perf evlist: Use unshare(CLONE_FS) in sb threads to let setns(CLONE_NEWNS) work
When we started using a thread to catch the PERF_RECORD_BPF_EVENT meta data events to then ask the kernel for further info (BTF, etc) for BPF programs shortly after they get loaded, we forgot to use unshare(CLONE_FS) as was done in: 868a8329 ("perf top: Support lookup of symbols in other mount namespaces.") Do it so that we can enter the namespaces to read the build-ids at the end of a 'perf record' session for the DSOs that had hits. Before: Starting a 'stress-ng --cpus 8' inside a container and then, outside the container running: # perf record -a --namespaces sleep 5 # perf buildid-list | grep stress-ng # We would end up with a 'perf.data' file that had no entry in its build-id table for the /usr/bin/stress-ng binary inside the container that got tons of PERF_RECORD_SAMPLEs. After: # perf buildid-list | grep stress-ng f2ed02c68341183a124b9b0f6e2e6c493c465b29 /usr/bin/stress-ng # Then its just a matter of making sure that that binary debuginfo package gets available in a place that 'perf report' will look at build-id keyed ELF files, which, in my case, on a f30 notebook, was a matter of installing the debuginfo file for the distro used in the container, fedora 31: # rpm -ivh http://fedora.c3sl.ufpr.br/linux/development/31/Everything/x86_64/debug/tree/Packages/s/stress-ng-debuginfo-0.07.29-10.fc31.x86_64.rpm Then, because perf currently looks for those debuginfo files (richer ELF symtab) inside that namespace (look at the setns calls): openat(AT_FDCWD, "/proc/self/ns/mnt", O_RDONLY) = 137 openat(AT_FDCWD, "/proc/13169/ns/mnt", O_RDONLY) = 139 setns(139, CLONE_NEWNS) = 0 stat("/usr/bin/stress-ng", {st_mode=S_IFREG|0755, st_size=3065416, ...}) = 0 openat(AT_FDCWD, "/usr/bin/stress-ng", O_RDONLY) = 140 fcntl(140, F_GETFD) = 0 fstat(140, {st_mode=S_IFREG|0755, st_size=3065416, ...}) = 0 mmap(NULL, 3065416, PROT_READ, MAP_PRIVATE, 140, 0) = 0x7ff2fdc5b000 munmap(0x7ff2fdc5b000, 3065416) = 0 close(140) = 0 stat("stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory) stat("/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory) stat("/usr/bin/.debug/stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory) stat("/usr/lib/debug/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory) stat("/root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29", 0x7fff45d711e0) = -1 ENOENT (No such file or directory) To only then go back to the "host" namespace to look just in the users's ~/.debug cache: setns(137, CLONE_NEWNS) = 0 chdir("/root") = 0 close(137) = 0 close(139) = 0 stat("/root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf", 0x7fff45d732e0) = -1 ENOENT (No such file or directory) It continues to fail to resolve symbols: # perf report | grep stress-ng | head -5 9.50% stress-ng-cpu stress-ng [.] 0x0000000000021ac1 8.58% stress-ng-cpu stress-ng [.] 0x0000000000021ab4 8.51% stress-ng-cpu stress-ng [.] 0x0000000000021489 7.17% stress-ng-cpu stress-ng [.] 0x00000000000219b6 3.93% stress-ng-cpu stress-ng [.] 0x0000000000021478 # To overcome that we use: # perf buildid-cache -v --add /usr/lib/debug/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug Adding f2ed02c68341183a124b9b0f6e2e6c493c465b29 /usr/lib/debug/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug: Ok # # ls -la /root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf -rw-r--r--. 3 root root 2401184 Jul 27 07:03 /root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf # file /root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf /root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter \004, BuildID[sha1]=f2ed02c68341183a124b9b0f6e2e6c493c465b29, for GNU/Linux 3.2.0, with debug_info, not stripped, too many notes (256) # Now it finally works: # perf report | grep stress-ng | head -5 23.59% stress-ng-cpu stress-ng [.] ackermann 23.33% stress-ng-cpu stress-ng [.] is_prime 17.36% stress-ng-cpu stress-ng [.] stress_cpu_sieve 6.08% stress-ng-cpu stress-ng [.] stress_cpu_correlate 3.55% stress-ng-cpu stress-ng [.] queens_try # I'll make sure that it looks for the build-id keyed files in both the "host" namespace (the namespace the user running 'perf record' was a the time of the recording) and in the container namespace, as it shouldn't matter where a content based key lookup finds the ELF file to use in resolving symbols, etc. Reported-by: Karl Rister <krister@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Krister Johansen <kjlx@templeofstupid.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stanislav Fomichev <sdf@google.com> Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Fixes: 657ee553 ("perf evlist: Introduce side band thread") Link: https://lkml.kernel.org/n/tip-g79k0jz41adiaeuqud742t2l@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Showing
Please register or sign in to comment