• Linus Torvalds's avatar
    Merge tag 'perf-tools-for-v6.2-1-2022-12-16' of... · aa4800e3
    Linus Torvalds authored
    Merge tag 'perf-tools-for-v6.2-1-2022-12-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
    
    Pull perf tools updates from Arnaldo Carvalho de Melo:
     "Libraries:
    
       - Drop the old copy of libtraceevent in tools/lib/traceevent/ now
         that all major distros ship it from its external repository.
    
         This is now just another feature detection, emitting a warning when
         the libtraceevent-dev[el] package isn't installed, disabling the
         build of perf features and tools that strictly require parsing
         things from tracefs while keeping the core functionality present
         and working with a subset of the events, the most used ones like
         CPU cycles, hardware cache and also vendor events, etc.
    
         This was tested with lots of containers for Fedora, Debian,
         OpenSUSE, Alpine Linux, Ubuntu, with cross builds, etc.
    
      Build:
    
       - Update to C standard to gnu11, like was done for the kernel.
    
       - Install the tools/lib/ libraries locally instead of having headers
         searched directly from the source code directories, to help the
         cases where we can build either from in-kernel source libraries or
         from the same library shipped as a distro package, as is the case
         with libbpf and was the case with libtraceevent.
    
      perf stat:
    
       - Do not delay the workload with --delay, the delay is just for
         starting to count the events, to skip noise at workload startup.
    
       - When we have events for each cgroup, the metric should be printed
         for each cgroup separately.
    
            $ perf stat -a --for-each-cgroup system.slice,user.slice --metric-only sleep 1
    
            Performance counter stats for 'system wide':
    
                            GHz  insn per cycle  branch-misses of all branches
            system.slice  3.792      0.61                  3.24%
            user.slice    3.661      2.32                  0.37%
    
       - Fix printing field separator in CSV metrics output.
    
       - Fix --metric-only --json output.
    
       - Fix summary output in CSV with --metric-only.
    
       - Update event group check for support of uncore event.
    
      perf test:
    
       - Stop requiring a C toolchain in shell tests, instead add a workload
         option that has all the previously C snippets built as part of
         'perf test -w' that then get used in the 'perf test' shell scripts.
    
       - Add event group test for events in multiple PMUs
    
       - The "kernel lock contention analysis" test should not print
         warnings in quiet mode.
    
       - Add attr tests for ARM64's new VG register.
    
       - Fix record test on KVM guests, as using precise flag with the
         br_inst_retired.near_call event causes the test fail on KVM guests,
         even when the guests have PMU forwarding enabled and the event
         itself is supported, so just remove the precise flag from the
         event.
    
       - Add mechanism for skipping attr tests on specific kernel versions
         where it is known that these checks will fail.
    
       - Skip watchpoint tests if no watchpoints available.
    
       - Add more Intel PT 'perf test' entries: hybrid CPUs, split the
         packet decoder into a suite of subtests.
    
      perf script:
    
       - Introduce task analyzer python script, where one first records some events:
    
         Recording can be done in two ways:
    
            $ perf script record tasks-analyzer -- sleep 10
            $ perf record -e sched:sched_switch -a -- sleep 10
    
         The script can parse any perf.data files, as long as it has
         sched:sched_switch events, other events will be ignored.
    
         The most simple report use case is to just call the script without
         arguments.
    
         Runtime is the time the task was running on the CPU, Time Out-In is
         the time between the process being scheduled *out* and scheduled
         back *in*. So the last time span between two executions:
    
            $ perf script report tasks-analyzer
                Switched-In     Switched-Out CPU    PID    TID             Comm  Runtime  Time Out-In
            15576.658891407  15576.659156086   4   2412   2428            gdbus      265         1949
            15576.659111320  15576.659455410   0   2412   2412      gnome-shell      344         2267
            15576.659491326  15576.659506173   2     74     74      kworker/2:1       15        13145
            15576.659506173  15576.659825748   2   2858   2858  gnome-terminal-      320        63263
            15576.659871270  15576.659902872   6  20932  20932    kworker/u16:0       32      2314582
            15576.659909951  15576.659945501   3  27264  27264               sh       36           -1
            15576.659853285  15576.659971052   7  27265  27265             perf      118      5050741
            [...]
    
      perf lock:
    
       - Allow concurrent record and report to support live monitoring of
         kernel lock contention without BPF:
    
            # perf lock record -a -o- sleep 1 | perf lock contention -i-
             contended   total wait     max wait     avg wait         type   caller
    
                     2     10.27 us      6.17 us      5.13 us     spinlock   load_balance+0xc03
                     1      5.29 us      5.29 us      5.29 us     rwlock:W   ep_scan_ready_list+0x54
                     1      4.12 us      4.12 us      4.12 us     spinlock   smpboot_thread_fn+0x116
                     1      3.28 us      3.28 us      3.28 us        mutex   pipe_read+0x50
    
       - Implement -t/--threads option when using BPF:
    
            $ sudo ./perf lock contention -abt -E 5 sleep 1
             contended  total wait   max wait   avg wait      pid  comm
    
                     1   740.66 ms  740.66 ms  740.66 ms     1950  nv_queue
                     3   305.50 ms  298.19 ms  101.83 ms     1884  nvidia-modeset/
                     1    25.14 us   25.14 us   25.14 us  2725038  EventManager_De
                    12    23.09 us    9.30 us    1.92 us        0  swapper
                     1    20.18 us   20.18 us   20.18 us  2725033  EventManager_De
    
       - Add -l/--lock-addr to aggregate per-lock-instance contention:
    
            $ sudo ./perf lock contention -abl sleep 1
             contended  total wait  max wait  avg wait           address  symbol
    
                     1    36.28 us  36.28 us  36.28 us  ffff92615d6448b8
                     9    10.91 us   1.84 us   1.21 us  ffffffffbaed50c0  rcu_state
                     1    10.49 us  10.49 us  10.49 us  ffff9262ac4f0c80
                     8     4.68 us   1.67 us    585 ns  ffffffffbae07a40  jiffies_lock
                     3     3.03 us   1.45 us   1.01 us  ffff9262277861e0
                     1      924 ns    924 ns    924 ns  ffff926095ba9d20
                     1      436 ns    436 ns    436 ns  ffff9260bfda4f60
    
      perf record:
    
       - Add remaining branch filters: "no_cycles", "no_flags" & "hw_index",
         to be used with hardware such as Intel's LBR that allows things
         like stitching stacks of two samples to overcome the limits of the
         number of LBR registers.
    
      Symbol resolution:
    
       - Handle .debug files created with 'objcopy --only-keep-debug', where
         program headers are zeroed and thus can't be used for adjustments,
         use the info in the runtime_ss (runtime ELF) instead.
    
      perf trace:
    
       - Add BPF based augmenter for the 'perf_event_open's 'struct
         perf_event_attr' argument.
    
       - Add BPF based augmenter for the 'clock_gettime's 'struct timespec'
         argument.
    
       - In both cases the syscall tracepoint has just the pointer value, we
         need to hook a BPF program to collect the pointer contents, and
         then, in userspace, pretty print it in 'perf trace'.
    
      perf list:
    
       - Introduce JSON output of events.
    
       - Streamline how the expression specifying what events should be
         shown is handled, fixing several corner cases, such as the metric
         filter that is specified as a glob but was using strstr().
    
      perf probe:
    
       - Fix to avoid crashing if DW_AT_decl_file is NULL, coping with clang
         generating DWARF5 like that.
    
       - Use dwarf_attr_integrate() as generic DWARF attr accessor as it
         supersedes dwarf_attr(), supporting abstact origin DIEs.
    
      perf inject:
    
       - Set PERF_RECORD_MISC_BUILD_ID_SIZE in the PERF_RECORD_HEADER_BUILD_ID
         so that perf.data readers can get the real build-id size and avoid
         trailing zeroes.
    
      perf data:
    
       - Add tracepoint fields when converting a perf.data file to JSON.
    
      arm64:
    
       - Fix mksyscalltbl, don't lose syscalls due to sort -nu.
    
       - Add Arm Neoverse V2 PMU events.
    
      riscv:
    
       - Add riscv sbi firmware std event files.
    
       - Add Sifive U74 vendor events (JSON) file.
    
       - Add some more events and metrics for Alderlake/Alderlake-N.
    
      Documentation:
    
       - Add data documentation for the PMU structs in the C source code.
    
      Miscellaneous:
    
       - Periodic sanitization of headers, adding missing includes, removing
         needless ones, creating new ones, etc.
    
       - Use sig_atomic_t for signal handlers to avoid undefined behaviour
         in all perf tools.
    
       - Fixes for libbpf 1.0+ compatibility (maps, etc) on 'perf trace' BPF
         examples.
    
       - Remove some old perf bpf examples, leave the best ones that
         demonstrate how to associate BPF functions to points in the kernel.
    
       - Make quiet mode consistent between tools.
    
       - Use dedicated non-atomic clear/set bit helpers.
    
       - Use "grep -E" instead of "egrep" as recommended by warning emitted
         by GNU grep since at least version 3.8.
    
       - Complete list of supported subcommands in the 'perf daemon' help
         message.
    
       - Update John Garry's email address for arm64 perf tooling on the
         MAINTAINERS file, he moved from Huawei to Oracle"
    
    * tag 'perf-tools-for-v6.2-1-2022-12-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (239 commits)
      libperf: Fix install_pkgconfig target
      perf tools: Use "grep -E" instead of "egrep"
      perf stat: Do not delay the workload with --delay
      perf evlist: Remove group option.
      perf build: Fix python/perf.so library's name
      perf test arm64: Add attr tests for new VG register
      perf test: Add mechanism for skipping attr tests on kernel versions
      perf test: Add mechanism for skipping attr tests on auxiliary vector values
      perf test: Add ability to test exit code for attr tests
      perf test: add new task-analyzer tests
      perf script: task-analyzer add csv support
      perf script: Introduce task analyzer python script
      perf cs-etm: Print auxtrace info even if OpenCSD isn't linked
      perf cs-etm: Cleanup cs_etm__process_auxtrace_info()
      perf cs-etm: Tidy up auxtrace info header printing
      perf cs-etm: Remove unused stub methods
      perf cs-etm: Print unknown header version as an error
      perf test: Update perf lock contention test
      perf lock contention: Add -l/--lock-addr option
      perf lock contention: Implement -t/--threads option for BPF
      ...
    aa4800e3
MAINTAINERS 681 KB