• Namhyung Kim's avatar
    perf record: Enable off-cpu analysis with BPF · edc41a10
    Namhyung Kim authored
    Add --off-cpu option to enable the off-cpu profiling with BPF.  It'd
    use a bpf_output event and rename it to "offcpu-time".  Samples will
    be synthesized at the end of the record session using data from a BPF
    map which contains the aggregated off-cpu time at context switches.
    So it needs root privilege to get the off-cpu profiling.
    
    Each sample will have a separate user stacktrace so it will skip
    kernel threads.  The sample ip will be set from the stacktrace and
    other sample data will be updated accordingly.  Currently it only
    handles some basic sample types.
    
    The sample timestamp is set to a dummy value just not to bother with
    other events during the sorting.  So it has a very big initial value
    and increase it on processing each samples.
    
    Good thing is that it can be used together with regular profiling like
    cpu cycles.  If you don't want to that, you can use a dummy event to
    enable off-cpu profiling only.
    
    Example output:
      $ sudo perf record --off-cpu perf bench sched messaging -l 1000
    
      $ sudo perf report --stdio --call-graph=no
      # Total Lost Samples: 0
      #
      # Samples: 41K of event 'cycles'
      # Event count (approx.): 42137343851
      ...
    
      # Samples: 1K of event 'offcpu-time'
      # Event count (approx.): 587990831640
      #
      # Children      Self  Command          Shared Object       Symbol
      # ........  ........  ...............  ..................  .........................
      #
          81.66%     0.00%  sched-messaging  libc-2.33.so        [.] __libc_start_main
          81.66%     0.00%  sched-messaging  perf                [.] cmd_bench
          81.66%     0.00%  sched-messaging  perf                [.] main
          81.66%     0.00%  sched-messaging  perf                [.] run_builtin
          81.43%     0.00%  sched-messaging  perf                [.] bench_sched_messaging
          40.86%    40.86%  sched-messaging  libpthread-2.33.so  [.] __read
          37.66%    37.66%  sched-messaging  libpthread-2.33.so  [.] __write
           2.91%     2.91%  sched-messaging  libc-2.33.so        [.] __poll
      ...
    
    As you can see it spent most of off-cpu time in read and write in
    bench_sched_messaging().  The --call-graph=no was added just to make
    the output concise here.
    
    It uses perf hooks facility to control BPF program during the record
    session rather than adding new BPF/off-cpu specific calls.
    Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
    Acked-by: default avatarIan Rogers <irogers@google.com>
    Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Blake Jones <blakejones@google.com>
    Cc: Hao Luo <haoluo@google.com>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Milian Wolff <milian.wolff@kdab.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Song Liu <songliubraving@fb.com>
    Cc: bpf@vger.kernel.org
    Link: https://lore.kernel.org/r/20220518224725.742882-3-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    edc41a10
bpf_off_cpu.c 4.43 KB