• Steinar H. Gunderson's avatar
    perf intel-pt: Synthesize cycle events · 7e55b956
    Steinar H. Gunderson authored
    There is no good reason why we cannot synthesize "cycle" events from
    Intel PT just as we can synthesize "instruction" events, in particular
    when CYC packets are available. This enables using PT to getting much
    more accurate cycle profiles than regular sampling (record -e cycles)
    when the work last for very short periods (<10 ms).  Thus, add support
    for this, based off of the existing IPC calculation framework. The new
    option to --itrace is "y" (for cYcles), as c was taken for calls. Cycle
    and instruction events can be synthesized together, and are by default.
    
    The only real caveat is that CYC packets are only emitted whenever some
    other packet is, which in practice is when a branch instruction is
    encountered (and not even all branches). Thus, even at no subsampling
    (e.g. --itrace=y0ns), it is impossible to get more accuracy than a
    single basic block, and all cycles spent executing that block will get
    attributed to the branch instruction that ends the packet.  Thus, one
    cannot know whether the cycles came from e.g. a specific load, a
    mispredicted branch, or something else. When subsampling (which is the
    default), the cycle events will get smeared out even more, but will
    still be generally useful to attribute cycle counts to functions.
    Reviewed-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
    Signed-off-by: default avatarSteinar H. Gunderson <sesse@google.com>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Link: https://lore.kernel.org/r/20220322082452.1429091-1-sesse@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    7e55b956
perf-intel-pt.txt 87.8 KB