• Kan Liang's avatar
    perf/core: Add PERF_SAMPLE_WEIGHT_STRUCT · 2a6c6b7d
    Kan Liang authored
    Current PERF_SAMPLE_WEIGHT sample type is very useful to expresses the
    cost of an action represented by the sample. This allows the profiler
    to scale the samples to be more informative to the programmer. It could
    also help to locate a hotspot, e.g., when profiling by memory latencies,
    the expensive load appear higher up in the histograms. But current
    PERF_SAMPLE_WEIGHT sample type is solely determined by one factor. This
    could be a problem, if users want two or more factors to contribute to
    the weight. For example, Golden Cove core PMU can provide both the
    instruction latency and the cache Latency information as factors for the
    memory profiling.
    
    For current X86 platforms, although meminfo::latency is defined as a
    u64, only the lower 32 bits include the valid data in practice (No
    memory access could last than 4G cycles). The higher 32 bits can be used
    to store new factors.
    
    Add a new sample type, PERF_SAMPLE_WEIGHT_STRUCT, to indicate the new
    sample weight structure. It shares the same space as the
    PERF_SAMPLE_WEIGHT sample type.
    
    Users can apply either the PERF_SAMPLE_WEIGHT sample type or the
    PERF_SAMPLE_WEIGHT_STRUCT sample type to retrieve the sample weight, but
    they cannot apply both sample types simultaneously.
    
    Currently, only X86 and PowerPC use the PERF_SAMPLE_WEIGHT sample type.
    - For PowerPC, there is nothing changed for the PERF_SAMPLE_WEIGHT
      sample type. There is no effect for the new PERF_SAMPLE_WEIGHT_STRUCT
      sample type. PowerPC can re-struct the weight field similarly later.
    - For X86, the same value will be dumped for the PERF_SAMPLE_WEIGHT
      sample type or the PERF_SAMPLE_WEIGHT_STRUCT sample type for now.
      The following patches will apply the new factors for the
      PERF_SAMPLE_WEIGHT_STRUCT sample type.
    
    The field in the union perf_sample_weight should be shared among
    different architectures. A generic name is required, but it's hard to
    abstract a name that applies to all architectures. For example, on X86,
    the fields are to store all kinds of latency. While on PowerPC, it
    stores MMCRA[TECX/TECM], which should not be latency. So a general name
    prefix 'var$NUM' is used here.
    Suggested-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lkml.kernel.org/r/1611873611-156687-2-git-send-email-kan.liang@linux.intel.com
    2a6c6b7d
core-book3s.c 61.1 KB