• Will Deacon's avatar
    drivers/perf: Add support for ARMv8.2 Statistical Profiling Extension · d5d9696b
    Will Deacon authored
    The ARMv8.2 architecture introduces the optional Statistical Profiling
    Extension (SPE).
    
    SPE can be used to profile a population of operations in the CPU pipeline
    after instruction decode. These are either architected instructions (i.e.
    a dynamic instruction trace) or CPU-specific uops and the choice is fixed
    statically in the hardware and advertised to userspace via caps/. Sampling
    is controlled using a sampling interval, similar to a regular PMU counter,
    but also with an optional random perturbation to avoid falling into patterns
    where you continuously profile the same instruction in a hot loop.
    
    After each operation is decoded, the interval counter is decremented. When
    it hits zero, an operation is chosen for profiling and tracked within the
    pipeline until it retires. Along the way, information such as TLB lookups,
    cache misses, time spent to issue etc is captured in the form of a sample.
    The sample is then filtered according to certain criteria (e.g. load
    latency) that can be specified in the event config (described under
    format/) and, if the sample satisfies the filter, it is written out to
    memory as a record, otherwise it is discarded. Only one operation can
    be sampled at a time.
    
    The in-memory buffer is linear and virtually addressed, raising an
    interrupt when it fills up. The PMU driver handles these interrupts to
    give the appearance of a ring buffer, as expected by the AUX code.
    
    The in-memory trace-like format is self-describing (though not parseable
    in reverse) and written as a series of records, with each record
    corresponding to a sample and consisting of a sequence of packets. These
    packets are defined by the architecture, although some have CPU-specific
    fields for recording information specific to the microarchitecture.
    
    As a simple example, a record generated for a branch instruction may
    consist of the following packets:
    
      0 (Address) : Virtual PC of the branch instruction
      1 (Type)    : Conditional direct branch
      2 (Counter) : Number of cycles taken from Dispatch to Issue
      3 (Address) : Virtual branch target + condition flags
      4 (Counter) : Number of cycles taken from Dispatch to Complete
      5 (Events)  : Mispredicted as not-taken
      6 (END)     : End of record
    
    It is also possible to toggle properties such as timestamp packets in
    each record.
    
    This patch adds support for SPE in the form of a new perf driver.
    
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Reviewed-by: default avatarMark Rutland <mark.rutland@arm.com>
    Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
    d5d9696b
arm_spe_pmu.c 33.1 KB