• Yan, Zheng's avatar
    perf/x86/intel: Handle multiple records in the PEBS buffer · 21509084
    Yan, Zheng authored
    When the PEBS interrupt threshold is larger than one record and the
    machine supports multiple PEBS events, the records of these events are
    mixed up and we need to demultiplex them.
    
    Demuxing the records is hard because the hardware is deficient. The
    hardware has two issues that, when combined, create impossible
    scenarios to demux.
    
    The first issue is that the 'status' field of the PEBS record is a copy
    of the GLOBAL_STATUS MSR at PEBS assist time. To see why this is a
    problem let us first describe the regular PEBS cycle:
    
    A) the CTRn value reaches 0:
      - the corresponding bit in GLOBAL_STATUS gets set
      - we start arming the hardware assist
      < some unspecified amount of time later -- this could cover multiple
        events of interest >
    
    B) the hardware assist is armed, any next event will trigger it
    
    C) a matching event happens:
      - the hardware assist triggers and generates a PEBS record
        this includes a copy of GLOBAL_STATUS at this moment
      - if we auto-reload we (re)set CTRn
      - we clear the relevant bit in GLOBAL_STATUS
    
    Now consider the following chain of events:
    
      A0, B0, A1, C0
    
    The event generated for counter 0 will include a status with counter 1
    set, even though its not at all related to the record. A similar thing
    can happen with a !PEBS event if it just happens to overflow at the
    right moment.
    
    The second issue is that the hardware will only emit one record for two
    or more counters if the event that triggers the assist is 'close'. The
    'close' can be several cycles. In some cases even the complete assist,
    if the event is something that doesn't need retirement.
    
    For instance, consider this chain of events:
    
      A0, B0, A1, B1, C01
    
    Where C01 is an event that triggers both hardware assists, we will
    generate but a single record, but again with both counters listed in the
    status field.
    
    This time the record pertains to both events.
    
    Note that these two cases are different but undistinguishable with the
    data as generated. Therefore demuxing records with multiple PEBS bits
    (we can safely ignore status bits for !PEBS counters) is impossible.
    
    Furthermore we cannot emit the record to both events because that might
    cause a data leak -- the events might not have the same privileges -- so
    what this patch does is discard such events.
    
    The assumption/hope is that such discards will be rare.
    
    Here lists some possible ways you may get high discard rate.
    
      - when you count the same thing multiple times. But it is not a useful
        configuration.
      - you can be unfortunate if you measure with a userspace only PEBS
        event along with either a kernel or unrestricted PEBS event. Imagine
        the event triggering and setting the overflow flag right before
        entering the kernel. Then all kernel side events will end up with
        multiple bits set.
    Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
    Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
    [ Changelog improvements. ]
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: acme@infradead.org
    Cc: eranian@google.com
    Link: http://lkml.kernel.org/r/1430940834-8964-4-git-send-email-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    21509084
perf_event_intel_ds.c 30.3 KB