• Robert Bragg's avatar
    drm/i915/perf: improve tail race workaround · 0dd860cf
    Robert Bragg authored
    There's a HW race condition between OA unit tail pointer register
    updates and writes to memory whereby the tail pointer can sometimes get
    ahead of what's been written out to the OA buffer so far (in terms of
    what's visible to the CPU).
    
    Although this can be observed explicitly while copying reports to
    userspace by checking for a zeroed report-id field in tail reports, we
    want to account for this earlier, as part of the _oa_buffer_check to
    avoid lots of redundant read() attempts.
    
    Previously the driver used to define an effective tail pointer that
    lagged the real pointer by a 'tail margin' measured in bytes derived
    from OA_TAIL_MARGIN_NSEC and the configured sampling frequency.
    Unfortunately this was flawed considering that the OA unit may also
    automatically generate non-periodic reports (such as on context switch)
    or the OA unit may be enabled without any periodic sampling.
    
    This improves how we define a tail pointer for reading that lags the
    real tail pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which
    gives enough time for the corresponding reports to become visible to the
    CPU.
    
    The driver now maintains two tail pointers:
     1) An 'aging' tail with an associated timestamp that is tracked until we
        can trust the corresponding data is visible to the CPU; at which point
        it is considered 'aged'.
     2) An 'aged' tail that can be used for read()ing.
    
    The two separate pointers let us decouple read()s from tail pointer aging.
    
    The tail pointers are checked and updated at a limited rate within a
    hrtimer callback (the same callback that is used for delivering POLLIN
    events) and since we're now measuring the wall clock time elapsed since
    a given tail pointer was read the mechanism no longer cares about
    the OA unit's periodic sampling frequency.
    
    The natural place to handle the tail pointer updates was in
    gen7_oa_buffer_is_empty() which is called as part of blocking reads and
    the hrtimer callback used for polling, and so this was renamed to
    oa_buffer_check() considering the added side effect while checking
    whether the buffer contains data.
    Signed-off-by: default avatarRobert Bragg <robert@sixbynine.org>
    Reviewed-by: default avatarMatthew Auld <matthew.auld@intel.com>
    Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-6-lionel.g.landwerlin@intel.comSigned-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
    0dd860cf
i915_perf.c 71.4 KB