• Robert Bragg's avatar
    drm/i915/perf: Add OA unit support for Gen 8+ · 19f81df2
    Robert Bragg authored
    Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
    share (more-or-less) the same OA unit design.
    
    Of particular note in comparison to Haswell: some OA unit HW config
    state has become per-context state and as a consequence it is somewhat
    more complicated to manage synchronous state changes from the cpu while
    there's no guarantee of what context (if any) is currently actively
    running on the gpu.
    
    The periodic sampling frequency which can be particularly useful for
    system-wide analysis (as opposed to command stream synchronised
    MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
    have become per-context save and restored (while the OABUFFER
    destination is still a shared, system-wide resource).
    
    This support for gen8+ takes care to consider a number of timing
    challenges involved in synchronously updating per-context state
    primarily by programming all config state from the cpu and updating all
    current and saved contexts synchronously while the OA unit is still
    disabled.
    
    The driver intentionally avoids depending on command streamer
    programming to update OA state considering the lack of synchronization
    between the automatic loading of OACTXCONTROL state (that includes the
    periodic sampling state and enable state) on context restore and the
    parsing of any general purpose BB the driver can control. I.e. this
    implementation is careful to avoid the possibility of a context restore
    temporarily enabling any out-of-date periodic sampling state. In
    addition to the risk of transiently-out-of-date state being loaded
    automatically; there are also internal HW latencies involved in the
    loading of MUX configurations which would be difficult to account for
    from the command streamer (and we only want to enable the unit when once
    the MUX configuration is complete).
    
    Since the Gen8+ OA unit design no longer supports clock gating the unit
    off for a single given context (which effectively stopped any progress
    of counters while any other context was running) and instead supports
    tagging OA reports with a context ID for filtering on the CPU, it means
    we can no longer hide the system-wide progress of counters from a
    non-privileged application only interested in metrics for its own
    context. Although we could theoretically try and subtract the progress
    of other contexts before forwarding reports via read() we aren't in a
    position to filter reports captured via MI_REPORT_PERF_COUNT commands.
    As a result, for Gen8+, we always require the
    dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
    if not root.
    
    v5: Drain submitted requests when enabling metric set to ensure no
        lite-restore erases the context image we just updated (Lionel)
    
    v6: In addition to drain, switch to kernel context & update all
        context in place (Chris)
    
    v7: Add missing mutex_unlock() if switching to kernel context fails
        (Matthew)
    
    v8: Simplify OA period/flex-eu-counters programming by using the
        batchbuffer instead of modifying ctx-image (Lionel)
    
    v9: Back to updating the context image (due to erroneous testing,
        batchbuffer programming the OA unit doesn't actually work)
        (Lionel)
        Pin context before updating context image (Chris)
        Drop MMIO programming now that we switch to a kernel context with
        right values in initial context image (Chris)
    
    v10: Just pin_map the contexts we want to modify or let the
         configuration happen on first use (Chris)
    
    v11: Update kernel context OA config through the batchbuffer rather
         than on the fly ctx-image update (Lionel)
    
    v12: Rework OA context registers update again by swithing away from
         user contexts and reconfiguring the kernel context through the
         batchbuffer and updating all the other contexts' context image.
         Also take care to lock slice/subslice configuration when OA is
         on. (Lionel)
    
    v13: Request rpcs updates on all engine when updating the OA config
         (Lionel)
    
    v14: Drop any kind of rpcs management now that we monitor sseu
         configuration changes in a later patch (Lionel)
         Remove usleep after programming the NOA configs on Gen8+, this
         doesn't seem to be needed (Lionel)
    
    v15: Respect coding style for block comments (Chris)
    
    v16: Add missing i915_add_request() in case we fail to emit OA
         configuration (Matthew)
    Signed-off-by: default avatarRobert Bragg <robert@sixbynine.org>
    Signed-off-by: default avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
    Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/
    Signed-off-by: default avatarBen Widawsky <ben@bwidawsk.net>
    19f81df2
i915_perf.c 100 KB