• Tvrtko Ursulin's avatar
    drm/i915: Move CSB MMIO reads out of the execlists lock · 26720ab9
    Tvrtko Ursulin authored
    By reading the CSB (slow MMIO accesses) into a temporary local
    buffer we can decrease the duration of holding the execlist
    lock.
    
    Main advantage is that during heavy batch buffer submission we
    reduce the execlist lock contention, which should decrease the
    latency and CPU usage between the submitting userspace process
    and interrupt handling.
    
    Downside is that we need to grab and relase the forcewake twice,
    but as the below numbers will show this is completely hidden
    by the primary gains.
    
    Testing with "gem_latency -n 100" (submit batch buffers with a
    hundred nops each) shows more than doubling of the throughput
    and more than halving of the dispatch latency, overall latency
    and CPU time spend in the submitting process.
    
    Submitting empty batches ("gem_latency -n 0") does not seem
    significantly affected by this change with throughput and CPU
    time improving by half a percent, and overall latency worsening
    by the same amount.
    
    Above tests were done in a hundred runs on a big core Broadwell.
    
    v2:
      * Overflow protection to local CSB buffer.
      * Use closer dev_priv in execlists_submit_requests. (Chris Wilson)
    
    v3: Rebase.
    
    v4: Added commend about irq needed to be disabled in
        execlists_submit_request. (Chris Wilson)
    Signed-off-by: default avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: default avatarChris Wilsno <chris@chris-wilson.co.uk>
    Link: http://patchwork.freedesktop.org/patch/msgid/1458219586-20452-1-git-send-email-tvrtko.ursulin@linux.intel.com
    26720ab9
intel_lrc.c 80.9 KB