• Daniel Vetter's avatar
    drm/i915: More surgically unbreak the modeset vs reset deadlock · 9db529aa
    Daniel Vetter authored
    There's no reason to entirely wedge the gpu, for the minimal deadlock
    bugfix we only need to unbreak/decouple the atomic commit from the gpu
    reset. The simplest way to fix that is by replacing the
    unconditional fence wait a the top of commit_tail by a wait which
    completes either when the fences are done (normal case, or when a
    reset doesn't need to touch the display state). Or when the gpu reset
    needs to force-unblock all pending modeset states.
    
    The lesser source of deadlocks is when we try to pin a new framebuffer
    and run into a stall. There's a bunch of places this can happen, like
    eviction, changing the caching mode, acquiring a fence on older
    platforms. And we can't just break the depency loop and keep going,
    the only way would be to break out and restart. But the problem with
    that approach is that we must stall for the reset to complete before
    we grab any locks, and with the atomic infrastructure that's a bit
    tricky. The only place is the ioctl code, and we don't want to insert
    code into e.g. the BUSY ioctl. Hence for that problem just create a
    critical section, and if any code is in there, wedge the GPU. For the
    steady-state this should never be a problem.
    
    Note that in both cases TDR itself keeps working, so from a userspace
    pov this trickery isn't observable. Users themselvs might spot a short
    glitch while the rendering is catching up again, but that's still
    better than pre-TDR where we've thrown away all the rendering,
    including innocent batches. Also, this fixes the regression TDR
    introduced of making gpu resets deadlock-prone when we do need to
    touch the display.
    
    One thing I noticed is that gpu_error.flags seems to use both our own
    wait-queue in gpu_error.wait_queue, and the generic wait_on_bit
    facilities. Not entirely sure why this inconsistency exists, I just
    picked one style.
    
    A possible future avenue could be to insert the gpu reset in-between
    ongoing modeset changes, which would avoid the momentary glitch. But
    that's a lot more work to implement in the atomic commit machinery,
    and given that we only need this for pre-g4x hw, of questionable
    utility just for the sake of polishing gpu reset even more on those
    old boxes. It might be useful for other features though.
    
    v2: Rebase onto 4.13 with a s/wait_queue_t/struct wait_queue_entry/.
    
    v3: Really emabarrassing fixup, I checked the wrong bit and broke the
    unbreak/wakeup logic.
    
    v4: Also handle deadlocks in pin_to_display.
    
    v5: Review from Michel:
    - Fixup the BUILD_BUG_ON
    - Don't forget about the overlay
    
    Cc: Michel Thierry <michel.thierry@intel.com>
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Mika Kuoppala <mika.kuoppala@intel.com>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v2)
    Cc: Michel Thierry <michel.thierry@intel.com>
    Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20170808080828.23650-3-daniel.vetter@ffwll.chReviewed-by: default avatarMichel Thierry <michel.thierry@intel.com>
    9db529aa
i915_irq.c 120 KB