1. 21 Aug, 2017 1 commit
  2. 18 Aug, 2017 16 commits
  3. 17 Aug, 2017 2 commits
  4. 16 Aug, 2017 4 commits
  5. 15 Aug, 2017 14 commits
  6. 14 Aug, 2017 3 commits
    • Daniel Vetter's avatar
      drm/i915: More surgically unbreak the modeset vs reset deadlock · 9db529aa
      Daniel Vetter authored
      There's no reason to entirely wedge the gpu, for the minimal deadlock
      bugfix we only need to unbreak/decouple the atomic commit from the gpu
      reset. The simplest way to fix that is by replacing the
      unconditional fence wait a the top of commit_tail by a wait which
      completes either when the fences are done (normal case, or when a
      reset doesn't need to touch the display state). Or when the gpu reset
      needs to force-unblock all pending modeset states.
      
      The lesser source of deadlocks is when we try to pin a new framebuffer
      and run into a stall. There's a bunch of places this can happen, like
      eviction, changing the caching mode, acquiring a fence on older
      platforms. And we can't just break the depency loop and keep going,
      the only way would be to break out and restart. But the problem with
      that approach is that we must stall for the reset to complete before
      we grab any locks, and with the atomic infrastructure that's a bit
      tricky. The only place is the ioctl code, and we don't want to insert
      code into e.g. the BUSY ioctl. Hence for that problem just create a
      critical section, and if any code is in there, wedge the GPU. For the
      steady-state this should never be a problem.
      
      Note that in both cases TDR itself keeps working, so from a userspace
      pov this trickery isn't observable. Users themselvs might spot a short
      glitch while the rendering is catching up again, but that's still
      better than pre-TDR where we've thrown away all the rendering,
      including innocent batches. Also, this fixes the regression TDR
      introduced of making gpu resets deadlock-prone when we do need to
      touch the display.
      
      One thing I noticed is that gpu_error.flags seems to use both our own
      wait-queue in gpu_error.wait_queue, and the generic wait_on_bit
      facilities. Not entirely sure why this inconsistency exists, I just
      picked one style.
      
      A possible future avenue could be to insert the gpu reset in-between
      ongoing modeset changes, which would avoid the momentary glitch. But
      that's a lot more work to implement in the atomic commit machinery,
      and given that we only need this for pre-g4x hw, of questionable
      utility just for the sake of polishing gpu reset even more on those
      old boxes. It might be useful for other features though.
      
      v2: Rebase onto 4.13 with a s/wait_queue_t/struct wait_queue_entry/.
      
      v3: Really emabarrassing fixup, I checked the wrong bit and broke the
      unbreak/wakeup logic.
      
      v4: Also handle deadlocks in pin_to_display.
      
      v5: Review from Michel:
      - Fixup the BUILD_BUG_ON
      - Don't forget about the overlay
      
      Cc: Michel Thierry <michel.thierry@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v2)
      Cc: Michel Thierry <michel.thierry@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20170808080828.23650-3-daniel.vetter@ffwll.chReviewed-by: default avatarMichel Thierry <michel.thierry@intel.com>
      9db529aa
    • Daniel Vetter's avatar
      drm/i915: Push i915_sw_fence_wait into the nonblocking atomic commit · 42b062b0
      Daniel Vetter authored
      Blocking in a worker is ok, that's what the unbound_wq is for. And it
      unifies the paths between the blocking and nonblocking commit, giving
      me just one path where I have to implement the deadlock avoidance
      trickery in the next patch.
      
      I first tried to implement the following patch without this rework, but
      force-completing i915_sw_fence creates some serious challenges around
      properly cleaning things up. So wasn't a feasible short-term approach.
      Another approach would be to simple keep track of all pending atomic
      commit work items and manually queue them from the reset code. With the
      caveat that double-queue in case we race with the i915_sw_fence must be
      avoided. Given all that, taking the cost of a double schedule in atomic
      for the short-term fix is the best approach, but can be changed in the future of course.
      
      v2: Amend commit message (Chris).
      
      v3: Add comment explaining why we do nothing in the sw_fence complete
      callback (Michel).
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v2)
      Cc: Michel Thierry <michel.thierry@intel.com>
      Reviewed-by: default avatarMichel Thierry <michel.thierry@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20170808080828.23650-2-daniel.vetter@ffwll.ch
      42b062b0
    • Daniel Vetter's avatar
      drm/i915: Avoid the gpu reset vs. modeset deadlock · 97154ec2
      Daniel Vetter authored
      ... using the biggest hammer we have. This is essentially a weaponized
      version of the timeout-based wedging Chris added in
      
      commit 36703e79
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Thu Jun 22 11:56:25 2017 +0100
      
          drm/i915: Break modeset deadlocks on reset
      
      Because defense-in-depth is good it's good to still have both. Also
      note that with the locking change we can now restrict this a lot (old
      gpus and special testing only), so this doesn't kill the TDR benefits
      on at least anything remotely modern.
      
      And futuremore with a few tricks it should be possible to make a much
      more educated guess about whether an atomic commit is stuck waiting on
      the gpu (atomic_t counting the pending i915_sw_fence used by the
      atomic modeset code should do it), so we can improve this.
      
      But for now just start with something that is guaranteed to recover
      faster, for much better CI througput.
      
      This defacto reverts TDR on these platforms, but there's not really a
      single commit to specify as the sole offender.
      
      v2: Add a debug message to explain what's going on. We can't DRM_ERROR
      because that spams CI. And the timeout based fallback still prints a
      DRM_ERROR, in case something goes wrong.
      
      v3: Fix comment layout (Michel)
      
      Fixes: 4680816b ("drm/i915: Wait first for submission, before waiting for request completion")
      Fixes: 221fe799 ("drm/i915: Perform a direct reset of the GPU from the waiter")
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v2)
      Cc: Michel Thierry <michel.thierry@intel.com>
      Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v2)
      Reviewed-by: default avatarMichel Thierry <michel.thierry@intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20170808080828.23650-1-daniel.vetter@ffwll.ch
      97154ec2