• Daniel Vetter's avatar
    drm/i915: add interface to simulate gpu hangs · e5eb3d63
    Daniel Vetter authored
    gpu reset is a very important piece of our infrastructure.
    Unfortunately we only really it test by actually hanging the gpu,
    which often has bad side-effects for the entire system. And the gpu
    hang handling code is one of the rather complicated pieces of code we
    have, consisting of
    - hang detection
    - error capture
    - actual gpu reset
    - reset of all the gem bookkeeping
    - reinitialition of the entire gpu
    
    This patch adds a debugfs to selectively stopping rings by ceasing to
    update the hw tail pointer, which will result in the gpu no longer
    updating it's head pointer and eventually to the hangcheck firing.
    This way we can exercise the gpu hang code under controlled conditions
    without a dying gpu taking down the entire systems.
    
    Patch motivated by me forgetting to properly reinitialize ppgtt after
    a gpu reset.
    
    Usage:
    
    echo $((1 << $ringnum)) > i915_ring_stop # stops one ring
    
    echo 0xffffffff > i915_ring_stop # stops all, future-proof version
    
    then run whatever testload is desired. i915_ring_stop automatically
    resets after a gpu hang is detected to avoid hanging the gpu to fast
    and declaring it wedged.
    
    v2: Incorporate feedback from Chris Wilson.
    
    v3: Add the missing cleanup.
    
    v4: Fix up inconsistent size of ring_stop_read vs _write, noticed by
    Eugeni Dodonov.
    Reviewed-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: default avatarEugeni Dodonov <eugeni.dodonov@intel.com>
    Signed-Off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
    e5eb3d63
i915_debugfs.c 52.7 KB