• John Harrison's avatar
    drm/i915/guc: Force a reset on internal GuC error · b2edc414
    John Harrison authored
    If GuC hits an internal error (and survives long enough to report it
    to the KMD), it is basically toast and will stop until a GT reset and
    subsequent GuC reload is performed. Previously, the KMD just printed
    an error message and then waited for the heartbeat to eventually kick
    in and trigger a reset (assuming the heartbeat had not been disabled).
    Instead, force the reset immediately to guarantee that it happens and
    to eliminate the very long heartbeat delay. The captured error state
    is also more likely to be useful if captured at the time of the error
    rather than many seconds later.
    
    Note that it is not possible to trigger a reset from with the G2H
    handler itself. The reset prepare process involves flushing
    outstanding G2H contents. So a deadlock could result. Instead, the G2H
    handler queues a worker thread to do the reset asynchronously.
    
    v2: Flush the worker on suspend and shutdown. Add rate limiting to
    prevent spam from a totally dead system (review feedback from Daniele).
    Signed-off-by: default avatarJohn Harrison <John.C.Harrison@Intel.com>
    Reviewed-by: default avatarDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20230816003957.3572654-1-John.C.Harrison@Intel.com
    b2edc414
intel_guc.c 26 KB