• Rodrigo Vivi's avatar
    drm/xe: Force wedged state and block GT reset upon any GPU hang · 8ed9aaae
    Rodrigo Vivi authored
    In many validation situations when debugging GPU Hangs,
    it is useful to preserve the GT situation from the moment
    that the timeout occurred.
    
    This patch introduces a module parameter that could be used
    on situations like this.
    
    If xe.wedged module parameter is set to 2, Xe will be declared
    wedged on every single execution timeout (a.k.a. GPU hang) right
    after devcoredump snapshot capture and without attempting any
    kind of GT reset and blocking entirely any kind of execution.
    
    v2: Really block gt_reset from guc side. (Lucas)
        s/wedged/busted (Lucas)
    
    v3: - s/busted/wedged
        - Really use global_flags (Dafna)
        - More robust timeout handling when wedging it.
    
    v4: A really robust clean exit done by Matt Brost.
        No more kernel warns on unbind.
    
    v5: Simplify error message (Lucas)
    
    Cc: Matthew Brost <matthew.brost@intel.com>
    Cc: Dafna Hirschfeld <dhirschfeld@habana.ai>
    Cc: Lucas De Marchi <lucas.demarchi@intel.com>
    Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
    Cc: Himanshu Somaiya <himanshu.somaiya@intel.com>
    Reviewed-by: default avatarLucas De Marchi <lucas.demarchi@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-3-rodrigo.vivi@intel.comSigned-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
    8ed9aaae
xe_module.h 516 Bytes