drm/i915/guc: Fix error capture for virtual engines
GuC based register dumps in error capture logs were basically broken for virtual engines. This can be seen in igt@gem_exec_balancer@hang: [IGT] gem_exec_balancer: starting subtest hang [drm] GPU HANG: ecode 12:4:e1524110, in gem_exec_balanc [6388] [drm] GT0: GUC: No register capture node found for 0x1005 / 0xFEDC311D [drm] GPU HANG: ecode 12:4:00000000, in gem_exec_balanc [6388] [IGT] gem_exec_balancer: exiting, ret=0 The test causes a hang on both engines of a virtual engine context. The engine instance zero hang gets a valid error capture but the non-instance-zero hang does not. Fix that by scanning through the list of pending register captures when a hang notification for a virtual engine is received. That way, the hang can be assigned to the correct physical engine prior to starting the error capture process. So later on, when the error capture handler tries to find the engine register list, it looks for one on the correct engine. Also, sneak in a missing blank line before a comment in the node search code. v2: Fix null pointer deref on non-GuC platforms. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230428185636.457407-5-John.C.Harrison@Intel.com
Showing
Please register or sign in to comment