• Janusz Krzysztofik's avatar
    Revert "drm/i915: Remove extra multi-gt pm-references" · 64753576
    Janusz Krzysztofik authored
    This reverts commit 1f33dc0c.
    
    There was a patch supposed to fix an issue of illegal attempts to free a
    still active i915 VMA object when parking a GT believed to be idle,
    reported by CI on 2-GT Meteor Lake.  As a solution, an extra wakeref for
    a Primary GT was acquired from i915_gem_do_execbuffer() -- see commit
    f56fe3e9 ("drm/i915: Fix a VMA UAF for multi-gt platform").
    
    However, that fix occurred insufficient -- the issue was still reported by
    CI.  That wakeref was released on exit from i915_gem_do_execbuffer(), then
    potentially before completion of the request and deactivation of its
    associated VMAs.  Moreover, CI reports indicated that single-GT platforms
    also suffered sporadically from the same race.
    
    Since that issue was fixed by another commit f3c71b2d ("drm/i915/vma:
    Fix UAF on destroy against retire race"), the changes introduced by that
    insufficient fix were dropped as no longer useful.  However, that series
    resulted in another VMA UAF scenario now being triggered in CI.
    
    <4> [260.290809] ------------[ cut here ]------------
    <4> [260.290988] list_del corruption. prev->next should be ffff888118c5d990, but was ffff888118c5a510. (prev=ffff888118c5a510)
    <4> [260.291004] WARNING: CPU: 2 PID: 1143 at lib/list_debug.c:62 __list_del_entry_valid_or_report+0xb7/0xe0
    ..
    <4> [260.291055] CPU: 2 PID: 1143 Comm: kms_plane Not tainted 6.9.0-rc2-CI_DRM_14524-ga25d180c6853+ #1
    <4> [260.291058] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P LP5x T3 RVP, BIOS MTLPFWI1.R00.3471.D91.2401310918 01/31/2024
    <4> [260.291060] RIP: 0010:__list_del_entry_valid_or_report+0xb7/0xe0
    ...
    <4> [260.291087] Call Trace:
    <4> [260.291089]  <TASK>
    <4> [260.291124]  i915_vma_reopen+0x43/0x80 [i915]
    <4> [260.291298]  eb_lookup_vmas+0x9cb/0xcc0 [i915]
    <4> [260.291579]  i915_gem_do_execbuffer+0xc9a/0x26d0 [i915]
    <4> [260.291883]  i915_gem_execbuffer2_ioctl+0x123/0x2a0 [i915]
    ...
    <4> [260.292301]  </TASK>
    ...
    <4> [260.292506] ---[ end trace 0000000000000000 ]---
    <4> [260.292782] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6ca3: 0000 [#1] PREEMPT SMP NOPTI
    <4> [260.303575] CPU: 2 PID: 1143 Comm: kms_plane Tainted: G        W          6.9.0-rc2-CI_DRM_14524-ga25d180c6853+ #1
    <4> [260.313851] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P LP5x T3 RVP, BIOS MTLPFWI1.R00.3471.D91.2401310918 01/31/2024
    <4> [260.326359] RIP: 0010:eb_validate_vmas+0x114/0xd80 [i915]
    ...
    <4> [260.428756] Call Trace:
    <4> [260.431192]  <TASK>
    <4> [639.283393]  i915_gem_do_execbuffer+0xd05/0x26d0 [i915]
    <4> [639.305245]  i915_gem_execbuffer2_ioctl+0x123/0x2a0 [i915]
    ...
    <4> [639.411134]  </TASK>
    ...
    <4> [639.449979] ---[ end trace 0000000000000000 ]---
    
    We defer actually closing, unbinding and destroying a VMA until next idle
    point, or until the object is freed in the meantime.  By postponing the
    unbind, we allow for the VMA to be reopened by the client, avoiding the
    work required to rebind the VMA.
    
    Starting from commit b0647a5e ("drm/i915: Avoid live-lock with
    i915_vma_parked()"), we assume that as long as a GT is held idle, no VMA
    would be reopened while we destroy them.  That assumption is no longer
    true in multi-GT configurations, where a VMA we reopen may be handled by a
    GT different from the one that we already keep active via its engine while
    we set up an execbuf request.
    
    Restoring the extra GT0 PM wakeref removed from i915_gem_do_execbuffer()
    processing path seems to fix this issue.
    
    Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10608Signed-off-by: default avatarJanusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
    Reviewed-by: default avatarNirmoy Das <nirmoy.das@linux.intel.com>
    Fixes: 1f33dc0c ("drm/i915: Remove extra multi-gt pm-references")
    Link: https://patchwork.freedesktop.org/patch/msgid/20240506180253.96858-2-janusz.krzysztofik@linux.intel.comSigned-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
    (cherry picked from commit 749670a5)
    Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
    64753576
i915_gem_execbuffer.c 93.4 KB