• Dan Williams's avatar
    cxl/hdm: Fix dpa translation locking · 6f5c4eca
    Dan Williams authored
    The helper, cxl_dpa_resource_start(), snapshots the dpa-address of an
    endpoint-decoder after acquiring the cxl_dpa_rwsem. However, it is
    sufficient to assert that cxl_dpa_rwsem is held rather than acquire it
    in the helper. Otherwise, it triggers multiple lockdep reports:
    
    1/ Tracing callbacks are in an atomic context that can not acquire sleeping
    locks:
    
        BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1525
        in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1288, name: bash
        preempt_count: 2, expected: 0
        RCU nest depth: 0, expected: 0
        [..]
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023
        Call Trace:
         <TASK>
         dump_stack_lvl+0x71/0x90
         __might_resched+0x1b2/0x2c0
         down_read+0x1a/0x190
         cxl_dpa_resource_start+0x15/0x50 [cxl_core]
         cxl_trace_hpa+0x122/0x300 [cxl_core]
         trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]
    
    2/ The rwsem is already held in the inject poison path:
    
        WARNING: possible recursive locking detected
        6.7.0-rc2+ #12 Tainted: G        W  OE    N
        --------------------------------------------
        bash/1288 is trying to acquire lock:
        ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_dpa_resource_start+0x15/0x50 [cxl_core]
    
        but task is already holding lock:
        ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_inject_poison+0x7d/0x1e0 [cxl_core]
        [..]
        Call Trace:
         <TASK>
         dump_stack_lvl+0x71/0x90
         __might_resched+0x1b2/0x2c0
         down_read+0x1a/0x190
         cxl_dpa_resource_start+0x15/0x50 [cxl_core]
         cxl_trace_hpa+0x122/0x300 [cxl_core]
         trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]
         __traceiter_cxl_poison+0x5c/0x80 [cxl_core]
         cxl_inject_poison+0x1bc/0x1e0 [cxl_core]
    
    This appears to have been an issue since the initial implementation and
    uncovered by the new cxl-poison.sh test [1]. That test is now passing with
    these changes.
    
    Fixes: 28a3ae4f ("cxl/trace: Add an HPA to cxl_poison trace events")
    Link: http://lore.kernel.org/r/e4f2716646918135ddbadf4146e92abb659de734.1700615159.git.alison.schofield@intel.com [1]
    Cc: <stable@vger.kernel.org>
    Cc: Alison Schofield <alison.schofield@intel.com>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Dave Jiang <dave.jiang@intel.com>
    Cc: Ira Weiny <ira.weiny@intel.com>
    Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
    6f5c4eca
hdm.c 27.1 KB