• Andrew Donnellan's avatar
    cxl: fix nested locking hang during EEH hotplug · 8cdfa0d8
    Andrew Donnellan authored
    commit 171ed0fc upstream.
    
    Commit 14a3ae34 ("cxl: Prevent read/write to AFU config space while AFU
    not configured") introduced a rwsem to fix an invalid memory access that
    occurred when someone attempts to access the config space of an AFU on a
    vPHB whilst the AFU is deconfigured, such as during EEH recovery.
    
    It turns out that it's possible to run into a nested locking issue when EEH
    recovery fails and a full device hotplug is required.
    cxl_pci_error_detected() deconfigures the AFU, taking a writer lock on
    configured_rwsem. When EEH recovery fails, the EEH code calls
    pci_hp_remove_devices() to remove the device, which in turn calls
    cxl_remove() -> cxl_pci_remove_afu() -> pci_deconfigure_afu(), which tries
    to grab the writer lock that's already held.
    
    Standard rwsem semantics don't express what we really want to do here and
    don't allow for nested locking. Fix this by replacing the rwsem with an
    atomic_t which we can control more finely. Allow the AFU to be locked
    multiple times so long as there are no readers.
    
    Fixes: 14a3ae34 ("cxl: Prevent read/write to AFU config space while AFU not configured")
    Signed-off-by: default avatarAndrew Donnellan <andrew.donnellan@au1.ibm.com>
    Acked-by: default avatarFrederic Barrat <fbarrat@linux.vnet.ibm.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    8cdfa0d8
cxl.h 35.2 KB