• Daniel Axtens's avatar
    cxl: Remove racy attempt to force EEH invocation in reset · 348b0748
    Daniel Axtens authored
    commit 9d8e2767 upstream.
    
    cxl_reset currently PERSTs the slot, and then repeatedly tries to
    read MMIO space in order to kick off EEH.
    
    There are 2 problems with this: it's unnecessary, and it's racy.
    
    It's unnecessary because the PERST will bring down the PHB link.
    That will be picked up by the CAPP, which will send out an HMI.
    Skiboot, noticing an HMI from the CAPP, will send an OPAL
    notification to the kernel, which will trigger EEH recovery.
    
    It's also racy: the EEH recovery triggered by the CAPP will
    eventually cause the MMIO space to have its mapping invalidated
    and the pointer NULLed out. This races with our attempt to read
    the MMIO space. This is causing OOPSes in testing.
    
    Simply drop all the attempts to force EEH detection, and trust
    that Skiboot will send the notification and that we'll act on it.
    The Skiboot code to send the EEH notification has been in Skiboot
    for as long as CAPP recovery has been supported, so we don't need
    to worry about breaking obscure setups with ancient firmware.
    
    Cc: Ryan Grimm <grimm@linux.vnet.ibm.com>
    Fixes: 62fa19d4 ("cxl: Add ability to reset the card")
    Signed-off-by: default avatarDaniel Axtens <dja@axtens.net>
    Acked-by: default avatarIan Munsie <imunsie@au1.ibm.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    348b0748
pci.c 29.6 KB