• Kuppuswamy Sathyanarayanan's avatar
    PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits · 203926da
    Kuppuswamy Sathyanarayanan authored
    When a Root Port or Root Complex Event Collector receives an error Message
    e.g., ERR_COR, it sets PCI_ERR_ROOT_COR_RCV in the Root Error Status
    register and logs the Requester ID in the Error Source Identification
    register.  If it receives a second ERR_COR Message before software clears
    PCI_ERR_ROOT_COR_RCV, hardware sets PCI_ERR_ROOT_MULTI_COR_RCV and the
    Requester ID is lost.
    
    In the following scenario, PCI_ERR_ROOT_MULTI_COR_RCV was never cleared:
    
      - hardware receives ERR_COR message
      - hardware sets PCI_ERR_ROOT_COR_RCV
      - aer_irq() entered
      - aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS)
      - aer_irq(): now status == PCI_ERR_ROOT_COR_RCV
      - hardware receives second ERR_COR message
      - hardware sets PCI_ERR_ROOT_MULTI_COR_RCV
      - aer_irq(): pci_write_config_dword(PCI_ERR_ROOT_STATUS, status)
      - PCI_ERR_ROOT_COR_RCV is cleared; PCI_ERR_ROOT_MULTI_COR_RCV is set
      - aer_irq() entered again
      - aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS)
      - aer_irq(): now status == PCI_ERR_ROOT_MULTI_COR_RCV
      - aer_irq() exits because PCI_ERR_ROOT_COR_RCV not set
      - PCI_ERR_ROOT_MULTI_COR_RCV is still set
    
    The same problem occurred with ERR_NONFATAL/ERR_FATAL Messages and
    PCI_ERR_ROOT_UNCOR_RCV and PCI_ERR_ROOT_MULTI_UNCOR_RCV.
    
    Fix the problem by queueing an AER event and clearing the Root Error Status
    bits when any of these bits are set:
    
      PCI_ERR_ROOT_COR_RCV
      PCI_ERR_ROOT_UNCOR_RCV
      PCI_ERR_ROOT_MULTI_COR_RCV
      PCI_ERR_ROOT_MULTI_UNCOR_RCV
    
    See the bugzilla link for details from Eric about how to reproduce this
    problem.
    
    [bhelgaas: commit log, move repro details to bugzilla]
    Fixes: e167bfca ("PCI: aerdrv: remove magical ROOT_ERR_STATUS_MASKS")
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=215992
    Link: https://lore.kernel.org/r/20220418150237.1021519-1-sathyanarayanan.kuppuswamy@linux.intel.comReported-by: default avatarEric Badger <ebadger@purestorage.com>
    Signed-off-by: default avatarKuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
    Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
    Reviewed-by: default avatarAshok Raj <ashok.raj@intel.com>
    203926da
aer.c 39.6 KB