Commit 361187e0 authored by Dave Jiang's avatar Dave Jiang Committed by Dan Williams

PCI/AER: Add optional logging callback for correctable error

Some new devices such as CXL devices may want to record additional error
information on a corrected error. Add a callback to allow the PCI device
driver to do additional logging such as providing additional stats for user
space RAS monitoring.

For CXL device, this is actually a need due to CXL needing to write to the
CXL RAS capability structure correctable error status register in order to
clear the unmasked correctable errors. See CXL spec rev3.0 8.2.4.16.
Suggested-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: default avatarKuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166984619233.2804404.3966368388544312674.stgit@djiang5-desk3.ch.intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
parent 2905cb52
...@@ -83,6 +83,7 @@ This structure has the form:: ...@@ -83,6 +83,7 @@ This structure has the form::
int (*mmio_enabled)(struct pci_dev *dev); int (*mmio_enabled)(struct pci_dev *dev);
int (*slot_reset)(struct pci_dev *dev); int (*slot_reset)(struct pci_dev *dev);
void (*resume)(struct pci_dev *dev); void (*resume)(struct pci_dev *dev);
void (*cor_error_detected)(struct pci_dev *dev);
}; };
The possible channel states are:: The possible channel states are::
...@@ -422,5 +423,11 @@ That is, the recovery API only requires that: ...@@ -422,5 +423,11 @@ That is, the recovery API only requires that:
- drivers/net/cxgb3 - drivers/net/cxgb3
- drivers/net/s2io.c - drivers/net/s2io.c
The cor_error_detected() callback is invoked in handle_error_source() when
the error severity is "correctable". The callback is optional and allows
additional logging to be done if desired. See example:
- drivers/cxl/pci.c
The End The End
------- -------
...@@ -961,8 +961,14 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info) ...@@ -961,8 +961,14 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
if (aer) if (aer)
pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS, pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS,
info->status); info->status);
if (pcie_aer_is_native(dev)) if (pcie_aer_is_native(dev)) {
struct pci_driver *pdrv = dev->driver;
if (pdrv && pdrv->err_handler &&
pdrv->err_handler->cor_error_detected)
pdrv->err_handler->cor_error_detected(dev);
pcie_clear_device_status(dev); pcie_clear_device_status(dev);
}
} else if (info->severity == AER_NONFATAL) } else if (info->severity == AER_NONFATAL)
pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset); pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset);
else if (info->severity == AER_FATAL) else if (info->severity == AER_FATAL)
......
...@@ -843,6 +843,9 @@ struct pci_error_handlers { ...@@ -843,6 +843,9 @@ struct pci_error_handlers {
/* Device driver may resume normal operations */ /* Device driver may resume normal operations */
void (*resume)(struct pci_dev *dev); void (*resume)(struct pci_dev *dev);
/* Allow device driver to record more details of a correctable error */
void (*cor_error_detected)(struct pci_dev *dev);
}; };
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment