1. 17 Feb, 2024 5 commits
  2. 29 Jan, 2024 2 commits
    • Li Ming's avatar
      cxl/pci: Skip to handle RAS errors if CXL.mem device is detached · eef5c7b2
      Li Ming authored
      The PCI AER model is an awkward fit for CXL error handling. While the
      expectation is that a PCI device can escalate to link reset to recover
      from an AER event, the same reset on CXL amounts to a surprise memory
      hotplug of massive amounts of memory.
      
      At present, the CXL error handler attempts some optimistic error
      handling to unbind the device from the cxl_mem driver after reaping some
      RAS register values. This results in a "hopeful" attempt to unplug the
      memory, but there is no guarantee that will succeed.
      
      A subsequent AER notification after the memdev unbind event can no
      longer assume the registers are mapped. Check for memdev bind before
      reaping status register values to avoid crashes of the form:
      
       BUG: unable to handle page fault for address: ffa00000195e9100
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       [...]
       RIP: 0010:__cxl_handle_ras+0x30/0x110 [cxl_core]
       [...]
       Call Trace:
        <TASK>
        ? __die+0x24/0x70
        ? page_fault_oops+0x82/0x160
        ? kernelmode_fixup_or_oops+0x84/0x110
        ? exc_page_fault+0x113/0x170
        ? asm_exc_page_fault+0x26/0x30
        ? __pfx_dpc_reset_link+0x10/0x10
        ? __cxl_handle_ras+0x30/0x110 [cxl_core]
        ? find_cxl_port+0x59/0x80 [cxl_core]
        cxl_handle_rp_ras+0xbc/0xd0 [cxl_core]
        cxl_error_detected+0x6c/0xf0 [cxl_core]
        report_error_detected+0xc7/0x1c0
        pci_walk_bus+0x73/0x90
        pcie_do_recovery+0x23f/0x330
      
      Longer term, the unbind and PCI_ERS_RESULT_DISCONNECT behavior might
      need to be replaced with a new PCI_ERS_RESULT_PANIC.
      
      Fixes: 6ac07883 ("cxl/pci: Add RCH downstream port error logging")
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarLi Ming <ming4.li@intel.com>
      Link: https://lore.kernel.org/r/20240129131856.2458980-1-ming4.li@intel.comSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      eef5c7b2
    • Linus Torvalds's avatar
      Linux 6.8-rc2 · 41bccc98
      Linus Torvalds authored
      41bccc98
  3. 28 Jan, 2024 7 commits
  4. 27 Jan, 2024 9 commits
  5. 26 Jan, 2024 17 commits