1. 19 Jul, 2024 7 commits
    • Bjorn Helgaas's avatar
      Merge branch 'pci/reset' · 62281339
      Bjorn Helgaas authored
      - Warn about doing a Secondary Bus Reset without holding the device lock
        (Dan Williams)
      
      - Lock bridge in addition to downstream hierarchy before doing a Secondary
        Bus Reset (Dan Williams)
      
      * pci/reset:
        PCI: Add missing bridge lock to pci_bus_lock()
        PCI: Warn on missing cfg_access_lock during secondary bus reset
      62281339
    • Bjorn Helgaas's avatar
      Merge branch 'pci/hotplug' · 675ba773
      Bjorn Helgaas authored
      - Detect if a device was removed or replaced during system sleep so we
        don't assume a new device is the one that used to be there.  This uses
        Vendor/Device/Subsystem/Class/Revision and Device Serial Number (if
        implemented), so it's not fool-proof and drivers may know how to detect
        more cases (Lukas Wunner)
      
      - Add missing MODULE_DESCRIPTION() macro (Jeff Johnson)
      
      * pci/hotplug:
        PCI: acpiphp: Add missing MODULE_DESCRIPTION() macro
        PCI: pciehp: Detect device replacement during system sleep
      675ba773
    • Bjorn Helgaas's avatar
      Merge branch 'pci/err' · 52490480
      Bjorn Helgaas authored
      - Disable AER and DPC during suspend so that if they share an interrupt
        with PME and errors occur during suspend, the AER or DPC interrupt
        doesn't cause spurious wakeups (Kai-Heng Feng)
      
      * pci/err:
        PCI/DPC: Disable DPC service on suspend
        PCI/AER: Disable AER service on suspend
      52490480
    • Bjorn Helgaas's avatar
      Merge branch 'pci/enumeration' · 903a3b1e
      Bjorn Helgaas authored
      - Move the PRESERVE_BOOT_CONFIG ACPI _DSM evaluation from drivers/acpi to
        drivers/pci so we can unify with similar DT functionality (Vidya Sagar)
      
      - Add of_pci_preserve_config() to check for a DT "linux,pci-probe-only"
        property on a per-host bridge basis in addition to a global basis (Vidya
        Sagar)
      
      - Unify ACPI PRESERVE_BOOT_CONFIG _DSM and DT "linux,pci-probe-only" in a
        generic pci_preserve_config() path (Vidya Sagar)
      
      * pci/enumeration:
        PCI: Use preserve_config in place of pci_flags
        PCI: Unify ACPI and DT 'preserve config' support
        PCI: of: Add of_pci_preserve_config() for per-host bridge support
        PCI: Move PRESERVE_BOOT_CONFIG _DSM evaluation to pci_register_host_bridge()
      903a3b1e
    • Bjorn Helgaas's avatar
      Merge branch 'pci/dpc' · 147ea50e
      Bjorn Helgaas authored
      - If there's a device below a bridge, prevent a use-after-free by holding a
        reference to the device while waiting for the secondary bus to be ready
        in case the device is concurrently removed, e.g., by DPC (Lukas Wunner)
      
      * pci/dpc:
        PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal
      147ea50e
    • Bjorn Helgaas's avatar
      Merge branch 'pci/devres' · 06bbe25c
      Bjorn Helgaas authored
      - Add pcim_add_mapping_to_legacy_table() and
        pcim_remove_mapping_from_legacy_table() helper functions to simplify
        devres iomap table (Philipp Stanner)
      
      - Reimplement devres that take a bit mask of BARs in a way that can be used
        to map partial BARs as well as entire BARs (Philipp Stanner)
      
      - Deprecate pcim_iomap_table() and pcim_iomap_regions_request_all() in
        favor of pcim_* request plus pcim_* mapping (Philipp Stanner)
      
      - Add pcim_request_region(), a managed interface to request a single BAR
        (Philipp Stanner)
      
      - Use the existing pci_is_enabled() interface to replace the struct
        devres.enabled bit (Philipp Stanner)
      
      - Move the struct pci_devres.pinned bit to struct pci_dev (Philipp Stanner)
      
      - Reimplement pcim_set_mwi() so it uses its own devres cleanup callback
        instead of a special-purpose bit in struct pci_devres (Philipp Stanner)
      
      - Add pcim_intx(), which is unambiguously managed, unlike pci_intx(), which
        is managed if pcim_enable_device() has been called but unmanaged
        otherwise (Philipp Stanner)
      
      - Remove pcim_release(), which is no longer needed after previous cleanups
        of pcim_set_mwi() and pci_intx() (Philipp Stanner)
      
      - Add pcim_iomap_range(), a managed interface to map part of a BAR (Philipp
        Stanner)
      
      - Fix vboxvideo leak by using the new pcim_iomap_range() instead of the
        unmanaged pci_iomap_range() (Philipp Stanner)
      
      * pci/devres:
        drm/vboxvideo: fix mapping leaks
        PCI: Add managed pcim_iomap_range()
        PCI: Remove legacy pcim_release()
        PCI: Add managed pcim_intx()
        PCI: Give pcim_set_mwi() its own devres cleanup callback
        PCI: Move struct pci_devres.pinned bit to struct pci_dev
        PCI: Remove struct pci_devres.enabled status bit
        PCI: Document hybrid devres hazards
        PCI: Add managed pcim_request_region()
        PCI: Deprecate pcim_iomap_table(), pcim_iomap_regions_request_all()
        PCI: Add managed partial-BAR request and map infrastructure
        PCI: Add devres helpers for iomap table
        PCI: Add and use devres helper for bit masks
      06bbe25c
    • Bjorn Helgaas's avatar
      Merge branch 'pci/acs' · cb43487e
      Bjorn Helgaas authored
      - Add ACS quirk for Broadcom BCM5760X NIC, which doesn't allow peer-to-peer
        transactions between functions, but doesn't advertise ACS support (Ajit
        Khaparde)
      
      - Add "pci=config_acs=" kernel command-line parameter to relax default ACS
        settings to enable peer-to-peer configurations.  Requires expert
        knowledge of topology and ACS operation (Vidya Sagar)
      
      * pci/acs:
        PCI: Extend ACS configurability
        PCI: Add ACS quirk for Broadcom BCM5760X NIC
      cb43487e
  2. 12 Jul, 2024 2 commits
    • Vidya Sagar's avatar
      PCI: Extend ACS configurability · 47c8846a
      Vidya Sagar authored
      PCIe ACS settings control the level of isolation and the possible P2P paths
      between devices. With greater isolation the kernel will create smaller
      iommu_groups and with less isolation there is more HW that can achieve P2P
      transfers. From a virtualization perspective all devices in the same
      iommu_group must be assigned to the same VM as they lack security
      isolation.
      
      There is no way for the kernel to automatically know the correct ACS
      settings for any given system and workload. Existing command line options
      (e.g., disable_acs_redir) allow only for large scale change, disabling all
      isolation, but this is not sufficient for more complex cases.
      
      Add a kernel command-line option 'config_acs' to directly control all the
      ACS bits for specific devices, which allows the operator to setup the right
      level of isolation to achieve the desired P2P configuration.  The
      definition is future proof; when new ACS bits are added to the spec the
      open syntax can be extended.
      
      ACS needs to be setup early in the kernel boot as the ACS settings affect
      how iommu_groups are formed. iommu_group formation is a one time event
      during initial device discovery, so changing ACS bits after kernel boot can
      result in an inaccurate view of the iommu_groups compared to the current
      isolation configuration.
      
      ACS applies to PCIe Downstream Ports and multi-function devices.  The
      default ACS settings are strict and deny any direct traffic between two
      functions. This results in the smallest iommu_group the HW can support.
      Frequently these values result in slow or non-working P2PDMA.
      
      ACS offers a range of security choices controlling how traffic is
      allowed to go directly between two devices. Some popular choices:
      
        - Full prevention
      
        - Translated requests can be direct, with various options
      
        - Asymmetric direct traffic, A can reach B but not the reverse
      
        - All traffic can be direct
      
      Along with some other less common ones for special topologies.
      
      The intention is that this option would be used with expert knowledge of
      the HW capability and workload to achieve the desired configuration.
      
      Link: https://lore.kernel.org/r/20240625153150.159310-1-vidyas@nvidia.comSigned-off-by: default avatarVidya Sagar <vidyas@nvidia.com>
      [bhelgaas: add example, tidy printk formats]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      47c8846a
    • Dan Williams's avatar
      PCI: Add missing bridge lock to pci_bus_lock() · a4e77289
      Dan Williams authored
      One of the true positives that the cfg_access_lock lockdep effort
      identified is this sequence:
      
        WARNING: CPU: 14 PID: 1 at drivers/pci/pci.c:4886 pci_bridge_secondary_bus_reset+0x5d/0x70
        RIP: 0010:pci_bridge_secondary_bus_reset+0x5d/0x70
        Call Trace:
         <TASK>
         ? __warn+0x8c/0x190
         ? pci_bridge_secondary_bus_reset+0x5d/0x70
         ? report_bug+0x1f8/0x200
         ? handle_bug+0x3c/0x70
         ? exc_invalid_op+0x18/0x70
         ? asm_exc_invalid_op+0x1a/0x20
         ? pci_bridge_secondary_bus_reset+0x5d/0x70
         pci_reset_bus+0x1d8/0x270
         vmd_probe+0x778/0xa10
         pci_device_probe+0x95/0x120
      
      Where pci_reset_bus() users are triggering unlocked secondary bus resets.
      Ironically pci_bus_reset(), several calls down from pci_reset_bus(), uses
      pci_bus_lock() before issuing the reset which locks everything *but* the
      bridge itself.
      
      For the same motivation as adding:
      
        bridge = pci_upstream_bridge(dev);
        if (bridge)
          pci_dev_lock(bridge);
      
      to pci_reset_function() for the "bus" and "cxl_bus" reset cases, add
      pci_dev_lock() for @bus->self to pci_bus_lock().
      
      Link: https://lore.kernel.org/r/171711747501.1628941.15217746952476635316.stgit@dwillia2-xfh.jf.intel.comReported-by: default avatarImre Deak <imre.deak@intel.com>
      Closes: http://lore.kernel.org/r/6657833b3b5ae_14984b29437@dwillia2-xfh.jf.intel.com.notmuchSigned-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      [bhelgaas: squash in recursive locking deadlock fix from Keith Busch:
      https://lore.kernel.org/r/20240711193650.701834-1-kbusch@meta.com]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Tested-by: default avatarHans de Goede <hdegoede@redhat.com>
      Tested-by: default avatarKalle Valo <kvalo@kernel.org>
      Reviewed-by: default avatarDave Jiang <dave.jiang@intel.com>
      a4e77289
  3. 11 Jul, 2024 4 commits
  4. 10 Jul, 2024 9 commits
  5. 01 Jul, 2024 1 commit
    • Lukas Wunner's avatar
      PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal · 11a1f4bc
      Lukas Wunner authored
      Keith reports a use-after-free when a DPC event occurs concurrently to
      hot-removal of the same portion of the hierarchy:
      
      The dpc_handler() awaits readiness of the secondary bus below the
      Downstream Port where the DPC event occurred.  To do so, it polls the
      config space of the first child device on the secondary bus.  If that
      child device is concurrently removed, accesses to its struct pci_dev
      cause the kernel to oops.
      
      That's because pci_bridge_wait_for_secondary_bus() neglects to hold a
      reference on the child device.  Before v6.3, the function was only
      called on resume from system sleep or on runtime resume.  Holding a
      reference wasn't necessary back then because the pciehp IRQ thread
      could never run concurrently.  (On resume from system sleep, IRQs are
      not enabled until after the resume_noirq phase.  And runtime resume is
      always awaited before a PCI device is removed.)
      
      However starting with v6.3, pci_bridge_wait_for_secondary_bus() is also
      called on a DPC event.  Commit 53b54ad0 ("PCI/DPC: Await readiness
      of secondary bus after reset"), which introduced that, failed to
      appreciate that pci_bridge_wait_for_secondary_bus() now needs to hold a
      reference on the child device because dpc_handler() and pciehp may
      indeed run concurrently.  The commit was backported to v5.10+ stable
      kernels, so that's the oldest one affected.
      
      Add the missing reference acquisition.
      
      Abridged stack trace:
      
        BUG: unable to handle page fault for address: 00000000091400c0
        CPU: 15 PID: 2464 Comm: irq/53-pcie-dpc 6.9.0
        RIP: pci_bus_read_config_dword+0x17/0x50
        pci_dev_wait()
        pci_bridge_wait_for_secondary_bus()
        dpc_reset_link()
        pcie_do_recovery()
        dpc_handler()
      
      Fixes: 53b54ad0 ("PCI/DPC: Await readiness of secondary bus after reset")
      Closes: https://lore.kernel.org/r/20240612181625.3604512-3-kbusch@meta.com/
      Link: https://lore.kernel.org/linux-pci/8e4bcd4116fd94f592f2bf2749f168099c480ddf.1718707743.git.lukas@wunner.deReported-by: default avatarKeith Busch <kbusch@kernel.org>
      Tested-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarKrzysztof Wilczyński <kwilczynski@kernel.org>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: stable@vger.kernel.org # v5.10+
      11a1f4bc
  6. 18 Jun, 2024 3 commits
  7. 04 Jun, 2024 1 commit
  8. 03 Jun, 2024 4 commits
  9. 30 May, 2024 1 commit
  10. 28 May, 2024 1 commit
  11. 26 May, 2024 5 commits
  12. 25 May, 2024 2 commits
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of... · 9b62e02e
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "16 hotfixes, 11 of which are cc:stable.
      
        A few nilfs2 fixes, the remainder are for MM: a couple of selftests
        fixes, various singletons fixing various issues in various parts"
      
      * tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        mm/ksm: fix possible UAF of stable_node
        mm/memory-failure: fix handling of dissolved but not taken off from buddy pages
        mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
        nilfs2: fix potential hang in nilfs_detach_log_writer()
        nilfs2: fix unexpected freezing of nilfs_segctor_sync()
        nilfs2: fix use-after-free of timer for log writer thread
        selftests/mm: fix build warnings on ppc64
        arm64: patching: fix handling of execmem addresses
        selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
        selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
        selftests/mm: compaction_test: fix bogus test success on Aarch64
        mailmap: update email address for Satya Priya
        mm/huge_memory: don't unpoison huge_zero_folio
        kasan, fortify: properly rename memintrinsics
        lib: add version into /proc/allocinfo output
        mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
      9b62e02e
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a0db36ed
      Linus Torvalds authored
      Pull irq fixes from Ingo Molnar:
      
       - Fix x86 IRQ vector leak caused by a CPU offlining race
      
       - Fix build failure in the riscv-imsic irqchip driver
         caused by an API-change semantic conflict
      
       - Fix use-after-free in irq_find_at_or_after()
      
      * tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()
        genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline
        irqchip/riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
      a0db36ed