- 19 Jul, 2024 8 commits
-
-
Bjorn Helgaas authored
- Rename find_resource() to find_resource_space() to make it more descriptive for exporting outside resource.c (Ilpo Järvinen) - Document find_resource_space() and the resource_constraint struct it uses (Ilpo Järvinen) - Add typedef resource_alignf to make it simpler to declare allocation constraint alignf callbacks (Ilpo Järvinen) - Open-code the no-constraint simple alignment case to make the simple_align_resource() default callback unnecessary (Ilpo Järvinen) - Export find_resource_space() because PCI bridge window allocation needs to learn whether there's space for a window (Ilpo Järvinen) - Fix a double-counting problem in PCI calculate_memsize() that led to allocating larger windows each time a bus was removed and rescanned (Ilpo Järvinen) - When we don't have space to allocate larger bridge windows, allocate windows only large enough for the downstream devices to prevent cases where a device worked originally, but not after being removed and re-added (Ilpo Järvinen) * pci/resource: PCI: Relax bridge window tail sizing rules PCI: Make minimum bridge window alignment reference more obvious PCI: Fix resource double counting on remove & rescan resource: Export find_resource_space() resource: Handle simple alignment inside __find_resource_space() resource: Use typedef for alignf callback resource: Document find_resource_space() and resource_constraint resource: Rename find_resource() to find_resource_space()
-
Bjorn Helgaas authored
- Warn about doing a Secondary Bus Reset without holding the device lock (Dan Williams) - Lock bridge in addition to downstream hierarchy before doing a Secondary Bus Reset (Dan Williams) * pci/reset: PCI: Add missing bridge lock to pci_bus_lock() PCI: Warn on missing cfg_access_lock during secondary bus reset
-
Bjorn Helgaas authored
- Detect if a device was removed or replaced during system sleep so we don't assume a new device is the one that used to be there. This uses Vendor/Device/Subsystem/Class/Revision and Device Serial Number (if implemented), so it's not fool-proof and drivers may know how to detect more cases (Lukas Wunner) - Add missing MODULE_DESCRIPTION() macro (Jeff Johnson) * pci/hotplug: PCI: acpiphp: Add missing MODULE_DESCRIPTION() macro PCI: pciehp: Detect device replacement during system sleep
-
Bjorn Helgaas authored
- Disable AER and DPC during suspend so that if they share an interrupt with PME and errors occur during suspend, the AER or DPC interrupt doesn't cause spurious wakeups (Kai-Heng Feng) * pci/err: PCI/DPC: Disable DPC service on suspend PCI/AER: Disable AER service on suspend
-
Bjorn Helgaas authored
- Move the PRESERVE_BOOT_CONFIG ACPI _DSM evaluation from drivers/acpi to drivers/pci so we can unify with similar DT functionality (Vidya Sagar) - Add of_pci_preserve_config() to check for a DT "linux,pci-probe-only" property on a per-host bridge basis in addition to a global basis (Vidya Sagar) - Unify ACPI PRESERVE_BOOT_CONFIG _DSM and DT "linux,pci-probe-only" in a generic pci_preserve_config() path (Vidya Sagar) * pci/enumeration: PCI: Use preserve_config in place of pci_flags PCI: Unify ACPI and DT 'preserve config' support PCI: of: Add of_pci_preserve_config() for per-host bridge support PCI: Move PRESERVE_BOOT_CONFIG _DSM evaluation to pci_register_host_bridge()
-
Bjorn Helgaas authored
- If there's a device below a bridge, prevent a use-after-free by holding a reference to the device while waiting for the secondary bus to be ready in case the device is concurrently removed, e.g., by DPC (Lukas Wunner) * pci/dpc: PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal
-
Bjorn Helgaas authored
- Add pcim_add_mapping_to_legacy_table() and pcim_remove_mapping_from_legacy_table() helper functions to simplify devres iomap table (Philipp Stanner) - Reimplement devres that take a bit mask of BARs in a way that can be used to map partial BARs as well as entire BARs (Philipp Stanner) - Deprecate pcim_iomap_table() and pcim_iomap_regions_request_all() in favor of pcim_* request plus pcim_* mapping (Philipp Stanner) - Add pcim_request_region(), a managed interface to request a single BAR (Philipp Stanner) - Use the existing pci_is_enabled() interface to replace the struct devres.enabled bit (Philipp Stanner) - Move the struct pci_devres.pinned bit to struct pci_dev (Philipp Stanner) - Reimplement pcim_set_mwi() so it uses its own devres cleanup callback instead of a special-purpose bit in struct pci_devres (Philipp Stanner) - Add pcim_intx(), which is unambiguously managed, unlike pci_intx(), which is managed if pcim_enable_device() has been called but unmanaged otherwise (Philipp Stanner) - Remove pcim_release(), which is no longer needed after previous cleanups of pcim_set_mwi() and pci_intx() (Philipp Stanner) - Add pcim_iomap_range(), a managed interface to map part of a BAR (Philipp Stanner) - Fix vboxvideo leak by using the new pcim_iomap_range() instead of the unmanaged pci_iomap_range() (Philipp Stanner) * pci/devres: drm/vboxvideo: fix mapping leaks PCI: Add managed pcim_iomap_range() PCI: Remove legacy pcim_release() PCI: Add managed pcim_intx() PCI: Give pcim_set_mwi() its own devres cleanup callback PCI: Move struct pci_devres.pinned bit to struct pci_dev PCI: Remove struct pci_devres.enabled status bit PCI: Document hybrid devres hazards PCI: Add managed pcim_request_region() PCI: Deprecate pcim_iomap_table(), pcim_iomap_regions_request_all() PCI: Add managed partial-BAR request and map infrastructure PCI: Add devres helpers for iomap table PCI: Add and use devres helper for bit masks
-
Bjorn Helgaas authored
- Add ACS quirk for Broadcom BCM5760X NIC, which doesn't allow peer-to-peer transactions between functions, but doesn't advertise ACS support (Ajit Khaparde) - Add "pci=config_acs=" kernel command-line parameter to relax default ACS settings to enable peer-to-peer configurations. Requires expert knowledge of topology and ACS operation (Vidya Sagar) * pci/acs: PCI: Extend ACS configurability PCI: Add ACS quirk for Broadcom BCM5760X NIC
-
- 12 Jul, 2024 2 commits
-
-
Vidya Sagar authored
PCIe ACS settings control the level of isolation and the possible P2P paths between devices. With greater isolation the kernel will create smaller iommu_groups and with less isolation there is more HW that can achieve P2P transfers. From a virtualization perspective all devices in the same iommu_group must be assigned to the same VM as they lack security isolation. There is no way for the kernel to automatically know the correct ACS settings for any given system and workload. Existing command line options (e.g., disable_acs_redir) allow only for large scale change, disabling all isolation, but this is not sufficient for more complex cases. Add a kernel command-line option 'config_acs' to directly control all the ACS bits for specific devices, which allows the operator to setup the right level of isolation to achieve the desired P2P configuration. The definition is future proof; when new ACS bits are added to the spec the open syntax can be extended. ACS needs to be setup early in the kernel boot as the ACS settings affect how iommu_groups are formed. iommu_group formation is a one time event during initial device discovery, so changing ACS bits after kernel boot can result in an inaccurate view of the iommu_groups compared to the current isolation configuration. ACS applies to PCIe Downstream Ports and multi-function devices. The default ACS settings are strict and deny any direct traffic between two functions. This results in the smallest iommu_group the HW can support. Frequently these values result in slow or non-working P2PDMA. ACS offers a range of security choices controlling how traffic is allowed to go directly between two devices. Some popular choices: - Full prevention - Translated requests can be direct, with various options - Asymmetric direct traffic, A can reach B but not the reverse - All traffic can be direct Along with some other less common ones for special topologies. The intention is that this option would be used with expert knowledge of the HW capability and workload to achieve the desired configuration. Link: https://lore.kernel.org/r/20240625153150.159310-1-vidyas@nvidia.comSigned-off-by: Vidya Sagar <vidyas@nvidia.com> [bhelgaas: add example, tidy printk formats] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
Dan Williams authored
One of the true positives that the cfg_access_lock lockdep effort identified is this sequence: WARNING: CPU: 14 PID: 1 at drivers/pci/pci.c:4886 pci_bridge_secondary_bus_reset+0x5d/0x70 RIP: 0010:pci_bridge_secondary_bus_reset+0x5d/0x70 Call Trace: <TASK> ? __warn+0x8c/0x190 ? pci_bridge_secondary_bus_reset+0x5d/0x70 ? report_bug+0x1f8/0x200 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? pci_bridge_secondary_bus_reset+0x5d/0x70 pci_reset_bus+0x1d8/0x270 vmd_probe+0x778/0xa10 pci_device_probe+0x95/0x120 Where pci_reset_bus() users are triggering unlocked secondary bus resets. Ironically pci_bus_reset(), several calls down from pci_reset_bus(), uses pci_bus_lock() before issuing the reset which locks everything *but* the bridge itself. For the same motivation as adding: bridge = pci_upstream_bridge(dev); if (bridge) pci_dev_lock(bridge); to pci_reset_function() for the "bus" and "cxl_bus" reset cases, add pci_dev_lock() for @bus->self to pci_bus_lock(). Link: https://lore.kernel.org/r/171711747501.1628941.15217746952476635316.stgit@dwillia2-xfh.jf.intel.comReported-by: Imre Deak <imre.deak@intel.com> Closes: http://lore.kernel.org/r/6657833b3b5ae_14984b29437@dwillia2-xfh.jf.intel.com.notmuchSigned-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Keith Busch <kbusch@kernel.org> [bhelgaas: squash in recursive locking deadlock fix from Keith Busch: https://lore.kernel.org/r/20240711193650.701834-1-kbusch@meta.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Kalle Valo <kvalo@kernel.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
-
- 11 Jul, 2024 4 commits
-
-
Philipp Stanner authored
When the PCI devres API was introduced to this driver, it was wrongly assumed that initializing the device with pcim_enable_device() instead of pci_enable_device() will make all PCI functions managed. This is wrong and was caused by the quite confusing PCI devres API in which some, but not all, functions become managed that way. The function pci_iomap_range() is never managed. Replace pci_iomap_range() with the managed function pcim_iomap_range(). Fixes: 8558de40 ("drm/vboxvideo: use managed pci functions") Link: https://lore.kernel.org/r/20240613115032.29098-14-pstanner@redhat.comSigned-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
-
Philipp Stanner authored
The only managed mapping function currently is pcim_iomap() which doesn't allow for mapping an area starting at a certain offset, which many drivers want. Add pcim_iomap_range() as an exported function. Link: https://lore.kernel.org/r/20240613115032.29098-13-pstanner@redhat.comSigned-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
Philipp Stanner authored
Thanks to preceding cleanup steps, pcim_release() is now not needed anymore and can be replaced by pcim_disable_device(), which is the exact counterpart to pcim_enable_device(). This permits removing further parts of the old PCI devres implementation. Replace pcim_release() with pcim_disable_device(). Remove the now unused function get_pci_dr(). Remove the struct pci_devres from pci.h. Link: https://lore.kernel.org/r/20240613115032.29098-12-pstanner@redhat.comSigned-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
Philipp Stanner authored
pci_intx() is a "hybrid" function, i.e., it is managed if pcim_enable_device() has been called, but unmanaged otherwise. Add pcim_intx(), which is always managed, and implement pci_intx() using it. Remove the now-unused struct pci_devres.orig_intx and .restore_intx and find_pci_dr(). Link: https://lore.kernel.org/r/20240613115032.29098-11-pstanner@redhat.comSigned-off-by: Philipp Stanner <pstanner@redhat.com> [kwilczynski: squashed in https://lore.kernel.org/r/426645d40776198e0fcc942f4a6cac4433c7a9aa.camel@redhat.com to fix problem reported and tested by Ashish Kalra <Ashish.Kalra@amd.com>: https://lore.kernel.org/r/20240708214656.4721-1-Ashish.Kalra@amd.com https://lore.kernel.org/r/8c4634e9-4f02-4c54-9c89-d75e2f4bf026@amd.com/] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
- 10 Jul, 2024 9 commits
-
-
Philipp Stanner authored
Managing pci_set_mwi() with devres can easily be done with its own callback, without the necessity to store any state about it in a device-related struct. Remove the MWI state from struct pci_devres. Give pcim_set_mwi() a separate devres cleanup callback. Link: https://lore.kernel.org/r/20240613115032.29098-10-pstanner@redhat.comSigned-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
Philipp Stanner authored
The bit describing whether the PCI device is currently pinned is stored in struct pci_devres. To clean up and simplify the PCI devres API, it's better if this information is stored in struct pci_dev. This will later permit simplifying pcim_enable_device(). Move the 'pinned' boolean bit to struct pci_dev. Restructure bits in struct pci_dev so the pm / pme fields are next to each other. Link: https://lore.kernel.org/r/20240613115032.29098-9-pstanner@redhat.comSigned-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
Philipp Stanner authored
The struct pci_devres has a separate boolean to track whether a device is enabled. That, however, can easily be tracked in an agnostic manner through the function pci_is_enabled(). Using it allows for simplifying the PCI devres implementation. Replace the separate 'enabled' status bit from struct pci_devres with calls to pci_is_enabled() at the appropriate places. Link: https://lore.kernel.org/r/20240613115032.29098-8-pstanner@redhat.comSigned-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
Philipp Stanner authored
These functions: pci_request_region() pci_request_regions() pci_request_regions_exclusive() pci_request_selected_regions() pci_request_selected_regions_exclusive() pci_intx() are "hybrid" functions that are managed if pcim_enable_device() has been called, but unmanaged otherwise. This is confusing and has already caused a bug (in 8558de40 ("drm/vboxvideo: use managed pci functions")) because users believe all PCI functions, such as pci_iomap_range(), can become managed that way, which is not the case. Add comments to the relevant functions' docstrings that warn users about this behavior. Link: https://lore.kernel.org/r/20240613115032.29098-7-pstanner@redhat.comSigned-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
Philipp Stanner authored
These existing functions: pci_request_region() pci_request_selected_regions() pci_request_selected_regions_exclusive() are "hybrid" functions built on __pci_request_region() and are managed if pcim_enable_device() has been called, but unmanaged otherwise. Add these new functions: pcim_request_region() pcim_request_region_exclusive() These are *always* managed and use the new pcim_addr_devres tracking infrastructure instead of find_pci_dr() and struct pci_devres.region_mask. Implement the hybrid functions using the new "pure" functions and remove struct pci_devres.region_mask, which is no longer needed. Link: https://lore.kernel.org/r/20240613115032.29098-6-pstanner@redhat.comSigned-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
Philipp Stanner authored
Deprecate pcim_iomap_table(). It returns a pointer to a table of ioremapped BARs, or NULL if it fails. This makes uses like this: addr = pcim_iomap_table(pdev)[0]; problematic because it causes a NULL pointer dereference on failure. Callers should use pcim_iomap() instead. Deprecate pcim_iomap_regions_request_all() because it is built on __pci_request_region() and is managed if pcim_enable_device() has been called, but unmanaged otherwise, which is prone to errors. Callers should either use pcim_iomap_regions() to request and map BARs, or use pcim_request_region() followed by pcim_iomap(). Link: https://lore.kernel.org/r/20240613115032.29098-5-pstanner@redhat.comSigned-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log, sphinx markup] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
Philipp Stanner authored
The pcim_iomap_devres table tracks entire-BAR mappings, so we can't use it to build a managed version of pci_iomap_range(), which maps partial BARs. Add struct pcim_addr_devres, which can track request and mapping of both entire BARs and partial BARs. Add the following internal devres functions based on struct pcim_addr_devres: pcim_iomap_region() # request & map entire BAR pcim_iounmap_region() # unmap & release entire BAR pcim_request_region() # request entire BAR pcim_release_region() # release entire BAR pcim_request_all_regions() # request all entire BARs pcim_release_all_regions() # release all entire BARs Rework the following public interfaces using the new infrastructure listed above: pcim_iomap() # map partial BAR pcim_iounmap() # unmap partial BAR pcim_iomap_regions() # request & map specified BARs pcim_iomap_regions_request_all() # request all BARs, map specified BARs pcim_iounmap_regions() # unmap & release specified BARs Link: https://lore.kernel.org/r/20240613115032.29098-4-pstanner@redhat.comSigned-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
Philipp Stanner authored
The pcim_iomap_devres.table administrated by pcim_iomap_table() has its entries set and unset at several places throughout devres.c using manual iterations which are effectively code duplications. Add pcim_add_mapping_to_legacy_table() and pcim_remove_mapping_from_legacy_table() helper functions and use them where possible. Link: https://lore.kernel.org/r/20240613115032.29098-3-pstanner@redhat.comSigned-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
Philipp Stanner authored
The current devres implementation uses manual shift operations to check whether a bit in a mask is set. The code can be made more readable by writing a small helper function for that. Implement mask_contains_bar() and use it where applicable. Link: https://lore.kernel.org/r/20240613115032.29098-2-pstanner@redhat.comSigned-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
- 01 Jul, 2024 1 commit
-
-
Lukas Wunner authored
Keith reports a use-after-free when a DPC event occurs concurrently to hot-removal of the same portion of the hierarchy: The dpc_handler() awaits readiness of the secondary bus below the Downstream Port where the DPC event occurred. To do so, it polls the config space of the first child device on the secondary bus. If that child device is concurrently removed, accesses to its struct pci_dev cause the kernel to oops. That's because pci_bridge_wait_for_secondary_bus() neglects to hold a reference on the child device. Before v6.3, the function was only called on resume from system sleep or on runtime resume. Holding a reference wasn't necessary back then because the pciehp IRQ thread could never run concurrently. (On resume from system sleep, IRQs are not enabled until after the resume_noirq phase. And runtime resume is always awaited before a PCI device is removed.) However starting with v6.3, pci_bridge_wait_for_secondary_bus() is also called on a DPC event. Commit 53b54ad0 ("PCI/DPC: Await readiness of secondary bus after reset"), which introduced that, failed to appreciate that pci_bridge_wait_for_secondary_bus() now needs to hold a reference on the child device because dpc_handler() and pciehp may indeed run concurrently. The commit was backported to v5.10+ stable kernels, so that's the oldest one affected. Add the missing reference acquisition. Abridged stack trace: BUG: unable to handle page fault for address: 00000000091400c0 CPU: 15 PID: 2464 Comm: irq/53-pcie-dpc 6.9.0 RIP: pci_bus_read_config_dword+0x17/0x50 pci_dev_wait() pci_bridge_wait_for_secondary_bus() dpc_reset_link() pcie_do_recovery() dpc_handler() Fixes: 53b54ad0 ("PCI/DPC: Await readiness of secondary bus after reset") Closes: https://lore.kernel.org/r/20240612181625.3604512-3-kbusch@meta.com/ Link: https://lore.kernel.org/linux-pci/8e4bcd4116fd94f592f2bf2749f168099c480ddf.1718707743.git.lukas@wunner.deReported-by: Keith Busch <kbusch@kernel.org> Tested-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: stable@vger.kernel.org # v5.10+
-
- 18 Jun, 2024 3 commits
-
-
Kai-Heng Feng authored
If the link is powered off during suspend, electrical noise may cause errors that trigger DPC. If the DPC interrupt is enabled and shares an IRQ with PME, that causes a spurious wakeup during suspend. Disable DPC triggering and the DPC interrupt during suspend to prevent this. Clear DPC interrupt status before re-enabling DPC interrupts during resume so we don't get an interrupt for errors that occurred during the suspend/resume process. Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 Link: https://lore.kernel.org/r/20240416043225.1462548-3-kai.heng.feng@canonical.comSigned-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> [bhelgaas: clear status on resume, add comments, commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
Kai-Heng Feng authored
If the link is powered off during suspend, electrical noise may cause errors that are logged via AER. If the AER interrupt is enabled and shares an IRQ with PME, that causes a spurious wakeup during suspend. Disable the AER interrupt during suspend to prevent this. Clear error status before re-enabling IRQ interrupts during resume so we don't get an interrupt for errors that occurred during the suspend/resume process. Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 Link: https://lore.kernel.org/r/20240416043225.1462548-2-kai.heng.feng@canonical.comSigned-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> [bhelgaas: drop pci_ancestor_pr3_present() etc, commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
Jeff Johnson authored
With ARCH=arm64, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/pci/hotplug/acpiphp_ampere_altra.o Add the missing MODULE_DESCRIPTION(). Link: https://lore.kernel.org/r/20240612-md-drivers-pci-hotplug-v1-1-2b30d14d783d@quicinc.comSigned-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
- 12 Jun, 2024 2 commits
-
-
Ilpo Järvinen authored
During remove & rescan cycle, PCI subsystem will recalculate and adjust the bridge window sizing that was initially done by "BIOS". The size calculation is based on the required alignment of the largest resource among the downstream resources as per pbus_size_mem() (unimportant or zero parameters marked with "..."): min_align = calculate_mem_align(aligns, max_order); size0 = calculate_memsize(size, ..., min_align); inside calculate_memsize(), for the largest alignment: min_align = align1 >> 1; ... return min_align; and then in calculate_memsize(): return ALIGN(max(size, ...), align); If the original bridge window sizing tried to conserve space, this will lead to massive increase of the required bridge window size when the downstream has a large disparity in BAR sizes. E.g., with 16MiB and 16GiB BARs this results in 24GiB bridge window size even if 16MiB BAR does not require gigabytes of space to fit. When doing remove & rescan for a bus that contains such a PCI device, a larger bridge window is suddenly required on rescan but when there is a bridge window upstream that is already assigned based on the original size, it cannot be enlarged to the new requirement. This causes the allocation of the bridge window to fail (0x600000000 > 0x400ffffff): pci 0000:02:01.0: PCI bridge to [bus 03] pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff] pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] pci 0000:01:00.0: PCI bridge to [bus 02-04] pci 0000:01:00.0: bridge window [mem 0x40400000-0x406fffff] pci 0000:01:00.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] pci 0000:03:00.0: device released pci 0000:02:01.0: device released pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 0 pci 0000:02:01.0: PCI bridge to [bus 03] pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff] pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 0 pci 0000:03:00.0: BAR 0 [mem 0x6400000000-0x6400ffffff 64bit pref] pci 0000:03:00.0: BAR 2 [mem 0x6000000000-0x63ffffffff 64bit pref] pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref] pci 0000:02:01.0: PCI bridge to [bus 03] pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 1 pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 1 pci 0000:02:01.0: bridge window [mem size 0x600000000 64bit pref]: can't assign; no space pci 0000:02:01.0: bridge window [mem size 0x600000000 64bit pref]: failed to assign pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]: assigned pci 0000:03:00.0: BAR 2 [mem size 0x400000000 64bit pref]: can't assign; no space pci 0000:03:00.0: BAR 2 [mem size 0x400000000 64bit pref]: failed to assign pci 0000:03:00.0: BAR 0 [mem size 0x01000000 64bit pref]: can't assign; no space pci 0000:03:00.0: BAR 0 [mem size 0x01000000 64bit pref]: failed to assign pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]: assigned pci 0000:02:01.0: PCI bridge to [bus 03] pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff] This is a major surprise for users who are suddenly left with a device that was working fine with the original bridge window sizing. Even if the already assigned bridge window could be enlarged by reallocation in some cases (something the current code does not attempt to do), it is not possible in general case and the large amount of wasted space at the tail of the bridge window may lead to other resource exhaustion problems on Root Complex level (think of multiple PCIe cards with VFs and BAR size disparity in a single system). PCI BARs only need natural alignment (PCIe r6.1, sec 7.5.1.2.1) and bridge memory windows need 1MiB (sec 7.5.1.3). The current bridge window tail alignment rule was introduced in the commit 5d0a8965 ("[PATCH] 2.5.14: New PCI allocation code (alpha, arm, parisc) [2/2]") that only states: "pbus_size_mem: core stuff; tested with randomly generated sets of resources". It does not explain the motivation for the extra tail space allocated that is not truly needed by the downstream resources. As such, it is far from clear if it ever has been required by any HW. To prevent devices with BAR size disparity from becoming unusable after remove & rescan cycle, attempt to do a truly minimal allocation for memory resources if needed. First check if the normally calculated bridge window will not fit into an already assigned upstream resource. In such case, try with relaxed bridge window tail sizing rules instead where no extra tail space is requested beyond what the downstream resources require. Only enforce the alignment requirement of the bridge window itself (normally 1MiB). With this patch, the resources are successfully allocated: pci 0000:02:01.0: PCI bridge to [bus 03] pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 1 pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 1 pcieport 0000:01:00.0: Assigned bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] to [bus 02-04] cannot fit 0x600000000 required for 0000:02:01.0 bridging to [bus 03] pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] to [bus 03] requires relaxed alignment rules pcieport 0000:01:00.0: Assigned bridge window [mem 0x40400000-0x406fffff] to [bus 02-04] free space at [mem 0x40400000-0x405fffff] pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]: assigned pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]: assigned pci 0000:03:00.0: BAR 2 [mem 0x6000000000-0x63ffffffff 64bit pref]: assigned pci 0000:03:00.0: BAR 0 [mem 0x6400000000-0x6400ffffff 64bit pref]: assigned pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]: assigned pci 0000:02:01.0: PCI bridge to [bus 03] pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff] pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] This patch draws inspiration from the initial investigations and work by Mika Westerberg. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216795 Link: https://lore.kernel.org/linux-pci/20190812144144.2646-1-mika.westerberg@linux.intel.com/ Fixes: 5d0a8965 ("[PATCH] 2.5.14: New PCI allocation code (alpha, arm, parisc) [2/2]") Link: https://lore.kernel.org/r/20240507102523.57320-9-ilpo.jarvinen@linux.intel.comTested-by: Lidong Wang <lidong.wang@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
-
Ilpo Järvinen authored
Calculations related to bridge window size contain literal 20 that is the minimum alignment for a bridge window. Make the code more obvious by converting the literal 20 to __ffs(SZ_1M). Link: https://lore.kernel.org/r/20240507102523.57320-8-ilpo.jarvinen@linux.intel.comSigned-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash https://lore.kernel.org/r/20240612093250.17544-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
-
- 04 Jun, 2024 1 commit
-
-
Dan Williams authored
The recent adventure with adding lockdep tracking for cfg_access_lock, while it yielded many false positives [1], did catch a true positive in the pci_reset_bus() path [2]. So, while lockdep is difficult to deploy, open coding a check that cfg_access_lock is held during the reset is feasible. While this does not offer a full backtrace, it should be sufficient to implicate the caller of pci_bridge_secondary_bus_reset() as a path that needs investigation. Link: https://lore.kernel.org/r/171711746953.1628941.4692125082286867825.stgit@dwillia2-xfh.jf.intel.com Link: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_134186v1/shard-dg2-1/igt@device_reset@unbind-reset-rebind.html [1] Link: http://lore.kernel.org/r/cfb50601-5d2a-4676-a958-1bd3f1b06654@intel.com [2] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Kalle Valo <kvalo@kernel.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
-
- 03 Jun, 2024 4 commits
-
-
Vidya Sagar authored
Use preserve_config in place of checking for PCI_PROBE_ONLY flag to enable support for "linux,pci-probe-only" on a per host bridge basis. This also obviates the use of adding PCI_REASSIGN_ALL_BUS flag if !PCI_PROBE_ONLY, as pci_assign_unassigned_root_bus_resources() takes care of reassigning the resources that are not already claimed. Link: https://lore.kernel.org/r/20240508174138.3630283-5-vidyas@nvidia.comSigned-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
Vidya Sagar authored
Unify the 'preserve config' support across ACPI and device-tree boot flows. Link: https://lore.kernel.org/r/20240508174138.3630283-4-vidyas@nvidia.comSigned-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
Vidya Sagar authored
Add of_pci_preserve_config() to look for the "linux,pci-probe-only" property under a specified node. If it's not found there, look under "of_chosen" in addition. If the caller didn't specify a node, look under "of_chosen". With a future patch, this will support "linux,pci-probe-only" on a per host bridge basis based on the presence of the property in the respective PCI host bridge DT node. Implement of_pci_check_probe_only() using of_pci_preserve_config(). Link: https://lore.kernel.org/r/20240508174138.3630283-3-vidyas@nvidia.comSigned-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
Vidya Sagar authored
Move the PRESERVE_BOOT_CONFIG _DSM evaluation from acpi_pci_root_create() to pci_register_host_bridge(). This will help unify the ACPI _DSM path and the DT-based "linux,pci-probe-only" paths. This should be safe because it happens earlier than it used to: acpi_pci_root_create pci_create_root_bus pci_register_host_bridge + bridge->preserve_config = pci_preserve_config(bridge) pci_acpi_preserve_config + acpi_evaluate_dsm_typed(DSM_PCI_PRESERVE_BOOT_CONFIG) - acpi_evaluate_dsm_typed(DSM_PCI_PRESERVE_BOOT_CONFIG) No functional change intended. Link: https://lore.kernel.org/r/20240508174138.3630283-2-vidyas@nvidia.comSigned-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
- 30 May, 2024 1 commit
-
-
Lukas Wunner authored
Ricky reports that replacing a device in a hotplug slot during ACPI sleep state S3 does not cause re-enumeration on resume, as one would expect. Instead, the new device is treated as if it was the old one. There is no bulletproof way to detect device replacement, but as a heuristic, check whether the device identity in config space matches cached data in struct pci_dev (Vendor ID, Device ID, Class Code, Revision ID, Subsystem Vendor ID, Subsystem ID). Additionally, cache and compare the Device Serial Number (PCIe r6.2 sec 7.9.3). If a mismatch is detected, mark the old device disconnected (to prevent its driver from accessing the new device) and synthesize a Presence Detect Changed event. The device identity in config space which is compared here is the same as the one included in the signed Subject Alternative Name per PCIe r6.1 sec 6.31.3. Thus, the present commit prevents attacks where a valid device is replaced with a malicious device during system sleep and the valid device's driver obliviously accesses the malicious device. This is about as much as can be done at the PCI layer. Drivers may have additional ways to identify devices (such as reading a WWID from some register) and may trigger re-enumeration when detecting an identity change on resume. Link: https://lore.kernel.org/r/a1afaa12f341d146ecbea27c1743661c71683833.1716992815.git.lukas@wunner.deReported-by: Ricky Wu <ricky_wu@realtek.com> Closes: https://lore.kernel.org/r/a608b5930d0a48f092f717c0e137454b@realtek.comTested-by: Ricky Wu <ricky_wu@realtek.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
-
- 28 May, 2024 5 commits
-
-
Ilpo Järvinen authored
pbus_size_mem() keeps the size of the optional resources in children_add_size. When calculating the PCI bridge window size, calculate_memsize() lower bounds size by old_size before adding children_add_size and performing the window size alignment. This results in double counting for the resources in children_add_size because old_size may be based on the previous size of the bridge window after it has already included children_add_size (that is, size1 in pbus_size_mem() from an earlier invocation of that function). As a result, on repeated remove of the bus & rescan cycles the resource size keeps increasing when children_add_size is non-zero as can be seen from this extract: iomem0: 23fffd00000-23fffdfffff : PCI Bus 0000:03 # 1MiB iomem1: 20000000000-200001fffff : PCI Bus 0000:03 # 2MiB iomem2: 20000000000-200002fffff : PCI Bus 0000:03 # 3MiB iomem3: 20000000000-200003fffff : PCI Bus 0000:03 # 4MiB iomem4: 20000000000-200004fffff : PCI Bus 0000:03 # 5MiB Solve the double counting by moving old_size check later in calculate_memsize() so that children_add_size is already accounted for. After the patch, the bridge window retains its size as expected: iomem0: 23fffd00000-23fffdfffff : PCI Bus 0000:03 # 1MiB iomem1: 20000000000-200000fffff : PCI Bus 0000:03 # 1MiB iomem2: 20000000000-200000fffff : PCI Bus 0000:03 # 1MiB Fixes: a4ac9fea ("PCI : Calculate right add_size") Link: https://lore.kernel.org/r/20240507102523.57320-2-ilpo.jarvinen@linux.intel.comTested-by: Lidong Wang <lidong.wang@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
-
Ilpo Järvinen authored
PCI bridge window logic needs to find out in advance to the actual allocation if there is an empty space big enough to fit the window. Export find_resource_space() for the purpose. Also move the struct resource_constraint into generic header to be able to use the new interface. Link: https://lore.kernel.org/r/20240507102523.57320-7-ilpo.jarvinen@linux.intel.comTested-by: Lidong Wang <lidong.wang@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
-
Ilpo Järvinen authored
allocate_resource() accepts ->alignf() callback to perform custom alignment beyond constraint->align. If alignf is NULL, simple_align_resource() is used which only returns avail->start (no change). Using avail->start directly is natural and can be done with a conditional in __find_resource_space() instead which avoids unnecessarily using callback. It makes the code inside __find_resource_space() more obvious and removes the need for the caller to provide constraint->alignf unnecessarily. This is preparation for exporting find_resource_space(). Link: https://lore.kernel.org/r/20240507102523.57320-6-ilpo.jarvinen@linux.intel.comTested-by: Lidong Wang <lidong.wang@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
-
Ilpo Järvinen authored
To make it simpler to declare resource constraint alignf callbacks, add typedef for it and document it. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20240507102523.57320-5-ilpo.jarvinen@linux.intel.comTested-by: Lidong Wang <lidong.wang@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
-
Ilpo Järvinen authored
Document find_resource_space() and the struct resource_constraint as they are going to be exposed outside of resource.c. Link: https://lore.kernel.org/r/20240507102523.57320-4-ilpo.jarvinen@linux.intel.comTested-by: Lidong Wang <lidong.wang@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
-