• Jarod Wilson's avatar
    PCI: pciehp: Handle invalid data when reading from non-existent devices · 1469d17d
    Jarod Wilson authored
    It's platform-dependent, but an MMIO read to a non-existent PCI device
    generally returns data with all bits set.  This happens when the host
    bridge or Root Complex times out waiting for a response from the device and
    fabricates return data to complete the CPU's read.
    
    One example, reported in the bugzilla below, involved this hierarchy:
    
      pci 0000:00:1c.0: PCI bridge to [bus 02-3a] Root Port
      pci 0000:02:00.0: PCI bridge to [bus 03-0a] Upstream Port
      pci 0000:03:03.0: PCI bridge to [bus 05-07] Downstream Port
      pci 0000:05:00.0: PCI bridge to [bus 06-07] Thunderbolt Upstream Port
      pci 0000:06:00.0: PCI bridge to [bus 07]    Thunderbolt Downstream Port
      pci 0000:07:00.0: BCM57762 NIC
    
    Unplugging the Thunderbolt switch and the NIC below it resulted in this:
    
      pciehp 0000:03:03.0: Surprise Removal
      tg3 0000:07:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
      pciehp 0000:06:00.0: unloading service driver pciehp
      pciehp 0000:06:00.0: pcie_isr: intr_loc 11f
      pciehp 0000:06:00.0: Switch interrupt received
      pciehp 0000:06:00.0: Latch open on Slot
      pciehp 0000:06:00.0: Attention button interrupt received
      pciehp 0000:06:00.0: Button pressed on Slot
      pciehp 0000:06:00.0: Presence/Notify input change
      pciehp 0000:06:00.0: Card present on Slot
      pciehp 0000:06:00.0: Power fault interrupt received
      pciehp 0000:06:00.0: Data Link Layer State change
      pciehp 0000:06:00.0: Link Up event
    
    The pciehp driver correctly noticed that the Thunderbolt switch (05:00.0
    and 06:00.0) and NIC (07:00.0) had been removed, and it called their driver
    remove methods.
    
    Since the NIC was already gone, tg3 received 0xffffffff when it tried to
    read from the device.  The resulting timeout is a tg3 issue and not of
    interest here.
    
    Similarly, since the 06:00.0 Thunderbolt switch was already gone,
    pcie_isr() received 0xffff when it tried to read PCI_EXP_SLTSTA, and pciehp
    thought that was valid status showing that many events had happened: the
    latch had been opened, the attention button had been pressed, a card was
    now present, and the link was now up.  These are all wrong, of course, but
    pciehp went on to try to power up and enumerate devices below the
    non-existent bridge:
    
      pciehp 0000:06:00.0: PCI slot - powering on due to button press
      pciehp 0000:06:00.0: Surprise Insertion
      pci 0000:07:00.0 id reading try 50 times with interval 20 ms to get ffffffff
    
    [bhelgaas: changelog, also check in pcie_poll_cmd() & pcie_do_write_cmd()]
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=99841Suggested-by: default avatarBjorn Helgaas <bhelgaas@google.com>
    Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
    Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
    1469d17d
pciehp_hpc.c 22.6 KB