• Lukas Wunner's avatar
    PCI: pciehp: Become resilient to missed events · d331710e
    Lukas Wunner authored
    A hotplug port's Slot Status register does not count how often each type
    of event occurred, it only records the fact *that* an event has occurred.
    
    Previously pciehp queued a work item for each event.  But if it missed
    an event, e.g. removal of a card in-between two back-to-back insertions,
    it queued up the wrong work item or no work item at all.  Commit
    fad214b0 ("PCI: pciehp: Process all hotplug events before looking
    for new ones") sought to improve the situation by shrinking the window
    during which events may be missed.
    
    But Stefan Roese reports unbalanced Card present and Link Up events,
    suggesting that we're still missing events if they occur very rapidly.
    Bjorn Helgaas responds that he considers pciehp's event handling
    "baroque" and calls for its simplification and rationalization:
    https://lkml.kernel.org/r/20180202192045.GA53759@bhelgaas-glaptop.roam.corp.google.com
    
    It gets worse once a hotplug port is runtime suspended:  The port can
    signal an interrupt while it and its parents are in D3hot, i.e. while
    it is inaccessible.  By the time we've runtime resumed all parents to D0
    and read the port's Slot Status register, we may have missed an arbitrary
    number of events.  Event handling therefore needs to be reworked to
    become resilient to missed events.
    
    Assume that a Presence Detect Changed event has occurred.
    Consider the following truth table:
    - Slot is in OFF_STATE and is currently empty.    => Do nothing.
      (The event is trailing a Link Down or we've
      missed an insertion and subsequent removal.)
    - Slot is in OFF_STATE and is currently occupied. => Turn the slot on.
    - Slot is in ON_STATE  and is currently empty.    => Turn the slot off.
    - Slot is in ON_STATE  and is currently occupied. => Turn the slot off,
      (Be cautious and assume the card in                then back on.
      the slot isn't the same as before.)
    
    This leads to the following simple algorithm:
    1 If the slot is in ON_STATE, turn it off unconditionally.
    2 If the slot is currently occupied, turn it on.
    
    Because those actions are now carried out synchronously, rather than by
    scheduled work items, pciehp reacts to the *current* situation and
    missed events no longer matter.
    
    Data Link Layer State Changed events can be handled identically to
    Presence Detect Changed events.  Note that in the above truth table,
    a Link Up trailing a Card present event didn't have to be accounted for:
    It is filtered out by pciehp_check_link_status().
    
    As for Attention Button Pressed events, PCIe r4.0, sec 6.7.1.5 says:
    "Once the Power Indicator begins blinking, a 5-second abort interval
    exists during which a second depression of the Attention Button cancels
    the operation."  In other words, the user can only expect the system to
    react to a button press after it starts blinking.  Missed button presses
    that occur in-between are irrelevant.
    Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
    Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
    Cc: Stefan Roese <sr@denx.de>
    Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
    Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
    Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
    d331710e
pciehp.h 7.7 KB