• Stuart Hayes's avatar
    usb: Fix deadlock in hid_reset when Dell iDRAC is reset · dad11ff1
    Stuart Hayes authored
    This was fixed upstream by commit e22bee78
    ('workqueue: implement concurrency managed dynamic worker pool'), but
    that is far too large a change for stable.
    
    When Dell iDRAC is reset, the iDRAC's USB keyboard/mouse device stops
    responding but is not actually disconnected.  This causes usbhid to
    hid hid_io_error(), and you get a chain of calls like...
    
    hid_reset()
     usb_reset_device()
      usb_reset_and_verify_device()
       usb_ep0_reinit()
        usb_disble_endpoint()
         usb_hcd_disable_endpoint()
          ehci_endpoint_disable()
    
    Along the way, as a result of an error/timeout with a USB transaction,
    ehci_clear_tt_buffer() calls usb_hub_clear_tt_buffer() (to clear a failed
    transaction out of the transaction translator in the hub), which schedules
    hub_tt_work() to be run (using keventd), and then sets qh->clearing_tt=1 so
    that nobody will mess with that qh until the TT is cleared.
    
    But run_workqueue() never happens for the keventd workqueue on that CPU, so
    hub_tt_work() never gets run.  And qh->clearing_tt never gets changed back to
    0.
    
    This causes ehci_endpoint_disable() to get stuck in a loop waiting for
    qh->clearing_tt to go to 0.
    
    Part of the problem is hid_reset() is itself running on keventd.  So
    when that thread gets a timeout trying to talk to the HID device, it
    schedules clear_work (to run hub_tt_work) to run, and then gets stuck
    in ehci_endpoint_disable waiting for it to run.
    
    However, clear_work never gets run because the workqueue for that CPU
    is still waiting for hid_reset to finish.
    
    A much less invasive patch for earlier kernels is to just schedule
    clear_work on khubd if the usb code needs to clear the TT and it sees
    that it is already running on keventd.  Khubd isn't used by default
    because it can get blocked by device enumeration sometimes, but I
    think it should be ok for a backup for unusual cases like this just to
    prevent deadlock.
    Signed-off-by: default avatarStuart Hayes <stuart_hayes@dell.com>
    Signed-off-by: default avatarShyam Iyer <shyam_iyer@dell.com>
    [bwh: Use current_is_keventd() rather than checking current->{flags,comm}]
    Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
    Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
    dad11ff1
hub.c 108 KB