• John Stultz's avatar
    usb: f_fs: Avoid crash due to out-of-scope stack ptr access · 54f64d5c
    John Stultz authored
    Since the 5.0 merge window opened, I've been seeing frequent
    crashes on suspend and reboot with the trace:
    
    [   36.911170] Unable to handle kernel paging request at virtual address ffffff801153d660
    [   36.912769] Unable to handle kernel paging request at virtual address ffffff800004b564
    ...
    [   36.950666] Call trace:
    [   36.950670]  queued_spin_lock_slowpath+0x1cc/0x2c8
    [   36.950681]  _raw_spin_lock_irqsave+0x64/0x78
    [   36.950692]  complete+0x28/0x70
    [   36.950703]  ffs_epfile_io_complete+0x3c/0x50
    [   36.950713]  usb_gadget_giveback_request+0x34/0x108
    [   36.950721]  dwc3_gadget_giveback+0x50/0x68
    [   36.950723]  dwc3_thread_interrupt+0x358/0x1488
    [   36.950731]  irq_thread_fn+0x30/0x88
    [   36.950734]  irq_thread+0x114/0x1b0
    [   36.950739]  kthread+0x104/0x130
    [   36.950747]  ret_from_fork+0x10/0x1c
    
    I isolated this down to in ffs_epfile_io():
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/gadget/function/f_fs.c#n1065
    
    Where the completion done is setup on the stack:
      DECLARE_COMPLETION_ONSTACK(done);
    
    Then later we setup a request and queue it, and wait for it:
      if (unlikely(wait_for_completion_interruptible(&done))) {
        /*
        * To avoid race condition with ffs_epfile_io_complete,
        * dequeue the request first then check
        * status. usb_ep_dequeue API should guarantee no race
        * condition with req->complete callback.
        */
        usb_ep_dequeue(ep->ep, req);
        interrupted = ep->status < 0;
      }
    
    The problem is, that we end up being interrupted, dequeue the
    request, and exit.
    
    But then the irq triggers and we try calling complete() on the
    context pointer which points to now random stack space, which
    results in the panic.
    
    Alan Stern pointed out there is a bug here, in that the snippet
    above "assumes that usb_ep_dequeue() waits until the request has
    been completed." And that:
    
        wait_for_completion(&done);
    
    Is needed right after the usb_ep_dequeue().
    
    Thus this patch implements that change. With it I no longer see
    the crashes on suspend or reboot.
    
    This issue seems to have been uncovered by behavioral changes in
    the dwc3 driver in commit fec9095b ("usb: dwc3: gadget:
    remove wait_end_transfer").
    
    Cc: Alan Stern <stern@rowland.harvard.edu>
    Cc: Felipe Balbi <balbi@kernel.org>
    Cc: Zeng Tao <prime.zeng@hisilicon.com>
    Cc: Jack Pham <jackp@codeaurora.org>
    Cc: Thinh Nguyen <thinh.nguyen@synopsys.com>
    Cc: Chen Yu <chenyu56@huawei.com>
    Cc: Jerry Zhang <zhangjerry@google.com>
    Cc: Lars-Peter Clausen <lars@metafoo.de>
    Cc: Vincent Pelletier <plr.vincent@gmail.com>
    Cc: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Linux USB List <linux-usb@vger.kernel.org>
    Suggested-by: default avatarAlan Stern <stern@rowland.harvard.edu>
    Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
    Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
    54f64d5c
f_fs.c 89.9 KB