• Tetsuo Handa's avatar
    PM: hibernate: defer device probing when resuming from hibernation · 8386c414
    Tetsuo Handa authored
    syzbot is reporting hung task at misc_open() [1], for there is a race
    window of AB-BA deadlock which involves probe_count variable. Currently
    wait_for_device_probe() from snapshot_open() from misc_open() can sleep
    forever with misc_mtx held if probe_count cannot become 0.
    
    When a device is probed by hub_event() work function, probe_count is
    incremented before the probe function starts, and probe_count is
    decremented after the probe function completed.
    
    There are three cases that can prevent probe_count from dropping to 0.
    
      (a) A device being probed stopped responding (i.e. broken/malicious
          hardware).
    
      (b) A process emulating a USB device using /dev/raw-gadget interface
          stopped responding for some reason.
    
      (c) New device probe requests keeps coming in before existing device
          probe requests complete.
    
    The phenomenon syzbot is reporting is (b). A process which is holding
    system_transition_mutex and misc_mtx is waiting for probe_count to become
    0 inside wait_for_device_probe(), but the probe function which is called
     from hub_event() work function is waiting for the processes which are
    blocked at mutex_lock(&misc_mtx) to respond via /dev/raw-gadget interface.
    
    This patch mitigates (b) by deferring wait_for_device_probe() from
    snapshot_open() to snapshot_write() and snapshot_ioctl(). Please note that
    the possibility of (b) remains as long as any thread which is emulating a
    USB device via /dev/raw-gadget interface can be blocked by uninterruptible
    blocking operations (e.g. mutex_lock()).
    
    Please also note that (a) and (c) are not addressed. Regarding (c), we
    should change the code to wait for only one device which contains the
    image for resuming from hibernation. I don't know how to address (a), for
    use of timeout for wait_for_device_probe() might result in loss of user
    data in the image. Maybe we should require the userland to wait for the
    image device before opening /dev/snapshot interface.
    
    Link: https://syzkaller.appspot.com/bug?extid=358c9ab4c93da7b7238c [1]
    Reported-by: default avatarsyzbot <syzbot+358c9ab4c93da7b7238c@syzkaller.appspotmail.com>
    Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Tested-by: default avatarsyzbot <syzbot+358c9ab4c93da7b7238c@syzkaller.appspotmail.com>
    Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
    8386c414
user.c 9.52 KB