• Tetsuo Handa's avatar
    workqueue: don't skip lockdep work dependency in cancel_work_sync() · c0feea59
    Tetsuo Handa authored
    Like Hillf Danton mentioned
    
      syzbot should have been able to catch cancel_work_sync() in work context
      by checking lockdep_map in __flush_work() for both flush and cancel.
    
    in [1], being unable to report an obvious deadlock scenario shown below is
    broken. From locking dependency perspective, sync version of cancel request
    should behave as if flush request, for it waits for completion of work if
    that work has already started execution.
    
      ----------
      #include <linux/module.h>
      #include <linux/sched.h>
      static DEFINE_MUTEX(mutex);
      static void work_fn(struct work_struct *work)
      {
        schedule_timeout_uninterruptible(HZ / 5);
        mutex_lock(&mutex);
        mutex_unlock(&mutex);
      }
      static DECLARE_WORK(work, work_fn);
      static int __init test_init(void)
      {
        schedule_work(&work);
        schedule_timeout_uninterruptible(HZ / 10);
        mutex_lock(&mutex);
        cancel_work_sync(&work);
        mutex_unlock(&mutex);
        return -EINVAL;
      }
      module_init(test_init);
      MODULE_LICENSE("GPL");
      ----------
    
    The check this patch restores was added by commit 0976dfc1
    ("workqueue: Catch more locking problems with flush_work()").
    
    Then, lockdep's crossrelease feature was added by commit b09be676
    ("locking/lockdep: Implement the 'crossrelease' feature"). As a result,
    this check was once removed by commit fd1a5b04 ("workqueue: Remove
    now redundant lock acquisitions wrt. workqueue flushes").
    
    But lockdep's crossrelease feature was removed by commit e966eaee
    ("locking/lockdep: Remove the cross-release locking checks"). At this
    point, this check should have been restored.
    
    Then, commit d6e89786 ("workqueue: skip lockdep wq dependency in
    cancel_work_sync()") introduced a boolean flag in order to distinguish
    flush_work() and cancel_work_sync(), for checking "struct workqueue_struct"
    dependency when called from cancel_work_sync() was causing false positives.
    
    Then, commit 87915adc ("workqueue: re-add lockdep dependencies for
    flushing") tried to restore "struct work_struct" dependency check, but by
    error checked this boolean flag. Like an example shown above indicates,
    "struct work_struct" dependency needs to be checked for both flush_work()
    and cancel_work_sync().
    
    Link: https://lkml.kernel.org/r/20220504044800.4966-1-hdanton@sina.com [1]
    Reported-by: default avatarHillf Danton <hdanton@sina.com>
    Suggested-by: default avatarLai Jiangshan <jiangshanlai@gmail.com>
    Fixes: 87915adc ("workqueue: re-add lockdep dependencies for flushing")
    Cc: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    c0feea59
workqueue.c 168 KB