• Tejun Heo's avatar
    workqueue: fix possible deadlock in idle worker rebinding · ec58815a
    Tejun Heo authored
    Currently, rebind_workers() and idle_worker_rebind() are two-way
    interlocked.  rebind_workers() waits for idle workers to finish
    rebinding and rebound idle workers wait for rebind_workers() to finish
    rebinding busy workers before proceeding.
    
    Unfortunately, this isn't enough.  The second wait from idle workers
    is implemented as follows.
    
    	wait_event(gcwq->rebind_hold, !(worker->flags & WORKER_REBIND));
    
    rebind_workers() clears WORKER_REBIND, wakes up the idle workers and
    then returns.  If CPU hotplug cycle happens again before one of the
    idle workers finishes the above wait_event(), rebind_workers() will
    repeat the first part of the handshake - set WORKER_REBIND again and
    wait for the idle worker to finish rebinding - and this leads to
    deadlock because the idle worker would be waiting for WORKER_REBIND to
    clear.
    
    This is fixed by adding another interlocking step at the end -
    rebind_workers() now waits for all the idle workers to finish the
    above WORKER_REBIND wait before returning.  This ensures that all
    rebinding steps are complete on all idle workers before the next
    hotplug cycle can happen.
    
    This problem was diagnosed by Lai Jiangshan who also posted a patch to
    fix the issue, upon which this patch is based.
    
    This is the minimal fix and further patches are scheduled for the next
    merge window to simplify the CPU hotplug path.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Original-patch-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
    LKML-Reference: <1346516916-1991-3-git-send-email-laijs@cn.fujitsu.com>
    ec58815a
workqueue.c 103 KB