• Lai Jiangshan's avatar
    workqueue: fix possible stall on try_to_grab_pending() of a delayed work item · 3aa62497
    Lai Jiangshan authored
    Currently, when try_to_grab_pending() grabs a delayed work item, it
    leaves its linked work items alone on the delayed_works.  The linked
    work items are always NO_COLOR and will cause future
    cwq_activate_first_delayed() increase cwq->nr_active incorrectly, and
    may cause the whole cwq to stall.  For example,
    
    state: cwq->max_active = 1, cwq->nr_active = 1
           one work in cwq->pool, many in cwq->delayed_works.
    
    step1: try_to_grab_pending() removes a work item from delayed_works
           but leaves its NO_COLOR linked work items on it.
    
    step2: Later on, cwq_activate_first_delayed() activates the linked
           work item increasing ->nr_active.
    
    step3: cwq->nr_active = 1, but all activated work items of the cwq are
           NO_COLOR.  When they finish, cwq->nr_active will not be
           decreased due to NO_COLOR, and no further work items will be
           activated from cwq->delayed_works. the cwq stalls.
    
    Fix it by ensuring the target work item is activated before stealing
    PENDING in try_to_grab_pending().  This ensures that all the linked
    work items are activated without incorrectly bumping cwq->nr_active.
    
    tj: Updated comment and description.
    Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Cc: stable@kernel.org
    3aa62497
workqueue.c 106 KB