• Neeraj Upadhyay's avatar
    rcu: Allow only one expedited GP to run concurrently with wakeups · 4bc6b745
    Neeraj Upadhyay authored
    The current expedited RCU grace-period code expects that a task
    requesting an expedited grace period cannot awaken until that grace
    period has reached the wakeup phase.  However, it is possible for a long
    preemption to result in the waiting task never sleeping.  For example,
    consider the following sequence of events:
    
    1.	Task A starts an expedited grace period by invoking
    	synchronize_rcu_expedited().  It proceeds normally up to the
    	wait_event() near the end of that function, and is then preempted
    	(or interrupted or whatever).
    
    2.	The expedited grace period completes, and a kworker task starts
    	the awaken phase, having incremented the counter and acquired
    	the rcu_state structure's .exp_wake_mutex.  This kworker task
    	is then preempted or interrupted or whatever.
    
    3.	Task A resumes and enters wait_event(), which notes that the
    	expedited grace period has completed, and thus doesn't sleep.
    
    4.	Task B starts an expedited grace period exactly as did Task A,
    	complete with the preemption (or whatever delay) just before
    	the call to wait_event().
    
    5.	The expedited grace period completes, and another kworker
    	task starts the awaken phase, having incremented the counter.
    	However, it blocks when attempting to acquire the rcu_state
    	structure's .exp_wake_mutex because step 2's kworker task has
    	not yet released it.
    
    6.	Steps 4 and 5 repeat, resulting in overflow of the rcu_node
    	structure's ->exp_wq[] array.
    
    In theory, this is harmless.  Tasks waiting on the various ->exp_wq[]
    array will just be spuriously awakened, but they will just sleep again
    on noting that the rcu_state structure's ->expedited_sequence value has
    not advanced far enough.
    
    In practice, this wastes CPU time and is an accident waiting to happen.
    This commit therefore moves the rcu_exp_gp_seq_end() call that officially
    ends the expedited grace period (along with associate tracing) until
    after the ->exp_wake_mutex has been acquired.  This prevents Task A from
    awakening prematurely, thus preventing more than one expedited grace
    period from being in flight during a previous expedited grace period's
    wakeup phase.
    
    Fixes: 3b5f668e ("rcu: Overlap wakeups with next expedited grace period")
    Signed-off-by: default avatarNeeraj Upadhyay <neeraju@codeaurora.org>
    [ paulmck: Added updated comment. ]
    Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
    4bc6b745
tree_exp.h 24.7 KB