• Pranith Kumar's avatar
    rcu: Check both root and current rcu_node when setting up future grace period · 48bd8e9b
    Pranith Kumar authored
    The rcu_start_future_gp() function checks the current rcu_node's ->gpnum
    and ->completed twice, once without ACCESS_ONCE() and once with it.
    Which is pointless because we hold that rcu_node's ->lock at that point.
    The intent was to check the current rcu_node structure and the root
    rcu_node structure, the latter locklessly with ACCESS_ONCE().  This
    commit therefore makes that change.
    
    The reason that it is safe to locklessly check the root rcu_nodes's
    ->gpnum and ->completed fields is that we hold the current rcu_node's
    ->lock, which constrains the root rcu_node's ability to change its
    ->gpnum and ->completed fields.  Of course, if there is a single rcu_node
    structure, then rnp_root==rnp, and holding the lock prevents all changes.
    If there is more than one rcu_node structure, then the code updates the
    fields in the following order:
    
    1.	Increment rnp_root->gpnum to start new grace period.
    2.	Increment rnp->gpnum to initialize the current rcu_node,
    	continuing initialization for the new grace period.
    3.	Increment rnp_root->completed to end the current grace period.
    4.	Increment rnp->completed to continue cleaning up after the
    	old grace period.
    
    So there are four possible combinations of relative values of these
    four fields:
    
    N   N   N   N:  RCU idle, new grace period must be initiated.
    		Although rnp_root->gpnum might be incremented immediately
    		after we check, that will just result in unnecessary work.
    		The grace period already started, and we try to start it.
    
    N+1 N   N   N:  RCU grace period just started.  No further change is
    		possible because we hold rnp->lock, so the checks of
    		rnp_root->gpnum and rnp_root->completed are stable.
    		We know that our request for a future grace period will
    		be seen during grace-period cleanup.
    
    N+1 N   N+1 N:  RCU grace period is ongoing.  Because rnp->gpnum is
    		different than rnp->completed, we won't even look at
    		rnp_root->gpnum and rnp_root->completed, so the possible
    		concurrent change to rnp_root->completed does not matter.
    		We know that our request for a future grace period will
    		be seen during grace-period cleanup, which cannot pass
    		this rcu_node because we hold its ->lock.
    
    N+1 N+1 N+1 N:  RCU grace period has ended, but not yet been cleaned up.
    		Because rnp->gpnum is different than rnp->completed, we
    		won't look at rnp_root->gpnum and rnp_root->completed, so
    		the possible concurrent change to rnp_root->completed does
    		not matter.  We know that our request for a future grace
    		period will be seen during grace-period cleanup, which
    		cannot pass this rcu_node because we hold its ->lock.
    
    Therefore, despite initial appearances, the lockless check is safe.
    Signed-off-by: default avatarPranith Kumar <bobby.prani@gmail.com>
    [ paulmck: Update comment to say why the lockless check is safe. ]
    Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
    48bd8e9b
tree.c 116 KB