• Paul E. McKenney's avatar
    rcu: Fix grace-period hangs from mid-init task resume · ec2c2976
    Paul E. McKenney authored
    Without special fail-safe quiescent-state-propagation checks, grace-period
    hangs can result from the following scenario:
    
    1.	A task running on a given CPU is preempted in its RCU read-side
    	critical section.
    
    2.	That CPU goes offline, and there are now no online CPUs
    	corresponding to that CPU's leaf rcu_node structure.
    
    3.	The rcu_gp_init() function does the first phase of grace-period
    	initialization, and sets the aforementioned leaf rcu_node
    	structure's ->qsmaskinit field to all zeroes.  Because there
    	is a blocked task, it does not propagate the zeroing of either
    	->qsmaskinit or ->qsmaskinitnext up the rcu_node tree.
    
    4.	The task resumes on some other CPU and exits its critical section.
    	There is no grace period in progress, so the resulting quiescent
    	state is not reported up the tree.
    
    5.	The rcu_gp_init() function does the second phase of grace-period
    	initialization, which results in the leaf rcu_node structure
    	being initialized to expect no further quiescent states, but
    	with that structure's parent expecting a quiescent-state report.
    
    	The parent will never receive a quiescent state from this leaf
    	rcu_node structure, so the grace period will hang, resulting in
    	RCU CPU stall warnings.
    
    It would be good to get rid of the special fail-safe quiescent-state
    propagation checks.  This commit therefore checks the leaf rcu_node
    structure's ->wait_blkd_tasks field during grace-period initialization.
    If this flag is set, the rcu_report_qs_rnp() is invoked to immediately
    report the possible quiescent state.  While in the neighborhood, this
    commit also report quiescent states for any CPUs that went offline between
    the two phases of grace-period initialization, thus reducing grace-period
    delays and hopefully eventually allowing removal of offline-CPU checks
    from the force-quiescent-state code path.
    Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
    ec2c2976
tree.c 130 KB