Commit 792bf4d8 authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "The biggest RCU changes in this cycle were:

   - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar.

   - Replace calls of RCU-bh and RCU-sched update-side functions to
     their vanilla RCU counterparts. This series is a step towards
     complete removal of the RCU-bh and RCU-sched update-side functions.

     ( Note that some of these conversions are going upstream via their
       respective maintainers. )

   - Documentation updates, including a number of flavor-consolidation
     updates from Joel Fernandes.

   - Miscellaneous fixes.

   - Automate generation of the initrd filesystem used for rcutorture
     testing.

   - Convert spin_is_locked() assertions to instead use lockdep.

     ( Note that some of these conversions are going upstream via their
       respective maintainers. )

   - SRCU updates, especially including a fix from Dennis Krein for a
     bag-on-head-class bug.

   - RCU torture-test updates"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits)
  rcutorture: Don't do busted forward-progress testing
  rcutorture: Use 100ms buckets for forward-progress callback histograms
  rcutorture: Recover from OOM during forward-progress tests
  rcutorture: Print forward-progress test age upon failure
  rcutorture: Print time since GP end upon forward-progress failure
  rcutorture: Print histogram of CB invocation at OOM time
  rcutorture: Print GP age upon forward-progress failure
  rcu: Print per-CPU callback counts for forward-progress failures
  rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings
  rcutorture: Dump grace-period diagnostics upon forward-progress OOM
  rcutorture: Prepare for asynchronous access to rcu_fwd_startat
  torture: Remove unnecessary "ret" variables
  rcutorture: Affinity forward-progress test to avoid housekeeping CPUs
  rcutorture: Break up too-long rcu_torture_fwd_prog() function
  rcutorture: Remove cbflood facility
  torture: Bring any extra CPUs online during kernel startup
  rcutorture: Add call_rcu() flooding forward-progress tests
  rcutorture/formal: Replace synchronize_sched() with synchronize_rcu()
  tools/kernel.h: Replace synchronize_sched() with synchronize_rcu()
  net/decnet: Replace rcu_barrier_bh() with rcu_barrier()
  ...
parents eed9688f 4bbfd746
...@@ -160,9 +160,9 @@ was in flight. ...@@ -160,9 +160,9 @@ was in flight.
If the CPU is idle, then <tt>sync_sched_exp_handler()</tt> reports If the CPU is idle, then <tt>sync_sched_exp_handler()</tt> reports
the quiescent state. the quiescent state.
<p> <p> Otherwise, the handler forces a future context switch by setting the
Otherwise, the handler invokes <tt>resched_cpu()</tt>, which forces NEED_RESCHED flag of the current task's thread flag and the CPU preempt
a future context switch. counter.
At the time of the context switch, the CPU reports the quiescent state. At the time of the context switch, the CPU reports the quiescent state.
Should the CPU go offline first, it will report the quiescent state Should the CPU go offline first, it will report the quiescent state
at that time. at that time.
......
...@@ -77,7 +77,7 @@ The key point is that the lock-acquisition functions, including ...@@ -77,7 +77,7 @@ The key point is that the lock-acquisition functions, including
<tt>smp_mb__after_unlock_lock()</tt> immediately after successful <tt>smp_mb__after_unlock_lock()</tt> immediately after successful
acquisition of the lock. acquisition of the lock.
<p>Therefore, for any given <tt>rcu_node</tt> struction, any access <p>Therefore, for any given <tt>rcu_node</tt> structure, any access
happening before one of the above lock-release functions will be seen happening before one of the above lock-release functions will be seen
by all CPUs as happening before any access happening after a later by all CPUs as happening before any access happening after a later
one of the above lock-acquisition functions. one of the above lock-acquisition functions.
......
...@@ -63,7 +63,7 @@ over a rather long period of time, but improvements are always welcome! ...@@ -63,7 +63,7 @@ over a rather long period of time, but improvements are always welcome!
pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(), pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(),
rcu_read_lock_sched(), or by the appropriate update-side lock. rcu_read_lock_sched(), or by the appropriate update-side lock.
Disabling of preemption can serve as rcu_read_lock_sched(), but Disabling of preemption can serve as rcu_read_lock_sched(), but
is less readable. is less readable and prevents lockdep from detecting locking issues.
Letting RCU-protected pointers "leak" out of an RCU read-side Letting RCU-protected pointers "leak" out of an RCU read-side
critical section is every bid as bad as letting them leak out critical section is every bid as bad as letting them leak out
...@@ -285,11 +285,7 @@ over a rather long period of time, but improvements are always welcome! ...@@ -285,11 +285,7 @@ over a rather long period of time, but improvements are always welcome!
here is that superuser already has lots of ways to crash here is that superuser already has lots of ways to crash
the machine. the machine.
d. Use call_rcu_bh() rather than call_rcu(), in order to take d. Periodically invoke synchronize_rcu(), permitting a limited
advantage of call_rcu_bh()'s faster grace periods. (This
is only a partial solution, though.)
e. Periodically invoke synchronize_rcu(), permitting a limited
number of updates per grace period. number of updates per grace period.
The same cautions apply to call_rcu_bh(), call_rcu_sched(), The same cautions apply to call_rcu_bh(), call_rcu_sched(),
...@@ -324,37 +320,14 @@ over a rather long period of time, but improvements are always welcome! ...@@ -324,37 +320,14 @@ over a rather long period of time, but improvements are always welcome!
will break Alpha, cause aggressive compilers to generate bad code, will break Alpha, cause aggressive compilers to generate bad code,
and confuse people trying to read your code. and confuse people trying to read your code.
11. Note that synchronize_rcu() -only- guarantees to wait until 11. Any lock acquired by an RCU callback must be acquired elsewhere
all currently executing rcu_read_lock()-protected RCU read-side
critical sections complete. It does -not- necessarily guarantee
that all currently running interrupts, NMIs, preempt_disable()
code, or idle loops will complete. Therefore, if your
read-side critical sections are protected by something other
than rcu_read_lock(), do -not- use synchronize_rcu().
Similarly, disabling preemption is not an acceptable substitute
for rcu_read_lock(). Code that attempts to use preemption
disabling where it should be using rcu_read_lock() will break
in CONFIG_PREEMPT=y kernel builds.
If you want to wait for interrupt handlers, NMI handlers, and
code under the influence of preempt_disable(), you instead
need to use synchronize_irq() or synchronize_sched().
This same limitation also applies to synchronize_rcu_bh()
and synchronize_srcu(), as well as to the asynchronous and
expedited forms of the three primitives, namely call_rcu(),
call_rcu_bh(), call_srcu(), synchronize_rcu_expedited(),
synchronize_rcu_bh_expedited(), and synchronize_srcu_expedited().
12. Any lock acquired by an RCU callback must be acquired elsewhere
with softirq disabled, e.g., via spin_lock_irqsave(), with softirq disabled, e.g., via spin_lock_irqsave(),
spin_lock_bh(), etc. Failing to disable irq on a given spin_lock_bh(), etc. Failing to disable irq on a given
acquisition of that lock will result in deadlock as soon as acquisition of that lock will result in deadlock as soon as
the RCU softirq handler happens to run your RCU callback while the RCU softirq handler happens to run your RCU callback while
interrupting that acquisition's critical section. interrupting that acquisition's critical section.
13. RCU callbacks can be and are executed in parallel. In many cases, 12. RCU callbacks can be and are executed in parallel. In many cases,
the callback code simply wrappers around kfree(), so that this the callback code simply wrappers around kfree(), so that this
is not an issue (or, more accurately, to the extent that it is is not an issue (or, more accurately, to the extent that it is
an issue, the memory-allocator locking handles it). However, an issue, the memory-allocator locking handles it). However,
...@@ -370,7 +343,7 @@ over a rather long period of time, but improvements are always welcome! ...@@ -370,7 +343,7 @@ over a rather long period of time, but improvements are always welcome!
not the case, a self-spawning RCU callback would prevent the not the case, a self-spawning RCU callback would prevent the
victim CPU from ever going offline.) victim CPU from ever going offline.)
14. Unlike other forms of RCU, it -is- permissible to block in an 13. Unlike other forms of RCU, it -is- permissible to block in an
SRCU read-side critical section (demarked by srcu_read_lock() SRCU read-side critical section (demarked by srcu_read_lock()
and srcu_read_unlock()), hence the "SRCU": "sleepable RCU". and srcu_read_unlock()), hence the "SRCU": "sleepable RCU".
Please note that if you don't need to sleep in read-side critical Please note that if you don't need to sleep in read-side critical
...@@ -414,7 +387,7 @@ over a rather long period of time, but improvements are always welcome! ...@@ -414,7 +387,7 @@ over a rather long period of time, but improvements are always welcome!
Note that rcu_dereference() and rcu_assign_pointer() relate to Note that rcu_dereference() and rcu_assign_pointer() relate to
SRCU just as they do to other forms of RCU. SRCU just as they do to other forms of RCU.
15. The whole point of call_rcu(), synchronize_rcu(), and friends 14. The whole point of call_rcu(), synchronize_rcu(), and friends
is to wait until all pre-existing readers have finished before is to wait until all pre-existing readers have finished before
carrying out some otherwise-destructive operation. It is carrying out some otherwise-destructive operation. It is
therefore critically important to -first- remove any path therefore critically important to -first- remove any path
...@@ -426,13 +399,13 @@ over a rather long period of time, but improvements are always welcome! ...@@ -426,13 +399,13 @@ over a rather long period of time, but improvements are always welcome!
is the caller's responsibility to guarantee that any subsequent is the caller's responsibility to guarantee that any subsequent
readers will execute safely. readers will execute safely.
16. The various RCU read-side primitives do -not- necessarily contain 15. The various RCU read-side primitives do -not- necessarily contain
memory barriers. You should therefore plan for the CPU memory barriers. You should therefore plan for the CPU
and the compiler to freely reorder code into and out of RCU and the compiler to freely reorder code into and out of RCU
read-side critical sections. It is the responsibility of the read-side critical sections. It is the responsibility of the
RCU update-side primitives to deal with this. RCU update-side primitives to deal with this.
17. Use CONFIG_PROVE_LOCKING, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the 16. Use CONFIG_PROVE_LOCKING, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the
__rcu sparse checks to validate your RCU code. These can help __rcu sparse checks to validate your RCU code. These can help
find problems as follows: find problems as follows:
...@@ -455,7 +428,7 @@ over a rather long period of time, but improvements are always welcome! ...@@ -455,7 +428,7 @@ over a rather long period of time, but improvements are always welcome!
These debugging aids can help you find problems that are These debugging aids can help you find problems that are
otherwise extremely difficult to spot. otherwise extremely difficult to spot.
18. If you register a callback using call_rcu(), call_rcu_bh(), 17. If you register a callback using call_rcu(), call_rcu_bh(),
call_rcu_sched(), or call_srcu(), and pass in a function defined call_rcu_sched(), or call_srcu(), and pass in a function defined
within a loadable module, then it in necessary to wait for within a loadable module, then it in necessary to wait for
all pending callbacks to be invoked after the last invocation all pending callbacks to be invoked after the last invocation
...@@ -469,8 +442,8 @@ over a rather long period of time, but improvements are always welcome! ...@@ -469,8 +442,8 @@ over a rather long period of time, but improvements are always welcome!
You instead need to use one of the barrier functions: You instead need to use one of the barrier functions:
o call_rcu() -> rcu_barrier() o call_rcu() -> rcu_barrier()
o call_rcu_bh() -> rcu_barrier_bh() o call_rcu_bh() -> rcu_barrier()
o call_rcu_sched() -> rcu_barrier_sched() o call_rcu_sched() -> rcu_barrier()
o call_srcu() -> srcu_barrier() o call_srcu() -> srcu_barrier()
However, these barrier functions are absolutely -not- guaranteed However, these barrier functions are absolutely -not- guaranteed
......
...@@ -176,9 +176,8 @@ causing stalls, and that the stall was affecting RCU-sched. This message ...@@ -176,9 +176,8 @@ causing stalls, and that the stall was affecting RCU-sched. This message
will normally be followed by stack dumps for each CPU. Please note that will normally be followed by stack dumps for each CPU. Please note that
PREEMPT_RCU builds can be stalled by tasks as well as by CPUs, and that PREEMPT_RCU builds can be stalled by tasks as well as by CPUs, and that
the tasks will be indicated by PID, for example, "P3421". It is even the tasks will be indicated by PID, for example, "P3421". It is even
possible for a rcu_preempt_state stall to be caused by both CPUs -and- possible for an rcu_state stall to be caused by both CPUs -and- tasks,
tasks, in which case the offending CPUs and tasks will all be called in which case the offending CPUs and tasks will all be called out in the list.
out in the list.
CPU 2's "(3 GPs behind)" indicates that this CPU has not interacted with CPU 2's "(3 GPs behind)" indicates that this CPU has not interacted with
the RCU core for the past three grace periods. In contrast, CPU 16's "(0 the RCU core for the past three grace periods. In contrast, CPU 16's "(0
...@@ -206,7 +205,7 @@ handlers are no longer able to execute on this CPU. This can happen if ...@@ -206,7 +205,7 @@ handlers are no longer able to execute on this CPU. This can happen if
the stalled CPU is spinning with interrupts are disabled, or, in -rt the stalled CPU is spinning with interrupts are disabled, or, in -rt
kernels, if a high-priority process is starving RCU's softirq handler. kernels, if a high-priority process is starving RCU's softirq handler.
The "fps=" shows the number of force-quiescent-state idle/offline The "fqs=" shows the number of force-quiescent-state idle/offline
detection passes that the grace-period kthread has made across this detection passes that the grace-period kthread has made across this
CPU since the last time that this CPU noted the beginning of a grace CPU since the last time that this CPU noted the beginning of a grace
period. period.
......
...@@ -266,7 +266,7 @@ rcu_dereference() ...@@ -266,7 +266,7 @@ rcu_dereference()
unnecessary overhead on Alpha CPUs. unnecessary overhead on Alpha CPUs.
Note that the value returned by rcu_dereference() is valid Note that the value returned by rcu_dereference() is valid
only within the enclosing RCU read-side critical section. only within the enclosing RCU read-side critical section [1].
For example, the following is -not- legal: For example, the following is -not- legal:
rcu_read_lock(); rcu_read_lock();
...@@ -292,6 +292,19 @@ rcu_dereference() ...@@ -292,6 +292,19 @@ rcu_dereference()
typically used indirectly, via the _rcu list-manipulation typically used indirectly, via the _rcu list-manipulation
primitives, such as list_for_each_entry_rcu(). primitives, such as list_for_each_entry_rcu().
[1] The variant rcu_dereference_protected() can be used outside
of an RCU read-side critical section as long as the usage is
protected by locks acquired by the update-side code. This variant
avoids the lockdep warning that would happen when using (for
example) rcu_dereference() without rcu_read_lock() protection.
Using rcu_dereference_protected() also has the advantage
of permitting compiler optimizations that rcu_dereference()
must prohibit. The rcu_dereference_protected() variant takes
a lockdep expression to indicate which locks must be acquired
by the caller. If the indicated protection is not provided,
a lockdep splat is emitted. See RCU/Design/Requirements.html
and the API's code comments for more details and example usage.
The following diagram shows how each API communicates among the The following diagram shows how each API communicates among the
reader, updater, and reclaimer. reader, updater, and reclaimer.
...@@ -322,28 +335,27 @@ to their callers and (2) call_rcu() callbacks may be invoked. Efficient ...@@ -322,28 +335,27 @@ to their callers and (2) call_rcu() callbacks may be invoked. Efficient
implementations of the RCU infrastructure make heavy use of batching in implementations of the RCU infrastructure make heavy use of batching in
order to amortize their overhead over many uses of the corresponding APIs. order to amortize their overhead over many uses of the corresponding APIs.
There are no fewer than three RCU mechanisms in the Linux kernel; the There are at least three flavors of RCU usage in the Linux kernel. The diagram
diagram above shows the first one, which is by far the most commonly used. above shows the most common one. On the updater side, the rcu_assign_pointer(),
The rcu_dereference() and rcu_assign_pointer() primitives are used for sychronize_rcu() and call_rcu() primitives used are the same for all three
all three mechanisms, but different defer and protect primitives are flavors. However for protection (on the reader side), the primitives used vary
used as follows: depending on the flavor:
Defer Protect
a. synchronize_rcu() rcu_read_lock() / rcu_read_unlock() a. rcu_read_lock() / rcu_read_unlock()
call_rcu() rcu_dereference() rcu_dereference()
b. synchronize_rcu_bh() rcu_read_lock_bh() / rcu_read_unlock_bh() b. rcu_read_lock_bh() / rcu_read_unlock_bh()
call_rcu_bh() rcu_dereference_bh() local_bh_disable() / local_bh_enable()
rcu_dereference_bh()
c. synchronize_sched() rcu_read_lock_sched() / rcu_read_unlock_sched() c. rcu_read_lock_sched() / rcu_read_unlock_sched()
call_rcu_sched() preempt_disable() / preempt_enable() preempt_disable() / preempt_enable()
local_irq_save() / local_irq_restore() local_irq_save() / local_irq_restore()
hardirq enter / hardirq exit hardirq enter / hardirq exit
NMI enter / NMI exit NMI enter / NMI exit
rcu_dereference_sched() rcu_dereference_sched()
These three mechanisms are used as follows: These three flavors are used as follows:
a. RCU applied to normal data structures. a. RCU applied to normal data structures.
...@@ -867,18 +879,20 @@ RCU: Critical sections Grace period Barrier ...@@ -867,18 +879,20 @@ RCU: Critical sections Grace period Barrier
bh: Critical sections Grace period Barrier bh: Critical sections Grace period Barrier
rcu_read_lock_bh call_rcu_bh rcu_barrier_bh rcu_read_lock_bh call_rcu rcu_barrier
rcu_read_unlock_bh synchronize_rcu_bh rcu_read_unlock_bh synchronize_rcu
rcu_dereference_bh synchronize_rcu_bh_expedited [local_bh_disable] synchronize_rcu_expedited
[and friends]
rcu_dereference_bh
rcu_dereference_bh_check rcu_dereference_bh_check
rcu_dereference_bh_protected rcu_dereference_bh_protected
rcu_read_lock_bh_held rcu_read_lock_bh_held
sched: Critical sections Grace period Barrier sched: Critical sections Grace period Barrier
rcu_read_lock_sched synchronize_sched rcu_barrier_sched rcu_read_lock_sched call_rcu rcu_barrier
rcu_read_unlock_sched call_rcu_sched rcu_read_unlock_sched synchronize_rcu
[preempt_disable] synchronize_sched_expedited [preempt_disable] synchronize_rcu_expedited
[and friends] [and friends]
rcu_read_lock_sched_notrace rcu_read_lock_sched_notrace
rcu_read_unlock_sched_notrace rcu_read_unlock_sched_notrace
...@@ -890,8 +904,8 @@ sched: Critical sections Grace period Barrier ...@@ -890,8 +904,8 @@ sched: Critical sections Grace period Barrier
SRCU: Critical sections Grace period Barrier SRCU: Critical sections Grace period Barrier
srcu_read_lock synchronize_srcu srcu_barrier srcu_read_lock call_srcu srcu_barrier
srcu_read_unlock call_srcu srcu_read_unlock synchronize_srcu
srcu_dereference synchronize_srcu_expedited srcu_dereference synchronize_srcu_expedited
srcu_dereference_check srcu_dereference_check
srcu_read_lock_held srcu_read_lock_held
...@@ -1034,7 +1048,7 @@ Answer: Just as PREEMPT_RT permits preemption of spinlock ...@@ -1034,7 +1048,7 @@ Answer: Just as PREEMPT_RT permits preemption of spinlock
spinlocks blocking while in RCU read-side critical spinlocks blocking while in RCU read-side critical
sections. sections.
Why the apparent inconsistency? Because it is it Why the apparent inconsistency? Because it is
possible to use priority boosting to keep the RCU possible to use priority boosting to keep the RCU
grace periods short if need be (for example, if running grace periods short if need be (for example, if running
short of memory). In contrast, if blocking waiting short of memory). In contrast, if blocking waiting
......
...@@ -3754,24 +3754,6 @@ ...@@ -3754,24 +3754,6 @@
in microseconds. The default of zero says in microseconds. The default of zero says
no holdoff. no holdoff.
rcutorture.cbflood_inter_holdoff= [KNL]
Set holdoff time (jiffies) between successive
callback-flood tests.
rcutorture.cbflood_intra_holdoff= [KNL]
Set holdoff time (jiffies) between successive
bursts of callbacks within a given callback-flood
test.
rcutorture.cbflood_n_burst= [KNL]
Set the number of bursts making up a given
callback-flood test. Set this to zero to
disable callback-flood testing.
rcutorture.cbflood_n_per_burst= [KNL]
Set the number of callbacks to be registered
in a given burst of a callback-flood test.
rcutorture.fqs_duration= [KNL] rcutorture.fqs_duration= [KNL]
Set duration of force_quiescent_state bursts Set duration of force_quiescent_state bursts
in microseconds. in microseconds.
...@@ -3784,6 +3766,23 @@ ...@@ -3784,6 +3766,23 @@
Set wait time between force_quiescent_state bursts Set wait time between force_quiescent_state bursts
in seconds. in seconds.
rcutorture.fwd_progress= [KNL]
Enable RCU grace-period forward-progress testing
for the types of RCU supporting this notion.
rcutorture.fwd_progress_div= [KNL]
Specify the fraction of a CPU-stall-warning
period to do tight-loop forward-progress testing.
rcutorture.fwd_progress_holdoff= [KNL]
Number of seconds to wait between successive
forward-progress tests.
rcutorture.fwd_progress_need_resched= [KNL]
Enclose cond_resched() calls within checks for
need_resched() during tight-loop forward-progress
testing.
rcutorture.gp_cond= [KNL] rcutorture.gp_cond= [KNL]
Use conditional/asynchronous update-side Use conditional/asynchronous update-side
primitives, if available. primitives, if available.
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment