Commit a83f1fe2 authored by Paul E. McKenney's avatar Paul E. McKenney Committed by Linus Torvalds

[PATCH] Update RCU documentation

Update the RCU documentation to allow for the new synchronize_rcu() and
synchronize_sched() primitives.  Fix a few other nits as well.
Signed-off-by: default avatarPaul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
parent fbd568a3
...@@ -108,8 +108,9 @@ year saw a paper describing an RCU implementation of System V IPC ...@@ -108,8 +108,9 @@ year saw a paper describing an RCU implementation of System V IPC
2004 has seen a Linux-Journal article on use of RCU in dcache 2004 has seen a Linux-Journal article on use of RCU in dcache
[McKenney04a], a performance comparison of locking to RCU on several [McKenney04a], a performance comparison of locking to RCU on several
different CPUs [McKenney04b], a dissertation describing use of RCU in a different CPUs [McKenney04b], a dissertation describing use of RCU in a
number of operating-system kernels [PaulEdwardMcKenneyPhD], and a paper number of operating-system kernels [PaulEdwardMcKenneyPhD], a paper
describing how to make RCU safe for soft-realtime applications [Sarma04c]. describing how to make RCU safe for soft-realtime applications [Sarma04c],
and a paper describing SELinux performance with RCU [JamesMorris04b].
Bibtex Entries Bibtex Entries
...@@ -341,6 +342,17 @@ Dipankar Sarma" ...@@ -341,6 +342,17 @@ Dipankar Sarma"
,pages="18-26" ,pages="18-26"
} }
@techreport{Friedberg03a
,author="Stuart A. Friedberg"
,title="Lock-Free Wild Card Search Data Structure and Method"
,institution="US Patent and Trademark Office"
,address="Washington, DC"
,year="2003"
,number="US Patent 6,662,184 (contributed under GPL)"
,month="December"
,pages="112"
}
@article{McKenney04a @article{McKenney04a
,author="Paul E. McKenney and Dipankar Sarma and Maneesh Soni" ,author="Paul E. McKenney and Dipankar Sarma and Maneesh Soni"
,title="Scaling dcache with {RCU}" ,title="Scaling dcache with {RCU}"
...@@ -373,6 +385,9 @@ in Operating System Kernels" ...@@ -373,6 +385,9 @@ in Operating System Kernels"
,school="OGI School of Science and Engineering at ,school="OGI School of Science and Engineering at
Oregon Health and Sciences University" Oregon Health and Sciences University"
,year="2004" ,year="2004"
,note="Available:
\url{http://www.rdrop.com/users/paulmck/RCU/RCUdissertation.2004.07.14e1.pdf}
[Viewed October 15, 2004]"
} }
@Conference{Sarma04c @Conference{Sarma04c
...@@ -385,3 +400,13 @@ Oregon Health and Sciences University" ...@@ -385,3 +400,13 @@ Oregon Health and Sciences University"
,month="June" ,month="June"
,pages="182-191" ,pages="182-191"
} }
@unpublished{JamesMorris04b
,Author="James Morris"
,Title="Recent Developments in {SELinux} Kernel Performance"
,month="December"
,year="2004"
,note="Available:
\url{http://www.livejournal.com/users/james_morris/2153.html}
[Viewed December 10, 2004]"
}
...@@ -2,11 +2,11 @@ RCU on Uniprocessor Systems ...@@ -2,11 +2,11 @@ RCU on Uniprocessor Systems
A common misconception is that, on UP systems, the call_rcu() primitive A common misconception is that, on UP systems, the call_rcu() primitive
may immediately invoke its function, and that the synchronize_kernel may immediately invoke its function, and that the synchronize_rcu()
primitive may return immediately. The basis of this misconception primitive may return immediately. The basis of this misconception
is that since there is only one CPU, it should not be necessary to is that since there is only one CPU, it should not be necessary to
wait for anything else to get done, since there are no other CPUs for wait for anything else to get done, since there are no other CPUs for
anything else to be happening on. Although this approach will sort of anything else to be happening on. Although this approach will -sort- -of-
work a surprising amount of the time, it is a very bad idea in general. work a surprising amount of the time, it is a very bad idea in general.
This document presents two examples that demonstrate exactly how bad an This document presents two examples that demonstrate exactly how bad an
idea this is. idea this is.
...@@ -44,14 +44,14 @@ its arguments would cause it to fail to make the fundamental guarantee ...@@ -44,14 +44,14 @@ its arguments would cause it to fail to make the fundamental guarantee
underlying RCU, namely that call_rcu() defers invoking its arguments until underlying RCU, namely that call_rcu() defers invoking its arguments until
all RCU read-side critical sections currently executing have completed. all RCU read-side critical sections currently executing have completed.
Quick Quiz: why is it -not- legal to invoke synchronize_kernel() in Quick Quiz: why is it -not- legal to invoke synchronize_rcu() in
this case? this case?
Summary Summary
Permitting call_rcu() to immediately invoke its arguments or permitting Permitting call_rcu() to immediately invoke its arguments or permitting
synchronize_kernel() to immediately return breaks RCU, even on a UP system. synchronize_rcu() to immediately return breaks RCU, even on a UP system.
So do not do it! Even on a UP system, the RCU infrastructure -must- So do not do it! Even on a UP system, the RCU infrastructure -must-
respect grace periods. respect grace periods.
......
...@@ -32,7 +32,10 @@ over a rather long period of time, but improvements are always welcome! ...@@ -32,7 +32,10 @@ over a rather long period of time, but improvements are always welcome!
them -- even x86 allows reads to be reordered), and be prepared them -- even x86 allows reads to be reordered), and be prepared
to explain why this added complexity is worthwhile. If you to explain why this added complexity is worthwhile. If you
choose #c, be prepared to explain how this single task does not choose #c, be prepared to explain how this single task does not
become a major bottleneck on big multiprocessor machines. become a major bottleneck on big multiprocessor machines (for
example, if the task is updating information relating to itself
that other tasks can read, there by definition can be no
bottleneck).
2. Do the RCU read-side critical sections make proper use of 2. Do the RCU read-side critical sections make proper use of
rcu_read_lock() and friends? These primitives are needed rcu_read_lock() and friends? These primitives are needed
...@@ -89,27 +92,34 @@ over a rather long period of time, but improvements are always welcome! ...@@ -89,27 +92,34 @@ over a rather long period of time, but improvements are always welcome!
"_rcu()" list-traversal primitives, such as the "_rcu()" list-traversal primitives, such as the
list_for_each_entry_rcu(). list_for_each_entry_rcu().
b. If the list macros are being used, the list_del_rcu(), b. If the list macros are being used, the list_add_tail_rcu()
list_add_tail_rcu(), and list_del_rcu() primitives must and list_add_rcu() primitives must be used in order
be used in order to prevent weakly ordered machines from to prevent weakly ordered machines from misordering
misordering structure initialization and pointer planting. structure initialization and pointer planting.
Similarly, if the hlist macros are being used, the Similarly, if the hlist macros are being used, the
hlist_del_rcu() and hlist_add_head_rcu() primitives hlist_add_head_rcu() primitive is required.
are required.
c. Updates must ensure that initialization of a given c. If the list macros are being used, the list_del_rcu()
primitive must be used to keep list_del()'s pointer
poisoning from inflicting toxic effects on concurrent
readers. Similarly, if the hlist macros are being used,
the hlist_del_rcu() primitive is required.
The list_replace_rcu() primitive may be used to
replace an old structure with a new one in an
RCU-protected list.
d. Updates must ensure that initialization of a given
structure happens before pointers to that structure are structure happens before pointers to that structure are
publicized. Use the rcu_assign_pointer() primitive publicized. Use the rcu_assign_pointer() primitive
when publicizing a pointer to a structure that can when publicizing a pointer to a structure that can
be traversed by an RCU read-side critical section. be traversed by an RCU read-side critical section.
[The rcu_assign_pointer() primitive is in process.]
5. If call_rcu(), or a related primitive such as call_rcu_bh(), 5. If call_rcu(), or a related primitive such as call_rcu_bh(),
is used, the callback function must be written to be called is used, the callback function must be written to be called
from softirq context. In particular, it cannot block. from softirq context. In particular, it cannot block.
6. Since synchronize_kernel() blocks, it cannot be called from 6. Since synchronize_rcu() can block, it cannot be called from
any sort of irq context. any sort of irq context.
7. If the updater uses call_rcu(), then the corresponding readers 7. If the updater uses call_rcu(), then the corresponding readers
...@@ -125,9 +135,9 @@ over a rather long period of time, but improvements are always welcome! ...@@ -125,9 +135,9 @@ over a rather long period of time, but improvements are always welcome!
such cases is a must, of course! And the jury is still out on such cases is a must, of course! And the jury is still out on
whether the increased speed is worth it. whether the increased speed is worth it.
8. Although synchronize_kernel() is a bit slower than is call_rcu(), 8. Although synchronize_rcu() is a bit slower than is call_rcu(),
it usually results in simpler code. So, unless update performance it usually results in simpler code. So, unless update performance
is important or the updaters cannot block, synchronize_kernel() is important or the updaters cannot block, synchronize_rcu()
should be used in preference to call_rcu(). should be used in preference to call_rcu().
9. All RCU list-traversal primitives, which include 9. All RCU list-traversal primitives, which include
...@@ -155,3 +165,14 @@ over a rather long period of time, but improvements are always welcome! ...@@ -155,3 +165,14 @@ over a rather long period of time, but improvements are always welcome!
you -must- use the "_rcu()" variants of the list macros. you -must- use the "_rcu()" variants of the list macros.
Failing to do so will break Alpha and confuse people reading Failing to do so will break Alpha and confuse people reading
your code. your code.
11. Note that synchronize_rcu() -only- guarantees to wait until
all currently executing rcu_read_lock()-protected RCU read-side
critical sections complete. It does -not- necessarily guarantee
that all currently running interrupts, NMIs, preempt_disable()
code, or idle loops will complete. Therefore, if you do not have
rcu_read_lock()-protected read-side critical sections, do -not-
use synchronize_rcu().
If you want to wait for some of these other things, you might
instead need to use synchronize_irq() or synchronize_sched().
...@@ -32,6 +32,7 @@ implementation of audit_filter_task() might be as follows: ...@@ -32,6 +32,7 @@ implementation of audit_filter_task() might be as follows:
enum audit_state state; enum audit_state state;
read_lock(&auditsc_lock); read_lock(&auditsc_lock);
/* Note: audit_netlink_sem held by caller. */
list_for_each_entry(e, &audit_tsklist, list) { list_for_each_entry(e, &audit_tsklist, list) {
if (audit_filter_rules(tsk, &e->rule, NULL, &state)) { if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
read_unlock(&auditsc_lock); read_unlock(&auditsc_lock);
...@@ -55,6 +56,7 @@ This means that RCU can be easily applied to the read side, as follows: ...@@ -55,6 +56,7 @@ This means that RCU can be easily applied to the read side, as follows:
enum audit_state state; enum audit_state state;
rcu_read_lock(); rcu_read_lock();
/* Note: audit_netlink_sem held by caller. */
list_for_each_entry_rcu(e, &audit_tsklist, list) { list_for_each_entry_rcu(e, &audit_tsklist, list) {
if (audit_filter_rules(tsk, &e->rule, NULL, &state)) { if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
rcu_read_unlock(); rcu_read_unlock();
...@@ -139,12 +141,15 @@ Normally, the write_lock() and write_unlock() would be replaced by ...@@ -139,12 +141,15 @@ Normally, the write_lock() and write_unlock() would be replaced by
a spin_lock() and a spin_unlock(), but in this case, all callers hold a spin_lock() and a spin_unlock(), but in this case, all callers hold
audit_netlink_sem, so no additional locking is required. The auditsc_lock audit_netlink_sem, so no additional locking is required. The auditsc_lock
can therefore be eliminated, since use of RCU eliminates the need for can therefore be eliminated, since use of RCU eliminates the need for
writers to exclude readers. writers to exclude readers. Normally, the write_lock() calls would
be converted into spin_lock() calls.
The list_del(), list_add(), and list_add_tail() primitives have been The list_del(), list_add(), and list_add_tail() primitives have been
replaced by list_del_rcu(), list_add_rcu(), and list_add_tail_rcu(). replaced by list_del_rcu(), list_add_rcu(), and list_add_tail_rcu().
The _rcu() list-manipulation primitives add memory barriers that are The _rcu() list-manipulation primitives add memory barriers that are
needed on weakly ordered CPUs (most of them!). needed on weakly ordered CPUs (most of them!). The list_del_rcu()
primitive omits the pointer poisoning debug-assist code that would
otherwise cause concurrent readers to fail spectacularly.
So, when readers can tolerate stale data and when entries are either added So, when readers can tolerate stale data and when entries are either added
or deleted, without in-place modification, it is very easy to use RCU! or deleted, without in-place modification, it is very easy to use RCU!
...@@ -166,6 +171,7 @@ otherwise, the added fields would need to be filled in): ...@@ -166,6 +171,7 @@ otherwise, the added fields would need to be filled in):
struct audit_newentry *ne; struct audit_newentry *ne;
write_lock(&auditsc_lock); write_lock(&auditsc_lock);
/* Note: audit_netlink_sem held by caller. */
list_for_each_entry(e, list, list) { list_for_each_entry(e, list, list) {
if (!audit_compare_rule(rule, &e->rule)) { if (!audit_compare_rule(rule, &e->rule)) {
e->rule.action = newaction; e->rule.action = newaction;
...@@ -199,8 +205,7 @@ RCU ("read-copy update") its name. The RCU code is as follows: ...@@ -199,8 +205,7 @@ RCU ("read-copy update") its name. The RCU code is as follows:
audit_copy_rule(&ne->rule, &e->rule); audit_copy_rule(&ne->rule, &e->rule);
ne->rule.action = newaction; ne->rule.action = newaction;
ne->rule.file_count = newfield_count; ne->rule.file_count = newfield_count;
list_add_rcu(ne, e); list_replace_rcu(e, ne);
list_del(e);
call_rcu(&e->rcu, audit_free_rule, e); call_rcu(&e->rcu, audit_free_rule, e);
return 0; return 0;
} }
......
...@@ -43,7 +43,9 @@ o If I am running on a uniprocessor kernel, which can only do one ...@@ -43,7 +43,9 @@ o If I am running on a uniprocessor kernel, which can only do one
o How can I see where RCU is currently used in the Linux kernel? o How can I see where RCU is currently used in the Linux kernel?
Search for "rcu_read_lock", "call_rcu", and "synchronize_kernel". Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu",
"rcu_read_lock_bh", "rcu_read_unlock_bh", "call_rcu_bh",
"synchronize_rcu", and "synchronize_net".
o What guidelines should I follow when writing code that uses RCU? o What guidelines should I follow when writing code that uses RCU?
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment