Commit bf28ae56 authored by Alan Stern's avatar Alan Stern Committed by Ingo Molnar

tools/memory-model: Remove rb-dep, smp_read_barrier_depends, and lockless_dereference

Since commit 76ebbe78 ("locking/barriers: Add implicit
smp_read_barrier_depends() to READ_ONCE()") was merged for the 4.15
kernel, it has not been necessary to use smp_read_barrier_depends().
Similarly, commit 59ecbbe7 ("locking/barriers: Kill
lockless_dereference()") removed lockless_dereference() from the
kernel.

Since these primitives are no longer part of the kernel, they do not
belong in the Linux Kernel Memory Consistency Model.  This patch
removes them, along with the internal rb-dep relation, and updates the
revelant documentation.
Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akiyks@gmail.com
Cc: boqun.feng@gmail.com
Cc: dhowells@redhat.com
Cc: j.alglave@ucl.ac.uk
Cc: linux-arch@vger.kernel.org
Cc: luc.maranget@inria.fr
Cc: nborisov@suse.com
Cc: npiggin@gmail.com
Cc: parri.andrea@gmail.com
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1519169112-20593-12-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
parent cac79a39
...@@ -6,8 +6,7 @@ ...@@ -6,8 +6,7 @@
Store, e.g., WRITE_ONCE() Y Y Store, e.g., WRITE_ONCE() Y Y
Load, e.g., READ_ONCE() Y Y Y Load, e.g., READ_ONCE() Y Y Y
Unsuccessful RMW operation Y Y Y Unsuccessful RMW operation Y Y Y
smp_read_barrier_depends() Y Y Y rcu_dereference() Y Y Y Y
*_dereference() Y Y Y Y
Successful *_acquire() R Y Y Y Y Y Y Successful *_acquire() R Y Y Y Y Y Y
Successful *_release() C Y Y Y W Y Successful *_release() C Y Y Y W Y
smp_rmb() Y R Y Y R smp_rmb() Y R Y Y R
......
Explanation of the Linux-Kernel Memory Model Explanation of the Linux-Kernel Memory Consistency Model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
:Author: Alan Stern <stern@rowland.harvard.edu> :Author: Alan Stern <stern@rowland.harvard.edu>
:Created: October 2017 :Created: October 2017
...@@ -35,25 +35,24 @@ Explanation of the Linux-Kernel Memory Model ...@@ -35,25 +35,24 @@ Explanation of the Linux-Kernel Memory Model
INTRODUCTION INTRODUCTION
------------ ------------
The Linux-kernel memory model (LKMM) is rather complex and obscure. The Linux-kernel memory consistency model (LKMM) is rather complex and
This is particularly evident if you read through the linux-kernel.bell obscure. This is particularly evident if you read through the
and linux-kernel.cat files that make up the formal version of the linux-kernel.bell and linux-kernel.cat files that make up the formal
memory model; they are extremely terse and their meanings are far from version of the model; they are extremely terse and their meanings are
clear. far from clear.
This document describes the ideas underlying the LKMM. It is meant This document describes the ideas underlying the LKMM. It is meant
for people who want to understand how the memory model was designed. for people who want to understand how the model was designed. It does
It does not go into the details of the code in the .bell and .cat not go into the details of the code in the .bell and .cat files;
files; rather, it explains in English what the code expresses rather, it explains in English what the code expresses symbolically.
symbolically.
Sections 2 (BACKGROUND) through 5 (ORDERING AND CYCLES) are aimed Sections 2 (BACKGROUND) through 5 (ORDERING AND CYCLES) are aimed
toward beginners; they explain what memory models are and the basic toward beginners; they explain what memory consistency models are and
notions shared by all such models. People already familiar with these the basic notions shared by all such models. People already familiar
concepts can skim or skip over them. Sections 6 (EVENTS) through 12 with these concepts can skim or skip over them. Sections 6 (EVENTS)
(THE FROM_READS RELATION) describe the fundamental relations used in through 12 (THE FROM_READS RELATION) describe the fundamental
many memory models. Starting in Section 13 (AN OPERATIONAL MODEL), relations used in many models. Starting in Section 13 (AN OPERATIONAL
the workings of the LKMM itself are covered. MODEL), the workings of the LKMM itself are covered.
Warning: The code examples in this document are not written in the Warning: The code examples in this document are not written in the
proper format for litmus tests. They don't include a header line, the proper format for litmus tests. They don't include a header line, the
...@@ -827,8 +826,8 @@ A-cumulative; they only affect the propagation of stores that are ...@@ -827,8 +826,8 @@ A-cumulative; they only affect the propagation of stores that are
executed on C before the fence (i.e., those which precede the fence in executed on C before the fence (i.e., those which precede the fence in
program order). program order).
smp_read_barrier_depends(), rcu_read_lock(), rcu_read_unlock(), and read_lock(), rcu_read_unlock(), and synchronize_rcu() fences have
synchronize_rcu() fences have other properties which we discuss later. other properties which we discuss later.
PROPAGATION ORDER RELATION: cumul-fence PROPAGATION ORDER RELATION: cumul-fence
...@@ -988,8 +987,8 @@ Another possibility, not mentioned earlier but discussed in the next ...@@ -988,8 +987,8 @@ Another possibility, not mentioned earlier but discussed in the next
section, is: section, is:
X and Y are both loads, X ->addr Y (i.e., there is an address X and Y are both loads, X ->addr Y (i.e., there is an address
dependency from X to Y), and an smp_read_barrier_depends() dependency from X to Y), and X is a READ_ONCE() or an atomic
fence occurs between them. access.
Dependencies can also cause instructions to be executed in program Dependencies can also cause instructions to be executed in program
order. This is uncontroversial when the second instruction is a order. This is uncontroversial when the second instruction is a
...@@ -1015,9 +1014,9 @@ After all, a CPU cannot ask the memory subsystem to load a value from ...@@ -1015,9 +1014,9 @@ After all, a CPU cannot ask the memory subsystem to load a value from
a particular location before it knows what that location is. However, a particular location before it knows what that location is. However,
the split-cache design used by Alpha can cause it to behave in a way the split-cache design used by Alpha can cause it to behave in a way
that looks as if the loads were executed out of order (see the next that looks as if the loads were executed out of order (see the next
section for more details). For this reason, the LKMM does not include section for more details). The kernel includes a workaround for this
address dependencies between read events in the ppo relation unless an problem when the loads come from READ_ONCE(), and therefore the LKMM
smp_read_barrier_depends() fence is present. includes address dependencies to loads in the ppo relation.
On the other hand, dependencies can indirectly affect the ordering of On the other hand, dependencies can indirectly affect the ordering of
two loads. This happens when there is a dependency from a load to a two loads. This happens when there is a dependency from a load to a
...@@ -1114,11 +1113,12 @@ code such as the following: ...@@ -1114,11 +1113,12 @@ code such as the following:
int *r1; int *r1;
int r2; int r2;
r1 = READ_ONCE(ptr); r1 = ptr;
r2 = READ_ONCE(*r1); r2 = READ_ONCE(*r1);
} }
can malfunction on Alpha systems. It is quite possible that r1 = &x can malfunction on Alpha systems (notice that P1 uses an ordinary load
to read ptr instead of READ_ONCE()). It is quite possible that r1 = &x
and r2 = 0 at the end, in spite of the address dependency. and r2 = 0 at the end, in spite of the address dependency.
At first glance this doesn't seem to make sense. We know that the At first glance this doesn't seem to make sense. We know that the
...@@ -1141,11 +1141,15 @@ This could not have happened if the local cache had processed the ...@@ -1141,11 +1141,15 @@ This could not have happened if the local cache had processed the
incoming stores in FIFO order. In constrast, other architectures incoming stores in FIFO order. In constrast, other architectures
maintain at least the appearance of FIFO order. maintain at least the appearance of FIFO order.
In practice, this difficulty is solved by inserting an In practice, this difficulty is solved by inserting a special fence
smp_read_barrier_depends() fence between P1's two loads. The effect between P1's two loads when the kernel is compiled for the Alpha
of this fence is to cause the CPU not to execute any po-later architecture. In fact, as of version 4.15, the kernel automatically
instructions until after the local cache has finished processing all adds this fence (called smp_read_barrier_depends() and defined as
the stores it has already received. Thus, if the code was changed to: nothing at all on non-Alpha builds) after every READ_ONCE() and atomic
load. The effect of the fence is to cause the CPU not to execute any
po-later instructions until after the local cache has finished
processing all the stores it has already received. Thus, if the code
was changed to:
P1() P1()
{ {
...@@ -1153,13 +1157,15 @@ the stores it has already received. Thus, if the code was changed to: ...@@ -1153,13 +1157,15 @@ the stores it has already received. Thus, if the code was changed to:
int r2; int r2;
r1 = READ_ONCE(ptr); r1 = READ_ONCE(ptr);
smp_read_barrier_depends();
r2 = READ_ONCE(*r1); r2 = READ_ONCE(*r1);
} }
then we would never get r1 = &x and r2 = 0. By the time P1 executed then we would never get r1 = &x and r2 = 0. By the time P1 executed
its second load, the x = 1 store would already be fully processed by its second load, the x = 1 store would already be fully processed by
the local cache and available for satisfying the read request. the local cache and available for satisfying the read request. Thus
we have yet another reason why shared data should always be read with
READ_ONCE() or another synchronization primitive rather than accessed
directly.
The LKMM requires that smp_rmb(), acquire fences, and strong fences The LKMM requires that smp_rmb(), acquire fences, and strong fences
share this property with smp_read_barrier_depends(): They do not allow share this property with smp_read_barrier_depends(): They do not allow
...@@ -1751,11 +1757,10 @@ no further involvement from the CPU. Since the CPU doesn't ever read ...@@ -1751,11 +1757,10 @@ no further involvement from the CPU. Since the CPU doesn't ever read
the value of x, there is nothing for the smp_rmb() fence to act on. the value of x, there is nothing for the smp_rmb() fence to act on.
The LKMM defines a few extra synchronization operations in terms of The LKMM defines a few extra synchronization operations in terms of
things we have already covered. In particular, rcu_dereference() and things we have already covered. In particular, rcu_dereference() is
lockless_dereference() are both treated as a READ_ONCE() followed by treated as READ_ONCE() and rcu_assign_pointer() is treated as
smp_read_barrier_depends() -- which also happens to be how they are smp_store_release() -- which is basically how the Linux kernel treats
defined in include/linux/rcupdate.h and include/linux/compiler.h, them.
respectively.
There are a few oddball fences which need special treatment: There are a few oddball fences which need special treatment:
smp_mb__before_atomic(), smp_mb__after_atomic(), and smp_mb__before_atomic(), smp_mb__after_atomic(), and
......
...@@ -24,7 +24,6 @@ instructions RMW[{'once,'acquire,'release}] ...@@ -24,7 +24,6 @@ instructions RMW[{'once,'acquire,'release}]
enum Barriers = 'wmb (*smp_wmb*) || enum Barriers = 'wmb (*smp_wmb*) ||
'rmb (*smp_rmb*) || 'rmb (*smp_rmb*) ||
'mb (*smp_mb*) || 'mb (*smp_mb*) ||
'rb_dep (*smp_read_barrier_depends*) ||
'rcu-lock (*rcu_read_lock*) || 'rcu-lock (*rcu_read_lock*) ||
'rcu-unlock (*rcu_read_unlock*) || 'rcu-unlock (*rcu_read_unlock*) ||
'sync-rcu (*synchronize_rcu*) || 'sync-rcu (*synchronize_rcu*) ||
......
...@@ -25,7 +25,6 @@ include "lock.cat" ...@@ -25,7 +25,6 @@ include "lock.cat"
(*******************) (*******************)
(* Fences *) (* Fences *)
let rb-dep = [R] ; fencerel(Rb_dep) ; [R]
let rmb = [R \ Noreturn] ; fencerel(Rmb) ; [R \ Noreturn] let rmb = [R \ Noreturn] ; fencerel(Rmb) ; [R \ Noreturn]
let wmb = [W] ; fencerel(Wmb) ; [W] let wmb = [W] ; fencerel(Wmb) ; [W]
let mb = ([M] ; fencerel(Mb) ; [M]) | let mb = ([M] ; fencerel(Mb) ; [M]) |
...@@ -61,11 +60,9 @@ let dep = addr | data ...@@ -61,11 +60,9 @@ let dep = addr | data
let rwdep = (dep | ctrl) ; [W] let rwdep = (dep | ctrl) ; [W]
let overwrite = co | fr let overwrite = co | fr
let to-w = rwdep | (overwrite & int) let to-w = rwdep | (overwrite & int)
let rrdep = addr | (dep ; rfi) let to-r = addr | (dep ; rfi) | rfi-rel-acq
let strong-rrdep = rrdep+ & rb-dep
let to-r = strong-rrdep | rfi-rel-acq
let fence = strong-fence | wmb | po-rel | rmb | acq-po let fence = strong-fence | wmb | po-rel | rmb | acq-po
let ppo = rrdep* ; (to-r | to-w | fence) let ppo = to-r | to-w | fence
(* Propagation: Ordering from release operations and strong fences. *) (* Propagation: Ordering from release operations and strong fences. *)
let A-cumul(r) = rfe? ; r let A-cumul(r) = rfe? ; r
......
...@@ -13,14 +13,12 @@ WRITE_ONCE(X,V) { __store{once}(X,V); } ...@@ -13,14 +13,12 @@ WRITE_ONCE(X,V) { __store{once}(X,V); }
smp_store_release(X,V) { __store{release}(*X,V); } smp_store_release(X,V) { __store{release}(*X,V); }
smp_load_acquire(X) __load{acquire}(*X) smp_load_acquire(X) __load{acquire}(*X)
rcu_assign_pointer(X,V) { __store{release}(X,V); } rcu_assign_pointer(X,V) { __store{release}(X,V); }
lockless_dereference(X) __load{lderef}(X)
rcu_dereference(X) __load{deref}(X) rcu_dereference(X) __load{deref}(X)
// Fences // Fences
smp_mb() { __fence{mb} ; } smp_mb() { __fence{mb} ; }
smp_rmb() { __fence{rmb} ; } smp_rmb() { __fence{rmb} ; }
smp_wmb() { __fence{wmb} ; } smp_wmb() { __fence{wmb} ; }
smp_read_barrier_depends() { __fence{rb_dep}; }
smp_mb__before_atomic() { __fence{before-atomic} ; } smp_mb__before_atomic() { __fence{before-atomic} ; }
smp_mb__after_atomic() { __fence{after-atomic} ; } smp_mb__after_atomic() { __fence{after-atomic} ; }
smp_mb__after_spinlock() { __fence{after-spinlock} ; } smp_mb__after_spinlock() { __fence{after-spinlock} ; }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment