Commit 387b1468 authored by Mauro Carvalho Chehab's avatar Mauro Carvalho Chehab

docs: locking: convert docs to ReST and rename to *.rst

Convert the locking documents to ReST and add them to the
kernel development book where it belongs.

Most of the stuff here is just to make Sphinx to properly
parse the text file, as they're already in good shape,
not requiring massive changes in order to be parsed.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: default avatarMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: default avatarFederico Vaga <federico.vaga@vaga.pv.it>
parent fec88ab0
...@@ -1364,7 +1364,7 @@ Futex API reference ...@@ -1364,7 +1364,7 @@ Futex API reference
Further reading Further reading
=============== ===============
- ``Documentation/locking/spinlocks.txt``: Linus Torvalds' spinlocking - ``Documentation/locking/spinlocks.rst``: Linus Torvalds' spinlocking
tutorial in the kernel sources. tutorial in the kernel sources.
- Unix Systems for Modern Architectures: Symmetric Multiprocessing and - Unix Systems for Modern Architectures: Symmetric Multiprocessing and
......
:orphan:
=======
locking
=======
.. toctree::
:maxdepth: 1
lockdep-design
lockstat
locktorture
mutex-design
rt-mutex-design
rt-mutex
spinlocks
ww-mutex-design
.. only:: subproject and html
Indices
=======
* :ref:`genindex`
...@@ -2,6 +2,7 @@ Runtime locking correctness validator ...@@ -2,6 +2,7 @@ Runtime locking correctness validator
===================================== =====================================
started by Ingo Molnar <mingo@redhat.com> started by Ingo Molnar <mingo@redhat.com>
additions by Arjan van de Ven <arjan@linux.intel.com> additions by Arjan van de Ven <arjan@linux.intel.com>
Lock-class Lock-class
...@@ -56,7 +57,7 @@ where the last 1 category is: ...@@ -56,7 +57,7 @@ where the last 1 category is:
When locking rules are violated, these usage bits are presented in the When locking rules are violated, these usage bits are presented in the
locking error messages, inside curlies, with a total of 2 * n STATEs bits. locking error messages, inside curlies, with a total of 2 * n STATEs bits.
A contrived example: A contrived example::
modprobe/2287 is trying to acquire lock: modprobe/2287 is trying to acquire lock:
(&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24 (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24
...@@ -70,12 +71,14 @@ of the lock and readlock (if exists), for each of the n STATEs listed ...@@ -70,12 +71,14 @@ of the lock and readlock (if exists), for each of the n STATEs listed
above respectively, and the character displayed at each bit position above respectively, and the character displayed at each bit position
indicates: indicates:
=== ===================================================
'.' acquired while irqs disabled and not in irq context '.' acquired while irqs disabled and not in irq context
'-' acquired in irq context '-' acquired in irq context
'+' acquired with irqs enabled '+' acquired with irqs enabled
'?' acquired in irq context with irqs enabled. '?' acquired in irq context with irqs enabled.
=== ===================================================
The bits are illustrated with an example: The bits are illustrated with an example::
(&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24 (&sio_locks[i].lock){-.-.}, at: [<c02867fd>] mutex_lock+0x21/0x24
|||| ||||
...@@ -90,13 +93,13 @@ context and whether that STATE is enabled yields four possible cases as ...@@ -90,13 +93,13 @@ context and whether that STATE is enabled yields four possible cases as
shown in the table below. The bit character is able to indicate which shown in the table below. The bit character is able to indicate which
exact case is for the lock as of the reporting time. exact case is for the lock as of the reporting time.
------------------------------------------- +--------------+-------------+--------------+
| | irq enabled | irq disabled | | | irq enabled | irq disabled |
|-------------------------------------------| +--------------+-------------+--------------+
| ever in irq | ? | - | | ever in irq | ? | - |
|-------------------------------------------| +--------------+-------------+--------------+
| never in irq | + | . | | never in irq | + | . |
------------------------------------------- +--------------+-------------+--------------+
The character '-' suggests irq is disabled because if otherwise the The character '-' suggests irq is disabled because if otherwise the
charactor '?' would have been shown instead. Similar deduction can be charactor '?' would have been shown instead. Similar deduction can be
...@@ -113,7 +116,7 @@ is irq-unsafe means it was ever acquired with irq enabled. ...@@ -113,7 +116,7 @@ is irq-unsafe means it was ever acquired with irq enabled.
A softirq-unsafe lock-class is automatically hardirq-unsafe as well. The A softirq-unsafe lock-class is automatically hardirq-unsafe as well. The
following states must be exclusive: only one of them is allowed to be set following states must be exclusive: only one of them is allowed to be set
for any lock-class based on its usage: for any lock-class based on its usage::
<hardirq-safe> or <hardirq-unsafe> <hardirq-safe> or <hardirq-unsafe>
<softirq-safe> or <softirq-unsafe> <softirq-safe> or <softirq-unsafe>
...@@ -134,7 +137,7 @@ Multi-lock dependency rules: ...@@ -134,7 +137,7 @@ Multi-lock dependency rules:
The same lock-class must not be acquired twice, because this could lead The same lock-class must not be acquired twice, because this could lead
to lock recursion deadlocks. to lock recursion deadlocks.
Furthermore, two locks can not be taken in inverse order: Furthermore, two locks can not be taken in inverse order::
<L1> -> <L2> <L1> -> <L2>
<L2> -> <L1> <L2> -> <L1>
...@@ -148,7 +151,7 @@ operations; the validator will still find whether these locks can be ...@@ -148,7 +151,7 @@ operations; the validator will still find whether these locks can be
acquired in a circular fashion. acquired in a circular fashion.
Furthermore, the following usage based lock dependencies are not allowed Furthermore, the following usage based lock dependencies are not allowed
between any two lock-classes: between any two lock-classes::
<hardirq-safe> -> <hardirq-unsafe> <hardirq-safe> -> <hardirq-unsafe>
<softirq-safe> -> <softirq-unsafe> <softirq-safe> -> <softirq-unsafe>
...@@ -204,16 +207,16 @@ the ordering is not static. ...@@ -204,16 +207,16 @@ the ordering is not static.
In order to teach the validator about this correct usage model, new In order to teach the validator about this correct usage model, new
versions of the various locking primitives were added that allow you to versions of the various locking primitives were added that allow you to
specify a "nesting level". An example call, for the block device mutex, specify a "nesting level". An example call, for the block device mutex,
looks like this: looks like this::
enum bdev_bd_mutex_lock_class enum bdev_bd_mutex_lock_class
{ {
BD_MUTEX_NORMAL, BD_MUTEX_NORMAL,
BD_MUTEX_WHOLE, BD_MUTEX_WHOLE,
BD_MUTEX_PARTITION BD_MUTEX_PARTITION
}; };
mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION); mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION);
In this case the locking is done on a bdev object that is known to be a In this case the locking is done on a bdev object that is known to be a
partition. partition.
...@@ -234,7 +237,7 @@ must be held: lockdep_assert_held*(&lock) and lockdep_*pin_lock(&lock). ...@@ -234,7 +237,7 @@ must be held: lockdep_assert_held*(&lock) and lockdep_*pin_lock(&lock).
As the name suggests, lockdep_assert_held* family of macros assert that a As the name suggests, lockdep_assert_held* family of macros assert that a
particular lock is held at a certain time (and generate a WARN() otherwise). particular lock is held at a certain time (and generate a WARN() otherwise).
This annotation is largely used all over the kernel, e.g. kernel/sched/ This annotation is largely used all over the kernel, e.g. kernel/sched/
core.c core.c::
void update_rq_clock(struct rq *rq) void update_rq_clock(struct rq *rq)
{ {
...@@ -253,7 +256,7 @@ out to be especially helpful to debug code with callbacks, where an upper ...@@ -253,7 +256,7 @@ out to be especially helpful to debug code with callbacks, where an upper
layer assumes a lock remains taken, but a lower layer thinks it can maybe drop layer assumes a lock remains taken, but a lower layer thinks it can maybe drop
and reacquire the lock ("unwittingly" introducing races). lockdep_pin_lock() and reacquire the lock ("unwittingly" introducing races). lockdep_pin_lock()
returns a 'struct pin_cookie' that is then used by lockdep_unpin_lock() to check returns a 'struct pin_cookie' that is then used by lockdep_unpin_lock() to check
that nobody tampered with the lock, e.g. kernel/sched/sched.h that nobody tampered with the lock, e.g. kernel/sched/sched.h::
static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf) static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
{ {
...@@ -280,7 +283,7 @@ correctness) in the sense that for every simple, standalone single-task ...@@ -280,7 +283,7 @@ correctness) in the sense that for every simple, standalone single-task
locking sequence that occurred at least once during the lifetime of the locking sequence that occurred at least once during the lifetime of the
kernel, the validator proves it with a 100% certainty that no kernel, the validator proves it with a 100% certainty that no
combination and timing of these locking sequences can cause any class of combination and timing of these locking sequences can cause any class of
lock related deadlock. [*] lock related deadlock. [1]_
I.e. complex multi-CPU and multi-task locking scenarios do not have to I.e. complex multi-CPU and multi-task locking scenarios do not have to
occur in practice to prove a deadlock: only the simple 'component' occur in practice to prove a deadlock: only the simple 'component'
...@@ -299,7 +302,9 @@ possible combination of locking interaction between CPUs, combined with ...@@ -299,7 +302,9 @@ possible combination of locking interaction between CPUs, combined with
every possible hardirq and softirq nesting scenario (which is impossible every possible hardirq and softirq nesting scenario (which is impossible
to do in practice). to do in practice).
[*] assuming that the validator itself is 100% correct, and no other .. [1]
assuming that the validator itself is 100% correct, and no other
part of the system corrupts the state of the validator in any way. part of the system corrupts the state of the validator in any way.
We also assume that all NMI/SMM paths [which could interrupt We also assume that all NMI/SMM paths [which could interrupt
even hardirq-disabled codepaths] are correct and do not interfere even hardirq-disabled codepaths] are correct and do not interfere
...@@ -310,7 +315,7 @@ to do in practice). ...@@ -310,7 +315,7 @@ to do in practice).
Performance: Performance:
------------ ------------
The above rules require _massive_ amounts of runtime checking. If we did The above rules require **massive** amounts of runtime checking. If we did
that for every lock taken and for every irqs-enable event, it would that for every lock taken and for every irqs-enable event, it would
render the system practically unusably slow. The complexity of checking render the system practically unusably slow. The complexity of checking
is O(N^2), so even with just a few hundred lock-classes we'd have to do is O(N^2), so even with just a few hundred lock-classes we'd have to do
...@@ -369,17 +374,17 @@ be harder to do than to say. ...@@ -369,17 +374,17 @@ be harder to do than to say.
Of course, if you do run out of lock classes, the next thing to do is Of course, if you do run out of lock classes, the next thing to do is
to find the offending lock classes. First, the following command gives to find the offending lock classes. First, the following command gives
you the number of lock classes currently in use along with the maximum: you the number of lock classes currently in use along with the maximum::
grep "lock-classes" /proc/lockdep_stats grep "lock-classes" /proc/lockdep_stats
This command produces the following output on a modest system: This command produces the following output on a modest system::
lock-classes: 748 [max: 8191] lock-classes: 748 [max: 8191]
If the number allocated (748 above) increases continually over time, If the number allocated (748 above) increases continually over time,
then there is likely a leak. The following command can be used to then there is likely a leak. The following command can be used to
identify the leaking lock classes: identify the leaking lock classes::
grep "BD" /proc/lockdep grep "BD" /proc/lockdep
......
==================================
Kernel Lock Torture Test Operation Kernel Lock Torture Test Operation
==================================
CONFIG_LOCK_TORTURE_TEST CONFIG_LOCK_TORTURE_TEST
========================
The CONFIG LOCK_TORTURE_TEST config option provides a kernel module The CONFIG LOCK_TORTURE_TEST config option provides a kernel module
that runs torture tests on core kernel locking primitives. The kernel that runs torture tests on core kernel locking primitives. The kernel
...@@ -18,61 +21,77 @@ can be simulated by either enlarging this critical region hold time and/or ...@@ -18,61 +21,77 @@ can be simulated by either enlarging this critical region hold time and/or
creating more kthreads. creating more kthreads.
MODULE PARAMETERS Module Parameters
=================
This module has the following parameters: This module has the following parameters:
** Locktorture-specific ** Locktorture-specific
--------------------
nwriters_stress Number of kernel threads that will stress exclusive lock nwriters_stress
Number of kernel threads that will stress exclusive lock
ownership (writers). The default value is twice the number ownership (writers). The default value is twice the number
of online CPUs. of online CPUs.
nreaders_stress Number of kernel threads that will stress shared lock nreaders_stress
Number of kernel threads that will stress shared lock
ownership (readers). The default is the same amount of writer ownership (readers). The default is the same amount of writer
locks. If the user did not specify nwriters_stress, then locks. If the user did not specify nwriters_stress, then
both readers and writers be the amount of online CPUs. both readers and writers be the amount of online CPUs.
torture_type Type of lock to torture. By default, only spinlocks will torture_type
Type of lock to torture. By default, only spinlocks will
be tortured. This module can torture the following locks, be tortured. This module can torture the following locks,
with string values as follows: with string values as follows:
o "lock_busted": Simulates a buggy lock implementation. - "lock_busted":
Simulates a buggy lock implementation.
o "spin_lock": spin_lock() and spin_unlock() pairs. - "spin_lock":
spin_lock() and spin_unlock() pairs.
o "spin_lock_irq": spin_lock_irq() and spin_unlock_irq() - "spin_lock_irq":
pairs. spin_lock_irq() and spin_unlock_irq() pairs.
o "rw_lock": read/write lock() and unlock() rwlock pairs. - "rw_lock":
read/write lock() and unlock() rwlock pairs.
o "rw_lock_irq": read/write lock_irq() and unlock_irq() - "rw_lock_irq":
read/write lock_irq() and unlock_irq()
rwlock pairs. rwlock pairs.
o "mutex_lock": mutex_lock() and mutex_unlock() pairs. - "mutex_lock":
mutex_lock() and mutex_unlock() pairs.
o "rtmutex_lock": rtmutex_lock() and rtmutex_unlock() - "rtmutex_lock":
pairs. Kernel must have CONFIG_RT_MUTEX=y. rtmutex_lock() and rtmutex_unlock() pairs.
Kernel must have CONFIG_RT_MUTEX=y.
o "rwsem_lock": read/write down() and up() semaphore pairs. - "rwsem_lock":
read/write down() and up() semaphore pairs.
** Torture-framework (RCU + locking) ** Torture-framework (RCU + locking)
---------------------------------
shutdown_secs The number of seconds to run the test before terminating shutdown_secs
The number of seconds to run the test before terminating
the test and powering off the system. The default is the test and powering off the system. The default is
zero, which disables test termination and system shutdown. zero, which disables test termination and system shutdown.
This capability is useful for automated testing. This capability is useful for automated testing.
onoff_interval The number of seconds between each attempt to execute a onoff_interval
The number of seconds between each attempt to execute a
randomly selected CPU-hotplug operation. Defaults randomly selected CPU-hotplug operation. Defaults
to zero, which disables CPU hotplugging. In to zero, which disables CPU hotplugging. In
CONFIG_HOTPLUG_CPU=n kernels, locktorture will silently CONFIG_HOTPLUG_CPU=n kernels, locktorture will silently
refuse to do any CPU-hotplug operations regardless of refuse to do any CPU-hotplug operations regardless of
what value is specified for onoff_interval. what value is specified for onoff_interval.
onoff_holdoff The number of seconds to wait until starting CPU-hotplug onoff_holdoff
The number of seconds to wait until starting CPU-hotplug
operations. This would normally only be used when operations. This would normally only be used when
locktorture was built into the kernel and started locktorture was built into the kernel and started
automatically at boot time, in which case it is useful automatically at boot time, in which case it is useful
...@@ -80,53 +99,59 @@ onoff_holdoff The number of seconds to wait until starting CPU-hotplug ...@@ -80,53 +99,59 @@ onoff_holdoff The number of seconds to wait until starting CPU-hotplug
coming and going. This parameter is only useful if coming and going. This parameter is only useful if
CONFIG_HOTPLUG_CPU is enabled. CONFIG_HOTPLUG_CPU is enabled.
stat_interval Number of seconds between statistics-related printk()s. stat_interval
Number of seconds between statistics-related printk()s.
By default, locktorture will report stats every 60 seconds. By default, locktorture will report stats every 60 seconds.
Setting the interval to zero causes the statistics to Setting the interval to zero causes the statistics to
be printed -only- when the module is unloaded, and this be printed -only- when the module is unloaded, and this
is the default. is the default.
stutter The length of time to run the test before pausing for this stutter
The length of time to run the test before pausing for this
same period of time. Defaults to "stutter=5", so as same period of time. Defaults to "stutter=5", so as
to run and pause for (roughly) five-second intervals. to run and pause for (roughly) five-second intervals.
Specifying "stutter=0" causes the test to run continuously Specifying "stutter=0" causes the test to run continuously
without pausing, which is the old default behavior. without pausing, which is the old default behavior.
shuffle_interval The number of seconds to keep the test threads affinitied shuffle_interval
The number of seconds to keep the test threads affinitied
to a particular subset of the CPUs, defaults to 3 seconds. to a particular subset of the CPUs, defaults to 3 seconds.
Used in conjunction with test_no_idle_hz. Used in conjunction with test_no_idle_hz.
verbose Enable verbose debugging printing, via printk(). Enabled verbose
Enable verbose debugging printing, via printk(). Enabled
by default. This extra information is mostly related to by default. This extra information is mostly related to
high-level errors and reports from the main 'torture' high-level errors and reports from the main 'torture'
framework. framework.
STATISTICS Statistics
==========
Statistics are printed in the following format: Statistics are printed in the following format::
spin_lock-torture: Writes: Total: 93746064 Max/Min: 0/0 Fail: 0 spin_lock-torture: Writes: Total: 93746064 Max/Min: 0/0 Fail: 0
(A) (B) (C) (D) (E) (A) (B) (C) (D) (E)
(A): Lock type that is being tortured -- torture_type parameter. (A): Lock type that is being tortured -- torture_type parameter.
(B): Number of writer lock acquisitions. If dealing with a read/write primitive (B): Number of writer lock acquisitions. If dealing with a read/write
a second "Reads" statistics line is printed. primitive a second "Reads" statistics line is printed.
(C): Number of times the lock was acquired. (C): Number of times the lock was acquired.
(D): Min and max number of times threads failed to acquire the lock. (D): Min and max number of times threads failed to acquire the lock.
(E): true/false values if there were errors acquiring the lock. This should (E): true/false values if there were errors acquiring the lock. This should
-only- be positive if there is a bug in the locking primitive's -only- be positive if there is a bug in the locking primitive's
implementation. Otherwise a lock should never fail (i.e., spin_lock()). implementation. Otherwise a lock should never fail (i.e., spin_lock()).
Of course, the same applies for (C), above. A dummy example of this is Of course, the same applies for (C), above. A dummy example of this is
the "lock_busted" type. the "lock_busted" type.
USAGE Usage
=====
The following script may be used to torture locks: The following script may be used to torture locks::
#!/bin/sh #!/bin/sh
......
=======================
Generic Mutex Subsystem Generic Mutex Subsystem
=======================
started by Ingo Molnar <mingo@redhat.com> started by Ingo Molnar <mingo@redhat.com>
updated by Davidlohr Bueso <davidlohr@hp.com> updated by Davidlohr Bueso <davidlohr@hp.com>
What are mutexes? What are mutexes?
...@@ -23,7 +26,7 @@ Implementation ...@@ -23,7 +26,7 @@ Implementation
Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h
and implemented in kernel/locking/mutex.c. These locks use an atomic variable and implemented in kernel/locking/mutex.c. These locks use an atomic variable
(->owner) to keep track of the lock state during its lifetime. Field owner (->owner) to keep track of the lock state during its lifetime. Field owner
actually contains 'struct task_struct *' to the current lock owner and it is actually contains `struct task_struct *` to the current lock owner and it is
therefore NULL if not currently owned. Since task_struct pointers are aligned therefore NULL if not currently owned. Since task_struct pointers are aligned
at at least L1_CACHE_BYTES, low bits (3) are used to store extra state (e.g., at at least L1_CACHE_BYTES, low bits (3) are used to store extra state (e.g.,
if waiter list is non-empty). In its most basic form it also includes a if waiter list is non-empty). In its most basic form it also includes a
...@@ -101,29 +104,36 @@ features that make lock debugging easier and faster: ...@@ -101,29 +104,36 @@ features that make lock debugging easier and faster:
Interfaces Interfaces
---------- ----------
Statically define the mutex: Statically define the mutex::
DEFINE_MUTEX(name); DEFINE_MUTEX(name);
Dynamically initialize the mutex: Dynamically initialize the mutex::
mutex_init(mutex); mutex_init(mutex);
Acquire the mutex, uninterruptible: Acquire the mutex, uninterruptible::
void mutex_lock(struct mutex *lock); void mutex_lock(struct mutex *lock);
void mutex_lock_nested(struct mutex *lock, unsigned int subclass); void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
int mutex_trylock(struct mutex *lock); int mutex_trylock(struct mutex *lock);
Acquire the mutex, interruptible: Acquire the mutex, interruptible::
int mutex_lock_interruptible_nested(struct mutex *lock, int mutex_lock_interruptible_nested(struct mutex *lock,
unsigned int subclass); unsigned int subclass);
int mutex_lock_interruptible(struct mutex *lock); int mutex_lock_interruptible(struct mutex *lock);
Acquire the mutex, interruptible, if dec to 0: Acquire the mutex, interruptible, if dec to 0::
int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
Unlock the mutex: Unlock the mutex::
void mutex_unlock(struct mutex *lock); void mutex_unlock(struct mutex *lock);
Test if the mutex is taken: Test if the mutex is taken::
int mutex_is_locked(struct mutex *lock); int mutex_is_locked(struct mutex *lock);
Disadvantages Disadvantages
......
# ==============================
# Copyright (c) 2006 Steven Rostedt
# Licensed under the GNU Free Documentation License, Version 1.2
#
RT-mutex implementation design RT-mutex implementation design
------------------------------ ==============================
Copyright (c) 2006 Steven Rostedt
Licensed under the GNU Free Documentation License, Version 1.2
This document tries to describe the design of the rtmutex.c implementation. This document tries to describe the design of the rtmutex.c implementation.
It doesn't describe the reasons why rtmutex.c exists. For that please see It doesn't describe the reasons why rtmutex.c exists. For that please see
Documentation/locking/rt-mutex.txt. Although this document does explain problems Documentation/locking/rt-mutex.rst. Although this document does explain problems
that happen without this code, but that is in the concept to understand that happen without this code, but that is in the concept to understand
what the code actually is doing. what the code actually is doing.
...@@ -41,16 +42,16 @@ to release the lock, because for all we know, B is a CPU hog and will ...@@ -41,16 +42,16 @@ to release the lock, because for all we know, B is a CPU hog and will
never give C a chance to release the lock. This is called unbounded priority never give C a chance to release the lock. This is called unbounded priority
inversion. inversion.
Here's a little ASCII art to show the problem. Here's a little ASCII art to show the problem::
grab lock L1 (owned by C) grab lock L1 (owned by C)
| |
A ---+ A ---+
C preempted by B C preempted by B
| |
C +----+ C +----+
B +--------> B +-------->
B now keeps A from running. B now keeps A from running.
...@@ -75,24 +76,29 @@ Terminology ...@@ -75,24 +76,29 @@ Terminology
Here I explain some terminology that is used in this document to help describe Here I explain some terminology that is used in this document to help describe
the design that is used to implement PI. the design that is used to implement PI.
PI chain - The PI chain is an ordered series of locks and processes that cause PI chain
- The PI chain is an ordered series of locks and processes that cause
processes to inherit priorities from a previous process that is processes to inherit priorities from a previous process that is
blocked on one of its locks. This is described in more detail blocked on one of its locks. This is described in more detail
later in this document. later in this document.
mutex - In this document, to differentiate from locks that implement mutex
- In this document, to differentiate from locks that implement
PI and spin locks that are used in the PI code, from now on PI and spin locks that are used in the PI code, from now on
the PI locks will be called a mutex. the PI locks will be called a mutex.
lock - In this document from now on, I will use the term lock when lock
- In this document from now on, I will use the term lock when
referring to spin locks that are used to protect parts of the PI referring to spin locks that are used to protect parts of the PI
algorithm. These locks disable preemption for UP (when algorithm. These locks disable preemption for UP (when
CONFIG_PREEMPT is enabled) and on SMP prevents multiple CPUs from CONFIG_PREEMPT is enabled) and on SMP prevents multiple CPUs from
entering critical sections simultaneously. entering critical sections simultaneously.
spin lock - Same as lock above. spin lock
- Same as lock above.
waiter - A waiter is a struct that is stored on the stack of a blocked waiter
- A waiter is a struct that is stored on the stack of a blocked
process. Since the scope of the waiter is within the code for process. Since the scope of the waiter is within the code for
a process being blocked on the mutex, it is fine to allocate a process being blocked on the mutex, it is fine to allocate
the waiter on the process's stack (local variable). This the waiter on the process's stack (local variable). This
...@@ -104,14 +110,18 @@ waiter - A waiter is a struct that is stored on the stack of a blocked ...@@ -104,14 +110,18 @@ waiter - A waiter is a struct that is stored on the stack of a blocked
waiter is sometimes used in reference to the task that is waiting waiter is sometimes used in reference to the task that is waiting
on a mutex. This is the same as waiter->task. on a mutex. This is the same as waiter->task.
waiters - A list of processes that are blocked on a mutex. waiters
- A list of processes that are blocked on a mutex.
top waiter - The highest priority process waiting on a specific mutex. top waiter
- The highest priority process waiting on a specific mutex.
top pi waiter - The highest priority process waiting on one of the mutexes top pi waiter
- The highest priority process waiting on one of the mutexes
that a specific process owns. that a specific process owns.
Note: task and process are used interchangeably in this document, mostly to Note:
task and process are used interchangeably in this document, mostly to
differentiate between two processes that are being described together. differentiate between two processes that are being described together.
...@@ -123,7 +133,7 @@ inheritance to take place. Multiple chains may converge, but a chain ...@@ -123,7 +133,7 @@ inheritance to take place. Multiple chains may converge, but a chain
would never diverge, since a process can't be blocked on more than one would never diverge, since a process can't be blocked on more than one
mutex at a time. mutex at a time.
Example: Example::
Process: A, B, C, D, E Process: A, B, C, D, E
Mutexes: L1, L2, L3, L4 Mutexes: L1, L2, L3, L4
...@@ -137,21 +147,21 @@ Example: ...@@ -137,21 +147,21 @@ Example:
D owns L4 D owns L4
E blocked on L4 E blocked on L4
The chain would be: The chain would be::
E->L4->D->L3->C->L2->B->L1->A E->L4->D->L3->C->L2->B->L1->A
To show where two chains merge, we could add another process F and To show where two chains merge, we could add another process F and
another mutex L5 where B owns L5 and F is blocked on mutex L5. another mutex L5 where B owns L5 and F is blocked on mutex L5.
The chain for F would be: The chain for F would be::
F->L5->B->L1->A F->L5->B->L1->A
Since a process may own more than one mutex, but never be blocked on more than Since a process may own more than one mutex, but never be blocked on more than
one, the chains merge. one, the chains merge.
Here we show both chains: Here we show both chains::
E->L4->D->L3->C->L2-+ E->L4->D->L3->C->L2-+
| |
...@@ -165,12 +175,12 @@ than the processes to the left or below in the chain. ...@@ -165,12 +175,12 @@ than the processes to the left or below in the chain.
Also since a mutex may have more than one process blocked on it, we can Also since a mutex may have more than one process blocked on it, we can
have multiple chains merge at mutexes. If we add another process G that is have multiple chains merge at mutexes. If we add another process G that is
blocked on mutex L2: blocked on mutex L2::
G->L2->B->L1->A G->L2->B->L1->A
And once again, to show how this can grow I will show the merging chains And once again, to show how this can grow I will show the merging chains
again. again::
E->L4->D->L3->C-+ E->L4->D->L3->C-+
+->L2-+ +->L2-+
...@@ -184,7 +194,7 @@ the chain (A and B in this example), must have their priorities increased ...@@ -184,7 +194,7 @@ the chain (A and B in this example), must have their priorities increased
to that of G. to that of G.
Mutex Waiters Tree Mutex Waiters Tree
----------------- ------------------
Every mutex keeps track of all the waiters that are blocked on itself. The Every mutex keeps track of all the waiters that are blocked on itself. The
mutex has a rbtree to store these waiters by priority. This tree is protected mutex has a rbtree to store these waiters by priority. This tree is protected
...@@ -219,19 +229,19 @@ defined. But is very complex to figure it out, since it depends on all ...@@ -219,19 +229,19 @@ defined. But is very complex to figure it out, since it depends on all
the nesting of mutexes. Let's look at the example where we have 3 mutexes, the nesting of mutexes. Let's look at the example where we have 3 mutexes,
L1, L2, and L3, and four separate functions func1, func2, func3 and func4. L1, L2, and L3, and four separate functions func1, func2, func3 and func4.
The following shows a locking order of L1->L2->L3, but may not actually The following shows a locking order of L1->L2->L3, but may not actually
be directly nested that way. be directly nested that way::
void func1(void) void func1(void)
{ {
mutex_lock(L1); mutex_lock(L1);
/* do anything */ /* do anything */
mutex_unlock(L1); mutex_unlock(L1);
} }
void func2(void) void func2(void)
{ {
mutex_lock(L1); mutex_lock(L1);
mutex_lock(L2); mutex_lock(L2);
...@@ -239,10 +249,10 @@ void func2(void) ...@@ -239,10 +249,10 @@ void func2(void)
mutex_unlock(L2); mutex_unlock(L2);
mutex_unlock(L1); mutex_unlock(L1);
} }
void func3(void) void func3(void)
{ {
mutex_lock(L2); mutex_lock(L2);
mutex_lock(L3); mutex_lock(L3);
...@@ -250,30 +260,30 @@ void func3(void) ...@@ -250,30 +260,30 @@ void func3(void)
mutex_unlock(L3); mutex_unlock(L3);
mutex_unlock(L2); mutex_unlock(L2);
} }
void func4(void) void func4(void)
{ {
mutex_lock(L3); mutex_lock(L3);
/* do something again */ /* do something again */
mutex_unlock(L3); mutex_unlock(L3);
} }
Now we add 4 processes that run each of these functions separately. Now we add 4 processes that run each of these functions separately.
Processes A, B, C, and D which run functions func1, func2, func3 and func4 Processes A, B, C, and D which run functions func1, func2, func3 and func4
respectively, and such that D runs first and A last. With D being preempted respectively, and such that D runs first and A last. With D being preempted
in func4 in the "do something again" area, we have a locking that follows: in func4 in the "do something again" area, we have a locking that follows::
D owns L3 D owns L3
C blocked on L3 C blocked on L3
C owns L2 C owns L2
B blocked on L2 B blocked on L2
B owns L1 B owns L1
A blocked on L1 A blocked on L1
And thus we have the chain A->L1->B->L2->C->L3->D. And thus we have the chain A->L1->B->L2->C->L3->D.
This gives us a PI depth of 4 (four processes), but looking at any of the This gives us a PI depth of 4 (four processes), but looking at any of the
functions individually, it seems as though they only have at most a locking functions individually, it seems as though they only have at most a locking
...@@ -298,7 +308,7 @@ not true, the rtmutex.c code will be broken!), this allows for the least ...@@ -298,7 +308,7 @@ not true, the rtmutex.c code will be broken!), this allows for the least
significant bit to be used as a flag. Bit 0 is used as the "Has Waiters" significant bit to be used as a flag. Bit 0 is used as the "Has Waiters"
flag. It's set whenever there are waiters on a mutex. flag. It's set whenever there are waiters on a mutex.
See Documentation/locking/rt-mutex.txt for further details. See Documentation/locking/rt-mutex.rst for further details.
cmpxchg Tricks cmpxchg Tricks
-------------- --------------
...@@ -307,17 +317,17 @@ Some architectures implement an atomic cmpxchg (Compare and Exchange). This ...@@ -307,17 +317,17 @@ Some architectures implement an atomic cmpxchg (Compare and Exchange). This
is used (when applicable) to keep the fast path of grabbing and releasing is used (when applicable) to keep the fast path of grabbing and releasing
mutexes short. mutexes short.
cmpxchg is basically the following function performed atomically: cmpxchg is basically the following function performed atomically::
unsigned long _cmpxchg(unsigned long *A, unsigned long *B, unsigned long *C) unsigned long _cmpxchg(unsigned long *A, unsigned long *B, unsigned long *C)
{ {
unsigned long T = *A; unsigned long T = *A;
if (*A == *B) { if (*A == *B) {
*A = *C; *A = *C;
} }
return T; return T;
} }
#define cmpxchg(a,b,c) _cmpxchg(&a,&b,&c) #define cmpxchg(a,b,c) _cmpxchg(&a,&b,&c)
This is really nice to have, since it allows you to only update a variable This is really nice to have, since it allows you to only update a variable
if the variable is what you expect it to be. You know if it succeeded if if the variable is what you expect it to be. You know if it succeeded if
...@@ -352,9 +362,10 @@ Then rt_mutex_setprio is called to adjust the priority of the task to the ...@@ -352,9 +362,10 @@ Then rt_mutex_setprio is called to adjust the priority of the task to the
new priority. Note that rt_mutex_setprio is defined in kernel/sched/core.c new priority. Note that rt_mutex_setprio is defined in kernel/sched/core.c
to implement the actual change in priority. to implement the actual change in priority.
(Note: For the "prio" field in task_struct, the lower the number, the Note:
For the "prio" field in task_struct, the lower the number, the
higher the priority. A "prio" of 5 is of higher priority than a higher the priority. A "prio" of 5 is of higher priority than a
"prio" of 10.) "prio" of 10.
It is interesting to note that rt_mutex_adjust_prio can either increase It is interesting to note that rt_mutex_adjust_prio can either increase
or decrease the priority of the task. In the case that a higher priority or decrease the priority of the task. In the case that a higher priority
...@@ -439,6 +450,7 @@ wait_lock, which this code currently holds. So setting the "Has Waiters" flag ...@@ -439,6 +450,7 @@ wait_lock, which this code currently holds. So setting the "Has Waiters" flag
forces the current owner to synchronize with this code. forces the current owner to synchronize with this code.
The lock is taken if the following are true: The lock is taken if the following are true:
1) The lock has no owner 1) The lock has no owner
2) The current task is the highest priority against all other 2) The current task is the highest priority against all other
waiters of the lock waiters of the lock
...@@ -546,10 +558,13 @@ Credits ...@@ -546,10 +558,13 @@ Credits
------- -------
Author: Steven Rostedt <rostedt@goodmis.org> Author: Steven Rostedt <rostedt@goodmis.org>
Updated: Alex Shi <alex.shi@linaro.org> - 7/6/2017 Updated: Alex Shi <alex.shi@linaro.org> - 7/6/2017
Original Reviewers: Ingo Molnar, Thomas Gleixner, Thomas Duetsch, and Original Reviewers:
Ingo Molnar, Thomas Gleixner, Thomas Duetsch, and
Randy Dunlap Randy Dunlap
Update (7/6/2017) Reviewers: Steven Rostedt and Sebastian Siewior Update (7/6/2017) Reviewers: Steven Rostedt and Sebastian Siewior
Updates Updates
......
==================================
RT-mutex subsystem with PI support RT-mutex subsystem with PI support
---------------------------------- ==================================
RT-mutexes with priority inheritance are used to support PI-futexes, RT-mutexes with priority inheritance are used to support PI-futexes,
which enable pthread_mutex_t priority inheritance attributes which enable pthread_mutex_t priority inheritance attributes
...@@ -46,27 +47,30 @@ The state of the rt-mutex is tracked via the owner field of the rt-mutex ...@@ -46,27 +47,30 @@ The state of the rt-mutex is tracked via the owner field of the rt-mutex
structure: structure:
lock->owner holds the task_struct pointer of the owner. Bit 0 is used to lock->owner holds the task_struct pointer of the owner. Bit 0 is used to
keep track of the "lock has waiters" state. keep track of the "lock has waiters" state:
owner bit0 ============ ======= ================================================
owner bit0 Notes
============ ======= ================================================
NULL 0 lock is free (fast acquire possible) NULL 0 lock is free (fast acquire possible)
NULL 1 lock is free and has waiters and the top waiter NULL 1 lock is free and has waiters and the top waiter
is going to take the lock* is going to take the lock [1]_
taskpointer 0 lock is held (fast release possible) taskpointer 0 lock is held (fast release possible)
taskpointer 1 lock is held and has waiters** taskpointer 1 lock is held and has waiters [2]_
============ ======= ================================================
The fast atomic compare exchange based acquire and release is only The fast atomic compare exchange based acquire and release is only
possible when bit 0 of lock->owner is 0. possible when bit 0 of lock->owner is 0.
(*) It also can be a transitional state when grabbing the lock .. [1] It also can be a transitional state when grabbing the lock
with ->wait_lock is held. To prevent any fast path cmpxchg to the lock, with ->wait_lock is held. To prevent any fast path cmpxchg to the lock,
we need to set the bit0 before looking at the lock, and the owner may be we need to set the bit0 before looking at the lock, and the owner may
NULL in this small time, hence this can be a transitional state. be NULL in this small time, hence this can be a transitional state.
(**) There is a small time when bit 0 is set but there are no .. [2] There is a small time when bit 0 is set but there are no
waiters. This can happen when grabbing the lock in the slow path. waiters. This can happen when grabbing the lock in the slow path.
To prevent a cmpxchg of the owner releasing the lock, we need to To prevent a cmpxchg of the owner releasing the lock, we need to
set this bit before looking at the lock. set this bit before looking at the lock.
BTW, there is still technically a "Pending Owner", it's just not called BTW, there is still technically a "Pending Owner", it's just not called
that anymore. The pending owner happens to be the top_waiter of a lock that anymore. The pending owner happens to be the top_waiter of a lock
......
===============
Locking lessons
===============
Lesson 1: Spin locks Lesson 1: Spin locks
====================
The most basic primitive for locking is spinlock. The most basic primitive for locking is spinlock::
static DEFINE_SPINLOCK(xxx_lock); static DEFINE_SPINLOCK(xxx_lock);
unsigned long flags; unsigned long flags;
...@@ -19,23 +24,25 @@ worry about UP vs SMP issues: the spinlocks work correctly under both. ...@@ -19,23 +24,25 @@ worry about UP vs SMP issues: the spinlocks work correctly under both.
NOTE! Implications of spin_locks for memory are further described in: NOTE! Implications of spin_locks for memory are further described in:
Documentation/memory-barriers.txt Documentation/memory-barriers.txt
(5) LOCK operations. (5) LOCK operations.
(6) UNLOCK operations. (6) UNLOCK operations.
The above is usually pretty simple (you usually need and want only one The above is usually pretty simple (you usually need and want only one
spinlock for most things - using more than one spinlock can make things a spinlock for most things - using more than one spinlock can make things a
lot more complex and even slower and is usually worth it only for lot more complex and even slower and is usually worth it only for
sequences that you _know_ need to be split up: avoid it at all cost if you sequences that you **know** need to be split up: avoid it at all cost if you
aren't sure). aren't sure).
This is really the only really hard part about spinlocks: once you start This is really the only really hard part about spinlocks: once you start
using spinlocks they tend to expand to areas you might not have noticed using spinlocks they tend to expand to areas you might not have noticed
before, because you have to make sure the spinlocks correctly protect the before, because you have to make sure the spinlocks correctly protect the
shared data structures _everywhere_ they are used. The spinlocks are most shared data structures **everywhere** they are used. The spinlocks are most
easily added to places that are completely independent of other code (for easily added to places that are completely independent of other code (for
example, internal driver data structures that nobody else ever touches). example, internal driver data structures that nobody else ever touches).
NOTE! The spin-lock is safe only when you _also_ use the lock itself NOTE! The spin-lock is safe only when you **also** use the lock itself
to do locking across CPU's, which implies that EVERYTHING that to do locking across CPU's, which implies that EVERYTHING that
touches a shared variable has to agree about the spinlock they want touches a shared variable has to agree about the spinlock they want
to use. to use.
...@@ -43,6 +50,7 @@ example, internal driver data structures that nobody else ever touches). ...@@ -43,6 +50,7 @@ example, internal driver data structures that nobody else ever touches).
---- ----
Lesson 2: reader-writer spinlocks. Lesson 2: reader-writer spinlocks.
==================================
If your data accesses have a very natural pattern where you usually tend If your data accesses have a very natural pattern where you usually tend
to mostly read from the shared variables, the reader-writer locks to mostly read from the shared variables, the reader-writer locks
...@@ -54,7 +62,7 @@ to change the variables it has to get an exclusive write lock. ...@@ -54,7 +62,7 @@ to change the variables it has to get an exclusive write lock.
simple spinlocks. Unless the reader critical section is long, you simple spinlocks. Unless the reader critical section is long, you
are better off just using spinlocks. are better off just using spinlocks.
The routines look the same as above: The routines look the same as above::
rwlock_t xxx_lock = __RW_LOCK_UNLOCKED(xxx_lock); rwlock_t xxx_lock = __RW_LOCK_UNLOCKED(xxx_lock);
...@@ -71,7 +79,7 @@ The routines look the same as above: ...@@ -71,7 +79,7 @@ The routines look the same as above:
The above kind of lock may be useful for complex data structures like The above kind of lock may be useful for complex data structures like
linked lists, especially searching for entries without changing the list linked lists, especially searching for entries without changing the list
itself. The read lock allows many concurrent readers. Anything that itself. The read lock allows many concurrent readers. Anything that
_changes_ the list will have to get the write lock. **changes** the list will have to get the write lock.
NOTE! RCU is better for list traversal, but requires careful NOTE! RCU is better for list traversal, but requires careful
attention to design detail (see Documentation/RCU/listRCU.txt). attention to design detail (see Documentation/RCU/listRCU.txt).
...@@ -87,10 +95,11 @@ to get the write-lock at the very beginning. ...@@ -87,10 +95,11 @@ to get the write-lock at the very beginning.
---- ----
Lesson 3: spinlocks revisited. Lesson 3: spinlocks revisited.
==============================
The single spin-lock primitives above are by no means the only ones. They The single spin-lock primitives above are by no means the only ones. They
are the most safe ones, and the ones that work under all circumstances, are the most safe ones, and the ones that work under all circumstances,
but partly _because_ they are safe they are also fairly slow. They are slower but partly **because** they are safe they are also fairly slow. They are slower
than they'd need to be, because they do have to disable interrupts than they'd need to be, because they do have to disable interrupts
(which is just a single instruction on a x86, but it's an expensive one - (which is just a single instruction on a x86, but it's an expensive one -
and on other architectures it can be worse). and on other architectures it can be worse).
...@@ -98,7 +107,7 @@ and on other architectures it can be worse). ...@@ -98,7 +107,7 @@ and on other architectures it can be worse).
If you have a case where you have to protect a data structure across If you have a case where you have to protect a data structure across
several CPU's and you want to use spinlocks you can potentially use several CPU's and you want to use spinlocks you can potentially use
cheaper versions of the spinlocks. IFF you know that the spinlocks are cheaper versions of the spinlocks. IFF you know that the spinlocks are
never used in interrupt handlers, you can use the non-irq versions: never used in interrupt handlers, you can use the non-irq versions::
spin_lock(&lock); spin_lock(&lock);
... ...
...@@ -110,7 +119,7 @@ This is useful if you know that the data in question is only ever ...@@ -110,7 +119,7 @@ This is useful if you know that the data in question is only ever
manipulated from a "process context", ie no interrupts involved. manipulated from a "process context", ie no interrupts involved.
The reasons you mustn't use these versions if you have interrupts that The reasons you mustn't use these versions if you have interrupts that
play with the spinlock is that you can get deadlocks: play with the spinlock is that you can get deadlocks::
spin_lock(&lock); spin_lock(&lock);
... ...
...@@ -147,9 +156,10 @@ indeed), while write-locks need to protect themselves against interrupts. ...@@ -147,9 +156,10 @@ indeed), while write-locks need to protect themselves against interrupts.
---- ----
Reference information: Reference information:
======================
For dynamic initialization, use spin_lock_init() or rwlock_init() as For dynamic initialization, use spin_lock_init() or rwlock_init() as
appropriate: appropriate::
spinlock_t xxx_lock; spinlock_t xxx_lock;
rwlock_t xxx_rw_lock; rwlock_t xxx_rw_lock;
......
======================================
Wound/Wait Deadlock-Proof Mutex Design Wound/Wait Deadlock-Proof Mutex Design
====================================== ======================================
...@@ -85,6 +86,7 @@ Furthermore there are three different class of w/w lock acquire functions: ...@@ -85,6 +86,7 @@ Furthermore there are three different class of w/w lock acquire functions:
no deadlock potential and hence the ww_mutex_lock call will block and not no deadlock potential and hence the ww_mutex_lock call will block and not
prematurely return -EDEADLK. The advantage of the _slow functions is in prematurely return -EDEADLK. The advantage of the _slow functions is in
interface safety: interface safety:
- ww_mutex_lock has a __must_check int return type, whereas ww_mutex_lock_slow - ww_mutex_lock has a __must_check int return type, whereas ww_mutex_lock_slow
has a void return type. Note that since ww mutex code needs loops/retries has a void return type. Note that since ww mutex code needs loops/retries
anyway the __must_check doesn't result in spurious warnings, even though the anyway the __must_check doesn't result in spurious warnings, even though the
...@@ -115,36 +117,36 @@ expect the number of simultaneous competing transactions to be typically small, ...@@ -115,36 +117,36 @@ expect the number of simultaneous competing transactions to be typically small,
and you want to reduce the number of rollbacks. and you want to reduce the number of rollbacks.
Three different ways to acquire locks within the same w/w class. Common Three different ways to acquire locks within the same w/w class. Common
definitions for methods #1 and #2: definitions for methods #1 and #2::
static DEFINE_WW_CLASS(ww_class); static DEFINE_WW_CLASS(ww_class);
struct obj { struct obj {
struct ww_mutex lock; struct ww_mutex lock;
/* obj data */ /* obj data */
}; };
struct obj_entry { struct obj_entry {
struct list_head head; struct list_head head;
struct obj *obj; struct obj *obj;
}; };
Method 1, using a list in execbuf->buffers that's not allowed to be reordered. Method 1, using a list in execbuf->buffers that's not allowed to be reordered.
This is useful if a list of required objects is already tracked somewhere. This is useful if a list of required objects is already tracked somewhere.
Furthermore the lock helper can use propagate the -EALREADY return code back to Furthermore the lock helper can use propagate the -EALREADY return code back to
the caller as a signal that an object is twice on the list. This is useful if the caller as a signal that an object is twice on the list. This is useful if
the list is constructed from userspace input and the ABI requires userspace to the list is constructed from userspace input and the ABI requires userspace to
not have duplicate entries (e.g. for a gpu commandbuffer submission ioctl). not have duplicate entries (e.g. for a gpu commandbuffer submission ioctl)::
int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
{ {
struct obj *res_obj = NULL; struct obj *res_obj = NULL;
struct obj_entry *contended_entry = NULL; struct obj_entry *contended_entry = NULL;
struct obj_entry *entry; struct obj_entry *entry;
ww_acquire_init(ctx, &ww_class); ww_acquire_init(ctx, &ww_class);
retry: retry:
list_for_each_entry (entry, list, head) { list_for_each_entry (entry, list, head) {
if (entry->obj == res_obj) { if (entry->obj == res_obj) {
res_obj = NULL; res_obj = NULL;
...@@ -160,7 +162,7 @@ retry: ...@@ -160,7 +162,7 @@ retry:
ww_acquire_done(ctx); ww_acquire_done(ctx);
return 0; return 0;
err: err:
list_for_each_entry_continue_reverse (entry, list, head) list_for_each_entry_continue_reverse (entry, list, head)
ww_mutex_unlock(&entry->obj->lock); ww_mutex_unlock(&entry->obj->lock);
...@@ -176,14 +178,14 @@ err: ...@@ -176,14 +178,14 @@ err:
ww_acquire_fini(ctx); ww_acquire_fini(ctx);
return ret; return ret;
} }
Method 2, using a list in execbuf->buffers that can be reordered. Same semantics Method 2, using a list in execbuf->buffers that can be reordered. Same semantics
of duplicate entry detection using -EALREADY as method 1 above. But the of duplicate entry detection using -EALREADY as method 1 above. But the
list-reordering allows for a bit more idiomatic code. list-reordering allows for a bit more idiomatic code::
int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
{ {
struct obj_entry *entry, *entry2; struct obj_entry *entry, *entry2;
ww_acquire_init(ctx, &ww_class); ww_acquire_init(ctx, &ww_class);
...@@ -216,24 +218,25 @@ int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) ...@@ -216,24 +218,25 @@ int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
ww_acquire_done(ctx); ww_acquire_done(ctx);
return 0; return 0;
} }
Unlocking works the same way for both methods #1 and #2: Unlocking works the same way for both methods #1 and #2::
void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
{ {
struct obj_entry *entry; struct obj_entry *entry;
list_for_each_entry (entry, list, head) list_for_each_entry (entry, list, head)
ww_mutex_unlock(&entry->obj->lock); ww_mutex_unlock(&entry->obj->lock);
ww_acquire_fini(ctx); ww_acquire_fini(ctx);
} }
Method 3 is useful if the list of objects is constructed ad-hoc and not upfront, Method 3 is useful if the list of objects is constructed ad-hoc and not upfront,
e.g. when adjusting edges in a graph where each node has its own ww_mutex lock, e.g. when adjusting edges in a graph where each node has its own ww_mutex lock,
and edges can only be changed when holding the locks of all involved nodes. w/w and edges can only be changed when holding the locks of all involved nodes. w/w
mutexes are a natural fit for such a case for two reasons: mutexes are a natural fit for such a case for two reasons:
- They can handle lock-acquisition in any order which allows us to start walking - They can handle lock-acquisition in any order which allows us to start walking
a graph from a starting point and then iteratively discovering new edges and a graph from a starting point and then iteratively discovering new edges and
locking down the nodes those edges connect to. locking down the nodes those edges connect to.
...@@ -243,6 +246,7 @@ mutexes are a natural fit for such a case for two reasons: ...@@ -243,6 +246,7 @@ mutexes are a natural fit for such a case for two reasons:
as a starting point). as a starting point).
Note that this approach differs in two important ways from the above methods: Note that this approach differs in two important ways from the above methods:
- Since the list of objects is dynamically constructed (and might very well be - Since the list of objects is dynamically constructed (and might very well be
different when retrying due to hitting the -EDEADLK die condition) there's different when retrying due to hitting the -EDEADLK die condition) there's
no need to keep any object on a persistent list when it's not locked. We can no need to keep any object on a persistent list when it's not locked. We can
...@@ -260,17 +264,17 @@ any interface misuse for these cases. ...@@ -260,17 +264,17 @@ any interface misuse for these cases.
Also, method 3 can't fail the lock acquisition step since it doesn't return Also, method 3 can't fail the lock acquisition step since it doesn't return
-EALREADY. Of course this would be different when using the _interruptible -EALREADY. Of course this would be different when using the _interruptible
variants, but that's outside of the scope of these examples here. variants, but that's outside of the scope of these examples here::
struct obj { struct obj {
struct ww_mutex ww_mutex; struct ww_mutex ww_mutex;
struct list_head locked_list; struct list_head locked_list;
}; };
static DEFINE_WW_CLASS(ww_class); static DEFINE_WW_CLASS(ww_class);
void __unlock_objs(struct list_head *list) void __unlock_objs(struct list_head *list)
{ {
struct obj *entry, *temp; struct obj *entry, *temp;
list_for_each_entry_safe (entry, temp, list, locked_list) { list_for_each_entry_safe (entry, temp, list, locked_list) {
...@@ -279,15 +283,15 @@ void __unlock_objs(struct list_head *list) ...@@ -279,15 +283,15 @@ void __unlock_objs(struct list_head *list)
list_del(&entry->locked_list); list_del(&entry->locked_list);
ww_mutex_unlock(entry->ww_mutex) ww_mutex_unlock(entry->ww_mutex)
} }
} }
void lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) void lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
{ {
struct obj *obj; struct obj *obj;
ww_acquire_init(ctx, &ww_class); ww_acquire_init(ctx, &ww_class);
retry: retry:
/* re-init loop start state */ /* re-init loop start state */
loop { loop {
/* magic code which walks over a graph and decides which objects /* magic code which walks over a graph and decides which objects
...@@ -312,13 +316,13 @@ retry: ...@@ -312,13 +316,13 @@ retry:
ww_acquire_done(ctx); ww_acquire_done(ctx);
return 0; return 0;
} }
void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
{ {
__unlock_objs(list); __unlock_objs(list);
ww_acquire_fini(ctx); ww_acquire_fini(ctx);
} }
Method 4: Only lock one single objects. In that case deadlock detection and Method 4: Only lock one single objects. In that case deadlock detection and
prevention is obviously overkill, since with grabbing just one lock you can't prevention is obviously overkill, since with grabbing just one lock you can't
...@@ -329,11 +333,14 @@ Implementation Details ...@@ -329,11 +333,14 @@ Implementation Details
---------------------- ----------------------
Design: Design:
^^^^^^^
ww_mutex currently encapsulates a struct mutex, this means no extra overhead for ww_mutex currently encapsulates a struct mutex, this means no extra overhead for
normal mutex locks, which are far more common. As such there is only a small normal mutex locks, which are far more common. As such there is only a small
increase in code size if wait/wound mutexes are not used. increase in code size if wait/wound mutexes are not used.
We maintain the following invariants for the wait list: We maintain the following invariants for the wait list:
(1) Waiters with an acquire context are sorted by stamp order; waiters (1) Waiters with an acquire context are sorted by stamp order; waiters
without an acquire context are interspersed in FIFO order. without an acquire context are interspersed in FIFO order.
(2) For Wait-Die, among waiters with contexts, only the first one can have (2) For Wait-Die, among waiters with contexts, only the first one can have
...@@ -355,6 +362,8 @@ Design: ...@@ -355,6 +362,8 @@ Design:
therefore be directed towards the uncontended cases. therefore be directed towards the uncontended cases.
Lockdep: Lockdep:
^^^^^^^^
Special care has been taken to warn for as many cases of api abuse Special care has been taken to warn for as many cases of api abuse
as possible. Some common api abuses will be caught with as possible. Some common api abuses will be caught with
CONFIG_DEBUG_MUTEXES, but CONFIG_PROVE_LOCKING is recommended. CONFIG_DEBUG_MUTEXES, but CONFIG_PROVE_LOCKING is recommended.
...@@ -379,5 +388,6 @@ Lockdep: ...@@ -379,5 +388,6 @@ Lockdep:
having called ww_acquire_fini on the first. having called ww_acquire_fini on the first.
- 'normal' deadlocks that can occur. - 'normal' deadlocks that can occur.
FIXME: Update this section once we have the TASK_DEADLOCK task state flag magic FIXME:
implemented. Update this section once we have the TASK_DEADLOCK task state flag magic
implemented.
...@@ -119,4 +119,4 @@ properties of futexes, and all four combinations are possible: futex, ...@@ -119,4 +119,4 @@ properties of futexes, and all four combinations are possible: futex,
robust-futex, PI-futex, robust+PI-futex. robust-futex, PI-futex, robust+PI-futex.
More details about priority inheritance can be found in More details about priority inheritance can be found in
Documentation/locking/rt-mutex.txt. Documentation/locking/rt-mutex.rst.
...@@ -1404,7 +1404,7 @@ Riferimento per l'API dei Futex ...@@ -1404,7 +1404,7 @@ Riferimento per l'API dei Futex
Approfondimenti Approfondimenti
=============== ===============
- ``Documentation/locking/spinlocks.txt``: la guida di Linus Torvalds agli - ``Documentation/locking/spinlocks.rst``: la guida di Linus Torvalds agli
spinlock del kernel. spinlock del kernel.
- Unix Systems for Modern Architectures: Symmetric Multiprocessing and - Unix Systems for Modern Architectures: Symmetric Multiprocessing and
......
...@@ -36,7 +36,7 @@ ...@@ -36,7 +36,7 @@
* of extra utility/tracking out of our acquire-ctx. This is provided * of extra utility/tracking out of our acquire-ctx. This is provided
* by &struct drm_modeset_lock and &struct drm_modeset_acquire_ctx. * by &struct drm_modeset_lock and &struct drm_modeset_acquire_ctx.
* *
* For basic principles of &ww_mutex, see: Documentation/locking/ww-mutex-design.txt * For basic principles of &ww_mutex, see: Documentation/locking/ww-mutex-design.rst
* *
* The basic usage pattern is to:: * The basic usage pattern is to::
* *
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
* Copyright (C) 2006,2007 Red Hat, Inc., Ingo Molnar <mingo@redhat.com> * Copyright (C) 2006,2007 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
* Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra * Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra
* *
* see Documentation/locking/lockdep-design.txt for more details. * see Documentation/locking/lockdep-design.rst for more details.
*/ */
#ifndef __LINUX_LOCKDEP_H #ifndef __LINUX_LOCKDEP_H
#define __LINUX_LOCKDEP_H #define __LINUX_LOCKDEP_H
......
...@@ -151,7 +151,7 @@ static inline bool mutex_is_locked(struct mutex *lock) ...@@ -151,7 +151,7 @@ static inline bool mutex_is_locked(struct mutex *lock)
/* /*
* See kernel/locking/mutex.c for detailed documentation of these APIs. * See kernel/locking/mutex.c for detailed documentation of these APIs.
* Also see Documentation/locking/mutex-design.txt. * Also see Documentation/locking/mutex-design.rst.
*/ */
#ifdef CONFIG_DEBUG_LOCK_ALLOC #ifdef CONFIG_DEBUG_LOCK_ALLOC
extern void mutex_lock_nested(struct mutex *lock, unsigned int subclass); extern void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
......
...@@ -160,7 +160,7 @@ extern void downgrade_write(struct rw_semaphore *sem); ...@@ -160,7 +160,7 @@ extern void downgrade_write(struct rw_semaphore *sem);
* static then another method for expressing nested locking is * static then another method for expressing nested locking is
* the explicit definition of lock class keys and the use of * the explicit definition of lock class keys and the use of
* lockdep_set_class() at lock initialization time. * lockdep_set_class() at lock initialization time.
* See Documentation/locking/lockdep-design.txt for more details.) * See Documentation/locking/lockdep-design.rst for more details.)
*/ */
extern void down_read_nested(struct rw_semaphore *sem, int subclass); extern void down_read_nested(struct rw_semaphore *sem, int subclass);
extern void down_write_nested(struct rw_semaphore *sem, int subclass); extern void down_write_nested(struct rw_semaphore *sem, int subclass);
......
...@@ -16,7 +16,7 @@ ...@@ -16,7 +16,7 @@
* by Steven Rostedt, based on work by Gregory Haskins, Peter Morreale * by Steven Rostedt, based on work by Gregory Haskins, Peter Morreale
* and Sven Dietrich. * and Sven Dietrich.
* *
* Also see Documentation/locking/mutex-design.txt. * Also see Documentation/locking/mutex-design.rst.
*/ */
#include <linux/mutex.h> #include <linux/mutex.h>
#include <linux/ww_mutex.h> #include <linux/ww_mutex.h>
......
...@@ -9,7 +9,7 @@ ...@@ -9,7 +9,7 @@
* Copyright (C) 2005 Kihon Technologies Inc., Steven Rostedt * Copyright (C) 2005 Kihon Technologies Inc., Steven Rostedt
* Copyright (C) 2006 Esben Nielsen * Copyright (C) 2006 Esben Nielsen
* *
* See Documentation/locking/rt-mutex-design.txt for details. * See Documentation/locking/rt-mutex-design.rst for details.
*/ */
#include <linux/spinlock.h> #include <linux/spinlock.h>
#include <linux/export.h> #include <linux/export.h>
......
...@@ -1139,7 +1139,7 @@ config PROVE_LOCKING ...@@ -1139,7 +1139,7 @@ config PROVE_LOCKING
the proof of observed correctness is also maintained for an the proof of observed correctness is also maintained for an
arbitrary combination of these separate locking variants. arbitrary combination of these separate locking variants.
For more details, see Documentation/locking/lockdep-design.txt. For more details, see Documentation/locking/lockdep-design.rst.
config LOCK_STAT config LOCK_STAT
bool "Lock usage statistics" bool "Lock usage statistics"
...@@ -1153,7 +1153,7 @@ config LOCK_STAT ...@@ -1153,7 +1153,7 @@ config LOCK_STAT
help help
This feature enables tracking lock contention points This feature enables tracking lock contention points
For more details, see Documentation/locking/lockstat.txt For more details, see Documentation/locking/lockstat.rst
This also enables lock events required by "perf lock", This also enables lock events required by "perf lock",
subcommand of perf. subcommand of perf.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment