Commit 8f0cb666 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'core-rcu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:

 - kfree_rcu updates

 - RCU tasks updates

 - Read-side scalability tests

 - SRCU updates

 - Torture-test updates

 - Documentation updates

 - Miscellaneous fixes

* tag 'core-rcu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (109 commits)
  torture: Remove obsolete "cd $KVM"
  torture: Avoid duplicate specification of qemu command
  torture: Dump ftrace at shutdown only if requested
  torture: Add kvm-tranform.sh script for qemu-cmd files
  torture: Add more tracing crib notes to kvm.sh
  torture: Improve diagnostic for KCSAN-incapable compilers
  torture: Correctly summarize build-only runs
  torture: Pass --kmake-arg to all make invocations
  rcutorture: Check for unwatched readers
  torture: Abstract out console-log error detection
  torture: Add a stop-run capability
  torture: Create qemu-cmd in --buildonly runs
  rcu/rcutorture: Replace 0 with false
  torture: Add --allcpus argument to the kvm.sh script
  torture: Remove whitespace from identify_qemu_vcpus output
  rcutorture: NULL rcu_torture_current earlier in cleanup code
  rcutorture: Handle non-statistic bang-string error messages
  torture: Set configfile variable to current scenario
  rcutorture: Add races with task-exit processing
  locktorture: Use true and false to assign to bool variables
  ...
parents 5ece0817 c1cc4784
...@@ -2583,7 +2583,12 @@ not work to have these markers in the trampoline itself, because there ...@@ -2583,7 +2583,12 @@ not work to have these markers in the trampoline itself, because there
would need to be instructions following ``rcu_read_unlock()``. Although would need to be instructions following ``rcu_read_unlock()``. Although
``synchronize_rcu()`` would guarantee that execution reached the ``synchronize_rcu()`` would guarantee that execution reached the
``rcu_read_unlock()``, it would not be able to guarantee that execution ``rcu_read_unlock()``, it would not be able to guarantee that execution
had completely left the trampoline. had completely left the trampoline. Worse yet, in some situations
the trampoline's protection must extend a few instructions *prior* to
execution reaching the trampoline. For example, these few instructions
might calculate the address of the trampoline, so that entering the
trampoline would be pre-ordained a surprisingly long time before execution
actually reached the trampoline itself.
The solution, in the form of `Tasks The solution, in the form of `Tasks
RCU <https://lwn.net/Articles/607117/>`__, is to have implicit read-side RCU <https://lwn.net/Articles/607117/>`__, is to have implicit read-side
......
.. SPDX-License-Identifier: GPL-2.0
================================
Review Checklist for RCU Patches Review Checklist for RCU Patches
================================
This document contains a checklist for producing and reviewing patches This document contains a checklist for producing and reviewing patches
...@@ -411,18 +415,21 @@ over a rather long period of time, but improvements are always welcome! ...@@ -411,18 +415,21 @@ over a rather long period of time, but improvements are always welcome!
__rcu sparse checks to validate your RCU code. These can help __rcu sparse checks to validate your RCU code. These can help
find problems as follows: find problems as follows:
CONFIG_PROVE_LOCKING: check that accesses to RCU-protected data CONFIG_PROVE_LOCKING:
check that accesses to RCU-protected data
structures are carried out under the proper RCU structures are carried out under the proper RCU
read-side critical section, while holding the right read-side critical section, while holding the right
combination of locks, or whatever other conditions combination of locks, or whatever other conditions
are appropriate. are appropriate.
CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the CONFIG_DEBUG_OBJECTS_RCU_HEAD:
check that you don't pass the
same object to call_rcu() (or friends) before an RCU same object to call_rcu() (or friends) before an RCU
grace period has elapsed since the last time that you grace period has elapsed since the last time that you
passed that same object to call_rcu() (or friends). passed that same object to call_rcu() (or friends).
__rcu sparse checks: tag the pointer to the RCU-protected data __rcu sparse checks:
tag the pointer to the RCU-protected data
structure with __rcu, and sparse will warn you if you structure with __rcu, and sparse will warn you if you
access that pointer without the services of one of the access that pointer without the services of one of the
variants of rcu_dereference(). variants of rcu_dereference().
...@@ -442,8 +449,8 @@ over a rather long period of time, but improvements are always welcome! ...@@ -442,8 +449,8 @@ over a rather long period of time, but improvements are always welcome!
You instead need to use one of the barrier functions: You instead need to use one of the barrier functions:
o call_rcu() -> rcu_barrier() - call_rcu() -> rcu_barrier()
o call_srcu() -> srcu_barrier() - call_srcu() -> srcu_barrier()
However, these barrier functions are absolutely -not- guaranteed However, these barrier functions are absolutely -not- guaranteed
to wait for a grace period. In fact, if there are no call_rcu() to wait for a grace period. In fact, if there are no call_rcu()
......
.. SPDX-License-Identifier: GPL-2.0
.. _rcu_concepts: .. _rcu_concepts:
============ ============
...@@ -8,10 +10,17 @@ RCU concepts ...@@ -8,10 +10,17 @@ RCU concepts
:maxdepth: 3 :maxdepth: 3
arrayRCU arrayRCU
checklist
lockdep
lockdep-splat
rcubarrier rcubarrier
rcu_dereference rcu_dereference
whatisRCU whatisRCU
rcu rcu
rculist_nulls
rcuref
torture
stallwarn
listRCU listRCU
NMI-RCU NMI-RCU
UP UP
......
.. SPDX-License-Identifier: GPL-2.0
=================
Lockdep-RCU Splat
=================
Lockdep-RCU was added to the Linux kernel in early 2010 Lockdep-RCU was added to the Linux kernel in early 2010
(http://lwn.net/Articles/371986/). This facility checks for some common (http://lwn.net/Articles/371986/). This facility checks for some common
misuses of the RCU API, most notably using one of the rcu_dereference() misuses of the RCU API, most notably using one of the rcu_dereference()
...@@ -12,55 +18,54 @@ overwriting or worse. There can of course be false positives, this ...@@ -12,55 +18,54 @@ overwriting or worse. There can of course be false positives, this
being the real world and all that. being the real world and all that.
So let's look at an example RCU lockdep splat from 3.0-rc5, one that So let's look at an example RCU lockdep splat from 3.0-rc5, one that
has long since been fixed: has long since been fixed::
============================= =============================
WARNING: suspicious RCU usage WARNING: suspicious RCU usage
----------------------------- -----------------------------
block/cfq-iosched.c:2776 suspicious rcu_dereference_protected() usage! block/cfq-iosched.c:2776 suspicious rcu_dereference_protected() usage!
other info that might help us debug this: other info that might help us debug this::
rcu_scheduler_active = 1, debug_locks = 0
rcu_scheduler_active = 1, debug_locks = 0 3 locks held by scsi_scan_6/1552:
3 locks held by scsi_scan_6/1552: #0: (&shost->scan_mutex){+.+.}, at: [<ffffffff8145efca>]
#0: (&shost->scan_mutex){+.+.}, at: [<ffffffff8145efca>] scsi_scan_host_selected+0x5a/0x150
scsi_scan_host_selected+0x5a/0x150 #1: (&eq->sysfs_lock){+.+.}, at: [<ffffffff812a5032>]
#1: (&eq->sysfs_lock){+.+.}, at: [<ffffffff812a5032>] elevator_exit+0x22/0x60
elevator_exit+0x22/0x60 #2: (&(&q->__queue_lock)->rlock){-.-.}, at: [<ffffffff812b6233>]
#2: (&(&q->__queue_lock)->rlock){-.-.}, at: [<ffffffff812b6233>] cfq_exit_queue+0x43/0x190
cfq_exit_queue+0x43/0x190
stack backtrace:
stack backtrace: Pid: 1552, comm: scsi_scan_6 Not tainted 3.0.0-rc5 #17
Pid: 1552, comm: scsi_scan_6 Not tainted 3.0.0-rc5 #17 Call Trace:
Call Trace: [<ffffffff810abb9b>] lockdep_rcu_dereference+0xbb/0xc0
[<ffffffff810abb9b>] lockdep_rcu_dereference+0xbb/0xc0 [<ffffffff812b6139>] __cfq_exit_single_io_context+0xe9/0x120
[<ffffffff812b6139>] __cfq_exit_single_io_context+0xe9/0x120 [<ffffffff812b626c>] cfq_exit_queue+0x7c/0x190
[<ffffffff812b626c>] cfq_exit_queue+0x7c/0x190 [<ffffffff812a5046>] elevator_exit+0x36/0x60
[<ffffffff812a5046>] elevator_exit+0x36/0x60 [<ffffffff812a802a>] blk_cleanup_queue+0x4a/0x60
[<ffffffff812a802a>] blk_cleanup_queue+0x4a/0x60 [<ffffffff8145cc09>] scsi_free_queue+0x9/0x10
[<ffffffff8145cc09>] scsi_free_queue+0x9/0x10 [<ffffffff81460944>] __scsi_remove_device+0x84/0xd0
[<ffffffff81460944>] __scsi_remove_device+0x84/0xd0 [<ffffffff8145dca3>] scsi_probe_and_add_lun+0x353/0xb10
[<ffffffff8145dca3>] scsi_probe_and_add_lun+0x353/0xb10 [<ffffffff817da069>] ? error_exit+0x29/0xb0
[<ffffffff817da069>] ? error_exit+0x29/0xb0 [<ffffffff817d98ed>] ? _raw_spin_unlock_irqrestore+0x3d/0x80
[<ffffffff817d98ed>] ? _raw_spin_unlock_irqrestore+0x3d/0x80 [<ffffffff8145e722>] __scsi_scan_target+0x112/0x680
[<ffffffff8145e722>] __scsi_scan_target+0x112/0x680 [<ffffffff812c690d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff812c690d>] ? trace_hardirqs_off_thunk+0x3a/0x3c [<ffffffff817da069>] ? error_exit+0x29/0xb0
[<ffffffff817da069>] ? error_exit+0x29/0xb0 [<ffffffff812bcc60>] ? kobject_del+0x40/0x40
[<ffffffff812bcc60>] ? kobject_del+0x40/0x40 [<ffffffff8145ed16>] scsi_scan_channel+0x86/0xb0
[<ffffffff8145ed16>] scsi_scan_channel+0x86/0xb0 [<ffffffff8145f0b0>] scsi_scan_host_selected+0x140/0x150
[<ffffffff8145f0b0>] scsi_scan_host_selected+0x140/0x150 [<ffffffff8145f149>] do_scsi_scan_host+0x89/0x90
[<ffffffff8145f149>] do_scsi_scan_host+0x89/0x90 [<ffffffff8145f170>] do_scan_async+0x20/0x160
[<ffffffff8145f170>] do_scan_async+0x20/0x160 [<ffffffff8145f150>] ? do_scsi_scan_host+0x90/0x90
[<ffffffff8145f150>] ? do_scsi_scan_host+0x90/0x90 [<ffffffff810975b6>] kthread+0xa6/0xb0
[<ffffffff810975b6>] kthread+0xa6/0xb0 [<ffffffff817db154>] kernel_thread_helper+0x4/0x10
[<ffffffff817db154>] kernel_thread_helper+0x4/0x10 [<ffffffff81066430>] ? finish_task_switch+0x80/0x110
[<ffffffff81066430>] ? finish_task_switch+0x80/0x110 [<ffffffff817d9c04>] ? retint_restore_args+0xe/0xe
[<ffffffff817d9c04>] ? retint_restore_args+0xe/0xe [<ffffffff81097510>] ? __kthread_init_worker+0x70/0x70
[<ffffffff81097510>] ? __kthread_init_worker+0x70/0x70 [<ffffffff817db150>] ? gs_change+0xb/0xb
[<ffffffff817db150>] ? gs_change+0xb/0xb
Line 2776 of block/cfq-iosched.c in v3.0-rc5 is as follows::
Line 2776 of block/cfq-iosched.c in v3.0-rc5 is as follows:
if (rcu_dereference(ioc->ioc_data) == cic) { if (rcu_dereference(ioc->ioc_data) == cic) {
...@@ -70,7 +75,7 @@ case. Instead, we hold three locks, one of which might be RCU related. ...@@ -70,7 +75,7 @@ case. Instead, we hold three locks, one of which might be RCU related.
And maybe that lock really does protect this reference. If so, the fix And maybe that lock really does protect this reference. If so, the fix
is to inform RCU, perhaps by changing __cfq_exit_single_io_context() to is to inform RCU, perhaps by changing __cfq_exit_single_io_context() to
take the struct request_queue "q" from cfq_exit_queue() as an argument, take the struct request_queue "q" from cfq_exit_queue() as an argument,
which would permit us to invoke rcu_dereference_protected as follows: which would permit us to invoke rcu_dereference_protected as follows::
if (rcu_dereference_protected(ioc->ioc_data, if (rcu_dereference_protected(ioc->ioc_data,
lockdep_is_held(&q->queue_lock)) == cic) { lockdep_is_held(&q->queue_lock)) == cic) {
...@@ -85,7 +90,7 @@ On the other hand, perhaps we really do need an RCU read-side critical ...@@ -85,7 +90,7 @@ On the other hand, perhaps we really do need an RCU read-side critical
section. In this case, the critical section must span the use of the section. In this case, the critical section must span the use of the
return value from rcu_dereference(), or at least until there is some return value from rcu_dereference(), or at least until there is some
reference count incremented or some such. One way to handle this is to reference count incremented or some such. One way to handle this is to
add rcu_read_lock() and rcu_read_unlock() as follows: add rcu_read_lock() and rcu_read_unlock() as follows::
rcu_read_lock(); rcu_read_lock();
if (rcu_dereference(ioc->ioc_data) == cic) { if (rcu_dereference(ioc->ioc_data) == cic) {
...@@ -102,7 +107,7 @@ above lockdep-RCU splat. ...@@ -102,7 +107,7 @@ above lockdep-RCU splat.
But in this particular case, we don't actually dereference the pointer But in this particular case, we don't actually dereference the pointer
returned from rcu_dereference(). Instead, that pointer is just compared returned from rcu_dereference(). Instead, that pointer is just compared
to the cic pointer, which means that the rcu_dereference() can be replaced to the cic pointer, which means that the rcu_dereference() can be replaced
by rcu_access_pointer() as follows: by rcu_access_pointer() as follows::
if (rcu_access_pointer(ioc->ioc_data) == cic) { if (rcu_access_pointer(ioc->ioc_data) == cic) {
......
.. SPDX-License-Identifier: GPL-2.0
========================
RCU and lockdep checking RCU and lockdep checking
========================
All flavors of RCU have lockdep checking available, so that lockdep is All flavors of RCU have lockdep checking available, so that lockdep is
aware of when each task enters and leaves any flavor of RCU read-side aware of when each task enters and leaves any flavor of RCU read-side
...@@ -8,7 +12,7 @@ tracking to include RCU state, which can sometimes help when debugging ...@@ -8,7 +12,7 @@ tracking to include RCU state, which can sometimes help when debugging
deadlocks and the like. deadlocks and the like.
In addition, RCU provides the following primitives that check lockdep's In addition, RCU provides the following primitives that check lockdep's
state: state::
rcu_read_lock_held() for normal RCU. rcu_read_lock_held() for normal RCU.
rcu_read_lock_bh_held() for RCU-bh. rcu_read_lock_bh_held() for RCU-bh.
...@@ -63,7 +67,7 @@ checking of rcu_dereference() primitives: ...@@ -63,7 +67,7 @@ checking of rcu_dereference() primitives:
The rcu_dereference_check() check expression can be any boolean The rcu_dereference_check() check expression can be any boolean
expression, but would normally include a lockdep expression. However, expression, but would normally include a lockdep expression. However,
any boolean expression can be used. For a moderately ornate example, any boolean expression can be used. For a moderately ornate example,
consider the following: consider the following::
file = rcu_dereference_check(fdt->fd[fd], file = rcu_dereference_check(fdt->fd[fd],
lockdep_is_held(&files->file_lock) || lockdep_is_held(&files->file_lock) ||
...@@ -82,7 +86,7 @@ RCU read-side critical sections, in case (2) the ->file_lock prevents ...@@ -82,7 +86,7 @@ RCU read-side critical sections, in case (2) the ->file_lock prevents
any change from taking place, and finally, in case (3) the current task any change from taking place, and finally, in case (3) the current task
is the only task accessing the file_struct, again preventing any change is the only task accessing the file_struct, again preventing any change
from taking place. If the above statement was invoked only from updater from taking place. If the above statement was invoked only from updater
code, it could instead be written as follows: code, it could instead be written as follows::
file = rcu_dereference_protected(fdt->fd[fd], file = rcu_dereference_protected(fdt->fd[fd],
lockdep_is_held(&files->file_lock) || lockdep_is_held(&files->file_lock) ||
...@@ -105,7 +109,7 @@ false and they are called from outside any RCU read-side critical section. ...@@ -105,7 +109,7 @@ false and they are called from outside any RCU read-side critical section.
For example, the workqueue for_each_pwq() macro is intended to be used For example, the workqueue for_each_pwq() macro is intended to be used
either within an RCU read-side critical section or with wq->mutex held. either within an RCU read-side critical section or with wq->mutex held.
It is thus implemented as follows: It is thus implemented as follows::
#define for_each_pwq(pwq, wq) #define for_each_pwq(pwq, wq)
list_for_each_entry_rcu((pwq), &(wq)->pwqs, pwqs_node, list_for_each_entry_rcu((pwq), &(wq)->pwqs, pwqs_node,
......
Using hlist_nulls to protect read-mostly linked lists and .. SPDX-License-Identifier: GPL-2.0
=================================================
Using RCU hlist_nulls to protect list and objects
=================================================
This section describes how to use hlist_nulls to
protect read-mostly linked lists and
objects using SLAB_TYPESAFE_BY_RCU allocations. objects using SLAB_TYPESAFE_BY_RCU allocations.
Please read the basics in Documentation/RCU/listRCU.rst Please read the basics in Documentation/RCU/listRCU.rst
Using 'nulls'
=============
Using special makers (called 'nulls') is a convenient way Using special makers (called 'nulls') is a convenient way
to solve following problem : to solve following problem :
...@@ -12,63 +22,68 @@ use following algos : ...@@ -12,63 +22,68 @@ use following algos :
1) Lookup algo 1) Lookup algo
-------------- --------------
rcu_read_lock()
begin: ::
obj = lockless_lookup(key);
if (obj) { rcu_read_lock()
if (!try_get_ref(obj)) // might fail for free objects begin:
goto begin; obj = lockless_lookup(key);
/* if (obj) {
* Because a writer could delete object, and a writer could if (!try_get_ref(obj)) // might fail for free objects
* reuse these object before the RCU grace period, we goto begin;
* must check key after getting the reference on object /*
*/ * Because a writer could delete object, and a writer could
if (obj->key != key) { // not the object we expected * reuse these object before the RCU grace period, we
put_ref(obj); * must check key after getting the reference on object
goto begin; */
} if (obj->key != key) { // not the object we expected
} put_ref(obj);
rcu_read_unlock(); goto begin;
}
}
rcu_read_unlock();
Beware that lockless_lookup(key) cannot use traditional hlist_for_each_entry_rcu() Beware that lockless_lookup(key) cannot use traditional hlist_for_each_entry_rcu()
but a version with an additional memory barrier (smp_rmb()) but a version with an additional memory barrier (smp_rmb())
lockless_lookup(key) ::
{
struct hlist_node *node, *next;
for (pos = rcu_dereference((head)->first);
pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
pos = rcu_dereference(next))
if (obj->key == key)
return obj;
return NULL;
And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb() :
struct hlist_node *node; lockless_lookup(key)
for (pos = rcu_dereference((head)->first); {
pos && ({ prefetch(pos->next); 1; }) && struct hlist_node *node, *next;
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); for (pos = rcu_dereference((head)->first);
pos = rcu_dereference(pos->next)) pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
pos = rcu_dereference(next))
if (obj->key == key) if (obj->key == key)
return obj; return obj;
return NULL; return NULL;
} }
Quoting Corey Minyard : And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb()::
"If the object is moved from one list to another list in-between the struct hlist_node *node;
time the hash is calculated and the next field is accessed, and the for (pos = rcu_dereference((head)->first);
object has moved to the end of a new list, the traversal will not pos && ({ prefetch(pos->next); 1; }) &&
complete properly on the list it should have, since the object will ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
be on the end of the new list and there's not a way to tell it's on a pos = rcu_dereference(pos->next))
new list and restart the list traversal. I think that this can be if (obj->key == key)
solved by pre-fetching the "next" field (with proper barriers) before return obj;
checking the key." return NULL;
2) Insert algo : Quoting Corey Minyard::
----------------
"If the object is moved from one list to another list in-between the
time the hash is calculated and the next field is accessed, and the
object has moved to the end of a new list, the traversal will not
complete properly on the list it should have, since the object will
be on the end of the new list and there's not a way to tell it's on a
new list and restart the list traversal. I think that this can be
solved by pre-fetching the "next" field (with proper barriers) before
checking the key."
2) Insert algo
--------------
We need to make sure a reader cannot read the new 'obj->obj_next' value We need to make sure a reader cannot read the new 'obj->obj_next' value
and previous value of 'obj->key'. Or else, an item could be deleted and previous value of 'obj->key'. Or else, an item could be deleted
...@@ -76,21 +91,23 @@ from a chain, and inserted into another chain. If new chain was empty ...@@ -76,21 +91,23 @@ from a chain, and inserted into another chain. If new chain was empty
before the move, 'next' pointer is NULL, and lockless reader can before the move, 'next' pointer is NULL, and lockless reader can
not detect it missed following items in original chain. not detect it missed following items in original chain.
/* ::
* Please note that new inserts are done at the head of list,
* not in the middle or end. /*
*/ * Please note that new inserts are done at the head of list,
obj = kmem_cache_alloc(...); * not in the middle or end.
lock_chain(); // typically a spin_lock() */
obj->key = key; obj = kmem_cache_alloc(...);
/* lock_chain(); // typically a spin_lock()
* we need to make sure obj->key is updated before obj->next obj->key = key;
* or obj->refcnt /*
*/ * we need to make sure obj->key is updated before obj->next
smp_wmb(); * or obj->refcnt
atomic_set(&obj->refcnt, 1); */
hlist_add_head_rcu(&obj->obj_node, list); smp_wmb();
unlock_chain(); // typically a spin_unlock() atomic_set(&obj->refcnt, 1);
hlist_add_head_rcu(&obj->obj_node, list);
unlock_chain(); // typically a spin_unlock()
3) Remove algo 3) Remove algo
...@@ -99,16 +116,22 @@ Nothing special here, we can use a standard RCU hlist deletion. ...@@ -99,16 +116,22 @@ Nothing special here, we can use a standard RCU hlist deletion.
But thanks to SLAB_TYPESAFE_BY_RCU, beware a deleted object can be reused But thanks to SLAB_TYPESAFE_BY_RCU, beware a deleted object can be reused
very very fast (before the end of RCU grace period) very very fast (before the end of RCU grace period)
if (put_last_reference_on(obj) { ::
lock_chain(); // typically a spin_lock()
hlist_del_init_rcu(&obj->obj_node); if (put_last_reference_on(obj) {
unlock_chain(); // typically a spin_unlock() lock_chain(); // typically a spin_lock()
kmem_cache_free(cachep, obj); hlist_del_init_rcu(&obj->obj_node);
} unlock_chain(); // typically a spin_unlock()
kmem_cache_free(cachep, obj);
}
-------------------------------------------------------------------------- --------------------------------------------------------------------------
Avoiding extra smp_rmb()
========================
With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup() With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup()
and extra smp_wmb() in insert function. and extra smp_wmb() in insert function.
...@@ -124,49 +147,54 @@ scan the list again without harm. ...@@ -124,49 +147,54 @@ scan the list again without harm.
1) lookup algo 1) lookup algo
--------------
head = &table[slot]; ::
rcu_read_lock();
begin: head = &table[slot];
hlist_nulls_for_each_entry_rcu(obj, node, head, member) { rcu_read_lock();
if (obj->key == key) { begin:
hlist_nulls_for_each_entry_rcu(obj, node, head, member) {
if (obj->key == key) {
if (!try_get_ref(obj)) // might fail for free objects if (!try_get_ref(obj)) // might fail for free objects
goto begin; goto begin;
if (obj->key != key) { // not the object we expected if (obj->key != key) { // not the object we expected
put_ref(obj); put_ref(obj);
goto begin; goto begin;
} }
goto out; goto out;
} }
/* /*
* if the nulls value we got at the end of this lookup is * if the nulls value we got at the end of this lookup is
* not the expected one, we must restart lookup. * not the expected one, we must restart lookup.
* We probably met an item that was moved to another chain. * We probably met an item that was moved to another chain.
*/ */
if (get_nulls_value(node) != slot) if (get_nulls_value(node) != slot)
goto begin; goto begin;
obj = NULL; obj = NULL;
out: out:
rcu_read_unlock(); rcu_read_unlock();
2) Insert function : 2) Insert function
-------------------- ------------------
/* ::
* Please note that new inserts are done at the head of list,
* not in the middle or end. /*
*/ * Please note that new inserts are done at the head of list,
obj = kmem_cache_alloc(cachep); * not in the middle or end.
lock_chain(); // typically a spin_lock() */
obj->key = key; obj = kmem_cache_alloc(cachep);
/* lock_chain(); // typically a spin_lock()
* changes to obj->key must be visible before refcnt one obj->key = key;
*/ /*
smp_wmb(); * changes to obj->key must be visible before refcnt one
atomic_set(&obj->refcnt, 1); */
/* smp_wmb();
* insert obj in RCU way (readers might be traversing chain) atomic_set(&obj->refcnt, 1);
*/ /*
hlist_nulls_add_head_rcu(&obj->obj_node, list); * insert obj in RCU way (readers might be traversing chain)
unlock_chain(); // typically a spin_unlock() */
hlist_nulls_add_head_rcu(&obj->obj_node, list);
unlock_chain(); // typically a spin_unlock()
Reference-count design for elements of lists/arrays protected by RCU. .. SPDX-License-Identifier: GPL-2.0
====================================================================
Reference-count design for elements of lists/arrays protected by RCU
====================================================================
Please note that the percpu-ref feature is likely your first Please note that the percpu-ref feature is likely your first
...@@ -12,32 +16,33 @@ please read on. ...@@ -12,32 +16,33 @@ please read on.
Reference counting on elements of lists which are protected by traditional Reference counting on elements of lists which are protected by traditional
reader/writer spinlocks or semaphores are straightforward: reader/writer spinlocks or semaphores are straightforward:
CODE LISTING A: CODE LISTING A::
1. 2.
add() search_and_reference() 1. 2.
{ { add() search_and_reference()
alloc_object read_lock(&list_lock); { {
... search_for_element alloc_object read_lock(&list_lock);
atomic_set(&el->rc, 1); atomic_inc(&el->rc); ... search_for_element
write_lock(&list_lock); ... atomic_set(&el->rc, 1); atomic_inc(&el->rc);
add_element read_unlock(&list_lock); write_lock(&list_lock); ...
... ... add_element read_unlock(&list_lock);
write_unlock(&list_lock); } ... ...
} write_unlock(&list_lock); }
}
3. 4.
release_referenced() delete() 3. 4.
{ { release_referenced() delete()
... write_lock(&list_lock); { {
if(atomic_dec_and_test(&el->rc)) ... ... write_lock(&list_lock);
kfree(el); if(atomic_dec_and_test(&el->rc)) ...
... remove_element kfree(el);
} write_unlock(&list_lock); ... remove_element
... } write_unlock(&list_lock);
if (atomic_dec_and_test(&el->rc)) ...
kfree(el); if (atomic_dec_and_test(&el->rc))
... kfree(el);
} ...
}
If this list/array is made lock free using RCU as in changing the If this list/array is made lock free using RCU as in changing the
write_lock() in add() and delete() to spin_lock() and changing read_lock() write_lock() in add() and delete() to spin_lock() and changing read_lock()
...@@ -46,34 +51,35 @@ search_and_reference() could potentially hold reference to an element which ...@@ -46,34 +51,35 @@ search_and_reference() could potentially hold reference to an element which
has already been deleted from the list/array. Use atomic_inc_not_zero() has already been deleted from the list/array. Use atomic_inc_not_zero()
in this scenario as follows: in this scenario as follows:
CODE LISTING B: CODE LISTING B::
1. 2.
add() search_and_reference() 1. 2.
{ { add() search_and_reference()
alloc_object rcu_read_lock(); { {
... search_for_element alloc_object rcu_read_lock();
atomic_set(&el->rc, 1); if (!atomic_inc_not_zero(&el->rc)) { ... search_for_element
spin_lock(&list_lock); rcu_read_unlock(); atomic_set(&el->rc, 1); if (!atomic_inc_not_zero(&el->rc)) {
return FAIL; spin_lock(&list_lock); rcu_read_unlock();
add_element } return FAIL;
... ... add_element }
spin_unlock(&list_lock); rcu_read_unlock(); ... ...
} } spin_unlock(&list_lock); rcu_read_unlock();
3. 4. } }
release_referenced() delete() 3. 4.
{ { release_referenced() delete()
... spin_lock(&list_lock); { {
if (atomic_dec_and_test(&el->rc)) ... ... spin_lock(&list_lock);
call_rcu(&el->head, el_free); remove_element if (atomic_dec_and_test(&el->rc)) ...
... spin_unlock(&list_lock); call_rcu(&el->head, el_free); remove_element
} ... ... spin_unlock(&list_lock);
if (atomic_dec_and_test(&el->rc)) } ...
call_rcu(&el->head, el_free); if (atomic_dec_and_test(&el->rc))
... call_rcu(&el->head, el_free);
} ...
}
Sometimes, a reference to the element needs to be obtained in the Sometimes, a reference to the element needs to be obtained in the
update (write) stream. In such cases, atomic_inc_not_zero() might be update (write) stream. In such cases, atomic_inc_not_zero() might be
overkill, since we hold the update-side spinlock. One might instead overkill, since we hold the update-side spinlock. One might instead
use atomic_inc() in such cases. use atomic_inc() in such cases.
...@@ -82,39 +88,40 @@ search_and_reference() code path. In such cases, the ...@@ -82,39 +88,40 @@ search_and_reference() code path. In such cases, the
atomic_dec_and_test() may be moved from delete() to el_free() atomic_dec_and_test() may be moved from delete() to el_free()
as follows: as follows:
CODE LISTING C: CODE LISTING C::
1. 2.
add() search_and_reference() 1. 2.
{ { add() search_and_reference()
alloc_object rcu_read_lock(); { {
... search_for_element alloc_object rcu_read_lock();
atomic_set(&el->rc, 1); atomic_inc(&el->rc); ... search_for_element
spin_lock(&list_lock); ... atomic_set(&el->rc, 1); atomic_inc(&el->rc);
spin_lock(&list_lock); ...
add_element rcu_read_unlock();
... } add_element rcu_read_unlock();
spin_unlock(&list_lock); 4. ... }
} delete() spin_unlock(&list_lock); 4.
3. { } delete()
release_referenced() spin_lock(&list_lock); 3. {
{ ... release_referenced() spin_lock(&list_lock);
... remove_element { ...
if (atomic_dec_and_test(&el->rc)) spin_unlock(&list_lock); ... remove_element
kfree(el); ... if (atomic_dec_and_test(&el->rc)) spin_unlock(&list_lock);
... call_rcu(&el->head, el_free); kfree(el); ...
} ... ... call_rcu(&el->head, el_free);
5. } } ...
void el_free(struct rcu_head *rhp) 5. }
{ void el_free(struct rcu_head *rhp)
release_referenced(); {
} release_referenced();
}
The key point is that the initial reference added by add() is not removed The key point is that the initial reference added by add() is not removed
until after a grace period has elapsed following removal. This means that until after a grace period has elapsed following removal. This means that
search_and_reference() cannot find this element, which means that the value search_and_reference() cannot find this element, which means that the value
of el->rc cannot increase. Thus, once it reaches zero, there are no of el->rc cannot increase. Thus, once it reaches zero, there are no
readers that can or ever will be able to reference the element. The readers that can or ever will be able to reference the element. The
element can therefore safely be freed. This in turn guarantees that if element can therefore safely be freed. This in turn guarantees that if
any reader finds the element, that reader may safely acquire a reference any reader finds the element, that reader may safely acquire a reference
without checking the value of the reference counter. without checking the value of the reference counter.
...@@ -130,21 +137,21 @@ the eventual invocation of kfree(), which is usually not a problem on ...@@ -130,21 +137,21 @@ the eventual invocation of kfree(), which is usually not a problem on
modern computer systems, even the small ones. modern computer systems, even the small ones.
In cases where delete() can sleep, synchronize_rcu() can be called from In cases where delete() can sleep, synchronize_rcu() can be called from
delete(), so that el_free() can be subsumed into delete as follows: delete(), so that el_free() can be subsumed into delete as follows::
4. 4.
delete() delete()
{ {
spin_lock(&list_lock); spin_lock(&list_lock);
... ...
remove_element remove_element
spin_unlock(&list_lock); spin_unlock(&list_lock);
... ...
synchronize_rcu(); synchronize_rcu();
if (atomic_dec_and_test(&el->rc)) if (atomic_dec_and_test(&el->rc))
kfree(el); kfree(el);
... ...
} }
As additional examples in the kernel, the pattern in listing C is used by As additional examples in the kernel, the pattern in listing C is used by
reference counting of struct pid, while the pattern in listing B is used by reference counting of struct pid, while the pattern in listing B is used by
......
.. SPDX-License-Identifier: GPL-2.0
==============================
Using RCU's CPU Stall Detector Using RCU's CPU Stall Detector
==============================
This document first discusses what sorts of issues RCU's CPU stall This document first discusses what sorts of issues RCU's CPU stall
detector can locate, and then discusses kernel parameters and Kconfig detector can locate, and then discusses kernel parameters and Kconfig
...@@ -7,39 +11,40 @@ this document explains the stall detector's "splat" format. ...@@ -7,39 +11,40 @@ this document explains the stall detector's "splat" format.
What Causes RCU CPU Stall Warnings? What Causes RCU CPU Stall Warnings?
===================================
So your kernel printed an RCU CPU stall warning. The next question is So your kernel printed an RCU CPU stall warning. The next question is
"What caused it?" The following problems can result in RCU CPU stall "What caused it?" The following problems can result in RCU CPU stall
warnings: warnings:
o A CPU looping in an RCU read-side critical section. - A CPU looping in an RCU read-side critical section.
o A CPU looping with interrupts disabled. - A CPU looping with interrupts disabled.
o A CPU looping with preemption disabled. - A CPU looping with preemption disabled.
o A CPU looping with bottom halves disabled. - A CPU looping with bottom halves disabled.
o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel - For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
without invoking schedule(). If the looping in the kernel is without invoking schedule(). If the looping in the kernel is
really expected and desirable behavior, you might need to add really expected and desirable behavior, you might need to add
some calls to cond_resched(). some calls to cond_resched().
o Booting Linux using a console connection that is too slow to - Booting Linux using a console connection that is too slow to
keep up with the boot-time console-message rate. For example, keep up with the boot-time console-message rate. For example,
a 115Kbaud serial console can be -way- too slow to keep up a 115Kbaud serial console can be -way- too slow to keep up
with boot-time message rates, and will frequently result in with boot-time message rates, and will frequently result in
RCU CPU stall warning messages. Especially if you have added RCU CPU stall warning messages. Especially if you have added
debug printk()s. debug printk()s.
o Anything that prevents RCU's grace-period kthreads from running. - Anything that prevents RCU's grace-period kthreads from running.
This can result in the "All QSes seen" console-log message. This can result in the "All QSes seen" console-log message.
This message will include information on when the kthread last This message will include information on when the kthread last
ran and how often it should be expected to run. It can also ran and how often it should be expected to run. It can also
result in the "rcu_.*kthread starved for" console-log message, result in the ``rcu_.*kthread starved for`` console-log message,
which will include additional debugging information. which will include additional debugging information.
o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might - A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
happen to preempt a low-priority task in the middle of an RCU happen to preempt a low-priority task in the middle of an RCU
read-side critical section. This is especially damaging if read-side critical section. This is especially damaging if
that low-priority task is not permitted to run on any other CPU, that low-priority task is not permitted to run on any other CPU,
...@@ -48,7 +53,7 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might ...@@ -48,7 +53,7 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
While the system is in the process of running itself out of While the system is in the process of running itself out of
memory, you might see stall-warning messages. memory, you might see stall-warning messages.
o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that - A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
is running at a higher priority than the RCU softirq threads. is running at a higher priority than the RCU softirq threads.
This will prevent RCU callbacks from ever being invoked, This will prevent RCU callbacks from ever being invoked,
and in a CONFIG_PREEMPT_RCU kernel will further prevent and in a CONFIG_PREEMPT_RCU kernel will further prevent
...@@ -63,7 +68,7 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that ...@@ -63,7 +68,7 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
can increase your system's context-switch rate and thus degrade can increase your system's context-switch rate and thus degrade
performance. performance.
o A periodic interrupt whose handler takes longer than the time - A periodic interrupt whose handler takes longer than the time
interval between successive pairs of interrupts. This can interval between successive pairs of interrupts. This can
prevent RCU's kthreads and softirq handlers from running. prevent RCU's kthreads and softirq handlers from running.
Note that certain high-overhead debugging options, for example Note that certain high-overhead debugging options, for example
...@@ -71,20 +76,27 @@ o A periodic interrupt whose handler takes longer than the time ...@@ -71,20 +76,27 @@ o A periodic interrupt whose handler takes longer than the time
considerably longer than normal, which can in turn result in considerably longer than normal, which can in turn result in
RCU CPU stall warnings. RCU CPU stall warnings.
o Testing a workload on a fast system, tuning the stall-warning - Testing a workload on a fast system, tuning the stall-warning
timeout down to just barely avoid RCU CPU stall warnings, and then timeout down to just barely avoid RCU CPU stall warnings, and then
running the same workload with the same stall-warning timeout on a running the same workload with the same stall-warning timeout on a
slow system. Note that thermal throttling and on-demand governors slow system. Note that thermal throttling and on-demand governors
can cause a single system to be sometimes fast and sometimes slow! can cause a single system to be sometimes fast and sometimes slow!
o A hardware or software issue shuts off the scheduler-clock - A hardware or software issue shuts off the scheduler-clock
interrupt on a CPU that is not in dyntick-idle mode. This interrupt on a CPU that is not in dyntick-idle mode. This
problem really has happened, and seems to be most likely to problem really has happened, and seems to be most likely to
result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels. result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels.
o A bug in the RCU implementation. - A hardware or software issue that prevents time-based wakeups
from occurring. These issues can range from misconfigured or
buggy timer hardware through bugs in the interrupt or exception
path (whether hardware, firmware, or software) through bugs
in Linux's timer subsystem through bugs in the scheduler, and,
yes, even including bugs in RCU itself.
- A bug in the RCU implementation.
o A hardware failure. This is quite unlikely, but has occurred - A hardware failure. This is quite unlikely, but has occurred
at least once in real life. A CPU failed in a running system, at least once in real life. A CPU failed in a running system,
becoming unresponsive, but not causing an immediate crash. becoming unresponsive, but not causing an immediate crash.
This resulted in a series of RCU CPU stall warnings, eventually This resulted in a series of RCU CPU stall warnings, eventually
...@@ -109,6 +121,7 @@ see include/trace/events/rcu.h. ...@@ -109,6 +121,7 @@ see include/trace/events/rcu.h.
Fine-Tuning the RCU CPU Stall Detector Fine-Tuning the RCU CPU Stall Detector
======================================
The rcuupdate.rcu_cpu_stall_suppress module parameter disables RCU's The rcuupdate.rcu_cpu_stall_suppress module parameter disables RCU's
CPU stall detector, which detects conditions that unduly delay RCU grace CPU stall detector, which detects conditions that unduly delay RCU grace
...@@ -118,6 +131,7 @@ The stall detector's idea of what constitutes "unduly delayed" is ...@@ -118,6 +131,7 @@ The stall detector's idea of what constitutes "unduly delayed" is
controlled by a set of kernel configuration variables and cpp macros: controlled by a set of kernel configuration variables and cpp macros:
CONFIG_RCU_CPU_STALL_TIMEOUT CONFIG_RCU_CPU_STALL_TIMEOUT
----------------------------
This kernel configuration parameter defines the period of time This kernel configuration parameter defines the period of time
that RCU will wait from the beginning of a grace period until it that RCU will wait from the beginning of a grace period until it
...@@ -137,6 +151,7 @@ CONFIG_RCU_CPU_STALL_TIMEOUT ...@@ -137,6 +151,7 @@ CONFIG_RCU_CPU_STALL_TIMEOUT
/sys/module/rcupdate/parameters/rcu_cpu_stall_suppress. /sys/module/rcupdate/parameters/rcu_cpu_stall_suppress.
RCU_STALL_DELAY_DELTA RCU_STALL_DELAY_DELTA
---------------------
Although the lockdep facility is extremely useful, it does add Although the lockdep facility is extremely useful, it does add
some overhead. Therefore, under CONFIG_PROVE_RCU, the some overhead. Therefore, under CONFIG_PROVE_RCU, the
...@@ -145,6 +160,7 @@ RCU_STALL_DELAY_DELTA ...@@ -145,6 +160,7 @@ RCU_STALL_DELAY_DELTA
macro, not a kernel configuration parameter.) macro, not a kernel configuration parameter.)
RCU_STALL_RAT_DELAY RCU_STALL_RAT_DELAY
-------------------
The CPU stall detector tries to make the offending CPU print its The CPU stall detector tries to make the offending CPU print its
own warnings, as this often gives better-quality stack traces. own warnings, as this often gives better-quality stack traces.
...@@ -155,6 +171,7 @@ RCU_STALL_RAT_DELAY ...@@ -155,6 +171,7 @@ RCU_STALL_RAT_DELAY
parameter.) parameter.)
rcupdate.rcu_task_stall_timeout rcupdate.rcu_task_stall_timeout
-------------------------------
This boot/sysfs parameter controls the RCU-tasks stall warning This boot/sysfs parameter controls the RCU-tasks stall warning
interval. A value of zero or less suppresses RCU-tasks stall interval. A value of zero or less suppresses RCU-tasks stall
...@@ -168,9 +185,10 @@ rcupdate.rcu_task_stall_timeout ...@@ -168,9 +185,10 @@ rcupdate.rcu_task_stall_timeout
Interpreting RCU's CPU Stall-Detector "Splats" Interpreting RCU's CPU Stall-Detector "Splats"
==============================================
For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling, For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling,
it will print a message similar to the following: it will print a message similar to the following::
INFO: rcu_sched detected stalls on CPUs/tasks: INFO: rcu_sched detected stalls on CPUs/tasks:
2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0 2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
...@@ -223,7 +241,7 @@ an estimate of the total number of RCU callbacks queued across all CPUs ...@@ -223,7 +241,7 @@ an estimate of the total number of RCU callbacks queued across all CPUs
(625 in this case). (625 in this case).
In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed
for each CPU: for each CPU::
0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 dyntick_enabled: 1 0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 dyntick_enabled: 1
...@@ -235,7 +253,7 @@ processing is enabled. ...@@ -235,7 +253,7 @@ processing is enabled.
If the grace period ends just as the stall warning starts printing, If the grace period ends just as the stall warning starts printing,
there will be a spurious stall-warning message, which will include there will be a spurious stall-warning message, which will include
the following: the following::
INFO: Stall ended before state dump start INFO: Stall ended before state dump start
...@@ -248,7 +266,7 @@ which is overkill for this sort of problem. ...@@ -248,7 +266,7 @@ which is overkill for this sort of problem.
If all CPUs and tasks have passed through quiescent states, but the If all CPUs and tasks have passed through quiescent states, but the
grace period has nevertheless failed to end, the stall-warning splat grace period has nevertheless failed to end, the stall-warning splat
will include something like the following: will include something like the following::
All QSes seen, last rcu_preempt kthread activity 23807 (4297905177-4297881370), jiffies_till_next_fqs=3, root ->qsmask 0x0 All QSes seen, last rcu_preempt kthread activity 23807 (4297905177-4297881370), jiffies_till_next_fqs=3, root ->qsmask 0x0
...@@ -261,7 +279,7 @@ which is way less than 23807. Finally, the root rcu_node structure's ...@@ -261,7 +279,7 @@ which is way less than 23807. Finally, the root rcu_node structure's
If the relevant grace-period kthread has been unable to run prior to If the relevant grace-period kthread has been unable to run prior to
the stall warning, as was the case in the "All QSes seen" line above, the stall warning, as was the case in the "All QSes seen" line above,
the following additional line is printed: the following additional line is printed::
kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5 kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5
...@@ -276,6 +294,7 @@ kthread last ran on CPU 5. ...@@ -276,6 +294,7 @@ kthread last ran on CPU 5.
Multiple Warnings From One Stall Multiple Warnings From One Stall
================================
If a stall lasts long enough, multiple stall-warning messages will be If a stall lasts long enough, multiple stall-warning messages will be
printed for it. The second and subsequent messages are printed at printed for it. The second and subsequent messages are printed at
...@@ -285,9 +304,10 @@ of the stall and the first message. ...@@ -285,9 +304,10 @@ of the stall and the first message.
Stall Warnings for Expedited Grace Periods Stall Warnings for Expedited Grace Periods
==========================================
If an expedited grace period detects a stall, it will place a message If an expedited grace period detects a stall, it will place a message
like the following in dmesg: like the following in dmesg::
INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 21119 jiffies s: 73 root: 0x2/. INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 21119 jiffies s: 73 root: 0x2/.
......
...@@ -4038,6 +4038,14 @@ ...@@ -4038,6 +4038,14 @@
latencies, which will choose a value aligned latencies, which will choose a value aligned
with the appropriate hardware boundaries. with the appropriate hardware boundaries.
rcutree.rcu_min_cached_objs= [KNL]
Minimum number of objects which are cached and
maintained per one CPU. Object size is equal
to PAGE_SIZE. The cache allows to reduce the
pressure to page allocator, also it makes the
whole algorithm to behave better in low memory
condition.
rcutree.jiffies_till_first_fqs= [KNL] rcutree.jiffies_till_first_fqs= [KNL]
Set delay from grace-period initialization to Set delay from grace-period initialization to
first attempt to force quiescent states. first attempt to force quiescent states.
...@@ -4258,6 +4266,20 @@ ...@@ -4258,6 +4266,20 @@
Set time (jiffies) between CPU-hotplug operations, Set time (jiffies) between CPU-hotplug operations,
or zero to disable CPU-hotplug testing. or zero to disable CPU-hotplug testing.
rcutorture.read_exit= [KNL]
Set the number of read-then-exit kthreads used
to test the interaction of RCU updaters and
task-exit processing.
rcutorture.read_exit_burst= [KNL]
The number of times in a given read-then-exit
episode that a set of read-then-exit kthreads
is spawned.
rcutorture.read_exit_delay= [KNL]
The delay, in seconds, between successive
read-then-exit testing episodes.
rcutorture.shuffle_interval= [KNL] rcutorture.shuffle_interval= [KNL]
Set task-shuffle interval (s). Shuffling tasks Set task-shuffle interval (s). Shuffling tasks
allows some CPUs to go into dyntick-idle mode allows some CPUs to go into dyntick-idle mode
...@@ -4407,6 +4429,45 @@ ...@@ -4407,6 +4429,45 @@
reboot_cpu is s[mp]#### with #### being the processor reboot_cpu is s[mp]#### with #### being the processor
to be used for rebooting. to be used for rebooting.
refscale.holdoff= [KNL]
Set test-start holdoff period. The purpose of
this parameter is to delay the start of the
test until boot completes in order to avoid
interference.
refscale.loops= [KNL]
Set the number of loops over the synchronization
primitive under test. Increasing this number
reduces noise due to loop start/end overhead,
but the default has already reduced the per-pass
noise to a handful of picoseconds on ca. 2020
x86 laptops.
refscale.nreaders= [KNL]
Set number of readers. The default value of -1
selects N, where N is roughly 75% of the number
of CPUs. A value of zero is an interesting choice.
refscale.nruns= [KNL]
Set number of runs, each of which is dumped onto
the console log.
refscale.readdelay= [KNL]
Set the read-side critical-section duration,
measured in microseconds.
refscale.scale_type= [KNL]
Specify the read-protection implementation to test.
refscale.shutdown= [KNL]
Shut down the system at the end of the performance
test. This defaults to 1 (shut it down) when
rcuperf is built into the kernel and to 0 (leave
it running) when rcuperf is built as a module.
refscale.verbose= [KNL]
Enable additional printk() statements.
relax_domain_level= relax_domain_level=
[KNL, SMP] Set scheduler's default relax_domain_level. [KNL, SMP] Set scheduler's default relax_domain_level.
See Documentation/admin-guide/cgroup-v1/cpusets.rst. See Documentation/admin-guide/cgroup-v1/cpusets.rst.
...@@ -5082,6 +5143,13 @@ ...@@ -5082,6 +5143,13 @@
Prevent the CPU-hotplug component of torturing Prevent the CPU-hotplug component of torturing
until after init has spawned. until after init has spawned.
torture.ftrace_dump_at_shutdown= [KNL]
Dump the ftrace buffer at torture-test shutdown,
even if there were no errors. This can be a
very costly operation when many torture tests
are running concurrently, especially on systems
with rotating-rust storage.
tp720= [HW,PS2] tp720= [HW,PS2]
tpm_suspend_pcr=[HW,TPM] tpm_suspend_pcr=[HW,TPM]
......
...@@ -166,4 +166,4 @@ checked for such errors. The "rmmod" command forces a "SUCCESS", ...@@ -166,4 +166,4 @@ checked for such errors. The "rmmod" command forces a "SUCCESS",
two are self-explanatory, while the last indicates that while there two are self-explanatory, while the last indicates that while there
were no locking failures, CPU-hotplug problems were detected. were no locking failures, CPU-hotplug problems were detected.
Also see: Documentation/RCU/torture.txt Also see: Documentation/RCU/torture.rst
...@@ -14449,7 +14449,7 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev ...@@ -14449,7 +14449,7 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev
F: Documentation/RCU/ F: Documentation/RCU/
F: include/linux/rcu* F: include/linux/rcu*
F: kernel/rcu/ F: kernel/rcu/
X: Documentation/RCU/torture.txt X: Documentation/RCU/torture.rst
X: include/linux/srcu*.h X: include/linux/srcu*.h
X: kernel/rcu/srcu*.c X: kernel/rcu/srcu*.c
...@@ -17301,7 +17301,7 @@ M: Josh Triplett <josh@joshtriplett.org> ...@@ -17301,7 +17301,7 @@ M: Josh Triplett <josh@joshtriplett.org>
L: linux-kernel@vger.kernel.org L: linux-kernel@vger.kernel.org
S: Supported S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev
F: Documentation/RCU/torture.txt F: Documentation/RCU/torture.rst
F: kernel/locking/locktorture.c F: kernel/locking/locktorture.c
F: kernel/rcu/rcuperf.c F: kernel/rcu/rcuperf.c
F: kernel/rcu/rcutorture.c F: kernel/rcu/rcutorture.c
......
...@@ -4541,6 +4541,8 @@ int try_release_extent_mapping(struct page *page, gfp_t mask) ...@@ -4541,6 +4541,8 @@ int try_release_extent_mapping(struct page *page, gfp_t mask)
/* once for us */ /* once for us */
free_extent_map(em); free_extent_map(em);
cond_resched(); /* Allow large-extent preemption. */
} }
} }
return try_release_extent_state(tree, page, mask); return try_release_extent_state(tree, page, mask);
......
...@@ -512,7 +512,7 @@ static inline void hlist_replace_rcu(struct hlist_node *old, ...@@ -512,7 +512,7 @@ static inline void hlist_replace_rcu(struct hlist_node *old,
* @right: The hlist head on the right * @right: The hlist head on the right
* *
* The lists start out as [@left ][node1 ... ] and * The lists start out as [@left ][node1 ... ] and
[@right ][node2 ... ] * [@right ][node2 ... ]
* The lists end up as [@left ][node2 ... ] * The lists end up as [@left ][node2 ... ]
* [@right ][node1 ... ] * [@right ][node1 ... ]
*/ */
......
...@@ -162,7 +162,7 @@ static inline void hlist_nulls_add_fake(struct hlist_nulls_node *n) ...@@ -162,7 +162,7 @@ static inline void hlist_nulls_add_fake(struct hlist_nulls_node *n)
* The barrier() is needed to make sure compiler doesn't cache first element [1], * The barrier() is needed to make sure compiler doesn't cache first element [1],
* as this loop can be restarted [2] * as this loop can be restarted [2]
* [1] Documentation/core-api/atomic_ops.rst around line 114 * [1] Documentation/core-api/atomic_ops.rst around line 114
* [2] Documentation/RCU/rculist_nulls.txt around line 146 * [2] Documentation/RCU/rculist_nulls.rst around line 146
*/ */
#define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \ #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
for (({barrier();}), \ for (({barrier();}), \
......
...@@ -828,17 +828,17 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) ...@@ -828,17 +828,17 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
/* /*
* Does the specified offset indicate that the corresponding rcu_head * Does the specified offset indicate that the corresponding rcu_head
* structure can be handled by kfree_rcu()? * structure can be handled by kvfree_rcu()?
*/ */
#define __is_kfree_rcu_offset(offset) ((offset) < 4096) #define __is_kvfree_rcu_offset(offset) ((offset) < 4096)
/* /*
* Helper macro for kfree_rcu() to prevent argument-expansion eyestrain. * Helper macro for kfree_rcu() to prevent argument-expansion eyestrain.
*/ */
#define __kfree_rcu(head, offset) \ #define __kvfree_rcu(head, offset) \
do { \ do { \
BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \ BUILD_BUG_ON(!__is_kvfree_rcu_offset(offset)); \
kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \ kvfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
} while (0) } while (0)
/** /**
...@@ -857,7 +857,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) ...@@ -857,7 +857,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
* Because the functions are not allowed in the low-order 4096 bytes of * Because the functions are not allowed in the low-order 4096 bytes of
* kernel virtual memory, offsets up to 4095 bytes can be accommodated. * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
* If the offset is larger than 4095 bytes, a compile-time error will * If the offset is larger than 4095 bytes, a compile-time error will
* be generated in __kfree_rcu(). If this error is triggered, you can * be generated in __kvfree_rcu(). If this error is triggered, you can
* either fall back to use of call_rcu() or rearrange the structure to * either fall back to use of call_rcu() or rearrange the structure to
* position the rcu_head structure into the first 4096 bytes. * position the rcu_head structure into the first 4096 bytes.
* *
...@@ -872,7 +872,46 @@ do { \ ...@@ -872,7 +872,46 @@ do { \
typeof (ptr) ___p = (ptr); \ typeof (ptr) ___p = (ptr); \
\ \
if (___p) \ if (___p) \
__kfree_rcu(&((___p)->rhf), offsetof(typeof(*(ptr)), rhf)); \ __kvfree_rcu(&((___p)->rhf), offsetof(typeof(*(ptr)), rhf)); \
} while (0)
/**
* kvfree_rcu() - kvfree an object after a grace period.
*
* This macro consists of one or two arguments and it is
* based on whether an object is head-less or not. If it
* has a head then a semantic stays the same as it used
* to be before:
*
* kvfree_rcu(ptr, rhf);
*
* where @ptr is a pointer to kvfree(), @rhf is the name
* of the rcu_head structure within the type of @ptr.
*
* When it comes to head-less variant, only one argument
* is passed and that is just a pointer which has to be
* freed after a grace period. Therefore the semantic is
*
* kvfree_rcu(ptr);
*
* where @ptr is a pointer to kvfree().
*
* Please note, head-less way of freeing is permitted to
* use from a context that has to follow might_sleep()
* annotation. Otherwise, please switch and embed the
* rcu_head structure within the type of @ptr.
*/
#define kvfree_rcu(...) KVFREE_GET_MACRO(__VA_ARGS__, \
kvfree_rcu_arg_2, kvfree_rcu_arg_1)(__VA_ARGS__)
#define KVFREE_GET_MACRO(_1, _2, NAME, ...) NAME
#define kvfree_rcu_arg_2(ptr, rhf) kfree_rcu(ptr, rhf)
#define kvfree_rcu_arg_1(ptr) \
do { \
typeof(ptr) ___p = (ptr); \
\
if (___p) \
kvfree_call_rcu(NULL, (rcu_callback_t) (___p)); \
} while (0) } while (0)
/* /*
......
...@@ -36,8 +36,8 @@ void rcu_read_unlock_trace_special(struct task_struct *t, int nesting); ...@@ -36,8 +36,8 @@ void rcu_read_unlock_trace_special(struct task_struct *t, int nesting);
/** /**
* rcu_read_lock_trace - mark beginning of RCU-trace read-side critical section * rcu_read_lock_trace - mark beginning of RCU-trace read-side critical section
* *
* When synchronize_rcu_trace() is invoked by one task, then that task * When synchronize_rcu_tasks_trace() is invoked by one task, then that
* is guaranteed to block until all other tasks exit their read-side * task is guaranteed to block until all other tasks exit their read-side
* critical sections. Similarly, if call_rcu_trace() is invoked on one * critical sections. Similarly, if call_rcu_trace() is invoked on one
* task while other tasks are within RCU read-side critical sections, * task while other tasks are within RCU read-side critical sections,
* invocation of the corresponding RCU callback is deferred until after * invocation of the corresponding RCU callback is deferred until after
......
...@@ -34,9 +34,25 @@ static inline void synchronize_rcu_expedited(void) ...@@ -34,9 +34,25 @@ static inline void synchronize_rcu_expedited(void)
synchronize_rcu(); synchronize_rcu();
} }
static inline void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func) /*
* Add one more declaration of kvfree() here. It is
* not so straight forward to just include <linux/mm.h>
* where it is defined due to getting many compile
* errors caused by that include.
*/
extern void kvfree(const void *addr);
static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
{ {
call_rcu(head, func); if (head) {
call_rcu(head, func);
return;
}
// kvfree_rcu(one_arg) call.
might_sleep();
synchronize_rcu();
kvfree((void *) func);
} }
void rcu_qs(void); void rcu_qs(void);
......
...@@ -33,7 +33,7 @@ static inline void rcu_virt_note_context_switch(int cpu) ...@@ -33,7 +33,7 @@ static inline void rcu_virt_note_context_switch(int cpu)
} }
void synchronize_rcu_expedited(void); void synchronize_rcu_expedited(void);
void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func); void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
void rcu_barrier(void); void rcu_barrier(void);
bool rcu_eqs_special_set(int cpu); bool rcu_eqs_special_set(int cpu);
......
...@@ -55,6 +55,11 @@ struct torture_random_state { ...@@ -55,6 +55,11 @@ struct torture_random_state {
#define DEFINE_TORTURE_RANDOM_PERCPU(name) \ #define DEFINE_TORTURE_RANDOM_PERCPU(name) \
DEFINE_PER_CPU(struct torture_random_state, name) DEFINE_PER_CPU(struct torture_random_state, name)
unsigned long torture_random(struct torture_random_state *trsp); unsigned long torture_random(struct torture_random_state *trsp);
static inline void torture_random_init(struct torture_random_state *trsp)
{
trsp->trs_state = 0;
trsp->trs_count = 0;
}
/* Task shuffler, which causes CPUs to occasionally go idle. */ /* Task shuffler, which causes CPUs to occasionally go idle. */
void torture_shuffle_task_register(struct task_struct *tp); void torture_shuffle_task_register(struct task_struct *tp);
......
...@@ -435,11 +435,12 @@ TRACE_EVENT_RCU(rcu_fqs, ...@@ -435,11 +435,12 @@ TRACE_EVENT_RCU(rcu_fqs,
#endif /* #if defined(CONFIG_TREE_RCU) */ #endif /* #if defined(CONFIG_TREE_RCU) */
/* /*
* Tracepoint for dyntick-idle entry/exit events. These take a string * Tracepoint for dyntick-idle entry/exit events. These take 2 strings
* as argument: "Start" for entering dyntick-idle mode, "Startirq" for * as argument:
* entering it from irq/NMI, "End" for leaving it, "Endirq" for leaving it * polarity: "Start", "End", "StillNonIdle" for entering, exiting or still not
* to irq/NMI, "--=" for events moving towards idle, and "++=" for events * being in dyntick-idle mode.
* moving away from idle. * context: "USER" or "IDLE" or "IRQ".
* NMIs nested in IRQs are inferred with dynticks_nesting > 1 in IRQ context.
* *
* These events also take a pair of numbers, which indicate the nesting * These events also take a pair of numbers, which indicate the nesting
* depth before and after the event of interest, and a third number that is * depth before and after the event of interest, and a third number that is
...@@ -506,13 +507,13 @@ TRACE_EVENT_RCU(rcu_callback, ...@@ -506,13 +507,13 @@ TRACE_EVENT_RCU(rcu_callback,
/* /*
* Tracepoint for the registration of a single RCU callback of the special * Tracepoint for the registration of a single RCU callback of the special
* kfree() form. The first argument is the RCU type, the second argument * kvfree() form. The first argument is the RCU type, the second argument
* is a pointer to the RCU callback, the third argument is the offset * is a pointer to the RCU callback, the third argument is the offset
* of the callback within the enclosing RCU-protected data structure, * of the callback within the enclosing RCU-protected data structure,
* the fourth argument is the number of lazy callbacks queued, and the * the fourth argument is the number of lazy callbacks queued, and the
* fifth argument is the total number of callbacks queued. * fifth argument is the total number of callbacks queued.
*/ */
TRACE_EVENT_RCU(rcu_kfree_callback, TRACE_EVENT_RCU(rcu_kvfree_callback,
TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset, TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset,
long qlen), long qlen),
...@@ -596,12 +597,12 @@ TRACE_EVENT_RCU(rcu_invoke_callback, ...@@ -596,12 +597,12 @@ TRACE_EVENT_RCU(rcu_invoke_callback,
/* /*
* Tracepoint for the invocation of a single RCU callback of the special * Tracepoint for the invocation of a single RCU callback of the special
* kfree() form. The first argument is the RCU flavor, the second * kvfree() form. The first argument is the RCU flavor, the second
* argument is a pointer to the RCU callback, and the third argument * argument is a pointer to the RCU callback, and the third argument
* is the offset of the callback within the enclosing RCU-protected * is the offset of the callback within the enclosing RCU-protected
* data structure. * data structure.
*/ */
TRACE_EVENT_RCU(rcu_invoke_kfree_callback, TRACE_EVENT_RCU(rcu_invoke_kvfree_callback,
TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset), TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset),
......
...@@ -5851,9 +5851,7 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s) ...@@ -5851,9 +5851,7 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n", pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n",
!rcu_lockdep_current_cpu_online() !rcu_lockdep_current_cpu_online()
? "RCU used illegally from offline CPU!\n" ? "RCU used illegally from offline CPU!\n"
: !rcu_is_watching() : "",
? "RCU used illegally from idle CPU!\n"
: "",
rcu_scheduler_active, debug_locks); rcu_scheduler_active, debug_locks);
/* /*
......
...@@ -631,13 +631,13 @@ static int lock_torture_writer(void *arg) ...@@ -631,13 +631,13 @@ static int lock_torture_writer(void *arg)
cxt.cur_ops->writelock(); cxt.cur_ops->writelock();
if (WARN_ON_ONCE(lock_is_write_held)) if (WARN_ON_ONCE(lock_is_write_held))
lwsp->n_lock_fail++; lwsp->n_lock_fail++;
lock_is_write_held = 1; lock_is_write_held = true;
if (WARN_ON_ONCE(lock_is_read_held)) if (WARN_ON_ONCE(lock_is_read_held))
lwsp->n_lock_fail++; /* rare, but... */ lwsp->n_lock_fail++; /* rare, but... */
lwsp->n_lock_acquired++; lwsp->n_lock_acquired++;
cxt.cur_ops->write_delay(&rand); cxt.cur_ops->write_delay(&rand);
lock_is_write_held = 0; lock_is_write_held = false;
cxt.cur_ops->writeunlock(); cxt.cur_ops->writeunlock();
stutter_wait("lock_torture_writer"); stutter_wait("lock_torture_writer");
...@@ -665,13 +665,13 @@ static int lock_torture_reader(void *arg) ...@@ -665,13 +665,13 @@ static int lock_torture_reader(void *arg)
schedule_timeout_uninterruptible(1); schedule_timeout_uninterruptible(1);
cxt.cur_ops->readlock(); cxt.cur_ops->readlock();
lock_is_read_held = 1; lock_is_read_held = true;
if (WARN_ON_ONCE(lock_is_write_held)) if (WARN_ON_ONCE(lock_is_write_held))
lrsp->n_lock_fail++; /* rare, but... */ lrsp->n_lock_fail++; /* rare, but... */
lrsp->n_lock_acquired++; lrsp->n_lock_acquired++;
cxt.cur_ops->read_delay(&rand); cxt.cur_ops->read_delay(&rand);
lock_is_read_held = 0; lock_is_read_held = false;
cxt.cur_ops->readunlock(); cxt.cur_ops->readunlock();
stutter_wait("lock_torture_reader"); stutter_wait("lock_torture_reader");
...@@ -686,7 +686,7 @@ static int lock_torture_reader(void *arg) ...@@ -686,7 +686,7 @@ static int lock_torture_reader(void *arg)
static void __torture_print_stats(char *page, static void __torture_print_stats(char *page,
struct lock_stress_stats *statp, bool write) struct lock_stress_stats *statp, bool write)
{ {
bool fail = 0; bool fail = false;
int i, n_stress; int i, n_stress;
long max = 0, min = statp ? statp[0].n_lock_acquired : 0; long max = 0, min = statp ? statp[0].n_lock_acquired : 0;
long long sum = 0; long long sum = 0;
...@@ -904,7 +904,7 @@ static int __init lock_torture_init(void) ...@@ -904,7 +904,7 @@ static int __init lock_torture_init(void)
/* Initialize the statistics so that each run gets its own numbers. */ /* Initialize the statistics so that each run gets its own numbers. */
if (nwriters_stress) { if (nwriters_stress) {
lock_is_write_held = 0; lock_is_write_held = false;
cxt.lwsa = kmalloc_array(cxt.nrealwriters_stress, cxt.lwsa = kmalloc_array(cxt.nrealwriters_stress,
sizeof(*cxt.lwsa), sizeof(*cxt.lwsa),
GFP_KERNEL); GFP_KERNEL);
...@@ -935,7 +935,7 @@ static int __init lock_torture_init(void) ...@@ -935,7 +935,7 @@ static int __init lock_torture_init(void)
} }
if (nreaders_stress) { if (nreaders_stress) {
lock_is_read_held = 0; lock_is_read_held = false;
cxt.lrsa = kmalloc_array(cxt.nrealreaders_stress, cxt.lrsa = kmalloc_array(cxt.nrealreaders_stress,
sizeof(*cxt.lrsa), sizeof(*cxt.lrsa),
GFP_KERNEL); GFP_KERNEL);
......
...@@ -61,6 +61,25 @@ config RCU_TORTURE_TEST ...@@ -61,6 +61,25 @@ config RCU_TORTURE_TEST
Say M if you want the RCU torture tests to build as a module. Say M if you want the RCU torture tests to build as a module.
Say N if you are unsure. Say N if you are unsure.
config RCU_REF_SCALE_TEST
tristate "Scalability tests for read-side synchronization (RCU and others)"
depends on DEBUG_KERNEL
select TORTURE_TEST
select SRCU
select TASKS_RCU
select TASKS_RUDE_RCU
select TASKS_TRACE_RCU
default n
help
This option provides a kernel module that runs performance tests
useful comparing RCU with various read-side synchronization mechanisms.
The kernel module may be built after the fact on the running kernel to be
tested, if desired.
Say Y here if you want these performance tests built into the kernel.
Say M if you want to build it as a module instead.
Say N if you are unsure.
config RCU_CPU_STALL_TIMEOUT config RCU_CPU_STALL_TIMEOUT
int "RCU CPU stall timeout in seconds" int "RCU CPU stall timeout in seconds"
depends on RCU_STALL_COMMON depends on RCU_STALL_COMMON
......
...@@ -12,6 +12,7 @@ obj-$(CONFIG_TREE_SRCU) += srcutree.o ...@@ -12,6 +12,7 @@ obj-$(CONFIG_TREE_SRCU) += srcutree.o
obj-$(CONFIG_TINY_SRCU) += srcutiny.o obj-$(CONFIG_TINY_SRCU) += srcutiny.o
obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
obj-$(CONFIG_RCU_PERF_TEST) += rcuperf.o obj-$(CONFIG_RCU_PERF_TEST) += rcuperf.o
obj-$(CONFIG_RCU_REF_SCALE_TEST) += refscale.o
obj-$(CONFIG_TREE_RCU) += tree.o obj-$(CONFIG_TREE_RCU) += tree.o
obj-$(CONFIG_TINY_RCU) += tiny.o obj-$(CONFIG_TINY_RCU) += tiny.o
obj-$(CONFIG_RCU_NEED_SEGCBLIST) += rcu_segcblist.o obj-$(CONFIG_RCU_NEED_SEGCBLIST) += rcu_segcblist.o
...@@ -69,6 +69,11 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>"); ...@@ -69,6 +69,11 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>");
* value specified by nr_cpus for a read-only test. * value specified by nr_cpus for a read-only test.
* *
* Various other use cases may of course be specified. * Various other use cases may of course be specified.
*
* Note that this test's readers are intended only as a test load for
* the writers. The reader performance statistics will be overly
* pessimistic due to the per-critical-section interrupt disabling,
* test-end checks, and the pair of calls through pointers.
*/ */
#ifdef MODULE #ifdef MODULE
...@@ -309,8 +314,10 @@ static void rcu_perf_wait_shutdown(void) ...@@ -309,8 +314,10 @@ static void rcu_perf_wait_shutdown(void)
} }
/* /*
* RCU perf reader kthread. Repeatedly does empty RCU read-side * RCU perf reader kthread. Repeatedly does empty RCU read-side critical
* critical section, minimizing update-side interference. * section, minimizing update-side interference. However, the point of
* this test is not to evaluate reader performance, but instead to serve
* as a test load for update-side performance testing.
*/ */
static int static int
rcu_perf_reader(void *arg) rcu_perf_reader(void *arg)
...@@ -576,11 +583,8 @@ static int compute_real(int n) ...@@ -576,11 +583,8 @@ static int compute_real(int n)
static int static int
rcu_perf_shutdown(void *arg) rcu_perf_shutdown(void *arg)
{ {
do { wait_event(shutdown_wq,
wait_event(shutdown_wq, atomic_read(&n_rcu_perf_writer_finished) >= nrealwriters);
atomic_read(&n_rcu_perf_writer_finished) >=
nrealwriters);
} while (atomic_read(&n_rcu_perf_writer_finished) < nrealwriters);
smp_mb(); /* Wake before output. */ smp_mb(); /* Wake before output. */
rcu_perf_cleanup(); rcu_perf_cleanup();
kernel_power_off(); kernel_power_off();
...@@ -693,11 +697,8 @@ kfree_perf_cleanup(void) ...@@ -693,11 +697,8 @@ kfree_perf_cleanup(void)
static int static int
kfree_perf_shutdown(void *arg) kfree_perf_shutdown(void *arg)
{ {
do { wait_event(shutdown_wq,
wait_event(shutdown_wq, atomic_read(&n_kfree_perf_thread_ended) >= kfree_nrealthreads);
atomic_read(&n_kfree_perf_thread_ended) >=
kfree_nrealthreads);
} while (atomic_read(&n_kfree_perf_thread_ended) < kfree_nrealthreads);
smp_mb(); /* Wake before output. */ smp_mb(); /* Wake before output. */
......
...@@ -7,7 +7,7 @@ ...@@ -7,7 +7,7 @@
* Authors: Paul E. McKenney <paulmck@linux.ibm.com> * Authors: Paul E. McKenney <paulmck@linux.ibm.com>
* Josh Triplett <josh@joshtriplett.org> * Josh Triplett <josh@joshtriplett.org>
* *
* See also: Documentation/RCU/torture.txt * See also: Documentation/RCU/torture.rst
*/ */
#define pr_fmt(fmt) fmt #define pr_fmt(fmt) fmt
...@@ -109,6 +109,10 @@ torture_param(int, object_debug, 0, ...@@ -109,6 +109,10 @@ torture_param(int, object_debug, 0,
torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)"); torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)");
torture_param(int, onoff_interval, 0, torture_param(int, onoff_interval, 0,
"Time between CPU hotplugs (jiffies), 0=disable"); "Time between CPU hotplugs (jiffies), 0=disable");
torture_param(int, read_exit_delay, 13,
"Delay between read-then-exit episodes (s)");
torture_param(int, read_exit_burst, 16,
"# of read-then-exit bursts per episode, zero to disable");
torture_param(int, shuffle_interval, 3, "Number of seconds between shuffles"); torture_param(int, shuffle_interval, 3, "Number of seconds between shuffles");
torture_param(int, shutdown_secs, 0, "Shutdown time (s), <= zero to disable."); torture_param(int, shutdown_secs, 0, "Shutdown time (s), <= zero to disable.");
torture_param(int, stall_cpu, 0, "Stall duration (s), zero to disable."); torture_param(int, stall_cpu, 0, "Stall duration (s), zero to disable.");
...@@ -146,6 +150,7 @@ static struct task_struct *stall_task; ...@@ -146,6 +150,7 @@ static struct task_struct *stall_task;
static struct task_struct *fwd_prog_task; static struct task_struct *fwd_prog_task;
static struct task_struct **barrier_cbs_tasks; static struct task_struct **barrier_cbs_tasks;
static struct task_struct *barrier_task; static struct task_struct *barrier_task;
static struct task_struct *read_exit_task;
#define RCU_TORTURE_PIPE_LEN 10 #define RCU_TORTURE_PIPE_LEN 10
...@@ -177,6 +182,7 @@ static long n_rcu_torture_boosts; ...@@ -177,6 +182,7 @@ static long n_rcu_torture_boosts;
static atomic_long_t n_rcu_torture_timers; static atomic_long_t n_rcu_torture_timers;
static long n_barrier_attempts; static long n_barrier_attempts;
static long n_barrier_successes; /* did rcu_barrier test succeed? */ static long n_barrier_successes; /* did rcu_barrier test succeed? */
static unsigned long n_read_exits;
static struct list_head rcu_torture_removed; static struct list_head rcu_torture_removed;
static unsigned long shutdown_jiffies; static unsigned long shutdown_jiffies;
...@@ -1166,6 +1172,7 @@ rcu_torture_writer(void *arg) ...@@ -1166,6 +1172,7 @@ rcu_torture_writer(void *arg)
WARN(1, "%s: rtort_pipe_count: %d\n", __func__, rcu_tortures[i].rtort_pipe_count); WARN(1, "%s: rtort_pipe_count: %d\n", __func__, rcu_tortures[i].rtort_pipe_count);
} }
} while (!torture_must_stop()); } while (!torture_must_stop());
rcu_torture_current = NULL; // Let stats task know that we are done.
/* Reset expediting back to unexpedited. */ /* Reset expediting back to unexpedited. */
if (expediting > 0) if (expediting > 0)
expediting = -expediting; expediting = -expediting;
...@@ -1370,6 +1377,7 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp) ...@@ -1370,6 +1377,7 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp)
struct rt_read_seg *rtrsp1; struct rt_read_seg *rtrsp1;
unsigned long long ts; unsigned long long ts;
WARN_ON_ONCE(!rcu_is_watching());
newstate = rcutorture_extend_mask(readstate, trsp); newstate = rcutorture_extend_mask(readstate, trsp);
rcutorture_one_extend(&readstate, newstate, trsp, rtrsp++); rcutorture_one_extend(&readstate, newstate, trsp, rtrsp++);
started = cur_ops->get_gp_seq(); started = cur_ops->get_gp_seq();
...@@ -1539,10 +1547,11 @@ rcu_torture_stats_print(void) ...@@ -1539,10 +1547,11 @@ rcu_torture_stats_print(void)
n_rcu_torture_boosts, n_rcu_torture_boosts,
atomic_long_read(&n_rcu_torture_timers)); atomic_long_read(&n_rcu_torture_timers));
torture_onoff_stats(); torture_onoff_stats();
pr_cont("barrier: %ld/%ld:%ld\n", pr_cont("barrier: %ld/%ld:%ld ",
data_race(n_barrier_successes), data_race(n_barrier_successes),
data_race(n_barrier_attempts), data_race(n_barrier_attempts),
data_race(n_rcu_torture_barrier_error)); data_race(n_rcu_torture_barrier_error));
pr_cont("read-exits: %ld\n", data_race(n_read_exits));
pr_alert("%s%s ", torture_type, TORTURE_FLAG); pr_alert("%s%s ", torture_type, TORTURE_FLAG);
if (atomic_read(&n_rcu_torture_mberror) || if (atomic_read(&n_rcu_torture_mberror) ||
...@@ -1634,7 +1643,8 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag) ...@@ -1634,7 +1643,8 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag)
"stall_cpu=%d stall_cpu_holdoff=%d stall_cpu_irqsoff=%d " "stall_cpu=%d stall_cpu_holdoff=%d stall_cpu_irqsoff=%d "
"stall_cpu_block=%d " "stall_cpu_block=%d "
"n_barrier_cbs=%d " "n_barrier_cbs=%d "
"onoff_interval=%d onoff_holdoff=%d\n", "onoff_interval=%d onoff_holdoff=%d "
"read_exit_delay=%d read_exit_burst=%d\n",
torture_type, tag, nrealreaders, nfakewriters, torture_type, tag, nrealreaders, nfakewriters,
stat_interval, verbose, test_no_idle_hz, shuffle_interval, stat_interval, verbose, test_no_idle_hz, shuffle_interval,
stutter, irqreader, fqs_duration, fqs_holdoff, fqs_stutter, stutter, irqreader, fqs_duration, fqs_holdoff, fqs_stutter,
...@@ -1643,7 +1653,8 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag) ...@@ -1643,7 +1653,8 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag)
stall_cpu, stall_cpu_holdoff, stall_cpu_irqsoff, stall_cpu, stall_cpu_holdoff, stall_cpu_irqsoff,
stall_cpu_block, stall_cpu_block,
n_barrier_cbs, n_barrier_cbs,
onoff_interval, onoff_holdoff); onoff_interval, onoff_holdoff,
read_exit_delay, read_exit_burst);
} }
static int rcutorture_booster_cleanup(unsigned int cpu) static int rcutorture_booster_cleanup(unsigned int cpu)
...@@ -2175,7 +2186,7 @@ static void rcu_torture_barrier1cb(void *rcu_void) ...@@ -2175,7 +2186,7 @@ static void rcu_torture_barrier1cb(void *rcu_void)
static int rcu_torture_barrier_cbs(void *arg) static int rcu_torture_barrier_cbs(void *arg)
{ {
long myid = (long)arg; long myid = (long)arg;
bool lastphase = 0; bool lastphase = false;
bool newphase; bool newphase;
struct rcu_head rcu; struct rcu_head rcu;
...@@ -2338,6 +2349,99 @@ static bool rcu_torture_can_boost(void) ...@@ -2338,6 +2349,99 @@ static bool rcu_torture_can_boost(void)
return true; return true;
} }
static bool read_exit_child_stop;
static bool read_exit_child_stopped;
static wait_queue_head_t read_exit_wq;
// Child kthread which just does an rcutorture reader and exits.
static int rcu_torture_read_exit_child(void *trsp_in)
{
struct torture_random_state *trsp = trsp_in;
set_user_nice(current, MAX_NICE);
// Minimize time between reading and exiting.
while (!kthread_should_stop())
schedule_timeout_uninterruptible(1);
(void)rcu_torture_one_read(trsp);
return 0;
}
// Parent kthread which creates and destroys read-exit child kthreads.
static int rcu_torture_read_exit(void *unused)
{
int count = 0;
bool errexit = false;
int i;
struct task_struct *tsp;
DEFINE_TORTURE_RANDOM(trs);
// Allocate and initialize.
set_user_nice(current, MAX_NICE);
VERBOSE_TOROUT_STRING("rcu_torture_read_exit: Start of test");
// Each pass through this loop does one read-exit episode.
do {
if (++count > read_exit_burst) {
VERBOSE_TOROUT_STRING("rcu_torture_read_exit: End of episode");
rcu_barrier(); // Wait for task_struct free, avoid OOM.
for (i = 0; i < read_exit_delay; i++) {
schedule_timeout_uninterruptible(HZ);
if (READ_ONCE(read_exit_child_stop))
break;
}
if (!READ_ONCE(read_exit_child_stop))
VERBOSE_TOROUT_STRING("rcu_torture_read_exit: Start of episode");
count = 0;
}
if (READ_ONCE(read_exit_child_stop))
break;
// Spawn child.
tsp = kthread_run(rcu_torture_read_exit_child,
&trs, "%s",
"rcu_torture_read_exit_child");
if (IS_ERR(tsp)) {
VERBOSE_TOROUT_ERRSTRING("out of memory");
errexit = true;
tsp = NULL;
break;
}
cond_resched();
kthread_stop(tsp);
n_read_exits ++;
stutter_wait("rcu_torture_read_exit");
} while (!errexit && !READ_ONCE(read_exit_child_stop));
// Clean up and exit.
smp_store_release(&read_exit_child_stopped, true); // After reaping.
smp_mb(); // Store before wakeup.
wake_up(&read_exit_wq);
while (!torture_must_stop())
schedule_timeout_uninterruptible(1);
torture_kthread_stopping("rcu_torture_read_exit");
return 0;
}
static int rcu_torture_read_exit_init(void)
{
if (read_exit_burst <= 0)
return -EINVAL;
init_waitqueue_head(&read_exit_wq);
read_exit_child_stop = false;
read_exit_child_stopped = false;
return torture_create_kthread(rcu_torture_read_exit, NULL,
read_exit_task);
}
static void rcu_torture_read_exit_cleanup(void)
{
if (!read_exit_task)
return;
WRITE_ONCE(read_exit_child_stop, true);
smp_mb(); // Above write before wait.
wait_event(read_exit_wq, smp_load_acquire(&read_exit_child_stopped));
torture_stop_kthread(rcutorture_read_exit, read_exit_task);
}
static enum cpuhp_state rcutor_hp; static enum cpuhp_state rcutor_hp;
static void static void
...@@ -2359,6 +2463,7 @@ rcu_torture_cleanup(void) ...@@ -2359,6 +2463,7 @@ rcu_torture_cleanup(void)
} }
show_rcu_gp_kthreads(); show_rcu_gp_kthreads();
rcu_torture_read_exit_cleanup();
rcu_torture_barrier_cleanup(); rcu_torture_barrier_cleanup();
torture_stop_kthread(rcu_torture_fwd_prog, fwd_prog_task); torture_stop_kthread(rcu_torture_fwd_prog, fwd_prog_task);
torture_stop_kthread(rcu_torture_stall, stall_task); torture_stop_kthread(rcu_torture_stall, stall_task);
...@@ -2370,7 +2475,6 @@ rcu_torture_cleanup(void) ...@@ -2370,7 +2475,6 @@ rcu_torture_cleanup(void)
reader_tasks[i]); reader_tasks[i]);
kfree(reader_tasks); kfree(reader_tasks);
} }
rcu_torture_current = NULL;
if (fakewriter_tasks) { if (fakewriter_tasks) {
for (i = 0; i < nfakewriters; i++) { for (i = 0; i < nfakewriters; i++) {
...@@ -2680,6 +2784,9 @@ rcu_torture_init(void) ...@@ -2680,6 +2784,9 @@ rcu_torture_init(void)
if (firsterr) if (firsterr)
goto unwind; goto unwind;
firsterr = rcu_torture_barrier_init(); firsterr = rcu_torture_barrier_init();
if (firsterr)
goto unwind;
firsterr = rcu_torture_read_exit_init();
if (firsterr) if (firsterr)
goto unwind; goto unwind;
if (object_debug) if (object_debug)
......
This diff is collapsed.
...@@ -766,7 +766,7 @@ static void srcu_flip(struct srcu_struct *ssp) ...@@ -766,7 +766,7 @@ static void srcu_flip(struct srcu_struct *ssp)
* it, if this function was preempted for enough time for the counters * it, if this function was preempted for enough time for the counters
* to wrap, it really doesn't matter whether or not we expedite the grace * to wrap, it really doesn't matter whether or not we expedite the grace
* period. The extra overhead of a needlessly expedited grace period is * period. The extra overhead of a needlessly expedited grace period is
* negligible when amoritized over that time period, and the extra latency * negligible when amortized over that time period, and the extra latency
* of a needlessly non-expedited grace period is similarly negligible. * of a needlessly non-expedited grace period is similarly negligible.
*/ */
static bool srcu_might_be_idle(struct srcu_struct *ssp) static bool srcu_might_be_idle(struct srcu_struct *ssp)
...@@ -777,14 +777,15 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp) ...@@ -777,14 +777,15 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
unsigned long t; unsigned long t;
unsigned long tlast; unsigned long tlast;
check_init_srcu_struct(ssp);
/* If the local srcu_data structure has callbacks, not idle. */ /* If the local srcu_data structure has callbacks, not idle. */
local_irq_save(flags); sdp = raw_cpu_ptr(ssp->sda);
sdp = this_cpu_ptr(ssp->sda); spin_lock_irqsave_rcu_node(sdp, flags);
if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) { if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) {
local_irq_restore(flags); spin_unlock_irqrestore_rcu_node(sdp, flags);
return false; /* Callbacks already present, so not idle. */ return false; /* Callbacks already present, so not idle. */
} }
local_irq_restore(flags); spin_unlock_irqrestore_rcu_node(sdp, flags);
/* /*
* No local callbacks, so probabalistically probe global state. * No local callbacks, so probabalistically probe global state.
...@@ -864,9 +865,8 @@ static void __call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp, ...@@ -864,9 +865,8 @@ static void __call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp,
} }
rhp->func = func; rhp->func = func;
idx = srcu_read_lock(ssp); idx = srcu_read_lock(ssp);
local_irq_save(flags); sdp = raw_cpu_ptr(ssp->sda);
sdp = this_cpu_ptr(ssp->sda); spin_lock_irqsave_rcu_node(sdp, flags);
spin_lock_rcu_node(sdp);
rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp); rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp);
rcu_segcblist_advance(&sdp->srcu_cblist, rcu_segcblist_advance(&sdp->srcu_cblist,
rcu_seq_current(&ssp->srcu_gp_seq)); rcu_seq_current(&ssp->srcu_gp_seq));
......
...@@ -103,6 +103,7 @@ module_param(rcu_task_stall_timeout, int, 0644); ...@@ -103,6 +103,7 @@ module_param(rcu_task_stall_timeout, int, 0644);
#define RTGS_WAIT_READERS 9 #define RTGS_WAIT_READERS 9
#define RTGS_INVOKE_CBS 10 #define RTGS_INVOKE_CBS 10
#define RTGS_WAIT_CBS 11 #define RTGS_WAIT_CBS 11
#ifndef CONFIG_TINY_RCU
static const char * const rcu_tasks_gp_state_names[] = { static const char * const rcu_tasks_gp_state_names[] = {
"RTGS_INIT", "RTGS_INIT",
"RTGS_WAIT_WAIT_CBS", "RTGS_WAIT_WAIT_CBS",
...@@ -117,6 +118,7 @@ static const char * const rcu_tasks_gp_state_names[] = { ...@@ -117,6 +118,7 @@ static const char * const rcu_tasks_gp_state_names[] = {
"RTGS_INVOKE_CBS", "RTGS_INVOKE_CBS",
"RTGS_WAIT_CBS", "RTGS_WAIT_CBS",
}; };
#endif /* #ifndef CONFIG_TINY_RCU */
//////////////////////////////////////////////////////////////////////// ////////////////////////////////////////////////////////////////////////
// //
...@@ -129,6 +131,7 @@ static void set_tasks_gp_state(struct rcu_tasks *rtp, int newstate) ...@@ -129,6 +131,7 @@ static void set_tasks_gp_state(struct rcu_tasks *rtp, int newstate)
rtp->gp_jiffies = jiffies; rtp->gp_jiffies = jiffies;
} }
#ifndef CONFIG_TINY_RCU
/* Return state name. */ /* Return state name. */
static const char *tasks_gp_state_getname(struct rcu_tasks *rtp) static const char *tasks_gp_state_getname(struct rcu_tasks *rtp)
{ {
...@@ -139,6 +142,7 @@ static const char *tasks_gp_state_getname(struct rcu_tasks *rtp) ...@@ -139,6 +142,7 @@ static const char *tasks_gp_state_getname(struct rcu_tasks *rtp)
return "???"; return "???";
return rcu_tasks_gp_state_names[j]; return rcu_tasks_gp_state_names[j];
} }
#endif /* #ifndef CONFIG_TINY_RCU */
// Enqueue a callback for the specified flavor of Tasks RCU. // Enqueue a callback for the specified flavor of Tasks RCU.
static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func, static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
...@@ -205,7 +209,7 @@ static int __noreturn rcu_tasks_kthread(void *arg) ...@@ -205,7 +209,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
if (!rtp->cbs_head) { if (!rtp->cbs_head) {
WARN_ON(signal_pending(current)); WARN_ON(signal_pending(current));
set_tasks_gp_state(rtp, RTGS_WAIT_WAIT_CBS); set_tasks_gp_state(rtp, RTGS_WAIT_WAIT_CBS);
schedule_timeout_interruptible(HZ/10); schedule_timeout_idle(HZ/10);
} }
continue; continue;
} }
...@@ -227,7 +231,7 @@ static int __noreturn rcu_tasks_kthread(void *arg) ...@@ -227,7 +231,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
cond_resched(); cond_resched();
} }
/* Paranoid sleep to keep this from entering a tight loop */ /* Paranoid sleep to keep this from entering a tight loop */
schedule_timeout_uninterruptible(HZ/10); schedule_timeout_idle(HZ/10);
set_tasks_gp_state(rtp, RTGS_WAIT_CBS); set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
} }
...@@ -268,6 +272,7 @@ static void __init rcu_tasks_bootup_oddness(void) ...@@ -268,6 +272,7 @@ static void __init rcu_tasks_bootup_oddness(void)
#endif /* #ifndef CONFIG_TINY_RCU */ #endif /* #ifndef CONFIG_TINY_RCU */
#ifndef CONFIG_TINY_RCU
/* Dump out rcutorture-relevant state common to all RCU-tasks flavors. */ /* Dump out rcutorture-relevant state common to all RCU-tasks flavors. */
static void show_rcu_tasks_generic_gp_kthread(struct rcu_tasks *rtp, char *s) static void show_rcu_tasks_generic_gp_kthread(struct rcu_tasks *rtp, char *s)
{ {
...@@ -281,6 +286,7 @@ static void show_rcu_tasks_generic_gp_kthread(struct rcu_tasks *rtp, char *s) ...@@ -281,6 +286,7 @@ static void show_rcu_tasks_generic_gp_kthread(struct rcu_tasks *rtp, char *s)
".C"[!!data_race(rtp->cbs_head)], ".C"[!!data_race(rtp->cbs_head)],
s); s);
} }
#endif /* #ifndef CONFIG_TINY_RCU */
static void exit_tasks_rcu_finish_trace(struct task_struct *t); static void exit_tasks_rcu_finish_trace(struct task_struct *t);
...@@ -336,7 +342,7 @@ static void rcu_tasks_wait_gp(struct rcu_tasks *rtp) ...@@ -336,7 +342,7 @@ static void rcu_tasks_wait_gp(struct rcu_tasks *rtp)
/* Slowly back off waiting for holdouts */ /* Slowly back off waiting for holdouts */
set_tasks_gp_state(rtp, RTGS_WAIT_SCAN_HOLDOUTS); set_tasks_gp_state(rtp, RTGS_WAIT_SCAN_HOLDOUTS);
schedule_timeout_interruptible(HZ/fract); schedule_timeout_idle(HZ/fract);
if (fract > 1) if (fract > 1)
fract--; fract--;
...@@ -402,7 +408,7 @@ static void rcu_tasks_pertask(struct task_struct *t, struct list_head *hop) ...@@ -402,7 +408,7 @@ static void rcu_tasks_pertask(struct task_struct *t, struct list_head *hop)
} }
/* Processing between scanning taskslist and draining the holdout list. */ /* Processing between scanning taskslist and draining the holdout list. */
void rcu_tasks_postscan(struct list_head *hop) static void rcu_tasks_postscan(struct list_head *hop)
{ {
/* /*
* Wait for tasks that are in the process of exiting. This * Wait for tasks that are in the process of exiting. This
...@@ -557,10 +563,12 @@ static int __init rcu_spawn_tasks_kthread(void) ...@@ -557,10 +563,12 @@ static int __init rcu_spawn_tasks_kthread(void)
} }
core_initcall(rcu_spawn_tasks_kthread); core_initcall(rcu_spawn_tasks_kthread);
#ifndef CONFIG_TINY_RCU
static void show_rcu_tasks_classic_gp_kthread(void) static void show_rcu_tasks_classic_gp_kthread(void)
{ {
show_rcu_tasks_generic_gp_kthread(&rcu_tasks, ""); show_rcu_tasks_generic_gp_kthread(&rcu_tasks, "");
} }
#endif /* #ifndef CONFIG_TINY_RCU */
/* Do the srcu_read_lock() for the above synchronize_srcu(). */ /* Do the srcu_read_lock() for the above synchronize_srcu(). */
void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu) void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu)
...@@ -682,10 +690,12 @@ static int __init rcu_spawn_tasks_rude_kthread(void) ...@@ -682,10 +690,12 @@ static int __init rcu_spawn_tasks_rude_kthread(void)
} }
core_initcall(rcu_spawn_tasks_rude_kthread); core_initcall(rcu_spawn_tasks_rude_kthread);
#ifndef CONFIG_TINY_RCU
static void show_rcu_tasks_rude_gp_kthread(void) static void show_rcu_tasks_rude_gp_kthread(void)
{ {
show_rcu_tasks_generic_gp_kthread(&rcu_tasks_rude, ""); show_rcu_tasks_generic_gp_kthread(&rcu_tasks_rude, "");
} }
#endif /* #ifndef CONFIG_TINY_RCU */
#else /* #ifdef CONFIG_TASKS_RUDE_RCU */ #else /* #ifdef CONFIG_TASKS_RUDE_RCU */
static void show_rcu_tasks_rude_gp_kthread(void) {} static void show_rcu_tasks_rude_gp_kthread(void) {}
...@@ -727,8 +737,8 @@ EXPORT_SYMBOL_GPL(rcu_trace_lock_map); ...@@ -727,8 +737,8 @@ EXPORT_SYMBOL_GPL(rcu_trace_lock_map);
#ifdef CONFIG_TASKS_TRACE_RCU #ifdef CONFIG_TASKS_TRACE_RCU
atomic_t trc_n_readers_need_end; // Number of waited-for readers. static atomic_t trc_n_readers_need_end; // Number of waited-for readers.
DECLARE_WAIT_QUEUE_HEAD(trc_wait); // List of holdout tasks. static DECLARE_WAIT_QUEUE_HEAD(trc_wait); // List of holdout tasks.
// Record outstanding IPIs to each CPU. No point in sending two... // Record outstanding IPIs to each CPU. No point in sending two...
static DEFINE_PER_CPU(bool, trc_ipi_to_cpu); static DEFINE_PER_CPU(bool, trc_ipi_to_cpu);
...@@ -835,7 +845,7 @@ static bool trc_inspect_reader(struct task_struct *t, void *arg) ...@@ -835,7 +845,7 @@ static bool trc_inspect_reader(struct task_struct *t, void *arg)
bool ofl = cpu_is_offline(cpu); bool ofl = cpu_is_offline(cpu);
if (task_curr(t)) { if (task_curr(t)) {
WARN_ON_ONCE(ofl & !is_idle_task(t)); WARN_ON_ONCE(ofl && !is_idle_task(t));
// If no chance of heavyweight readers, do it the hard way. // If no chance of heavyweight readers, do it the hard way.
if (!ofl && !IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB)) if (!ofl && !IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB))
...@@ -1118,11 +1128,10 @@ EXPORT_SYMBOL_GPL(call_rcu_tasks_trace); ...@@ -1118,11 +1128,10 @@ EXPORT_SYMBOL_GPL(call_rcu_tasks_trace);
* synchronize_rcu_tasks_trace - wait for a trace rcu-tasks grace period * synchronize_rcu_tasks_trace - wait for a trace rcu-tasks grace period
* *
* Control will return to the caller some time after a trace rcu-tasks * Control will return to the caller some time after a trace rcu-tasks
* grace period has elapsed, in other words after all currently * grace period has elapsed, in other words after all currently executing
* executing rcu-tasks read-side critical sections have elapsed. These * rcu-tasks read-side critical sections have elapsed. These read-side
* read-side critical sections are delimited by calls to schedule(), * critical sections are delimited by calls to rcu_read_lock_trace()
* cond_resched_tasks_rcu_qs(), userspace execution, and (in theory, * and rcu_read_unlock_trace().
* anyway) cond_resched().
* *
* This is a very specialized primitive, intended only for a few uses in * This is a very specialized primitive, intended only for a few uses in
* tracing and other situations requiring manipulation of function preambles * tracing and other situations requiring manipulation of function preambles
...@@ -1164,6 +1173,7 @@ static int __init rcu_spawn_tasks_trace_kthread(void) ...@@ -1164,6 +1173,7 @@ static int __init rcu_spawn_tasks_trace_kthread(void)
} }
core_initcall(rcu_spawn_tasks_trace_kthread); core_initcall(rcu_spawn_tasks_trace_kthread);
#ifndef CONFIG_TINY_RCU
static void show_rcu_tasks_trace_gp_kthread(void) static void show_rcu_tasks_trace_gp_kthread(void)
{ {
char buf[64]; char buf[64];
...@@ -1174,18 +1184,21 @@ static void show_rcu_tasks_trace_gp_kthread(void) ...@@ -1174,18 +1184,21 @@ static void show_rcu_tasks_trace_gp_kthread(void)
data_race(n_heavy_reader_attempts)); data_race(n_heavy_reader_attempts));
show_rcu_tasks_generic_gp_kthread(&rcu_tasks_trace, buf); show_rcu_tasks_generic_gp_kthread(&rcu_tasks_trace, buf);
} }
#endif /* #ifndef CONFIG_TINY_RCU */
#else /* #ifdef CONFIG_TASKS_TRACE_RCU */ #else /* #ifdef CONFIG_TASKS_TRACE_RCU */
static void exit_tasks_rcu_finish_trace(struct task_struct *t) { } static void exit_tasks_rcu_finish_trace(struct task_struct *t) { }
static inline void show_rcu_tasks_trace_gp_kthread(void) {} static inline void show_rcu_tasks_trace_gp_kthread(void) {}
#endif /* #else #ifdef CONFIG_TASKS_TRACE_RCU */ #endif /* #else #ifdef CONFIG_TASKS_TRACE_RCU */
#ifndef CONFIG_TINY_RCU
void show_rcu_tasks_gp_kthreads(void) void show_rcu_tasks_gp_kthreads(void)
{ {
show_rcu_tasks_classic_gp_kthread(); show_rcu_tasks_classic_gp_kthread();
show_rcu_tasks_rude_gp_kthread(); show_rcu_tasks_rude_gp_kthread();
show_rcu_tasks_trace_gp_kthread(); show_rcu_tasks_trace_gp_kthread();
} }
#endif /* #ifndef CONFIG_TINY_RCU */
#else /* #ifdef CONFIG_TASKS_RCU_GENERIC */ #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
static inline void rcu_tasks_bootup_oddness(void) {} static inline void rcu_tasks_bootup_oddness(void) {}
......
...@@ -23,6 +23,7 @@ ...@@ -23,6 +23,7 @@
#include <linux/cpu.h> #include <linux/cpu.h>
#include <linux/prefetch.h> #include <linux/prefetch.h>
#include <linux/slab.h> #include <linux/slab.h>
#include <linux/mm.h>
#include "rcu.h" #include "rcu.h"
...@@ -84,9 +85,9 @@ static inline bool rcu_reclaim_tiny(struct rcu_head *head) ...@@ -84,9 +85,9 @@ static inline bool rcu_reclaim_tiny(struct rcu_head *head)
unsigned long offset = (unsigned long)head->func; unsigned long offset = (unsigned long)head->func;
rcu_lock_acquire(&rcu_callback_map); rcu_lock_acquire(&rcu_callback_map);
if (__is_kfree_rcu_offset(offset)) { if (__is_kvfree_rcu_offset(offset)) {
trace_rcu_invoke_kfree_callback("", head, offset); trace_rcu_invoke_kvfree_callback("", head, offset);
kfree((void *)head - offset); kvfree((void *)head - offset);
rcu_lock_release(&rcu_callback_map); rcu_lock_release(&rcu_callback_map);
return true; return true;
} }
......
This diff is collapsed.
...@@ -41,7 +41,7 @@ struct rcu_node { ...@@ -41,7 +41,7 @@ struct rcu_node {
raw_spinlock_t __private lock; /* Root rcu_node's lock protects */ raw_spinlock_t __private lock; /* Root rcu_node's lock protects */
/* some rcu_state fields as well as */ /* some rcu_state fields as well as */
/* following. */ /* following. */
unsigned long gp_seq; /* Track rsp->rcu_gp_seq. */ unsigned long gp_seq; /* Track rsp->gp_seq. */
unsigned long gp_seq_needed; /* Track furthest future GP request. */ unsigned long gp_seq_needed; /* Track furthest future GP request. */
unsigned long completedqs; /* All QSes done for this node. */ unsigned long completedqs; /* All QSes done for this node. */
unsigned long qsmask; /* CPUs or groups that need to switch in */ unsigned long qsmask; /* CPUs or groups that need to switch in */
...@@ -73,9 +73,9 @@ struct rcu_node { ...@@ -73,9 +73,9 @@ struct rcu_node {
unsigned long ffmask; /* Fully functional CPUs. */ unsigned long ffmask; /* Fully functional CPUs. */
unsigned long grpmask; /* Mask to apply to parent qsmask. */ unsigned long grpmask; /* Mask to apply to parent qsmask. */
/* Only one bit will be set in this mask. */ /* Only one bit will be set in this mask. */
int grplo; /* lowest-numbered CPU or group here. */ int grplo; /* lowest-numbered CPU here. */
int grphi; /* highest-numbered CPU or group here. */ int grphi; /* highest-numbered CPU here. */
u8 grpnum; /* CPU/group number for next level up. */ u8 grpnum; /* group number for next level up. */
u8 level; /* root is at level 0. */ u8 level; /* root is at level 0. */
bool wait_blkd_tasks;/* Necessary to wait for blocked tasks to */ bool wait_blkd_tasks;/* Necessary to wait for blocked tasks to */
/* exit RCU read-side critical sections */ /* exit RCU read-side critical sections */
...@@ -149,7 +149,7 @@ union rcu_noqs { ...@@ -149,7 +149,7 @@ union rcu_noqs {
/* Per-CPU data for read-copy update. */ /* Per-CPU data for read-copy update. */
struct rcu_data { struct rcu_data {
/* 1) quiescent-state and grace-period handling : */ /* 1) quiescent-state and grace-period handling : */
unsigned long gp_seq; /* Track rsp->rcu_gp_seq counter. */ unsigned long gp_seq; /* Track rsp->gp_seq counter. */
unsigned long gp_seq_needed; /* Track furthest future GP request. */ unsigned long gp_seq_needed; /* Track furthest future GP request. */
union rcu_noqs cpu_no_qs; /* No QSes yet for this CPU. */ union rcu_noqs cpu_no_qs; /* No QSes yet for this CPU. */
bool core_needs_qs; /* Core waits for quiesc state. */ bool core_needs_qs; /* Core waits for quiesc state. */
...@@ -171,6 +171,7 @@ struct rcu_data { ...@@ -171,6 +171,7 @@ struct rcu_data {
/* different grace periods. */ /* different grace periods. */
long qlen_last_fqs_check; long qlen_last_fqs_check;
/* qlen at last check for QS forcing */ /* qlen at last check for QS forcing */
unsigned long n_cbs_invoked; /* # callbacks invoked since boot. */
unsigned long n_force_qs_snap; unsigned long n_force_qs_snap;
/* did other CPU force QS recently? */ /* did other CPU force QS recently? */
long blimit; /* Upper limit on a processed batch */ long blimit; /* Upper limit on a processed batch */
...@@ -301,6 +302,8 @@ struct rcu_state { ...@@ -301,6 +302,8 @@ struct rcu_state {
u8 boost ____cacheline_internodealigned_in_smp; u8 boost ____cacheline_internodealigned_in_smp;
/* Subject to priority boost. */ /* Subject to priority boost. */
unsigned long gp_seq; /* Grace-period sequence #. */ unsigned long gp_seq; /* Grace-period sequence #. */
unsigned long gp_max; /* Maximum GP duration in */
/* jiffies. */
struct task_struct *gp_kthread; /* Task for grace periods. */ struct task_struct *gp_kthread; /* Task for grace periods. */
struct swait_queue_head gp_wq; /* Where GP task waits. */ struct swait_queue_head gp_wq; /* Where GP task waits. */
short gp_flags; /* Commands for GP task. */ short gp_flags; /* Commands for GP task. */
...@@ -346,8 +349,6 @@ struct rcu_state { ...@@ -346,8 +349,6 @@ struct rcu_state {
/* a reluctant CPU. */ /* a reluctant CPU. */
unsigned long n_force_qs_gpstart; /* Snapshot of n_force_qs at */ unsigned long n_force_qs_gpstart; /* Snapshot of n_force_qs at */
/* GP start. */ /* GP start. */
unsigned long gp_max; /* Maximum GP duration in */
/* jiffies. */
const char *name; /* Name of structure. */ const char *name; /* Name of structure. */
char abbr; /* Abbreviated name. */ char abbr; /* Abbreviated name. */
......
...@@ -403,7 +403,7 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) ...@@ -403,7 +403,7 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
/* Online, so delay for a bit and try again. */ /* Online, so delay for a bit and try again. */
raw_spin_unlock_irqrestore_rcu_node(rnp, flags); raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("selectofl")); trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("selectofl"));
schedule_timeout_uninterruptible(1); schedule_timeout_idle(1);
goto retry_ipi; goto retry_ipi;
} }
/* CPU really is offline, so we must report its QS. */ /* CPU really is offline, so we must report its QS. */
......
...@@ -1033,7 +1033,7 @@ static int rcu_boost_kthread(void *arg) ...@@ -1033,7 +1033,7 @@ static int rcu_boost_kthread(void *arg)
if (spincnt > 10) { if (spincnt > 10) {
WRITE_ONCE(rnp->boost_kthread_status, RCU_KTHREAD_YIELDING); WRITE_ONCE(rnp->boost_kthread_status, RCU_KTHREAD_YIELDING);
trace_rcu_utilization(TPS("End boost kthread@rcu_yield")); trace_rcu_utilization(TPS("End boost kthread@rcu_yield"));
schedule_timeout_interruptible(2); schedule_timeout_idle(2);
trace_rcu_utilization(TPS("Start boost kthread@rcu_yield")); trace_rcu_utilization(TPS("Start boost kthread@rcu_yield"));
spincnt = 0; spincnt = 0;
} }
...@@ -2005,7 +2005,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) ...@@ -2005,7 +2005,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
/* Polling, so trace if first poll in the series. */ /* Polling, so trace if first poll in the series. */
if (gotcbs) if (gotcbs)
trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Poll")); trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Poll"));
schedule_timeout_interruptible(1); schedule_timeout_idle(1);
} else if (!needwait_gp) { } else if (!needwait_gp) {
/* Wait for callbacks to appear. */ /* Wait for callbacks to appear. */
trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Sleep")); trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Sleep"));
......
...@@ -237,14 +237,12 @@ struct rcu_stall_chk_rdr { ...@@ -237,14 +237,12 @@ struct rcu_stall_chk_rdr {
*/ */
static bool check_slow_task(struct task_struct *t, void *arg) static bool check_slow_task(struct task_struct *t, void *arg)
{ {
struct rcu_node *rnp;
struct rcu_stall_chk_rdr *rscrp = arg; struct rcu_stall_chk_rdr *rscrp = arg;
if (task_curr(t)) if (task_curr(t))
return false; // It is running, so decline to inspect it. return false; // It is running, so decline to inspect it.
rscrp->nesting = t->rcu_read_lock_nesting; rscrp->nesting = t->rcu_read_lock_nesting;
rscrp->rs = t->rcu_read_unlock_special; rscrp->rs = t->rcu_read_unlock_special;
rnp = t->rcu_blocked_node;
rscrp->on_blkd_list = !list_empty(&t->rcu_node_entry); rscrp->on_blkd_list = !list_empty(&t->rcu_node_entry);
return true; return true;
} }
...@@ -468,7 +466,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps) ...@@ -468,7 +466,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
/* /*
* OK, time to rat on our buddy... * OK, time to rat on our buddy...
* See Documentation/RCU/stallwarn.txt for info on how to debug * See Documentation/RCU/stallwarn.rst for info on how to debug
* RCU CPU stall warnings. * RCU CPU stall warnings.
*/ */
pr_err("INFO: %s detected stalls on CPUs/tasks:\n", rcu_state.name); pr_err("INFO: %s detected stalls on CPUs/tasks:\n", rcu_state.name);
...@@ -535,7 +533,7 @@ static void print_cpu_stall(unsigned long gps) ...@@ -535,7 +533,7 @@ static void print_cpu_stall(unsigned long gps)
/* /*
* OK, time to rat on ourselves... * OK, time to rat on ourselves...
* See Documentation/RCU/stallwarn.txt for info on how to debug * See Documentation/RCU/stallwarn.rst for info on how to debug
* RCU CPU stall warnings. * RCU CPU stall warnings.
*/ */
pr_err("INFO: %s self-detected stall on CPU\n", rcu_state.name); pr_err("INFO: %s self-detected stall on CPU\n", rcu_state.name);
...@@ -649,6 +647,7 @@ static void check_cpu_stall(struct rcu_data *rdp) ...@@ -649,6 +647,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
*/ */
void show_rcu_gp_kthreads(void) void show_rcu_gp_kthreads(void)
{ {
unsigned long cbs = 0;
int cpu; int cpu;
unsigned long j; unsigned long j;
unsigned long ja; unsigned long ja;
...@@ -690,9 +689,11 @@ void show_rcu_gp_kthreads(void) ...@@ -690,9 +689,11 @@ void show_rcu_gp_kthreads(void)
} }
for_each_possible_cpu(cpu) { for_each_possible_cpu(cpu) {
rdp = per_cpu_ptr(&rcu_data, cpu); rdp = per_cpu_ptr(&rcu_data, cpu);
cbs += data_race(rdp->n_cbs_invoked);
if (rcu_segcblist_is_offloaded(&rdp->cblist)) if (rcu_segcblist_is_offloaded(&rdp->cblist))
show_rcu_nocb_state(rdp); show_rcu_nocb_state(rdp);
} }
pr_info("RCU callbacks invoked since boot: %lu\n", cbs);
show_rcu_tasks_gp_kthreads(); show_rcu_tasks_gp_kthreads();
} }
EXPORT_SYMBOL_GPL(show_rcu_gp_kthreads); EXPORT_SYMBOL_GPL(show_rcu_gp_kthreads);
......
...@@ -42,6 +42,7 @@ ...@@ -42,6 +42,7 @@
#include <linux/kprobes.h> #include <linux/kprobes.h>
#include <linux/slab.h> #include <linux/slab.h>
#include <linux/irq_work.h> #include <linux/irq_work.h>
#include <linux/rcupdate_trace.h>
#define CREATE_TRACE_POINTS #define CREATE_TRACE_POINTS
...@@ -207,7 +208,7 @@ void rcu_end_inkernel_boot(void) ...@@ -207,7 +208,7 @@ void rcu_end_inkernel_boot(void)
rcu_unexpedite_gp(); rcu_unexpedite_gp();
if (rcu_normal_after_boot) if (rcu_normal_after_boot)
WRITE_ONCE(rcu_normal, 1); WRITE_ONCE(rcu_normal, 1);
rcu_boot_ended = 1; rcu_boot_ended = true;
} }
/* /*
...@@ -279,6 +280,7 @@ struct lockdep_map rcu_sched_lock_map = { ...@@ -279,6 +280,7 @@ struct lockdep_map rcu_sched_lock_map = {
}; };
EXPORT_SYMBOL_GPL(rcu_sched_lock_map); EXPORT_SYMBOL_GPL(rcu_sched_lock_map);
// Tell lockdep when RCU callbacks are being invoked.
static struct lock_class_key rcu_callback_key; static struct lock_class_key rcu_callback_key;
struct lockdep_map rcu_callback_map = struct lockdep_map rcu_callback_map =
STATIC_LOCKDEP_MAP_INIT("rcu_callback", &rcu_callback_key); STATIC_LOCKDEP_MAP_INIT("rcu_callback", &rcu_callback_key);
...@@ -390,13 +392,14 @@ void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array, ...@@ -390,13 +392,14 @@ void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array,
might_sleep(); might_sleep();
continue; continue;
} }
init_rcu_head_on_stack(&rs_array[i].head);
init_completion(&rs_array[i].completion);
for (j = 0; j < i; j++) for (j = 0; j < i; j++)
if (crcu_array[j] == crcu_array[i]) if (crcu_array[j] == crcu_array[i])
break; break;
if (j == i) if (j == i) {
init_rcu_head_on_stack(&rs_array[i].head);
init_completion(&rs_array[i].completion);
(crcu_array[i])(&rs_array[i].head, wakeme_after_rcu); (crcu_array[i])(&rs_array[i].head, wakeme_after_rcu);
}
} }
/* Wait for all callbacks to be invoked. */ /* Wait for all callbacks to be invoked. */
...@@ -407,9 +410,10 @@ void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array, ...@@ -407,9 +410,10 @@ void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array,
for (j = 0; j < i; j++) for (j = 0; j < i; j++)
if (crcu_array[j] == crcu_array[i]) if (crcu_array[j] == crcu_array[i])
break; break;
if (j == i) if (j == i) {
wait_for_completion(&rs_array[i].completion); wait_for_completion(&rs_array[i].completion);
destroy_rcu_head_on_stack(&rs_array[i].head); destroy_rcu_head_on_stack(&rs_array[i].head);
}
} }
} }
EXPORT_SYMBOL_GPL(__wait_rcu_gp); EXPORT_SYMBOL_GPL(__wait_rcu_gp);
......
...@@ -351,16 +351,24 @@ void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit) ...@@ -351,16 +351,24 @@ void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit)
EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cpu); EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cpu);
/* /*
* Set a per-task tick dependency. Posix CPU timers need this in order to elapse * Set a per-task tick dependency. RCU need this. Also posix CPU timers
* per task timers. * in order to elapse per task timers.
*/ */
void tick_nohz_dep_set_task(struct task_struct *tsk, enum tick_dep_bits bit) void tick_nohz_dep_set_task(struct task_struct *tsk, enum tick_dep_bits bit)
{ {
/* if (!atomic_fetch_or(BIT(bit), &tsk->tick_dep_mask)) {
* We could optimize this with just kicking the target running the task if (tsk == current) {
* if that noise matters for nohz full users. preempt_disable();
*/ tick_nohz_full_kick();
tick_nohz_dep_set_all(&tsk->tick_dep_mask, bit); preempt_enable();
} else {
/*
* Some future tick_nohz_full_kick_task()
* should optimize this.
*/
tick_nohz_full_kick_all();
}
}
} }
EXPORT_SYMBOL_GPL(tick_nohz_dep_set_task); EXPORT_SYMBOL_GPL(tick_nohz_dep_set_task);
......
...@@ -45,6 +45,9 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>"); ...@@ -45,6 +45,9 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>");
static bool disable_onoff_at_boot; static bool disable_onoff_at_boot;
module_param(disable_onoff_at_boot, bool, 0444); module_param(disable_onoff_at_boot, bool, 0444);
static bool ftrace_dump_at_shutdown;
module_param(ftrace_dump_at_shutdown, bool, 0444);
static char *torture_type; static char *torture_type;
static int verbose; static int verbose;
...@@ -527,7 +530,8 @@ static int torture_shutdown(void *arg) ...@@ -527,7 +530,8 @@ static int torture_shutdown(void *arg)
torture_shutdown_hook(); torture_shutdown_hook();
else else
VERBOSE_TOROUT_STRING("No torture_shutdown_hook(), skipping."); VERBOSE_TOROUT_STRING("No torture_shutdown_hook(), skipping.");
rcu_ftrace_dump(DUMP_ALL); if (ftrace_dump_at_shutdown)
rcu_ftrace_dump(DUMP_ALL);
kernel_power_off(); /* Shut down the system. */ kernel_power_off(); /* Shut down the system. */
return 0; return 0;
} }
......
...@@ -15,6 +15,8 @@ ...@@ -15,6 +15,8 @@
#include <linux/delay.h> #include <linux/delay.h>
#include <linux/rwsem.h> #include <linux/rwsem.h>
#include <linux/mm.h> #include <linux/mm.h>
#include <linux/rcupdate.h>
#include <linux/slab.h>
#define __param(type, name, init, msg) \ #define __param(type, name, init, msg) \
static type name = init; \ static type name = init; \
...@@ -35,14 +37,18 @@ __param(int, test_loop_count, 1000000, ...@@ -35,14 +37,18 @@ __param(int, test_loop_count, 1000000,
__param(int, run_test_mask, INT_MAX, __param(int, run_test_mask, INT_MAX,
"Set tests specified in the mask.\n\n" "Set tests specified in the mask.\n\n"
"\t\tid: 1, name: fix_size_alloc_test\n" "\t\tid: 1, name: fix_size_alloc_test\n"
"\t\tid: 2, name: full_fit_alloc_test\n" "\t\tid: 2, name: full_fit_alloc_test\n"
"\t\tid: 4, name: long_busy_list_alloc_test\n" "\t\tid: 4, name: long_busy_list_alloc_test\n"
"\t\tid: 8, name: random_size_alloc_test\n" "\t\tid: 8, name: random_size_alloc_test\n"
"\t\tid: 16, name: fix_align_alloc_test\n" "\t\tid: 16, name: fix_align_alloc_test\n"
"\t\tid: 32, name: random_size_align_alloc_test\n" "\t\tid: 32, name: random_size_align_alloc_test\n"
"\t\tid: 64, name: align_shift_alloc_test\n" "\t\tid: 64, name: align_shift_alloc_test\n"
"\t\tid: 128, name: pcpu_alloc_test\n" "\t\tid: 128, name: pcpu_alloc_test\n"
"\t\tid: 256, name: kvfree_rcu_1_arg_vmalloc_test\n"
"\t\tid: 512, name: kvfree_rcu_2_arg_vmalloc_test\n"
"\t\tid: 1024, name: kvfree_rcu_1_arg_slab_test\n"
"\t\tid: 2048, name: kvfree_rcu_2_arg_slab_test\n"
/* Add a new test case description here. */ /* Add a new test case description here. */
); );
...@@ -316,6 +322,83 @@ pcpu_alloc_test(void) ...@@ -316,6 +322,83 @@ pcpu_alloc_test(void)
return rv; return rv;
} }
struct test_kvfree_rcu {
struct rcu_head rcu;
unsigned char array[20];
};
static int
kvfree_rcu_1_arg_vmalloc_test(void)
{
struct test_kvfree_rcu *p;
int i;
for (i = 0; i < test_loop_count; i++) {
p = vmalloc(1 * PAGE_SIZE);
if (!p)
return -1;
p->array[0] = 'a';
kvfree_rcu(p);
}
return 0;
}
static int
kvfree_rcu_2_arg_vmalloc_test(void)
{
struct test_kvfree_rcu *p;
int i;
for (i = 0; i < test_loop_count; i++) {
p = vmalloc(1 * PAGE_SIZE);
if (!p)
return -1;
p->array[0] = 'a';
kvfree_rcu(p, rcu);
}
return 0;
}
static int
kvfree_rcu_1_arg_slab_test(void)
{
struct test_kvfree_rcu *p;
int i;
for (i = 0; i < test_loop_count; i++) {
p = kmalloc(sizeof(*p), GFP_KERNEL);
if (!p)
return -1;
p->array[0] = 'a';
kvfree_rcu(p);
}
return 0;
}
static int
kvfree_rcu_2_arg_slab_test(void)
{
struct test_kvfree_rcu *p;
int i;
for (i = 0; i < test_loop_count; i++) {
p = kmalloc(sizeof(*p), GFP_KERNEL);
if (!p)
return -1;
p->array[0] = 'a';
kvfree_rcu(p, rcu);
}
return 0;
}
struct test_case_desc { struct test_case_desc {
const char *test_name; const char *test_name;
int (*test_func)(void); int (*test_func)(void);
...@@ -330,6 +413,10 @@ static struct test_case_desc test_case_array[] = { ...@@ -330,6 +413,10 @@ static struct test_case_desc test_case_array[] = {
{ "random_size_align_alloc_test", random_size_align_alloc_test }, { "random_size_align_alloc_test", random_size_align_alloc_test },
{ "align_shift_alloc_test", align_shift_alloc_test }, { "align_shift_alloc_test", align_shift_alloc_test },
{ "pcpu_alloc_test", pcpu_alloc_test }, { "pcpu_alloc_test", pcpu_alloc_test },
{ "kvfree_rcu_1_arg_vmalloc_test", kvfree_rcu_1_arg_vmalloc_test },
{ "kvfree_rcu_2_arg_vmalloc_test", kvfree_rcu_2_arg_vmalloc_test },
{ "kvfree_rcu_1_arg_slab_test", kvfree_rcu_1_arg_slab_test },
{ "kvfree_rcu_2_arg_slab_test", kvfree_rcu_2_arg_slab_test },
/* Add a new test case here. */ /* Add a new test case here. */
}; };
......
...@@ -373,14 +373,14 @@ static void memcg_destroy_list_lru_node(struct list_lru_node *nlru) ...@@ -373,14 +373,14 @@ static void memcg_destroy_list_lru_node(struct list_lru_node *nlru)
struct list_lru_memcg *memcg_lrus; struct list_lru_memcg *memcg_lrus;
/* /*
* This is called when shrinker has already been unregistered, * This is called when shrinker has already been unregistered,
* and nobody can use it. So, there is no need to use kvfree_rcu(). * and nobody can use it. So, there is no need to use kvfree_rcu_local().
*/ */
memcg_lrus = rcu_dereference_protected(nlru->memcg_lrus, true); memcg_lrus = rcu_dereference_protected(nlru->memcg_lrus, true);
__memcg_destroy_list_lru_node(memcg_lrus, 0, memcg_nr_cache_ids); __memcg_destroy_list_lru_node(memcg_lrus, 0, memcg_nr_cache_ids);
kvfree(memcg_lrus); kvfree(memcg_lrus);
} }
static void kvfree_rcu(struct rcu_head *head) static void kvfree_rcu_local(struct rcu_head *head)
{ {
struct list_lru_memcg *mlru; struct list_lru_memcg *mlru;
...@@ -419,7 +419,7 @@ static int memcg_update_list_lru_node(struct list_lru_node *nlru, ...@@ -419,7 +419,7 @@ static int memcg_update_list_lru_node(struct list_lru_node *nlru,
rcu_assign_pointer(nlru->memcg_lrus, new); rcu_assign_pointer(nlru->memcg_lrus, new);
spin_unlock_irq(&nlru->lock); spin_unlock_irq(&nlru->lock);
call_rcu(&old->rcu, kvfree_rcu); call_rcu(&old->rcu, kvfree_rcu_local);
return 0; return 0;
} }
......
...@@ -3171,6 +3171,7 @@ void exit_mmap(struct mm_struct *mm) ...@@ -3171,6 +3171,7 @@ void exit_mmap(struct mm_struct *mm)
if (vma->vm_flags & VM_ACCOUNT) if (vma->vm_flags & VM_ACCOUNT)
nr_accounted += vma_pages(vma); nr_accounted += vma_pages(vma);
vma = remove_vma(vma); vma = remove_vma(vma);
cond_resched();
} }
vm_unacct_memory(nr_accounted); vm_unacct_memory(nr_accounted);
} }
......
...@@ -1973,7 +1973,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority) ...@@ -1973,7 +1973,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
/* /*
* Before updating sk_refcnt, we must commit prior changes to memory * Before updating sk_refcnt, we must commit prior changes to memory
* (Documentation/RCU/rculist_nulls.txt for details) * (Documentation/RCU/rculist_nulls.rst for details)
*/ */
smp_wmb(); smp_wmb();
refcount_set(&newsk->sk_refcnt, 2); refcount_set(&newsk->sk_refcnt, 2);
...@@ -3035,7 +3035,7 @@ void sock_init_data(struct socket *sock, struct sock *sk) ...@@ -3035,7 +3035,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
sk_rx_queue_clear(sk); sk_rx_queue_clear(sk);
/* /*
* Before updating sk_refcnt, we must commit prior changes to memory * Before updating sk_refcnt, we must commit prior changes to memory
* (Documentation/RCU/rculist_nulls.txt for details) * (Documentation/RCU/rculist_nulls.rst for details)
*/ */
smp_wmb(); smp_wmb();
refcount_set(&sk->sk_refcnt, 1); refcount_set(&sk->sk_refcnt, 1);
......
...@@ -32,11 +32,11 @@ if test -z "$TORTURE_TRUST_MAKE" ...@@ -32,11 +32,11 @@ if test -z "$TORTURE_TRUST_MAKE"
then then
make clean > $resdir/Make.clean 2>&1 make clean > $resdir/Make.clean 2>&1
fi fi
make $TORTURE_DEFCONFIG > $resdir/Make.defconfig.out 2>&1 make $TORTURE_KMAKE_ARG $TORTURE_DEFCONFIG > $resdir/Make.defconfig.out 2>&1
mv .config .config.sav mv .config .config.sav
sh $T/upd.sh < .config.sav > .config sh $T/upd.sh < .config.sav > .config
cp .config .config.new cp .config .config.new
yes '' | make oldconfig > $resdir/Make.oldconfig.out 2> $resdir/Make.oldconfig.err yes '' | make $TORTURE_KMAKE_ARG oldconfig > $resdir/Make.oldconfig.out 2> $resdir/Make.oldconfig.err
# verify new config matches specification. # verify new config matches specification.
configcheck.sh .config $c configcheck.sh .config $c
......
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+
#
# Scan standard input for error messages, dumping any found to standard
# output.
#
# Usage: console-badness.sh
#
# Copyright (C) 2020 Facebook, Inc.
#
# Authors: Paul E. McKenney <paulmck@kernel.org>
egrep 'Badness|WARNING:|Warn|BUG|===========|Call Trace:|Oops:|detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state|rcu_.*kthread starved for|!!!' |
grep -v 'ODEBUG: ' |
grep -v 'This means that this is a DEBUG kernel and it is' |
grep -v 'Warning: unable to open an initial console'
...@@ -215,9 +215,6 @@ identify_qemu_args () { ...@@ -215,9 +215,6 @@ identify_qemu_args () {
then then
echo -device spapr-vlan,netdev=net0,mac=$TORTURE_QEMU_MAC echo -device spapr-vlan,netdev=net0,mac=$TORTURE_QEMU_MAC
echo -netdev bridge,br=br0,id=net0 echo -netdev bridge,br=br0,id=net0
elif test -n "$TORTURE_QEMU_INTERACTIVE"
then
echo -net nic -net user
fi fi
;; ;;
esac esac
...@@ -234,7 +231,7 @@ identify_qemu_args () { ...@@ -234,7 +231,7 @@ identify_qemu_args () {
# Returns the number of virtual CPUs available to the aggregate of the # Returns the number of virtual CPUs available to the aggregate of the
# guest OSes. # guest OSes.
identify_qemu_vcpus () { identify_qemu_vcpus () {
lscpu | grep '^CPU(s):' | sed -e 's/CPU(s)://' lscpu | grep '^CPU(s):' | sed -e 's/CPU(s)://' -e 's/[ ]*//g'
} }
# print_bug # print_bug
...@@ -275,3 +272,21 @@ specify_qemu_cpus () { ...@@ -275,3 +272,21 @@ specify_qemu_cpus () {
esac esac
fi fi
} }
# specify_qemu_net qemu-args
#
# Appends a string containing "-net none" to qemu-args, unless the incoming
# qemu-args already contains "-smp" or unless the TORTURE_QEMU_INTERACTIVE
# environment variable is set, in which case the string that is be added is
# instead "-net nic -net user".
specify_qemu_net () {
if echo $1 | grep -q -e -net
then
echo $1
elif test -n "$TORTURE_QEMU_INTERACTIVE"
then
echo $1 -net nic -net user
else
echo $1 -net none
fi
}
...@@ -46,6 +46,12 @@ do ...@@ -46,6 +46,12 @@ do
exit 0; exit 0;
fi fi
# Check for stop request.
if test -f "$TORTURE_STOPFILE"
then
exit 1;
fi
# Set affinity to randomly selected online CPU # Set affinity to randomly selected online CPU
if cpus=`grep 1 /sys/devices/system/cpu/*/online 2>&1 | if cpus=`grep 1 /sys/devices/system/cpu/*/online 2>&1 |
sed -e 's,/[^/]*$,,' -e 's/^[^0-9]*//'` sed -e 's,/[^/]*$,,' -e 's/^[^0-9]*//'`
......
...@@ -9,6 +9,12 @@ ...@@ -9,6 +9,12 @@
# #
# Authors: Paul E. McKenney <paulmck@linux.ibm.com> # Authors: Paul E. McKenney <paulmck@linux.ibm.com>
if test -f "$TORTURE_STOPFILE"
then
echo "kvm-build.sh early exit due to run STOP request"
exit 1
fi
config_template=${1} config_template=${1}
if test -z "$config_template" -o ! -f "$config_template" -o ! -r "$config_template" if test -z "$config_template" -o ! -f "$config_template" -o ! -r "$config_template"
then then
......
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0+
#
# Run a group of kvm.sh tests on the specified commits. This currently
# unconditionally does three-minute runs on each scenario in CFLIST,
# taking advantage of all available CPUs and trusting the "make" utility.
# In the short term, adjustments can be made by editing this script and
# CFLIST. If some adjustments appear to have ongoing value, this script
# might grow some command-line arguments.
#
# Usage: kvm-check-branches.sh commit1 commit2..commit3 commit4 ...
#
# This script considers its arguments one at a time. If more elaborate
# specification of commits is needed, please use "git rev-list" to
# produce something that this simple script can understand. The reason
# for retaining the simplicity is that it allows the user to more easily
# see which commit came from which branch.
#
# This script creates a yyyy.mm.dd-hh.mm.ss-group entry in the "res"
# directory. The calls to kvm.sh create the usual entries, but this script
# moves them under the yyyy.mm.dd-hh.mm.ss-group entry, each in its own
# directory numbered in run order, that is, "0001", "0002", and so on.
# For successful runs, the large build artifacts are removed. Doing this
# reduces the disk space required by about two orders of magnitude for
# successful runs.
#
# Copyright (C) Facebook, 2020
#
# Authors: Paul E. McKenney <paulmck@kernel.org>
if ! git status > /dev/null 2>&1
then
echo '!!!' This script needs to run in a git archive. 1>&2
echo '!!!' Giving up. 1>&2
exit 1
fi
# Remember where we started so that we can get back and the end.
curcommit="`git status | head -1 | awk '{ print $NF }'`"
nfail=0
ntry=0
resdir="tools/testing/selftests/rcutorture/res"
ds="`date +%Y.%m.%d-%H.%M.%S`-group"
if ! test -e $resdir
then
mkdir $resdir || :
fi
mkdir $resdir/$ds
echo Results directory: $resdir/$ds
KVM="`pwd`/tools/testing/selftests/rcutorture"; export KVM
PATH=${KVM}/bin:$PATH; export PATH
. functions.sh
cpus="`identify_qemu_vcpus`"
echo Using up to $cpus CPUs.
# Each pass through this loop does one command-line argument.
for gitbr in $@
do
echo ' --- git branch ' $gitbr
# Each pass through this loop tests one commit.
for i in `git rev-list "$gitbr"`
do
ntry=`expr $ntry + 1`
idir=`awk -v ntry="$ntry" 'END { printf "%04d", ntry; }' < /dev/null`
echo ' --- commit ' $i from branch $gitbr
date
mkdir $resdir/$ds/$idir
echo $gitbr > $resdir/$ds/$idir/gitbr
echo $i >> $resdir/$ds/$idir/gitbr
# Test the specified commit.
git checkout $i > $resdir/$ds/$idir/git-checkout.out 2>&1
echo git checkout return code: $? "(Commit $ntry: $i)"
kvm.sh --cpus $cpus --duration 3 --trust-make > $resdir/$ds/$idir/kvm.sh.out 2>&1
ret=$?
echo kvm.sh return code $ret for commit $i from branch $gitbr
# Move the build products to their resting place.
runresdir="`grep -m 1 '^Results directory:' < $resdir/$ds/$idir/kvm.sh.out | sed -e 's/^Results directory://'`"
mv $runresdir $resdir/$ds/$idir
rrd="`echo $runresdir | sed -e 's,^.*/,,'`"
echo Run results: $resdir/$ds/$idir/$rrd
if test "$ret" -ne 0
then
# Failure, so leave all evidence intact.
nfail=`expr $nfail + 1`
else
# Success, so remove large files to save about 1GB.
( cd $resdir/$ds/$idir/$rrd; rm -f */vmlinux */bzImage */System.map */Module.symvers )
fi
done
done
date
# Go back to the original commit.
git checkout "$curcommit"
if test $nfail -ne 0
then
echo '!!! ' $nfail failures in $ntry 'runs!!!'
exit 1
else
echo No failures in $ntry runs.
exit 0
fi
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+
#
# Analyze a given results directory for refscale performance measurements.
#
# Usage: kvm-recheck-refscale.sh resdir
#
# Copyright (C) IBM Corporation, 2016
#
# Authors: Paul E. McKenney <paulmck@linux.ibm.com>
i="$1"
if test -d "$i" -a -r "$i"
then
:
else
echo Unreadable results directory: $i
exit 1
fi
PATH=`pwd`/tools/testing/selftests/rcutorture/bin:$PATH; export PATH
. functions.sh
configfile=`echo $i | sed -e 's/^.*\///'`
sed -e 's/^\[[^]]*]//' < $i/console.log | tr -d '\015' |
awk -v configfile="$configfile" '
/^[ ]*Runs Time\(ns\) *$/ {
if (dataphase + 0 == 0) {
dataphase = 1;
# print configfile, $0;
}
next;
}
/[^ ]*[0-9][0-9]* [0-9][0-9]*\.[0-9][0-9]*$/ {
if (dataphase == 1) {
# print $0;
readertimes[++n] = $2;
sum += $2;
}
next;
}
{
if (dataphase == 1)
dataphase == 2;
next;
}
END {
print configfile " results:";
newNR = asort(readertimes);
if (newNR <= 0) {
print "No refscale records found???"
exit;
}
medianidx = int(newNR / 2);
if (newNR == medianidx * 2)
medianvalue = (readertimes[medianidx - 1] + readertimes[medianidx]) / 2;
else
medianvalue = readertimes[medianidx];
points = "Points:";
for (i = 1; i <= newNR; i++)
points = points " " readertimes[i];
print points;
print "Average reader duration: " sum / newNR " nanoseconds";
print "Minimum reader duration: " readertimes[1];
print "Median reader duration: " medianvalue;
print "Maximum reader duration: " readertimes[newNR];
print "Computed from refscale printk output.";
}'
...@@ -31,6 +31,7 @@ do ...@@ -31,6 +31,7 @@ do
head -1 $resdir/log head -1 $resdir/log
fi fi
TORTURE_SUITE="`cat $i/../TORTURE_SUITE`" TORTURE_SUITE="`cat $i/../TORTURE_SUITE`"
configfile=`echo $i | sed -e 's,^.*/,,'`
rm -f $i/console.log.*.diags rm -f $i/console.log.*.diags
kvm-recheck-${TORTURE_SUITE}.sh $i kvm-recheck-${TORTURE_SUITE}.sh $i
if test -f "$i/qemu-retval" && test "`cat $i/qemu-retval`" -ne 0 && test "`cat $i/qemu-retval`" -ne 137 if test -f "$i/qemu-retval" && test "`cat $i/qemu-retval`" -ne 0 && test "`cat $i/qemu-retval`" -ne 137
...@@ -43,7 +44,8 @@ do ...@@ -43,7 +44,8 @@ do
then then
echo QEMU killed echo QEMU killed
fi fi
configcheck.sh $i/.config $i/ConfigFragment configcheck.sh $i/.config $i/ConfigFragment > $T 2>&1
cat $T
if test -r $i/Make.oldconfig.err if test -r $i/Make.oldconfig.err
then then
cat $i/Make.oldconfig.err cat $i/Make.oldconfig.err
...@@ -55,15 +57,15 @@ do ...@@ -55,15 +57,15 @@ do
cat $i/Warnings cat $i/Warnings
fi fi
else else
if test -f "$i/qemu-cmd" if test -f "$i/buildonly"
then
print_bug qemu failed
echo " $i"
elif test -f "$i/buildonly"
then then
echo Build-only run, no boot/test echo Build-only run, no boot/test
configcheck.sh $i/.config $i/ConfigFragment configcheck.sh $i/.config $i/ConfigFragment
parse-build.sh $i/Make.out $configfile parse-build.sh $i/Make.out $configfile
elif test -f "$i/qemu-cmd"
then
print_bug qemu failed
echo " $i"
else else
print_bug Build failed print_bug Build failed
echo " $i" echo " $i"
...@@ -72,7 +74,11 @@ do ...@@ -72,7 +74,11 @@ do
done done
if test -f "$rd/kcsan.sum" if test -f "$rd/kcsan.sum"
then then
if test -s "$rd/kcsan.sum" if grep -q CONFIG_KCSAN=y $T
then
echo "Compiler or architecture does not support KCSAN!"
echo Did you forget to switch your compiler with '--kmake-arg CC=<cc-that-supports-kcsan>'?
elif test -s "$rd/kcsan.sum"
then then
echo KCSAN summary in $rd/kcsan.sum echo KCSAN summary in $rd/kcsan.sum
else else
......
...@@ -124,7 +124,6 @@ seconds=$4 ...@@ -124,7 +124,6 @@ seconds=$4
qemu_args=$5 qemu_args=$5
boot_args=$6 boot_args=$6
cd $KVM
kstarttime=`gawk 'BEGIN { print systime() }' < /dev/null` kstarttime=`gawk 'BEGIN { print systime() }' < /dev/null`
if test -z "$TORTURE_BUILDONLY" if test -z "$TORTURE_BUILDONLY"
then then
...@@ -141,6 +140,7 @@ then ...@@ -141,6 +140,7 @@ then
cpu_count=$TORTURE_ALLOTED_CPUS cpu_count=$TORTURE_ALLOTED_CPUS
fi fi
qemu_args="`specify_qemu_cpus "$QEMU" "$qemu_args" "$cpu_count"`" qemu_args="`specify_qemu_cpus "$QEMU" "$qemu_args" "$cpu_count"`"
qemu_args="`specify_qemu_net "$qemu_args"`"
# Generate architecture-specific and interaction-specific qemu arguments # Generate architecture-specific and interaction-specific qemu arguments
qemu_args="$qemu_args `identify_qemu_args "$QEMU" "$resdir/console.log"`" qemu_args="$qemu_args `identify_qemu_args "$QEMU" "$resdir/console.log"`"
...@@ -152,6 +152,7 @@ qemu_append="`identify_qemu_append "$QEMU"`" ...@@ -152,6 +152,7 @@ qemu_append="`identify_qemu_append "$QEMU"`"
boot_args="`configfrag_boot_params "$boot_args" "$config_template"`" boot_args="`configfrag_boot_params "$boot_args" "$config_template"`"
# Generate kernel-version-specific boot parameters # Generate kernel-version-specific boot parameters
boot_args="`per_version_boot_params "$boot_args" $resdir/.config $seconds`" boot_args="`per_version_boot_params "$boot_args" $resdir/.config $seconds`"
echo $QEMU $qemu_args -m $TORTURE_QEMU_MEM -kernel $KERNEL -append \"$qemu_append $boot_args\" > $resdir/qemu-cmd
if test -n "$TORTURE_BUILDONLY" if test -n "$TORTURE_BUILDONLY"
then then
...@@ -159,9 +160,16 @@ then ...@@ -159,9 +160,16 @@ then
touch $resdir/buildonly touch $resdir/buildonly
exit 0 exit 0
fi fi
# Decorate qemu-cmd with redirection, backgrounding, and PID capture
sed -e 's/$/ 2>\&1 \&/' < $resdir/qemu-cmd > $T/qemu-cmd
echo 'echo $! > $resdir/qemu_pid' >> $T/qemu-cmd
# In case qemu refuses to run...
echo "NOTE: $QEMU either did not run or was interactive" > $resdir/console.log echo "NOTE: $QEMU either did not run or was interactive" > $resdir/console.log
echo $QEMU $qemu_args -m $TORTURE_QEMU_MEM -kernel $KERNEL -append \"$qemu_append $boot_args\" > $resdir/qemu-cmd
( $QEMU $qemu_args -m $TORTURE_QEMU_MEM -kernel $KERNEL -append "$qemu_append $boot_args" > $resdir/qemu-output 2>&1 & echo $! > $resdir/qemu_pid; wait `cat $resdir/qemu_pid`; echo $? > $resdir/qemu-retval ) & # Attempt to run qemu
( . $T/qemu-cmd; wait `cat $resdir/qemu_pid`; echo $? > $resdir/qemu-retval ) &
commandcompleted=0 commandcompleted=0
sleep 10 # Give qemu's pid a chance to reach the file sleep 10 # Give qemu's pid a chance to reach the file
if test -s "$resdir/qemu_pid" if test -s "$resdir/qemu_pid"
...@@ -181,7 +189,7 @@ do ...@@ -181,7 +189,7 @@ do
kruntime=`gawk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null` kruntime=`gawk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null`
if test -z "$qemu_pid" || kill -0 "$qemu_pid" > /dev/null 2>&1 if test -z "$qemu_pid" || kill -0 "$qemu_pid" > /dev/null 2>&1
then then
if test $kruntime -ge $seconds if test $kruntime -ge $seconds -o -f "$TORTURE_STOPFILE"
then then
break; break;
fi fi
...@@ -210,10 +218,19 @@ then ...@@ -210,10 +218,19 @@ then
fi fi
if test $commandcompleted -eq 0 -a -n "$qemu_pid" if test $commandcompleted -eq 0 -a -n "$qemu_pid"
then then
echo Grace period for qemu job at pid $qemu_pid if ! test -f "$TORTURE_STOPFILE"
then
echo Grace period for qemu job at pid $qemu_pid
fi
oldline="`tail $resdir/console.log`" oldline="`tail $resdir/console.log`"
while : while :
do do
if test -f "$TORTURE_STOPFILE"
then
echo "PID $qemu_pid killed due to run STOP request" >> $resdir/Warnings 2>&1
kill -KILL $qemu_pid
break
fi
kruntime=`gawk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null` kruntime=`gawk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null`
if kill -0 $qemu_pid > /dev/null 2>&1 if kill -0 $qemu_pid > /dev/null 2>&1
then then
......
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+
#
# Transform a qemu-cmd file to allow reuse.
#
# Usage: kvm-transform.sh bzImage console.log < qemu-cmd-in > qemu-cmd-out
#
# bzImage: Kernel and initrd from the same prior kvm.sh run.
# console.log: File into which to place console output.
#
# The original qemu-cmd file is provided on standard input.
# The transformed qemu-cmd file is on standard output.
# The transformation assumes that the qemu command is confined to a
# single line. It also assumes no whitespace in filenames.
#
# Copyright (C) 2020 Facebook, Inc.
#
# Authors: Paul E. McKenney <paulmck@kernel.org>
image="$1"
if test -z "$image"
then
echo Need kernel image file.
exit 1
fi
consolelog="$2"
if test -z "$consolelog"
then
echo "Need console log file name."
exit 1
fi
awk -v image="$image" -v consolelog="$consolelog" '
{
line = "";
for (i = 1; i <= NF; i++) {
if (line == "")
line = $i;
else
line = line " " $i;
if ($i == "-serial") {
i++;
line = line " file:" consolelog;
}
if ($i == "-kernel") {
i++;
line = line " " image;
}
}
print line;
}'
...@@ -73,6 +73,10 @@ usage () { ...@@ -73,6 +73,10 @@ usage () {
while test $# -gt 0 while test $# -gt 0
do do
case "$1" in case "$1" in
--allcpus)
cpus=$TORTURE_ALLOTED_CPUS
max_cpus=$TORTURE_ALLOTED_CPUS
;;
--bootargs|--bootarg) --bootargs|--bootarg)
checkarg --bootargs "(list of kernel boot arguments)" "$#" "$2" '.*' '^--' checkarg --bootargs "(list of kernel boot arguments)" "$#" "$2" '.*' '^--'
TORTURE_BOOTARGS="$2" TORTURE_BOOTARGS="$2"
...@@ -180,13 +184,14 @@ do ...@@ -180,13 +184,14 @@ do
shift shift
;; ;;
--torture) --torture)
checkarg --torture "(suite name)" "$#" "$2" '^\(lock\|rcu\|rcuperf\)$' '^--' checkarg --torture "(suite name)" "$#" "$2" '^\(lock\|rcu\|rcuperf\|refscale\)$' '^--'
TORTURE_SUITE=$2 TORTURE_SUITE=$2
shift shift
if test "$TORTURE_SUITE" = rcuperf if test "$TORTURE_SUITE" = rcuperf || test "$TORTURE_SUITE" = refscale
then then
# If you really want jitter for rcuperf, specify # If you really want jitter for refscale or
# it after specifying rcuperf. (But why?) # rcuperf, specify it after specifying the rcuperf
# or the refscale. (But why jitter in these cases?)
jitter=0 jitter=0
fi fi
;; ;;
...@@ -333,6 +338,8 @@ then ...@@ -333,6 +338,8 @@ then
mkdir -p "$resdir" || : mkdir -p "$resdir" || :
fi fi
mkdir $resdir/$ds mkdir $resdir/$ds
TORTURE_RESDIR="$resdir/$ds"; export TORTURE_RESDIR
TORTURE_STOPFILE="$resdir/$ds/STOP"; export TORTURE_STOPFILE
echo Results directory: $resdir/$ds echo Results directory: $resdir/$ds
echo $scriptname $args echo $scriptname $args
touch $resdir/$ds/log touch $resdir/$ds/log
...@@ -497,3 +504,7 @@ fi ...@@ -497,3 +504,7 @@ fi
# Tracing: trace_event=rcu:rcu_grace_period,rcu:rcu_future_grace_period,rcu:rcu_grace_period_init,rcu:rcu_nocb_wake,rcu:rcu_preempt_task,rcu:rcu_unlock_preempted_task,rcu:rcu_quiescent_state_report,rcu:rcu_fqs,rcu:rcu_callback,rcu:rcu_kfree_callback,rcu:rcu_batch_start,rcu:rcu_invoke_callback,rcu:rcu_invoke_kfree_callback,rcu:rcu_batch_end,rcu:rcu_torture_read,rcu:rcu_barrier # Tracing: trace_event=rcu:rcu_grace_period,rcu:rcu_future_grace_period,rcu:rcu_grace_period_init,rcu:rcu_nocb_wake,rcu:rcu_preempt_task,rcu:rcu_unlock_preempted_task,rcu:rcu_quiescent_state_report,rcu:rcu_fqs,rcu:rcu_callback,rcu:rcu_kfree_callback,rcu:rcu_batch_start,rcu:rcu_invoke_callback,rcu:rcu_invoke_kfree_callback,rcu:rcu_batch_end,rcu:rcu_torture_read,rcu:rcu_barrier
# Function-graph tracing: ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop # Function-graph tracing: ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop
# Also --kconfig "CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y" # Also --kconfig "CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y"
# Control buffer size: --bootargs trace_buf_size=3k
# Get trace-buffer dumps on all oopses: --bootargs ftrace_dump_on_oops
# Ditto, but dump only the oopsing CPU: --bootargs ftrace_dump_on_oops=orig_cpu
# Heavy-handed way to also dump on warnings: --bootargs panic_on_warn
...@@ -33,8 +33,8 @@ then ...@@ -33,8 +33,8 @@ then
fi fi
cat /dev/null > $file.diags cat /dev/null > $file.diags
# Check for proper termination, except that rcuperf runs don't indicate this. # Check for proper termination, except for rcuperf and refscale.
if test "$TORTURE_SUITE" != rcuperf if test "$TORTURE_SUITE" != rcuperf && test "$TORTURE_SUITE" != refscale
then then
# check for abject failure # check for abject failure
...@@ -44,11 +44,23 @@ then ...@@ -44,11 +44,23 @@ then
tail -1 | tail -1 |
awk ' awk '
{ {
for (i=NF-8;i<=NF;i++) normalexit = 1;
for (i=NF-8;i<=NF;i++) {
if (i <= 0 || i !~ /^[0-9]*$/) {
bangstring = $0;
gsub(/^\[[^]]*] /, "", bangstring);
print bangstring;
normalexit = 0;
exit 0;
}
sum+=$i; sum+=$i;
}
} }
END { print sum }'` END {
print_bug $title FAILURE, $nerrs instances if (normalexit)
print sum " instances"
}'`
print_bug $title FAILURE, $nerrs
exit exit
fi fi
...@@ -104,10 +116,7 @@ then ...@@ -104,10 +116,7 @@ then
fi fi
fi | tee -a $file.diags fi | tee -a $file.diags
egrep 'Badness|WARNING:|Warn|BUG|===========|Call Trace:|Oops:|detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state|rcu_.*kthread starved for' < $file | console-badness.sh < $file > $T.diags
grep -v 'ODEBUG: ' |
grep -v 'This means that this is a DEBUG kernel and it is' |
grep -v 'Warning: unable to open an initial console' > $T.diags
if test -s $T.diags if test -s $T.diags
then then
print_warning "Assertion failure in $file $title" print_warning "Assertion failure in $file $title"
......
CONFIG_RCU_REF_SCALE_TEST=y
CONFIG_PRINTK_TIME=y
CONFIG_SMP=y
CONFIG_PREEMPT_NONE=y
CONFIG_PREEMPT_VOLUNTARY=n
CONFIG_PREEMPT=n
#CHECK#CONFIG_PREEMPT_RCU=n
CONFIG_HZ_PERIODIC=n
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ_FULL=n
CONFIG_RCU_FAST_NO_HZ=n
CONFIG_HOTPLUG_CPU=n
CONFIG_SUSPEND=n
CONFIG_HIBERNATION=n
CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_PROVE_LOCKING=n
CONFIG_RCU_BOOST=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
CONFIG_RCU_EXPERT=y
CONFIG_SMP=y
CONFIG_PREEMPT_NONE=n
CONFIG_PREEMPT_VOLUNTARY=n
CONFIG_PREEMPT=y
#CHECK#CONFIG_PREEMPT_RCU=y
CONFIG_HZ_PERIODIC=n
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ_FULL=n
CONFIG_RCU_FAST_NO_HZ=n
CONFIG_HOTPLUG_CPU=n
CONFIG_SUSPEND=n
CONFIG_HIBERNATION=n
CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_PROVE_LOCKING=n
CONFIG_RCU_BOOST=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
CONFIG_RCU_EXPERT=y
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+
#
# Torture-suite-dependent shell functions for the rest of the scripts.
#
# Copyright (C) IBM Corporation, 2015
#
# Authors: Paul E. McKenney <paulmck@linux.ibm.com>
# per_version_boot_params bootparam-string config-file seconds
#
# Adds per-version torture-module parameters to kernels supporting them.
per_version_boot_params () {
echo $1 refscale.shutdown=1 \
refscale.verbose=1
}
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment