Commit 8f0cb666 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'core-rcu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:

 - kfree_rcu updates

 - RCU tasks updates

 - Read-side scalability tests

 - SRCU updates

 - Torture-test updates

 - Documentation updates

 - Miscellaneous fixes

* tag 'core-rcu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (109 commits)
  torture: Remove obsolete "cd $KVM"
  torture: Avoid duplicate specification of qemu command
  torture: Dump ftrace at shutdown only if requested
  torture: Add kvm-tranform.sh script for qemu-cmd files
  torture: Add more tracing crib notes to kvm.sh
  torture: Improve diagnostic for KCSAN-incapable compilers
  torture: Correctly summarize build-only runs
  torture: Pass --kmake-arg to all make invocations
  rcutorture: Check for unwatched readers
  torture: Abstract out console-log error detection
  torture: Add a stop-run capability
  torture: Create qemu-cmd in --buildonly runs
  rcu/rcutorture: Replace 0 with false
  torture: Add --allcpus argument to the kvm.sh script
  torture: Remove whitespace from identify_qemu_vcpus output
  rcutorture: NULL rcu_torture_current earlier in cleanup code
  rcutorture: Handle non-statistic bang-string error messages
  torture: Set configfile variable to current scenario
  rcutorture: Add races with task-exit processing
  locktorture: Use true and false to assign to bool variables
  ...
parents 5ece0817 c1cc4784
......@@ -2583,7 +2583,12 @@ not work to have these markers in the trampoline itself, because there
would need to be instructions following ``rcu_read_unlock()``. Although
``synchronize_rcu()`` would guarantee that execution reached the
``rcu_read_unlock()``, it would not be able to guarantee that execution
had completely left the trampoline.
had completely left the trampoline. Worse yet, in some situations
the trampoline's protection must extend a few instructions *prior* to
execution reaching the trampoline. For example, these few instructions
might calculate the address of the trampoline, so that entering the
trampoline would be pre-ordained a surprisingly long time before execution
actually reached the trampoline itself.
The solution, in the form of `Tasks
RCU <https://lwn.net/Articles/607117/>`__, is to have implicit read-side
......
.. SPDX-License-Identifier: GPL-2.0
================================
Review Checklist for RCU Patches
================================
This document contains a checklist for producing and reviewing patches
......@@ -411,18 +415,21 @@ over a rather long period of time, but improvements are always welcome!
__rcu sparse checks to validate your RCU code. These can help
find problems as follows:
CONFIG_PROVE_LOCKING: check that accesses to RCU-protected data
CONFIG_PROVE_LOCKING:
check that accesses to RCU-protected data
structures are carried out under the proper RCU
read-side critical section, while holding the right
combination of locks, or whatever other conditions
are appropriate.
CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the
CONFIG_DEBUG_OBJECTS_RCU_HEAD:
check that you don't pass the
same object to call_rcu() (or friends) before an RCU
grace period has elapsed since the last time that you
passed that same object to call_rcu() (or friends).
__rcu sparse checks: tag the pointer to the RCU-protected data
__rcu sparse checks:
tag the pointer to the RCU-protected data
structure with __rcu, and sparse will warn you if you
access that pointer without the services of one of the
variants of rcu_dereference().
......@@ -442,8 +449,8 @@ over a rather long period of time, but improvements are always welcome!
You instead need to use one of the barrier functions:
o call_rcu() -> rcu_barrier()
o call_srcu() -> srcu_barrier()
- call_rcu() -> rcu_barrier()
- call_srcu() -> srcu_barrier()
However, these barrier functions are absolutely -not- guaranteed
to wait for a grace period. In fact, if there are no call_rcu()
......
.. SPDX-License-Identifier: GPL-2.0
.. _rcu_concepts:
============
......@@ -8,10 +10,17 @@ RCU concepts
:maxdepth: 3
arrayRCU
checklist
lockdep
lockdep-splat
rcubarrier
rcu_dereference
whatisRCU
rcu
rculist_nulls
rcuref
torture
stallwarn
listRCU
NMI-RCU
UP
......
.. SPDX-License-Identifier: GPL-2.0
=================
Lockdep-RCU Splat
=================
Lockdep-RCU was added to the Linux kernel in early 2010
(http://lwn.net/Articles/371986/). This facility checks for some common
misuses of the RCU API, most notably using one of the rcu_dereference()
......@@ -12,55 +18,54 @@ overwriting or worse. There can of course be false positives, this
being the real world and all that.
So let's look at an example RCU lockdep splat from 3.0-rc5, one that
has long since been fixed:
=============================
WARNING: suspicious RCU usage
-----------------------------
block/cfq-iosched.c:2776 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
3 locks held by scsi_scan_6/1552:
#0: (&shost->scan_mutex){+.+.}, at: [<ffffffff8145efca>]
scsi_scan_host_selected+0x5a/0x150
#1: (&eq->sysfs_lock){+.+.}, at: [<ffffffff812a5032>]
elevator_exit+0x22/0x60
#2: (&(&q->__queue_lock)->rlock){-.-.}, at: [<ffffffff812b6233>]
cfq_exit_queue+0x43/0x190
stack backtrace:
Pid: 1552, comm: scsi_scan_6 Not tainted 3.0.0-rc5 #17
Call Trace:
[<ffffffff810abb9b>] lockdep_rcu_dereference+0xbb/0xc0
[<ffffffff812b6139>] __cfq_exit_single_io_context+0xe9/0x120
[<ffffffff812b626c>] cfq_exit_queue+0x7c/0x190
[<ffffffff812a5046>] elevator_exit+0x36/0x60
[<ffffffff812a802a>] blk_cleanup_queue+0x4a/0x60
[<ffffffff8145cc09>] scsi_free_queue+0x9/0x10
[<ffffffff81460944>] __scsi_remove_device+0x84/0xd0
[<ffffffff8145dca3>] scsi_probe_and_add_lun+0x353/0xb10
[<ffffffff817da069>] ? error_exit+0x29/0xb0
[<ffffffff817d98ed>] ? _raw_spin_unlock_irqrestore+0x3d/0x80
[<ffffffff8145e722>] __scsi_scan_target+0x112/0x680
[<ffffffff812c690d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff817da069>] ? error_exit+0x29/0xb0
[<ffffffff812bcc60>] ? kobject_del+0x40/0x40
[<ffffffff8145ed16>] scsi_scan_channel+0x86/0xb0
[<ffffffff8145f0b0>] scsi_scan_host_selected+0x140/0x150
[<ffffffff8145f149>] do_scsi_scan_host+0x89/0x90
[<ffffffff8145f170>] do_scan_async+0x20/0x160
[<ffffffff8145f150>] ? do_scsi_scan_host+0x90/0x90
[<ffffffff810975b6>] kthread+0xa6/0xb0
[<ffffffff817db154>] kernel_thread_helper+0x4/0x10
[<ffffffff81066430>] ? finish_task_switch+0x80/0x110
[<ffffffff817d9c04>] ? retint_restore_args+0xe/0xe
[<ffffffff81097510>] ? __kthread_init_worker+0x70/0x70
[<ffffffff817db150>] ? gs_change+0xb/0xb
Line 2776 of block/cfq-iosched.c in v3.0-rc5 is as follows:
has long since been fixed::
=============================
WARNING: suspicious RCU usage
-----------------------------
block/cfq-iosched.c:2776 suspicious rcu_dereference_protected() usage!
other info that might help us debug this::
rcu_scheduler_active = 1, debug_locks = 0
3 locks held by scsi_scan_6/1552:
#0: (&shost->scan_mutex){+.+.}, at: [<ffffffff8145efca>]
scsi_scan_host_selected+0x5a/0x150
#1: (&eq->sysfs_lock){+.+.}, at: [<ffffffff812a5032>]
elevator_exit+0x22/0x60
#2: (&(&q->__queue_lock)->rlock){-.-.}, at: [<ffffffff812b6233>]
cfq_exit_queue+0x43/0x190
stack backtrace:
Pid: 1552, comm: scsi_scan_6 Not tainted 3.0.0-rc5 #17
Call Trace:
[<ffffffff810abb9b>] lockdep_rcu_dereference+0xbb/0xc0
[<ffffffff812b6139>] __cfq_exit_single_io_context+0xe9/0x120
[<ffffffff812b626c>] cfq_exit_queue+0x7c/0x190
[<ffffffff812a5046>] elevator_exit+0x36/0x60
[<ffffffff812a802a>] blk_cleanup_queue+0x4a/0x60
[<ffffffff8145cc09>] scsi_free_queue+0x9/0x10
[<ffffffff81460944>] __scsi_remove_device+0x84/0xd0
[<ffffffff8145dca3>] scsi_probe_and_add_lun+0x353/0xb10
[<ffffffff817da069>] ? error_exit+0x29/0xb0
[<ffffffff817d98ed>] ? _raw_spin_unlock_irqrestore+0x3d/0x80
[<ffffffff8145e722>] __scsi_scan_target+0x112/0x680
[<ffffffff812c690d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff817da069>] ? error_exit+0x29/0xb0
[<ffffffff812bcc60>] ? kobject_del+0x40/0x40
[<ffffffff8145ed16>] scsi_scan_channel+0x86/0xb0
[<ffffffff8145f0b0>] scsi_scan_host_selected+0x140/0x150
[<ffffffff8145f149>] do_scsi_scan_host+0x89/0x90
[<ffffffff8145f170>] do_scan_async+0x20/0x160
[<ffffffff8145f150>] ? do_scsi_scan_host+0x90/0x90
[<ffffffff810975b6>] kthread+0xa6/0xb0
[<ffffffff817db154>] kernel_thread_helper+0x4/0x10
[<ffffffff81066430>] ? finish_task_switch+0x80/0x110
[<ffffffff817d9c04>] ? retint_restore_args+0xe/0xe
[<ffffffff81097510>] ? __kthread_init_worker+0x70/0x70
[<ffffffff817db150>] ? gs_change+0xb/0xb
Line 2776 of block/cfq-iosched.c in v3.0-rc5 is as follows::
if (rcu_dereference(ioc->ioc_data) == cic) {
......@@ -70,7 +75,7 @@ case. Instead, we hold three locks, one of which might be RCU related.
And maybe that lock really does protect this reference. If so, the fix
is to inform RCU, perhaps by changing __cfq_exit_single_io_context() to
take the struct request_queue "q" from cfq_exit_queue() as an argument,
which would permit us to invoke rcu_dereference_protected as follows:
which would permit us to invoke rcu_dereference_protected as follows::
if (rcu_dereference_protected(ioc->ioc_data,
lockdep_is_held(&q->queue_lock)) == cic) {
......@@ -85,7 +90,7 @@ On the other hand, perhaps we really do need an RCU read-side critical
section. In this case, the critical section must span the use of the
return value from rcu_dereference(), or at least until there is some
reference count incremented or some such. One way to handle this is to
add rcu_read_lock() and rcu_read_unlock() as follows:
add rcu_read_lock() and rcu_read_unlock() as follows::
rcu_read_lock();
if (rcu_dereference(ioc->ioc_data) == cic) {
......@@ -102,7 +107,7 @@ above lockdep-RCU splat.
But in this particular case, we don't actually dereference the pointer
returned from rcu_dereference(). Instead, that pointer is just compared
to the cic pointer, which means that the rcu_dereference() can be replaced
by rcu_access_pointer() as follows:
by rcu_access_pointer() as follows::
if (rcu_access_pointer(ioc->ioc_data) == cic) {
......
.. SPDX-License-Identifier: GPL-2.0
========================
RCU and lockdep checking
========================
All flavors of RCU have lockdep checking available, so that lockdep is
aware of when each task enters and leaves any flavor of RCU read-side
......@@ -8,7 +12,7 @@ tracking to include RCU state, which can sometimes help when debugging
deadlocks and the like.
In addition, RCU provides the following primitives that check lockdep's
state:
state::
rcu_read_lock_held() for normal RCU.
rcu_read_lock_bh_held() for RCU-bh.
......@@ -63,7 +67,7 @@ checking of rcu_dereference() primitives:
The rcu_dereference_check() check expression can be any boolean
expression, but would normally include a lockdep expression. However,
any boolean expression can be used. For a moderately ornate example,
consider the following:
consider the following::
file = rcu_dereference_check(fdt->fd[fd],
lockdep_is_held(&files->file_lock) ||
......@@ -82,7 +86,7 @@ RCU read-side critical sections, in case (2) the ->file_lock prevents
any change from taking place, and finally, in case (3) the current task
is the only task accessing the file_struct, again preventing any change
from taking place. If the above statement was invoked only from updater
code, it could instead be written as follows:
code, it could instead be written as follows::
file = rcu_dereference_protected(fdt->fd[fd],
lockdep_is_held(&files->file_lock) ||
......@@ -105,7 +109,7 @@ false and they are called from outside any RCU read-side critical section.
For example, the workqueue for_each_pwq() macro is intended to be used
either within an RCU read-side critical section or with wq->mutex held.
It is thus implemented as follows:
It is thus implemented as follows::
#define for_each_pwq(pwq, wq)
list_for_each_entry_rcu((pwq), &(wq)->pwqs, pwqs_node,
......
Using hlist_nulls to protect read-mostly linked lists and
.. SPDX-License-Identifier: GPL-2.0
=================================================
Using RCU hlist_nulls to protect list and objects
=================================================
This section describes how to use hlist_nulls to
protect read-mostly linked lists and
objects using SLAB_TYPESAFE_BY_RCU allocations.
Please read the basics in Documentation/RCU/listRCU.rst
Using 'nulls'
=============
Using special makers (called 'nulls') is a convenient way
to solve following problem :
......@@ -12,63 +22,68 @@ use following algos :
1) Lookup algo
--------------
rcu_read_lock()
begin:
obj = lockless_lookup(key);
if (obj) {
if (!try_get_ref(obj)) // might fail for free objects
goto begin;
/*
* Because a writer could delete object, and a writer could
* reuse these object before the RCU grace period, we
* must check key after getting the reference on object
*/
if (obj->key != key) { // not the object we expected
put_ref(obj);
goto begin;
}
}
rcu_read_unlock();
::
rcu_read_lock()
begin:
obj = lockless_lookup(key);
if (obj) {
if (!try_get_ref(obj)) // might fail for free objects
goto begin;
/*
* Because a writer could delete object, and a writer could
* reuse these object before the RCU grace period, we
* must check key after getting the reference on object
*/
if (obj->key != key) { // not the object we expected
put_ref(obj);
goto begin;
}
}
rcu_read_unlock();
Beware that lockless_lookup(key) cannot use traditional hlist_for_each_entry_rcu()
but a version with an additional memory barrier (smp_rmb())
lockless_lookup(key)
{
struct hlist_node *node, *next;
for (pos = rcu_dereference((head)->first);
pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
pos = rcu_dereference(next))
if (obj->key == key)
return obj;
return NULL;
And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb() :
::
struct hlist_node *node;
for (pos = rcu_dereference((head)->first);
pos && ({ prefetch(pos->next); 1; }) &&
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
pos = rcu_dereference(pos->next))
lockless_lookup(key)
{
struct hlist_node *node, *next;
for (pos = rcu_dereference((head)->first);
pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
pos = rcu_dereference(next))
if (obj->key == key)
return obj;
return NULL;
}
Quoting Corey Minyard :
"If the object is moved from one list to another list in-between the
time the hash is calculated and the next field is accessed, and the
object has moved to the end of a new list, the traversal will not
complete properly on the list it should have, since the object will
be on the end of the new list and there's not a way to tell it's on a
new list and restart the list traversal. I think that this can be
solved by pre-fetching the "next" field (with proper barriers) before
checking the key."
2) Insert algo :
----------------
return obj;
return NULL;
}
And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb()::
struct hlist_node *node;
for (pos = rcu_dereference((head)->first);
pos && ({ prefetch(pos->next); 1; }) &&
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
pos = rcu_dereference(pos->next))
if (obj->key == key)
return obj;
return NULL;
Quoting Corey Minyard::
"If the object is moved from one list to another list in-between the
time the hash is calculated and the next field is accessed, and the
object has moved to the end of a new list, the traversal will not
complete properly on the list it should have, since the object will
be on the end of the new list and there's not a way to tell it's on a
new list and restart the list traversal. I think that this can be
solved by pre-fetching the "next" field (with proper barriers) before
checking the key."
2) Insert algo
--------------
We need to make sure a reader cannot read the new 'obj->obj_next' value
and previous value of 'obj->key'. Or else, an item could be deleted
......@@ -76,21 +91,23 @@ from a chain, and inserted into another chain. If new chain was empty
before the move, 'next' pointer is NULL, and lockless reader can
not detect it missed following items in original chain.
/*
* Please note that new inserts are done at the head of list,
* not in the middle or end.
*/
obj = kmem_cache_alloc(...);
lock_chain(); // typically a spin_lock()
obj->key = key;
/*
* we need to make sure obj->key is updated before obj->next
* or obj->refcnt
*/
smp_wmb();
atomic_set(&obj->refcnt, 1);
hlist_add_head_rcu(&obj->obj_node, list);
unlock_chain(); // typically a spin_unlock()
::
/*
* Please note that new inserts are done at the head of list,
* not in the middle or end.
*/
obj = kmem_cache_alloc(...);
lock_chain(); // typically a spin_lock()
obj->key = key;
/*
* we need to make sure obj->key is updated before obj->next
* or obj->refcnt
*/
smp_wmb();
atomic_set(&obj->refcnt, 1);
hlist_add_head_rcu(&obj->obj_node, list);
unlock_chain(); // typically a spin_unlock()
3) Remove algo
......@@ -99,16 +116,22 @@ Nothing special here, we can use a standard RCU hlist deletion.
But thanks to SLAB_TYPESAFE_BY_RCU, beware a deleted object can be reused
very very fast (before the end of RCU grace period)
if (put_last_reference_on(obj) {
lock_chain(); // typically a spin_lock()
hlist_del_init_rcu(&obj->obj_node);
unlock_chain(); // typically a spin_unlock()
kmem_cache_free(cachep, obj);
}
::
if (put_last_reference_on(obj) {
lock_chain(); // typically a spin_lock()
hlist_del_init_rcu(&obj->obj_node);
unlock_chain(); // typically a spin_unlock()
kmem_cache_free(cachep, obj);
}
--------------------------------------------------------------------------
Avoiding extra smp_rmb()
========================
With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup()
and extra smp_wmb() in insert function.
......@@ -124,49 +147,54 @@ scan the list again without harm.
1) lookup algo
--------------
head = &table[slot];
rcu_read_lock();
begin:
hlist_nulls_for_each_entry_rcu(obj, node, head, member) {
if (obj->key == key) {
::
head = &table[slot];
rcu_read_lock();
begin:
hlist_nulls_for_each_entry_rcu(obj, node, head, member) {
if (obj->key == key) {
if (!try_get_ref(obj)) // might fail for free objects
goto begin;
goto begin;
if (obj->key != key) { // not the object we expected
put_ref(obj);
goto begin;
put_ref(obj);
goto begin;
}
goto out;
}
/*
* if the nulls value we got at the end of this lookup is
* not the expected one, we must restart lookup.
* We probably met an item that was moved to another chain.
*/
if (get_nulls_value(node) != slot)
goto begin;
obj = NULL;
out:
rcu_read_unlock();
2) Insert function :
--------------------
/*
* Please note that new inserts are done at the head of list,
* not in the middle or end.
*/
obj = kmem_cache_alloc(cachep);
lock_chain(); // typically a spin_lock()
obj->key = key;
/*
* changes to obj->key must be visible before refcnt one
*/
smp_wmb();
atomic_set(&obj->refcnt, 1);
/*
* insert obj in RCU way (readers might be traversing chain)
*/
hlist_nulls_add_head_rcu(&obj->obj_node, list);
unlock_chain(); // typically a spin_unlock()
goto out;
}
/*
* if the nulls value we got at the end of this lookup is
* not the expected one, we must restart lookup.
* We probably met an item that was moved to another chain.
*/
if (get_nulls_value(node) != slot)
goto begin;
obj = NULL;
out:
rcu_read_unlock();
2) Insert function
------------------
::
/*
* Please note that new inserts are done at the head of list,
* not in the middle or end.
*/
obj = kmem_cache_alloc(cachep);
lock_chain(); // typically a spin_lock()
obj->key = key;
/*
* changes to obj->key must be visible before refcnt one
*/
smp_wmb();
atomic_set(&obj->refcnt, 1);
/*
* insert obj in RCU way (readers might be traversing chain)
*/
hlist_nulls_add_head_rcu(&obj->obj_node, list);
unlock_chain(); // typically a spin_unlock()
Reference-count design for elements of lists/arrays protected by RCU.
.. SPDX-License-Identifier: GPL-2.0
====================================================================
Reference-count design for elements of lists/arrays protected by RCU
====================================================================
Please note that the percpu-ref feature is likely your first
......@@ -12,32 +16,33 @@ please read on.
Reference counting on elements of lists which are protected by traditional
reader/writer spinlocks or semaphores are straightforward:
CODE LISTING A:
1. 2.
add() search_and_reference()
{ {
alloc_object read_lock(&list_lock);
... search_for_element
atomic_set(&el->rc, 1); atomic_inc(&el->rc);
write_lock(&list_lock); ...
add_element read_unlock(&list_lock);
... ...
write_unlock(&list_lock); }
}
3. 4.
release_referenced() delete()
{ {
... write_lock(&list_lock);
if(atomic_dec_and_test(&el->rc)) ...
kfree(el);
... remove_element
} write_unlock(&list_lock);
...
if (atomic_dec_and_test(&el->rc))
kfree(el);
...
}
CODE LISTING A::
1. 2.
add() search_and_reference()
{ {
alloc_object read_lock(&list_lock);
... search_for_element
atomic_set(&el->rc, 1); atomic_inc(&el->rc);
write_lock(&list_lock); ...
add_element read_unlock(&list_lock);
... ...
write_unlock(&list_lock); }
}
3. 4.
release_referenced() delete()
{ {
... write_lock(&list_lock);
if(atomic_dec_and_test(&el->rc)) ...
kfree(el);
... remove_element
} write_unlock(&list_lock);
...
if (atomic_dec_and_test(&el->rc))
kfree(el);
...
}
If this list/array is made lock free using RCU as in changing the
write_lock() in add() and delete() to spin_lock() and changing read_lock()
......@@ -46,34 +51,35 @@ search_and_reference() could potentially hold reference to an element which
has already been deleted from the list/array. Use atomic_inc_not_zero()
in this scenario as follows:
CODE LISTING B:
1. 2.
add() search_and_reference()
{ {
alloc_object rcu_read_lock();
... search_for_element
atomic_set(&el->rc, 1); if (!atomic_inc_not_zero(&el->rc)) {
spin_lock(&list_lock); rcu_read_unlock();
return FAIL;
add_element }
... ...
spin_unlock(&list_lock); rcu_read_unlock();
} }
3. 4.
release_referenced() delete()
{ {
... spin_lock(&list_lock);
if (atomic_dec_and_test(&el->rc)) ...
call_rcu(&el->head, el_free); remove_element
... spin_unlock(&list_lock);
} ...
if (atomic_dec_and_test(&el->rc))
call_rcu(&el->head, el_free);
...
}
CODE LISTING B::
1. 2.
add() search_and_reference()
{ {
alloc_object rcu_read_lock();
... search_for_element
atomic_set(&el->rc, 1); if (!atomic_inc_not_zero(&el->rc)) {
spin_lock(&list_lock); rcu_read_unlock();
return FAIL;
add_element }
... ...
spin_unlock(&list_lock); rcu_read_unlock();
} }
3. 4.
release_referenced() delete()
{ {
... spin_lock(&list_lock);
if (atomic_dec_and_test(&el->rc)) ...
call_rcu(&el->head, el_free); remove_element
... spin_unlock(&list_lock);
} ...
if (atomic_dec_and_test(&el->rc))
call_rcu(&el->head, el_free);
...
}
Sometimes, a reference to the element needs to be obtained in the
update (write) stream. In such cases, atomic_inc_not_zero() might be
update (write) stream. In such cases, atomic_inc_not_zero() might be
overkill, since we hold the update-side spinlock. One might instead
use atomic_inc() in such cases.
......@@ -82,39 +88,40 @@ search_and_reference() code path. In such cases, the
atomic_dec_and_test() may be moved from delete() to el_free()
as follows:
CODE LISTING C:
1. 2.
add() search_and_reference()
{ {
alloc_object rcu_read_lock();
... search_for_element
atomic_set(&el->rc, 1); atomic_inc(&el->rc);
spin_lock(&list_lock); ...
add_element rcu_read_unlock();
... }
spin_unlock(&list_lock); 4.
} delete()
3. {
release_referenced() spin_lock(&list_lock);
{ ...
... remove_element
if (atomic_dec_and_test(&el->rc)) spin_unlock(&list_lock);
kfree(el); ...
... call_rcu(&el->head, el_free);
} ...
5. }
void el_free(struct rcu_head *rhp)
{
release_referenced();
}
CODE LISTING C::
1. 2.
add() search_and_reference()
{ {
alloc_object rcu_read_lock();
... search_for_element
atomic_set(&el->rc, 1); atomic_inc(&el->rc);
spin_lock(&list_lock); ...
add_element rcu_read_unlock();
... }
spin_unlock(&list_lock); 4.
} delete()
3. {
release_referenced() spin_lock(&list_lock);
{ ...
... remove_element
if (atomic_dec_and_test(&el->rc)) spin_unlock(&list_lock);
kfree(el); ...
... call_rcu(&el->head, el_free);
} ...
5. }
void el_free(struct rcu_head *rhp)
{
release_referenced();
}
The key point is that the initial reference added by add() is not removed
until after a grace period has elapsed following removal. This means that
search_and_reference() cannot find this element, which means that the value
of el->rc cannot increase. Thus, once it reaches zero, there are no
readers that can or ever will be able to reference the element. The
element can therefore safely be freed. This in turn guarantees that if
readers that can or ever will be able to reference the element. The
element can therefore safely be freed. This in turn guarantees that if
any reader finds the element, that reader may safely acquire a reference
without checking the value of the reference counter.
......@@ -130,21 +137,21 @@ the eventual invocation of kfree(), which is usually not a problem on
modern computer systems, even the small ones.
In cases where delete() can sleep, synchronize_rcu() can be called from
delete(), so that el_free() can be subsumed into delete as follows:
4.
delete()
{
spin_lock(&list_lock);
...
remove_element
spin_unlock(&list_lock);
...
synchronize_rcu();
if (atomic_dec_and_test(&el->rc))
kfree(el);
...
}
delete(), so that el_free() can be subsumed into delete as follows::
4.
delete()
{
spin_lock(&list_lock);
...
remove_element
spin_unlock(&list_lock);
...
synchronize_rcu();
if (atomic_dec_and_test(&el->rc))
kfree(el);
...
}
As additional examples in the kernel, the pattern in listing C is used by
reference counting of struct pid, while the pattern in listing B is used by
......
.. SPDX-License-Identifier: GPL-2.0
==============================
Using RCU's CPU Stall Detector
==============================
This document first discusses what sorts of issues RCU's CPU stall
detector can locate, and then discusses kernel parameters and Kconfig
......@@ -7,39 +11,40 @@ this document explains the stall detector's "splat" format.
What Causes RCU CPU Stall Warnings?
===================================
So your kernel printed an RCU CPU stall warning. The next question is
"What caused it?" The following problems can result in RCU CPU stall
warnings:
o A CPU looping in an RCU read-side critical section.
- A CPU looping in an RCU read-side critical section.
o A CPU looping with interrupts disabled.
- A CPU looping with interrupts disabled.
o A CPU looping with preemption disabled.
- A CPU looping with preemption disabled.
o A CPU looping with bottom halves disabled.
- A CPU looping with bottom halves disabled.
o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
- For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
without invoking schedule(). If the looping in the kernel is
really expected and desirable behavior, you might need to add
some calls to cond_resched().
o Booting Linux using a console connection that is too slow to
- Booting Linux using a console connection that is too slow to
keep up with the boot-time console-message rate. For example,
a 115Kbaud serial console can be -way- too slow to keep up
with boot-time message rates, and will frequently result in
RCU CPU stall warning messages. Especially if you have added
debug printk()s.
o Anything that prevents RCU's grace-period kthreads from running.
- Anything that prevents RCU's grace-period kthreads from running.
This can result in the "All QSes seen" console-log message.
This message will include information on when the kthread last
ran and how often it should be expected to run. It can also
result in the "rcu_.*kthread starved for" console-log message,
result in the ``rcu_.*kthread starved for`` console-log message,
which will include additional debugging information.
o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
- A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
happen to preempt a low-priority task in the middle of an RCU
read-side critical section. This is especially damaging if
that low-priority task is not permitted to run on any other CPU,
......@@ -48,7 +53,7 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
While the system is in the process of running itself out of
memory, you might see stall-warning messages.
o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
- A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
is running at a higher priority than the RCU softirq threads.
This will prevent RCU callbacks from ever being invoked,
and in a CONFIG_PREEMPT_RCU kernel will further prevent
......@@ -63,7 +68,7 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
can increase your system's context-switch rate and thus degrade
performance.
o A periodic interrupt whose handler takes longer than the time
- A periodic interrupt whose handler takes longer than the time
interval between successive pairs of interrupts. This can
prevent RCU's kthreads and softirq handlers from running.
Note that certain high-overhead debugging options, for example
......@@ -71,20 +76,27 @@ o A periodic interrupt whose handler takes longer than the time
considerably longer than normal, which can in turn result in
RCU CPU stall warnings.
o Testing a workload on a fast system, tuning the stall-warning
- Testing a workload on a fast system, tuning the stall-warning
timeout down to just barely avoid RCU CPU stall warnings, and then
running the same workload with the same stall-warning timeout on a
slow system. Note that thermal throttling and on-demand governors
can cause a single system to be sometimes fast and sometimes slow!
o A hardware or software issue shuts off the scheduler-clock
- A hardware or software issue shuts off the scheduler-clock
interrupt on a CPU that is not in dyntick-idle mode. This
problem really has happened, and seems to be most likely to
result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels.
o A bug in the RCU implementation.
- A hardware or software issue that prevents time-based wakeups
from occurring. These issues can range from misconfigured or
buggy timer hardware through bugs in the interrupt or exception
path (whether hardware, firmware, or software) through bugs
in Linux's timer subsystem through bugs in the scheduler, and,
yes, even including bugs in RCU itself.
- A bug in the RCU implementation.
o A hardware failure. This is quite unlikely, but has occurred
- A hardware failure. This is quite unlikely, but has occurred
at least once in real life. A CPU failed in a running system,
becoming unresponsive, but not causing an immediate crash.
This resulted in a series of RCU CPU stall warnings, eventually
......@@ -109,6 +121,7 @@ see include/trace/events/rcu.h.
Fine-Tuning the RCU CPU Stall Detector
======================================
The rcuupdate.rcu_cpu_stall_suppress module parameter disables RCU's
CPU stall detector, which detects conditions that unduly delay RCU grace
......@@ -118,6 +131,7 @@ The stall detector's idea of what constitutes "unduly delayed" is
controlled by a set of kernel configuration variables and cpp macros:
CONFIG_RCU_CPU_STALL_TIMEOUT
----------------------------
This kernel configuration parameter defines the period of time
that RCU will wait from the beginning of a grace period until it
......@@ -137,6 +151,7 @@ CONFIG_RCU_CPU_STALL_TIMEOUT
/sys/module/rcupdate/parameters/rcu_cpu_stall_suppress.
RCU_STALL_DELAY_DELTA
---------------------
Although the lockdep facility is extremely useful, it does add
some overhead. Therefore, under CONFIG_PROVE_RCU, the
......@@ -145,6 +160,7 @@ RCU_STALL_DELAY_DELTA
macro, not a kernel configuration parameter.)
RCU_STALL_RAT_DELAY
-------------------
The CPU stall detector tries to make the offending CPU print its
own warnings, as this often gives better-quality stack traces.
......@@ -155,6 +171,7 @@ RCU_STALL_RAT_DELAY
parameter.)
rcupdate.rcu_task_stall_timeout
-------------------------------
This boot/sysfs parameter controls the RCU-tasks stall warning
interval. A value of zero or less suppresses RCU-tasks stall
......@@ -168,9 +185,10 @@ rcupdate.rcu_task_stall_timeout
Interpreting RCU's CPU Stall-Detector "Splats"
==============================================
For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling,
it will print a message similar to the following:
it will print a message similar to the following::
INFO: rcu_sched detected stalls on CPUs/tasks:
2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
......@@ -223,7 +241,7 @@ an estimate of the total number of RCU callbacks queued across all CPUs
(625 in this case).
In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed
for each CPU:
for each CPU::
0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 dyntick_enabled: 1
......@@ -235,7 +253,7 @@ processing is enabled.
If the grace period ends just as the stall warning starts printing,
there will be a spurious stall-warning message, which will include
the following:
the following::
INFO: Stall ended before state dump start
......@@ -248,7 +266,7 @@ which is overkill for this sort of problem.
If all CPUs and tasks have passed through quiescent states, but the
grace period has nevertheless failed to end, the stall-warning splat
will include something like the following:
will include something like the following::
All QSes seen, last rcu_preempt kthread activity 23807 (4297905177-4297881370), jiffies_till_next_fqs=3, root ->qsmask 0x0
......@@ -261,7 +279,7 @@ which is way less than 23807. Finally, the root rcu_node structure's
If the relevant grace-period kthread has been unable to run prior to
the stall warning, as was the case in the "All QSes seen" line above,
the following additional line is printed:
the following additional line is printed::
kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5
......@@ -276,6 +294,7 @@ kthread last ran on CPU 5.
Multiple Warnings From One Stall
================================
If a stall lasts long enough, multiple stall-warning messages will be
printed for it. The second and subsequent messages are printed at
......@@ -285,9 +304,10 @@ of the stall and the first message.
Stall Warnings for Expedited Grace Periods
==========================================
If an expedited grace period detects a stall, it will place a message
like the following in dmesg:
like the following in dmesg::
INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 21119 jiffies s: 73 root: 0x2/.
......
......@@ -4038,6 +4038,14 @@
latencies, which will choose a value aligned
with the appropriate hardware boundaries.
rcutree.rcu_min_cached_objs= [KNL]
Minimum number of objects which are cached and
maintained per one CPU. Object size is equal
to PAGE_SIZE. The cache allows to reduce the
pressure to page allocator, also it makes the
whole algorithm to behave better in low memory
condition.
rcutree.jiffies_till_first_fqs= [KNL]
Set delay from grace-period initialization to
first attempt to force quiescent states.
......@@ -4258,6 +4266,20 @@
Set time (jiffies) between CPU-hotplug operations,
or zero to disable CPU-hotplug testing.
rcutorture.read_exit= [KNL]
Set the number of read-then-exit kthreads used
to test the interaction of RCU updaters and
task-exit processing.
rcutorture.read_exit_burst= [KNL]
The number of times in a given read-then-exit
episode that a set of read-then-exit kthreads
is spawned.
rcutorture.read_exit_delay= [KNL]
The delay, in seconds, between successive
read-then-exit testing episodes.
rcutorture.shuffle_interval= [KNL]
Set task-shuffle interval (s). Shuffling tasks
allows some CPUs to go into dyntick-idle mode
......@@ -4407,6 +4429,45 @@
reboot_cpu is s[mp]#### with #### being the processor
to be used for rebooting.
refscale.holdoff= [KNL]
Set test-start holdoff period. The purpose of
this parameter is to delay the start of the
test until boot completes in order to avoid
interference.
refscale.loops= [KNL]
Set the number of loops over the synchronization
primitive under test. Increasing this number
reduces noise due to loop start/end overhead,
but the default has already reduced the per-pass
noise to a handful of picoseconds on ca. 2020
x86 laptops.
refscale.nreaders= [KNL]
Set number of readers. The default value of -1
selects N, where N is roughly 75% of the number
of CPUs. A value of zero is an interesting choice.
refscale.nruns= [KNL]
Set number of runs, each of which is dumped onto
the console log.
refscale.readdelay= [KNL]
Set the read-side critical-section duration,
measured in microseconds.
refscale.scale_type= [KNL]
Specify the read-protection implementation to test.
refscale.shutdown= [KNL]
Shut down the system at the end of the performance
test. This defaults to 1 (shut it down) when
rcuperf is built into the kernel and to 0 (leave
it running) when rcuperf is built as a module.
refscale.verbose= [KNL]
Enable additional printk() statements.
relax_domain_level=
[KNL, SMP] Set scheduler's default relax_domain_level.
See Documentation/admin-guide/cgroup-v1/cpusets.rst.
......@@ -5082,6 +5143,13 @@
Prevent the CPU-hotplug component of torturing
until after init has spawned.
torture.ftrace_dump_at_shutdown= [KNL]
Dump the ftrace buffer at torture-test shutdown,
even if there were no errors. This can be a
very costly operation when many torture tests
are running concurrently, especially on systems
with rotating-rust storage.
tp720= [HW,PS2]
tpm_suspend_pcr=[HW,TPM]
......
......@@ -166,4 +166,4 @@ checked for such errors. The "rmmod" command forces a "SUCCESS",
two are self-explanatory, while the last indicates that while there
were no locking failures, CPU-hotplug problems were detected.
Also see: Documentation/RCU/torture.txt
Also see: Documentation/RCU/torture.rst
......@@ -14449,7 +14449,7 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev
F: Documentation/RCU/
F: include/linux/rcu*
F: kernel/rcu/
X: Documentation/RCU/torture.txt
X: Documentation/RCU/torture.rst
X: include/linux/srcu*.h
X: kernel/rcu/srcu*.c
......@@ -17301,7 +17301,7 @@ M: Josh Triplett <josh@joshtriplett.org>
L: linux-kernel@vger.kernel.org
S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev
F: Documentation/RCU/torture.txt
F: Documentation/RCU/torture.rst
F: kernel/locking/locktorture.c
F: kernel/rcu/rcuperf.c
F: kernel/rcu/rcutorture.c
......
......@@ -4541,6 +4541,8 @@ int try_release_extent_mapping(struct page *page, gfp_t mask)
/* once for us */
free_extent_map(em);
cond_resched(); /* Allow large-extent preemption. */
}
}
return try_release_extent_state(tree, page, mask);
......
......@@ -512,7 +512,7 @@ static inline void hlist_replace_rcu(struct hlist_node *old,
* @right: The hlist head on the right
*
* The lists start out as [@left ][node1 ... ] and
[@right ][node2 ... ]
* [@right ][node2 ... ]
* The lists end up as [@left ][node2 ... ]
* [@right ][node1 ... ]
*/
......
......@@ -162,7 +162,7 @@ static inline void hlist_nulls_add_fake(struct hlist_nulls_node *n)
* The barrier() is needed to make sure compiler doesn't cache first element [1],
* as this loop can be restarted [2]
* [1] Documentation/core-api/atomic_ops.rst around line 114
* [2] Documentation/RCU/rculist_nulls.txt around line 146
* [2] Documentation/RCU/rculist_nulls.rst around line 146
*/
#define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
for (({barrier();}), \
......
......@@ -828,17 +828,17 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
/*
* Does the specified offset indicate that the corresponding rcu_head
* structure can be handled by kfree_rcu()?
* structure can be handled by kvfree_rcu()?
*/
#define __is_kfree_rcu_offset(offset) ((offset) < 4096)
#define __is_kvfree_rcu_offset(offset) ((offset) < 4096)
/*
* Helper macro for kfree_rcu() to prevent argument-expansion eyestrain.
*/
#define __kfree_rcu(head, offset) \
#define __kvfree_rcu(head, offset) \
do { \
BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
BUILD_BUG_ON(!__is_kvfree_rcu_offset(offset)); \
kvfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
} while (0)
/**
......@@ -857,7 +857,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
* Because the functions are not allowed in the low-order 4096 bytes of
* kernel virtual memory, offsets up to 4095 bytes can be accommodated.
* If the offset is larger than 4095 bytes, a compile-time error will
* be generated in __kfree_rcu(). If this error is triggered, you can
* be generated in __kvfree_rcu(). If this error is triggered, you can
* either fall back to use of call_rcu() or rearrange the structure to
* position the rcu_head structure into the first 4096 bytes.
*
......@@ -872,7 +872,46 @@ do { \
typeof (ptr) ___p = (ptr); \
\
if (___p) \
__kfree_rcu(&((___p)->rhf), offsetof(typeof(*(ptr)), rhf)); \
__kvfree_rcu(&((___p)->rhf), offsetof(typeof(*(ptr)), rhf)); \
} while (0)
/**
* kvfree_rcu() - kvfree an object after a grace period.
*
* This macro consists of one or two arguments and it is
* based on whether an object is head-less or not. If it
* has a head then a semantic stays the same as it used
* to be before:
*
* kvfree_rcu(ptr, rhf);
*
* where @ptr is a pointer to kvfree(), @rhf is the name
* of the rcu_head structure within the type of @ptr.
*
* When it comes to head-less variant, only one argument
* is passed and that is just a pointer which has to be
* freed after a grace period. Therefore the semantic is
*
* kvfree_rcu(ptr);
*
* where @ptr is a pointer to kvfree().
*
* Please note, head-less way of freeing is permitted to
* use from a context that has to follow might_sleep()
* annotation. Otherwise, please switch and embed the
* rcu_head structure within the type of @ptr.
*/
#define kvfree_rcu(...) KVFREE_GET_MACRO(__VA_ARGS__, \
kvfree_rcu_arg_2, kvfree_rcu_arg_1)(__VA_ARGS__)
#define KVFREE_GET_MACRO(_1, _2, NAME, ...) NAME
#define kvfree_rcu_arg_2(ptr, rhf) kfree_rcu(ptr, rhf)
#define kvfree_rcu_arg_1(ptr) \
do { \
typeof(ptr) ___p = (ptr); \
\
if (___p) \
kvfree_call_rcu(NULL, (rcu_callback_t) (___p)); \
} while (0)
/*
......
......@@ -36,8 +36,8 @@ void rcu_read_unlock_trace_special(struct task_struct *t, int nesting);
/**
* rcu_read_lock_trace - mark beginning of RCU-trace read-side critical section
*
* When synchronize_rcu_trace() is invoked by one task, then that task
* is guaranteed to block until all other tasks exit their read-side
* When synchronize_rcu_tasks_trace() is invoked by one task, then that
* task is guaranteed to block until all other tasks exit their read-side
* critical sections. Similarly, if call_rcu_trace() is invoked on one
* task while other tasks are within RCU read-side critical sections,
* invocation of the corresponding RCU callback is deferred until after
......
......@@ -34,9 +34,25 @@ static inline void synchronize_rcu_expedited(void)
synchronize_rcu();
}
static inline void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
/*
* Add one more declaration of kvfree() here. It is
* not so straight forward to just include <linux/mm.h>
* where it is defined due to getting many compile
* errors caused by that include.
*/
extern void kvfree(const void *addr);
static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
{
call_rcu(head, func);
if (head) {
call_rcu(head, func);
return;
}
// kvfree_rcu(one_arg) call.
might_sleep();
synchronize_rcu();
kvfree((void *) func);
}
void rcu_qs(void);
......
......@@ -33,7 +33,7 @@ static inline void rcu_virt_note_context_switch(int cpu)
}
void synchronize_rcu_expedited(void);
void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
void rcu_barrier(void);
bool rcu_eqs_special_set(int cpu);
......
......@@ -55,6 +55,11 @@ struct torture_random_state {
#define DEFINE_TORTURE_RANDOM_PERCPU(name) \
DEFINE_PER_CPU(struct torture_random_state, name)
unsigned long torture_random(struct torture_random_state *trsp);
static inline void torture_random_init(struct torture_random_state *trsp)
{
trsp->trs_state = 0;
trsp->trs_count = 0;
}
/* Task shuffler, which causes CPUs to occasionally go idle. */
void torture_shuffle_task_register(struct task_struct *tp);
......
......@@ -435,11 +435,12 @@ TRACE_EVENT_RCU(rcu_fqs,
#endif /* #if defined(CONFIG_TREE_RCU) */
/*
* Tracepoint for dyntick-idle entry/exit events. These take a string
* as argument: "Start" for entering dyntick-idle mode, "Startirq" for
* entering it from irq/NMI, "End" for leaving it, "Endirq" for leaving it
* to irq/NMI, "--=" for events moving towards idle, and "++=" for events
* moving away from idle.
* Tracepoint for dyntick-idle entry/exit events. These take 2 strings
* as argument:
* polarity: "Start", "End", "StillNonIdle" for entering, exiting or still not
* being in dyntick-idle mode.
* context: "USER" or "IDLE" or "IRQ".
* NMIs nested in IRQs are inferred with dynticks_nesting > 1 in IRQ context.
*
* These events also take a pair of numbers, which indicate the nesting
* depth before and after the event of interest, and a third number that is
......@@ -506,13 +507,13 @@ TRACE_EVENT_RCU(rcu_callback,
/*
* Tracepoint for the registration of a single RCU callback of the special
* kfree() form. The first argument is the RCU type, the second argument
* kvfree() form. The first argument is the RCU type, the second argument
* is a pointer to the RCU callback, the third argument is the offset
* of the callback within the enclosing RCU-protected data structure,
* the fourth argument is the number of lazy callbacks queued, and the
* fifth argument is the total number of callbacks queued.
*/
TRACE_EVENT_RCU(rcu_kfree_callback,
TRACE_EVENT_RCU(rcu_kvfree_callback,
TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset,
long qlen),
......@@ -596,12 +597,12 @@ TRACE_EVENT_RCU(rcu_invoke_callback,
/*
* Tracepoint for the invocation of a single RCU callback of the special
* kfree() form. The first argument is the RCU flavor, the second
* kvfree() form. The first argument is the RCU flavor, the second
* argument is a pointer to the RCU callback, and the third argument
* is the offset of the callback within the enclosing RCU-protected
* data structure.
*/
TRACE_EVENT_RCU(rcu_invoke_kfree_callback,
TRACE_EVENT_RCU(rcu_invoke_kvfree_callback,
TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset),
......
......@@ -5851,9 +5851,7 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n",
!rcu_lockdep_current_cpu_online()
? "RCU used illegally from offline CPU!\n"
: !rcu_is_watching()
? "RCU used illegally from idle CPU!\n"
: "",
: "",
rcu_scheduler_active, debug_locks);
/*
......
......@@ -631,13 +631,13 @@ static int lock_torture_writer(void *arg)
cxt.cur_ops->writelock();
if (WARN_ON_ONCE(lock_is_write_held))
lwsp->n_lock_fail++;
lock_is_write_held = 1;
lock_is_write_held = true;
if (WARN_ON_ONCE(lock_is_read_held))
lwsp->n_lock_fail++; /* rare, but... */
lwsp->n_lock_acquired++;
cxt.cur_ops->write_delay(&rand);
lock_is_write_held = 0;
lock_is_write_held = false;
cxt.cur_ops->writeunlock();
stutter_wait("lock_torture_writer");
......@@ -665,13 +665,13 @@ static int lock_torture_reader(void *arg)
schedule_timeout_uninterruptible(1);
cxt.cur_ops->readlock();
lock_is_read_held = 1;
lock_is_read_held = true;
if (WARN_ON_ONCE(lock_is_write_held))
lrsp->n_lock_fail++; /* rare, but... */
lrsp->n_lock_acquired++;
cxt.cur_ops->read_delay(&rand);
lock_is_read_held = 0;
lock_is_read_held = false;
cxt.cur_ops->readunlock();
stutter_wait("lock_torture_reader");
......@@ -686,7 +686,7 @@ static int lock_torture_reader(void *arg)
static void __torture_print_stats(char *page,
struct lock_stress_stats *statp, bool write)
{
bool fail = 0;
bool fail = false;
int i, n_stress;
long max = 0, min = statp ? statp[0].n_lock_acquired : 0;
long long sum = 0;
......@@ -904,7 +904,7 @@ static int __init lock_torture_init(void)
/* Initialize the statistics so that each run gets its own numbers. */
if (nwriters_stress) {
lock_is_write_held = 0;
lock_is_write_held = false;
cxt.lwsa = kmalloc_array(cxt.nrealwriters_stress,
sizeof(*cxt.lwsa),
GFP_KERNEL);
......@@ -935,7 +935,7 @@ static int __init lock_torture_init(void)
}
if (nreaders_stress) {
lock_is_read_held = 0;
lock_is_read_held = false;
cxt.lrsa = kmalloc_array(cxt.nrealreaders_stress,
sizeof(*cxt.lrsa),
GFP_KERNEL);
......
......@@ -61,6 +61,25 @@ config RCU_TORTURE_TEST
Say M if you want the RCU torture tests to build as a module.
Say N if you are unsure.
config RCU_REF_SCALE_TEST
tristate "Scalability tests for read-side synchronization (RCU and others)"
depends on DEBUG_KERNEL
select TORTURE_TEST
select SRCU
select TASKS_RCU
select TASKS_RUDE_RCU
select TASKS_TRACE_RCU
default n
help
This option provides a kernel module that runs performance tests
useful comparing RCU with various read-side synchronization mechanisms.
The kernel module may be built after the fact on the running kernel to be
tested, if desired.
Say Y here if you want these performance tests built into the kernel.
Say M if you want to build it as a module instead.
Say N if you are unsure.
config RCU_CPU_STALL_TIMEOUT
int "RCU CPU stall timeout in seconds"
depends on RCU_STALL_COMMON
......
......@@ -12,6 +12,7 @@ obj-$(CONFIG_TREE_SRCU) += srcutree.o
obj-$(CONFIG_TINY_SRCU) += srcutiny.o
obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
obj-$(CONFIG_RCU_PERF_TEST) += rcuperf.o
obj-$(CONFIG_RCU_REF_SCALE_TEST) += refscale.o
obj-$(CONFIG_TREE_RCU) += tree.o
obj-$(CONFIG_TINY_RCU) += tiny.o
obj-$(CONFIG_RCU_NEED_SEGCBLIST) += rcu_segcblist.o
......@@ -69,6 +69,11 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>");
* value specified by nr_cpus for a read-only test.
*
* Various other use cases may of course be specified.
*
* Note that this test's readers are intended only as a test load for
* the writers. The reader performance statistics will be overly
* pessimistic due to the per-critical-section interrupt disabling,
* test-end checks, and the pair of calls through pointers.
*/
#ifdef MODULE
......@@ -309,8 +314,10 @@ static void rcu_perf_wait_shutdown(void)
}
/*
* RCU perf reader kthread. Repeatedly does empty RCU read-side
* critical section, minimizing update-side interference.
* RCU perf reader kthread. Repeatedly does empty RCU read-side critical
* section, minimizing update-side interference. However, the point of
* this test is not to evaluate reader performance, but instead to serve
* as a test load for update-side performance testing.
*/
static int
rcu_perf_reader(void *arg)
......@@ -576,11 +583,8 @@ static int compute_real(int n)
static int
rcu_perf_shutdown(void *arg)
{
do {
wait_event(shutdown_wq,
atomic_read(&n_rcu_perf_writer_finished) >=
nrealwriters);
} while (atomic_read(&n_rcu_perf_writer_finished) < nrealwriters);
wait_event(shutdown_wq,
atomic_read(&n_rcu_perf_writer_finished) >= nrealwriters);
smp_mb(); /* Wake before output. */
rcu_perf_cleanup();
kernel_power_off();
......@@ -693,11 +697,8 @@ kfree_perf_cleanup(void)
static int
kfree_perf_shutdown(void *arg)
{
do {
wait_event(shutdown_wq,
atomic_read(&n_kfree_perf_thread_ended) >=
kfree_nrealthreads);
} while (atomic_read(&n_kfree_perf_thread_ended) < kfree_nrealthreads);
wait_event(shutdown_wq,
atomic_read(&n_kfree_perf_thread_ended) >= kfree_nrealthreads);
smp_mb(); /* Wake before output. */
......
......@@ -7,7 +7,7 @@
* Authors: Paul E. McKenney <paulmck@linux.ibm.com>
* Josh Triplett <josh@joshtriplett.org>
*
* See also: Documentation/RCU/torture.txt
* See also: Documentation/RCU/torture.rst
*/
#define pr_fmt(fmt) fmt
......@@ -109,6 +109,10 @@ torture_param(int, object_debug, 0,
torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)");
torture_param(int, onoff_interval, 0,
"Time between CPU hotplugs (jiffies), 0=disable");
torture_param(int, read_exit_delay, 13,
"Delay between read-then-exit episodes (s)");
torture_param(int, read_exit_burst, 16,
"# of read-then-exit bursts per episode, zero to disable");
torture_param(int, shuffle_interval, 3, "Number of seconds between shuffles");
torture_param(int, shutdown_secs, 0, "Shutdown time (s), <= zero to disable.");
torture_param(int, stall_cpu, 0, "Stall duration (s), zero to disable.");
......@@ -146,6 +150,7 @@ static struct task_struct *stall_task;
static struct task_struct *fwd_prog_task;
static struct task_struct **barrier_cbs_tasks;
static struct task_struct *barrier_task;
static struct task_struct *read_exit_task;
#define RCU_TORTURE_PIPE_LEN 10
......@@ -177,6 +182,7 @@ static long n_rcu_torture_boosts;
static atomic_long_t n_rcu_torture_timers;
static long n_barrier_attempts;
static long n_barrier_successes; /* did rcu_barrier test succeed? */
static unsigned long n_read_exits;
static struct list_head rcu_torture_removed;
static unsigned long shutdown_jiffies;
......@@ -1166,6 +1172,7 @@ rcu_torture_writer(void *arg)
WARN(1, "%s: rtort_pipe_count: %d\n", __func__, rcu_tortures[i].rtort_pipe_count);
}
} while (!torture_must_stop());
rcu_torture_current = NULL; // Let stats task know that we are done.
/* Reset expediting back to unexpedited. */
if (expediting > 0)
expediting = -expediting;
......@@ -1370,6 +1377,7 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp)
struct rt_read_seg *rtrsp1;
unsigned long long ts;
WARN_ON_ONCE(!rcu_is_watching());
newstate = rcutorture_extend_mask(readstate, trsp);
rcutorture_one_extend(&readstate, newstate, trsp, rtrsp++);
started = cur_ops->get_gp_seq();
......@@ -1539,10 +1547,11 @@ rcu_torture_stats_print(void)
n_rcu_torture_boosts,
atomic_long_read(&n_rcu_torture_timers));
torture_onoff_stats();
pr_cont("barrier: %ld/%ld:%ld\n",
pr_cont("barrier: %ld/%ld:%ld ",
data_race(n_barrier_successes),
data_race(n_barrier_attempts),
data_race(n_rcu_torture_barrier_error));
pr_cont("read-exits: %ld\n", data_race(n_read_exits));
pr_alert("%s%s ", torture_type, TORTURE_FLAG);
if (atomic_read(&n_rcu_torture_mberror) ||
......@@ -1634,7 +1643,8 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag)
"stall_cpu=%d stall_cpu_holdoff=%d stall_cpu_irqsoff=%d "
"stall_cpu_block=%d "
"n_barrier_cbs=%d "
"onoff_interval=%d onoff_holdoff=%d\n",
"onoff_interval=%d onoff_holdoff=%d "
"read_exit_delay=%d read_exit_burst=%d\n",
torture_type, tag, nrealreaders, nfakewriters,
stat_interval, verbose, test_no_idle_hz, shuffle_interval,
stutter, irqreader, fqs_duration, fqs_holdoff, fqs_stutter,
......@@ -1643,7 +1653,8 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag)
stall_cpu, stall_cpu_holdoff, stall_cpu_irqsoff,
stall_cpu_block,
n_barrier_cbs,
onoff_interval, onoff_holdoff);
onoff_interval, onoff_holdoff,
read_exit_delay, read_exit_burst);
}
static int rcutorture_booster_cleanup(unsigned int cpu)
......@@ -2175,7 +2186,7 @@ static void rcu_torture_barrier1cb(void *rcu_void)
static int rcu_torture_barrier_cbs(void *arg)
{
long myid = (long)arg;
bool lastphase = 0;
bool lastphase = false;
bool newphase;
struct rcu_head rcu;
......@@ -2338,6 +2349,99 @@ static bool rcu_torture_can_boost(void)
return true;
}
static bool read_exit_child_stop;
static bool read_exit_child_stopped;
static wait_queue_head_t read_exit_wq;
// Child kthread which just does an rcutorture reader and exits.
static int rcu_torture_read_exit_child(void *trsp_in)
{
struct torture_random_state *trsp = trsp_in;
set_user_nice(current, MAX_NICE);
// Minimize time between reading and exiting.
while (!kthread_should_stop())
schedule_timeout_uninterruptible(1);
(void)rcu_torture_one_read(trsp);
return 0;
}
// Parent kthread which creates and destroys read-exit child kthreads.
static int rcu_torture_read_exit(void *unused)
{
int count = 0;
bool errexit = false;
int i;
struct task_struct *tsp;
DEFINE_TORTURE_RANDOM(trs);
// Allocate and initialize.
set_user_nice(current, MAX_NICE);
VERBOSE_TOROUT_STRING("rcu_torture_read_exit: Start of test");
// Each pass through this loop does one read-exit episode.
do {
if (++count > read_exit_burst) {
VERBOSE_TOROUT_STRING("rcu_torture_read_exit: End of episode");
rcu_barrier(); // Wait for task_struct free, avoid OOM.
for (i = 0; i < read_exit_delay; i++) {
schedule_timeout_uninterruptible(HZ);
if (READ_ONCE(read_exit_child_stop))
break;
}
if (!READ_ONCE(read_exit_child_stop))
VERBOSE_TOROUT_STRING("rcu_torture_read_exit: Start of episode");
count = 0;
}
if (READ_ONCE(read_exit_child_stop))
break;
// Spawn child.
tsp = kthread_run(rcu_torture_read_exit_child,
&trs, "%s",
"rcu_torture_read_exit_child");
if (IS_ERR(tsp)) {
VERBOSE_TOROUT_ERRSTRING("out of memory");
errexit = true;
tsp = NULL;
break;
}
cond_resched();
kthread_stop(tsp);
n_read_exits ++;
stutter_wait("rcu_torture_read_exit");
} while (!errexit && !READ_ONCE(read_exit_child_stop));
// Clean up and exit.
smp_store_release(&read_exit_child_stopped, true); // After reaping.
smp_mb(); // Store before wakeup.
wake_up(&read_exit_wq);
while (!torture_must_stop())
schedule_timeout_uninterruptible(1);
torture_kthread_stopping("rcu_torture_read_exit");
return 0;
}
static int rcu_torture_read_exit_init(void)
{
if (read_exit_burst <= 0)
return -EINVAL;
init_waitqueue_head(&read_exit_wq);
read_exit_child_stop = false;
read_exit_child_stopped = false;
return torture_create_kthread(rcu_torture_read_exit, NULL,
read_exit_task);
}
static void rcu_torture_read_exit_cleanup(void)
{
if (!read_exit_task)
return;
WRITE_ONCE(read_exit_child_stop, true);
smp_mb(); // Above write before wait.
wait_event(read_exit_wq, smp_load_acquire(&read_exit_child_stopped));
torture_stop_kthread(rcutorture_read_exit, read_exit_task);
}
static enum cpuhp_state rcutor_hp;
static void
......@@ -2359,6 +2463,7 @@ rcu_torture_cleanup(void)
}
show_rcu_gp_kthreads();
rcu_torture_read_exit_cleanup();
rcu_torture_barrier_cleanup();
torture_stop_kthread(rcu_torture_fwd_prog, fwd_prog_task);
torture_stop_kthread(rcu_torture_stall, stall_task);
......@@ -2370,7 +2475,6 @@ rcu_torture_cleanup(void)
reader_tasks[i]);
kfree(reader_tasks);
}
rcu_torture_current = NULL;
if (fakewriter_tasks) {
for (i = 0; i < nfakewriters; i++) {
......@@ -2680,6 +2784,9 @@ rcu_torture_init(void)
if (firsterr)
goto unwind;
firsterr = rcu_torture_barrier_init();
if (firsterr)
goto unwind;
firsterr = rcu_torture_read_exit_init();
if (firsterr)
goto unwind;
if (object_debug)
......
This diff is collapsed.
......@@ -766,7 +766,7 @@ static void srcu_flip(struct srcu_struct *ssp)
* it, if this function was preempted for enough time for the counters
* to wrap, it really doesn't matter whether or not we expedite the grace
* period. The extra overhead of a needlessly expedited grace period is
* negligible when amoritized over that time period, and the extra latency
* negligible when amortized over that time period, and the extra latency
* of a needlessly non-expedited grace period is similarly negligible.
*/
static bool srcu_might_be_idle(struct srcu_struct *ssp)
......@@ -777,14 +777,15 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
unsigned long t;
unsigned long tlast;
check_init_srcu_struct(ssp);
/* If the local srcu_data structure has callbacks, not idle. */
local_irq_save(flags);
sdp = this_cpu_ptr(ssp->sda);
sdp = raw_cpu_ptr(ssp->sda);
spin_lock_irqsave_rcu_node(sdp, flags);
if (rcu_segcblist_pend_cbs(&sdp->srcu_cblist)) {
local_irq_restore(flags);
spin_unlock_irqrestore_rcu_node(sdp, flags);
return false; /* Callbacks already present, so not idle. */
}
local_irq_restore(flags);
spin_unlock_irqrestore_rcu_node(sdp, flags);
/*
* No local callbacks, so probabalistically probe global state.
......@@ -864,9 +865,8 @@ static void __call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp,
}
rhp->func = func;
idx = srcu_read_lock(ssp);
local_irq_save(flags);
sdp = this_cpu_ptr(ssp->sda);
spin_lock_rcu_node(sdp);
sdp = raw_cpu_ptr(ssp->sda);
spin_lock_irqsave_rcu_node(sdp, flags);
rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp);
rcu_segcblist_advance(&sdp->srcu_cblist,
rcu_seq_current(&ssp->srcu_gp_seq));
......
......@@ -103,6 +103,7 @@ module_param(rcu_task_stall_timeout, int, 0644);
#define RTGS_WAIT_READERS 9
#define RTGS_INVOKE_CBS 10
#define RTGS_WAIT_CBS 11
#ifndef CONFIG_TINY_RCU
static const char * const rcu_tasks_gp_state_names[] = {
"RTGS_INIT",
"RTGS_WAIT_WAIT_CBS",
......@@ -117,6 +118,7 @@ static const char * const rcu_tasks_gp_state_names[] = {
"RTGS_INVOKE_CBS",
"RTGS_WAIT_CBS",
};
#endif /* #ifndef CONFIG_TINY_RCU */
////////////////////////////////////////////////////////////////////////
//
......@@ -129,6 +131,7 @@ static void set_tasks_gp_state(struct rcu_tasks *rtp, int newstate)
rtp->gp_jiffies = jiffies;
}
#ifndef CONFIG_TINY_RCU
/* Return state name. */
static const char *tasks_gp_state_getname(struct rcu_tasks *rtp)
{
......@@ -139,6 +142,7 @@ static const char *tasks_gp_state_getname(struct rcu_tasks *rtp)
return "???";
return rcu_tasks_gp_state_names[j];
}
#endif /* #ifndef CONFIG_TINY_RCU */
// Enqueue a callback for the specified flavor of Tasks RCU.
static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
......@@ -205,7 +209,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
if (!rtp->cbs_head) {
WARN_ON(signal_pending(current));
set_tasks_gp_state(rtp, RTGS_WAIT_WAIT_CBS);
schedule_timeout_interruptible(HZ/10);
schedule_timeout_idle(HZ/10);
}
continue;
}
......@@ -227,7 +231,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
cond_resched();
}
/* Paranoid sleep to keep this from entering a tight loop */
schedule_timeout_uninterruptible(HZ/10);
schedule_timeout_idle(HZ/10);
set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
}
......@@ -268,6 +272,7 @@ static void __init rcu_tasks_bootup_oddness(void)
#endif /* #ifndef CONFIG_TINY_RCU */
#ifndef CONFIG_TINY_RCU
/* Dump out rcutorture-relevant state common to all RCU-tasks flavors. */
static void show_rcu_tasks_generic_gp_kthread(struct rcu_tasks *rtp, char *s)
{
......@@ -281,6 +286,7 @@ static void show_rcu_tasks_generic_gp_kthread(struct rcu_tasks *rtp, char *s)
".C"[!!data_race(rtp->cbs_head)],
s);
}
#endif /* #ifndef CONFIG_TINY_RCU */
static void exit_tasks_rcu_finish_trace(struct task_struct *t);
......@@ -336,7 +342,7 @@ static void rcu_tasks_wait_gp(struct rcu_tasks *rtp)
/* Slowly back off waiting for holdouts */
set_tasks_gp_state(rtp, RTGS_WAIT_SCAN_HOLDOUTS);
schedule_timeout_interruptible(HZ/fract);
schedule_timeout_idle(HZ/fract);
if (fract > 1)
fract--;
......@@ -402,7 +408,7 @@ static void rcu_tasks_pertask(struct task_struct *t, struct list_head *hop)
}
/* Processing between scanning taskslist and draining the holdout list. */
void rcu_tasks_postscan(struct list_head *hop)
static void rcu_tasks_postscan(struct list_head *hop)
{
/*
* Wait for tasks that are in the process of exiting. This
......@@ -557,10 +563,12 @@ static int __init rcu_spawn_tasks_kthread(void)
}
core_initcall(rcu_spawn_tasks_kthread);
#ifndef CONFIG_TINY_RCU
static void show_rcu_tasks_classic_gp_kthread(void)
{
show_rcu_tasks_generic_gp_kthread(&rcu_tasks, "");
}
#endif /* #ifndef CONFIG_TINY_RCU */
/* Do the srcu_read_lock() for the above synchronize_srcu(). */
void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu)
......@@ -682,10 +690,12 @@ static int __init rcu_spawn_tasks_rude_kthread(void)
}
core_initcall(rcu_spawn_tasks_rude_kthread);
#ifndef CONFIG_TINY_RCU
static void show_rcu_tasks_rude_gp_kthread(void)
{
show_rcu_tasks_generic_gp_kthread(&rcu_tasks_rude, "");
}
#endif /* #ifndef CONFIG_TINY_RCU */
#else /* #ifdef CONFIG_TASKS_RUDE_RCU */
static void show_rcu_tasks_rude_gp_kthread(void) {}
......@@ -727,8 +737,8 @@ EXPORT_SYMBOL_GPL(rcu_trace_lock_map);
#ifdef CONFIG_TASKS_TRACE_RCU
atomic_t trc_n_readers_need_end; // Number of waited-for readers.
DECLARE_WAIT_QUEUE_HEAD(trc_wait); // List of holdout tasks.
static atomic_t trc_n_readers_need_end; // Number of waited-for readers.
static DECLARE_WAIT_QUEUE_HEAD(trc_wait); // List of holdout tasks.
// Record outstanding IPIs to each CPU. No point in sending two...
static DEFINE_PER_CPU(bool, trc_ipi_to_cpu);
......@@ -835,7 +845,7 @@ static bool trc_inspect_reader(struct task_struct *t, void *arg)
bool ofl = cpu_is_offline(cpu);
if (task_curr(t)) {
WARN_ON_ONCE(ofl & !is_idle_task(t));
WARN_ON_ONCE(ofl && !is_idle_task(t));
// If no chance of heavyweight readers, do it the hard way.
if (!ofl && !IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB))
......@@ -1118,11 +1128,10 @@ EXPORT_SYMBOL_GPL(call_rcu_tasks_trace);
* synchronize_rcu_tasks_trace - wait for a trace rcu-tasks grace period
*
* Control will return to the caller some time after a trace rcu-tasks
* grace period has elapsed, in other words after all currently
* executing rcu-tasks read-side critical sections have elapsed. These
* read-side critical sections are delimited by calls to schedule(),
* cond_resched_tasks_rcu_qs(), userspace execution, and (in theory,
* anyway) cond_resched().
* grace period has elapsed, in other words after all currently executing
* rcu-tasks read-side critical sections have elapsed. These read-side
* critical sections are delimited by calls to rcu_read_lock_trace()
* and rcu_read_unlock_trace().
*
* This is a very specialized primitive, intended only for a few uses in
* tracing and other situations requiring manipulation of function preambles
......@@ -1164,6 +1173,7 @@ static int __init rcu_spawn_tasks_trace_kthread(void)
}
core_initcall(rcu_spawn_tasks_trace_kthread);
#ifndef CONFIG_TINY_RCU
static void show_rcu_tasks_trace_gp_kthread(void)
{
char buf[64];
......@@ -1174,18 +1184,21 @@ static void show_rcu_tasks_trace_gp_kthread(void)
data_race(n_heavy_reader_attempts));
show_rcu_tasks_generic_gp_kthread(&rcu_tasks_trace, buf);
}
#endif /* #ifndef CONFIG_TINY_RCU */
#else /* #ifdef CONFIG_TASKS_TRACE_RCU */
static void exit_tasks_rcu_finish_trace(struct task_struct *t) { }
static inline void show_rcu_tasks_trace_gp_kthread(void) {}
#endif /* #else #ifdef CONFIG_TASKS_TRACE_RCU */
#ifndef CONFIG_TINY_RCU
void show_rcu_tasks_gp_kthreads(void)
{
show_rcu_tasks_classic_gp_kthread();
show_rcu_tasks_rude_gp_kthread();
show_rcu_tasks_trace_gp_kthread();
}
#endif /* #ifndef CONFIG_TINY_RCU */
#else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
static inline void rcu_tasks_bootup_oddness(void) {}
......
......@@ -23,6 +23,7 @@
#include <linux/cpu.h>
#include <linux/prefetch.h>
#include <linux/slab.h>
#include <linux/mm.h>
#include "rcu.h"
......@@ -84,9 +85,9 @@ static inline bool rcu_reclaim_tiny(struct rcu_head *head)
unsigned long offset = (unsigned long)head->func;
rcu_lock_acquire(&rcu_callback_map);
if (__is_kfree_rcu_offset(offset)) {
trace_rcu_invoke_kfree_callback("", head, offset);
kfree((void *)head - offset);
if (__is_kvfree_rcu_offset(offset)) {
trace_rcu_invoke_kvfree_callback("", head, offset);
kvfree((void *)head - offset);
rcu_lock_release(&rcu_callback_map);
return true;
}
......
This diff is collapsed.
......@@ -41,7 +41,7 @@ struct rcu_node {
raw_spinlock_t __private lock; /* Root rcu_node's lock protects */
/* some rcu_state fields as well as */
/* following. */
unsigned long gp_seq; /* Track rsp->rcu_gp_seq. */
unsigned long gp_seq; /* Track rsp->gp_seq. */
unsigned long gp_seq_needed; /* Track furthest future GP request. */
unsigned long completedqs; /* All QSes done for this node. */
unsigned long qsmask; /* CPUs or groups that need to switch in */
......@@ -73,9 +73,9 @@ struct rcu_node {
unsigned long ffmask; /* Fully functional CPUs. */
unsigned long grpmask; /* Mask to apply to parent qsmask. */
/* Only one bit will be set in this mask. */
int grplo; /* lowest-numbered CPU or group here. */
int grphi; /* highest-numbered CPU or group here. */
u8 grpnum; /* CPU/group number for next level up. */
int grplo; /* lowest-numbered CPU here. */
int grphi; /* highest-numbered CPU here. */
u8 grpnum; /* group number for next level up. */
u8 level; /* root is at level 0. */
bool wait_blkd_tasks;/* Necessary to wait for blocked tasks to */
/* exit RCU read-side critical sections */
......@@ -149,7 +149,7 @@ union rcu_noqs {
/* Per-CPU data for read-copy update. */
struct rcu_data {
/* 1) quiescent-state and grace-period handling : */
unsigned long gp_seq; /* Track rsp->rcu_gp_seq counter. */
unsigned long gp_seq; /* Track rsp->gp_seq counter. */
unsigned long gp_seq_needed; /* Track furthest future GP request. */
union rcu_noqs cpu_no_qs; /* No QSes yet for this CPU. */
bool core_needs_qs; /* Core waits for quiesc state. */
......@@ -171,6 +171,7 @@ struct rcu_data {
/* different grace periods. */
long qlen_last_fqs_check;
/* qlen at last check for QS forcing */
unsigned long n_cbs_invoked; /* # callbacks invoked since boot. */
unsigned long n_force_qs_snap;
/* did other CPU force QS recently? */
long blimit; /* Upper limit on a processed batch */
......@@ -301,6 +302,8 @@ struct rcu_state {
u8 boost ____cacheline_internodealigned_in_smp;
/* Subject to priority boost. */
unsigned long gp_seq; /* Grace-period sequence #. */
unsigned long gp_max; /* Maximum GP duration in */
/* jiffies. */
struct task_struct *gp_kthread; /* Task for grace periods. */
struct swait_queue_head gp_wq; /* Where GP task waits. */
short gp_flags; /* Commands for GP task. */
......@@ -346,8 +349,6 @@ struct rcu_state {
/* a reluctant CPU. */
unsigned long n_force_qs_gpstart; /* Snapshot of n_force_qs at */
/* GP start. */
unsigned long gp_max; /* Maximum GP duration in */
/* jiffies. */
const char *name; /* Name of structure. */
char abbr; /* Abbreviated name. */
......
......@@ -403,7 +403,7 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
/* Online, so delay for a bit and try again. */
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("selectofl"));
schedule_timeout_uninterruptible(1);
schedule_timeout_idle(1);
goto retry_ipi;
}
/* CPU really is offline, so we must report its QS. */
......
......@@ -1033,7 +1033,7 @@ static int rcu_boost_kthread(void *arg)
if (spincnt > 10) {
WRITE_ONCE(rnp->boost_kthread_status, RCU_KTHREAD_YIELDING);
trace_rcu_utilization(TPS("End boost kthread@rcu_yield"));
schedule_timeout_interruptible(2);
schedule_timeout_idle(2);
trace_rcu_utilization(TPS("Start boost kthread@rcu_yield"));
spincnt = 0;
}
......@@ -2005,7 +2005,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
/* Polling, so trace if first poll in the series. */
if (gotcbs)
trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Poll"));
schedule_timeout_interruptible(1);
schedule_timeout_idle(1);
} else if (!needwait_gp) {
/* Wait for callbacks to appear. */
trace_rcu_nocb_wake(rcu_state.name, cpu, TPS("Sleep"));
......
......@@ -237,14 +237,12 @@ struct rcu_stall_chk_rdr {
*/
static bool check_slow_task(struct task_struct *t, void *arg)
{
struct rcu_node *rnp;
struct rcu_stall_chk_rdr *rscrp = arg;
if (task_curr(t))
return false; // It is running, so decline to inspect it.
rscrp->nesting = t->rcu_read_lock_nesting;
rscrp->rs = t->rcu_read_unlock_special;
rnp = t->rcu_blocked_node;
rscrp->on_blkd_list = !list_empty(&t->rcu_node_entry);
return true;
}
......@@ -468,7 +466,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
/*
* OK, time to rat on our buddy...
* See Documentation/RCU/stallwarn.txt for info on how to debug
* See Documentation/RCU/stallwarn.rst for info on how to debug
* RCU CPU stall warnings.
*/
pr_err("INFO: %s detected stalls on CPUs/tasks:\n", rcu_state.name);
......@@ -535,7 +533,7 @@ static void print_cpu_stall(unsigned long gps)
/*
* OK, time to rat on ourselves...
* See Documentation/RCU/stallwarn.txt for info on how to debug
* See Documentation/RCU/stallwarn.rst for info on how to debug
* RCU CPU stall warnings.
*/
pr_err("INFO: %s self-detected stall on CPU\n", rcu_state.name);
......@@ -649,6 +647,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
*/
void show_rcu_gp_kthreads(void)
{
unsigned long cbs = 0;
int cpu;
unsigned long j;
unsigned long ja;
......@@ -690,9 +689,11 @@ void show_rcu_gp_kthreads(void)
}
for_each_possible_cpu(cpu) {
rdp = per_cpu_ptr(&rcu_data, cpu);
cbs += data_race(rdp->n_cbs_invoked);
if (rcu_segcblist_is_offloaded(&rdp->cblist))
show_rcu_nocb_state(rdp);
}
pr_info("RCU callbacks invoked since boot: %lu\n", cbs);
show_rcu_tasks_gp_kthreads();
}
EXPORT_SYMBOL_GPL(show_rcu_gp_kthreads);
......
......@@ -42,6 +42,7 @@
#include <linux/kprobes.h>
#include <linux/slab.h>
#include <linux/irq_work.h>
#include <linux/rcupdate_trace.h>
#define CREATE_TRACE_POINTS
......@@ -207,7 +208,7 @@ void rcu_end_inkernel_boot(void)
rcu_unexpedite_gp();
if (rcu_normal_after_boot)
WRITE_ONCE(rcu_normal, 1);
rcu_boot_ended = 1;
rcu_boot_ended = true;
}
/*
......@@ -279,6 +280,7 @@ struct lockdep_map rcu_sched_lock_map = {
};
EXPORT_SYMBOL_GPL(rcu_sched_lock_map);
// Tell lockdep when RCU callbacks are being invoked.
static struct lock_class_key rcu_callback_key;
struct lockdep_map rcu_callback_map =
STATIC_LOCKDEP_MAP_INIT("rcu_callback", &rcu_callback_key);
......@@ -390,13 +392,14 @@ void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array,
might_sleep();
continue;
}
init_rcu_head_on_stack(&rs_array[i].head);
init_completion(&rs_array[i].completion);
for (j = 0; j < i; j++)
if (crcu_array[j] == crcu_array[i])
break;
if (j == i)
if (j == i) {
init_rcu_head_on_stack(&rs_array[i].head);
init_completion(&rs_array[i].completion);
(crcu_array[i])(&rs_array[i].head, wakeme_after_rcu);
}
}
/* Wait for all callbacks to be invoked. */
......@@ -407,9 +410,10 @@ void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array,
for (j = 0; j < i; j++)
if (crcu_array[j] == crcu_array[i])
break;
if (j == i)
if (j == i) {
wait_for_completion(&rs_array[i].completion);
destroy_rcu_head_on_stack(&rs_array[i].head);
destroy_rcu_head_on_stack(&rs_array[i].head);
}
}
}
EXPORT_SYMBOL_GPL(__wait_rcu_gp);
......
......@@ -351,16 +351,24 @@ void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit)
EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cpu);
/*
* Set a per-task tick dependency. Posix CPU timers need this in order to elapse
* per task timers.
* Set a per-task tick dependency. RCU need this. Also posix CPU timers
* in order to elapse per task timers.
*/
void tick_nohz_dep_set_task(struct task_struct *tsk, enum tick_dep_bits bit)
{
/*
* We could optimize this with just kicking the target running the task
* if that noise matters for nohz full users.
*/
tick_nohz_dep_set_all(&tsk->tick_dep_mask, bit);
if (!atomic_fetch_or(BIT(bit), &tsk->tick_dep_mask)) {
if (tsk == current) {
preempt_disable();
tick_nohz_full_kick();
preempt_enable();
} else {
/*
* Some future tick_nohz_full_kick_task()
* should optimize this.
*/
tick_nohz_full_kick_all();
}
}
}
EXPORT_SYMBOL_GPL(tick_nohz_dep_set_task);
......
......@@ -45,6 +45,9 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>");
static bool disable_onoff_at_boot;
module_param(disable_onoff_at_boot, bool, 0444);
static bool ftrace_dump_at_shutdown;
module_param(ftrace_dump_at_shutdown, bool, 0444);
static char *torture_type;
static int verbose;
......@@ -527,7 +530,8 @@ static int torture_shutdown(void *arg)
torture_shutdown_hook();
else
VERBOSE_TOROUT_STRING("No torture_shutdown_hook(), skipping.");
rcu_ftrace_dump(DUMP_ALL);
if (ftrace_dump_at_shutdown)
rcu_ftrace_dump(DUMP_ALL);
kernel_power_off(); /* Shut down the system. */
return 0;
}
......
......@@ -15,6 +15,8 @@
#include <linux/delay.h>
#include <linux/rwsem.h>
#include <linux/mm.h>
#include <linux/rcupdate.h>
#include <linux/slab.h>
#define __param(type, name, init, msg) \
static type name = init; \
......@@ -35,14 +37,18 @@ __param(int, test_loop_count, 1000000,
__param(int, run_test_mask, INT_MAX,
"Set tests specified in the mask.\n\n"
"\t\tid: 1, name: fix_size_alloc_test\n"
"\t\tid: 2, name: full_fit_alloc_test\n"
"\t\tid: 4, name: long_busy_list_alloc_test\n"
"\t\tid: 8, name: random_size_alloc_test\n"
"\t\tid: 16, name: fix_align_alloc_test\n"
"\t\tid: 32, name: random_size_align_alloc_test\n"
"\t\tid: 64, name: align_shift_alloc_test\n"
"\t\tid: 128, name: pcpu_alloc_test\n"
"\t\tid: 1, name: fix_size_alloc_test\n"
"\t\tid: 2, name: full_fit_alloc_test\n"
"\t\tid: 4, name: long_busy_list_alloc_test\n"
"\t\tid: 8, name: random_size_alloc_test\n"
"\t\tid: 16, name: fix_align_alloc_test\n"
"\t\tid: 32, name: random_size_align_alloc_test\n"
"\t\tid: 64, name: align_shift_alloc_test\n"
"\t\tid: 128, name: pcpu_alloc_test\n"
"\t\tid: 256, name: kvfree_rcu_1_arg_vmalloc_test\n"
"\t\tid: 512, name: kvfree_rcu_2_arg_vmalloc_test\n"
"\t\tid: 1024, name: kvfree_rcu_1_arg_slab_test\n"
"\t\tid: 2048, name: kvfree_rcu_2_arg_slab_test\n"
/* Add a new test case description here. */
);
......@@ -316,6 +322,83 @@ pcpu_alloc_test(void)
return rv;
}
struct test_kvfree_rcu {
struct rcu_head rcu;
unsigned char array[20];
};
static int
kvfree_rcu_1_arg_vmalloc_test(void)
{
struct test_kvfree_rcu *p;
int i;
for (i = 0; i < test_loop_count; i++) {
p = vmalloc(1 * PAGE_SIZE);
if (!p)
return -1;
p->array[0] = 'a';
kvfree_rcu(p);
}
return 0;
}
static int
kvfree_rcu_2_arg_vmalloc_test(void)
{
struct test_kvfree_rcu *p;
int i;
for (i = 0; i < test_loop_count; i++) {
p = vmalloc(1 * PAGE_SIZE);
if (!p)
return -1;
p->array[0] = 'a';
kvfree_rcu(p, rcu);
}
return 0;
}
static int
kvfree_rcu_1_arg_slab_test(void)
{
struct test_kvfree_rcu *p;
int i;
for (i = 0; i < test_loop_count; i++) {
p = kmalloc(sizeof(*p), GFP_KERNEL);
if (!p)
return -1;
p->array[0] = 'a';
kvfree_rcu(p);
}
return 0;
}
static int
kvfree_rcu_2_arg_slab_test(void)
{
struct test_kvfree_rcu *p;
int i;
for (i = 0; i < test_loop_count; i++) {
p = kmalloc(sizeof(*p), GFP_KERNEL);
if (!p)
return -1;
p->array[0] = 'a';
kvfree_rcu(p, rcu);
}
return 0;
}
struct test_case_desc {
const char *test_name;
int (*test_func)(void);
......@@ -330,6 +413,10 @@ static struct test_case_desc test_case_array[] = {
{ "random_size_align_alloc_test", random_size_align_alloc_test },
{ "align_shift_alloc_test", align_shift_alloc_test },
{ "pcpu_alloc_test", pcpu_alloc_test },
{ "kvfree_rcu_1_arg_vmalloc_test", kvfree_rcu_1_arg_vmalloc_test },
{ "kvfree_rcu_2_arg_vmalloc_test", kvfree_rcu_2_arg_vmalloc_test },
{ "kvfree_rcu_1_arg_slab_test", kvfree_rcu_1_arg_slab_test },
{ "kvfree_rcu_2_arg_slab_test", kvfree_rcu_2_arg_slab_test },
/* Add a new test case here. */
};
......
......@@ -373,14 +373,14 @@ static void memcg_destroy_list_lru_node(struct list_lru_node *nlru)
struct list_lru_memcg *memcg_lrus;
/*
* This is called when shrinker has already been unregistered,
* and nobody can use it. So, there is no need to use kvfree_rcu().
* and nobody can use it. So, there is no need to use kvfree_rcu_local().
*/
memcg_lrus = rcu_dereference_protected(nlru->memcg_lrus, true);
__memcg_destroy_list_lru_node(memcg_lrus, 0, memcg_nr_cache_ids);
kvfree(memcg_lrus);
}
static void kvfree_rcu(struct rcu_head *head)
static void kvfree_rcu_local(struct rcu_head *head)
{
struct list_lru_memcg *mlru;
......@@ -419,7 +419,7 @@ static int memcg_update_list_lru_node(struct list_lru_node *nlru,
rcu_assign_pointer(nlru->memcg_lrus, new);
spin_unlock_irq(&nlru->lock);
call_rcu(&old->rcu, kvfree_rcu);
call_rcu(&old->rcu, kvfree_rcu_local);
return 0;
}
......
......@@ -3171,6 +3171,7 @@ void exit_mmap(struct mm_struct *mm)
if (vma->vm_flags & VM_ACCOUNT)
nr_accounted += vma_pages(vma);
vma = remove_vma(vma);
cond_resched();
}
vm_unacct_memory(nr_accounted);
}
......
......@@ -1973,7 +1973,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
/*
* Before updating sk_refcnt, we must commit prior changes to memory
* (Documentation/RCU/rculist_nulls.txt for details)
* (Documentation/RCU/rculist_nulls.rst for details)
*/
smp_wmb();
refcount_set(&newsk->sk_refcnt, 2);
......@@ -3035,7 +3035,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
sk_rx_queue_clear(sk);
/*
* Before updating sk_refcnt, we must commit prior changes to memory
* (Documentation/RCU/rculist_nulls.txt for details)
* (Documentation/RCU/rculist_nulls.rst for details)
*/
smp_wmb();
refcount_set(&sk->sk_refcnt, 1);
......
......@@ -32,11 +32,11 @@ if test -z "$TORTURE_TRUST_MAKE"
then
make clean > $resdir/Make.clean 2>&1
fi
make $TORTURE_DEFCONFIG > $resdir/Make.defconfig.out 2>&1
make $TORTURE_KMAKE_ARG $TORTURE_DEFCONFIG > $resdir/Make.defconfig.out 2>&1
mv .config .config.sav
sh $T/upd.sh < .config.sav > .config
cp .config .config.new
yes '' | make oldconfig > $resdir/Make.oldconfig.out 2> $resdir/Make.oldconfig.err
yes '' | make $TORTURE_KMAKE_ARG oldconfig > $resdir/Make.oldconfig.out 2> $resdir/Make.oldconfig.err
# verify new config matches specification.
configcheck.sh .config $c
......
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+
#
# Scan standard input for error messages, dumping any found to standard
# output.
#
# Usage: console-badness.sh
#
# Copyright (C) 2020 Facebook, Inc.
#
# Authors: Paul E. McKenney <paulmck@kernel.org>
egrep 'Badness|WARNING:|Warn|BUG|===========|Call Trace:|Oops:|detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state|rcu_.*kthread starved for|!!!' |
grep -v 'ODEBUG: ' |
grep -v 'This means that this is a DEBUG kernel and it is' |
grep -v 'Warning: unable to open an initial console'
......@@ -215,9 +215,6 @@ identify_qemu_args () {
then
echo -device spapr-vlan,netdev=net0,mac=$TORTURE_QEMU_MAC
echo -netdev bridge,br=br0,id=net0
elif test -n "$TORTURE_QEMU_INTERACTIVE"
then
echo -net nic -net user
fi
;;
esac
......@@ -234,7 +231,7 @@ identify_qemu_args () {
# Returns the number of virtual CPUs available to the aggregate of the
# guest OSes.
identify_qemu_vcpus () {
lscpu | grep '^CPU(s):' | sed -e 's/CPU(s)://'
lscpu | grep '^CPU(s):' | sed -e 's/CPU(s)://' -e 's/[ ]*//g'
}
# print_bug
......@@ -275,3 +272,21 @@ specify_qemu_cpus () {
esac
fi
}
# specify_qemu_net qemu-args
#
# Appends a string containing "-net none" to qemu-args, unless the incoming
# qemu-args already contains "-smp" or unless the TORTURE_QEMU_INTERACTIVE
# environment variable is set, in which case the string that is be added is
# instead "-net nic -net user".
specify_qemu_net () {
if echo $1 | grep -q -e -net
then
echo $1
elif test -n "$TORTURE_QEMU_INTERACTIVE"
then
echo $1 -net nic -net user
else
echo $1 -net none
fi
}
......@@ -46,6 +46,12 @@ do
exit 0;
fi
# Check for stop request.
if test -f "$TORTURE_STOPFILE"
then
exit 1;
fi
# Set affinity to randomly selected online CPU
if cpus=`grep 1 /sys/devices/system/cpu/*/online 2>&1 |
sed -e 's,/[^/]*$,,' -e 's/^[^0-9]*//'`
......
......@@ -9,6 +9,12 @@
#
# Authors: Paul E. McKenney <paulmck@linux.ibm.com>
if test -f "$TORTURE_STOPFILE"
then
echo "kvm-build.sh early exit due to run STOP request"
exit 1
fi
config_template=${1}
if test -z "$config_template" -o ! -f "$config_template" -o ! -r "$config_template"
then
......
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0+
#
# Run a group of kvm.sh tests on the specified commits. This currently
# unconditionally does three-minute runs on each scenario in CFLIST,
# taking advantage of all available CPUs and trusting the "make" utility.
# In the short term, adjustments can be made by editing this script and
# CFLIST. If some adjustments appear to have ongoing value, this script
# might grow some command-line arguments.
#
# Usage: kvm-check-branches.sh commit1 commit2..commit3 commit4 ...
#
# This script considers its arguments one at a time. If more elaborate
# specification of commits is needed, please use "git rev-list" to
# produce something that this simple script can understand. The reason
# for retaining the simplicity is that it allows the user to more easily
# see which commit came from which branch.
#
# This script creates a yyyy.mm.dd-hh.mm.ss-group entry in the "res"
# directory. The calls to kvm.sh create the usual entries, but this script
# moves them under the yyyy.mm.dd-hh.mm.ss-group entry, each in its own
# directory numbered in run order, that is, "0001", "0002", and so on.
# For successful runs, the large build artifacts are removed. Doing this
# reduces the disk space required by about two orders of magnitude for
# successful runs.
#
# Copyright (C) Facebook, 2020
#
# Authors: Paul E. McKenney <paulmck@kernel.org>
if ! git status > /dev/null 2>&1
then
echo '!!!' This script needs to run in a git archive. 1>&2
echo '!!!' Giving up. 1>&2
exit 1
fi
# Remember where we started so that we can get back and the end.
curcommit="`git status | head -1 | awk '{ print $NF }'`"
nfail=0
ntry=0
resdir="tools/testing/selftests/rcutorture/res"
ds="`date +%Y.%m.%d-%H.%M.%S`-group"
if ! test -e $resdir
then
mkdir $resdir || :
fi
mkdir $resdir/$ds
echo Results directory: $resdir/$ds
KVM="`pwd`/tools/testing/selftests/rcutorture"; export KVM
PATH=${KVM}/bin:$PATH; export PATH
. functions.sh
cpus="`identify_qemu_vcpus`"
echo Using up to $cpus CPUs.
# Each pass through this loop does one command-line argument.
for gitbr in $@
do
echo ' --- git branch ' $gitbr
# Each pass through this loop tests one commit.
for i in `git rev-list "$gitbr"`
do
ntry=`expr $ntry + 1`
idir=`awk -v ntry="$ntry" 'END { printf "%04d", ntry; }' < /dev/null`
echo ' --- commit ' $i from branch $gitbr
date
mkdir $resdir/$ds/$idir
echo $gitbr > $resdir/$ds/$idir/gitbr
echo $i >> $resdir/$ds/$idir/gitbr
# Test the specified commit.
git checkout $i > $resdir/$ds/$idir/git-checkout.out 2>&1
echo git checkout return code: $? "(Commit $ntry: $i)"
kvm.sh --cpus $cpus --duration 3 --trust-make > $resdir/$ds/$idir/kvm.sh.out 2>&1
ret=$?
echo kvm.sh return code $ret for commit $i from branch $gitbr
# Move the build products to their resting place.
runresdir="`grep -m 1 '^Results directory:' < $resdir/$ds/$idir/kvm.sh.out | sed -e 's/^Results directory://'`"
mv $runresdir $resdir/$ds/$idir
rrd="`echo $runresdir | sed -e 's,^.*/,,'`"
echo Run results: $resdir/$ds/$idir/$rrd
if test "$ret" -ne 0
then
# Failure, so leave all evidence intact.
nfail=`expr $nfail + 1`
else
# Success, so remove large files to save about 1GB.
( cd $resdir/$ds/$idir/$rrd; rm -f */vmlinux */bzImage */System.map */Module.symvers )
fi
done
done
date
# Go back to the original commit.
git checkout "$curcommit"
if test $nfail -ne 0
then
echo '!!! ' $nfail failures in $ntry 'runs!!!'
exit 1
else
echo No failures in $ntry runs.
exit 0
fi
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+
#
# Analyze a given results directory for refscale performance measurements.
#
# Usage: kvm-recheck-refscale.sh resdir
#
# Copyright (C) IBM Corporation, 2016
#
# Authors: Paul E. McKenney <paulmck@linux.ibm.com>
i="$1"
if test -d "$i" -a -r "$i"
then
:
else
echo Unreadable results directory: $i
exit 1
fi
PATH=`pwd`/tools/testing/selftests/rcutorture/bin:$PATH; export PATH
. functions.sh
configfile=`echo $i | sed -e 's/^.*\///'`
sed -e 's/^\[[^]]*]//' < $i/console.log | tr -d '\015' |
awk -v configfile="$configfile" '
/^[ ]*Runs Time\(ns\) *$/ {
if (dataphase + 0 == 0) {
dataphase = 1;
# print configfile, $0;
}
next;
}
/[^ ]*[0-9][0-9]* [0-9][0-9]*\.[0-9][0-9]*$/ {
if (dataphase == 1) {
# print $0;
readertimes[++n] = $2;
sum += $2;
}
next;
}
{
if (dataphase == 1)
dataphase == 2;
next;
}
END {
print configfile " results:";
newNR = asort(readertimes);
if (newNR <= 0) {
print "No refscale records found???"
exit;
}
medianidx = int(newNR / 2);
if (newNR == medianidx * 2)
medianvalue = (readertimes[medianidx - 1] + readertimes[medianidx]) / 2;
else
medianvalue = readertimes[medianidx];
points = "Points:";
for (i = 1; i <= newNR; i++)
points = points " " readertimes[i];
print points;
print "Average reader duration: " sum / newNR " nanoseconds";
print "Minimum reader duration: " readertimes[1];
print "Median reader duration: " medianvalue;
print "Maximum reader duration: " readertimes[newNR];
print "Computed from refscale printk output.";
}'
......@@ -31,6 +31,7 @@ do
head -1 $resdir/log
fi
TORTURE_SUITE="`cat $i/../TORTURE_SUITE`"
configfile=`echo $i | sed -e 's,^.*/,,'`
rm -f $i/console.log.*.diags
kvm-recheck-${TORTURE_SUITE}.sh $i
if test -f "$i/qemu-retval" && test "`cat $i/qemu-retval`" -ne 0 && test "`cat $i/qemu-retval`" -ne 137
......@@ -43,7 +44,8 @@ do
then
echo QEMU killed
fi
configcheck.sh $i/.config $i/ConfigFragment
configcheck.sh $i/.config $i/ConfigFragment > $T 2>&1
cat $T
if test -r $i/Make.oldconfig.err
then
cat $i/Make.oldconfig.err
......@@ -55,15 +57,15 @@ do
cat $i/Warnings
fi
else
if test -f "$i/qemu-cmd"
then
print_bug qemu failed
echo " $i"
elif test -f "$i/buildonly"
if test -f "$i/buildonly"
then
echo Build-only run, no boot/test
configcheck.sh $i/.config $i/ConfigFragment
parse-build.sh $i/Make.out $configfile
elif test -f "$i/qemu-cmd"
then
print_bug qemu failed
echo " $i"
else
print_bug Build failed
echo " $i"
......@@ -72,7 +74,11 @@ do
done
if test -f "$rd/kcsan.sum"
then
if test -s "$rd/kcsan.sum"
if grep -q CONFIG_KCSAN=y $T
then
echo "Compiler or architecture does not support KCSAN!"
echo Did you forget to switch your compiler with '--kmake-arg CC=<cc-that-supports-kcsan>'?
elif test -s "$rd/kcsan.sum"
then
echo KCSAN summary in $rd/kcsan.sum
else
......
......@@ -124,7 +124,6 @@ seconds=$4
qemu_args=$5
boot_args=$6
cd $KVM
kstarttime=`gawk 'BEGIN { print systime() }' < /dev/null`
if test -z "$TORTURE_BUILDONLY"
then
......@@ -141,6 +140,7 @@ then
cpu_count=$TORTURE_ALLOTED_CPUS
fi
qemu_args="`specify_qemu_cpus "$QEMU" "$qemu_args" "$cpu_count"`"
qemu_args="`specify_qemu_net "$qemu_args"`"
# Generate architecture-specific and interaction-specific qemu arguments
qemu_args="$qemu_args `identify_qemu_args "$QEMU" "$resdir/console.log"`"
......@@ -152,6 +152,7 @@ qemu_append="`identify_qemu_append "$QEMU"`"
boot_args="`configfrag_boot_params "$boot_args" "$config_template"`"
# Generate kernel-version-specific boot parameters
boot_args="`per_version_boot_params "$boot_args" $resdir/.config $seconds`"
echo $QEMU $qemu_args -m $TORTURE_QEMU_MEM -kernel $KERNEL -append \"$qemu_append $boot_args\" > $resdir/qemu-cmd
if test -n "$TORTURE_BUILDONLY"
then
......@@ -159,9 +160,16 @@ then
touch $resdir/buildonly
exit 0
fi
# Decorate qemu-cmd with redirection, backgrounding, and PID capture
sed -e 's/$/ 2>\&1 \&/' < $resdir/qemu-cmd > $T/qemu-cmd
echo 'echo $! > $resdir/qemu_pid' >> $T/qemu-cmd
# In case qemu refuses to run...
echo "NOTE: $QEMU either did not run or was interactive" > $resdir/console.log
echo $QEMU $qemu_args -m $TORTURE_QEMU_MEM -kernel $KERNEL -append \"$qemu_append $boot_args\" > $resdir/qemu-cmd
( $QEMU $qemu_args -m $TORTURE_QEMU_MEM -kernel $KERNEL -append "$qemu_append $boot_args" > $resdir/qemu-output 2>&1 & echo $! > $resdir/qemu_pid; wait `cat $resdir/qemu_pid`; echo $? > $resdir/qemu-retval ) &
# Attempt to run qemu
( . $T/qemu-cmd; wait `cat $resdir/qemu_pid`; echo $? > $resdir/qemu-retval ) &
commandcompleted=0
sleep 10 # Give qemu's pid a chance to reach the file
if test -s "$resdir/qemu_pid"
......@@ -181,7 +189,7 @@ do
kruntime=`gawk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null`
if test -z "$qemu_pid" || kill -0 "$qemu_pid" > /dev/null 2>&1
then
if test $kruntime -ge $seconds
if test $kruntime -ge $seconds -o -f "$TORTURE_STOPFILE"
then
break;
fi
......@@ -210,10 +218,19 @@ then
fi
if test $commandcompleted -eq 0 -a -n "$qemu_pid"
then
echo Grace period for qemu job at pid $qemu_pid
if ! test -f "$TORTURE_STOPFILE"
then
echo Grace period for qemu job at pid $qemu_pid
fi
oldline="`tail $resdir/console.log`"
while :
do
if test -f "$TORTURE_STOPFILE"
then
echo "PID $qemu_pid killed due to run STOP request" >> $resdir/Warnings 2>&1
kill -KILL $qemu_pid
break
fi
kruntime=`gawk 'BEGIN { print systime() - '"$kstarttime"' }' < /dev/null`
if kill -0 $qemu_pid > /dev/null 2>&1
then
......
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+
#
# Transform a qemu-cmd file to allow reuse.
#
# Usage: kvm-transform.sh bzImage console.log < qemu-cmd-in > qemu-cmd-out
#
# bzImage: Kernel and initrd from the same prior kvm.sh run.
# console.log: File into which to place console output.
#
# The original qemu-cmd file is provided on standard input.
# The transformed qemu-cmd file is on standard output.
# The transformation assumes that the qemu command is confined to a
# single line. It also assumes no whitespace in filenames.
#
# Copyright (C) 2020 Facebook, Inc.
#
# Authors: Paul E. McKenney <paulmck@kernel.org>
image="$1"
if test -z "$image"
then
echo Need kernel image file.
exit 1
fi
consolelog="$2"
if test -z "$consolelog"
then
echo "Need console log file name."
exit 1
fi
awk -v image="$image" -v consolelog="$consolelog" '
{
line = "";
for (i = 1; i <= NF; i++) {
if (line == "")
line = $i;
else
line = line " " $i;
if ($i == "-serial") {
i++;
line = line " file:" consolelog;
}
if ($i == "-kernel") {
i++;
line = line " " image;
}
}
print line;
}'
......@@ -73,6 +73,10 @@ usage () {
while test $# -gt 0
do
case "$1" in
--allcpus)
cpus=$TORTURE_ALLOTED_CPUS
max_cpus=$TORTURE_ALLOTED_CPUS
;;
--bootargs|--bootarg)
checkarg --bootargs "(list of kernel boot arguments)" "$#" "$2" '.*' '^--'
TORTURE_BOOTARGS="$2"
......@@ -180,13 +184,14 @@ do
shift
;;
--torture)
checkarg --torture "(suite name)" "$#" "$2" '^\(lock\|rcu\|rcuperf\)$' '^--'
checkarg --torture "(suite name)" "$#" "$2" '^\(lock\|rcu\|rcuperf\|refscale\)$' '^--'
TORTURE_SUITE=$2
shift
if test "$TORTURE_SUITE" = rcuperf
if test "$TORTURE_SUITE" = rcuperf || test "$TORTURE_SUITE" = refscale
then
# If you really want jitter for rcuperf, specify
# it after specifying rcuperf. (But why?)
# If you really want jitter for refscale or
# rcuperf, specify it after specifying the rcuperf
# or the refscale. (But why jitter in these cases?)
jitter=0
fi
;;
......@@ -333,6 +338,8 @@ then
mkdir -p "$resdir" || :
fi
mkdir $resdir/$ds
TORTURE_RESDIR="$resdir/$ds"; export TORTURE_RESDIR
TORTURE_STOPFILE="$resdir/$ds/STOP"; export TORTURE_STOPFILE
echo Results directory: $resdir/$ds
echo $scriptname $args
touch $resdir/$ds/log
......@@ -497,3 +504,7 @@ fi
# Tracing: trace_event=rcu:rcu_grace_period,rcu:rcu_future_grace_period,rcu:rcu_grace_period_init,rcu:rcu_nocb_wake,rcu:rcu_preempt_task,rcu:rcu_unlock_preempted_task,rcu:rcu_quiescent_state_report,rcu:rcu_fqs,rcu:rcu_callback,rcu:rcu_kfree_callback,rcu:rcu_batch_start,rcu:rcu_invoke_callback,rcu:rcu_invoke_kfree_callback,rcu:rcu_batch_end,rcu:rcu_torture_read,rcu:rcu_barrier
# Function-graph tracing: ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop
# Also --kconfig "CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y"
# Control buffer size: --bootargs trace_buf_size=3k
# Get trace-buffer dumps on all oopses: --bootargs ftrace_dump_on_oops
# Ditto, but dump only the oopsing CPU: --bootargs ftrace_dump_on_oops=orig_cpu
# Heavy-handed way to also dump on warnings: --bootargs panic_on_warn
......@@ -33,8 +33,8 @@ then
fi
cat /dev/null > $file.diags
# Check for proper termination, except that rcuperf runs don't indicate this.
if test "$TORTURE_SUITE" != rcuperf
# Check for proper termination, except for rcuperf and refscale.
if test "$TORTURE_SUITE" != rcuperf && test "$TORTURE_SUITE" != refscale
then
# check for abject failure
......@@ -44,11 +44,23 @@ then
tail -1 |
awk '
{
for (i=NF-8;i<=NF;i++)
normalexit = 1;
for (i=NF-8;i<=NF;i++) {
if (i <= 0 || i !~ /^[0-9]*$/) {
bangstring = $0;
gsub(/^\[[^]]*] /, "", bangstring);
print bangstring;
normalexit = 0;
exit 0;
}
sum+=$i;
}
}
END { print sum }'`
print_bug $title FAILURE, $nerrs instances
END {
if (normalexit)
print sum " instances"
}'`
print_bug $title FAILURE, $nerrs
exit
fi
......@@ -104,10 +116,7 @@ then
fi
fi | tee -a $file.diags
egrep 'Badness|WARNING:|Warn|BUG|===========|Call Trace:|Oops:|detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state|rcu_.*kthread starved for' < $file |
grep -v 'ODEBUG: ' |
grep -v 'This means that this is a DEBUG kernel and it is' |
grep -v 'Warning: unable to open an initial console' > $T.diags
console-badness.sh < $file > $T.diags
if test -s $T.diags
then
print_warning "Assertion failure in $file $title"
......
CONFIG_RCU_REF_SCALE_TEST=y
CONFIG_PRINTK_TIME=y
CONFIG_SMP=y
CONFIG_PREEMPT_NONE=y
CONFIG_PREEMPT_VOLUNTARY=n
CONFIG_PREEMPT=n
#CHECK#CONFIG_PREEMPT_RCU=n
CONFIG_HZ_PERIODIC=n
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ_FULL=n
CONFIG_RCU_FAST_NO_HZ=n
CONFIG_HOTPLUG_CPU=n
CONFIG_SUSPEND=n
CONFIG_HIBERNATION=n
CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_PROVE_LOCKING=n
CONFIG_RCU_BOOST=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
CONFIG_RCU_EXPERT=y
CONFIG_SMP=y
CONFIG_PREEMPT_NONE=n
CONFIG_PREEMPT_VOLUNTARY=n
CONFIG_PREEMPT=y
#CHECK#CONFIG_PREEMPT_RCU=y
CONFIG_HZ_PERIODIC=n
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ_FULL=n
CONFIG_RCU_FAST_NO_HZ=n
CONFIG_HOTPLUG_CPU=n
CONFIG_SUSPEND=n
CONFIG_HIBERNATION=n
CONFIG_RCU_NOCB_CPU=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_PROVE_LOCKING=n
CONFIG_RCU_BOOST=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
CONFIG_RCU_EXPERT=y
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+
#
# Torture-suite-dependent shell functions for the rest of the scripts.
#
# Copyright (C) IBM Corporation, 2015
#
# Authors: Paul E. McKenney <paulmck@linux.ibm.com>
# per_version_boot_params bootparam-string config-file seconds
#
# Adds per-version torture-module parameters to kernels supporting them.
per_version_boot_params () {
echo $1 refscale.shutdown=1 \
refscale.verbose=1
}
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment