Commits · dd56af42bd829c6e770ed69812bd65a04eaeb1e4 · Kirill Smelkov / linux

18 Sep, 2014 1 commit

rcu: Eliminate deadlock between CPU hotplug and expedited grace periods · dd56af42

Paul E. McKenney authored Aug 25, 2014

Currently, the expedited grace-period primitives do get_online_cpus().
This greatly simplifies their implementation, but means that calls
to them holding locks that are acquired by CPU-hotplug notifiers (to
say nothing of calls to these primitives from CPU-hotplug notifiers)
can deadlock. But this is starting to become inconvenient, as can be
seen here: https://lkml.org/lkml/2014/8/5/754. The problem in this
case is that some developers need to acquire a mutex from a CPU-hotplug
notifier, but also need to hold it across a synchronize_rcu_expedited().
As noted above, this currently results in deadlock.

This commit avoids the deadlock and retains the simplicity by creating
a try_get_online_cpus(), which returns false if the get_online_cpus()
reference count could not immediately be incremented. If a call to
try_get_online_cpus() returns true, the expedited primitives operate as
before. If a call returns false, the expedited primitives fall back to
normal grace-period operations. This falling back of course results in
increased grace-period latency, but only during times when CPU hotplug
operations are actually in flight. The effect should therefore be
negligible during normal operation.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Tested-by: Lan Tianyu <tianyu.lan@intel.com>

dd56af42

16 Sep, 2014 30 commits

locktorture: Document boot/module parameters · ec4518aa
Paul E. McKenney authored Sep 12, 2014
```
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
```
ec4518aa

rcutorture: Rename rcutorture_runnable parameter · 59da22a0

Paul E. McKenney authored Sep 12, 2014

This commit changes rcutorture_runnable to torture_runnable, which is
consistent with the names of the other parameters and is a bit shorter
as well.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

59da22a0

locktorture: Add test scenario for rwsem_lock · aaa693e3
Paul E. McKenney authored Sep 12, 2014
```
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
```
aaa693e3
locktorture: Add test scenario for mutex_lock · 862917a5
Paul E. McKenney authored Sep 12, 2014
```
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
```
862917a5
locktorture: Make torture scripting account for new _runnable name · 0acf0153
Paul E. McKenney authored Sep 12, 2014
```
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
```
0acf0153

locktorture: Introduce torture context · 630952c2

Davidlohr Bueso authored Sep 11, 2014

The amount of global variables is getting pretty ugly. Group variables
related to the execution (ie: not parameters) in a new context structure.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

630952c2

locktorture: Support rwsems · 4a3b427f

Davidlohr Bueso authored Sep 11, 2014

We can easily do so with our new reader lock support. Just an arbitrary
design default: readers have higher (5x) critical region latencies than
writers: 50 ms and 10 ms, respectively.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

4a3b427f

locktorture: Add infrastructure for torturing read locks · 4f6332c1

Davidlohr Bueso authored Sep 11, 2014

Most of it is based on what we already have for writers. This allows
readers to be very independent (and thus configurable), enabling
future module parameters to control things such as rw distribution.
Furthermore, readers have their own delaying function, allowing us
to test different rw critical region latencies, and stress locking
internals. Similarly, statistics, for now will only serve for the
number of lock acquisitions -- as opposed to writers, readers have
no failure detection.

In addition, introduce a new nreaders_stress module parameter. The
default number of readers will be the same number of writers threads.
Writer threads are interleaved with readers. Documentation is updated,
respectively.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

4f6332c1

torture: Address race in module cleanup · d36a7a0d

Davidlohr Bueso authored Sep 11, 2014

When performing module cleanups by calling torture_cleanup() the
'torture_type' string in nullified However, callers are not necessarily
done, and might still need to reference the variable. This impacts
both rcutorture and locktorture, causing printing things like:

[   94.226618] (null)-torture: Stopping lock_torture_writer task
[   94.226624] (null)-torture: Stopping lock_torture_stats task

Thus delay this operation until the very end of the cleanup process.
The consequence (which shouldn't matter for this kid of program) is,
of course, that we delay the window between rmmod and modprobing,
for instance in module_torture_begin().
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

d36a7a0d

locktorture: Make statistics generic · 1e6757a9

Davidlohr Bueso authored Sep 11, 2014

The statistics structure can serve well for both reader and writer
locks, thus simply rename some fields that mention 'write' and leave
the declaration of lwsa.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

1e6757a9

locktorture: Teach about lock debugging · f095bfc0

Davidlohr Bueso authored Sep 11, 2014

Regular locks are very different than locks with debugging. For instance
for mutexes, debugging forces to only take the slowpaths. As such, the
locktorture module should take this into account when printing related
information -- specifically when printing user passed parameters, it seems
the right place for such info.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

f095bfc0

locktorture: Support mutexes · 42ddc75d

Davidlohr Bueso authored Sep 11, 2014

Add a "mutex_lock" torture test. The main difference with the already
existing spinlock tests is that the latency of the critical region
is much larger. We randomly delay for (arbitrarily) either 500 ms or,
otherwise, 25 ms. While this can considerably reduce the amount of
writes compared to non blocking locks, if run long enough it can have
the same torturous effect. Furthermore it is more representative of
mutex hold times and can stress better things like thrashing.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

42ddc75d

locktorture: Add documentation · cdf26bb1

Davidlohr Bueso authored Sep 11, 2014

Just like Documentation/RCU/torture.txt, begin a document for the
locktorture module. This module is still pretty green, so I have
just added some specific sections to the doc (general desc, params,
usage, etc.). Further development should update the file.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
[ paulmck: Apply Randy Dunlap review comments. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

cdf26bb1

locktorture: Rename locktorture_runnable parameter · 23a8e5c2

Davidlohr Bueso authored Sep 11, 2014

... to just 'torture_runnable'. It follows other variable naming
and is shorter.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

23a8e5c2

Merge branch 'rcu-tasks.2014.09.10a' into HEAD · 96b46727
Paul E. McKenney authored Sep 16, 2014
```
rcu-tasks.2014.09.10a: Add RCU-tasks flavor of RCU.
```
96b46727

Merge branches 'doc.2014.09.07a', 'fixes.2014.09.10a', 'nocb-nohz.2014.09.16b'... · e98d06dd

Paul E. McKenney authored Sep 16, 2014

Merge branches 'doc.2014.09.07a', 'fixes.2014.09.10a', 'nocb-nohz.2014.09.16b' and 'torture.2014.09.07a' into HEAD

doc.2014.09.07a: Documentation updates.
fixes.2014.09.10a: Miscellaneous fixes.
nocb-nohz.2014.09.16b: No-CBs CPUs and NO_HZ_FULL updates.
torture.2014.09.07a: Torture-test updates.

e98d06dd

rcu: Avoid misordering in nocb_leader_wait() · c847f142

Paul E. McKenney authored Aug 12, 2014

The NOCB follower wakeup ordering depends on the store to the tail
pointer happening before the wakeup. However, because atomic_long_add()
does not return a value, it does not provide ordering guarantees, and
the locking in wake_up() only guarantees that the store will happen
before the unlock, which might be too late. Even though this is only a
theoretical issue, this commit adds a smp_mb__after_atomic() after the
final atomic_long_add() to provide the needed ordering guarantee.
Reported-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>

c847f142

rcu: Handle NOCB callbacks from irq-disabled idle code · 1772947b

Paul E. McKenney authored Aug 12, 2014

If an RCU callback is queued on a no-CBs CPU from idle code with irqs
disabled, and if that CPU stays idle forever after, the callback will
never be invoked. This commit therefore adds a check for this situation
in ____call_rcu_nocb(), invoking the RCU core solely for the purpose
of the ensuing return-to-idle transition. (If the CPU doesn't return
to idle, the next scheduling-clock interrupt will fix things up.)
Reported-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>

1772947b

rcu: Avoid misordering in __call_rcu_nocb_enqueue() · 39953dfd

Paul E. McKenney authored Aug 12, 2014

The NOCB leader wakeup ordering depends on the store to the header
happening before the check for the leader already being awake.  However,
because atomic_long_add() does not return a value, it does not provide
ordering guarantees, the incorrect comment in wake_nocb_leader()
notwithstanding.  This commit therefore adds a smp_mb__after_atomic()
after the final atomic_long_add() to provide the needed ordering
guarantee.
Reported-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>

39953dfd

rcu: Don't track sysidle state if no nohz_full= CPUs · 663e1310

Paul E. McKenney authored Jul 21, 2014

If there are no nohz_full= CPUs, then there is currently no reason to
track sysidle state.  This commit therefore short-circuits this state
tracking if !tick_nohz_full_enabled().

Note that these checks will need to be revisited if nohz_full= state
can ever be changed at runtime.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>

663e1310

rcu: Eliminate redundant rcu_sysidle_state variable · 417e8d26

Paul E. McKenney authored Jul 21, 2014

Now that we have rcu_state_p, which references rcu_preempt_state for
TREE_PREEMPT_RCU and rcu_sched_state for TREE_RCU, we don't need a
separate rcu_sysidle_state variable.  This commit therefore eliminates
rcu_preempt_state in favor of rcu_state_p.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>

417e8d26

rcu: Check for have_rcu_nocb_mask instead of rcu_nocb_mask · 22c2f669

Pranith Kumar authored Jul 17, 2014

If we configure a kernel with CONFIG_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_NONE=y and
CONFIG_CPUMASK_OFFSTACK=n and do not pass in a rcu_nocb= boot parameter, the
cpumask rcu_nocb_mask can be garbage instead of NULL.

Hence this commit replaces checks for rcu_nocb_mask == NULL with a check for
have_rcu_nocb_mask.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>

22c2f669

rcu: Create rcuo kthreads only for onlined CPUs · 35ce7f29

Paul E. McKenney authored Jul 11, 2014

RCU currently uses for_each_possible_cpu() to spawn rcuo kthreads,
which can result in more rcuo kthreads than one would expect, for
example, derRichard reported 64 CPUs worth of rcuo kthreads on an
8-CPU image.  This commit therefore creates rcuo kthreads only for
those CPUs that actually come online.

This was reported by derRichard on the OFTC IRC network.
Reported-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>

35ce7f29

rcu: Rationalize kthread spawning · 9386c0b7

Paul E. McKenney authored Jul 13, 2014

Currently, RCU spawns kthreads from several different early_initcall()
functions.  Although this has served RCU well for quite some time,
as more kthreads are added a more deterministic approach is required.
This commit therefore causes all of RCU's early-boot kthreads to be
spawned from a single early_initcall() function.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>

9386c0b7

rcu: Return false instead of 0 in rcu_nocb_adopt_orphan_cbs() · f4aa84ba

Pranith Kumar authored Jul 08, 2014

Return false instead of 0 in rcu_nocb_adopt_orphan_cbs() as this has
bool as return type.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>

f4aa84ba

rcu: Use false for return in __call_rcu_nocb() · 4afc7e26

Pranith Kumar authored Jul 08, 2014

Return false instead of 0 in __call_rcu_nocb() as this has bool as
return type.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>

4afc7e26

rcu: Use true/false for return in rcu_nocb_adopt_orphan_cbs() · 0a9e1e11

Pranith Kumar authored Jul 08, 2014

Return true/false in rcu_nocb_adopt_orphan_cbs() instead of 0/1 as
this function has return type of bool.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>

0a9e1e11

rcu: Use true/false for return in __call_rcu_nocb() · c271d3a9

Pranith Kumar authored Jul 08, 2014

Return true/false instead of 0/1 in __call_rcu_nocb() as this returns a
bool type.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>

c271d3a9

rcu: Check the return value of zalloc_cpumask_var() · 949cccdb

Pranith Kumar authored Jul 25, 2014

This commit checks the return value of the zalloc_cpumask_var() used for
allocating cpumask for rcu_nocb_mask.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>

949cccdb

rcu: Fix attempt to avoid unsolicited offloading of callbacks · f4579fc5

Paul E. McKenney authored Jul 25, 2014

Commit b58cc46c (rcu: Don't offload callbacks unless specifically
requested) failed to adjust the callback lists of the CPUs that are
known to be no-CBs CPUs only because they are also nohz_full= CPUs.
This failure can result in callbacks that are posted during early boot
getting stranded on nxtlist for CPUs whose no-CBs property becomes
apparent late, and there can also be spurious warnings about offline
CPUs posting callbacks.

This commit fixes these problems by adding an early-boot rcu_init_nohz()
that properly initializes the no-CBs CPUs.

Note that kernels built with CONFIG_RCU_NOCB_CPU_ALL=y or with
CONFIG_RCU_NOCB_CPU=n do not exhibit this bug.  Neither do kernels
booted without the nohz_full= boot parameter.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>

f4579fc5

10 Sep, 2014 1 commit

rcutorture: Add RCU-tasks tests to default rcutorture list · a53dd6a6

Paul E. McKenney authored Sep 10, 2014

Although the test cases have been added, they must be specified explicitly
via the kvm.sh --configs argument in order to run them.  This commit
therefore adds the RCU-tasks tests to the CFLIST so that they will be
run automatically by default.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

a53dd6a6

07 Sep, 2014 8 commits

rcu: Per-CPU operation cleanups to rcu_*_qs() functions · 284a8c93

Paul E. McKenney authored Aug 14, 2014

The rcu_bh_qs(), rcu_preempt_qs(), and rcu_sched_qs() functions use
old-style per-CPU variable access and write to ->passed_quiesce even
if it is already set. This commit therefore updates to use the new-style
per-CPU variable access functions and avoids the spurious writes.
This commit also eliminates the "cpu" argument to these functions because
they are always invoked on the indicated CPU.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

284a8c93

rcu: Remove local_irq_disable() in rcu_preempt_note_context_switch() · 1d082fd0

Paul E. McKenney authored Aug 14, 2014

The rcu_preempt_note_context_switch() function is on a scheduling fast
path, so it would be good to avoid disabling irqs. The reason that irqs
are disabled is to synchronize process-level and irq-handler access to
the task_struct ->rcu_read_unlock_special bitmask. This commit therefore
makes ->rcu_read_unlock_special instead be a union of bools with a short
allowing single-access checks in RCU's __rcu_read_unlock(). This results
in the process-level and irq-handler accesses being simple loads and
stores, so that irqs need no longer be disabled. This commit therefore
removes the irq disabling from rcu_preempt_note_context_switch().
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

1d082fd0

rcu: Additional information on RCU-tasks stall-warning messages · 4ff475ed
Paul E. McKenney authored Aug 10, 2014
```
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
```
4ff475ed

rcu: Remove redundant preempt_disable() from rcu_note_voluntary_context_switch() · 01a81330

Paul E. McKenney authored Aug 05, 2014

In theory, synchronize_sched() requires a read-side critical section
to order against.  In practice, preemption can be thought of as
being disabled across every machine instruction, at least for those
machine instructions that are not in the idle loop and not on offline
CPUs.  So this commit removes the redundant preempt_disable() from
rcu_note_voluntary_context_switch().

Please note that the single instruction in question is the store of
zero to ->rcu_tasks_holdout.  The "if" is simply a performance optimization
that avoids unnecessary stores.  To see this, keep in mind that both
the "if" condition and the store are in a quiescent state.  Therefore,
even if the task is preempted for a full grace period (presumably due
to its having done a context switch beforehand), the store will be
recording a legitimate quiescent state.
Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Conflicts:
	include/linux/rcupdate.h

01a81330

rcu: Make rcu_tasks_kthread()'s GP-wait loop allow preemption · 8f20a5e8

Paul E. McKenney authored Aug 05, 2014

The grace-period-wait loop in rcu_tasks_kthread() is under (unnecessary)
RCU protection, and therefore has no preemption points in a PREEMPT=n
kernel.  This commit therefore removes the RCU protection and inserts
cond_resched().
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

8f20a5e8

rcu: Make TASKS_RCU handle nohz_full= CPUs · 176f8f7a

Paul E. McKenney authored Aug 04, 2014

Currently TASKS_RCU would ignore a CPU running a task in nohz_full=
usermode execution. There would be neither a context switch nor a
scheduling-clock interrupt to tell TASKS_RCU that the task in question
had passed through a quiescent state. The grace period would therefore
extend indefinitely. This commit therefore makes RCU's dyntick-idle
subsystem record the task_struct structure of the task that is running
in dyntick-idle mode on each CPU. The TASKS_RCU grace period can
then access this information and record a quiescent state on
behalf of any CPU running in dyntick-idle usermode.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

176f8f7a

rcu: Defer rcu_tasks_kthread() creation till first call_rcu_tasks() · 84a8f446

Paul E. McKenney authored Aug 04, 2014

It is expected that many sites will have CONFIG_TASKS_RCU=y, but
will never actually invoke call_rcu_tasks(). For such sites, creating
rcu_tasks_kthread() at boot is wasteful. This commit therefore defers
creation of this kthread until the time of the first call_rcu_tasks().

This of course means that the first call_rcu_tasks() must be invoked
from process context after the scheduler is fully operational.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

84a8f446

documentation: Add verbiage on RCU-tasks stall warning messages · 37fe5f0e

Paul E. McKenney authored Jul 29, 2014

This commit documents RCU-tasks stall warning messages and also describes
when to use the new cond_resched_rcu_qs() API.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

37fe5f0e