Commit 330e9e46 authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "The sole purpose of these changes is to shrink and simplify the RCU
  code base, which has suffered from creeping bloat over the past couple
  of years. The end result is a net removal of ~2700 lines of code:

     79 files changed, 1496 insertions(+), 4211 deletions(-)

  Plus there's a marked reduction in the Kconfig space complexity as
  well, here's the number of matches on 'grep RCU' in the .config:

                               before       after

     x86-defconfig                 17          15
     x86-allmodconfig              33          20"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (86 commits)
  rcu: Remove RCU CPU stall warnings from Tiny RCU
  rcu: Remove event tracing from Tiny RCU
  rcu: Move RCU debug Kconfig options to kernel/rcu
  rcu: Move RCU non-debug Kconfig options to kernel/rcu
  rcu: Eliminate NOCBs CPU-state Kconfig options
  rcu: Remove debugfs tracing
  srcu: Remove Classic SRCU
  srcu: Fix rcutorture-statistics typo
  rcu: Remove SPARSE_RCU_POINTER Kconfig option
  rcu: Remove the now-obsolete PROVE_RCU_REPEATEDLY Kconfig option
  rcu: Remove typecheck() from RCU locking wrapper functions
  rcu: Remove #ifdef moving rcu_end_inkernel_boot from rcupdate.h
  rcu: Remove nohz_full full-system-idle state machine
  rcu: Remove the RCU_KTHREAD_PRIO Kconfig option
  rcu: Remove *_SLOW_* Kconfig options
  srcu: Use rnp->lock wrappers to replace explicit memory barriers
  rcu: Move rnp->lock wrappers for SRCU use
  rcu: Convert rnp->lock wrappers to macros for SRCU use
  rcu: Refactor #includes from include/linux/rcupdate.h
  bcm47xx: Fix build regression
  ...
parents e94693f7 567b64aa
......@@ -28,8 +28,6 @@ stallwarn.txt
- RCU CPU stall warnings (module parameter rcu_cpu_stall_suppress)
torture.txt
- RCU Torture Test Operation (CONFIG_RCU_TORTURE_TEST)
trace.txt
- CONFIG_RCU_TRACE debugfs files and formats
UP.txt
- RCU on Uniprocessor Systems
whatisRCU.txt
......
......@@ -559,9 +559,7 @@ The <tt>rcu_access_pointer()</tt> on line&nbsp;6 is similar to
For <tt>remove_gp_synchronous()</tt>, as long as all modifications
to <tt>gp</tt> are carried out while holding <tt>gp_lock</tt>,
the above optimizations are harmless.
However,
with <tt>CONFIG_SPARSE_RCU_POINTER=y</tt>,
<tt>sparse</tt> will complain if you
However, <tt>sparse</tt> will complain if you
define <tt>gp</tt> with <tt>__rcu</tt> and then
access it without using
either <tt>rcu_access_pointer()</tt> or <tt>rcu_dereference()</tt>.
......@@ -1849,7 +1847,8 @@ mass storage, or user patience, whichever comes first.
If the nesting is not visible to the compiler, as is the case with
mutually recursive functions each in its own translation unit,
stack overflow will result.
If the nesting takes the form of loops, either the control variable
If the nesting takes the form of loops, perhaps in the guise of tail
recursion, either the control variable
will overflow or (in the Linux kernel) you will get an RCU CPU stall warning.
Nevertheless, this class of RCU implementations is one
of the most composable constructs in existence.
......@@ -1977,9 +1976,8 @@ guard against mishaps and misuse:
and <tt>rcu_dereference()</tt>, perhaps (incorrectly)
substituting a simple assignment.
To catch this sort of error, a given RCU-protected pointer may be
tagged with <tt>__rcu</tt>, after which running sparse
with <tt>CONFIG_SPARSE_RCU_POINTER=y</tt> will complain
about simple-assignment accesses to that pointer.
tagged with <tt>__rcu</tt>, after which sparse
will complain about simple-assignment accesses to that pointer.
Arnd Bergmann made me aware of this requirement, and also
supplied the needed
<a href="https://lwn.net/Articles/376011/">patch series</a>.
......@@ -2036,7 +2034,7 @@ guard against mishaps and misuse:
some other synchronization mechanism, for example, reference
counting.
<li> In kernels built with <tt>CONFIG_RCU_TRACE=y</tt>, RCU-related
information is provided via both debugfs and event tracing.
information is provided via event tracing.
<li> Open-coded use of <tt>rcu_assign_pointer()</tt> and
<tt>rcu_dereference()</tt> to create typical linked
data structures can be surprisingly error-prone.
......@@ -2519,11 +2517,7 @@ It is similarly socially unacceptable to interrupt an
<tt>nohz_full</tt> CPU running in userspace.
RCU must therefore track <tt>nohz_full</tt> userspace
execution.
And in
<a href="https://lwn.net/Articles/558284/"><tt>CONFIG_NO_HZ_FULL_SYSIDLE=y</tt></a>
kernels, RCU must separately track idle CPUs on the one hand and
CPUs that are either idle or executing in userspace on the other.
In both cases, RCU must be able to sample state at two points in
RCU must therefore be able to sample state at two points in
time, and be able to determine whether or not some other CPU spent
any time idle and/or executing in userspace.
......@@ -2935,6 +2929,20 @@ The reason that this is possible is that SRCU is insensitive
to whether or not a CPU is online, which means that <tt>srcu_barrier()</tt>
need not exclude CPU-hotplug operations.
<p>
SRCU also differs from other RCU flavors in that SRCU's expedited and
non-expedited grace periods are implemented by the same mechanism.
This means that in the current SRCU implementation, expediting a
future grace period has the side effect of expediting all prior
grace periods that have not yet completed.
(But please note that this is a property of the current implementation,
not necessarily of future implementations.)
In addition, if SRCU has been idle for longer than the interval
specified by the <tt>srcutree.exp_holdoff</tt> kernel boot parameter
(25&nbsp;microseconds by default),
and if a <tt>synchronize_srcu()</tt> invocation ends this idle period,
that invocation will be automatically expedited.
<p>
As of v4.12, SRCU's callbacks are maintained per-CPU, eliminating
a locking bottleneck present in prior kernel versions.
......
......@@ -413,11 +413,11 @@ over a rather long period of time, but improvements are always welcome!
read-side critical sections. It is the responsibility of the
RCU update-side primitives to deal with this.
17. Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the
__rcu sparse checks (enabled by CONFIG_SPARSE_RCU_POINTER) to
validate your RCU code. These can help find problems as follows:
17. Use CONFIG_PROVE_LOCKING, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the
__rcu sparse checks to validate your RCU code. These can help
find problems as follows:
CONFIG_PROVE_RCU: check that accesses to RCU-protected data
CONFIG_PROVE_LOCKING: check that accesses to RCU-protected data
structures are carried out under the proper RCU
read-side critical section, while holding the right
combination of locks, or whatever other conditions
......
This diff is collapsed.
......@@ -3238,21 +3238,17 @@
rcutree.gp_cleanup_delay= [KNL]
Set the number of jiffies to delay each step of
RCU grace-period cleanup. This only has effect
when CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP is set.
RCU grace-period cleanup.
rcutree.gp_init_delay= [KNL]
Set the number of jiffies to delay each step of
RCU grace-period initialization. This only has
effect when CONFIG_RCU_TORTURE_TEST_SLOW_INIT
is set.
RCU grace-period initialization.
rcutree.gp_preinit_delay= [KNL]
Set the number of jiffies to delay each step of
RCU grace-period pre-initialization, that is,
the propagation of recent CPU-hotplug changes up
the rcu_node combining tree. This only has effect
when CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT is set.
the rcu_node combining tree.
rcutree.rcu_fanout_exact= [KNL]
Disable autobalancing of the rcu_node combining
......@@ -3328,6 +3324,17 @@
This wake_up() will be accompanied by a
WARN_ONCE() splat and an ftrace_dump().
rcuperf.gp_async= [KNL]
Measure performance of asynchronous
grace-period primitives such as call_rcu().
rcuperf.gp_async_max= [KNL]
Specify the maximum number of outstanding
callbacks per writer thread. When a writer
thread exceeds this limit, it invokes the
corresponding flavor of rcu_barrier() to allow
previously posted callbacks to drain.
rcuperf.gp_exp= [KNL]
Measure performance of expedited synchronous
grace-period primitives.
......@@ -3355,17 +3362,22 @@
rcuperf.perf_runnable= [BOOT]
Start rcuperf running at boot time.
rcuperf.perf_type= [KNL]
Specify the RCU implementation to test.
rcuperf.shutdown= [KNL]
Shut the system down after performance tests
complete. This is useful for hands-off automated
testing.
rcuperf.perf_type= [KNL]
Specify the RCU implementation to test.
rcuperf.verbose= [KNL]
Enable additional printk() statements.
rcuperf.writer_holdoff= [KNL]
Write-side holdoff between grace periods,
in microseconds. The default of zero says
no holdoff.
rcutorture.cbflood_inter_holdoff= [KNL]
Set holdoff time (jiffies) between successive
callback-flood tests.
......@@ -3803,6 +3815,15 @@
spia_pedr=
spia_peddr=
srcutree.counter_wrap_check [KNL]
Specifies how frequently to check for
grace-period sequence counter wrap for the
srcu_data structure's ->srcu_gp_seq_needed field.
The greater the number of bits set in this kernel
parameter, the less frequently counter wrap will
be checked for. Note that the bottom two bits
are ignored.
srcutree.exp_holdoff [KNL]
Specifies how many nanoseconds must elapse
since the end of the last SRCU grace period for
......
......@@ -303,6 +303,11 @@ defined which accomplish this::
void smp_mb__before_atomic(void);
void smp_mb__after_atomic(void);
Preceding a non-value-returning read-modify-write atomic operation with
smp_mb__before_atomic() and following it with smp_mb__after_atomic()
provides the same full ordering that is provided by value-returning
read-modify-write atomic operations.
For example, smp_mb__before_atomic() can be used like so::
obj->dead = 1;
......
......@@ -103,9 +103,3 @@ have already built it.
The optional make variable CF can be used to pass arguments to sparse. The
build system passes -Wbitwise to sparse automatically.
Checking RCU annotations
~~~~~~~~~~~~~~~~~~~~~~~~
RCU annotations are not checked by default. To enable RCU annotation
checks, include -DCONFIG_SPARSE_RCU_POINTER in your CF flags.
......@@ -109,13 +109,12 @@ SCHED_SOFTIRQ: Do all of the following:
on that CPU. If a thread that expects to run on the de-jittered
CPU awakens, the scheduler will send an IPI that can result in
a subsequent SCHED_SOFTIRQ.
2. Build with CONFIG_RCU_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_ALL=y,
CONFIG_NO_HZ_FULL=y, and, in addition, ensure that the CPU
to be de-jittered is marked as an adaptive-ticks CPU using the
"nohz_full=" boot parameter. This reduces the number of
scheduler-clock interrupts that the de-jittered CPU receives,
minimizing its chances of being selected to do the load balancing
work that runs in SCHED_SOFTIRQ context.
2. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be de-jittered
is marked as an adaptive-ticks CPU using the "nohz_full="
boot parameter. This reduces the number of scheduler-clock
interrupts that the de-jittered CPU receives, minimizing its
chances of being selected to do the load balancing work that
runs in SCHED_SOFTIRQ context.
3. To the extent possible, keep the CPU out of the kernel when it
is non-idle, for example, by avoiding system calls and by
forcing both kernel threads and interrupts to execute elsewhere.
......@@ -135,11 +134,10 @@ HRTIMER_SOFTIRQ: Do all of the following:
RCU_SOFTIRQ: Do at least one of the following:
1. Offload callbacks and keep the CPU in either dyntick-idle or
adaptive-ticks state by doing all of the following:
a. Build with CONFIG_RCU_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_ALL=y,
CONFIG_NO_HZ_FULL=y, and, in addition ensure that the CPU
to be de-jittered is marked as an adaptive-ticks CPU using
the "nohz_full=" boot parameter. Bind the rcuo kthreads
to housekeeping CPUs, which can tolerate OS jitter.
a. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be
de-jittered is marked as an adaptive-ticks CPU using the
"nohz_full=" boot parameter. Bind the rcuo kthreads to
housekeeping CPUs, which can tolerate OS jitter.
b. To the extent possible, keep the CPU out of the kernel
when it is non-idle, for example, by avoiding system
calls and by forcing both kernel threads and interrupts
......@@ -236,11 +234,10 @@ To reduce its OS jitter, do at least one of the following:
is feasible only if your workload never requires RCU priority
boosting, for example, if you ensure frequent idle time on all
CPUs that might execute within the kernel.
3. Build with CONFIG_RCU_NOCB_CPU=y and CONFIG_RCU_NOCB_CPU_ALL=y,
which offloads all RCU callbacks to kthreads that can be moved
off of CPUs susceptible to OS jitter. This approach prevents the
rcuc/%u kthreads from having any work to do, so that they are
never awakened.
3. Build with CONFIG_RCU_NOCB_CPU=y and boot with the rcu_nocbs=
boot parameter offloading RCU callbacks from all CPUs susceptible
to OS jitter. This approach prevents the rcuc/%u kthreads from
having any work to do, so that they are never awakened.
4. Ensure that the CPU never enters the kernel, and, in particular,
avoid initiating any CPU hotplug operations on this CPU. This is
another way of preventing any callbacks from being queued on the
......
......@@ -27,7 +27,7 @@ The purpose of this document is twofold:
(2) to provide a guide as to how to use the barriers that are available.
Note that an architecture can provide more than the minimum requirement
for any particular barrier, but if the architecure provides less than
for any particular barrier, but if the architecture provides less than
that, that architecture is incorrect.
Note also that it is possible that a barrier may be a no-op for an
......
......@@ -194,32 +194,9 @@ that the RCU callbacks are processed in a timely fashion.
Another approach is to offload RCU callback processing to "rcuo" kthreads
using the CONFIG_RCU_NOCB_CPU=y Kconfig option. The specific CPUs to
offload may be selected via several methods:
1. One of three mutually exclusive Kconfig options specify a
build-time default for the CPUs to offload:
a. The CONFIG_RCU_NOCB_CPU_NONE=y Kconfig option results in
no CPUs being offloaded.
b. The CONFIG_RCU_NOCB_CPU_ZERO=y Kconfig option causes
CPU 0 to be offloaded.
c. The CONFIG_RCU_NOCB_CPU_ALL=y Kconfig option causes all
CPUs to be offloaded. Note that the callbacks will be
offloaded to "rcuo" kthreads, and that those kthreads
will in fact run on some CPU. However, this approach
gives fine-grained control on exactly which CPUs the
callbacks run on, along with their scheduling priority
(including the default of SCHED_OTHER), and it further
allows this control to be varied dynamically at runtime.
2. The "rcu_nocbs=" kernel boot parameter, which takes a comma-separated
list of CPUs and CPU ranges, for example, "1,3-5" selects CPUs 1,
3, 4, and 5. The specified CPUs will be offloaded in addition to
any CPUs specified as offloaded by CONFIG_RCU_NOCB_CPU_ZERO=y or
CONFIG_RCU_NOCB_CPU_ALL=y. This means that the "rcu_nocbs=" boot
parameter has no effect for kernels built with RCU_NOCB_CPU_ALL=y.
offload may be selected using The "rcu_nocbs=" kernel boot parameter,
which takes a comma-separated list of CPUs and CPU ranges, for example,
"1,3-5" selects CPUs 1, 3, 4, and 5.
The offloaded CPUs will never queue RCU callbacks, and therefore RCU
never prevents offloaded CPUs from entering either dyntick-idle mode
......
......@@ -8,6 +8,7 @@
#ifndef __BCM47XX_NVRAM_H
#define __BCM47XX_NVRAM_H
#include <linux/errno.h>
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/vmalloc.h>
......
......@@ -17,11 +17,7 @@
# define __release(x) __context__(x,-1)
# define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0)
# define __percpu __attribute__((noderef, address_space(3)))
#ifdef CONFIG_SPARSE_RCU_POINTER
# define __rcu __attribute__((noderef, address_space(4)))
#else /* CONFIG_SPARSE_RCU_POINTER */
# define __rcu
#endif /* CONFIG_SPARSE_RCU_POINTER */
# define __private __attribute__((noderef))
extern void __chk_user_ptr(const volatile void __user *);
extern void __chk_io_ptr(const volatile void __iomem *);
......
......@@ -7,6 +7,10 @@
* unlimited scalability while maintaining a constant level of contention
* on the root node.
*
* This seemingly RCU-private file must be available to SRCU users
* because the size of the TREE SRCU srcu_struct structure depends
* on these definitions.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
......
/*
* RCU segmented callback lists
*
* This seemingly RCU-private file must be available to SRCU users
* because the size of the TREE SRCU srcu_struct structure depends
* on these definitions.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
......
This diff is collapsed.
......@@ -25,7 +25,7 @@
#ifndef __LINUX_TINY_H
#define __LINUX_TINY_H
#include <linux/cache.h>
#include <linux/ktime.h>
struct rcu_dynticks;
static inline int rcu_dynticks_snap(struct rcu_dynticks *rdtp)
......@@ -33,10 +33,8 @@ static inline int rcu_dynticks_snap(struct rcu_dynticks *rdtp)
return 0;
}
static inline bool rcu_eqs_special_set(int cpu)
{
return false; /* Never flag non-existent other CPUs! */
}
/* Never flag non-existent other CPUs! */
static inline bool rcu_eqs_special_set(int cpu) { return false; }
static inline unsigned long get_state_synchronize_rcu(void)
{
......@@ -98,159 +96,38 @@ static inline void kfree_call_rcu(struct rcu_head *head,
rcu_note_voluntary_context_switch_lite(current); \
} while (0)
/*
* Take advantage of the fact that there is only one CPU, which
* allows us to ignore virtualization-based context switches.
*/
static inline void rcu_virt_note_context_switch(int cpu)
{
}
/*
* Return the number of grace periods started.
*/
static inline unsigned long rcu_batches_started(void)
{
return 0;
}
/*
* Return the number of bottom-half grace periods started.
*/
static inline unsigned long rcu_batches_started_bh(void)
{
return 0;
}
/*
* Return the number of sched grace periods started.
*/
static inline unsigned long rcu_batches_started_sched(void)
{
return 0;
}
/*
* Return the number of grace periods completed.
*/
static inline unsigned long rcu_batches_completed(void)
{
return 0;
}
/*
* Return the number of bottom-half grace periods completed.
*/
static inline unsigned long rcu_batches_completed_bh(void)
{
return 0;
}
/*
* Return the number of sched grace periods completed.
*/
static inline unsigned long rcu_batches_completed_sched(void)
static inline int rcu_needs_cpu(u64 basemono, u64 *nextevt)
{
*nextevt = KTIME_MAX;
return 0;
}
/*
* Return the number of expedited grace periods completed.
*/
static inline unsigned long rcu_exp_batches_completed(void)
{
return 0;
}
/*
* Return the number of expedited sched grace periods completed.
* Take advantage of the fact that there is only one CPU, which
* allows us to ignore virtualization-based context switches.
*/
static inline unsigned long rcu_exp_batches_completed_sched(void)
{
return 0;
}
static inline void rcu_force_quiescent_state(void)
{
}
static inline void rcu_bh_force_quiescent_state(void)
{
}
static inline void rcu_sched_force_quiescent_state(void)
{
}
static inline void show_rcu_gp_kthreads(void)
{
}
static inline void rcu_cpu_stall_reset(void)
{
}
static inline void rcu_idle_enter(void)
{
}
static inline void rcu_idle_exit(void)
{
}
static inline void rcu_irq_enter(void)
{
}
static inline void rcu_irq_exit_irqson(void)
{
}
static inline void rcu_irq_enter_irqson(void)
{
}
static inline void rcu_irq_exit(void)
{
}
static inline void exit_rcu(void)
{
}
static inline void rcu_virt_note_context_switch(int cpu) { }
static inline void rcu_cpu_stall_reset(void) { }
static inline void rcu_idle_enter(void) { }
static inline void rcu_idle_exit(void) { }
static inline void rcu_irq_enter(void) { }
static inline bool rcu_irq_enter_disabled(void) { return false; }
static inline void rcu_irq_exit_irqson(void) { }
static inline void rcu_irq_enter_irqson(void) { }
static inline void rcu_irq_exit(void) { }
static inline void exit_rcu(void) { }
#if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_SRCU)
extern int rcu_scheduler_active __read_mostly;
void rcu_scheduler_starting(void);
#else /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_SRCU) */
static inline void rcu_scheduler_starting(void)
{
}
static inline void rcu_scheduler_starting(void) { }
#endif /* #else #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_SRCU) */
static inline void rcu_end_inkernel_boot(void) { }
static inline bool rcu_is_watching(void) { return true; }
#if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE)
static inline bool rcu_is_watching(void)
{
return __rcu_is_watching();
}
#else /* defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */
static inline bool rcu_is_watching(void)
{
return true;
}
#endif /* #else defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */
static inline void rcu_request_urgent_qs_task(struct task_struct *t)
{
}
static inline void rcu_all_qs(void)
{
barrier(); /* Avoid RCU read-side critical sections leaking across. */
}
/* Avoid RCU read-side critical sections leaking across. */
static inline void rcu_all_qs(void) { barrier(); }
/* RCUtree hotplug events */
#define rcutree_prepare_cpu NULL
......
......@@ -79,37 +79,20 @@ void cond_synchronize_rcu(unsigned long oldstate);
unsigned long get_state_synchronize_sched(void);
void cond_synchronize_sched(unsigned long oldstate);
extern unsigned long rcutorture_testseq;
extern unsigned long rcutorture_vernum;
unsigned long rcu_batches_started(void);
unsigned long rcu_batches_started_bh(void);
unsigned long rcu_batches_started_sched(void);
unsigned long rcu_batches_completed(void);
unsigned long rcu_batches_completed_bh(void);
unsigned long rcu_batches_completed_sched(void);
unsigned long rcu_exp_batches_completed(void);
unsigned long rcu_exp_batches_completed_sched(void);
void show_rcu_gp_kthreads(void);
void rcu_force_quiescent_state(void);
void rcu_bh_force_quiescent_state(void);
void rcu_sched_force_quiescent_state(void);
void rcu_idle_enter(void);
void rcu_idle_exit(void);
void rcu_irq_enter(void);
void rcu_irq_exit(void);
void rcu_irq_enter_irqson(void);
void rcu_irq_exit_irqson(void);
bool rcu_irq_enter_disabled(void);
void exit_rcu(void);
void rcu_scheduler_starting(void);
extern int rcu_scheduler_active __read_mostly;
void rcu_end_inkernel_boot(void);
bool rcu_is_watching(void);
void rcu_request_urgent_qs_task(struct task_struct *t);
void rcu_all_qs(void);
/* RCUtree hotplug events */
......
......@@ -369,6 +369,26 @@ static __always_inline int spin_trylock_irq(spinlock_t *lock)
raw_spin_trylock_irqsave(spinlock_check(lock), flags); \
})
/**
* spin_unlock_wait - Interpose between successive critical sections
* @lock: the spinlock whose critical sections are to be interposed.
*
* Semantically this is equivalent to a spin_lock() immediately
* followed by a spin_unlock(). However, most architectures have
* more efficient implementations in which the spin_unlock_wait()
* cannot block concurrent lock acquisition, and in some cases
* where spin_unlock_wait() does not write to the lock variable.
* Nevertheless, spin_unlock_wait() can have high overhead, so if
* you feel the need to use it, please check to see if there is
* a better way to get your job done.
*
* The ordering guarantees provided by spin_unlock_wait() are:
*
* 1. All accesses preceding the spin_unlock_wait() happen before
* any accesses in later critical sections for this same lock.
* 2. All accesses following the spin_unlock_wait() happen after
* any accesses in earlier critical sections for this same lock.
*/
static __always_inline void spin_unlock_wait(spinlock_t *lock)
{
raw_spin_unlock_wait(&lock->rlock);
......
......@@ -60,32 +60,15 @@ int init_srcu_struct(struct srcu_struct *sp);
#include <linux/srcutiny.h>
#elif defined(CONFIG_TREE_SRCU)
#include <linux/srcutree.h>
#elif defined(CONFIG_CLASSIC_SRCU)
#include <linux/srcuclassic.h>
#else
#elif defined(CONFIG_SRCU)
#error "Unknown SRCU implementation specified to kernel configuration"
#else
/* Dummy definition for things like notifiers. Actual use gets link error. */
struct srcu_struct { };
#endif
/**
* call_srcu() - Queue a callback for invocation after an SRCU grace period
* @sp: srcu_struct in queue the callback
* @head: structure to be used for queueing the SRCU callback.
* @func: function to be invoked after the SRCU grace period
*
* The callback function will be invoked some time after a full SRCU
* grace period elapses, in other words after all pre-existing SRCU
* read-side critical sections have completed. However, the callback
* function might well execute concurrently with other SRCU read-side
* critical sections that started after call_srcu() was invoked. SRCU
* read-side critical sections are delimited by srcu_read_lock() and
* srcu_read_unlock(), and may be nested.
*
* The callback will be invoked from process context, but must nevertheless
* be fast and must not block.
*/
void call_srcu(struct srcu_struct *sp, struct rcu_head *head,
void (*func)(struct rcu_head *head));
void cleanup_srcu_struct(struct srcu_struct *sp);
int __srcu_read_lock(struct srcu_struct *sp) __acquires(sp);
void __srcu_read_unlock(struct srcu_struct *sp, int idx) __releases(sp);
......
/*
* Sleepable Read-Copy Update mechanism for mutual exclusion,
* classic v4.11 variant.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, you can access it online at
* http://www.gnu.org/licenses/gpl-2.0.html.
*
* Copyright (C) IBM Corporation, 2017
*
* Author: Paul McKenney <paulmck@us.ibm.com>
*/
#ifndef _LINUX_SRCU_CLASSIC_H
#define _LINUX_SRCU_CLASSIC_H
struct srcu_array {
unsigned long lock_count[2];
unsigned long unlock_count[2];
};
struct rcu_batch {
struct rcu_head *head, **tail;
};
#define RCU_BATCH_INIT(name) { NULL, &(name.head) }
struct srcu_struct {
unsigned long completed;
struct srcu_array __percpu *per_cpu_ref;
spinlock_t queue_lock; /* protect ->batch_queue, ->running */
bool running;
/* callbacks just queued */
struct rcu_batch batch_queue;
/* callbacks try to do the first check_zero */
struct rcu_batch batch_check0;
/* callbacks done with the first check_zero and the flip */
struct rcu_batch batch_check1;
struct rcu_batch batch_done;
struct delayed_work work;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
};
void process_srcu(struct work_struct *work);
#define __SRCU_STRUCT_INIT(name) \
{ \
.completed = -300, \
.per_cpu_ref = &name##_srcu_array, \
.queue_lock = __SPIN_LOCK_UNLOCKED(name.queue_lock), \
.running = false, \
.batch_queue = RCU_BATCH_INIT(name.batch_queue), \
.batch_check0 = RCU_BATCH_INIT(name.batch_check0), \
.batch_check1 = RCU_BATCH_INIT(name.batch_check1), \
.batch_done = RCU_BATCH_INIT(name.batch_done), \
.work = __DELAYED_WORK_INITIALIZER(name.work, process_srcu, 0),\
__SRCU_DEP_MAP_INIT(name) \
}
/*
* Define and initialize a srcu struct at build time.
* Do -not- call init_srcu_struct() nor cleanup_srcu_struct() on it.
*
* Note that although DEFINE_STATIC_SRCU() hides the name from other
* files, the per-CPU variable rules nevertheless require that the
* chosen name be globally unique. These rules also prohibit use of
* DEFINE_STATIC_SRCU() within a function. If these rules are too
* restrictive, declare the srcu_struct manually. For example, in
* each file:
*
* static struct srcu_struct my_srcu;
*
* Then, before the first use of each my_srcu, manually initialize it:
*
* init_srcu_struct(&my_srcu);
*
* See include/linux/percpu-defs.h for the rules on per-CPU variables.
*/
#define __DEFINE_SRCU(name, is_static) \
static DEFINE_PER_CPU(struct srcu_array, name##_srcu_array);\
is_static struct srcu_struct name = __SRCU_STRUCT_INIT(name)
#define DEFINE_SRCU(name) __DEFINE_SRCU(name, /* not static */)
#define DEFINE_STATIC_SRCU(name) __DEFINE_SRCU(name, static)
void synchronize_srcu_expedited(struct srcu_struct *sp);
void srcu_barrier(struct srcu_struct *sp);
unsigned long srcu_batches_completed(struct srcu_struct *sp);
static inline void srcutorture_get_gp_data(enum rcutorture_type test_type,
struct srcu_struct *sp, int *flags,
unsigned long *gpnum,
unsigned long *completed)
{
if (test_type != SRCU_FLAVOR)
return;
*flags = 0;
*completed = sp->completed;
*gpnum = *completed;
if (sp->batch_queue.head || sp->batch_check0.head || sp->batch_check0.head)
(*gpnum)++;
}
#endif
......@@ -27,15 +27,14 @@
#include <linux/swait.h>
struct srcu_struct {
int srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */
short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */
short srcu_idx; /* Current reader array element. */
u8 srcu_gp_running; /* GP workqueue running? */
u8 srcu_gp_waiting; /* GP waiting for readers? */
struct swait_queue_head srcu_wq;
/* Last srcu_read_unlock() wakes GP. */
unsigned long srcu_gp_seq; /* GP seq # for callback tagging. */
struct rcu_segcblist srcu_cblist;
/* Pending SRCU callbacks. */
int srcu_idx; /* Current reader array element. */
bool srcu_gp_running; /* GP workqueue running? */
bool srcu_gp_waiting; /* GP waiting for readers? */
struct rcu_head *srcu_cb_head; /* Pending callbacks: Head. */
struct rcu_head **srcu_cb_tail; /* Pending callbacks: Tail. */
struct work_struct srcu_work; /* For driving grace periods. */
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
......@@ -47,7 +46,7 @@ void srcu_drive_gp(struct work_struct *wp);
#define __SRCU_STRUCT_INIT(name) \
{ \
.srcu_wq = __SWAIT_QUEUE_HEAD_INITIALIZER(name.srcu_wq), \
.srcu_cblist = RCU_SEGCBLIST_INITIALIZER(name.srcu_cblist), \
.srcu_cb_tail = &name.srcu_cb_head, \
.srcu_work = __WORK_INITIALIZER(name.srcu_work, srcu_drive_gp), \
__SRCU_DEP_MAP_INIT(name) \
}
......@@ -63,31 +62,29 @@ void srcu_drive_gp(struct work_struct *wp);
void synchronize_srcu(struct srcu_struct *sp);
static inline void synchronize_srcu_expedited(struct srcu_struct *sp)
/*
* Counts the new reader in the appropriate per-CPU element of the
* srcu_struct. Can be invoked from irq/bh handlers, but the matching
* __srcu_read_unlock() must be in the same handler instance. Returns an
* index that must be passed to the matching srcu_read_unlock().
*/
static inline int __srcu_read_lock(struct srcu_struct *sp)
{
synchronize_srcu(sp);
}
int idx;
static inline void srcu_barrier(struct srcu_struct *sp)
{
synchronize_srcu(sp);
idx = READ_ONCE(sp->srcu_idx);
WRITE_ONCE(sp->srcu_lock_nesting[idx], sp->srcu_lock_nesting[idx] + 1);
return idx;
}
static inline unsigned long srcu_batches_completed(struct srcu_struct *sp)
static inline void synchronize_srcu_expedited(struct srcu_struct *sp)
{
return 0;
synchronize_srcu(sp);
}
static inline void srcutorture_get_gp_data(enum rcutorture_type test_type,
struct srcu_struct *sp, int *flags,
unsigned long *gpnum,
unsigned long *completed)
static inline void srcu_barrier(struct srcu_struct *sp)
{
if (test_type != SRCU_FLAVOR)
return;
*flags = 0;
*completed = sp->srcu_gp_seq;
*gpnum = *completed;
synchronize_srcu(sp);
}
#endif
......@@ -40,7 +40,7 @@ struct srcu_data {
unsigned long srcu_unlock_count[2]; /* Unlocks per CPU. */
/* Update-side state. */
spinlock_t lock ____cacheline_internodealigned_in_smp;
raw_spinlock_t __private lock ____cacheline_internodealigned_in_smp;
struct rcu_segcblist srcu_cblist; /* List of callbacks.*/
unsigned long srcu_gp_seq_needed; /* Furthest future GP needed. */
unsigned long srcu_gp_seq_needed_exp; /* Furthest future exp GP. */
......@@ -58,7 +58,7 @@ struct srcu_data {
* Node in SRCU combining tree, similar in function to rcu_data.
*/
struct srcu_node {
spinlock_t lock;
raw_spinlock_t __private lock;
unsigned long srcu_have_cbs[4]; /* GP seq for children */
/* having CBs, but only */
/* is > ->srcu_gq_seq. */
......@@ -78,7 +78,7 @@ struct srcu_struct {
struct srcu_node *level[RCU_NUM_LVLS + 1];
/* First node at each level. */
struct mutex srcu_cb_mutex; /* Serialize CB preparation. */
spinlock_t gp_lock; /* protect ->srcu_cblist */
raw_spinlock_t __private lock; /* Protect counters */
struct mutex srcu_gp_mutex; /* Serialize GP work. */
unsigned int srcu_idx; /* Current rdr array element. */
unsigned long srcu_gp_seq; /* Grace-period seq #. */
......@@ -109,7 +109,7 @@ void process_srcu(struct work_struct *work);
#define __SRCU_STRUCT_INIT(name) \
{ \
.sda = &name##_srcu_data, \
.gp_lock = __SPIN_LOCK_UNLOCKED(name.gp_lock), \
.lock = __RAW_SPIN_LOCK_UNLOCKED(name.lock), \
.srcu_gp_seq_needed = 0 - 1, \
__SRCU_DEP_MAP_INIT(name) \
}
......@@ -141,10 +141,5 @@ void process_srcu(struct work_struct *work);
void synchronize_srcu_expedited(struct srcu_struct *sp);
void srcu_barrier(struct srcu_struct *sp);
unsigned long srcu_batches_completed(struct srcu_struct *sp);
void srcutorture_get_gp_data(enum rcutorture_type test_type,
struct srcu_struct *sp, int *flags,
unsigned long *gpnum, unsigned long *completed);
#endif
......@@ -742,6 +742,7 @@ TRACE_EVENT(rcu_torture_read,
* "OnlineQ": _rcu_barrier() found online CPU with callbacks.
* "OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
* "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
* "IRQNQ": An rcu_barrier_callback() callback found no callbacks.
* "CB": An rcu_barrier_callback() invoked a callback, not the last.
* "LastCB": An rcu_barrier_callback() invoked the last callback.
* "Inc2": _rcu_barrier() piggyback check counter incremented.
......
This diff is collapsed.
This diff is collapsed.
#
# RCU-related configuration options
#
menu "RCU Subsystem"
config TREE_RCU
bool
default y if !PREEMPT && SMP
help
This option selects the RCU implementation that is
designed for very large SMP system with hundreds or
thousands of CPUs. It also scales down nicely to
smaller systems.
config PREEMPT_RCU
bool
default y if PREEMPT
help
This option selects the RCU implementation that is
designed for very large SMP systems with hundreds or
thousands of CPUs, but for which real-time response
is also required. It also scales down nicely to
smaller systems.
Select this option if you are unsure.
config TINY_RCU
bool
default y if !PREEMPT && !SMP
help
This option selects the RCU implementation that is
designed for UP systems from which real-time response
is not required. This option greatly reduces the
memory footprint of RCU.
config RCU_EXPERT
bool "Make expert-level adjustments to RCU configuration"
default n
help
This option needs to be enabled if you wish to make
expert-level adjustments to RCU configuration. By default,
no such adjustments can be made, which has the often-beneficial
side-effect of preventing "make oldconfig" from asking you all
sorts of detailed questions about how you would like numerous
obscure RCU options to be set up.
Say Y if you need to make expert-level adjustments to RCU.
Say N if you are unsure.
config SRCU
bool
help
This option selects the sleepable version of RCU. This version
permits arbitrary sleeping or blocking within RCU read-side critical
sections.
config TINY_SRCU
bool
default y if SRCU && TINY_RCU
help
This option selects the single-CPU non-preemptible version of SRCU.
config TREE_SRCU
bool
default y if SRCU && !TINY_RCU
help
This option selects the full-fledged version of SRCU.
config TASKS_RCU
bool
default n
select SRCU
help
This option enables a task-based RCU implementation that uses
only voluntary context switch (not preemption!), idle, and
user-mode execution as quiescent states.
config RCU_STALL_COMMON
def_bool ( TREE_RCU || PREEMPT_RCU )
help
This option enables RCU CPU stall code that is common between
the TINY and TREE variants of RCU. The purpose is to allow
the tiny variants to disable RCU CPU stall warnings, while
making these warnings mandatory for the tree variants.
config RCU_NEED_SEGCBLIST
def_bool ( TREE_RCU || PREEMPT_RCU || TREE_SRCU )
config CONTEXT_TRACKING
bool
config CONTEXT_TRACKING_FORCE
bool "Force context tracking"
depends on CONTEXT_TRACKING
default y if !NO_HZ_FULL
help
The major pre-requirement for full dynticks to work is to
support the context tracking subsystem. But there are also
other dependencies to provide in order to make the full
dynticks working.
This option stands for testing when an arch implements the
context tracking backend but doesn't yet fullfill all the
requirements to make the full dynticks feature working.
Without the full dynticks, there is no way to test the support
for context tracking and the subsystems that rely on it: RCU
userspace extended quiescent state and tickless cputime
accounting. This option copes with the absence of the full
dynticks subsystem by forcing the context tracking on all
CPUs in the system.
Say Y only if you're working on the development of an
architecture backend for the context tracking.
Say N otherwise, this option brings an overhead that you
don't want in production.
config RCU_FANOUT
int "Tree-based hierarchical RCU fanout value"
range 2 64 if 64BIT
range 2 32 if !64BIT
depends on (TREE_RCU || PREEMPT_RCU) && RCU_EXPERT
default 64 if 64BIT
default 32 if !64BIT
help
This option controls the fanout of hierarchical implementations
of RCU, allowing RCU to work efficiently on machines with
large numbers of CPUs. This value must be at least the fourth
root of NR_CPUS, which allows NR_CPUS to be insanely large.
The default value of RCU_FANOUT should be used for production
systems, but if you are stress-testing the RCU implementation
itself, small RCU_FANOUT values allow you to test large-system
code paths on small(er) systems.
Select a specific number if testing RCU itself.
Take the default if unsure.
config RCU_FANOUT_LEAF
int "Tree-based hierarchical RCU leaf-level fanout value"
range 2 64 if 64BIT
range 2 32 if !64BIT
depends on (TREE_RCU || PREEMPT_RCU) && RCU_EXPERT
default 16
help
This option controls the leaf-level fanout of hierarchical
implementations of RCU, and allows trading off cache misses
against lock contention. Systems that synchronize their
scheduling-clock interrupts for energy-efficiency reasons will
want the default because the smaller leaf-level fanout keeps
lock contention levels acceptably low. Very large systems
(hundreds or thousands of CPUs) will instead want to set this
value to the maximum value possible in order to reduce the
number of cache misses incurred during RCU's grace-period
initialization. These systems tend to run CPU-bound, and thus
are not helped by synchronized interrupts, and thus tend to
skew them, which reduces lock contention enough that large
leaf-level fanouts work well. That said, setting leaf-level
fanout to a large number will likely cause problematic
lock contention on the leaf-level rcu_node structures unless
you boot with the skew_tick kernel parameter.
Select a specific number if testing RCU itself.
Select the maximum permissible value for large systems, but
please understand that you may also need to set the skew_tick
kernel boot parameter to avoid contention on the rcu_node
structure's locks.
Take the default if unsure.
config RCU_FAST_NO_HZ
bool "Accelerate last non-dyntick-idle CPU's grace periods"
depends on NO_HZ_COMMON && SMP && RCU_EXPERT
default n
help
This option permits CPUs to enter dynticks-idle state even if
they have RCU callbacks queued, and prevents RCU from waking
these CPUs up more than roughly once every four jiffies (by
default, you can adjust this using the rcutree.rcu_idle_gp_delay
parameter), thus improving energy efficiency. On the other
hand, this option increases the duration of RCU grace periods,
for example, slowing down synchronize_rcu().
Say Y if energy efficiency is critically important, and you
don't care about increased grace-period durations.
Say N if you are unsure.
config RCU_BOOST
bool "Enable RCU priority boosting"
depends on RT_MUTEXES && PREEMPT_RCU && RCU_EXPERT
default n
help
This option boosts the priority of preempted RCU readers that
block the current preemptible RCU grace period for too long.
This option also prevents heavy loads from blocking RCU
callback invocation for all flavors of RCU.
Say Y here if you are working with real-time apps or heavy loads
Say N here if you are unsure.
config RCU_BOOST_DELAY
int "Milliseconds to delay boosting after RCU grace-period start"
range 0 3000
depends on RCU_BOOST
default 500
help
This option specifies the time to wait after the beginning of
a given grace period before priority-boosting preempted RCU
readers blocking that grace period. Note that any RCU reader
blocking an expedited RCU grace period is boosted immediately.
Accept the default if unsure.
config RCU_NOCB_CPU
bool "Offload RCU callback processing from boot-selected CPUs"
depends on TREE_RCU || PREEMPT_RCU
depends on RCU_EXPERT || NO_HZ_FULL
default n
help
Use this option to reduce OS jitter for aggressive HPC or
real-time workloads. It can also be used to offload RCU
callback invocation to energy-efficient CPUs in battery-powered
asymmetric multiprocessors.
This option offloads callback invocation from the set of
CPUs specified at boot time by the rcu_nocbs parameter.
For each such CPU, a kthread ("rcuox/N") will be created to
invoke callbacks, where the "N" is the CPU being offloaded,
and where the "x" is "b" for RCU-bh, "p" for RCU-preempt, and
"s" for RCU-sched. Nothing prevents this kthread from running
on the specified CPUs, but (1) the kthreads may be preempted
between each callback, and (2) affinity or cgroups can be used
to force the kthreads to run on whatever set of CPUs is desired.
Say Y here if you want to help to debug reduced OS jitter.
Say N here if you are unsure.
endmenu # "RCU Subsystem"
#
# RCU-related debugging configuration options
#
menu "RCU Debugging"
config PROVE_RCU
def_bool PROVE_LOCKING
config TORTURE_TEST
tristate
default n
config RCU_PERF_TEST
tristate "performance tests for RCU"
depends on DEBUG_KERNEL
select TORTURE_TEST
select SRCU
select TASKS_RCU
default n
help
This option provides a kernel module that runs performance
tests on the RCU infrastructure. The kernel module may be built
after the fact on the running kernel to be tested, if desired.
Say Y here if you want RCU performance tests to be built into
the kernel.
Say M if you want the RCU performance tests to build as a module.
Say N if you are unsure.
config RCU_TORTURE_TEST
tristate "torture tests for RCU"
depends on DEBUG_KERNEL
select TORTURE_TEST
select SRCU
select TASKS_RCU
default n
help
This option provides a kernel module that runs torture tests
on the RCU infrastructure. The kernel module may be built
after the fact on the running kernel to be tested, if desired.
Say Y here if you want RCU torture tests to be built into
the kernel.
Say M if you want the RCU torture tests to build as a module.
Say N if you are unsure.
config RCU_CPU_STALL_TIMEOUT
int "RCU CPU stall timeout in seconds"
depends on RCU_STALL_COMMON
range 3 300
default 21
help
If a given RCU grace period extends more than the specified
number of seconds, a CPU stall warning is printed. If the
RCU grace period persists, additional CPU stall warnings are
printed at more widely spaced intervals.
config RCU_TRACE
bool "Enable tracing for RCU"
depends on DEBUG_KERNEL
default y if TREE_RCU
select TRACE_CLOCK
help
This option enables additional tracepoints for ftrace-style
event tracing.
Say Y here if you want to enable RCU tracing
Say N if you are unsure.
config RCU_EQS_DEBUG
bool "Provide debugging asserts for adding NO_HZ support to an arch"
depends on DEBUG_KERNEL
help
This option provides consistency checks in RCU's handling of
NO_HZ. These checks have proven quite helpful in detecting
bugs in arch-specific NO_HZ code.
Say N here if you need ultimate kernel/user switch latencies
Say Y if you are unsure
endmenu # "RCU Debugging"
......@@ -3,13 +3,11 @@
KCOV_INSTRUMENT := n
obj-y += update.o sync.o
obj-$(CONFIG_CLASSIC_SRCU) += srcu.o
obj-$(CONFIG_TREE_SRCU) += srcutree.o
obj-$(CONFIG_TINY_SRCU) += srcutiny.o
obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
obj-$(CONFIG_RCU_PERF_TEST) += rcuperf.o
obj-$(CONFIG_TREE_RCU) += tree.o
obj-$(CONFIG_PREEMPT_RCU) += tree.o
obj-$(CONFIG_TREE_RCU_TRACE) += tree_trace.o
obj-$(CONFIG_TINY_RCU) += tiny.o
obj-$(CONFIG_RCU_NEED_SEGCBLIST) += rcu_segcblist.o
......@@ -212,6 +212,18 @@ int rcu_jiffies_till_stall_check(void);
*/
#define TPS(x) tracepoint_string(x)
/*
* Dump the ftrace buffer, but only one time per callsite per boot.
*/
#define rcu_ftrace_dump(oops_dump_mode) \
do { \
static atomic_t ___rfd_beenhere = ATOMIC_INIT(0); \
\
if (!atomic_read(&___rfd_beenhere) && \
!atomic_xchg(&___rfd_beenhere, 1)) \
ftrace_dump(oops_dump_mode); \
} while (0)
void rcu_early_boot_tests(void);
void rcu_test_sync_prims(void);
......@@ -291,6 +303,271 @@ static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt)
cpu <= rnp->grphi; \
cpu = cpumask_next((cpu), cpu_possible_mask))
/*
* Wrappers for the rcu_node::lock acquire and release.
*
* Because the rcu_nodes form a tree, the tree traversal locking will observe
* different lock values, this in turn means that an UNLOCK of one level
* followed by a LOCK of another level does not imply a full memory barrier;
* and most importantly transitivity is lost.
*
* In order to restore full ordering between tree levels, augment the regular
* lock acquire functions with smp_mb__after_unlock_lock().
*
* As ->lock of struct rcu_node is a __private field, therefore one should use
* these wrappers rather than directly call raw_spin_{lock,unlock}* on ->lock.
*/
#define raw_spin_lock_rcu_node(p) \
do { \
raw_spin_lock(&ACCESS_PRIVATE(p, lock)); \
smp_mb__after_unlock_lock(); \
} while (0)
#define raw_spin_unlock_rcu_node(p) raw_spin_unlock(&ACCESS_PRIVATE(p, lock))
#define raw_spin_lock_irq_rcu_node(p) \
do { \
raw_spin_lock_irq(&ACCESS_PRIVATE(p, lock)); \
smp_mb__after_unlock_lock(); \
} while (0)
#define raw_spin_unlock_irq_rcu_node(p) \
raw_spin_unlock_irq(&ACCESS_PRIVATE(p, lock))
#define raw_spin_lock_irqsave_rcu_node(p, flags) \
do { \
raw_spin_lock_irqsave(&ACCESS_PRIVATE(p, lock), flags); \
smp_mb__after_unlock_lock(); \
} while (0)
#define raw_spin_unlock_irqrestore_rcu_node(p, flags) \
raw_spin_unlock_irqrestore(&ACCESS_PRIVATE(p, lock), flags) \
#define raw_spin_trylock_rcu_node(p) \
({ \
bool ___locked = raw_spin_trylock(&ACCESS_PRIVATE(p, lock)); \
\
if (___locked) \
smp_mb__after_unlock_lock(); \
___locked; \
})
#endif /* #if defined(SRCU) || !defined(TINY_RCU) */
#ifdef CONFIG_TINY_RCU
/* Tiny RCU doesn't expedite, as its purpose in life is instead to be tiny. */
static inline bool rcu_gp_is_normal(void) /* Internal RCU use. */
{
return true;
}
static inline bool rcu_gp_is_expedited(void) /* Internal RCU use. */
{
return false;
}
static inline void rcu_expedite_gp(void)
{
}
static inline void rcu_unexpedite_gp(void)
{
}
#else /* #ifdef CONFIG_TINY_RCU */
bool rcu_gp_is_normal(void); /* Internal RCU use. */
bool rcu_gp_is_expedited(void); /* Internal RCU use. */
void rcu_expedite_gp(void);
void rcu_unexpedite_gp(void);
void rcupdate_announce_bootup_oddness(void);
#endif /* #else #ifdef CONFIG_TINY_RCU */
#define RCU_SCHEDULER_INACTIVE 0
#define RCU_SCHEDULER_INIT 1
#define RCU_SCHEDULER_RUNNING 2
#ifdef CONFIG_TINY_RCU
static inline void rcu_request_urgent_qs_task(struct task_struct *t) { }
#else /* #ifdef CONFIG_TINY_RCU */
void rcu_request_urgent_qs_task(struct task_struct *t);
#endif /* #else #ifdef CONFIG_TINY_RCU */
enum rcutorture_type {
RCU_FLAVOR,
RCU_BH_FLAVOR,
RCU_SCHED_FLAVOR,
RCU_TASKS_FLAVOR,
SRCU_FLAVOR,
INVALID_RCU_FLAVOR
};
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags,
unsigned long *gpnum, unsigned long *completed);
void rcutorture_record_test_transition(void);
void rcutorture_record_progress(unsigned long vernum);
void do_trace_rcu_torture_read(const char *rcutorturename,
struct rcu_head *rhp,
unsigned long secs,
unsigned long c_old,
unsigned long c);
#else
static inline void rcutorture_get_gp_data(enum rcutorture_type test_type,
int *flags,
unsigned long *gpnum,
unsigned long *completed)
{
*flags = 0;
*gpnum = 0;
*completed = 0;
}
static inline void rcutorture_record_test_transition(void)
{
}
static inline void rcutorture_record_progress(unsigned long vernum)
{
}
#ifdef CONFIG_RCU_TRACE
void do_trace_rcu_torture_read(const char *rcutorturename,
struct rcu_head *rhp,
unsigned long secs,
unsigned long c_old,
unsigned long c);
#else
#define do_trace_rcu_torture_read(rcutorturename, rhp, secs, c_old, c) \
do { } while (0)
#endif
#endif
#ifdef CONFIG_TINY_SRCU
static inline void srcutorture_get_gp_data(enum rcutorture_type test_type,
struct srcu_struct *sp, int *flags,
unsigned long *gpnum,
unsigned long *completed)
{
if (test_type != SRCU_FLAVOR)
return;
*flags = 0;
*completed = sp->srcu_idx;
*gpnum = *completed;
}
#elif defined(CONFIG_TREE_SRCU)
void srcutorture_get_gp_data(enum rcutorture_type test_type,
struct srcu_struct *sp, int *flags,
unsigned long *gpnum, unsigned long *completed);
#endif
#ifdef CONFIG_TINY_RCU
/*
* Return the number of grace periods started.
*/
static inline unsigned long rcu_batches_started(void)
{
return 0;
}
/*
* Return the number of bottom-half grace periods started.
*/
static inline unsigned long rcu_batches_started_bh(void)
{
return 0;
}
/*
* Return the number of sched grace periods started.
*/
static inline unsigned long rcu_batches_started_sched(void)
{
return 0;
}
/*
* Return the number of grace periods completed.
*/
static inline unsigned long rcu_batches_completed(void)
{
return 0;
}
/*
* Return the number of bottom-half grace periods completed.
*/
static inline unsigned long rcu_batches_completed_bh(void)
{
return 0;
}
/*
* Return the number of sched grace periods completed.
*/
static inline unsigned long rcu_batches_completed_sched(void)
{
return 0;
}
/*
* Return the number of expedited grace periods completed.
*/
static inline unsigned long rcu_exp_batches_completed(void)
{
return 0;
}
/*
* Return the number of expedited sched grace periods completed.
*/
static inline unsigned long rcu_exp_batches_completed_sched(void)
{
return 0;
}
static inline unsigned long srcu_batches_completed(struct srcu_struct *sp)
{
return 0;
}
static inline void rcu_force_quiescent_state(void)
{
}
static inline void rcu_bh_force_quiescent_state(void)
{
}
static inline void rcu_sched_force_quiescent_state(void)
{
}
static inline void show_rcu_gp_kthreads(void)
{
}
#else /* #ifdef CONFIG_TINY_RCU */
extern unsigned long rcutorture_testseq;
extern unsigned long rcutorture_vernum;
unsigned long rcu_batches_started(void);
unsigned long rcu_batches_started_bh(void);
unsigned long rcu_batches_started_sched(void);
unsigned long rcu_batches_completed(void);
unsigned long rcu_batches_completed_bh(void);
unsigned long rcu_batches_completed_sched(void);
unsigned long rcu_exp_batches_completed(void);
unsigned long rcu_exp_batches_completed_sched(void);
unsigned long srcu_batches_completed(struct srcu_struct *sp);
void show_rcu_gp_kthreads(void);
void rcu_force_quiescent_state(void);
void rcu_bh_force_quiescent_state(void);
void rcu_sched_force_quiescent_state(void);
#endif /* #else #ifdef CONFIG_TINY_RCU */
#ifdef CONFIG_RCU_NOCB_CPU
bool rcu_is_nocb_cpu(int cpu);
#else
static inline bool rcu_is_nocb_cpu(int cpu) { return false; }
#endif
#endif /* __LINUX_RCU_H */
......@@ -48,6 +48,8 @@
#include <linux/torture.h>
#include <linux/vmalloc.h>
#include "rcu.h"
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.vnet.ibm.com>");
......@@ -59,12 +61,16 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.vnet.ibm.com>");
#define VERBOSE_PERFOUT_ERRSTRING(s) \
do { if (verbose) pr_alert("%s" PERF_FLAG "!!! %s\n", perf_type, s); } while (0)
torture_param(bool, gp_async, false, "Use asynchronous GP wait primitives");
torture_param(int, gp_async_max, 1000, "Max # outstanding waits per reader");
torture_param(bool, gp_exp, false, "Use expedited GP wait primitives");
torture_param(int, holdoff, 10, "Holdoff time before test start (s)");
torture_param(int, nreaders, -1, "Number of RCU reader threads");
torture_param(int, nreaders, 0, "Number of RCU reader threads");
torture_param(int, nwriters, -1, "Number of RCU updater threads");
torture_param(bool, shutdown, false, "Shutdown at end of performance tests.");
torture_param(bool, shutdown, !IS_ENABLED(MODULE),
"Shutdown at end of performance tests.");
torture_param(bool, verbose, true, "Enable verbose debugging printk()s");
torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable");
static char *perf_type = "rcu";
module_param(perf_type, charp, 0444);
......@@ -86,13 +92,16 @@ static u64 t_rcu_perf_writer_started;
static u64 t_rcu_perf_writer_finished;
static unsigned long b_rcu_perf_writer_started;
static unsigned long b_rcu_perf_writer_finished;
static DEFINE_PER_CPU(atomic_t, n_async_inflight);
static int rcu_perf_writer_state;
#define RTWS_INIT 0
#define RTWS_EXP_SYNC 1
#define RTWS_SYNC 2
#define RTWS_IDLE 2
#define RTWS_STOPPING 3
#define RTWS_ASYNC 1
#define RTWS_BARRIER 2
#define RTWS_EXP_SYNC 3
#define RTWS_SYNC 4
#define RTWS_IDLE 5
#define RTWS_STOPPING 6
#define MAX_MEAS 10000
#define MIN_MEAS 100
......@@ -114,6 +123,8 @@ struct rcu_perf_ops {
unsigned long (*started)(void);
unsigned long (*completed)(void);
unsigned long (*exp_completed)(void);
void (*async)(struct rcu_head *head, rcu_callback_t func);
void (*gp_barrier)(void);
void (*sync)(void);
void (*exp_sync)(void);
const char *name;
......@@ -153,6 +164,8 @@ static struct rcu_perf_ops rcu_ops = {
.started = rcu_batches_started,
.completed = rcu_batches_completed,
.exp_completed = rcu_exp_batches_completed,
.async = call_rcu,
.gp_barrier = rcu_barrier,
.sync = synchronize_rcu,
.exp_sync = synchronize_rcu_expedited,
.name = "rcu"
......@@ -181,6 +194,8 @@ static struct rcu_perf_ops rcu_bh_ops = {
.started = rcu_batches_started_bh,
.completed = rcu_batches_completed_bh,
.exp_completed = rcu_exp_batches_completed_sched,
.async = call_rcu_bh,
.gp_barrier = rcu_barrier_bh,
.sync = synchronize_rcu_bh,
.exp_sync = synchronize_rcu_bh_expedited,
.name = "rcu_bh"
......@@ -208,6 +223,16 @@ static unsigned long srcu_perf_completed(void)
return srcu_batches_completed(srcu_ctlp);
}
static void srcu_call_rcu(struct rcu_head *head, rcu_callback_t func)
{
call_srcu(srcu_ctlp, head, func);
}
static void srcu_rcu_barrier(void)
{
srcu_barrier(srcu_ctlp);
}
static void srcu_perf_synchronize(void)
{
synchronize_srcu(srcu_ctlp);
......@@ -226,11 +251,42 @@ static struct rcu_perf_ops srcu_ops = {
.started = NULL,
.completed = srcu_perf_completed,
.exp_completed = srcu_perf_completed,
.async = srcu_call_rcu,
.gp_barrier = srcu_rcu_barrier,
.sync = srcu_perf_synchronize,
.exp_sync = srcu_perf_synchronize_expedited,
.name = "srcu"
};
static struct srcu_struct srcud;
static void srcu_sync_perf_init(void)
{
srcu_ctlp = &srcud;
init_srcu_struct(srcu_ctlp);
}
static void srcu_sync_perf_cleanup(void)
{
cleanup_srcu_struct(srcu_ctlp);
}
static struct rcu_perf_ops srcud_ops = {
.ptype = SRCU_FLAVOR,
.init = srcu_sync_perf_init,
.cleanup = srcu_sync_perf_cleanup,
.readlock = srcu_perf_read_lock,
.readunlock = srcu_perf_read_unlock,
.started = NULL,
.completed = srcu_perf_completed,
.exp_completed = srcu_perf_completed,
.async = srcu_call_rcu,
.gp_barrier = srcu_rcu_barrier,
.sync = srcu_perf_synchronize,
.exp_sync = srcu_perf_synchronize_expedited,
.name = "srcud"
};
/*
* Definitions for sched perf testing.
*/
......@@ -254,6 +310,8 @@ static struct rcu_perf_ops sched_ops = {
.started = rcu_batches_started_sched,
.completed = rcu_batches_completed_sched,
.exp_completed = rcu_exp_batches_completed_sched,
.async = call_rcu_sched,
.gp_barrier = rcu_barrier_sched,
.sync = synchronize_sched,
.exp_sync = synchronize_sched_expedited,
.name = "sched"
......@@ -281,6 +339,8 @@ static struct rcu_perf_ops tasks_ops = {
.readunlock = tasks_perf_read_unlock,
.started = rcu_no_completed,
.completed = rcu_no_completed,
.async = call_rcu_tasks,
.gp_barrier = rcu_barrier_tasks,
.sync = synchronize_rcu_tasks,
.exp_sync = synchronize_rcu_tasks,
.name = "tasks"
......@@ -343,6 +403,15 @@ rcu_perf_reader(void *arg)
return 0;
}
/*
* Callback function for asynchronous grace periods from rcu_perf_writer().
*/
static void rcu_perf_async_cb(struct rcu_head *rhp)
{
atomic_dec(this_cpu_ptr(&n_async_inflight));
kfree(rhp);
}
/*
* RCU perf writer kthread. Repeatedly does a grace period.
*/
......@@ -352,6 +421,7 @@ rcu_perf_writer(void *arg)
int i = 0;
int i_max;
long me = (long)arg;
struct rcu_head *rhp = NULL;
struct sched_param sp;
bool started = false, done = false, alldone = false;
u64 t;
......@@ -380,9 +450,27 @@ rcu_perf_writer(void *arg)
}
do {
if (writer_holdoff)
udelay(writer_holdoff);
wdp = &wdpp[i];
*wdp = ktime_get_mono_fast_ns();
if (gp_exp) {
if (gp_async) {
retry:
if (!rhp)
rhp = kmalloc(sizeof(*rhp), GFP_KERNEL);
if (rhp && atomic_read(this_cpu_ptr(&n_async_inflight)) < gp_async_max) {
rcu_perf_writer_state = RTWS_ASYNC;
atomic_inc(this_cpu_ptr(&n_async_inflight));
cur_ops->async(rhp, rcu_perf_async_cb);
rhp = NULL;
} else if (!kthread_should_stop()) {
rcu_perf_writer_state = RTWS_BARRIER;
cur_ops->gp_barrier();
goto retry;
} else {
kfree(rhp); /* Because we are stopping. */
}
} else if (gp_exp) {
rcu_perf_writer_state = RTWS_EXP_SYNC;
cur_ops->exp_sync();
} else {
......@@ -429,6 +517,10 @@ rcu_perf_writer(void *arg)
i++;
rcu_perf_wait_shutdown();
} while (!torture_must_stop());
if (gp_async) {
rcu_perf_writer_state = RTWS_BARRIER;
cur_ops->gp_barrier();
}
rcu_perf_writer_state = RTWS_STOPPING;
writer_n_durations[me] = i_max;
torture_kthread_stopping("rcu_perf_writer");
......@@ -452,6 +544,17 @@ rcu_perf_cleanup(void)
u64 *wdp;
u64 *wdpp;
/*
* Would like warning at start, but everything is expedited
* during the mid-boot phase, so have to wait till the end.
*/
if (rcu_gp_is_expedited() && !rcu_gp_is_normal() && !gp_exp)
VERBOSE_PERFOUT_ERRSTRING("All grace periods expedited, no normal ones to measure!");
if (rcu_gp_is_normal() && gp_exp)
VERBOSE_PERFOUT_ERRSTRING("All grace periods normal, no expedited ones to measure!");
if (gp_exp && gp_async)
VERBOSE_PERFOUT_ERRSTRING("No expedited async GPs, so went with async!");
if (torture_cleanup_begin())
return;
......@@ -554,7 +657,7 @@ rcu_perf_init(void)
long i;
int firsterr = 0;
static struct rcu_perf_ops *perf_ops[] = {
&rcu_ops, &rcu_bh_ops, &srcu_ops, &sched_ops,
&rcu_ops, &rcu_bh_ops, &srcu_ops, &srcud_ops, &sched_ops,
RCUPERF_TASKS_OPS
};
......@@ -624,16 +727,6 @@ rcu_perf_init(void)
firsterr = -ENOMEM;
goto unwind;
}
if (rcu_gp_is_expedited() && !rcu_gp_is_normal() && !gp_exp) {
VERBOSE_PERFOUT_ERRSTRING("All grace periods expedited, no normal ones to measure!");
firsterr = -EINVAL;
goto unwind;
}
if (rcu_gp_is_normal() && gp_exp) {
VERBOSE_PERFOUT_ERRSTRING("All grace periods normal, no expedited ones to measure!");
firsterr = -EINVAL;
goto unwind;
}
for (i = 0; i < nrealwriters; i++) {
writer_durations[i] =
kcalloc(MAX_MEAS, sizeof(*writer_durations[i]),
......
......@@ -52,6 +52,8 @@
#include <linux/torture.h>
#include <linux/vmalloc.h>
#include "rcu.h"
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Paul E. McKenney <paulmck@us.ibm.com> and Josh Triplett <josh@joshtriplett.org>");
......@@ -562,31 +564,19 @@ static void srcu_torture_stats(void)
int __maybe_unused cpu;
int idx;
#if defined(CONFIG_TREE_SRCU) || defined(CONFIG_CLASSIC_SRCU)
#ifdef CONFIG_TREE_SRCU
idx = srcu_ctlp->srcu_idx & 0x1;
#else /* #ifdef CONFIG_TREE_SRCU */
idx = srcu_ctlp->completed & 0x1;
#endif /* #else #ifdef CONFIG_TREE_SRCU */
pr_alert("%s%s Tree SRCU per-CPU(idx=%d):",
torture_type, TORTURE_FLAG, idx);
for_each_possible_cpu(cpu) {
unsigned long l0, l1;
unsigned long u0, u1;
long c0, c1;
#ifdef CONFIG_TREE_SRCU
struct srcu_data *counts;
counts = per_cpu_ptr(srcu_ctlp->sda, cpu);
u0 = counts->srcu_unlock_count[!idx];
u1 = counts->srcu_unlock_count[idx];
#else /* #ifdef CONFIG_TREE_SRCU */
struct srcu_array *counts;
counts = per_cpu_ptr(srcu_ctlp->per_cpu_ref, cpu);
u0 = counts->unlock_count[!idx];
u1 = counts->unlock_count[idx];
#endif /* #else #ifdef CONFIG_TREE_SRCU */
/*
* Make sure that a lock is always counted if the corresponding
......@@ -594,13 +584,8 @@ static void srcu_torture_stats(void)
*/
smp_rmb();
#ifdef CONFIG_TREE_SRCU
l0 = counts->srcu_lock_count[!idx];
l1 = counts->srcu_lock_count[idx];
#else /* #ifdef CONFIG_TREE_SRCU */
l0 = counts->lock_count[!idx];
l1 = counts->lock_count[idx];
#endif /* #else #ifdef CONFIG_TREE_SRCU */
c0 = l0 - u0;
c1 = l1 - u1;
......@@ -609,7 +594,7 @@ static void srcu_torture_stats(void)
pr_cont("\n");
#elif defined(CONFIG_TINY_SRCU)
idx = READ_ONCE(srcu_ctlp->srcu_idx) & 0x1;
pr_alert("%s%s Tiny SRCU per-CPU(idx=%d): (%d,%d)\n",
pr_alert("%s%s Tiny SRCU per-CPU(idx=%d): (%hd,%hd)\n",
torture_type, TORTURE_FLAG, idx,
READ_ONCE(srcu_ctlp->srcu_lock_nesting[!idx]),
READ_ONCE(srcu_ctlp->srcu_lock_nesting[idx]));
......
This diff is collapsed.
......@@ -38,8 +38,8 @@ static int init_srcu_struct_fields(struct srcu_struct *sp)
sp->srcu_lock_nesting[0] = 0;
sp->srcu_lock_nesting[1] = 0;
init_swait_queue_head(&sp->srcu_wq);
sp->srcu_gp_seq = 0;
rcu_segcblist_init(&sp->srcu_cblist);
sp->srcu_cb_head = NULL;
sp->srcu_cb_tail = &sp->srcu_cb_head;
sp->srcu_gp_running = false;
sp->srcu_gp_waiting = false;
sp->srcu_idx = 0;
......@@ -88,29 +88,13 @@ void cleanup_srcu_struct(struct srcu_struct *sp)
{
WARN_ON(sp->srcu_lock_nesting[0] || sp->srcu_lock_nesting[1]);
flush_work(&sp->srcu_work);
WARN_ON(rcu_seq_state(sp->srcu_gp_seq));
WARN_ON(sp->srcu_gp_running);
WARN_ON(sp->srcu_gp_waiting);
WARN_ON(!rcu_segcblist_empty(&sp->srcu_cblist));
WARN_ON(sp->srcu_cb_head);
WARN_ON(&sp->srcu_cb_head != sp->srcu_cb_tail);
}
EXPORT_SYMBOL_GPL(cleanup_srcu_struct);
/*
* Counts the new reader in the appropriate per-CPU element of the
* srcu_struct. Can be invoked from irq/bh handlers, but the matching
* __srcu_read_unlock() must be in the same handler instance. Returns an
* index that must be passed to the matching srcu_read_unlock().
*/
int __srcu_read_lock(struct srcu_struct *sp)
{
int idx;
idx = READ_ONCE(sp->srcu_idx);
WRITE_ONCE(sp->srcu_lock_nesting[idx], sp->srcu_lock_nesting[idx] + 1);
return idx;
}
EXPORT_SYMBOL_GPL(__srcu_read_lock);
/*
* Removes the count for the old reader from the appropriate element of
* the srcu_struct.
......@@ -133,52 +117,44 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock);
void srcu_drive_gp(struct work_struct *wp)
{
int idx;
struct rcu_cblist ready_cbs;
struct srcu_struct *sp;
struct rcu_head *lh;
struct rcu_head *rhp;
struct srcu_struct *sp;
sp = container_of(wp, struct srcu_struct, srcu_work);
if (sp->srcu_gp_running || rcu_segcblist_empty(&sp->srcu_cblist))
if (sp->srcu_gp_running || !READ_ONCE(sp->srcu_cb_head))
return; /* Already running or nothing to do. */
/* Tag recently arrived callbacks and wait for readers. */
/* Remove recently arrived callbacks and wait for readers. */
WRITE_ONCE(sp->srcu_gp_running, true);
rcu_segcblist_accelerate(&sp->srcu_cblist,
rcu_seq_snap(&sp->srcu_gp_seq));
rcu_seq_start(&sp->srcu_gp_seq);
local_irq_disable();
lh = sp->srcu_cb_head;
sp->srcu_cb_head = NULL;
sp->srcu_cb_tail = &sp->srcu_cb_head;
local_irq_enable();
idx = sp->srcu_idx;
WRITE_ONCE(sp->srcu_idx, !sp->srcu_idx);
WRITE_ONCE(sp->srcu_gp_waiting, true); /* srcu_read_unlock() wakes! */
swait_event(sp->srcu_wq, !READ_ONCE(sp->srcu_lock_nesting[idx]));
WRITE_ONCE(sp->srcu_gp_waiting, false); /* srcu_read_unlock() cheap. */
rcu_seq_end(&sp->srcu_gp_seq);
/* Update callback list based on GP, and invoke ready callbacks. */
rcu_segcblist_advance(&sp->srcu_cblist,
rcu_seq_current(&sp->srcu_gp_seq));
if (rcu_segcblist_ready_cbs(&sp->srcu_cblist)) {
rcu_cblist_init(&ready_cbs);
local_irq_disable();
rcu_segcblist_extract_done_cbs(&sp->srcu_cblist, &ready_cbs);
local_irq_enable();
rhp = rcu_cblist_dequeue(&ready_cbs);
for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) {
local_bh_disable();
rhp->func(rhp);
local_bh_enable();
}
local_irq_disable();
rcu_segcblist_insert_count(&sp->srcu_cblist, &ready_cbs);
local_irq_enable();
/* Invoke the callbacks we removed above. */
while (lh) {
rhp = lh;
lh = lh->next;
local_bh_disable();
rhp->func(rhp);
local_bh_enable();
}
WRITE_ONCE(sp->srcu_gp_running, false);
/*
* If more callbacks, reschedule ourselves. This can race with
* a call_srcu() at interrupt level, but the ->srcu_gp_running
* checks will straighten that out.
* Enable rescheduling, and if there are more callbacks,
* reschedule ourselves. This can race with a call_srcu()
* at interrupt level, but the ->srcu_gp_running checks will
* straighten that out.
*/
if (!rcu_segcblist_empty(&sp->srcu_cblist))
WRITE_ONCE(sp->srcu_gp_running, false);
if (READ_ONCE(sp->srcu_cb_head))
schedule_work(&sp->srcu_work);
}
EXPORT_SYMBOL_GPL(srcu_drive_gp);
......@@ -187,14 +163,16 @@ EXPORT_SYMBOL_GPL(srcu_drive_gp);
* Enqueue an SRCU callback on the specified srcu_struct structure,
* initiating grace-period processing if it is not already running.
*/
void call_srcu(struct srcu_struct *sp, struct rcu_head *head,
void call_srcu(struct srcu_struct *sp, struct rcu_head *rhp,
rcu_callback_t func)
{
unsigned long flags;
head->func = func;
rhp->func = func;
rhp->next = NULL;
local_irq_save(flags);
rcu_segcblist_enqueue(&sp->srcu_cblist, head, false);
*sp->srcu_cb_tail = rhp;
sp->srcu_cb_tail = &rhp->next;
local_irq_restore(flags);
if (!READ_ONCE(sp->srcu_gp_running))
schedule_work(&sp->srcu_work);
......
This diff is collapsed.
......@@ -35,15 +35,26 @@
#include <linux/time.h>
#include <linux/cpu.h>
#include <linux/prefetch.h>
#include <linux/trace_events.h>
#include "rcu.h"
/* Forward declarations for tiny_plugin.h. */
struct rcu_ctrlblk;
static void __call_rcu(struct rcu_head *head,
rcu_callback_t func,
struct rcu_ctrlblk *rcp);
/* Global control variables for rcupdate callback mechanism. */
struct rcu_ctrlblk {
struct rcu_head *rcucblist; /* List of pending callbacks (CBs). */
struct rcu_head **donetail; /* ->next pointer of last "done" CB. */
struct rcu_head **curtail; /* ->next pointer of last CB. */
};
/* Definition for rcupdate control block. */
static struct rcu_ctrlblk rcu_sched_ctrlblk = {
.donetail = &rcu_sched_ctrlblk.rcucblist,
.curtail = &rcu_sched_ctrlblk.rcucblist,
};
static struct rcu_ctrlblk rcu_bh_ctrlblk = {
.donetail = &rcu_bh_ctrlblk.rcucblist,
.curtail = &rcu_bh_ctrlblk.rcucblist,
};
#include "tiny_plugin.h"
......@@ -59,19 +70,6 @@ void rcu_barrier_sched(void)
}
EXPORT_SYMBOL(rcu_barrier_sched);
#if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE)
/*
* Test whether RCU thinks that the current CPU is idle.
*/
bool notrace __rcu_is_watching(void)
{
return true;
}
EXPORT_SYMBOL(__rcu_is_watching);
#endif /* defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */
/*
* Helper function for rcu_sched_qs() and rcu_bh_qs().
* Also irqs are disabled to avoid confusion due to interrupt handlers
......@@ -79,7 +77,6 @@ EXPORT_SYMBOL(__rcu_is_watching);
*/
static int rcu_qsctr_help(struct rcu_ctrlblk *rcp)
{
RCU_TRACE(reset_cpu_stall_ticks(rcp);)
if (rcp->donetail != rcp->curtail) {
rcp->donetail = rcp->curtail;
return 1;
......@@ -125,7 +122,6 @@ void rcu_bh_qs(void)
*/
void rcu_check_callbacks(int user)
{
RCU_TRACE(check_cpu_stalls();)
if (user)
rcu_sched_qs();
else if (!in_softirq())
......@@ -140,10 +136,8 @@ void rcu_check_callbacks(int user)
*/
static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
{
const char *rn = NULL;
struct rcu_head *next, *list;
unsigned long flags;
RCU_TRACE(int cb_count = 0;)
/* Move the ready-to-invoke callbacks to a local list. */
local_irq_save(flags);
......@@ -152,7 +146,6 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
local_irq_restore(flags);
return;
}
RCU_TRACE(trace_rcu_batch_start(rcp->name, 0, rcp->qlen, -1);)
list = rcp->rcucblist;
rcp->rcucblist = *rcp->donetail;
*rcp->donetail = NULL;
......@@ -162,22 +155,15 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
local_irq_restore(flags);
/* Invoke the callbacks on the local list. */
RCU_TRACE(rn = rcp->name;)
while (list) {
next = list->next;
prefetch(next);
debug_rcu_head_unqueue(list);
local_bh_disable();
__rcu_reclaim(rn, list);
__rcu_reclaim("", list);
local_bh_enable();
list = next;
RCU_TRACE(cb_count++;)
}
RCU_TRACE(rcu_trace_sub_qlen(rcp, cb_count);)
RCU_TRACE(trace_rcu_batch_end(rcp->name,
cb_count, 0, need_resched(),
is_idle_task(current),
false));
}
static __latent_entropy void rcu_process_callbacks(struct softirq_action *unused)
......@@ -221,7 +207,6 @@ static void __call_rcu(struct rcu_head *head,
local_irq_save(flags);
*rcp->curtail = head;
rcp->curtail = &head->next;
RCU_TRACE(rcp->qlen++;)
local_irq_restore(flags);
if (unlikely(is_idle_task(current))) {
......@@ -254,8 +239,5 @@ EXPORT_SYMBOL_GPL(call_rcu_bh);
void __init rcu_init(void)
{
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
RCU_TRACE(reset_cpu_stall_ticks(&rcu_sched_ctrlblk);)
RCU_TRACE(reset_cpu_stall_ticks(&rcu_bh_ctrlblk);)
rcu_early_boot_tests();
}
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -147,7 +147,7 @@ static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp)
*
* Caller must hold the rcu_state's exp_mutex.
*/
static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
static bool sync_rcu_preempt_exp_done(struct rcu_node *rnp)
{
return rnp->exp_tasks == NULL &&
READ_ONCE(rnp->expmask) == 0;
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -5874,15 +5874,9 @@ int sched_cpu_deactivate(unsigned int cpu)
* users of this state to go away such that all new such users will
* observe it.
*
* For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might
* not imply sync_sched(), so wait for both.
*
* Do sync before park smpboot threads to take care the rcu boost case.
*/
if (IS_ENABLED(CONFIG_PREEMPT))
synchronize_rcu_mult(call_rcu, call_rcu_sched);
else
synchronize_rcu();
synchronize_rcu_mult(call_rcu, call_rcu_sched);
if (!sched_smp_initialized)
return 0;
......
......@@ -126,56 +126,6 @@ config NO_HZ_FULL_ALL
Note the boot CPU will still be kept outside the range to
handle the timekeeping duty.
config NO_HZ_FULL_SYSIDLE
bool "Detect full-system idle state for full dynticks system"
depends on NO_HZ_FULL
default n
help
At least one CPU must keep the scheduling-clock tick running for
timekeeping purposes whenever there is a non-idle CPU, where
"non-idle" also includes dynticks CPUs as long as they are
running non-idle tasks. Because the underlying adaptive-tick
support cannot distinguish between all CPUs being idle and
all CPUs each running a single task in dynticks mode, the
underlying support simply ensures that there is always a CPU
handling the scheduling-clock tick, whether or not all CPUs
are idle. This Kconfig option enables scalable detection of
the all-CPUs-idle state, thus allowing the scheduling-clock
tick to be disabled when all CPUs are idle. Note that scalable
detection of the all-CPUs-idle state means that larger systems
will be slower to declare the all-CPUs-idle state.
Say Y if you would like to help debug all-CPUs-idle detection.
Say N if you are unsure.
config NO_HZ_FULL_SYSIDLE_SMALL
int "Number of CPUs above which large-system approach is used"
depends on NO_HZ_FULL_SYSIDLE
range 1 NR_CPUS
default 8
help
The full-system idle detection mechanism takes a lazy approach
on large systems, as is required to attain decent scalability.
However, on smaller systems, scalability is not anywhere near as
large a concern as is energy efficiency. The sysidle subsystem
therefore uses a fast but non-scalable algorithm for small
systems and a lazier but scalable algorithm for large systems.
This Kconfig parameter defines the number of CPUs in the largest
system that will be considered to be "small".
The default value will be fine in most cases. Battery-powered
systems that (1) enable NO_HZ_FULL_SYSIDLE, (2) have larger
numbers of CPUs, and (3) are suffering from battery-lifetime
problems due to long sysidle latencies might wish to experiment
with larger values for this Kconfig parameter. On the other
hand, they might be even better served by disabling NO_HZ_FULL
entirely, given that NO_HZ_FULL is intended for HPC and
real-time workloads that at present do not tend to be run on
battery-powered systems.
Take the default if you are unsure.
config NO_HZ
bool "Old Idle dynticks config"
depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
......
This diff is collapsed.
......@@ -25,9 +25,6 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
earlycpio.o seq_buf.o siphash.o \
nmi_backtrace.o nodemask.o win_minmax.o
CFLAGS_radix-tree.o += -DCONFIG_SPARSE_RCU_POINTER
CFLAGS_idr.o += -DCONFIG_SPARSE_RCU_POINTER
lib-$(CONFIG_MMU) += ioremap.o
lib-$(CONFIG_SMP) += cpumask.o
lib-$(CONFIG_DMA_NOOP_OPS) += dma-noop.o
......
......@@ -5533,23 +5533,6 @@ sub process {
}
}
# Check for expedited grace periods that interrupt non-idle non-nohz
# online CPUs. These expedited can therefore degrade real-time response
# if used carelessly, and should be avoided where not absolutely
# needed. It is always OK to use synchronize_rcu_expedited() and
# synchronize_sched_expedited() at boot time (before real-time applications
# start) and in error situations where real-time response is compromised in
# any case. Note that synchronize_srcu_expedited() does -not- interrupt
# other CPUs, so don't warn on uses of synchronize_srcu_expedited().
# Of course, nothing comes for free, and srcu_read_lock() and
# srcu_read_unlock() do contain full memory barriers in payment for
# synchronize_srcu_expedited() non-interruption properties.
if ($line =~ /\b(synchronize_rcu_expedited|synchronize_sched_expedited)\(/) {
WARN("EXPEDITED_RCU_GRACE_PERIOD",
"expedited RCU grace periods should be avoided where they can degrade real-time response\n" . $herecurr);
}
# check of hardware specific defines
if ($line =~ m@^.\s*\#\s*if.*\b(__i386__|__powerpc64__|__sun__|__s390x__)\b@ && $realfile !~ m@include/asm-@) {
CHK("ARCH_DEFINES",
......
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment