Commit be77f87c authored by Paul E. McKenney's avatar Paul E. McKenney

Merge branches 'cbnum.2013.06.10a', 'doc.2013.06.10a', 'fixes.2013.06.10a',...

Merge branches 'cbnum.2013.06.10a', 'doc.2013.06.10a', 'fixes.2013.06.10a', 'srcu.2013.06.10a' and 'tiny.2013.06.10a' into HEAD

cbnum.2013.06.10a: Apply simplifications stemming from the new callback
	numbering.

doc.2013.06.10a: Documentation updates.

fixes.2013.06.10a: Miscellaneous fixes.

srcu.2013.06.10a: Updates to SRCU.

tiny.2013.06.10a: Eliminate TINY_PREEMPT_RCU.
......@@ -354,12 +354,6 @@ over a rather long period of time, but improvements are always welcome!
using RCU rather than SRCU, because RCU is almost always faster
and easier to use than is SRCU.
If you need to enter your read-side critical section in a
hardirq or exception handler, and then exit that same read-side
critical section in the task that was interrupted, then you need
to srcu_read_lock_raw() and srcu_read_unlock_raw(), which avoid
the lockdep checking that would otherwise this practice illegal.
Also unlike other forms of RCU, explicit initialization
and cleanup is required via init_srcu_struct() and
cleanup_srcu_struct(). These are passed a "struct srcu_struct"
......
......@@ -182,12 +182,6 @@ torture_type The type of RCU to test, with string values as follows:
"srcu_expedited": srcu_read_lock(), srcu_read_unlock() and
synchronize_srcu_expedited().
"srcu_raw": srcu_read_lock_raw(), srcu_read_unlock_raw(),
and call_srcu().
"srcu_raw_sync": srcu_read_lock_raw(), srcu_read_unlock_raw(),
and synchronize_srcu().
"sched": preempt_disable(), preempt_enable(), and
call_rcu_sched().
......
......@@ -530,113 +530,21 @@ o "nos" counts the number of times we balked for other
reasons, e.g., the grace period ended first.
CONFIG_TINY_RCU and CONFIG_TINY_PREEMPT_RCU debugfs Files and Formats
CONFIG_TINY_RCU debugfs Files and Formats
These implementations of RCU provides a single debugfs file under the
top-level directory RCU, namely rcu/rcudata, which displays fields in
rcu_bh_ctrlblk, rcu_sched_ctrlblk and, for CONFIG_TINY_PREEMPT_RCU,
rcu_preempt_ctrlblk.
rcu_bh_ctrlblk and rcu_sched_ctrlblk.
The output of "cat rcu/rcudata" is as follows:
rcu_preempt: qlen=24 gp=1097669 g197/p197/c197 tasks=...
ttb=. btg=no ntb=184 neb=0 nnb=183 j=01f7 bt=0274
normal balk: nt=1097669 gt=0 bt=371 b=0 ny=25073378 nos=0
exp balk: bt=0 nos=0
rcu_sched: qlen: 0
rcu_bh: qlen: 0
This is split into rcu_preempt, rcu_sched, and rcu_bh sections, with the
rcu_preempt section appearing only in CONFIG_TINY_PREEMPT_RCU builds.
The last three lines of the rcu_preempt section appear only in
CONFIG_RCU_BOOST kernel builds. The fields are as follows:
This is split into rcu_sched and rcu_bh sections. The field is as
follows:
o "qlen" is the number of RCU callbacks currently waiting either
for an RCU grace period or waiting to be invoked. This is the
only field present for rcu_sched and rcu_bh, due to the
short-circuiting of grace period in those two cases.
o "gp" is the number of grace periods that have completed.
o "g197/p197/c197" displays the grace-period state, with the
"g" number being the number of grace periods that have started
(mod 256), the "p" number being the number of grace periods
that the CPU has responded to (also mod 256), and the "c"
number being the number of grace periods that have completed
(once again mode 256).
Why have both "gp" and "g"? Because the data flowing into
"gp" is only present in a CONFIG_RCU_TRACE kernel.
o "tasks" is a set of bits. The first bit is "T" if there are
currently tasks that have recently blocked within an RCU
read-side critical section, the second bit is "N" if any of the
aforementioned tasks are blocking the current RCU grace period,
and the third bit is "E" if any of the aforementioned tasks are
blocking the current expedited grace period. Each bit is "."
if the corresponding condition does not hold.
o "ttb" is a single bit. It is "B" if any of the blocked tasks
need to be priority boosted and "." otherwise.
o "btg" indicates whether boosting has been carried out during
the current grace period, with "exp" indicating that boosting
is in progress for an expedited grace period, "no" indicating
that boosting has not yet started for a normal grace period,
"begun" indicating that boosting has bebug for a normal grace
period, and "done" indicating that boosting has completed for
a normal grace period.
o "ntb" is the total number of tasks subjected to RCU priority boosting
periods since boot.
o "neb" is the number of expedited grace periods that have had
to resort to RCU priority boosting since boot.
o "nnb" is the number of normal grace periods that have had
to resort to RCU priority boosting since boot.
o "j" is the low-order 16 bits of the jiffies counter in hexadecimal.
o "bt" is the low-order 16 bits of the value that the jiffies counter
will have at the next time that boosting is scheduled to begin.
o In the line beginning with "normal balk", the fields are as follows:
o "nt" is the number of times that the system balked from
boosting because there were no blocked tasks to boost.
Note that the system will balk from boosting even if the
grace period is overdue when the currently running task
is looping within an RCU read-side critical section.
There is no point in boosting in this case, because
boosting a running task won't make it run any faster.
o "gt" is the number of times that the system balked
from boosting because, although there were blocked tasks,
none of them were preventing the current grace period
from completing.
o "bt" is the number of times that the system balked
from boosting because boosting was already in progress.
o "b" is the number of times that the system balked from
boosting because boosting had already completed for
the grace period in question.
o "ny" is the number of times that the system balked from
boosting because it was not yet time to start boosting
the grace period in question.
o "nos" is the number of times that the system balked from
boosting for inexplicable ("not otherwise specified")
reasons. This can actually happen due to races involving
increments of the jiffies counter.
o In the line beginning with "exp balk", the fields are as follows:
o "bt" is the number of times that the system balked from
boosting because there were no blocked tasks to boost.
o "nos" is the number of times that the system balked from
boosting for inexplicable ("not otherwise specified")
reasons.
......@@ -842,9 +842,7 @@ SRCU: Critical sections Grace period Barrier
srcu_read_lock synchronize_srcu srcu_barrier
srcu_read_unlock call_srcu
srcu_read_lock_raw synchronize_srcu_expedited
srcu_read_unlock_raw
srcu_dereference
srcu_dereference synchronize_srcu_expedited
SRCU: Initialization/cleanup
init_srcu_struct
......@@ -865,38 +863,32 @@ list can be helpful:
a. Will readers need to block? If so, you need SRCU.
b. Is it necessary to start a read-side critical section in a
hardirq handler or exception handler, and then to complete
this read-side critical section in the task that was
interrupted? If so, you need SRCU's srcu_read_lock_raw() and
srcu_read_unlock_raw() primitives.
c. What about the -rt patchset? If readers would need to block
b. What about the -rt patchset? If readers would need to block
in an non-rt kernel, you need SRCU. If readers would block
in a -rt kernel, but not in a non-rt kernel, SRCU is not
necessary.
d. Do you need to treat NMI handlers, hardirq handlers,
c. Do you need to treat NMI handlers, hardirq handlers,
and code segments with preemption disabled (whether
via preempt_disable(), local_irq_save(), local_bh_disable(),
or some other mechanism) as if they were explicit RCU readers?
If so, RCU-sched is the only choice that will work for you.
e. Do you need RCU grace periods to complete even in the face
d. Do you need RCU grace periods to complete even in the face
of softirq monopolization of one or more of the CPUs? For
example, is your code subject to network-based denial-of-service
attacks? If so, you need RCU-bh.
f. Is your workload too update-intensive for normal use of
e. Is your workload too update-intensive for normal use of
RCU, but inappropriate for other synchronization mechanisms?
If so, consider SLAB_DESTROY_BY_RCU. But please be careful!
g. Do you need read-side critical sections that are respected
f. Do you need read-side critical sections that are respected
even though they are in the middle of the idle loop, during
user-mode execution, or on an offlined CPU? If so, SRCU is the
only choice that will work for you.
h. Otherwise, use RCU.
g. Otherwise, use RCU.
Of course, this all assumes that you have determined that RCU is in fact
the right tool for your job.
......
......@@ -157,6 +157,53 @@ RCU_SOFTIRQ: Do at least one of the following:
calls and by forcing both kernel threads and interrupts
to execute elsewhere.
Name: kworker/%u:%d%s (cpu, id, priority)
Purpose: Execute workqueue requests
To reduce its OS jitter, do any of the following:
1. Run your workload at a real-time priority, which will allow
preempting the kworker daemons.
2. Do any of the following needed to avoid jitter that your
application cannot tolerate:
a. Build your kernel with CONFIG_SLUB=y rather than
CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
use of each CPU's workqueues to run its cache_reap()
function.
b. Avoid using oprofile, thus avoiding OS jitter from
wq_sync_buffer().
c. Limit your CPU frequency so that a CPU-frequency
governor is not required, possibly enlisting the aid of
special heatsinks or other cooling technologies. If done
correctly, and if you CPU architecture permits, you should
be able to build your kernel with CONFIG_CPU_FREQ=n to
avoid the CPU-frequency governor periodically running
on each CPU, including cs_dbs_timer() and od_dbs_timer().
WARNING: Please check your CPU specifications to
make sure that this is safe on your particular system.
d. It is not possible to entirely get rid of OS jitter
from vmstat_update() on CONFIG_SMP=y systems, but you
can decrease its frequency by writing a large value to
/proc/sys/vm/stat_interval. The default value is HZ,
for an interval of one second. Of course, larger values
will make your virtual-memory statistics update more
slowly. Of course, you can also run your workload at
a real-time priority, thus preempting vmstat_update().
e. If running on high-end powerpc servers, build with
CONFIG_PPC_RTAS_DAEMON=n. This prevents the RTAS
daemon from running on each CPU every second or so.
(This will require editing Kconfig files and will defeat
this platform's RAS functionality.) This avoids jitter
due to the rtas_event_scan() function.
WARNING: Please check your CPU specifications to
make sure that this is safe on your particular system.
f. If running on Cell Processor, build your kernel with
CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from
spu_gov_work().
WARNING: Please check your CPU specifications to
make sure that this is safe on your particular system.
g. If running on PowerMAC, build your kernel with
CONFIG_PMAC_RACKMETER=n to disable the CPU-meter,
avoiding OS jitter from rackmeter_do_timer().
Name: rcuc/%u
Purpose: Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels.
To reduce its OS jitter, do at least one of the following:
......
......@@ -7,21 +7,59 @@ efficiency and reducing OS jitter. Reducing OS jitter is important for
some types of computationally intensive high-performance computing (HPC)
applications and for real-time applications.
There are two main contexts in which the number of scheduling-clock
interrupts can be reduced compared to the old-school approach of sending
a scheduling-clock interrupt to all CPUs every jiffy whether they need
it or not (CONFIG_HZ_PERIODIC=y or CONFIG_NO_HZ=n for older kernels):
There are three main ways of managing scheduling-clock interrupts
(also known as "scheduling-clock ticks" or simply "ticks"):
1. Idle CPUs (CONFIG_NO_HZ_IDLE=y or CONFIG_NO_HZ=y for older kernels).
1. Never omit scheduling-clock ticks (CONFIG_HZ_PERIODIC=y or
CONFIG_NO_HZ=n for older kernels). You normally will -not-
want to choose this option.
2. CPUs having only one runnable task (CONFIG_NO_HZ_FULL=y).
2. Omit scheduling-clock ticks on idle CPUs (CONFIG_NO_HZ_IDLE=y or
CONFIG_NO_HZ=y for older kernels). This is the most common
approach, and should be the default.
These two cases are described in the following two sections, followed
3. Omit scheduling-clock ticks on CPUs that are either idle or that
have only one runnable task (CONFIG_NO_HZ_FULL=y). Unless you
are running realtime applications or certain types of HPC
workloads, you will normally -not- want this option.
These three cases are described in the following three sections, followed
by a third section on RCU-specific considerations and a fourth and final
section listing known issues.
IDLE CPUs
NEVER OMIT SCHEDULING-CLOCK TICKS
Very old versions of Linux from the 1990s and the very early 2000s
are incapable of omitting scheduling-clock ticks. It turns out that
there are some situations where this old-school approach is still the
right approach, for example, in heavy workloads with lots of tasks
that use short bursts of CPU, where there are very frequent idle
periods, but where these idle periods are also quite short (tens or
hundreds of microseconds). For these types of workloads, scheduling
clock interrupts will normally be delivered any way because there
will frequently be multiple runnable tasks per CPU. In these cases,
attempting to turn off the scheduling clock interrupt will have no effect
other than increasing the overhead of switching to and from idle and
transitioning between user and kernel execution.
This mode of operation can be selected using CONFIG_HZ_PERIODIC=y (or
CONFIG_NO_HZ=n for older kernels).
However, if you are instead running a light workload with long idle
periods, failing to omit scheduling-clock interrupts will result in
excessive power consumption. This is especially bad on battery-powered
devices, where it results in extremely short battery lifetimes. If you
are running light workloads, you should therefore read the following
section.
In addition, if you are running either a real-time workload or an HPC
workload with short iterations, the scheduling-clock interrupts can
degrade your applications performance. If this describes your workload,
you should read the following two sections.
OMIT SCHEDULING-CLOCK TICKS FOR IDLE CPUs
If a CPU is idle, there is little point in sending it a scheduling-clock
interrupt. After all, the primary purpose of a scheduling-clock interrupt
......@@ -59,10 +97,12 @@ By default, CONFIG_NO_HZ_IDLE=y kernels boot with "nohz=on", enabling
dyntick-idle mode.
CPUs WITH ONLY ONE RUNNABLE TASK
OMIT SCHEDULING-CLOCK TICKS FOR CPUs WITH ONLY ONE RUNNABLE TASK
If a CPU has only one runnable task, there is little point in sending it
a scheduling-clock interrupt because there is no other task to switch to.
Note that omitting scheduling-clock ticks for CPUs with only one runnable
task implies also omitting them for idle CPUs.
The CONFIG_NO_HZ_FULL=y Kconfig option causes the kernel to avoid
sending scheduling-clock interrupts to CPUs with a single runnable task,
......@@ -238,6 +278,11 @@ o Adaptive-ticks does not do anything unless there is only one
single runnable SCHED_FIFO task and multiple runnable SCHED_OTHER
tasks, even though these interrupts are unnecessary.
And even when there are multiple runnable tasks on a given CPU,
there is little point in interrupting that CPU until the current
running task's timeslice expires, which is almost always way
longer than the time of the next scheduling-clock interrupt.
Better handling of these sorts of situations is future work.
o A reboot is required to reconfigure both adaptive idle and RCU
......@@ -268,6 +313,16 @@ o Unless all CPUs are idle, at least one CPU must keep the
scheduling-clock interrupt going in order to support accurate
timekeeping.
o If there are adaptive-ticks CPUs, there will be at least one
CPU keeping the scheduling-clock interrupt going, even if all
CPUs are otherwise idle.
o If there might potentially be some adaptive-ticks CPUs, there
will be at least one CPU keeping the scheduling-clock interrupt
going, even if all CPUs are otherwise idle.
Better handling of this situation is ongoing work.
o Some process-handling operations still require the occasional
scheduling-clock tick. These operations include calculating CPU
load, maintaining sched average, computing CFS entity vruntime,
computing avenrun, and carrying out load balancing. They are
currently accommodated by scheduling-clock tick every second
or so. On-going work will eliminate the need even for these
infrequent scheduling-clock ticks.
......@@ -1864,7 +1864,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
up_out:
up_read(&current->mm->mmap_sem);
goto out;
goto out_srcu;
}
int kvmppc_core_init_vm(struct kvm *kvm)
......
......@@ -128,7 +128,7 @@ extern void synchronize_irq(unsigned int irq);
# define synchronize_irq(irq) barrier()
#endif
#if defined(CONFIG_TINY_RCU) || defined(CONFIG_TINY_PREEMPT_RCU)
#if defined(CONFIG_TINY_RCU)
static inline void rcu_nmi_enter(void)
{
......
......@@ -216,6 +216,7 @@ static inline int rcu_preempt_depth(void)
#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
/* Internal to kernel */
extern void rcu_init(void);
extern void rcu_sched_qs(int cpu);
extern void rcu_bh_qs(int cpu);
extern void rcu_check_callbacks(int cpu, int user);
......@@ -239,8 +240,6 @@ static inline void rcu_user_hooks_switch(struct task_struct *prev,
struct task_struct *next) { }
#endif /* CONFIG_RCU_USER_QS */
extern void exit_rcu(void);
/**
* RCU_NONIDLE - Indicate idle-loop code that needs RCU readers
* @a: Code that RCU needs to pay attention to.
......@@ -277,7 +276,7 @@ void wait_rcu_gp(call_rcu_func_t crf);
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_TREE_PREEMPT_RCU)
#include <linux/rcutree.h>
#elif defined(CONFIG_TINY_RCU) || defined(CONFIG_TINY_PREEMPT_RCU)
#elif defined(CONFIG_TINY_RCU)
#include <linux/rcutiny.h>
#else
#error "Unknown RCU implementation specified to kernel configuration"
......
......@@ -27,10 +27,6 @@
#include <linux/cache.h>
static inline void rcu_init(void)
{
}
static inline void rcu_barrier_bh(void)
{
wait_rcu_gp(call_rcu_bh);
......@@ -41,8 +37,6 @@ static inline void rcu_barrier_sched(void)
wait_rcu_gp(call_rcu_sched);
}
#ifdef CONFIG_TINY_RCU
static inline void synchronize_rcu_expedited(void)
{
synchronize_sched(); /* Only one CPU, so pretty fast anyway!!! */
......@@ -53,17 +47,6 @@ static inline void rcu_barrier(void)
rcu_barrier_sched(); /* Only one CPU, so only one list of callbacks! */
}
#else /* #ifdef CONFIG_TINY_RCU */
void synchronize_rcu_expedited(void);
static inline void rcu_barrier(void)
{
wait_rcu_gp(call_rcu);
}
#endif /* #else #ifdef CONFIG_TINY_RCU */
static inline void synchronize_rcu_bh(void)
{
synchronize_sched();
......@@ -85,35 +68,15 @@ static inline void kfree_call_rcu(struct rcu_head *head,
call_rcu(head, func);
}
#ifdef CONFIG_TINY_RCU
static inline void rcu_preempt_note_context_switch(void)
{
}
static inline int rcu_needs_cpu(int cpu, unsigned long *delta_jiffies)
{
*delta_jiffies = ULONG_MAX;
return 0;
}
#else /* #ifdef CONFIG_TINY_RCU */
void rcu_preempt_note_context_switch(void);
int rcu_preempt_needs_cpu(void);
static inline int rcu_needs_cpu(int cpu, unsigned long *delta_jiffies)
{
*delta_jiffies = ULONG_MAX;
return rcu_preempt_needs_cpu();
}
#endif /* #else #ifdef CONFIG_TINY_RCU */
static inline void rcu_note_context_switch(int cpu)
{
rcu_sched_qs(cpu);
rcu_preempt_note_context_switch();
}
/*
......@@ -156,6 +119,10 @@ static inline void rcu_cpu_stall_reset(void)
{
}
static inline void exit_rcu(void)
{
}
#ifdef CONFIG_DEBUG_LOCK_ALLOC
extern int rcu_scheduler_active __read_mostly;
extern void rcu_scheduler_starting(void);
......
......@@ -30,7 +30,6 @@
#ifndef __LINUX_RCUTREE_H
#define __LINUX_RCUTREE_H
extern void rcu_init(void);
extern void rcu_note_context_switch(int cpu);
extern int rcu_needs_cpu(int cpu, unsigned long *delta_jiffies);
extern void rcu_cpu_stall_reset(void);
......@@ -86,6 +85,8 @@ extern void rcu_force_quiescent_state(void);
extern void rcu_bh_force_quiescent_state(void);
extern void rcu_sched_force_quiescent_state(void);
extern void exit_rcu(void);
extern void rcu_scheduler_starting(void);
extern int rcu_scheduler_active __read_mostly;
......
......@@ -237,47 +237,4 @@ static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
__srcu_read_unlock(sp, idx);
}
/**
* srcu_read_lock_raw - register a new reader for an SRCU-protected structure.
* @sp: srcu_struct in which to register the new reader.
*
* Enter an SRCU read-side critical section. Similar to srcu_read_lock(),
* but avoids the RCU-lockdep checking. This means that it is legal to
* use srcu_read_lock_raw() in one context, for example, in an exception
* handler, and then have the matching srcu_read_unlock_raw() in another
* context, for example in the task that took the exception.
*
* However, the entire SRCU read-side critical section must reside within a
* single task. For example, beware of using srcu_read_lock_raw() in
* a device interrupt handler and srcu_read_unlock() in the interrupted
* task: This will not work if interrupts are threaded.
*/
static inline int srcu_read_lock_raw(struct srcu_struct *sp)
{
unsigned long flags;
int ret;
local_irq_save(flags);
ret = __srcu_read_lock(sp);
local_irq_restore(flags);
return ret;
}
/**
* srcu_read_unlock_raw - unregister reader from an SRCU-protected structure.
* @sp: srcu_struct in which to unregister the old reader.
* @idx: return value from corresponding srcu_read_lock_raw().
*
* Exit an SRCU read-side critical section without lockdep-RCU checking.
* See srcu_read_lock_raw() for more details.
*/
static inline void srcu_read_unlock_raw(struct srcu_struct *sp, int idx)
{
unsigned long flags;
local_irq_save(flags);
__srcu_read_unlock(sp, idx);
local_irq_restore(flags);
}
#endif
......@@ -459,18 +459,10 @@ config TINY_RCU
is not required. This option greatly reduces the
memory footprint of RCU.
config TINY_PREEMPT_RCU
bool "Preemptible UP-only small-memory-footprint RCU"
depends on PREEMPT && !SMP
help
This option selects the RCU implementation that is designed
for real-time UP systems. This option greatly reduces the
memory footprint of RCU.
endchoice
config PREEMPT_RCU
def_bool ( TREE_PREEMPT_RCU || TINY_PREEMPT_RCU )
def_bool TREE_PREEMPT_RCU
help
This option enables preemptible-RCU code that is common between
the TREE_PREEMPT_RCU and TINY_PREEMPT_RCU implementations.
......@@ -656,7 +648,7 @@ config RCU_BOOST_DELAY
Accept the default if unsure.
config RCU_NOCB_CPU
bool "Offload RCU callback processing from boot-selected CPUs (EXPERIMENTAL"
bool "Offload RCU callback processing from boot-selected CPUs"
depends on TREE_RCU || TREE_PREEMPT_RCU
default n
help
......@@ -682,9 +674,10 @@ choice
prompt "Build-forced no-CBs CPUs"
default RCU_NOCB_CPU_NONE
help
This option allows no-CBs CPUs to be specified at build time.
Additional no-CBs CPUs may be specified by the rcu_nocbs=
boot parameter.
This option allows no-CBs CPUs (whose RCU callbacks are invoked
from kthreads rather than from softirq context) to be specified
at build time. Additional no-CBs CPUs may be specified by
the rcu_nocbs= boot parameter.
config RCU_NOCB_CPU_NONE
bool "No build_forced no-CBs CPUs"
......@@ -692,25 +685,40 @@ config RCU_NOCB_CPU_NONE
help
This option does not force any of the CPUs to be no-CBs CPUs.
Only CPUs designated by the rcu_nocbs= boot parameter will be
no-CBs CPUs.
no-CBs CPUs, whose RCU callbacks will be invoked by per-CPU
kthreads whose names begin with "rcuo". All other CPUs will
invoke their own RCU callbacks in softirq context.
Select this option if you want to choose no-CBs CPUs at
boot time, for example, to allow testing of different no-CBs
configurations without having to rebuild the kernel each time.
config RCU_NOCB_CPU_ZERO
bool "CPU 0 is a build_forced no-CBs CPU"
depends on RCU_NOCB_CPU && !NO_HZ_FULL
help
This option forces CPU 0 to be a no-CBs CPU. Additional CPUs
may be designated as no-CBs CPUs using the rcu_nocbs= boot
parameter will be no-CBs CPUs.
This option forces CPU 0 to be a no-CBs CPU, so that its RCU
callbacks are invoked by a per-CPU kthread whose name begins
with "rcuo". Additional CPUs may be designated as no-CBs
CPUs using the rcu_nocbs= boot parameter will be no-CBs CPUs.
All other CPUs will invoke their own RCU callbacks in softirq
context.
Select this if CPU 0 needs to be a no-CBs CPU for real-time
or energy-efficiency reasons.
or energy-efficiency reasons, but the real reason it exists
is to ensure that randconfig testing covers mixed systems.
config RCU_NOCB_CPU_ALL
bool "All CPUs are build_forced no-CBs CPUs"
depends on RCU_NOCB_CPU
help
This option forces all CPUs to be no-CBs CPUs. The rcu_nocbs=
boot parameter will be ignored.
boot parameter will be ignored. All CPUs' RCU callbacks will
be executed in the context of per-CPU rcuo kthreads created for
this purpose. Assuming that the kthreads whose names start with
"rcuo" are bound to "housekeeping" CPUs, this reduces OS jitter
on the remaining CPUs, but might decrease memory locality during
RCU-callback invocation, thus potentially degrading throughput.
Select this if all CPUs need to be no-CBs CPUs for real-time
or energy-efficiency reasons.
......
......@@ -104,31 +104,7 @@ void __rcu_read_unlock(void)
}
EXPORT_SYMBOL_GPL(__rcu_read_unlock);
/*
* Check for a task exiting while in a preemptible-RCU read-side
* critical section, clean up if so. No need to issue warnings,
* as debug_check_no_locks_held() already does this if lockdep
* is enabled.
*/
void exit_rcu(void)
{
struct task_struct *t = current;
if (likely(list_empty(&current->rcu_node_entry)))
return;
t->rcu_read_lock_nesting = 1;
barrier();
t->rcu_read_unlock_special = RCU_READ_UNLOCK_BLOCKED;
__rcu_read_unlock();
}
#else /* #ifdef CONFIG_PREEMPT_RCU */
void exit_rcu(void)
{
}
#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
#endif /* #ifdef CONFIG_PREEMPT_RCU */
#ifdef CONFIG_DEBUG_LOCK_ALLOC
static struct lock_class_key rcu_lock_key;
......@@ -145,9 +121,6 @@ static struct lock_class_key rcu_sched_lock_key;
struct lockdep_map rcu_sched_lock_map =
STATIC_LOCKDEP_MAP_INIT("rcu_read_lock_sched", &rcu_sched_lock_key);
EXPORT_SYMBOL_GPL(rcu_sched_lock_map);
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
int debug_lockdep_rcu_enabled(void)
{
......
......@@ -44,7 +44,6 @@
/* Forward declarations for rcutiny_plugin.h. */
struct rcu_ctrlblk;
static void invoke_rcu_callbacks(void);
static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp);
static void rcu_process_callbacks(struct softirq_action *unused);
static void __call_rcu(struct rcu_head *head,
......@@ -205,7 +204,7 @@ static int rcu_is_cpu_rrupt_from_idle(void)
*/
static int rcu_qsctr_help(struct rcu_ctrlblk *rcp)
{
reset_cpu_stall_ticks(rcp);
RCU_TRACE(reset_cpu_stall_ticks(rcp));
if (rcp->rcucblist != NULL &&
rcp->donetail != rcp->curtail) {
rcp->donetail = rcp->curtail;
......@@ -227,7 +226,7 @@ void rcu_sched_qs(int cpu)
local_irq_save(flags);
if (rcu_qsctr_help(&rcu_sched_ctrlblk) +
rcu_qsctr_help(&rcu_bh_ctrlblk))
invoke_rcu_callbacks();
raise_softirq(RCU_SOFTIRQ);
local_irq_restore(flags);
}
......@@ -240,7 +239,7 @@ void rcu_bh_qs(int cpu)
local_irq_save(flags);
if (rcu_qsctr_help(&rcu_bh_ctrlblk))
invoke_rcu_callbacks();
raise_softirq(RCU_SOFTIRQ);
local_irq_restore(flags);
}
......@@ -252,12 +251,11 @@ void rcu_bh_qs(int cpu)
*/
void rcu_check_callbacks(int cpu, int user)
{
check_cpu_stalls();
RCU_TRACE(check_cpu_stalls());
if (user || rcu_is_cpu_rrupt_from_idle())
rcu_sched_qs(cpu);
else if (!in_softirq())
rcu_bh_qs(cpu);
rcu_preempt_check_callbacks();
}
/*
......@@ -278,7 +276,7 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
ACCESS_ONCE(rcp->rcucblist),
need_resched(),
is_idle_task(current),
rcu_is_callbacks_kthread()));
false));
return;
}
......@@ -290,7 +288,6 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
*rcp->donetail = NULL;
if (rcp->curtail == rcp->donetail)
rcp->curtail = &rcp->rcucblist;
rcu_preempt_remove_callbacks(rcp);
rcp->donetail = &rcp->rcucblist;
local_irq_restore(flags);
......@@ -309,14 +306,13 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
RCU_TRACE(rcu_trace_sub_qlen(rcp, cb_count));
RCU_TRACE(trace_rcu_batch_end(rcp->name, cb_count, 0, need_resched(),
is_idle_task(current),
rcu_is_callbacks_kthread()));
false));
}
static void rcu_process_callbacks(struct softirq_action *unused)
{
__rcu_process_callbacks(&rcu_sched_ctrlblk);
__rcu_process_callbacks(&rcu_bh_ctrlblk);
rcu_preempt_process_callbacks();
}
/*
......@@ -382,3 +378,8 @@ void call_rcu_bh(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
__call_rcu(head, func, &rcu_bh_ctrlblk);
}
EXPORT_SYMBOL_GPL(call_rcu_bh);
void rcu_init(void)
{
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
}
This diff is collapsed.
......@@ -695,44 +695,6 @@ static struct rcu_torture_ops srcu_sync_ops = {
.name = "srcu_sync"
};
static int srcu_torture_read_lock_raw(void) __acquires(&srcu_ctl)
{
return srcu_read_lock_raw(&srcu_ctl);
}
static void srcu_torture_read_unlock_raw(int idx) __releases(&srcu_ctl)
{
srcu_read_unlock_raw(&srcu_ctl, idx);
}
static struct rcu_torture_ops srcu_raw_ops = {
.init = rcu_sync_torture_init,
.readlock = srcu_torture_read_lock_raw,
.read_delay = srcu_read_delay,
.readunlock = srcu_torture_read_unlock_raw,
.completed = srcu_torture_completed,
.deferred_free = srcu_torture_deferred_free,
.sync = srcu_torture_synchronize,
.call = NULL,
.cb_barrier = NULL,
.stats = srcu_torture_stats,
.name = "srcu_raw"
};
static struct rcu_torture_ops srcu_raw_sync_ops = {
.init = rcu_sync_torture_init,
.readlock = srcu_torture_read_lock_raw,
.read_delay = srcu_read_delay,
.readunlock = srcu_torture_read_unlock_raw,
.completed = srcu_torture_completed,
.deferred_free = rcu_sync_torture_deferred_free,
.sync = srcu_torture_synchronize,
.call = NULL,
.cb_barrier = NULL,
.stats = srcu_torture_stats,
.name = "srcu_raw_sync"
};
static void srcu_torture_synchronize_expedited(void)
{
synchronize_srcu_expedited(&srcu_ctl);
......@@ -1983,7 +1945,6 @@ rcu_torture_init(void)
{ &rcu_ops, &rcu_sync_ops, &rcu_expedited_ops,
&rcu_bh_ops, &rcu_bh_sync_ops, &rcu_bh_expedited_ops,
&srcu_ops, &srcu_sync_ops, &srcu_expedited_ops,
&srcu_raw_ops, &srcu_raw_sync_ops,
&sched_ops, &sched_sync_ops, &sched_expedited_ops, };
mutex_lock(&fullstop_mutex);
......
......@@ -218,8 +218,8 @@ module_param(blimit, long, 0444);
module_param(qhimark, long, 0444);
module_param(qlowmark, long, 0444);
static ulong jiffies_till_first_fqs = RCU_JIFFIES_TILL_FORCE_QS;
static ulong jiffies_till_next_fqs = RCU_JIFFIES_TILL_FORCE_QS;
static ulong jiffies_till_first_fqs = ULONG_MAX;
static ulong jiffies_till_next_fqs = ULONG_MAX;
module_param(jiffies_till_first_fqs, ulong, 0644);
module_param(jiffies_till_next_fqs, ulong, 0644);
......@@ -3171,11 +3171,25 @@ static void __init rcu_init_one(struct rcu_state *rsp,
*/
static void __init rcu_init_geometry(void)
{
ulong d;
int i;
int j;
int n = nr_cpu_ids;
int rcu_capacity[MAX_RCU_LVLS + 1];
/*
* Initialize any unspecified boot parameters.
* The default values of jiffies_till_first_fqs and
* jiffies_till_next_fqs are set to the RCU_JIFFIES_TILL_FORCE_QS
* value, which is a function of HZ, then adding one for each
* RCU_JIFFIES_FQS_DIV CPUs that might be on the system.
*/
d = RCU_JIFFIES_TILL_FORCE_QS + nr_cpu_ids / RCU_JIFFIES_FQS_DIV;
if (jiffies_till_first_fqs == ULONG_MAX)
jiffies_till_first_fqs = d;
if (jiffies_till_next_fqs == ULONG_MAX)
jiffies_till_next_fqs = d;
/* If the compile-time values are accurate, just leave. */
if (rcu_fanout_leaf == CONFIG_RCU_FANOUT_LEAF &&
nr_cpu_ids == NR_CPUS)
......
......@@ -343,12 +343,17 @@ struct rcu_data {
#define RCU_FORCE_QS 3 /* Need to force quiescent state. */
#define RCU_SIGNAL_INIT RCU_SAVE_DYNTICK
#define RCU_JIFFIES_TILL_FORCE_QS 3 /* for rsp->jiffies_force_qs */
#define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
/* For jiffies_till_first_fqs and */
/* and jiffies_till_next_fqs. */
#define RCU_STALL_RAT_DELAY 2 /* Allow other CPUs time */
/* to take at least one */
/* scheduling clock irq */
/* before ratting on them. */
#define RCU_JIFFIES_FQS_DIV 256 /* Very large systems need more */
/* delay between bouts of */
/* quiescent-state forcing. */
#define RCU_STALL_RAT_DELAY 2 /* Allow other CPUs time to take */
/* at least one scheduling clock */
/* irq before ratting on them. */
#define rcu_wait(cond) \
do { \
......
......@@ -81,7 +81,7 @@ static void __init rcu_bootup_announce_oddness(void)
pr_info("\tFour-level hierarchy is enabled.\n");
#endif
if (rcu_fanout_leaf != CONFIG_RCU_FANOUT_LEAF)
pr_info("\tExperimental boot-time adjustment of leaf fanout to %d.\n", rcu_fanout_leaf);
pr_info("\tBoot-time adjustment of leaf fanout to %d.\n", rcu_fanout_leaf);
if (nr_cpu_ids != NR_CPUS)
pr_info("\tRCU restricting CPUs from NR_CPUS=%d to nr_cpu_ids=%d.\n", NR_CPUS, nr_cpu_ids);
#ifdef CONFIG_RCU_NOCB_CPU
......@@ -91,19 +91,19 @@ static void __init rcu_bootup_announce_oddness(void)
have_rcu_nocb_mask = true;
}
#ifdef CONFIG_RCU_NOCB_CPU_ZERO
pr_info("\tExperimental no-CBs CPU 0\n");
pr_info("\tOffload RCU callbacks from CPU 0\n");
cpumask_set_cpu(0, rcu_nocb_mask);
#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ZERO */
#ifdef CONFIG_RCU_NOCB_CPU_ALL
pr_info("\tExperimental no-CBs for all CPUs\n");
pr_info("\tOffload RCU callbacks from all CPUs\n");
cpumask_setall(rcu_nocb_mask);
#endif /* #ifdef CONFIG_RCU_NOCB_CPU_ALL */
#endif /* #ifndef CONFIG_RCU_NOCB_CPU_NONE */
if (have_rcu_nocb_mask) {
cpulist_scnprintf(nocb_buf, sizeof(nocb_buf), rcu_nocb_mask);
pr_info("\tExperimental no-CBs CPUs: %s.\n", nocb_buf);
pr_info("\tOffload RCU callbacks from CPUs: %s.\n", nocb_buf);
if (rcu_nocb_poll)
pr_info("\tExperimental polled no-CBs CPUs.\n");
pr_info("\tPoll for callbacks from no-CBs CPUs.\n");
}
#endif /* #ifdef CONFIG_RCU_NOCB_CPU */
}
......@@ -932,6 +932,24 @@ static void __init __rcu_init_preempt(void)
rcu_init_one(&rcu_preempt_state, &rcu_preempt_data);
}
/*
* Check for a task exiting while in a preemptible-RCU read-side
* critical section, clean up if so. No need to issue warnings,
* as debug_check_no_locks_held() already does this if lockdep
* is enabled.
*/
void exit_rcu(void)
{
struct task_struct *t = current;
if (likely(list_empty(&current->rcu_node_entry)))
return;
t->rcu_read_lock_nesting = 1;
barrier();
t->rcu_read_unlock_special = RCU_READ_UNLOCK_BLOCKED;
__rcu_read_unlock();
}
#else /* #ifdef CONFIG_TREE_PREEMPT_RCU */
static struct rcu_state *rcu_state = &rcu_sched_state;
......@@ -1100,6 +1118,14 @@ static void __init __rcu_init_preempt(void)
{
}
/*
* Because preemptible RCU does not exist, tasks cannot possibly exit
* while in preemptible RCU read-side critical sections.
*/
void exit_rcu(void)
{
}
#endif /* #else #ifdef CONFIG_TREE_PREEMPT_RCU */
#ifdef CONFIG_RCU_BOOST
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment