Commit 41eea65e authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU changes from Ingo Molnar:

 - Debugging for smp_call_function()

 - RT raw/non-raw lock ordering fixes

 - Strict grace periods for KASAN

 - New smp_call_function() torture test

 - Torture-test updates

 - Documentation updates

 - Miscellaneous fixes

[ This doesn't actually pull the tag - I've dropped the last merge from
  the RCU branch due to questions about the series.   - Linus ]

* tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
  smp: Make symbol 'csd_bug_count' static
  kernel/smp: Provide CSD lock timeout diagnostics
  smp: Add source and destination CPUs to __call_single_data
  rcu: Shrink each possible cpu krcp
  rcu/segcblist: Prevent useless GP start if no CBs to accelerate
  torture: Add gdb support
  rcutorture: Allow pointer leaks to test diagnostic code
  rcutorture: Hoist OOM registry up one level
  refperf: Avoid null pointer dereference when buf fails to allocate
  rcutorture: Properly synchronize with OOM notifier
  rcutorture: Properly set rcu_fwds for OOM handling
  torture: Add kvm.sh --help and update help message
  rcutorture: Add CONFIG_PROVE_RCU_LIST to TREE05
  torture: Update initrd documentation
  rcutorture: Replace HTTP links with HTTPS ones
  locktorture: Make function torture_percpu_rwsem_init() static
  torture: document --allcpus argument added to the kvm.sh script
  rcutorture: Output number of elapsed grace periods
  rcutorture: Remove KCSAN stubs
  rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp()
  ...
parents 373014bb b36c830f
......@@ -963,7 +963,7 @@ exit and perhaps also vice versa. Therefore, whenever the
``->dynticks_nesting`` field is incremented up from zero, the
``->dynticks_nmi_nesting`` field is set to a large positive number, and
whenever the ``->dynticks_nesting`` field is decremented down to zero,
the the ``->dynticks_nmi_nesting`` field is set to zero. Assuming that
the ``->dynticks_nmi_nesting`` field is set to zero. Assuming that
the number of misnested interrupts is not sufficient to overflow the
counter, this approach corrects the ``->dynticks_nmi_nesting`` field
every time the corresponding CPU enters the idle loop from process
......
......@@ -2162,7 +2162,7 @@ scheduling-clock interrupt be enabled when RCU needs it to be:
this sort of thing.
#. If a CPU is in a portion of the kernel that is absolutely positively
no-joking guaranteed to never execute any RCU read-side critical
sections, and RCU believes this CPU to to be idle, no problem. This
sections, and RCU believes this CPU to be idle, no problem. This
sort of thing is used by some architectures for light-weight
exception handlers, which can then avoid the overhead of
``rcu_irq_enter()`` and ``rcu_irq_exit()`` at exception entry and
......@@ -2431,7 +2431,7 @@ However, there are legitimate preemptible-RCU implementations that do
not have this property, given that any point in the code outside of an
RCU read-side critical section can be a quiescent state. Therefore,
*RCU-sched* was created, which follows “classic” RCU in that an
RCU-sched grace period waits for for pre-existing interrupt and NMI
RCU-sched grace period waits for pre-existing interrupt and NMI
handlers. In kernels built with ``CONFIG_PREEMPT=n``, the RCU and
RCU-sched APIs have identical implementations, while kernels built with
``CONFIG_PREEMPT=y`` provide a separate implementation for each.
......
......@@ -360,7 +360,7 @@ order to amortize their overhead over many uses of the corresponding APIs.
There are at least three flavors of RCU usage in the Linux kernel. The diagram
above shows the most common one. On the updater side, the rcu_assign_pointer(),
sychronize_rcu() and call_rcu() primitives used are the same for all three
synchronize_rcu() and call_rcu() primitives used are the same for all three
flavors. However for protection (on the reader side), the primitives used vary
depending on the flavor:
......
......@@ -3095,6 +3095,10 @@
and gids from such clients. This is intended to ease
migration from NFSv2/v3.
nmi_backtrace.backtrace_idle [KNL]
Dump stacks even of idle CPUs in response to an
NMI stack-backtrace request.
nmi_debug= [KNL,SH] Specify one or more actions to take
when a NMI is triggered.
Format: [state][,regs][,debounce][,die]
......@@ -4174,46 +4178,55 @@
This wake_up() will be accompanied by a
WARN_ONCE() splat and an ftrace_dump().
rcutree.rcu_unlock_delay= [KNL]
In CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels,
this specifies an rcu_read_unlock()-time delay
in microseconds. This defaults to zero.
Larger delays increase the probability of
catching RCU pointer leaks, that is, buggy use
of RCU-protected pointers after the relevant
rcu_read_unlock() has completed.
rcutree.sysrq_rcu= [KNL]
Commandeer a sysrq key to dump out Tree RCU's
rcu_node tree with an eye towards determining
why a new grace period has not yet started.
rcuperf.gp_async= [KNL]
rcuscale.gp_async= [KNL]
Measure performance of asynchronous
grace-period primitives such as call_rcu().
rcuperf.gp_async_max= [KNL]
rcuscale.gp_async_max= [KNL]
Specify the maximum number of outstanding
callbacks per writer thread. When a writer
thread exceeds this limit, it invokes the
corresponding flavor of rcu_barrier() to allow
previously posted callbacks to drain.
rcuperf.gp_exp= [KNL]
rcuscale.gp_exp= [KNL]
Measure performance of expedited synchronous
grace-period primitives.
rcuperf.holdoff= [KNL]
rcuscale.holdoff= [KNL]
Set test-start holdoff period. The purpose of
this parameter is to delay the start of the
test until boot completes in order to avoid
interference.
rcuperf.kfree_rcu_test= [KNL]
rcuscale.kfree_rcu_test= [KNL]
Set to measure performance of kfree_rcu() flooding.
rcuperf.kfree_nthreads= [KNL]
rcuscale.kfree_nthreads= [KNL]
The number of threads running loops of kfree_rcu().
rcuperf.kfree_alloc_num= [KNL]
rcuscale.kfree_alloc_num= [KNL]
Number of allocations and frees done in an iteration.
rcuperf.kfree_loops= [KNL]
Number of loops doing rcuperf.kfree_alloc_num number
rcuscale.kfree_loops= [KNL]
Number of loops doing rcuscale.kfree_alloc_num number
of allocations and frees.
rcuperf.nreaders= [KNL]
rcuscale.nreaders= [KNL]
Set number of RCU readers. The value -1 selects
N, where N is the number of CPUs. A value
"n" less than -1 selects N-n+1, where N is again
......@@ -4222,23 +4235,23 @@
A value of "n" less than or equal to -N selects
a single reader.
rcuperf.nwriters= [KNL]
rcuscale.nwriters= [KNL]
Set number of RCU writers. The values operate
the same as for rcuperf.nreaders.
the same as for rcuscale.nreaders.
N, where N is the number of CPUs
rcuperf.perf_type= [KNL]
rcuscale.perf_type= [KNL]
Specify the RCU implementation to test.
rcuperf.shutdown= [KNL]
rcuscale.shutdown= [KNL]
Shut the system down after performance tests
complete. This is useful for hands-off automated
testing.
rcuperf.verbose= [KNL]
rcuscale.verbose= [KNL]
Enable additional printk() statements.
rcuperf.writer_holdoff= [KNL]
rcuscale.writer_holdoff= [KNL]
Write-side holdoff between grace periods,
in microseconds. The default of zero says
no holdoff.
......@@ -4291,6 +4304,18 @@
are zero, rcutorture acts as if is interpreted
they are all non-zero.
rcutorture.irqreader= [KNL]
Run RCU readers from irq handlers, or, more
accurately, from a timer handler. Not all RCU
flavors take kindly to this sort of thing.
rcutorture.leakpointer= [KNL]
Leak an RCU-protected pointer out of the reader.
This can of course result in splats, and is
intended to test the ability of things like
CONFIG_RCU_STRICT_GRACE_PERIOD=y to detect
such leaks.
rcutorture.n_barrier_cbs= [KNL]
Set callbacks/threads for rcu_barrier() testing.
......@@ -4512,8 +4537,8 @@
refscale.shutdown= [KNL]
Shut down the system at the end of the performance
test. This defaults to 1 (shut it down) when
rcuperf is built into the kernel and to 0 (leave
it running) when rcuperf is built as a module.
refscale is built into the kernel and to 0 (leave
it running) when refscale is built as a module.
refscale.verbose= [KNL]
Enable additional printk() statements.
......@@ -4659,6 +4684,98 @@
Format: integer between 0 and 10
Default is 0.
scftorture.holdoff= [KNL]
Number of seconds to hold off before starting
test. Defaults to zero for module insertion and
to 10 seconds for built-in smp_call_function()
tests.
scftorture.longwait= [KNL]
Request ridiculously long waits randomly selected
up to the chosen limit in seconds. Zero (the
default) disables this feature. Please note
that requesting even small non-zero numbers of
seconds can result in RCU CPU stall warnings,
softlockup complaints, and so on.
scftorture.nthreads= [KNL]
Number of kthreads to spawn to invoke the
smp_call_function() family of functions.
The default of -1 specifies a number of kthreads
equal to the number of CPUs.
scftorture.onoff_holdoff= [KNL]
Number seconds to wait after the start of the
test before initiating CPU-hotplug operations.
scftorture.onoff_interval= [KNL]
Number seconds to wait between successive
CPU-hotplug operations. Specifying zero (which
is the default) disables CPU-hotplug operations.
scftorture.shutdown_secs= [KNL]
The number of seconds following the start of the
test after which to shut down the system. The
default of zero avoids shutting down the system.
Non-zero values are useful for automated tests.
scftorture.stat_interval= [KNL]
The number of seconds between outputting the
current test statistics to the console. A value
of zero disables statistics output.
scftorture.stutter_cpus= [KNL]
The number of jiffies to wait between each change
to the set of CPUs under test.
scftorture.use_cpus_read_lock= [KNL]
Use use_cpus_read_lock() instead of the default
preempt_disable() to disable CPU hotplug
while invoking one of the smp_call_function*()
functions.
scftorture.verbose= [KNL]
Enable additional printk() statements.
scftorture.weight_single= [KNL]
The probability weighting to use for the
smp_call_function_single() function with a zero
"wait" parameter. A value of -1 selects the
default if all other weights are -1. However,
if at least one weight has some other value, a
value of -1 will instead select a weight of zero.
scftorture.weight_single_wait= [KNL]
The probability weighting to use for the
smp_call_function_single() function with a
non-zero "wait" parameter. See weight_single.
scftorture.weight_many= [KNL]
The probability weighting to use for the
smp_call_function_many() function with a zero
"wait" parameter. See weight_single.
Note well that setting a high probability for
this weighting can place serious IPI load
on the system.
scftorture.weight_many_wait= [KNL]
The probability weighting to use for the
smp_call_function_many() function with a
non-zero "wait" parameter. See weight_single
and weight_many.
scftorture.weight_all= [KNL]
The probability weighting to use for the
smp_call_function_all() function with a zero
"wait" parameter. See weight_single and
weight_many.
scftorture.weight_all_wait= [KNL]
The probability weighting to use for the
smp_call_function_all() function with a
non-zero "wait" parameter. See weight_single
and weight_many.
skew_tick= [KNL] Offset the periodic timer tick per cpu to mitigate
xtime_lock contention on larger systems, and/or RCU lock
contention on all systems with CONFIG_MAXSMP set.
......
......@@ -17672,8 +17672,9 @@ S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev
F: Documentation/RCU/torture.rst
F: kernel/locking/locktorture.c
F: kernel/rcu/rcuperf.c
F: kernel/rcu/rcuscale.c
F: kernel/rcu/rcutorture.c
F: kernel/rcu/refscale.c
F: kernel/torture.c
TOSHIBA ACPI EXTRAS DRIVER
......
......@@ -229,7 +229,8 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
return;
idx = srcu_read_lock(&head->track_srcu);
hlist_for_each_entry_rcu(n, &head->track_notifier_list, node)
hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
srcu_read_lock_held(&head->track_srcu))
if (n->track_write)
n->track_write(vcpu, gpa, new, bytes, n);
srcu_read_unlock(&head->track_srcu, idx);
......@@ -254,7 +255,8 @@ void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
return;
idx = srcu_read_lock(&head->track_srcu);
hlist_for_each_entry_rcu(n, &head->track_notifier_list, node)
hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
srcu_read_lock_held(&head->track_srcu))
if (n->track_flush_slot)
n->track_flush_slot(kvm, slot, n);
srcu_read_unlock(&head->track_srcu, idx);
......
......@@ -63,9 +63,17 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list)
RCU_LOCKDEP_WARN(!(cond) && !rcu_read_lock_any_held(), \
"RCU-list traversed in non-reader section!"); \
})
#define __list_check_srcu(cond) \
({ \
RCU_LOCKDEP_WARN(!(cond), \
"RCU-list traversed without holding the required lock!");\
})
#else
#define __list_check_rcu(dummy, cond, extra...) \
({ check_arg_count_one(extra); })
#define __list_check_srcu(cond) ({ })
#endif
/*
......@@ -385,6 +393,25 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
&pos->member != (head); \
pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
/**
* list_for_each_entry_srcu - iterate over rcu list of given type
* @pos: the type * to use as a loop cursor.
* @head: the head for your list.
* @member: the name of the list_head within the struct.
* @cond: lockdep expression for the lock required to traverse the list.
*
* This list-traversal primitive may safely run concurrently with
* the _rcu list-mutation primitives such as list_add_rcu()
* as long as the traversal is guarded by srcu_read_lock().
* The lockdep expression srcu_read_lock_held() can be passed as the
* cond argument from read side.
*/
#define list_for_each_entry_srcu(pos, head, member, cond) \
for (__list_check_srcu(cond), \
pos = list_entry_rcu((head)->next, typeof(*pos), member); \
&pos->member != (head); \
pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
/**
* list_entry_lockless - get the struct for this entry
* @ptr: the &struct list_head pointer.
......@@ -683,6 +710,27 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n,
pos = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(\
&(pos)->member)), typeof(*(pos)), member))
/**
* hlist_for_each_entry_srcu - iterate over rcu list of given type
* @pos: the type * to use as a loop cursor.
* @head: the head for your list.
* @member: the name of the hlist_node within the struct.
* @cond: lockdep expression for the lock required to traverse the list.
*
* This list-traversal primitive may safely run concurrently with
* the _rcu list-mutation primitives such as hlist_add_head_rcu()
* as long as the traversal is guarded by srcu_read_lock().
* The lockdep expression srcu_read_lock_held() can be passed as the
* cond argument from read side.
*/
#define hlist_for_each_entry_srcu(pos, head, member, cond) \
for (__list_check_srcu(cond), \
pos = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(head)),\
typeof(*(pos)), member); \
pos; \
pos = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(\
&(pos)->member)), typeof(*(pos)), member))
/**
* hlist_for_each_entry_rcu_notrace - iterate over rcu list of given type (for tracing)
* @pos: the type * to use as a loop cursor.
......
......@@ -55,6 +55,12 @@ void __rcu_read_unlock(void);
#else /* #ifdef CONFIG_PREEMPT_RCU */
#ifdef CONFIG_TINY_RCU
#define rcu_read_unlock_strict() do { } while (0)
#else
void rcu_read_unlock_strict(void);
#endif
static inline void __rcu_read_lock(void)
{
preempt_disable();
......@@ -63,6 +69,7 @@ static inline void __rcu_read_lock(void)
static inline void __rcu_read_unlock(void)
{
preempt_enable();
rcu_read_unlock_strict();
}
static inline int rcu_preempt_depth(void)
......@@ -709,8 +716,8 @@ static inline void rcu_read_lock_bh(void)
"rcu_read_lock_bh() used illegally while idle");
}
/*
* rcu_read_unlock_bh - marks the end of a softirq-only RCU critical section
/**
* rcu_read_unlock_bh() - marks the end of a softirq-only RCU critical section
*
* See rcu_read_lock_bh() for more information.
*/
......@@ -751,10 +758,10 @@ static inline notrace void rcu_read_lock_sched_notrace(void)
__acquire(RCU_SCHED);
}
/*
* rcu_read_unlock_sched - marks the end of a RCU-classic critical section
/**
* rcu_read_unlock_sched() - marks the end of a RCU-classic critical section
*
* See rcu_read_lock_sched for more information.
* See rcu_read_lock_sched() for more information.
*/
static inline void rcu_read_unlock_sched(void)
{
......@@ -945,7 +952,7 @@ static inline void rcu_head_init(struct rcu_head *rhp)
}
/**
* rcu_head_after_call_rcu - Has this rcu_head been passed to call_rcu()?
* rcu_head_after_call_rcu() - Has this rcu_head been passed to call_rcu()?
* @rhp: The rcu_head structure to test.
* @f: The function passed to call_rcu() along with @rhp.
*
......
......@@ -103,7 +103,6 @@ static inline void rcu_scheduler_starting(void) { }
static inline void rcu_end_inkernel_boot(void) { }
static inline bool rcu_inkernel_boot_has_ended(void) { return true; }
static inline bool rcu_is_watching(void) { return true; }
static inline bool __rcu_is_watching(void) { return true; }
static inline void rcu_momentary_dyntick_idle(void) { }
static inline void kfree_rcu_scheduler_running(void) { }
static inline bool rcu_gp_might_be_stalled(void) { return false; }
......
......@@ -64,7 +64,6 @@ extern int rcu_scheduler_active __read_mostly;
void rcu_end_inkernel_boot(void);
bool rcu_inkernel_boot_has_ended(void);
bool rcu_is_watching(void);
bool __rcu_is_watching(void);
#ifndef CONFIG_PREEMPTION
void rcu_all_qs(void);
#endif
......
......@@ -26,6 +26,9 @@ struct __call_single_data {
struct {
struct llist_node llist;
unsigned int flags;
#ifdef CONFIG_64BIT
u16 src, dst;
#endif
};
};
smp_call_func_t func;
......
......@@ -61,6 +61,9 @@ struct __call_single_node {
unsigned int u_flags;
atomic_t a_flags;
};
#ifdef CONFIG_64BIT
u16 src, dst;
#endif
};
#endif /* __LINUX_SMP_TYPES_H */
......@@ -74,17 +74,17 @@ TRACE_EVENT_RCU(rcu_grace_period,
TP_STRUCT__entry(
__field(const char *, rcuname)
__field(unsigned long, gp_seq)
__field(long, gp_seq)
__field(const char *, gpevent)
),
TP_fast_assign(
__entry->rcuname = rcuname;
__entry->gp_seq = gp_seq;
__entry->gp_seq = (long)gp_seq;
__entry->gpevent = gpevent;
),
TP_printk("%s %lu %s",
TP_printk("%s %ld %s",
__entry->rcuname, __entry->gp_seq, __entry->gpevent)
);
......@@ -114,8 +114,8 @@ TRACE_EVENT_RCU(rcu_future_grace_period,
TP_STRUCT__entry(
__field(const char *, rcuname)
__field(unsigned long, gp_seq)
__field(unsigned long, gp_seq_req)
__field(long, gp_seq)
__field(long, gp_seq_req)
__field(u8, level)
__field(int, grplo)
__field(int, grphi)
......@@ -124,16 +124,16 @@ TRACE_EVENT_RCU(rcu_future_grace_period,
TP_fast_assign(
__entry->rcuname = rcuname;
__entry->gp_seq = gp_seq;
__entry->gp_seq_req = gp_seq_req;
__entry->gp_seq = (long)gp_seq;
__entry->gp_seq_req = (long)gp_seq_req;
__entry->level = level;
__entry->grplo = grplo;
__entry->grphi = grphi;
__entry->gpevent = gpevent;
),
TP_printk("%s %lu %lu %u %d %d %s",
__entry->rcuname, __entry->gp_seq, __entry->gp_seq_req, __entry->level,
TP_printk("%s %ld %ld %u %d %d %s",
__entry->rcuname, (long)__entry->gp_seq, (long)__entry->gp_seq_req, __entry->level,
__entry->grplo, __entry->grphi, __entry->gpevent)
);
......@@ -153,7 +153,7 @@ TRACE_EVENT_RCU(rcu_grace_period_init,
TP_STRUCT__entry(
__field(const char *, rcuname)
__field(unsigned long, gp_seq)
__field(long, gp_seq)
__field(u8, level)
__field(int, grplo)
__field(int, grphi)
......@@ -162,14 +162,14 @@ TRACE_EVENT_RCU(rcu_grace_period_init,
TP_fast_assign(
__entry->rcuname = rcuname;
__entry->gp_seq = gp_seq;
__entry->gp_seq = (long)gp_seq;
__entry->level = level;
__entry->grplo = grplo;
__entry->grphi = grphi;
__entry->qsmask = qsmask;
),
TP_printk("%s %lu %u %d %d %lx",
TP_printk("%s %ld %u %d %d %lx",
__entry->rcuname, __entry->gp_seq, __entry->level,
__entry->grplo, __entry->grphi, __entry->qsmask)
);
......@@ -197,17 +197,17 @@ TRACE_EVENT_RCU(rcu_exp_grace_period,
TP_STRUCT__entry(
__field(const char *, rcuname)
__field(unsigned long, gpseq)
__field(long, gpseq)
__field(const char *, gpevent)
),
TP_fast_assign(
__entry->rcuname = rcuname;
__entry->gpseq = gpseq;
__entry->gpseq = (long)gpseq;
__entry->gpevent = gpevent;
),
TP_printk("%s %lu %s",
TP_printk("%s %ld %s",
__entry->rcuname, __entry->gpseq, __entry->gpevent)
);
......@@ -316,17 +316,17 @@ TRACE_EVENT_RCU(rcu_preempt_task,
TP_STRUCT__entry(
__field(const char *, rcuname)
__field(unsigned long, gp_seq)
__field(long, gp_seq)
__field(int, pid)
),
TP_fast_assign(
__entry->rcuname = rcuname;
__entry->gp_seq = gp_seq;
__entry->gp_seq = (long)gp_seq;
__entry->pid = pid;
),
TP_printk("%s %lu %d",
TP_printk("%s %ld %d",
__entry->rcuname, __entry->gp_seq, __entry->pid)
);
......@@ -343,17 +343,17 @@ TRACE_EVENT_RCU(rcu_unlock_preempted_task,
TP_STRUCT__entry(
__field(const char *, rcuname)
__field(unsigned long, gp_seq)
__field(long, gp_seq)
__field(int, pid)
),
TP_fast_assign(
__entry->rcuname = rcuname;
__entry->gp_seq = gp_seq;
__entry->gp_seq = (long)gp_seq;
__entry->pid = pid;
),
TP_printk("%s %lu %d", __entry->rcuname, __entry->gp_seq, __entry->pid)
TP_printk("%s %ld %d", __entry->rcuname, __entry->gp_seq, __entry->pid)
);
/*
......@@ -374,7 +374,7 @@ TRACE_EVENT_RCU(rcu_quiescent_state_report,
TP_STRUCT__entry(
__field(const char *, rcuname)
__field(unsigned long, gp_seq)
__field(long, gp_seq)
__field(unsigned long, mask)
__field(unsigned long, qsmask)
__field(u8, level)
......@@ -385,7 +385,7 @@ TRACE_EVENT_RCU(rcu_quiescent_state_report,
TP_fast_assign(
__entry->rcuname = rcuname;
__entry->gp_seq = gp_seq;
__entry->gp_seq = (long)gp_seq;
__entry->mask = mask;
__entry->qsmask = qsmask;
__entry->level = level;
......@@ -394,7 +394,7 @@ TRACE_EVENT_RCU(rcu_quiescent_state_report,
__entry->gp_tasks = gp_tasks;
),
TP_printk("%s %lu %lx>%lx %u %d %d %u",
TP_printk("%s %ld %lx>%lx %u %d %d %u",
__entry->rcuname, __entry->gp_seq,
__entry->mask, __entry->qsmask, __entry->level,
__entry->grplo, __entry->grphi, __entry->gp_tasks)
......@@ -415,19 +415,19 @@ TRACE_EVENT_RCU(rcu_fqs,
TP_STRUCT__entry(
__field(const char *, rcuname)
__field(unsigned long, gp_seq)
__field(long, gp_seq)
__field(int, cpu)
__field(const char *, qsevent)
),
TP_fast_assign(
__entry->rcuname = rcuname;
__entry->gp_seq = gp_seq;
__entry->gp_seq = (long)gp_seq;
__entry->cpu = cpu;
__entry->qsevent = qsevent;
),
TP_printk("%s %lu %d %s",
TP_printk("%s %ld %d %s",
__entry->rcuname, __entry->gp_seq,
__entry->cpu, __entry->qsevent)
);
......
......@@ -134,6 +134,8 @@ KASAN_SANITIZE_stackleak.o := n
KCSAN_SANITIZE_stackleak.o := n
KCOV_INSTRUMENT_stackleak.o := n
obj-$(CONFIG_SCF_TORTURE_TEST) += scftorture.o
$(obj)/configs.o: $(obj)/config_data.gz
targets += config_data.gz
......
......@@ -304,7 +304,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
* terminate a grace period, if and only if the timer interrupt is
* not nested into another interrupt.
*
* Checking for __rcu_is_watching() here would prevent the nesting
* Checking for rcu_is_watching() here would prevent the nesting
* interrupt to invoke rcu_irq_enter(). If that nested interrupt is
* the tick then rcu_flavor_sched_clock_irq() would wrongfully
* assume that it is the first interupt and eventually claim
......
......@@ -566,7 +566,7 @@ static struct lock_torture_ops rwsem_lock_ops = {
#include <linux/percpu-rwsem.h>
static struct percpu_rw_semaphore pcpu_rwsem;
void torture_percpu_rwsem_init(void)
static void torture_percpu_rwsem_init(void)
{
BUG_ON(percpu_init_rwsem(&pcpu_rwsem));
}
......
......@@ -135,10 +135,12 @@ config RCU_FANOUT
config RCU_FANOUT_LEAF
int "Tree-based hierarchical RCU leaf-level fanout value"
range 2 64 if 64BIT
range 2 32 if !64BIT
range 2 64 if 64BIT && !RCU_STRICT_GRACE_PERIOD
range 2 32 if !64BIT && !RCU_STRICT_GRACE_PERIOD
range 2 3 if RCU_STRICT_GRACE_PERIOD
depends on TREE_RCU && RCU_EXPERT
default 16
default 16 if !RCU_STRICT_GRACE_PERIOD
default 2 if RCU_STRICT_GRACE_PERIOD
help
This option controls the leaf-level fanout of hierarchical
implementations of RCU, and allows trading off cache misses
......
......@@ -23,7 +23,7 @@ config TORTURE_TEST
tristate
default n
config RCU_PERF_TEST
config RCU_SCALE_TEST
tristate "performance tests for RCU"
depends on DEBUG_KERNEL
select TORTURE_TEST
......@@ -114,4 +114,19 @@ config RCU_EQS_DEBUG
Say N here if you need ultimate kernel/user switch latencies
Say Y if you are unsure
config RCU_STRICT_GRACE_PERIOD
bool "Provide debug RCU implementation with short grace periods"
depends on DEBUG_KERNEL && RCU_EXPERT
default n
select PREEMPT_COUNT if PREEMPT=n
help
Select this option to build an RCU variant that is strict about
grace periods, making them as short as it can. This limits
scalability, destroys real-time response, degrades battery
lifetime and kills performance. Don't try this on large
machines, as in systems with more than about 10 or 20 CPUs.
But in conjunction with tools like KASAN, it can be helpful
when looking for certain types of RCU usage bugs, for example,
too-short RCU read-side critical sections.
endmenu # "RCU Debugging"
......@@ -11,7 +11,7 @@ obj-y += update.o sync.o
obj-$(CONFIG_TREE_SRCU) += srcutree.o
obj-$(CONFIG_TINY_SRCU) += srcutiny.o
obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
obj-$(CONFIG_RCU_PERF_TEST) += rcuperf.o
obj-$(CONFIG_RCU_SCALE_TEST) += rcuscale.o
obj-$(CONFIG_RCU_REF_SCALE_TEST) += refscale.o
obj-$(CONFIG_TREE_RCU) += tree.o
obj-$(CONFIG_TINY_RCU) += tiny.o
......
......@@ -475,8 +475,16 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
* Also advance to the oldest segment of callbacks whose
* ->gp_seq[] completion is at or after that passed in via "seq",
* skipping any empty segments.
*
* Note that segment "i" (and any lower-numbered segments
* containing older callbacks) will be unaffected, and their
* grace-period numbers remain unchanged. For example, if i ==
* WAIT_TAIL, then neither WAIT_TAIL nor DONE_TAIL will be touched.
* Instead, the CBs in NEXT_TAIL will be merged with those in
* NEXT_READY_TAIL and the grace-period number of NEXT_READY_TAIL
* would be updated. NEXT_TAIL would then be empty.
*/
if (++i >= RCU_NEXT_TAIL)
if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL)
return false;
/*
......
// SPDX-License-Identifier: GPL-2.0+
/*
* Read-Copy Update module-based performance-test facility
* Read-Copy Update module-based scalability-test facility
*
* Copyright (C) IBM Corporation, 2015
*
......@@ -44,13 +44,13 @@
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>");
#define PERF_FLAG "-perf:"
#define PERFOUT_STRING(s) \
pr_alert("%s" PERF_FLAG " %s\n", perf_type, s)
#define VERBOSE_PERFOUT_STRING(s) \
do { if (verbose) pr_alert("%s" PERF_FLAG " %s\n", perf_type, s); } while (0)
#define VERBOSE_PERFOUT_ERRSTRING(s) \
do { if (verbose) pr_alert("%s" PERF_FLAG "!!! %s\n", perf_type, s); } while (0)
#define SCALE_FLAG "-scale:"
#define SCALEOUT_STRING(s) \
pr_alert("%s" SCALE_FLAG " %s\n", scale_type, s)
#define VERBOSE_SCALEOUT_STRING(s) \
do { if (verbose) pr_alert("%s" SCALE_FLAG " %s\n", scale_type, s); } while (0)
#define VERBOSE_SCALEOUT_ERRSTRING(s) \
do { if (verbose) pr_alert("%s" SCALE_FLAG "!!! %s\n", scale_type, s); } while (0)
/*
* The intended use cases for the nreaders and nwriters module parameters
......@@ -61,25 +61,25 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>");
* nr_cpus for a mixed reader/writer test.
*
* 2. Specify the nr_cpus kernel boot parameter, but set
* rcuperf.nreaders to zero. This will set nwriters to the
* rcuscale.nreaders to zero. This will set nwriters to the
* value specified by nr_cpus for an update-only test.
*
* 3. Specify the nr_cpus kernel boot parameter, but set
* rcuperf.nwriters to zero. This will set nreaders to the
* rcuscale.nwriters to zero. This will set nreaders to the
* value specified by nr_cpus for a read-only test.
*
* Various other use cases may of course be specified.
*
* Note that this test's readers are intended only as a test load for
* the writers. The reader performance statistics will be overly
* the writers. The reader scalability statistics will be overly
* pessimistic due to the per-critical-section interrupt disabling,
* test-end checks, and the pair of calls through pointers.
*/
#ifdef MODULE
# define RCUPERF_SHUTDOWN 0
# define RCUSCALE_SHUTDOWN 0
#else
# define RCUPERF_SHUTDOWN 1
# define RCUSCALE_SHUTDOWN 1
#endif
torture_param(bool, gp_async, false, "Use asynchronous GP wait primitives");
......@@ -88,16 +88,16 @@ torture_param(bool, gp_exp, false, "Use expedited GP wait primitives");
torture_param(int, holdoff, 10, "Holdoff time before test start (s)");
torture_param(int, nreaders, -1, "Number of RCU reader threads");
torture_param(int, nwriters, -1, "Number of RCU updater threads");
torture_param(bool, shutdown, RCUPERF_SHUTDOWN,
"Shutdown at end of performance tests.");
torture_param(bool, shutdown, RCUSCALE_SHUTDOWN,
"Shutdown at end of scalability tests.");
torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable");
torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() perf test?");
torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() scale test?");
torture_param(int, kfree_mult, 1, "Multiple of kfree_obj size to allocate.");
static char *perf_type = "rcu";
module_param(perf_type, charp, 0444);
MODULE_PARM_DESC(perf_type, "Type of RCU to performance-test (rcu, srcu, ...)");
static char *scale_type = "rcu";
module_param(scale_type, charp, 0444);
MODULE_PARM_DESC(scale_type, "Type of RCU to scalability-test (rcu, srcu, ...)");
static int nrealreaders;
static int nrealwriters;
......@@ -107,12 +107,12 @@ static struct task_struct *shutdown_task;
static u64 **writer_durations;
static int *writer_n_durations;
static atomic_t n_rcu_perf_reader_started;
static atomic_t n_rcu_perf_writer_started;
static atomic_t n_rcu_perf_writer_finished;
static atomic_t n_rcu_scale_reader_started;
static atomic_t n_rcu_scale_writer_started;
static atomic_t n_rcu_scale_writer_finished;
static wait_queue_head_t shutdown_wq;
static u64 t_rcu_perf_writer_started;
static u64 t_rcu_perf_writer_finished;
static u64 t_rcu_scale_writer_started;
static u64 t_rcu_scale_writer_finished;
static unsigned long b_rcu_gp_test_started;
static unsigned long b_rcu_gp_test_finished;
static DEFINE_PER_CPU(atomic_t, n_async_inflight);
......@@ -124,7 +124,7 @@ static DEFINE_PER_CPU(atomic_t, n_async_inflight);
* Operations vector for selecting different types of tests.
*/
struct rcu_perf_ops {
struct rcu_scale_ops {
int ptype;
void (*init)(void);
void (*cleanup)(void);
......@@ -140,19 +140,19 @@ struct rcu_perf_ops {
const char *name;
};
static struct rcu_perf_ops *cur_ops;
static struct rcu_scale_ops *cur_ops;
/*
* Definitions for rcu perf testing.
* Definitions for rcu scalability testing.
*/
static int rcu_perf_read_lock(void) __acquires(RCU)
static int rcu_scale_read_lock(void) __acquires(RCU)
{
rcu_read_lock();
return 0;
}
static void rcu_perf_read_unlock(int idx) __releases(RCU)
static void rcu_scale_read_unlock(int idx) __releases(RCU)
{
rcu_read_unlock();
}
......@@ -162,15 +162,15 @@ static unsigned long __maybe_unused rcu_no_completed(void)
return 0;
}
static void rcu_sync_perf_init(void)
static void rcu_sync_scale_init(void)
{
}
static struct rcu_perf_ops rcu_ops = {
static struct rcu_scale_ops rcu_ops = {
.ptype = RCU_FLAVOR,
.init = rcu_sync_perf_init,
.readlock = rcu_perf_read_lock,
.readunlock = rcu_perf_read_unlock,
.init = rcu_sync_scale_init,
.readlock = rcu_scale_read_lock,
.readunlock = rcu_scale_read_unlock,
.get_gp_seq = rcu_get_gp_seq,
.gp_diff = rcu_seq_diff,
.exp_completed = rcu_exp_batches_completed,
......@@ -182,23 +182,23 @@ static struct rcu_perf_ops rcu_ops = {
};
/*
* Definitions for srcu perf testing.
* Definitions for srcu scalability testing.
*/
DEFINE_STATIC_SRCU(srcu_ctl_perf);
static struct srcu_struct *srcu_ctlp = &srcu_ctl_perf;
DEFINE_STATIC_SRCU(srcu_ctl_scale);
static struct srcu_struct *srcu_ctlp = &srcu_ctl_scale;
static int srcu_perf_read_lock(void) __acquires(srcu_ctlp)
static int srcu_scale_read_lock(void) __acquires(srcu_ctlp)
{
return srcu_read_lock(srcu_ctlp);
}
static void srcu_perf_read_unlock(int idx) __releases(srcu_ctlp)
static void srcu_scale_read_unlock(int idx) __releases(srcu_ctlp)
{
srcu_read_unlock(srcu_ctlp, idx);
}
static unsigned long srcu_perf_completed(void)
static unsigned long srcu_scale_completed(void)
{
return srcu_batches_completed(srcu_ctlp);
}
......@@ -213,78 +213,78 @@ static void srcu_rcu_barrier(void)
srcu_barrier(srcu_ctlp);
}
static void srcu_perf_synchronize(void)
static void srcu_scale_synchronize(void)
{
synchronize_srcu(srcu_ctlp);
}
static void srcu_perf_synchronize_expedited(void)
static void srcu_scale_synchronize_expedited(void)
{
synchronize_srcu_expedited(srcu_ctlp);
}
static struct rcu_perf_ops srcu_ops = {
static struct rcu_scale_ops srcu_ops = {
.ptype = SRCU_FLAVOR,
.init = rcu_sync_perf_init,
.readlock = srcu_perf_read_lock,
.readunlock = srcu_perf_read_unlock,
.get_gp_seq = srcu_perf_completed,
.init = rcu_sync_scale_init,
.readlock = srcu_scale_read_lock,
.readunlock = srcu_scale_read_unlock,
.get_gp_seq = srcu_scale_completed,
.gp_diff = rcu_seq_diff,
.exp_completed = srcu_perf_completed,
.exp_completed = srcu_scale_completed,
.async = srcu_call_rcu,
.gp_barrier = srcu_rcu_barrier,
.sync = srcu_perf_synchronize,
.exp_sync = srcu_perf_synchronize_expedited,
.sync = srcu_scale_synchronize,
.exp_sync = srcu_scale_synchronize_expedited,
.name = "srcu"
};
static struct srcu_struct srcud;
static void srcu_sync_perf_init(void)
static void srcu_sync_scale_init(void)
{
srcu_ctlp = &srcud;
init_srcu_struct(srcu_ctlp);
}
static void srcu_sync_perf_cleanup(void)
static void srcu_sync_scale_cleanup(void)
{
cleanup_srcu_struct(srcu_ctlp);
}
static struct rcu_perf_ops srcud_ops = {
static struct rcu_scale_ops srcud_ops = {
.ptype = SRCU_FLAVOR,
.init = srcu_sync_perf_init,
.cleanup = srcu_sync_perf_cleanup,
.readlock = srcu_perf_read_lock,
.readunlock = srcu_perf_read_unlock,
.get_gp_seq = srcu_perf_completed,
.init = srcu_sync_scale_init,
.cleanup = srcu_sync_scale_cleanup,
.readlock = srcu_scale_read_lock,
.readunlock = srcu_scale_read_unlock,
.get_gp_seq = srcu_scale_completed,
.gp_diff = rcu_seq_diff,
.exp_completed = srcu_perf_completed,
.exp_completed = srcu_scale_completed,
.async = srcu_call_rcu,
.gp_barrier = srcu_rcu_barrier,
.sync = srcu_perf_synchronize,
.exp_sync = srcu_perf_synchronize_expedited,
.sync = srcu_scale_synchronize,
.exp_sync = srcu_scale_synchronize_expedited,
.name = "srcud"
};
/*
* Definitions for RCU-tasks perf testing.
* Definitions for RCU-tasks scalability testing.
*/
static int tasks_perf_read_lock(void)
static int tasks_scale_read_lock(void)
{
return 0;
}
static void tasks_perf_read_unlock(int idx)
static void tasks_scale_read_unlock(int idx)
{
}
static struct rcu_perf_ops tasks_ops = {
static struct rcu_scale_ops tasks_ops = {
.ptype = RCU_TASKS_FLAVOR,
.init = rcu_sync_perf_init,
.readlock = tasks_perf_read_lock,
.readunlock = tasks_perf_read_unlock,
.init = rcu_sync_scale_init,
.readlock = tasks_scale_read_lock,
.readunlock = tasks_scale_read_unlock,
.get_gp_seq = rcu_no_completed,
.gp_diff = rcu_seq_diff,
.async = call_rcu_tasks,
......@@ -294,7 +294,7 @@ static struct rcu_perf_ops tasks_ops = {
.name = "tasks"
};
static unsigned long rcuperf_seq_diff(unsigned long new, unsigned long old)
static unsigned long rcuscale_seq_diff(unsigned long new, unsigned long old)
{
if (!cur_ops->gp_diff)
return new - old;
......@@ -302,60 +302,60 @@ static unsigned long rcuperf_seq_diff(unsigned long new, unsigned long old)
}
/*
* If performance tests complete, wait for shutdown to commence.
* If scalability tests complete, wait for shutdown to commence.
*/
static void rcu_perf_wait_shutdown(void)
static void rcu_scale_wait_shutdown(void)
{
cond_resched_tasks_rcu_qs();
if (atomic_read(&n_rcu_perf_writer_finished) < nrealwriters)
if (atomic_read(&n_rcu_scale_writer_finished) < nrealwriters)
return;
while (!torture_must_stop())
schedule_timeout_uninterruptible(1);
}
/*
* RCU perf reader kthread. Repeatedly does empty RCU read-side critical
* section, minimizing update-side interference. However, the point of
* this test is not to evaluate reader performance, but instead to serve
* as a test load for update-side performance testing.
* RCU scalability reader kthread. Repeatedly does empty RCU read-side
* critical section, minimizing update-side interference. However, the
* point of this test is not to evaluate reader scalability, but instead
* to serve as a test load for update-side scalability testing.
*/
static int
rcu_perf_reader(void *arg)
rcu_scale_reader(void *arg)
{
unsigned long flags;
int idx;
long me = (long)arg;
VERBOSE_PERFOUT_STRING("rcu_perf_reader task started");
VERBOSE_SCALEOUT_STRING("rcu_scale_reader task started");
set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
set_user_nice(current, MAX_NICE);
atomic_inc(&n_rcu_perf_reader_started);
atomic_inc(&n_rcu_scale_reader_started);
do {
local_irq_save(flags);
idx = cur_ops->readlock();
cur_ops->readunlock(idx);
local_irq_restore(flags);
rcu_perf_wait_shutdown();
rcu_scale_wait_shutdown();
} while (!torture_must_stop());
torture_kthread_stopping("rcu_perf_reader");
torture_kthread_stopping("rcu_scale_reader");
return 0;
}
/*
* Callback function for asynchronous grace periods from rcu_perf_writer().
* Callback function for asynchronous grace periods from rcu_scale_writer().
*/
static void rcu_perf_async_cb(struct rcu_head *rhp)
static void rcu_scale_async_cb(struct rcu_head *rhp)
{
atomic_dec(this_cpu_ptr(&n_async_inflight));
kfree(rhp);
}
/*
* RCU perf writer kthread. Repeatedly does a grace period.
* RCU scale writer kthread. Repeatedly does a grace period.
*/
static int
rcu_perf_writer(void *arg)
rcu_scale_writer(void *arg)
{
int i = 0;
int i_max;
......@@ -366,7 +366,7 @@ rcu_perf_writer(void *arg)
u64 *wdp;
u64 *wdpp = writer_durations[me];
VERBOSE_PERFOUT_STRING("rcu_perf_writer task started");
VERBOSE_SCALEOUT_STRING("rcu_scale_writer task started");
WARN_ON(!wdpp);
set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
sched_set_fifo_low(current);
......@@ -383,8 +383,8 @@ rcu_perf_writer(void *arg)
schedule_timeout_uninterruptible(1);
t = ktime_get_mono_fast_ns();
if (atomic_inc_return(&n_rcu_perf_writer_started) >= nrealwriters) {
t_rcu_perf_writer_started = t;
if (atomic_inc_return(&n_rcu_scale_writer_started) >= nrealwriters) {
t_rcu_scale_writer_started = t;
if (gp_exp) {
b_rcu_gp_test_started =
cur_ops->exp_completed() / 2;
......@@ -404,7 +404,7 @@ rcu_perf_writer(void *arg)
rhp = kmalloc(sizeof(*rhp), GFP_KERNEL);
if (rhp && atomic_read(this_cpu_ptr(&n_async_inflight)) < gp_async_max) {
atomic_inc(this_cpu_ptr(&n_async_inflight));
cur_ops->async(rhp, rcu_perf_async_cb);
cur_ops->async(rhp, rcu_scale_async_cb);
rhp = NULL;
} else if (!kthread_should_stop()) {
cur_ops->gp_barrier();
......@@ -421,19 +421,19 @@ rcu_perf_writer(void *arg)
*wdp = t - *wdp;
i_max = i;
if (!started &&
atomic_read(&n_rcu_perf_writer_started) >= nrealwriters)
atomic_read(&n_rcu_scale_writer_started) >= nrealwriters)
started = true;
if (!done && i >= MIN_MEAS) {
done = true;
sched_set_normal(current, 0);
pr_alert("%s%s rcu_perf_writer %ld has %d measurements\n",
perf_type, PERF_FLAG, me, MIN_MEAS);
if (atomic_inc_return(&n_rcu_perf_writer_finished) >=
pr_alert("%s%s rcu_scale_writer %ld has %d measurements\n",
scale_type, SCALE_FLAG, me, MIN_MEAS);
if (atomic_inc_return(&n_rcu_scale_writer_finished) >=
nrealwriters) {
schedule_timeout_interruptible(10);
rcu_ftrace_dump(DUMP_ALL);
PERFOUT_STRING("Test complete");
t_rcu_perf_writer_finished = t;
SCALEOUT_STRING("Test complete");
t_rcu_scale_writer_finished = t;
if (gp_exp) {
b_rcu_gp_test_finished =
cur_ops->exp_completed() / 2;
......@@ -448,30 +448,30 @@ rcu_perf_writer(void *arg)
}
}
if (done && !alldone &&
atomic_read(&n_rcu_perf_writer_finished) >= nrealwriters)
atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters)
alldone = true;
if (started && !alldone && i < MAX_MEAS - 1)
i++;
rcu_perf_wait_shutdown();
rcu_scale_wait_shutdown();
} while (!torture_must_stop());
if (gp_async) {
cur_ops->gp_barrier();
}
writer_n_durations[me] = i_max;
torture_kthread_stopping("rcu_perf_writer");
torture_kthread_stopping("rcu_scale_writer");
return 0;
}
static void
rcu_perf_print_module_parms(struct rcu_perf_ops *cur_ops, const char *tag)
rcu_scale_print_module_parms(struct rcu_scale_ops *cur_ops, const char *tag)
{
pr_alert("%s" PERF_FLAG
pr_alert("%s" SCALE_FLAG
"--- %s: nreaders=%d nwriters=%d verbose=%d shutdown=%d\n",
perf_type, tag, nrealreaders, nrealwriters, verbose, shutdown);
scale_type, tag, nrealreaders, nrealwriters, verbose, shutdown);
}
static void
rcu_perf_cleanup(void)
rcu_scale_cleanup(void)
{
int i;
int j;
......@@ -484,11 +484,11 @@ rcu_perf_cleanup(void)
* during the mid-boot phase, so have to wait till the end.
*/
if (rcu_gp_is_expedited() && !rcu_gp_is_normal() && !gp_exp)
VERBOSE_PERFOUT_ERRSTRING("All grace periods expedited, no normal ones to measure!");
VERBOSE_SCALEOUT_ERRSTRING("All grace periods expedited, no normal ones to measure!");
if (rcu_gp_is_normal() && gp_exp)
VERBOSE_PERFOUT_ERRSTRING("All grace periods normal, no expedited ones to measure!");
VERBOSE_SCALEOUT_ERRSTRING("All grace periods normal, no expedited ones to measure!");
if (gp_exp && gp_async)
VERBOSE_PERFOUT_ERRSTRING("No expedited async GPs, so went with async!");
VERBOSE_SCALEOUT_ERRSTRING("No expedited async GPs, so went with async!");
if (torture_cleanup_begin())
return;
......@@ -499,30 +499,30 @@ rcu_perf_cleanup(void)
if (reader_tasks) {
for (i = 0; i < nrealreaders; i++)
torture_stop_kthread(rcu_perf_reader,
torture_stop_kthread(rcu_scale_reader,
reader_tasks[i]);
kfree(reader_tasks);
}
if (writer_tasks) {
for (i = 0; i < nrealwriters; i++) {
torture_stop_kthread(rcu_perf_writer,
torture_stop_kthread(rcu_scale_writer,
writer_tasks[i]);
if (!writer_n_durations)
continue;
j = writer_n_durations[i];
pr_alert("%s%s writer %d gps: %d\n",
perf_type, PERF_FLAG, i, j);
scale_type, SCALE_FLAG, i, j);
ngps += j;
}
pr_alert("%s%s start: %llu end: %llu duration: %llu gps: %d batches: %ld\n",
perf_type, PERF_FLAG,
t_rcu_perf_writer_started, t_rcu_perf_writer_finished,
t_rcu_perf_writer_finished -
t_rcu_perf_writer_started,
scale_type, SCALE_FLAG,
t_rcu_scale_writer_started, t_rcu_scale_writer_finished,
t_rcu_scale_writer_finished -
t_rcu_scale_writer_started,
ngps,
rcuperf_seq_diff(b_rcu_gp_test_finished,
b_rcu_gp_test_started));
rcuscale_seq_diff(b_rcu_gp_test_finished,
b_rcu_gp_test_started));
for (i = 0; i < nrealwriters; i++) {
if (!writer_durations)
break;
......@@ -534,7 +534,7 @@ rcu_perf_cleanup(void)
for (j = 0; j <= writer_n_durations[i]; j++) {
wdp = &wdpp[j];
pr_alert("%s%s %4d writer-duration: %5d %llu\n",
perf_type, PERF_FLAG,
scale_type, SCALE_FLAG,
i, j, *wdp);
if (j % 100 == 0)
schedule_timeout_uninterruptible(1);
......@@ -573,22 +573,22 @@ static int compute_real(int n)
}
/*
* RCU perf shutdown kthread. Just waits to be awakened, then shuts
* RCU scalability shutdown kthread. Just waits to be awakened, then shuts
* down system.
*/
static int
rcu_perf_shutdown(void *arg)
rcu_scale_shutdown(void *arg)
{
wait_event(shutdown_wq,
atomic_read(&n_rcu_perf_writer_finished) >= nrealwriters);
atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters);
smp_mb(); /* Wake before output. */
rcu_perf_cleanup();
rcu_scale_cleanup();
kernel_power_off();
return -EINVAL;
}
/*
* kfree_rcu() performance tests: Start a kfree_rcu() loop on all CPUs for number
* kfree_rcu() scalability tests: Start a kfree_rcu() loop on all CPUs for number
* of iterations and measure total time and number of GP for all iterations to complete.
*/
......@@ -598,8 +598,8 @@ torture_param(int, kfree_loops, 10, "Number of loops doing kfree_alloc_num alloc
static struct task_struct **kfree_reader_tasks;
static int kfree_nrealthreads;
static atomic_t n_kfree_perf_thread_started;
static atomic_t n_kfree_perf_thread_ended;
static atomic_t n_kfree_scale_thread_started;
static atomic_t n_kfree_scale_thread_ended;
struct kfree_obj {
char kfree_obj[8];
......@@ -607,7 +607,7 @@ struct kfree_obj {
};
static int
kfree_perf_thread(void *arg)
kfree_scale_thread(void *arg)
{
int i, loop = 0;
long me = (long)arg;
......@@ -615,13 +615,13 @@ kfree_perf_thread(void *arg)
u64 start_time, end_time;
long long mem_begin, mem_during = 0;
VERBOSE_PERFOUT_STRING("kfree_perf_thread task started");
VERBOSE_SCALEOUT_STRING("kfree_scale_thread task started");
set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
set_user_nice(current, MAX_NICE);
start_time = ktime_get_mono_fast_ns();
if (atomic_inc_return(&n_kfree_perf_thread_started) >= kfree_nrealthreads) {
if (atomic_inc_return(&n_kfree_scale_thread_started) >= kfree_nrealthreads) {
if (gp_exp)
b_rcu_gp_test_started = cur_ops->exp_completed() / 2;
else
......@@ -646,7 +646,7 @@ kfree_perf_thread(void *arg)
cond_resched();
} while (!torture_must_stop() && ++loop < kfree_loops);
if (atomic_inc_return(&n_kfree_perf_thread_ended) >= kfree_nrealthreads) {
if (atomic_inc_return(&n_kfree_scale_thread_ended) >= kfree_nrealthreads) {
end_time = ktime_get_mono_fast_ns();
if (gp_exp)
......@@ -656,7 +656,7 @@ kfree_perf_thread(void *arg)
pr_alert("Total time taken by all kfree'ers: %llu ns, loops: %d, batches: %ld, memory footprint: %lldMB\n",
(unsigned long long)(end_time - start_time), kfree_loops,
rcuperf_seq_diff(b_rcu_gp_test_finished, b_rcu_gp_test_started),
rcuscale_seq_diff(b_rcu_gp_test_finished, b_rcu_gp_test_started),
(mem_begin - mem_during) >> (20 - PAGE_SHIFT));
if (shutdown) {
......@@ -665,12 +665,12 @@ kfree_perf_thread(void *arg)
}
}
torture_kthread_stopping("kfree_perf_thread");
torture_kthread_stopping("kfree_scale_thread");
return 0;
}
static void
kfree_perf_cleanup(void)
kfree_scale_cleanup(void)
{
int i;
......@@ -679,7 +679,7 @@ kfree_perf_cleanup(void)
if (kfree_reader_tasks) {
for (i = 0; i < kfree_nrealthreads; i++)
torture_stop_kthread(kfree_perf_thread,
torture_stop_kthread(kfree_scale_thread,
kfree_reader_tasks[i]);
kfree(kfree_reader_tasks);
}
......@@ -691,20 +691,20 @@ kfree_perf_cleanup(void)
* shutdown kthread. Just waits to be awakened, then shuts down system.
*/
static int
kfree_perf_shutdown(void *arg)
kfree_scale_shutdown(void *arg)
{
wait_event(shutdown_wq,
atomic_read(&n_kfree_perf_thread_ended) >= kfree_nrealthreads);
atomic_read(&n_kfree_scale_thread_ended) >= kfree_nrealthreads);
smp_mb(); /* Wake before output. */
kfree_perf_cleanup();
kfree_scale_cleanup();
kernel_power_off();
return -EINVAL;
}
static int __init
kfree_perf_init(void)
kfree_scale_init(void)
{
long i;
int firsterr = 0;
......@@ -713,7 +713,7 @@ kfree_perf_init(void)
/* Start up the kthreads. */
if (shutdown) {
init_waitqueue_head(&shutdown_wq);
firsterr = torture_create_kthread(kfree_perf_shutdown, NULL,
firsterr = torture_create_kthread(kfree_scale_shutdown, NULL,
shutdown_task);
if (firsterr)
goto unwind;
......@@ -730,13 +730,13 @@ kfree_perf_init(void)
}
for (i = 0; i < kfree_nrealthreads; i++) {
firsterr = torture_create_kthread(kfree_perf_thread, (void *)i,
firsterr = torture_create_kthread(kfree_scale_thread, (void *)i,
kfree_reader_tasks[i]);
if (firsterr)
goto unwind;
}
while (atomic_read(&n_kfree_perf_thread_started) < kfree_nrealthreads)
while (atomic_read(&n_kfree_scale_thread_started) < kfree_nrealthreads)
schedule_timeout_uninterruptible(1);
torture_init_end();
......@@ -744,35 +744,35 @@ kfree_perf_init(void)
unwind:
torture_init_end();
kfree_perf_cleanup();
kfree_scale_cleanup();
return firsterr;
}
static int __init
rcu_perf_init(void)
rcu_scale_init(void)
{
long i;
int firsterr = 0;
static struct rcu_perf_ops *perf_ops[] = {
static struct rcu_scale_ops *scale_ops[] = {
&rcu_ops, &srcu_ops, &srcud_ops, &tasks_ops,
};
if (!torture_init_begin(perf_type, verbose))
if (!torture_init_begin(scale_type, verbose))
return -EBUSY;
/* Process args and tell the world that the perf'er is on the job. */
for (i = 0; i < ARRAY_SIZE(perf_ops); i++) {
cur_ops = perf_ops[i];
if (strcmp(perf_type, cur_ops->name) == 0)
/* Process args and announce that the scalability'er is on the job. */
for (i = 0; i < ARRAY_SIZE(scale_ops); i++) {
cur_ops = scale_ops[i];
if (strcmp(scale_type, cur_ops->name) == 0)
break;
}
if (i == ARRAY_SIZE(perf_ops)) {
pr_alert("rcu-perf: invalid perf type: \"%s\"\n", perf_type);
pr_alert("rcu-perf types:");
for (i = 0; i < ARRAY_SIZE(perf_ops); i++)
pr_cont(" %s", perf_ops[i]->name);
if (i == ARRAY_SIZE(scale_ops)) {
pr_alert("rcu-scale: invalid scale type: \"%s\"\n", scale_type);
pr_alert("rcu-scale types:");
for (i = 0; i < ARRAY_SIZE(scale_ops); i++)
pr_cont(" %s", scale_ops[i]->name);
pr_cont("\n");
WARN_ON(!IS_MODULE(CONFIG_RCU_PERF_TEST));
WARN_ON(!IS_MODULE(CONFIG_RCU_SCALE_TEST));
firsterr = -EINVAL;
cur_ops = NULL;
goto unwind;
......@@ -781,20 +781,20 @@ rcu_perf_init(void)
cur_ops->init();
if (kfree_rcu_test)
return kfree_perf_init();
return kfree_scale_init();
nrealwriters = compute_real(nwriters);
nrealreaders = compute_real(nreaders);
atomic_set(&n_rcu_perf_reader_started, 0);
atomic_set(&n_rcu_perf_writer_started, 0);
atomic_set(&n_rcu_perf_writer_finished, 0);
rcu_perf_print_module_parms(cur_ops, "Start of test");
atomic_set(&n_rcu_scale_reader_started, 0);
atomic_set(&n_rcu_scale_writer_started, 0);
atomic_set(&n_rcu_scale_writer_finished, 0);
rcu_scale_print_module_parms(cur_ops, "Start of test");
/* Start up the kthreads. */
if (shutdown) {
init_waitqueue_head(&shutdown_wq);
firsterr = torture_create_kthread(rcu_perf_shutdown, NULL,
firsterr = torture_create_kthread(rcu_scale_shutdown, NULL,
shutdown_task);
if (firsterr)
goto unwind;
......@@ -803,17 +803,17 @@ rcu_perf_init(void)
reader_tasks = kcalloc(nrealreaders, sizeof(reader_tasks[0]),
GFP_KERNEL);
if (reader_tasks == NULL) {
VERBOSE_PERFOUT_ERRSTRING("out of memory");
VERBOSE_SCALEOUT_ERRSTRING("out of memory");
firsterr = -ENOMEM;
goto unwind;
}
for (i = 0; i < nrealreaders; i++) {
firsterr = torture_create_kthread(rcu_perf_reader, (void *)i,
firsterr = torture_create_kthread(rcu_scale_reader, (void *)i,
reader_tasks[i]);
if (firsterr)
goto unwind;
}
while (atomic_read(&n_rcu_perf_reader_started) < nrealreaders)
while (atomic_read(&n_rcu_scale_reader_started) < nrealreaders)
schedule_timeout_uninterruptible(1);
writer_tasks = kcalloc(nrealwriters, sizeof(reader_tasks[0]),
GFP_KERNEL);
......@@ -823,7 +823,7 @@ rcu_perf_init(void)
kcalloc(nrealwriters, sizeof(*writer_n_durations),
GFP_KERNEL);
if (!writer_tasks || !writer_durations || !writer_n_durations) {
VERBOSE_PERFOUT_ERRSTRING("out of memory");
VERBOSE_SCALEOUT_ERRSTRING("out of memory");
firsterr = -ENOMEM;
goto unwind;
}
......@@ -835,7 +835,7 @@ rcu_perf_init(void)
firsterr = -ENOMEM;
goto unwind;
}
firsterr = torture_create_kthread(rcu_perf_writer, (void *)i,
firsterr = torture_create_kthread(rcu_scale_writer, (void *)i,
writer_tasks[i]);
if (firsterr)
goto unwind;
......@@ -845,9 +845,9 @@ rcu_perf_init(void)
unwind:
torture_init_end();
rcu_perf_cleanup();
rcu_scale_cleanup();
return firsterr;
}
module_init(rcu_perf_init);
module_exit(rcu_perf_cleanup);
module_init(rcu_scale_init);
module_exit(rcu_scale_cleanup);
......@@ -52,19 +52,6 @@
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com> and Josh Triplett <josh@joshtriplett.org>");
#ifndef data_race
#define data_race(expr) \
({ \
expr; \
})
#endif
#ifndef ASSERT_EXCLUSIVE_WRITER
#define ASSERT_EXCLUSIVE_WRITER(var) do { } while (0)
#endif
#ifndef ASSERT_EXCLUSIVE_ACCESS
#define ASSERT_EXCLUSIVE_ACCESS(var) do { } while (0)
#endif
/* Bits for ->extendables field, extendables param, and related definitions. */
#define RCUTORTURE_RDR_SHIFT 8 /* Put SRCU index in upper bits. */
#define RCUTORTURE_RDR_MASK ((1 << RCUTORTURE_RDR_SHIFT) - 1)
......@@ -100,6 +87,7 @@ torture_param(bool, gp_normal, false,
"Use normal (non-expedited) GP wait primitives");
torture_param(bool, gp_sync, false, "Use synchronous GP wait primitives");
torture_param(int, irqreader, 1, "Allow RCU readers from irq handlers");
torture_param(int, leakpointer, 0, "Leak pointer dereferences from readers");
torture_param(int, n_barrier_cbs, 0,
"# of callbacks/kthreads for barrier testing");
torture_param(int, nfakewriters, 4, "Number of RCU fake writer threads");
......@@ -185,6 +173,7 @@ static long n_barrier_successes; /* did rcu_barrier test succeed? */
static unsigned long n_read_exits;
static struct list_head rcu_torture_removed;
static unsigned long shutdown_jiffies;
static unsigned long start_gp_seq;
static int rcu_torture_writer_state;
#define RTWS_FIXED_DELAY 0
......@@ -1413,6 +1402,9 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp)
preempt_enable();
rcutorture_one_extend(&readstate, 0, trsp, rtrsp);
WARN_ON_ONCE(readstate & RCUTORTURE_RDR_MASK);
// This next splat is expected behavior if leakpointer, especially
// for CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels.
WARN_ON_ONCE(leakpointer && READ_ONCE(p->rtort_pipe_count) > 1);
/* If error or close call, record the sequence of reader protections. */
if ((pipe_count > 1 || completed > 1) && !xchg(&err_segs_recorded, 1)) {
......@@ -1808,6 +1800,7 @@ struct rcu_fwd {
unsigned long rcu_launder_gp_seq_start;
};
static DEFINE_MUTEX(rcu_fwd_mutex);
static struct rcu_fwd *rcu_fwds;
static bool rcu_fwd_emergency_stop;
......@@ -2074,8 +2067,14 @@ static void rcu_torture_fwd_prog_cr(struct rcu_fwd *rfp)
static int rcutorture_oom_notify(struct notifier_block *self,
unsigned long notused, void *nfreed)
{
struct rcu_fwd *rfp = rcu_fwds;
struct rcu_fwd *rfp;
mutex_lock(&rcu_fwd_mutex);
rfp = rcu_fwds;
if (!rfp) {
mutex_unlock(&rcu_fwd_mutex);
return NOTIFY_OK;
}
WARN(1, "%s invoked upon OOM during forward-progress testing.\n",
__func__);
rcu_torture_fwd_cb_hist(rfp);
......@@ -2093,6 +2092,7 @@ static int rcutorture_oom_notify(struct notifier_block *self,
smp_mb(); /* Frees before return to avoid redoing OOM. */
(*(unsigned long *)nfreed)++; /* Forward progress CBs freed! */
pr_info("%s returning after OOM processing.\n", __func__);
mutex_unlock(&rcu_fwd_mutex);
return NOTIFY_OK;
}
......@@ -2114,13 +2114,11 @@ static int rcu_torture_fwd_prog(void *args)
do {
schedule_timeout_interruptible(fwd_progress_holdoff * HZ);
WRITE_ONCE(rcu_fwd_emergency_stop, false);
register_oom_notifier(&rcutorture_oom_nb);
if (!IS_ENABLED(CONFIG_TINY_RCU) ||
rcu_inkernel_boot_has_ended())
rcu_torture_fwd_prog_nr(rfp, &tested, &tested_tries);
if (rcu_inkernel_boot_has_ended())
rcu_torture_fwd_prog_cr(rfp);
unregister_oom_notifier(&rcutorture_oom_nb);
/* Avoid slow periods, better to test when busy. */
stutter_wait("rcu_torture_fwd_prog");
......@@ -2160,9 +2158,26 @@ static int __init rcu_torture_fwd_prog_init(void)
return -ENOMEM;
spin_lock_init(&rfp->rcu_fwd_lock);
rfp->rcu_fwd_cb_tail = &rfp->rcu_fwd_cb_head;
mutex_lock(&rcu_fwd_mutex);
rcu_fwds = rfp;
mutex_unlock(&rcu_fwd_mutex);
register_oom_notifier(&rcutorture_oom_nb);
return torture_create_kthread(rcu_torture_fwd_prog, rfp, fwd_prog_task);
}
static void rcu_torture_fwd_prog_cleanup(void)
{
struct rcu_fwd *rfp;
torture_stop_kthread(rcu_torture_fwd_prog, fwd_prog_task);
rfp = rcu_fwds;
mutex_lock(&rcu_fwd_mutex);
rcu_fwds = NULL;
mutex_unlock(&rcu_fwd_mutex);
unregister_oom_notifier(&rcutorture_oom_nb);
kfree(rfp);
}
/* Callback function for RCU barrier testing. */
static void rcu_torture_barrier_cbf(struct rcu_head *rcu)
{
......@@ -2460,7 +2475,7 @@ rcu_torture_cleanup(void)
show_rcu_gp_kthreads();
rcu_torture_read_exit_cleanup();
rcu_torture_barrier_cleanup();
torture_stop_kthread(rcu_torture_fwd_prog, fwd_prog_task);
rcu_torture_fwd_prog_cleanup();
torture_stop_kthread(rcu_torture_stall, stall_task);
torture_stop_kthread(rcu_torture_writer, writer_task);
......@@ -2482,8 +2497,9 @@ rcu_torture_cleanup(void)
rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq);
srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq);
pr_alert("%s: End-test grace-period state: g%lu f%#x\n",
cur_ops->name, gp_seq, flags);
pr_alert("%s: End-test grace-period state: g%ld f%#x total-gps=%ld\n",
cur_ops->name, (long)gp_seq, flags,
rcutorture_seq_diff(gp_seq, start_gp_seq));
torture_stop_kthread(rcu_torture_stats, stats_task);
torture_stop_kthread(rcu_torture_fqs, fqs_task);
if (rcu_torture_can_boost())
......@@ -2607,6 +2623,8 @@ rcu_torture_init(void)
long i;
int cpu;
int firsterr = 0;
int flags = 0;
unsigned long gp_seq = 0;
static struct rcu_torture_ops *torture_ops[] = {
&rcu_ops, &rcu_busted_ops, &srcu_ops, &srcud_ops,
&busted_srcud_ops, &tasks_ops, &tasks_rude_ops,
......@@ -2649,6 +2667,11 @@ rcu_torture_init(void)
nrealreaders = 1;
}
rcu_torture_print_module_parms(cur_ops, "Start of test");
rcutorture_get_gp_data(cur_ops->ttype, &flags, &gp_seq);
srcutorture_get_gp_data(cur_ops->ttype, srcu_ctlp, &flags, &gp_seq);
start_gp_seq = gp_seq;
pr_alert("%s: Start-test grace-period state: g%ld f%#x\n",
cur_ops->name, (long)gp_seq, flags);
/* Set up the freelist. */
......
......@@ -546,9 +546,11 @@ static int main_func(void *arg)
// Print the average of all experiments
SCALEOUT("END OF TEST. Calculating average duration per loop (nanoseconds)...\n");
buf[0] = 0;
strcat(buf, "\n");
strcat(buf, "Runs\tTime(ns)\n");
if (!errexit) {
buf[0] = 0;
strcat(buf, "\n");
strcat(buf, "Runs\tTime(ns)\n");
}
for (exp = 0; exp < nruns; exp++) {
u64 avg;
......
......@@ -29,19 +29,6 @@
#include "rcu.h"
#include "rcu_segcblist.h"
#ifndef data_race
#define data_race(expr) \
({ \
expr; \
})
#endif
#ifndef ASSERT_EXCLUSIVE_WRITER
#define ASSERT_EXCLUSIVE_WRITER(var) do { } while (0)
#endif
#ifndef ASSERT_EXCLUSIVE_ACCESS
#define ASSERT_EXCLUSIVE_ACCESS(var) do { } while (0)
#endif
/* Holdoff in nanoseconds for auto-expediting. */
#define DEFAULT_SRCU_EXP_HOLDOFF (25 * 1000)
static ulong exp_holdoff = DEFAULT_SRCU_EXP_HOLDOFF;
......
......@@ -70,19 +70,6 @@
#endif
#define MODULE_PARAM_PREFIX "rcutree."
#ifndef data_race
#define data_race(expr) \
({ \
expr; \
})
#endif
#ifndef ASSERT_EXCLUSIVE_WRITER
#define ASSERT_EXCLUSIVE_WRITER(var) do { } while (0)
#endif
#ifndef ASSERT_EXCLUSIVE_ACCESS
#define ASSERT_EXCLUSIVE_ACCESS(var) do { } while (0)
#endif
/* Data structures. */
/*
......@@ -178,6 +165,12 @@ module_param(gp_init_delay, int, 0444);
static int gp_cleanup_delay;
module_param(gp_cleanup_delay, int, 0444);
// Add delay to rcu_read_unlock() for strict grace periods.
static int rcu_unlock_delay;
#ifdef CONFIG_RCU_STRICT_GRACE_PERIOD
module_param(rcu_unlock_delay, int, 0444);
#endif
/*
* This rcu parameter is runtime-read-only. It reflects
* a minimum allowed number of objects which can be cached
......@@ -468,24 +461,25 @@ static int rcu_is_cpu_rrupt_from_idle(void)
return __this_cpu_read(rcu_data.dynticks_nesting) == 0;
}
#define DEFAULT_RCU_BLIMIT 10 /* Maximum callbacks per rcu_do_batch ... */
#define DEFAULT_MAX_RCU_BLIMIT 10000 /* ... even during callback flood. */
#define DEFAULT_RCU_BLIMIT (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ? 1000 : 10)
// Maximum callbacks per rcu_do_batch ...
#define DEFAULT_MAX_RCU_BLIMIT 10000 // ... even during callback flood.
static long blimit = DEFAULT_RCU_BLIMIT;
#define DEFAULT_RCU_QHIMARK 10000 /* If this many pending, ignore blimit. */
#define DEFAULT_RCU_QHIMARK 10000 // If this many pending, ignore blimit.
static long qhimark = DEFAULT_RCU_QHIMARK;
#define DEFAULT_RCU_QLOMARK 100 /* Once only this many pending, use blimit. */
#define DEFAULT_RCU_QLOMARK 100 // Once only this many pending, use blimit.
static long qlowmark = DEFAULT_RCU_QLOMARK;
#define DEFAULT_RCU_QOVLD_MULT 2
#define DEFAULT_RCU_QOVLD (DEFAULT_RCU_QOVLD_MULT * DEFAULT_RCU_QHIMARK)
static long qovld = DEFAULT_RCU_QOVLD; /* If this many pending, hammer QS. */
static long qovld_calc = -1; /* No pre-initialization lock acquisitions! */
static long qovld = DEFAULT_RCU_QOVLD; // If this many pending, hammer QS.
static long qovld_calc = -1; // No pre-initialization lock acquisitions!
module_param(blimit, long, 0444);
module_param(qhimark, long, 0444);
module_param(qlowmark, long, 0444);
module_param(qovld, long, 0444);
static ulong jiffies_till_first_fqs = ULONG_MAX;
static ulong jiffies_till_first_fqs = IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ? 0 : ULONG_MAX;
static ulong jiffies_till_next_fqs = ULONG_MAX;
static bool rcu_kick_kthreads;
static int rcu_divisor = 7;
......@@ -1092,11 +1086,6 @@ static void rcu_disable_urgency_upon_qs(struct rcu_data *rdp)
}
}
noinstr bool __rcu_is_watching(void)
{
return !rcu_dynticks_curr_cpu_in_eqs();
}
/**
* rcu_is_watching - see if RCU thinks that the current CPU is not idle
*
......@@ -1229,13 +1218,28 @@ static int rcu_implicit_dynticks_qs(struct rcu_data *rdp)
return 1;
}
/* If waiting too long on an offline CPU, complain. */
if (!(rdp->grpmask & rcu_rnp_online_cpus(rnp)) &&
time_after(jiffies, rcu_state.gp_start + HZ)) {
/*
* Complain if a CPU that is considered to be offline from RCU's
* perspective has not yet reported a quiescent state. After all,
* the offline CPU should have reported a quiescent state during
* the CPU-offline process, or, failing that, by rcu_gp_init()
* if it ran concurrently with either the CPU going offline or the
* last task on a leaf rcu_node structure exiting its RCU read-side
* critical section while all CPUs corresponding to that structure
* are offline. This added warning detects bugs in any of these
* code paths.
*
* The rcu_node structure's ->lock is held here, which excludes
* the relevant portions the CPU-hotplug code, the grace-period
* initialization code, and the rcu_read_unlock() code paths.
*
* For more detail, please refer to the "Hotplug CPU" section
* of RCU's Requirements documentation.
*/
if (WARN_ON_ONCE(!(rdp->grpmask & rcu_rnp_online_cpus(rnp)))) {
bool onl;
struct rcu_node *rnp1;
WARN_ON(1); /* Offline CPUs are supposed to report QS! */
pr_info("%s: grp: %d-%d level: %d ->gp_seq %ld ->completedqs %ld\n",
__func__, rnp->grplo, rnp->grphi, rnp->level,
(long)rnp->gp_seq, (long)rnp->completedqs);
......@@ -1498,9 +1502,10 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
/* Trace depending on how much we were able to accelerate. */
if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB"));
trace_rcu_grace_period(rcu_state.name, gp_seq_req, TPS("AccWaitCB"));
else
trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB"));
trace_rcu_grace_period(rcu_state.name, gp_seq_req, TPS("AccReadyCB"));
return ret;
}
......@@ -1575,6 +1580,19 @@ static void __maybe_unused rcu_advance_cbs_nowake(struct rcu_node *rnp,
raw_spin_unlock_rcu_node(rnp);
}
/*
* In CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels, attempt to generate a
* quiescent state. This is intended to be invoked when the CPU notices
* a new grace period.
*/
static void rcu_strict_gp_check_qs(void)
{
if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) {
rcu_read_lock();
rcu_read_unlock();
}
}
/*
* Update CPU-local rcu_data state to record the beginnings and ends of
* grace periods. The caller must hold the ->lock of the leaf rcu_node
......@@ -1645,6 +1663,7 @@ static void note_gp_changes(struct rcu_data *rdp)
}
needwake = __note_gp_changes(rnp, rdp);
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
rcu_strict_gp_check_qs();
if (needwake)
rcu_gp_kthread_wake();
}
......@@ -1682,6 +1701,15 @@ static void rcu_gp_torture_wait(void)
}
}
/*
* Handler for on_each_cpu() to invoke the target CPU's RCU core
* processing.
*/
static void rcu_strict_gp_boundary(void *unused)
{
invoke_rcu_core();
}
/*
* Initialize a new grace period. Return false if no grace period required.
*/
......@@ -1720,10 +1748,13 @@ static bool rcu_gp_init(void)
raw_spin_unlock_irq_rcu_node(rnp);
/*
* Apply per-leaf buffered online and offline operations to the
* rcu_node tree. Note that this new grace period need not wait
* for subsequent online CPUs, and that quiescent-state forcing
* will handle subsequent offline CPUs.
* Apply per-leaf buffered online and offline operations to
* the rcu_node tree. Note that this new grace period need not
* wait for subsequent online CPUs, and that RCU hooks in the CPU
* offlining path, when combined with checks in this function,
* will handle CPUs that are currently going offline or that will
* go offline later. Please also refer to "Hotplug CPU" section
* of RCU's Requirements documentation.
*/
rcu_state.gp_state = RCU_GP_ONOFF;
rcu_for_each_leaf_node(rnp) {
......@@ -1810,6 +1841,10 @@ static bool rcu_gp_init(void)
WRITE_ONCE(rcu_state.gp_activity, jiffies);
}
// If strict, make all CPUs aware of new grace period.
if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
on_each_cpu(rcu_strict_gp_boundary, NULL, 0);
return true;
}
......@@ -1898,7 +1933,7 @@ static void rcu_gp_fqs_loop(void)
break;
/* If time for quiescent-state forcing, do it. */
if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
(gf & RCU_GP_FLAG_FQS)) {
(gf & (RCU_GP_FLAG_FQS | RCU_GP_FLAG_OVLD))) {
trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
TPS("fqsstart"));
rcu_gp_fqs(first_gp_fqs);
......@@ -2026,6 +2061,10 @@ static void rcu_gp_cleanup(void)
rcu_state.gp_flags & RCU_GP_FLAG_INIT);
}
raw_spin_unlock_irq_rcu_node(rnp);
// If strict, make all CPUs aware of the end of the old grace period.
if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
on_each_cpu(rcu_strict_gp_boundary, NULL, 0);
}
/*
......@@ -2204,7 +2243,7 @@ rcu_report_unblock_qs_rnp(struct rcu_node *rnp, unsigned long flags)
* structure. This must be called from the specified CPU.
*/
static void
rcu_report_qs_rdp(int cpu, struct rcu_data *rdp)
rcu_report_qs_rdp(struct rcu_data *rdp)
{
unsigned long flags;
unsigned long mask;
......@@ -2213,6 +2252,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_data *rdp)
rcu_segcblist_is_offloaded(&rdp->cblist);
struct rcu_node *rnp;
WARN_ON_ONCE(rdp->cpu != smp_processor_id());
rnp = rdp->mynode;
raw_spin_lock_irqsave_rcu_node(rnp, flags);
if (rdp->cpu_no_qs.b.norm || rdp->gp_seq != rnp->gp_seq ||
......@@ -2229,8 +2269,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_data *rdp)
return;
}
mask = rdp->grpmask;
if (rdp->cpu == smp_processor_id())
rdp->core_needs_qs = false;
rdp->core_needs_qs = false;
if ((rnp->qsmask & mask) == 0) {
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
} else {
......@@ -2279,7 +2318,7 @@ rcu_check_quiescent_state(struct rcu_data *rdp)
* Tell RCU we are done (but rcu_report_qs_rdp() will be the
* judge of that).
*/
rcu_report_qs_rdp(rdp->cpu, rdp);
rcu_report_qs_rdp(rdp);
}
/*
......@@ -2376,6 +2415,7 @@ int rcutree_dead_cpu(unsigned int cpu)
*/
static void rcu_do_batch(struct rcu_data *rdp)
{
int div;
unsigned long flags;
const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
rcu_segcblist_is_offloaded(&rdp->cblist);
......@@ -2404,9 +2444,15 @@ static void rcu_do_batch(struct rcu_data *rdp)
rcu_nocb_lock(rdp);
WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
pending = rcu_segcblist_n_cbs(&rdp->cblist);
bl = max(rdp->blimit, pending >> rcu_divisor);
if (unlikely(bl > 100))
tlimit = local_clock() + rcu_resched_ns;
div = READ_ONCE(rcu_divisor);
div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div;
bl = max(rdp->blimit, pending >> div);
if (unlikely(bl > 100)) {
long rrn = READ_ONCE(rcu_resched_ns);
rrn = rrn < NSEC_PER_MSEC ? NSEC_PER_MSEC : rrn > NSEC_PER_SEC ? NSEC_PER_SEC : rrn;
tlimit = local_clock() + rrn;
}
trace_rcu_batch_start(rcu_state.name,
rcu_segcblist_n_cbs(&rdp->cblist), bl);
rcu_segcblist_extract_done_cbs(&rdp->cblist, &rcl);
......@@ -2547,8 +2593,7 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
raw_spin_lock_irqsave_rcu_node(rnp, flags);
rcu_state.cbovldnext |= !!rnp->cbovldmask;
if (rnp->qsmask == 0) {
if (!IS_ENABLED(CONFIG_PREEMPT_RCU) ||
rcu_preempt_blocked_readers_cgp(rnp)) {
if (rcu_preempt_blocked_readers_cgp(rnp)) {
/*
* No point in scanning bits because they
* are all zero. But we might need to
......@@ -2616,6 +2661,14 @@ void rcu_force_quiescent_state(void)
}
EXPORT_SYMBOL_GPL(rcu_force_quiescent_state);
// Workqueue handler for an RCU reader for kernels enforcing struct RCU
// grace periods.
static void strict_work_handler(struct work_struct *work)
{
rcu_read_lock();
rcu_read_unlock();
}
/* Perform RCU core processing work for the current CPU. */
static __latent_entropy void rcu_core(void)
{
......@@ -2660,6 +2713,10 @@ static __latent_entropy void rcu_core(void)
/* Do any needed deferred wakeups of rcuo kthreads. */
do_nocb_deferred_wakeup(rdp);
trace_rcu_utilization(TPS("End RCU core"));
// If strict GPs, schedule an RCU reader in a clean environment.
if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
queue_work_on(rdp->cpu, rcu_gp_wq, &rdp->strict_work);
}
static void rcu_core_si(struct softirq_action *h)
......@@ -3443,7 +3500,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
unsigned long count = 0;
/* Snapshot count of all CPUs */
for_each_online_cpu(cpu) {
for_each_possible_cpu(cpu) {
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
count += READ_ONCE(krcp->count);
......@@ -3458,7 +3515,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
int cpu, freed = 0;
unsigned long flags;
for_each_online_cpu(cpu) {
for_each_possible_cpu(cpu) {
int count;
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
......@@ -3491,7 +3548,7 @@ void __init kfree_rcu_scheduler_running(void)
int cpu;
unsigned long flags;
for_each_online_cpu(cpu) {
for_each_possible_cpu(cpu) {
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
raw_spin_lock_irqsave(&krcp->lock, flags);
......@@ -3855,6 +3912,7 @@ rcu_boot_init_percpu_data(int cpu)
/* Set up local state, ensuring consistent view of global state. */
rdp->grpmask = leaf_node_cpu_bit(rdp->mynode, cpu);
INIT_WORK(&rdp->strict_work, strict_work_handler);
WARN_ON_ONCE(rdp->dynticks_nesting != 1);
WARN_ON_ONCE(rcu_dynticks_in_eqs(rcu_dynticks_snap(rdp)));
rdp->rcu_ofl_gp_seq = rcu_state.gp_seq;
......@@ -3973,8 +4031,6 @@ int rcutree_offline_cpu(unsigned int cpu)
return 0;
}
static DEFINE_PER_CPU(int, rcu_cpu_started);
/*
* Mark the specified CPU as being online so that subsequent grace periods
* (both expedited and normal) will wait on it. Note that this means that
......@@ -3994,12 +4050,11 @@ void rcu_cpu_starting(unsigned int cpu)
struct rcu_node *rnp;
bool newcpu;
if (per_cpu(rcu_cpu_started, cpu))
rdp = per_cpu_ptr(&rcu_data, cpu);
if (rdp->cpu_started)
return;
rdp->cpu_started = true;
per_cpu(rcu_cpu_started, cpu) = 1;
rdp = per_cpu_ptr(&rcu_data, cpu);
rnp = rdp->mynode;
mask = rdp->grpmask;
raw_spin_lock_irqsave_rcu_node(rnp, flags);
......@@ -4059,7 +4114,7 @@ void rcu_report_dead(unsigned int cpu)
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
raw_spin_unlock(&rcu_state.ofl_lock);
per_cpu(rcu_cpu_started, cpu) = 0;
rdp->cpu_started = false;
}
/*
......
......@@ -156,6 +156,7 @@ struct rcu_data {
bool beenonline; /* CPU online at least once. */
bool gpwrap; /* Possible ->gp_seq wrap. */
bool exp_deferred_qs; /* This CPU awaiting a deferred QS? */
bool cpu_started; /* RCU watching this onlining CPU. */
struct rcu_node *mynode; /* This CPU's leaf of hierarchy */
unsigned long grpmask; /* Mask to apply to leaf qsmask. */
unsigned long ticks_this_gp; /* The number of scheduling-clock */
......@@ -164,6 +165,7 @@ struct rcu_data {
/* period it is aware of. */
struct irq_work defer_qs_iw; /* Obtain later scheduler attention. */
bool defer_qs_iw_pending; /* Scheduler attention pending? */
struct work_struct strict_work; /* Schedule readers for strict GPs. */
/* 2) batch handling */
struct rcu_segcblist cblist; /* Segmented callback list, with */
......
......@@ -732,11 +732,9 @@ static void rcu_exp_need_qs(void)
/* Invoked on each online non-idle CPU for expedited quiescent state. */
static void rcu_exp_handler(void *unused)
{
struct rcu_data *rdp;
struct rcu_node *rnp;
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
struct rcu_node *rnp = rdp->mynode;
rdp = this_cpu_ptr(&rcu_data);
rnp = rdp->mynode;
if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
__this_cpu_read(rcu_data.cpu_no_qs.b.exp))
return;
......
......@@ -36,6 +36,8 @@ static void __init rcu_bootup_announce_oddness(void)
pr_info("\tRCU dyntick-idle grace-period acceleration is enabled.\n");
if (IS_ENABLED(CONFIG_PROVE_RCU))
pr_info("\tRCU lockdep checking is enabled.\n");
if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
pr_info("\tRCU strict (and thus non-scalable) grace periods enabled.\n");
if (RCU_NUM_LVLS >= 4)
pr_info("\tFour(or more)-level hierarchy is enabled.\n");
if (RCU_FANOUT_LEAF != 16)
......@@ -374,6 +376,8 @@ void __rcu_read_lock(void)
rcu_preempt_read_enter();
if (IS_ENABLED(CONFIG_PROVE_LOCKING))
WARN_ON_ONCE(rcu_preempt_depth() > RCU_NEST_PMAX);
if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) && rcu_state.gp_kthread)
WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
barrier(); /* critical section after entry code. */
}
EXPORT_SYMBOL_GPL(__rcu_read_lock);
......@@ -455,8 +459,14 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
return;
}
t->rcu_read_unlock_special.s = 0;
if (special.b.need_qs)
rcu_qs();
if (special.b.need_qs) {
if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)) {
rcu_report_qs_rdp(rdp);
udelay(rcu_unlock_delay);
} else {
rcu_qs();
}
}
/*
* Respond to a request by an expedited grace period for a
......@@ -768,6 +778,24 @@ dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
#else /* #ifdef CONFIG_PREEMPT_RCU */
/*
* If strict grace periods are enabled, and if the calling
* __rcu_read_unlock() marks the beginning of a quiescent state, immediately
* report that quiescent state and, if requested, spin for a bit.
*/
void rcu_read_unlock_strict(void)
{
struct rcu_data *rdp;
if (!IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ||
irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
return;
rdp = this_cpu_ptr(&rcu_data);
rcu_report_qs_rdp(rdp);
udelay(rcu_unlock_delay);
}
EXPORT_SYMBOL_GPL(rcu_read_unlock_strict);
/*
* Tell them what RCU they are running.
*/
......@@ -1926,6 +1954,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
* nearest grace period (if any) to wait for next. The CB kthreads
* and the global grace-period kthread are awakened if needed.
*/
WARN_ON_ONCE(my_rdp->nocb_gp_rdp != my_rdp);
for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
rcu_nocb_lock_irqsave(rdp, flags);
......@@ -2411,13 +2440,12 @@ static void show_rcu_nocb_state(struct rcu_data *rdp)
return;
waslocked = raw_spin_is_locked(&rdp->nocb_gp_lock);
wastimer = timer_pending(&rdp->nocb_timer);
wastimer = timer_pending(&rdp->nocb_bypass_timer);
wassleep = swait_active(&rdp->nocb_gp_wq);
if (!rdp->nocb_defer_wakeup && !rdp->nocb_gp_sleep &&
!waslocked && !wastimer && !wassleep)
if (!rdp->nocb_gp_sleep && !waslocked && !wastimer && !wassleep)
return; /* Nothing untowards. */
pr_info(" !!! %c%c%c%c %c\n",
pr_info(" nocb GP activity on CB-only CPU!!! %c%c%c%c %c\n",
"lL"[waslocked],
"dD"[!!rdp->nocb_defer_wakeup],
"tT"[wastimer],
......
......@@ -158,7 +158,7 @@ static void rcu_stall_kick_kthreads(void)
{
unsigned long j;
if (!rcu_kick_kthreads)
if (!READ_ONCE(rcu_kick_kthreads))
return;
j = READ_ONCE(rcu_state.jiffies_kick_kthreads);
if (time_after(jiffies, j) && rcu_state.gp_kthread &&
......@@ -580,7 +580,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
unsigned long js;
struct rcu_node *rnp;
if ((rcu_stall_is_suppressed() && !rcu_kick_kthreads) ||
if ((rcu_stall_is_suppressed() && !READ_ONCE(rcu_kick_kthreads)) ||
!rcu_gp_in_progress())
return;
rcu_stall_kick_kthreads();
......@@ -623,7 +623,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
/* We haven't checked in, so go dump stack. */
print_cpu_stall(gps);
if (rcu_cpu_stall_ftrace_dump)
if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
rcu_ftrace_dump(DUMP_ALL);
} else if (rcu_gp_in_progress() &&
......@@ -632,7 +632,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
/* They had a few time units to dump stack, so complain. */
print_other_cpu_stall(gs2, gps);
if (rcu_cpu_stall_ftrace_dump)
if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
rcu_ftrace_dump(DUMP_ALL);
}
}
......
......@@ -53,19 +53,6 @@
#endif
#define MODULE_PARAM_PREFIX "rcupdate."
#ifndef data_race
#define data_race(expr) \
({ \
expr; \
})
#endif
#ifndef ASSERT_EXCLUSIVE_WRITER
#define ASSERT_EXCLUSIVE_WRITER(var) do { } while (0)
#endif
#ifndef ASSERT_EXCLUSIVE_ACCESS
#define ASSERT_EXCLUSIVE_ACCESS(var) do { } while (0)
#endif
#ifndef CONFIG_TINY_RCU
module_param(rcu_expedited, int, 0);
module_param(rcu_normal, int, 0);
......
// SPDX-License-Identifier: GPL-2.0+
//
// Torture test for smp_call_function() and friends.
//
// Copyright (C) Facebook, 2020.
//
// Author: Paul E. McKenney <paulmck@kernel.org>
#define pr_fmt(fmt) fmt
#include <linux/atomic.h>
#include <linux/bitops.h>
#include <linux/completion.h>
#include <linux/cpu.h>
#include <linux/delay.h>
#include <linux/err.h>
#include <linux/init.h>
#include <linux/interrupt.h>
#include <linux/kthread.h>
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/notifier.h>
#include <linux/percpu.h>
#include <linux/rcupdate.h>
#include <linux/rcupdate_trace.h>
#include <linux/reboot.h>
#include <linux/sched.h>
#include <linux/spinlock.h>
#include <linux/smp.h>
#include <linux/stat.h>
#include <linux/srcu.h>
#include <linux/slab.h>
#include <linux/torture.h>
#include <linux/types.h>
#define SCFTORT_STRING "scftorture"
#define SCFTORT_FLAG SCFTORT_STRING ": "
#define SCFTORTOUT(s, x...) \
pr_alert(SCFTORT_FLAG s, ## x)
#define VERBOSE_SCFTORTOUT(s, x...) \
do { if (verbose) pr_alert(SCFTORT_FLAG s, ## x); } while (0)
#define VERBOSE_SCFTORTOUT_ERRSTRING(s, x...) \
do { if (verbose) pr_alert(SCFTORT_FLAG "!!! " s, ## x); } while (0)
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Paul E. McKenney <paulmck@kernel.org>");
// Wait until there are multiple CPUs before starting test.
torture_param(int, holdoff, IS_BUILTIN(CONFIG_SCF_TORTURE_TEST) ? 10 : 0,
"Holdoff time before test start (s)");
torture_param(int, longwait, 0, "Include ridiculously long waits? (seconds)");
torture_param(int, nthreads, -1, "# threads, defaults to -1 for all CPUs.");
torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)");
torture_param(int, onoff_interval, 0, "Time between CPU hotplugs (s), 0=disable");
torture_param(int, shutdown_secs, 0, "Shutdown time (ms), <= zero to disable.");
torture_param(int, stat_interval, 60, "Number of seconds between stats printk()s.");
torture_param(int, stutter_cpus, 5, "Number of jiffies to change CPUs under test, 0=disable");
torture_param(bool, use_cpus_read_lock, 0, "Use cpus_read_lock() to exclude CPU hotplug.");
torture_param(int, verbose, 0, "Enable verbose debugging printk()s");
torture_param(int, weight_single, -1, "Testing weight for single-CPU no-wait operations.");
torture_param(int, weight_single_wait, -1, "Testing weight for single-CPU operations.");
torture_param(int, weight_many, -1, "Testing weight for multi-CPU no-wait operations.");
torture_param(int, weight_many_wait, -1, "Testing weight for multi-CPU operations.");
torture_param(int, weight_all, -1, "Testing weight for all-CPU no-wait operations.");
torture_param(int, weight_all_wait, -1, "Testing weight for all-CPU operations.");
char *torture_type = "";
#ifdef MODULE
# define SCFTORT_SHUTDOWN 0
#else
# define SCFTORT_SHUTDOWN 1
#endif
torture_param(bool, shutdown, SCFTORT_SHUTDOWN, "Shutdown at end of torture test.");
struct scf_statistics {
struct task_struct *task;
int cpu;
long long n_single;
long long n_single_ofl;
long long n_single_wait;
long long n_single_wait_ofl;
long long n_many;
long long n_many_wait;
long long n_all;
long long n_all_wait;
};
static struct scf_statistics *scf_stats_p;
static struct task_struct *scf_torture_stats_task;
static DEFINE_PER_CPU(long long, scf_invoked_count);
// Data for random primitive selection
#define SCF_PRIM_SINGLE 0
#define SCF_PRIM_MANY 1
#define SCF_PRIM_ALL 2
#define SCF_NPRIMS (2 * 3) // Need wait and no-wait versions of each.
static char *scf_prim_name[] = {
"smp_call_function_single",
"smp_call_function_many",
"smp_call_function",
};
struct scf_selector {
unsigned long scfs_weight;
int scfs_prim;
bool scfs_wait;
};
static struct scf_selector scf_sel_array[SCF_NPRIMS];
static int scf_sel_array_len;
static unsigned long scf_sel_totweight;
// Communicate between caller and handler.
struct scf_check {
bool scfc_in;
bool scfc_out;
int scfc_cpu; // -1 for not _single().
bool scfc_wait;
};
// Use to wait for all threads to start.
static atomic_t n_started;
static atomic_t n_errs;
static atomic_t n_mb_in_errs;
static atomic_t n_mb_out_errs;
static atomic_t n_alloc_errs;
static bool scfdone;
static char *bangstr = "";
static DEFINE_TORTURE_RANDOM_PERCPU(scf_torture_rand);
// Print torture statistics. Caller must ensure serialization.
static void scf_torture_stats_print(void)
{
int cpu;
int i;
long long invoked_count = 0;
bool isdone = READ_ONCE(scfdone);
struct scf_statistics scfs = {};
for_each_possible_cpu(cpu)
invoked_count += data_race(per_cpu(scf_invoked_count, cpu));
for (i = 0; i < nthreads; i++) {
scfs.n_single += scf_stats_p[i].n_single;
scfs.n_single_ofl += scf_stats_p[i].n_single_ofl;
scfs.n_single_wait += scf_stats_p[i].n_single_wait;
scfs.n_single_wait_ofl += scf_stats_p[i].n_single_wait_ofl;
scfs.n_many += scf_stats_p[i].n_many;
scfs.n_many_wait += scf_stats_p[i].n_many_wait;
scfs.n_all += scf_stats_p[i].n_all;
scfs.n_all_wait += scf_stats_p[i].n_all_wait;
}
if (atomic_read(&n_errs) || atomic_read(&n_mb_in_errs) ||
atomic_read(&n_mb_out_errs) || atomic_read(&n_alloc_errs))
bangstr = "!!! ";
pr_alert("%s %sscf_invoked_count %s: %lld single: %lld/%lld single_ofl: %lld/%lld many: %lld/%lld all: %lld/%lld ",
SCFTORT_FLAG, bangstr, isdone ? "VER" : "ver", invoked_count,
scfs.n_single, scfs.n_single_wait, scfs.n_single_ofl, scfs.n_single_wait_ofl,
scfs.n_many, scfs.n_many_wait, scfs.n_all, scfs.n_all_wait);
torture_onoff_stats();
pr_cont("ste: %d stnmie: %d stnmoe: %d staf: %d\n", atomic_read(&n_errs),
atomic_read(&n_mb_in_errs), atomic_read(&n_mb_out_errs),
atomic_read(&n_alloc_errs));
}
// Periodically prints torture statistics, if periodic statistics printing
// was specified via the stat_interval module parameter.
static int
scf_torture_stats(void *arg)
{
VERBOSE_TOROUT_STRING("scf_torture_stats task started");
do {
schedule_timeout_interruptible(stat_interval * HZ);
scf_torture_stats_print();
torture_shutdown_absorb("scf_torture_stats");
} while (!torture_must_stop());
torture_kthread_stopping("scf_torture_stats");
return 0;
}
// Add a primitive to the scf_sel_array[].
static void scf_sel_add(unsigned long weight, int prim, bool wait)
{
struct scf_selector *scfsp = &scf_sel_array[scf_sel_array_len];
// If no weight, if array would overflow, if computing three-place
// percentages would overflow, or if the scf_prim_name[] array would
// overflow, don't bother. In the last three two cases, complain.
if (!weight ||
WARN_ON_ONCE(scf_sel_array_len >= ARRAY_SIZE(scf_sel_array)) ||
WARN_ON_ONCE(0 - 100000 * weight <= 100000 * scf_sel_totweight) ||
WARN_ON_ONCE(prim >= ARRAY_SIZE(scf_prim_name)))
return;
scf_sel_totweight += weight;
scfsp->scfs_weight = scf_sel_totweight;
scfsp->scfs_prim = prim;
scfsp->scfs_wait = wait;
scf_sel_array_len++;
}
// Dump out weighting percentages for scf_prim_name[] array.
static void scf_sel_dump(void)
{
int i;
unsigned long oldw = 0;
struct scf_selector *scfsp;
unsigned long w;
for (i = 0; i < scf_sel_array_len; i++) {
scfsp = &scf_sel_array[i];
w = (scfsp->scfs_weight - oldw) * 100000 / scf_sel_totweight;
pr_info("%s: %3lu.%03lu %s(%s)\n", __func__, w / 1000, w % 1000,
scf_prim_name[scfsp->scfs_prim],
scfsp->scfs_wait ? "wait" : "nowait");
oldw = scfsp->scfs_weight;
}
}
// Randomly pick a primitive and wait/nowait, based on weightings.
static struct scf_selector *scf_sel_rand(struct torture_random_state *trsp)
{
int i;
unsigned long w = torture_random(trsp) % (scf_sel_totweight + 1);
for (i = 0; i < scf_sel_array_len; i++)
if (scf_sel_array[i].scfs_weight >= w)
return &scf_sel_array[i];
WARN_ON_ONCE(1);
return &scf_sel_array[0];
}
// Update statistics and occasionally burn up mass quantities of CPU time,
// if told to do so via scftorture.longwait. Otherwise, occasionally burn
// a little bit.
static void scf_handler(void *scfc_in)
{
int i;
int j;
unsigned long r = torture_random(this_cpu_ptr(&scf_torture_rand));
struct scf_check *scfcp = scfc_in;
if (likely(scfcp)) {
WRITE_ONCE(scfcp->scfc_out, false); // For multiple receivers.
if (WARN_ON_ONCE(unlikely(!READ_ONCE(scfcp->scfc_in))))
atomic_inc(&n_mb_in_errs);
}
this_cpu_inc(scf_invoked_count);
if (longwait <= 0) {
if (!(r & 0xffc0))
udelay(r & 0x3f);
goto out;
}
if (r & 0xfff)
goto out;
r = (r >> 12);
if (longwait <= 0) {
udelay((r & 0xff) + 1);
goto out;
}
r = r % longwait + 1;
for (i = 0; i < r; i++) {
for (j = 0; j < 1000; j++) {
udelay(1000);
cpu_relax();
}
}
out:
if (unlikely(!scfcp))
return;
if (scfcp->scfc_wait)
WRITE_ONCE(scfcp->scfc_out, true);
else
kfree(scfcp);
}
// As above, but check for correct CPU.
static void scf_handler_1(void *scfc_in)
{
struct scf_check *scfcp = scfc_in;
if (likely(scfcp) && WARN_ONCE(smp_processor_id() != scfcp->scfc_cpu, "%s: Wanted CPU %d got CPU %d\n", __func__, scfcp->scfc_cpu, smp_processor_id())) {
atomic_inc(&n_errs);
}
scf_handler(scfcp);
}
// Randomly do an smp_call_function*() invocation.
static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_random_state *trsp)
{
uintptr_t cpu;
int ret = 0;
struct scf_check *scfcp = NULL;
struct scf_selector *scfsp = scf_sel_rand(trsp);
if (use_cpus_read_lock)
cpus_read_lock();
else
preempt_disable();
if (scfsp->scfs_prim == SCF_PRIM_SINGLE || scfsp->scfs_wait) {
scfcp = kmalloc(sizeof(*scfcp), GFP_ATOMIC);
if (WARN_ON_ONCE(!scfcp)) {
atomic_inc(&n_alloc_errs);
} else {
scfcp->scfc_cpu = -1;
scfcp->scfc_wait = scfsp->scfs_wait;
scfcp->scfc_out = false;
}
}
switch (scfsp->scfs_prim) {
case SCF_PRIM_SINGLE:
cpu = torture_random(trsp) % nr_cpu_ids;
if (scfsp->scfs_wait)
scfp->n_single_wait++;
else
scfp->n_single++;
if (scfcp) {
scfcp->scfc_cpu = cpu;
barrier(); // Prevent race-reduction compiler optimizations.
scfcp->scfc_in = true;
}
ret = smp_call_function_single(cpu, scf_handler_1, (void *)scfcp, scfsp->scfs_wait);
if (ret) {
if (scfsp->scfs_wait)
scfp->n_single_wait_ofl++;
else
scfp->n_single_ofl++;
kfree(scfcp);
scfcp = NULL;
}
break;
case SCF_PRIM_MANY:
if (scfsp->scfs_wait)
scfp->n_many_wait++;
else
scfp->n_many++;
if (scfcp) {
barrier(); // Prevent race-reduction compiler optimizations.
scfcp->scfc_in = true;
}
smp_call_function_many(cpu_online_mask, scf_handler, scfcp, scfsp->scfs_wait);
break;
case SCF_PRIM_ALL:
if (scfsp->scfs_wait)
scfp->n_all_wait++;
else
scfp->n_all++;
if (scfcp) {
barrier(); // Prevent race-reduction compiler optimizations.
scfcp->scfc_in = true;
}
smp_call_function(scf_handler, scfcp, scfsp->scfs_wait);
break;
default:
WARN_ON_ONCE(1);
if (scfcp)
scfcp->scfc_out = true;
}
if (scfcp && scfsp->scfs_wait) {
if (WARN_ON_ONCE((num_online_cpus() > 1 || scfsp->scfs_prim == SCF_PRIM_SINGLE) &&
!scfcp->scfc_out))
atomic_inc(&n_mb_out_errs); // Leak rather than trash!
else
kfree(scfcp);
barrier(); // Prevent race-reduction compiler optimizations.
}
if (use_cpus_read_lock)
cpus_read_unlock();
else
preempt_enable();
if (!(torture_random(trsp) & 0xfff))
schedule_timeout_uninterruptible(1);
}
// SCF test kthread. Repeatedly does calls to members of the
// smp_call_function() family of functions.
static int scftorture_invoker(void *arg)
{
int cpu;
DEFINE_TORTURE_RANDOM(rand);
struct scf_statistics *scfp = (struct scf_statistics *)arg;
bool was_offline = false;
VERBOSE_SCFTORTOUT("scftorture_invoker %d: task started", scfp->cpu);
cpu = scfp->cpu % nr_cpu_ids;
set_cpus_allowed_ptr(current, cpumask_of(cpu));
set_user_nice(current, MAX_NICE);
if (holdoff)
schedule_timeout_interruptible(holdoff * HZ);
VERBOSE_SCFTORTOUT("scftorture_invoker %d: Waiting for all SCF torturers from cpu %d", scfp->cpu, smp_processor_id());
// Make sure that the CPU is affinitized appropriately during testing.
WARN_ON_ONCE(smp_processor_id() != scfp->cpu);
if (!atomic_dec_return(&n_started))
while (atomic_read_acquire(&n_started)) {
if (torture_must_stop()) {
VERBOSE_SCFTORTOUT("scftorture_invoker %d ended before starting", scfp->cpu);
goto end;
}
schedule_timeout_uninterruptible(1);
}
VERBOSE_SCFTORTOUT("scftorture_invoker %d started", scfp->cpu);
do {
scftorture_invoke_one(scfp, &rand);
while (cpu_is_offline(cpu) && !torture_must_stop()) {
schedule_timeout_interruptible(HZ / 5);
was_offline = true;
}
if (was_offline) {
set_cpus_allowed_ptr(current, cpumask_of(cpu));
was_offline = false;
}
cond_resched();
} while (!torture_must_stop());
VERBOSE_SCFTORTOUT("scftorture_invoker %d ended", scfp->cpu);
end:
torture_kthread_stopping("scftorture_invoker");
return 0;
}
static void
scftorture_print_module_parms(const char *tag)
{
pr_alert(SCFTORT_FLAG
"--- %s: verbose=%d holdoff=%d longwait=%d nthreads=%d onoff_holdoff=%d onoff_interval=%d shutdown_secs=%d stat_interval=%d stutter_cpus=%d use_cpus_read_lock=%d, weight_single=%d, weight_single_wait=%d, weight_many=%d, weight_many_wait=%d, weight_all=%d, weight_all_wait=%d\n", tag,
verbose, holdoff, longwait, nthreads, onoff_holdoff, onoff_interval, shutdown, stat_interval, stutter_cpus, use_cpus_read_lock, weight_single, weight_single_wait, weight_many, weight_many_wait, weight_all, weight_all_wait);
}
static void scf_cleanup_handler(void *unused)
{
}
static void scf_torture_cleanup(void)
{
int i;
if (torture_cleanup_begin())
return;
WRITE_ONCE(scfdone, true);
if (nthreads)
for (i = 0; i < nthreads; i++)
torture_stop_kthread("scftorture_invoker", scf_stats_p[i].task);
else
goto end;
smp_call_function(scf_cleanup_handler, NULL, 0);
torture_stop_kthread(scf_torture_stats, scf_torture_stats_task);
scf_torture_stats_print(); // -After- the stats thread is stopped!
kfree(scf_stats_p); // -After- the last stats print has completed!
scf_stats_p = NULL;
if (atomic_read(&n_errs) || atomic_read(&n_mb_in_errs) || atomic_read(&n_mb_out_errs))
scftorture_print_module_parms("End of test: FAILURE");
else if (torture_onoff_failures())
scftorture_print_module_parms("End of test: LOCK_HOTPLUG");
else
scftorture_print_module_parms("End of test: SUCCESS");
end:
torture_cleanup_end();
}
static int __init scf_torture_init(void)
{
long i;
int firsterr = 0;
unsigned long weight_single1 = weight_single;
unsigned long weight_single_wait1 = weight_single_wait;
unsigned long weight_many1 = weight_many;
unsigned long weight_many_wait1 = weight_many_wait;
unsigned long weight_all1 = weight_all;
unsigned long weight_all_wait1 = weight_all_wait;
if (!torture_init_begin(SCFTORT_STRING, verbose))
return -EBUSY;
scftorture_print_module_parms("Start of test");
if (weight_single == -1 && weight_single_wait == -1 &&
weight_many == -1 && weight_many_wait == -1 &&
weight_all == -1 && weight_all_wait == -1) {
weight_single1 = 2 * nr_cpu_ids;
weight_single_wait1 = 2 * nr_cpu_ids;
weight_many1 = 2;
weight_many_wait1 = 2;
weight_all1 = 1;
weight_all_wait1 = 1;
} else {
if (weight_single == -1)
weight_single1 = 0;
if (weight_single_wait == -1)
weight_single_wait1 = 0;
if (weight_many == -1)
weight_many1 = 0;
if (weight_many_wait == -1)
weight_many_wait1 = 0;
if (weight_all == -1)
weight_all1 = 0;
if (weight_all_wait == -1)
weight_all_wait1 = 0;
}
if (weight_single1 == 0 && weight_single_wait1 == 0 &&
weight_many1 == 0 && weight_many_wait1 == 0 &&
weight_all1 == 0 && weight_all_wait1 == 0) {
VERBOSE_SCFTORTOUT_ERRSTRING("all zero weights makes no sense");
firsterr = -EINVAL;
goto unwind;
}
scf_sel_add(weight_single1, SCF_PRIM_SINGLE, false);
scf_sel_add(weight_single_wait1, SCF_PRIM_SINGLE, true);
scf_sel_add(weight_many1, SCF_PRIM_MANY, false);
scf_sel_add(weight_many_wait1, SCF_PRIM_MANY, true);
scf_sel_add(weight_all1, SCF_PRIM_ALL, false);
scf_sel_add(weight_all_wait1, SCF_PRIM_ALL, true);
scf_sel_dump();
if (onoff_interval > 0) {
firsterr = torture_onoff_init(onoff_holdoff * HZ, onoff_interval, NULL);
if (firsterr)
goto unwind;
}
if (shutdown_secs > 0) {
firsterr = torture_shutdown_init(shutdown_secs, scf_torture_cleanup);
if (firsterr)
goto unwind;
}
// Worker tasks invoking smp_call_function().
if (nthreads < 0)
nthreads = num_online_cpus();
scf_stats_p = kcalloc(nthreads, sizeof(scf_stats_p[0]), GFP_KERNEL);
if (!scf_stats_p) {
VERBOSE_SCFTORTOUT_ERRSTRING("out of memory");
firsterr = -ENOMEM;
goto unwind;
}
VERBOSE_SCFTORTOUT("Starting %d smp_call_function() threads\n", nthreads);
atomic_set(&n_started, nthreads);
for (i = 0; i < nthreads; i++) {
scf_stats_p[i].cpu = i;
firsterr = torture_create_kthread(scftorture_invoker, (void *)&scf_stats_p[i],
scf_stats_p[i].task);
if (firsterr)
goto unwind;
}
if (stat_interval > 0) {
firsterr = torture_create_kthread(scf_torture_stats, NULL, scf_torture_stats_task);
if (firsterr)
goto unwind;
}
torture_init_end();
return 0;
unwind:
torture_init_end();
scf_torture_cleanup();
return firsterr;
}
module_init(scf_torture_init);
module_exit(scf_torture_cleanup);
......@@ -20,6 +20,9 @@
#include <linux/sched.h>
#include <linux/sched/idle.h>
#include <linux/hypervisor.h>
#include <linux/sched/clock.h>
#include <linux/nmi.h>
#include <linux/sched/debug.h>
#include "smpboot.h"
#include "sched/smp.h"
......@@ -96,6 +99,103 @@ void __init call_function_init(void)
smpcfd_prepare_cpu(smp_processor_id());
}
#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
static DEFINE_PER_CPU(call_single_data_t *, cur_csd);
static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
static DEFINE_PER_CPU(void *, cur_csd_info);
#define CSD_LOCK_TIMEOUT (5ULL * NSEC_PER_SEC)
static atomic_t csd_bug_count = ATOMIC_INIT(0);
/* Record current CSD work for current CPU, NULL to erase. */
static void csd_lock_record(call_single_data_t *csd)
{
if (!csd) {
smp_mb(); /* NULL cur_csd after unlock. */
__this_cpu_write(cur_csd, NULL);
return;
}
__this_cpu_write(cur_csd_func, csd->func);
__this_cpu_write(cur_csd_info, csd->info);
smp_wmb(); /* func and info before csd. */
__this_cpu_write(cur_csd, csd);
smp_mb(); /* Update cur_csd before function call. */
/* Or before unlock, as the case may be. */
}
static __always_inline int csd_lock_wait_getcpu(call_single_data_t *csd)
{
unsigned int csd_type;
csd_type = CSD_TYPE(csd);
if (csd_type == CSD_TYPE_ASYNC || csd_type == CSD_TYPE_SYNC)
return csd->dst; /* Other CSD_TYPE_ values might not have ->dst. */
return -1;
}
/*
* Complain if too much time spent waiting. Note that only
* the CSD_TYPE_SYNC/ASYNC types provide the destination CPU,
* so waiting on other types gets much less information.
*/
static __always_inline bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id)
{
int cpu = -1;
int cpux;
bool firsttime;
u64 ts2, ts_delta;
call_single_data_t *cpu_cur_csd;
unsigned int flags = READ_ONCE(csd->flags);
if (!(flags & CSD_FLAG_LOCK)) {
if (!unlikely(*bug_id))
return true;
cpu = csd_lock_wait_getcpu(csd);
pr_alert("csd: CSD lock (#%d) got unstuck on CPU#%02d, CPU#%02d released the lock.\n",
*bug_id, raw_smp_processor_id(), cpu);
return true;
}
ts2 = sched_clock();
ts_delta = ts2 - *ts1;
if (likely(ts_delta <= CSD_LOCK_TIMEOUT))
return false;
firsttime = !*bug_id;
if (firsttime)
*bug_id = atomic_inc_return(&csd_bug_count);
cpu = csd_lock_wait_getcpu(csd);
if (WARN_ONCE(cpu < 0 || cpu >= nr_cpu_ids, "%s: cpu = %d\n", __func__, cpu))
cpux = 0;
else
cpux = cpu;
cpu_cur_csd = smp_load_acquire(&per_cpu(cur_csd, cpux)); /* Before func and info. */
pr_alert("csd: %s non-responsive CSD lock (#%d) on CPU#%d, waiting %llu ns for CPU#%02d %pS(%ps).\n",
firsttime ? "Detected" : "Continued", *bug_id, raw_smp_processor_id(), ts2 - ts0,
cpu, csd->func, csd->info);
if (cpu_cur_csd && csd != cpu_cur_csd) {
pr_alert("\tcsd: CSD lock (#%d) handling prior %pS(%ps) request.\n",
*bug_id, READ_ONCE(per_cpu(cur_csd_func, cpux)),
READ_ONCE(per_cpu(cur_csd_info, cpux)));
} else {
pr_alert("\tcsd: CSD lock (#%d) %s.\n",
*bug_id, !cpu_cur_csd ? "unresponsive" : "handling this request");
}
if (cpu >= 0) {
if (!trigger_single_cpu_backtrace(cpu))
dump_cpu_task(cpu);
if (!cpu_cur_csd) {
pr_alert("csd: Re-sending CSD lock (#%d) IPI from CPU#%02d to CPU#%02d\n", *bug_id, raw_smp_processor_id(), cpu);
arch_send_call_function_single_ipi(cpu);
}
}
dump_stack();
*ts1 = ts2;
return false;
}
/*
* csd_lock/csd_unlock used to serialize access to per-cpu csd resources
*
......@@ -103,10 +203,30 @@ void __init call_function_init(void)
* previous function call. For multi-cpu calls its even more interesting
* as we'll have to ensure no other cpu is observing our csd.
*/
static __always_inline void csd_lock_wait(call_single_data_t *csd)
{
int bug_id = 0;
u64 ts0, ts1;
ts1 = ts0 = sched_clock();
for (;;) {
if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id))
break;
cpu_relax();
}
smp_acquire__after_ctrl_dep();
}
#else
static void csd_lock_record(call_single_data_t *csd)
{
}
static __always_inline void csd_lock_wait(call_single_data_t *csd)
{
smp_cond_load_acquire(&csd->flags, !(VAL & CSD_FLAG_LOCK));
}
#endif
static __always_inline void csd_lock(call_single_data_t *csd)
{
......@@ -166,9 +286,11 @@ static int generic_exec_single(int cpu, call_single_data_t *csd)
* We can unlock early even for the synchronous on-stack case,
* since we're doing this from the same CPU..
*/
csd_lock_record(csd);
csd_unlock(csd);
local_irq_save(flags);
func(info);
csd_lock_record(NULL);
local_irq_restore(flags);
return 0;
}
......@@ -268,8 +390,10 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline)
entry = &csd_next->llist;
}
csd_lock_record(csd);
func(info);
csd_unlock(csd);
csd_lock_record(NULL);
} else {
prev = &csd->llist;
}
......@@ -296,8 +420,10 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline)
smp_call_func_t func = csd->func;
void *info = csd->info;
csd_lock_record(csd);
csd_unlock(csd);
func(info);
csd_lock_record(NULL);
} else if (type == CSD_TYPE_IRQ_WORK) {
irq_work_single(csd);
}
......@@ -375,6 +501,10 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
csd->func = func;
csd->info = info;
#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
csd->src = smp_processor_id();
csd->dst = cpu;
#endif
err = generic_exec_single(cpu, csd);
......@@ -540,6 +670,10 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
csd->flags |= CSD_TYPE_SYNC;
csd->func = func;
csd->info = info;
#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
csd->src = smp_processor_id();
csd->dst = cpu;
#endif
if (llist_add(&csd->llist, &per_cpu(call_single_queue, cpu)))
__cpumask_set_cpu(cpu, cfd->cpumask_ipi);
}
......
......@@ -927,7 +927,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
if (ratelimit < 10 &&
(local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
pr_warn("NOHZ: local_softirq_pending %02x\n",
pr_warn("NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #%02x!!!\n",
(unsigned int) local_softirq_pending());
ratelimit++;
}
......
......@@ -1367,6 +1367,27 @@ config WW_MUTEX_SELFTEST
Say M if you want these self tests to build as a module.
Say N if you are unsure.
config SCF_TORTURE_TEST
tristate "torture tests for smp_call_function*()"
depends on DEBUG_KERNEL
select TORTURE_TEST
help
This option provides a kernel module that runs torture tests
on the smp_call_function() family of primitives. The kernel
module may be built after the fact on the running kernel to
be tested, if desired.
config CSD_LOCK_WAIT_DEBUG
bool "Debugging for csd_lock_wait(), called from smp_call_function*()"
depends on DEBUG_KERNEL
depends on 64BIT
default n
help
This option enables debug prints when CPUs are slow to respond
to the smp_call_function*() IPI wrappers. These debug prints
include the IPI handler function currently executing (if any)
and relevant stack traces.
endmenu # lock debugging
config TRACE_IRQFLAGS
......
......@@ -85,12 +85,16 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
put_cpu();
}
// Dump stacks even for idle CPUs.
static bool backtrace_idle;
module_param(backtrace_idle, bool, 0644);
bool nmi_cpu_backtrace(struct pt_regs *regs)
{
int cpu = smp_processor_id();
if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
if (regs && cpu_in_idle(instruction_pointer(regs))) {
if (!READ_ONCE(backtrace_idle) && regs && cpu_in_idle(instruction_pointer(regs))) {
pr_warn("NMI backtrace for cpu %d skipped: idling at %pS\n",
cpu, (void *)instruction_pointer(regs));
} else {
......
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+
#
# Analyze a given results directory for rcuperf performance measurements,
# Analyze a given results directory for rcuscale performance measurements,
# looking for ftrace data. Exits with 0 if data was found, analyzed, and
# printed. Intended to be invoked from kvm-recheck-rcuperf.sh after
# printed. Intended to be invoked from kvm-recheck-rcuscale.sh after
# argument checking.
#
# Usage: kvm-recheck-rcuperf-ftrace.sh resdir
# Usage: kvm-recheck-rcuscale-ftrace.sh resdir
#
# Copyright (C) IBM Corporation, 2016
#
......
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+
#
# Analyze a given results directory for rcuperf performance measurements.
# Analyze a given results directory for rcuscale scalability measurements.
#
# Usage: kvm-recheck-rcuperf.sh resdir
# Usage: kvm-recheck-rcuscale.sh resdir
#
# Copyright (C) IBM Corporation, 2016
#
......@@ -20,7 +20,7 @@ fi
PATH=`pwd`/tools/testing/selftests/rcutorture/bin:$PATH; export PATH
. functions.sh
if kvm-recheck-rcuperf-ftrace.sh $i
if kvm-recheck-rcuscale-ftrace.sh $i
then
# ftrace data was successfully analyzed, call it good!
exit 0
......@@ -30,12 +30,12 @@ configfile=`echo $i | sed -e 's/^.*\///'`
sed -e 's/^\[[^]]*]//' < $i/console.log |
awk '
/-perf: .* gps: .* batches:/ {
/-scale: .* gps: .* batches:/ {
ngps = $9;
nbatches = $11;
}
/-perf: .*writer-duration/ {
/-scale: .*writer-duration/ {
gptimes[++n] = $5 / 1000.;
sum += $5 / 1000.;
}
......@@ -43,7 +43,7 @@ awk '
END {
newNR = asort(gptimes);
if (newNR <= 0) {
print "No rcuperf records found???"
print "No rcuscale records found???"
exit;
}
pct50 = int(newNR * 50 / 100);
......@@ -79,5 +79,5 @@ END {
print "99th percentile grace-period duration: " gptimes[pct99];
print "Maximum grace-period duration: " gptimes[newNR];
print "Grace periods: " ngps + 0 " Batches: " nbatches + 0 " Ratio: " ngps / nbatches;
print "Computed from rcuperf printk output.";
print "Computed from rcuscale printk output.";
}'
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+
#
# Analyze a given results directory for rcutorture progress.
#
# Usage: kvm-recheck-rcu.sh resdir
#
# Copyright (C) Facebook, 2020
#
# Authors: Paul E. McKenney <paulmck@kernel.org>
i="$1"
if test -d "$i" -a -r "$i"
then
:
else
echo Unreadable results directory: $i
exit 1
fi
. functions.sh
configfile=`echo $i | sed -e 's/^.*\///'`
nscfs="`grep 'scf_invoked_count ver:' $i/console.log 2> /dev/null | tail -1 | sed -e 's/^.* scf_invoked_count ver: //' -e 's/ .*$//' | tr -d '\015'`"
if test -z "$nscfs"
then
echo "$configfile ------- "
else
dur="`sed -e 's/^.* scftorture.shutdown_secs=//' -e 's/ .*$//' < $i/qemu-cmd 2> /dev/null`"
if test -z "$dur"
then
rate=""
else
nscfss=`awk -v nscfs=$nscfs -v dur=$dur '
BEGIN { print nscfs / dur }' < /dev/null`
rate=" ($nscfss/s)"
fi
echo "${configfile} ------- ${nscfs} SCF handler invocations$rate"
fi
......@@ -66,6 +66,7 @@ config_override_param () {
echo > $T/KcList
config_override_param "$config_dir/CFcommon" KcList "`cat $config_dir/CFcommon 2> /dev/null`"
config_override_param "$config_template" KcList "`cat $config_template 2> /dev/null`"
config_override_param "--gdb options" KcList "$TORTURE_KCONFIG_GDB_ARG"
config_override_param "--kasan options" KcList "$TORTURE_KCONFIG_KASAN_ARG"
config_override_param "--kcsan options" KcList "$TORTURE_KCONFIG_KCSAN_ARG"
config_override_param "--kconfig argument" KcList "$TORTURE_KCONFIG_ARG"
......@@ -152,7 +153,11 @@ qemu_append="`identify_qemu_append "$QEMU"`"
boot_args="`configfrag_boot_params "$boot_args" "$config_template"`"
# Generate kernel-version-specific boot parameters
boot_args="`per_version_boot_params "$boot_args" $resdir/.config $seconds`"
echo $QEMU $qemu_args -m $TORTURE_QEMU_MEM -kernel $KERNEL -append \"$qemu_append $boot_args\" > $resdir/qemu-cmd
if test -n "$TORTURE_BOOT_GDB_ARG"
then
boot_args="$boot_args $TORTURE_BOOT_GDB_ARG"
fi
echo $QEMU $qemu_args -m $TORTURE_QEMU_MEM -kernel $KERNEL -append \"$qemu_append $boot_args\" $TORTURE_QEMU_GDB_ARG > $resdir/qemu-cmd
if test -n "$TORTURE_BUILDONLY"
then
......@@ -171,14 +176,26 @@ echo "NOTE: $QEMU either did not run or was interactive" > $resdir/console.log
# Attempt to run qemu
( . $T/qemu-cmd; wait `cat $resdir/qemu_pid`; echo $? > $resdir/qemu-retval ) &
commandcompleted=0
sleep 10 # Give qemu's pid a chance to reach the file
if test -s "$resdir/qemu_pid"
if test -z "$TORTURE_KCONFIG_GDB_ARG"
then
qemu_pid=`cat "$resdir/qemu_pid"`
echo Monitoring qemu job at pid $qemu_pid
else
qemu_pid=""
echo Monitoring qemu job at yet-as-unknown pid
sleep 10 # Give qemu's pid a chance to reach the file
if test -s "$resdir/qemu_pid"
then
qemu_pid=`cat "$resdir/qemu_pid"`
echo Monitoring qemu job at pid $qemu_pid
else
qemu_pid=""
echo Monitoring qemu job at yet-as-unknown pid
fi
fi
if test -n "$TORTURE_KCONFIG_GDB_ARG"
then
echo Waiting for you to attach a debug session, for example: > /dev/tty
echo " gdb $base_resdir/vmlinux" > /dev/tty
echo 'After symbols load and the "(gdb)" prompt appears:' > /dev/tty
echo " target remote :1234" > /dev/tty
echo " continue" > /dev/tty
kstarttime=`gawk 'BEGIN { print systime() }' < /dev/null`
fi
while :
do
......
......@@ -31,6 +31,9 @@ TORTURE_DEFCONFIG=defconfig
TORTURE_BOOT_IMAGE=""
TORTURE_INITRD="$KVM/initrd"; export TORTURE_INITRD
TORTURE_KCONFIG_ARG=""
TORTURE_KCONFIG_GDB_ARG=""
TORTURE_BOOT_GDB_ARG=""
TORTURE_QEMU_GDB_ARG=""
TORTURE_KCONFIG_KASAN_ARG=""
TORTURE_KCONFIG_KCSAN_ARG=""
TORTURE_KMAKE_ARG=""
......@@ -46,6 +49,7 @@ jitter="-1"
usage () {
echo "Usage: $scriptname optional arguments:"
echo " --allcpus"
echo " --bootargs kernel-boot-arguments"
echo " --bootimage relative-path-to-kernel-boot-image"
echo " --buildonly"
......@@ -55,17 +59,19 @@ usage () {
echo " --defconfig string"
echo " --dryrun sched|script"
echo " --duration minutes"
echo " --gdb"
echo " --help"
echo " --interactive"
echo " --jitter N [ maxsleep (us) [ maxspin (us) ] ]"
echo " --kconfig Kconfig-options"
echo " --kmake-arg kernel-make-arguments"
echo " --mac nn:nn:nn:nn:nn:nn"
echo " --memory megabytes | nnnG"
echo " --memory megabytes|nnnG"
echo " --no-initrd"
echo " --qemu-args qemu-arguments"
echo " --qemu-cmd qemu-system-..."
echo " --results absolute-pathname"
echo " --torture rcu"
echo " --torture lock|rcu|rcuscale|refscale|scf"
echo " --trust-make"
exit 1
}
......@@ -126,6 +132,14 @@ do
dur=$(($2*60))
shift
;;
--gdb)
TORTURE_KCONFIG_GDB_ARG="CONFIG_DEBUG_INFO=y"; export TORTURE_KCONFIG_GDB_ARG
TORTURE_BOOT_GDB_ARG="nokaslr"; export TORTURE_BOOT_GDB_ARG
TORTURE_QEMU_GDB_ARG="-s -S"; export TORTURE_QEMU_GDB_ARG
;;
--help|-h)
usage
;;
--interactive)
TORTURE_QEMU_INTERACTIVE=1; export TORTURE_QEMU_INTERACTIVE
;;
......@@ -184,13 +198,13 @@ do
shift
;;
--torture)
checkarg --torture "(suite name)" "$#" "$2" '^\(lock\|rcu\|rcuperf\|refscale\)$' '^--'
checkarg --torture "(suite name)" "$#" "$2" '^\(lock\|rcu\|rcuscale\|refscale\|scf\)$' '^--'
TORTURE_SUITE=$2
shift
if test "$TORTURE_SUITE" = rcuperf || test "$TORTURE_SUITE" = refscale
if test "$TORTURE_SUITE" = rcuscale || test "$TORTURE_SUITE" = refscale
then
# If you really want jitter for refscale or
# rcuperf, specify it after specifying the rcuperf
# rcuscale, specify it after specifying the rcuscale
# or the refscale. (But why jitter in these cases?)
jitter=0
fi
......@@ -248,6 +262,15 @@ do
done
touch $T/cfgcpu
configs_derep="`echo $configs_derep | sed -e "s/\<CFLIST\>/$defaultconfigs/g"`"
if test -n "$TORTURE_KCONFIG_GDB_ARG"
then
if test "`echo $configs_derep | wc -w`" -gt 1
then
echo "The --config list is: $configs_derep."
echo "Only one --config permitted with --gdb, terminating."
exit 1
fi
fi
for CF1 in $configs_derep
do
if test -f "$CONFIGFRAG/$CF1"
......@@ -323,6 +346,9 @@ TORTURE_BUILDONLY="$TORTURE_BUILDONLY"; export TORTURE_BUILDONLY
TORTURE_DEFCONFIG="$TORTURE_DEFCONFIG"; export TORTURE_DEFCONFIG
TORTURE_INITRD="$TORTURE_INITRD"; export TORTURE_INITRD
TORTURE_KCONFIG_ARG="$TORTURE_KCONFIG_ARG"; export TORTURE_KCONFIG_ARG
TORTURE_KCONFIG_GDB_ARG="$TORTURE_KCONFIG_GDB_ARG"; export TORTURE_KCONFIG_GDB_ARG
TORTURE_BOOT_GDB_ARG="$TORTURE_BOOT_GDB_ARG"; export TORTURE_BOOT_GDB_ARG
TORTURE_QEMU_GDB_ARG="$TORTURE_QEMU_GDB_ARG"; export TORTURE_QEMU_GDB_ARG
TORTURE_KCONFIG_KASAN_ARG="$TORTURE_KCONFIG_KASAN_ARG"; export TORTURE_KCONFIG_KASAN_ARG
TORTURE_KCONFIG_KCSAN_ARG="$TORTURE_KCONFIG_KCSAN_ARG"; export TORTURE_KCONFIG_KCSAN_ARG
TORTURE_KMAKE_ARG="$TORTURE_KMAKE_ARG"; export TORTURE_KMAKE_ARG
......
......@@ -33,8 +33,8 @@ then
fi
cat /dev/null > $file.diags
# Check for proper termination, except for rcuperf and refscale.
if test "$TORTURE_SUITE" != rcuperf && test "$TORTURE_SUITE" != refscale
# Check for proper termination, except for rcuscale and refscale.
if test "$TORTURE_SUITE" != rcuscale && test "$TORTURE_SUITE" != refscale
then
# check for abject failure
......@@ -67,6 +67,7 @@ then
grep --binary-files=text 'torture:.*ver:' $file |
egrep --binary-files=text -v '\(null\)|rtc: 000000000* ' |
sed -e 's/^(initramfs)[^]]*] //' -e 's/^\[[^]]*] //' |
sed -e 's/^.*ver: //' |
awk '
BEGIN {
ver = 0;
......@@ -74,13 +75,13 @@ then
}
{
if (!badseq && ($5 + 0 != $5 || $5 <= ver)) {
if (!badseq && ($1 + 0 != $1 || $1 <= ver)) {
badseqno1 = ver;
badseqno2 = $5;
badseqno2 = $1;
badseqnr = NR;
badseq = 1;
}
ver = $5
ver = $1
}
END {
......
......@@ -16,5 +16,6 @@ CONFIG_RCU_NOCB_CPU=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
#CHECK#CONFIG_PROVE_RCU=y
CONFIG_PROVE_RCU_LIST=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
CONFIG_RCU_EXPERT=y
......@@ -11,6 +11,6 @@
#
# Adds per-version torture-module parameters to kernels supporting them.
per_version_boot_params () {
echo $1 rcuperf.shutdown=1 \
rcuperf.verbose=1
echo $1 rcuscale.shutdown=1 \
rcuscale.verbose=1
}
CONFIG_SCF_TORTURE_TEST=y
CONFIG_PRINTK_TIME=y
CONFIG_SMP=y
CONFIG_PREEMPT_NONE=y
CONFIG_PREEMPT_VOLUNTARY=n
CONFIG_PREEMPT=n
CONFIG_HZ_PERIODIC=n
CONFIG_NO_HZ_IDLE=n
CONFIG_NO_HZ_FULL=y
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_PROVE_LOCKING=n
CONFIG_SMP=y
CONFIG_PREEMPT_NONE=n
CONFIG_PREEMPT_VOLUNTARY=n
CONFIG_PREEMPT=y
CONFIG_HZ_PERIODIC=n
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ_FULL=n
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+
#
# Torture-suite-dependent shell functions for the rest of the scripts.
#
# Copyright (C) Facebook, 2020
#
# Authors: Paul E. McKenney <paulmck@kernel.org>
# scftorture_param_onoff bootparam-string config-file
#
# Adds onoff scftorture module parameters to kernels having it.
scftorture_param_onoff () {
if ! bootparam_hotplug_cpu "$1" && configfrag_hotplug_cpu "$2"
then
echo CPU-hotplug kernel, adding scftorture onoff. 1>&2
echo scftorture.onoff_interval=1000 scftorture.onoff_holdoff=30
fi
}
# per_version_boot_params bootparam-string config-file seconds
#
# Adds per-version torture-module parameters to kernels supporting them.
per_version_boot_params () {
echo $1 `scftorture_param_onoff "$1" "$2"` \
scftorture.stat_interval=15 \
scftorture.shutdown_secs=$3 \
scftorture.verbose=1 \
scf
}
The rcutorture scripting tools automatically create the needed initrd
directory using dracut. Failing that, this tool will create an initrd
containing a single statically linked binary named "init" that loops
over a very long sleep() call. In both cases, this creation is done
by tools/testing/selftests/rcutorture/bin/mkinitrd.sh.
The rcutorture scripting tools automatically create an initrd containing
a single statically linked binary named "init" that loops over a
very long sleep() call. In both cases, this creation is done by
tools/testing/selftests/rcutorture/bin/mkinitrd.sh.
However, if you are attempting to run rcutorture on a system that does
not have dracut installed, and if you don't like the notion of static
linking, you might wish to press an existing initrd into service:
However, if you don't like the notion of statically linked bare-bones
userspace environments, you might wish to press an existing initrd
into service:
------------------------------------------------------------------------
cd tools/testing/selftests/rcutorture
......@@ -15,24 +14,3 @@ mkdir initrd
cd initrd
cpio -id < /tmp/initrd.img.zcat
# Manually verify that initrd contains needed binaries and libraries.
------------------------------------------------------------------------
Interestingly enough, if you are running rcutorture, you don't really
need userspace in many cases. Running without userspace has the
advantage of allowing you to test your kernel independently of the
distro in place, the root-filesystem layout, and so on. To make this
happen, put the following script in the initrd's tree's "/init" file,
with 0755 mode.
------------------------------------------------------------------------
#!/bin/sh
while :
do
sleep 10
done
------------------------------------------------------------------------
This approach also allows most of the binaries and libraries in the
initrd filesystem to be dispensed with, which can save significant
space in rcutorture's "res" directory.
This document describes one way to create the rcu-test-image file
that contains the filesystem used by the guest-OS kernel. There are
probably much better ways of doing this, and this filesystem could no
doubt be smaller. It is probably also possible to simply download
an appropriate image from any number of places.
Normally, a minimal initrd is created automatically by the rcutorture
scripting. But minimal really does mean "minimal", namely just a single
root directory with a single statically linked executable named "init":
$ size tools/testing/selftests/rcutorture/initrd/init
text data bss dec hex filename
328 0 8 336 150 tools/testing/selftests/rcutorture/initrd/init
Suppose you need to run some scripts, perhaps to monitor or control
some aspect of the rcutorture testing. This will require a more fully
filled-out userspace, perhaps containing libraries, executables for
the shell and other utilities, and soforth. In that case, place your
desired filesystem here:
tools/testing/selftests/rcutorture/initrd
For example, your tools/testing/selftests/rcutorture/initrd/init might
be a script that does any needed mount operations and starts whatever
scripts need starting to properly monitor or control your testing.
The next rcutorture build will then incorporate this filesystem into
the kernel image that is passed to qemu.
Or maybe you need a real root filesystem for some reason, in which case
please read on!
The remainder of this document describes one way to create the
rcu-test-image file that contains the filesystem used by the guest-OS
kernel. There are probably much better ways of doing this, and this
filesystem could no doubt be smaller. It is probably also possible to
simply download an appropriate image from any number of places.
That said, here are the commands:
......@@ -36,7 +61,7 @@ References:
https://help.ubuntu.com/community/JeOSVMBuilder
http://wiki.libvirt.org/page/UbuntuKVMWalkthrough
http://www.moe.co.uk/2011/01/07/pci_add_option_rom-failed-to-find-romfile-pxe-rtl8139-bin/ -- "apt-get install kvm-pxe"
http://www.landley.net/writing/rootfs-howto.html
http://en.wikipedia.org/wiki/Initrd
http://en.wikipedia.org/wiki/Cpio
https://www.landley.net/writing/rootfs-howto.html
https://en.wikipedia.org/wiki/Initrd
https://en.wikipedia.org/wiki/Cpio
http://wiki.libvirt.org/page/UbuntuKVMWalkthrough
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment