Commit 31da0670 authored by Paul E. McKenney's avatar Paul E. McKenney

Merge branches 'consolidate.2019.08.01b', 'fixes.2019.08.12a',...

Merge branches 'consolidate.2019.08.01b', 'fixes.2019.08.12a', 'lists.2019.08.13a' and 'torture.2019.08.01b' into HEAD

consolidate.2019.08.01b: Further consolidation cleanups
fixes.2019.08.12a: Miscellaneous fixes
lists.2019.08.13a: Optional lockdep arguments for RCU list macros
torture.2019.08.01b: Torture-test updates
......@@ -2129,6 +2129,8 @@ Some of the relevant points of interest are as follows:
<li> <a href="#Hotplug CPU">Hotplug CPU</a>.
<li> <a href="#Scheduler and RCU">Scheduler and RCU</a>.
<li> <a href="#Tracing and RCU">Tracing and RCU</a>.
<li> <a href="#Accesses to User Memory and RCU">
Accesses to User Memory and RCU</a>.
<li> <a href="#Energy Efficiency">Energy Efficiency</a>.
<li> <a href="#Scheduling-Clock Interrupts and RCU">
Scheduling-Clock Interrupts and RCU</a>.
......@@ -2512,7 +2514,7 @@ disabled across the entire RCU read-side critical section.
<p>
It is possible to use tracing on RCU code, but tracing itself
uses RCU.
For this reason, <tt>rcu_dereference_raw_notrace()</tt>
For this reason, <tt>rcu_dereference_raw_check()</tt>
is provided for use by tracing, which avoids the destructive
recursion that could otherwise ensue.
This API is also used by virtualization in some architectures,
......@@ -2521,6 +2523,75 @@ cannot be used.
The tracing folks both located the requirement and provided the
needed fix, so this surprise requirement was relatively painless.
<h3><a name="Accesses to User Memory and RCU">
Accesses to User Memory and RCU</a></h3>
<p>
The kernel needs to access user-space memory, for example, to access
data referenced by system-call parameters.
The <tt>get_user()</tt> macro does this job.
<p>
However, user-space memory might well be paged out, which means
that <tt>get_user()</tt> might well page-fault and thus block while
waiting for the resulting I/O to complete.
It would be a very bad thing for the compiler to reorder
a <tt>get_user()</tt> invocation into an RCU read-side critical
section.
For example, suppose that the source code looked like this:
<blockquote>
<pre>
1 rcu_read_lock();
2 p = rcu_dereference(gp);
3 v = p-&gt;value;
4 rcu_read_unlock();
5 get_user(user_v, user_p);
6 do_something_with(v, user_v);
</pre>
</blockquote>
<p>
The compiler must not be permitted to transform this source code into
the following:
<blockquote>
<pre>
1 rcu_read_lock();
2 p = rcu_dereference(gp);
3 get_user(user_v, user_p); // BUG: POSSIBLE PAGE FAULT!!!
4 v = p-&gt;value;
5 rcu_read_unlock();
6 do_something_with(v, user_v);
</pre>
</blockquote>
<p>
If the compiler did make this transformation in a
<tt>CONFIG_PREEMPT=n</tt> kernel build, and if <tt>get_user()</tt> did
page fault, the result would be a quiescent state in the middle
of an RCU read-side critical section.
This misplaced quiescent state could result in line&nbsp;4 being
a use-after-free access, which could be bad for your kernel's
actuarial statistics.
Similar examples can be constructed with the call to <tt>get_user()</tt>
preceding the <tt>rcu_read_lock()</tt>.
<p>
Unfortunately, <tt>get_user()</tt> doesn't have any particular
ordering properties, and in some architectures the underlying <tt>asm</tt>
isn't even marked <tt>volatile</tt>.
And even if it was marked <tt>volatile</tt>, the above access to
<tt>p-&gt;value</tt> is not volatile, so the compiler would not have any
reason to keep those two accesses in order.
<p>
Therefore, the Linux-kernel definitions of <tt>rcu_read_lock()</tt>
and <tt>rcu_read_unlock()</tt> must act as compiler barriers,
at least for outermost instances of <tt>rcu_read_lock()</tt> and
<tt>rcu_read_unlock()</tt> within a nested set of RCU read-side critical
sections.
<h3><a name="Energy Efficiency">Energy Efficiency</a></h3>
<p>
......
......@@ -57,6 +57,12 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
CONFIG_PREEMPT_RCU case, you might see stall-warning
messages.
You can use the rcutree.kthread_prio kernel boot parameter to
increase the scheduling priority of RCU's kthreads, which can
help avoid this problem. However, please note that doing this
can increase your system's context-switch rate and thus degrade
performance.
o A periodic interrupt whose handler takes longer than the time
interval between successive pairs of interrupts. This can
prevent RCU's kthreads and softirq handlers from running.
......
......@@ -4047,6 +4047,10 @@
rcutorture.verbose= [KNL]
Enable additional printk() statements.
rcupdate.rcu_cpu_stall_ftrace_dump= [KNL]
Dump ftrace buffer after reporting RCU CPU
stall warning.
rcupdate.rcu_cpu_stall_suppress= [KNL]
Suppress RCU CPU stall warning messages.
......
......@@ -9326,7 +9326,7 @@ F: drivers/misc/lkdtm/*
LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM)
M: Alan Stern <stern@rowland.harvard.edu>
M: Andrea Parri <andrea.parri@amarulasolutions.com>
M: Andrea Parri <parri.andrea@gmail.com>
M: Will Deacon <will@kernel.org>
M: Peter Zijlstra <peterz@infradead.org>
M: Boqun Feng <boqun.feng@gmail.com>
......
......@@ -264,15 +264,13 @@ int __cpu_disable(void)
return 0;
}
static DECLARE_COMPLETION(cpu_died);
/*
* called on the thread which is asking for a CPU to be shutdown -
* waits until shutdown has completed, or it is timed out.
*/
void __cpu_die(unsigned int cpu)
{
if (!wait_for_completion_timeout(&cpu_died, msecs_to_jiffies(5000))) {
if (!cpu_wait_death(cpu, 5)) {
pr_err("CPU%u: cpu didn't die\n", cpu);
return;
}
......@@ -319,7 +317,7 @@ void arch_cpu_idle_dead(void)
* this returns, power and/or clocks can be removed at any point
* from this CPU and its cache by platform_cpu_kill().
*/
complete(&cpu_died);
(void)cpu_report_death();
/*
* Ensure that the cache lines associated with that completion are
......
......@@ -535,7 +535,7 @@ static inline void note_hpte_modification(struct kvm *kvm,
*/
static inline struct kvm_memslots *kvm_memslots_raw(struct kvm *kvm)
{
return rcu_dereference_raw_notrace(kvm->memslots[0]);
return rcu_dereference_raw_check(kvm->memslots[0]);
}
extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
......
......@@ -29,6 +29,7 @@
static bool pci_mmcfg_running_state;
static bool pci_mmcfg_arch_init_failed;
static DEFINE_MUTEX(pci_mmcfg_lock);
#define pci_mmcfg_lock_held() lock_is_held(&(pci_mmcfg_lock).dep_map)
LIST_HEAD(pci_mmcfg_list);
......@@ -54,7 +55,7 @@ static void list_add_sorted(struct pci_mmcfg_region *new)
struct pci_mmcfg_region *cfg;
/* keep list sorted by segment and starting bus number */
list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {
list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list, pci_mmcfg_lock_held()) {
if (cfg->segment > new->segment ||
(cfg->segment == new->segment &&
cfg->start_bus >= new->start_bus)) {
......@@ -118,7 +119,7 @@ struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus)
{
struct pci_mmcfg_region *cfg;
list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list, pci_mmcfg_lock_held())
if (cfg->segment == segment &&
cfg->start_bus <= bus && bus <= cfg->end_bus)
return cfg;
......
......@@ -14,6 +14,7 @@
#include <linux/slab.h>
#include <linux/mm.h>
#include <linux/highmem.h>
#include <linux/lockdep.h>
#include <linux/pci.h>
#include <linux/interrupt.h>
#include <linux/kmod.h>
......@@ -80,6 +81,7 @@ struct acpi_ioremap {
static LIST_HEAD(acpi_ioremaps);
static DEFINE_MUTEX(acpi_ioremap_lock);
#define acpi_ioremap_lock_held() lock_is_held(&acpi_ioremap_lock.dep_map)
static void __init acpi_request_region (struct acpi_generic_address *gas,
unsigned int length, char *desc)
......@@ -206,7 +208,7 @@ acpi_map_lookup(acpi_physical_address phys, acpi_size size)
{
struct acpi_ioremap *map;
list_for_each_entry_rcu(map, &acpi_ioremaps, list)
list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
if (map->phys <= phys &&
phys + size <= map->phys + map->size)
return map;
......@@ -249,7 +251,7 @@ acpi_map_lookup_virt(void __iomem *virt, acpi_size size)
{
struct acpi_ioremap *map;
list_for_each_entry_rcu(map, &acpi_ioremaps, list)
list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
if (map->virt <= virt &&
virt + size <= map->virt + map->size)
return map;
......
......@@ -165,6 +165,7 @@ static inline int devtmpfs_init(void) { return 0; }
/* Device links support */
extern int device_links_read_lock(void);
extern void device_links_read_unlock(int idx);
extern int device_links_read_lock_held(void);
extern int device_links_check_suppliers(struct device *dev);
extern void device_links_driver_bound(struct device *dev);
extern void device_links_driver_cleanup(struct device *dev);
......
......@@ -68,6 +68,11 @@ void device_links_read_unlock(int idx)
{
srcu_read_unlock(&device_links_srcu, idx);
}
int device_links_read_lock_held(void)
{
return srcu_read_lock_held(&device_links_srcu);
}
#else /* !CONFIG_SRCU */
static DECLARE_RWSEM(device_links_lock);
......@@ -91,6 +96,13 @@ void device_links_read_unlock(int not_used)
{
up_read(&device_links_lock);
}
#ifdef CONFIG_DEBUG_LOCK_ALLOC
int device_links_read_lock_held(void)
{
return lockdep_is_held(&device_links_lock);
}
#endif
#endif /* !CONFIG_SRCU */
/**
......
......@@ -287,7 +287,8 @@ static int rpm_get_suppliers(struct device *dev)
{
struct device_link *link;
list_for_each_entry_rcu(link, &dev->links.suppliers, c_node) {
list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
device_links_read_lock_held()) {
int retval;
if (!(link->flags & DL_FLAG_PM_RUNTIME) ||
......@@ -309,7 +310,8 @@ static void rpm_put_suppliers(struct device *dev)
{
struct device_link *link;
list_for_each_entry_rcu(link, &dev->links.suppliers, c_node) {
list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
device_links_read_lock_held()) {
if (READ_ONCE(link->status) == DL_STATE_SUPPLIER_UNBIND)
continue;
......@@ -1640,7 +1642,8 @@ void pm_runtime_clean_up_links(struct device *dev)
idx = device_links_read_lock();
list_for_each_entry_rcu(link, &dev->links.consumers, s_node) {
list_for_each_entry_rcu(link, &dev->links.consumers, s_node,
device_links_read_lock_held()) {
if (link->flags & DL_FLAG_STATELESS)
continue;
......@@ -1662,7 +1665,8 @@ void pm_runtime_get_suppliers(struct device *dev)
idx = device_links_read_lock();
list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
device_links_read_lock_held())
if (link->flags & DL_FLAG_PM_RUNTIME) {
link->supplier_preactivated = true;
refcount_inc(&link->rpm_active);
......@@ -1683,7 +1687,8 @@ void pm_runtime_put_suppliers(struct device *dev)
idx = device_links_read_lock();
list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
device_links_read_lock_held())
if (link->supplier_preactivated) {
link->supplier_preactivated = false;
if (refcount_dec_not_one(&link->rpm_active))
......
......@@ -31,9 +31,7 @@ struct rcu_sync {
*/
static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
{
RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
!rcu_read_lock_bh_held() &&
!rcu_read_lock_sched_held(),
RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
"suspicious rcu_sync_is_idle() usage");
return !READ_ONCE(rsp->gp_state); /* GP_IDLE */
}
......
......@@ -40,6 +40,24 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list)
*/
#define list_next_rcu(list) (*((struct list_head __rcu **)(&(list)->next)))
/*
* Check during list traversal that we are within an RCU reader
*/
#define check_arg_count_one(dummy)
#ifdef CONFIG_PROVE_RCU_LIST
#define __list_check_rcu(dummy, cond, extra...) \
({ \
check_arg_count_one(extra); \
RCU_LOCKDEP_WARN(!cond && !rcu_read_lock_any_held(), \
"RCU-list traversed in non-reader section!"); \
})
#else
#define __list_check_rcu(dummy, cond, extra...) \
({ check_arg_count_one(extra); })
#endif
/*
* Insert a new entry between two known consecutive entries.
*
......@@ -343,14 +361,16 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
* @pos: the type * to use as a loop cursor.
* @head: the head for your list.
* @member: the name of the list_head within the struct.
* @cond: optional lockdep expression if called from non-RCU protection.
*
* This list-traversal primitive may safely run concurrently with
* the _rcu list-mutation primitives such as list_add_rcu()
* as long as the traversal is guarded by rcu_read_lock().
*/
#define list_for_each_entry_rcu(pos, head, member) \
for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
&pos->member != (head); \
#define list_for_each_entry_rcu(pos, head, member, cond...) \
for (__list_check_rcu(dummy, ## cond, 0), \
pos = list_entry_rcu((head)->next, typeof(*pos), member); \
&pos->member != (head); \
pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
/**
......@@ -616,13 +636,15 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n,
* @pos: the type * to use as a loop cursor.
* @head: the head for your list.
* @member: the name of the hlist_node within the struct.
* @cond: optional lockdep expression if called from non-RCU protection.
*
* This list-traversal primitive may safely run concurrently with
* the _rcu list-mutation primitives such as hlist_add_head_rcu()
* as long as the traversal is guarded by rcu_read_lock().
*/
#define hlist_for_each_entry_rcu(pos, head, member) \
for (pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
#define hlist_for_each_entry_rcu(pos, head, member, cond...) \
for (__list_check_rcu(dummy, ## cond, 0), \
pos = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(head)),\
typeof(*(pos)), member); \
pos; \
pos = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(\
......@@ -642,10 +664,10 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n,
* not do any RCU debugging or tracing.
*/
#define hlist_for_each_entry_rcu_notrace(pos, head, member) \
for (pos = hlist_entry_safe (rcu_dereference_raw_notrace(hlist_first_rcu(head)),\
for (pos = hlist_entry_safe(rcu_dereference_raw_check(hlist_first_rcu(head)),\
typeof(*(pos)), member); \
pos; \
pos = hlist_entry_safe(rcu_dereference_raw_notrace(hlist_next_rcu(\
pos = hlist_entry_safe(rcu_dereference_raw_check(hlist_next_rcu(\
&(pos)->member)), typeof(*(pos)), member))
/**
......
......@@ -221,6 +221,7 @@ int debug_lockdep_rcu_enabled(void);
int rcu_read_lock_held(void);
int rcu_read_lock_bh_held(void);
int rcu_read_lock_sched_held(void);
int rcu_read_lock_any_held(void);
#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
......@@ -241,6 +242,12 @@ static inline int rcu_read_lock_sched_held(void)
{
return !preemptible();
}
static inline int rcu_read_lock_any_held(void)
{
return !preemptible();
}
#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
#ifdef CONFIG_PROVE_RCU
......@@ -476,7 +483,7 @@ do { \
* The no-tracing version of rcu_dereference_raw() must not call
* rcu_read_lock_held().
*/
#define rcu_dereference_raw_notrace(p) __rcu_dereference_check((p), 1, __rcu)
#define rcu_dereference_raw_check(p) __rcu_dereference_check((p), 1, __rcu)
/**
* rcu_dereference_protected() - fetch RCU pointer when updates prevented
......
......@@ -620,7 +620,7 @@ static void print_lock(struct held_lock *hlock)
return;
}
printk(KERN_CONT "%p", hlock->instance);
printk(KERN_CONT "%px", hlock->instance);
print_lock_name(lock);
printk(KERN_CONT ", at: %pS\n", (void *)hlock->acquire_ip);
}
......
......@@ -8,6 +8,17 @@ menu "RCU Debugging"
config PROVE_RCU
def_bool PROVE_LOCKING
config PROVE_RCU_LIST
bool "RCU list lockdep debugging"
depends on PROVE_RCU && RCU_EXPERT
default n
help
Enable RCU lockdep checking for list usages. By default it is
turned off since there are several list RCU users that still
need to be converted to pass a lockdep expression. To prevent
false-positive splats, we keep it default disabled but once all
users are converted, we can remove this config option.
config TORTURE_TEST
tristate
default n
......
......@@ -227,6 +227,7 @@ static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
#ifdef CONFIG_RCU_STALL_COMMON
extern int rcu_cpu_stall_ftrace_dump;
extern int rcu_cpu_stall_suppress;
extern int rcu_cpu_stall_timeout;
int rcu_jiffies_till_stall_check(void);
......
......@@ -76,27 +76,6 @@ static inline bool rcu_segcblist_restempty(struct rcu_segcblist *rsclp, int seg)
return !*rsclp->tails[seg];
}
/*
* Interim function to return rcu_segcblist head pointer. Longer term, the
* rcu_segcblist will be used more pervasively, removing the need for this
* function.
*/
static inline struct rcu_head *rcu_segcblist_head(struct rcu_segcblist *rsclp)
{
return rsclp->head;
}
/*
* Interim function to return rcu_segcblist head pointer. Longer term, the
* rcu_segcblist will be used more pervasively, removing the need for this
* function.
*/
static inline struct rcu_head **rcu_segcblist_tail(struct rcu_segcblist *rsclp)
{
WARN_ON_ONCE(rcu_segcblist_empty(rsclp));
return rsclp->tails[RCU_NEXT_TAIL];
}
void rcu_segcblist_init(struct rcu_segcblist *rsclp);
void rcu_segcblist_disable(struct rcu_segcblist *rsclp);
bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp);
......
......@@ -89,7 +89,7 @@ torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable
static char *perf_type = "rcu";
module_param(perf_type, charp, 0444);
MODULE_PARM_DESC(perf_type, "Type of RCU to performance-test (rcu, rcu_bh, ...)");
MODULE_PARM_DESC(perf_type, "Type of RCU to performance-test (rcu, srcu, ...)");
static int nrealreaders;
static int nrealwriters;
......@@ -375,6 +375,14 @@ rcu_perf_writer(void *arg)
if (holdoff)
schedule_timeout_uninterruptible(holdoff * HZ);
/*
* Wait until rcu_end_inkernel_boot() is called for normal GP tests
* so that RCU is not always expedited for normal GP tests.
* The system_state test is approximate, but works well in practice.
*/
while (!gp_exp && system_state != SYSTEM_RUNNING)
schedule_timeout_uninterruptible(1);
t = ktime_get_mono_fast_ns();
if (atomic_inc_return(&n_rcu_perf_writer_started) >= nrealwriters) {
t_rcu_perf_writer_started = t;
......
......@@ -161,6 +161,7 @@ static atomic_long_t n_rcu_torture_timers;
static long n_barrier_attempts;
static long n_barrier_successes; /* did rcu_barrier test succeed? */
static struct list_head rcu_torture_removed;
static unsigned long shutdown_jiffies;
static int rcu_torture_writer_state;
#define RTWS_FIXED_DELAY 0
......@@ -228,6 +229,15 @@ static u64 notrace rcu_trace_clock_local(void)
}
#endif /* #else #ifdef CONFIG_RCU_TRACE */
/*
* Stop aggressive CPU-hog tests a bit before the end of the test in order
* to avoid interfering with test shutdown.
*/
static bool shutdown_time_arrived(void)
{
return shutdown_secs && time_after(jiffies, shutdown_jiffies - 30 * HZ);
}
static unsigned long boost_starttime; /* jiffies of next boost test start. */
static DEFINE_MUTEX(boost_mutex); /* protect setting boost_starttime */
/* and boost task create/destroy. */
......@@ -1713,12 +1723,14 @@ static void rcu_torture_fwd_cb_cr(struct rcu_head *rhp)
}
// Give the scheduler a chance, even on nohz_full CPUs.
static void rcu_torture_fwd_prog_cond_resched(void)
static void rcu_torture_fwd_prog_cond_resched(unsigned long iter)
{
if (IS_ENABLED(CONFIG_PREEMPT) && IS_ENABLED(CONFIG_NO_HZ_FULL)) {
if (need_resched())
// Real call_rcu() floods hit userspace, so emulate that.
if (need_resched() || (iter & 0xfff))
schedule();
} else {
// No userspace emulation: CB invocation throttles call_rcu()
cond_resched();
}
}
......@@ -1746,7 +1758,7 @@ static unsigned long rcu_torture_fwd_prog_cbfree(void)
spin_unlock_irqrestore(&rcu_fwd_lock, flags);
kfree(rfcp);
freed++;
rcu_torture_fwd_prog_cond_resched();
rcu_torture_fwd_prog_cond_resched(freed);
}
return freed;
}
......@@ -1785,15 +1797,17 @@ static void rcu_torture_fwd_prog_nr(int *tested, int *tested_tries)
WRITE_ONCE(rcu_fwd_startat, jiffies);
stopat = rcu_fwd_startat + dur;
while (time_before(jiffies, stopat) &&
!shutdown_time_arrived() &&
!READ_ONCE(rcu_fwd_emergency_stop) && !torture_must_stop()) {
idx = cur_ops->readlock();
udelay(10);
cur_ops->readunlock(idx);
if (!fwd_progress_need_resched || need_resched())
rcu_torture_fwd_prog_cond_resched();
rcu_torture_fwd_prog_cond_resched(1);
}
(*tested_tries)++;
if (!time_before(jiffies, stopat) &&
!shutdown_time_arrived() &&
!READ_ONCE(rcu_fwd_emergency_stop) && !torture_must_stop()) {
(*tested)++;
cver = READ_ONCE(rcu_torture_current_version) - cver;
......@@ -1852,6 +1866,7 @@ static void rcu_torture_fwd_prog_cr(void)
gps = cur_ops->get_gp_seq();
rcu_launder_gp_seq_start = gps;
while (time_before(jiffies, stopat) &&
!shutdown_time_arrived() &&
!READ_ONCE(rcu_fwd_emergency_stop) && !torture_must_stop()) {
rfcp = READ_ONCE(rcu_fwd_cb_head);
rfcpn = NULL;
......@@ -1875,7 +1890,7 @@ static void rcu_torture_fwd_prog_cr(void)
rfcp->rfc_gps = 0;
}
cur_ops->call(&rfcp->rh, rcu_torture_fwd_cb_cr);
rcu_torture_fwd_prog_cond_resched();
rcu_torture_fwd_prog_cond_resched(n_launders + n_max_cbs);
}
stoppedat = jiffies;
n_launders_cb_snap = READ_ONCE(n_launders_cb);
......@@ -1884,7 +1899,8 @@ static void rcu_torture_fwd_prog_cr(void)
cur_ops->cb_barrier(); /* Wait for callbacks to be invoked. */
(void)rcu_torture_fwd_prog_cbfree();
if (!torture_must_stop() && !READ_ONCE(rcu_fwd_emergency_stop)) {
if (!torture_must_stop() && !READ_ONCE(rcu_fwd_emergency_stop) &&
!shutdown_time_arrived()) {
WARN_ON(n_max_gps < MIN_FWD_CBS_LAUNDERED);
pr_alert("%s Duration %lu barrier: %lu pending %ld n_launders: %ld n_launders_sa: %ld n_max_gps: %ld n_max_cbs: %ld cver %ld gps %ld\n",
__func__,
......@@ -2465,6 +2481,7 @@ rcu_torture_init(void)
goto unwind;
rcutor_hp = firsterr;
}
shutdown_jiffies = jiffies + shutdown_secs * HZ;
firsterr = torture_shutdown_init(shutdown_secs, rcu_torture_cleanup);
if (firsterr)
goto unwind;
......
......@@ -1279,8 +1279,9 @@ void srcu_torture_stats_print(struct srcu_struct *ssp, char *tt, char *tf)
c0 = l0 - u0;
c1 = l1 - u1;
pr_cont(" %d(%ld,%ld %1p)",
cpu, c0, c1, rcu_segcblist_head(&sdp->srcu_cblist));
pr_cont(" %d(%ld,%ld %c)",
cpu, c0, c1,
"C."[rcu_segcblist_empty(&sdp->srcu_cblist)]);
s0 += c0;
s1 += c1;
}
......
......@@ -781,7 +781,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
* other hand, if the CPU is not in an RCU read-side critical section,
* the IPI handler reports the quiescent state immediately.
*
* Although this is a greate improvement over previous expedited
* Although this is a great improvement over previous expedited
* implementations, it is still unfriendly to real-time workloads, so is
* thus not recommended for any sort of common-case code. In fact, if
* you are using synchronize_rcu_expedited() in a loop, please restructure
......@@ -792,6 +792,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
*/
void synchronize_rcu_expedited(void)
{
bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
struct rcu_exp_work rew;
struct rcu_node *rnp;
unsigned long s;
......@@ -817,7 +818,7 @@ void synchronize_rcu_expedited(void)
return; /* Someone else did our work for us. */
/* Ensure that load happens before action based on it. */
if (unlikely(rcu_scheduler_active == RCU_SCHEDULER_INIT)) {
if (unlikely(boottime)) {
/* Direct call during scheduler init and early_initcalls(). */
rcu_exp_sel_wait_wake(s);
} else {
......@@ -835,5 +836,8 @@ void synchronize_rcu_expedited(void)
/* Let the next expedited grace period start. */
mutex_unlock(&rcu_state.exp_mutex);
if (likely(!boottime))
destroy_work_on_stack(&rew.rew_work);
}
EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
......@@ -288,7 +288,6 @@ void rcu_note_context_switch(bool preempt)
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
struct rcu_node *rnp;
barrier(); /* Avoid RCU read-side critical sections leaking down. */
trace_rcu_utilization(TPS("Start context switch"));
lockdep_assert_irqs_disabled();
WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0);
......@@ -331,7 +330,6 @@ void rcu_note_context_switch(bool preempt)
if (rdp->exp_deferred_qs)
rcu_report_exp_rdp(rdp);
trace_rcu_utilization(TPS("End context switch"));
barrier(); /* Avoid RCU read-side critical sections leaking up. */
}
EXPORT_SYMBOL_GPL(rcu_note_context_switch);
......@@ -815,11 +813,6 @@ static void rcu_qs(void)
* dyntick-idle quiescent state visible to other CPUs, which will in
* some cases serve for expedited as well as normal grace periods.
* Either way, register a lightweight quiescent state.
*
* The barrier() calls are redundant in the common case when this is
* called externally, but just in case this is called from within this
* file.
*
*/
void rcu_all_qs(void)
{
......@@ -834,14 +827,12 @@ void rcu_all_qs(void)
return;
}
this_cpu_write(rcu_data.rcu_urgent_qs, false);
barrier(); /* Avoid RCU read-side critical sections leaking down. */
if (unlikely(raw_cpu_read(rcu_data.rcu_need_heavy_qs))) {
local_irq_save(flags);
rcu_momentary_dyntick_idle();
local_irq_restore(flags);
}
rcu_qs();
barrier(); /* Avoid RCU read-side critical sections leaking up. */
preempt_enable();
}
EXPORT_SYMBOL_GPL(rcu_all_qs);
......@@ -851,7 +842,6 @@ EXPORT_SYMBOL_GPL(rcu_all_qs);
*/
void rcu_note_context_switch(bool preempt)
{
barrier(); /* Avoid RCU read-side critical sections leaking down. */
trace_rcu_utilization(TPS("Start context switch"));
rcu_qs();
/* Load rcu_urgent_qs before other flags. */
......@@ -864,7 +854,6 @@ void rcu_note_context_switch(bool preempt)
rcu_tasks_qs(current);
out:
trace_rcu_utilization(TPS("End context switch"));
barrier(); /* Avoid RCU read-side critical sections leaking up. */
}
EXPORT_SYMBOL_GPL(rcu_note_context_switch);
......@@ -1121,7 +1110,7 @@ static void rcu_preempt_boost_start_gp(struct rcu_node *rnp)
* already exist. We only create this kthread for preemptible RCU.
* Returns zero if all is well, a negated errno otherwise.
*/
static int rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
{
int rnp_index = rnp - rcu_get_root();
unsigned long flags;
......@@ -1129,25 +1118,27 @@ static int rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
struct task_struct *t;
if (!IS_ENABLED(CONFIG_PREEMPT_RCU))
return 0;
return;
if (!rcu_scheduler_fully_active || rcu_rnp_online_cpus(rnp) == 0)
return 0;
return;
rcu_state.boost = 1;
if (rnp->boost_kthread_task != NULL)
return 0;
return;
t = kthread_create(rcu_boost_kthread, (void *)rnp,
"rcub/%d", rnp_index);
if (IS_ERR(t))
return PTR_ERR(t);
if (WARN_ON_ONCE(IS_ERR(t)))
return;
raw_spin_lock_irqsave_rcu_node(rnp, flags);
rnp->boost_kthread_task = t;
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
sp.sched_priority = kthread_prio;
sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
wake_up_process(t); /* get to TASK_INTERRUPTIBLE quickly. */
return 0;
}
/*
......@@ -1188,7 +1179,7 @@ static void __init rcu_spawn_boost_kthreads(void)
struct rcu_node *rnp;
rcu_for_each_leaf_node(rnp)
(void)rcu_spawn_one_boost_kthread(rnp);
rcu_spawn_one_boost_kthread(rnp);
}
static void rcu_prepare_kthreads(int cpu)
......@@ -1198,7 +1189,7 @@ static void rcu_prepare_kthreads(int cpu)
/* Fire up the incoming CPU's kthread and leaf rcu_node kthread. */
if (rcu_scheduler_fully_active)
(void)rcu_spawn_one_boost_kthread(rnp);
rcu_spawn_one_boost_kthread(rnp);
}
#else /* #ifdef CONFIG_RCU_BOOST */
......
......@@ -527,6 +527,8 @@ static void check_cpu_stall(struct rcu_data *rdp)
/* We haven't checked in, so go dump stack. */
print_cpu_stall();
if (rcu_cpu_stall_ftrace_dump)
rcu_ftrace_dump(DUMP_ALL);
} else if (rcu_gp_in_progress() &&
ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
......@@ -534,6 +536,8 @@ static void check_cpu_stall(struct rcu_data *rdp)
/* They had a few time units to dump stack, so complain. */
print_other_cpu_stall(gs2);
if (rcu_cpu_stall_ftrace_dump)
rcu_ftrace_dump(DUMP_ALL);
}
}
......
......@@ -61,9 +61,15 @@ module_param(rcu_normal_after_boot, int, 0);
#ifdef CONFIG_DEBUG_LOCK_ALLOC
/**
* rcu_read_lock_sched_held() - might we be in RCU-sched read-side critical section?
* rcu_read_lock_held_common() - might we be in RCU-sched read-side critical section?
* @ret: Best guess answer if lockdep cannot be relied on
*
* If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an
* Returns true if lockdep must be ignored, in which case *ret contains
* the best guess described below. Otherwise returns false, in which
* case *ret tells the caller nothing and the caller should instead
* consult lockdep.
*
* If CONFIG_DEBUG_LOCK_ALLOC is selected, set *ret to nonzero iff in an
* RCU-sched read-side critical section. In absence of
* CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side
* critical section unless it can prove otherwise. Note that disabling
......@@ -75,35 +81,45 @@ module_param(rcu_normal_after_boot, int, 0);
* Check debug_lockdep_rcu_enabled() to prevent false positives during boot
* and while lockdep is disabled.
*
* Note that if the CPU is in the idle loop from an RCU point of
* view (ie: that we are in the section between rcu_idle_enter() and
* rcu_idle_exit()) then rcu_read_lock_held() returns false even if the CPU
* did an rcu_read_lock(). The reason for this is that RCU ignores CPUs
* that are in such a section, considering these as in extended quiescent
* state, so such a CPU is effectively never in an RCU read-side critical
* section regardless of what RCU primitives it invokes. This state of
* affairs is required --- we need to keep an RCU-free window in idle
* where the CPU may possibly enter into low power mode. This way we can
* notice an extended quiescent state to other CPUs that started a grace
* period. Otherwise we would delay any grace period as long as we run in
* the idle task.
* Note that if the CPU is in the idle loop from an RCU point of view (ie:
* that we are in the section between rcu_idle_enter() and rcu_idle_exit())
* then rcu_read_lock_held() sets *ret to false even if the CPU did an
* rcu_read_lock(). The reason for this is that RCU ignores CPUs that are
* in such a section, considering these as in extended quiescent state,
* so such a CPU is effectively never in an RCU read-side critical section
* regardless of what RCU primitives it invokes. This state of affairs is
* required --- we need to keep an RCU-free window in idle where the CPU may
* possibly enter into low power mode. This way we can notice an extended
* quiescent state to other CPUs that started a grace period. Otherwise
* we would delay any grace period as long as we run in the idle task.
*
* Similarly, we avoid claiming an SRCU read lock held if the current
* Similarly, we avoid claiming an RCU read lock held if the current
* CPU is offline.
*/
static bool rcu_read_lock_held_common(bool *ret)
{
if (!debug_lockdep_rcu_enabled()) {
*ret = 1;
return true;
}
if (!rcu_is_watching()) {
*ret = 0;
return true;
}
if (!rcu_lockdep_current_cpu_online()) {
*ret = 0;
return true;
}
return false;
}
int rcu_read_lock_sched_held(void)
{
int lockdep_opinion = 0;
bool ret;
if (!debug_lockdep_rcu_enabled())
return 1;
if (!rcu_is_watching())
return 0;
if (!rcu_lockdep_current_cpu_online())
return 0;
if (debug_locks)
lockdep_opinion = lock_is_held(&rcu_sched_lock_map);
return lockdep_opinion || !preemptible();
if (rcu_read_lock_held_common(&ret))
return ret;
return lock_is_held(&rcu_sched_lock_map) || !preemptible();
}
EXPORT_SYMBOL(rcu_read_lock_sched_held);
#endif
......@@ -136,8 +152,7 @@ static atomic_t rcu_expedited_nesting = ATOMIC_INIT(1);
*/
bool rcu_gp_is_expedited(void)
{
return rcu_expedited || atomic_read(&rcu_expedited_nesting) ||
rcu_scheduler_active == RCU_SCHEDULER_INIT;
return rcu_expedited || atomic_read(&rcu_expedited_nesting);
}
EXPORT_SYMBOL_GPL(rcu_gp_is_expedited);
......@@ -261,12 +276,10 @@ NOKPROBE_SYMBOL(debug_lockdep_rcu_enabled);
*/
int rcu_read_lock_held(void)
{
if (!debug_lockdep_rcu_enabled())
return 1;
if (!rcu_is_watching())
return 0;
if (!rcu_lockdep_current_cpu_online())
return 0;
bool ret;
if (rcu_read_lock_held_common(&ret))
return ret;
return lock_is_held(&rcu_lock_map);
}
EXPORT_SYMBOL_GPL(rcu_read_lock_held);
......@@ -288,16 +301,28 @@ EXPORT_SYMBOL_GPL(rcu_read_lock_held);
*/
int rcu_read_lock_bh_held(void)
{
if (!debug_lockdep_rcu_enabled())
return 1;
if (!rcu_is_watching())
return 0;
if (!rcu_lockdep_current_cpu_online())
return 0;
bool ret;
if (rcu_read_lock_held_common(&ret))
return ret;
return in_softirq() || irqs_disabled();
}
EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);
int rcu_read_lock_any_held(void)
{
bool ret;
if (rcu_read_lock_held_common(&ret))
return ret;
if (lock_is_held(&rcu_lock_map) ||
lock_is_held(&rcu_bh_lock_map) ||
lock_is_held(&rcu_sched_lock_map))
return 1;
return !preemptible();
}
EXPORT_SYMBOL_GPL(rcu_read_lock_any_held);
#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
/**
......@@ -437,6 +462,8 @@ EXPORT_SYMBOL_GPL(rcutorture_sched_setaffinity);
#endif
#ifdef CONFIG_RCU_STALL_COMMON
int rcu_cpu_stall_ftrace_dump __read_mostly;
module_param(rcu_cpu_stall_ftrace_dump, int, 0644);
int rcu_cpu_stall_suppress __read_mostly; /* 1 = suppress stall warnings. */
EXPORT_SYMBOL_GPL(rcu_cpu_stall_suppress);
module_param(rcu_cpu_stall_suppress, int, 0644);
......
......@@ -3486,8 +3486,36 @@ void scheduler_tick(void)
struct tick_work {
int cpu;
atomic_t state;
struct delayed_work work;
};
/* Values for ->state, see diagram below. */
#define TICK_SCHED_REMOTE_OFFLINE 0
#define TICK_SCHED_REMOTE_OFFLINING 1
#define TICK_SCHED_REMOTE_RUNNING 2
/*
* State diagram for ->state:
*
*
* TICK_SCHED_REMOTE_OFFLINE
* | ^
* | |
* | | sched_tick_remote()
* | |
* | |
* +--TICK_SCHED_REMOTE_OFFLINING
* | ^
* | |
* sched_tick_start() | | sched_tick_stop()
* | |
* V |
* TICK_SCHED_REMOTE_RUNNING
*
*
* Other transitions get WARN_ON_ONCE(), except that sched_tick_remote()
* and sched_tick_start() are happy to leave the state in RUNNING.
*/
static struct tick_work __percpu *tick_work_cpu;
......@@ -3500,6 +3528,7 @@ static void sched_tick_remote(struct work_struct *work)
struct task_struct *curr;
struct rq_flags rf;
u64 delta;
int os;
/*
* Handle the tick only if it appears the remote CPU is running in full
......@@ -3513,7 +3542,7 @@ static void sched_tick_remote(struct work_struct *work)
rq_lock_irq(rq, &rf);
curr = rq->curr;
if (is_idle_task(curr))
if (is_idle_task(curr) || cpu_is_offline(cpu))
goto out_unlock;
update_rq_clock(rq);
......@@ -3533,13 +3562,18 @@ static void sched_tick_remote(struct work_struct *work)
/*
* Run the remote tick once per second (1Hz). This arbitrary
* frequency is large enough to avoid overload but short enough
* to keep scheduler internal stats reasonably up to date.
* to keep scheduler internal stats reasonably up to date. But
* first update state to reflect hotplug activity if required.
*/
queue_delayed_work(system_unbound_wq, dwork, HZ);
os = atomic_fetch_add_unless(&twork->state, -1, TICK_SCHED_REMOTE_RUNNING);
WARN_ON_ONCE(os == TICK_SCHED_REMOTE_OFFLINE);
if (os == TICK_SCHED_REMOTE_RUNNING)
queue_delayed_work(system_unbound_wq, dwork, HZ);
}
static void sched_tick_start(int cpu)
{
int os;
struct tick_work *twork;
if (housekeeping_cpu(cpu, HK_FLAG_TICK))
......@@ -3548,15 +3582,20 @@ static void sched_tick_start(int cpu)
WARN_ON_ONCE(!tick_work_cpu);
twork = per_cpu_ptr(tick_work_cpu, cpu);
twork->cpu = cpu;
INIT_DELAYED_WORK(&twork->work, sched_tick_remote);
queue_delayed_work(system_unbound_wq, &twork->work, HZ);
os = atomic_xchg(&twork->state, TICK_SCHED_REMOTE_RUNNING);
WARN_ON_ONCE(os == TICK_SCHED_REMOTE_RUNNING);
if (os == TICK_SCHED_REMOTE_OFFLINE) {
twork->cpu = cpu;
INIT_DELAYED_WORK(&twork->work, sched_tick_remote);
queue_delayed_work(system_unbound_wq, &twork->work, HZ);
}
}
#ifdef CONFIG_HOTPLUG_CPU
static void sched_tick_stop(int cpu)
{
struct tick_work *twork;
int os;
if (housekeeping_cpu(cpu, HK_FLAG_TICK))
return;
......@@ -3564,7 +3603,10 @@ static void sched_tick_stop(int cpu)
WARN_ON_ONCE(!tick_work_cpu);
twork = per_cpu_ptr(tick_work_cpu, cpu);
cancel_delayed_work_sync(&twork->work);
/* There cannot be competing actions, but don't rely on stop-machine. */
os = atomic_xchg(&twork->state, TICK_SCHED_REMOTE_OFFLINING);
WARN_ON_ONCE(os != TICK_SCHED_REMOTE_RUNNING);
/* Don't cancel, as this would mess up the state machine. */
}
#endif /* CONFIG_HOTPLUG_CPU */
......@@ -3572,7 +3614,6 @@ int __init sched_tick_offload_init(void)
{
tick_work_cpu = alloc_percpu(struct tick_work);
BUG_ON(!tick_work_cpu);
return 0;
}
......
......@@ -241,13 +241,14 @@ static void do_idle(void)
check_pgt_cache();
rmb();
local_irq_disable();
if (cpu_is_offline(cpu)) {
tick_nohz_idle_stop_tick_protected();
tick_nohz_idle_stop_tick();
cpuhp_report_idle_dead();
arch_cpu_idle_dead();
}
local_irq_disable();
arch_cpu_idle_enter();
/*
......
......@@ -263,7 +263,6 @@ static void torture_onoff_cleanup(void)
onoff_task = NULL;
#endif /* #ifdef CONFIG_HOTPLUG_CPU */
}
EXPORT_SYMBOL_GPL(torture_onoff_cleanup);
/*
* Print online/offline testing statistics.
......@@ -449,7 +448,6 @@ static void torture_shuffle_cleanup(void)
}
shuffler_task = NULL;
}
EXPORT_SYMBOL_GPL(torture_shuffle_cleanup);
/*
* Variables for auto-shutdown. This allows "lights out" torture runs
......
......@@ -6,22 +6,22 @@
/*
* Traverse the ftrace_global_list, invoking all entries. The reason that we
* can use rcu_dereference_raw_notrace() is that elements removed from this list
* can use rcu_dereference_raw_check() is that elements removed from this list
* are simply leaked, so there is no need to interact with a grace-period
* mechanism. The rcu_dereference_raw_notrace() calls are needed to handle
* mechanism. The rcu_dereference_raw_check() calls are needed to handle
* concurrent insertions into the ftrace_global_list.
*
* Silly Alpha and silly pointer-speculation compiler optimizations!
*/
#define do_for_each_ftrace_op(op, list) \
op = rcu_dereference_raw_notrace(list); \
op = rcu_dereference_raw_check(list); \
do
/*
* Optimized for just a single item in the list (as that is the normal case).
*/
#define while_for_each_ftrace_op(op) \
while (likely(op = rcu_dereference_raw_notrace((op)->next)) && \
while (likely(op = rcu_dereference_raw_check((op)->next)) && \
unlikely((op) != &ftrace_list_end))
extern struct ftrace_ops __rcu *ftrace_ops_list;
......
......@@ -2642,10 +2642,10 @@ static void ftrace_exports(struct ring_buffer_event *event)
preempt_disable_notrace();
export = rcu_dereference_raw_notrace(ftrace_exports_list);
export = rcu_dereference_raw_check(ftrace_exports_list);
while (export) {
trace_process_export(export, event);
export = rcu_dereference_raw_notrace(export->next);
export = rcu_dereference_raw_check(export->next);
}
preempt_enable_notrace();
......
......@@ -124,7 +124,8 @@ struct fib_table *fib_get_table(struct net *net, u32 id)
h = id & (FIB_TABLE_HASHSZ - 1);
head = &net->ipv4.fib_table_hash[h];
hlist_for_each_entry_rcu(tb, head, tb_hlist) {
hlist_for_each_entry_rcu(tb, head, tb_hlist,
lockdep_rtnl_is_held()) {
if (tb->tb_id == id)
return tb;
}
......
......@@ -227,7 +227,7 @@ then
must_continue=yes
fi
last_ts="`tail $resdir/console.log | grep '^\[ *[0-9]\+\.[0-9]\+]' | tail -1 | sed -e 's/^\[ *//' -e 's/\..*$//'`"
if test -z "last_ts"
if test -z "$last_ts"
then
last_ts=0
fi
......
......@@ -3,3 +3,4 @@ rcutree.gp_preinit_delay=12
rcutree.gp_init_delay=3
rcutree.gp_cleanup_delay=3
rcutree.kthread_prio=2
threadirqs
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment