Commit 4b978934 authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Expedited grace-period changes, most notably avoiding having user
     threads drive expedited grace periods, using a workqueue instead.

   - Miscellaneous fixes, including a performance fix for lists that was
     sent with the lists modifications.

   - CPU hotplug updates, most notably providing exact CPU-online
     tracking for RCU. This will in turn allow removal of the checks
     supporting RCU's prior heuristic that was based on the assumption
     that CPUs would take no longer than one jiffy to come online.

   - Torture-test updates.

   - Documentation updates"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  list: Expand list_first_entry_or_null()
  torture: TOROUT_STRING(): Insert a space between flag and message
  rcuperf: Consistently insert space between flag and message
  rcutorture: Print out barrier error as document says
  torture: Add task state to writer-task stall printk()s
  torture: Convert torture_shutdown() to hrtimer
  rcutorture: Convert to hotplug state machine
  cpu/hotplug: Get rid of CPU_STARTING reference
  rcu: Provide exact CPU-online tracking for RCU
  rcu: Avoid redundant quiescent-state chasing
  rcu: Don't use modular infrastructure in non-modular code
  sched: Make wake_up_nohz_cpu() handle CPUs going offline
  rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads
  rcu: Use RCU's online-CPU state for expedited IPI retry
  rcu: Exclude RCU-offline CPUs from expedited grace periods
  rcu: Make expedited RCU CPU stall warnings respond to controls
  rcu: Stop disabling expedited RCU CPU stall warnings
  rcu: Drive expedited grace periods from workqueue
  rcu: Consolidate expedited grace period machinery
  documentation: Record reason for rcu_head two-byte alignment
  ...
parents 72a9cdd0 2d8fbcd1
...@@ -2493,6 +2493,28 @@ or some future “lazy” ...@@ -2493,6 +2493,28 @@ or some future “lazy”
variant of <tt>call_rcu()</tt> that might one day be created for variant of <tt>call_rcu()</tt> that might one day be created for
energy-efficiency purposes. energy-efficiency purposes.
<p>
That said, there are limits.
RCU requires that the <tt>rcu_head</tt> structure be aligned to a
two-byte boundary, and passing a misaligned <tt>rcu_head</tt>
structure to one of the <tt>call_rcu()</tt> family of functions
will result in a splat.
It is therefore necessary to exercise caution when packing
structures containing fields of type <tt>rcu_head</tt>.
Why not a four-byte or even eight-byte alignment requirement?
Because the m68k architecture provides only two-byte alignment,
and thus acts as alignment's least common denominator.
<p>
The reason for reserving the bottom bit of pointers to
<tt>rcu_head</tt> structures is to leave the door open to
&ldquo;lazy&rdquo; callbacks whose invocations can safely be deferred.
Deferring invocation could potentially have energy-efficiency
benefits, but only if the rate of non-lazy callbacks decreases
significantly for some important workload.
In the meantime, reserving the bottom bit keeps this option open
in case it one day becomes useful.
<h3><a name="Performance, Scalability, Response Time, and Reliability"> <h3><a name="Performance, Scalability, Response Time, and Reliability">
Performance, Scalability, Response Time, and Reliability</a></h3> Performance, Scalability, Response Time, and Reliability</a></h3>
......
...@@ -10,21 +10,6 @@ status messages via printk(), which can be examined via the dmesg ...@@ -10,21 +10,6 @@ status messages via printk(), which can be examined via the dmesg
command (perhaps grepping for "torture"). The test is started command (perhaps grepping for "torture"). The test is started
when the module is loaded, and stops when the module is unloaded. when the module is loaded, and stops when the module is unloaded.
CONFIG_RCU_TORTURE_TEST_RUNNABLE
It is also possible to specify CONFIG_RCU_TORTURE_TEST=y, which will
result in the tests being loaded into the base kernel. In this case,
the CONFIG_RCU_TORTURE_TEST_RUNNABLE config option is used to specify
whether the RCU torture tests are to be started immediately during
boot or whether the /proc/sys/kernel/rcutorture_runnable file is used
to enable them. This /proc file can be used to repeatedly pause and
restart the tests, regardless of the initial state specified by the
CONFIG_RCU_TORTURE_TEST_RUNNABLE config option.
You will normally -not- want to start the RCU torture tests during boot
(and thus the default is CONFIG_RCU_TORTURE_TEST_RUNNABLE=n), but doing
this can sometimes be useful in finding boot-time bugs.
MODULE PARAMETERS MODULE PARAMETERS
......
...@@ -381,8 +381,11 @@ static inline void list_splice_tail_init(struct list_head *list, ...@@ -381,8 +381,11 @@ static inline void list_splice_tail_init(struct list_head *list,
* *
* Note that if the list is empty, it returns NULL. * Note that if the list is empty, it returns NULL.
*/ */
#define list_first_entry_or_null(ptr, type, member) \ #define list_first_entry_or_null(ptr, type, member) ({ \
(!list_empty(ptr) ? list_first_entry(ptr, type, member) : NULL) struct list_head *head__ = (ptr); \
struct list_head *pos__ = READ_ONCE(head__->next); \
pos__ != head__ ? list_entry(pos__, type, member) : NULL; \
})
/** /**
* list_next_entry - get the next element in list * list_next_entry - get the next element in list
......
...@@ -334,6 +334,7 @@ void rcu_sched_qs(void); ...@@ -334,6 +334,7 @@ void rcu_sched_qs(void);
void rcu_bh_qs(void); void rcu_bh_qs(void);
void rcu_check_callbacks(int user); void rcu_check_callbacks(int user);
void rcu_report_dead(unsigned int cpu); void rcu_report_dead(unsigned int cpu);
void rcu_cpu_starting(unsigned int cpu);
#ifndef CONFIG_TINY_RCU #ifndef CONFIG_TINY_RCU
void rcu_end_inkernel_boot(void); void rcu_end_inkernel_boot(void);
......
...@@ -43,7 +43,7 @@ ...@@ -43,7 +43,7 @@
#define TORTURE_FLAG "-torture:" #define TORTURE_FLAG "-torture:"
#define TOROUT_STRING(s) \ #define TOROUT_STRING(s) \
pr_alert("%s" TORTURE_FLAG s "\n", torture_type) pr_alert("%s" TORTURE_FLAG " %s\n", torture_type, s)
#define VERBOSE_TOROUT_STRING(s) \ #define VERBOSE_TOROUT_STRING(s) \
do { if (verbose) pr_alert("%s" TORTURE_FLAG " %s\n", torture_type, s); } while (0) do { if (verbose) pr_alert("%s" TORTURE_FLAG " %s\n", torture_type, s); } while (0)
#define VERBOSE_TOROUT_ERRSTRING(s) \ #define VERBOSE_TOROUT_ERRSTRING(s) \
......
...@@ -889,6 +889,7 @@ void notify_cpu_starting(unsigned int cpu) ...@@ -889,6 +889,7 @@ void notify_cpu_starting(unsigned int cpu)
struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu); struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE); enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE);
rcu_cpu_starting(cpu); /* Enables RCU usage on this CPU. */
while (st->state < target) { while (st->state < target) {
struct cpuhp_step *step; struct cpuhp_step *step;
......
...@@ -52,7 +52,7 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.vnet.ibm.com>"); ...@@ -52,7 +52,7 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.vnet.ibm.com>");
#define PERF_FLAG "-perf:" #define PERF_FLAG "-perf:"
#define PERFOUT_STRING(s) \ #define PERFOUT_STRING(s) \
pr_alert("%s" PERF_FLAG s "\n", perf_type) pr_alert("%s" PERF_FLAG " %s\n", perf_type, s)
#define VERBOSE_PERFOUT_STRING(s) \ #define VERBOSE_PERFOUT_STRING(s) \
do { if (verbose) pr_alert("%s" PERF_FLAG " %s\n", perf_type, s); } while (0) do { if (verbose) pr_alert("%s" PERF_FLAG " %s\n", perf_type, s); } while (0)
#define VERBOSE_PERFOUT_ERRSTRING(s) \ #define VERBOSE_PERFOUT_ERRSTRING(s) \
...@@ -400,9 +400,8 @@ rcu_perf_writer(void *arg) ...@@ -400,9 +400,8 @@ rcu_perf_writer(void *arg)
sp.sched_priority = 0; sp.sched_priority = 0;
sched_setscheduler_nocheck(current, sched_setscheduler_nocheck(current,
SCHED_NORMAL, &sp); SCHED_NORMAL, &sp);
pr_alert("%s" PERF_FLAG pr_alert("%s%s rcu_perf_writer %ld has %d measurements\n",
"rcu_perf_writer %ld has %d measurements\n", perf_type, PERF_FLAG, me, MIN_MEAS);
perf_type, me, MIN_MEAS);
if (atomic_inc_return(&n_rcu_perf_writer_finished) >= if (atomic_inc_return(&n_rcu_perf_writer_finished) >=
nrealwriters) { nrealwriters) {
schedule_timeout_interruptible(10); schedule_timeout_interruptible(10);
......
...@@ -1238,6 +1238,7 @@ rcu_torture_stats_print(void) ...@@ -1238,6 +1238,7 @@ rcu_torture_stats_print(void)
long pipesummary[RCU_TORTURE_PIPE_LEN + 1] = { 0 }; long pipesummary[RCU_TORTURE_PIPE_LEN + 1] = { 0 };
long batchsummary[RCU_TORTURE_PIPE_LEN + 1] = { 0 }; long batchsummary[RCU_TORTURE_PIPE_LEN + 1] = { 0 };
static unsigned long rtcv_snap = ULONG_MAX; static unsigned long rtcv_snap = ULONG_MAX;
struct task_struct *wtp;
for_each_possible_cpu(cpu) { for_each_possible_cpu(cpu) {
for (i = 0; i < RCU_TORTURE_PIPE_LEN + 1; i++) { for (i = 0; i < RCU_TORTURE_PIPE_LEN + 1; i++) {
...@@ -1258,8 +1259,9 @@ rcu_torture_stats_print(void) ...@@ -1258,8 +1259,9 @@ rcu_torture_stats_print(void)
atomic_read(&n_rcu_torture_alloc), atomic_read(&n_rcu_torture_alloc),
atomic_read(&n_rcu_torture_alloc_fail), atomic_read(&n_rcu_torture_alloc_fail),
atomic_read(&n_rcu_torture_free)); atomic_read(&n_rcu_torture_free));
pr_cont("rtmbe: %d rtbke: %ld rtbre: %ld ", pr_cont("rtmbe: %d rtbe: %ld rtbke: %ld rtbre: %ld ",
atomic_read(&n_rcu_torture_mberror), atomic_read(&n_rcu_torture_mberror),
n_rcu_torture_barrier_error,
n_rcu_torture_boost_ktrerror, n_rcu_torture_boost_ktrerror,
n_rcu_torture_boost_rterror); n_rcu_torture_boost_rterror);
pr_cont("rtbf: %ld rtb: %ld nt: %ld ", pr_cont("rtbf: %ld rtb: %ld nt: %ld ",
...@@ -1312,10 +1314,12 @@ rcu_torture_stats_print(void) ...@@ -1312,10 +1314,12 @@ rcu_torture_stats_print(void)
rcutorture_get_gp_data(cur_ops->ttype, rcutorture_get_gp_data(cur_ops->ttype,
&flags, &gpnum, &completed); &flags, &gpnum, &completed);
pr_alert("??? Writer stall state %s(%d) g%lu c%lu f%#x\n", wtp = READ_ONCE(writer_task);
pr_alert("??? Writer stall state %s(%d) g%lu c%lu f%#x ->state %#lx\n",
rcu_torture_writer_state_getname(), rcu_torture_writer_state_getname(),
rcu_torture_writer_state, rcu_torture_writer_state,
gpnum, completed, flags); gpnum, completed, flags,
wtp == NULL ? ~0UL : wtp->state);
show_rcu_gp_kthreads(); show_rcu_gp_kthreads();
rcu_ftrace_dump(DUMP_ALL); rcu_ftrace_dump(DUMP_ALL);
} }
...@@ -1362,12 +1366,12 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag) ...@@ -1362,12 +1366,12 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag)
onoff_interval, onoff_holdoff); onoff_interval, onoff_holdoff);
} }
static void rcutorture_booster_cleanup(int cpu) static int rcutorture_booster_cleanup(unsigned int cpu)
{ {
struct task_struct *t; struct task_struct *t;
if (boost_tasks[cpu] == NULL) if (boost_tasks[cpu] == NULL)
return; return 0;
mutex_lock(&boost_mutex); mutex_lock(&boost_mutex);
t = boost_tasks[cpu]; t = boost_tasks[cpu];
boost_tasks[cpu] = NULL; boost_tasks[cpu] = NULL;
...@@ -1375,9 +1379,10 @@ static void rcutorture_booster_cleanup(int cpu) ...@@ -1375,9 +1379,10 @@ static void rcutorture_booster_cleanup(int cpu)
/* This must be outside of the mutex, otherwise deadlock! */ /* This must be outside of the mutex, otherwise deadlock! */
torture_stop_kthread(rcu_torture_boost, t); torture_stop_kthread(rcu_torture_boost, t);
return 0;
} }
static int rcutorture_booster_init(int cpu) static int rcutorture_booster_init(unsigned int cpu)
{ {
int retval; int retval;
...@@ -1577,28 +1582,7 @@ static void rcu_torture_barrier_cleanup(void) ...@@ -1577,28 +1582,7 @@ static void rcu_torture_barrier_cleanup(void)
} }
} }
static int rcutorture_cpu_notify(struct notifier_block *self, static enum cpuhp_state rcutor_hp;
unsigned long action, void *hcpu)
{
long cpu = (long)hcpu;
switch (action & ~CPU_TASKS_FROZEN) {
case CPU_ONLINE:
case CPU_DOWN_FAILED:
(void)rcutorture_booster_init(cpu);
break;
case CPU_DOWN_PREPARE:
rcutorture_booster_cleanup(cpu);
break;
default:
break;
}
return NOTIFY_OK;
}
static struct notifier_block rcutorture_cpu_nb = {
.notifier_call = rcutorture_cpu_notify,
};
static void static void
rcu_torture_cleanup(void) rcu_torture_cleanup(void)
...@@ -1638,11 +1622,8 @@ rcu_torture_cleanup(void) ...@@ -1638,11 +1622,8 @@ rcu_torture_cleanup(void)
for (i = 0; i < ncbflooders; i++) for (i = 0; i < ncbflooders; i++)
torture_stop_kthread(rcu_torture_cbflood, cbflood_task[i]); torture_stop_kthread(rcu_torture_cbflood, cbflood_task[i]);
if ((test_boost == 1 && cur_ops->can_boost) || if ((test_boost == 1 && cur_ops->can_boost) ||
test_boost == 2) { test_boost == 2)
unregister_cpu_notifier(&rcutorture_cpu_nb); cpuhp_remove_state(rcutor_hp);
for_each_possible_cpu(i)
rcutorture_booster_cleanup(i);
}
/* /*
* Wait for all RCU callbacks to fire, then do flavor-specific * Wait for all RCU callbacks to fire, then do flavor-specific
...@@ -1869,14 +1850,13 @@ rcu_torture_init(void) ...@@ -1869,14 +1850,13 @@ rcu_torture_init(void)
test_boost == 2) { test_boost == 2) {
boost_starttime = jiffies + test_boost_interval * HZ; boost_starttime = jiffies + test_boost_interval * HZ;
register_cpu_notifier(&rcutorture_cpu_nb);
for_each_possible_cpu(i) { firsterr = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "RCU_TORTURE",
if (cpu_is_offline(i)) rcutorture_booster_init,
continue; /* Heuristic: CPU can go offline. */ rcutorture_booster_cleanup);
firsterr = rcutorture_booster_init(i); if (firsterr < 0)
if (firsterr) goto unwind;
goto unwind; rcutor_hp = firsterr;
}
} }
firsterr = torture_shutdown_init(shutdown_secs, rcu_torture_cleanup); firsterr = torture_shutdown_init(shutdown_secs, rcu_torture_cleanup);
if (firsterr) if (firsterr)
......
...@@ -41,7 +41,6 @@ ...@@ -41,7 +41,6 @@
#include <linux/export.h> #include <linux/export.h>
#include <linux/completion.h> #include <linux/completion.h>
#include <linux/moduleparam.h> #include <linux/moduleparam.h>
#include <linux/module.h>
#include <linux/percpu.h> #include <linux/percpu.h>
#include <linux/notifier.h> #include <linux/notifier.h>
#include <linux/cpu.h> #include <linux/cpu.h>
...@@ -60,7 +59,6 @@ ...@@ -60,7 +59,6 @@
#include "tree.h" #include "tree.h"
#include "rcu.h" #include "rcu.h"
MODULE_ALIAS("rcutree");
#ifdef MODULE_PARAM_PREFIX #ifdef MODULE_PARAM_PREFIX
#undef MODULE_PARAM_PREFIX #undef MODULE_PARAM_PREFIX
#endif #endif
...@@ -1848,6 +1846,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp, ...@@ -1848,6 +1846,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp,
struct rcu_data *rdp) struct rcu_data *rdp)
{ {
bool ret; bool ret;
bool need_gp;
/* Handle the ends of any preceding grace periods first. */ /* Handle the ends of any preceding grace periods first. */
if (rdp->completed == rnp->completed && if (rdp->completed == rnp->completed &&
...@@ -1874,9 +1873,10 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp, ...@@ -1874,9 +1873,10 @@ static bool __note_gp_changes(struct rcu_state *rsp, struct rcu_node *rnp,
*/ */
rdp->gpnum = rnp->gpnum; rdp->gpnum = rnp->gpnum;
trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpustart")); trace_rcu_grace_period(rsp->name, rdp->gpnum, TPS("cpustart"));
rdp->cpu_no_qs.b.norm = true; need_gp = !!(rnp->qsmask & rdp->grpmask);
rdp->cpu_no_qs.b.norm = need_gp;
rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr); rdp->rcu_qs_ctr_snap = __this_cpu_read(rcu_qs_ctr);
rdp->core_needs_qs = !!(rnp->qsmask & rdp->grpmask); rdp->core_needs_qs = need_gp;
zero_cpu_stall_ticks(rdp); zero_cpu_stall_ticks(rdp);
WRITE_ONCE(rdp->gpwrap, false); WRITE_ONCE(rdp->gpwrap, false);
} }
...@@ -2344,7 +2344,7 @@ static void rcu_report_qs_rsp(struct rcu_state *rsp, unsigned long flags) ...@@ -2344,7 +2344,7 @@ static void rcu_report_qs_rsp(struct rcu_state *rsp, unsigned long flags)
WARN_ON_ONCE(!rcu_gp_in_progress(rsp)); WARN_ON_ONCE(!rcu_gp_in_progress(rsp));
WRITE_ONCE(rsp->gp_flags, READ_ONCE(rsp->gp_flags) | RCU_GP_FLAG_FQS); WRITE_ONCE(rsp->gp_flags, READ_ONCE(rsp->gp_flags) | RCU_GP_FLAG_FQS);
raw_spin_unlock_irqrestore_rcu_node(rcu_get_root(rsp), flags); raw_spin_unlock_irqrestore_rcu_node(rcu_get_root(rsp), flags);
swake_up(&rsp->gp_wq); /* Memory barrier implied by swake_up() path. */ rcu_gp_kthread_wake(rsp);
} }
/* /*
...@@ -2970,7 +2970,7 @@ static void force_quiescent_state(struct rcu_state *rsp) ...@@ -2970,7 +2970,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
} }
WRITE_ONCE(rsp->gp_flags, READ_ONCE(rsp->gp_flags) | RCU_GP_FLAG_FQS); WRITE_ONCE(rsp->gp_flags, READ_ONCE(rsp->gp_flags) | RCU_GP_FLAG_FQS);
raw_spin_unlock_irqrestore_rcu_node(rnp_old, flags); raw_spin_unlock_irqrestore_rcu_node(rnp_old, flags);
swake_up(&rsp->gp_wq); /* Memory barrier implied by swake_up() path. */ rcu_gp_kthread_wake(rsp);
} }
/* /*
...@@ -3792,8 +3792,6 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp) ...@@ -3792,8 +3792,6 @@ rcu_init_percpu_data(int cpu, struct rcu_state *rsp)
rnp = rdp->mynode; rnp = rdp->mynode;
mask = rdp->grpmask; mask = rdp->grpmask;
raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */ raw_spin_lock_rcu_node(rnp); /* irqs already disabled. */
rnp->qsmaskinitnext |= mask;
rnp->expmaskinitnext |= mask;
if (!rdp->beenonline) if (!rdp->beenonline)
WRITE_ONCE(rsp->ncpus, READ_ONCE(rsp->ncpus) + 1); WRITE_ONCE(rsp->ncpus, READ_ONCE(rsp->ncpus) + 1);
rdp->beenonline = true; /* We have now been online. */ rdp->beenonline = true; /* We have now been online. */
...@@ -3860,6 +3858,32 @@ int rcutree_dead_cpu(unsigned int cpu) ...@@ -3860,6 +3858,32 @@ int rcutree_dead_cpu(unsigned int cpu)
return 0; return 0;
} }
/*
* Mark the specified CPU as being online so that subsequent grace periods
* (both expedited and normal) will wait on it. Note that this means that
* incoming CPUs are not allowed to use RCU read-side critical sections
* until this function is called. Failing to observe this restriction
* will result in lockdep splats.
*/
void rcu_cpu_starting(unsigned int cpu)
{
unsigned long flags;
unsigned long mask;
struct rcu_data *rdp;
struct rcu_node *rnp;
struct rcu_state *rsp;
for_each_rcu_flavor(rsp) {
rdp = this_cpu_ptr(rsp->rda);
rnp = rdp->mynode;
mask = rdp->grpmask;
raw_spin_lock_irqsave_rcu_node(rnp, flags);
rnp->qsmaskinitnext |= mask;
rnp->expmaskinitnext |= mask;
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}
}
#ifdef CONFIG_HOTPLUG_CPU #ifdef CONFIG_HOTPLUG_CPU
/* /*
* The CPU is exiting the idle loop into the arch_cpu_idle_dead() * The CPU is exiting the idle loop into the arch_cpu_idle_dead()
...@@ -4209,8 +4233,10 @@ void __init rcu_init(void) ...@@ -4209,8 +4233,10 @@ void __init rcu_init(void)
* or the scheduler are operational. * or the scheduler are operational.
*/ */
pm_notifier(rcu_pm_notify, 0); pm_notifier(rcu_pm_notify, 0);
for_each_online_cpu(cpu) for_each_online_cpu(cpu) {
rcutree_prepare_cpu(cpu); rcutree_prepare_cpu(cpu);
rcu_cpu_starting(cpu);
}
} }
#include "tree_exp.h" #include "tree_exp.h"
......
...@@ -400,6 +400,7 @@ struct rcu_data { ...@@ -400,6 +400,7 @@ struct rcu_data {
#ifdef CONFIG_RCU_FAST_NO_HZ #ifdef CONFIG_RCU_FAST_NO_HZ
struct rcu_head oom_head; struct rcu_head oom_head;
#endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */ #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
atomic_long_t exp_workdone0; /* # done by workqueue. */
atomic_long_t exp_workdone1; /* # done by others #1. */ atomic_long_t exp_workdone1; /* # done by others #1. */
atomic_long_t exp_workdone2; /* # done by others #2. */ atomic_long_t exp_workdone2; /* # done by others #2. */
atomic_long_t exp_workdone3; /* # done by others #3. */ atomic_long_t exp_workdone3; /* # done by others #3. */
......
...@@ -359,7 +359,8 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp, ...@@ -359,7 +359,8 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu); struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
if (raw_smp_processor_id() == cpu || if (raw_smp_processor_id() == cpu ||
!(atomic_add_return(0, &rdtp->dynticks) & 0x1)) !(atomic_add_return(0, &rdtp->dynticks) & 0x1) ||
!(rnp->qsmaskinitnext & rdp->grpmask))
mask_ofl_test |= rdp->grpmask; mask_ofl_test |= rdp->grpmask;
} }
mask_ofl_ipi = rnp->expmask & ~mask_ofl_test; mask_ofl_ipi = rnp->expmask & ~mask_ofl_test;
...@@ -384,17 +385,16 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp, ...@@ -384,17 +385,16 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
mask_ofl_ipi &= ~mask; mask_ofl_ipi &= ~mask;
continue; continue;
} }
/* Failed, raced with offline. */ /* Failed, raced with CPU hotplug operation. */
raw_spin_lock_irqsave_rcu_node(rnp, flags); raw_spin_lock_irqsave_rcu_node(rnp, flags);
if (cpu_online(cpu) && if ((rnp->qsmaskinitnext & mask) &&
(rnp->expmask & mask)) { (rnp->expmask & mask)) {
/* Online, so delay for a bit and try again. */
raw_spin_unlock_irqrestore_rcu_node(rnp, flags); raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
schedule_timeout_uninterruptible(1); schedule_timeout_uninterruptible(1);
if (cpu_online(cpu) && goto retry_ipi;
(rnp->expmask & mask))
goto retry_ipi;
raw_spin_lock_irqsave_rcu_node(rnp, flags);
} }
/* CPU really is offline, so we can ignore it. */
if (!(rnp->expmask & mask)) if (!(rnp->expmask & mask))
mask_ofl_ipi &= ~mask; mask_ofl_ipi &= ~mask;
raw_spin_unlock_irqrestore_rcu_node(rnp, flags); raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
...@@ -427,12 +427,10 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp) ...@@ -427,12 +427,10 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
jiffies_stall); jiffies_stall);
if (ret > 0 || sync_rcu_preempt_exp_done(rnp_root)) if (ret > 0 || sync_rcu_preempt_exp_done(rnp_root))
return; return;
if (ret < 0) { WARN_ON(ret < 0); /* workqueues should not be signaled. */
/* Hit a signal, disable CPU stall warnings. */ if (rcu_cpu_stall_suppress)
swait_event(rsp->expedited_wq, continue;
sync_rcu_preempt_exp_done(rnp_root)); panic_on_rcu_stall();
return;
}
pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {", pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {",
rsp->name); rsp->name);
ndetected = 0; ndetected = 0;
...@@ -500,7 +498,6 @@ static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s) ...@@ -500,7 +498,6 @@ static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s)
* next GP, to proceed. * next GP, to proceed.
*/ */
mutex_lock(&rsp->exp_wake_mutex); mutex_lock(&rsp->exp_wake_mutex);
mutex_unlock(&rsp->exp_mutex);
rcu_for_each_node_breadth_first(rsp, rnp) { rcu_for_each_node_breadth_first(rsp, rnp) {
if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s)) { if (ULONG_CMP_LT(READ_ONCE(rnp->exp_seq_rq), s)) {
...@@ -516,6 +513,70 @@ static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s) ...@@ -516,6 +513,70 @@ static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s)
mutex_unlock(&rsp->exp_wake_mutex); mutex_unlock(&rsp->exp_wake_mutex);
} }
/* Let the workqueue handler know what it is supposed to do. */
struct rcu_exp_work {
smp_call_func_t rew_func;
struct rcu_state *rew_rsp;
unsigned long rew_s;
struct work_struct rew_work;
};
/*
* Work-queue handler to drive an expedited grace period forward.
*/
static void wait_rcu_exp_gp(struct work_struct *wp)
{
struct rcu_exp_work *rewp;
/* Initialize the rcu_node tree in preparation for the wait. */
rewp = container_of(wp, struct rcu_exp_work, rew_work);
sync_rcu_exp_select_cpus(rewp->rew_rsp, rewp->rew_func);
/* Wait and clean up, including waking everyone. */
rcu_exp_wait_wake(rewp->rew_rsp, rewp->rew_s);
}
/*
* Given an rcu_state pointer and a smp_call_function() handler, kick
* off the specified flavor of expedited grace period.
*/
static void _synchronize_rcu_expedited(struct rcu_state *rsp,
smp_call_func_t func)
{
struct rcu_data *rdp;
struct rcu_exp_work rew;
struct rcu_node *rnp;
unsigned long s;
/* If expedited grace periods are prohibited, fall back to normal. */
if (rcu_gp_is_normal()) {
wait_rcu_gp(rsp->call);
return;
}
/* Take a snapshot of the sequence number. */
s = rcu_exp_gp_seq_snap(rsp);
if (exp_funnel_lock(rsp, s))
return; /* Someone else did our work for us. */
/* Marshall arguments and schedule the expedited grace period. */
rew.rew_func = func;
rew.rew_rsp = rsp;
rew.rew_s = s;
INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
schedule_work(&rew.rew_work);
/* Wait for expedited grace period to complete. */
rdp = per_cpu_ptr(rsp->rda, raw_smp_processor_id());
rnp = rcu_get_root(rsp);
wait_event(rnp->exp_wq[(s >> 1) & 0x3],
sync_exp_work_done(rsp,
&rdp->exp_workdone0, s));
/* Let the next expedited grace period start. */
mutex_unlock(&rsp->exp_mutex);
}
/** /**
* synchronize_sched_expedited - Brute-force RCU-sched grace period * synchronize_sched_expedited - Brute-force RCU-sched grace period
* *
...@@ -534,29 +595,13 @@ static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s) ...@@ -534,29 +595,13 @@ static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s)
*/ */
void synchronize_sched_expedited(void) void synchronize_sched_expedited(void)
{ {
unsigned long s;
struct rcu_state *rsp = &rcu_sched_state; struct rcu_state *rsp = &rcu_sched_state;
/* If only one CPU, this is automatically a grace period. */ /* If only one CPU, this is automatically a grace period. */
if (rcu_blocking_is_gp()) if (rcu_blocking_is_gp())
return; return;
/* If expedited grace periods are prohibited, fall back to normal. */ _synchronize_rcu_expedited(rsp, sync_sched_exp_handler);
if (rcu_gp_is_normal()) {
wait_rcu_gp(call_rcu_sched);
return;
}
/* Take a snapshot of the sequence number. */
s = rcu_exp_gp_seq_snap(rsp);
if (exp_funnel_lock(rsp, s))
return; /* Someone else did our work for us. */
/* Initialize the rcu_node tree in preparation for the wait. */
sync_rcu_exp_select_cpus(rsp, sync_sched_exp_handler);
/* Wait and clean up, including waking everyone. */
rcu_exp_wait_wake(rsp, s);
} }
EXPORT_SYMBOL_GPL(synchronize_sched_expedited); EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
...@@ -620,23 +665,8 @@ static void sync_rcu_exp_handler(void *info) ...@@ -620,23 +665,8 @@ static void sync_rcu_exp_handler(void *info)
void synchronize_rcu_expedited(void) void synchronize_rcu_expedited(void)
{ {
struct rcu_state *rsp = rcu_state_p; struct rcu_state *rsp = rcu_state_p;
unsigned long s;
/* If expedited grace periods are prohibited, fall back to normal. */
if (rcu_gp_is_normal()) {
wait_rcu_gp(call_rcu);
return;
}
s = rcu_exp_gp_seq_snap(rsp);
if (exp_funnel_lock(rsp, s))
return; /* Someone else did our work for us. */
/* Initialize the rcu_node tree in preparation for the wait. */
sync_rcu_exp_select_cpus(rsp, sync_rcu_exp_handler);
/* Wait for ->blkd_tasks lists to drain, then wake everyone up. */ _synchronize_rcu_expedited(rsp, sync_rcu_exp_handler);
rcu_exp_wait_wake(rsp, s);
} }
EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
......
...@@ -2173,6 +2173,7 @@ static int rcu_nocb_kthread(void *arg) ...@@ -2173,6 +2173,7 @@ static int rcu_nocb_kthread(void *arg)
cl++; cl++;
c++; c++;
local_bh_enable(); local_bh_enable();
cond_resched_rcu_qs();
list = next; list = next;
} }
trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1); trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
......
...@@ -185,16 +185,17 @@ static int show_rcuexp(struct seq_file *m, void *v) ...@@ -185,16 +185,17 @@ static int show_rcuexp(struct seq_file *m, void *v)
int cpu; int cpu;
struct rcu_state *rsp = (struct rcu_state *)m->private; struct rcu_state *rsp = (struct rcu_state *)m->private;
struct rcu_data *rdp; struct rcu_data *rdp;
unsigned long s1 = 0, s2 = 0, s3 = 0; unsigned long s0 = 0, s1 = 0, s2 = 0, s3 = 0;
for_each_possible_cpu(cpu) { for_each_possible_cpu(cpu) {
rdp = per_cpu_ptr(rsp->rda, cpu); rdp = per_cpu_ptr(rsp->rda, cpu);
s0 += atomic_long_read(&rdp->exp_workdone0);
s1 += atomic_long_read(&rdp->exp_workdone1); s1 += atomic_long_read(&rdp->exp_workdone1);
s2 += atomic_long_read(&rdp->exp_workdone2); s2 += atomic_long_read(&rdp->exp_workdone2);
s3 += atomic_long_read(&rdp->exp_workdone3); s3 += atomic_long_read(&rdp->exp_workdone3);
} }
seq_printf(m, "s=%lu wd1=%lu wd2=%lu wd3=%lu n=%lu enq=%d sc=%lu\n", seq_printf(m, "s=%lu wd0=%lu wd1=%lu wd2=%lu wd3=%lu n=%lu enq=%d sc=%lu\n",
rsp->expedited_sequence, s1, s2, s3, rsp->expedited_sequence, s0, s1, s2, s3,
atomic_long_read(&rsp->expedited_normal), atomic_long_read(&rsp->expedited_normal),
atomic_read(&rsp->expedited_need_qs), atomic_read(&rsp->expedited_need_qs),
rsp->expedited_sequence / 2); rsp->expedited_sequence / 2);
......
...@@ -46,7 +46,7 @@ ...@@ -46,7 +46,7 @@
#include <linux/export.h> #include <linux/export.h>
#include <linux/hardirq.h> #include <linux/hardirq.h>
#include <linux/delay.h> #include <linux/delay.h>
#include <linux/module.h> #include <linux/moduleparam.h>
#include <linux/kthread.h> #include <linux/kthread.h>
#include <linux/tick.h> #include <linux/tick.h>
...@@ -54,7 +54,6 @@ ...@@ -54,7 +54,6 @@
#include "rcu.h" #include "rcu.h"
MODULE_ALIAS("rcupdate");
#ifdef MODULE_PARAM_PREFIX #ifdef MODULE_PARAM_PREFIX
#undef MODULE_PARAM_PREFIX #undef MODULE_PARAM_PREFIX
#endif #endif
......
...@@ -581,6 +581,8 @@ static bool wake_up_full_nohz_cpu(int cpu) ...@@ -581,6 +581,8 @@ static bool wake_up_full_nohz_cpu(int cpu)
* If needed we can still optimize that later with an * If needed we can still optimize that later with an
* empty IRQ. * empty IRQ.
*/ */
if (cpu_is_offline(cpu))
return true; /* Don't try to wake offline CPUs. */
if (tick_nohz_full_cpu(cpu)) { if (tick_nohz_full_cpu(cpu)) {
if (cpu != smp_processor_id() || if (cpu != smp_processor_id() ||
tick_nohz_tick_stopped()) tick_nohz_tick_stopped())
...@@ -591,6 +593,11 @@ static bool wake_up_full_nohz_cpu(int cpu) ...@@ -591,6 +593,11 @@ static bool wake_up_full_nohz_cpu(int cpu)
return false; return false;
} }
/*
* Wake up the specified CPU. If the CPU is going offline, it is the
* caller's responsibility to deal with the lost wakeup, for example,
* by hooking into the CPU_DEAD notifier like timers and hrtimers do.
*/
void wake_up_nohz_cpu(int cpu) void wake_up_nohz_cpu(int cpu)
{ {
if (!wake_up_full_nohz_cpu(cpu)) if (!wake_up_full_nohz_cpu(cpu))
......
...@@ -43,6 +43,7 @@ ...@@ -43,6 +43,7 @@
#include <linux/stat.h> #include <linux/stat.h>
#include <linux/slab.h> #include <linux/slab.h>
#include <linux/trace_clock.h> #include <linux/trace_clock.h>
#include <linux/ktime.h>
#include <asm/byteorder.h> #include <asm/byteorder.h>
#include <linux/torture.h> #include <linux/torture.h>
...@@ -446,9 +447,8 @@ EXPORT_SYMBOL_GPL(torture_shuffle_cleanup); ...@@ -446,9 +447,8 @@ EXPORT_SYMBOL_GPL(torture_shuffle_cleanup);
* Variables for auto-shutdown. This allows "lights out" torture runs * Variables for auto-shutdown. This allows "lights out" torture runs
* to be fully scripted. * to be fully scripted.
*/ */
static int shutdown_secs; /* desired test duration in seconds. */
static struct task_struct *shutdown_task; static struct task_struct *shutdown_task;
static unsigned long shutdown_time; /* jiffies to system shutdown. */ static ktime_t shutdown_time; /* time to system shutdown. */
static void (*torture_shutdown_hook)(void); static void (*torture_shutdown_hook)(void);
/* /*
...@@ -471,20 +471,20 @@ EXPORT_SYMBOL_GPL(torture_shutdown_absorb); ...@@ -471,20 +471,20 @@ EXPORT_SYMBOL_GPL(torture_shutdown_absorb);
*/ */
static int torture_shutdown(void *arg) static int torture_shutdown(void *arg)
{ {
long delta; ktime_t ktime_snap;
unsigned long jiffies_snap;
VERBOSE_TOROUT_STRING("torture_shutdown task started"); VERBOSE_TOROUT_STRING("torture_shutdown task started");
jiffies_snap = jiffies; ktime_snap = ktime_get();
while (ULONG_CMP_LT(jiffies_snap, shutdown_time) && while (ktime_before(ktime_snap, shutdown_time) &&
!torture_must_stop()) { !torture_must_stop()) {
delta = shutdown_time - jiffies_snap;
if (verbose) if (verbose)
pr_alert("%s" TORTURE_FLAG pr_alert("%s" TORTURE_FLAG
"torture_shutdown task: %lu jiffies remaining\n", "torture_shutdown task: %llu ms remaining\n",
torture_type, delta); torture_type,
schedule_timeout_interruptible(delta); ktime_ms_delta(shutdown_time, ktime_snap));
jiffies_snap = jiffies; set_current_state(TASK_INTERRUPTIBLE);
schedule_hrtimeout(&shutdown_time, HRTIMER_MODE_ABS);
ktime_snap = ktime_get();
} }
if (torture_must_stop()) { if (torture_must_stop()) {
torture_kthread_stopping("torture_shutdown"); torture_kthread_stopping("torture_shutdown");
...@@ -511,10 +511,9 @@ int torture_shutdown_init(int ssecs, void (*cleanup)(void)) ...@@ -511,10 +511,9 @@ int torture_shutdown_init(int ssecs, void (*cleanup)(void))
{ {
int ret = 0; int ret = 0;
shutdown_secs = ssecs;
torture_shutdown_hook = cleanup; torture_shutdown_hook = cleanup;
if (shutdown_secs > 0) { if (ssecs > 0) {
shutdown_time = jiffies + shutdown_secs * HZ; shutdown_time = ktime_add(ktime_get(), ktime_set(ssecs, 0));
ret = torture_create_kthread(torture_shutdown, NULL, ret = torture_create_kthread(torture_shutdown, NULL,
shutdown_task); shutdown_task);
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment