Commit 2227e5b2 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'core-rcu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "The RCU updates for this cycle were:

   - RCU-tasks update, including addition of RCU Tasks Trace for BPF use
     and TASKS_RUDE_RCU

   - kfree_rcu() updates.

   - Remove scheduler locking restriction

   - RCU CPU stall warning updates.

   - Torture-test updates.

   - Miscellaneous fixes and other updates"

* tag 'core-rcu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
  rcu: Allow for smp_call_function() running callbacks from idle
  rcu: Provide rcu_irq_exit_check_preempt()
  rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter()
  rcu: Provide __rcu_is_watching()
  rcu: Provide rcu_irq_exit_preempt()
  rcu: Make RCU IRQ enter/exit functions rely on in_nmi()
  rcu/tree: Mark the idle relevant functions noinstr
  x86: Replace ist_enter() with nmi_enter()
  x86/mce: Send #MC singal from task work
  x86/entry: Get rid of ist_begin/end_non_atomic()
  sched,rcu,tracing: Avoid tracing before in_nmi() is correct
  sh/ftrace: Move arch_ftrace_nmi_{enter,exit} into nmi exception
  lockdep: Always inline lockdep_{off,on}()
  hardirq/nmi: Allow nested nmi_enter()
  arm64: Prepare arch_nmi_enter() for recursion
  printk: Disallow instrumenting print_nmi_enter()
  printk: Prepare for nested printk_nmi_enter()
  rcutorture: Convert ULONG_CMP_LT() to time_before()
  torture: Add a --kasan argument
  torture: Save a few lines by using config_override_param initially
  ...
parents 0bd957eb cb3cb673
......@@ -1943,56 +1943,27 @@ invoked from a CPU-hotplug notifier.
Scheduler and RCU
~~~~~~~~~~~~~~~~~
RCU depends on the scheduler, and the scheduler uses RCU to protect some
of its data structures. The preemptible-RCU ``rcu_read_unlock()``
implementation must therefore be written carefully to avoid deadlocks
involving the scheduler's runqueue and priority-inheritance locks. In
particular, ``rcu_read_unlock()`` must tolerate an interrupt where the
interrupt handler invokes both ``rcu_read_lock()`` and
``rcu_read_unlock()``. This possibility requires ``rcu_read_unlock()``
to use negative nesting levels to avoid destructive recursion via
interrupt handler's use of RCU.
This scheduler-RCU requirement came as a `complete
surprise <https://lwn.net/Articles/453002/>`__.
As noted above, RCU makes use of kthreads, and it is necessary to avoid
excessive CPU-time accumulation by these kthreads. This requirement was
no surprise, but RCU's violation of it when running context-switch-heavy
workloads when built with ``CONFIG_NO_HZ_FULL=y`` `did come as a
surprise
RCU makes use of kthreads, and it is necessary to avoid excessive CPU-time
accumulation by these kthreads. This requirement was no surprise, but
RCU's violation of it when running context-switch-heavy workloads when
built with ``CONFIG_NO_HZ_FULL=y`` `did come as a surprise
[PDF] <http://www.rdrop.com/users/paulmck/scalability/paper/BareMetal.2015.01.15b.pdf>`__.
RCU has made good progress towards meeting this requirement, even for
context-switch-heavy ``CONFIG_NO_HZ_FULL=y`` workloads, but there is
room for further improvement.
It is forbidden to hold any of scheduler's runqueue or
priority-inheritance spinlocks across an ``rcu_read_unlock()`` unless
interrupts have been disabled across the entire RCU read-side critical
section, that is, up to and including the matching ``rcu_read_lock()``.
Violating this restriction can result in deadlocks involving these
scheduler spinlocks. There was hope that this restriction might be
lifted when interrupt-disabled calls to ``rcu_read_unlock()`` started
deferring the reporting of the resulting RCU-preempt quiescent state
until the end of the corresponding interrupts-disabled region.
Unfortunately, timely reporting of the corresponding quiescent state to
expedited grace periods requires a call to ``raise_softirq()``, which
can acquire these scheduler spinlocks. In addition, real-time systems
using RCU priority boosting need this restriction to remain in effect
because deferred quiescent-state reporting would also defer deboosting,
which in turn would degrade real-time latencies.
In theory, if a given RCU read-side critical section could be guaranteed
to be less than one second in duration, holding a scheduler spinlock
across that critical section's ``rcu_read_unlock()`` would require only
that preemption be disabled across the entire RCU read-side critical
section, not interrupts. Unfortunately, given the possibility of vCPU
preemption, long-running interrupts, and so on, it is not possible in
practice to guarantee that a given RCU read-side critical section will
complete in less than one second. Therefore, as noted above, if
scheduler spinlocks are held across a given call to
``rcu_read_unlock()``, interrupts must be disabled across the entire RCU
read-side critical section.
There is no longer any prohibition against holding any of
scheduler's runqueue or priority-inheritance spinlocks across an
``rcu_read_unlock()``, even if interrupts and preemption were enabled
somewhere within the corresponding RCU read-side critical section.
Therefore, it is now perfectly legal to execute ``rcu_read_lock()``
with preemption enabled, acquire one of the scheduler locks, and hold
that lock across the matching ``rcu_read_unlock()``.
Similarly, the RCU flavor consolidation has removed the need for negative
nesting. The fact that interrupt-disabled regions of code act as RCU
read-side critical sections implicitly avoids earlier issues that used
to result in destructive recursion via interrupt handler's use of RCU.
Tracing and RCU
~~~~~~~~~~~~~~~
......
......@@ -4210,12 +4210,24 @@
Duration of CPU stall (s) to test RCU CPU stall
warnings, zero to disable.
rcutorture.stall_cpu_block= [KNL]
Sleep while stalling if set. This will result
in warnings from preemptible RCU in addition
to any other stall-related activity.
rcutorture.stall_cpu_holdoff= [KNL]
Time to wait (s) after boot before inducing stall.
rcutorture.stall_cpu_irqsoff= [KNL]
Disable interrupts while stalling if set.
rcutorture.stall_gp_kthread= [KNL]
Duration (s) of forced sleep within RCU
grace-period kthread to test RCU CPU stall
warnings, zero to disable. If both stall_cpu
and stall_gp_kthread are specified, the
kthread is starved first, then the CPU.
rcutorture.stat_interval= [KNL]
Time (s) between statistics printk()s.
......@@ -4286,6 +4298,13 @@
only normal grace-period primitives. No effect
on CONFIG_TINY_RCU kernels.
rcupdate.rcu_task_ipi_delay= [KNL]
Set time in jiffies during which RCU tasks will
avoid sending IPIs, starting with the beginning
of a given grace period. Setting a large
number avoids disturbing real-time workloads,
but lengthens grace periods.
rcupdate.rcu_task_stall_timeout= [KNL]
Set timeout in jiffies for RCU task stall warning
messages. Disable with a value less than or equal
......
......@@ -229,14 +229,6 @@ Adding support for it is easy: just define the macro in asm/ftrace.h and
pass the return address pointer as the 'retp' argument to
ftrace_push_return_trace().
HAVE_FTRACE_NMI_ENTER
---------------------
If you can't trace NMI functions, then skip this option.
<details to be filled>
HAVE_SYSCALL_TRACEPOINTS
------------------------
......
......@@ -32,30 +32,70 @@ u64 smp_irq_stat_cpu(unsigned int cpu);
struct nmi_ctx {
u64 hcr;
unsigned int cnt;
};
DECLARE_PER_CPU(struct nmi_ctx, nmi_contexts);
#define arch_nmi_enter() \
do { \
if (is_kernel_in_hyp_mode()) { \
struct nmi_ctx *nmi_ctx = this_cpu_ptr(&nmi_contexts); \
nmi_ctx->hcr = read_sysreg(hcr_el2); \
if (!(nmi_ctx->hcr & HCR_TGE)) { \
write_sysreg(nmi_ctx->hcr | HCR_TGE, hcr_el2); \
isb(); \
} \
} \
} while (0)
#define arch_nmi_enter() \
do { \
struct nmi_ctx *___ctx; \
u64 ___hcr; \
\
if (!is_kernel_in_hyp_mode()) \
break; \
\
___ctx = this_cpu_ptr(&nmi_contexts); \
if (___ctx->cnt) { \
___ctx->cnt++; \
break; \
} \
\
___hcr = read_sysreg(hcr_el2); \
if (!(___hcr & HCR_TGE)) { \
write_sysreg(___hcr | HCR_TGE, hcr_el2); \
isb(); \
} \
/* \
* Make sure the sysreg write is performed before ___ctx->cnt \
* is set to 1. NMIs that see cnt == 1 will rely on us. \
*/ \
barrier(); \
___ctx->cnt = 1; \
/* \
* Make sure ___ctx->cnt is set before we save ___hcr. We \
* don't want ___ctx->hcr to be overwritten. \
*/ \
barrier(); \
___ctx->hcr = ___hcr; \
} while (0)
#define arch_nmi_exit() \
do { \
if (is_kernel_in_hyp_mode()) { \
struct nmi_ctx *nmi_ctx = this_cpu_ptr(&nmi_contexts); \
if (!(nmi_ctx->hcr & HCR_TGE)) \
write_sysreg(nmi_ctx->hcr, hcr_el2); \
} \
} while (0)
#define arch_nmi_exit() \
do { \
struct nmi_ctx *___ctx; \
u64 ___hcr; \
\
if (!is_kernel_in_hyp_mode()) \
break; \
\
___ctx = this_cpu_ptr(&nmi_contexts); \
___hcr = ___ctx->hcr; \
/* \
* Make sure we read ___ctx->hcr before we release \
* ___ctx->cnt as it makes ___ctx->hcr updatable again. \
*/ \
barrier(); \
___ctx->cnt--; \
/* \
* Make sure ___ctx->cnt release is visible before we \
* restore the sysreg. Otherwise a new NMI occurring \
* right after write_sysreg() can be fooled and think \
* we secured things for it. \
*/ \
barrier(); \
if (!___ctx->cnt && !(___hcr & HCR_TGE)) \
write_sysreg(___hcr, hcr_el2); \
} while (0)
static inline void ack_bad_irq(unsigned int irq)
{
......
......@@ -251,22 +251,12 @@ asmlinkage __kprobes notrace unsigned long
__sdei_handler(struct pt_regs *regs, struct sdei_registered_event *arg)
{
unsigned long ret;
bool do_nmi_exit = false;
/*
* nmi_enter() deals with printk() re-entrance and use of RCU when
* RCU believed this CPU was idle. Because critical events can
* interrupt normal events, we may already be in_nmi().
*/
if (!in_nmi()) {
nmi_enter();
do_nmi_exit = true;
}
nmi_enter();
ret = _sdei_handler(regs, arg);
if (do_nmi_exit)
nmi_exit();
nmi_exit();
return ret;
}
......@@ -906,17 +906,13 @@ bool arm64_is_fatal_ras_serror(struct pt_regs *regs, unsigned int esr)
asmlinkage void do_serror(struct pt_regs *regs, unsigned int esr)
{
const bool was_in_nmi = in_nmi();
if (!was_in_nmi)
nmi_enter();
nmi_enter();
/* non-RAS errors are not containable */
if (!arm64_is_ras_serror(esr) || arm64_is_fatal_ras_serror(regs, esr))
arm64_serror_panic(regs, esr);
if (!was_in_nmi)
nmi_exit();
nmi_exit();
}
asmlinkage void enter_from_user_mode(void)
......
......@@ -441,15 +441,9 @@ void hv_nmi_check_nonrecoverable(struct pt_regs *regs)
void system_reset_exception(struct pt_regs *regs)
{
unsigned long hsrr0, hsrr1;
bool nested = in_nmi();
bool saved_hsrrs = false;
/*
* Avoid crashes in case of nested NMI exceptions. Recoverability
* is determined by RI and in_nmi
*/
if (!nested)
nmi_enter();
nmi_enter();
/*
* System reset can interrupt code where HSRRs are live and MSR[RI]=1.
......@@ -521,8 +515,7 @@ void system_reset_exception(struct pt_regs *regs)
mtspr(SPRN_HSRR1, hsrr1);
}
if (!nested)
nmi_exit();
nmi_exit();
/* What should we do here? We could issue a shutdown or hard reset. */
}
......@@ -823,9 +816,8 @@ int machine_check_generic(struct pt_regs *regs)
void machine_check_exception(struct pt_regs *regs)
{
int recover = 0;
bool nested = in_nmi();
if (!nested)
nmi_enter();
nmi_enter();
__this_cpu_inc(irq_stat.mce_exceptions);
......@@ -851,8 +843,7 @@ void machine_check_exception(struct pt_regs *regs)
if (check_io_access(regs))
goto bail;
if (!nested)
nmi_exit();
nmi_exit();
die("Machine check", regs, SIGBUS);
......@@ -863,8 +854,7 @@ void machine_check_exception(struct pt_regs *regs)
return;
bail:
if (!nested)
nmi_exit();
nmi_exit();
}
void SMIException(struct pt_regs *regs)
......
......@@ -71,7 +71,6 @@ config SUPERH32
select HAVE_FUNCTION_TRACER
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_DYNAMIC_FTRACE
select HAVE_FTRACE_NMI_ENTER if DYNAMIC_FTRACE
select ARCH_WANT_IPC_PARSE_VERSION
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_ARCH_KGDB
......
......@@ -170,11 +170,21 @@ BUILD_TRAP_HANDLER(bug)
force_sig(SIGTRAP);
}
#ifdef CONFIG_DYNAMIC_FTRACE
extern void arch_ftrace_nmi_enter(void);
extern void arch_ftrace_nmi_exit(void);
#else
static inline void arch_ftrace_nmi_enter(void) { }
static inline void arch_ftrace_nmi_exit(void) { }
#endif
BUILD_TRAP_HANDLER(nmi)
{
unsigned int cpu = smp_processor_id();
TRAP_HANDLER_DECL;
arch_ftrace_nmi_enter();
nmi_enter();
nmi_count(cpu)++;
......@@ -190,4 +200,6 @@ BUILD_TRAP_HANDLER(nmi)
}
nmi_exit();
arch_ftrace_nmi_exit();
}
......@@ -118,11 +118,6 @@ void smp_spurious_interrupt(struct pt_regs *regs);
void smp_error_interrupt(struct pt_regs *regs);
asmlinkage void smp_irq_move_cleanup_interrupt(void);
extern void ist_enter(struct pt_regs *regs);
extern void ist_exit(struct pt_regs *regs);
extern void ist_begin_non_atomic(struct pt_regs *regs);
extern void ist_end_non_atomic(void);
#ifdef CONFIG_VMAP_STACK
void __noreturn handle_stack_overflow(const char *message,
struct pt_regs *regs,
......
......@@ -42,6 +42,8 @@
#include <linux/export.h>
#include <linux/jump_label.h>
#include <linux/set_memory.h>
#include <linux/task_work.h>
#include <linux/hardirq.h>
#include <asm/intel-family.h>
#include <asm/processor.h>
......@@ -1086,23 +1088,6 @@ static void mce_clear_state(unsigned long *toclear)
}
}
static int do_memory_failure(struct mce *m)
{
int flags = MF_ACTION_REQUIRED;
int ret;
pr_err("Uncorrected hardware memory error in user-access at %llx", m->addr);
if (!(m->mcgstatus & MCG_STATUS_RIPV))
flags |= MF_MUST_KILL;
ret = memory_failure(m->addr >> PAGE_SHIFT, flags);
if (ret)
pr_err("Memory error not recovered");
else
set_mce_nospec(m->addr >> PAGE_SHIFT);
return ret;
}
/*
* Cases where we avoid rendezvous handler timeout:
* 1) If this CPU is offline.
......@@ -1204,6 +1189,29 @@ static void __mc_scan_banks(struct mce *m, struct mce *final,
*m = *final;
}
static void kill_me_now(struct callback_head *ch)
{
force_sig(SIGBUS);
}
static void kill_me_maybe(struct callback_head *cb)
{
struct task_struct *p = container_of(cb, struct task_struct, mce_kill_me);
int flags = MF_ACTION_REQUIRED;
pr_err("Uncorrected hardware memory error in user-access at %llx", p->mce_addr);
if (!(p->mce_status & MCG_STATUS_RIPV))
flags |= MF_MUST_KILL;
if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags)) {
set_mce_nospec(p->mce_addr >> PAGE_SHIFT);
return;
}
pr_err("Memory error not recovered");
kill_me_now(cb);
}
/*
* The actual machine check handler. This only handles real
* exceptions when something got corrupted coming in through int 18.
......@@ -1222,7 +1230,7 @@ static void __mc_scan_banks(struct mce *m, struct mce *final,
* backing the user stack, tracing that reads the user stack will cause
* potentially infinite recursion.
*/
void notrace do_machine_check(struct pt_regs *regs, long error_code)
void noinstr do_machine_check(struct pt_regs *regs, long error_code)
{
DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
DECLARE_BITMAP(toclear, MAX_NR_BANKS);
......@@ -1259,7 +1267,7 @@ void notrace do_machine_check(struct pt_regs *regs, long error_code)
if (__mc_check_crashing_cpu(cpu))
return;
ist_enter(regs);
nmi_enter();
this_cpu_inc(mce_exception_count);
......@@ -1352,23 +1360,24 @@ void notrace do_machine_check(struct pt_regs *regs, long error_code)
/* Fault was in user mode and we need to take some action */
if ((m.cs & 3) == 3) {
ist_begin_non_atomic(regs);
local_irq_enable();
if (kill_it || do_memory_failure(&m))
force_sig(SIGBUS);
local_irq_disable();
ist_end_non_atomic();
/* If this triggers there is no way to recover. Die hard. */
BUG_ON(!on_thread_stack() || !user_mode(regs));
current->mce_addr = m.addr;
current->mce_status = m.mcgstatus;
current->mce_kill_me.func = kill_me_maybe;
if (kill_it)
current->mce_kill_me.func = kill_me_now;
task_work_add(current, &current->mce_kill_me, true);
} else {
if (!fixup_exception(regs, X86_TRAP_MC, error_code, 0))
mce_panic("Failed kernel mode recovery", &m, msg);
}
out_ist:
ist_exit(regs);
nmi_exit();
}
EXPORT_SYMBOL_GPL(do_machine_check);
NOKPROBE_SYMBOL(do_machine_check);
#ifndef CONFIG_MEMORY_FAILURE
int memory_failure(unsigned long pfn, int flags)
......
......@@ -7,6 +7,7 @@
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/smp.h>
#include <linux/hardirq.h>
#include <asm/processor.h>
#include <asm/traps.h>
......@@ -24,7 +25,7 @@ static void pentium_machine_check(struct pt_regs *regs, long error_code)
{
u32 loaddr, hi, lotype;
ist_enter(regs);
nmi_enter();
rdmsr(MSR_IA32_P5_MC_ADDR, loaddr, hi);
rdmsr(MSR_IA32_P5_MC_TYPE, lotype, hi);
......@@ -39,7 +40,7 @@ static void pentium_machine_check(struct pt_regs *regs, long error_code)
add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
ist_exit(regs);
nmi_exit();
}
/* Set up machine check reporting for processors with Intel style MCE: */
......
......@@ -6,6 +6,7 @@
#include <linux/interrupt.h>
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/hardirq.h>
#include <asm/processor.h>
#include <asm/traps.h>
......@@ -18,12 +19,12 @@
/* Machine check handler for WinChip C6: */
static void winchip_machine_check(struct pt_regs *regs, long error_code)
{
ist_enter(regs);
nmi_enter();
pr_emerg("CPU0: Machine Check Exception.\n");
add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
ist_exit(regs);
nmi_exit();
}
/* Set up machine check reporting on the Winchip C6 series */
......
......@@ -37,10 +37,12 @@
#include <linux/mm.h>
#include <linux/smp.h>
#include <linux/io.h>
#include <linux/hardirq.h>
#include <linux/atomic.h>
#include <asm/stacktrace.h>
#include <asm/processor.h>
#include <asm/debugreg.h>
#include <linux/atomic.h>
#include <asm/text-patching.h>
#include <asm/ftrace.h>
#include <asm/traps.h>
......@@ -82,78 +84,6 @@ static inline void cond_local_irq_disable(struct pt_regs *regs)
local_irq_disable();
}
/*
* In IST context, we explicitly disable preemption. This serves two
* purposes: it makes it much less likely that we would accidentally
* schedule in IST context and it will force a warning if we somehow
* manage to schedule by accident.
*/
void ist_enter(struct pt_regs *regs)
{
if (user_mode(regs)) {
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
} else {
/*
* We might have interrupted pretty much anything. In
* fact, if we're a machine check, we can even interrupt
* NMI processing. We don't want in_nmi() to return true,
* but we need to notify RCU.
*/
rcu_nmi_enter();
}
preempt_disable();
/* This code is a bit fragile. Test it. */
RCU_LOCKDEP_WARN(!rcu_is_watching(), "ist_enter didn't work");
}
NOKPROBE_SYMBOL(ist_enter);
void ist_exit(struct pt_regs *regs)
{
preempt_enable_no_resched();
if (!user_mode(regs))
rcu_nmi_exit();
}
/**
* ist_begin_non_atomic() - begin a non-atomic section in an IST exception
* @regs: regs passed to the IST exception handler
*
* IST exception handlers normally cannot schedule. As a special
* exception, if the exception interrupted userspace code (i.e.
* user_mode(regs) would return true) and the exception was not
* a double fault, it can be safe to schedule. ist_begin_non_atomic()
* begins a non-atomic section within an ist_enter()/ist_exit() region.
* Callers are responsible for enabling interrupts themselves inside
* the non-atomic section, and callers must call ist_end_non_atomic()
* before ist_exit().
*/
void ist_begin_non_atomic(struct pt_regs *regs)
{
BUG_ON(!user_mode(regs));
/*
* Sanity check: we need to be on the normal thread stack. This
* will catch asm bugs and any attempt to use ist_preempt_enable
* from double_fault.
*/
BUG_ON(!on_thread_stack());
preempt_enable_no_resched();
}
/**
* ist_end_non_atomic() - begin a non-atomic section in an IST exception
*
* Ends a non-atomic section started with ist_begin_non_atomic().
*/
void ist_end_non_atomic(void)
{
preempt_disable();
}
int is_valid_bugaddr(unsigned long addr)
{
unsigned short ud;
......@@ -363,7 +293,7 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsign
* The net result is that our #GP handler will think that we
* entered from usermode with the bad user context.
*
* No need for ist_enter here because we don't use RCU.
* No need for nmi_enter() here because we don't use RCU.
*/
if (((long)regs->sp >> P4D_SHIFT) == ESPFIX_PGD_ENTRY &&
regs->cs == __KERNEL_CS &&
......@@ -398,7 +328,7 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code, unsign
}
#endif
ist_enter(regs);
nmi_enter();
notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV);
tsk->thread.error_code = error_code;
......@@ -592,19 +522,13 @@ dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
return;
/*
* Unlike any other non-IST entry, we can be called from a kprobe in
* non-CONTEXT_KERNEL kernel mode or even during context tracking
* state changes. Make sure that we wake up RCU even if we're coming
* from kernel code.
*
* This means that we can't schedule even if we came from a
* preemptible kernel context. That's okay.
* Unlike any other non-IST entry, we can be called from pretty much
* any location in the kernel through kprobes -- text_poke() will most
* likely be handled by poke_int3_handler() above. This means this
* handler is effectively NMI-like.
*/
if (!user_mode(regs)) {
rcu_nmi_enter();
preempt_disable();
}
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
if (!user_mode(regs))
nmi_enter();
#ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
......@@ -626,10 +550,8 @@ dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
cond_local_irq_disable(regs);
exit:
if (!user_mode(regs)) {
preempt_enable_no_resched();
rcu_nmi_exit();
}
if (!user_mode(regs))
nmi_exit();
}
NOKPROBE_SYMBOL(do_int3);
......@@ -733,7 +655,7 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
unsigned long dr6;
int si_code;
ist_enter(regs);
nmi_enter();
get_debugreg(dr6, 6);
/*
......@@ -826,7 +748,7 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code)
debug_stack_usage_dec();
exit:
ist_exit(regs);
nmi_exit();
}
NOKPROBE_SYMBOL(do_debug);
......
......@@ -5499,7 +5499,7 @@ struct drm_dp_aux *drm_dp_mst_dsc_aux_for_port(struct drm_dp_mst_port *port)
{
struct drm_dp_mst_port *immediate_upstream_port;
struct drm_dp_mst_port *fec_port;
struct drm_dp_desc desc = { 0 };
struct drm_dp_desc desc = { };
u8 endpoint_fec;
u8 endpoint_dsc;
......
......@@ -2,15 +2,6 @@
#ifndef _LINUX_FTRACE_IRQ_H
#define _LINUX_FTRACE_IRQ_H
#ifdef CONFIG_FTRACE_NMI_ENTER
extern void arch_ftrace_nmi_enter(void);
extern void arch_ftrace_nmi_exit(void);
#else
static inline void arch_ftrace_nmi_enter(void) { }
static inline void arch_ftrace_nmi_exit(void) { }
#endif
#ifdef CONFIG_HWLAT_TRACER
extern bool trace_hwlat_callback_enabled;
extern void trace_hwlat_callback(bool enter);
......@@ -22,12 +13,10 @@ static inline void ftrace_nmi_enter(void)
if (trace_hwlat_callback_enabled)
trace_hwlat_callback(true);
#endif
arch_ftrace_nmi_enter();
}
static inline void ftrace_nmi_exit(void)
{
arch_ftrace_nmi_exit();
#ifdef CONFIG_HWLAT_TRACER
if (trace_hwlat_callback_enabled)
trace_hwlat_callback(false);
......
......@@ -2,31 +2,28 @@
#ifndef LINUX_HARDIRQ_H
#define LINUX_HARDIRQ_H
#include <linux/context_tracking_state.h>
#include <linux/preempt.h>
#include <linux/lockdep.h>
#include <linux/ftrace_irq.h>
#include <linux/vtime.h>
#include <asm/hardirq.h>
extern void synchronize_irq(unsigned int irq);
extern bool synchronize_hardirq(unsigned int irq);
#if defined(CONFIG_TINY_RCU)
static inline void rcu_nmi_enter(void)
{
}
#ifdef CONFIG_NO_HZ_FULL
void __rcu_irq_enter_check_tick(void);
#else
static inline void __rcu_irq_enter_check_tick(void) { }
#endif
static inline void rcu_nmi_exit(void)
static __always_inline void rcu_irq_enter_check_tick(void)
{
if (context_tracking_enabled())
__rcu_irq_enter_check_tick();
}
#else
extern void rcu_nmi_enter(void);
extern void rcu_nmi_exit(void);
#endif
/*
* It is safe to do non-atomic ops on ->hardirq_context,
* because NMI handlers may not preempt and the ops are
......@@ -65,14 +62,34 @@ extern void irq_exit(void);
#define arch_nmi_exit() do { } while (0)
#endif
#ifdef CONFIG_TINY_RCU
static inline void rcu_nmi_enter(void) { }
static inline void rcu_nmi_exit(void) { }
#else
extern void rcu_nmi_enter(void);
extern void rcu_nmi_exit(void);
#endif
/*
* NMI vs Tracing
* --------------
*
* We must not land in a tracer until (or after) we've changed preempt_count
* such that in_nmi() becomes true. To that effect all NMI C entry points must
* be marked 'notrace' and call nmi_enter() as soon as possible.
*/
/*
* nmi_enter() can nest up to 15 times; see NMI_BITS.
*/
#define nmi_enter() \
do { \
arch_nmi_enter(); \
printk_nmi_enter(); \
lockdep_off(); \
ftrace_nmi_enter(); \
BUG_ON(in_nmi()); \
preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
BUG_ON(in_nmi() == NMI_MASK); \
__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
rcu_nmi_enter(); \
lockdep_hardirq_enter(); \
} while (0)
......@@ -82,7 +99,7 @@ extern void irq_exit(void);
lockdep_hardirq_exit(); \
rcu_nmi_exit(); \
BUG_ON(!in_nmi()); \
preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
ftrace_nmi_exit(); \
lockdep_on(); \
printk_nmi_exit(); \
......
......@@ -308,8 +308,27 @@ extern void lockdep_set_selftest_task(struct task_struct *task);
extern void lockdep_init_task(struct task_struct *task);
extern void lockdep_off(void);
extern void lockdep_on(void);
/*
* Split the recrursion counter in two to readily detect 'off' vs recursion.
*/
#define LOCKDEP_RECURSION_BITS 16
#define LOCKDEP_OFF (1U << LOCKDEP_RECURSION_BITS)
#define LOCKDEP_RECURSION_MASK (LOCKDEP_OFF - 1)
/*
* lockdep_{off,on}() are macros to avoid tracing and kprobes; not inlines due
* to header dependencies.
*/
#define lockdep_off() \
do { \
current->lockdep_recursion += LOCKDEP_OFF; \
} while (0)
#define lockdep_on() \
do { \
current->lockdep_recursion -= LOCKDEP_OFF; \
} while (0)
extern void lockdep_register_key(struct lock_class_key *key);
extern void lockdep_unregister_key(struct lock_class_key *key);
......
......@@ -26,13 +26,13 @@
* PREEMPT_MASK: 0x000000ff
* SOFTIRQ_MASK: 0x0000ff00
* HARDIRQ_MASK: 0x000f0000
* NMI_MASK: 0x00100000
* NMI_MASK: 0x00f00000
* PREEMPT_NEED_RESCHED: 0x80000000
*/
#define PREEMPT_BITS 8
#define SOFTIRQ_BITS 8
#define HARDIRQ_BITS 4
#define NMI_BITS 1
#define NMI_BITS 4
#define PREEMPT_SHIFT 0
#define SOFTIRQ_SHIFT (PREEMPT_SHIFT + PREEMPT_BITS)
......
......@@ -371,7 +371,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list,
* @pos: the type * to use as a loop cursor.
* @head: the head for your list.
* @member: the name of the list_head within the struct.
* @cond...: optional lockdep expression if called from non-RCU protection.
* @cond: optional lockdep expression if called from non-RCU protection.
*
* This list-traversal primitive may safely run concurrently with
* the _rcu list-mutation primitives such as list_add_rcu()
......@@ -646,7 +646,7 @@ static inline void hlist_add_behind_rcu(struct hlist_node *n,
* @pos: the type * to use as a loop cursor.
* @head: the head for your list.
* @member: the name of the hlist_node within the struct.
* @cond...: optional lockdep expression if called from non-RCU protection.
* @cond: optional lockdep expression if called from non-RCU protection.
*
* This list-traversal primitive may safely run concurrently with
* the _rcu list-mutation primitives such as hlist_add_head_rcu()
......
......@@ -37,6 +37,7 @@
/* Exported common interfaces */
void call_rcu(struct rcu_head *head, rcu_callback_t func);
void rcu_barrier_tasks(void);
void rcu_barrier_tasks_rude(void);
void synchronize_rcu(void);
#ifdef CONFIG_PREEMPT_RCU
......@@ -129,25 +130,57 @@ static inline void rcu_init_nohz(void) { }
* Note a quasi-voluntary context switch for RCU-tasks's benefit.
* This is a macro rather than an inline function to avoid #include hell.
*/
#ifdef CONFIG_TASKS_RCU
#define rcu_tasks_qs(t) \
do { \
if (READ_ONCE((t)->rcu_tasks_holdout)) \
WRITE_ONCE((t)->rcu_tasks_holdout, false); \
#ifdef CONFIG_TASKS_RCU_GENERIC
# ifdef CONFIG_TASKS_RCU
# define rcu_tasks_classic_qs(t, preempt) \
do { \
if (!(preempt) && READ_ONCE((t)->rcu_tasks_holdout)) \
WRITE_ONCE((t)->rcu_tasks_holdout, false); \
} while (0)
#define rcu_note_voluntary_context_switch(t) rcu_tasks_qs(t)
void call_rcu_tasks(struct rcu_head *head, rcu_callback_t func);
void synchronize_rcu_tasks(void);
# else
# define rcu_tasks_classic_qs(t, preempt) do { } while (0)
# define call_rcu_tasks call_rcu
# define synchronize_rcu_tasks synchronize_rcu
# endif
# ifdef CONFIG_TASKS_RCU_TRACE
# define rcu_tasks_trace_qs(t) \
do { \
if (!likely(READ_ONCE((t)->trc_reader_checked)) && \
!unlikely(READ_ONCE((t)->trc_reader_nesting))) { \
smp_store_release(&(t)->trc_reader_checked, true); \
smp_mb(); /* Readers partitioned by store. */ \
} \
} while (0)
# else
# define rcu_tasks_trace_qs(t) do { } while (0)
# endif
#define rcu_tasks_qs(t, preempt) \
do { \
rcu_tasks_classic_qs((t), (preempt)); \
rcu_tasks_trace_qs((t)); \
} while (0)
# ifdef CONFIG_TASKS_RUDE_RCU
void call_rcu_tasks_rude(struct rcu_head *head, rcu_callback_t func);
void synchronize_rcu_tasks_rude(void);
# endif
#define rcu_note_voluntary_context_switch(t) rcu_tasks_qs(t, false)
void exit_tasks_rcu_start(void);
void exit_tasks_rcu_finish(void);
#else /* #ifdef CONFIG_TASKS_RCU */
#define rcu_tasks_qs(t) do { } while (0)
#else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
#define rcu_tasks_qs(t, preempt) do { } while (0)
#define rcu_note_voluntary_context_switch(t) do { } while (0)
#define call_rcu_tasks call_rcu
#define synchronize_rcu_tasks synchronize_rcu
static inline void exit_tasks_rcu_start(void) { }
static inline void exit_tasks_rcu_finish(void) { }
#endif /* #else #ifdef CONFIG_TASKS_RCU */
#endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
/**
* cond_resched_tasks_rcu_qs - Report potential quiescent states to RCU
......@@ -158,7 +191,7 @@ static inline void exit_tasks_rcu_finish(void) { }
*/
#define cond_resched_tasks_rcu_qs() \
do { \
rcu_tasks_qs(current); \
rcu_tasks_qs(current, false); \
cond_resched(); \
} while (0)
......
/* SPDX-License-Identifier: GPL-2.0+ */
/*
* Read-Copy Update mechanism for mutual exclusion, adapted for tracing.
*
* Copyright (C) 2020 Paul E. McKenney.
*/
#ifndef __LINUX_RCUPDATE_TRACE_H
#define __LINUX_RCUPDATE_TRACE_H
#include <linux/sched.h>
#include <linux/rcupdate.h>
#ifdef CONFIG_DEBUG_LOCK_ALLOC
extern struct lockdep_map rcu_trace_lock_map;
static inline int rcu_read_lock_trace_held(void)
{
return lock_is_held(&rcu_trace_lock_map);
}
#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
static inline int rcu_read_lock_trace_held(void)
{
return 1;
}
#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
#ifdef CONFIG_TASKS_TRACE_RCU
void rcu_read_unlock_trace_special(struct task_struct *t, int nesting);
/**
* rcu_read_lock_trace - mark beginning of RCU-trace read-side critical section
*
* When synchronize_rcu_trace() is invoked by one task, then that task
* is guaranteed to block until all other tasks exit their read-side
* critical sections. Similarly, if call_rcu_trace() is invoked on one
* task while other tasks are within RCU read-side critical sections,
* invocation of the corresponding RCU callback is deferred until after
* the all the other tasks exit their critical sections.
*
* For more details, please see the documentation for rcu_read_lock().
*/
static inline void rcu_read_lock_trace(void)
{
struct task_struct *t = current;
WRITE_ONCE(t->trc_reader_nesting, READ_ONCE(t->trc_reader_nesting) + 1);
if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) &&
t->trc_reader_special.b.need_mb)
smp_mb(); // Pairs with update-side barriers
rcu_lock_acquire(&rcu_trace_lock_map);
}
/**
* rcu_read_unlock_trace - mark end of RCU-trace read-side critical section
*
* Pairs with a preceding call to rcu_read_lock_trace(), and nesting is
* allowed. Invoking a rcu_read_unlock_trace() when there is no matching
* rcu_read_lock_trace() is verboten, and will result in lockdep complaints.
*
* For more details, please see the documentation for rcu_read_unlock().
*/
static inline void rcu_read_unlock_trace(void)
{
int nesting;
struct task_struct *t = current;
rcu_lock_release(&rcu_trace_lock_map);
nesting = READ_ONCE(t->trc_reader_nesting) - 1;
if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
WRITE_ONCE(t->trc_reader_nesting, nesting);
return; // We assume shallow reader nesting.
}
rcu_read_unlock_trace_special(t, nesting);
}
void call_rcu_tasks_trace(struct rcu_head *rhp, rcu_callback_t func);
void synchronize_rcu_tasks_trace(void);
void rcu_barrier_tasks_trace(void);
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
#endif /* __LINUX_RCUPDATE_TRACE_H */
......@@ -31,4 +31,23 @@ do { \
#define wait_rcu_gp(...) _wait_rcu_gp(false, __VA_ARGS__)
/**
* synchronize_rcu_mult - Wait concurrently for multiple grace periods
* @...: List of call_rcu() functions for different grace periods to wait on
*
* This macro waits concurrently for multiple types of RCU grace periods.
* For example, synchronize_rcu_mult(call_rcu, call_rcu_tasks) would wait
* on concurrent RCU and RCU-tasks grace periods. Waiting on a given SRCU
* domain requires you to write a wrapper function for that SRCU domain's
* call_srcu() function, with this wrapper supplying the pointer to the
* corresponding srcu_struct.
*
* The first argument tells Tiny RCU's _wait_rcu_gp() not to
* bother waiting for RCU. The reason for this is because anywhere
* synchronize_rcu_mult() can be called is automatically already a full
* grace period.
*/
#define synchronize_rcu_mult(...) \
_wait_rcu_gp(IS_ENABLED(CONFIG_TINY_RCU), __VA_ARGS__)
#endif /* _LINUX_SCHED_RCUPDATE_WAIT_H */
......@@ -49,7 +49,7 @@ static inline void rcu_softirq_qs(void)
#define rcu_note_context_switch(preempt) \
do { \
rcu_qs(); \
rcu_tasks_qs(current); \
rcu_tasks_qs(current, (preempt)); \
} while (0)
static inline int rcu_needs_cpu(u64 basemono, u64 *nextevt)
......@@ -71,6 +71,8 @@ static inline void rcu_irq_enter(void) { }
static inline void rcu_irq_exit_irqson(void) { }
static inline void rcu_irq_enter_irqson(void) { }
static inline void rcu_irq_exit(void) { }
static inline void rcu_irq_exit_preempt(void) { }
static inline void rcu_irq_exit_check_preempt(void) { }
static inline void exit_rcu(void) { }
static inline bool rcu_preempt_need_deferred_qs(struct task_struct *t)
{
......@@ -85,8 +87,10 @@ static inline void rcu_scheduler_starting(void) { }
static inline void rcu_end_inkernel_boot(void) { }
static inline bool rcu_inkernel_boot_has_ended(void) { return true; }
static inline bool rcu_is_watching(void) { return true; }
static inline bool __rcu_is_watching(void) { return true; }
static inline void rcu_momentary_dyntick_idle(void) { }
static inline void kfree_rcu_scheduler_running(void) { }
static inline bool rcu_gp_might_be_stalled(void) { return false; }
/* Avoid RCU read-side critical sections leaking across. */
static inline void rcu_all_qs(void) { barrier(); }
......
......@@ -39,6 +39,7 @@ void rcu_barrier(void);
bool rcu_eqs_special_set(int cpu);
void rcu_momentary_dyntick_idle(void);
void kfree_rcu_scheduler_running(void);
bool rcu_gp_might_be_stalled(void);
unsigned long get_state_synchronize_rcu(void);
void cond_synchronize_rcu(unsigned long oldstate);
......@@ -46,9 +47,16 @@ void rcu_idle_enter(void);
void rcu_idle_exit(void);
void rcu_irq_enter(void);
void rcu_irq_exit(void);
void rcu_irq_exit_preempt(void);
void rcu_irq_enter_irqson(void);
void rcu_irq_exit_irqson(void);
#ifdef CONFIG_PROVE_RCU
void rcu_irq_exit_check_preempt(void);
#else
static inline void rcu_irq_exit_check_preempt(void) { }
#endif
void exit_rcu(void);
void rcu_scheduler_starting(void);
......@@ -56,6 +64,7 @@ extern int rcu_scheduler_active __read_mostly;
void rcu_end_inkernel_boot(void);
bool rcu_inkernel_boot_has_ended(void);
bool rcu_is_watching(void);
bool __rcu_is_watching(void);
#ifndef CONFIG_PREEMPTION
void rcu_all_qs(void);
#endif
......
......@@ -613,7 +613,7 @@ union rcu_special {
u8 blocked;
u8 need_qs;
u8 exp_hint; /* Hint for performance. */
u8 deferred_qs;
u8 need_mb; /* Readers need smp_mb(). */
} b; /* Bits. */
u32 s; /* Set of bits. */
};
......@@ -724,6 +724,14 @@ struct task_struct {
struct list_head rcu_tasks_holdout_list;
#endif /* #ifdef CONFIG_TASKS_RCU */
#ifdef CONFIG_TASKS_TRACE_RCU
int trc_reader_nesting;
int trc_ipi_to_cpu;
union rcu_special trc_reader_special;
bool trc_reader_checked;
struct list_head trc_holdout_list;
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
struct sched_info sched_info;
struct list_head tasks;
......@@ -1289,6 +1297,12 @@ struct task_struct {
unsigned long prev_lowest_stack;
#endif
#ifdef CONFIG_X86_MCE
u64 mce_addr;
u64 mce_status;
struct callback_head mce_kill_me;
#endif
/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
......
......@@ -89,7 +89,7 @@ void _torture_stop_kthread(char *m, struct task_struct **tp);
#ifdef CONFIG_PREEMPTION
#define torture_preempt_schedule() preempt_schedule()
#else
#define torture_preempt_schedule()
#define torture_preempt_schedule() do { } while (0)
#endif
#endif /* __LINUX_TORTURE_H */
......@@ -1149,4 +1149,6 @@ int autoremove_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, i
(wait)->flags = 0; \
} while (0)
bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg);
#endif /* _LINUX_WAIT_H */
......@@ -141,6 +141,11 @@ struct task_struct init_task
.rcu_tasks_holdout_list = LIST_HEAD_INIT(init_task.rcu_tasks_holdout_list),
.rcu_tasks_idle_cpu = -1,
#endif
#ifdef CONFIG_TASKS_TRACE_RCU
.trc_reader_nesting = 0,
.trc_reader_special.s = 0,
.trc_holdout_list = LIST_HEAD_INIT(init_task.trc_holdout_list),
#endif
#ifdef CONFIG_CPUSETS
.mems_allowed_seq = SEQCNT_ZERO(init_task.mems_allowed_seq),
#endif
......
......@@ -1683,6 +1683,11 @@ static inline void rcu_copy_process(struct task_struct *p)
INIT_LIST_HEAD(&p->rcu_tasks_holdout_list);
p->rcu_tasks_idle_cpu = -1;
#endif /* #ifdef CONFIG_TASKS_RCU */
#ifdef CONFIG_TASKS_TRACE_RCU
p->trc_reader_nesting = 0;
p->trc_reader_special.s = 0;
INIT_LIST_HEAD(&p->trc_holdout_list);
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
}
struct pid *pidfd_pid(const struct file *file)
......
......@@ -393,25 +393,6 @@ void lockdep_init_task(struct task_struct *task)
task->lockdep_recursion = 0;
}
/*
* Split the recrursion counter in two to readily detect 'off' vs recursion.
*/
#define LOCKDEP_RECURSION_BITS 16
#define LOCKDEP_OFF (1U << LOCKDEP_RECURSION_BITS)
#define LOCKDEP_RECURSION_MASK (LOCKDEP_OFF - 1)
void lockdep_off(void)
{
current->lockdep_recursion += LOCKDEP_OFF;
}
EXPORT_SYMBOL(lockdep_off);
void lockdep_on(void)
{
current->lockdep_recursion -= LOCKDEP_OFF;
}
EXPORT_SYMBOL(lockdep_on);
static inline void lockdep_recursion_finish(void)
{
if (WARN_ON_ONCE(--current->lockdep_recursion))
......
......@@ -6,9 +6,11 @@
#ifdef CONFIG_PRINTK
#define PRINTK_SAFE_CONTEXT_MASK 0x3fffffff
#define PRINTK_NMI_DIRECT_CONTEXT_MASK 0x40000000
#define PRINTK_NMI_CONTEXT_MASK 0x80000000
#define PRINTK_SAFE_CONTEXT_MASK 0x007ffffff
#define PRINTK_NMI_DIRECT_CONTEXT_MASK 0x008000000
#define PRINTK_NMI_CONTEXT_MASK 0xff0000000
#define PRINTK_NMI_CONTEXT_OFFSET 0x010000000
extern raw_spinlock_t logbuf_lock;
......
......@@ -10,6 +10,7 @@
#include <linux/cpumask.h>
#include <linux/irq_work.h>
#include <linux/printk.h>
#include <linux/kprobes.h>
#include "internal.h"
......@@ -293,14 +294,14 @@ static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args)
return printk_safe_log_store(s, fmt, args);
}
void notrace printk_nmi_enter(void)
void noinstr printk_nmi_enter(void)
{
this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
this_cpu_add(printk_context, PRINTK_NMI_CONTEXT_OFFSET);
}
void notrace printk_nmi_exit(void)
void noinstr printk_nmi_exit(void)
{
this_cpu_and(printk_context, ~PRINTK_NMI_CONTEXT_MASK);
this_cpu_sub(printk_context, PRINTK_NMI_CONTEXT_OFFSET);
}
/*
......
......@@ -70,13 +70,37 @@ config TREE_SRCU
help
This option selects the full-fledged version of SRCU.
config TASKS_RCU_GENERIC
def_bool TASKS_RCU || TASKS_RUDE_RCU || TASKS_TRACE_RCU
select SRCU
help
This option enables generic infrastructure code supporting
task-based RCU implementations. Not for manual selection.
config TASKS_RCU
def_bool PREEMPTION
select SRCU
help
This option enables a task-based RCU implementation that uses
only voluntary context switch (not preemption!), idle, and
user-mode execution as quiescent states.
user-mode execution as quiescent states. Not for manual selection.
config TASKS_RUDE_RCU
def_bool 0
help
This option enables a task-based RCU implementation that uses
only context switch (including preemption) and user-mode
execution as quiescent states. It forces IPIs and context
switches on all online CPUs, including idle ones, so use
with caution.
config TASKS_TRACE_RCU
def_bool 0
help
This option enables a task-based RCU implementation that uses
explicit rcu_read_lock_trace() read-side markers, and allows
these readers to appear in the idle loop as well as on the CPU
hotplug code paths. It can force IPIs on online CPUs, including
idle ones, so use with caution.
config RCU_STALL_COMMON
def_bool TREE_RCU
......@@ -210,4 +234,22 @@ config RCU_NOCB_CPU
Say Y here if you want to help to debug reduced OS jitter.
Say N here if you are unsure.
config TASKS_TRACE_RCU_READ_MB
bool "Tasks Trace RCU readers use memory barriers in user and idle"
depends on RCU_EXPERT
default PREEMPT_RT || NR_CPUS < 8
help
Use this option to further reduce the number of IPIs sent
to CPUs executing in userspace or idle during tasks trace
RCU grace periods. Given that a reasonable setting of
the rcupdate.rcu_task_ipi_delay kernel boot parameter
eliminates such IPIs for many workloads, proper setting
of this Kconfig option is important mostly for aggressive
real-time installations and for battery-powered devices,
hence the default chosen above.
Say Y here if you hate IPIs.
Say N here if you hate read-side memory barriers.
Take the default if you are unsure.
endmenu # "RCU Subsystem"
......@@ -29,6 +29,8 @@ config RCU_PERF_TEST
select TORTURE_TEST
select SRCU
select TASKS_RCU
select TASKS_RUDE_RCU
select TASKS_TRACE_RCU
default n
help
This option provides a kernel module that runs performance
......@@ -46,6 +48,8 @@ config RCU_TORTURE_TEST
select TORTURE_TEST
select SRCU
select TASKS_RCU
select TASKS_RUDE_RCU
select TASKS_TRACE_RCU
default n
help
This option provides a kernel module that runs torture tests
......
......@@ -431,6 +431,7 @@ bool rcu_gp_is_expedited(void); /* Internal RCU use. */
void rcu_expedite_gp(void);
void rcu_unexpedite_gp(void);
void rcupdate_announce_bootup_oddness(void);
void show_rcu_tasks_gp_kthreads(void);
void rcu_request_urgent_qs_task(struct task_struct *t);
#endif /* #else #ifdef CONFIG_TINY_RCU */
......@@ -441,6 +442,8 @@ void rcu_request_urgent_qs_task(struct task_struct *t);
enum rcutorture_type {
RCU_FLAVOR,
RCU_TASKS_FLAVOR,
RCU_TASKS_RUDE_FLAVOR,
RCU_TASKS_TRACING_FLAVOR,
RCU_TRIVIAL_FLAVOR,
SRCU_FLAVOR,
INVALID_RCU_FLAVOR
......@@ -454,6 +457,7 @@ void do_trace_rcu_torture_read(const char *rcutorturename,
unsigned long secs,
unsigned long c_old,
unsigned long c);
void rcu_gp_set_torture_wait(int duration);
#else
static inline void rcutorture_get_gp_data(enum rcutorture_type test_type,
int *flags, unsigned long *gp_seq)
......@@ -471,6 +475,7 @@ void do_trace_rcu_torture_read(const char *rcutorturename,
#define do_trace_rcu_torture_read(rcutorturename, rhp, secs, c_old, c) \
do { } while (0)
#endif
static inline void rcu_gp_set_torture_wait(int duration) { }
#endif
#if IS_ENABLED(CONFIG_RCU_TORTURE_TEST) || IS_MODULE(CONFIG_RCU_TORTURE_TEST)
......@@ -498,6 +503,7 @@ void srcutorture_get_gp_data(enum rcutorture_type test_type,
#endif
#ifdef CONFIG_TINY_RCU
static inline bool rcu_dynticks_zero_in_eqs(int cpu, int *vp) { return false; }
static inline unsigned long rcu_get_gp_seq(void) { return 0; }
static inline unsigned long rcu_exp_batches_completed(void) { return 0; }
static inline unsigned long
......@@ -507,6 +513,7 @@ static inline void show_rcu_gp_kthreads(void) { }
static inline int rcu_get_gp_kthreads_prio(void) { return 0; }
static inline void rcu_fwd_progress_check(unsigned long j) { }
#else /* #ifdef CONFIG_TINY_RCU */
bool rcu_dynticks_zero_in_eqs(int cpu, int *vp);
unsigned long rcu_get_gp_seq(void);
unsigned long rcu_exp_batches_completed(void);
unsigned long srcu_batches_completed(struct srcu_struct *sp);
......
......@@ -88,6 +88,7 @@ torture_param(bool, shutdown, RCUPERF_SHUTDOWN,
torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable");
torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() perf test?");
torture_param(int, kfree_mult, 1, "Multiple of kfree_obj size to allocate.");
static char *perf_type = "rcu";
module_param(perf_type, charp, 0444);
......@@ -635,7 +636,7 @@ kfree_perf_thread(void *arg)
}
for (i = 0; i < kfree_alloc_num; i++) {
alloc_ptr = kmalloc(sizeof(struct kfree_obj), GFP_KERNEL);
alloc_ptr = kmalloc(kfree_mult * sizeof(struct kfree_obj), GFP_KERNEL);
if (!alloc_ptr)
return -ENOMEM;
......@@ -722,6 +723,8 @@ kfree_perf_init(void)
schedule_timeout_uninterruptible(1);
}
pr_alert("kfree object size=%lu\n", kfree_mult * sizeof(struct kfree_obj));
kfree_reader_tasks = kcalloc(kfree_nrealthreads, sizeof(kfree_reader_tasks[0]),
GFP_KERNEL);
if (kfree_reader_tasks == NULL) {
......
This diff is collapsed.
......@@ -29,6 +29,19 @@
#include "rcu.h"
#include "rcu_segcblist.h"
#ifndef data_race
#define data_race(expr) \
({ \
expr; \
})
#endif
#ifndef ASSERT_EXCLUSIVE_WRITER
#define ASSERT_EXCLUSIVE_WRITER(var) do { } while (0)
#endif
#ifndef ASSERT_EXCLUSIVE_ACCESS
#define ASSERT_EXCLUSIVE_ACCESS(var) do { } while (0)
#endif
/* Holdoff in nanoseconds for auto-expediting. */
#define DEFAULT_SRCU_EXP_HOLDOFF (25 * 1000)
static ulong exp_holdoff = DEFAULT_SRCU_EXP_HOLDOFF;
......@@ -1268,8 +1281,8 @@ void srcu_torture_stats_print(struct srcu_struct *ssp, char *tt, char *tf)
struct srcu_data *sdp;
sdp = per_cpu_ptr(ssp->sda, cpu);
u0 = sdp->srcu_unlock_count[!idx];
u1 = sdp->srcu_unlock_count[idx];
u0 = data_race(sdp->srcu_unlock_count[!idx]);
u1 = data_race(sdp->srcu_unlock_count[idx]);
/*
* Make sure that a lock is always counted if the corresponding
......@@ -1277,8 +1290,8 @@ void srcu_torture_stats_print(struct srcu_struct *ssp, char *tt, char *tf)
*/
smp_rmb();
l0 = sdp->srcu_lock_count[!idx];
l1 = sdp->srcu_lock_count[idx];
l0 = data_race(sdp->srcu_lock_count[!idx]);
l1 = data_race(sdp->srcu_lock_count[idx]);
c0 = l0 - u0;
c1 = l1 - u1;
......
This diff is collapsed.
This diff is collapsed.
......@@ -359,6 +359,7 @@ struct rcu_state {
/* Values for rcu_state structure's gp_flags field. */
#define RCU_GP_FLAG_INIT 0x1 /* Need grace-period initialization. */
#define RCU_GP_FLAG_FQS 0x2 /* Need grace-period quiescent-state forcing. */
#define RCU_GP_FLAG_OVLD 0x4 /* Experiencing callback overload. */
/* Values for rcu_state structure's gp_state field. */
#define RCU_GP_IDLE 0 /* Initial state and no GP in progress. */
......@@ -454,6 +455,8 @@ static void rcu_bind_gp_kthread(void);
static bool rcu_nohz_full_cpu(void);
static void rcu_dynticks_task_enter(void);
static void rcu_dynticks_task_exit(void);
static void rcu_dynticks_task_trace_enter(void);
static void rcu_dynticks_task_trace_exit(void);
/* Forward declarations for tree_stall.h */
static void record_gp_stall_check_time(void);
......
......@@ -150,7 +150,7 @@ static void __maybe_unused sync_exp_reset_tree(void)
static bool sync_rcu_exp_done(struct rcu_node *rnp)
{
raw_lockdep_assert_held_rcu_node(rnp);
return rnp->exp_tasks == NULL &&
return READ_ONCE(rnp->exp_tasks) == NULL &&
READ_ONCE(rnp->expmask) == 0;
}
......@@ -373,7 +373,7 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
* until such time as the ->expmask bits are cleared.
*/
if (rcu_preempt_has_tasks(rnp))
rnp->exp_tasks = rnp->blkd_tasks.next;
WRITE_ONCE(rnp->exp_tasks, rnp->blkd_tasks.next);
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
/* IPI the remaining CPUs for expedited quiescent state. */
......@@ -542,8 +542,8 @@ static void synchronize_rcu_expedited_wait(void)
}
pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
jiffies - jiffies_start, rcu_state.expedited_sequence,
READ_ONCE(rnp_root->expmask),
".T"[!!rnp_root->exp_tasks]);
data_race(rnp_root->expmask),
".T"[!!data_race(rnp_root->exp_tasks)]);
if (ndetected) {
pr_err("blocking rcu_node structures:");
rcu_for_each_node_breadth_first(rnp) {
......@@ -553,8 +553,8 @@ static void synchronize_rcu_expedited_wait(void)
continue;
pr_cont(" l=%u:%d-%d:%#lx/%c",
rnp->level, rnp->grplo, rnp->grphi,
READ_ONCE(rnp->expmask),
".T"[!!rnp->exp_tasks]);
data_race(rnp->expmask),
".T"[!!data_race(rnp->exp_tasks)]);
}
pr_cont("\n");
}
......@@ -639,6 +639,7 @@ static void wait_rcu_exp_gp(struct work_struct *wp)
*/
static void rcu_exp_handler(void *unused)
{
int depth = rcu_preempt_depth();
unsigned long flags;
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
struct rcu_node *rnp = rdp->mynode;
......@@ -649,7 +650,7 @@ static void rcu_exp_handler(void *unused)
* critical section. If also enabled or idle, immediately
* report the quiescent state, otherwise defer.
*/
if (!rcu_preempt_depth()) {
if (!depth) {
if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)) ||
rcu_dynticks_curr_cpu_in_eqs()) {
rcu_report_exp_rdp(rdp);
......@@ -673,7 +674,7 @@ static void rcu_exp_handler(void *unused)
* can have caused this quiescent state to already have been
* reported, so we really do need to check ->expmask.
*/
if (rcu_preempt_depth() > 0) {
if (depth > 0) {
raw_spin_lock_irqsave_rcu_node(rnp, flags);
if (rnp->expmask & rdp->grpmask) {
rdp->exp_deferred_qs = true;
......@@ -683,30 +684,8 @@ static void rcu_exp_handler(void *unused)
return;
}
/*
* The final and least likely case is where the interrupted
* code was just about to or just finished exiting the RCU-preempt
* read-side critical section, and no, we can't tell which.
* So either way, set ->deferred_qs to flag later code that
* a quiescent state is required.
*
* If the CPU is fully enabled (or if some buggy RCU-preempt
* read-side critical section is being used from idle), just
* invoke rcu_preempt_deferred_qs() to immediately report the
* quiescent state. We cannot use rcu_read_unlock_special()
* because we are in an interrupt handler, which will cause that
* function to take an early exit without doing anything.
*
* Otherwise, force a context switch after the CPU enables everything.
*/
rdp->exp_deferred_qs = true;
if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)) ||
WARN_ON_ONCE(rcu_dynticks_curr_cpu_in_eqs())) {
rcu_preempt_deferred_qs(t);
} else {
set_tsk_need_resched(t);
set_preempt_need_resched();
}
// Finally, negative nesting depth should not happen.
WARN_ON_ONCE(1);
}
/* PREEMPTION=y, so no PREEMPTION=n expedited grace period to clean up after. */
......@@ -721,17 +700,20 @@ static void sync_sched_exp_online_cleanup(int cpu)
*/
static int rcu_print_task_exp_stall(struct rcu_node *rnp)
{
struct task_struct *t;
unsigned long flags;
int ndetected = 0;
struct task_struct *t;
if (!rnp->exp_tasks)
if (!READ_ONCE(rnp->exp_tasks))
return 0;
raw_spin_lock_irqsave_rcu_node(rnp, flags);
t = list_entry(rnp->exp_tasks->prev,
struct task_struct, rcu_node_entry);
list_for_each_entry_continue(t, &rnp->blkd_tasks, rcu_node_entry) {
pr_cont(" P%d", t->pid);
ndetected++;
}
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
return ndetected;
}
......
......@@ -226,7 +226,7 @@ static void rcu_preempt_ctxt_queue(struct rcu_node *rnp, struct rcu_data *rdp)
WARN_ON_ONCE(rnp->completedqs == rnp->gp_seq);
}
if (!rnp->exp_tasks && (blkd_state & RCU_EXP_BLKD))
rnp->exp_tasks = &t->rcu_node_entry;
WRITE_ONCE(rnp->exp_tasks, &t->rcu_node_entry);
WARN_ON_ONCE(!(blkd_state & RCU_GP_BLKD) !=
!(rnp->qsmask & rdp->grpmask));
WARN_ON_ONCE(!(blkd_state & RCU_EXP_BLKD) !=
......@@ -331,6 +331,7 @@ void rcu_note_context_switch(bool preempt)
rcu_qs();
if (rdp->exp_deferred_qs)
rcu_report_exp_rdp(rdp);
rcu_tasks_qs(current, preempt);
trace_rcu_utilization(TPS("End context switch"));
}
EXPORT_SYMBOL_GPL(rcu_note_context_switch);
......@@ -345,9 +346,7 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp)
return READ_ONCE(rnp->gp_tasks) != NULL;
}
/* Bias and limit values for ->rcu_read_lock_nesting. */
#define RCU_NEST_BIAS INT_MAX
#define RCU_NEST_NMAX (-INT_MAX / 2)
/* limit value for ->rcu_read_lock_nesting. */
#define RCU_NEST_PMAX (INT_MAX / 2)
static void rcu_preempt_read_enter(void)
......@@ -355,9 +354,9 @@ static void rcu_preempt_read_enter(void)
current->rcu_read_lock_nesting++;
}
static void rcu_preempt_read_exit(void)
static int rcu_preempt_read_exit(void)
{
current->rcu_read_lock_nesting--;
return --current->rcu_read_lock_nesting;
}
static void rcu_preempt_depth_set(int val)
......@@ -390,21 +389,15 @@ void __rcu_read_unlock(void)
{
struct task_struct *t = current;
if (rcu_preempt_depth() != 1) {
rcu_preempt_read_exit();
} else {
if (rcu_preempt_read_exit() == 0) {
barrier(); /* critical section before exit code. */
rcu_preempt_depth_set(-RCU_NEST_BIAS);
barrier(); /* assign before ->rcu_read_unlock_special load */
if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s)))
rcu_read_unlock_special(t);
barrier(); /* ->rcu_read_unlock_special load before assign */
rcu_preempt_depth_set(0);
}
if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
int rrln = rcu_preempt_depth();
WARN_ON_ONCE(rrln < 0 && rrln > RCU_NEST_NMAX);
WARN_ON_ONCE(rrln < 0 || rrln > RCU_NEST_PMAX);
}
}
EXPORT_SYMBOL_GPL(__rcu_read_unlock);
......@@ -500,12 +493,12 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
if (&t->rcu_node_entry == rnp->gp_tasks)
WRITE_ONCE(rnp->gp_tasks, np);
if (&t->rcu_node_entry == rnp->exp_tasks)
rnp->exp_tasks = np;
WRITE_ONCE(rnp->exp_tasks, np);
if (IS_ENABLED(CONFIG_RCU_BOOST)) {
/* Snapshot ->boost_mtx ownership w/rnp->lock held. */
drop_boost_mutex = rt_mutex_owner(&rnp->boost_mtx) == t;
if (&t->rcu_node_entry == rnp->boost_tasks)
rnp->boost_tasks = np;
WRITE_ONCE(rnp->boost_tasks, np);
}
/*
......@@ -556,7 +549,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
{
return (__this_cpu_read(rcu_data.exp_deferred_qs) ||
READ_ONCE(t->rcu_read_unlock_special.s)) &&
rcu_preempt_depth() <= 0;
rcu_preempt_depth() == 0;
}
/*
......@@ -569,16 +562,11 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
static void rcu_preempt_deferred_qs(struct task_struct *t)
{
unsigned long flags;
bool couldrecurse = rcu_preempt_depth() >= 0;
if (!rcu_preempt_need_deferred_qs(t))
return;
if (couldrecurse)
rcu_preempt_depth_set(rcu_preempt_depth() - RCU_NEST_BIAS);
local_irq_save(flags);
rcu_preempt_deferred_qs_irqrestore(t, flags);
if (couldrecurse)
rcu_preempt_depth_set(rcu_preempt_depth() + RCU_NEST_BIAS);
}
/*
......@@ -615,19 +603,18 @@ static void rcu_read_unlock_special(struct task_struct *t)
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
struct rcu_node *rnp = rdp->mynode;
exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) ||
(rdp->grpmask & READ_ONCE(rnp->expmask)) ||
tick_nohz_full_cpu(rdp->cpu);
exp = (t->rcu_blocked_node &&
READ_ONCE(t->rcu_blocked_node->exp_tasks)) ||
(rdp->grpmask & READ_ONCE(rnp->expmask));
// Need to defer quiescent state until everything is enabled.
if (irqs_were_disabled && use_softirq &&
(in_interrupt() ||
(exp && !t->rcu_read_unlock_special.b.deferred_qs))) {
// Using softirq, safe to awaken, and we get
// no help from enabling irqs, unlike bh/preempt.
if (use_softirq && (in_irq() || (exp && !irqs_were_disabled))) {
// Using softirq, safe to awaken, and either the
// wakeup is free or there is an expedited GP.
raise_softirq_irqoff(RCU_SOFTIRQ);
} else {
// Enabling BH or preempt does reschedule, so...
// Also if no expediting or NO_HZ_FULL, slow is OK.
// Also if no expediting, slow is OK.
// Plus nohz_full CPUs eventually get tick enabled.
set_tsk_need_resched(current);
set_preempt_need_resched();
if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled &&
......@@ -640,7 +627,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
}
}
t->rcu_read_unlock_special.b.deferred_qs = true;
local_irq_restore(flags);
return;
}
......@@ -699,7 +685,7 @@ static void rcu_flavor_sched_clock_irq(int user)
} else if (rcu_preempt_need_deferred_qs(t)) {
rcu_preempt_deferred_qs(t); /* Report deferred QS. */
return;
} else if (!rcu_preempt_depth()) {
} else if (!WARN_ON_ONCE(rcu_preempt_depth())) {
rcu_qs(); /* Report immediate QS. */
return;
}
......@@ -760,8 +746,8 @@ dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
pr_info("%s: %d:%d ->qsmask %#lx ->qsmaskinit %#lx ->qsmaskinitnext %#lx\n",
__func__, rnp1->grplo, rnp1->grphi, rnp1->qsmask, rnp1->qsmaskinit, rnp1->qsmaskinitnext);
pr_info("%s: ->gp_tasks %p ->boost_tasks %p ->exp_tasks %p\n",
__func__, READ_ONCE(rnp->gp_tasks), rnp->boost_tasks,
rnp->exp_tasks);
__func__, READ_ONCE(rnp->gp_tasks), data_race(rnp->boost_tasks),
READ_ONCE(rnp->exp_tasks));
pr_info("%s: ->blkd_tasks", __func__);
i = 0;
list_for_each(lhp, &rnp->blkd_tasks) {
......@@ -854,8 +840,7 @@ void rcu_note_context_switch(bool preempt)
this_cpu_write(rcu_data.rcu_urgent_qs, false);
if (unlikely(raw_cpu_read(rcu_data.rcu_need_heavy_qs)))
rcu_momentary_dyntick_idle();
if (!preempt)
rcu_tasks_qs(current);
rcu_tasks_qs(current, preempt);
out:
trace_rcu_utilization(TPS("End context switch"));
}
......@@ -1036,7 +1021,8 @@ static int rcu_boost_kthread(void *arg)
for (;;) {
WRITE_ONCE(rnp->boost_kthread_status, RCU_KTHREAD_WAITING);
trace_rcu_utilization(TPS("End boost kthread@rcu_wait"));
rcu_wait(rnp->boost_tasks || rnp->exp_tasks);
rcu_wait(READ_ONCE(rnp->boost_tasks) ||
READ_ONCE(rnp->exp_tasks));
trace_rcu_utilization(TPS("Start boost kthread@rcu_wait"));
WRITE_ONCE(rnp->boost_kthread_status, RCU_KTHREAD_RUNNING);
more2boost = rcu_boost(rnp);
......@@ -1079,9 +1065,9 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
(rnp->gp_tasks != NULL &&
rnp->boost_tasks == NULL &&
rnp->qsmask == 0 &&
(ULONG_CMP_GE(jiffies, rnp->boost_time) || rcu_state.cbovld))) {
(!time_after(rnp->boost_time, jiffies) || rcu_state.cbovld))) {
if (rnp->exp_tasks == NULL)
rnp->boost_tasks = rnp->gp_tasks;
WRITE_ONCE(rnp->boost_tasks, rnp->gp_tasks);
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
rcu_wake_cond(rnp->boost_kthread_task,
READ_ONCE(rnp->boost_kthread_status));
......@@ -2536,7 +2522,7 @@ static bool rcu_nohz_full_cpu(void)
#ifdef CONFIG_NO_HZ_FULL
if (tick_nohz_full_cpu(smp_processor_id()) &&
(!rcu_gp_in_progress() ||
ULONG_CMP_LT(jiffies, READ_ONCE(rcu_state.gp_start) + HZ)))
time_before(jiffies, READ_ONCE(rcu_state.gp_start) + HZ)))
return true;
#endif /* #ifdef CONFIG_NO_HZ_FULL */
return false;
......@@ -2553,7 +2539,7 @@ static void rcu_bind_gp_kthread(void)
}
/* Record the current task on dyntick-idle entry. */
static void rcu_dynticks_task_enter(void)
static void noinstr rcu_dynticks_task_enter(void)
{
#if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
WRITE_ONCE(current->rcu_tasks_idle_cpu, smp_processor_id());
......@@ -2561,9 +2547,27 @@ static void rcu_dynticks_task_enter(void)
}
/* Record no current task on dyntick-idle exit. */
static void rcu_dynticks_task_exit(void)
static void noinstr rcu_dynticks_task_exit(void)
{
#if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
WRITE_ONCE(current->rcu_tasks_idle_cpu, -1);
#endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */
}
/* Turn on heavyweight RCU tasks trace readers on idle/user entry. */
static void rcu_dynticks_task_trace_enter(void)
{
#ifdef CONFIG_TASKS_RCU_TRACE
if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB))
current->trc_reader_special.b.need_mb = true;
#endif /* #ifdef CONFIG_TASKS_RCU_TRACE */
}
/* Turn off heavyweight RCU tasks trace readers on idle/user exit. */
static void rcu_dynticks_task_trace_exit(void)
{
#ifdef CONFIG_TASKS_RCU_TRACE
if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB))
current->trc_reader_special.b.need_mb = false;
#endif /* #ifdef CONFIG_TASKS_RCU_TRACE */
}
This diff is collapsed.
This diff is collapsed.
......@@ -2561,6 +2561,8 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
*
* Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
* __schedule(). See the comment for smp_mb__after_spinlock().
*
* A similar smb_rmb() lives in try_invoke_on_locked_down_task().
*/
smp_rmb();
if (p->on_rq && ttwu_remote(p, wake_flags))
......@@ -2634,6 +2636,52 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
return success;
}
/**
* try_invoke_on_locked_down_task - Invoke a function on task in fixed state
* @p: Process for which the function is to be invoked.
* @func: Function to invoke.
* @arg: Argument to function.
*
* If the specified task can be quickly locked into a definite state
* (either sleeping or on a given runqueue), arrange to keep it in that
* state while invoking @func(@arg). This function can use ->on_rq and
* task_curr() to work out what the state is, if required. Given that
* @func can be invoked with a runqueue lock held, it had better be quite
* lightweight.
*
* Returns:
* @false if the task slipped out from under the locks.
* @true if the task was locked onto a runqueue or is sleeping.
* However, @func can override this by returning @false.
*/
bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg)
{
bool ret = false;
struct rq_flags rf;
struct rq *rq;
lockdep_assert_irqs_enabled();
raw_spin_lock_irq(&p->pi_lock);
if (p->on_rq) {
rq = __task_rq_lock(p, &rf);
if (task_rq(p) == rq)
ret = func(p, arg);
rq_unlock(rq, &rf);
} else {
switch (p->state) {
case TASK_RUNNING:
case TASK_WAKING:
break;
default:
smp_rmb(); // See smp_rmb() comment in try_to_wake_up().
if (!p->on_rq)
ret = func(p, arg);
}
}
raw_spin_unlock_irq(&p->pi_lock);
return ret;
}
/**
* wake_up_process - Wake up a specific process
* @p: The process to be woken up.
......
......@@ -10,11 +10,6 @@ config USER_STACKTRACE_SUPPORT
config NOP_TRACER
bool
config HAVE_FTRACE_NMI_ENTER
bool
help
See Documentation/trace/ftrace-design.rst
config HAVE_FUNCTION_TRACER
bool
help
......@@ -72,11 +67,6 @@ config RING_BUFFER
select TRACE_CLOCK
select IRQ_WORK
config FTRACE_NMI_ENTER
bool
depends on HAVE_FTRACE_NMI_ENTER
default y
config EVENT_TRACING
select CONTEXT_SWITCH_TRACER
select GLOB
......@@ -158,6 +148,7 @@ config FUNCTION_TRACER
select CONTEXT_SWITCH_TRACER
select GLOB
select TASKS_RCU if PREEMPTION
select TASKS_RUDE_RCU
help
Enable the kernel to trace every kernel function. This is done
by using a compiler feature to insert a small, 5-byte No-Operation
......
......@@ -160,17 +160,6 @@ static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip,
op->saved_func(ip, parent_ip, op, regs);
}
static void ftrace_sync(struct work_struct *work)
{
/*
* This function is just a stub to implement a hard force
* of synchronize_rcu(). This requires synchronizing
* tasks even in userspace and idle.
*
* Yes, function tracing is rude.
*/
}
static void ftrace_sync_ipi(void *data)
{
/* Probably not needed, but do it anyway */
......@@ -256,7 +245,7 @@ static void update_ftrace_function(void)
* Make sure all CPUs see this. Yes this is slow, but static
* tracing is slow and nasty to have enabled.
*/
schedule_on_each_cpu(ftrace_sync);
synchronize_rcu_tasks_rude();
/* Now all cpus are using the list ops. */
function_trace_op = set_function_trace_op;
/* Make sure the function_trace_op is visible on all CPUs */
......@@ -2932,7 +2921,7 @@ int ftrace_shutdown(struct ftrace_ops *ops, int command)
* infrastructure to do the synchronization, thus we must do it
* ourselves.
*/
schedule_on_each_cpu(ftrace_sync);
synchronize_rcu_tasks_rude();
/*
* When the kernel is preeptive, tasks can be preempted
......@@ -5888,7 +5877,7 @@ ftrace_graph_release(struct inode *inode, struct file *file)
* infrastructure to do the synchronization, thus we must do it
* ourselves.
*/
schedule_on_each_cpu(ftrace_sync);
synchronize_rcu_tasks_rude();
free_ftrace_hash(old_hash);
}
......
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+
#
# If this was a KCSAN run, collapse the reports in the various console.log
# files onto pairs of functions.
#
# Usage: kcsan-collapse.sh resultsdir
#
# Copyright (C) 2020 Facebook, Inc.
#
# Authors: Paul E. McKenney <paulmck@kernel.org>
if test -z "$TORTURE_KCONFIG_KCSAN_ARG"
then
exit 0
fi
cat $1/*/console.log |
grep "BUG: KCSAN: " |
sed -e 's/^\[[^]]*] //' |
sort |
uniq -c |
sort -k1nr > $1/kcsan.sum
......@@ -41,7 +41,21 @@ else
title="$title ($ngpsps/s)"
fi
echo $title $stopstate $fwdprog
nclosecalls=`grep --binary-files=text 'torture: Reader Batch' $i/console.log | tail -1 | awk '{for (i=NF-8;i<=NF;i++) sum+=$i; } END {print sum}'`
nclosecalls=`grep --binary-files=text 'torture: Reader Batch' $i/console.log | tail -1 | \
awk -v sum=0 '
{
for (i = 0; i <= NF; i++) {
sum += $i;
if ($i ~ /Batch:/) {
sum = 0;
i = i + 2;
}
}
}
END {
print sum
}'`
if test -z "$nclosecalls"
then
exit 0
......
......@@ -70,6 +70,15 @@ do
fi
fi
done
if test -f "$rd/kcsan.sum"
then
if test -s "$rd/kcsan.sum"
then
echo KCSAN summary in $rd/kcsan.sum
else
echo Clean KCSAN run in $rd
fi
fi
done
EDITOR=echo kvm-find-errors.sh "${@: -1}" > $T 2>&1
ret=$?
......
......@@ -31,6 +31,8 @@ TORTURE_DEFCONFIG=defconfig
TORTURE_BOOT_IMAGE=""
TORTURE_INITRD="$KVM/initrd"; export TORTURE_INITRD
TORTURE_KCONFIG_ARG=""
TORTURE_KCONFIG_KASAN_ARG=""
TORTURE_KCONFIG_KCSAN_ARG=""
TORTURE_KMAKE_ARG=""
TORTURE_QEMU_MEM=512
TORTURE_SHUTDOWN_GRACE=180
......@@ -133,6 +135,12 @@ do
TORTURE_KCONFIG_ARG="$2"
shift
;;
--kasan)
TORTURE_KCONFIG_KASAN_ARG="CONFIG_DEBUG_INFO=y CONFIG_KASAN=y"; export TORTURE_KCONFIG_KASAN_ARG
;;
--kcsan)
TORTURE_KCONFIG_KCSAN_ARG="CONFIG_DEBUG_INFO=y CONFIG_KCSAN=y CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000 CONFIG_KCSAN_VERBOSE=y CONFIG_KCSAN_INTERRUPT_WATCHER=y"; export TORTURE_KCONFIG_KCSAN_ARG
;;
--kmake-arg)
checkarg --kmake-arg "(kernel make arguments)" $# "$2" '.*' '^error$'
TORTURE_KMAKE_ARG="$2"
......@@ -310,6 +318,8 @@ TORTURE_BUILDONLY="$TORTURE_BUILDONLY"; export TORTURE_BUILDONLY
TORTURE_DEFCONFIG="$TORTURE_DEFCONFIG"; export TORTURE_DEFCONFIG
TORTURE_INITRD="$TORTURE_INITRD"; export TORTURE_INITRD
TORTURE_KCONFIG_ARG="$TORTURE_KCONFIG_ARG"; export TORTURE_KCONFIG_ARG
TORTURE_KCONFIG_KASAN_ARG="$TORTURE_KCONFIG_KASAN_ARG"; export TORTURE_KCONFIG_KASAN_ARG
TORTURE_KCONFIG_KCSAN_ARG="$TORTURE_KCONFIG_KCSAN_ARG"; export TORTURE_KCONFIG_KCSAN_ARG
TORTURE_KMAKE_ARG="$TORTURE_KMAKE_ARG"; export TORTURE_KMAKE_ARG
TORTURE_QEMU_CMD="$TORTURE_QEMU_CMD"; export TORTURE_QEMU_CMD
TORTURE_QEMU_INTERACTIVE="$TORTURE_QEMU_INTERACTIVE"; export TORTURE_QEMU_INTERACTIVE
......@@ -464,6 +474,7 @@ echo
echo
echo " --- `date` Test summary:"
echo Results directory: $resdir/$ds
kcsan-collapse.sh $resdir/$ds
kvm-recheck.sh $resdir/$ds
___EOF___
......
......@@ -14,3 +14,6 @@ TINY02
TASKS01
TASKS02
TASKS03
RUDE01
TRACE01
TRACE02
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment