Commit 7d9d077c authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull RCU updates from Paul McKenney:

 - Documentation updates

 - Miscellaneous fixes

 - Callback-offload updates, perhaps most notably a new
   RCU_NOCB_CPU_DEFAULT_ALL Kconfig option that causes all CPUs to be
   offloaded at boot time, regardless of kernel boot parameters.

   This is useful to battery-powered systems such as ChromeOS and
   Android. In addition, a new RCU_NOCB_CPU_CB_BOOST kernel boot
   parameter prevents offloaded callbacks from interfering with
   real-time workloads and with energy-efficiency mechanisms

 - Polled grace-period updates, perhaps most notably making these APIs
   account for both normal and expedited grace periods

 - Tasks RCU updates, perhaps most notably reducing the CPU overhead of
   RCU tasks trace grace periods by more than a factor of two on a
   system with 15,000 tasks.

   The reduction is expected to increase with the number of tasks, so it
   seems reasonable to hypothesize that a system with 150,000 tasks
   might see a 20-fold reduction in CPU overhead

 - Torture-test updates

 - Updates that merge RCU's dyntick-idle tracking into context tracking,
   thus reducing the overhead of transitioning to kernel mode from
   either idle or nohz_full userspace execution for kernels that track
   context independently of RCU.

   This is expected to be helpful primarily for kernels built with
   CONFIG_NO_HZ_FULL=y

* tag 'rcu.2022.07.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (98 commits)
  rcu: Add irqs-disabled indicator to expedited RCU CPU stall warnings
  rcu: Diagnose extended sync_rcu_do_polled_gp() loops
  rcu: Put panic_on_rcu_stall() after expedited RCU CPU stall warnings
  rcutorture: Test polled expedited grace-period primitives
  rcu: Add polled expedited grace-period primitives
  rcutorture: Verify that polled GP API sees synchronous grace periods
  rcu: Make Tiny RCU grace periods visible to polled APIs
  rcu: Make polled grace-period API account for expedited grace periods
  rcu: Switch polled grace-period APIs to ->gp_seq_polled
  rcu/nocb: Avoid polling when my_rdp->nocb_head_rdp list is empty
  rcu/nocb: Add option to opt rcuo kthreads out of RT priority
  rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()
  rcu/nocb: Add an option to offload all CPUs on boot
  rcu/nocb: Fix NOCB kthreads spawn failure with rcu_nocb_rdp_deoffload() direct call
  rcu/nocb: Invert rcu_state.barrier_mutex VS hotplug lock locking order
  rcu/nocb: Add/del rdp to iterate from rcuog itself
  rcu/tree: Add comment to describe GP-done condition in fqs loop
  rcu: Initialize first_gp_fqs at declaration in rcu_gp_fqs()
  rcu/kvfree: Remove useless monitor_todo flag
  rcu: Cleanup RCU urgency state for offline CPU
  ...
parents c2a24a7a 34bc7b45
......@@ -1844,10 +1844,10 @@ that meets this requirement.
Furthermore, NMI handlers can be interrupted by what appear to RCU to be
normal interrupts. One way that this can happen is for code that
directly invokes rcu_irq_enter() and rcu_irq_exit() to be called
directly invokes ct_irq_enter() and ct_irq_exit() to be called
from an NMI handler. This astonishing fact of life prompted the current
code structure, which has rcu_irq_enter() invoking
rcu_nmi_enter() and rcu_irq_exit() invoking rcu_nmi_exit().
code structure, which has ct_irq_enter() invoking
ct_nmi_enter() and ct_irq_exit() invoking ct_nmi_exit().
And yes, I also learned of this requirement the hard way.
Loadable Modules
......@@ -2195,7 +2195,7 @@ scheduling-clock interrupt be enabled when RCU needs it to be:
sections, and RCU believes this CPU to be idle, no problem. This
sort of thing is used by some architectures for light-weight
exception handlers, which can then avoid the overhead of
rcu_irq_enter() and rcu_irq_exit() at exception entry and
ct_irq_enter() and ct_irq_exit() at exception entry and
exit, respectively. Some go further and avoid the entireties of
irq_enter() and irq_exit().
Just make very sure you are running some of your tests with
......@@ -2226,7 +2226,7 @@ scheduling-clock interrupt be enabled when RCU needs it to be:
+-----------------------------------------------------------------------+
| **Answer**: |
+-----------------------------------------------------------------------+
| One approach is to do ``rcu_irq_exit();rcu_irq_enter();`` every so |
| One approach is to do ``ct_irq_exit();ct_irq_enter();`` every so |
| often. But given that long-running interrupt handlers can cause other |
| problems, not least for response time, shouldn't you work to keep |
| your interrupt handler's runtime within reasonable bounds? |
......
......@@ -97,12 +97,12 @@ warnings:
which will include additional debugging information.
- A low-level kernel issue that either fails to invoke one of the
variants of rcu_user_enter(), rcu_user_exit(), rcu_idle_enter(),
rcu_idle_exit(), rcu_irq_enter(), or rcu_irq_exit() on the one
variants of rcu_eqs_enter(true), rcu_eqs_exit(true), ct_idle_enter(),
ct_idle_exit(), ct_irq_enter(), or ct_irq_exit() on the one
hand, or that invokes one of them too many times on the other.
Historically, the most frequent issue has been an omission
of either irq_enter() or irq_exit(), which in turn invoke
rcu_irq_enter() or rcu_irq_exit(), respectively. Building your
ct_irq_enter() or ct_irq_exit(), respectively. Building your
kernel with CONFIG_RCU_EQS_DEBUG=y can help track down these types
of issues, which sometimes arise in architecture-specific code.
......
......@@ -3667,6 +3667,9 @@
just as if they had also been called out in the
rcu_nocbs= boot parameter.
Note that this argument takes precedence over
the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
noiotrap [SH] Disables trapped I/O port accesses.
noirqdebug [X86-32] Disables the code which attempts to detect and
......@@ -4560,6 +4563,9 @@
no-callback mode from boot but the mode may be
toggled at runtime via cpusets.
Note that this argument takes precedence over
the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
rcu_nocb_poll [KNL]
Rather than requiring that offloaded CPUs
(specified by rcu_nocbs= above) explicitly
......@@ -4669,6 +4675,34 @@
When RCU_NOCB_CPU is set, also adjust the
priority of NOCB callback kthreads.
rcutree.rcu_divisor= [KNL]
Set the shift-right count to use to compute
the callback-invocation batch limit bl from
the number of callbacks queued on this CPU.
The result will be bounded below by the value of
the rcutree.blimit kernel parameter. Every bl
callbacks, the softirq handler will exit in
order to allow the CPU to do other work.
Please note that this callback-invocation batch
limit applies only to non-offloaded callback
invocation. Offloaded callbacks are instead
invoked in the context of an rcuoc kthread, which
scheduler will preempt as it does any other task.
rcutree.nocb_nobypass_lim_per_jiffy= [KNL]
On callback-offloaded (rcu_nocbs) CPUs,
RCU reduces the lock contention that would
otherwise be caused by callback floods through
use of the ->nocb_bypass list. However, in the
common non-flooded case, RCU queues directly to
the main ->cblist in order to avoid the extra
overhead of the ->nocb_bypass list and its lock.
But if there are too many callbacks queued during
a single jiffy, RCU pre-queues the callbacks into
the ->nocb_bypass queue. The definition of "too
many" is supplied by this kernel boot parameter.
rcutree.rcu_nocb_gp_stride= [KNL]
Set the number of NOCB callback kthreads in
each group, which defaults to the square root
......
#
# Feature name: context-tracking
# Kconfig: HAVE_CONTEXT_TRACKING
# description: arch supports context tracking for NO_HZ_FULL
# Feature name: user-context-tracking
# Kconfig: HAVE_CONTEXT_TRACKING_USER
# description: arch supports user context tracking for NO_HZ_FULL
#
-----------------------
| arch |status|
......
......@@ -5165,6 +5165,7 @@ F: include/linux/console*
CONTEXT TRACKING
M: Frederic Weisbecker <frederic@kernel.org>
M: "Paul E. McKenney" <paulmck@kernel.org>
S: Maintained
F: kernel/context_tracking.c
F: include/linux/context_tracking*
......
......@@ -784,7 +784,7 @@ config HAVE_ARCH_WITHIN_STACK_FRAMES
and similar) by implementing an inline arch_within_stack_frames(),
which is used by CONFIG_HARDENED_USERCOPY.
config HAVE_CONTEXT_TRACKING
config HAVE_CONTEXT_TRACKING_USER
bool
help
Provide kernel/user boundaries probes necessary for subsystems
......@@ -792,10 +792,10 @@ config HAVE_CONTEXT_TRACKING
Syscalls need to be wrapped inside user_exit()-user_enter(), either
optimized behind static key or through the slow path using TIF_NOHZ
flag. Exceptions handlers must be wrapped as well. Irqs are already
protected inside rcu_irq_enter/rcu_irq_exit() but preemption or signal
protected inside ct_irq_enter/ct_irq_exit() but preemption or signal
handling on irq exit still need to be protected.
config HAVE_CONTEXT_TRACKING_OFFSTACK
config HAVE_CONTEXT_TRACKING_USER_OFFSTACK
bool
help
Architecture neither relies on exception_enter()/exception_exit()
......@@ -807,7 +807,7 @@ config HAVE_CONTEXT_TRACKING_OFFSTACK
- Critical entry code isn't preemptible (or better yet:
not interruptible).
- No use of RCU read side critical sections, unless rcu_nmi_enter()
- No use of RCU read side critical sections, unless ct_nmi_enter()
got called.
- No use of instrumentation, unless instrumentation_begin() got
called.
......
......@@ -84,7 +84,7 @@ config ARM
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if ARM_LPAE
select HAVE_ARM_SMCCC if CPU_V7
select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32
select HAVE_CONTEXT_TRACKING
select HAVE_CONTEXT_TRACKING_USER
select HAVE_C_RECORDMCOUNT
select HAVE_BUILDTIME_MCOUNT_SORT
select HAVE_DEBUG_KMEMLEAK if !XIP_KERNEL
......
......@@ -28,7 +28,7 @@
#include "entry-header.S"
saved_psr .req r8
#if defined(CONFIG_TRACE_IRQFLAGS) || defined(CONFIG_CONTEXT_TRACKING)
#if defined(CONFIG_TRACE_IRQFLAGS) || defined(CONFIG_CONTEXT_TRACKING_USER)
saved_pc .req r9
#define TRACE(x...) x
#else
......@@ -38,7 +38,7 @@ saved_pc .req lr
.section .entry.text,"ax",%progbits
.align 5
#if !(IS_ENABLED(CONFIG_TRACE_IRQFLAGS) || IS_ENABLED(CONFIG_CONTEXT_TRACKING) || \
#if !(IS_ENABLED(CONFIG_TRACE_IRQFLAGS) || IS_ENABLED(CONFIG_CONTEXT_TRACKING_USER) || \
IS_ENABLED(CONFIG_DEBUG_RSEQ))
/*
* This is the fast syscall return path. We do as little as possible here,
......
......@@ -366,25 +366,25 @@ ALT_UP_B(.L1_\@)
* between user and kernel mode.
*/
.macro ct_user_exit, save = 1
#ifdef CONFIG_CONTEXT_TRACKING
#ifdef CONFIG_CONTEXT_TRACKING_USER
.if \save
stmdb sp!, {r0-r3, ip, lr}
bl context_tracking_user_exit
bl user_exit_callable
ldmia sp!, {r0-r3, ip, lr}
.else
bl context_tracking_user_exit
bl user_exit_callable
.endif
#endif
.endm
.macro ct_user_enter, save = 1
#ifdef CONFIG_CONTEXT_TRACKING
#ifdef CONFIG_CONTEXT_TRACKING_USER
.if \save
stmdb sp!, {r0-r3, ip, lr}
bl context_tracking_user_enter
bl user_enter_callable
ldmia sp!, {r0-r3, ip, lr}
.else
bl context_tracking_user_enter
bl user_enter_callable
.endif
#endif
.endm
......
......@@ -3,6 +3,7 @@
* Copyright (C) 2012 Freescale Semiconductor, Inc.
*/
#include <linux/context_tracking.h>
#include <linux/cpuidle.h>
#include <linux/module.h>
#include <asm/cpuidle.h>
......@@ -24,9 +25,9 @@ static int imx6q_enter_wait(struct cpuidle_device *dev,
imx6_set_lpm(WAIT_UNCLOCKED);
raw_spin_unlock(&cpuidle_lock);
rcu_idle_enter();
ct_idle_enter();
cpu_do_idle();
rcu_idle_exit();
ct_idle_exit();
raw_spin_lock(&cpuidle_lock);
if (num_idle_cpus-- == num_online_cpus())
......
......@@ -176,7 +176,7 @@ config ARM64
select HAVE_C_RECORDMCOUNT
select HAVE_CMPXCHG_DOUBLE
select HAVE_CMPXCHG_LOCAL
select HAVE_CONTEXT_TRACKING
select HAVE_CONTEXT_TRACKING_USER
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
......
......@@ -41,7 +41,7 @@ static __always_inline void __enter_from_kernel_mode(struct pt_regs *regs)
if (!IS_ENABLED(CONFIG_TINY_RCU) && is_idle_task(current)) {
lockdep_hardirqs_off(CALLER_ADDR0);
rcu_irq_enter();
ct_irq_enter();
trace_hardirqs_off_finish();
regs->exit_rcu = true;
......@@ -76,7 +76,7 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
if (regs->exit_rcu) {
trace_hardirqs_on_prepare();
lockdep_hardirqs_on_prepare();
rcu_irq_exit();
ct_irq_exit();
lockdep_hardirqs_on(CALLER_ADDR0);
return;
}
......@@ -84,7 +84,7 @@ static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
trace_hardirqs_on();
} else {
if (regs->exit_rcu)
rcu_irq_exit();
ct_irq_exit();
}
}
......@@ -161,7 +161,7 @@ static void noinstr arm64_enter_nmi(struct pt_regs *regs)
__nmi_enter();
lockdep_hardirqs_off(CALLER_ADDR0);
lockdep_hardirq_enter();
rcu_nmi_enter();
ct_nmi_enter();
trace_hardirqs_off_finish();
ftrace_nmi_enter();
......@@ -182,7 +182,7 @@ static void noinstr arm64_exit_nmi(struct pt_regs *regs)
lockdep_hardirqs_on_prepare();
}
rcu_nmi_exit();
ct_nmi_exit();
lockdep_hardirq_exit();
if (restore)
lockdep_hardirqs_on(CALLER_ADDR0);
......@@ -199,7 +199,7 @@ static void noinstr arm64_enter_el1_dbg(struct pt_regs *regs)
regs->lockdep_hardirqs = lockdep_hardirqs_enabled();
lockdep_hardirqs_off(CALLER_ADDR0);
rcu_nmi_enter();
ct_nmi_enter();
trace_hardirqs_off_finish();
}
......@@ -218,7 +218,7 @@ static void noinstr arm64_exit_el1_dbg(struct pt_regs *regs)
lockdep_hardirqs_on_prepare();
}
rcu_nmi_exit();
ct_nmi_exit();
if (restore)
lockdep_hardirqs_on(CALLER_ADDR0);
}
......
......@@ -42,7 +42,7 @@ config CSKY
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_CONTEXT_TRACKING
select HAVE_CONTEXT_TRACKING_USER
select HAVE_VIRT_CPU_ACCOUNTING_GEN
select HAVE_DEBUG_BUGVERBOSE
select HAVE_DEBUG_KMEMLEAK
......
......@@ -19,11 +19,11 @@
.endm
.macro context_tracking
#ifdef CONFIG_CONTEXT_TRACKING
#ifdef CONFIG_CONTEXT_TRACKING_USER
mfcr a0, epsr
btsti a0, 31
bt 1f
jbsr context_tracking_user_exit
jbsr user_exit_callable
ldw a0, (sp, LSAVE_A0)
ldw a1, (sp, LSAVE_A1)
ldw a2, (sp, LSAVE_A2)
......@@ -159,8 +159,8 @@ ret_from_exception:
and r10, r9
cmpnei r10, 0
bt exit_work
#ifdef CONFIG_CONTEXT_TRACKING
jbsr context_tracking_user_enter
#ifdef CONFIG_CONTEXT_TRACKING_USER
jbsr user_enter_callable
#endif
1:
#ifdef CONFIG_PREEMPTION
......
......@@ -75,7 +75,7 @@ config LOONGARCH
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select HAVE_ASM_MODVERSIONS
select HAVE_CONTEXT_TRACKING
select HAVE_CONTEXT_TRACKING_USER
select HAVE_DEBUG_STACKOVERFLOW
select HAVE_DMA_CONTIGUOUS
select HAVE_EXIT_THREAD
......
......@@ -56,7 +56,7 @@ config MIPS
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if CPU_SUPPORTS_HUGEPAGES
select HAVE_ASM_MODVERSIONS
select HAVE_CONTEXT_TRACKING
select HAVE_CONTEXT_TRACKING_USER
select HAVE_TIF_NOHZ
select HAVE_C_RECORDMCOUNT
select HAVE_DEBUG_KMEMLEAK
......
......@@ -202,7 +202,7 @@ config PPC
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select HAVE_ASM_MODVERSIONS
select HAVE_CONTEXT_TRACKING if PPC64
select HAVE_CONTEXT_TRACKING_USER if PPC64
select HAVE_C_RECORDMCOUNT
select HAVE_DEBUG_KMEMLEAK
select HAVE_DEBUG_STACKOVERFLOW
......
......@@ -2,7 +2,7 @@
#ifndef _ASM_POWERPC_CONTEXT_TRACKING_H
#define _ASM_POWERPC_CONTEXT_TRACKING_H
#ifdef CONFIG_CONTEXT_TRACKING
#ifdef CONFIG_CONTEXT_TRACKING_USER
#define SCHEDULE_USER bl schedule_user
#else
#define SCHEDULE_USER bl schedule
......
......@@ -86,7 +86,7 @@ config RISCV
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_VMAP_STACK if MMU && 64BIT
select HAVE_ASM_MODVERSIONS
select HAVE_CONTEXT_TRACKING
select HAVE_CONTEXT_TRACKING_USER
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS if MMU
select HAVE_EBPF_JIT if MMU
......
......@@ -111,12 +111,12 @@ _save_context:
call __trace_hardirqs_off
#endif
#ifdef CONFIG_CONTEXT_TRACKING
/* If previous state is in user mode, call context_tracking_user_exit. */
#ifdef CONFIG_CONTEXT_TRACKING_USER
/* If previous state is in user mode, call user_exit_callable(). */
li a0, SR_PP
and a0, s1, a0
bnez a0, skip_context_tracking
call context_tracking_user_exit
call user_exit_callable
skip_context_tracking:
#endif
......@@ -176,7 +176,7 @@ handle_syscall:
*/
csrs CSR_STATUS, SR_IE
#endif
#if defined(CONFIG_TRACE_IRQFLAGS) || defined(CONFIG_CONTEXT_TRACKING)
#if defined(CONFIG_TRACE_IRQFLAGS) || defined(CONFIG_CONTEXT_TRACKING_USER)
/* Recover a0 - a7 for system calls */
REG_L a0, PT_A0(sp)
REG_L a1, PT_A1(sp)
......@@ -269,8 +269,8 @@ resume_userspace:
andi s1, s0, _TIF_WORK_MASK
bnez s1, work_pending
#ifdef CONFIG_CONTEXT_TRACKING
call context_tracking_user_enter
#ifdef CONFIG_CONTEXT_TRACKING_USER
call user_enter_callable
#endif
/* Save unwound kernel stack pointer in thread_info */
......
......@@ -73,7 +73,7 @@ config SPARC64
select HAVE_DYNAMIC_FTRACE
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_CONTEXT_TRACKING
select HAVE_CONTEXT_TRACKING_USER
select HAVE_TIF_NOHZ
select HAVE_DEBUG_KMEMLEAK
select IOMMU_HELPER
......
......@@ -15,7 +15,7 @@
#include <asm/visasm.h>
#include <asm/processor.h>
#ifdef CONFIG_CONTEXT_TRACKING
#ifdef CONFIG_CONTEXT_TRACKING_USER
# define SCHEDULE_USER schedule_user
#else
# define SCHEDULE_USER schedule
......
......@@ -186,8 +186,8 @@ config X86
select HAVE_ASM_MODVERSIONS
select HAVE_CMPXCHG_DOUBLE
select HAVE_CMPXCHG_LOCAL
select HAVE_CONTEXT_TRACKING if X86_64
select HAVE_CONTEXT_TRACKING_OFFSTACK if HAVE_CONTEXT_TRACKING
select HAVE_CONTEXT_TRACKING_USER if X86_64
select HAVE_CONTEXT_TRACKING_USER_OFFSTACK if HAVE_CONTEXT_TRACKING_USER
select HAVE_C_RECORDMCOUNT
select HAVE_OBJTOOL_MCOUNT if HAVE_OBJTOOL
select HAVE_BUILDTIME_MCOUNT_SORT
......
......@@ -1526,7 +1526,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
/*
* Entry handling for valid #PF from kernel mode is slightly
* different: RCU is already watching and rcu_irq_enter() must not
* different: RCU is already watching and ct_irq_enter() must not
* be invoked because a kernel fault on a user space address might
* sleep.
*
......
......@@ -33,7 +33,7 @@ config XTENSA
select HAVE_ARCH_KCSAN
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select HAVE_CONTEXT_TRACKING
select HAVE_CONTEXT_TRACKING_USER
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_EXIT_THREAD
......
......@@ -455,10 +455,10 @@ KABI_W or a3, a3, a2
abi_call trace_hardirqs_off
1:
#endif
#ifdef CONFIG_CONTEXT_TRACKING
#ifdef CONFIG_CONTEXT_TRACKING_USER
l32i abi_tmp0, a1, PT_PS
bbci.l abi_tmp0, PS_UM_BIT, 1f
abi_call context_tracking_user_exit
abi_call user_exit_callable
1:
#endif
......@@ -544,8 +544,8 @@ common_exception_return:
j .Lrestore_state
.Lexit_tif_loop_user:
#ifdef CONFIG_CONTEXT_TRACKING
abi_call context_tracking_user_enter
#ifdef CONFIG_CONTEXT_TRACKING_USER
abi_call user_enter_callable
#endif
#ifdef CONFIG_HAVE_HW_BREAKPOINT
_bbci.l abi_saved0, TIF_DB_DISABLED, 1f
......
......@@ -23,6 +23,7 @@
#include <linux/minmax.h>
#include <linux/perf_event.h>
#include <acpi/processor.h>
#include <linux/context_tracking.h>
/*
* Include the apic definitions for x86 to have the APIC timer related defines
......@@ -647,11 +648,11 @@ static int __cpuidle acpi_idle_enter_bm(struct cpuidle_driver *drv,
raw_spin_unlock(&c3_lock);
}
rcu_idle_enter();
ct_idle_enter();
acpi_idle_do_entry(cx);
rcu_idle_exit();
ct_idle_exit();
/* Re-enable bus master arbitration */
if (dis_bm) {
......
......@@ -69,12 +69,12 @@ static int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
return -1;
/* Do runtime PM to manage a hierarchical CPU toplogy. */
rcu_irq_enter_irqson();
ct_irq_enter_irqson();
if (s2idle)
dev_pm_genpd_suspend(pd_dev);
else
pm_runtime_put_sync_suspend(pd_dev);
rcu_irq_exit_irqson();
ct_irq_exit_irqson();
state = psci_get_domain_state();
if (!state)
......@@ -82,12 +82,12 @@ static int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
ret = psci_cpu_suspend_enter(state) ? -1 : idx;
rcu_irq_enter_irqson();
ct_irq_enter_irqson();
if (s2idle)
dev_pm_genpd_resume(pd_dev);
else
pm_runtime_get_sync(pd_dev);
rcu_irq_exit_irqson();
ct_irq_exit_irqson();
cpu_pm_exit();
......
......@@ -116,12 +116,12 @@ static int __sbi_enter_domain_idle_state(struct cpuidle_device *dev,
return -1;
/* Do runtime PM to manage a hierarchical CPU toplogy. */
rcu_irq_enter_irqson();
ct_irq_enter_irqson();
if (s2idle)
dev_pm_genpd_suspend(pd_dev);
else
pm_runtime_put_sync_suspend(pd_dev);
rcu_irq_exit_irqson();
ct_irq_exit_irqson();
if (sbi_is_domain_state_available())
state = sbi_get_domain_state();
......@@ -130,12 +130,12 @@ static int __sbi_enter_domain_idle_state(struct cpuidle_device *dev,
ret = sbi_suspend(state) ? -1 : idx;
rcu_irq_enter_irqson();
ct_irq_enter_irqson();
if (s2idle)
dev_pm_genpd_resume(pd_dev);
else
pm_runtime_get_sync(pd_dev);
rcu_irq_exit_irqson();
ct_irq_exit_irqson();
cpu_pm_exit();
......
......@@ -23,6 +23,7 @@
#include <linux/suspend.h>
#include <linux/tick.h>
#include <linux/mmu_context.h>
#include <linux/context_tracking.h>
#include <trace/events/power.h>
#include "cpuidle.h"
......@@ -150,12 +151,12 @@ static void enter_s2idle_proper(struct cpuidle_driver *drv,
*/
stop_critical_timings();
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
rcu_idle_enter();
ct_idle_enter();
target_state->enter_s2idle(dev, drv, index);
if (WARN_ON_ONCE(!irqs_disabled()))
local_irq_disable();
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
rcu_idle_exit();
ct_idle_exit();
tick_unfreeze();
start_critical_timings();
......@@ -233,10 +234,10 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
stop_critical_timings();
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
rcu_idle_enter();
ct_idle_enter();
entered_state = target_state->enter(dev, drv, index);
if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
rcu_idle_exit();
ct_idle_exit();
start_critical_timings();
sched_clock_idle_wakeup_event();
......
......@@ -10,71 +10,72 @@
#include <asm/ptrace.h>
#ifdef CONFIG_CONTEXT_TRACKING
extern void context_tracking_cpu_set(int cpu);
#ifdef CONFIG_CONTEXT_TRACKING_USER
extern void ct_cpu_track_user(int cpu);
/* Called with interrupts disabled. */
extern void __context_tracking_enter(enum ctx_state state);
extern void __context_tracking_exit(enum ctx_state state);
extern void __ct_user_enter(enum ctx_state state);
extern void __ct_user_exit(enum ctx_state state);
extern void context_tracking_enter(enum ctx_state state);
extern void context_tracking_exit(enum ctx_state state);
extern void context_tracking_user_enter(void);
extern void context_tracking_user_exit(void);
extern void ct_user_enter(enum ctx_state state);
extern void ct_user_exit(enum ctx_state state);
extern void user_enter_callable(void);
extern void user_exit_callable(void);
static inline void user_enter(void)
{
if (context_tracking_enabled())
context_tracking_enter(CONTEXT_USER);
ct_user_enter(CONTEXT_USER);
}
static inline void user_exit(void)
{
if (context_tracking_enabled())
context_tracking_exit(CONTEXT_USER);
ct_user_exit(CONTEXT_USER);
}
/* Called with interrupts disabled. */
static __always_inline void user_enter_irqoff(void)
{
if (context_tracking_enabled())
__context_tracking_enter(CONTEXT_USER);
__ct_user_enter(CONTEXT_USER);
}
static __always_inline void user_exit_irqoff(void)
{
if (context_tracking_enabled())
__context_tracking_exit(CONTEXT_USER);
__ct_user_exit(CONTEXT_USER);
}
static inline enum ctx_state exception_enter(void)
{
enum ctx_state prev_ctx;
if (IS_ENABLED(CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK) ||
if (IS_ENABLED(CONFIG_HAVE_CONTEXT_TRACKING_USER_OFFSTACK) ||
!context_tracking_enabled())
return 0;
prev_ctx = this_cpu_read(context_tracking.state);
prev_ctx = __ct_state();
if (prev_ctx != CONTEXT_KERNEL)
context_tracking_exit(prev_ctx);
ct_user_exit(prev_ctx);
return prev_ctx;
}
static inline void exception_exit(enum ctx_state prev_ctx)
{
if (!IS_ENABLED(CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK) &&
if (!IS_ENABLED(CONFIG_HAVE_CONTEXT_TRACKING_USER_OFFSTACK) &&
context_tracking_enabled()) {
if (prev_ctx != CONTEXT_KERNEL)
context_tracking_enter(prev_ctx);
ct_user_enter(prev_ctx);
}
}
static __always_inline bool context_tracking_guest_enter(void)
{
if (context_tracking_enabled())
__context_tracking_enter(CONTEXT_GUEST);
__ct_user_enter(CONTEXT_GUEST);
return context_tracking_enabled_this_cpu();
}
......@@ -82,40 +83,56 @@ static __always_inline bool context_tracking_guest_enter(void)
static __always_inline void context_tracking_guest_exit(void)
{
if (context_tracking_enabled())
__context_tracking_exit(CONTEXT_GUEST);
__ct_user_exit(CONTEXT_GUEST);
}
/**
* ct_state() - return the current context tracking state if known
*
* Returns the current cpu's context tracking state if context tracking
* is enabled. If context tracking is disabled, returns
* CONTEXT_DISABLED. This should be used primarily for debugging.
*/
static __always_inline enum ctx_state ct_state(void)
{
return context_tracking_enabled() ?
this_cpu_read(context_tracking.state) : CONTEXT_DISABLED;
}
#define CT_WARN_ON(cond) WARN_ON(context_tracking_enabled() && (cond))
#else
static inline void user_enter(void) { }
static inline void user_exit(void) { }
static inline void user_enter_irqoff(void) { }
static inline void user_exit_irqoff(void) { }
static inline enum ctx_state exception_enter(void) { return 0; }
static inline int exception_enter(void) { return 0; }
static inline void exception_exit(enum ctx_state prev_ctx) { }
static inline enum ctx_state ct_state(void) { return CONTEXT_DISABLED; }
static inline int ct_state(void) { return -1; }
static __always_inline bool context_tracking_guest_enter(void) { return false; }
static inline void context_tracking_guest_exit(void) { }
#define CT_WARN_ON(cond) do { } while (0)
#endif /* !CONFIG_CONTEXT_TRACKING_USER */
#endif /* !CONFIG_CONTEXT_TRACKING */
#define CT_WARN_ON(cond) WARN_ON(context_tracking_enabled() && (cond))
#ifdef CONFIG_CONTEXT_TRACKING_FORCE
#ifdef CONFIG_CONTEXT_TRACKING_USER_FORCE
extern void context_tracking_init(void);
#else
static inline void context_tracking_init(void) { }
#endif /* CONFIG_CONTEXT_TRACKING_FORCE */
#endif /* CONFIG_CONTEXT_TRACKING_USER_FORCE */
#ifdef CONFIG_CONTEXT_TRACKING_IDLE
extern void ct_idle_enter(void);
extern void ct_idle_exit(void);
/*
* Is the current CPU in an extended quiescent state?
*
* No ordering, as we are sampling CPU-local information.
*/
static __always_inline bool rcu_dynticks_curr_cpu_in_eqs(void)
{
return !(arch_atomic_read(this_cpu_ptr(&context_tracking.state)) & RCU_DYNTICKS_IDX);
}
/*
* Increment the current CPU's context_tracking structure's ->state field
* with ordering. Return the new value.
*/
static __always_inline unsigned long ct_state_inc(int incby)
{
return arch_atomic_add_return(incby, this_cpu_ptr(&context_tracking.state));
}
#else
static inline void ct_idle_enter(void) { }
static inline void ct_idle_exit(void) { }
#endif /* !CONFIG_CONTEXT_TRACKING_IDLE */
#endif
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_CONTEXT_TRACKING_IRQ_H
#define _LINUX_CONTEXT_TRACKING_IRQ_H
#ifdef CONFIG_CONTEXT_TRACKING_IDLE
void ct_irq_enter(void);
void ct_irq_exit(void);
void ct_irq_enter_irqson(void);
void ct_irq_exit_irqson(void);
void ct_nmi_enter(void);
void ct_nmi_exit(void);
#else
static inline void ct_irq_enter(void) { }
static inline void ct_irq_exit(void) { }
static inline void ct_irq_enter_irqson(void) { }
static inline void ct_irq_exit_irqson(void) { }
static inline void ct_nmi_enter(void) { }
static inline void ct_nmi_exit(void) { }
#endif
#endif
......@@ -4,8 +4,28 @@
#include <linux/percpu.h>
#include <linux/static_key.h>
#include <linux/context_tracking_irq.h>
/* Offset to allow distinguishing irq vs. task-based idle entry/exit. */
#define DYNTICK_IRQ_NONIDLE ((LONG_MAX / 2) + 1)
enum ctx_state {
CONTEXT_DISABLED = -1, /* returned by ct_state() if unknown */
CONTEXT_KERNEL = 0,
CONTEXT_IDLE = 1,
CONTEXT_USER = 2,
CONTEXT_GUEST = 3,
CONTEXT_MAX = 4,
};
/* Even value for idle, else odd. */
#define RCU_DYNTICKS_IDX CONTEXT_MAX
#define CT_STATE_MASK (CONTEXT_MAX - 1)
#define CT_DYNTICKS_MASK (~CT_STATE_MASK)
struct context_tracking {
#ifdef CONFIG_CONTEXT_TRACKING_USER
/*
* When active is false, probes are unset in order
* to minimize overhead: TIF flags are cleared
......@@ -14,18 +34,73 @@ struct context_tracking {
*/
bool active;
int recursion;
enum ctx_state {
CONTEXT_DISABLED = -1, /* returned by ct_state() if unknown */
CONTEXT_KERNEL = 0,
CONTEXT_USER,
CONTEXT_GUEST,
} state;
#endif
#ifdef CONFIG_CONTEXT_TRACKING
atomic_t state;
#endif
#ifdef CONFIG_CONTEXT_TRACKING_IDLE
long dynticks_nesting; /* Track process nesting level. */
long dynticks_nmi_nesting; /* Track irq/NMI nesting level. */
#endif
};
#ifdef CONFIG_CONTEXT_TRACKING
extern struct static_key_false context_tracking_key;
DECLARE_PER_CPU(struct context_tracking, context_tracking);
static __always_inline int __ct_state(void)
{
return arch_atomic_read(this_cpu_ptr(&context_tracking.state)) & CT_STATE_MASK;
}
#endif
#ifdef CONFIG_CONTEXT_TRACKING_IDLE
static __always_inline int ct_dynticks(void)
{
return atomic_read(this_cpu_ptr(&context_tracking.state)) & CT_DYNTICKS_MASK;
}
static __always_inline int ct_dynticks_cpu(int cpu)
{
struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu);
return atomic_read(&ct->state) & CT_DYNTICKS_MASK;
}
static __always_inline int ct_dynticks_cpu_acquire(int cpu)
{
struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu);
return atomic_read_acquire(&ct->state) & CT_DYNTICKS_MASK;
}
static __always_inline long ct_dynticks_nesting(void)
{
return __this_cpu_read(context_tracking.dynticks_nesting);
}
static __always_inline long ct_dynticks_nesting_cpu(int cpu)
{
struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu);
return ct->dynticks_nesting;
}
static __always_inline long ct_dynticks_nmi_nesting(void)
{
return __this_cpu_read(context_tracking.dynticks_nmi_nesting);
}
static __always_inline long ct_dynticks_nmi_nesting_cpu(int cpu)
{
struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu);
return ct->dynticks_nmi_nesting;
}
#endif /* #ifdef CONFIG_CONTEXT_TRACKING_IDLE */
#ifdef CONFIG_CONTEXT_TRACKING_USER
extern struct static_key_false context_tracking_key;
static __always_inline bool context_tracking_enabled(void)
{
return static_branch_unlikely(&context_tracking_key);
......@@ -41,15 +116,31 @@ static inline bool context_tracking_enabled_this_cpu(void)
return context_tracking_enabled() && __this_cpu_read(context_tracking.active);
}
static __always_inline bool context_tracking_in_user(void)
/**
* ct_state() - return the current context tracking state if known
*
* Returns the current cpu's context tracking state if context tracking
* is enabled. If context tracking is disabled, returns
* CONTEXT_DISABLED. This should be used primarily for debugging.
*/
static __always_inline int ct_state(void)
{
return __this_cpu_read(context_tracking.state) == CONTEXT_USER;
int ret;
if (!context_tracking_enabled())
return CONTEXT_DISABLED;
preempt_disable();
ret = __ct_state();
preempt_enable();
return ret;
}
#else
static __always_inline bool context_tracking_in_user(void) { return false; }
static __always_inline bool context_tracking_enabled(void) { return false; }
static __always_inline bool context_tracking_enabled_cpu(int cpu) { return false; }
static __always_inline bool context_tracking_enabled_this_cpu(void) { return false; }
#endif /* CONFIG_CONTEXT_TRACKING */
#endif /* CONFIG_CONTEXT_TRACKING_USER */
#endif
......@@ -357,7 +357,7 @@ void irqentry_exit_to_user_mode(struct pt_regs *regs);
/**
* struct irqentry_state - Opaque object for exception state storage
* @exit_rcu: Used exclusively in the irqentry_*() calls; signals whether the
* exit path has to invoke rcu_irq_exit().
* exit path has to invoke ct_irq_exit().
* @lockdep: Used exclusively in the irqentry_nmi_*() calls; ensures that
* lockdep state is restored correctly on exit from nmi.
*
......@@ -395,12 +395,12 @@ typedef struct irqentry_state {
*
* For kernel mode entries RCU handling is done conditional. If RCU is
* watching then the only RCU requirement is to check whether the tick has
* to be restarted. If RCU is not watching then rcu_irq_enter() has to be
* invoked on entry and rcu_irq_exit() on exit.
* to be restarted. If RCU is not watching then ct_irq_enter() has to be
* invoked on entry and ct_irq_exit() on exit.
*
* Avoiding the rcu_irq_enter/exit() calls is an optimization but also
* Avoiding the ct_irq_enter/exit() calls is an optimization but also
* solves the problem of kernel mode pagefaults which can schedule, which
* is not possible after invoking rcu_irq_enter() without undoing it.
* is not possible after invoking ct_irq_enter() without undoing it.
*
* For user mode entries irqentry_enter_from_user_mode() is invoked to
* establish the proper context for NOHZ_FULL. Otherwise scheduling on exit
......
......@@ -92,14 +92,6 @@ void irq_exit_rcu(void);
#define arch_nmi_exit() do { } while (0)
#endif
#ifdef CONFIG_TINY_RCU
static inline void rcu_nmi_enter(void) { }
static inline void rcu_nmi_exit(void) { }
#else
extern void rcu_nmi_enter(void);
extern void rcu_nmi_exit(void);
#endif
/*
* NMI vs Tracing
* --------------
......@@ -124,7 +116,7 @@ extern void rcu_nmi_exit(void);
do { \
__nmi_enter(); \
lockdep_hardirq_enter(); \
rcu_nmi_enter(); \
ct_nmi_enter(); \
instrumentation_begin(); \
ftrace_nmi_enter(); \
instrumentation_end(); \
......@@ -143,7 +135,7 @@ extern void rcu_nmi_exit(void);
instrumentation_begin(); \
ftrace_nmi_exit(); \
instrumentation_end(); \
rcu_nmi_exit(); \
ct_nmi_exit(); \
lockdep_hardirq_exit(); \
__nmi_exit(); \
} while (0)
......
......@@ -29,6 +29,7 @@
#include <linux/lockdep.h>
#include <asm/processor.h>
#include <linux/cpumask.h>
#include <linux/context_tracking_irq.h>
#define ULONG_CMP_GE(a, b) (ULONG_MAX / 2 >= (a) - (b))
#define ULONG_CMP_LT(a, b) (ULONG_MAX / 2 < (a) - (b))
......@@ -41,6 +42,7 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
void rcu_barrier_tasks(void);
void rcu_barrier_tasks_rude(void);
void synchronize_rcu(void);
unsigned long get_completed_synchronize_rcu(void);
#ifdef CONFIG_PREEMPT_RCU
......@@ -103,13 +105,11 @@ static inline void rcu_sysrq_start(void) { }
static inline void rcu_sysrq_end(void) { }
#endif /* #else #ifdef CONFIG_RCU_STALL_COMMON */
#ifdef CONFIG_NO_HZ_FULL
void rcu_user_enter(void);
void rcu_user_exit(void);
#if defined(CONFIG_NO_HZ_FULL) && (!defined(CONFIG_GENERIC_ENTRY) || !defined(CONFIG_KVM_XFER_TO_GUEST_WORK))
void rcu_irq_work_resched(void);
#else
static inline void rcu_user_enter(void) { }
static inline void rcu_user_exit(void) { }
#endif /* CONFIG_NO_HZ_FULL */
static inline void rcu_irq_work_resched(void) { }
#endif
#ifdef CONFIG_RCU_NOCB_CPU
void rcu_init_nohz(void);
......@@ -128,7 +128,7 @@ static inline void rcu_nocb_flush_deferred_wakeup(void) { }
* @a: Code that RCU needs to pay attention to.
*
* RCU read-side critical sections are forbidden in the inner idle loop,
* that is, between the rcu_idle_enter() and the rcu_idle_exit() -- RCU
* that is, between the ct_idle_enter() and the ct_idle_exit() -- RCU
* will happily ignore any such read-side critical sections. However,
* things like powertop need tracepoints in the inner idle loop.
*
......@@ -143,9 +143,9 @@ static inline void rcu_nocb_flush_deferred_wakeup(void) { }
*/
#define RCU_NONIDLE(a) \
do { \
rcu_irq_enter_irqson(); \
ct_irq_enter_irqson(); \
do { a; } while (0); \
rcu_irq_exit_irqson(); \
ct_irq_exit_irqson(); \
} while (0)
/*
......@@ -169,13 +169,24 @@ void synchronize_rcu_tasks(void);
# endif
# ifdef CONFIG_TASKS_TRACE_RCU
# define rcu_tasks_trace_qs(t) \
do { \
if (!likely(READ_ONCE((t)->trc_reader_checked)) && \
!unlikely(READ_ONCE((t)->trc_reader_nesting))) { \
smp_store_release(&(t)->trc_reader_checked, true); \
smp_mb(); /* Readers partitioned by store. */ \
} \
// Bits for ->trc_reader_special.b.need_qs field.
#define TRC_NEED_QS 0x1 // Task needs a quiescent state.
#define TRC_NEED_QS_CHECKED 0x2 // Task has been checked for needing quiescent state.
u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new);
void rcu_tasks_trace_qs_blkd(struct task_struct *t);
# define rcu_tasks_trace_qs(t) \
do { \
int ___rttq_nesting = READ_ONCE((t)->trc_reader_nesting); \
\
if (likely(!READ_ONCE((t)->trc_reader_special.b.need_qs)) && \
likely(!___rttq_nesting)) { \
rcu_trc_cmpxchg_need_qs((t), 0, TRC_NEED_QS_CHECKED); \
} else if (___rttq_nesting && ___rttq_nesting != INT_MIN && \
!READ_ONCE((t)->trc_reader_special.b.blocked)) { \
rcu_tasks_trace_qs_blkd(t); \
} \
} while (0)
# else
# define rcu_tasks_trace_qs(t) do { } while (0)
......@@ -184,7 +195,7 @@ void synchronize_rcu_tasks(void);
#define rcu_tasks_qs(t, preempt) \
do { \
rcu_tasks_classic_qs((t), (preempt)); \
rcu_tasks_trace_qs((t)); \
rcu_tasks_trace_qs(t); \
} while (0)
# ifdef CONFIG_TASKS_RUDE_RCU
......
......@@ -75,7 +75,7 @@ static inline void rcu_read_unlock_trace(void)
nesting = READ_ONCE(t->trc_reader_nesting) - 1;
barrier(); // Critical section before disabling.
// Disable IPI-based setting of .need_qs.
WRITE_ONCE(t->trc_reader_nesting, INT_MIN);
WRITE_ONCE(t->trc_reader_nesting, INT_MIN + nesting);
if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) {
WRITE_ONCE(t->trc_reader_nesting, nesting);
return; // We assume shallow reader nesting.
......
......@@ -23,6 +23,16 @@ static inline void cond_synchronize_rcu(unsigned long oldstate)
might_sleep();
}
static inline unsigned long start_poll_synchronize_rcu_expedited(void)
{
return start_poll_synchronize_rcu();
}
static inline void cond_synchronize_rcu_expedited(unsigned long oldstate)
{
cond_synchronize_rcu(oldstate);
}
extern void rcu_barrier(void);
static inline void synchronize_rcu_expedited(void)
......@@ -38,7 +48,7 @@ static inline void synchronize_rcu_expedited(void)
*/
extern void kvfree(const void *addr);
static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
static inline void __kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
{
if (head) {
call_rcu(head, func);
......@@ -51,6 +61,15 @@ static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
kvfree((void *) func);
}
#ifdef CONFIG_KASAN_GENERIC
void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func);
#else
static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
{
__kvfree_call_rcu(head, func);
}
#endif
void rcu_qs(void);
static inline void rcu_softirq_qs(void)
......@@ -76,12 +95,6 @@ static inline int rcu_needs_cpu(void)
static inline void rcu_virt_note_context_switch(int cpu) { }
static inline void rcu_cpu_stall_reset(void) { }
static inline int rcu_jiffies_till_stall_check(void) { return 21 * HZ; }
static inline void rcu_idle_enter(void) { }
static inline void rcu_idle_exit(void) { }
static inline void rcu_irq_enter(void) { }
static inline void rcu_irq_exit_irqson(void) { }
static inline void rcu_irq_enter_irqson(void) { }
static inline void rcu_irq_exit(void) { }
static inline void rcu_irq_exit_check_preempt(void) { }
#define rcu_is_idle_cpu(cpu) \
(is_idle_task(current) && !in_nmi() && !in_hardirq() && !in_serving_softirq())
......
......@@ -40,17 +40,13 @@ bool rcu_eqs_special_set(int cpu);
void rcu_momentary_dyntick_idle(void);
void kfree_rcu_scheduler_running(void);
bool rcu_gp_might_be_stalled(void);
unsigned long start_poll_synchronize_rcu_expedited(void);
void cond_synchronize_rcu_expedited(unsigned long oldstate);
unsigned long get_state_synchronize_rcu(void);
unsigned long start_poll_synchronize_rcu(void);
bool poll_state_synchronize_rcu(unsigned long oldstate);
void cond_synchronize_rcu(unsigned long oldstate);
void rcu_idle_enter(void);
void rcu_idle_exit(void);
void rcu_irq_enter(void);
void rcu_irq_exit(void);
void rcu_irq_enter_irqson(void);
void rcu_irq_exit_irqson(void);
bool rcu_is_idle_cpu(int cpu);
#ifdef CONFIG_PROVE_RCU
......@@ -59,6 +55,9 @@ void rcu_irq_exit_check_preempt(void);
static inline void rcu_irq_exit_check_preempt(void) { }
#endif
struct task_struct;
void rcu_preempt_deferred_qs(struct task_struct *t);
void exit_rcu(void);
void rcu_scheduler_starting(void);
......
......@@ -843,8 +843,9 @@ struct task_struct {
int trc_reader_nesting;
int trc_ipi_to_cpu;
union rcu_special trc_reader_special;
bool trc_reader_checked;
struct list_head trc_holdout_list;
struct list_head trc_blkd_node;
int trc_blkd_cpu;
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
struct sched_info sched_info;
......@@ -2223,6 +2224,7 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
extern bool sched_task_on_rq(struct task_struct *p);
extern unsigned long get_wchan(struct task_struct *p);
extern struct task_struct *cpu_curr_snapshot(int cpu);
/*
* In order to reduce various lock holder preemption latencies provide an
......
......@@ -200,13 +200,13 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
*/ \
if (rcuidle) { \
__idx = srcu_read_lock_notrace(&tracepoint_srcu);\
rcu_irq_enter_irqson(); \
ct_irq_enter_irqson(); \
} \
\
__DO_TRACE_CALL(name, TP_ARGS(args)); \
\
if (rcuidle) { \
rcu_irq_exit_irqson(); \
ct_irq_exit_irqson(); \
srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\
} \
\
......
......@@ -494,11 +494,11 @@ config VIRT_CPU_ACCOUNTING_NATIVE
config VIRT_CPU_ACCOUNTING_GEN
bool "Full dynticks CPU time accounting"
depends on HAVE_CONTEXT_TRACKING
depends on HAVE_CONTEXT_TRACKING_USER
depends on HAVE_VIRT_CPU_ACCOUNTING_GEN
depends on GENERIC_CLOCKEVENTS
select VIRT_CPU_ACCOUNTING
select CONTEXT_TRACKING
select CONTEXT_TRACKING_USER
help
Select this option to enable task and CPU time accounting on full
dynticks systems. This accounting is implemented by watching every
......
......@@ -157,6 +157,7 @@ struct task_struct init_task
.trc_reader_nesting = 0,
.trc_reader_special.s = 0,
.trc_holdout_list = LIST_HEAD_INIT(init_task.trc_holdout_list),
.trc_blkd_node = LIST_HEAD_INIT(init_task.trc_blkd_node),
#endif
#ifdef CONFIG_CPUSETS
.mems_allowed_seq = SEQCNT_SPINLOCK_ZERO(init_task.mems_allowed_seq,
......
......@@ -295,7 +295,7 @@ static inline cfi_check_fn find_check_fn(unsigned long ptr)
rcu_idle = !rcu_is_watching();
if (rcu_idle) {
local_irq_save(flags);
rcu_irq_enter();
ct_irq_enter();
}
if (IS_ENABLED(CONFIG_CFI_CLANG_SHADOW))
......@@ -304,7 +304,7 @@ static inline cfi_check_fn find_check_fn(unsigned long ptr)
fn = find_module_check_fn(ptr);
if (rcu_idle) {
rcu_irq_exit();
ct_irq_exit();
local_irq_restore(flags);
}
......
This diff is collapsed.
......@@ -35,11 +35,11 @@ static int cpu_pm_notify(enum cpu_pm_event event)
* disfunctional in cpu idle. Copy RCU_NONIDLE code to let RCU know
* this.
*/
rcu_irq_enter_irqson();
ct_irq_enter_irqson();
rcu_read_lock();
ret = raw_notifier_call_chain(&cpu_pm_notifier.chain, event, NULL);
rcu_read_unlock();
rcu_irq_exit_irqson();
ct_irq_exit_irqson();
return notifier_to_errno(ret);
}
......@@ -49,11 +49,11 @@ static int cpu_pm_notify_robust(enum cpu_pm_event event_up, enum cpu_pm_event ev
unsigned long flags;
int ret;
rcu_irq_enter_irqson();
ct_irq_enter_irqson();
raw_spin_lock_irqsave(&cpu_pm_notifier.lock, flags);
ret = raw_notifier_call_chain_robust(&cpu_pm_notifier.chain, event_up, event_down, NULL);
raw_spin_unlock_irqrestore(&cpu_pm_notifier.lock, flags);
rcu_irq_exit_irqson();
ct_irq_exit_irqson();
return notifier_to_errno(ret);
}
......
......@@ -321,7 +321,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
}
/*
* If this entry hit the idle task invoke rcu_irq_enter() whether
* If this entry hit the idle task invoke ct_irq_enter() whether
* RCU is watching or not.
*
* Interrupts can nest when the first interrupt invokes softirq
......@@ -332,12 +332,12 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
* not nested into another interrupt.
*
* Checking for rcu_is_watching() here would prevent the nesting
* interrupt to invoke rcu_irq_enter(). If that nested interrupt is
* interrupt to invoke ct_irq_enter(). If that nested interrupt is
* the tick then rcu_flavor_sched_clock_irq() would wrongfully
* assume that it is the first interrupt and eventually claim
* quiescent state and end grace periods prematurely.
*
* Unconditionally invoke rcu_irq_enter() so RCU state stays
* Unconditionally invoke ct_irq_enter() so RCU state stays
* consistent.
*
* TINY_RCU does not support EQS, so let the compiler eliminate
......@@ -350,7 +350,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
* as in irqentry_enter_from_user_mode().
*/
lockdep_hardirqs_off(CALLER_ADDR0);
rcu_irq_enter();
ct_irq_enter();
instrumentation_begin();
trace_hardirqs_off_finish();
instrumentation_end();
......@@ -418,7 +418,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
trace_hardirqs_on_prepare();
lockdep_hardirqs_on_prepare();
instrumentation_end();
rcu_irq_exit();
ct_irq_exit();
lockdep_hardirqs_on(CALLER_ADDR0);
return;
}
......@@ -436,7 +436,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
* was not watching on entry.
*/
if (state.exit_rcu)
rcu_irq_exit();
ct_irq_exit();
}
}
......@@ -449,7 +449,7 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)
__nmi_enter();
lockdep_hardirqs_off(CALLER_ADDR0);
lockdep_hardirq_enter();
rcu_nmi_enter();
ct_nmi_enter();
instrumentation_begin();
trace_hardirqs_off_finish();
......@@ -469,7 +469,7 @@ void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_state)
}
instrumentation_end();
rcu_nmi_exit();
ct_nmi_exit();
lockdep_hardirq_exit();
if (irq_state.lockdep)
lockdep_hardirqs_on(CALLER_ADDR0);
......
......@@ -114,7 +114,7 @@ int kernel_text_address(unsigned long addr)
/* Treat this like an NMI as it can happen anywhere */
if (no_rcu)
rcu_nmi_enter();
ct_nmi_enter();
if (is_module_text_address(addr))
goto out;
......@@ -127,7 +127,7 @@ int kernel_text_address(unsigned long addr)
ret = 0;
out:
if (no_rcu)
rcu_nmi_exit();
ct_nmi_exit();
return ret;
}
......
......@@ -1814,6 +1814,7 @@ static inline void rcu_copy_process(struct task_struct *p)
p->trc_reader_nesting = 0;
p->trc_reader_special.s = 0;
INIT_LIST_HEAD(&p->trc_holdout_list);
INIT_LIST_HEAD(&p->trc_blkd_node);
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
}
......
......@@ -6571,7 +6571,7 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
/*
* If a CPU is in the RCU-free window in idle (ie: in the section
* between rcu_idle_enter() and rcu_idle_exit(), then RCU
* between ct_idle_enter() and ct_idle_exit(), then RCU
* considers that CPU to be in an "extended quiescent state",
* which means that RCU will be completely ignoring that CPU.
* Therefore, rcu_read_lock() and friends have absolutely no
......
......@@ -8,6 +8,8 @@ menu "RCU Subsystem"
config TREE_RCU
bool
default y if SMP
# Dynticks-idle tracking
select CONTEXT_TRACKING_IDLE
help
This option selects the RCU implementation that is
designed for very large SMP system with hundreds or
......@@ -262,6 +264,35 @@ config RCU_NOCB_CPU
Say Y here if you need reduced OS jitter, despite added overhead.
Say N here if you are unsure.
config RCU_NOCB_CPU_DEFAULT_ALL
bool "Offload RCU callback processing from all CPUs by default"
depends on RCU_NOCB_CPU
default n
help
Use this option to offload callback processing from all CPUs
by default, in the absence of the rcu_nocbs or nohz_full boot
parameter. This also avoids the need to use any boot parameters
to achieve the effect of offloading all CPUs on boot.
Say Y here if you want offload all CPUs by default on boot.
Say N here if you are unsure.
config RCU_NOCB_CPU_CB_BOOST
bool "Offload RCU callback from real-time kthread"
depends on RCU_NOCB_CPU && RCU_BOOST
default y if PREEMPT_RT
help
Use this option to invoke offloaded callbacks as SCHED_FIFO
to avoid starvation by heavy SCHED_OTHER background load.
Of course, running as SCHED_FIFO during callback floods will
cause the rcuo[ps] kthreads to monopolize the CPU for hundreds
of milliseconds or more. Therefore, when enabling this option,
it is your responsibility to ensure that latency-sensitive
tasks either run with higher priority or run on some other CPU.
Say Y here if you want to set RT priority for offloading kthreads.
Say N here if you are building a !PREEMPT_RT kernel and are unsure.
config TASKS_TRACE_RCU_READ_MB
bool "Tasks Trace RCU readers use memory barriers in user and idle"
depends on RCU_EXPERT && TASKS_TRACE_RCU
......
......@@ -121,7 +121,7 @@ config RCU_EQS_DEBUG
config RCU_STRICT_GRACE_PERIOD
bool "Provide debug RCU implementation with short grace periods"
depends on DEBUG_KERNEL && RCU_EXPERT && NR_CPUS <= 4
depends on DEBUG_KERNEL && RCU_EXPERT && NR_CPUS <= 4 && !TINY_RCU
default n
select PREEMPT_COUNT if PREEMPT=n
help
......
......@@ -12,10 +12,6 @@
#include <trace/events/rcu.h>
/* Offset to allow distinguishing irq vs. task-based idle entry/exit. */
#define DYNTICK_IRQ_NONIDLE ((LONG_MAX / 2) + 1)
/*
* Grace-period counter management.
*/
......@@ -23,6 +19,9 @@
#define RCU_SEQ_CTR_SHIFT 2
#define RCU_SEQ_STATE_MASK ((1 << RCU_SEQ_CTR_SHIFT) - 1)
/* Low-order bit definition for polled grace-period APIs. */
#define RCU_GET_STATE_COMPLETED 0x1
extern int sysctl_sched_rt_runtime;
/*
......@@ -119,6 +118,18 @@ static inline bool rcu_seq_done(unsigned long *sp, unsigned long s)
return ULONG_CMP_GE(READ_ONCE(*sp), s);
}
/*
* Given a snapshot from rcu_seq_snap(), determine whether or not a
* full update-side operation has occurred, but do not allow the
* (ULONG_MAX / 2) safety-factor/guard-band.
*/
static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s)
{
unsigned long cur_s = READ_ONCE(*sp);
return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 1));
}
/*
* Has a grace period completed since the time the old gp_seq was collected?
*/
......
......@@ -419,6 +419,7 @@ rcu_scale_writer(void *arg)
VERBOSE_SCALEOUT_STRING("rcu_scale_writer task started");
WARN_ON(!wdpp);
set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
current->flags |= PF_NO_SETAFFINITY;
sched_set_fifo_low(current);
if (holdoff)
......
This diff is collapsed.
......@@ -385,7 +385,7 @@ static struct ref_scale_ops rwsem_ops = {
};
// Definitions for global spinlock
static DEFINE_SPINLOCK(test_lock);
static DEFINE_RAW_SPINLOCK(test_lock);
static void ref_lock_section(const int nloops)
{
......@@ -393,8 +393,8 @@ static void ref_lock_section(const int nloops)
preempt_disable();
for (i = nloops; i >= 0; i--) {
spin_lock(&test_lock);
spin_unlock(&test_lock);
raw_spin_lock(&test_lock);
raw_spin_unlock(&test_lock);
}
preempt_enable();
}
......@@ -405,9 +405,9 @@ static void ref_lock_delay_section(const int nloops, const int udl, const int nd
preempt_disable();
for (i = nloops; i >= 0; i--) {
spin_lock(&test_lock);
raw_spin_lock(&test_lock);
un_delay(udl, ndl);
spin_unlock(&test_lock);
raw_spin_unlock(&test_lock);
}
preempt_enable();
}
......@@ -427,8 +427,8 @@ static void ref_lock_irq_section(const int nloops)
preempt_disable();
for (i = nloops; i >= 0; i--) {
spin_lock_irqsave(&test_lock, flags);
spin_unlock_irqrestore(&test_lock, flags);
raw_spin_lock_irqsave(&test_lock, flags);
raw_spin_unlock_irqrestore(&test_lock, flags);
}
preempt_enable();
}
......@@ -440,9 +440,9 @@ static void ref_lock_irq_delay_section(const int nloops, const int udl, const in
preempt_disable();
for (i = nloops; i >= 0; i--) {
spin_lock_irqsave(&test_lock, flags);
raw_spin_lock_irqsave(&test_lock, flags);
un_delay(udl, ndl);
spin_unlock_irqrestore(&test_lock, flags);
raw_spin_unlock_irqrestore(&test_lock, flags);
}
preempt_enable();
}
......
This diff is collapsed.
......@@ -58,7 +58,7 @@ void rcu_qs(void)
rcu_ctrlblk.donetail = rcu_ctrlblk.curtail;
raise_softirq_irqoff(RCU_SOFTIRQ);
}
WRITE_ONCE(rcu_ctrlblk.gp_seq, rcu_ctrlblk.gp_seq + 1);
WRITE_ONCE(rcu_ctrlblk.gp_seq, rcu_ctrlblk.gp_seq + 2);
local_irq_restore(flags);
}
......@@ -139,8 +139,10 @@ static __latent_entropy void rcu_process_callbacks(struct softirq_action *unused
/*
* Wait for a grace period to elapse. But it is illegal to invoke
* synchronize_rcu() from within an RCU read-side critical section.
* Therefore, any legal call to synchronize_rcu() is a quiescent
* state, and so on a UP system, synchronize_rcu() need do nothing.
* Therefore, any legal call to synchronize_rcu() is a quiescent state,
* and so on a UP system, synchronize_rcu() need do nothing, other than
* let the polled APIs know that another grace period elapsed.
*
* (But Lai Jiangshan points out the benefits of doing might_sleep()
* to reduce latency.)
*
......@@ -152,6 +154,7 @@ void synchronize_rcu(void)
lock_is_held(&rcu_lock_map) ||
lock_is_held(&rcu_sched_lock_map),
"Illegal synchronize_rcu() in RCU read-side critical section");
WRITE_ONCE(rcu_ctrlblk.gp_seq, rcu_ctrlblk.gp_seq + 2);
}
EXPORT_SYMBOL_GPL(synchronize_rcu);
......@@ -213,10 +216,24 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu);
*/
bool poll_state_synchronize_rcu(unsigned long oldstate)
{
return READ_ONCE(rcu_ctrlblk.gp_seq) != oldstate;
return oldstate == RCU_GET_STATE_COMPLETED || READ_ONCE(rcu_ctrlblk.gp_seq) != oldstate;
}
EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
#ifdef CONFIG_KASAN_GENERIC
void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
{
if (head) {
void *ptr = (void *) head - (unsigned long) func;
kasan_record_aux_stack_noalloc(ptr);
}
__kvfree_call_rcu(head, func);
}
EXPORT_SYMBOL_GPL(kvfree_call_rcu);
#endif
void __init rcu_init(void)
{
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
......
This diff is collapsed.
......@@ -133,6 +133,10 @@ struct rcu_node {
wait_queue_head_t exp_wq[4];
struct rcu_exp_work rew;
bool exp_need_flush; /* Need to flush workitem? */
raw_spinlock_t exp_poll_lock;
/* Lock and data for polled expedited grace periods. */
unsigned long exp_seq_poll_rq;
struct work_struct exp_poll_wq;
} ____cacheline_internodealigned_in_smp;
/*
......@@ -187,9 +191,6 @@ struct rcu_data {
/* 3) dynticks interface. */
int dynticks_snap; /* Per-GP tracking for dynticks. */
long dynticks_nesting; /* Track process nesting level. */
long dynticks_nmi_nesting; /* Track irq/NMI nesting level. */
atomic_t dynticks; /* Even value for idle, else odd. */
bool rcu_need_heavy_qs; /* GP old, so heavy quiescent state! */
bool rcu_urgent_qs; /* GP old need light quiescent state. */
bool rcu_forced_tick; /* Forced tick to provide QS. */
......@@ -235,6 +236,7 @@ struct rcu_data {
* if rdp_gp.
*/
struct list_head nocb_entry_rdp; /* rcu_data node in wakeup chain. */
struct rcu_data *nocb_toggling_rdp; /* rdp queued for (de-)offloading */
/* The following fields are used by CB kthread, hence new cacheline. */
struct rcu_data *nocb_gp_rdp ____cacheline_internodealigned_in_smp;
......@@ -323,6 +325,9 @@ struct rcu_state {
short gp_state; /* GP kthread sleep state. */
unsigned long gp_wake_time; /* Last GP kthread wake. */
unsigned long gp_wake_seq; /* ->gp_seq at ^^^. */
unsigned long gp_seq_polled; /* GP seq for polled API. */
unsigned long gp_seq_polled_snap; /* ->gp_seq_polled at normal GP start. */
unsigned long gp_seq_polled_exp_snap; /* ->gp_seq_polled at expedited GP start. */
/* End of fields guarded by root rcu_node's lock. */
......@@ -425,12 +430,11 @@ static void rcu_flavor_sched_clock_irq(int user);
static void dump_blkd_tasks(struct rcu_node *rnp, int ncheck);
static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags);
static void rcu_preempt_boost_start_gp(struct rcu_node *rnp);
static bool rcu_is_callbacks_kthread(void);
static bool rcu_is_callbacks_kthread(struct rcu_data *rdp);
static void rcu_cpu_kthread_setup(unsigned int cpu);
static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp);
static bool rcu_preempt_has_tasks(struct rcu_node *rnp);
static bool rcu_preempt_need_deferred_qs(struct task_struct *t);
static void rcu_preempt_deferred_qs(struct task_struct *t);
static void zero_cpu_stall_ticks(struct rcu_data *rdp);
static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
......@@ -470,10 +474,6 @@ do { \
static void rcu_bind_gp_kthread(void);
static bool rcu_nohz_full_cpu(void);
static void rcu_dynticks_task_enter(void);
static void rcu_dynticks_task_exit(void);
static void rcu_dynticks_task_trace_enter(void);
static void rcu_dynticks_task_trace_exit(void);
/* Forward declarations for tree_stall.h */
static void record_gp_stall_check_time(void);
......@@ -481,3 +481,6 @@ static void rcu_iw_handler(struct irq_work *iwp);
static void check_cpu_stall(struct rcu_data *rdp);
static void rcu_check_gp_start_stall(struct rcu_node *rnp, struct rcu_data *rdp,
const unsigned long gpssdelay);
/* Forward declarations for tree_exp.h. */
static void sync_rcu_do_polled_gp(struct work_struct *wp);
......@@ -18,6 +18,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp);
static void rcu_exp_gp_seq_start(void)
{
rcu_seq_start(&rcu_state.expedited_sequence);
rcu_poll_gp_seq_start_unlocked(&rcu_state.gp_seq_polled_exp_snap);
}
/*
......@@ -34,6 +35,7 @@ static __maybe_unused unsigned long rcu_exp_gp_seq_endval(void)
*/
static void rcu_exp_gp_seq_end(void)
{
rcu_poll_gp_seq_end_unlocked(&rcu_state.gp_seq_polled_exp_snap);
rcu_seq_end(&rcu_state.expedited_sequence);
smp_mb(); /* Ensure that consecutive grace periods serialize. */
}
......@@ -356,7 +358,7 @@ static void __sync_rcu_exp_select_node_cpus(struct rcu_exp_work *rewp)
!(rnp->qsmaskinitnext & mask)) {
mask_ofl_test |= mask;
} else {
snap = rcu_dynticks_snap(rdp);
snap = rcu_dynticks_snap(cpu);
if (rcu_dynticks_in_eqs(snap))
mask_ofl_test |= mask;
else
......@@ -621,7 +623,6 @@ static void synchronize_rcu_expedited_wait(void)
return;
if (rcu_stall_is_suppressed())
continue;
panic_on_rcu_stall();
trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall"));
pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {",
rcu_state.name);
......@@ -636,10 +637,11 @@ static void synchronize_rcu_expedited_wait(void)
continue;
ndetected++;
rdp = per_cpu_ptr(&rcu_data, cpu);
pr_cont(" %d-%c%c%c", cpu,
pr_cont(" %d-%c%c%c%c", cpu,
"O."[!!cpu_online(cpu)],
"o."[!!(rdp->grpmask & rnp->expmaskinit)],
"N."[!!(rdp->grpmask & rnp->expmaskinitnext)]);
"N."[!!(rdp->grpmask & rnp->expmaskinitnext)],
"D."[!!(rdp->cpu_no_qs.b.exp)]);
}
}
pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
......@@ -669,6 +671,7 @@ static void synchronize_rcu_expedited_wait(void)
}
}
jiffies_stall = 3 * rcu_exp_jiffies_till_stall_check() + 3;
panic_on_rcu_stall();
}
}
......@@ -913,8 +916,18 @@ void synchronize_rcu_expedited(void)
"Illegal synchronize_rcu_expedited() in RCU read-side critical section");
/* Is the state is such that the call is a grace period? */
if (rcu_blocking_is_gp())
return;
if (rcu_blocking_is_gp()) {
// Note well that this code runs with !PREEMPT && !SMP.
// In addition, all code that advances grace periods runs
// at process level. Therefore, this expedited GP overlaps
// with other expedited GPs only by being fully nested within
// them, which allows reuse of ->gp_seq_polled_exp_snap.
rcu_poll_gp_seq_start_unlocked(&rcu_state.gp_seq_polled_exp_snap);
rcu_poll_gp_seq_end_unlocked(&rcu_state.gp_seq_polled_exp_snap);
if (rcu_init_invoked())
cond_resched();
return; // Context allows vacuous grace periods.
}
/* If expedited grace periods are prohibited, fall back to normal. */
if (rcu_gp_is_normal()) {
......@@ -950,3 +963,93 @@ void synchronize_rcu_expedited(void)
synchronize_rcu_expedited_destroy_work(&rew);
}
EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
/*
* Ensure that start_poll_synchronize_rcu_expedited() has the expedited
* RCU grace periods that it needs.
*/
static void sync_rcu_do_polled_gp(struct work_struct *wp)
{
unsigned long flags;
int i = 0;
struct rcu_node *rnp = container_of(wp, struct rcu_node, exp_poll_wq);
unsigned long s;
raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
s = rnp->exp_seq_poll_rq;
rnp->exp_seq_poll_rq = RCU_GET_STATE_COMPLETED;
raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
if (s == RCU_GET_STATE_COMPLETED)
return;
while (!poll_state_synchronize_rcu(s)) {
synchronize_rcu_expedited();
if (i == 10 || i == 20)
pr_info("%s: i = %d s = %lx gp_seq_polled = %lx\n", __func__, i, s, READ_ONCE(rcu_state.gp_seq_polled));
i++;
}
raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
s = rnp->exp_seq_poll_rq;
if (poll_state_synchronize_rcu(s))
rnp->exp_seq_poll_rq = RCU_GET_STATE_COMPLETED;
raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
}
/**
* start_poll_synchronize_rcu_expedited - Snapshot current RCU state and start expedited grace period
*
* Returns a cookie to pass to a call to cond_synchronize_rcu(),
* cond_synchronize_rcu_expedited(), or poll_state_synchronize_rcu(),
* allowing them to determine whether or not any sort of grace period has
* elapsed in the meantime. If the needed expedited grace period is not
* already slated to start, initiates that grace period.
*/
unsigned long start_poll_synchronize_rcu_expedited(void)
{
unsigned long flags;
struct rcu_data *rdp;
struct rcu_node *rnp;
unsigned long s;
s = get_state_synchronize_rcu();
rdp = per_cpu_ptr(&rcu_data, raw_smp_processor_id());
rnp = rdp->mynode;
if (rcu_init_invoked())
raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
if (!poll_state_synchronize_rcu(s)) {
rnp->exp_seq_poll_rq = s;
if (rcu_init_invoked())
queue_work(rcu_gp_wq, &rnp->exp_poll_wq);
}
if (rcu_init_invoked())
raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
return s;
}
EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_expedited);
/**
* cond_synchronize_rcu_expedited - Conditionally wait for an expedited RCU grace period
*
* @oldstate: value from get_state_synchronize_rcu(), start_poll_synchronize_rcu(), or start_poll_synchronize_rcu_expedited()
*
* If any type of full RCU grace period has elapsed since the earlier
* call to get_state_synchronize_rcu(), start_poll_synchronize_rcu(),
* or start_poll_synchronize_rcu_expedited(), just return. Otherwise,
* invoke synchronize_rcu_expedited() to wait for a full grace period.
*
* Yes, this function does not take counter wrap into account.
* But counter wrap is harmless. If the counter wraps, we have waited for
* more than 2 billion grace periods (and way more on a 64-bit system!),
* so waiting for a couple of additional grace periods should be just fine.
*
* This function provides the same memory-ordering guarantees that
* would be provided by a synchronize_rcu() that was invoked at the call
* to the function that provided @oldstate and that returned at the end
* of this function.
*/
void cond_synchronize_rcu_expedited(unsigned long oldstate)
{
if (!poll_state_synchronize_rcu(oldstate))
synchronize_rcu_expedited();
}
EXPORT_SYMBOL_GPL(cond_synchronize_rcu_expedited);
This diff is collapsed.
......@@ -460,7 +460,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
* be quite short, for example, in the case of the call from
* rcu_read_unlock_special().
*/
static void
static notrace void
rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
{
bool empty_exp;
......@@ -581,7 +581,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
* is disabled. This function cannot be expected to understand these
* nuances, so the caller must handle them.
*/
static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t)
{
return (__this_cpu_read(rcu_data.cpu_no_qs.b.exp) ||
READ_ONCE(t->rcu_read_unlock_special.s)) &&
......@@ -595,7 +595,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
* evaluate safety in terms of interrupt, softirq, and preemption
* disabling.
*/
static void rcu_preempt_deferred_qs(struct task_struct *t)
notrace void rcu_preempt_deferred_qs(struct task_struct *t)
{
unsigned long flags;
......@@ -899,8 +899,8 @@ void rcu_note_context_switch(bool preempt)
this_cpu_write(rcu_data.rcu_urgent_qs, false);
if (unlikely(raw_cpu_read(rcu_data.rcu_need_heavy_qs)))
rcu_momentary_dyntick_idle();
rcu_tasks_qs(current, preempt);
out:
rcu_tasks_qs(current, preempt);
trace_rcu_utilization(TPS("End context switch"));
}
EXPORT_SYMBOL_GPL(rcu_note_context_switch);
......@@ -926,7 +926,7 @@ static bool rcu_preempt_has_tasks(struct rcu_node *rnp)
* Because there is no preemptible RCU, there can be no deferred quiescent
* states.
*/
static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t)
{
return false;
}
......@@ -935,7 +935,7 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t)
// period for a quiescent state from this CPU. Note that requests from
// tasks are handled when removing the task from the blocked-tasks list
// below.
static void rcu_preempt_deferred_qs(struct task_struct *t)
notrace void rcu_preempt_deferred_qs(struct task_struct *t)
{
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
......@@ -1012,6 +1012,25 @@ static void rcu_cpu_kthread_setup(unsigned int cpu)
WRITE_ONCE(rdp->rcuc_activity, jiffies);
}
static bool rcu_is_callbacks_nocb_kthread(struct rcu_data *rdp)
{
#ifdef CONFIG_RCU_NOCB_CPU
return rdp->nocb_cb_kthread == current;
#else
return false;
#endif
}
/*
* Is the current CPU running the RCU-callbacks kthread?
* Caller must have preemption disabled.
*/
static bool rcu_is_callbacks_kthread(struct rcu_data *rdp)
{
return rdp->rcu_cpu_kthread_task == current ||
rcu_is_callbacks_nocb_kthread(rdp);
}
#ifdef CONFIG_RCU_BOOST
/*
......@@ -1140,7 +1159,8 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
(rnp->gp_tasks != NULL &&
rnp->boost_tasks == NULL &&
rnp->qsmask == 0 &&
(!time_after(rnp->boost_time, jiffies) || rcu_state.cbovld))) {
(!time_after(rnp->boost_time, jiffies) || rcu_state.cbovld ||
IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD)))) {
if (rnp->exp_tasks == NULL)
WRITE_ONCE(rnp->boost_tasks, rnp->gp_tasks);
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
......@@ -1151,15 +1171,6 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
}
}
/*
* Is the current CPU running the RCU-callbacks kthread?
* Caller must have preemption disabled.
*/
static bool rcu_is_callbacks_kthread(void)
{
return __this_cpu_read(rcu_data.rcu_cpu_kthread_task) == current;
}
#define RCU_BOOST_DELAY_JIFFIES DIV_ROUND_UP(CONFIG_RCU_BOOST_DELAY * HZ, 1000)
/*
......@@ -1242,11 +1253,6 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}
static bool rcu_is_callbacks_kthread(void)
{
return false;
}
static void rcu_preempt_boost_start_gp(struct rcu_node *rnp)
{
}
......@@ -1290,37 +1296,3 @@ static void rcu_bind_gp_kthread(void)
return;
housekeeping_affine(current, HK_TYPE_RCU);
}
/* Record the current task on dyntick-idle entry. */
static __always_inline void rcu_dynticks_task_enter(void)
{
#if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
WRITE_ONCE(current->rcu_tasks_idle_cpu, smp_processor_id());
#endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */
}
/* Record no current task on dyntick-idle exit. */
static __always_inline void rcu_dynticks_task_exit(void)
{
#if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL)
WRITE_ONCE(current->rcu_tasks_idle_cpu, -1);
#endif /* #if defined(CONFIG_TASKS_RCU) && defined(CONFIG_NO_HZ_FULL) */
}
/* Turn on heavyweight RCU tasks trace readers on idle/user entry. */
static __always_inline void rcu_dynticks_task_trace_enter(void)
{
#ifdef CONFIG_TASKS_TRACE_RCU
if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB))
current->trc_reader_special.b.need_mb = true;
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
}
/* Turn off heavyweight RCU tasks trace readers on idle/user exit. */
static __always_inline void rcu_dynticks_task_trace_exit(void)
{
#ifdef CONFIG_TASKS_TRACE_RCU
if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB))
current->trc_reader_special.b.need_mb = false;
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
}
This diff is collapsed.
......@@ -85,7 +85,7 @@ module_param(rcu_normal_after_boot, int, 0444);
* and while lockdep is disabled.
*
* Note that if the CPU is in the idle loop from an RCU point of view (ie:
* that we are in the section between rcu_idle_enter() and rcu_idle_exit())
* that we are in the section between ct_idle_enter() and ct_idle_exit())
* then rcu_read_lock_held() sets ``*ret`` to false even if the CPU did an
* rcu_read_lock(). The reason for this is that RCU ignores CPUs that are
* in such a section, considering these as in extended quiescent state,
......@@ -516,6 +516,19 @@ int rcu_cpu_stall_suppress_at_boot __read_mostly; // !0 = suppress boot stalls.
EXPORT_SYMBOL_GPL(rcu_cpu_stall_suppress_at_boot);
module_param(rcu_cpu_stall_suppress_at_boot, int, 0444);
/**
* get_completed_synchronize_rcu - Return a pre-completed polled state cookie
*
* Returns a value that will always be treated by functions like
* poll_state_synchronize_rcu() as a cookie whose grace period has already
* completed.
*/
unsigned long get_completed_synchronize_rcu(void)
{
return RCU_GET_STATE_COMPLETED;
}
EXPORT_SYMBOL_GPL(get_completed_synchronize_rcu);
#ifdef CONFIG_PROVE_RCU
/*
......
This diff is collapsed.
This diff is collapsed.
......@@ -27,6 +27,7 @@
#include <linux/capability.h>
#include <linux/cgroup_api.h>
#include <linux/cgroup.h>
#include <linux/context_tracking.h>
#include <linux/cpufreq.h>
#include <linux/cpumask_api.h>
#include <linux/ctype.h>
......
......@@ -174,9 +174,9 @@ static int __init csdlock_debug(char *str)
if (val)
static_branch_enable(&csdlock_debug_enabled);
return 0;
return 1;
}
early_param("csdlock_debug", csdlock_debug);
__setup("csdlock_debug=", csdlock_debug);
static DEFINE_PER_CPU(call_single_data_t *, cur_csd);
static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
......
This diff is collapsed.
This diff is collapsed.
......@@ -570,7 +570,7 @@ void __init tick_nohz_init(void)
}
for_each_cpu(cpu, tick_nohz_full_mask)
context_tracking_cpu_set(cpu);
ct_cpu_track_user(cpu);
ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
"kernel/nohz:predown", NULL,
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment