Commit 890f2420 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'rcu.2022.09.30a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull RCU updates from Paul McKenney:

 - Documentation updates.

   This is the first in a series from an ongoing review of the RCU
   documentation. "Why are people thinking -that- about RCU? Oh. Because
   that is an entirely reasonable interpretation of its documentation."

 - Miscellaneous fixes.

 - Improved memory allocation and heuristics.

 - Improve rcu_nocbs diagnostic output.

 - Add full-sized polled RCU grace period state values.

   These are the same size as an rcu_head structure, which is double
   that of the traditional unsigned long state values that may still be
   obtained from et_state_synchronize_rcu(). The added size avoids
   missing overlapping grace periods. This benefit is that call_rcu()
   can be replaced by polling, which can be attractive in situations
   where RCU-protected data is aged out of memory.

   Early in the series, the size of this state value is three unsigned
   longs. Later in the series, the fastpaths in synchronize_rcu() and
   synchronize_rcu_expedited() are reworked to permit the full state to
   be represented by only two unsigned longs. This reworking slows these
   two functions down in SMP kernels running either on single-CPU
   systems or on systems with all but one CPU offlined, but this should
   not be a significant problem. And if it somehow becomes a problem in
   some yet-as-unforeseen situations, three-value state values can be
   provided for only those situations.

   Finally, a pair of functions named same_state_synchronize_rcu() and
   same_state_synchronize_rcu_full() allow grace-period state values to
   be compared for equality. This permits users to maintain lists of
   data structures having the same state value, removing the need for
   per-data-structure grace-period state values, thus decreasing memory
   footprint.

 - Polled SRCU grace-period updates, including adding tests to
   rcutorture and reducing the incidence of Tiny SRCU grace-period-state
   counter wrap.

 - Improve Tasks RCU diagnostics and quiescent-state detection.

* tag 'rcu.2022.09.30a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (55 commits)
  rcutorture: Use the barrier operation specified by cur_ops
  rcu-tasks: Make RCU Tasks Trace check for userspace execution
  rcu-tasks: Ensure RCU Tasks Trace loops have quiescent states
  rcu-tasks: Convert RCU_LOCKDEP_WARN() to WARN_ONCE()
  srcu: Make Tiny SRCU use full-sized grace-period counters
  srcu: Make Tiny SRCU poll_state_synchronize_srcu() more precise
  srcu: Add GP and maximum requested GP to Tiny SRCU rcutorture output
  rcutorture: Make "srcud" option also test polled grace-period API
  rcutorture: Limit read-side polling-API testing
  rcu: Add functions to compare grace-period state values
  rcutorture: Expand rcu_torture_write_types() first "if" statement
  rcutorture: Use 1-suffixed variable in rcu_torture_write_types() check
  rcu: Make synchronize_rcu() fastpath update only boot-CPU counters
  rcutorture: Adjust rcu_poll_need_2gp() for rcu_gp_oldstate field removal
  rcu: Remove ->rgos_polled field from rcu_gp_oldstate structure
  rcu: Make synchronize_rcu_expedited() fast path update .expedited_sequence
  rcu: Remove expedited grace-period fast-path forward-progress helper
  rcu: Make synchronize_rcu() fast path update ->gp_seq counters
  rcu-tasks: Remove grace-period fast-path rcu-tasks helper
  rcu: Set rcu_data structures' initial ->gpwrap value to true
  ...
parents b8fb65e1 5c0ec490
......@@ -66,8 +66,13 @@ over a rather long period of time, but improvements are always welcome!
As a rough rule of thumb, any dereference of an RCU-protected
pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(),
rcu_read_lock_sched(), or by the appropriate update-side lock.
Disabling of preemption can serve as rcu_read_lock_sched(), but
is less readable and prevents lockdep from detecting locking issues.
Explicit disabling of preemption (preempt_disable(), for example)
can serve as rcu_read_lock_sched(), but is less readable and
prevents lockdep from detecting locking issues.
Please not that you *cannot* rely on code known to be built
only in non-preemptible kernels. Such code can and will break,
especially in kernels built with CONFIG_PREEMPT_COUNT=y.
Letting RCU-protected pointers "leak" out of an RCU read-side
critical section is every bit as bad as letting them leak out
......@@ -185,6 +190,9 @@ over a rather long period of time, but improvements are always welcome!
5. If call_rcu() or call_srcu() is used, the callback function will
be called from softirq context. In particular, it cannot block.
If you need the callback to block, run that code in a workqueue
handler scheduled from the callback. The queue_rcu_work()
function does this for you in the case of call_rcu().
6. Since synchronize_rcu() can block, it cannot be called
from any sort of irq context. The same rule applies
......@@ -297,7 +305,8 @@ over a rather long period of time, but improvements are always welcome!
the machine.
d. Periodically invoke synchronize_rcu(), permitting a limited
number of updates per grace period.
number of updates per grace period. Better yet, periodically
invoke rcu_barrier() to wait for all outstanding callbacks.
The same cautions apply to call_srcu() and kfree_rcu().
......
......@@ -128,10 +128,16 @@ Follow these rules to keep your RCU code working properly:
This sort of comparison occurs frequently when scanning
RCU-protected circular linked lists.
Note that if checks for being within an RCU read-side
critical section are not required and the pointer is never
dereferenced, rcu_access_pointer() should be used in place
of rcu_dereference().
Note that if the pointer comparison is done outside
of an RCU read-side critical section, and the pointer
is never dereferenced, rcu_access_pointer() should be
used in place of rcu_dereference(). In most cases,
it is best to avoid accidental dereferences by testing
the rcu_access_pointer() return value directly, without
assigning it to a variable.
Within an RCU read-side critical section, there is little
reason to use rcu_access_pointer().
- The comparison is against a pointer that references memory
that was initialized "a long time ago." The reason
......
......@@ -6,13 +6,15 @@ What is RCU? -- "Read, Copy, Update"
Please note that the "What is RCU?" LWN series is an excellent place
to start learning about RCU:
| 1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/
| 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/
| 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/
| 4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/
| 2010 Big API Table http://lwn.net/Articles/419086/
| 5. The RCU API, 2014 Edition http://lwn.net/Articles/609904/
| 2014 Big API Table http://lwn.net/Articles/609973/
| 1. What is RCU, Fundamentally? https://lwn.net/Articles/262464/
| 2. What is RCU? Part 2: Usage https://lwn.net/Articles/263130/
| 3. RCU part 3: the RCU API https://lwn.net/Articles/264090/
| 4. The RCU API, 2010 Edition https://lwn.net/Articles/418853/
| 2010 Big API Table https://lwn.net/Articles/419086/
| 5. The RCU API, 2014 Edition https://lwn.net/Articles/609904/
| 2014 Big API Table https://lwn.net/Articles/609973/
| 6. The RCU API, 2019 Edition https://lwn.net/Articles/777036/
| 2019 Big API Table https://lwn.net/Articles/777165/
What is RCU?
......@@ -915,13 +917,18 @@ which an RCU reference is held include:
The understanding that RCU provides a reference that only prevents a
change of type is particularly visible with objects allocated from a
slab cache marked ``SLAB_TYPESAFE_BY_RCU``. RCU operations may yield a
reference to an object from such a cache that has been concurrently
freed and the memory reallocated to a completely different object,
though of the same type. In this case RCU doesn't even protect the
identity of the object from changing, only its type. So the object
found may not be the one expected, but it will be one where it is safe
to take a reference or spinlock and then confirm that the identity
matches the expectations.
reference to an object from such a cache that has been concurrently freed
and the memory reallocated to a completely different object, though of
the same type. In this case RCU doesn't even protect the identity of the
object from changing, only its type. So the object found may not be the
one expected, but it will be one where it is safe to take a reference
(and then potentially acquiring a spinlock), allowing subsequent code
to check whether the identity matches expectations. It is tempting
to simply acquire the spinlock without first taking the reference, but
unfortunately any spinlock in a ``SLAB_TYPESAFE_BY_RCU`` object must be
initialized after each and every call to kmem_cache_alloc(), which renders
reference-free spinlock acquisition completely unsafe. Therefore, when
using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter.
With traditional reference counting -- such as that implemented by the
kref library in Linux -- there is typically code that runs when the last
......@@ -1057,14 +1064,20 @@ SRCU: Initialization/cleanup::
init_srcu_struct
cleanup_srcu_struct
All: lockdep-checked RCU-protected pointer access::
All: lockdep-checked RCU utility APIs::
rcu_access_pointer
rcu_dereference_raw
RCU_LOCKDEP_WARN
rcu_sleep_check
RCU_NONIDLE
All: Unchecked RCU-protected pointer access::
rcu_dereference_raw
All: Unchecked RCU-protected pointer access with dereferencing prohibited::
rcu_access_pointer
See the comment headers in the source code (or the docbook generated
from them) for more information.
......
......@@ -42,7 +42,31 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func);
void rcu_barrier_tasks(void);
void rcu_barrier_tasks_rude(void);
void synchronize_rcu(void);
struct rcu_gp_oldstate;
unsigned long get_completed_synchronize_rcu(void);
void get_completed_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp);
// Maximum number of unsigned long values corresponding to
// not-yet-completed RCU grace periods.
#define NUM_ACTIVE_RCU_POLL_OLDSTATE 2
/**
* same_state_synchronize_rcu - Are two old-state values identical?
* @oldstate1: First old-state value.
* @oldstate2: Second old-state value.
*
* The two old-state values must have been obtained from either
* get_state_synchronize_rcu(), start_poll_synchronize_rcu(), or
* get_completed_synchronize_rcu(). Returns @true if the two values are
* identical and @false otherwise. This allows structures whose lifetimes
* are tracked by old-state values to push these values to a list header,
* allowing those structures to be slightly smaller.
*/
static inline bool same_state_synchronize_rcu(unsigned long oldstate1, unsigned long oldstate2)
{
return oldstate1 == oldstate2;
}
#ifdef CONFIG_PREEMPT_RCU
......@@ -496,13 +520,21 @@ do { \
* against NULL. Although rcu_access_pointer() may also be used in cases
* where update-side locks prevent the value of the pointer from changing,
* you should instead use rcu_dereference_protected() for this use case.
* Within an RCU read-side critical section, there is little reason to
* use rcu_access_pointer().
*
* It is usually best to test the rcu_access_pointer() return value
* directly in order to avoid accidental dereferences being introduced
* by later inattentive changes. In other words, assigning the
* rcu_access_pointer() return value to a local variable results in an
* accident waiting to happen.
*
* It is also permissible to use rcu_access_pointer() when read-side
* access to the pointer was removed at least one grace period ago, as
* is the case in the context of the RCU callback that is freeing up
* the data, or after a synchronize_rcu() returns. This can be useful
* when tearing down multi-linked structures after a grace period
* has elapsed.
* access to the pointer was removed at least one grace period ago, as is
* the case in the context of the RCU callback that is freeing up the data,
* or after a synchronize_rcu() returns. This can be useful when tearing
* down multi-linked structures after a grace period has elapsed. However,
* rcu_dereference_protected() is normally preferred for this use case.
*/
#define rcu_access_pointer(p) __rcu_access_pointer((p), __UNIQUE_ID(rcu), __rcu)
......
......@@ -14,25 +14,75 @@
#include <asm/param.h> /* for HZ */
struct rcu_gp_oldstate {
unsigned long rgos_norm;
};
// Maximum number of rcu_gp_oldstate values corresponding to
// not-yet-completed RCU grace periods.
#define NUM_ACTIVE_RCU_POLL_FULL_OLDSTATE 2
/*
* Are the two oldstate values the same? See the Tree RCU version for
* docbook header.
*/
static inline bool same_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp1,
struct rcu_gp_oldstate *rgosp2)
{
return rgosp1->rgos_norm == rgosp2->rgos_norm;
}
unsigned long get_state_synchronize_rcu(void);
static inline void get_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp)
{
rgosp->rgos_norm = get_state_synchronize_rcu();
}
unsigned long start_poll_synchronize_rcu(void);
static inline void start_poll_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp)
{
rgosp->rgos_norm = start_poll_synchronize_rcu();
}
bool poll_state_synchronize_rcu(unsigned long oldstate);
static inline bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp)
{
return poll_state_synchronize_rcu(rgosp->rgos_norm);
}
static inline void cond_synchronize_rcu(unsigned long oldstate)
{
might_sleep();
}
static inline void cond_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp)
{
cond_synchronize_rcu(rgosp->rgos_norm);
}
static inline unsigned long start_poll_synchronize_rcu_expedited(void)
{
return start_poll_synchronize_rcu();
}
static inline void start_poll_synchronize_rcu_expedited_full(struct rcu_gp_oldstate *rgosp)
{
rgosp->rgos_norm = start_poll_synchronize_rcu_expedited();
}
static inline void cond_synchronize_rcu_expedited(unsigned long oldstate)
{
cond_synchronize_rcu(oldstate);
}
static inline void cond_synchronize_rcu_expedited_full(struct rcu_gp_oldstate *rgosp)
{
cond_synchronize_rcu_expedited(rgosp->rgos_norm);
}
extern void rcu_barrier(void);
static inline void synchronize_rcu_expedited(void)
......
......@@ -40,12 +40,52 @@ bool rcu_eqs_special_set(int cpu);
void rcu_momentary_dyntick_idle(void);
void kfree_rcu_scheduler_running(void);
bool rcu_gp_might_be_stalled(void);
struct rcu_gp_oldstate {
unsigned long rgos_norm;
unsigned long rgos_exp;
};
// Maximum number of rcu_gp_oldstate values corresponding to
// not-yet-completed RCU grace periods.
#define NUM_ACTIVE_RCU_POLL_FULL_OLDSTATE 4
/**
* same_state_synchronize_rcu_full - Are two old-state values identical?
* @rgosp1: First old-state value.
* @rgosp2: Second old-state value.
*
* The two old-state values must have been obtained from either
* get_state_synchronize_rcu_full(), start_poll_synchronize_rcu_full(),
* or get_completed_synchronize_rcu_full(). Returns @true if the two
* values are identical and @false otherwise. This allows structures
* whose lifetimes are tracked by old-state values to push these values
* to a list header, allowing those structures to be slightly smaller.
*
* Note that equality is judged on a bitwise basis, so that an
* @rcu_gp_oldstate structure with an already-completed state in one field
* will compare not-equal to a structure with an already-completed state
* in the other field. After all, the @rcu_gp_oldstate structure is opaque
* so how did such a situation come to pass in the first place?
*/
static inline bool same_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp1,
struct rcu_gp_oldstate *rgosp2)
{
return rgosp1->rgos_norm == rgosp2->rgos_norm && rgosp1->rgos_exp == rgosp2->rgos_exp;
}
unsigned long start_poll_synchronize_rcu_expedited(void);
void start_poll_synchronize_rcu_expedited_full(struct rcu_gp_oldstate *rgosp);
void cond_synchronize_rcu_expedited(unsigned long oldstate);
void cond_synchronize_rcu_expedited_full(struct rcu_gp_oldstate *rgosp);
unsigned long get_state_synchronize_rcu(void);
void get_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp);
unsigned long start_poll_synchronize_rcu(void);
void start_poll_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp);
bool poll_state_synchronize_rcu(unsigned long oldstate);
bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp);
void cond_synchronize_rcu(unsigned long oldstate);
void cond_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp);
bool rcu_is_idle_cpu(int cpu);
......
......@@ -15,10 +15,10 @@
struct srcu_struct {
short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */
unsigned short srcu_idx; /* Current reader array element in bit 0x2. */
unsigned short srcu_idx_max; /* Furthest future srcu_idx request. */
u8 srcu_gp_running; /* GP workqueue running? */
u8 srcu_gp_waiting; /* GP waiting for readers? */
unsigned long srcu_idx; /* Current reader array element in bit 0x2. */
unsigned long srcu_idx_max; /* Furthest future srcu_idx request. */
struct swait_queue_head srcu_wq;
/* Last srcu_read_unlock() wakes GP. */
struct rcu_head *srcu_cb_head; /* Pending callbacks: Head. */
......@@ -82,10 +82,12 @@ static inline void srcu_torture_stats_print(struct srcu_struct *ssp,
int idx;
idx = ((data_race(READ_ONCE(ssp->srcu_idx)) + 1) & 0x2) >> 1;
pr_alert("%s%s Tiny SRCU per-CPU(idx=%d): (%hd,%hd)\n",
pr_alert("%s%s Tiny SRCU per-CPU(idx=%d): (%hd,%hd) gp: %lu->%lu\n",
tt, tf, idx,
data_race(READ_ONCE(ssp->srcu_lock_nesting[!idx])),
data_race(READ_ONCE(ssp->srcu_lock_nesting[idx])));
data_race(READ_ONCE(ssp->srcu_lock_nesting[idx])),
data_race(READ_ONCE(ssp->srcu_idx)),
data_race(READ_ONCE(ssp->srcu_idx_max)));
}
#endif
This diff is collapsed.
......@@ -117,7 +117,7 @@ void srcu_drive_gp(struct work_struct *wp)
struct srcu_struct *ssp;
ssp = container_of(wp, struct srcu_struct, srcu_work);
if (ssp->srcu_gp_running || USHORT_CMP_GE(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max)))
if (ssp->srcu_gp_running || ULONG_CMP_GE(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max)))
return; /* Already running or nothing to do. */
/* Remove recently arrived callbacks and wait for readers. */
......@@ -150,17 +150,17 @@ void srcu_drive_gp(struct work_struct *wp)
* straighten that out.
*/
WRITE_ONCE(ssp->srcu_gp_running, false);
if (USHORT_CMP_LT(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max)))
if (ULONG_CMP_LT(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max)))
schedule_work(&ssp->srcu_work);
}
EXPORT_SYMBOL_GPL(srcu_drive_gp);
static void srcu_gp_start_if_needed(struct srcu_struct *ssp)
{
unsigned short cookie;
unsigned long cookie;
cookie = get_state_synchronize_srcu(ssp);
if (USHORT_CMP_GE(READ_ONCE(ssp->srcu_idx_max), cookie))
if (ULONG_CMP_GE(READ_ONCE(ssp->srcu_idx_max), cookie))
return;
WRITE_ONCE(ssp->srcu_idx_max, cookie);
if (!READ_ONCE(ssp->srcu_gp_running)) {
......@@ -215,7 +215,7 @@ unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp)
barrier();
ret = (READ_ONCE(ssp->srcu_idx) + 3) & ~0x1;
barrier();
return ret & USHRT_MAX;
return ret;
}
EXPORT_SYMBOL_GPL(get_state_synchronize_srcu);
......@@ -240,10 +240,10 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_srcu);
*/
bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long cookie)
{
bool ret = USHORT_CMP_GE(READ_ONCE(ssp->srcu_idx), cookie);
unsigned long cur_s = READ_ONCE(ssp->srcu_idx);
barrier();
return ret;
return ULONG_CMP_GE(cur_s, cookie) || ULONG_CMP_LT(cur_s, cookie - 3);
}
EXPORT_SYMBOL_GPL(poll_state_synchronize_srcu);
......
......@@ -560,7 +560,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
{
/* Complain if the scheduler has not started. */
RCU_LOCKDEP_WARN(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
"synchronize_rcu_tasks called too soon");
// If the grace-period kthread is running, use it.
......@@ -1500,6 +1500,7 @@ static void rcu_tasks_trace_pregp_step(struct list_head *hop)
if (rcu_tasks_trace_pertask_prep(t, true))
trc_add_holdout(t, hop);
rcu_read_unlock();
cond_resched_tasks_rcu_qs();
}
// Only after all running tasks have been accounted for is it
......@@ -1520,6 +1521,7 @@ static void rcu_tasks_trace_pregp_step(struct list_head *hop)
raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
}
raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
cond_resched_tasks_rcu_qs();
}
// Re-enable CPU hotplug now that the holdout list is populated.
......@@ -1619,6 +1621,7 @@ static void check_all_holdout_tasks_trace(struct list_head *hop,
trc_del_holdout(t);
else if (needreport)
show_stalled_task_trace(t, firstreport);
cond_resched_tasks_rcu_qs();
}
// Re-enable CPU hotplug now that the holdout list scan has completed.
......
......@@ -158,6 +158,10 @@ void synchronize_rcu(void)
}
EXPORT_SYMBOL_GPL(synchronize_rcu);
static void tiny_rcu_leak_callback(struct rcu_head *rhp)
{
}
/*
* Post an RCU callback to be invoked after the end of an RCU grace
* period. But since we have but one CPU, that would be after any
......@@ -165,9 +169,20 @@ EXPORT_SYMBOL_GPL(synchronize_rcu);
*/
void call_rcu(struct rcu_head *head, rcu_callback_t func)
{
static atomic_t doublefrees;
unsigned long flags;
debug_rcu_head_queue(head);
if (debug_rcu_head_queue(head)) {
if (atomic_inc_return(&doublefrees) < 4) {
pr_err("%s(): Double-freed CB %p->%pS()!!! ", __func__, head, head->func);
mem_dump_obj(head);
}
if (!__is_kvfree_rcu_offset((unsigned long)head->func))
WRITE_ONCE(head->func, tiny_rcu_leak_callback);
return;
}
head->func = func;
head->next = NULL;
......@@ -183,6 +198,16 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
}
EXPORT_SYMBOL_GPL(call_rcu);
/*
* Store a grace-period-counter "cookie". For more information,
* see the Tree RCU header comment.
*/
void get_completed_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp)
{
rgosp->rgos_norm = RCU_GET_STATE_COMPLETED;
}
EXPORT_SYMBOL_GPL(get_completed_synchronize_rcu_full);
/*
* Return a grace-period-counter "cookie". For more information,
* see the Tree RCU header comment.
......
This diff is collapsed.
......@@ -828,11 +828,13 @@ static void rcu_exp_handler(void *unused)
{
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
struct rcu_node *rnp = rdp->mynode;
bool preempt_bh_enabled = !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK));
if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
__this_cpu_read(rcu_data.cpu_no_qs.b.exp))
return;
if (rcu_is_cpu_rrupt_from_idle()) {
if (rcu_is_cpu_rrupt_from_idle() ||
(IS_ENABLED(CONFIG_PREEMPT_COUNT) && preempt_bh_enabled)) {
rcu_report_exp_rdp(this_cpu_ptr(&rcu_data));
return;
}
......@@ -906,6 +908,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
void synchronize_rcu_expedited(void)
{
bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
unsigned long flags;
struct rcu_exp_work rew;
struct rcu_node *rnp;
unsigned long s;
......@@ -924,8 +927,11 @@ void synchronize_rcu_expedited(void)
// them, which allows reuse of ->gp_seq_polled_exp_snap.
rcu_poll_gp_seq_start_unlocked(&rcu_state.gp_seq_polled_exp_snap);
rcu_poll_gp_seq_end_unlocked(&rcu_state.gp_seq_polled_exp_snap);
if (rcu_init_invoked())
cond_resched();
local_irq_save(flags);
WARN_ON_ONCE(num_online_cpus() > 1);
rcu_state.expedited_sequence += (1 << RCU_SEQ_CTR_SHIFT);
local_irq_restore(flags);
return; // Context allows vacuous grace periods.
}
......@@ -1027,6 +1033,24 @@ unsigned long start_poll_synchronize_rcu_expedited(void)
}
EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_expedited);
/**
* start_poll_synchronize_rcu_expedited_full - Take a full snapshot and start expedited grace period
* @rgosp: Place to put snapshot of grace-period state
*
* Places the normal and expedited grace-period states in rgosp. This
* state value can be passed to a later call to cond_synchronize_rcu_full()
* or poll_state_synchronize_rcu_full() to determine whether or not a
* grace period (whether normal or expedited) has elapsed in the meantime.
* If the needed expedited grace period is not already slated to start,
* initiates that grace period.
*/
void start_poll_synchronize_rcu_expedited_full(struct rcu_gp_oldstate *rgosp)
{
get_state_synchronize_rcu_full(rgosp);
(void)start_poll_synchronize_rcu_expedited();
}
EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_expedited_full);
/**
* cond_synchronize_rcu_expedited - Conditionally wait for an expedited RCU grace period
*
......@@ -1053,3 +1077,30 @@ void cond_synchronize_rcu_expedited(unsigned long oldstate)
synchronize_rcu_expedited();
}
EXPORT_SYMBOL_GPL(cond_synchronize_rcu_expedited);
/**
* cond_synchronize_rcu_expedited_full - Conditionally wait for an expedited RCU grace period
* @rgosp: value from get_state_synchronize_rcu_full(), start_poll_synchronize_rcu_full(), or start_poll_synchronize_rcu_expedited_full()
*
* If a full RCU grace period has elapsed since the call to
* get_state_synchronize_rcu_full(), start_poll_synchronize_rcu_full(),
* or start_poll_synchronize_rcu_expedited_full() from which @rgosp was
* obtained, just return. Otherwise, invoke synchronize_rcu_expedited()
* to wait for a full grace period.
*
* Yes, this function does not take counter wrap into account.
* But counter wrap is harmless. If the counter wraps, we have waited for
* more than 2 billion grace periods (and way more on a 64-bit system!),
* so waiting for a couple of additional grace periods should be just fine.
*
* This function provides the same memory-ordering guarantees that
* would be provided by a synchronize_rcu() that was invoked at the call
* to the function that provided @rgosp and that returned at the end of
* this function.
*/
void cond_synchronize_rcu_expedited_full(struct rcu_gp_oldstate *rgosp)
{
if (!poll_state_synchronize_rcu_full(rgosp))
synchronize_rcu_expedited();
}
EXPORT_SYMBOL_GPL(cond_synchronize_rcu_expedited_full);
......@@ -1111,7 +1111,7 @@ int rcu_nocb_cpu_deoffload(int cpu)
if (!ret)
cpumask_clear_cpu(cpu, rcu_nocb_mask);
} else {
pr_info("NOCB: Can't CB-deoffload an offline CPU\n");
pr_info("NOCB: Cannot CB-deoffload offline CPU %d\n", rdp->cpu);
ret = -EINVAL;
}
}
......@@ -1196,7 +1196,7 @@ int rcu_nocb_cpu_offload(int cpu)
if (!ret)
cpumask_set_cpu(cpu, rcu_nocb_mask);
} else {
pr_info("NOCB: Can't CB-offload an offline CPU\n");
pr_info("NOCB: Cannot CB-offload offline CPU %d\n", rdp->cpu);
ret = -EINVAL;
}
}
......@@ -1452,8 +1452,8 @@ static void show_rcu_nocb_gp_state(struct rcu_data *rdp)
(long)rdp->nocb_gp_seq,
rnp->grplo, rnp->grphi, READ_ONCE(rdp->nocb_gp_loops),
rdp->nocb_gp_kthread ? task_state_to_char(rdp->nocb_gp_kthread) : '.',
rdp->nocb_cb_kthread ? (int)task_cpu(rdp->nocb_gp_kthread) : -1,
show_rcu_should_be_on_cpu(rdp->nocb_cb_kthread));
rdp->nocb_gp_kthread ? (int)task_cpu(rdp->nocb_gp_kthread) : -1,
show_rcu_should_be_on_cpu(rdp->nocb_gp_kthread));
}
/* Dump out nocb kthread state for the specified rcu_data structure. */
......@@ -1497,7 +1497,7 @@ static void show_rcu_nocb_state(struct rcu_data *rdp)
".B"[!!rcu_cblist_n_cbs(&rdp->nocb_bypass)],
rcu_segcblist_n_cbs(&rdp->cblist),
rdp->nocb_cb_kthread ? task_state_to_char(rdp->nocb_cb_kthread) : '.',
rdp->nocb_cb_kthread ? (int)task_cpu(rdp->nocb_gp_kthread) : -1,
rdp->nocb_cb_kthread ? (int)task_cpu(rdp->nocb_cb_kthread) : -1,
show_rcu_should_be_on_cpu(rdp->nocb_cb_kthread));
/* It is OK for GP kthreads to have GP state. */
......
......@@ -641,7 +641,8 @@ static void rcu_read_unlock_special(struct task_struct *t)
expboost = (t->rcu_blocked_node && READ_ONCE(t->rcu_blocked_node->exp_tasks)) ||
(rdp->grpmask & READ_ONCE(rnp->expmask)) ||
IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ||
(IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
((rdp->grpmask & READ_ONCE(rnp->qsmask)) || t->rcu_blocked_node)) ||
(IS_ENABLED(CONFIG_RCU_BOOST) && irqs_were_disabled &&
t->rcu_blocked_node);
// Need to defer quiescent state until everything is enabled.
......@@ -718,9 +719,6 @@ static void rcu_flavor_sched_clock_irq(int user)
struct task_struct *t = current;
lockdep_assert_irqs_disabled();
if (user || rcu_is_cpu_rrupt_from_idle()) {
rcu_note_voluntary_context_switch(current);
}
if (rcu_preempt_depth() > 0 ||
(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK))) {
/* No QS, force context switch if deferred. */
......@@ -824,6 +822,7 @@ void rcu_read_unlock_strict(void)
if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
return;
rdp = this_cpu_ptr(&rcu_data);
rdp->cpu_no_qs.b.norm = false;
rcu_report_qs_rdp(rdp);
udelay(rcu_unlock_delay);
}
......@@ -869,7 +868,7 @@ void rcu_all_qs(void)
if (!raw_cpu_read(rcu_data.rcu_urgent_qs))
return;
preempt_disable();
preempt_disable(); // For CONFIG_PREEMPT_COUNT=y kernels
/* Load rcu_urgent_qs before other flags. */
if (!smp_load_acquire(this_cpu_ptr(&rcu_data.rcu_urgent_qs))) {
preempt_enable();
......@@ -931,10 +930,13 @@ static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t)
return false;
}
// Except that we do need to respond to a request by an expedited grace
// period for a quiescent state from this CPU. Note that requests from
// tasks are handled when removing the task from the blocked-tasks list
// below.
// Except that we do need to respond to a request by an expedited
// grace period for a quiescent state from this CPU. Note that in
// non-preemptible kernels, there can be no context switches within RCU
// read-side critical sections, which in turn means that the leaf rcu_node
// structure's blocked-tasks list is always empty. is therefore no need to
// actually check it. Instead, a quiescent state from this CPU suffices,
// and this function is only called from such a quiescent state.
notrace void rcu_preempt_deferred_qs(struct task_struct *t)
{
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
......@@ -972,7 +974,6 @@ static void rcu_flavor_sched_clock_irq(int user)
* neither access nor modify, at least not while the
* corresponding CPU is online.
*/
rcu_qs();
}
}
......@@ -1238,8 +1239,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
cpu != outgoingcpu)
cpumask_set_cpu(cpu, cm);
cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
if (cpumask_empty(cm))
if (cpumask_empty(cm)) {
cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_RCU));
if (outgoingcpu >= 0)
cpumask_clear_cpu(outgoingcpu, cm);
}
set_cpus_allowed_ptr(t, cm);
mutex_unlock(&rnp->boost_kthread_mutex);
free_cpumask_var(cm);
......
......@@ -368,7 +368,7 @@ static void rcu_dump_cpu_stacks(void)
if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
if (cpu_is_offline(cpu))
pr_err("Offline CPU %d blocking current GP.\n", cpu);
else if (!trigger_single_cpu_backtrace(cpu))
else
dump_cpu_task(cpu);
}
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
......@@ -511,8 +511,7 @@ static void rcu_check_gp_kthread_starvation(void)
pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
} else {
pr_err("Stack dump where RCU GP kthread last ran:\n");
if (!trigger_single_cpu_backtrace(cpu))
dump_cpu_task(cpu);
dump_cpu_task(cpu);
}
}
wake_up_process(gpk);
......
......@@ -73,6 +73,7 @@
#include <uapi/linux/sched/types.h>
#include <asm/irq_regs.h>
#include <asm/switch_to.h>
#include <asm/tlb.h>
......@@ -11183,6 +11184,19 @@ struct cgroup_subsys cpu_cgrp_subsys = {
void dump_cpu_task(int cpu)
{
if (cpu == smp_processor_id() && in_hardirq()) {
struct pt_regs *regs;
regs = get_irq_regs();
if (regs) {
show_regs(regs);
return;
}
}
if (trigger_single_cpu_backtrace(cpu))
return;
pr_info("Task dump for CPU %d:\n", cpu);
sched_show_task(cpu_curr(cpu));
}
......
......@@ -370,8 +370,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
if (cpu >= 0) {
if (static_branch_unlikely(&csdlock_debug_extended))
csd_lock_print_extended(csd, cpu);
if (!trigger_single_cpu_backtrace(cpu))
dump_cpu_task(cpu);
dump_cpu_task(cpu);
if (!cpu_cur_csd) {
pr_alert("csd: Re-sending CSD lock (#%d) IPI from CPU#%02d to CPU#%02d\n", *bug_id, raw_smp_processor_id(), cpu);
arch_send_call_function_single_ipi(cpu);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment