1. 27 Apr, 2020 40 commits
    • Paul E. McKenney's avatar
      rcu-tasks: Add IPI failure count to statistics · 7e0669c3
      Paul E. McKenney authored
      This commit adds a failure-return count for smp_call_function_single(),
      and adds this to the console messages for rcutorture writer stalls and at
      the end of rcutorture testing.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      7e0669c3
    • Paul E. McKenney's avatar
      rcutorture: Add TRACE02 scenario enabling RCU Tasks Trace IPIs · 039f3cc9
      Paul E. McKenney authored
      This commit adds a TRACE02 scenario which enables preemption and RCU
      Tasks Trace IPIs, more specifically, disabling heavyweight readers.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      039f3cc9
    • Paul E. McKenney's avatar
      rcu-tasks: Add count for idle tasks on offline CPUs · edf3775f
      Paul E. McKenney authored
      This commit adds a counter for the number of times the quiescent state
      was an idle task associated with an offline CPU, and prints this count
      at the end of rcutorture runs and at stall time.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      edf3775f
    • Paul E. McKenney's avatar
      rcu-tasks: Add rcu_dynticks_zero_in_eqs() effectiveness statistics · 40471509
      Paul E. McKenney authored
      This commit adds counts of the number of calls and number of successful
      calls to rcu_dynticks_zero_in_eqs(), which are printed at the end
      of rcutorture runs and at stall time.  This allows evaluation of the
      effectiveness of rcu_dynticks_zero_in_eqs().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      40471509
    • Paul E. McKenney's avatar
      rcu-tasks: Make RCU tasks trace also wait for idle tasks · 9796e1ae
      Paul E. McKenney authored
      This commit scans the CPUs, adding each CPU's idle task to the list of
      tasks that need quiescent states.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      9796e1ae
    • Paul E. McKenney's avatar
      rcu-tasks: Handle the running-offline idle-task special case · 7e3b70e0
      Paul E. McKenney authored
      The idle task corresponding to an offline CPU can appear to be running
      while that CPU is offline.  This commit therefore adds checks for this
      situation, treating it as a quiescent state.  Because the tasklist scan
      and the holdout-list scan now exclude CPU-hotplug operations, readers
      on the CPU-hotplug paths are still waited for.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      7e3b70e0
    • Paul E. McKenney's avatar
      rcu-tasks: Disable CPU hotplug across RCU tasks trace scans · 81b4a7bc
      Paul E. McKenney authored
      This commit disables CPU hotplug across RCU tasks trace scans, which
      is a first step towards correctly recognizing idle tasks "running" on
      offline CPUs.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      81b4a7bc
    • Paul E. McKenney's avatar
      rcu-tasks: Allow rcu_read_unlock_trace() under scheduler locks · b38f57c1
      Paul E. McKenney authored
      The rcu_read_unlock_trace() can invoke rcu_read_unlock_trace_special(),
      which in turn can call wake_up().  Therefore, if any scheduler lock is
      held across a call to rcu_read_unlock_trace(), self-deadlock can occur.
      This commit therefore uses the irq_work facility to defer the wake_up()
      to a clean environment where no scheduler locks will be held.
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      [ paulmck: Update #includes for m68k per kbuild test robot. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      b38f57c1
    • Paul E. McKenney's avatar
      rcu-tasks: Avoid IPIing userspace/idle tasks if kernel is so built · 7d0c9c50
      Paul E. McKenney authored
      Systems running CPU-bound real-time task do not want IPIs sent to CPUs
      executing nohz_full userspace tasks.  Battery-powered systems don't
      want IPIs sent to idle CPUs in low-power mode.  Unfortunately, RCU tasks
      trace can and will send such IPIs in some cases.
      
      Both of these situations occur only when the target CPU is in RCU
      dyntick-idle mode, in other words, when RCU is not watching the
      target CPU.  This suggests that CPUs in dyntick-idle mode should use
      memory barriers in outermost invocations of rcu_read_lock_trace()
      and rcu_read_unlock_trace(), which would allow the RCU tasks trace
      grace period to directly read out the target CPU's read-side state.
      One challenge is that RCU tasks trace is not targeting a specific
      CPU, but rather a task.  And that task could switch from one CPU to
      another at any time.
      
      This commit therefore uses try_invoke_on_locked_down_task()
      and checks for task_curr() in trc_inspect_reader_notrunning().
      When this condition holds, the target task is running and cannot move.
      If CONFIG_TASKS_TRACE_RCU_READ_MB=y, the new rcu_dynticks_zero_in_eqs()
      function can be used to check if the specified integer (in this case,
      t->trc_reader_nesting) is zero while the target CPU remains in that same
      dyntick-idle sojourn.  If so, the target task is in a quiescent state.
      If not, trc_read_check_handler() must indicate failure so that the
      grace-period kthread can take appropriate action or retry after an
      appropriate delay, as the case may be.
      
      With this change, given CONFIG_TASKS_TRACE_RCU_READ_MB=y, if a given
      CPU remains idle or a given task continues executing in nohz_full mode,
      the RCU tasks trace grace-period kthread will detect this without the
      need to send an IPI.
      Suggested-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      7d0c9c50
    • Paul E. McKenney's avatar
      rcu-tasks: Add Kconfig option to mediate smp_mb() vs. IPI · 9ae58d7b
      Paul E. McKenney authored
      This commit provides a new TASKS_TRACE_RCU_READ_MB Kconfig option that
      enables use of read-side memory barriers by both rcu_read_lock_trace()
      and rcu_read_unlock_trace() when the are executed with the
      current->trc_reader_special.b.need_mb flag set.  This flag is currently
      never set.  Doing that is the subject of a later commit.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      9ae58d7b
    • Paul E. McKenney's avatar
      rcu-tasks: Add grace-period and IPI counts to statistics · 238dbce3
      Paul E. McKenney authored
      This commit adds a grace-period count and a count of IPIs sent since
      boot, which is printed in response to rcutorture writer stalls and at
      the end of rcutorture testing.  These counts will be used to evaluate
      various schemes to reduce the number of IPIs sent.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      238dbce3
    • Paul E. McKenney's avatar
      rcu-tasks: Split ->trc_reader_need_end · 276c4104
      Paul E. McKenney authored
      This commit splits ->trc_reader_need_end by using the rcu_special union.
      This change permits readers to check to see if a memory barrier is
      required without any added overhead in the common case where no such
      barrier is required.  This commit also adds the read-side checking.
      Later commits will add the machinery to properly set the new
      ->trc_reader_special.b.need_mb field.
      
      This commit also makes rcu_read_unlock_trace_special() tolerate nested
      read-side critical sections within interrupt and NMI handlers.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      276c4104
    • Paul E. McKenney's avatar
      rcu-tasks: Provide boot parameter to delay IPIs until late in grace period · b0afa0f0
      Paul E. McKenney authored
      This commit provides a rcupdate.rcu_task_ipi_delay kernel boot parameter
      that specifies how old the RCU tasks trace grace period must be before
      the grace-period kthread starts sending IPIs.  This delay allows more
      tasks to pass through rcu_tasks_qs() quiescent states, thus reducing
      (or even eliminating) the number of IPIs that must be sent.
      
      On a short rcutorture test setting this kernel boot parameter to HZ/2
      resulted in zero IPIs for all 877 RCU-tasks trace grace periods that
      elapsed during that test.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      b0afa0f0
    • Paul E. McKenney's avatar
      rcu-tasks: Add a grace-period start time for throttling and debug · 88092d0c
      Paul E. McKenney authored
      This commit adds a place to record the grace-period start in jiffies.
      This will be used by later commits for debugging purposes and to throttle
      IPIs early in the grace period.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      88092d0c
    • Paul E. McKenney's avatar
      rcu-tasks: Make RCU Tasks Trace make use of RCU scheduler hooks · 43766c3e
      Paul E. McKenney authored
      This commit makes the calls to rcu_tasks_qs() detect and report
      quiescent states for RCU tasks trace.  If the task is in a quiescent
      state and if ->trc_reader_checked is not yet set, the task sets its own
      ->trc_reader_checked.  This will cause the grace-period kthread to
      remove it from the holdout list if it still remains there.
      
      [ paulmck: Fix conditional compilation per kbuild test robot feedback. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      43766c3e
    • Paul E. McKenney's avatar
      rcu-tasks: Make rcutorture writer stall output include GP state · af051ca4
      Paul E. McKenney authored
      This commit adds grace-period state and time to the rcutorture writer
      stall output.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      af051ca4
    • Paul E. McKenney's avatar
      rcu-tasks: Add RCU tasks to rcutorture writer stall output · e21408ce
      Paul E. McKenney authored
      This commit adds state for each RCU-tasks flavor to the rcutorture
      writer stall output.  The initial state is minimal, but you have to
      start somewhere.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      [ paulmck: Fixes based on feedback from kbuild test robot. ]
      e21408ce
    • Paul E. McKenney's avatar
      rcu-tasks: Move #ifdef into tasks.h · 8fd8ca38
      Paul E. McKenney authored
      This commit pushes the #ifdef CONFIG_TASKS_RCU_GENERIC from
      kernel/rcu/update.c to kernel/rcu/tasks.h in order to improve
      readability as more APIs are added.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      8fd8ca38
    • Paul E. McKenney's avatar
      rcu-tasks: Add stall warnings for RCU Tasks Trace · 4593e772
      Paul E. McKenney authored
      This commit adds RCU CPU stall warnings for RCU Tasks Trace.  These
      dump out any tasks blocking the current grace period, as well as any
      CPUs that have not responded to an IPI request.  This happens in two
      phases, when initially extracting state from the tasks and later when
      waiting for any holdout tasks to check in.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      4593e772
    • Paul E. McKenney's avatar
      rcutorture: Add torture tests for RCU Tasks Trace · c1a76c0b
      Paul E. McKenney authored
      This commit adds the definitions required to torture the tracing flavor
      of RCU tasks.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      c1a76c0b
    • Paul E. McKenney's avatar
      rcu-tasks: Add an RCU Tasks Trace to simplify protection of tracing hooks · d5f177d3
      Paul E. McKenney authored
      Because RCU does not watch exception early-entry/late-exit, idle-loop,
      or CPU-hotplug execution, protection of tracing and BPF operations is
      needlessly complicated.  This commit therefore adds a variant of
      Tasks RCU that:
      
      o	Has explicit read-side markers to allow finite grace periods in
      	the face of in-kernel loops for PREEMPT=n builds.  These markers
      	are rcu_read_lock_trace() and rcu_read_unlock_trace().
      
      o	Protects code in the idle loop, exception entry/exit, and
      	CPU-hotplug code paths.  In this respect, RCU-tasks trace is
      	similar to SRCU, but with lighter-weight readers.
      
      o	Avoids expensive read-side instruction, having overhead similar
      	to that of Preemptible RCU.
      
      There are of course downsides:
      
      o	The grace-period code can send IPIs to CPUs, even when those
      	CPUs are in the idle loop or in nohz_full userspace.  This is
      	mitigated by later commits.
      
      o	It is necessary to scan the full tasklist, much as for Tasks RCU.
      
      o	There is a single callback queue guarded by a single lock,
      	again, much as for Tasks RCU.  However, those early use cases
      	that request multiple grace periods in quick succession are
      	expected to do so from a single task, which makes the single
      	lock almost irrelevant.  If needed, multiple callback queues
      	can be provided using any number of schemes.
      
      Perhaps most important, this variant of RCU does not affect the vanilla
      flavors, rcu_preempt and rcu_sched.  The fact that RCU Tasks Trace
      readers can operate from idle, offline, and exception entry/exit in no
      way enables rcu_preempt and rcu_sched readers to do so.
      
      The memory ordering was outlined here:
      https://lore.kernel.org/lkml/20200319034030.GX3199@paulmck-ThinkPad-P72/
      
      This effort benefited greatly from off-list discussions of BPF
      requirements with Alexei Starovoitov and Andrii Nakryiko.  At least
      some of the on-list discussions are captured in the Link: tags below.
      In addition, KCSAN was quite helpful in finding some early bugs.
      
      Link: https://lore.kernel.org/lkml/20200219150744.428764577@infradead.org/
      Link: https://lore.kernel.org/lkml/87mu8p797b.fsf@nanos.tec.linutronix.de/
      Link: https://lore.kernel.org/lkml/20200225221305.605144982@linutronix.de/
      Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
      Cc: Andrii Nakryiko <andriin@fb.com>
      [ paulmck: Apply feedback from Steve Rostedt and Joel Fernandes. ]
      [ paulmck: Decrement trc_n_readers_need_end upon IPI failure. ]
      [ paulmck: Fix locking issue reported by rcutorture. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      d5f177d3
    • Paul E. McKenney's avatar
      rcu-tasks: Code movement to allow more Tasks RCU variants · d01aa263
      Paul E. McKenney authored
      This commit does nothing but move rcu_tasks_wait_gp() up to a new section
      for common code.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      d01aa263
    • Paul E. McKenney's avatar
      rcu-tasks: Further refactor RCU-tasks to allow adding more variants · e4fe5dd6
      Paul E. McKenney authored
      This commit refactors RCU tasks to allow variants to be added.  These
      variants will share the current Tasks-RCU tasklist scan and the holdout
      list processing.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      e4fe5dd6
    • Paul E. McKenney's avatar
      rcu-tasks: Use unique names for RCU-Tasks kthreads and messages · c97d12a6
      Paul E. McKenney authored
      This commit causes the flavors of RCU Tasks to use different names
      for their kthreads and in their console messages.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      c97d12a6
    • Paul E. McKenney's avatar
      rcutorture: Add torture tests for RCU Tasks Rude · 3d6e43c7
      Paul E. McKenney authored
      This commit adds the definitions required to torture the rude flavor of
      RCU tasks.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      3d6e43c7
    • Paul E. McKenney's avatar
      rcu-tasks: Add an RCU-tasks rude variant · c84aad76
      Paul E. McKenney authored
      This commit adds a "rude" variant of RCU-tasks that has as quiescent
      states schedule(), cond_resched_tasks_rcu_qs(), userspace execution,
      and (in theory, anyway) cond_resched().  In other words, RCU-tasks rude
      readers are regions of code with preemption disabled, but excluding code
      early in the CPU-online sequence and late in the CPU-offline sequence.
      Updates make use of IPIs and force an IPI and a context switch on each
      online CPU.  This variant is useful in some situations in tracing.
      Suggested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      [ paulmck: Apply EXPORT_SYMBOL_GPL() feedback from Qiujun Huang. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      [ paulmck: Apply review feedback from Steve Rostedt. ]
      c84aad76
    • Paul E. McKenney's avatar
      rcu-tasks: Refactor RCU-tasks to allow variants to be added · 5873b8a9
      Paul E. McKenney authored
      This commit splits out generic processing from RCU-tasks-specific
      processing in order to allow additional flavors to be added.  It also
      adds a def_bool TASKS_RCU_GENERIC to enable the common RCU-tasks
      infrastructure code.
      
      This is primarily, but not entirely, a code-movement commit.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      5873b8a9
    • Paul E. McKenney's avatar
      rcutorture: Add a test for synchronize_rcu_mult() · 9cf8fc6f
      Paul E. McKenney authored
      This commit adds a crude test for synchronize_rcu_mult().  This is
      currently a smoke test rather than a high-quality stress test.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      9cf8fc6f
    • Paul E. McKenney's avatar
      rcu: Reinstate synchronize_rcu_mult() · b3d73156
      Paul E. McKenney authored
      With the advent and likely usage of synchronize_rcu_rude(), there is
      again a need to wait on multiple types of RCU grace periods, for
      example, call_rcu_tasks() and call_rcu_tasks_rude().  This commit
      therefore reinstates synchronize_rcu_mult() in order to allow these
      grace periods to be straightforwardly waited on concurrently.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      b3d73156
    • Paul E. McKenney's avatar
      rcu-tasks: Create struct to hold state information · 07e10515
      Paul E. McKenney authored
      This commit creates an rcu_tasks struct to hold state information for
      RCU Tasks.  This is a preparation commit for adding additional flavors
      of Tasks RCU, each of which would have its own rcu_tasks struct.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      07e10515
    • Paul E. McKenney's avatar
      rcu-tasks: Move Tasks RCU to its own file · eacd6f04
      Paul E. McKenney authored
      This code-movement-only commit is in preparation for adding an additional
      flavor of Tasks RCU, which relies on workqueues to detect grace periods.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      eacd6f04
    • Paul E. McKenney's avatar
      rcu: Add per-task state to RCU CPU stall warnings · 5bef8da6
      Paul E. McKenney authored
      Currently, an RCU-preempt CPU stall warning simply lists the PIDs of
      those tasks holding up the current grace period.  This can be helpful,
      but more can be even more helpful.
      
      To this end, this commit adds the nesting level, whether the task
      thinks it was preempted in its current RCU read-side critical section,
      whether RCU core has asked this task for a quiescent state, whether the
      expedited-grace-period hint is set, and whether the task believes that
      it is on the blocked-tasks list (it must be, or it would not be printed,
      but if things are broken, best not to take too much for granted).
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      5bef8da6
    • Paul E. McKenney's avatar
      sched/core: Add function to sample state of locked-down task · 2beaf328
      Paul E. McKenney authored
      A running task's state can be sampled in a consistent manner (for example,
      for diagnostic purposes) simply by invoking smp_call_function_single()
      on its CPU, which may be obtained using task_cpu(), then having the
      IPI handler verify that the desired task is in fact still running.
      However, if the task is not running, this sampling can in theory be done
      immediately and directly.  In practice, the task might start running at
      any time, including during the sampling period.  Gaining a consistent
      sample of a not-running task therefore requires that something be done
      to lock down the target task's state.
      
      This commit therefore adds a try_invoke_on_locked_down_task() function
      that invokes a specified function if the specified task can be locked
      down, returning true if successful and if the specified function returns
      true.  Otherwise this function simply returns false.  Given that the
      function passed to try_invoke_on_nonrunning_task() might be invoked with
      a runqueue lock held, that function had better be quite lightweight.
      
      The function is passed the target task's task_struct pointer and the
      argument passed to try_invoke_on_locked_down_task(), allowing easy access
      to task state and to a location for further variables to be passed in
      and out.
      
      Note that the specified function will be called even if the specified
      task is currently running.  The function can use ->on_rq and task_curr()
      to quickly and easily determine the task's state, and can return false
      if this state is not to the function's liking.  The caller of the
      try_invoke_on_locked_down_task() would then see the false return value,
      and could take appropriate action, for example, trying again later or
      sending an IPI if matters are more urgent.
      
      It is expected that use cases such as the RCU CPU stall warning code will
      simply return false if the task is currently running.  However, there are
      use cases involving nohz_full CPUs where the specified function might
      instead fall back to an alternative sampling scheme that relies on heavier
      synchronization (such as memory barriers) in the target task.
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      [ paulmck: Apply feedback from Peter Zijlstra and Steven Rostedt. ]
      [ paulmck: Invoke if running to handle feedback from Mathieu Desnoyers. ]
      Reviewed-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Reviewed-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      2beaf328
    • Paul E. McKenney's avatar
      rcu-tasks: Use context-switch hook for PREEMPT=y kernels · 66777e58
      Paul E. McKenney authored
      Currently, the PREEMPT=y version of rcu_note_context_switch() does not
      invoke rcu_tasks_qs(), and we need it to in order to keep RCU Tasks
      Trace's IPIs down to a dull roar.  This commit therefore enables this
      hook.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      66777e58
    • Paul E. McKenney's avatar
      rcu: Add comments marking transitions between RCU watching and not · ac3caf82
      Paul E. McKenney authored
      It is not as clear as it might be just where in RCU's idle entry/exit
      code RCU stops and starts watching the current CPU.  This commit therefore
      adds comments calling out the transitions.
      Reported-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      ac3caf82
    • Paul E. McKenney's avatar
      rcutorture: Add test of holding scheduler locks across rcu_read_unlock() · 52b1fc3f
      Paul E. McKenney authored
      Now that it should be safe to hold scheduler locks across
      rcu_read_unlock(), even in cases where the corresponding RCU read-side
      critical section might have been preempted and boosted, the commit adds
      a test of this capability to rcutorture.  This has been tested on current
      mainline (which can deadlock in this situation), and lockdep duly reported
      the expected deadlock.  On -rcu, lockdep is silent, thus far, anyway.
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      52b1fc3f
    • Lai Jiangshan's avatar
      rcu: Don't use negative nesting depth in __rcu_read_unlock() · 5f5fa7ea
      Lai Jiangshan authored
      Now that RCU flavors have been consolidated, an RCU-preempt
      rcu_read_unlock() in an interrupt or softirq handler cannot possibly
      end the RCU read-side critical section.  Consider the old vulnerability
      involving rcu_read_unlock() being invoked within such a handler that
      interrupted an __rcu_read_unlock_special(), in which a wakeup might be
      invoked with a scheduler lock held.  Because rcu_read_unlock_special()
      no longer does wakeups in such situations, it is no longer necessary
      for __rcu_read_unlock() to set the nesting level negative.
      
      This commit therefore removes this recursion-protection code from
      __rcu_read_unlock().
      
      [ paulmck: Let rcu_exp_handler() continue to call rcu_report_exp_rdp(). ]
      [ paulmck: Adjust other checks given no more negative nesting. ]
      Signed-off-by: default avatarLai Jiangshan <laijs@linux.alibaba.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      5f5fa7ea
    • Lai Jiangshan's avatar
      rcu: Remove unused ->rcu_read_unlock_special.b.deferred_qs field · f0bdf6d4
      Lai Jiangshan authored
      The ->rcu_read_unlock_special.b.deferred_qs field is set to true in
      rcu_read_unlock_special() but never set to false.  This is not
      particularly useful, so this commit removes this field.
      
      The only possible justification for this field is to ease debugging
      of RCU deferred quiscent states, but the combination of the other
      ->rcu_read_unlock_special fields plus ->rcu_blocked_node and of course
      ->rcu_read_lock_nesting should cover debugging needs.  And if this last
      proves incorrect, this patch can always be reverted, along with the
      required setting of ->rcu_read_unlock_special.b.deferred_qs to false
      in rcu_preempt_deferred_qs_irqrestore().
      Signed-off-by: default avatarLai Jiangshan <laijs@linux.alibaba.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      f0bdf6d4
    • Lai Jiangshan's avatar
      rcu: Don't set nesting depth negative in rcu_preempt_deferred_qs() · 07b4a930
      Lai Jiangshan authored
      Now that RCU flavors have been consolidated, an RCU-preempt
      rcu_read_unlock() in an interrupt or softirq handler cannot possibly
      end the RCU read-side critical section.  Consider the old vulnerability
      involving rcu_preempt_deferred_qs() being invoked within such a handler
      that interrupted an extended RCU read-side critical section, in which
      a wakeup might be invoked with a scheduler lock held.  Because
      rcu_read_unlock_special() no longer does wakeups in such situations,
      it is no longer necessary for rcu_preempt_deferred_qs() to set the
      nesting level negative.
      
      This commit therefore removes this recursion-protection code from
      rcu_preempt_deferred_qs().
      
      [ paulmck: Fix typo in commit log per Steve Rostedt. ]
      Signed-off-by: default avatarLai Jiangshan <laijs@linux.alibaba.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      07b4a930
    • Paul E. McKenney's avatar
      rcu: Make rcu_read_unlock_special() safe for rq/pi locks · e4453d8a
      Paul E. McKenney authored
      The scheduler is currently required to hold rq/pi locks across the entire
      RCU read-side critical section or not at all.  This is inconvenient and
      leaves traps for the unwary, including the author of this commit.
      
      But now that excessively long grace periods enable scheduling-clock
      interrupts for holdout nohz_full CPUs, the nohz_full rescue logic in
      rcu_read_unlock_special() can be dispensed with.  In other words, the
      rcu_read_unlock_special() function can refrain from doing wakeups unless
      such wakeups are guaranteed safe.
      
      This commit therefore avoids unsafe wakeups, freeing the scheduler to
      hold rq/pi locks across rcu_read_unlock() even if the corresponding RCU
      read-side critical section might have been preempted.  This commit also
      updates RCU's requirements documentation.
      
      This commit is inspired by a patch from Lai Jiangshan:
      https://lore.kernel.org/lkml/20191102124559.1135-2-laijs@linux.alibaba.com
      This commit is further intended to be a step towards his goal of permitting
      the inlining of RCU-preempt's rcu_read_lock() and rcu_read_unlock().
      
      Cc: Lai Jiangshan <laijs@linux.alibaba.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      e4453d8a