An error occurred fetching the project authors.
  1. 13 Aug, 2019 3 commits
    • Paul E. McKenney's avatar
      rcu/nocb: Provide separate no-CBs grace-period kthreads · 12f54c3a
      Paul E. McKenney authored
      Currently, there is one no-CBs rcuo kthread per CPU, and these kthreads
      are divided into groups.  The first rcuo kthread to come online in a
      given group is that group's leader, and the leader both waits for grace
      periods and invokes its CPU's callbacks.  The non-leader rcuo kthreads
      only invoke callbacks.
      
      This works well in the real-time/embedded environments for which it was
      intended because such environments tend not to generate all that many
      callbacks.  However, given huge floods of callbacks, it is possible for
      the leader kthread to be stuck invoking callbacks while its followers
      wait helplessly while their callbacks pile up.  This is a good recipe
      for an OOM, and rcutorture's new callback-flood capability does generate
      such OOMs.
      
      One strategy would be to wait until such OOMs start happening in
      production, but similar OOMs have in fact happened starting in 2018.
      It would therefore be wise to take a more proactive approach.
      
      This commit therefore features per-CPU rcuo kthreads that do nothing
      but invoke callbacks.  Instead of having one of these kthreads act as
      leader, each group has a separate rcog kthread that handles grace periods
      for its group.  Because these rcuog kthreads do not invoke callbacks,
      callback floods on one CPU no longer block callbacks from reaching the
      rcuc callback-invocation kthreads on other CPUs.
      
      This change does introduce additional kthreads, however:
      
      1.	The number of additional kthreads is about the square root of
      	the number of CPUs, so that a 4096-CPU system would have only
      	about 64 additional kthreads.  Note that recent changes
      	decreased the number of rcuo kthreads by a factor of two
      	(CONFIG_PREEMPT=n) or even three (CONFIG_PREEMPT=y), so
      	this still represents a significant improvement on most systems.
      
      2.	The leading "rcuo" of the rcuog kthreads should allow existing
      	scripting to affinity these additional kthreads as needed, the
      	same as for the rcuop and rcuos kthreads.  (There are no longer
      	any rcuob kthreads.)
      
      3.	A state-machine approach was considered and rejected.  Although
      	this would allow the rcuo kthreads to continue their dual
      	leader/follower roles, it complicates callback invocation
      	and makes it more difficult to consolidate rcuo callback
      	invocation with existing softirq callback invocation.
      
      The introduction of rcuog kthreads should thus be acceptable.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      12f54c3a
    • Paul E. McKenney's avatar
      rcu/nocb: Update comments to prepare for forward-progress work · 6484fe54
      Paul E. McKenney authored
      This commit simply rewords comments to prepare for leader nocb kthreads
      doing only grace-period work and callback shuffling.  This will mean
      the addition of replacement kthreads to invoke callbacks.  The "leader"
      and "follower" thus become less meaningful, so the commit changes no-CB
      comments with these strings to "GP" and "CB", respectively.  (Give or
      take the usual grammatical transformations.)
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      6484fe54
    • Paul E. McKenney's avatar
      rcu/nocb: Rename rcu_data fields to prepare for forward-progress work · 58bf6f77
      Paul E. McKenney authored
      This commit simply renames rcu_data fields to prepare for leader
      nocb kthreads doing only grace-period work and callback shuffling.
      This will mean the addition of replacement kthreads to invoke callbacks.
      The "leader" and "follower" thus become less meaningful, so the commit
      changes no-CB fields with these strings to "gp" and "cb", respectively.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      58bf6f77
  2. 01 Aug, 2019 5 commits
    • Byungchul Park's avatar
      rcu: Change return type of rcu_spawn_one_boost_kthread() · 3545832f
      Byungchul Park authored
      The return value of rcu_spawn_one_boost_kthread() is not used any longer.
      This commit therefore changes its return type from int to void, and
      removes the cast to void from its callers.
      Signed-off-by: default avatarByungchul Park <byungchul.park@lge.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      3545832f
    • Paul E. McKenney's avatar
      rcu: Restore barrier() to rcu_read_lock() and rcu_read_unlock() · 1f3ebc82
      Paul E. McKenney authored
      Commit bb73c52b ("rcu: Don't disable preemption for Tiny and Tree
      RCU readers") removed the barrier() calls from rcu_read_lock() and
      rcu_write_lock() in CONFIG_PREEMPT=n&&CONFIG_PREEMPT_COUNT=n kernels.
      Within RCU, this commit was OK, but it failed to account for things like
      get_user() that can pagefault and that can be reordered by the compiler.
      Lack of the barrier() calls in rcu_read_lock() and rcu_read_unlock()
      can cause these page faults to migrate into RCU read-side critical
      sections, which in CONFIG_PREEMPT=n kernels could result in too-short
      grace periods and arbitrary misbehavior.  Please see commit 386afc91
      ("spinlocks and preemption points need to be at least compiler barriers")
      and Linus's commit 66be4e66 ("rcu: locking and unlocking need to
      always be at least barriers"), this last of which restores the barrier()
      call to both rcu_read_lock() and rcu_read_unlock().
      
      This commit removes barrier() calls that are no longer needed given that
      the addition of them in Linus's commit noted above.  The combination of
      this commit and Linus's commit effectively reverts commit bb73c52b
      ("rcu: Don't disable preemption for Tiny and Tree RCU readers").
      Reported-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      [ paulmck: Fix embarrassing typo located by Alan Stern. ]
      1f3ebc82
    • Joel Fernandes (Google)'s avatar
      rcu: Simplify rcu_note_context_switch exit from critical section · cb4dbbfa
      Joel Fernandes (Google) authored
      Because __rcu_read_unlock() can be preempted just before the call to
      rcu_read_unlock_special(), it is possible for a task to be preempted just
      before it would have fully exited its RCU read-side critical section.
      This would result in a needless extension of that critical section until
      that task was resumed, which might in turn result in a needlessly
      long grace period, needless RCU priority boosting, and needless
      force-quiescent-state actions.  Therefore, rcu_note_context_switch()
      invokes __rcu_read_unlock() followed by rcu_preempt_deferred_qs() when
      it detects this situation.  This action by rcu_note_context_switch()
      ends the RCU read-side critical section immediately.
      
      Of course, once the task resumes, it will invoke rcu_read_unlock_special()
      redundantly.  This is harmless because the fact that a preemption
      happened means that interrupts, preemption, and softirqs cannot
      have been disabled, so there would be no deferred quiescent state.
      While ->rcu_read_lock_nesting remains less than zero, none of the
      ->rcu_read_unlock_special.b bits can be set, and they were all zeroed by
      the call to rcu_note_context_switch() at task-preemption time.  Therefore,
      setting ->rcu_read_unlock_special.b.exp_hint to false has no effect.
      
      Therefore, the extra call to rcu_preempt_deferred_qs_irqrestore()
      would return immediately.  With one possible exception, which is
      if an expedited grace period started just as the task was being
      resumed, which could leave ->exp_deferred_qs set.  This will cause
      rcu_preempt_deferred_qs_irqrestore() to invoke rcu_report_exp_rdp(),
      reporting the quiescent state, just as it should.  (Such an expedited
      grace period won't affect the preemption code path due to interrupts
      having already been disabled.)
      
      But when rcu_note_context_switch() invokes __rcu_read_unlock(), it
      is doing so with preemption disabled, hence __rcu_read_unlock() will
      unconditionally defer the quiescent state, only to immediately invoke
      rcu_preempt_deferred_qs(), thus immediately reporting the deferred
      quiescent state.  It turns out to be safe (and faster) to instead
      just invoke rcu_preempt_deferred_qs() without the __rcu_read_unlock()
      middleman.
      
      Because this is the invocation during the preemption (as opposed to
      the invocation just after the resume), at least one of the bits in
      ->rcu_read_unlock_special.b must be set and ->rcu_read_lock_nesting
      must be negative.  This means that rcu_preempt_need_deferred_qs() must
      return true, avoiding the early exit from rcu_preempt_deferred_qs().
      Thus, rcu_preempt_deferred_qs_irqrestore() will be invoked immediately,
      as required.
      
      This commit therefore simplifies the CONFIG_PREEMPT=y version of
      rcu_note_context_switch() by removing the "else if" branch of its
      "if" statement.  This change means that all callers that would have
      invoked rcu_read_unlock_special() followed by rcu_preempt_deferred_qs()
      will now simply invoke rcu_preempt_deferred_qs(), thus avoiding the
      rcu_read_unlock_special() middleman when __rcu_read_unlock() is preempted.
      
      Cc: rcu@vger.kernel.org
      Cc: kernel-team@android.com
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      cb4dbbfa
    • Paul E. McKenney's avatar
      rcu: Make rcu_read_unlock_special() checks match raise_softirq_irqoff() · 87446b48
      Paul E. McKenney authored
      Threaded interrupts provide additional interesting interactions between
      RCU and raise_softirq() that can result in self-deadlocks in v5.0-2 of
      the Linux kernel.  These self-deadlocks can be provoked in susceptible
      kernels within a few minutes using the following rcutorture command on
      an 8-CPU system:
      
      tools/testing/selftests/rcutorture/bin/kvm.sh --duration 5 --configs "TREE03" --bootargs "threadirqs"
      
      Although post-v5.2 RCU commits have at least greatly reduced the
      probability of these self-deadlocks, this was entirely by accident.
      Although this sort of accident should be rowdily celebrated on those
      rare occasions when it does occur, such celebrations should be quickly
      followed by a principled patch, which is what this patch purports to be.
      
      The key point behind this patch is that when in_interrupt() returns
      true, __raise_softirq_irqoff() will never attempt a wakeup.  Therefore,
      if in_interrupt(), calls to raise_softirq*() are both safe and
      extremely cheap.
      
      This commit therefore replaces the in_irq() calls in the "if" statement
      in rcu_read_unlock_special() with in_interrupt() and simplifies the
      "if" condition to the following:
      
      	if (irqs_were_disabled && use_softirq &&
      	    (in_interrupt() ||
      	     (exp && !t->rcu_read_unlock_special.b.deferred_qs))) {
      		raise_softirq_irqoff(RCU_SOFTIRQ);
      	} else {
      		/* Appeal to the scheduler. */
      	}
      
      The rationale behind the "if" condition is as follows:
      
      1.	irqs_were_disabled:  If interrupts are enabled, we should
      	instead appeal to the scheduler so as to let the upcoming
      	irq_enable()/local_bh_enable() do the rescheduling for us.
      2.	use_softirq: If this kernel isn't using softirq, then
      	raise_softirq_irqoff() will be unhelpful.
      3.	a.	in_interrupt(): If this returns true, the subsequent
      		call to raise_softirq_irqoff() is guaranteed not to
      		do a wakeup, so that call will be both very cheap and
      		quite safe.
      	b.	Otherwise, if !in_interrupt() the raise_softirq_irqoff()
      		might do a wakeup, which is expensive and, in some
      		contexts, unsafe.
      		i.	The "exp" (an expedited RCU grace period is being
      			blocked) says that the wakeup is worthwhile, and:
      		ii.	The !.deferred_qs says that scheduler locks
      			cannot be held, so the wakeup will be safe.
      
      Backporting this requires considerable care, so no auto-backport, please!
      
      Fixes: 05f41571 ("rcu: Speed up expedited GPs when interrupting RCU reader")
      Reported-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      87446b48
    • Paul E. McKenney's avatar
      rcu: Simplify rcu_read_unlock_special() deferred wakeups · d143b3d1
      Paul E. McKenney authored
      In !use_softirq runs, we clearly cannot rely on raise_softirq() and
      its lightweight bit setting, so we must instead do some form of wakeup.
      In the absence of a self-IPI when interrupts are disabled, these wakeups
      can be delayed until the next interrupt occurs.  This means that calling
      invoke_rcu_core() doesn't actually do any expediting.
      
      In this case, it is better to take the "else" clause, which sets the
      current CPU's resched bits and, if there is an expedited grace period
      in flight, uses IRQ-work to force the needed self-IPI.  This commit
      therefore removes the "else if" clause that calls invoke_rcu_core().
      Reported-by: default avatarScott Wood <swood@redhat.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      d143b3d1
  3. 28 May, 2019 2 commits
  4. 25 May, 2019 5 commits
    • Paul E. McKenney's avatar
      rcu: Use irq_work to get scheduler's attention in clean context · 0864f057
      Paul E. McKenney authored
      When rcu_read_unlock_special() is invoked with interrupts disabled, is
      either not in an interrupt handler or is not using RCU_SOFTIRQ, is not
      the first RCU read-side critical section in the chain, and either there
      is an expedited grace period in flight or this is a NO_HZ_FULL kernel,
      the end of the grace period can be unduly delayed.  The reason for this
      is that it is not safe to do wakeups in this situation.
      
      This commit fixes this problem by using the irq_work subsystem to
      force a later interrupt handler in a clean environment.  Because
      set_tsk_need_resched(current) and set_preempt_need_resched() are
      invoked prior to this, the scheduler will force a context switch
      upon return from this interrupt (though perhaps at the end of any
      interrupted preempt-disable or BH-disable region of code), which will
      invoke rcu_note_context_switch() (again in a clean environment), which
      will in turn give RCU the chance to report the deferred quiescent state.
      
      Of course, by then this task might be within another RCU read-side
      critical section.  But that will be detected at that time and reporting
      will be further deferred to the outermost rcu_read_unlock().  See
      rcu_preempt_need_deferred_qs() and rcu_preempt_deferred_qs() for more
      details on the checking.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      0864f057
    • Paul E. McKenney's avatar
      rcu: Allow rcu_read_unlock_special() to raise_softirq() if in_irq() · 385b599e
      Paul E. McKenney authored
      When running in an interrupt handler, raise_softirq() and
      raise_softirq_irqoff() have extremely low overhead: They simply set a
      bit in a per-CPU mask, which is checked upon exit from that interrupt
      handler.  Therefore, if rcu_read_unlock_special() is invoked within an
      interrupt handler and RCU_SOFTIRQ is in use, this commit make use of
      raise_softirq_irqoff() even if there is no expedited grace period in
      flight and even if this is not a nohz_full CPU.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      385b599e
    • Paul E. McKenney's avatar
      rcu: Only do rcu_read_unlock_special() wakeups if expedited · 25102de6
      Paul E. McKenney authored
      Currently, rcu_read_unlock_special() will do wakeups whenever it is safe
      to do so.  However, wakeups are expensive, and they are only really
      needed when the just-ended RCU read-side critical section is blocking
      an expedited grace period (in which case speed is of the essence)
      or on a nohz_full CPU (where it might be a good long time before an
      interrupt arrives).  This commit therefore checks for these conditions,
      and does the expensive wakeups only if doing so would be useful.
      
      Note it can be rather expensive to determine whether or not the current
      task (as opposed to the current CPU) is blocking the current expedited
      grace period.  Doing so requires traversing the ->blkd_tasks list, which
      can be quite long.  This commit therefore cheats:  If the current task
      is on a given ->blkd_tasks list, and some task on that list is blocking
      the current expedited grace period, the code assumes that the current
      task is blocking that expedited grace period.
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      25102de6
    • Paul E. McKenney's avatar
      rcu: Check for wakeup-safe conditions in rcu_read_unlock_special() · 23634ebc
      Paul E. McKenney authored
      When RCU core processing is offloaded from RCU_SOFTIRQ to the rcuc
      kthreads, a full and unconditional wakeup is required to initiate RCU
      core processing.  In contrast, when RCU core processing is carried
      out by RCU_SOFTIRQ, a raise_softirq() suffices.  Of course, there are
      situations where raise_softirq() does a full wakeup, but these do not
      occur with normal usage of rcu_read_unlock().
      
      The reason that full wakeups can be problematic is that the scheduler
      sometimes invokes rcu_read_unlock() with its pi or rq locks held,
      which can of course result in deadlock in CONFIG_PREEMPT=y kernels when
      rcu_read_unlock() invokes the scheduler.  Scheduler invocations can happen
      in the following situations: (1) The just-ended reader has been subjected
      to RCU priority boosting, in which case rcu_read_unlock() must deboost,
      (2) Interrupts were disabled across the call to rcu_read_unlock(), so
      the quiescent state must be deferred, requiring a wakeup of the rcuc
      kthread corresponding to the current CPU.
      
      Now, the scheduler may hold one of its locks across rcu_read_unlock()
      only if preemption has been disabled across the entire RCU read-side
      critical section, which in the days prior to RCU flavor consolidation
      meant that rcu_read_unlock() never needed to do wakeups.  However, this
      is no longer the case for any but the first rcu_read_unlock() following a
      condition (e.g., preempted RCU reader) requiring special rcu_read_unlock()
      attention.  For example, an RCU read-side critical section might be
      preempted, but preemption might be disabled across the rcu_read_unlock().
      The rcu_read_unlock() must defer the quiescent state, and therefore
      leaves the task queued on its leaf rcu_node structure.  If a scheduler
      interrupt occurs, the scheduler might well invoke rcu_read_unlock() with
      one of its locks held.  However, the preempted task is still queued, so
      rcu_read_unlock() will attempt to defer the quiescent state once more.
      When RCU core processing is carried out by RCU_SOFTIRQ, this works just
      fine: The raise_softirq() function simply sets a bit in a per-CPU mask
      and the RCU core processing will be undertaken upon return from interrupt.
      
      Not so when RCU core processing is carried out by the rcuc kthread: In this
      case, the required wakeup can result in deadlock.
      
      The initial solution to this problem was to use set_tsk_need_resched() and
      set_preempt_need_resched() to force a future context switch, which allows
      rcu_preempt_note_context_switch() to report the deferred quiescent state
      to RCU's core processing.  Unfortunately for expedited grace periods,
      there can be a significant delay between the call for a context switch
      and the actual context switch.
      
      This commit therefore introduces a ->deferred_qs flag to the task_struct
      structure's rcu_special structure.  This flag is initially false, and
      is set to true by the first call to rcu_read_unlock() requiring special
      attention, then finally reset back to false when the quiescent state is
      finally reported.  Then rcu_read_unlock() attempts full wakeups only when
      ->deferred_qs is false, that is, on the first rcu_read_unlock() requiring
      special attention.  Note that a chain of RCU readers linked by some other
      sort of reader may find that a later rcu_read_unlock() is once again able
      to do a full wakeup, courtesy of an intervening preemption:
      
      	rcu_read_lock();
      	/* preempted */
      	local_irq_disable();
      	rcu_read_unlock(); /* Can do full wakeup, sets ->deferred_qs. */
      	rcu_read_lock();
      	local_irq_enable();
      	preempt_disable()
      	rcu_read_unlock(); /* Cannot do full wakeup, ->deferred_qs set. */
      	rcu_read_lock();
      	preempt_enable();
      	/* preempted, >deferred_qs reset. */
      	local_irq_disable();
      	rcu_read_unlock(); /* Can again do full wakeup, sets ->deferred_qs. */
      
      Such linked RCU readers do not yet seem to appear in the Linux kernel, and
      it is probably best if they don't.  However, RCU needs to handle them, and
      some variations on this theme could make even raise_softirq() unsafe due to
      the possibility of its doing a full wakeup.  This commit therefore also
      avoids invoking raise_softirq() when the ->deferred_qs set flag is set.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      23634ebc
    • Sebastian Andrzej Siewior's avatar
      rcu: Enable elimination of Tree-RCU softirq processing · 48d07c04
      Sebastian Andrzej Siewior authored
      Some workloads need to change kthread priority for RCU core processing
      without affecting other softirq work.  This commit therefore introduces
      the rcutree.use_softirq kernel boot parameter, which moves the RCU core
      work from softirq to a per-CPU SCHED_OTHER kthread named rcuc.  Use of
      SCHED_OTHER approach avoids the scalability problems that appeared
      with the earlier attempt to move RCU core processing to from softirq
      to kthreads.  That said, kernels built with RCU_BOOST=y will run the
      rcuc kthreads at the RCU-boosting priority.
      
      Note that rcutree.use_softirq=0 must be specified to move RCU core
      processing to the rcuc kthreads: rcutree.use_softirq=1 is the default.
      Reported-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarMike Galbraith <efault@gmx.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      [ paulmck: Adjust for invoke_rcu_callbacks() only ever being invoked
        from RCU core processing, in contrast to softirq->rcuc transition
        in old mainline RCU priority boosting. ]
      [ paulmck: Avoid wakeups when scheduler might have invoked rcu_read_unlock()
        while holding rq or pi locks, also possibly fixing a pre-existing latent
        bug involving raise_softirq()-induced wakeups. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      48d07c04
  5. 26 Mar, 2019 9 commits
    • Paul E. McKenney's avatar
      rcu: Move FAST_NO_HZ stall-warning code to tree_stall.h · 59b73a27
      Paul E. McKenney authored
      This commit further consolidates the stall-warning code by moving
      print_cpu_stall_info() and its helper functions along with
      zero_cpu_stall_ticks() to kernel/rcu/tree_stall.h.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      59b73a27
    • Paul E. McKenney's avatar
      rcu: Inline RCU stall-warning info helper functions · 40e69ac7
      Paul E. McKenney authored
      The print_cpu_stall_info_begin() and print_cpu_stall_info_end() print a
      single character each onto the console, and are a holdover from a time
      when RCU CPU stall warning messages could be abbreviated using a long-gone
      Kconfig option.  This commit therefore adds these single characters to
      already-printed strings in the calling functions, and then eliminates
      both print_cpu_stall_info_begin() and print_cpu_stall_info_end().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      40e69ac7
    • Paul E. McKenney's avatar
      rcu: Move rcu_print_task_exp_stall() to tree_exp.h · d87cda50
      Paul E. McKenney authored
      Because expedited CPU stall warnings are contained within the
      kernel/rcu/tree_exp.h file, rcu_print_task_exp_stall() should live
      there too.  This commit carries out the required code motion.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      d87cda50
    • Paul E. McKenney's avatar
      rcu: Move RCU CPU stall-warning code out of tree_plugin.h · 3fc3d170
      Paul E. McKenney authored
      The RCU CPU stall-warning code for normal grace periods is currently
      scattered across two files, due to earlier Tiny RCU support for RCU
      CPU stall warnings and for old Kconfig options that have long since
      been retired.  Given that it is hard for the lead RCU maintainer to
      find relevant stall-warning code, it would be good to consolidate it.
      This commit continues this process by moving stall-warning code from
      kernel/rcu/tree_plugin.c to a new kernel/rcu/tree_stall.h file.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      3fc3d170
    • Paul E. McKenney's avatar
      rcu: Correct READ_ONCE()/WRITE_ONCE() for ->rcu_read_unlock_special · add0d37b
      Paul E. McKenney authored
      The task_struct structure's ->rcu_read_unlock_special field is only ever
      read or written by the owning task, but it is accessed both at process
      and interrupt levels.  It may therefore be accessed using plain reads
      and writes while interrupts are disabled, but must be accessed using
      READ_ONCE() and WRITE_ONCE() or better otherwise.  This commit makes a
      few adjustments to align with this discipline.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      add0d37b
    • Paul E. McKenney's avatar
      rcu: Eliminate redundant NULL-pointer check · a2badefa
      Paul E. McKenney authored
      Because rcu_wake_cond() checks for a null task_struct pointer, there is
      no need for its callers to do so.  This commit eliminates the redundant
      check.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      a2badefa
    • Paul E. McKenney's avatar
      rcu: Report error for bad rcu_nocbs= parameter values · 497e4260
      Paul E. McKenney authored
      This commit prints a console message when cpulist_parse() reports a
      bad list of CPUs, and sets all CPUs' bits in that case.  The reason for
      setting all CPUs' bits is that this is the safe(r) choice for real-time
      workloads, which would normally be the ones using the rcu_nocbs= kernel
      boot parameter.  Either way, later RCU console log messages list the
      actual set of CPUs whose RCU callbacks will be offloaded.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      497e4260
    • Paul E. McKenney's avatar
      rcu: Allow rcu_nocbs= to specify all CPUs · da8739f2
      Paul E. McKenney authored
      Currently, the rcu_nocbs= kernel boot parameter requires that a specific
      list of CPUs be specified, and has no way to say "all of them".
      As noted by user RavFX in a comment to Phoronix topic 1002538, this
      is an inconvenient side effect of the removal of the RCU_NOCB_CPU_ALL
      Kconfig option.  This commit therefore enables the rcu_nocbs= kernel boot
      parameter to be given the string "all", as in "rcu_nocbs=all" to specify
      that all CPUs on the system are to have their RCU callbacks offloaded.
      
      Another approach would be to make cpulist_parse() check for "all", but
      there are uses of cpulist_parse() that do other checking, which could
      conflict with an "all".  This commit therefore focuses on the specific
      use of cpulist_parse() in rcu_nocb_setup().
      
      Just a note to other people who would like changes to Linux-kernel RCU:
      If you send your requests to me directly, they might get fixed somewhat
      faster.  RavFX's comment was posted on January 22, 2018 and I first saw
      it on March 5, 2019.  And the only reason that I found it -at- -all- was
      that I was looking for projects using RCU, and my search engine showed
      me that Phoronix comment quite by accident.  Your choice, though!  ;-)
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      da8739f2
    • Paul E. McKenney's avatar
      rcu: Make exit_rcu() handle non-preempted RCU readers · 884157ce
      Paul E. McKenney authored
      The purpose of exit_rcu() is to handle cases where buggy code causes a
      task to exit within an RCU read-side critical section.  It currently
      does that in the case where said RCU read-side critical section was
      preempted at least once, but fails to handle cases where preemption did
      not occur.  This case needs to be handled because otherwise the final
      context switch away from the exiting task will incorrectly behave as if
      task exit were instead a preemption of an RCU read-side critical section,
      and will therefore queue the exiting task.  The exiting task will have
      exited, and thus won't ever execute rcu_read_unlock(), which means that
      it will remain queued forever, blocking all subsequent grace periods,
      and eventually resulting in OOM.
      
      Although this is arguably better than letting grace periods proceed
      and having a later rcu_read_unlock() access the now-freed task
      structure that once belonged to the exiting tasks, it would obviously
      be better to correctly handle this case.  This commit therefore sets
      ->rcu_read_lock_nesting to 1 in that case, so that the subsequence call
      to __rcu_read_unlock() causes the exiting task to exit its dangling RCU
      read-side critical section.
      
      Note that deferred quiescent states need not be considered.  The reason
      is that removing the task from the ->blkd_tasks[] list in the call to
      rcu_preempt_deferred_qs() handles the per-task component of any deferred
      quiescent state, and all other components of any deferred quiescent state
      are associated with the CPU, which isn't going anywhere until some later
      CPU-hotplug operation, which will report any remaining deferred quiescent
      states from within the rcu_report_dead() function.
      
      Note also that negative values of ->rcu_read_lock_nesting need not be
      considered.  First, these won't show up in exit_rcu() unless there is
      a serious bug in RCU, and second, setting ->rcu_read_lock_nesting sets
      the state so that the RCU read-side critical section will be exited
      normally.
      
      Again, this code has no effect unless there has been some prior bug
      that prevents a task from leaving an RCU read-side critical section
      before exiting.  Furthermore, there have been no reports of the bug
      fixed by this commit appearing in production.  This commit is therefore
      absolutely -not- recommended for backporting to -stable.
      Reported-by: default avatarABHISHEK DUBEY <dabhishek@iisc.ac.in>
      Reported-by: default avatarBHARATH Y MOURYA <bharathm@iisc.ac.in>
      Reported-by: default avatarAravinda Prasad <aravinda@iisc.ac.in>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      Tested-by: default avatarABHISHEK DUBEY <dabhishek@iisc.ac.in>
      884157ce
  6. 09 Feb, 2019 1 commit
  7. 25 Jan, 2019 10 commits
    • Paul E. McKenney's avatar
      rcu: Rename rcu_check_callbacks() to rcu_sched_clock_irq() · c98cac60
      Paul E. McKenney authored
      The name rcu_check_callbacks() arguably made sense back in the early
      2000s when RCU was quite a bit simpler than it is today, but it has
      become quite misleading, especially with the advent of dyntick-idle
      and NO_HZ_FULL.  The rcu_check_callbacks() function is RCU's hook into
      the scheduling-clock interrupt, and is now but one of many ways that
      callbacks get promoted to invocable state.
      
      This commit therefore changes the name to rcu_sched_clock_irq(),
      which is the same number of characters and clearly indicates this
      function's relation to the rest of the Linux kernel.  In addition, for
      the sake of consistency, rcu_flavor_check_callbacks() is also renamed
      to rcu_flavor_sched_clock_irq().
      
      While in the area, the header comments for both functions are reworked.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      c98cac60
    • Paul E. McKenney's avatar
      rcu: Update NOCB comments · a9fefdb2
      Paul E. McKenney authored
      This commit updates a few obsolete comments in the RCU callback-offload
      code.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      a9fefdb2
    • Paul E. McKenney's avatar
      rcu: Move rcu_cpu_has_work to rcu_data structure · f7e972ee
      Paul E. McKenney authored
      Given that RCU has a perfectly good per-CPU rcu_data structure, most
      per-CPU quantities should be stored there.
      
      This commit therefore moves the rcu_cpu_has_work per-CPU variable to
      the rcu_data structure.  This also makes this variable unconditionally
      present, which should be acceptable given the memory reduction due to the
      RCU flavor consolidation and also due to simplifications this will enable.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      f7e972ee
    • Paul E. McKenney's avatar
      rcu: Remove unused rcu_cpu_kthread_loops per-CPU variable · 8b4d0f48
      Paul E. McKenney authored
      The rcu_cpu_kthread_loops variable used to provide debugfs information,
      but is no longer used.  This commit therefore removes it.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      8b4d0f48
    • Paul E. McKenney's avatar
      rcu: Move rcu_cpu_kthread_status to rcu_data structure · 6ffdde28
      Paul E. McKenney authored
      Given that RCU has a perfectly good per-CPU rcu_data structure, most
      per-CPU quantities should be stored there.
      
      This commit therefore moves the rcu_cpu_kthread_status per-CPU variable
      to the rcu_data structure.  This also makes this variable unconditionally
      present, which should be acceptable given the memory reduction due to the
      RCU flavor consolidation and also due to simplifications this will enable.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      6ffdde28
    • Paul E. McKenney's avatar
      rcu: Move rcu_cpu_kthread_task to rcu_data structure · 37f62d7c
      Paul E. McKenney authored
      Given that RCU has a perfectly good per-CPU rcu_data structure, most
      per-CPU quantities should be stored there.
      
      This commit therefore moves the rcu_cpu_kthread_task per-CPU variable to
      the rcu_data structure.  This also makes this variable unconditionally
      present, which should be acceptable given the memory reduction due to the
      RCU flavor consolidation and also due to simplifications this will enable.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      37f62d7c
    • Paul E. McKenney's avatar
      rcu: Discard separate per-CPU callback counts · 260e1e4f
      Paul E. McKenney authored
      Back when there were multiple flavors of RCU, it was necessary to
      separately count lazy and non-lazy callbacks for each CPU.  These counts
      were used in CONFIG_RCU_FAST_NO_HZ kernels to determine how long a newly
      idle CPU should be allowed to sleep before handling its RCU callbacks.
      But now that there is only one flavor, the callback counts for a given
      CPU's sole rcu_data structure are the counts for that CPU.
      
      This commit therefore removes the rcu_data structure's ->nonlazy_posted
      and ->nonlazy_posted_snap fields, the rcu_idle_count_callbacks_posted()
      and rcu_cpu_has_callbacks() functions, repurposes the rcu_data structure's
      ->all_lazy field to record the laziness state at the beginning of the
      latest idle sojourn, and modifies CONFIG_RCU_FAST_NO_HZ RCU CPU stall
      warnings accordingly.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      260e1e4f
    • Paul E. McKenney's avatar
      rcu: Consolidate PREEMPT and !PREEMPT synchronize_rcu() · e5bc3af7
      Paul E. McKenney authored
      Now that rcu_blocking_is_gp() makes the correct immediate-return
      decision for both PREEMPT and !PREEMPT, a single implementation of
      synchronize_rcu() will work correctly under both configurations.
      This commit therefore eliminates a few lines of code by consolidating
      the two implementations of synchronize_rcu().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      e5bc3af7
    • Paul E. McKenney's avatar
      rcu: Inline rcu_kthread_do_work() into its sole remaining caller · c46f497a
      Paul E. McKenney authored
      The rcu_kthread_do_work() function has a single-line body and only one
      remaining caller.  This commit therefore saves a few lines of code by
      inlining rcu_kthread_do_work() into its sole remaining caller.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      c46f497a
    • Paul E. McKenney's avatar
      rcu: Rename and comment changes due to only one rcuo kthread per CPU · ad368d15
      Paul E. McKenney authored
      Given RCU flavor consolidation, the name rcu_spawn_all_nocb_kthreads()
      is quite misleading.  It no longer ever creates more than one kthread,
      and it does so only for the specified CPU.  This commit therefore changes
      this name to the more descriptive rcu_spawn_cpu_nocb_kthread(), and also
      fixes up a similar issue in its header comment while in the area.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      ad368d15
  8. 01 Dec, 2018 2 commits
  9. 12 Nov, 2018 3 commits
    • Paul E. McKenney's avatar
      rcu: Avoid signed integer overflow in rcu_preempt_deferred_qs() · 5f1a6ef3
      Paul E. McKenney authored
      Subtracting INT_MIN can be interpreted as unconditional signed integer
      overflow, which according to the C standard is undefined behavior.
      Therefore, kernel build arguments notwithstanding, it would be good to
      future-proof the code.  This commit therefore substitutes INT_MAX for
      INT_MIN in order to avoid undefined behavior.
      
      While in the neighborhood, this commit also creates some meaningful names
      for INT_MAX and friends in order to improve readability, as suggested
      by Joel Fernandes.
      Reported-by: default avatarRan Rozenstein <ranro@mellanox.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      5f1a6ef3
    • Paul E. McKenney's avatar
      rcu: Replace this_cpu_ptr() with __this_cpu_read() · 117f683c
      Paul E. McKenney authored
      Because __this_cpu_read() can be lighter weight than equivalent uses of
      this_cpu_ptr(), this commit replaces the latter with the former.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      117f683c
    • Paul E. McKenney's avatar
      rcu: Speed up expedited GPs when interrupting RCU reader · 05f41571
      Paul E. McKenney authored
      In PREEMPT kernels, an expedited grace period might send an IPI to a
      CPU that is executing an RCU read-side critical section.  In that case,
      it would be nice if the rcu_read_unlock() directly interacted with the
      RCU core code to immediately report the quiescent state.  And this does
      happen in the case where the reader has been preempted.  But it would
      also be a nice performance optimization if immediate reporting also
      happened in the preemption-free case.
      
      This commit therefore adds an ->exp_hint field to the task_struct structure's
      ->rcu_read_unlock_special field.  The IPI handler sets this hint when
      it has interrupted an RCU read-side critical section, and this causes
      the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(),
      which, if preemption is enabled, reports the quiescent state immediately.
      If preemption is disabled, then the report is required to be deferred
      until preemption (or bottom halves or interrupts or whatever) is re-enabled.
      
      Because this is a hint, it does nothing for more complicated cases.  For
      example, if the IPI interrupts an RCU reader, but interrupts are disabled
      across the rcu_read_unlock(), but another rcu_read_lock() is executed
      before interrupts are re-enabled, the hint will already have been cleared.
      If you do crazy things like this, reporting will be deferred until some
      later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar.
      Reported-by: default avatarJoel Fernandes <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      Acked-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      05f41571