1. 04 Jul, 2024 5 commits
    • Paul E. McKenney's avatar
      Merge branches 'doc.2024.06.06a', 'fixes.2024.07.04a', 'mb.2024.06.28a',... · 02219caa
      Paul E. McKenney authored
      Merge branches 'doc.2024.06.06a', 'fixes.2024.07.04a', 'mb.2024.06.28a', 'nocb.2024.06.03a', 'rcu-tasks.2024.06.06a', 'rcutorture.2024.06.06a' and 'srcu.2024.06.18a' into HEAD
      
      doc.2024.06.06a: Documentation updates.
      fixes.2024.07.04a: Miscellaneous fixes.
      mb.2024.06.28a: Grace-period memory-barrier redundancy removal.
      nocb.2024.06.03a: No-CB CPU updates.
      rcu-tasks.2024.06.06a: RCU-Tasks updates.
      rcutorture.2024.06.06a: Torture-test updates.
      srcu.2024.06.18a: SRCU polled-grace-period updates.
      02219caa
    • Frederic Weisbecker's avatar
      rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation · 55d4669e
      Frederic Weisbecker authored
      When rcu_barrier() calls rcu_rdp_cpu_online() and observes a CPU off
      rnp->qsmaskinitnext, it means that all accesses from the offline CPU
      preceding the CPUHP_TEARDOWN_CPU are visible to RCU barrier, including
      callbacks expiration and counter updates.
      
      However interrupts can still fire after stop_machine() re-enables
      interrupts and before rcutree_report_cpu_dead(). The related accesses
      happening between CPUHP_TEARDOWN_CPU and rnp->qsmaskinitnext clearing
      are _NOT_ guaranteed to be seen by rcu_barrier() without proper
      ordering, especially when callbacks are invoked there to the end, making
      rcutree_migrate_callback() bypass barrier_lock.
      
      The following theoretical race example can make rcu_barrier() hang:
      
      CPU 0                                               CPU 1
      -----                                               -----
      //cpu_down()
      smpboot_park_threads()
      //ksoftirqd is parked now
      <IRQ>
      rcu_sched_clock_irq()
         invoke_rcu_core()
      do_softirq()
         rcu_core()
            rcu_do_batch()
               // callback storm
               // rcu_do_batch() returns
               // before completing all
               // of them
         // do_softirq also returns early because of
         // timeout. It defers to ksoftirqd but
         // it's parked
      </IRQ>
      stop_machine()
         take_cpu_down()
                                                          rcu_barrier()
                                                              spin_lock(barrier_lock)
                                                              // observes rcu_segcblist_n_cbs(&rdp->cblist) != 0
      <IRQ>
      do_softirq()
         rcu_core()
            rcu_do_batch()
               //completes all pending callbacks
               //smp_mb() implied _after_ callback number dec
      </IRQ>
      
      rcutree_report_cpu_dead()
         rnp->qsmaskinitnext &= ~rdp->grpmask;
      
      rcutree_migrate_callback()
         // no callback, early return without locking
         // barrier_lock
                                                              //observes !rcu_rdp_cpu_online(rdp)
                                                              rcu_barrier_entrain()
                                                                 rcu_segcblist_entrain()
                                                                    // Observe rcu_segcblist_n_cbs(rsclp) == 0
                                                                    // because no barrier between reading
                                                                    // rnp->qsmaskinitnext and rsclp->len
                                                                    rcu_segcblist_add_len()
                                                                       smp_mb__before_atomic()
                                                                       // will now observe the 0 count and empty
                                                                       // list, but too late, we enqueue regardless
                                                                       WRITE_ONCE(rsclp->len, rsclp->len + v);
                                                              // ignored barrier callback
                                                              // rcu barrier stall...
      
      This could be solved with a read memory barrier, enforcing the message
      passing between rnp->qsmaskinitnext and rsclp->len, matching the full
      memory barrier after rsclp->len addition in rcu_segcblist_add_len()
      performed at the end of rcu_do_batch().
      
      However the rcu_barrier() is complicated enough and probably doesn't
      need too many more subtleties. CPU down is a slowpath and the
      barrier_lock seldom contended. Solve the issue with unconditionally
      locking the barrier_lock on rcutree_migrate_callbacks(). This makes sure
      that either rcu_barrier() sees the empty queue or its entrained
      callback will be migrated.
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      55d4669e
    • Oleg Nesterov's avatar
      rcu: Eliminate lockless accesses to rcu_sync->gp_count · 6f4cec22
      Oleg Nesterov authored
      The rcu_sync structure's ->gp_count field is always accessed under the
      protection of that same structure's ->rss_lock field, with the exception
      of a pair of WARN_ON_ONCE() calls just prior to acquiring that lock in
      functions rcu_sync_exit() and rcu_sync_dtor().  These lockless accesses
      are unnecessary and impair KCSAN's ability to catch bugs that might be
      inserted via other lockless accesses.
      
      This commit therefore moves those WARN_ON_ONCE() calls under the lock.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      6f4cec22
    • Paul E. McKenney's avatar
      MAINTAINERS: Add Uladzislau Rezki as RCU maintainer · 7f09e70f
      Paul E. McKenney authored
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Cc: Uladzislau Rezki <urezki@gmail.com>
      7f09e70f
    • Paul E. McKenney's avatar
      rcu: Add rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter · 68d124b0
      Paul E. McKenney authored
      If a CPU is running either a userspace application or a guest OS in
      nohz_full mode, it is possible for a system call to occur just as an
      RCU grace period is starting.  If that CPU also has the scheduling-clock
      tick enabled for any reason (such as a second runnable task), and if the
      system was booted with rcutree.use_softirq=0, then RCU can add insult to
      injury by awakening that CPU's rcuc kthread, resulting in yet another
      task and yet more OS jitter due to switching to that task, running it,
      and switching back.
      
      In addition, in the common case where that system call is not of
      excessively long duration, awakening the rcuc task is pointless.
      This pointlessness is due to the fact that the CPU will enter an extended
      quiescent state upon returning to the userspace application or guest OS.
      In this case, the rcuc kthread cannot do anything that the main RCU
      grace-period kthread cannot do on its behalf, at least if it is given
      a few additional milliseconds (for example, given the time duration
      specified by rcutree.jiffies_till_first_fqs, give or take scheduling
      delays).
      
      This commit therefore adds a rcutree.nohz_full_patience_delay kernel
      boot parameter that specifies the grace period age (in milliseconds,
      rounded to jiffies) before which RCU will refrain from awakening the
      rcuc kthread.  Preliminary experimentation suggests a value of 1000,
      that is, one second.  Increasing rcutree.nohz_full_patience_delay will
      increase grace-period latency and in turn increase memory footprint,
      so systems with constrained memory might choose a smaller value.
      Systems with less-aggressive OS-jitter requirements might choose the
      default value of zero, which keeps the traditional immediate-wakeup
      behavior, thus avoiding increases in grace-period latency.
      
      [ paulmck: Apply Leonardo Bras feedback.  ]
      
      Link: https://lore.kernel.org/all/20240328171949.743211-1-leobras@redhat.com/Reported-by: default avatarLeonardo Bras <leobras@redhat.com>
      Suggested-by: default avatarLeonardo Bras <leobras@redhat.com>
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Reviewed-by: default avatarLeonardo Bras <leobras@redhat.com>
      68d124b0
  2. 28 Jun, 2024 6 commits
    • Frederic Weisbecker's avatar
      rcu/exp: Remove redundant full memory barrier at the end of GP · 677ab23b
      Frederic Weisbecker authored
      A full memory barrier is necessary at the end of the expedited grace
      period to order:
      
      1) The grace period completion (pictured by the GP sequence
         number) with all preceding accesses. This pairs with rcu_seq_end()
         performed by the concurrent kworker.
      
      2) The grace period completion and subsequent post-GP update side
         accesses. Pairs again against rcu_seq_end().
      
      This full barrier is already provided by the final sync_exp_work_done()
      test, making the subsequent explicit one redundant. Remove it and
      improve comments.
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Reviewed-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarNeeraj Upadhyay <neeraj.upadhyay@kernel.org>
      677ab23b
    • Frederic Weisbecker's avatar
      rcu: Remove full memory barrier on RCU stall printout · 55911a9f
      Frederic Weisbecker authored
      RCU stall printout fetches the EQS state of a CPU with a preceding full
      memory barrier. However there is nothing to order this read against at
      this debugging stage. It is inherently racy when performed remotely.
      
      Do a plain read instead.
      
      This was the last user of rcu_dynticks_snap().
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Reviewed-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarNeeraj Upadhyay <neeraj.upadhyay@kernel.org>
      55911a9f
    • Frederic Weisbecker's avatar
      rcu: Remove full memory barrier on boot time eqs sanity check · e7a3c8ea
      Frederic Weisbecker authored
      When the boot CPU initializes the per-CPU data on behalf of all possible
      CPUs, a sanity check is performed on each of them to make sure none is
      initialized in an extended quiescent state.
      
      This check involves a full memory barrier which is useless at this early
      boot stage.
      
      Do a plain access instead.
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Reviewed-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarNeeraj Upadhyay <neeraj.upadhyay@kernel.org>
      e7a3c8ea
    • Frederic Weisbecker's avatar
      rcu/exp: Remove superfluous full memory barrier upon first EQS snapshot · 33c0860b
      Frederic Weisbecker authored
      When the grace period kthread checks the extended quiescent state
      counter of a CPU, full ordering is necessary to ensure that either:
      
      * If the GP kthread observes the remote target in an extended quiescent
        state, then that target must observe all accesses prior to the current
        grace period, including the current grace period sequence number, once
        it exits that extended quiescent state.
      
      or:
      
      * If the GP kthread observes the remote target NOT in an extended
        quiescent state, then the target further entering in an extended
        quiescent state must observe all accesses prior to the current
        grace period, including the current grace period sequence number, once
        it enters that extended quiescent state.
      
      This ordering is enforced through a full memory barrier placed right
      before taking the first EQS snapshot. However this is superfluous
      because the snapshot is taken while holding the target's rnp lock which
      provides the necessary ordering through its chain of
      smp_mb__after_unlock_lock().
      
      Remove the needless explicit barrier before the snapshot and put a
      comment about the implicit barrier newly relied upon here.
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Reviewed-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarNeeraj Upadhyay <neeraj.upadhyay@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      33c0860b
    • Frederic Weisbecker's avatar
      rcu: Remove superfluous full memory barrier upon first EQS snapshot · 9a7e73c9
      Frederic Weisbecker authored
      When the grace period kthread checks the extended quiescent state
      counter of a CPU, full ordering is necessary to ensure that either:
      
      * If the GP kthread observes the remote target in an extended quiescent
        state, then that target must observe all accesses prior to the current
        grace period, including the current grace period sequence number, once
        it exits that extended quiescent state.
      
      or:
      
      * If the GP kthread observes the remote target NOT in an extended
        quiescent state, then the target further entering in an extended
        quiescent state must observe all accesses prior to the current
        grace period, including the current grace period sequence number, once
        it enters that extended quiescent state.
      
      This ordering is enforced through a full memory barrier placed right
      before taking the first EQS snapshot. However this is superfluous
      because the snapshot is taken while holding the target's rnp lock which
      provides the necessary ordering through its chain of
      smp_mb__after_unlock_lock().
      
      Remove the needless explicit barrier before the snapshot and put a
      comment about the implicit barrier newly relied upon here.
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Reviewed-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarNeeraj Upadhyay <neeraj.upadhyay@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      9a7e73c9
    • Frederic Weisbecker's avatar
      rcu: Remove full ordering on second EQS snapshot · 0a5e9bd3
      Frederic Weisbecker authored
      When the grace period kthread checks the extended quiescent state
      counter of a CPU, full ordering is necessary to ensure that either:
      
      * If the GP kthread observes the remote target in an extended quiescent
        state, then that target must observe all accesses prior to the current
        grace period, including the current grace period sequence number, once
        it exits that extended quiescent state. Also the GP kthread must
        observe all accesses performed by the target prior it entering in
        EQS.
      
      or:
      
      * If the GP kthread observes the remote target NOT in an extended
        quiescent state, then the target further entering in an extended
        quiescent state must observe all accesses prior to the current
        grace period, including the current grace period sequence number, once
        it enters that extended quiescent state. Also the GP kthread later
        observing that EQS must also observe all accesses performed by the
        target prior it entering in EQS.
      
      This ordering is explicitly performed both on the first EQS snapshot
      and on the second one as well through the combination of a preceding
      full barrier followed by an acquire read. However the second snapshot's
      full memory barrier is redundant and not needed to enforce the above
      guarantees:
      
          GP kthread                  Remote target
          ----                        -----
          // Access prior GP
          WRITE_ONCE(A, 1)
          // first snapshot
          smp_mb()
          x = smp_load_acquire(EQS)
                                     // Access prior GP
                                     WRITE_ONCE(B, 1)
                                     // EQS enter
                                     // implied full barrier by atomic_add_return()
                                     atomic_add_return(RCU_DYNTICKS_IDX, EQS)
                                     // implied full barrier by atomic_add_return()
                                     READ_ONCE(A)
          // second snapshot
          y = smp_load_acquire(EQS)
          z = READ_ONCE(B)
      
      If the GP kthread above fails to observe the remote target in EQS
      (x not in EQS), the remote target will observe A == 1 after further
      entering in EQS. Then the second snapshot taken by the GP kthread only
      need to be an acquire read in order to observe z == 1.
      
      Therefore remove the needless full memory barrier on second snapshot.
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Reviewed-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Reviewed-by: default avatarNeeraj Upadhyay <neeraj.upadhyay@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      0a5e9bd3
  3. 18 Jun, 2024 6 commits
  4. 06 Jun, 2024 6 commits
    • Frederic Weisbecker's avatar
      rcu/tasks: Fix stale task snaphot for Tasks Trace · 399ced95
      Frederic Weisbecker authored
      When RCU-TASKS-TRACE pre-gp takes a snapshot of the current task running
      on all online CPUs, no explicit ordering synchronizes properly with a
      context switch.  This lack of ordering can permit the new task to miss
      pre-grace-period update-side accesses.  The following diagram, courtesy
      of Paul, shows the possible bad scenario:
      
              CPU 0                                           CPU 1
              -----                                           -----
      
              // Pre-GP update side access
              WRITE_ONCE(*X, 1);
              smp_mb();
              r0 = rq->curr;
                                                              RCU_INIT_POINTER(rq->curr, TASK_B)
                                                              spin_unlock(rq)
                                                              rcu_read_lock_trace()
                                                              r1 = X;
              /* ignore TASK_B */
      
      Either r0==TASK_B or r1==1 is needed but neither is guaranteed.
      
      One possible solution to solve this is to wait for an RCU grace period
      at the beginning of the RCU-tasks-trace grace period before taking the
      current tasks snaphot. However this would introduce large additional
      latencies to RCU-tasks-trace grace periods.
      
      Another solution is to lock the target runqueue while taking the current
      task snapshot. This ensures that the update side sees the latest context
      switch and subsequent context switches will see the pre-grace-period
      update side accesses.
      
      This commit therefore adds runqueue locking to cpu_curr_snapshot().
      
      Fixes: e386b672 ("rcu-tasks: Eliminate RCU Tasks Trace IPIs to online CPUs")
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      399ced95
    • Paul E. McKenney's avatar
      tools/rcu: Add rcu-updaters.sh script · 0ac55d09
      Paul E. McKenney authored
      This commit adds a tools/rcu/rcu-updaters.sh script that uses bpftrace
      to print a histogram of the RCU update-side primitives invoked during
      the specified time interval, or until manually terminated if no interval
      is specified.
      
      Sample output on an idle laptop:
      
      @counts[poll_state_synchronize_rcu]: 6
      @counts[synchronize_srcu]: 13
      @counts[call_rcu_tasks_trace]: 25
      @counts[synchronize_rcu]: 54
      @counts[kvfree_call_rcu]: 428
      @counts[call_rcu]: 2134
      
      Note that when run on a kernel missing one or more of the symbols, this
      script will issue a diagnostic for each that is not found, but continue
      normally for the rest of the functions.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      0ac55d09
    • Jeff Johnson's avatar
      rcutorture: Add missing MODULE_DESCRIPTION() macros · b9f147cd
      Jeff Johnson authored
      Fix the following 'make W=1' warnings:
      
      WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/rcu/rcutorture.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/rcu/rcuscale.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/rcu/refscale.o
      Signed-off-by: default avatarJeff Johnson <quic_jjohnson@quicinc.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      b9f147cd
    • Paul E. McKenney's avatar
      rcutorture: Fix rcu_torture_fwd_cb_cr() data race · 6040072f
      Paul E. McKenney authored
      On powerpc systems, spinlock acquisition does not order prior stores
      against later loads.  This means that this statement:
      
      	rfcp->rfc_next = NULL;
      
      Can be reordered to follow this statement:
      
      	WRITE_ONCE(*rfcpp, rfcp);
      
      Which is then a data race with rcu_torture_fwd_prog_cr(), specifically,
      this statement:
      
      	rfcpn = READ_ONCE(rfcp->rfc_next)
      
      KCSAN located this data race, which represents a real failure on powerpc.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: <kasan-dev@googlegroups.com>
      6040072f
    • Paul E. McKenney's avatar
      doc: Clarify rcu_assign_pointer() and rcu_dereference() ordering · a3fbf860
      Paul E. McKenney authored
      This commit expands on the ordering properties of rcu_assign_pointer()
      and rcu_dereference(), outlining their constraints on CPUs and compilers.
      Reported-by: default avatarRao Shoaib <rao.shoaib@oracle.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      a3fbf860
    • Paul E. McKenney's avatar
      doc: Update Tasks RCU and Tasks Rude RCU description in Requirements.rst · 293d9013
      Paul E. McKenney authored
      This commit adds more detail to the Tasks RCU and Tasks Rude RCU
      descriptions in Requirements.rst.  While in the area, add Tasks Trace
      RCU to the Tasks-RCU table of contents.
      Reported-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      293d9013
  5. 04 Jun, 2024 7 commits
    • Zqiang's avatar
      rcutorture: Make rcutorture support srcu double call test · 43b39caf
      Zqiang authored
      This commit allows rcutorture to test double-call_srcu() when the
      CONFIG_DEBUG_OBJECTS_RCU_HEAD Kconfig option is enabled.  The non-raw
      sdp structure's ->spinlock will be acquired in call_srcu(), hence this
      commit also removes the current IRQ and preemption disabling so as to
      avoid lockdep complaints.
      
      Link: https://lore.kernel.org/all/20240407112714.24460-1-qiang.zhang1211@gmail.com/Signed-off-by: default avatarZqiang <qiang.zhang1211@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      43b39caf
    • Frederic Weisbecker's avatar
      Revert "rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes()" · 9855c37e
      Frederic Weisbecker authored
      This reverts commit 28319d6d. The race
      it fixed was subject to conditions that don't exist anymore since:
      
      	1612160b ("rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks")
      
      This latter commit removes the use of SRCU that used to cover the
      RCU-tasks blind spot on exit between the tasklist's removal and the
      final preemption disabling. The task is now placed instead into a
      temporary list inside which voluntary sleeps are accounted as RCU-tasks
      quiescent states. This would disarm the deadlock initially reported
      against PID namespace exit.
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      9855c37e
    • Frederic Weisbecker's avatar
      rcu/nocb: Remove buggy bypass lock contention mitigation · e4f78057
      Frederic Weisbecker authored
      The bypass lock contention mitigation assumes there can be at most
      2 contenders on the bypass lock, following this scheme:
      
      1) One kthread takes the bypass lock
      2) Another one spins on it and increment the contended counter
      3) A third one (a bypass enqueuer) sees the contended counter on and
        busy loops waiting on it to decrement.
      
      However this assumption is wrong. There can be only one CPU to find the
      lock contended because call_rcu() (the bypass enqueuer) is the only
      bypass lock acquire site that may not already hold the NOCB lock
      beforehand, all the other sites must first contend on the NOCB lock.
      Therefore step 2) is impossible.
      
      The other problem is that the mitigation assumes that contenders all
      belong to the same rdp CPU, which is also impossible for a raw spinlock.
      In theory the warning could trigger if the enqueuer holds the bypass
      lock and another CPU flushes the bypass queue concurrently but this is
      prevented from all flush users:
      
      1) NOCB kthreads only flush if they successfully _tried_ to lock the
         bypass lock. So no contention management here.
      
      2) Flush on callbacks migration happen remotely when the CPU is offline.
         No concurrency against bypass enqueue.
      
      3) Flush on deoffloading happen either locally with IRQs disabled or
         remotely when the CPU is not yet online. No concurrency against
         bypass enqueue.
      
      4) Flush on barrier entrain happen either locally with IRQs disabled or
         remotely when the CPU is offline. No concurrency against
         bypass enqueue.
      
      For those reasons, the bypass lock contention mitigation isn't needed
      and is even wrong. Remove it but keep the warning reporting a contended
      bypass lock on a remote CPU, to keep unexpected contention awareness.
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      e4f78057
    • Frederic Weisbecker's avatar
      rcu/nocb: Use kthread parking instead of ad-hoc implementation · 483d5bf2
      Frederic Weisbecker authored
      Upon NOCB deoffloading, the rcuo kthread must be forced to sleep
      until the corresponding rdp is ever offloaded again. The deoffloader
      clears the SEGCBLIST_OFFLOADED flag, wakes up the rcuo kthread which
      then notices that change and clears in turn its SEGCBLIST_KTHREAD_CB
      flag before going to sleep, until it ever sees the SEGCBLIST_OFFLOADED
      flag again, should a re-offloading happen.
      
      Upon NOCB offloading, the rcuo kthread must be forced to wake up and
      handle callbacks until the corresponding rdp is ever deoffloaded again.
      The offloader sets the SEGCBLIST_OFFLOADED flag, wakes up the rcuo
      kthread which then notices that change and sets in turn its
      SEGCBLIST_KTHREAD_CB flag before going to check callbacks, until it
      ever sees the SEGCBLIST_OFFLOADED flag cleared again, should a
      de-offloading happen again.
      
      This is all a crude ad-hoc and error-prone kthread (un-)parking
      re-implementation.
      
      Consolidate the behaviour with the appropriate API instead.
      
      [ paulmck: Apply Qiang Zhang feedback provided in Link: below. ]
      Link: https://lore.kernel.org/all/20240509074046.15629-1-qiang.zhang1211@gmail.com/Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      483d5bf2
    • Frederic Weisbecker's avatar
      rcu/nocb: Fix segcblist state machine stale comments about timers · aa97b9a5
      Frederic Weisbecker authored
      The (de-)offloading process used to take care about the NOCB timer when
      it depended on the per-rdp NOCB locking. However this isn't the case
      anymore for a long while. It can now safely be armed and run during the
      (de-)offloading process, which doesn't care about it anymore.
      
      Update the comments accordingly.
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      aa97b9a5
    • Frederic Weisbecker's avatar
      rcu/nocb: Fix segcblist state machine comments about bypass · ce418966
      Frederic Weisbecker authored
      The parts explaining the bypass lifecycle in (de-)offloading are out
      of date and/or wrong. Bypass is simply enabled whenever SEGCBLIST_RCU_CORE
      flag is off.
      
      Fix the comments accordingly.
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      ce418966
    • Paul E. McKenney's avatar
      rcu: Add lockdep_assert_in_rcu_read_lock() and friends · 32d99593
      Paul E. McKenney authored
      There is no direct RCU counterpart to lockdep_assert_irqs_disabled()
      and friends.  Although it is possible to construct them, it would
      be more convenient to have the following lockdep assertions:
      
      lockdep_assert_in_rcu_read_lock()
      lockdep_assert_in_rcu_read_lock_bh()
      lockdep_assert_in_rcu_read_lock_sched()
      lockdep_assert_in_rcu_reader()
      
      This commit therefore creates them.
      Reported-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      32d99593
  6. 26 May, 2024 5 commits
  7. 25 May, 2024 5 commits
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of... · 9b62e02e
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "16 hotfixes, 11 of which are cc:stable.
      
        A few nilfs2 fixes, the remainder are for MM: a couple of selftests
        fixes, various singletons fixing various issues in various parts"
      
      * tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        mm/ksm: fix possible UAF of stable_node
        mm/memory-failure: fix handling of dissolved but not taken off from buddy pages
        mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
        nilfs2: fix potential hang in nilfs_detach_log_writer()
        nilfs2: fix unexpected freezing of nilfs_segctor_sync()
        nilfs2: fix use-after-free of timer for log writer thread
        selftests/mm: fix build warnings on ppc64
        arm64: patching: fix handling of execmem addresses
        selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
        selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
        selftests/mm: compaction_test: fix bogus test success on Aarch64
        mailmap: update email address for Satya Priya
        mm/huge_memory: don't unpoison huge_zero_folio
        kasan, fortify: properly rename memintrinsics
        lib: add version into /proc/allocinfo output
        mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
      9b62e02e
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a0db36ed
      Linus Torvalds authored
      Pull irq fixes from Ingo Molnar:
      
       - Fix x86 IRQ vector leak caused by a CPU offlining race
      
       - Fix build failure in the riscv-imsic irqchip driver
         caused by an API-change semantic conflict
      
       - Fix use-after-free in irq_find_at_or_after()
      
      * tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()
        genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline
        irqchip/riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
      a0db36ed
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3a390f24
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
      
       - Fix regressions of the new x86 CPU VFM (vendor/family/model)
         enumeration/matching code
      
       - Fix crash kernel detection on buggy firmware with
         non-compliant ACPI MADT tables
      
       - Address Kconfig warning
      
      * tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL
        crypto: x86/aes-xts - switch to new Intel CPU model defines
        x86/topology: Handle bogus ACPI tables correctly
        x86/kconfig: Select ARCH_WANT_FRAME_POINTERS again when UNWINDER_FRAME_POINTER=y
      3a390f24
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.10-1' of https://github.com/cminyard/linux-ipmi · 56676c4c
      Linus Torvalds authored
      Pull ipmi updates from Corey Minyard:
       "Mostly updates for deprecated interfaces, platform.remove and
        converting from a tasklet to a BH workqueue.
      
        Also use HAS_IOPORT for disabling inb()/outb()"
      
      * tag 'for-linus-6.10-1' of https://github.com/cminyard/linux-ipmi:
        ipmi: kcs_bmc_npcm7xx: Convert to platform remove callback returning void
        ipmi: kcs_bmc_aspeed: Convert to platform remove callback returning void
        ipmi: ipmi_ssif: Convert to platform remove callback returning void
        ipmi: ipmi_si_platform: Convert to platform remove callback returning void
        ipmi: ipmi_powernv: Convert to platform remove callback returning void
        ipmi: bt-bmc: Convert to platform remove callback returning void
        char: ipmi: handle HAS_IOPORT dependencies
        ipmi: Convert from tasklet to BH workqueue
      56676c4c
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client · 74eca356
      Linus Torvalds authored
      Pull ceph updates from Ilya Dryomov:
       "A series from Xiubo that adds support for additional access checks
        based on MDS auth caps which were recently made available to clients.
      
        This is needed to prevent scenarios where the MDS quietly discards
        updates that a UID-restricted client previously (wrongfully) acked to
        the user.
      
        Other than that, just a documentation fixup"
      
      * tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client:
        doc: ceph: update userspace command to get CephFS metadata
        ceph: add CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK feature bit
        ceph: check the cephx mds auth access for async dirop
        ceph: check the cephx mds auth access for open
        ceph: check the cephx mds auth access for setattr
        ceph: add ceph_mds_check_access() helper
        ceph: save cap_auths in MDS client when session is opened
      74eca356