1. 18 Apr, 2017 35 commits
    • Paul E. McKenney's avatar
      srcu: Create a tiny SRCU · d8be8173
      Paul E. McKenney authored
      In response to automated complaints about modifications to SRCU
      increasing its size, this commit creates a tiny SRCU that is
      used in SMP=n && PREEMPT=n builds.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d8be8173
    • Paul E. McKenney's avatar
      mm: Use static initialization for "srcu" · dde8da6c
      Paul E. McKenney authored
      The MM-notifier code currently dynamically initializes the srcu_struct
      named "srcu" at subsys_initcall() time, and includes a BUG_ON() to check
      this initialization in do_mmu_notifier_register().  Unfortunately, there
      is no foolproof way to verify that an srcu_struct has been initialized,
      given the possibility of an srcu_struct being allocated on the stack or
      on the heap.  This means that creating an srcu_struct_is_initialized()
      function is not a reasonable course of action.  Nor is peppering
      do_mmu_notifier_register() with SRCU-specific #ifdefs an attractive
      alternative.
      
      This commit therefore uses DEFINE_STATIC_SRCU() to initialize
      this srcu_struct at compile time, thus eliminating both the
      subsys_initcall()-time initialization and the runtime BUG_ON().
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: <linux-mm@kvack.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      dde8da6c
    • Paul E. McKenney's avatar
      srcu: Crude control of expedited grace periods · f60d231a
      Paul E. McKenney authored
      SRCU's implementation of expedited grace periods has always assumed
      that the SRCU instance is idle when the expedited request arrives.
      This commit improves this a bit by maintaining a count of the number
      of outstanding expedited requests, thus allowing prior non-expedited
      grace periods accommodate these requests by shifting to expedited mode.
      However, any non-expedited wait already in progress will still wait for
      the full duration.
      
      Improved control of expedited grace periods is planned, but one step
      at a time.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f60d231a
    • Paul E. McKenney's avatar
      srcu: Merge ->srcu_state into ->srcu_gp_seq · 80a7956f
      Paul E. McKenney authored
      Updating ->srcu_state and ->srcu_gp_seq will lead to extremely complex
      race conditions given multiple callback queues, so this commit takes
      advantage of the two-bit state now available in rcu_seq counters to
      store the state in the bottom two bits of ->srcu_gp_seq.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      80a7956f
    • Paul E. McKenney's avatar
      srcu: Allow a second bit in rcu_seq for SRCU state · f1ec57a4
      Paul E. McKenney authored
      This commit increases the number of reserved bits at the bottom of an
      rcu_seq grace-period counter from one to two, as will be needed to
      accommodate SRCU's three-state grace periods.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f1ec57a4
    • Paul E. McKenney's avatar
      srcu: Improve rcu_seq grace-period-counter abstraction · 031aeee0
      Paul E. McKenney authored
      The expedited grace-period code contains several open-coded shifts
      know the format of an rcu_seq grace-period counter, which is not
      particularly good style.  This commit therefore creates a new
      rcu_seq_ctr() function that extracts the counter portion of the
      counter, and an rcu_seq_state() function that extracts the low-order
      state bit.  This commit prepares for SRCU callback parallelization,
      which will require two state bits.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      031aeee0
    • Paul E. McKenney's avatar
    • Paul E. McKenney's avatar
      srcu: Make num_rcu_lvl[] array be external · e95d68d2
      Paul E. McKenney authored
      This commit makes the num_rcu_lvl[] array external so that SRCU can
      make use of it for initializing its upcoming srcu_node tree.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e95d68d2
    • Paul E. McKenney's avatar
      srcu: Move rcu_node traversal macros to rcu.h · efbe451d
      Paul E. McKenney authored
      This commit moves rcu_for_each_node_breadth_first(),
      rcu_for_each_nonleaf_node_breadth_first(), and
      rcu_for_each_leaf_node() from kernel/rcu/tree.h to
      kernel/rcu/rcu.h so that SRCU can access them.
      This commit is code-movement only.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      efbe451d
    • Paul E. McKenney's avatar
      rcu: Remove redundant levelcnt[] array from rcu_init_one() · 41f5c631
      Paul E. McKenney authored
      The levelcnt[] array is identical to num_rcu_lvl[], so this commit
      removes levelcnt[].
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      41f5c631
    • Paul E. McKenney's avatar
      srcu: Move rcu_init_levelspread() to rcu_tree_node.h · 2b34c43c
      Paul E. McKenney authored
      This commit moves the rcu_init_levelspread() function from
      kernel/rcu/tree.c to kernel/rcu/rcu.h so that SRCU can access it.  This is
      another step towards enabling SRCU to create its own combining tree.
      This commit is code-movement only, give or take knock-on adjustments.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2b34c43c
    • Paul E. McKenney's avatar
      srcu: Move combining-tree definitions for SRCU's benefit · f2425b4e
      Paul E. McKenney authored
      This commit moves the C preprocessor code that defines the default shape
      of the rcu_node combining tree to a new include/linux/rcu_node_tree.h
      file as a first step towards enabling SRCU to create its own combining
      tree, which in turn enables SRCU to implement per-CPU callback handling,
      thus avoiding contention on the lock currently guarding the single list
      of callbacks.  Note that users of SRCU still need to know the size of
      the srcu_struct structure, hence include/linux rather than kernel/rcu.
      
      This commit is code-movement only.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f2425b4e
    • Paul E. McKenney's avatar
      srcu: Use rcu_segcblist to track SRCU callbacks · 8660b7d8
      Paul E. McKenney authored
      This commit switches SRCU from custom-built callback queues to the new
      rcu_segcblist structure.  This change associates grace-period sequence
      numbers with groups of callbacks, which will be needed for efficient
      processing of per-CPU callbacks.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8660b7d8
    • Paul E. McKenney's avatar
      srcu: Add grace-period sequence numbers · ac367c1c
      Paul E. McKenney authored
      This commit adds grace-period sequence numbers, which will be used to
      handle mid-boot grace periods and per-CPU callback lists.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ac367c1c
    • Paul E. McKenney's avatar
      srcu: Move to state-based grace-period sequencing · c2a8ec07
      Paul E. McKenney authored
      The current SRCU grace-period processing might never reach the last
      portion of srcu_advance_batches().  This is OK given the current
      implementation, as the first portion, up to the try_check_zero()
      following the srcu_flip() is sufficient to drive grace periods forward.
      However, it has the unfortunate side-effect of making it impossible to
      determine when a given grace period has ended, and it will be necessary
      to efficiently trace ends of grace periods in order to efficiently handle
      per-CPU SRCU callback lists.
      
      This commit therefore adds states to the SRCU grace-period processing,
      so that the end of a given SRCU grace period is marked by the transition
      to the SRCU_STATE_DONE state.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c2a8ec07
    • Paul E. McKenney's avatar
      srcu: Push srcu_advance_batches() fastpath into common case · c6e56f59
      Paul E. McKenney authored
      This commit simplifies the SRCU state machine by pushing the
      srcu_advance_batches() idle-SRCU fastpath into the common case.  This is
      done by giving srcu_reschedule() a delay parameter, which is zero in
      the call from srcu_advance_batches().
      
      This commit is a step towards numbering callbacks in order to
      efficiently handle per-CPU callback lists.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c6e56f59
    • Dmitry Vyukov's avatar
      rcu: Fix warning in rcu_seq_end() · f010ed82
      Dmitry Vyukov authored
      The rcu_seq_end() function increments seq signifying completion
      of a grace period, after that checks that the seq is even and wakes
      _synchronize_rcu_expedited().  The _synchronize_rcu_expedited() function
      uses wait_event() to wait for even seq.  The problem is that wait_event()
      can return as soon as seq becomes even without waiting for the wakeup.
      In such case the warning in rcu_seq_end() can falsely fire if the next
      expedited grace period starts before the check.
      
      Check that seq has good value before incrementing it.
      Signed-off-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: syzkaller@googlegroups.com
      Cc: linux-kernel@vger.kernel.org
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: josh@joshtriplett.org
      Cc: jiangshanlai@gmail.com
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      
      ---
      
      syzkaller-triggered warning:
      
      WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533
      rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533
      CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Workqueue: events wait_rcu_exp_gp
      Call Trace:
       __dump_stack lib/dump_stack.c:15 [inline]
       dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
       panic+0x1fb/0x412 kernel/panic.c:179
       __warn+0x1c4/0x1e0 kernel/panic.c:540
       warn_slowpath_null+0x2c/0x40 kernel/panic.c:583
       rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533
       rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline]
       rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517
       rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline]
       wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570
       process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096
       worker_thread+0x223/0x19c0 kernel/workqueue.c:2230
       kthread+0x326/0x3f0 kernel/kthread.c:227
       ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
      ---
      f010ed82
    • Paul E. McKenney's avatar
      rcu: Expedited wakeups need to be fully ordered · 3c345825
      Paul E. McKenney authored
      Expedited grace periods use workqueue handlers that wake up the requesters,
      but there is no lock mediating this wakeup.  Therefore, memory barriers
      are required to ensure that the handler's memory references are seen by
      all to occur before synchronize_*_expedited() returns to its caller.
      Possibly detected by syzkaller.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3c345825
    • Paul E. McKenney's avatar
      srcu: Move rcu_seq_start() and friends to rcu.h · 2e8c28c2
      Paul E. McKenney authored
      This commit moves rcu_seq_start(), rcu_seq_end(), rcu_seq_snap(),
      and rcu_seq_done() from kernel/rcu/tree.c to kernel/rcu/rcu.h.
      This will allow SRCU to use these functions, which in turn will
      allow SRCU to move from a single global callback queue to a
      per-CPU callback queue.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2e8c28c2
    • Paul E. McKenney's avatar
      rcu: Add single-element dequeue functions to rcu_segcblist · bdcabf4c
      Paul E. McKenney authored
      This commit adds single-element dequeue functions to rcu_segcblist.
      These are less efficient than using the extract and insert functions,
      but allow more precise debugging code.  These functions are thus
      expected to be used only in debug builds, for example, CONFIG_PROVE_RCU.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      bdcabf4c
    • Paul E. McKenney's avatar
      srcu: Allow early boot use of synchronize_srcu() · b5eaeaa5
      Paul E. McKenney authored
      This commit checks for pre-scheduler state, and if that early in the
      boot process, synchronize_srcu() and friends are no-ops.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b5eaeaa5
    • Paul E. McKenney's avatar
      srcu: Allow SRCU to access rcu_scheduler_active · 900b1028
      Paul E. McKenney authored
      This is primarily a code-movement commit in preparation for allowing
      SRCU to handle early-boot SRCU grace periods.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      900b1028
    • Paul E. McKenney's avatar
      srcu: Abstract multi-tail callback list handling · 15fecf89
      Paul E. McKenney authored
      RCU has only one multi-tail callback list, which is implemented via
      the nxtlist, nxttail, nxtcompleted, qlen_lazy, and qlen fields in the
      rcu_data structure, and whose operations are open-code throughout the
      Tree RCU implementation.  This has been more or less OK in the past,
      but upcoming callback-list optimizations in SRCU could really use
      a multi-tail callback list there as well.
      
      This commit therefore abstracts the multi-tail callback list handling
      into a new kernel/rcu/rcu_segcblist.h file, and uses this new API.
      The simple head-and-tail pointer callback list is also abstracted and
      applied everywhere except for the NOCB callback-offload lists.  (Yes,
      the plan is to apply them there as well, but this commit is already
      bigger than would be good.)
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      15fecf89
    • Paul E. McKenney's avatar
      rcu: Default RCU_FANOUT_LEAF to 16 unless explicitly changed · b8c78d3a
      Paul E. McKenney authored
      If the RCU_EXPERT Kconfig option is not set (the default), then the
      RCU_FANOUT_LEAF Kconfig option will not be defined, which will cause
      the leaf-level rcu_node tree fanout to default to 32 on 32-bit systems
      and 64 on 64-bit systems.  This can result in excessive lock contention.
      This commit therefore changes the computation of the leaf-level rcu_node
      tree fanout so that the result will be 16 unless an explicit Kconfig or
      kernel-boot setting says otherwise.
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b8c78d3a
    • Paul E. McKenney's avatar
      rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actions · 9226b10d
      Paul E. McKenney authored
      The rcu_all_qs() and rcu_note_context_switch() do a series of checks,
      taking various actions to supply RCU with quiescent states, depending
      on the outcomes of the various checks.  This is a bit much for scheduling
      fastpaths, so this commit creates a separate ->rcu_urgent_qs field in
      the rcu_dynticks structure that acts as a global guard for these checks.
      Thus, in the common case, rcu_all_qs() and rcu_note_context_switch()
      check the ->rcu_urgent_qs field, find it false, and simply return.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      9226b10d
    • Paul E. McKenney's avatar
      rcu: Eliminate flavor scan in rcu_momentary_dyntick_idle() · 0f9be8ca
      Paul E. McKenney authored
      The rcu_momentary_dyntick_idle() function scans the RCU flavors, checking
      that one of them still needs a quiescent state before doing an expensive
      atomic operation on the ->dynticks counter.  However, this check reduces
      overhead only after a rare race condition, and increases complexity.  This
      commit therefore removes the scan and the mechanism enabling the scan.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0f9be8ca
    • Paul E. McKenney's avatar
      rcu: Pull rcu_qs_ctr into rcu_dynticks structure · 9577df9a
      Paul E. McKenney authored
      The rcu_qs_ctr variable is yet another isolated per-CPU variable,
      so this commit pulls it into the pre-existing rcu_dynticks per-CPU
      structure.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      9577df9a
    • Paul E. McKenney's avatar
      rcu: Pull rcu_sched_qs_mask into rcu_dynticks structure · abb06b99
      Paul E. McKenney authored
      The rcu_sched_qs_mask variable is yet another isolated per-CPU variable,
      so this commit pulls it into the pre-existing rcu_dynticks per-CPU
      structure.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      abb06b99
    • Paul E. McKenney's avatar
      rcu: Semicolon inside RCU_TRACE() for tree.c · 88a4976d
      Paul E. McKenney authored
      The current use of "RCU_TRACE(statement);" can cause odd bugs, especially
      where "statement" is a local-variable declaration, as it can leave a
      misplaced ";" in the source code.  This commit therefore converts these
      to "RCU_TRACE(statement;)", which avoids the misplaced ";".
      Reported-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      88a4976d
    • Paul E. McKenney's avatar
      rcu: Semicolon inside RCU_TRACE() for Tiny RCU · 6c8c1485
      Paul E. McKenney authored
      The current use of "RCU_TRACE(statement);" can cause odd bugs, especially
      where "statement" is a local-variable declaration, as it can leave a
      misplaced ";" in the source code.  This commit therefore converts these
      to "RCU_TRACE(statement;)", which avoids the misplaced ";".
      Reported-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      6c8c1485
    • Paul E. McKenney's avatar
      rcu: Semicolon inside RCU_TRACE() for rcu.h · dffd06a7
      Paul E. McKenney authored
      The current use of "RCU_TRACE(statement);" can cause odd bugs, especially
      where "statement" is a local-variable declaration, as it can leave a
      misplaced ";" in the source code.  This commit therefore converts these
      to "RCU_TRACE(statement;)", which avoids the misplaced ";".
      Reported-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      dffd06a7
    • Paul E. McKenney's avatar
      srcu: Check for tardy grace-period activity in cleanup_srcu_struct() · 15c68f7f
      Paul E. McKenney authored
      Users of SRCU are obliged to complete all grace-period activity before
      invoking cleanup_srcu_struct().  This means that all calls to either
      synchronize_srcu() or synchronize_srcu_expedited() must have returned,
      and all calls to call_srcu() must have returned, and the last call to
      call_srcu() must have been followed by a call to srcu_barrier().
      Furthermore, the caller must have done something to prevent any
      further calls to synchronize_srcu(), synchronize_srcu_expedited(),
      and call_srcu().
      
      Therefore, if there has ever been an invocation of call_srcu() on
      the srcu_struct in question, the sequence of events must be as
      follows:
      
      1.  Prevent any further calls to call_srcu().
      2.  Wait for any pre-existing call_srcu() invocations to return.
      3.  Invoke srcu_barrier().
      4.  It is now safe to invoke cleanup_srcu_struct().
      
      On the other hand, if there has ever been a call to synchronize_srcu()
      or synchronize_srcu_expedited(), the sequence of events must be as
      follows:
      
      1.  Prevent any further calls to synchronize_srcu() or
          synchronize_srcu_expedited().
      2.  Wait for any pre-existing synchronize_srcu() or
          synchronize_srcu_expedited() invocations to return.
      3.  It is now safe to invoke cleanup_srcu_struct().
      
      If there have been calls to all both types of functions (call_srcu()
      and either of synchronize_srcu() and synchronize_srcu_expedited()), then
      the caller must do the first three steps of the call_srcu() procedure
      above and the first two steps of the synchronize_s*() procedure above,
      and only then invoke cleanup_srcu_struct().
      
      Note that cleanup_srcu_struct() does some probabilistic checks
      for the caller failing to follow these procedures, in which case
      cleanup_srcu_struct() does WARN_ON() and avoids freeing the per-CPU
      structures associated with the specified srcu_struct structure.
      Reported-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      15c68f7f
    • Paul E. McKenney's avatar
      srcu: Consolidate batch checking into rcu_all_batches_empty() · cc985822
      Paul E. McKenney authored
      The srcu_reschedule() function invokes rcu_batch_empty() on each of
      the four rcu_batch structures in the srcu_struct in question twice.
      Given that this check will also be needed in cleanup_srcu_struct(), this
      commit consolidates these four checks into a new rcu_all_batches_empty()
      function.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      cc985822
    • Paul E. McKenney's avatar
      rcu: Make arch select smp_mb__after_unlock_lock() strength · 77e58496
      Paul E. McKenney authored
      The definition of smp_mb__after_unlock_lock() is currently smp_mb()
      for CONFIG_PPC and a no-op otherwise.  It would be better to instead
      provide an architecture-selectable Kconfig option, and select the
      strength of smp_mb__after_unlock_lock() based on that option.  This
      commit therefore creates ARCH_WEAK_RELEASE_ACQUIRE, has PPC select it,
      and bases the definition of smp_mb__after_unlock_lock() on this new
      ARCH_WEAK_RELEASE_ACQUIRE Kconfig option.
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Boqun Feng <boqun.feng@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: <linuxppc-dev@lists.ozlabs.org>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      77e58496
    • Paul E. McKenney's avatar
      rcu: Maintain special bits at bottom of ->dynticks counter · b8c17e66
      Paul E. McKenney authored
      Currently, IPIs are used to force other CPUs to invalidate their TLBs
      in response to a kernel virtual-memory mapping change.  This works, but
      degrades both battery lifetime (for idle CPUs) and real-time response
      (for nohz_full CPUs), and in addition results in unnecessary IPIs due to
      the fact that CPUs executing in usermode are unaffected by stale kernel
      mappings.  It would be better to cause a CPU executing in usermode to
      wait until it is entering kernel mode to do the flush, first to avoid
      interrupting usemode tasks and second to handle multiple flush requests
      with a single flush in the case of a long-running user task.
      
      This commit therefore reserves a bit at the bottom of the ->dynticks
      counter, which is checked upon exit from extended quiescent states.
      If it is set, it is cleared and then a new rcu_eqs_special_exit() macro is
      invoked, which, if not supplied, is an empty single-pass do-while loop.
      If this bottom bit is set on -entry- to an extended quiescent state,
      then a WARN_ON_ONCE() triggers.
      
      This bottom bit may be set using a new rcu_eqs_special_set() function,
      which returns true if the bit was set, or false if the CPU turned
      out to not be in an extended quiescent state.  Please note that this
      function refuses to set the bit for a non-nohz_full CPU when that CPU
      is executing in usermode because usermode execution is tracked by RCU
      as a dyntick-idle extended quiescent state only for nohz_full CPUs.
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      b8c17e66
  2. 12 Mar, 2017 5 commits
    • Linus Torvalds's avatar
      Linux 4.11-rc2 · 4495c08e
      Linus Torvalds authored
      4495c08e
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 56b24d1b
      Linus Torvalds authored
      Pull s390 fixes from Martin Schwidefsky:
      
       - four patches to get the new cputime code in shape for s390
      
       - add the new statx system call
      
       - a few bug fixes
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390: wire up statx system call
        KVM: s390: Fix guest migration for huge guests resulting in panic
        s390/ipl: always use load normal for CCW-type re-IPL
        s390/timex: micro optimization for tod_to_ns
        s390/cputime: provide archicture specific cputime_to_nsecs
        s390/cputime: reset all accounting fields on fork
        s390/cputime: remove last traces of cputime_t
        s390: fix in-kernel program checks
        s390/crypt: fix missing unlock in ctr_paes_crypt on error path
      56b24d1b
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5a45a5a8
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
      
       - a fix for the kexec/purgatory regression which was introduced in the
         merge window via an innocent sparse fix. We could have reverted that
         commit, but on deeper inspection it turned out that the whole
         machinery is neither documented nor robust. So a proper cleanup was
         done instead
      
       - the fix for the TLB flush issue which was discovered recently
      
       - a simple typo fix for a reboot quirk
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/tlb: Fix tlb flushing when lguest clears PGE
        kexec, x86/purgatory: Unbreak it and clean it up
        x86/reboot/quirks: Fix typo in ASUS EeeBook X205TA reboot quirk
      5a45a5a8
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ecade114
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
      
       - a workaround for a GIC erratum
      
       - a missing stub function for CONFIG_IRQDOMAIN=n
      
       - fixes for a couple of type inconsistencies
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/crossbar: Fix incorrect type of register size
        irqchip/gicv3-its: Add workaround for QDF2400 ITS erratum 0065
        irqdomain: Add empty irq_domain_check_msi_remap
        irqchip/crossbar: Fix incorrect type of local variables
      ecade114
    • Daniel Borkmann's avatar
      x86/tlb: Fix tlb flushing when lguest clears PGE · 2c4ea6e2
      Daniel Borkmann authored
      Fengguang reported random corruptions from various locations on x86-32
      after commits d2852a22 ("arch: add ARCH_HAS_SET_MEMORY config") and
      9d876e79 ("bpf: fix unlocking of jited image when module ronx not set")
      that uses the former. While x86-32 doesn't have a JIT like x86_64, the
      bpf_prog_lock_ro() and bpf_prog_unlock_ro() got enabled due to
      ARCH_HAS_SET_MEMORY, whereas Fengguang's test kernel doesn't have module
      support built in and therefore never had the DEBUG_SET_MODULE_RONX setting
      enabled.
      
      After investigating the crashes further, it turned out that using
      set_memory_ro() and set_memory_rw() didn't have the desired effect, for
      example, setting the pages as read-only on x86-32 would still let
      probe_kernel_write() succeed without error. This behavior would manifest
      itself in situations where the vmalloc'ed buffer was accessed prior to
      set_memory_*() such as in case of bpf_prog_alloc(). In cases where it
      wasn't, the page attribute changes seemed to have taken effect, leading to
      the conclusion that a TLB invalidate didn't happen. Moreover, it turned out
      that this issue reproduced with qemu in "-cpu kvm64" mode, but not for
      "-cpu host". When the issue occurs, change_page_attr_set_clr() did trigger
      a TLB flush as expected via __flush_tlb_all() through cpa_flush_range(),
      though.
      
      There are 3 variants for issuing a TLB flush: invpcid_flush_all() (depends
      on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE), cr4 based flush
      (depends on X86_FEATURE_PGE), and cr3 based flush.  For "-cpu host" case in
      my setup, the flush used invpcid_flush_all() variant, whereas for "-cpu
      kvm64", the flush was cr4 based. Switching the kvm64 case to cr3 manually
      worked fine, and further investigating the cr4 one turned out that
      X86_CR4_PGE bit was not set in cr4 register, meaning the
      __native_flush_tlb_global_irq_disabled() wrote cr4 twice with the same
      value instead of clearing X86_CR4_PGE in the first write to trigger the
      flush.
      
      It turned out that X86_CR4_PGE was cleared from cr4 during init from
      lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE bit is also
      cleared from there due to concerns of using PGE in guest kernel that can
      lead to hard to trace bugs (see bff672e6 ("lguest: documentation V:
      Host") in init()). The CPU feature bits are cleared in dynamic
      boot_cpu_data, but they never propagated to __flush_tlb_all() as it uses
      static_cpu_has() instead of boot_cpu_has() for testing which variant of TLB
      flushing to use, meaning they still used the old setting of the host
      kernel.
      
      Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would propagate
      to static_cpu_has() checks is too late at this point as sections have been
      patched already, so for now, it seems reasonable to switch back to
      boot_cpu_has(X86_FEATURE_PGE) as it was prior to commit c109bf95
      ("x86/cpufeature: Remove cpu_has_pge"). This lets the TLB flush trigger via
      cr3 as originally intended, properly makes the new page attributes visible
      and thus fixes the crashes seen by Fengguang.
      
      Fixes: c109bf95 ("x86/cpufeature: Remove cpu_has_pge")
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: bp@suse.de
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: netdev@vger.kernel.org
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: lkp@01.org
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernrl.org/r/20170301125426.l4nf65rx4wahohyl@wfg-t540p.sh.intel.com
      Link: http://lkml.kernel.org/r/25c41ad9eca164be4db9ad84f768965b7eb19d9e.1489191673.git.daniel@iogearbox.netSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      2c4ea6e2