1. 26 Sep, 2012 13 commits
    • Frederic Weisbecker's avatar
      x86: Use the new schedule_user API on userspace preemption · 0430499c
      Frederic Weisbecker authored
      This way we can exit the RCU extended quiescent state before
      we schedule a new task from irq/exception exit.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      0430499c
    • Frederic Weisbecker's avatar
      rcu: Exit RCU extended QS on user preemption · 20ab65e3
      Frederic Weisbecker authored
      When exceptions or irq are about to resume userspace, if
      the task needs to be rescheduled, the arch low level code
      calls schedule() directly.
      
      If we call it, it is because we have the TIF_RESCHED flag:
      
      - It can be set after random local calls to set_need_resched()
      (RCU, drm, ...)
      
      - A wake up happened and the CPU needs preemption. This can
        happen in several ways:
      
          * Remotely: the remote waking CPU has set TIF_RESCHED and send the
            wakee an IPI to schedule the new task.
          * Remotely enqueued: the remote waking CPU sends an IPI to the target
            and the wake up is made by the target.
          * Locally: waking CPU == wakee CPU and the wakeup is done locally.
            set_need_resched() is called without IPI.
      
      In the case of local and remotely enqueued wake ups, the tick can
      be restarted when we enqueue the new task and RCU can exit the
      extended quiescent state at the same time. Then by the time we reach
      irq exit path and we call schedule, we are not in RCU user mode.
      
      But if we call schedule() only because something called set_need_resched(),
      RCU may still be in user mode when we reach schedule.
      
      Also if a wake up is done remotely, the CPU might see the TIF_RESCHED
      flag and call schedule while the IPI has not yet happen to restart the
      tick and exit RCU user mode.
      
      We need to manually protect against these corner cases.
      
      Create a new API schedule_user() that calls schedule() inside
      rcu_user_exit()-rcu_user_enter() in order to protect it. Archs
      will need to rely on it now to implement user preemption safely.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      20ab65e3
    • Frederic Weisbecker's avatar
      rcu: Exit RCU extended QS on kernel preemption after irq/exception · 90a340ed
      Frederic Weisbecker authored
      When an exception or an irq exits, and we are going to resume into
      interrupted kernel code, the low level architecture code calls
      preempt_schedule_irq() if there is a need to reschedule.
      
      If the interrupt/exception occured between a call to rcu_user_enter()
      (from syscall exit, exception exit, do_notify_resume exit, ...) and
      a real resume to userspace (iret,...), preempt_schedule_irq() can be
      called whereas RCU thinks we are in userspace. But preempt_schedule_irq()
      is going to run kernel code and may be some RCU read side critical
      section. We must exit the userspace extended quiescent state before
      we call it.
      
      To solve this, just call rcu_user_exit() in the beginning of
      preempt_schedule_irq().
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      90a340ed
    • Frederic Weisbecker's avatar
      x86: Exception hooks for userspace RCU extended QS · 6ba3c97a
      Frederic Weisbecker authored
      Add necessary hooks to x86 exception for userspace
      RCU extended quiescent state support.
      
      This includes traps, page fault, debug exceptions, etc...
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      6ba3c97a
    • Frederic Weisbecker's avatar
      x86: Unspaghettize do_general_protection() · ef3f6288
      Frederic Weisbecker authored
      There is some unnatural label based layout in this function.
      Convert the unnecessary goto to readable conditional blocks.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      ef3f6288
    • Frederic Weisbecker's avatar
      x86: Syscall hooks for userspace RCU extended QS · bf5a3c13
      Frederic Weisbecker authored
      Add syscall slow path hooks to notify syscall entry
      and exit on CPUs that want to support userspace RCU
      extended quiescent state.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      bf5a3c13
    • Frederic Weisbecker's avatar
      rcu: Switch task's syscall hooks on context switch · 04e7e951
      Frederic Weisbecker authored
      Clear the syscalls hook of a task when it's scheduled out so that if
      the task migrates, it doesn't run the syscall slow path on a CPU
      that might not need it.
      
      Also set the syscalls hook on the next task if needed.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      04e7e951
    • Frederic Weisbecker's avatar
      rcu: Ignore userspace extended quiescent state by default · 1e1a689f
      Frederic Weisbecker authored
      By default we don't want to enter into RCU extended quiescent
      state while in userspace because doing this produces some overhead
      (eg: use of syscall slowpath). Set it off by default and ready to
      run when some feature like adaptive tickless need it.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      1e1a689f
    • Frederic Weisbecker's avatar
      rcu: Allow rcu_user_enter()/exit() to nest · c5d900bf
      Frederic Weisbecker authored
      Allow calls to rcu_user_enter() even if we are already
      in userspace (as seen by RCU) and allow calls to rcu_user_exit()
      even if we are already in the kernel.
      
      This makes the APIs more flexible to be called from architectures.
      Exception entries for example won't need to know if they come from
      userspace before calling rcu_user_exit().
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      c5d900bf
    • Frederic Weisbecker's avatar
      rcu: Settle config for userspace extended quiescent state · 2b1d5024
      Frederic Weisbecker authored
      Create a new config option under the RCU menu that put
      CPUs under RCU extended quiescent state (as in dynticks
      idle mode) when they run in userspace. This require
      some contribution from architectures to hook into kernel
      and userspace boundaries.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      2b1d5024
    • Paul E. McKenney's avatar
      rcu: Make RCU_FAST_NO_HZ handle adaptive ticks · 9a0c6fef
      Paul E. McKenney authored
      The current implementation of RCU_FAST_NO_HZ tries reasonably hard to rid
      the current CPU of RCU callbacks.  This is appropriate when the CPU is
      entering idle, where it doesn't have much useful to do anyway, but is most
      definitely not what you want when transitioning to user-mode execution.
      This commit therefore detects the adaptive-tick case, and refrains from
      burning CPU time getting rid of RCU callbacks in that case.
      Signed-off-by: default avatarPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      9a0c6fef
    • Frederic Weisbecker's avatar
      rcu: New rcu_user_enter_after_irq() and rcu_user_exit_after_irq() APIs · 19dd1591
      Frederic Weisbecker authored
      In some cases, it is necessary to enter or exit userspace-RCU-idle mode
      from an interrupt handler, for example, if some other CPU sends this
      CPU a resched IPI.  In this case, the current CPU would enter the IPI
      handler in userspace-RCU-idle mode, but would need to exit the IPI handler
      after having exited that mode.
      
      To allow this to work, this commit adds two new APIs to TREE_RCU:
      
      - rcu_user_enter_after_irq(). This must be called from an interrupt between
      rcu_irq_enter() and rcu_irq_exit().  After the irq calls rcu_irq_exit(),
      the irq handler will return into an RCU extended quiescent state.
      In theory, this interrupt is never a nested interrupt, but in practice
      it might interrupt softirq, which looks to RCU like a nested interrupt.
      
      - rcu_user_exit_after_irq(). This must be called from a non-nesting
      interrupt, interrupting an RCU extended quiescent state, also
      between rcu_irq_enter() and rcu_irq_exit(). After the irq calls
      rcu_irq_exit(), the irq handler will return in an RCU non-quiescent
      state.
      
      [ Combined with "Allow calls to rcu_exit_user_irq from nesting irqs." ]
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      19dd1591
    • Frederic Weisbecker's avatar
      rcu: New rcu_user_enter() and rcu_user_exit() APIs · adf5091e
      Frederic Weisbecker authored
      RCU currently insists that only idle tasks can enter RCU idle mode, which
      prohibits an adaptive tickless kernel (AKA nohz cpusets), which in turn
      would mean that usermode execution would always take scheduling-clock
      interrupts, even when there is only one task runnable on the CPU in
      question.
      
      This commit therefore adds rcu_user_enter() and rcu_user_exit(), which
      allow non-idle tasks to enter RCU idle mode.  These are quite similar
      to rcu_idle_enter() and rcu_idle_exit(), respectively, except that they
      omit the idle-task checks.
      
      [ Updated to use "user" flag rather than separate check functions. ]
      
      [ paulmck: Updated to drop exports of new functions based on Josh's patch
        getting rid of the need for them. ]
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      adf5091e
  2. 25 Sep, 2012 4 commits
  3. 24 Sep, 2012 1 commit
  4. 23 Sep, 2012 22 commits
    • Linus Torvalds's avatar
      Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · 56bae802
      Linus Torvalds authored
      Pull kbuild fixes from Michal Marek:
       "There are two more kbuild fixes for 3.6.
      
        One fixes a race between x86's archscripts target and the rule
        (re)building scripts/basic/fixdep.  The second is a fix for the
        previous attempt at fixing make firmware_install with make 3.82.
        This new solution should work with any version of GNU make"
      
      * 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
        x86/kbuild: archscripts depends on scripts_basic
        firmware: fix directory creation rule matching with make 3.80
      56bae802
    • Linus Torvalds's avatar
      Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging · 0737c8d7
      Linus Torvalds authored
      Pull hwmon subsystem fixes from Jean Delvare.
      
      * 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
        hwmon: (fam15h_power) Tweak runavg_range on resume
        hwmon: (coretemp) Use get_online_cpus to avoid races involving CPU hotplug
        hwmon: (via-cputemp) Use get_online_cpus to avoid races involving CPU hotplug
      0737c8d7
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 0bf7a705
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "This is a set of four essential fixes: two oops related (bnx2i,
        virtio-scsi), one data corruption related (hpsa) and one failure to
        boot due to interrupt routing issues (mpt2ss).
      
        Signed-off-by: James Bottomley <JBottomley@Parallels.com>"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        [SCSI] hpsa: fix handling of protocol error
        [SCSI] mpt2sas: Fix for issue - Unable to boot from the drive connected to HBA
        [SCSI] bnx2i: Fixed NULL ptr deference for 1G bnx2 Linux iSCSI offload
        [SCSI] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list
      0bf7a705
    • Shaun Ruffell's avatar
      edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs. · faa2ad09
      Shaun Ruffell authored
      Fix potential NULL pointer dereference in edac_unregister_sysfs() on
      system boot introduced in 3.6-rc1.
      
      Since commit 7a623c03 ("edac: rewrite the sysfs code to use struct
      device") edac_mc_alloc() no longer initializes embedded kobjects in
      struct mem_ctl_info.  Therefore edac_mc_free() can no longer simply
      decrement a kobject reference count to free the allocated memory unless
      the memory controller driver module had also called edac_mc_add_mc().
      
      Now edac_mc_free() will check if the newly embedded struct device has
      been registered with sysfs before using either the standard device
      release functions or freeing the data structures itself with logic
      pulled out of the error path of edac_mc_alloc().
      
      The BUG this patch resolves for me:
      
        BUG: unable to handle kernel NULL pointer dereference at   (null)
        EIP is at __wake_up_common+0x1a/0x6a
        Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 task.ti=f3dc6000)
        Call Trace:
          complete_all+0x3f/0x50
          device_pm_remove+0x23/0xa2
          device_del+0x34/0x142
          edac_unregister_sysfs+0x3b/0x5c [edac_core]
          edac_mc_free+0x29/0x2f [edac_core]
          e7xxx_probe1+0x268/0x311 [e7xxx_edac]
          e7xxx_init_one+0x56/0x61 [e7xxx_edac]
          local_pci_probe+0x13/0x15
        ...
      
      Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
      Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
      Signed-off-by: default avatarShaun Ruffell <sruffell@digium.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      faa2ad09
    • Fengguang Wu's avatar
      edac_mc: fix messy kfree calls in the error path · ef6e7816
      Fengguang Wu authored
      coccinelle warns about:
      
      + drivers/edac/edac_mc.c:429:9-23: ERROR: reference preceded by free on line 429
      
         421         if (mci->csrows) {
       > 422                 for (chn = 0; chn < tot_channels; chn++) {
         423                         csr = mci->csrows[chn];
         424                         if (csr) {
       > 425                                 for (chn = 0; chn < tot_channels; chn++)
         426                                          kfree(csr->channels[chn]);
         427                                  kfree(csr);
         428                          }
       > 429                          kfree(mci->csrows[i]);
         430                  }
         431                  kfree(mci->csrows);
         432          }
      
      and that code block seem to mess things up in several ways (double free, memory
      leak, out-of-bound reads etc.):
      
      L422: The iterator "chn" and bound "tot_channels" are totally wrong. Should be
            "row" and "tot_csrows" respectively. Which means either memory leak, or
            out-of-bound reads (which if does not trigger an immediate page fault
            error, will further lead to kfree() on random addresses).
      
      L425: The inner loop is reusing the same iterator "chn" as the outer loop,
            which could lead to premature end of the outer loop, and hence memory leak.
      
      L429: The array index 'i' in mci->csrows[i] is a temporary value used in
            previous loops, and won't change at all in the current loop. Which
            means either out-of-bound read and possibly kfree(random number), or the
            same mci->csrows[i] get freed once and again, and possibly double free
            for the kfree(csr) in L427.
      
      L426/L427: a kfree(csr->channels) is needed in between to avoid leaking the memory.
      
      The buggy code was introduced by commit de3910eb ("edac: change the mem
      allocation scheme to make Documentation/kobject.txt happy") in the 3.6-rc1
      merge window. Fix it by freeing up resources in this order:
      
        free csrows[i]->channels[j]
        free csrows[i]->channels
        free csrows[i]
        free csrows
      
      CC: Mauro Carvalho Chehab <mchehab@redhat.com>
      CC: Shaun Ruffell <sruffell@digium.com>
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ef6e7816
    • Andreas Herrmann's avatar
      hwmon: (fam15h_power) Tweak runavg_range on resume · 5f0ecb90
      Andreas Herrmann authored
      The quirk introduced with commit
      00250ec9 (hwmon: fam15h_power: fix
      bogus values with current BIOSes) is not only required during driver
      load but also when system resumes from suspend. The BIOS might set the
      previously recommended (but unsuitable) initilization value for the
      running average range register during resume.
      Signed-off-by: default avatarAndreas Herrmann <andreas.herrmann3@amd.com>
      Tested-by: default avatarAndreas Hartmann <andihartmann@01019freenet.de>
      Signed-off-by: default avatarJean Delvare <khali@linux-fr.org>
      Cc: stable@vger.kernel.org # 3.0+
      5f0ecb90
    • Silas Boyd-Wickizer's avatar
      hwmon: (coretemp) Use get_online_cpus to avoid races involving CPU hotplug · 641f1456
      Silas Boyd-Wickizer authored
      coretemp_init loops with for_each_online_cpu, adding platform_devices
      and sysfs interfaces, then calls register_hotcpu_notifier.  There is a
      race if a CPU is offlined or onlined after the loop, but before
      register_hotcpu_notifier.  The race might result in the absence of a
      platform_device+sysfs interface for an online CPU, or the presence of
      a platform_device+sysfs interface for an offline CPU.  A similar race
      occurs during coretemp_exit, after the module calls
      unregister_hotcpu_notifier, but before it unregisters all devices, a
      CPU might offline and a device for an offline CPU will exist for a
      short while.
      
      This fix surrounds for_each_online_cpu and register_hotcpu_notifier
      with get_online_cpus+put_online_cpus; and surrounds
      unregister_hotcpu_notifier and device unregistering with
      get_online_cpus+put_online_cpus.
      
      Build tested.
      Signed-off-by: default avatarSilas Boyd-Wickizer <sbw@mit.edu>
      Signed-off-by: default avatarJean Delvare <khali@linux-fr.org>
      641f1456
    • Silas Boyd-Wickizer's avatar
      hwmon: (via-cputemp) Use get_online_cpus to avoid races involving CPU hotplug · 1ec3ddfd
      Silas Boyd-Wickizer authored
      via_cputemp_init loops with for_each_online_cpu, adding
      platform_devices, then calls register_hotcpu_notifier.  If a CPU is
      offlined between the loop and register_hotcpu_notifier, then later
      onlined, via_cputemp_device_add will attempt to add platform devices
      with the same ID.  A similar race occurs during via_cputemp_exit,
      after the module calls unregister_hotcpu_notifier, a CPU might offline
      and a device will exist for a CPU that is offline.
      
      This fix surrounds for_each_online_cpu and register_hotcpu_notifier
      with get_online_cpus+put_online_cpus; and surrounds
      unregister_hotcpu_notifier and device unregistering with
      get_online_cpus+put_online_cpus.
      
      Build tested.
      Signed-off-by: default avatarSilas Boyd-Wickizer <sbw@mit.edu>
      Acked-by: default avatarHarald Welte <laforge@gnumonks.org>
      Signed-off-by: default avatarJean Delvare <khali@linux-fr.org>
      1ec3ddfd
    • Paul E. McKenney's avatar
      ia64: Add missing RCU idle APIs on idle loop · 93482f4e
      Paul E. McKenney authored
      Traditionally, the entire idle task served as an RCU quiescent state.
      But when RCU read side critical sections started appearing within the
      idle loop, this traditional strategy became untenable.  The fix was to
      create new RCU APIs named rcu_idle_enter() and rcu_idle_exit(), which
      must be called by each architecture's idle loop so that RCU can tell
      when it is safe to ignore a given idle CPU.
      
      Unfortunately, this fix was never applied to ia64, a shortcoming remedied
      by this commit.
      
      Reported by: Tony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarPaul E. McKenney <paul.mckenney@linaro.org>
      Cc: <stable@vger.kernel.org> # 3.3+
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested by: Tony Luck <tony.luck@intel.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      93482f4e
    • Frederic Weisbecker's avatar
      xtensa: Add missing RCU idle APIs on idle loop · 11ad47a0
      Frederic Weisbecker authored
      In the old times, the whole idle task was considered
      as an RCU quiescent state. But as RCU became more and
      more successful overtime, some RCU read side critical
      section have been added even in the code of some
      architectures idle tasks, for tracing for example.
      
      So nowadays, rcu_idle_enter() and rcu_idle_exit() must
      be called by the architecture to tell RCU about the part
      in the idle loop that doesn't make use of rcu read side
      critical sections, typically the part that puts the CPU
      in low power mode.
      
      This is necessary for RCU to find the quiescent states in
      idle in order to complete grace periods.
      
      Add this missing pair of calls in the xtensa's idle loop.
      Reported-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: <stable@vger.kernel.org> # 3.3+
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      11ad47a0
    • Frederic Weisbecker's avatar
      score: Add missing RCU idle APIs on idle loop · 0ee23fda
      Frederic Weisbecker authored
      In the old times, the whole idle task was considered
      as an RCU quiescent state. But as RCU became more and
      more successful overtime, some RCU read side critical
      section have been added even in the code of some
      architectures idle tasks, for tracing for example.
      
      So nowadays, rcu_idle_enter() and rcu_idle_exit() must
      be called by the architecture to tell RCU about the part
      in the idle loop that doesn't make use of rcu read side
      critical sections, typically the part that puts the CPU
      in low power mode.
      
      This is necessary for RCU to find the quiescent states in
      idle in order to complete grace periods.
      
      Add this missing pair of calls in scores's idle loop.
      Reported-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chen Liqin <liqin.chen@sunplusct.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: <stable@vger.kernel.org> # 3.3+
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      0ee23fda
    • Frederic Weisbecker's avatar
      parisc: Add missing RCU idle APIs on idle loop · fbe75218
      Frederic Weisbecker authored
      In the old times, the whole idle task was considered
      as an RCU quiescent state. But as RCU became more and
      more successful overtime, some RCU read side critical
      section have been added even in the code of some
      architectures idle tasks, for tracing for example.
      
      So nowadays, rcu_idle_enter() and rcu_idle_exit() must
      be called by the architecture to tell RCU about the part
      in the idle loop that doesn't make use of rcu read side
      critical sections, typically the part that puts the CPU
      in low power mode.
      
      This is necessary for RCU to find the quiescent states in
      idle in order to complete grace periods.
      
      Add this missing pair of calls in the parisc's idle loop.
      Reported-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Parisc <linux-parisc@vger.kernel.org>
      Cc: <stable@vger.kernel.org> # 3.3+
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      fbe75218
    • Frederic Weisbecker's avatar
      mn10300: Add missing RCU idle APIs on idle loop · 5b0753a9
      Frederic Weisbecker authored
      In the old times, the whole idle task was considered
      as an RCU quiescent state. But as RCU became more and
      more successful overtime, some RCU read side critical
      section have been added even in the code of some
      architectures idle tasks, for tracing for example.
      
      So nowadays, rcu_idle_enter() and rcu_idle_exit() must
      be called by the architecture to tell RCU about the part
      in the idle loop that doesn't make use of rcu read side
      critical sections, typically the part that puts the CPU
      in low power mode.
      
      This is necessary for RCU to find the quiescent states in
      idle in order to complete grace periods.
      
      Add this missing pair of calls in the mn10300's idle loop.
      Reported-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: <stable@vger.kernel.org> # 3.3+
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      5b0753a9
    • Frederic Weisbecker's avatar
      m68k: Add missing RCU idle APIs on idle loop · 5b57ba37
      Frederic Weisbecker authored
      In the old times, the whole idle task was considered
      as an RCU quiescent state. But as RCU became more and
      more successful overtime, some RCU read side critical
      section have been added even in the code of some
      architectures idle tasks, for tracing for example.
      
      So nowadays, rcu_idle_enter() and rcu_idle_exit() must
      be called by the architecture to tell RCU about the part
      in the idle loop that doesn't make use of rcu read side
      critical sections, typically the part that puts the CPU
      in low power mode.
      
      This is necessary for RCU to find the quiescent states in
      idle in order to complete grace periods.
      
      Add this missing pair of calls in the m68k's idle loop.
      Reported-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: m68k <linux-m68k@lists.linux-m68k.org>
      Cc: <stable@vger.kernel.org> # 3.3+
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      5b57ba37
    • Frederic Weisbecker's avatar
      m32r: Add missing RCU idle APIs on idle loop · 48ae077c
      Frederic Weisbecker authored
      In the old times, the whole idle task was considered
      as an RCU quiescent state. But as RCU became more and
      more successful overtime, some RCU read side critical
      section have been added even in the code of some
      architectures idle tasks, for tracing for example.
      
      So nowadays, rcu_idle_enter() and rcu_idle_exit() must
      be called by the architecture to tell RCU about the part
      in the idle loop that doesn't make use of rcu read side
      critical sections, typically the part that puts the CPU
      in low power mode.
      
      This is necessary for RCU to find the quiescent states in
      idle in order to complete grace periods.
      
      Add this missing pair of calls in the m32r's idle loop.
      Reported-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: <stable@vger.kernel.org> # 3.3+
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      48ae077c
    • Frederic Weisbecker's avatar
      h8300: Add missing RCU idle APIs on idle loop · b2fe1430
      Frederic Weisbecker authored
      In the old times, the whole idle task was considered
      as an RCU quiescent state. But as RCU became more and
      more successful overtime, some RCU read side critical
      section have been added even in the code of some
      architectures idle tasks, for tracing for example.
      
      So nowadays, rcu_idle_enter() and rcu_idle_exit() must
      be called by the architecture to tell RCU about the part
      in the idle loop that doesn't make use of rcu read side
      critical sections, typically the part that puts the CPU
      in low power mode.
      
      This is necessary for RCU to find the quiescent states in
      idle in order to complete grace periods.
      
      Add this missing pair of calls in the h8300's idle loop.
      Reported-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: <stable@vger.kernel.org> # 3.3+
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      b2fe1430
    • Frederic Weisbecker's avatar
      frv: Add missing RCU idle APIs on idle loop · 41d8fe5b
      Frederic Weisbecker authored
      In the old times, the whole idle task was considered
      as an RCU quiescent state. But as RCU became more and
      more successful overtime, some RCU read side critical
      section have been added even in the code of some
      architectures idle tasks, for tracing for example.
      
      So nowadays, rcu_idle_enter() and rcu_idle_exit() must
      be called by the architecture to tell RCU about the part
      in the idle loop that doesn't make use of rcu read side
      critical sections, typically the part that puts the CPU
      in low power mode.
      
      This is necessary for RCU to find the quiescent states in
      idle in order to complete grace periods.
      
      Add this missing pair of calls in the Frv's idle loop.
      Reported-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: <stable@vger.kernel.org> # 3.3+
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      41d8fe5b
    • Frederic Weisbecker's avatar
      cris: Add missing RCU idle APIs on idle loop · c633f9e7
      Frederic Weisbecker authored
      In the old times, the whole idle task was considered
      as an RCU quiescent state. But as RCU became more and
      more successful overtime, some RCU read side critical
      section have been added even in the code of some
      architectures idle tasks, for tracing for example.
      
      So nowadays, rcu_idle_enter() and rcu_idle_exit() must
      be called by the architecture to tell RCU about the part
      in the idle loop that doesn't make use of rcu read side
      critical sections, typically the part that puts the CPU
      in low power mode.
      
      This is necessary for RCU to find the quiescent states in
      idle in order to complete grace periods.
      
      Add this missing pair of calls in the Cris's idle loop.
      Reported-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Cris <linux-cris-kernel@axis.com>
      Cc: <stable@vger.kernel.org> # 3.3+
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      c633f9e7
    • Frederic Weisbecker's avatar
      alpha: Add missing RCU idle APIs on idle loop · 4c94cada
      Frederic Weisbecker authored
      In the old times, the whole idle task was considered
      as an RCU quiescent state. But as RCU became more and
      more successful overtime, some RCU read side critical
      section have been added even in the code of some
      architectures idle tasks, for tracing for example.
      
      So nowadays, rcu_idle_enter() and rcu_idle_exit() must
      be called by the architecture to tell RCU about the part
      in the idle loop that doesn't make use of rcu read side
      critical sections, typically the part that puts the CPU
      in low power mode.
      
      This is necessary for RCU to find the quiescent states in
      idle in order to complete grace periods.
      
      Add this missing pair of calls in the Alpha's idle loop.
      Reported-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Tested-by: default avatarMichael Cree <mcree@orcon.net.nz>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: alpha <linux-alpha@vger.kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org> # 3.3+
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      4c94cada
    • Frederic Weisbecker's avatar
      alpha: Fix preemption handling in idle loop · 6a6c0272
      Frederic Weisbecker authored
      cpu_idle() is called on the boot CPU by the init code with
      preemption disabled. But the cpu_idle() function in alpha
      doesn't handle this when it calls schedule() directly.
      
      Fix it by converting it into schedule_preempt_disabled().
      
      Also disable preemption before calling cpu_idle() from
      secondary CPU entry code to stay consistent with this
      state.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Tested-by: default avatarMichael Cree <mcree@orcon.net.nz>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: alpha <linux-alpha@vger.kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      6a6c0272
    • Silas Boyd-Wickizer's avatar
      Use get_online_cpus to avoid races involving CPU hotplug · 429227bb
      Silas Boyd-Wickizer authored
      If arch/x86/kernel/cpuid.c is a module, a CPU might offline or online
      between the for_each_online_cpu() loop and the call to
      register_hotcpu_notifier in cpuid_init or the call to
      unregister_hotcpu_notifier in cpuid_exit.  The potential races can
      lead to leaks/duplicates, attempts to destroy non-existant devices, or
      random pointer dereferences.
      
      For example, in cpuid_exit if:
      
              for_each_online_cpu(cpu)
                      cpuid_device_destroy(cpu);
              class_destroy(cpuid_class);
              __unregister_chrdev(CPUID_MAJOR, 0, NR_CPUS, "cpu/cpuid");
              <----- CPU onlines
              unregister_hotcpu_notifier(&cpuid_class_cpu_notifier);
      
      the hotcpu notifier will attempt to create a device for the
      cpuid_class, which the module already destroyed.
      
      This fix surrounds for_each_online_cpu and register_hotcpu_notifier or
      unregister_hotcpu_notifier with get_online_cpus+put_online_cpus.
      
      Tested on a VM.
      Signed-off-by: default avatarSilas Boyd-Wickizer <sbw@mit.edu>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      429227bb
    • Silas Boyd-Wickizer's avatar
      Use get_online_cpus to avoid races involving CPU hotplug · a2db672a
      Silas Boyd-Wickizer authored
      If arch/x86/kernel/msr.c is a module, a CPU might offline or online
      between the for_each_online_cpu(i) loop and the call to
      register_hotcpu_notifier in msr_init or the call to
      unregister_hotcpu_notifier in msr_exit. The potential races can lead
      to leaks/duplicates, attempts to destroy non-existant devices, or
      random pointer dereferences.
      
      For example, in msr_init if:
      
              for_each_online_cpu(i) {
                      err = msr_device_create(i);
                      if (err != 0)
                              goto out_class;
              }
              <----- CPU offlines
              register_hotcpu_notifier(&msr_class_cpu_notifier);
      
      and the CPU never onlines before msr_exit, then the module will never
      call msr_device_destroy for the associated CPU.
      
      This fix surrounds for_each_online_cpu and register_hotcpu_notifier or
      unregister_hotcpu_notifier with get_online_cpus+put_online_cpus.
      
      Tested on a VM.
      Signed-off-by: default avatarSilas Boyd-Wickizer <sbw@mit.edu>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a2db672a