1. 28 Oct, 2014 13 commits
    • Rik van Riel's avatar
      sched/numa: Calculate node scores in complex NUMA topologies · 6c6b1193
      Rik van Riel authored
      In order to do task placement on systems with complex NUMA topologies,
      it is necessary to count the faults on nodes nearby the node that is
      being examined for a potential move.
      
      In case of a system with a backplane interconnect, we are dealing with
      groups of NUMA nodes; each of the nodes within a group is the same number
      of hops away from nodes in other groups in the system. Optimal placement
      on this topology is achieved by counting all nearby nodes equally. When
      comparing nodes A and B at distance N, nearby nodes are those at distances
      smaller than N from nodes A or B.
      
      Placement strategy on a system with a glueless mesh NUMA topology needs
      to be different, because there are no natural groups of nodes determined
      by the hardware. Instead, when dealing with two nodes A and B at distance
      N, N >= 2, there will be intermediate nodes at distance < N from both nodes
      A and B. Good placement can be achieved by right shifting the faults on
      nearby nodes by the number of hops from the node being scored. In this
      context, a nearby node is any node less than the maximum distance in the
      system away from the node. Those nodes are skipped for efficiency reasons,
      there is no real policy reason to do so.
      
      Placement policy on directly connected NUMA systems is not affected.
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Tested-by: default avatarChegu Vinod <chegu_vinod@hp.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: mgorman@suse.de
      Cc: chegu_vinod@hp.com
      Link: http://lkml.kernel.org/r/1413530994-9732-5-git-send-email-riel@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6c6b1193
    • Rik van Riel's avatar
      sched/numa: Prepare for complex topology placement · 7bd95320
      Rik van Riel authored
      Preparatory patch for adding NUMA placement on systems with
      complex NUMA topology. Also fix a potential divide by zero
      in group_weight()
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Tested-by: default avatarChegu Vinod <chegu_vinod@hp.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: mgorman@suse.de
      Cc: chegu_vinod@hp.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1413530994-9732-4-git-send-email-riel@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7bd95320
    • Rik van Riel's avatar
      sched/numa: Classify the NUMA topology of a system · e3fe70b1
      Rik van Riel authored
      Smaller NUMA systems tend to have all NUMA nodes directly connected
      to each other. This includes the degenerate case of a system with just
      one node, ie. a non-NUMA system.
      
      Larger systems can have two kinds of NUMA topology, which affects how
      tasks and memory should be placed on the system.
      
      On glueless mesh systems, nodes that are not directly connected to
      each other will bounce traffic through intermediary nodes. Task groups
      can be run closer to each other by moving tasks from a node to an
      intermediary node between it and the task's preferred node.
      
      On NUMA systems with backplane controllers, the intermediary hops
      are incapable of running programs. This creates "islands" of nodes
      that are at an equal distance to anywhere else in the system.
      
      Each kind of topology requires a slightly different placement
      algorithm; this patch provides the mechanism to detect the kind
      of NUMA topology of a system.
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Tested-by: default avatarChegu Vinod <chegu_vinod@hp.com>
      [ Changed to use kernel/sched/sched.h ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: mgorman@suse.de
      Cc: chegu_vinod@hp.com
      Link: http://lkml.kernel.org/r/1413530994-9732-3-git-send-email-riel@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e3fe70b1
    • Rik van Riel's avatar
      sched/numa: Export info needed for NUMA balancing on complex topologies · 9942f79b
      Rik van Riel authored
      Export some information that is necessary to do placement of
      tasks on systems with multi-level NUMA topologies.
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: mgorman@suse.de
      Cc: chegu_vinod@hp.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1413530994-9732-2-git-send-email-riel@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9942f79b
    • Kirill Tkhai's avatar
      sched/dl: Fix preemption checks · f3a7e1a9
      Kirill Tkhai authored
      1) switched_to_dl() check is wrong. We reschedule only
         if rq->curr is deadline task, and we do not reschedule
         if it's a lower priority task. But we must always
         preempt a task of other classes.
      
      2) dl_task_timer():
         Policy does not change in case of priority inheritance.
         rt_mutex_setprio() changes prio, while policy remains old.
      
      So we lose some balancing logic in dl_task_timer() and
      switched_to_dl() when we check policy instead of priority. Boosted
      task may be rq->curr.
      
      (I didn't change switched_from_dl() because no check is necessary
      there at all).
      
      I've looked at this place(switched_to_dl) several times and even fixed
      this function, but found just now...  I suppose some performance tests
      may work better after this.
      Signed-off-by: default avatarKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1413909356.19914.128.camel@tkhaiSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f3a7e1a9
    • Chen Hanxiao's avatar
      sched: Update comments for CLONE_NEWNS · fcd964dd
      Chen Hanxiao authored
      Signed-off-by: default avatarChen Hanxiao <chenhanxiao@cn.fujitsu.com>
      Acked-by: default avatarSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-api@vger.kernel.org
      Link: http://lkml.kernel.org/r/1412674147-8941-1-git-send-email-chenhanxiao@cn.fujitsu.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fcd964dd
    • Oleg Nesterov's avatar
      sched: stop the unbound recursion in preempt_schedule_context() · 009f60e2
      Oleg Nesterov authored
      preempt_schedule_context() does preempt_enable_notrace() at the end
      and this can call the same function again; exception_exit() is heavy
      and it is quite possible that need-resched is true again.
      
      1. Change this code to dec preempt_count() and check need_resched()
         by hand.
      
      2. As Linus suggested, we can use the PREEMPT_ACTIVE bit and avoid
         the enable/disable dance around __schedule(). But in this case
         we need to move into sched/core.c.
      
      3. Cosmetic, but x86 forgets to declare this function. This doesn't
         really matter because it is only called by asm helpers, still it
         make sense to add the declaration into asm/preempt.h to match
         preempt_schedule().
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Graf <agraf@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/20141005202322.GB27962@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      009f60e2
    • Kirill Tkhai's avatar
      sched/fair: Fix division by zero sysctl_numa_balancing_scan_size · 64192658
      Kirill Tkhai authored
      File /proc/sys/kernel/numa_balancing_scan_size_mb allows writing of zero.
      
      This bash command reproduces problem:
      
      $ while :; do echo 0 > /proc/sys/kernel/numa_balancing_scan_size_mb; \
      	   echo 256 > /proc/sys/kernel/numa_balancing_scan_size_mb; done
      
      	divide error: 0000 [#1] SMP
      	Modules linked in:
      	CPU: 0 PID: 24112 Comm: bash Not tainted 3.17.0+ #8
      	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      	task: ffff88013c852600 ti: ffff880037a68000 task.ti: ffff880037a68000
      	RIP: 0010:[<ffffffff81074191>]  [<ffffffff81074191>] task_scan_min+0x21/0x50
      	RSP: 0000:ffff880037a6bce0  EFLAGS: 00010246
      	RAX: 0000000000000a00 RBX: 00000000000003e8 RCX: 0000000000000000
      	RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88013c852600
      	RBP: ffff880037a6bcf0 R08: 0000000000000001 R09: 0000000000015c90
      	R10: ffff880239bf6c00 R11: 0000000000000016 R12: 0000000000003fff
      	R13: ffff88013c852600 R14: ffffea0008d1b000 R15: 0000000000000003
      	FS:  00007f12bb048700(0000) GS:ffff88007da00000(0000) knlGS:0000000000000000
      	CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      	CR2: 0000000001505678 CR3: 0000000234770000 CR4: 00000000000006f0
      	Stack:
      	 ffff88013c852600 0000000000003fff ffff880037a6bd18 ffffffff810741d1
      	 ffff88013c852600 0000000000003fff 000000000002bfff ffff880037a6bda8
      	 ffffffff81077ef7 ffffea0008a56d40 0000000000000001 0000000000000001
      	Call Trace:
      	 [<ffffffff810741d1>] task_scan_max+0x11/0x40
      	 [<ffffffff81077ef7>] task_numa_fault+0x1f7/0xae0
      	 [<ffffffff8115a896>] ? migrate_misplaced_page+0x276/0x300
      	 [<ffffffff81134a4d>] handle_mm_fault+0x62d/0xba0
      	 [<ffffffff8103e2f1>] __do_page_fault+0x191/0x510
      	 [<ffffffff81030122>] ? native_smp_send_reschedule+0x42/0x60
      	 [<ffffffff8106dc00>] ? check_preempt_curr+0x80/0xa0
      	 [<ffffffff8107092c>] ? wake_up_new_task+0x11c/0x1a0
      	 [<ffffffff8104887d>] ? do_fork+0x14d/0x340
      	 [<ffffffff811799bb>] ? get_unused_fd_flags+0x2b/0x30
      	 [<ffffffff811799df>] ? __fd_install+0x1f/0x60
      	 [<ffffffff8103e67c>] do_page_fault+0xc/0x10
      	 [<ffffffff8150d322>] page_fault+0x22/0x30
      	RIP  [<ffffffff81074191>] task_scan_min+0x21/0x50
      	RSP <ffff880037a6bce0>
      	---[ end trace 9a826d16936c04de ]---
      
      Also fix race in task_scan_min (it depends on compiler behaviour).
      Signed-off-by: default avatarKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dario Faggioli <raistlin@linux.it>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Link: http://lkml.kernel.org/r/1413455977.24793.78.camel@tkhaiSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      64192658
    • Yasuaki Ishimatsu's avatar
      sched/fair: Care divide error in update_task_scan_period() · 2847c90e
      Yasuaki Ishimatsu authored
      While offling node by hot removing memory, the following divide error
      occurs:
      
        divide error: 0000 [#1] SMP
        [...]
        Call Trace:
         [...] handle_mm_fault
         [...] ? try_to_wake_up
         [...] ? wake_up_state
         [...] __do_page_fault
         [...] ? do_futex
         [...] ? put_prev_entity
         [...] ? __switch_to
         [...] do_page_fault
         [...] page_fault
        [...]
        RIP  [<ffffffff810a7081>] task_numa_fault
         RSP <ffff88084eb2bcb0>
      
      The issue occurs as follows:
        1. When page fault occurs and page is allocated from node 1,
           task_struct->numa_faults_buffer_memory[] of node 1 is
           incremented and p->numa_faults_locality[] is also incremented
           as follows:
      
           o numa_faults_buffer_memory[]       o numa_faults_locality[]
                    NR_NUMA_HINT_FAULT_TYPES
                   |      0     |     1     |
           ----------------------------------  ----------------------
            node 0 |      0     |     0     |   remote |      0     |
            node 1 |      0     |     1     |   locale |      1     |
           ----------------------------------  ----------------------
      
        2. node 1 is offlined by hot removing memory.
      
        3. When page fault occurs, fault_types[] is calculated by using
           p->numa_faults_buffer_memory[] of all online nodes in
           task_numa_placement(). But node 1 was offline by step 2. So
           the fault_types[] is calculated by using only
           p->numa_faults_buffer_memory[] of node 0. So both of fault_types[]
           are set to 0.
      
        4. The values(0) of fault_types[] pass to update_task_scan_period().
      
        5. numa_faults_locality[1] is set to 1. So the following division is
           calculated.
      
              static void update_task_scan_period(struct task_struct *p,
                                      unsigned long shared, unsigned long private){
              ...
                      ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (private + shared));
              }
      
        6. But both of private and shared are set to 0. So divide error
           occurs here.
      
      The divide error is rare case because the trigger is node offline.
      This patch always increments denominator for avoiding divide error.
      Signed-off-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/54475703.8000505@jp.fujitsu.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2847c90e
    • Kirill Tkhai's avatar
      sched/numa: Fix unsafe get_task_struct() in task_numa_assign() · 1effd9f1
      Kirill Tkhai authored
      Unlocked access to dst_rq->curr in task_numa_compare() is racy.
      If curr task is exiting this may be a reason of use-after-free:
      
      task_numa_compare()                    do_exit()
          ...                                        current->flags |= PF_EXITING;
          ...                                    release_task()
          ...                                        ~~delayed_put_task_struct()~~
          ...                                    schedule()
          rcu_read_lock()                        ...
          cur = ACCESS_ONCE(dst_rq->curr)        ...
              ...                                rq->curr = next;
              ...                                    context_switch()
              ...                                        finish_task_switch()
              ...                                            put_task_struct()
              ...                                                __put_task_struct()
              ...                                                    free_task_struct()
              task_numa_assign()                                     ...
                  get_task_struct()                                  ...
      
      As noted by Oleg:
      
        <<The lockless get_task_struct(tsk) is only safe if tsk == current
          and didn't pass exit_notify(), or if this tsk was found on a rcu
          protected list (say, for_each_process() or find_task_by_vpid()).
          IOW, it is only safe if release_task() was not called before we
          take rcu_read_lock(), in this case we can rely on the fact that
          delayed_put_pid() can not drop the (potentially) last reference
          until rcu_read_unlock().
      
          And as Kirill pointed out task_numa_compare()->task_numa_assign()
          path does get_task_struct(dst_rq->curr) and this is not safe. The
          task_struct itself can't go away, but rcu_read_lock() can't save
          us from the final put_task_struct() in finish_task_switch(); this
          reference goes away without rcu gp>>
      
      The patch provides simple check of PF_EXITING flag. If it's not set,
      this guarantees that call_rcu() of delayed_put_task_struct() callback
      hasn't happened yet, so we can safely do get_task_struct() in
      task_numa_assign().
      
      Locked dst_rq->lock protects from concurrency with the last schedule().
      Reusing or unmapping of cur's memory may happen without it.
      Suggested-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1413962231.19914.130.camel@tkhaiSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1effd9f1
    • Juri Lelli's avatar
      sched/deadline: Fix races between rt_mutex_setprio() and dl_task_timer() · aee38ea9
      Juri Lelli authored
      dl_task_timer() is racy against several paths. Daniel noticed that
      the replenishment timer may experience a race condition against an
      enqueue_dl_entity() called from rt_mutex_setprio(). With his own
      words:
      
       rt_mutex_setprio() resets p->dl.dl_throttled. So the pattern is:
       start_dl_timer() throttled = 1, rt_mutex_setprio() throlled = 0,
       sched_switch() -> enqueue_task(), dl_task_timer-> enqueue_task()
       throttled is 0
      
      => BUG_ON(on_dl_rq(dl_se)) fires as the scheduling entity is already
      enqueued on the -deadline runqueue.
      
      As we do for the other races, we just bail out in the replenishment
      timer code.
      Reported-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Tested-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Signed-off-by: default avatarJuri Lelli <juri.lelli@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: vincent@legout.info
      Cc: Dario Faggioli <raistlin@linux.it>
      Cc: Michael Trimarchi <michael@amarulasolutions.com>
      Cc: Fabio Checconi <fchecconi@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1414142198-18552-5-git-send-email-juri.lelli@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      aee38ea9
    • Juri Lelli's avatar
      sched/deadline: Don't replenish from a !SCHED_DEADLINE entity · 64be6f1f
      Juri Lelli authored
      In the deboost path, right after the dl_boosted flag has been
      reset, we can currently end up replenishing using -deadline
      parameters of a !SCHED_DEADLINE entity. This of course causes
      a bug, as those parameters are empty.
      
      In the case depicted above it is safe to simply bail out, as
      the deboosted task is going to be back to its original scheduling
      class anyway.
      Reported-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Tested-by: default avatarDaniel Wagner <daniel.wagner@bmw-carit.de>
      Signed-off-by: default avatarJuri Lelli <juri.lelli@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: vincent@legout.info
      Cc: Dario Faggioli <raistlin@linux.it>
      Cc: Michael Trimarchi <michael@amarulasolutions.com>
      Cc: Fabio Checconi <fchecconi@gmail.com>
      Link: http://lkml.kernel.org/r/1414142198-18552-4-git-send-email-juri.lelli@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      64be6f1f
    • Kirill Tkhai's avatar
      sched: Fix race between task_group and sched_task_group · eeb61e53
      Kirill Tkhai authored
      The race may happen when somebody is changing task_group of a forking task.
      Child's cgroup is the same as parent's after dup_task_struct() (there just
      memory copying). Also, cfs_rq and rt_rq are the same as parent's.
      
      But if parent changes its task_group before it's called cgroup_post_fork(),
      we do not reflect this situation on child. Child's cfs_rq and rt_rq remain
      the same, while child's task_group changes in cgroup_post_fork().
      
      To fix this we introduce fork() method, which calls sched_move_task() directly.
      This function changes sched_task_group on appropriate (also its logic has
      no problem with freshly created tasks, so we shouldn't introduce something
      special; we are able just to use it).
      
      Possibly, this decides the Burke Libbey's problem: https://lkml.org/lkml/2014/10/24/456Signed-off-by: default avatarKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1414405105.19914.169.camel@tkhaiSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      eeb61e53
  2. 26 Oct, 2014 4 commits
    • Linus Torvalds's avatar
      Linux 3.18-rc2 · cac7f242
      Linus Torvalds authored
      cac7f242
    • Linus Torvalds's avatar
      Merge tag 'armsoc-for-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 88e23761
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "Another week, another small batch of fixes.
      
        Most of these make zynq, socfpga and sunxi platforms work a bit
        better:
      
         - due to new requirements for regulators, DWMMC on socfpga broke past
           v3.17
         - SMP spinup fix for socfpga
         - a few DT fixes for zynq
         - another option (FIXED_REGULATOR) for sunxi is needed that used to
           be selected by other options but no longer is.
         - a couple of small DT fixes for at91
         - ...and a couple for i.MX"
      
      * tag 'armsoc-for-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        ARM: dts: imx28-evk: Let i2c0 run at 100kHz
        ARM: i.MX6: Fix "emi" clock name typo
        ARM: multi_v7_defconfig: enable CONFIG_MMC_DW_ROCKCHIP
        ARM: sunxi_defconfig: enable CONFIG_REGULATOR_FIXED_VOLTAGE
        ARM: dts: socfpga: Add a 3.3V fixed regulator node
        ARM: dts: socfpga: Fix SD card detect
        ARM: dts: socfpga: rename gpio nodes
        ARM: at91/dt: sam9263: fix PLLB frequencies
        power: reset: at91-reset: fix power down register
        MAINTAINERS: add atmel ssc driver maintainer entry
        arm: socfpga: fix fetching cpu1start_addr for SMP
        ARM: zynq: DT: trivial: Fix mc node
        ARM: zynq: DT: Add cadence watchdog node
        ARM: zynq: DT: Add missing reference for memory-controller
        ARM: zynq: DT: Add missing reference for ADC
        ARM: zynq: DT: Add missing address for L2 pl310
        ARM: zynq: DT: Remove 222 MHz OPP
        ARM: zynq: DT: Fix GEM register area size
      88e23761
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · d1e14f1d
      Linus Torvalds authored
      Pull vfs updates from Al Viro:
       "overlayfs merge + leak fix for d_splice_alias() failure exits"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        overlayfs: embed middle into overlay_readdir_data
        overlayfs: embed root into overlay_readdir_data
        overlayfs: make ovl_cache_entry->name an array instead of pointer
        overlayfs: don't hold ->i_mutex over opening the real directory
        fix inode leaks on d_splice_alias() failure exits
        fs: limit filesystem stacking depth
        overlay: overlay filesystem documentation
        overlayfs: implement show_options
        overlayfs: add statfs support
        overlay filesystem
        shmem: support RENAME_WHITEOUT
        ext4: support RENAME_WHITEOUT
        vfs: add RENAME_WHITEOUT
        vfs: add whiteout support
        vfs: export check_sticky()
        vfs: introduce clone_private_mount()
        vfs: export __inode_permission() to modules
        vfs: export do_splice_direct() to modules
        vfs: add i_op->dentry_open()
      d1e14f1d
    • Olof Johansson's avatar
      Merge tag 'imx-fixes-3.18' of... · efc176a8
      Olof Johansson authored
      Merge tag 'imx-fixes-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes
      
      Merge "ARM: imx: fixes for 3.18" from Shawn Guo:
      
      The i.MX fixes for 3.18:
       - Revert one patch which increases I2C bus frequency on imx28-evk
       - Fix a typo on imx6q EIM clock name
      
      * tag 'imx-fixes-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
        ARM: dts: imx28-evk: Let i2c0 run at 100kHz
        ARM: i.MX6: Fix "emi" clock name typo
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      efc176a8
  3. 25 Oct, 2014 6 commits
  4. 24 Oct, 2014 17 commits
    • Linus Torvalds's avatar
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · 2cc91884
      Linus Torvalds authored
      Pull MIPS fixes from Ralf Baechle:
       "This is the first round of fixes and tying up loose ends for MIPS.
      
         - plenty of fixes for build errors in specific obscure configurations
         - remove redundant code on the Lantiq platform
         - removal of a useless SEAD I2C driver that was causing a build issue
         - fix an earlier TLB exeption handler fix to also work on Octeon.
         - fix ISA level dependencies in FPU emulator's instruction decoding.
         - don't hardcode kernel command line in Octeon software emulator.
         - fix an earlier fix for the Loondson 2 clock setting"
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
        MIPS: SEAD3: Fix I2C device registration.
        MIPS: SEAD3: Nuke PIC32 I2C driver.
        MIPS: ftrace: Fix a microMIPS build problem
        MIPS: MSP71xx: Fix build error
        MIPS: Malta: Do not build the malta-amon.c file if CMP is not enabled
        MIPS: Prevent compiler warning from cop2_{save,restore}
        MIPS: Kconfig: Add missing MIPS_CPS dependencies to PM and cpuidle
        MIPS: idle: Remove leftover __pastwait symbol and its references
        MIPS: Sibyte: Include the swarm subdir to the sb1250 LittleSur builds
        MIPS: ptrace.h: Add a missing include
        MIPS: ath79: Fix compilation error when CONFIG_PCI is disabled
        MIPS: MSP71xx: Remove compilation error when CONFIG_MIPS_MT is present
        MIPS: Octeon: Remove special case for simulator command line.
        MIPS: tlbex: Properly fix HUGE TLB Refill exception handler
        MIPS: loongson2_cpufreq: Fix CPU clock rate setting mismerge
        pci: pci-lantiq: remove duplicate check on resource
        MIPS: Lasat: Add missing CONFIG_PROC_FS dependency to PICVUE_PROC
        MIPS: cp1emu: Fix ISA restrictions for cop1x_op instructions
      2cc91884
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · cdc63a05
      Linus Torvalds authored
      Pull arm64 fixes from Catalin Marinas:
      
       - enable 48-bit VA space now that KVM has been fixed, together with a
         couple of fixes for pgd allocation alignment and initial memblock
         current_limit.  There is still a dependency on !ARM_SMMU which needs
         to be updated as it uses the page table manipulation macros of the
         host kernel
       - eBPF fixes following changes/conflicts during the merging window
       - Compat types affecting compat_elf_prpsinfo
       - Compilation error on UP builds
       - ASLR fix when /proc/sys/kernel/randomize_va_space == 0
       - DT definitions for CLCD support on ARMv8 model platform
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: Fix memblock current_limit with 64K pages and 48-bit VA
        arm64: ASLR: Don't randomise text when randomise_va_space == 0
        arm64: vexpress: Add CLCD support to the ARMv8 model platform
        arm64: Fix compilation error on UP builds
        Documentation/arm64/memory.txt: fix typo
        net: bpf: arm64: minor fix of type in jited
        arm64: bpf: add 'load 64-bit immediate' instruction
        arm64: bpf: add 'shift by register' instructions
        net: bpf: arm64: address randomize and write protect JIT code
        arm64: mm: Correct fixmap pagetable types
        arm64: compat: fix compat types affecting struct compat_elf_prpsinfo
        arm64: Align less than PAGE_SIZE pgds naturally
        arm64: Allow 48-bits VA space without ARM_SMMU
      cdc63a05
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 83da00fb
      Linus Torvalds authored
      Pull two sparc fixes from David Miller:
      
       1) Fix boots with gcc-4.9 compiled sparc64 kernels.
      
       2) Add missing __get_user_pages_fast() on sparc64 to fix hangs on
          futexes used in transparent hugepage areas.
      
          It's really idiotic to have a weak symbolled fallback that just
          returns zero, and causes this kind of bug.  There should be no
          backup implementation and the link should fail if the architecture
          fails to provide __get_user_pages_fast() and supports transparent
          hugepages.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: Implement __get_user_pages_fast().
        sparc64: Fix register corruption in top-most kernel stack frame during boot.
      83da00fb
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 96971e9a
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "This is a pretty large update.  I think it is roughly as big as what I
        usually had for the _whole_ rc period.
      
        There are a few bad bugs where the guest can OOPS or crash the host.
        We have also started looking at attack models for nested
        virtualization; bugs that usually result in the guest ring 0 crashing
        itself become more worrisome if you have nested virtualization,
        because the nested guest might bring down the non-nested guest as
        well.  For current uses of nested virtualization these do not really
        have a security impact, but you never know and bugs are bugs
        nevertheless.
      
        A lot of these bugs are in 3.17 too, resulting in a large number of
        stable@ Ccs.  I checked that all the patches apply there with no
        conflicts"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        kvm: vfio: fix unregister kvm_device_ops of vfio
        KVM: x86: Wrong assertion on paging_tmpl.h
        kvm: fix excessive pages un-pinning in kvm_iommu_map error path.
        KVM: x86: PREFETCH and HINT_NOP should have SrcMem flag
        KVM: x86: Emulator does not decode clflush well
        KVM: emulate: avoid accessing NULL ctxt->memopp
        KVM: x86: Decoding guest instructions which cross page boundary may fail
        kvm: x86: don't kill guest on unknown exit reason
        kvm: vmx: handle invvpid vm exit gracefully
        KVM: x86: Handle errors when RIP is set during far jumps
        KVM: x86: Emulator fixes for eip canonical checks on near branches
        KVM: x86: Fix wrong masking on relative jump/call
        KVM: x86: Improve thread safety in pit
        KVM: x86: Prevent host from panicking on shared MSR writes.
        KVM: x86: Check non-canonical addresses upon WRMSR
      96971e9a
    • Linus Torvalds's avatar
      Merge tag 'stable/for-linus-3.18-b-rc1-tag' of... · 20ca57cd
      Linus Torvalds authored
      Merge tag 'stable/for-linus-3.18-b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
      
      Pull xen bug fixes from David Vrabel:
      
       - Fix regression in xen_clocksource_read() which caused all Xen guests
         to crash early in boot.
       - Several fixes for super rare race conditions in the p2m.
       - Assorted other minor fixes.
      
      * tag 'stable/for-linus-3.18-b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/pci: Allocate memory for physdev_pci_device_add's optarr
        x86/xen: panic on bad Xen-provided memory map
        x86/xen: Fix incorrect per_cpu accessor in xen_clocksource_read()
        x86/xen: avoid race in p2m handling
        x86/xen: delay construction of mfn_list_list
        x86/xen: avoid writing to freed memory after race in p2m handling
        xen/balloon: Don't continue ballooning when BP_ECANCELED is encountered
      20ca57cd
    • Linus Torvalds's avatar
      Merge tag 'sound-3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · c6d13403
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Here are a chunk of small fixes since rc1: two PCM core fixes, one is
        a long-standing annoyance about lockdep and another is an ARM64 mmap
        fix.
      
        The rest are a HD-audio HDMI hotplug notification fix, a fix for
        missing NULL termination in Realtek codec quirks and a few new
        device/codec-specific quirks as usual"
      
      * tag 'sound-3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda - Add missing terminating entry to SND_HDA_PIN_QUIRK macro
        ALSA: pcm: Fix false lockdep warnings
        ALSA: hda - Fix inverted LED gpio setup for Lenovo Ideapad
        ALSA: hda - hdmi: Fix missing ELD change event on plug/unplug
        ALSA: usb-audio: Add support for Steinberg UR22 USB interface
        ALSA: ALC283 codec - Avoid pop noise on headphones during suspend/resume
        ALSA: pcm: use the same dma mmap codepath both for arm and arm64
      c6d13403
    • Linus Torvalds's avatar
      Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random · 14d4cc08
      Linus Torvalds authored
      Pull /dev/random updates from Ted Ts'o:
       "This adds a memzero_explicit() call which is guaranteed not to be
        optimized away by GCC.  This is important when we are wiping
        cryptographically sensitive material"
      
      * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
        crypto: memzero_explicit - make sure to clear out sensitive data
        random: add and use memzero_explicit() for clearing data
      14d4cc08
    • Linus Torvalds's avatar
      Merge tag 'pm+acpi-3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 1c45d9a9
      Linus Torvalds authored
      Pull ACPI and power management updates from Rafael Wysocki:
       "This is material that didn't make it to my 3.18-rc1 pull request for
        various reasons, mostly related to timing and travel (LinuxCon EU /
        LPC) plus a couple of fixes for recent bugs.
      
        The only really new thing here is the PM QoS class for memory
        bandwidth, but it is simple enough and users of it will be added in
        the next cycle.  One major change in behavior is that platform devices
        enumerated by ACPI will use 32-bit DMA mask by default.  Also included
        is an ACPICA update to a new upstream release, but that's mostly
        cleanups, changes in tools and similar.  The rest is fixes and
        cleanups mostly.
      
        Specifics:
      
         - Fix for a recent PCI power management change that overlooked the
           fact that some IRQ chips might not be able to configure PCIe PME
           for system wakeup from Lucas Stach.
      
         - Fix for a bug introduced in 3.17 where acpi_device_wakeup() is
           called with a wrong ordering of arguments from Zhang Rui.
      
         - A bunch of intel_pstate driver fixes (all -stable candidates) from
           Dirk Brandewie, Gabriele Mazzotta and Pali Rohár.
      
         - Fixes for a rather long-standing problem with the OOM killer and
           the freezer that frozen processes killed by the OOM do not actually
           release any memory until they are thawed, so OOM-killing them is
           rather pointless, with a couple of cleanups on top (Michal Hocko,
           Cong Wang, Rafael J Wysocki).
      
         - ACPICA update to upstream release 20140926, inlcuding mostly
           cleanups reducing differences between the upstream ACPICA and the
           kernel code, tools changes (acpidump, acpiexec) and support for the
           _DDN object (Bob Moore, Lv Zheng).
      
         - New PM QoS class for memory bandwidth from Tomeu Vizoso.
      
         - Default 32-bit DMA mask for platform devices enumerated by ACPI
           (this change is mostly needed for some drivers development in
           progress targeted at 3.19) from Heikki Krogerus.
      
         - ACPI EC driver cleanups, mostly related to debugging, from Lv
           Zheng.
      
         - cpufreq-dt driver updates from Thomas Petazzoni.
      
         - powernv cpuidle driver update from Preeti U Murthy"
      
      * tag 'pm+acpi-3.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (34 commits)
        intel_pstate: Correct BYT VID values.
        intel_pstate: Fix BYT frequency reporting
        intel_pstate: Don't lose sysfs settings during cpu offline
        cpufreq: intel_pstate: Reflect current no_turbo state correctly
        cpufreq: expose scaling_cur_freq sysfs file for set_policy() drivers
        cpufreq: intel_pstate: Fix setting max_perf_pct in performance policy
        PCI / PM: handle failure to enable wakeup on PCIe PME
        ACPI: invoke acpi_device_wakeup() with correct parameters
        PM / freezer: Clean up code after recent fixes
        PM: convert do_each_thread to for_each_process_thread
        OOM, PM: OOM killed task shouldn't escape PM suspend
        freezer: remove obsolete comments in __thaw_task()
        freezer: Do not freeze tasks killed by OOM killer
        ACPI / platform: provide default DMA mask
        cpuidle: powernv: Populate cpuidle state details by querying the device-tree
        cpufreq: cpufreq-dt: adjust message related to regulators
        cpufreq: cpufreq-dt: extend with platform_data
        cpufreq: allow driver-specific data
        ACPI / EC: Cleanup coding style.
        ACPI / EC: Refine event/query debugging messages.
        ...
      1c45d9a9
    • Linus Torvalds's avatar
      Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux · 8264fce6
      Linus Torvalds authored
      Pull thermal management updates from Zhang Rui:
       "Sorry that I missed the merge window as there is a bug found in the
        last minute, and I have to fix it and wait for the code to be tested
        in linux-next tree for a few days.  Now the buggy patch has been
        dropped entirely from my next branch.  Thus I hope those changes can
        still be merged in 3.18-rc2 as most of them are platform thermal
        driver changes.
      
        Specifics:
      
         - introduce ACPI INT340X thermal drivers.
      
           Newer laptops and tablets may have thermal sensors and other
           devices with thermal control capabilities that are exposed for the
           OS to use via the ACPI INT340x device objects.  Several drivers are
           introduced to expose the temperature information and cooling
           ability from these objects to user-space via the normal thermal
           framework.
      
           From: Lu Aaron, Lan Tianyu, Jacob Pan and Zhang Rui.
      
         - introduce a new thermal governor, which just uses a hysteresis to
           switch abruptly on/off a cooling device.  This governor can be used
           to control certain fan devices that can not be throttled but just
           switched on or off.  From: Peter Feuerer.
      
         - introduce support for some new thermal interrupt functions on
           i.MX6SX, in IMX thermal driver.  From: Anson, Huang.
      
         - introduce tracing support on thermal framework.  From: Punit
           Agrawal.
      
         - small fixes in OF thermal and thermal step_wise governor"
      
      * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (25 commits)
        Thermal: int340x thermal: select ACPI fan driver
        Thermal: int3400_thermal: use acpi_thermal_rel parsing APIs
        Thermal: int340x_thermal: expose acpi thermal relationship tables
        Thermal: introduce int3403 thermal driver
        Thermal: introduce INT3402 thermal driver
        Thermal: move the KELVIN_TO_MILLICELSIUS macro to thermal.h
        ACPI / Fan: support INT3404 thermal device
        ACPI / Fan: add ACPI 4.0 style fan support
        ACPI / fan: convert to platform driver
        ACPI / fan: use acpi_device_xxx_power instead of acpi_bus equivelant
        ACPI / fan: remove no need check for device pointer
        ACPI / fan: remove unused macro
        Thermal: int3400 thermal: register to thermal framework
        Thermal: int3400 thermal: add capability to detect supporting UUIDs
        Thermal: introduce int3400 thermal driver
        ACPI: add ACPI_TYPE_LOCAL_REFERENCE support to acpi_extract_package()
        ACPI: make acpi_create_platform_device() an external API
        thermal: step_wise: fix: Prevent from binary overflow when trend is dropping
        ACPI: introduce ACPI int340x thermal scan handler
        thermal: Added Bang-bang thermal governor
        ...
      8264fce6
    • Catalin Marinas's avatar
      arm64: Fix memblock current_limit with 64K pages and 48-bit VA · 3dec0fe4
      Catalin Marinas authored
      With 48-bit VA space, the 64K page configuration uses 3 levels instead
      of 2 and PUD_SIZE != PMD_SIZE. Since with 64K pages we only cover
      PMD_SIZE with the initial swapper_pg_dir populated in head.S, the
      memblock current_limit needs to be set accordingly in map_mem() to avoid
      allocating unmapped memory. The memblock current_limit is progressively
      increased as more blocks are mapped.
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      3dec0fe4
    • David S. Miller's avatar
      sparc64: Implement __get_user_pages_fast(). · 06090e8e
      David S. Miller authored
      It is not sufficient to only implement get_user_pages_fast(), you
      must also implement the atomic version __get_user_pages_fast()
      otherwise you end up using the weak symbol fallback implementation
      which simply returns zero.
      
      This is dangerous, because it causes the futex code to loop forever
      if transparent hugepages are supported (see get_futex_key()).
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06090e8e
    • David S. Miller's avatar
      sparc64: Fix register corruption in top-most kernel stack frame during boot. · ef3e035c
      David S. Miller authored
      Meelis Roos reported that kernels built with gcc-4.9 do not boot, we
      eventually narrowed this down to only impacting machines using
      UltraSPARC-III and derivitive cpus.
      
      The crash happens right when the first user process is spawned:
      
      [   54.451346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
      [   54.451346]
      [   54.571516] CPU: 1 PID: 1 Comm: init Not tainted 3.16.0-rc2-00211-gd7933ab7 #96
      [   54.666431] Call Trace:
      [   54.698453]  [0000000000762f8c] panic+0xb0/0x224
      [   54.759071]  [000000000045cf68] do_exit+0x948/0x960
      [   54.823123]  [000000000042cbc0] fault_in_user_windows+0xe0/0x100
      [   54.902036]  [0000000000404ad0] __handle_user_windows+0x0/0x10
      [   54.978662] Press Stop-A (L1-A) to return to the boot prom
      [   55.050713] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
      
      Further investigation showed that compiling only per_cpu_patch() with
      an older compiler fixes the boot.
      
      Detailed analysis showed that the function is not being miscompiled by
      gcc-4.9, but it is using a different register allocation ordering.
      
      With the gcc-4.9 compiled function, something during the code patching
      causes some of the %i* input registers to get corrupted.  Perhaps
      we have a TLB miss path into the firmware that is deep enough to
      cause a register window spill and subsequent restore when we get
      back from the TLB miss trap.
      
      Let's plug this up by doing two things:
      
      1) Stop using the firmware stack for client interface calls into
         the firmware.  Just use the kernel's stack.
      
      2) As soon as we can, call into a new function "start_early_boot()"
         to put a one-register-window buffer between the firmware's
         deepest stack frame and the top-most initial kernel one.
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef3e035c
    • Arun Chandran's avatar
      arm64: ASLR: Don't randomise text when randomise_va_space == 0 · 92980405
      Arun Chandran authored
      When user asks to turn off ASLR by writing "0" to
      /proc/sys/kernel/randomize_va_space there should not be
      any randomization to mmap base, stack, VDSO, libs, text and heap
      
      Currently arm64 violates this behavior by randomising text.
      Fix this by defining a constant ELF_ET_DYN_BASE. The randomisation of
      mm->mmap_base is done by setup_new_exec -> arch_pick_mmap_layout ->
      mmap_base -> mmap_rnd.
      Signed-off-by: default avatarArun Chandran <achandran@mvista.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      92980405
    • Ralf Baechle's avatar
      MIPS: SEAD3: Fix I2C device registration. · 4846f118
      Ralf Baechle authored
      This isn't a module and shouldn't be one.
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      4846f118
    • Wanpeng Li's avatar
      kvm: vfio: fix unregister kvm_device_ops of vfio · 571ee1b6
      Wanpeng Li authored
      After commit 80ce1639 (KVM: VFIO: register kvm_device_ops dynamically),
      kvm_device_ops of vfio can be registered dynamically. Commit 3c3c29fd
      (kvm-vfio: do not use module_init) move the dynamic register invoked by
      kvm_init in order to fix broke unloading of the kvm module. However,
      kvm_device_ops of vfio is unregistered after rmmod kvm-intel module
      which lead to device type collision detection warning after kvm-intel
      module reinsmod.
      
          WARNING: CPU: 1 PID: 10358 at /root/cathy/kvm/arch/x86/kvm/../../../virt/kvm/kvm_main.c:3289 kvm_init+0x234/0x282 [kvm]()
          Modules linked in: kvm_intel(O+) kvm(O) nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs fscache lockd sunrpc pci_stub bridge stp llc autofs4 8021q cpufreq_ondemand ipv6 joydev microcode pcspkr igb i2c_algo_bit ehci_pci ehci_hcd e1000e i2c_i801 ixgbe ptp pps_core hwmon mdio tpm_tis tpm ipmi_si ipmi_msghandler acpi_cpufreq isci libsas scsi_transport_sas button dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kvm_intel]
          CPU: 1 PID: 10358 Comm: insmod Tainted: G        W  O   3.17.0-rc1 #2
          Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
           0000000000000cd9 ffff880ff08cfd18 ffffffff814a61d9 0000000000000cd9
           0000000000000000 ffff880ff08cfd58 ffffffff810417b7 ffff880ff08cfd48
           ffffffffa045bcac ffffffffa049c420 0000000000000040 00000000000000ff
          Call Trace:
           [<ffffffff814a61d9>] dump_stack+0x49/0x60
           [<ffffffff810417b7>] warn_slowpath_common+0x7c/0x96
           [<ffffffffa045bcac>] ? kvm_init+0x234/0x282 [kvm]
           [<ffffffff810417e6>] warn_slowpath_null+0x15/0x17
           [<ffffffffa045bcac>] kvm_init+0x234/0x282 [kvm]
           [<ffffffffa016e995>] vmx_init+0x1bf/0x42a [kvm_intel]
           [<ffffffffa016e7d6>] ? vmx_check_processor_compat+0x64/0x64 [kvm_intel]
           [<ffffffff810002ab>] do_one_initcall+0xe3/0x170
           [<ffffffff811168a9>] ? __vunmap+0xad/0xb8
           [<ffffffff8109c58f>] do_init_module+0x2b/0x174
           [<ffffffff8109d414>] load_module+0x43e/0x569
           [<ffffffff8109c6d8>] ? do_init_module+0x174/0x174
           [<ffffffff8109c75a>] ? copy_module_from_user+0x39/0x82
           [<ffffffff8109b7dd>] ? module_sect_show+0x20/0x20
           [<ffffffff8109d65f>] SyS_init_module+0x54/0x81
           [<ffffffff814a9a12>] system_call_fastpath+0x16/0x1b
          ---[ end trace 0626f4a3ddea56f3 ]---
      
      The bug can be reproduced by:
      
          rmmod kvm_intel.ko
          insmod kvm_intel.ko
      
      without rmmod/insmod kvm.ko
      This patch fixes the bug by unregistering kvm_device_ops of vfio when the
      kvm-intel module is removed.
      Reported-by: default avatarLiu Rongrong <rongrongx.liu@intel.com>
      Fixes: 3c3c29fdSigned-off-by: default avatarWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      571ee1b6
    • Nadav Amit's avatar
      KVM: x86: Wrong assertion on paging_tmpl.h · 1715d0dc
      Nadav Amit authored
      Even after the recent fix, the assertion on paging_tmpl.h is triggered.
      Apparently, the assertion wants to check that the PAE is always set on
      long-mode, but does it in incorrect way.  Note that the assertion is not
      enabled unless the code is debugged by defining MMU_DEBUG.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1715d0dc
    • Quentin Casasnovas's avatar
      kvm: fix excessive pages un-pinning in kvm_iommu_map error path. · 3d32e4db
      Quentin Casasnovas authored
      The third parameter of kvm_unpin_pages() when called from
      kvm_iommu_map_pages() is wrong, it should be the number of pages to un-pin
      and not the page size.
      
      This error was facilitated with an inconsistent API: kvm_pin_pages() takes
      a size, but kvn_unpin_pages() takes a number of pages, so fix the problem
      by matching the two.
      
      This was introduced by commit 350b8bdd ("kvm: iommu: fix the third parameter
      of kvm_iommu_put_pages (CVE-2014-3601)"), which fixes the lack of
      un-pinning for pages intended to be un-pinned (i.e. memory leak) but
      unfortunately potentially aggravated the number of pages we un-pin that
      should have stayed pinned. As far as I understand though, the same
      practical mitigations apply.
      
      This issue was found during review of Red Hat 6.6 patches to prepare
      Ksplice rebootless updates.
      
      Thanks to Vegard for his time on a late Friday evening to help me in
      understanding this code.
      
      Fixes: 350b8bdd ("kvm: iommu: fix the third parameter of... (CVE-2014-3601)")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarQuentin Casasnovas <quentin.casasnovas@oracle.com>
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarJamie Iles <jamie.iles@oracle.com>
      Reviewed-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3d32e4db