1. 10 Jun, 2013 40 commits
    • Andi Kleen's avatar
      MCE: Fix vm86 handling for 32bit mce handler · a4d30bc1
      Andi Kleen authored
      commit a129a7c8 upstream.
      
      When running on 32bit the mce handler could misinterpret
      vm86 mode as ring 0. This can affect whether it does recovery
      or not; it was possible to panic when recovery was actually
      possible.
      
      Fix this by always forcing vm86 to look like ring 3.
      
      [ Backport to 3.0 notes:
      Things changed there slightly:
         - move mce_get_rip() up. It fills up m->cs and m->ip values which
           are evaluated in mce_severity(). Therefore move it up right before
           the mce_severity call. This seem to be another bug in 3.0?
         - Place the backport (fix m->cs in V86 case) to where m->cs gets
           filled which is mce_get_rip() in 3.0
      ]
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarThomas Renninger <trenn@suse.de>
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a4d30bc1
    • Petr Matousek's avatar
      KVM: x86: invalid opcode oops on SET_SREGS with OSXSAVE bit set (CVE-2012-4461) · 228b7542
      Petr Matousek authored
      commit 6d1068b3 upstream.
      
      On hosts without the XSAVE support unprivileged local user can trigger
      oops similar to the one below by setting X86_CR4_OSXSAVE bit in guest
      cr4 register using KVM_SET_SREGS ioctl and later issuing KVM_RUN
      ioctl.
      
      invalid opcode: 0000 [#2] SMP
      Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables
      ...
      Pid: 24935, comm: zoog_kvm_monito Tainted: G      D      3.2.0-3-686-pae
      EIP: 0060:[<f8b9550c>] EFLAGS: 00210246 CPU: 0
      EIP is at kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm]
      EAX: 00000001 EBX: 000f387e ECX: 00000000 EDX: 00000000
      ESI: 00000000 EDI: 00000000 EBP: ef5a0060 ESP: d7c63e70
       DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      Process zoog_kvm_monito (pid: 24935, ti=d7c62000 task=ed84a0c0
      task.ti=d7c62000)
      Stack:
       00000001 f70a1200 f8b940a9 ef5a0060 00000000 00200202 f8769009 00000000
       ef5a0060 000f387e eda5c020 8722f9c8 00015bae 00000000 ed84a0c0 ed84a0c0
       c12bf02d 0000ae80 ef7f8740 fffffffb f359b740 ef5a0060 f8b85dc1 0000ae80
      Call Trace:
       [<f8b940a9>] ? kvm_arch_vcpu_ioctl_set_sregs+0x2fe/0x308 [kvm]
      ...
       [<c12bfb44>] ? syscall_call+0x7/0xb
      Code: 89 e8 e8 14 ee ff ff ba 00 00 04 00 89 e8 e8 98 48 ff ff 85 c0 74
      1e 83 7d 48 00 75 18 8b 85 08 07 00 00 31 c9 8b 95 0c 07 00 00 <0f> 01
      d1 c7 45 48 01 00 00 00 c7 45 1c 01 00 00 00 0f ae f0 89
      EIP: [<f8b9550c>] kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] SS:ESP
      0068:d7c63e70
      
      QEMU first retrieves the supported features via KVM_GET_SUPPORTED_CPUID
      and then sets them later. So guest's X86_FEATURE_XSAVE should be masked
      out on hosts without X86_FEATURE_XSAVE, making kvm_set_cr4 with
      X86_CR4_OSXSAVE fail. Userspaces that allow specifying guest cpuid with
      X86_FEATURE_XSAVE even on hosts that do not support it, might be
      susceptible to this attack from inside the guest as well.
      
      Allow setting X86_CR4_OSXSAVE bit only if host has XSAVE support.
      Signed-off-by: default avatarPetr Matousek <pmatouse@redhat.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      [bwh: Backported to 2.6.32: XSAVE is not supported at all, so always
       deny setting OSXSAVE]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      228b7542
    • Andy Honig's avatar
      KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798) · d277a2de
      Andy Honig authored
      commit a2c118bf upstream.
      
      If the guest specifies a IOAPIC_REG_SELECT with an invalid value and follows
      that with a read of the IOAPIC_REG_WINDOW KVM does not properly validate
      that request.  ioapic_read_indirect contains an
      ASSERT(redir_index < IOAPIC_NUM_PINS), but the ASSERT has no effect in
      non-debug builds.  In recent kernels this allows a guest to cause a kernel
      oops by reading invalid memory.  In older kernels (pre-3.3) this allows a
      guest to read from large ranges of host memory.
      
      Tested: tested against apic unit tests.
      Signed-off-by: default avatarAndrew Honig <ahonig@google.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d277a2de
    • Marcelo Tosatti's avatar
      KVM: x86: relax MSR_KVM_SYSTEM_TIME alignment check · aa184a86
      Marcelo Tosatti authored
      This was fixed by commit 8f964525
      upstream.  This alternate fix avoids the need for extensive backporting.
      
      RHEL5 i386 guests register non 32-byte aligned addresses:
      
      kvm-clock: cpu 1, msr 0:3018aa5, secondary cpu clock
      kvm-clock: cpu 2, msr 0:301f8e9, secondary cpu clock
      kvm-clock: cpu 3, msr 0:302672d, secondary cpu clock
      
      Check for an address+len that would cross page boundary
      instead.
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      [dannf: backported to Debian's 2.6.32]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      aa184a86
    • Andy Honig's avatar
      KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796) · 1a33aaff
      Andy Honig authored
      commit c300aa64 upstream.
      
      If the guest sets the GPA of the time_page so that the request to update the
      time straddles a page then KVM will write onto an incorrect page.  The
      write is done byusing kmap atomic to get a pointer to the page for the time
      structure and then performing a memcpy to that page starting at an offset
      that the guest controls.  Well behaved guests always provide a 32-byte aligned
      address, however a malicious guest could use this to corrupt host kernel
      memory.
      
      Tested: Tested against kvmclock unit test.
      Signed-off-by: default avatarAndrew Honig <ahonig@google.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      1a33aaff
    • Konrad Rzeszutek Wilk's avatar
      xen/bootup: allow {read|write}_cr8 pvops call. · 13fbb875
      Konrad Rzeszutek Wilk authored
      commit 1a7bbda5 upstream.
      
      We actually do not do anything about it. Just return a default
      value of zero and if the kernel tries to write anything but 0
      we BUG_ON.
      
      This fixes the case when an user tries to suspend the machine
      and it blows up in save_processor_state b/c 'read_cr8' is set
      to NULL and we get:
      
      kernel BUG at /home/konrad/ssd/linux/arch/x86/include/asm/paravirt.h:100!
      invalid opcode: 0000 [#1] SMP
      Pid: 2687, comm: init.late Tainted: G           O 3.6.0upstream-00002-gac264ac-dirty #4 Bochs Bochs
      RIP: e030:[<ffffffff814d5f42>]  [<ffffffff814d5f42>] save_processor_state+0x212/0x270
      
      .. snip..
      Call Trace:
       [<ffffffff810733bf>] do_suspend_lowlevel+0xf/0xac
       [<ffffffff8107330c>] ? x86_acpi_suspend_lowlevel+0x10c/0x150
       [<ffffffff81342ee2>] acpi_suspend_enter+0x57/0xd5
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      13fbb875
    • Konrad Rzeszutek Wilk's avatar
      xen/bootup: allow read_tscp call for Xen PV guests. · e73a5ba6
      Konrad Rzeszutek Wilk authored
      commit cd0608e7 upstream.
      
      The hypervisor will trap it. However without this patch,
      we would crash as the .read_tscp is set to NULL. This patch
      fixes it and sets it to the native_read_tscp call.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e73a5ba6
    • Samu Kallio's avatar
      x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates · 2ed9527f
      Samu Kallio authored
      commit 1160c277 upstream.
      
      In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops
      when lazy MMU updates are enabled, because set_pgd effects are being
      deferred.
      
      One instance of this problem is during process mm cleanup with memory
      cgroups enabled. The chain of events is as follows:
      
      - zap_pte_range enables lazy MMU updates
      - zap_pte_range eventually calls mem_cgroup_charge_statistics,
        which accesses the vmalloc'd mem_cgroup per-cpu stat area
      - vmalloc_fault is triggered which tries to sync the corresponding
        PGD entry with set_pgd, but the update is deferred
      - vmalloc_fault oopses due to a mismatch in the PUD entries
      
      The OOPs usually looks as so:
      
      ------------[ cut here ]------------
      kernel BUG at arch/x86/mm/fault.c:396!
      invalid opcode: 0000 [#1] SMP
      .. snip ..
      CPU 1
      Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1
      RIP: e030:[<ffffffff816271bf>]  [<ffffffff816271bf>] vmalloc_fault+0x11f/0x208
      .. snip ..
      Call Trace:
       [<ffffffff81627759>] do_page_fault+0x399/0x4b0
       [<ffffffff81004f4c>] ? xen_mc_extend_args+0xec/0x110
       [<ffffffff81624065>] page_fault+0x25/0x30
       [<ffffffff81184d03>] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50
       [<ffffffff81186f78>] __mem_cgroup_uncharge_common+0xd8/0x350
       [<ffffffff8118aac7>] mem_cgroup_uncharge_page+0x57/0x60
       [<ffffffff8115fbc0>] page_remove_rmap+0xe0/0x150
       [<ffffffff8115311a>] ? vm_normal_page+0x1a/0x80
       [<ffffffff81153e61>] unmap_single_vma+0x531/0x870
       [<ffffffff81154962>] unmap_vmas+0x52/0xa0
       [<ffffffff81007442>] ? pte_mfn_to_pfn+0x72/0x100
       [<ffffffff8115c8f8>] exit_mmap+0x98/0x170
       [<ffffffff810050d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
       [<ffffffff81059ce3>] mmput+0x83/0xf0
       [<ffffffff810624c4>] exit_mm+0x104/0x130
       [<ffffffff8106264a>] do_exit+0x15a/0x8c0
       [<ffffffff810630ff>] do_group_exit+0x3f/0xa0
       [<ffffffff81063177>] sys_exit_group+0x17/0x20
       [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b
      
      Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the
      changes visible to the consistency checks.
      
      RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737Tested-by: default avatarJosh Boyer <jwboyer@redhat.com>
      Reported-and-Tested-by: default avatarKrishna Raman <kraman@redhat.com>
      Signed-off-by: default avatarSamu Kallio <samu.kallio@aberdeencloud.com>
      Link: http://lkml.kernel.org/r/1364045796-10720-1-git-send-email-konrad.wilk@oracle.comTested-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2ed9527f
    • Mel Gorman's avatar
      x86/mm: Check if PUD is large when validating a kernel address · 49beb96c
      Mel Gorman authored
      commit 0ee364eb upstream.
      
      A user reported the following oops when a backup process reads
      /proc/kcore:
      
       BUG: unable to handle kernel paging request at ffffbb00ff33b000
       IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110
       [...]
      
       Call Trace:
        [<ffffffff811b8aaa>] read_kcore+0x17a/0x370
        [<ffffffff811ad847>] proc_reg_read+0x77/0xc0
        [<ffffffff81151687>] vfs_read+0xc7/0x130
        [<ffffffff811517f3>] sys_read+0x53/0xa0
        [<ffffffff81449692>] system_call_fastpath+0x16/0x1b
      
      Investigation determined that the bug triggered when reading
      system RAM at the 4G mark. On this system, that was the first
      address using 1G pages for the virt->phys direct mapping so the
      PUD is pointing to a physical address, not a PMD page.
      
      The problem is that the page table walker in kern_addr_valid() is
      not checking pud_large() and treats the physical address as if
      it was a PMD.  If it happens to look like pmd_none then it'll
      silently fail, probably returning zeros instead of real data. If
      the data happens to look like a present PMD though, it will be
      walked resulting in the oops above.
      
      This patch adds the necessary pud_large() check.
      
      Unfortunately the problem was not readily reproducible and now
      they are running the backup program without accessing
      /proc/kcore so the patch has not been validated but I think it
      makes sense.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.coM>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20130211145236.GX21389@suse.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      49beb96c
    • Alan Cox's avatar
      x86/msr: Add capabilities check · 3dcf19f3
      Alan Cox authored
      commit c903f045 upstream
      
      At the moment the MSR driver only relies upon file system
      checks. This means that anything as root with any capability set
      can write to MSRs. Historically that wasn't very interesting but
      on modern processors the MSRs are such that writing to them
      provides several ways to execute arbitary code in kernel space.
      Sample code and documentation on doing this is circulating and
      MSR attacks are used on Windows 64bit rootkits already.
      
      In the Linux case you still need to be able to open the device
      file so the impact is fairly limited and reduces the security of
      some capability and security model based systems down towards
      that of a generic "root owns the box" setup.
      
      Therefore they should require CAP_SYS_RAWIO to prevent an
      elevation of capabilities. The impact of this is fairly minimal
      on most setups because they don't have heavy use of
      capabilities. Those using SELinux, SMACK or AppArmor rules might
      want to consider if their rulesets on the MSR driver could be
      tighter.
      Signed-off-by: default avatarAlan Cox <alan@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Horses <stable@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [dannf: backported to Debian's 2.6.32]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3dcf19f3
    • Jan Beulich's avatar
      x86/xen: don't assume %ds is usable in xen_iret for 32-bit PVOPS. · 783defce
      Jan Beulich authored
      commit 13d2b4d1 upstream.
      
      This fixes CVE-2013-0228 / XSA-42
      
      Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user
      in 32bit PV guest can use to crash the > guest with the panic like this:
      
      -------------
      general protection fault: 0000 [#1] SMP
      last sysfs file: /sys/devices/vbd-51712/block/xvda/dev
      Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
      iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
      xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4
      mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last
      unloaded: scsi_wait_scan]
      
      Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1
      EIP: 0061:[<c0407462>] EFLAGS: 00010086 CPU: 0
      EIP is at xen_iret+0x12/0x2b
      EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010
      ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0
       DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069
      Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000)
      Stack:
       00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000
      Call Trace:
      Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00
      8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40
      10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02
      EIP: [<c0407462>] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0
      general protection fault: 0000 [#2]
      ---[ end trace ab0d29a492dcd330 ]---
      Kernel panic - not syncing: Fatal exception
      Pid: 1250, comm: r Tainted: G      D    ---------------
      2.6.32-356.el6.i686 #1
      Call Trace:
       [<c08476df>] ? panic+0x6e/0x122
       [<c084b63c>] ? oops_end+0xbc/0xd0
       [<c084b260>] ? do_general_protection+0x0/0x210
       [<c084a9b7>] ? error_code+0x73/
      -------------
      
      Petr says: "
       I've analysed the bug and I think that xen_iret() cannot cope with
       mangled DS, in this case zeroed out (null selector/descriptor) by either
       xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT
       entry was invalidated by the reproducer. "
      
      Jan took a look at the preliminary patch and came up a fix that solves
      this problem:
      
      "This code gets called after all registers other than those handled by
      IRET got already restored, hence a null selector in %ds or a non-null
      one that got loaded from a code or read-only data descriptor would
      cause a kernel mode fault (with the potential of crashing the kernel
      as a whole, if panic_on_oops is set)."
      
      The way to fix this is to realize that the we can only relay on the
      registers that IRET restores. The two that are guaranteed are the
      %cs and %ss as they are always fixed GDT selectors. Also they are
      inaccessible from user mode - so they cannot be altered. This is
      the approach taken in this patch.
      
      Another alternative option suggested by Jan would be to relay on
      the subtle realization that using the %ebp or %esp relative references uses
      the %ss segment.  In which case we could switch from using %eax to %ebp and
      would not need the %ss over-rides. That would also require one extra
      instruction to compensate for the one place where the register is used
      as scaled index. However Andrew pointed out that is too subtle and if
      further work was to be done in this code-path it could escape folks attention
      and lead to accidents.
      Reviewed-by: default avatarPetr Matousek <pmatouse@redhat.com>
      Reported-by: default avatarPetr Matousek <pmatouse@redhat.com>
      Reviewed-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      [dannf: backported to Debian's 2.6.32]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      783defce
    • Romain Francoise's avatar
      x86, random: make ARCH_RANDOM prompt if EMBEDDED, not EXPERT · 119274d6
      Romain Francoise authored
      Before v2.6.38 CONFIG_EXPERT was known as CONFIG_EMBEDDED but the
      Kconfig entry was not changed to match when upstream commit
      628c6246 ("x86, random: Architectural
      inlines to get random integers with RDRAND") was backported.
      Signed-off-by: default avatarRomain Francoise <romain@orebokech.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      119274d6
    • Matthew Garrett's avatar
      x86: Don't use the EFI reboot method by default · d9869cbe
      Matthew Garrett authored
      Testing suggests that at least some Lenovos and some Intels will
      fail to reboot via EFI, attempting to jump to an unmapped
      physical address. In the long run we could handle this by
      providing a page table with a 1:1 mapping of physical addresses,
      but for now it's probably just easier to assume that ACPI or
      legacy methods will be present and reboot via those.
      
      [2.6.32: additional background information from Jonathan below]
      >
      > Please consider
      >
      >   f70e957c x86: Don't use the EFI reboot method by default,
      >                2011-07-06
      >
      > for application to the 2.6.32.y and 2.6.34.y trees.  The patch was
      > applied upstream late in the 3.0 cycle, so newer kernels don't need
      > it.
      >
      > In 2011, Keith Ward wrote[1]:
      >
      > > When attempting to reboot my my UEFI enabled system, the system hangs when
      > > calling reboot requiring me to manually reset the system via the reset switch.
      > >
      > > Screenshot: http://twitgoo.com/29bq1c
      >
      > Ben Hutchings writes[1]:
      >
      > > Version: 3.0.0-1
      > >
      > > I also had this problem on my own system, but it is fixed now.
      > > I bisected the fix to:
      > >
      > > commit f70e957c
      > > Author: Matthew Garrett <mjg@redhat.com>
      > > Date:   Wed Jul 6 16:52:37 2011 -0400
      > >
      > >     x86: Don't use the EFI reboot method by default
      > >
      > > which is basically equivalent to the workaround!
      > >
      > > I'll also apply this fix to squeeze as it's so simple.
      >
      > Keith Ward also wrote[1]:
      >
      > > It seems as if this has recently been reported at Ubuntu's Launchpad as well:
      > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/721576
      >
      > There are a variety of reports of the same panic at that bug on
      > 2.6.32.y-, 2.6.38.y-, and 2.6.39-based kernels.  Passing "reboot=a,w"
      > on the kernel command line avoids trouble for reporters.
      Signed-off-by: default avatarMatthew Garrett <mjg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Alan Cox <alan@linux.intel.com>
      Link: http://lkml.kernel.org/r/1309985557-15350-1-git-send-email-mjg@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      (cherry picked from commit f70e957c)
      Cc: Jonathan Nieder <jrnieder@gmail.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d9869cbe
    • Suresh Siddha's avatar
      x86, ioapic: initialize nr_ioapic_registers early in mp_register_ioapic() · dadd486a
      Suresh Siddha authored
      Lin Bao reported that one of the HP platforms failed to boot
      2.6.32 kernel, when the BIOS enabled interrupt-remapping and
      x2apic before handing over the control to the Linux kernel.
      
      During boot, Linux kernel masks all the interrupt sources
      (8259, IO-APIC RTE's), setup the interrupt-remapping hardware
      with the OS controlled table and unmasks the 8259 interrupts
      but not the IO-APIC RTE's (as the newly setup interrupt-remapping
      table and the IO-APIC RTE's are not yet programmed by the kernel).
      
      Shortly after this, IO-APIC RTE's and the interrupt-remapping table
      entries are programmed based on the ACPI tables etc. So the
      expectation is that any interrupt during this window will be dropped
      and not see the intermediate configuration.
      
      In the reported problematic case, BIOS has configured the IO-APIC
      in virtual wire-B mode. Between the window of the kernel setting up
      new interrupt-remapping table  and the IO-APIC RTE's are properly
      configured, an interrupt gets routed by the IO-APIC RTE (setup
      by the virtual wire-B configuration) and sees the empty
      interrupt-remapping table entry, resulting in vt-d fault causing
      the platform to generate NMI. And the OS panics on this unexpected NMI.
      
      This problem doesn't happen with more recent kernels and closer
      look at the 2.6.32 kernel shows that the code which masks
      the IO-APIC RTE's is not working as expected as the nr_ioapic_registers
      for each IO-APIC is not yet initialized at this point. In the later
      kernels we initialize nr_ioapic_registers much before and
      everything works as expected.
      
      For 2.6.[32..34] kernels, fix this issue by initializing
      nr_ioapic_registers early in mp_register_ioapic()
      
      [ Relevant upstream commit info:
        commit 7716a5c4
        Author: Eric W. Biederman <ebiederm@xmission.com>
        Date:   Tue Mar 30 01:07:12 2010 -0700
      
          x86, ioapic: Move nr_ioapic_registers calculation to mp_register_ioapic.
      
        As the upstream commit depends on quite a few prior commits
        and some followup fixes in the mainline, we just picked
        the smallest relevant hunk for fixing the issue at hand.
        Problematic platform uses ACPI for IO-APIC, VT-d enumeration etc
        and this hunk only touches the ACPI based platforms.
      
        nr_ioapic_reigsters initialization in enable_IO_APIC() is still
        retained, so that other configurations like legacy MPS table based
        enumeration etc works with no change.
      ]
      Reported-and-tested-by: default avatarZhang, Lin-Bao <linbao.zhang@hp.com>
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJonathan Nieder <jrnieder@gmail.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      dadd486a
    • Takashi Iwai's avatar
    • Takashi Iwai's avatar
      ALSA: seq: Fix missing error handling in snd_seq_timer_open() · 402adfdd
      Takashi Iwai authored
      commit 66efdc71 upstream.
      
      snd_seq_timer_open() didn't catch the whole error path but let through
      if the timer id is a slave.  This may lead to Oops by accessing the
      uninitialized pointer.
      
       BUG: unable to handle kernel NULL pointer dereference at 00000000000002ae
       IP: [<ffffffff819b3477>] snd_seq_timer_open+0xe7/0x130
       PGD 785cd067 PUD 76964067 PMD 0
       Oops: 0002 [#4] SMP
       CPU 0
       Pid: 4288, comm: trinity-child7 Tainted: G      D W 3.9.0-rc1+ #100 Bochs Bochs
       RIP: 0010:[<ffffffff819b3477>]  [<ffffffff819b3477>] snd_seq_timer_open+0xe7/0x130
       RSP: 0018:ffff88006ece7d38  EFLAGS: 00010246
       RAX: 0000000000000286 RBX: ffff88007851b400 RCX: 0000000000000000
       RDX: 000000000000ffff RSI: ffff88006ece7d58 RDI: ffff88006ece7d38
       RBP: ffff88006ece7d98 R08: 000000000000000a R09: 000000000000fffe
       R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
       R13: ffff8800792c5400 R14: 0000000000e8f000 R15: 0000000000000007
       FS:  00007f7aaa650700(0000) GS:ffff88007f800000(0000) GS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00000000000002ae CR3: 000000006efec000 CR4: 00000000000006f0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Process trinity-child7 (pid: 4288, threadinfo ffff88006ece6000, task ffff880076a8a290)
       Stack:
        0000000000000286 ffffffff828f2be0 ffff88006ece7d58 ffffffff810f354d
        65636e6575716573 2065756575712072 ffff8800792c0030 0000000000000000
        ffff88006ece7d98 ffff8800792c5400 ffff88007851b400 ffff8800792c5520
       Call Trace:
        [<ffffffff810f354d>] ? trace_hardirqs_on+0xd/0x10
        [<ffffffff819b17e9>] snd_seq_queue_timer_open+0x29/0x70
        [<ffffffff819ae01a>] snd_seq_ioctl_set_queue_timer+0xda/0x120
        [<ffffffff819acb9b>] snd_seq_do_ioctl+0x9b/0xd0
        [<ffffffff819acbe0>] snd_seq_ioctl+0x10/0x20
        [<ffffffff811b9542>] do_vfs_ioctl+0x522/0x570
        [<ffffffff8130a4b3>] ? file_has_perm+0x83/0xa0
        [<ffffffff810f354d>] ? trace_hardirqs_on+0xd/0x10
        [<ffffffff811b95ed>] sys_ioctl+0x5d/0xa0
        [<ffffffff813663fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
        [<ffffffff81faed69>] system_call_fastpath+0x16/0x1b
      Reported-and-tested-by: default avatarTommi Rantala <tt.rantala@gmail.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      402adfdd
    • Takashi Iwai's avatar
      ALSA: hda - Add a pin-fix for FSC Amilo Pi1505 · 0b6c9f05
      Takashi Iwai authored
      FSC Amilo Pi 1505 has a buggy BIOS and doesn't set up the HP and
      speaker pins properly.  Add the pinfix entry for that.
      
      Reference: Novell bnc#557403
         https://bugzilla.novell.com/show_bug.cgi?id=557403
      
      [2.6.32: additional background from Jonathan below]
      > Hi Willy,
      >
      > Please consider
      >
      >   cfc9b06f ALSA: hda - Add a pin-fix for FSC Amilo Pi1505
      >
      > for application to the 2.6.32.y tree.  Without this patch, the Amilo
      > Pi 1505's internal speaker is silent unless a jack is plugged into its
      > headphone jack.
      >
      > Jose Manuel Castroagudin noticed[1] that 2.6.30 is not affected, so
      > this seems to be a regression.
      >
      > The patch was applied upstream during the 2.6.33 merge window, where
      > it worked.  That said, I didn't manage to track down anyone with a
      > Pi1505 to test it against 2.6.32, so thoughts from alsa folks on
      > whether this is appropriate for 2.6.32.y would be useful.
      >
      > Hope that helps,
      > Jonathan
      >
      > [1] http://bugs.debian.org/599582 has many more details.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      (cherry picked from commit cfc9b06f)
      Cc: Jonathan Nieder <jrnieder@gmail.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0b6c9f05
    • Kailang Yang's avatar
      ALSA: hda - More ALC663 fixes and support of compatible chips · da8aad73
      Kailang Yang authored
      commit ebb83eeb upstream.
      
      1. Add more ASUS NB model.
      2. Fixed alc663_m51va_setup
         M51VA has Digital Mic that NID is 0x12. The record source index is
         0x9 for ALC663.
         So, to modify the alc663_m51va_setup function to index 0x9
         and add analog Mic aupport function alc663_mode1_setup.
      3. Add ASUS mode7 and mode8 modules for ALC663
      
      [jn: backport to 2.6.32.y to address http://bugs.debian.org/688564]
      Signed-off-by: default avatarKailang Yang <kailang@realtek.com.tw>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarJonathan Nieder <jrnieder@gmail.com>
      Tested-by: Bill Allombert <Bill.Allombert@math.u-bordeaux1.fr> # Vaio w/ ALC275
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      da8aad73
    • Mel Gorman's avatar
      mempolicy: fix a race in shared_policy_replace() · aaa1bc47
      Mel Gorman authored
      commit b22d127a upstream.
      
      shared_policy_replace() use of sp_alloc() is unsafe.  1) sp_node cannot
      be dereferenced if sp->lock is not held and 2) another thread can modify
      sp_node between spin_unlock for allocating a new sp node and next
      spin_lock.  The bug was introduced before 2.6.12-rc2.
      
      Kosaki's original patch for this problem was to allocate an sp node and
      policy within shared_policy_replace and initialise it when the lock is
      reacquired.  I was not keen on this approach because it partially
      duplicates sp_alloc().  As the paths were sp->lock is taken are not that
      performance critical this patch converts sp->lock to sp->mutex so it can
      sleep when calling sp_alloc().
      
      [kosaki.motohiro@jp.fujitsu.com: Original patch]
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Josh Boyer <jwboyer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      aaa1bc47
    • Hugh Dickins's avatar
      mm: fix invalidate_complete_page2() lock ordering · bc68657f
      Hugh Dickins authored
      commit ec4d9f62 upstream.
      
      In fuzzing with trinity, lockdep protested "possible irq lock inversion
      dependency detected" when isolate_lru_page() reenabled interrupts while
      still holding the supposedly irq-safe tree_lock:
      
      invalidate_inode_pages2
        invalidate_complete_page2
          spin_lock_irq(&mapping->tree_lock)
          clear_page_mlock
            isolate_lru_page
              spin_unlock_irq(&zone->lru_lock)
      
      isolate_lru_page() is correct to enable interrupts unconditionally:
      invalidate_complete_page2() is incorrect to call clear_page_mlock() while
      holding tree_lock, which is supposed to nest inside lru_lock.
      
      Both truncate_complete_page() and invalidate_complete_page() call
      clear_page_mlock() before taking tree_lock to remove page from radix_tree.
       I guess invalidate_complete_page2() preferred to test PageDirty (again)
      under tree_lock before committing to the munlock; but since the page has
      already been unmapped, its state is already somewhat inconsistent, and no
      worse if clear_page_mlock() moved up.
      Reported-by: default avatarSasha Levin <levinsasha928@gmail.com>
      Deciphered-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      bc68657f
    • Takamori Yamaguchi's avatar
      mm: bugfix: set current->reclaim_state to NULL while returning from kswapd() · 61daa923
      Takamori Yamaguchi authored
      commit b0a8cc58 upstream.
      
      In kswapd(), set current->reclaim_state to NULL before returning, as
      current->reclaim_state holds reference to variable on kswapd()'s stack.
      
      In rare cases, while returning from kswapd() during memory offlining,
      __free_slab() and freepages() can access the dangling pointer of
      current->reclaim_state.
      Signed-off-by: default avatarTakamori Yamaguchi <takamori.yamaguchi@jp.sony.com>
      Signed-off-by: default avatarAaditya Kumar <aaditya.kumar@ap.sony.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      61daa923
    • Christoffer Dall's avatar
      mm: Fix PageHead when !CONFIG_PAGEFLAGS_EXTENDED · 6d9f42be
      Christoffer Dall authored
      commit ad4b3fb7 upstream.
      
      Unfortunately with !CONFIG_PAGEFLAGS_EXTENDED, (!PageHead) is false, and
      (PageHead) is true, for tail pages.  If this is indeed the intended
      behavior, which I doubt because it breaks cache cleaning on some ARM
      systems, then the nomenclature is highly problematic.
      
      This patch makes sure PageHead is only true for head pages and PageTail
      is only true for tail pages, and neither is true for non-compound pages.
      
      [ This buglet seems ancient - seems to have been introduced back in Apr
        2008 in commit 6a1e7f77: "pageflags: convert to the use of new
        macros".  And the reason nobody noticed is because the PageHead()
        tests are almost all about just sanity-checking, and only used on
        pages that are actual page heads.  The fact that the old code returned
        true for tail pages too was thus not really noticeable.   - Linus ]
      Signed-off-by: default avatarChristoffer Dall <cdall@cs.columbia.edu>
      Acked-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Will Deacon <Will.Deacon@arm.com>
      Cc: Steve Capper <Steve.Capper@arm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      6d9f42be
    • Dave Hansen's avatar
      mm: fix vma_resv_map() NULL pointer · 6b36635b
      Dave Hansen authored
      commit 4523e145 upstream
      
      hugetlb_reserve_pages() can be used for either normal file-backed
      hugetlbfs mappings, or MAP_HUGETLB.  In the MAP_HUGETLB, semi-anonymous
      mode, there is not a VMA around.  The new call to resv_map_put() assumed
      that there was, and resulted in a NULL pointer dereference:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
        IP: vma_resv_map+0x9/0x30
        PGD 141453067 PUD 1421e1067 PMD 0
        Oops: 0000 [#1] PREEMPT SMP
        ...
        Pid: 14006, comm: trinity-child6 Not tainted 3.4.0+ #36
        RIP: vma_resv_map+0x9/0x30
        ...
        Process trinity-child6 (pid: 14006, threadinfo ffff8801414e0000, task ffff8801414f26b0)
        Call Trace:
          resv_map_put+0xe/0x40
          hugetlb_reserve_pages+0xa6/0x1d0
          hugetlb_file_setup+0x102/0x2c0
          newseg+0x115/0x360
          ipcget+0x1ce/0x310
          sys_shmget+0x5a/0x60
          system_call_fastpath+0x16/0x1b
      
      This was reported by Dave Jones, but was reproducible with the
      libhugetlbfs test cases, so shame on me for not running them in the
      first place.
      
      With this, the oops is gone, and the output of libhugetlbfs's
      run_tests.py is identical to plain 3.4 again.
      
      [ Marked for stable, since this was introduced by commit c50ac050
        ("hugetlb: fix resv_map leak in error path") which was also marked for
        stable ]
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [dannf: backported to Debian's 2.6.32]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      6b36635b
    • Dave Hansen's avatar
      hugetlb: fix resv_map leak in error path · bc89fc4a
      Dave Hansen authored
      commit c50ac050 upstream
      
      When called for anonymous (non-shared) mappings, hugetlb_reserve_pages()
      does a resv_map_alloc().  It depends on code in hugetlbfs's
      vm_ops->close() to release that allocation.
      
      However, in the mmap() failure path, we do a plain unmap_region() without
      the remove_vma() which actually calls vm_ops->close().
      
      This is a decent fix.  This leak could get reintroduced if new code (say,
      after hugetlb_reserve_pages() in hugetlbfs_file_mmap()) decides to return
      an error.  But, I think it would have to unroll the reservation anyway.
      
      Christoph's test case:
      
      	http://marc.info/?l=linux-mm&m=133728900729735
      
      This patch applies to 3.4 and later.  A version for earlier kernels is at
      https://lkml.org/lkml/2012/5/22/418.
      Signed-off-by: default avatarDave Hansen <dave@linux.vnet.ibm.com>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reported-by: default avatarChristoph Lameter <cl@linux.com>
      Tested-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [dannf: backported to Debian's 2.6.32]
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      bc89fc4a
    • Namhyung Kim's avatar
      tracing: Fix double free when function profile init failed · 83c84290
      Namhyung Kim authored
      commit 83e03b3f upstream.
      
      On the failure path, stat->start and stat->pages will refer same page.
      So it'll attempt to free the same page again and get kernel panic.
      
      Link: http://lkml.kernel.org/r/1364820385-32027-1-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      83c84290
    • Wen Congyang's avatar
      tracing: Don't call page_to_pfn() if page is NULL · a2cefc7b
      Wen Congyang authored
      commit 85f2a2ef upstream.
      
      When allocating memory fails, page is NULL. page_to_pfn() will
      cause the kernel panicked if we don't use sparsemem vmemmap.
      
      Link: http://lkml.kernel.org/r/505AB1FF.8020104@cn.fujitsu.comAcked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a2cefc7b
    • Li Zhong's avatar
      Fix a dead loop in async_synchronize_full() · 48bac809
      Li Zhong authored
      [Fixed upstream by commits 2955b47d and
      a4683487 from Dan Williams, but they are much
      more intrusive than this tiny fix, according to Andrew - gregkh]
      
      This patch tries to fix a dead loop in  async_synchronize_full(), which
      could be seen when preemption is disabled on a single cpu machine.
      
      void async_synchronize_full(void)
      {
              do {
                      async_synchronize_cookie(next_cookie);
              } while (!list_empty(&async_running) || !
      list_empty(&async_pending));
      }
      
      async_synchronize_cookie() calls async_synchronize_cookie_domain() with
      &async_running as the default domain to synchronize.
      
      However, there might be some works in the async_pending list from other
      domains. On a single cpu system, without preemption, there is no chance
      for the other works to finish, so async_synchronize_full() enters a dead
      loop.
      
      It seems async_synchronize_full() wants to synchronize all entries in
      all running lists(domains), so maybe we could just check the entry_count
      to know whether all works are finished.
      
      Currently, async_synchronize_cookie_domain() expects a non-NULL running
      list ( if NULL, there would be NULL pointer dereference ), so maybe a
      NULL pointer could be used as an indication for the functions to
      synchronize all works in all domains.
      Reported-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarLi Zhong <zhong@linux.vnet.ibm.com>
      Tested-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: default avatarChristian Kujau <lists@nerdbynature.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dan Williams <dan.j.williams@gmail.com>
      Cc: Christian Kujau <lists@nerdbynature.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      48bac809
    • Tejun Heo's avatar
      cgroup: remove incorrect dget/dput() pair in cgroup_create_dir() · 5e224edf
      Tejun Heo authored
      commit 17543163 upstream.
      
      cgroup_create_dir() does weird dancing with dentry refcnt.  On
      success, it gets and then puts it achieving nothing.  On failure, it
      puts but there isn't no matching get anywhere leading to the following
      oops if cgroup_create_file() fails for whatever reason.
      
        ------------[ cut here ]------------
        kernel BUG at /work/os/work/fs/dcache.c:552!
        invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        Modules linked in:
        CPU 2
        Pid: 697, comm: mkdir Not tainted 3.7.0-rc4-work+ #3 Bochs Bochs
        RIP: 0010:[<ffffffff811d9c0c>]  [<ffffffff811d9c0c>] dput+0x1dc/0x1e0
        RSP: 0018:ffff88001a3ebef8  EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffff88000e5b1ef8 RCX: 0000000000000403
        RDX: 0000000000000303 RSI: 2000000000000000 RDI: ffff88000e5b1f58
        RBP: ffff88001a3ebf18 R08: ffffffff82c76960 R09: 0000000000000001
        R10: ffff880015022080 R11: ffd9bed70f48a041 R12: 00000000ffffffea
        R13: 0000000000000001 R14: ffff88000e5b1f58 R15: 00007fff57656d60
        FS:  00007ff05fcb3800(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000004046f0 CR3: 000000001315f000 CR4: 00000000000006e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        Process mkdir (pid: 697, threadinfo ffff88001a3ea000, task ffff880015022080)
        Stack:
         ffff88001a3ebf48 00000000ffffffea 0000000000000001 0000000000000000
         ffff88001a3ebf38 ffffffff811cc889 0000000000000001 ffff88000e5b1ef8
         ffff88001a3ebf68 ffffffff811d1fc9 ffff8800198d7f18 ffff880019106ef8
        Call Trace:
         [<ffffffff811cc889>] done_path_create+0x19/0x50
         [<ffffffff811d1fc9>] sys_mkdirat+0x59/0x80
         [<ffffffff811d2009>] sys_mkdir+0x19/0x20
         [<ffffffff81be1e02>] system_call_fastpath+0x16/0x1b
        Code: 00 48 8d 90 18 01 00 00 48 89 93 c0 00 00 00 4c 89 a0 18 01 00 00 48 8b 83 a0 00 00 00 83 80 28 01 00 00 01 e8 e6 6f a0 00 eb 92 <0f> 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41
        RIP  [<ffffffff811d9c0c>] dput+0x1dc/0x1e0
         RSP <ffff88001a3ebef8>
        ---[ end trace 1277bcfd9561ddb0 ]---
      
      Fix it by dropping the unnecessary dget/dput() pair.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5e224edf
    • Bjorn Helgaas's avatar
      Driver core: treat unregistered bus_types as having no devices · a21f3aa1
      Bjorn Helgaas authored
      commit 4fa3e78b upstream.
      
      A bus_type has a list of devices (klist_devices), but the list and the
      subsys_private structure that contains it are not initialized until the
      bus_type is registered with bus_register().
      
      The panic/reboot path has fixups that look up devices in pci_bus_type.  If
      we panic before registering pci_bus_type, the bus_type exists but the list
      does not, so mach_reboot_fixups() trips over a null pointer and panics
      again:
      
          mach_reboot_fixups
            pci_get_device
              ..
                bus_find_device(&pci_bus_type, ...)
                  bus->p is NULL
      
      Joonsoo reported a problem when panicking before PCI was initialized.
      I think this patch should be sufficient to replace the patch he posted
      here: https://lkml.org/lkml/2012/12/28/75 ("[PATCH] x86, reboot: skip
      reboot_fixups in early boot phase")
      Reported-by: default avatarJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a21f3aa1
    • T Makphaibulchoke's avatar
      kernel/resource.c: fix stack overflow in __reserve_region_with_split() · 0cb1546c
      T Makphaibulchoke authored
      commit 4965f566 upstream.
      
      Using a recursive call add a non-conflicting region in
      __reserve_region_with_split() could result in a stack overflow in the case
      that the recursive calls are too deep.  Convert the recursive calls to an
      iterative loop to avoid the problem.
      
      Tested on a machine containing 135 regions.  The kernel no longer panicked
      with stack overflow.
      
      Also tested with code arbitrarily adding regions with no conflict,
      embedding two consecutive conflicts and embedding two non-consecutive
      conflicts.
      Signed-off-by: default avatarT Makphaibulchoke <tmac@hp.com>
      Reviewed-by: default avatarRam Pai <linuxram@us.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@gmail.com>
      Cc: Wei Yang <weiyang@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0cb1546c
    • Thadeu Lima de Souza Cascardo's avatar
      genalloc: stop crashing the system when destroying a pool · b4c006c3
      Thadeu Lima de Souza Cascardo authored
      commit eedce141 upstream.
      
      The genalloc code uses the bitmap API from include/linux/bitmap.h and
      lib/bitmap.c, which is based on long values.  Both bitmap_set from
      lib/bitmap.c and bitmap_set_ll, which is the lockless version from
      genalloc.c, use BITMAP_LAST_WORD_MASK to set the first bits in a long in
      the bitmap.
      
      That one uses (1 << bits) - 1, 0b111, if you are setting the first three
      bits.  This means that the API counts from the least significant bits
      (LSB from now on) to the MSB.  The LSB in the first long is bit 0, then.
      The same works for the lookup functions.
      
      The genalloc code uses longs for the bitmap, as it should.  In
      include/linux/genalloc.h, struct gen_pool_chunk has unsigned long
      bits[0] as its last member.  When allocating the struct, genalloc should
      reserve enough space for the bitmap.  This should be a proper number of
      longs that can fit the amount of bits in the bitmap.
      
      However, genalloc allocates an integer number of bytes that fit the
      amount of bits, but may not be an integer amount of longs.  9 bytes, for
      example, could be allocated for 70 bits.
      
      This is a problem in itself if the Least Significat Bit in a long is in
      the byte with the largest address, which happens in Big Endian machines.
      This means genalloc is not allocating the byte in which it will try to
      set or check for a bit.
      
      This may end up in memory corruption, where genalloc will try to set the
      bits it has not allocated.  In fact, genalloc may not set these bits
      because it may find them already set, because they were not zeroed since
      they were not allocated.  And that's what causes a BUG when
      gen_pool_destroy is called and check for any set bits.
      
      What really happens is that genalloc uses kmalloc_node with __GFP_ZERO
      on gen_pool_add_virt.  With SLAB and SLUB, this means the whole slab
      will be cleared, not only the requested bytes.  Since struct
      gen_pool_chunk has a size that is a multiple of 8, and slab sizes are
      multiples of 8, we get lucky and allocate and clear the right amount of
      bytes.
      
      Hower, this is not the case with SLOB or with older code that did memset
      after allocating instead of using __GFP_ZERO.
      
      So, a simple module as this (running 3.6.0), will cause a crash when
      rmmod'ed.
      
        [root@phantom-lp2 foo]# cat foo.c
        #include <linux/kernel.h>
        #include <linux/module.h>
        #include <linux/init.h>
        #include <linux/genalloc.h>
      
        MODULE_LICENSE("GPL");
        MODULE_VERSION("0.1");
      
        static struct gen_pool *foo_pool;
      
        static __init int foo_init(void)
        {
                int ret;
                foo_pool = gen_pool_create(10, -1);
                if (!foo_pool)
                        return -ENOMEM;
                ret = gen_pool_add(foo_pool, 0xa0000000, 32 << 10, -1);
                if (ret) {
                        gen_pool_destroy(foo_pool);
                        return ret;
                }
                return 0;
        }
      
        static __exit void foo_exit(void)
        {
                gen_pool_destroy(foo_pool);
        }
      
        module_init(foo_init);
        module_exit(foo_exit);
        [root@phantom-lp2 foo]# zcat /proc/config.gz | grep SLOB
        CONFIG_SLOB=y
        [root@phantom-lp2 foo]# insmod ./foo.ko
        [root@phantom-lp2 foo]# rmmod foo
        ------------[ cut here ]------------
        kernel BUG at lib/genalloc.c:243!
        cpu 0x4: Vector: 700 (Program Check) at [c0000000bb0e7960]
            pc: c0000000003cb50c: .gen_pool_destroy+0xac/0x110
            lr: c0000000003cb4fc: .gen_pool_destroy+0x9c/0x110
            sp: c0000000bb0e7be0
           msr: 8000000000029032
          current = 0xc0000000bb0e0000
          paca    = 0xc000000006d30e00   softe: 0        irq_happened: 0x01
            pid   = 13044, comm = rmmod
        kernel BUG at lib/genalloc.c:243!
        [c0000000bb0e7ca0] d000000004b00020 .foo_exit+0x20/0x38 [foo]
        [c0000000bb0e7d20] c0000000000dff98 .SyS_delete_module+0x1a8/0x290
        [c0000000bb0e7e30] c0000000000097d4 syscall_exit+0x0/0x94
        --- Exception: c00 (System Call) at 000000800753d1a0
        SP (fffd0b0e640) is in userspace
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Benjamin Gaignard <benjamin.gaignard@stericsson.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b4c006c3
    • Steven Rostedt's avatar
      ring-buffer: Fix race between integrity check and readers · 501563ac
      Steven Rostedt authored
      commit 54f7be5b upstream.
      
      The function rb_check_pages() was added to make sure the ring buffer's
      pages were sane. This check is done when the ring buffer size is modified
      as well as when the iterator is released (closing the "trace" file),
      as that was considered a non fast path and a good place to do a sanity
      check.
      
      The problem is that the check does not have any locks around it.
      If one process were to read the trace file, and another were to read
      the raw binary file, the check could happen while the reader is reading
      the file.
      
      The issues with this is that the check requires to clear the HEAD page
      before doing the full check and it restores it afterward. But readers
      require the HEAD page to exist before it can read the buffer, otherwise
      it gives a nasty warning and disables the buffer.
      
      By adding the reader lock around the check, this keeps the race from
      happening.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      501563ac
    • Shawn Guo's avatar
      kernel/sys.c: call disable_nonboot_cpus() in kernel_restart() · a289a811
      Shawn Guo authored
      commit f96972f2 upstream.
      
      As kernel_power_off() calls disable_nonboot_cpus(), we may also want to
      have kernel_restart() call disable_nonboot_cpus().  Doing so can help
      machines that require boot cpu be the last alive cpu during reboot to
      survive with kernel restart.
      
      This fixes one reboot issue seen on imx6q (Cortex-A9 Quad).  The machine
      requires that the restart routine be run on the primary cpu rather than
      secondary ones.  Otherwise, the secondary core running the restart
      routine will fail to come to online after reboot.
      Signed-off-by: default avatarShawn Guo <shawn.guo@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a289a811
    • Denys Vlasenko's avatar
      coredump: prevent double-free on an error path in core dumper · 8b7435d2
      Denys Vlasenko authored
      commit f34f9d18 upstream.
      
      In !CORE_DUMP_USE_REGSET case, if elf_note_info_init fails to allocate
      memory for info->fields, it frees already allocated stuff and returns
      error to its caller, fill_note_info.  Which in turn returns error to its
      caller, elf_core_dump.  Which jumps to cleanup label and calls
      free_note_info, which will happily try to free all info->fields again.
      BOOM.
      
      This is the fix.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarDenys Vlasenko <vda.linux@googlemail.com>
      Cc: Venu Byravarasu <vbyravarasu@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      8b7435d2
    • Oleg Nesterov's avatar
      wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task · 16365e5b
      Oleg Nesterov authored
      wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task
      
      CVE-2013-0871
      
      BugLink: http://bugs.launchpad.net/bugs/1129192
      
      wake_up_process() should never wakeup a TASK_STOPPED/TRACED task.
      Change it to use TASK_NORMAL and add the WARN_ON().
      
      TASK_ALL has no other users, probably can be killed.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      (backported from commit 9067ac85)
      
      Conflicts:
      	kernel/sched/core.c
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Acked-by: default avatarColin King <colin.king@canonical.com>
      Signed-off-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      16365e5b
    • Andrew Morton's avatar
      kernel/signal.c: use __ARCH_HAS_SA_RESTORER instead of SA_RESTORER · 77ae04b3
      Andrew Morton authored
      commit 522cff14 upstream.
      
      __ARCH_HAS_SA_RESTORER is the preferred conditional for use in 3.9 and
      later kernels, per Kees.
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Emese Revfy <re.emese@gmail.com>
      Cc: Emese Revfy <re.emese@gmail.com>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: Julien Tinnes <jln@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      77ae04b3
    • Ben Hutchings's avatar
      signal: Define __ARCH_HAS_SA_RESTORER so we know whether to clear sa_restorer · e3542c8c
      Ben Hutchings authored
      Vaguely based on upstream commit 574c4866 'consolidate kernel-side
      struct sigaction declarations'.
      
      flush_signal_handlers() needs to know whether sigaction::sa_restorer
      is defined, not whether SA_RESTORER is defined.  Define the
      __ARCH_HAS_SA_RESTORER macro to indicate this.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e3542c8c
    • Emese Revfy's avatar
      kernel/signal.c: stop info leak via the tkill and the tgkill syscalls · d7328a9b
      Emese Revfy authored
      commit b9e146d8 upstream.
      
      This fixes a kernel memory contents leak via the tkill and tgkill syscalls
      for compat processes.
      
      This is visible in the siginfo_t->_sifields._rt.si_sigval.sival_ptr field
      when handling signals delivered from tkill.
      
      The place of the infoleak:
      
      int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
      {
              ...
              put_user_ex(ptr_to_compat(from->si_ptr), &to->si_ptr);
              ...
      }
      Signed-off-by: default avatarEmese Revfy <re.emese@gmail.com>
      Reviewed-by: default avatarPaX Team <pageexec@freemail.hu>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      d7328a9b
    • John Johansen's avatar
      ptrace: Fix ptrace when task is in task_is_stopped() state · 96ec5658
      John Johansen authored
      This patch fixes a regression in ptrace, introduced by commit 9e74eb39
      (backport of 9899d11f) which makes assumptions about ptrace behavior
      which are not true in the 2.6.32 kernel.
      
      BugLink: http://bugs.launchpad.net/bugs/1145234
      
      9899d11f makes the assumption that task_is_stopped() is not a valid state
      in ptrace because it is built on top of a series of patches which change
      how the TASK_STOPPED state is tracked (321fb561 which requires d79fdd6d
      and several other patches).
      
      Because Lucid does not have the set of patches that make task_is_stopped()
      an invalid state in ptrace_check_attach, partially revert 9e74eb39 so
      that ptrace_check_attach() correctly handles task_is_stopped(). However
      we must replace the assignment of TASK_TRACED with __TASK_TRACED to
      ensure TASK_WAKEKILL is cleared.
      Signed-off-by: default avatarJohn Johansen <john.johansen@canonical.com>
      Acked-by: default avatarColin King <colin.king@canonical.com>
      Acked-by: default avatarStefan Bader <stefan.bader@canonical.com>
      Acked-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      96ec5658
    • Oleg Nesterov's avatar
      ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL · 062bdcb3
      Oleg Nesterov authored
      ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL
      
      CVE-2013-0871
      
      BugLink: http://bugs.launchpad.net/bugs/1129192
      
      putreg() assumes that the tracee is not running and pt_regs_access() can
      safely play with its stack.  However a killed tracee can return from
      ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
      that debugger can actually read/modify the kernel stack until the tracee
      does SAVE_REST again.
      
      set_task_blockstep() can race with SIGKILL too and in some sense this
      race is even worse, the very fact the tracee can be woken up breaks the
      logic.
      
      As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
      call, this ensures that nobody can ever wakeup the tracee while the
      debugger looks at it.  Not only this fixes the mentioned problems, we
      can do some cleanups/simplifications in arch_ptrace() paths.
      
      Probably ptrace_unfreeze_traced() needs more callers, for example it
      makes sense to make the tracee killable for oom-killer before
      access_process_vm().
      
      While at it, add the comment into may_ptrace_stop() to explain why
      ptrace_stop() still can't rely on SIGKILL and signal_pending_state().
      Reported-by: default avatarSalman Qazi <sqazi@google.com>
      Reported-by: default avatarSuleiman Souhlal <suleiman@google.com>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      (backported from commit 9899d11f)
      
      Conflicts:
      	arch/x86/kernel/step.c
      	kernel/ptrace.c
      	kernel/signal.c
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Acked-by: default avatarColin King <colin.king@canonical.com>
      Signed-off-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      062bdcb3