1. 06 Jun, 2015 25 commits
    • David Vrabel's avatar
      xen/events: don't bind non-percpu VIRQs with percpu chip · a4b513b2
      David Vrabel authored
      commit 77bb3dfd upstream.
      
      A non-percpu VIRQ (e.g., VIRQ_CONSOLE) may be freed on a different
      VCPU than it is bound to.  This can result in a race between
      handle_percpu_irq() and removing the action in __free_irq() because
      handle_percpu_irq() does not take desc->lock.  The interrupt handler
      sees a NULL action and oopses.
      
      Only use the percpu chip/handler for per-CPU VIRQs (like VIRQ_TIMER).
      
        # cat /proc/interrupts | grep virq
         40:      87246          0  xen-percpu-virq      timer0
         44:          0          0  xen-percpu-virq      debug0
         47:          0      20995  xen-percpu-virq      timer1
         51:          0          0  xen-percpu-virq      debug1
         69:          0          0   xen-dyn-virq      xen-pcpu
         74:          0          0   xen-dyn-virq      mce
         75:         29          0   xen-dyn-virq      hvc_console
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a4b513b2
    • Filipe Manana's avatar
      Btrfs: fix racy system chunk allocation when setting block group ro · 68482f12
      Filipe Manana authored
      commit a9629596 upstream.
      
      If while setting a block group read-only we end up allocating a system
      chunk, through check_system_chunk(), we were not doing it while holding
      the chunk mutex which is a problem if a concurrent chunk allocation is
      happening, through do_chunk_alloc(), as it means both block groups can
      end up using the same logical addresses and physical regions in the
      device(s). So make sure we hold the chunk mutex.
      
      Fixes: 2f081088 ("btrfs: delete chunk allocation attemp when
                            setting block group ro")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68482f12
    • Ilya Dryomov's avatar
      Revert "libceph: clear r_req_lru_item in __unregister_linger_request()" · f0034a3a
      Ilya Dryomov authored
      commit 521a04d0 upstream.
      
      This reverts commit ba9d114e.
      
      .. which introduced a regression that prevented all lingering requests
      requeued in kick_requests() from ever being sent to the OSDs, resulting
      in a lot of missed notifies.  In retrospect it's pretty obvious that
      r_req_lru_item item in the case of lingering requests can be used not
      only for notarget, but also for unsent linkage due to how tightly
      actual map and enqueue operations are coupled in __map_request().
      
      The assertion that was being silenced is taken care of in the previous
      ("libceph: request a new osdmap if lingering request maps to no osd")
      commit: by always kicking homeless lingering requests we ensure that
      none of them ends up on the notarget list outside of the critical
      section guarded by request_mutex.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0034a3a
    • Ilya Dryomov's avatar
      libceph: request a new osdmap if lingering request maps to no osd · 5b4715d1
      Ilya Dryomov authored
      commit b0494532 upstream.
      
      This commit does two things.  First, if there are any homeless
      lingering requests, we now request a new osdmap even if the osdmap that
      is being processed brought no changes, i.e. if a given lingering
      request turned homeless in one of the previous epochs and remained
      homeless in the current epoch.  Not doing so leaves us with a stale
      osdmap and as a result we may miss our window for reestablishing the
      watch and lose notifies.
      
      MON=1 OSD=1:
      
          # cat linger-needmap.sh
          #!/bin/bash
          rbd create --size 1 test
          DEV=$(rbd map test)
          ceph osd out 0
          rbd map dne/dne # obtain a new osdmap as a side effect (!)
          sleep 1
          ceph osd in 0
          rbd resize --size 2 test
          # rbd info test | grep size -> 2M
          # blockdev --getsize $DEV -> 1M
      
      N.B.: Not obtaining a new osdmap in between "osd out" and "osd in"
      above is enough to make it miss that resize notify, but that is a
      bug^Wlimitation of ceph watch/notify v1.
      
      Second, homeless lingering requests are now kicked just like those
      lingering requests whose mapping has changed.  This is mainly to
      recognize that a homeless lingering request makes no sense and to
      preserve the invariant that a registered lingering request is not
      sitting on any of r_req_lru_item lists.  This spares us a WARN_ON,
      which commit ba9d114e ("libceph: clear r_req_lru_item in
      __unregister_linger_request()") tried to fix the _wrong_ way.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b4715d1
    • Johan Hovold's avatar
      mfd: da9052: Fix broken regulator probe · 5b465771
      Johan Hovold authored
      commit e0c21530 upstream.
      
      Fix broken probe of da9052 regulators, which since commit b3f6c73d
      ("mfd: da9052-core: Fix platform-device id collision") use a
      non-deterministic platform-device id to retrieve static regulator
      information. Fortunately, adequate error handling was in place so probe
      would simply fail with an error message.
      
      Update the mfd-cell ids to be zero-based and use those to identify the
      cells when probing the regulator devices.
      
      Fixes: b3f6c73d ("mfd: da9052-core: Fix platform-device id collision")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Acked-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b465771
    • Miklos Szeredi's avatar
      ovl: mount read-only if workdir can't be created · 5a0408db
      Miklos Szeredi authored
      commit cc6f67bc upstream.
      
      OpenWRT folks reported that overlayfs fails to mount if upper fs is full,
      because workdir can't be created.  Wordir creation can fail for various
      other reasons too.
      
      There's no reason that the mount itself should fail, overlayfs can work
      fine without a workdir, as long as the overlay isn't modified.
      
      So mount it read-only and don't allow remounting read-write.
      
      Add a couple of WARN_ON()s for the impossible case of workdir being used
      despite being read-only.
      Reported-by: default avatarBastian Bittorf <bittorf@bluebottle.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5a0408db
    • Miklos Szeredi's avatar
      ovl: don't remove non-empty opaque directory · 5738c250
      Miklos Szeredi authored
      commit d377c5eb upstream.
      
      When removing an opaque directory we can't just call rmdir() to check for
      emptiness, because the directory will need to be replaced with a whiteout.
      The replacement is done with RENAME_EXCHANGE, which doesn't check
      emptiness.
      
      Solution is just to check emptiness by reading the directory.  In the
      future we could add a new rename flag to check for emptiness even for
      RENAME_EXCHANGE to optimize this case.
      Reported-by: default avatarVincent Batts <vbatts@gmail.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Tested-by: default avatarJordi Pujol Palomer <jordipujolp@gmail.com>
      Fixes: 263b4a0f ("ovl: dont replace opaque dir")
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5738c250
    • Rusty Russell's avatar
      lguest: fix out-by-one error in address checking. · b8eadc72
      Rusty Russell authored
      commit 83a35114 upstream.
      
      This bug has been there since day 1; addresses in the top guest physical
      page weren't considered valid.  You could map that page (the check in
      check_gpte() is correct), but if a guest tried to put a pagetable there
      we'd check that address manually when walking it, and kill the guest.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8eadc72
    • Dave Chinner's avatar
      xfs: xfs_iozero can return positive errno · d5d4c3be
      Dave Chinner authored
      commit cddc1162 upstream.
      
      It was missed when we converted everything in XFs to use negative error
      numbers, so fix it now. Bug introduced in 3.17 by commit 2451337d ("xfs: global
      error sign conversion"), and should go back to stable kernels.
      
      Thanks to Brian Foster for noticing it.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5d4c3be
    • Dave Chinner's avatar
      xfs: xfs_attr_inactive leaves inconsistent attr fork state behind · d6ae1895
      Dave Chinner authored
      commit 6dfe5a04 upstream.
      
      xfs_attr_inactive() is supposed to clean up the attribute fork when
      the inode is being freed. While it removes attribute fork extents,
      it completely ignores attributes in local format, which means that
      there can still be active attributes on the inode after
      xfs_attr_inactive() has run.
      
      This leads to problems with concurrent inode writeback - the in-core
      inode attribute fork is removed without locking on the assumption
      that nothing will be attempting to access the attribute fork after a
      call to xfs_attr_inactive() because it isn't supposed to exist on
      disk any more.
      
      To fix this, make xfs_attr_inactive() completely remove all traces
      of the attribute fork from the inode, regardless of it's state.
      Further, also remove the in-core attribute fork structure safely so
      that there is nothing further that needs to be done by callers to
      clean up the attribute fork. This means we can remove the in-core
      and on-disk attribute forks atomically.
      
      Also, on error simply remove the in-memory attribute fork. There's
      nothing that can be done with it once we have failed to remove the
      on-disk attribute fork, so we may as well just blow it away here
      anyway.
      Reported-by: default avatarWaiman Long <waiman.long@hp.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6ae1895
    • Bob Copeland's avatar
      omfs: fix sign confusion for bitmap loop counter · a3593474
      Bob Copeland authored
      commit c0345ee5 upstream.
      
      The count variable is used to iterate down to (below) zero from the size
      of the bitmap and handle the one-filling the remainder of the last
      partial bitmap block.  The loop conditional expects count to be signed
      in order to detect when the final block is processed, after which count
      goes negative.
      
      Unfortunately, a recent change made this unsigned along with some other
      related fields.  The result of is this is that during mount,
      omfs_get_imap will overrun the bitmap array and corrupt memory unless
      number of blocks happens to be a multiple of 8 * blocksize.
      
      Fix by changing count back to signed: it is guaranteed to fit in an s32
      without overflow due to an enforced limit on the number of blocks in the
      filesystem.
      Signed-off-by: default avatarBob Copeland <me@bobcopeland.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a3593474
    • Sasha Levin's avatar
      fs, omfs: add NULL terminator in the end up the token list · 4215727c
      Sasha Levin authored
      commit dcbff39d upstream.
      
      match_token() expects a NULL terminator at the end of the token list so
      that it would know where to stop.  Not having one causes it to overrun
      to invalid memory.
      
      In practice, passing a mount option that omfs didn't recognize would
      sometimes panic the system.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarBob Copeland <me@bobcopeland.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4215727c
    • John Stultz's avatar
      ktime: Fix ktime_divns to do signed division · 1d280829
      John Stultz authored
      commit f7bcb70e upstream.
      
      It was noted that the 32bit implementation of ktime_divns()
      was doing unsigned division and didn't properly handle
      negative values.
      
      And when a ktime helper was changed to utilize
      ktime_divns, it caused a regression on some IR blasters.
      See the following bugzilla for details:
        https://bugzilla.redhat.com/show_bug.cgi?id=1200353
      
      This patch fixes the problem in ktime_divns by checking
      and preserving the sign bit, and then reapplying it if
      appropriate after the division, it also changes the return
      type to a s64 to make it more obvious this is expected.
      
      Nicolas also pointed out that negative dividers would
      cause infinite loops on 32bit systems, negative dividers
      is unlikely for users of this function, but out of caution
      this patch adds checks for negative dividers for both
      32-bit (BUG_ON) and 64-bit(WARN_ON) versions to make sure
      no such use cases creep in.
      
      [ tglx: Hand an u64 to do_div() to avoid the compiler warning ]
      
      Fixes: 166afb64 'ktime: Sanitize ktime_to_us/ms conversion'
      Reported-and-tested-by: default avatarTrevor Cordes <trevor@tecnopolis.ca>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Acked-by: default avatarNicolas Pitre <nicolas.pitre@linaro.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Link: http://lkml.kernel.org/r/1431118043-23452-1-git-send-email-john.stultz@linaro.orgSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d280829
    • Liang Li's avatar
      kvm/fpu: Enable eager restore kvm FPU for MPX · aeeaad9e
      Liang Li authored
      commit c447e76b upstream.
      
      The MPX feature requires eager KVM FPU restore support. We have verified
      that MPX cannot work correctly with the current lazy KVM FPU restore
      mechanism. Eager KVM FPU restore should be enabled if the MPX feature is
      exposed to VM.
      Signed-off-by: default avatarYang Zhang <yang.z.zhang@intel.com>
      Signed-off-by: default avatarLiang Li <liang.z.li@intel.com>
      [Also activate the FPU on AMD processors. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aeeaad9e
    • Xiao Guangrong's avatar
      KVM: MMU: fix SMAP virtualization · dbedd5a5
      Xiao Guangrong authored
      commit 0be0226f upstream.
      
      KVM may turn a user page to a kernel page when kernel writes a readonly
      user page if CR0.WP = 1. This shadow page entry will be reused after
      SMAP is enabled so that kernel is allowed to access this user page
      
      Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu
      once CR4.SMAP is updated
      Signed-off-by: default avatarXiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dbedd5a5
    • Andrea Arcangeli's avatar
      kvm: fix crash in kvm_vcpu_reload_apic_access_page · 211b28d7
      Andrea Arcangeli authored
      commit e8fd5e9e upstream.
      
      memslot->userfault_addr is set by the kernel with a mmap executed
      from the kernel but the userland can still munmap it and lead to the
      below oops after memslot->userfault_addr points to a host virtual
      address that has no vma or mapping.
      
      [  327.538306] BUG: unable to handle kernel paging request at fffffffffffffffe
      [  327.538407] IP: [<ffffffff811a7b55>] put_page+0x5/0x50
      [  327.538474] PGD 1a01067 PUD 1a03067 PMD 0
      [  327.538529] Oops: 0000 [#1] SMP
      [  327.538574] Modules linked in: macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT iptable_filter ip_tables tun bridge stp llc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ipmi_devintf iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp dcdbas intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sb_edac edac_core ipmi_si ipmi_msghandler acpi_pad wmi acpi_power_meter lpc_ich mfd_core mei_me
      [  327.539488]  mei shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc mlx4_ib ib_sa ib_mad ib_core mlx4_en vxlan ib_addr ip_tunnel xfs libcrc32c sd_mod crc_t10dif crct10dif_common crc32c_intel mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm ahci i2c_core libahci mlx4_core libata tg3 ptp pps_core megaraid_sas ntb dm_mirror dm_region_hash dm_log dm_mod
      [  327.539956] CPU: 3 PID: 3161 Comm: qemu-kvm Not tainted 3.10.0-240.el7.userfault19.4ca4011.x86_64.debug #1
      [  327.540045] Hardware name: Dell Inc. PowerEdge R420/0CN7CM, BIOS 2.1.2 01/20/2014
      [  327.540115] task: ffff8803280ccf00 ti: ffff880317c58000 task.ti: ffff880317c58000
      [  327.540184] RIP: 0010:[<ffffffff811a7b55>]  [<ffffffff811a7b55>] put_page+0x5/0x50
      [  327.540261] RSP: 0018:ffff880317c5bcf8  EFLAGS: 00010246
      [  327.540313] RAX: 00057ffffffff000 RBX: ffff880616a20000 RCX: 0000000000000000
      [  327.540379] RDX: 0000000000002014 RSI: 00057ffffffff000 RDI: fffffffffffffffe
      [  327.540445] RBP: ffff880317c5bd10 R08: 0000000000000103 R09: 0000000000000000
      [  327.540511] R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffffe
      [  327.540576] R13: 0000000000000000 R14: ffff880317c5bd70 R15: ffff880317c5bd50
      [  327.540643] FS:  00007fd230b7f700(0000) GS:ffff880630800000(0000) knlGS:0000000000000000
      [  327.540717] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  327.540771] CR2: fffffffffffffffe CR3: 000000062a2c3000 CR4: 00000000000427e0
      [  327.540837] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  327.540904] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  327.540974] Stack:
      [  327.541008]  ffffffffa05d6d0c ffff880616a20000 0000000000000000 ffff880317c5bdc0
      [  327.541093]  ffffffffa05ddaa2 0000000000000000 00000000002191bf 00000042f3feab2d
      [  327.541177]  00000042f3feab2d 0000000000000002 0000000000000001 0321000000000000
      [  327.541261] Call Trace:
      [  327.541321]  [<ffffffffa05d6d0c>] ? kvm_vcpu_reload_apic_access_page+0x6c/0x80 [kvm]
      [  327.543615]  [<ffffffffa05ddaa2>] vcpu_enter_guest+0x3f2/0x10f0 [kvm]
      [  327.545918]  [<ffffffffa05e2f10>] kvm_arch_vcpu_ioctl_run+0x2b0/0x5a0 [kvm]
      [  327.548211]  [<ffffffffa05e2d02>] ? kvm_arch_vcpu_ioctl_run+0xa2/0x5a0 [kvm]
      [  327.550500]  [<ffffffffa05ca845>] kvm_vcpu_ioctl+0x2b5/0x680 [kvm]
      [  327.552768]  [<ffffffff810b8d12>] ? creds_are_invalid.part.1+0x12/0x50
      [  327.555069]  [<ffffffff810b8d71>] ? creds_are_invalid+0x21/0x30
      [  327.557373]  [<ffffffff812d6066>] ? inode_has_perm.isra.49.constprop.65+0x26/0x80
      [  327.559663]  [<ffffffff8122d985>] do_vfs_ioctl+0x305/0x530
      [  327.561917]  [<ffffffff8122dc51>] SyS_ioctl+0xa1/0xc0
      [  327.564185]  [<ffffffff816de829>] system_call_fastpath+0x16/0x1b
      [  327.566480] Code: 0b 31 f6 4c 89 e7 e8 4b 7f ff ff 0f 0b e8 24 fd ff ff e9 a9 fd ff ff 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 55 48 89 e5 75 2a 8b 47 1c 85 c0 74 1e f0
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      211b28d7
    • Xiao Guangrong's avatar
      KVM: MMU: fix smap permission check · 496862ce
      Xiao Guangrong authored
      commit 7cbeed9b upstream.
      
      Current permission check assumes that RSVD bit in PFEC is always zero,
      however, it is not true since MMIO #PF will use it to quickly identify
      MMIO access
      
      Fix it by clearing the bit if walking guest page table is needed
      Signed-off-by: default avatarXiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      496862ce
    • Paolo Bonzini's avatar
      KVM: MMU: fix CR4.SMEP=1, CR0.WP=0 with shadow pages · b3aa70cd
      Paolo Bonzini authored
      commit 89876115 upstream.
      
      smep_andnot_wp is initialized in kvm_init_shadow_mmu and shadow pages
      should not be reused for different values of it.  Thus, it has to be
      added to the mask in kvm_mmu_pte_write.
      Reviewed-by: default avatarXiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3aa70cd
    • Ingo Molnar's avatar
      x86/fpu: Disable XSAVES* support for now · 8e592180
      Ingo Molnar authored
      commit e88221c5 upstream.
      
      The kernel's handling of 'compacted' xsave state layout is buggy:
      
          http://marc.info/?l=linux-kernel&m=142967852317199
      
      I don't have such a system, and the description there is vague, but
      from extrapolation I guess that there were two kinds of bugs
      observed:
      
        - boot crashes, due to size calculations being wrong and the dynamic
          allocation allocating a too small xstate area. (This is now fixed
          in the new FPU code - but still present in stable kernels.)
      
        - FPU state corruption and ABI breakage: if signal handlers try to
          change the FPU state in standard format, which then the kernel
          tries to restore in the compacted format.
      
      These breakages are scary, but they only occur on a small number of
      systems that have XSAVES* CPU support. Yet we have had XSAVES support
      in the upstream kernel for a large number of stable kernel releases,
      and the fixes are involved and unproven.
      
      So do the safe resolution first: disable XSAVES* support and only
      use the standard xstate format. This makes the code work and is
      easy to backport.
      
      On top of this we can work on enabling (and testing!) proper
      compacted format support, without backporting pressure, on top of the
      new, cleaned up FPU code.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e592180
    • Borislav Petkov's avatar
      x86/mce: Fix MCE severity messages · 270314c6
      Borislav Petkov authored
      commit 17fea54b upstream.
      
      Derek noticed that a critical MCE gets reported with the wrong
      error type description:
      
        [Hardware Error]: CPU 34: Machine Check Exception: 5 Bank 9: f200003f000100b0
        [Hardware Error]: RIP !INEXACT! 10:<ffffffff812e14c1> {intel_idle+0xb1/0x170}
        [Hardware Error]: TSC 49587b8e321cb
        [Hardware Error]: PROCESSOR 0:306e4 TIME 1431561296 SOCKET 1 APIC 29
        [Hardware Error]: Some CPUs didn't answer in synchronization
        [Hardware Error]: Machine check: Invalid
      				   ^^^^^^^
      
      The last line with 'Invalid' should have printed the high level
      MCE error type description we get from mce_severity, i.e.
      something like:
      
        [Hardware Error]: Machine check: Action required: data load error in a user process
      
      this happens due to the fact that mce_no_way_out() iterates over
      all MCA banks and possibly overwrites the @msg argument which is
      used in the panic printing later.
      
      Change behavior to take the message of only and the (last)
      critical MCE it detects.
      Reported-by: default avatarDerek <denc716@gmail.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1431936437-25286-3-git-send-email-bp@alien8.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      270314c6
    • Paolo Bonzini's avatar
      Revert "KVM: x86: drop fpu_activate hook" · 42a60630
      Paolo Bonzini authored
      commit 0fdd74f7 upstream.
      
      This reverts commit 4473b570.  We'll
      use the hook again.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      42a60630
    • Will Deacon's avatar
      iommu/arm-smmu: Fix sign-extension of upstream bus addresses at stage 1 · 4c951487
      Will Deacon authored
      commit 5dc5616e upstream.
      
      Stage 1 translation is controlled by two sets of page tables (TTBR0 and
      TTBR1) which grow up and down from zero respectively in the ARMv8
      translation regime. For the SMMU, we only care about TTBR0 and, in the
      case of a 48-bit virtual space, we expect to map virtual addresses 0x0
      through to 0xffff_ffff_ffff.
      
      Given that some masters may be incapable of emitting virtual addresses
      targetting TTBR1 (e.g. because they sit on a 48-bit bus), the SMMU
      architecture allows bit 47 to be sign-extended, halving the virtual
      range of TTBR0 but allowing TTBR1 to be used. This is controlled by the
      SEP field in TTBCR2.
      
      The SMMU driver incorrectly enables this sign-extension feature, which
      causes problems when userspace addresses are programmed into a master
      device with the SMMU expecting to map the incoming transactions via
      TTBR0; if the top bit of address is set, we will instead get a
      translation fault since TTBR1 walks are disabled in the TTBCR.
      
      This patch fixes the issue by disabling sign-extension of a fixed
      virtual address bit and instead basing the behaviour on the upstream bus
      size: the incoming address is zero extended unless the upstream bus is
      only 49 bits wide, in which case bit 48 is used as the sign bit and is
      replicated to the upper bits.
      Reported-by: default avatarVarun Sethi <varun.sethi@freescale.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c951487
    • Oded Gabbay's avatar
      iommu/amd: Fix bug in put_pasid_state_wait · 9f453d6e
      Oded Gabbay authored
      commit 1bf1b431 upstream.
      
      This patch fixes a bug in put_pasid_state_wait that appeared in kernel 4.0
      The bug is that pasid_state->count wasn't decremented before entering the
      wait_event. Thus, the condition in wait_event will never be true.
      
      The fix is to decrement (atomically) the pasid_state->count before the
      wait_event.
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f453d6e
    • Eric W. Biederman's avatar
      fs_pin: Allow for the possibility that m_list or s_list go unused. · ef20854f
      Eric W. Biederman authored
      commit 820f9f14 upstream.
      
      This is needed to support lazily umounting locked mounts.  Because the
      entire unmounted subtree needs to stay together until there are no
      users with references to any part of the subtree.
      
      To support this guarantee that the fs_pin m_list and s_list nodes
      are initialized by initializing them in init_fs_pin allowing
      for the possibility that pin_insert_group does not touch them.
      
      Further use hlist_del_init in pin_remove so that there is
      a hlist_unhashed test before the list we attempt to update
      the previous list item.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef20854f
    • Eric W. Biederman's avatar
      mnt: Fail collect_mounts when applied to unmounted mounts · 9993cbfd
      Eric W. Biederman authored
      commit cd4a4017 upstream.
      
      The only users of collect_mounts are in audit_tree.c
      
      In audit_trim_trees and audit_add_tree_rule the path passed into
      collect_mounts is generated from kern_path passed an audit_tree
      pathname which is guaranteed to be an absolute path.   In those cases
      collect_mounts is obviously intended to work on mounted paths and
      if a race results in paths that are unmounted when collect_mounts
      it is reasonable to fail early.
      
      The paths passed into audit_tag_tree don't have the absolute path
      check.  But are used to play with fsnotify and otherwise interact with
      the audit_trees, so again operating only on mounted paths appears
      reasonable.
      
      Avoid having to worry about what happens when we try and audit
      unmounted filesystems by restricting collect_mounts to mounts
      that appear in the mount tree.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9993cbfd
  2. 17 May, 2015 15 commits