1. 06 Jun, 2015 11 commits
    • Xiao Guangrong's avatar
      KVM: MMU: fix SMAP virtualization · dbedd5a5
      Xiao Guangrong authored
      commit 0be0226f upstream.
      
      KVM may turn a user page to a kernel page when kernel writes a readonly
      user page if CR0.WP = 1. This shadow page entry will be reused after
      SMAP is enabled so that kernel is allowed to access this user page
      
      Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu
      once CR4.SMAP is updated
      Signed-off-by: default avatarXiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dbedd5a5
    • Andrea Arcangeli's avatar
      kvm: fix crash in kvm_vcpu_reload_apic_access_page · 211b28d7
      Andrea Arcangeli authored
      commit e8fd5e9e upstream.
      
      memslot->userfault_addr is set by the kernel with a mmap executed
      from the kernel but the userland can still munmap it and lead to the
      below oops after memslot->userfault_addr points to a host virtual
      address that has no vma or mapping.
      
      [  327.538306] BUG: unable to handle kernel paging request at fffffffffffffffe
      [  327.538407] IP: [<ffffffff811a7b55>] put_page+0x5/0x50
      [  327.538474] PGD 1a01067 PUD 1a03067 PMD 0
      [  327.538529] Oops: 0000 [#1] SMP
      [  327.538574] Modules linked in: macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT iptable_filter ip_tables tun bridge stp llc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ipmi_devintf iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp dcdbas intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sb_edac edac_core ipmi_si ipmi_msghandler acpi_pad wmi acpi_power_meter lpc_ich mfd_core mei_me
      [  327.539488]  mei shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc mlx4_ib ib_sa ib_mad ib_core mlx4_en vxlan ib_addr ip_tunnel xfs libcrc32c sd_mod crc_t10dif crct10dif_common crc32c_intel mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm ahci i2c_core libahci mlx4_core libata tg3 ptp pps_core megaraid_sas ntb dm_mirror dm_region_hash dm_log dm_mod
      [  327.539956] CPU: 3 PID: 3161 Comm: qemu-kvm Not tainted 3.10.0-240.el7.userfault19.4ca4011.x86_64.debug #1
      [  327.540045] Hardware name: Dell Inc. PowerEdge R420/0CN7CM, BIOS 2.1.2 01/20/2014
      [  327.540115] task: ffff8803280ccf00 ti: ffff880317c58000 task.ti: ffff880317c58000
      [  327.540184] RIP: 0010:[<ffffffff811a7b55>]  [<ffffffff811a7b55>] put_page+0x5/0x50
      [  327.540261] RSP: 0018:ffff880317c5bcf8  EFLAGS: 00010246
      [  327.540313] RAX: 00057ffffffff000 RBX: ffff880616a20000 RCX: 0000000000000000
      [  327.540379] RDX: 0000000000002014 RSI: 00057ffffffff000 RDI: fffffffffffffffe
      [  327.540445] RBP: ffff880317c5bd10 R08: 0000000000000103 R09: 0000000000000000
      [  327.540511] R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffffe
      [  327.540576] R13: 0000000000000000 R14: ffff880317c5bd70 R15: ffff880317c5bd50
      [  327.540643] FS:  00007fd230b7f700(0000) GS:ffff880630800000(0000) knlGS:0000000000000000
      [  327.540717] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  327.540771] CR2: fffffffffffffffe CR3: 000000062a2c3000 CR4: 00000000000427e0
      [  327.540837] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  327.540904] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  327.540974] Stack:
      [  327.541008]  ffffffffa05d6d0c ffff880616a20000 0000000000000000 ffff880317c5bdc0
      [  327.541093]  ffffffffa05ddaa2 0000000000000000 00000000002191bf 00000042f3feab2d
      [  327.541177]  00000042f3feab2d 0000000000000002 0000000000000001 0321000000000000
      [  327.541261] Call Trace:
      [  327.541321]  [<ffffffffa05d6d0c>] ? kvm_vcpu_reload_apic_access_page+0x6c/0x80 [kvm]
      [  327.543615]  [<ffffffffa05ddaa2>] vcpu_enter_guest+0x3f2/0x10f0 [kvm]
      [  327.545918]  [<ffffffffa05e2f10>] kvm_arch_vcpu_ioctl_run+0x2b0/0x5a0 [kvm]
      [  327.548211]  [<ffffffffa05e2d02>] ? kvm_arch_vcpu_ioctl_run+0xa2/0x5a0 [kvm]
      [  327.550500]  [<ffffffffa05ca845>] kvm_vcpu_ioctl+0x2b5/0x680 [kvm]
      [  327.552768]  [<ffffffff810b8d12>] ? creds_are_invalid.part.1+0x12/0x50
      [  327.555069]  [<ffffffff810b8d71>] ? creds_are_invalid+0x21/0x30
      [  327.557373]  [<ffffffff812d6066>] ? inode_has_perm.isra.49.constprop.65+0x26/0x80
      [  327.559663]  [<ffffffff8122d985>] do_vfs_ioctl+0x305/0x530
      [  327.561917]  [<ffffffff8122dc51>] SyS_ioctl+0xa1/0xc0
      [  327.564185]  [<ffffffff816de829>] system_call_fastpath+0x16/0x1b
      [  327.566480] Code: 0b 31 f6 4c 89 e7 e8 4b 7f ff ff 0f 0b e8 24 fd ff ff e9 a9 fd ff ff 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 55 48 89 e5 75 2a 8b 47 1c 85 c0 74 1e f0
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      211b28d7
    • Xiao Guangrong's avatar
      KVM: MMU: fix smap permission check · 496862ce
      Xiao Guangrong authored
      commit 7cbeed9b upstream.
      
      Current permission check assumes that RSVD bit in PFEC is always zero,
      however, it is not true since MMIO #PF will use it to quickly identify
      MMIO access
      
      Fix it by clearing the bit if walking guest page table is needed
      Signed-off-by: default avatarXiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      496862ce
    • Paolo Bonzini's avatar
      KVM: MMU: fix CR4.SMEP=1, CR0.WP=0 with shadow pages · b3aa70cd
      Paolo Bonzini authored
      commit 89876115 upstream.
      
      smep_andnot_wp is initialized in kvm_init_shadow_mmu and shadow pages
      should not be reused for different values of it.  Thus, it has to be
      added to the mask in kvm_mmu_pte_write.
      Reviewed-by: default avatarXiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3aa70cd
    • Ingo Molnar's avatar
      x86/fpu: Disable XSAVES* support for now · 8e592180
      Ingo Molnar authored
      commit e88221c5 upstream.
      
      The kernel's handling of 'compacted' xsave state layout is buggy:
      
          http://marc.info/?l=linux-kernel&m=142967852317199
      
      I don't have such a system, and the description there is vague, but
      from extrapolation I guess that there were two kinds of bugs
      observed:
      
        - boot crashes, due to size calculations being wrong and the dynamic
          allocation allocating a too small xstate area. (This is now fixed
          in the new FPU code - but still present in stable kernels.)
      
        - FPU state corruption and ABI breakage: if signal handlers try to
          change the FPU state in standard format, which then the kernel
          tries to restore in the compacted format.
      
      These breakages are scary, but they only occur on a small number of
      systems that have XSAVES* CPU support. Yet we have had XSAVES support
      in the upstream kernel for a large number of stable kernel releases,
      and the fixes are involved and unproven.
      
      So do the safe resolution first: disable XSAVES* support and only
      use the standard xstate format. This makes the code work and is
      easy to backport.
      
      On top of this we can work on enabling (and testing!) proper
      compacted format support, without backporting pressure, on top of the
      new, cleaned up FPU code.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e592180
    • Borislav Petkov's avatar
      x86/mce: Fix MCE severity messages · 270314c6
      Borislav Petkov authored
      commit 17fea54b upstream.
      
      Derek noticed that a critical MCE gets reported with the wrong
      error type description:
      
        [Hardware Error]: CPU 34: Machine Check Exception: 5 Bank 9: f200003f000100b0
        [Hardware Error]: RIP !INEXACT! 10:<ffffffff812e14c1> {intel_idle+0xb1/0x170}
        [Hardware Error]: TSC 49587b8e321cb
        [Hardware Error]: PROCESSOR 0:306e4 TIME 1431561296 SOCKET 1 APIC 29
        [Hardware Error]: Some CPUs didn't answer in synchronization
        [Hardware Error]: Machine check: Invalid
      				   ^^^^^^^
      
      The last line with 'Invalid' should have printed the high level
      MCE error type description we get from mce_severity, i.e.
      something like:
      
        [Hardware Error]: Machine check: Action required: data load error in a user process
      
      this happens due to the fact that mce_no_way_out() iterates over
      all MCA banks and possibly overwrites the @msg argument which is
      used in the panic printing later.
      
      Change behavior to take the message of only and the (last)
      critical MCE it detects.
      Reported-by: default avatarDerek <denc716@gmail.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1431936437-25286-3-git-send-email-bp@alien8.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      270314c6
    • Paolo Bonzini's avatar
      Revert "KVM: x86: drop fpu_activate hook" · 42a60630
      Paolo Bonzini authored
      commit 0fdd74f7 upstream.
      
      This reverts commit 4473b570.  We'll
      use the hook again.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      42a60630
    • Will Deacon's avatar
      iommu/arm-smmu: Fix sign-extension of upstream bus addresses at stage 1 · 4c951487
      Will Deacon authored
      commit 5dc5616e upstream.
      
      Stage 1 translation is controlled by two sets of page tables (TTBR0 and
      TTBR1) which grow up and down from zero respectively in the ARMv8
      translation regime. For the SMMU, we only care about TTBR0 and, in the
      case of a 48-bit virtual space, we expect to map virtual addresses 0x0
      through to 0xffff_ffff_ffff.
      
      Given that some masters may be incapable of emitting virtual addresses
      targetting TTBR1 (e.g. because they sit on a 48-bit bus), the SMMU
      architecture allows bit 47 to be sign-extended, halving the virtual
      range of TTBR0 but allowing TTBR1 to be used. This is controlled by the
      SEP field in TTBCR2.
      
      The SMMU driver incorrectly enables this sign-extension feature, which
      causes problems when userspace addresses are programmed into a master
      device with the SMMU expecting to map the incoming transactions via
      TTBR0; if the top bit of address is set, we will instead get a
      translation fault since TTBR1 walks are disabled in the TTBCR.
      
      This patch fixes the issue by disabling sign-extension of a fixed
      virtual address bit and instead basing the behaviour on the upstream bus
      size: the incoming address is zero extended unless the upstream bus is
      only 49 bits wide, in which case bit 48 is used as the sign bit and is
      replicated to the upper bits.
      Reported-by: default avatarVarun Sethi <varun.sethi@freescale.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c951487
    • Oded Gabbay's avatar
      iommu/amd: Fix bug in put_pasid_state_wait · 9f453d6e
      Oded Gabbay authored
      commit 1bf1b431 upstream.
      
      This patch fixes a bug in put_pasid_state_wait that appeared in kernel 4.0
      The bug is that pasid_state->count wasn't decremented before entering the
      wait_event. Thus, the condition in wait_event will never be true.
      
      The fix is to decrement (atomically) the pasid_state->count before the
      wait_event.
      Signed-off-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f453d6e
    • Eric W. Biederman's avatar
      fs_pin: Allow for the possibility that m_list or s_list go unused. · ef20854f
      Eric W. Biederman authored
      commit 820f9f14 upstream.
      
      This is needed to support lazily umounting locked mounts.  Because the
      entire unmounted subtree needs to stay together until there are no
      users with references to any part of the subtree.
      
      To support this guarantee that the fs_pin m_list and s_list nodes
      are initialized by initializing them in init_fs_pin allowing
      for the possibility that pin_insert_group does not touch them.
      
      Further use hlist_del_init in pin_remove so that there is
      a hlist_unhashed test before the list we attempt to update
      the previous list item.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef20854f
    • Eric W. Biederman's avatar
      mnt: Fail collect_mounts when applied to unmounted mounts · 9993cbfd
      Eric W. Biederman authored
      commit cd4a4017 upstream.
      
      The only users of collect_mounts are in audit_tree.c
      
      In audit_trim_trees and audit_add_tree_rule the path passed into
      collect_mounts is generated from kern_path passed an audit_tree
      pathname which is guaranteed to be an absolute path.   In those cases
      collect_mounts is obviously intended to work on mounted paths and
      if a race results in paths that are unmounted when collect_mounts
      it is reasonable to fail early.
      
      The paths passed into audit_tag_tree don't have the absolute path
      check.  But are used to play with fsnotify and otherwise interact with
      the audit_trees, so again operating only on mounted paths appears
      reasonable.
      
      Avoid having to worry about what happens when we try and audit
      unmounted filesystems by restricting collect_mounts to mounts
      that appear in the mount tree.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9993cbfd
  2. 17 May, 2015 29 commits