1. 26 Jan, 2022 5 commits
    • Sean Christopherson's avatar
      KVM: SVM: Never reject emulation due to SMAP errata for !SEV guests · 55467fcd
      Sean Christopherson authored
      Always signal that emulation is possible for !SEV guests regardless of
      whether or not the CPU provided a valid instruction byte stream.  KVM can
      read all guest state (memory and registers) for !SEV guests, i.e. can
      fetch the code stream from memory even if the CPU failed to do so because
      of the SMAP errata.
      
      Fixes: 05d5a486 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)")
      Cc: stable@vger.kernel.org
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      55467fcd
    • Denis Valeev's avatar
      KVM: x86: nSVM: skip eax alignment check for non-SVM instructions · 47c28d43
      Denis Valeev authored
      The bug occurs on #GP triggered by VMware backdoor when eax value is
      unaligned. eax alignment check should not be applied to non-SVM
      instructions because it leads to incorrect omission of the instructions
      emulation.
      Apply the alignment check only to SVM instructions to fix.
      
      Fixes: d1cba6c9 ("KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround")
      Signed-off-by: default avatarDenis Valeev <lemniscattaden@gmail.com>
      Message-Id: <Yexlhaoe1Fscm59u@q>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      47c28d43
    • Like Xu's avatar
      KVM: x86/cpuid: Exclude unpermitted xfeatures sizes at KVM_GET_SUPPORTED_CPUID · 1ffce092
      Like Xu authored
      With the help of xstate_get_guest_group_perm(), KVM can exclude unpermitted
      xfeatures in cpuid.0xd.0.eax, in which case the corresponding xfeatures
      sizes should also be matched to the permitted xfeatures.
      
      To fix this inconsistency, the permitted_xcr0 and permitted_xss are defined
      consistently, which implies 'supported' plus certain permissions for this
      task, and it also fixes cpuid.0xd.1.ebx and later leaf-by-leaf queries.
      
      Fixes: 445ecdf7 ("kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUID")
      Signed-off-by: default avatarLike Xu <likexu@tencent.com>
      Message-Id: <20220125115223.33707-1-likexu@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1ffce092
    • Wanpeng Li's avatar
      KVM: LAPIC: Also cancel preemption timer during SET_LAPIC · 35fe7cfb
      Wanpeng Li authored
      The below warning is splatting during guest reboot.
      
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 1931 at arch/x86/kvm/x86.c:10322 kvm_arch_vcpu_ioctl_run+0x874/0x880 [kvm]
        CPU: 0 PID: 1931 Comm: qemu-system-x86 Tainted: G          I       5.17.0-rc1+ #5
        RIP: 0010:kvm_arch_vcpu_ioctl_run+0x874/0x880 [kvm]
        Call Trace:
         <TASK>
         kvm_vcpu_ioctl+0x279/0x710 [kvm]
         __x64_sys_ioctl+0x83/0xb0
         do_syscall_64+0x3b/0xc0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
        RIP: 0033:0x7fd39797350b
      
      This can be triggered by not exposing tsc-deadline mode and doing a reboot in
      the guest. The lapic_shutdown() function which is called in sys_reboot path
      will not disarm the flying timer, it just masks LVTT. lapic_shutdown() clears
      APIC state w/ LVT_MASKED and timer-mode bit is 0, this can trigger timer-mode
      switch between tsc-deadline and oneshot/periodic, which can result in preemption
      timer be cancelled in apic_update_lvtt(). However, We can't depend on this when
      not exposing tsc-deadline mode and oneshot/periodic modes emulated by preemption
      timer. Qemu will synchronise states around reset, let's cancel preemption timer
      under KVM_SET_LAPIC.
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1643102220-35667-1-git-send-email-wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      35fe7cfb
    • Jim Mattson's avatar
      KVM: VMX: Remove vmcs_config.order · 519669cc
      Jim Mattson authored
      The maximum size of a VMCS (or VMXON region) is 4096. By definition,
      these are order 0 allocations.
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Message-Id: <20220125004359.147600-1-jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      519669cc
  2. 25 Jan, 2022 4 commits
    • Quanfa Fu's avatar
      KVM/X86: Make kvm_vcpu_reload_apic_access_page() static · d081a343
      Quanfa Fu authored
      Make kvm_vcpu_reload_apic_access_page() static
      as it is no longer invoked directly by vmx
      and it is also no longer exported.
      
      No functional change intended.
      Signed-off-by: default avatarQuanfa Fu <quanfafu@gmail.com>
      Message-Id: <20211219091446.174584-1-quanfafu@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d081a343
    • David Matlack's avatar
      KVM: selftests: Re-enable access_tracking_perf_test · de1956f4
      David Matlack authored
      This selftest was accidentally removed by commit 6a581508
      ("selftest: KVM: Add intra host migration tests"). Add it back.
      
      Fixes: 6a581508 ("selftest: KVM: Add intra host migration tests")
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20220120003826.2805036-1-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      de1956f4
    • Sean Christopherson's avatar
      KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow · b9bed78e
      Sean Christopherson authored
      Set vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS, a.k.a. the pending single-step
      breakpoint flag, when re-injecting a #DB with RFLAGS.TF=1, and STI or
      MOVSS blocking is active.  Setting the flag is necessary to make VM-Entry
      consistency checks happy, as VMX has an invariant that if RFLAGS.TF is
      set and STI/MOVSS blocking is true, then the previous instruction must
      have been STI or MOV/POP, and therefore a single-step #DB must be pending
      since the RFLAGS.TF cannot have been set by the previous instruction,
      i.e. the one instruction delay after setting RFLAGS.TF must have already
      expired.
      
      Normally, the CPU sets vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS appropriately
      when recording guest state as part of a VM-Exit, but #DB VM-Exits
      intentionally do not treat the #DB as "guest state" as interception of
      the #DB effectively makes the #DB host-owned, thus KVM needs to manually
      set PENDING_DBG.BS when forwarding/re-injecting the #DB to the guest.
      
      Note, although this bug can be triggered by guest userspace, doing so
      requires IOPL=3, and guest userspace running with IOPL=3 has full access
      to all I/O ports (from the guest's perspective) and can crash/reboot the
      guest any number of ways.  IOPL=3 is required because STI blocking kicks
      in if and only if RFLAGS.IF is toggled 0=>1, and if CPL>IOPL, STI either
      takes a #GP or modifies RFLAGS.VIF, not RFLAGS.IF.
      
      MOVSS blocking can be initiated by userspace, but can be coincident with
      a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is
      executed in the MOVSS shadow.  MOV DR #GPs at CPL>0, thus MOVSS blocking
      is problematic only for CPL0 (and only if the guest is crazy enough to
      access a DR in a MOVSS shadow).  All other sources of #DBs are either
      suppressed by MOVSS blocking (single-step, code fetch, data, and I/O),
      are mutually exclusive with MOVSS blocking (T-bit task switch), or are
      already handled by KVM (ICEBP, a.k.a. INT1).
      
      This bug was originally found by running tests[1] created for XSA-308[2].
      Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is
      presumably why the Xen bug was deemed to be an exploitable DOS from guest
      userspace.  KVM already handles ICEBP by skipping the ICEBP instruction
      and thus clears MOVSS blocking as a side effect of its "emulation".
      
      [1] http://xenbits.xenproject.org/docs/xtf/xsa-308_2main_8c_source.html
      [2] https://xenbits.xen.org/xsa/advisory-308.htmlReported-by: default avatarDavid Woodhouse <dwmw2@infradead.org>
      Reported-by: default avatarAlexander Graf <graf@amazon.de>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220120000624.655815-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b9bed78e
    • Vitaly Kuznetsov's avatar
      KVM: x86: Move CPUID.(EAX=0x12,ECX=1) mangling to __kvm_update_cpuid_runtime() · 5c89be1d
      Vitaly Kuznetsov authored
      Full equality check of CPUID data on update (kvm_cpuid_check_equal()) may
      fail for SGX enabled CPUs as CPUID.(EAX=0x12,ECX=1) is currently being
      mangled in kvm_vcpu_after_set_cpuid(). Move it to
      __kvm_update_cpuid_runtime() and split off cpuid_get_supported_xcr0()
      helper  as 'vcpu->arch.guest_supported_xcr0' update needs (logically)
      to stay in kvm_vcpu_after_set_cpuid().
      
      Cc: stable@vger.kernel.org
      Fixes: feb627e8 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20220124103606.2630588-2-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5c89be1d
  3. 24 Jan, 2022 3 commits
    • Xianting Tian's avatar
      KVM: remove async parameter of hva_to_pfn_remapped() · 1625566e
      Xianting Tian authored
      The async parameter of hva_to_pfn_remapped() is not used, so remove it.
      Signed-off-by: default avatarXianting Tian <xianting.tian@linux.alibaba.com>
      Message-Id: <20220124020456.156386-1-xianting.tian@linux.alibaba.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1625566e
    • Peter Zijlstra's avatar
      x86,kvm/xen: Remove superfluous .fixup usage · adb759e5
      Peter Zijlstra authored
      Commit 14243b38 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and
      event channel delivery") adds superfluous .fixup usage after the whole
      .fixup section was removed in commit e5eefda5 ("x86: Remove .fixup
      section").
      
      Fixes: 14243b38 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery")
      Reported-by: default avatarBorislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Message-Id: <20220123124219.GH20638@worktop.programming.kicks-ass.net>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      adb759e5
    • Sean Christopherson's avatar
      KVM: VMX: Zero host's SYSENTER_ESP iff SYSENTER is NOT used · 94fea1d8
      Sean Christopherson authored
      Zero vmcs.HOST_IA32_SYSENTER_ESP when initializing *constant* host state
      if and only if SYSENTER cannot be used, i.e. the kernel is a 64-bit
      kernel and is not emulating 32-bit syscalls.  As the name suggests,
      vmx_set_constant_host_state() is intended for state that is *constant*.
      When SYSENTER is used, SYSENTER_ESP isn't constant because stacks are
      per-CPU, and the VMCS must be updated whenever the vCPU is migrated to a
      new CPU.  The logic in vmx_vcpu_load_vmcs() doesn't differentiate between
      "never loaded" and "loaded on a different CPU", i.e. setting SYSENTER_ESP
      on VMCS load also handles setting correct host state when the VMCS is
      first loaded.
      
      Because a VMCS must be loaded before it is initialized during vCPU RESET,
      zeroing the field in vmx_set_constant_host_state() obliterates the value
      that was written when the VMCS was loaded.  If the vCPU is run before it
      is migrated, the subsequent VM-Exit will zero out MSR_IA32_SYSENTER_ESP,
      leading to a #DF on the next 32-bit syscall.
      
        double fault: 0000 [#1] SMP
        CPU: 0 PID: 990 Comm: stable Not tainted 5.16.0+ #97
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        EIP: entry_SYSENTER_32+0x0/0xe7
        Code: <9c> 50 eb 17 0f 20 d8 a9 00 10 00 00 74 0d 25 ff ef ff ff 0f 22 d8
        EAX: 000000a2 EBX: a8d1300c ECX: a8d13014 EDX: 00000000
        ESI: a8f87000 EDI: a8d13014 EBP: a8d12fc0 ESP: 00000000
        DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00210093
        CR0: 80050033 CR2: fffffffc CR3: 02c3b000 CR4: 00152e90
      
      Fixes: 6ab8a405 ("KVM: VMX: Avoid to rdmsrl(MSR_IA32_SYSENTER_ESP)")
      Cc: Lai Jiangshan <laijs@linux.alibaba.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220122015211.1468758-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      94fea1d8
  4. 20 Jan, 2022 3 commits
  5. 19 Jan, 2022 25 commits