1. 26 Jan, 2022 21 commits
    • Xiaoyao Li's avatar
      KVM: x86: Keep MSR_IA32_XSS unchanged for INIT · be4f3b3f
      Xiaoyao Li authored
      It has been corrected from SDM version 075 that MSR_IA32_XSS is reset to
      zero on Power up and Reset but keeps unchanged on INIT.
      
      Fixes: a554d207 ("KVM: X86: Processor States following Reset or INIT")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarXiaoyao Li <xiaoyao.li@intel.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220126172226.2298529-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      be4f3b3f
    • Sean Christopherson's avatar
      KVM: x86: Free kvm_cpuid_entry2 array on post-KVM_RUN KVM_SET_CPUID{,2} · 811f95ff
      Sean Christopherson authored
      Free the "struct kvm_cpuid_entry2" array on successful post-KVM_RUN
      KVM_SET_CPUID{,2} to fix a memory leak, the callers of kvm_set_cpuid()
      free the array only on failure.
      
       BUG: memory leak
       unreferenced object 0xffff88810963a800 (size 2048):
        comm "syz-executor025", pid 3610, jiffies 4294944928 (age 8.080s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 0d 00 00 00  ................
          47 65 6e 75 6e 74 65 6c 69 6e 65 49 00 00 00 00  GenuntelineI....
        backtrace:
          [<ffffffff814948ee>] kmalloc_node include/linux/slab.h:604 [inline]
          [<ffffffff814948ee>] kvmalloc_node+0x3e/0x100 mm/util.c:580
          [<ffffffff814950f2>] kvmalloc include/linux/slab.h:732 [inline]
          [<ffffffff814950f2>] vmemdup_user+0x22/0x100 mm/util.c:199
          [<ffffffff8109f5ff>] kvm_vcpu_ioctl_set_cpuid2+0x8f/0xf0 arch/x86/kvm/cpuid.c:423
          [<ffffffff810711b9>] kvm_arch_vcpu_ioctl+0xb99/0x1e60 arch/x86/kvm/x86.c:5251
          [<ffffffff8103e92d>] kvm_vcpu_ioctl+0x4ad/0x950 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4066
          [<ffffffff815afacc>] vfs_ioctl fs/ioctl.c:51 [inline]
          [<ffffffff815afacc>] __do_sys_ioctl fs/ioctl.c:874 [inline]
          [<ffffffff815afacc>] __se_sys_ioctl fs/ioctl.c:860 [inline]
          [<ffffffff815afacc>] __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:860
          [<ffffffff844a3335>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          [<ffffffff844a3335>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
          [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: c6617c61 ("KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN")
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+be576ad7655690586eec@syzkaller.appspotmail.com
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220125210445.2053429-1-seanjc@google.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      811f95ff
    • Sean Christopherson's avatar
      KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02 · d6e656cd
      Sean Christopherson authored
      WARN if KVM attempts to allocate a shadow VMCS for vmcs02.  KVM emulates
      VMCS shadowing but doesn't virtualize it, i.e. KVM should never allocate
      a "real" shadow VMCS for L2.
      
      The previous code WARNed but continued anyway with the allocation,
      presumably in an attempt to avoid NULL pointer dereference.
      However, alloc_vmcs (and hence alloc_shadow_vmcs) can fail, and
      indeed the sole caller does:
      
      	if (enable_shadow_vmcs && !alloc_shadow_vmcs(vcpu))
      		goto out_shadow_vmcs;
      
      which makes it not a useful attempt.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220125220527.2093146-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d6e656cd
    • Sean Christopherson's avatar
      KVM: selftests: Don't skip L2's VMCALL in SMM test for SVM guest · 4cf3d3eb
      Sean Christopherson authored
      Don't skip the vmcall() in l2_guest_code() prior to re-entering L2, doing
      so will result in L2 running to completion, popping '0' off the stack for
      RET, jumping to address '0', and ultimately dying with a triple fault
      shutdown.
      
      It's not at all obvious why the test re-enters L2 and re-executes VMCALL,
      but presumably it serves a purpose.  The VMX path doesn't skip vmcall(),
      and the test can't possibly have passed on SVM, so just do what VMX does.
      
      Fixes: d951b221 ("KVM: selftests: smm_test: Test SMM enter from L2")
      Cc: Maxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220125221725.2101126-1-seanjc@google.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Tested-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4cf3d3eb
    • Vitaly Kuznetsov's avatar
      KVM: x86: Check .flags in kvm_cpuid_check_equal() too · 033a3ea5
      Vitaly Kuznetsov authored
      kvm_cpuid_check_equal() checks for the (full) equality of the supplied
      CPUID data so .flags need to be checked too.
      Reported-by: default avatarSean Christopherson <seanjc@google.com>
      Fixes: c6617c61 ("KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20220126131804.2839410-1-vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      033a3ea5
    • Sean Christopherson's avatar
      KVM: x86: Forcibly leave nested virt when SMM state is toggled · f7e57078
      Sean Christopherson authored
      Forcibly leave nested virtualization operation if userspace toggles SMM
      state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS.  If userspace
      forces the vCPU out of SMM while it's post-VMXON and then injects an SMI,
      vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both
      vmxon=false and smm.vmxon=false, but all other nVMX state allocated.
      
      Don't attempt to gracefully handle the transition as (a) most transitions
      are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't
      sufficient information to handle all transitions, e.g. SVM wants access
      to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede
      KVM_SET_NESTED_STATE during state restore as the latter disallows putting
      the vCPU into L2 if SMM is active, and disallows tagging the vCPU as
      being post-VMXON in SMM if SMM is not active.
      
      Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX
      due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond
      just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU
      in an architecturally impossible state.
      
        WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
        WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
        Modules linked in:
        CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
        RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
        Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00
        Call Trace:
         <TASK>
         kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123
         kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline]
         kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460
         kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline]
         kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676
         kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline]
         kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250
         kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273
         __fput+0x286/0x9f0 fs/file_table.c:311
         task_work_run+0xdd/0x1a0 kernel/task_work.c:164
         exit_task_work include/linux/task_work.h:32 [inline]
         do_exit+0xb29/0x2a30 kernel/exit.c:806
         do_group_exit+0xd2/0x2f0 kernel/exit.c:935
         get_signal+0x4b0/0x28c0 kernel/signal.c:2862
         arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868
         handle_signal_work kernel/entry/common.c:148 [inline]
         exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
         exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207
         __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
         syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
         do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
         entry_SYSCALL_64_after_hwframe+0x44/0xae
         </TASK>
      
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220125220358.2091737-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f7e57078
    • Vitaly Kuznetsov's avatar
      KVM: SVM: drop unnecessary code in svm_hv_vmcb_dirty_nested_enlightenments() · aa3b39f3
      Vitaly Kuznetsov authored
      Commit 3fa5e8fd ("KVM: SVM: delay svm_vcpu_init_msrpm after
      svm->vmcb is initialized") re-arranged svm_vcpu_init_msrpm() call in
      svm_create_vcpu(), thus making the comment about vmcb being NULL
      obsolete. Drop it.
      
      While on it, drop superfluous vmcb_is_clean() check: vmcb_mark_dirty()
      is a bit flip, an extra check is unlikely to bring any performance gain.
      Drop now-unneeded vmcb_is_clean() helper as well.
      
      Fixes: 3fa5e8fd ("KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initialized")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211220152139.418372-2-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      aa3b39f3
    • Vitaly Kuznetsov's avatar
      KVM: SVM: hyper-v: Enable Enlightened MSR-Bitmap support for real · 38dfa830
      Vitaly Kuznetsov authored
      Commit c4327f15 ("KVM: SVM: hyper-v: Enlightened MSR-Bitmap support")
      introduced enlightened MSR-Bitmap support for KVM-on-Hyper-V but it didn't
      actually enable the support. Similar to enlightened NPT TLB flush and
      direct TLB flush features, the guest (KVM) has to tell L0 (Hyper-V) that
      it's using the feature by setting the appropriate feature fit in VMCB
      control area (sw reserved fields).
      
      Fixes: c4327f15 ("KVM: SVM: hyper-v: Enlightened MSR-Bitmap support")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20211220152139.418372-3-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      38dfa830
    • Sean Christopherson's avatar
      KVM: SVM: Don't kill SEV guest if SMAP erratum triggers in usermode · cdf85e0c
      Sean Christopherson authored
      Inject a #GP instead of synthesizing triple fault to try to avoid killing
      the guest if emulation of an SEV guest fails due to encountering the SMAP
      erratum.  The injected #GP may still be fatal to the guest, e.g. if the
      userspace process is providing critical functionality, but KVM should
      make every attempt to keep the guest alive.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-10-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cdf85e0c
    • Sean Christopherson's avatar
      KVM: SVM: Don't apply SEV+SMAP workaround on code fetch or PT access · 3280cc22
      Sean Christopherson authored
      Resume the guest instead of synthesizing a triple fault shutdown if the
      instruction bytes buffer is empty due to the #NPF being on the code fetch
      itself or on a page table access.  The SMAP errata applies if and only if
      the code fetch was successful and ucode's subsequent data read from the
      code page encountered a SMAP violation.  In practice, the guest is likely
      hosed either way, but crashing the guest on a code fetch to emulated MMIO
      is technically wrong according to the behavior described in the APM.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-9-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3280cc22
    • Sean Christopherson's avatar
      KVM: SVM: Inject #UD on attempted emulation for SEV guest w/o insn buffer · 04c40f34
      Sean Christopherson authored
      Inject #UD if KVM attempts emulation for an SEV guests without an insn
      buffer and instruction decoding is required.  The previous behavior of
      allowing emulation if there is no insn buffer is undesirable as doing so
      means KVM is reading guest private memory and thus decoding cyphertext,
      i.e. is emulating garbage.  The check was previously necessary as the
      emulation type was not provided, i.e. SVM needed to allow emulation to
      handle completion of emulation after exiting to userspace to handle I/O.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-8-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      04c40f34
    • Sean Christopherson's avatar
      KVM: SVM: WARN if KVM attempts emulation on #UD or #GP for SEV guests · 132627c6
      Sean Christopherson authored
      WARN if KVM attempts to emulate in response to #UD or #GP for SEV guests,
      i.e. if KVM intercepts #UD or #GP, as emulation on any fault except #NPF
      is impossible since KVM cannot read guest private memory to get the code
      stream, and the CPU's DecodeAssists feature only provides the instruction
      bytes on #NPF.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-7-seanjc@google.com>
      [Warn on EMULTYPE_TRAP_UD_FORCED according to Liam Merwick's review. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      132627c6
    • Sean Christopherson's avatar
      KVM: x86: Pass emulation type to can_emulate_instruction() · 4d31d9ef
      Sean Christopherson authored
      Pass the emulation type to kvm_x86_ops.can_emulate_insutrction() so that
      a future commit can harden KVM's SEV support to WARN on emulation
      scenarios that should never happen.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-6-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4d31d9ef
    • Sean Christopherson's avatar
      KVM: SVM: Explicitly require DECODEASSISTS to enable SEV support · c532f290
      Sean Christopherson authored
      Add a sanity check on DECODEASSIST being support if SEV is supported, as
      KVM cannot read guest private memory and thus relies on the CPU to
      provide the instruction byte stream on #NPF for emulation.  The intent of
      the check is to document the dependency, it should never fail in practice
      as producing hardware that supports SEV but not DECODEASSISTS would be
      non-sensical.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-5-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c532f290
    • Sean Christopherson's avatar
      KVM: SVM: Don't intercept #GP for SEV guests · 0b0be065
      Sean Christopherson authored
      Never intercept #GP for SEV guests as reading SEV guest private memory
      will return cyphertext, i.e. emulating on #GP can't work as intended.
      
      Cc: stable@vger.kernel.org
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0b0be065
    • Sean Christopherson's avatar
      Revert "KVM: SVM: avoid infinite loop on NPF from bad address" · 31c25585
      Sean Christopherson authored
      Revert a completely broken check on an "invalid" RIP in SVM's workaround
      for the DecodeAssists SMAP errata.  kvm_vcpu_gfn_to_memslot() obviously
      expects a gfn, i.e. operates in the guest physical address space, whereas
      RIP is a virtual (not even linear) address.  The "fix" worked for the
      problematic KVM selftest because the test identity mapped RIP.
      
      Fully revert the hack instead of trying to translate RIP to a GPA, as the
      non-SEV case is now handled earlier, and KVM cannot access guest page
      tables to translate RIP.
      
      This reverts commit e72436bc.
      
      Fixes: e72436bc ("KVM: SVM: avoid infinite loop on NPF from bad address")
      Reported-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      31c25585
    • Sean Christopherson's avatar
      KVM: SVM: Never reject emulation due to SMAP errata for !SEV guests · 55467fcd
      Sean Christopherson authored
      Always signal that emulation is possible for !SEV guests regardless of
      whether or not the CPU provided a valid instruction byte stream.  KVM can
      read all guest state (memory and registers) for !SEV guests, i.e. can
      fetch the code stream from memory even if the CPU failed to do so because
      of the SMAP errata.
      
      Fixes: 05d5a486 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)")
      Cc: stable@vger.kernel.org
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Message-Id: <20220120010719.711476-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      55467fcd
    • Denis Valeev's avatar
      KVM: x86: nSVM: skip eax alignment check for non-SVM instructions · 47c28d43
      Denis Valeev authored
      The bug occurs on #GP triggered by VMware backdoor when eax value is
      unaligned. eax alignment check should not be applied to non-SVM
      instructions because it leads to incorrect omission of the instructions
      emulation.
      Apply the alignment check only to SVM instructions to fix.
      
      Fixes: d1cba6c9 ("KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround")
      Signed-off-by: default avatarDenis Valeev <lemniscattaden@gmail.com>
      Message-Id: <Yexlhaoe1Fscm59u@q>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      47c28d43
    • Like Xu's avatar
      KVM: x86/cpuid: Exclude unpermitted xfeatures sizes at KVM_GET_SUPPORTED_CPUID · 1ffce092
      Like Xu authored
      With the help of xstate_get_guest_group_perm(), KVM can exclude unpermitted
      xfeatures in cpuid.0xd.0.eax, in which case the corresponding xfeatures
      sizes should also be matched to the permitted xfeatures.
      
      To fix this inconsistency, the permitted_xcr0 and permitted_xss are defined
      consistently, which implies 'supported' plus certain permissions for this
      task, and it also fixes cpuid.0xd.1.ebx and later leaf-by-leaf queries.
      
      Fixes: 445ecdf7 ("kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUID")
      Signed-off-by: default avatarLike Xu <likexu@tencent.com>
      Message-Id: <20220125115223.33707-1-likexu@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1ffce092
    • Wanpeng Li's avatar
      KVM: LAPIC: Also cancel preemption timer during SET_LAPIC · 35fe7cfb
      Wanpeng Li authored
      The below warning is splatting during guest reboot.
      
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 1931 at arch/x86/kvm/x86.c:10322 kvm_arch_vcpu_ioctl_run+0x874/0x880 [kvm]
        CPU: 0 PID: 1931 Comm: qemu-system-x86 Tainted: G          I       5.17.0-rc1+ #5
        RIP: 0010:kvm_arch_vcpu_ioctl_run+0x874/0x880 [kvm]
        Call Trace:
         <TASK>
         kvm_vcpu_ioctl+0x279/0x710 [kvm]
         __x64_sys_ioctl+0x83/0xb0
         do_syscall_64+0x3b/0xc0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
        RIP: 0033:0x7fd39797350b
      
      This can be triggered by not exposing tsc-deadline mode and doing a reboot in
      the guest. The lapic_shutdown() function which is called in sys_reboot path
      will not disarm the flying timer, it just masks LVTT. lapic_shutdown() clears
      APIC state w/ LVT_MASKED and timer-mode bit is 0, this can trigger timer-mode
      switch between tsc-deadline and oneshot/periodic, which can result in preemption
      timer be cancelled in apic_update_lvtt(). However, We can't depend on this when
      not exposing tsc-deadline mode and oneshot/periodic modes emulated by preemption
      timer. Qemu will synchronise states around reset, let's cancel preemption timer
      under KVM_SET_LAPIC.
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1643102220-35667-1-git-send-email-wanpengli@tencent.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      35fe7cfb
    • Jim Mattson's avatar
      KVM: VMX: Remove vmcs_config.order · 519669cc
      Jim Mattson authored
      The maximum size of a VMCS (or VMXON region) is 4096. By definition,
      these are order 0 allocations.
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Message-Id: <20220125004359.147600-1-jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      519669cc
  2. 25 Jan, 2022 4 commits
    • Quanfa Fu's avatar
      KVM/X86: Make kvm_vcpu_reload_apic_access_page() static · d081a343
      Quanfa Fu authored
      Make kvm_vcpu_reload_apic_access_page() static
      as it is no longer invoked directly by vmx
      and it is also no longer exported.
      
      No functional change intended.
      Signed-off-by: default avatarQuanfa Fu <quanfafu@gmail.com>
      Message-Id: <20211219091446.174584-1-quanfafu@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d081a343
    • David Matlack's avatar
      KVM: selftests: Re-enable access_tracking_perf_test · de1956f4
      David Matlack authored
      This selftest was accidentally removed by commit 6a581508
      ("selftest: KVM: Add intra host migration tests"). Add it back.
      
      Fixes: 6a581508 ("selftest: KVM: Add intra host migration tests")
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20220120003826.2805036-1-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      de1956f4
    • Sean Christopherson's avatar
      KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow · b9bed78e
      Sean Christopherson authored
      Set vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS, a.k.a. the pending single-step
      breakpoint flag, when re-injecting a #DB with RFLAGS.TF=1, and STI or
      MOVSS blocking is active.  Setting the flag is necessary to make VM-Entry
      consistency checks happy, as VMX has an invariant that if RFLAGS.TF is
      set and STI/MOVSS blocking is true, then the previous instruction must
      have been STI or MOV/POP, and therefore a single-step #DB must be pending
      since the RFLAGS.TF cannot have been set by the previous instruction,
      i.e. the one instruction delay after setting RFLAGS.TF must have already
      expired.
      
      Normally, the CPU sets vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS appropriately
      when recording guest state as part of a VM-Exit, but #DB VM-Exits
      intentionally do not treat the #DB as "guest state" as interception of
      the #DB effectively makes the #DB host-owned, thus KVM needs to manually
      set PENDING_DBG.BS when forwarding/re-injecting the #DB to the guest.
      
      Note, although this bug can be triggered by guest userspace, doing so
      requires IOPL=3, and guest userspace running with IOPL=3 has full access
      to all I/O ports (from the guest's perspective) and can crash/reboot the
      guest any number of ways.  IOPL=3 is required because STI blocking kicks
      in if and only if RFLAGS.IF is toggled 0=>1, and if CPL>IOPL, STI either
      takes a #GP or modifies RFLAGS.VIF, not RFLAGS.IF.
      
      MOVSS blocking can be initiated by userspace, but can be coincident with
      a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is
      executed in the MOVSS shadow.  MOV DR #GPs at CPL>0, thus MOVSS blocking
      is problematic only for CPL0 (and only if the guest is crazy enough to
      access a DR in a MOVSS shadow).  All other sources of #DBs are either
      suppressed by MOVSS blocking (single-step, code fetch, data, and I/O),
      are mutually exclusive with MOVSS blocking (T-bit task switch), or are
      already handled by KVM (ICEBP, a.k.a. INT1).
      
      This bug was originally found by running tests[1] created for XSA-308[2].
      Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is
      presumably why the Xen bug was deemed to be an exploitable DOS from guest
      userspace.  KVM already handles ICEBP by skipping the ICEBP instruction
      and thus clears MOVSS blocking as a side effect of its "emulation".
      
      [1] http://xenbits.xenproject.org/docs/xtf/xsa-308_2main_8c_source.html
      [2] https://xenbits.xen.org/xsa/advisory-308.htmlReported-by: default avatarDavid Woodhouse <dwmw2@infradead.org>
      Reported-by: default avatarAlexander Graf <graf@amazon.de>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220120000624.655815-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b9bed78e
    • Vitaly Kuznetsov's avatar
      KVM: x86: Move CPUID.(EAX=0x12,ECX=1) mangling to __kvm_update_cpuid_runtime() · 5c89be1d
      Vitaly Kuznetsov authored
      Full equality check of CPUID data on update (kvm_cpuid_check_equal()) may
      fail for SGX enabled CPUs as CPUID.(EAX=0x12,ECX=1) is currently being
      mangled in kvm_vcpu_after_set_cpuid(). Move it to
      __kvm_update_cpuid_runtime() and split off cpuid_get_supported_xcr0()
      helper  as 'vcpu->arch.guest_supported_xcr0' update needs (logically)
      to stay in kvm_vcpu_after_set_cpuid().
      
      Cc: stable@vger.kernel.org
      Fixes: feb627e8 ("KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20220124103606.2630588-2-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5c89be1d
  3. 24 Jan, 2022 3 commits
    • Xianting Tian's avatar
      KVM: remove async parameter of hva_to_pfn_remapped() · 1625566e
      Xianting Tian authored
      The async parameter of hva_to_pfn_remapped() is not used, so remove it.
      Signed-off-by: default avatarXianting Tian <xianting.tian@linux.alibaba.com>
      Message-Id: <20220124020456.156386-1-xianting.tian@linux.alibaba.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1625566e
    • Peter Zijlstra's avatar
      x86,kvm/xen: Remove superfluous .fixup usage · adb759e5
      Peter Zijlstra authored
      Commit 14243b38 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and
      event channel delivery") adds superfluous .fixup usage after the whole
      .fixup section was removed in commit e5eefda5 ("x86: Remove .fixup
      section").
      
      Fixes: 14243b38 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery")
      Reported-by: default avatarBorislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Message-Id: <20220123124219.GH20638@worktop.programming.kicks-ass.net>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      adb759e5
    • Sean Christopherson's avatar
      KVM: VMX: Zero host's SYSENTER_ESP iff SYSENTER is NOT used · 94fea1d8
      Sean Christopherson authored
      Zero vmcs.HOST_IA32_SYSENTER_ESP when initializing *constant* host state
      if and only if SYSENTER cannot be used, i.e. the kernel is a 64-bit
      kernel and is not emulating 32-bit syscalls.  As the name suggests,
      vmx_set_constant_host_state() is intended for state that is *constant*.
      When SYSENTER is used, SYSENTER_ESP isn't constant because stacks are
      per-CPU, and the VMCS must be updated whenever the vCPU is migrated to a
      new CPU.  The logic in vmx_vcpu_load_vmcs() doesn't differentiate between
      "never loaded" and "loaded on a different CPU", i.e. setting SYSENTER_ESP
      on VMCS load also handles setting correct host state when the VMCS is
      first loaded.
      
      Because a VMCS must be loaded before it is initialized during vCPU RESET,
      zeroing the field in vmx_set_constant_host_state() obliterates the value
      that was written when the VMCS was loaded.  If the vCPU is run before it
      is migrated, the subsequent VM-Exit will zero out MSR_IA32_SYSENTER_ESP,
      leading to a #DF on the next 32-bit syscall.
      
        double fault: 0000 [#1] SMP
        CPU: 0 PID: 990 Comm: stable Not tainted 5.16.0+ #97
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        EIP: entry_SYSENTER_32+0x0/0xe7
        Code: <9c> 50 eb 17 0f 20 d8 a9 00 10 00 00 74 0d 25 ff ef ff ff 0f 22 d8
        EAX: 000000a2 EBX: a8d1300c ECX: a8d13014 EDX: 00000000
        ESI: a8f87000 EDI: a8d13014 EBP: a8d12fc0 ESP: 00000000
        DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00210093
        CR0: 80050033 CR2: fffffffc CR3: 02c3b000 CR4: 00152e90
      
      Fixes: 6ab8a405 ("KVM: VMX: Avoid to rdmsrl(MSR_IA32_SYSENTER_ESP)")
      Cc: Lai Jiangshan <laijs@linux.alibaba.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220122015211.1468758-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      94fea1d8
  4. 20 Jan, 2022 3 commits
  5. 19 Jan, 2022 9 commits