1. 11 Jun, 2020 2 commits
    • Paolo Bonzini's avatar
      Merge branch 'kvm-basic-exit-reason' into HEAD · 77f81f37
      Paolo Bonzini authored
      Using a topic branch so that stable branches can simply cherry-pick the
      patch.
      Reviewed-by: default avatarOliver Upton <oupton@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      77f81f37
    • Sean Christopherson's avatar
      KVM: nVMX: Consult only the "basic" exit reason when routing nested exit · 2ebac8bb
      Sean Christopherson authored
      Consult only the basic exit reason, i.e. bits 15:0 of vmcs.EXIT_REASON,
      when determining whether a nested VM-Exit should be reflected into L1 or
      handled by KVM in L0.
      
      For better or worse, the switch statement in nested_vmx_exit_reflected()
      currently defaults to "true", i.e. reflects any nested VM-Exit without
      dedicated logic.  Because the case statements only contain the basic
      exit reason, any VM-Exit with modifier bits set will be reflected to L1,
      even if KVM intended to handle it in L0.
      
      Practically speaking, this only affects EXIT_REASON_MCE_DURING_VMENTRY,
      i.e. a #MC that occurs on nested VM-Enter would be incorrectly routed to
      L1, as "failed VM-Entry" is the only modifier that KVM can currently
      encounter.  The SMM modifiers will never be generated as KVM doesn't
      support/employ a SMI Transfer Monitor.  Ditto for "exit from enclave",
      as KVM doesn't yet support virtualizing SGX, i.e. it's impossible to
      enter an enclave in a KVM guest (L1 or L2).
      
      Fixes: 644d711a ("KVM: nVMX: Deciding if L0 or L1 should handle an L2 exit")
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Xiaoyao Li <xiaoyao.li@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200227174430.26371-1-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2ebac8bb
  2. 09 Jun, 2020 2 commits
  3. 08 Jun, 2020 6 commits
    • Eiichi Tsukata's avatar
      KVM: x86: Fix APIC page invalidation race · e649b3f0
      Eiichi Tsukata authored
      Commit b1394e74 ("KVM: x86: fix APIC page invalidation") tried
      to fix inappropriate APIC page invalidation by re-introducing arch
      specific kvm_arch_mmu_notifier_invalidate_range() and calling it from
      kvm_mmu_notifier_invalidate_range_start. However, the patch left a
      possible race where the VMCS APIC address cache is updated *before*
      it is unmapped:
      
        (Invalidator) kvm_mmu_notifier_invalidate_range_start()
        (Invalidator) kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD)
        (KVM VCPU) vcpu_enter_guest()
        (KVM VCPU) kvm_vcpu_reload_apic_access_page()
        (Invalidator) actually unmap page
      
      Because of the above race, there can be a mismatch between the
      host physical address stored in the APIC_ACCESS_PAGE VMCS field and
      the host physical address stored in the EPT entry for the APIC GPA
      (0xfee0000).  When this happens, the processor will not trap APIC
      accesses, and will instead show the raw contents of the APIC-access page.
      Because Windows OS periodically checks for unexpected modifications to
      the LAPIC register, this will show up as a BSOD crash with BugCheck
      CRITICAL_STRUCTURE_CORRUPTION (109) we are currently seeing in
      https://bugzilla.redhat.com/show_bug.cgi?id=1751017.
      
      The root cause of the issue is that kvm_arch_mmu_notifier_invalidate_range()
      cannot guarantee that no additional references are taken to the pages in
      the range before kvm_mmu_notifier_invalidate_range_end().  Fortunately,
      this case is supported by the MMU notifier API, as documented in
      include/linux/mmu_notifier.h:
      
      	 * If the subsystem
               * can't guarantee that no additional references are taken to
               * the pages in the range, it has to implement the
               * invalidate_range() notifier to remove any references taken
               * after invalidate_range_start().
      
      The fix therefore is to reload the APIC-access page field in the VMCS
      from kvm_mmu_notifier_invalidate_range() instead of ..._range_start().
      
      Cc: stable@vger.kernel.org
      Fixes: b1394e74 ("KVM: x86: fix APIC page invalidation")
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=197951Signed-off-by: default avatarEiichi Tsukata <eiichi.tsukata@nutanix.com>
      Message-Id: <20200606042627.61070-1-eiichi.tsukata@nutanix.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e649b3f0
    • Paolo Bonzini's avatar
      KVM: SVM: fix calls to is_intercept · fb7333df
      Paolo Bonzini authored
      is_intercept takes an INTERCEPT_* constant, not SVM_EXIT_*; because
      of this, the compiler was removing the body of the conditionals,
      as if is_intercept returned 0.
      
      This unveils a latent bug: when clearing the VINTR intercept,
      int_ctl must also be changed in the L1 VMCB (svm->nested.hsave),
      just like the intercept itself is also changed in the L1 VMCB.
      Otherwise V_IRQ remains set and, due to the VINTR intercept being clear,
      we get a spurious injection of a vector 0 interrupt on the next
      L2->L1 vmexit.
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fb7333df
    • Vitaly Kuznetsov's avatar
      KVM: selftests: fix vmx_preemption_timer_test build with GCC10 · 75ad6e80
      Vitaly Kuznetsov authored
      GCC10 fails to build vmx_preemption_timer_test:
      
      gcc -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99
      -fno-stack-protector -fno-PIE -I../../../../tools/include
       -I../../../../tools/arch/x86/include -I../../../../usr/include/
       -Iinclude -Ix86_64 -Iinclude/x86_64 -I..  -pthread  -no-pie
       x86_64/evmcs_test.c ./linux/tools/testing/selftests/kselftest_harness.h
       ./linux/tools/testing/selftests/kselftest.h
       ./linux/tools/testing/selftests/kvm/libkvm.a
       -o ./linux/tools/testing/selftests/kvm/x86_64/evmcs_test
      /usr/bin/ld: ./linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):
       ./linux/tools/testing/selftests/kvm/include/x86_64/vmx.h:603:
       multiple definition of `ctrl_exit_rev'; /tmp/ccMQpvNt.o:
       ./linux/tools/testing/selftests/kvm/include/x86_64/vmx.h:603:
       first defined here
      /usr/bin/ld: ./linux/tools/testing/selftests/kvm/libkvm.a(vmx.o):
       ./linux/tools/testing/selftests/kvm/include/x86_64/vmx.h:602:
       multiple definition of `ctrl_pin_rev'; /tmp/ccMQpvNt.o:
       ./linux/tools/testing/selftests/kvm/include/x86_64/vmx.h:602:
       first defined here
       ...
      
      ctrl_exit_rev/ctrl_pin_rev/basic variables are only used in
      vmx_preemption_timer_test.c, just move them there.
      
      Fixes: 8d7fbf01 ("KVM: selftests: VMX preemption timer migration test")
      Reported-by: default avatarMarcelo Bandeira Condotta <mcondotta@redhat.com>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200608112346.593513-2-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      75ad6e80
    • Vitaly Kuznetsov's avatar
      KVM: selftests: Add x86_64/debug_regs to .gitignore · 5ae1452f
      Vitaly Kuznetsov authored
      Add x86_64/debug_regs to .gitignore.
      Reported-by: default avatarMarcelo Bandeira Condotta <mcondotta@redhat.com>
      Fixes: 449aa906 ("KVM: selftests: Add KVM_SET_GUEST_DEBUG test")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200608112346.593513-1-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5ae1452f
    • Vitaly Kuznetsov's avatar
      Revert "KVM: x86: work around leak of uninitialized stack contents" · 25597f64
      Vitaly Kuznetsov authored
      handle_vmptrst()/handle_vmread() stopped injecting #PF unconditionally
      and switched to nested_vmx_handle_memory_failure() which just kills the
      guest with KVM_EXIT_INTERNAL_ERROR in case of MMIO access, zeroing
      'exception' in kvm_write_guest_virt_system() is not needed anymore.
      
      This reverts commit 541ab2ae.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200605115906.532682-2-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      25597f64
    • Vitaly Kuznetsov's avatar
      KVM: VMX: Properly handle kvm_read/write_guest_virt*() result · 7a35e515
      Vitaly Kuznetsov authored
      Syzbot reports the following issue:
      
      WARNING: CPU: 0 PID: 6819 at arch/x86/kvm/x86.c:618
       kvm_inject_emulated_page_fault+0x210/0x290 arch/x86/kvm/x86.c:618
      ...
      Call Trace:
      ...
      RIP: 0010:kvm_inject_emulated_page_fault+0x210/0x290 arch/x86/kvm/x86.c:618
      ...
       nested_vmx_get_vmptr+0x1f9/0x2a0 arch/x86/kvm/vmx/nested.c:4638
       handle_vmon arch/x86/kvm/vmx/nested.c:4767 [inline]
       handle_vmon+0x168/0x3a0 arch/x86/kvm/vmx/nested.c:4728
       vmx_handle_exit+0x29c/0x1260 arch/x86/kvm/vmx/vmx.c:6067
      
      'exception' we're trying to inject with kvm_inject_emulated_page_fault()
      comes from:
      
        nested_vmx_get_vmptr()
         kvm_read_guest_virt()
           kvm_read_guest_virt_helper()
             vcpu->arch.walk_mmu->gva_to_gpa()
      
      but it is only set when GVA to GPA conversion fails. In case it doesn't but
      we still fail kvm_vcpu_read_guest_page(), X86EMUL_IO_NEEDED is returned and
      nested_vmx_get_vmptr() calls kvm_inject_emulated_page_fault() with zeroed
      'exception'. This happen when the argument is MMIO.
      
      Paolo also noticed that nested_vmx_get_vmptr() is not the only place in
      KVM code where kvm_read/write_guest_virt*() return result is mishandled.
      VMX instructions along with INVPCID have the same issue. This was already
      noticed before, e.g. see commit 541ab2ae ("KVM: x86: work around
      leak of uninitialized stack contents") but was never fully fixed.
      
      KVM could've handled the request correctly by going to userspace and
      performing I/O but there doesn't seem to be a good need for such requests
      in the first place.
      
      Introduce vmx_handle_memory_failure() as an interim solution.
      
      Note, nested_vmx_get_vmptr() now has three possible outcomes: OK, PF,
      KVM_EXIT_INTERNAL_ERROR and callers need to know if userspace exit is
      needed (for KVM_EXIT_INTERNAL_ERROR) in case of failure. We don't seem
      to have a good enum describing this tristate, just add "int *ret" to
      nested_vmx_get_vmptr() interface to pass the information.
      
      Reported-by: syzbot+2a7156e11dc199bdbd8a@syzkaller.appspotmail.com
      Suggested-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200605115906.532682-1-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7a35e515
  4. 05 Jun, 2020 2 commits
    • Paolo Bonzini's avatar
      KVM: x86: emulate reserved nops from 0f/18 to 0f/1f · 34d2618d
      Paolo Bonzini authored
      Instructions starting with 0f18 up to 0f1f are reserved nops, except those
      that were assigned to MPX.  These include the endbr markers used by CET.
      List them correctly in the opcode table.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      34d2618d
    • Vitaly Kuznetsov's avatar
      KVM: selftests: Fix build with "make ARCH=x86_64" · b80db73d
      Vitaly Kuznetsov authored
      Marcelo reports that kvm selftests fail to build with
      "make ARCH=x86_64":
      
      gcc -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99
       -fno-stack-protector -fno-PIE -I../../../../tools/include
       -I../../../../tools/arch/x86_64/include  -I../../../../usr/include/
       -Iinclude -Ilib -Iinclude/x86_64 -I.. -c lib/kvm_util.c
       -o /var/tmp/20200604202744-bin/lib/kvm_util.o
      
      In file included from lib/kvm_util.c:11:
      include/x86_64/processor.h:14:10: fatal error: asm/msr-index.h: No such
       file or directory
      
       #include <asm/msr-index.h>
                ^~~~~~~~~~~~~~~~~
      compilation terminated.
      
      "make ARCH=x86", however, works. The problem is that arch specific headers
      for x86_64 live in 'tools/arch/x86/include', not in
      'tools/arch/x86_64/include'.
      
      Fixes: 66d69e08 ("selftests: fix kvm relocatable native/cross builds and installs")
      Reported-by: default avatarMarcelo Bandeira Condotta <mcondotta@redhat.com>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200605142028.550068-1-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b80db73d
  5. 04 Jun, 2020 28 commits