1. 11 Apr, 2018 1 commit
    • KarimAllah Ahmed's avatar
      X86/KVM: Do not allow DISABLE_EXITS_MWAIT when LAPIC ARAT is not available · 8e9b29b6
      KarimAllah Ahmed authored
      If the processor does not have an "Always Running APIC Timer" (aka ARAT),
      we should not give guests direct access to MWAIT. The LAPIC timer would
      stop ticking in deep C-states, so any host deadlines would not wakeup the
      host kernel.
      
      The host kernel intel_idle driver handles this by switching to broadcast
      mode when ARAT is not available and MWAIT is issued with a deep C-state
      that would stop the LAPIC timer. When MWAIT is passed through, we can not
      tell when MWAIT is issued.
      
      So just disable this capability when LAPIC ARAT is not available. I am not
      even sure if there are any CPUs with VMX support but no LAPIC ARAT or not.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Reported-by: default avatarWanpeng Li <kernellwp@gmail.com>
      Signed-off-by: default avatarKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8e9b29b6
  2. 10 Apr, 2018 2 commits
    • Colin Ian King's avatar
      kvm: selftests: fix spelling mistake: "divisable" and "divisible" · 4d5f26ee
      Colin Ian King authored
      Trivial fix to spelling mistakes in comment and message text
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4d5f26ee
    • KarimAllah Ahmed's avatar
      X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted · 386c6ddb
      KarimAllah Ahmed authored
      The VMX-preemption timer is used by KVM as a way to set deadlines for the
      guest (i.e. timer emulation). That was safe till very recently when
      capability KVM_X86_DISABLE_EXITS_MWAIT to disable intercepting MWAIT was
      introduced. According to Intel SDM 25.5.1:
      
      """
      The VMX-preemption timer operates in the C-states C0, C1, and C2; it also
      operates in the shutdown and wait-for-SIPI states. If the timer counts down
      to zero in any state other than the wait-for SIPI state, the logical
      processor transitions to the C0 C-state and causes a VM exit; the timer
      does not cause a VM exit if it counts down to zero in the wait-for-SIPI
      state. The timer is not decremented in C-states deeper than C2.
      """
      
      Now once the guest issues the MWAIT with a c-state deeper than
      C2 the preemption timer will never wake it up again since it stopped
      ticking! Usually this is compensated by other activities in the system that
      would wake the core from the deep C-state (and cause a VMExit). For
      example, if the host itself is ticking or it received interrupts, etc!
      
      So disable the VMX-preemption timer if MWAIT is exposed to the guest!
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarKarimAllah Ahmed <karahmed@amazon.de>
      Fixes: 4d5422ceSigned-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      386c6ddb
  3. 06 Apr, 2018 1 commit
  4. 04 Apr, 2018 11 commits
    • Paolo Bonzini's avatar
      kvm: selftests: add sync_regs_test · 6089ae0b
      Paolo Bonzini authored
      This includes the infrastructure to map the test into the guest and
      run code from the test program inside a VM.
      Signed-off-by: default avatarKen Hofsass <hofsass@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6089ae0b
    • Paolo Bonzini's avatar
      kvm: selftests: add API testing infrastructure · 783e9e51
      Paolo Bonzini authored
      Testsuite contributed by Google and cleaned up by myself for
      inclusion in Linux.
      Signed-off-by: default avatarKen Hofsass <hofsass@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      783e9e51
    • Peng Hao's avatar
      kvm: x86: fix a compile warning · 3140c156
      Peng Hao authored
      fix a "warning: no previous prototype".
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPeng Hao <peng.hao2@zte.com.cn>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3140c156
    • Wanpeng Li's avatar
      KVM: X86: Add Force Emulation Prefix for "emulate the next instruction" · 6c86eedc
      Wanpeng Li authored
      There is no easy way to force KVM to run an instruction through the emulator
      (by design as that will expose the x86 emulator as a significant attack-surface).
      However, we do wish to expose the x86 emulator in case we are testing it
      (e.g. via kvm-unit-tests). Therefore, this patch adds a "force emulation prefix"
      that is designed to raise #UD which KVM will trap and it's #UD exit-handler will
      match "force emulation prefix" to run instruction after prefix by the x86 emulator.
      To not expose the x86 emulator by default, we add a module parameter that should
      be off by default.
      
      A simple testcase here:
      
          #include <stdio.h>
          #include <string.h>
      
          #define HYPERVISOR_INFO 0x40000000
      
          #define CPUID(idx, eax, ebx, ecx, edx) \
              asm volatile (\
              "ud2a; .ascii \"kvm\"; cpuid" \
              :"=b" (*ebx), "=a" (*eax), "=c" (*ecx), "=d" (*edx) \
                  :"0"(idx) );
      
          void main()
          {
              unsigned int eax, ebx, ecx, edx;
              char string[13];
      
              CPUID(HYPERVISOR_INFO, &eax, &ebx, &ecx, &edx);
              *(unsigned int *)(string + 0) = ebx;
              *(unsigned int *)(string + 4) = ecx;
              *(unsigned int *)(string + 8) = edx;
      
              string[12] = 0;
              if (strncmp(string, "KVMKVMKVM\0\0\0", 12) == 0)
                  printf("kvm guest\n");
              else
                  printf("bare hardware\n");
          }
      Suggested-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Reviewed-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: default avatarLiran Alon <liran.alon@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      [Correctly handle usermode exits. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6c86eedc
    • Wanpeng Li's avatar
      KVM: X86: Introduce handle_ud() · 082d06ed
      Wanpeng Li authored
      Introduce handle_ud() to handle invalid opcode, this function will be
      used by later patches.
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: default avatarLiran Alon <liran.alon@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim KrÄmáŠ<rkrcmar@redhat.com>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      082d06ed
    • Paolo Bonzini's avatar
      KVM: vmx: unify adjacent #ifdefs · 4fde8d57
      Paolo Bonzini authored
      vmx_save_host_state has multiple ifdefs for CONFIG_X86_64 that have
      no other code between them.  Simplify by reducing them to a single
      conditional.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4fde8d57
    • Arnd Bergmann's avatar
      x86: kvm: hide the unused 'cpu' variable · 51e8a8cc
      Arnd Bergmann authored
      The local variable was newly introduced but is only accessed in one
      place on x86_64, but not on 32-bit:
      
      arch/x86/kvm/vmx.c: In function 'vmx_save_host_state':
      arch/x86/kvm/vmx.c:2175:6: error: unused variable 'cpu' [-Werror=unused-variable]
      
      This puts it into another #ifdef.
      
      Fixes: 35060ed6 ("x86/kvm/vmx: avoid expensive rdmsr for MSR_GS_BASE")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      51e8a8cc
    • Sean Christopherson's avatar
      KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig · c75d0edc
      Sean Christopherson authored
      Remove the WARN_ON in handle_ept_misconfig() as it is unnecessary
      and causes false positives.  Return the unmodified result of
      kvm_mmu_page_fault() instead of converting a system error code to
      KVM_EXIT_UNKNOWN so that userspace sees the error code of the
      actual failure, not a generic "we don't know what went wrong".
      
        * kvm_mmu_page_fault() will WARN if reserved bits are set in the
          SPTEs, i.e. it covers the case where an EPT misconfig occurred
          because of a KVM bug.
      
        * The WARN_ON will fire on any system error code that is hit while
          handling the fault, e.g. -ENOMEM from mmu_topup_memory_caches()
          while handling a legitmate MMIO EPT misconfig or -EFAULT from
          kvm_handle_bad_page() if the corresponding HVA is invalid.  In
          either case, userspace should receive the original error code
          and firing a warning is incorrect behavior as KVM is operating
          as designed.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c75d0edc
    • Sean Christopherson's avatar
      Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown" · 2c151b25
      Sean Christopherson authored
      The bug that led to commit 95e057e2
      was a benign warning (no adverse affects other than the warning
      itself) that was detected by syzkaller.  Further inspection shows
      that the WARN_ON in question, in handle_ept_misconfig(), is
      unnecessary and flawed (this was also briefly discussed in the
      original patch: https://patchwork.kernel.org/patch/10204649).
      
        * The WARN_ON is unnecessary as kvm_mmu_page_fault() will WARN
          if reserved bits are set in the SPTEs, i.e. it covers the case
          where an EPT misconfig occurred because of a KVM bug.
      
        * The WARN_ON is flawed because it will fire on any system error
          code that is hit while handling the fault, e.g. -ENOMEM can be
          returned by mmu_topup_memory_caches() while handling a legitmate
          MMIO EPT misconfig.
      
      The original behavior of returning -EFAULT when userspace munmaps
      an HVA without first removing the memslot is correct and desirable,
      i.e. KVM is letting userspace know it has generated a bad address.
      Returning RET_PF_EMULATE masks the WARN_ON in the EPT misconfig path,
      but does not fix the underlying bug, i.e. the WARN_ON is bogus.
      
      Furthermore, returning RET_PF_EMULATE has the unwanted side effect of
      causing KVM to attempt to emulate an instruction on any page fault
      with an invalid HVA translation, e.g. a not-present EPT violation
      on a VM_PFNMAP VMA whose fault handler failed to insert a PFN.
      
        * There is no guarantee that the fault is directly related to the
          instruction, i.e. the fault could have been triggered by a side
          effect memory access in the guest, e.g. while vectoring a #DB or
          writing a tracing record.  This could cause KVM to effectively
          mask the fault if KVM doesn't model the behavior leading to the
          fault, i.e. emulation could succeed and resume the guest.
      
        * If emulation does fail, KVM will return EMULATION_FAILED instead
          of -EFAULT, which is a red herring as the user will either debug
          a bogus emulation attempt or scratch their head wondering why we
          were attempting emulation in the first place.
      
      TL;DR: revert to returning -EFAULT and remove the bogus WARN_ON in
      handle_ept_misconfig in a future patch.
      
      This reverts commit 95e057e2.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2c151b25
    • Stefan Fritsch's avatar
      kvm: Add emulation for movups/movupd · 29916968
      Stefan Fritsch authored
      This is very similar to the aligned versions movaps/movapd.
      
      We have seen the corresponding emulation failures with openbsd as guest
      and with Windows 10 with intel HD graphics pass through.
      Signed-off-by: default avatarChristian Ehrhardt <christian_ehrhardt@genua.de>
      Signed-off-by: default avatarStefan Fritsch <sf@sfritsch.de>
      Reviewed-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      29916968
    • Sean Christopherson's avatar
      KVM: VMX: raise internal error for exception during invalid protected mode state · add5ff7a
      Sean Christopherson authored
      Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if we encounter
      an exception in Protected Mode while emulating guest due to invalid
      guest state.  Unlike Big RM, KVM doesn't support emulating exceptions
      in PM, i.e. PM exceptions are always injected via the VMCS.  Because
      we will never do VMRESUME due to emulation_required, the exception is
      never realized and we'll keep emulating the faulting instruction over
      and over until we receive a signal.
      
      Exit to userspace iff there is a pending exception, i.e. don't exit
      simply on a requested event. The purpose of this check and exit is to
      aid in debugging a guest that is in all likelihood already doomed.
      Invalid guest state in PM is extremely limited in normal operation,
      e.g. it generally only occurs for a few instructions early in BIOS,
      and any exception at this time is all but guaranteed to be fatal.
      Non-vectored interrupts, e.g. INIT, SIPI and SMI, can be cleanly
      handled/emulated, while checking for vectored interrupts, e.g. INTR
      and NMI, without hitting false positives would add a fair amount of
      complexity for almost no benefit (getting hit by lightning seems
      more likely than encountering this specific scenario).
      
      Add a WARN_ON_ONCE to vmx_queue_exception() if we try to inject an
      exception via the VMCS and emulation_required is true.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      add5ff7a
  5. 29 Mar, 2018 1 commit
  6. 28 Mar, 2018 24 commits