1. 19 Mar, 2014 2 commits
    • James Hogan's avatar
      MIPS: KVM: Pass reserved instruction exceptions to guest · 15505679
      James Hogan authored
      Previously a reserved instruction exception while in guest code would
      cause a KVM internal error if kvm_mips_handle_ri() didn't recognise the
      instruction (including a RDHWR from an unrecognised hardware register).
      
      However the guest OS should really have the opportunity to catch the
      exception so that it can take the appropriate actions such as sending a
      SIGILL to the guest user process or emulating the instruction itself.
      
      Therefore in these cases emulate a guest RI exception and only return
      EMULATE_FAIL if that fails, being careful to revert the PC first in case
      the exception occurred in a branch delay slot in which case the PC will
      already point to the branch target.
      
      Also turn the printk messages relating to these cases into kvm_debug
      messages so that they aren't usually visible.
      
      This allows crashme to run in the guest without killing the entire VM.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      15505679
    • James Hogan's avatar
      MIPS: KVM: asm/kvm_host.h: Clean up whitespace · 22027945
      James Hogan authored
      The whitespace in asm/kvm_host.h is quite inconsistent in places. Clean
      up the whole file to use tabs more consistently.
      
      When you use the --ignore-space-change argument to git diff this patch
      only changes line wrapping in TLB_IS_GLOBAL and TLB_IS_VALID macros.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      22027945
  2. 18 Mar, 2014 1 commit
    • Cornelia Huck's avatar
      KVM: eventfd: Fix lock order inversion. · 684a0b71
      Cornelia Huck authored
      When registering a new irqfd, we call its ->poll method to collect any
      event that might have previously been pending so that we can trigger it.
      This is done under the kvm->irqfds.lock, which means the eventfd's ctx
      lock is taken under it.
      
      However, if we get a POLLHUP in irqfd_wakeup, we will be called with the
      ctx lock held before getting the irqfds.lock to deactivate the irqfd,
      causing lockdep to complain.
      
      Calling the ->poll method does not really need the irqfds.lock, so let's
      just move it after we've given up the irqfds.lock in kvm_irqfd_assign().
      Signed-off-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      684a0b71
  3. 17 Mar, 2014 8 commits
    • Paolo Bonzini's avatar
      KVM: x86: handle missing MPX in nested virtualization · 93c4adc7
      Paolo Bonzini authored
      When doing nested virtualization, we may be able to read BNDCFGS but
      still not be allowed to write to GUEST_BNDCFGS in the VMCS.  Guard
      writes to the field with vmx_mpx_supported(), and similarly hide the
      MSR from userspace if the processor does not support the field.
      
      We could work around this with the generic MSR save/load machinery,
      but there is only a limited number of MSR save/load slots and it is
      not really worthwhile to waste one for a scenario that should not
      happen except in the nested virtualization case.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      93c4adc7
    • Paolo Bonzini's avatar
      KVM: x86: Add nested virtualization support for MPX · 36be0b9d
      Paolo Bonzini authored
      This is simple to do, the "host" BNDCFGS is either 0 or the guest value.
      However, both controls have to be present.  We cannot provide MPX if
      we only have one of the "load BNDCFGS" or "clear BNDCFGS" controls.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      36be0b9d
    • Paolo Bonzini's avatar
      KVM: x86: introduce kvm_supported_xcr0() · 4ff41732
      Paolo Bonzini authored
      XSAVE support for KVM is already using host_xcr0 & KVM_SUPPORTED_XCR0 as
      a "dynamic" version of KVM_SUPPORTED_XCR0.
      
      However, this is not enough because the MPX bits should not be presented
      to the guest unless kvm_x86_ops confirms the support.  So, replace all
      instances of host_xcr0 & KVM_SUPPORTED_XCR0 with a new function
      kvm_supported_xcr0() that also has this check.
      
      Note that here:
      
      		if (xstate_bv & ~KVM_SUPPORTED_XCR0)
      			return -EINVAL;
      		if (xstate_bv & ~host_cr0)
      			return -EINVAL;
      
      the code is equivalent to
      
      		if ((xstate_bv & ~KVM_SUPPORTED_XCR0) ||
      		    (xstate_bv & ~host_cr0)
      			return -EINVAL;
      
      i.e. "xstate_bv & (~KVM_SUPPORTED_XCR0 | ~host_cr0)" which is in turn
      equal to "xstate_bv & ~(KVM_SUPPORTED_XCR0 & host_cr0)".  So we should
      also use the new function there.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4ff41732
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-20140317' of... · 94b3ffcd
      Paolo Bonzini authored
      Merge tag 'kvm-s390-20140317' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
      
      Two patches:
      - one regression fix for reducing the amount of ucontrol userspace exits
      - get rid of BUG_ONs in hot inner loops
      94b3ffcd
    • Igor Mammedov's avatar
      KVM: x86 emulator: emulate MOVAPD · 6fec27d8
      Igor Mammedov authored
      Add emulation for 0x66 prefixed instruction of 0f 28 opcode
      that has been added earlier.
      Signed-off-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6fec27d8
    • Igor Mammedov's avatar
      KVM: x86 emulator: emulate MOVAPS · 27ce8258
      Igor Mammedov authored
      HCK memory driver test fails when testing 32-bit Windows 8.1
      with baloon driver.
      
      tracing KVM shows error:
      reason EXIT_ERR rip 0x81c18326 info 0 0
      
      x/10i 0x81c18326-20
      0x0000000081c18312:  add    %al,(%eax)
      0x0000000081c18314:  add    %cl,-0x7127711d(%esi)
      0x0000000081c1831a:  rolb   $0x0,0x80ec(%ecx)
      0x0000000081c18321:  and    $0xfffffff0,%esp
      0x0000000081c18324:  mov    %esp,%esi
      0x0000000081c18326:  movaps %xmm0,(%esi)
      0x0000000081c18329:  movaps %xmm1,0x10(%esi)
      0x0000000081c1832d:  movaps %xmm2,0x20(%esi)
      0x0000000081c18331:  movaps %xmm3,0x30(%esi)
      0x0000000081c18335:  movaps %xmm4,0x40(%esi)
      
      which points to MOVAPS instruction currently no emulated by KVM.
      Fix it by adding appropriate entries to opcode table in KVM's emulator.
      Signed-off-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      27ce8258
    • Christian Borntraeger's avatar
      KVM: s390: Optimize ucontrol path · 2955c83f
      Christian Borntraeger authored
      Since commit 7c470539
      (s390/kvm: avoid automatic sie reentry) we will run through the C code
      of KVM on host interrupts instead of just reentering the guest. This
      will result in additional ucontrol exits (at least HZ per second). Let
      handle a 0 intercept in the kernel and dont return to userspace,
      even if in ucontrol mode.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      CC: stable@vger.kernel.org
      2955c83f
    • Dominik Dingel's avatar
      KVM: s390: Removing untriggerable BUG_ONs · fed495d2
      Dominik Dingel authored
      The BUG_ON in kvm-s390.c is unreachable, as we get the vcpu per common code,
      which itself does this from the private_data field of the file descriptor,
      and there is no KVM_UNCREATE_VCPU.
      
      The __{set,unset}_cpu_idle BUG_ONs are not triggerable because the vcpu
      creation code already checks against KVM_MAX_VCPUS.
      Signed-off-by: default avatarDominik Dingel <dingel@linux.vnet.ibm.com>
      Acked-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      fed495d2
  4. 14 Mar, 2014 1 commit
  5. 13 Mar, 2014 3 commits
    • Gabriel L. Somlo's avatar
      kvm: x86: ignore ioapic polarity · 100943c5
      Gabriel L. Somlo authored
      Both QEMU and KVM have already accumulated a significant number of
      optimizations based on the hard-coded assumption that ioapic polarity
      will always use the ActiveHigh convention, where the logical and
      physical states of level-triggered irq lines always match (i.e.,
      active(asserted) == high == 1, inactive == low == 0). QEMU guests
      are expected to follow directions given via ACPI and configure the
      ioapic with polarity 0 (ActiveHigh). However, even when misbehaving
      guests (e.g. OS X <= 10.9) set the ioapic polarity to 1 (ActiveLow),
      QEMU will still use the ActiveHigh signaling convention when
      interfacing with KVM.
      
      This patch modifies KVM to completely ignore ioapic polarity as set by
      the guest OS, enabling misbehaving guests to work alongside those which
      comply with the ActiveHigh polarity specified by QEMU's ACPI tables.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarGabriel L. Somlo <somlo@cmu.edu>
      [Move documentation to KVM_IRQ_LINE, add ia64. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      100943c5
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix register usage when loading/saving VRSAVE · e724f080
      Paul Mackerras authored
      Commit 595e4f7e ("KVM: PPC: Book3S HV: Use load/store_fp_state
      functions in HV guest entry/exit") changed the register usage in
      kvmppc_save_fp() and kvmppc_load_fp() but omitted changing the
      instructions that load and save VRSAVE.  The result is that the
      VRSAVE value was loaded from a constant address, and saved to a
      location past the end of the vcpu struct, causing host kernel
      memory corruption and various kinds of host kernel crashes.
      
      This fixes the problem by using register r31, which contains the
      vcpu pointer, instead of r3 and r4.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e724f080
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Remove bogus duplicate code · a5b0ccb0
      Paul Mackerras authored
      Commit 7b490411 ("KVM: PPC: Book3S HV: Add new state for
      transactional memory") incorrectly added some duplicate code to the
      guest exit path because I didn't manage to clean up after a rebase
      correctly.  This removes the extraneous material.  The presence of
      this extraneous code causes host crashes whenever a guest is run.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a5b0ccb0
  6. 11 Mar, 2014 11 commits
    • Paolo Bonzini's avatar
      KVM: svm: Allow the guest to run with dirty debug registers · facb0139
      Paolo Bonzini authored
      When not running in guest-debug mode (i.e. the guest controls the debug
      registers, having to take an exit for each DR access is a waste of time.
      If the guest gets into a state where each context switch causes DR to be
      saved and restored, this can take away as much as 40% of the execution
      time from the guest.
      
      If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we
      can let it write freely to the debug registers and reload them on the
      next exit.  We still need to exit on the first access, so that the
      KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further
      accesses to the debug registers will not cause a vmexit.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      facb0139
    • Paolo Bonzini's avatar
      KVM: svm: set/clear all DR intercepts in one swoop · 5315c716
      Paolo Bonzini authored
      Unlike other intercepts, debug register intercepts will be modified
      in hot paths if the guest OS is bad or otherwise gets tricked into
      doing so.
      
      Avoid calling recalc_intercepts 16 times for debug registers.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5315c716
    • Paolo Bonzini's avatar
      KVM: nVMX: Allow nested guests to run with dirty debug registers · d16c293e
      Paolo Bonzini authored
      When preparing the VMCS02, the CPU-based execution controls is computed
      by vmx_exec_control.  Turn off DR access exits there, too, if the
      KVM_DEBUGREG_WONT_EXIT bit is set in switch_db_regs.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d16c293e
    • Paolo Bonzini's avatar
      KVM: vmx: Allow the guest to run with dirty debug registers · 81908bf4
      Paolo Bonzini authored
      When not running in guest-debug mode (i.e. the guest controls the debug
      registers, having to take an exit for each DR access is a waste of time.
      If the guest gets into a state where each context switch causes DR to be
      saved and restored, this can take away as much as 40% of the execution
      time from the guest.
      
      If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we
      can let it write freely to the debug registers and reload them on the
      next exit.  We still need to exit on the first access, so that the
      KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further
      accesses to the debug registers will not cause a vmexit.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      81908bf4
    • Paolo Bonzini's avatar
      KVM: x86: Allow the guest to run with dirty debug registers · c77fb5fe
      Paolo Bonzini authored
      When not running in guest-debug mode, the guest controls the debug
      registers and having to take an exit for each DR access is a waste
      of time.  If the guest gets into a state where each context switch
      causes DR to be saved and restored, this can take away as much as 40%
      of the execution time from the guest.
      
      After this patch, VMX- and SVM-specific code can set a flag in
      switch_db_regs, telling vcpu_enter_guest that on the next exit the debug
      registers might be dirty and need to be reloaded (syncing will be taken
      care of by a new callback in kvm_x86_ops).  This flag can be set on the
      first access to a debug registers, so that multiple accesses to the
      debug registers only cause one vmexit.
      
      Note that since the guest will be able to read debug registers and
      enable breakpoints in DR7, we need to ensure that they are synchronized
      on entry to the guest---including DR6 that was not synced before.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c77fb5fe
    • Paolo Bonzini's avatar
      KVM: x86: change vcpu->arch.switch_db_regs to a bit mask · 360b948d
      Paolo Bonzini authored
      The next patch will add another bit that we can test with the
      same "if".
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      360b948d
    • Paolo Bonzini's avatar
      KVM: vmx: we do rely on loading DR7 on entry · c845f9c6
      Paolo Bonzini authored
      Currently, this works even if the bit is not in "min", because the bit is always
      set in MSR_IA32_VMX_ENTRY_CTLS.  Mention it for the sake of documentation, and
      to avoid surprises if we later switch to MSR_IA32_VMX_TRUE_ENTRY_CTLS.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c845f9c6
    • Jan Kiszka's avatar
      KVM: x86: Remove return code from enable_irq/nmi_window · c9a7953f
      Jan Kiszka authored
      It's no longer possible to enter enable_irq_window in guest mode when
      L1 intercepts external interrupts and we are entering L2. This is now
      caught in vcpu_enter_guest. So we can remove the check from the VMX
      version of enable_irq_window, thus the need to return an error code from
      both enable_irq_window and enable_nmi_window.
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c9a7953f
    • Jan Kiszka's avatar
      KVM: nVMX: Do not inject NMI vmexits when L2 has a pending interrupt · 220c5672
      Jan Kiszka authored
      According to SDM 27.2.3, IDT vectoring information will not be valid on
      vmexits caused by external NMIs. So we have to avoid creating such
      scenarios by delaying EXIT_REASON_EXCEPTION_NMI injection as long as we
      have a pending interrupt because that one would be migrated to L1's IDT
      vectoring info on nested exit.
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      220c5672
    • Jan Kiszka's avatar
      KVM: nVMX: Fully emulate preemption timer · f4124500
      Jan Kiszka authored
      We cannot rely on the hardware-provided preemption timer support because
      we are holding L2 in HLT outside non-root mode. Furthermore, emulating
      the preemption will resolve tick rate errata on older Intel CPUs.
      
      The emulation is based on hrtimer which is started on L2 entry, stopped
      on L2 exit and evaluated via the new check_nested_events hook. As we no
      longer rely on hardware features, we can enable both the preemption
      timer support and value saving unconditionally.
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f4124500
    • Jan Kiszka's avatar
      KVM: nVMX: Rework interception of IRQs and NMIs · b6b8a145
      Jan Kiszka authored
      Move the check for leaving L2 on pending and intercepted IRQs or NMIs
      from the *_allowed handler into a dedicated callback. Invoke this
      callback at the relevant points before KVM checks if IRQs/NMIs can be
      injected. The callback has the task to switch from L2 to L1 if needed
      and inject the proper vmexit events.
      
      The rework fixes L2 wakeups from HLT and provides the foundation for
      preemption timer emulation.
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b6b8a145
  7. 06 Mar, 2014 2 commits
  8. 04 Mar, 2014 12 commits