1. 11 Jul, 2014 22 commits
    • Paolo Bonzini's avatar
      KVM: x86: use kvm_read_guest_page for emulator accesses · 44583cba
      Paolo Bonzini authored
      Emulator accesses are always done a page at a time, either by the emulator
      itself (for fetches) or because we need to query the MMU for address
      translations.  Speed up these accesses by using kvm_read_guest_page
      and, in the case of fetches, by inlining kvm_read_guest_virt_helper and
      dropping the loop around kvm_read_guest_page.
      
      This final tweak saves 30-100 more clock cycles (4-10%), bringing the
      count (as measured by kvm-unit-tests) down to 720-1100 clock cycles on
      a Sandy Bridge Xeon host, compared to 2300-3200 before the whole series
      and 925-1700 after the first two low-hanging fruit changes.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      44583cba
    • Paolo Bonzini's avatar
      KVM: x86: ensure emulator fetches do not span multiple pages · 719d5a9b
      Paolo Bonzini authored
      When the CS base is not page-aligned, the linear address of the code could
      get close to the page boundary (e.g. 0x...ffe) even if the EIP value is
      not.  So we need to first linearize the address, and only then compute
      the number of valid bytes that can be fetched.
      
      This happens relatively often when executing real mode code.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      719d5a9b
    • Paolo Bonzini's avatar
      KVM: emulate: put pointers in the fetch_cache · 17052f16
      Paolo Bonzini authored
      This simplifies the code a bit, especially the overflow checks.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      17052f16
    • Paolo Bonzini's avatar
      KVM: emulate: avoid per-byte copying in instruction fetches · 9506d57d
      Paolo Bonzini authored
      We do not need a memory copying loop anymore in insn_fetch; we
      can use a byte-aligned pointer to access instruction fields directly
      from the fetch_cache.  This eliminates 50-150 cycles (corresponding to
      a 5-10% improvement in performance) from each instruction.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9506d57d
    • Paolo Bonzini's avatar
      KVM: emulate: avoid repeated calls to do_insn_fetch_bytes · 5cfc7e0f
      Paolo Bonzini authored
      do_insn_fetch_bytes will only be called once in a given insn_fetch and
      insn_fetch_arr, because in fact it will only be called at most twice
      for any instruction and the first call is explicit in x86_decode_insn.
      This observation lets us hoist the call out of the memory copying loop.
      It does not buy performance, because most fetches are one byte long
      anyway, but it prepares for the next patch.
      
      The overflow check is tricky, but correct.  Because do_insn_fetch_bytes
      has already been called once, we know that fc->end is at least 15.  So
      it is okay to subtract the number of bytes we want to read.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5cfc7e0f
    • Paolo Bonzini's avatar
      KVM: emulate: speed up do_insn_fetch · 285ca9e9
      Paolo Bonzini authored
      Hoist the common case up from do_insn_fetch_byte to do_insn_fetch,
      and prime the fetch_cache in x86_decode_insn.  This helps a bit the
      compiler and the branch predictor, but above all it lays the
      ground for further changes in the next few patches.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      285ca9e9
    • Bandan Das's avatar
      KVM: emulate: do not initialize memopp · 41061cdb
      Bandan Das authored
      rip_relative is only set if decode_modrm runs, and if you have ModRM
      you will also have a memopp.  We can then access memopp unconditionally.
      Note that rip_relative cannot be hoisted up to decode_modrm, or you
      break "mov $0, xyz(%rip)".
      
      Also, move typecast on "out of range value" of mem.ea to decode_modrm.
      
      Together, all these optimizations save about 50 cycles on each emulated
      instructions (4-6%).
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      [Fix immediate operands with rip-relative addressing. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      41061cdb
    • Bandan Das's avatar
      KVM: emulate: rework seg_override · 573e80fe
      Bandan Das authored
      x86_decode_insn already sets a default for seg_override,
      so remove it from the zeroed area. Also replace set/get functions
      with direct access to the field.
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      573e80fe
    • Bandan Das's avatar
      KVM: emulate: clean up initializations in init_decode_cache · c44b4c6a
      Bandan Das authored
      A lot of initializations are unnecessary as they get set to
      appropriate values before actually being used. Optimize
      placement of fields in x86_emulate_ctxt
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c44b4c6a
    • Bandan Das's avatar
      KVM: emulate: cleanup decode_modrm · 02357bdc
      Bandan Das authored
      Remove the if conditional - that will help us avoid
      an "else initialize to 0" Also, rearrange operators
      for slightly better code.
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      02357bdc
    • Bandan Das's avatar
      KVM: emulate: Remove ctxt->intercept and ctxt->check_perm checks · 685bbf4a
      Bandan Das authored
      The same information can be gleaned from ctxt->d and avoids having
      to zero/NULL initialize intercept and check_perm
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      685bbf4a
    • Bandan Das's avatar
      KVM: emulate: move init_decode_cache to emulate.c · 1498507a
      Bandan Das authored
      Core emulator functions all belong in emulator.c,
      x86 should have no knowledge of emulator internals
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1498507a
    • Paolo Bonzini's avatar
      KVM: emulate: simplify writeback · f5f87dfb
      Paolo Bonzini authored
      The "if/return" checks are useless, because we return X86EMUL_CONTINUE
      anyway if we do not return.
      Reviewed-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f5f87dfb
    • Paolo Bonzini's avatar
      KVM: emulate: speed up emulated moves · 54cfdb3e
      Paolo Bonzini authored
      We can just blindly move all 16 bytes of ctxt->src's value to ctxt->dst.
      write_register_operand will take care of writing only the lower bytes.
      
      Avoiding a call to memcpy (the compiler optimizes it out) gains about
      200 cycles on kvm-unit-tests for register-to-register moves, and makes
      them about as fast as arithmetic instructions.
      
      We could perhaps get a larger speedup by moving all instructions _except_
      moves out of x86_emulate_insn, removing opcode_len, and replacing the
      switch statement with an inlined em_mov.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      54cfdb3e
    • Paolo Bonzini's avatar
      KVM: emulate: protect checks on ctxt->d by a common "if (unlikely())" · d40a6898
      Paolo Bonzini authored
      There are several checks for "peculiar" aspects of instructions in both
      x86_decode_insn and x86_emulate_insn.  Group them together, and guard
      them with a single "if" that lets the processor quickly skip them all.
      Make this more effective by adding two more flag bits that say whether the
      .intercept and .check_perm fields are valid.  We will reuse these
      flags later to avoid initializing fields of the emulate_ctxt struct.
      
      This skims about 30 cycles for each emulated instructions, which is
      approximately a 3% improvement.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d40a6898
    • Paolo Bonzini's avatar
      KVM: emulate: move around some checks · e24186e0
      Paolo Bonzini authored
      The only purpose of this patch is to make the next patch simpler
      to review.  No semantic change.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e24186e0
    • Paolo Bonzini's avatar
      KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation · 6addfc42
      Paolo Bonzini authored
      Despite the provisions to emulate up to 130 consecutive instructions, in
      practice KVM will emulate just one before exiting handle_invalid_guest_state,
      because x86_emulate_instruction always sets KVM_REQ_EVENT.
      
      However, we only need to do this if an interrupt could be injected,
      which happens a) if an interrupt shadow bit (STI or MOV SS) has gone
      away; b) if the interrupt flag has just been set (other instructions
      than STI can set it without enabling an interrupt shadow).
      
      This cuts another 700-900 cycles from the cost of emulating an
      instruction (measured on a Sandy Bridge Xeon: 1650-2600 cycles
      before the patch on kvm-unit-tests, 925-1700 afterwards).
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6addfc42
    • Paolo Bonzini's avatar
      KVM: x86: return all bits from get_interrupt_shadow · 37ccdcbe
      Paolo Bonzini authored
      For the next patch we will need to know the full state of the
      interrupt shadow; we will then set KVM_REQ_EVENT when one bit
      is cleared.
      
      However, right now get_interrupt_shadow only returns the one
      corresponding to the emulated instruction, or an unconditional
      0 if the emulated instruction does not have an interrupt shadow.
      This is confusing and does not allow us to check for cleared
      bits as mentioned above.
      
      Clean the callback up, and modify toggle_interruptibility to
      match the comment above the call.  As a small result, the
      call to set_interrupt_shadow will be skipped in the common
      case where int_shadow == 0 && mask == 0.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      37ccdcbe
    • Paolo Bonzini's avatar
      KVM: vmx: speed up emulation of invalid guest state · 98eb2f8b
      Paolo Bonzini authored
      About 25% of the time spent in emulation of invalid guest state
      is wasted in checking whether emulation is required for the next
      instruction.  However, this almost never changes except when a
      segment register (or TR or LDTR) changes, or when there is a mode
      transition (i.e. CR0 changes).
      
      In fact, vmx_set_segment and vmx_set_cr0 already modify
      vmx->emulation_required (except that the former for some reason
      uses |= instead of just an assignment).  So there is no need to
      call guest_state_valid in the emulation loop.
      
      Emulation performance test results indicate 1650-2600 cycles
      for common instructions, versus 2300-3200 before this patch on
      a Sandy Bridge Xeon.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      98eb2f8b
    • Matthias Lange's avatar
      KVM: svm: writes to MSR_K7_HWCR generates GPE in guest · 22d48b2d
      Matthias Lange authored
      Since commit 575203 the MCE subsystem in the Linux kernel for AMD sets bit 18
      in MSR_K7_HWCR. Running such a kernel as a guest in KVM on an AMD host results
      in a GPE injected into the guest because kvm_set_msr_common returns 1. This
      patch fixes this by masking bit 18 from the MSR value desired by the guest.
      Signed-off-by: default avatarMatthias Lange <matthias.lange@kernkonzept.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      22d48b2d
    • Nadav Amit's avatar
      KVM: x86: Pending interrupt may be delivered after INIT · 5f7552d4
      Nadav Amit authored
      We encountered a scenario in which after an INIT is delivered, a pending
      interrupt is delivered, although it was sent before the INIT.  As the SDM
      states in section 10.4.7.1, the ISR and the IRR should be cleared after INIT as
      KVM does.  This also means that pending interrupts should be cleared.  This
      patch clears upon reset (and INIT) the pending interrupts; and at the same
      occassion clears the pending exceptions, since they may cause a similar issue.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5f7552d4
    • Jim Mattson's avatar
      KVM: Synthesize G bit for all segments. · 80112c89
      Jim Mattson authored
      We have noticed that qemu-kvm hangs early in the BIOS when runnning nested
      under some versions of VMware ESXi.
      
      The problem we believe is because KVM assumes that the platform preserves
      the 'G' but for any segment register. The SVM specification itemizes the
      segment attribute bits that are observed by the CPU, but the (G)ranularity bit
      is not one of the bits itemized, for any segment. Though current AMD CPUs keep
      track of the (G)ranularity bit for all segment registers other than CS, the
      specification does not require it. VMware's virtual CPU may not track the
      (G)ranularity bit for any segment register.
      
      Since kvm already synthesizes the (G)ranularity bit for the CS segment. It
      should do so for all segments. The patch below does that, and helps get rid of
      the hangs. Patch applies on top of Linus' tree.
      Signed-off-by: default avatarJim Mattson <jmattson@vmware.com>
      Signed-off-by: default avatarAlok N Kataria <akataria@vmware.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      80112c89
  2. 09 Jul, 2014 9 commits
    • James Hogan's avatar
      KVM: MIPS: Document MIPS specifics of KVM API. · c2d2c21b
      James Hogan authored
      Document the MIPS specific parts of the KVM API, including:
       - The layout of the kvm_regs structure.
       - The interrupt number passed to KVM_INTERRUPT.
       - The registers supported by the KVM_{GET,SET}_ONE_REG interface, and
         the encoding of those register ids.
       - That KVM_INTERRUPT and KVM_GET_REG_LIST are supported on MIPS.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: linux-doc@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c2d2c21b
    • James Hogan's avatar
      KVM: Reformat KVM_SET_ONE_REG register documentation · bf5590f3
      James Hogan authored
      Some of the MIPS registers that can be accessed with the
      KVM_{GET,SET}_ONE_REG interface have fairly long names, so widen the
      Register column of the table in the KVM_SET_ONE_REG documentation to
      allow them to fit.
      
      Tabs in the table are replaced with spaces at the same time for
      consistency.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: linux-doc@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bf5590f3
    • James Hogan's avatar
      KVM: Document KVM_SET_SIGNAL_MASK as universal · 572e0929
      James Hogan authored
      KVM_SET_SIGNAL_MASK is implemented in generic code and isn't x86
      specific, so document it as being applicable for all architectures.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: linux-doc@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      572e0929
    • Nadav Amit's avatar
      KVM: x86: Fix lapic.c debug prints · 98eff52a
      Nadav Amit authored
      In two cases lapic.c does not use the apic_debug macro correctly. This patch
      fixes them.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      98eff52a
    • Tomasz Grabiec's avatar
      KVM: x86: fix TSC matching · 0d3da0d2
      Tomasz Grabiec authored
      I've observed kvmclock being marked as unstable on a modern
      single-socket system with a stable TSC and qemu-1.6.2 or qemu-2.0.0.
      
      The culprit was failure in TSC matching because of overflow of
      kvm_arch::nr_vcpus_matched_tsc in case there were multiple TSC writes
      in a single synchronization cycle.
      
      Turns out that qemu does multiple TSC writes during init, below is the
      evidence of that (qemu-2.0.0):
      
      The first one:
      
       0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
       0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
       0xffffffffa04cfd6b : kvm_arch_vcpu_postcreate+0x4b/0x80 [kvm]
       0xffffffffa04b8188 : kvm_vm_ioctl+0x418/0x750 [kvm]
      
      The second one:
      
       0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
       0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
       0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
       0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
       0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
       0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
       0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]
      
       #0  kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
       #1  kvm_put_msrs at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
       #2  kvm_arch_put_registers at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
       #3  kvm_cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1641
       #4  cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:330
       #5  cpu_synchronize_all_post_init () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:521
       #6  main at /build/buildd/qemu-2.0.0+dfsg/vl.c:4390
      
      The third one:
      
       0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel]
       0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm]
       0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel]
       0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm]
       0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm]
       0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm]
       0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm]
      
       #0  kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780
       #1  kvm_put_msrs  at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270
       #2  kvm_arch_put_registers  at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909
       #3  kvm_cpu_synchronize_post_reset  at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1635
       #4  cpu_synchronize_post_reset  at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:323
       #5  cpu_synchronize_all_post_reset () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:512
       #6  main  at /build/buildd/qemu-2.0.0+dfsg/vl.c:4482
      
      The fix is to count each vCPU only once when matched, so that
      nr_vcpus_matched_tsc holds the size of the matched set. This is
      achieved by reusing generation counters. Every vCPU with
      this_tsc_generation == cur_tsc_generation is in the matched set. The
      match set is cleared by setting cur_tsc_generation to a value which no
      other vCPU is set to (by incrementing it).
      
      I needed to bump up the counter size form u8 to u64 to ensure it never
      overflows. Otherwise in cases TSC is not written the same number of
      times on each vCPU the counter could overflow and incorrectly indicate
      some vCPUs as being in the matched set. This scenario seems unlikely
      but I'm not sure if it can be disregarded.
      Signed-off-by: default avatarTomasz Grabiec <tgrabiec@cloudius-systems.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0d3da0d2
    • Jan Kiszka's avatar
      KVM: nSVM: Set correct port for IOIO interception evaluation · 6cbc5f5a
      Jan Kiszka authored
      Obtaining the port number from DX is bogus as a) there are immediate
      port accesses and b) user space may have changed the register content
      while processing the PIO access. Forward the correct value from the
      instruction emulator instead.
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6cbc5f5a
    • Jan Kiszka's avatar
      KVM: nSVM: Fix IOIO size reported on emulation · 6493f157
      Jan Kiszka authored
      The access size of an in/ins is reported in dst_bytes, and that of
      out/outs in src_bytes.
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6493f157
    • Jan Kiszka's avatar
      KVM: nSVM: Fix IOIO bitmap evaluation · 9bf41833
      Jan Kiszka authored
      First, kvm_read_guest returns 0 on success. And then we need to take the
      access size into account when testing the bitmap: intercept if any of
      bits corresponding to the access is set.
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9bf41833
    • Jan Kiszka's avatar
      KVM: nSVM: Do not report CLTS via SVM_EXIT_WRITE_CR0 to L1 · 62baf44c
      Jan Kiszka authored
      CLTS only changes TS which is not monitored by selected CR0
      interception. So skip any attempt to translate WRITE_CR0 to
      CR0_SEL_WRITE for this instruction.
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      62baf44c
  3. 30 Jun, 2014 9 commits