1. 09 Jul, 2012 11 commits
    • Avi Kivity's avatar
      KVM: x86 emulator: initialize memop · cbd27ee7
      Avi Kivity authored
      memop is not initialized; this can lead to a two-byte operation
      following a 4-byte operation to see garbage values.  Usually
      truncation fixes things fot us later on, but at least in one case
      (call abs) it doesn't.
      
      Fix by moving memop to the auto-initialized field area.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      cbd27ee7
    • Avi Kivity's avatar
      KVM: x86 emulator: emulate LEAVE · f47cfa31
      Avi Kivity authored
      Opcode c9; used by some variants of Windows during boot, in big real mode.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      f47cfa31
    • Avi Kivity's avatar
      KVM: VMX: Limit iterations with emulator_invalid_guest_state · b8405c18
      Avi Kivity authored
      Otherwise, if the guest ends up looping, we never exit the srcu critical
      section, which causes synchronize_srcu() to hang.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      b8405c18
    • Avi Kivity's avatar
      KVM: VMX: Relax check on unusable segment · f0495f9b
      Avi Kivity authored
      Some userspace (e.g. QEMU 1.1) munge the d and g bits of segment
      descriptors, causing us not to recognize them as unusable segments
      with emulate_invalid_guest_state=1.  Relax the check by testing for
      segment not present (a non-present segment cannot be usable).
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      f0495f9b
    • Avi Kivity's avatar
      KVM: x86 emulator: fix LIDT/LGDT in long mode · 510425ff
      Avi Kivity authored
      The operand size for these instructions is 8 bytes in long mode, even without
      a REX prefix.  Set it explicitly.
      
      Triggered while booting Linux with emulate_invalid_guest_state=1.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      510425ff
    • Avi Kivity's avatar
      KVM: x86 emulator: allow loading null SS in long mode · 79d5b4c3
      Avi Kivity authored
      Null SS is valid in long mode; allow loading it.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      79d5b4c3
    • Avi Kivity's avatar
      KVM: x86 emulator: emulate cpuid · 6d6eede4
      Avi Kivity authored
      Opcode 0F A2.
      
      Used by Linux during the mode change trampoline while in a state that is
      not virtualizable on vmx without unrestricted_guest, so we need to emulate
      it is emulate_invalid_guest_state=1.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      6d6eede4
    • Avi Kivity's avatar
      KVM: x86 emulator: change ->get_cpuid() accessor to use the x86 semantics · 0017f93a
      Avi Kivity authored
      Instead of getting an exact leaf, follow the spec and fall back to the last
      main leaf instead.  This lets us easily emulate the cpuid instruction in the
      emulator.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      0017f93a
    • Avi Kivity's avatar
      KVM: Split cpuid register access from computation · 62046e5a
      Avi Kivity authored
      Introduce kvm_cpuid() to perform the leaf limit check and calculate
      register values, and let kvm_emulate_cpuid() just handle reading and
      writing the registers from/to the vcpu.  This allows us to reuse
      kvm_cpuid() in a context where directly reading and writing registers
      is not desired.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      62046e5a
    • Avi Kivity's avatar
      KVM: VMX: Return correct CPL during transition to protected mode · d881e6f6
      Avi Kivity authored
      In protected mode, the CPL is defined as the lower two bits of CS, as set by
      the last far jump.  But during the transition to protected mode, there is no
      last far jump, so we need to return zero (the inherited real mode CPL).
      
      Fix by reading CPL from the cache during the transition.  This isn't 100%
      correct since we don't set the CPL cache on a far jump, but since protected
      mode transition will always jump to a segment with RPL=0, it will always
      work.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      d881e6f6
    • Avi Kivity's avatar
      KVM: MMU: Force cr3 reload with two dimensional paging on mov cr3 emulation · e676505a
      Avi Kivity authored
      Currently the MMU's ->new_cr3() callback does nothing when guest paging
      is disabled or when two-dimentional paging (e.g. EPT on Intel) is active.
      This means that an emulated write to cr3 can be lost; kvm_set_cr3() will
      write vcpu-arch.cr3, but the GUEST_CR3 field in the VMCS will retain its
      old value and this is what the guest sees.
      
      This bug did not have any effect until now because:
      - with unrestricted guest, or with svm, we never emulate a mov cr3 instruction
      - without unrestricted guest, and with paging enabled, we also never emulate a
        mov cr3 instruction
      - without unrestricted guest, but with paging disabled, the guest's cr3 is
        ignored until the guest enables paging; at this point the value from arch.cr3
        is loaded correctly my the mov cr0 instruction which turns on paging
      
      However, the patchset that enables big real mode causes us to emulate mov cr3
      instructions in protected mode sometimes (when guest state is not virtualizable
      by vmx); this mov cr3 is effectively ignored and will crash the guest.
      
      The fix is to make nonpaging_new_cr3() call mmu_free_roots() to force a cr3
      reload.  This is awkward because now all the new_cr3 callbacks to the same
      thing, and because mmu_free_roots() is somewhat of an overkill; but fixing
      that is more complicated and will be done after this minimal fix.
      
      Observed in the Window XP 32-bit installer while bringing up secondary vcpus.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      e676505a
  2. 06 Jul, 2012 1 commit
  3. 03 Jul, 2012 9 commits
  4. 25 Jun, 2012 8 commits
    • Michael S. Tsirkin's avatar
      KVM: host side for eoi optimization · ae7a2a3f
      Michael S. Tsirkin authored
      Implementation of PV EOI using shared memory.
      This reduces the number of exits an interrupt
      causes as much as by half.
      
      The idea is simple: there's a bit, per APIC, in guest memory,
      that tells the guest that it does not need EOI.
      We set it before injecting an interrupt and clear
      before injecting a nested one. Guest tests it using
      a test and clear operation - this is necessary
      so that host can detect interrupt nesting -
      and if set, it can skip the EOI MSR.
      
      There's a new MSR to set the address of said register
      in guest memory. Otherwise not much changed:
      - Guest EOI is not required
      - Register is tested & ISR is automatically cleared on exit
      
      For testing results see description of previous patch
      'kvm_para: guest side for eoi avoidance'.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      ae7a2a3f
    • Michael S. Tsirkin's avatar
      KVM: rearrange injection cancelling code · d905c069
      Michael S. Tsirkin authored
      Each time we need to cancel injection we invoke same code
      (cancel_injection callback).  Move it towards the end of function using
      the familiar goto on error pattern.
      
      Will make it easier to do more cleanups for PV EOI.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      d905c069
    • Michael S. Tsirkin's avatar
      KVM: only sync when attention bits set · 5cfb1d5a
      Michael S. Tsirkin authored
      Commit eb0dc6d0368072236dcd086d7fdc17fd3c4574d4 introduced apic
      attention bitmask but kvm still syncs lapic unconditionally.
      As that commit suggested and in anticipation of adding more attention
      bits, only sync lapic if(apic_attention).
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      5cfb1d5a
    • Michael S. Tsirkin's avatar
      KVM: eoi msi documentation · c1af87dc
      Michael S. Tsirkin authored
      Document the new EOI MSR. Couldn't decide whether this change belongs
      conceptually on guest or host side, so a separate patch.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      c1af87dc
    • Michael S. Tsirkin's avatar
      x86, bitops: note on __test_and_clear_bit atomicity · d0a69d63
      Michael S. Tsirkin authored
      __test_and_clear_bit is actually atomic with respect
      to the local CPU. Add a note saying that KVM on x86
      relies on this behaviour so people don't accidentaly break it.
      Also warn not to rely on this in portable code.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      d0a69d63
    • Michael S. Tsirkin's avatar
      KVM guest: guest side for eoi avoidance · ab9cf499
      Michael S. Tsirkin authored
      The idea is simple: there's a bit, per APIC, in guest memory,
      that tells the guest that it does not need EOI.
      Guest tests it using a single est and clear operation - this is
      necessary so that host can detect interrupt nesting - and if set, it can
      skip the EOI MSR.
      
      I run a simple microbenchmark to show exit reduction
      (note: for testing, need to apply follow-up patch
      'kvm: host side for eoi optimization' + a qemu patch
       I posted separately, on host):
      
      Before:
      
      Performance counter stats for 'sleep 1s':
      
                  47,357 kvm:kvm_entry                                                [99.98%]
                       0 kvm:kvm_hypercall                                            [99.98%]
                       0 kvm:kvm_hv_hypercall                                         [99.98%]
                   5,001 kvm:kvm_pio                                                  [99.98%]
                       0 kvm:kvm_cpuid                                                [99.98%]
                  22,124 kvm:kvm_apic                                                 [99.98%]
                  49,849 kvm:kvm_exit                                                 [99.98%]
                  21,115 kvm:kvm_inj_virq                                             [99.98%]
                       0 kvm:kvm_inj_exception                                        [99.98%]
                       0 kvm:kvm_page_fault                                           [99.98%]
                  22,937 kvm:kvm_msr                                                  [99.98%]
                       0 kvm:kvm_cr                                                   [99.98%]
                       0 kvm:kvm_pic_set_irq                                          [99.98%]
                       0 kvm:kvm_apic_ipi                                             [99.98%]
                  22,207 kvm:kvm_apic_accept_irq                                      [99.98%]
                  22,421 kvm:kvm_eoi                                                  [99.98%]
                       0 kvm:kvm_pv_eoi                                               [99.99%]
                       0 kvm:kvm_nested_vmrun                                         [99.99%]
                       0 kvm:kvm_nested_intercepts                                    [99.99%]
                       0 kvm:kvm_nested_vmexit                                        [99.99%]
                       0 kvm:kvm_nested_vmexit_inject                                    [99.99%]
                       0 kvm:kvm_nested_intr_vmexit                                    [99.99%]
                       0 kvm:kvm_invlpga                                              [99.99%]
                       0 kvm:kvm_skinit                                               [99.99%]
                      57 kvm:kvm_emulate_insn                                         [99.99%]
                       0 kvm:vcpu_match_mmio                                          [99.99%]
                       0 kvm:kvm_userspace_exit                                       [99.99%]
                       2 kvm:kvm_set_irq                                              [99.99%]
                       2 kvm:kvm_ioapic_set_irq                                       [99.99%]
                  23,609 kvm:kvm_msi_set_irq                                          [99.99%]
                       1 kvm:kvm_ack_irq                                              [99.99%]
                     131 kvm:kvm_mmio                                                 [99.99%]
                     226 kvm:kvm_fpu                                                  [100.00%]
                       0 kvm:kvm_age_page                                             [100.00%]
                       0 kvm:kvm_try_async_get_page                                    [100.00%]
                       0 kvm:kvm_async_pf_doublefault                                    [100.00%]
                       0 kvm:kvm_async_pf_not_present                                    [100.00%]
                       0 kvm:kvm_async_pf_ready                                       [100.00%]
                       0 kvm:kvm_async_pf_completed
      
             1.002100578 seconds time elapsed
      
      After:
      
       Performance counter stats for 'sleep 1s':
      
                  28,354 kvm:kvm_entry                                                [99.98%]
                       0 kvm:kvm_hypercall                                            [99.98%]
                       0 kvm:kvm_hv_hypercall                                         [99.98%]
                   1,347 kvm:kvm_pio                                                  [99.98%]
                       0 kvm:kvm_cpuid                                                [99.98%]
                   1,931 kvm:kvm_apic                                                 [99.98%]
                  29,595 kvm:kvm_exit                                                 [99.98%]
                  24,884 kvm:kvm_inj_virq                                             [99.98%]
                       0 kvm:kvm_inj_exception                                        [99.98%]
                       0 kvm:kvm_page_fault                                           [99.98%]
                   1,986 kvm:kvm_msr                                                  [99.98%]
                       0 kvm:kvm_cr                                                   [99.98%]
                       0 kvm:kvm_pic_set_irq                                          [99.98%]
                       0 kvm:kvm_apic_ipi                                             [99.99%]
                  25,953 kvm:kvm_apic_accept_irq                                      [99.99%]
                  26,132 kvm:kvm_eoi                                                  [99.99%]
                  26,593 kvm:kvm_pv_eoi                                               [99.99%]
                       0 kvm:kvm_nested_vmrun                                         [99.99%]
                       0 kvm:kvm_nested_intercepts                                    [99.99%]
                       0 kvm:kvm_nested_vmexit                                        [99.99%]
                       0 kvm:kvm_nested_vmexit_inject                                    [99.99%]
                       0 kvm:kvm_nested_intr_vmexit                                    [99.99%]
                       0 kvm:kvm_invlpga                                              [99.99%]
                       0 kvm:kvm_skinit                                               [99.99%]
                     284 kvm:kvm_emulate_insn                                         [99.99%]
                      68 kvm:vcpu_match_mmio                                          [99.99%]
                      68 kvm:kvm_userspace_exit                                       [99.99%]
                       2 kvm:kvm_set_irq                                              [99.99%]
                       2 kvm:kvm_ioapic_set_irq                                       [99.99%]
                  28,288 kvm:kvm_msi_set_irq                                          [99.99%]
                       1 kvm:kvm_ack_irq                                              [99.99%]
                     131 kvm:kvm_mmio                                                 [100.00%]
                     588 kvm:kvm_fpu                                                  [100.00%]
                       0 kvm:kvm_age_page                                             [100.00%]
                       0 kvm:kvm_try_async_get_page                                    [100.00%]
                       0 kvm:kvm_async_pf_doublefault                                    [100.00%]
                       0 kvm:kvm_async_pf_not_present                                    [100.00%]
                       0 kvm:kvm_async_pf_ready                                       [100.00%]
                       0 kvm:kvm_async_pf_completed
      
             1.002039622 seconds time elapsed
      
      We see that # of exits is almost halved.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      ab9cf499
    • Michael S. Tsirkin's avatar
      KVM: optimize ISR lookups · 8680b94b
      Michael S. Tsirkin authored
      We perform ISR lookups twice: during interrupt
      injection and on EOI. Typical workloads only have
      a single bit set there. So we can avoid ISR scans by
      1. counting bits as we set/clear them in ISR
      2. on set, caching the injected vector number
      3. on clear, invalidating the cache
      
      The real purpose of this is enabling PV EOI
      which needs to quickly validate the vector.
      But non PV guests also benefit: with this patch,
      and without interrupt nesting, apic_find_highest_isr
      will always return immediately without scanning ISR.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      8680b94b
    • Michael S. Tsirkin's avatar
      KVM: document lapic regs field · 5eadf916
      Michael S. Tsirkin authored
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      5eadf916
  5. 19 Jun, 2012 1 commit
  6. 18 Jun, 2012 2 commits
  7. 13 Jun, 2012 4 commits
  8. 12 Jun, 2012 1 commit
  9. 06 Jun, 2012 3 commits
    • Avi Kivity's avatar
      Merge branch 'for-upstream' of git://github.com/agraf/linux-2.6 into next · 25e531a9
      Avi Kivity authored
      Alex says:
      
      "Changes this time include:
      
        - Generalize KVM_GUEST support to overall ePAPR code
        - Fix reset for Book3S HV
        - Fix machine check deferral when CONFIG_KVM_GUEST=y
        - Add support for BookE register DECAR"
      
      * 'for-upstream' of git://github.com/agraf/linux-2.6:
        KVM: PPC: Not optimizing MSR_CE and MSR_ME with paravirt.
        KVM: PPC: booke: Added DECAR support
        KVM: PPC: Book3S HV: Make the guest hash table size configurable
        KVM: PPC: Factor out guest epapr initialization
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      25e531a9
    • Michael S. Tsirkin's avatar
      KVM: disable uninitialized var warning · 79f702a6
      Michael S. Tsirkin authored
      I see this in 3.5-rc1:
      
      arch/x86/kvm/mmu.c: In function ‘kvm_test_age_rmapp’:
      arch/x86/kvm/mmu.c:1271: warning: ‘iter.desc’ may be used uninitialized in this function
      
      The line in question was introduced by commit
      1e3f42f0
      
       static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
                                    unsigned long data)
       {
      -       u64 *spte;
      +       u64 *sptep;
      +       struct rmap_iterator iter;   <- line 1271
              int young = 0;
      
              /*
      
      The reason I think is that the compiler assumes that
      the rmap value could be 0, so
      
      static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator
      *iter)
      {
              if (!rmap)
                      return NULL;
      
              if (!(rmap & 1)) {
                      iter->desc = NULL;
                      return (u64 *)rmap;
              }
      
              iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
              iter->pos = 0;
              return iter->desc->sptes[iter->pos];
      }
      
      will not initialize iter.desc, but the compiler isn't
      smart enough to see that
      
              for (sptep = rmap_get_first(*rmapp, &iter); sptep;
                   sptep = rmap_get_next(&iter)) {
      
      will immediately exit in this case.
      I checked by adding
              if (!*rmapp)
                      goto out;
      on top which is clearly equivalent but disables the warning.
      
      This patch uses uninitialized_var to disable the warning without
      increasing code size.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      79f702a6
    • Christoffer Dall's avatar
      KVM: Cleanup the kvm_print functions and introduce pr_XX wrappers · a737f256
      Christoffer Dall authored
      Introduces a couple of print functions, which are essentially wrappers
      around standard printk functions, with a KVM: prefix.
      
      Functions introduced or modified are:
       - kvm_err(fmt, ...)
       - kvm_info(fmt, ...)
       - kvm_debug(fmt, ...)
       - kvm_pr_unimpl(fmt, ...)
       - pr_unimpl(vcpu, fmt, ...) -> vcpu_unimpl(vcpu, fmt, ...)
      Signed-off-by: default avatarChristoffer Dall <c.dall@virtualopensystems.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      a737f256