1. 24 Sep, 2014 1 commit
  2. 17 Sep, 2014 6 commits
  3. 16 Sep, 2014 2 commits
    • Zhang Haoyu's avatar
      kvm: ioapic: conditionally delay irq delivery duringeoi broadcast · 184564ef
      Zhang Haoyu authored
      Currently, we call ioapic_service() immediately when we find the irq is still
      active during eoi broadcast. But for real hardware, there's some delay between
      the EOI writing and irq delivery.  If we do not emulate this behavior, and
      re-inject the interrupt immediately after the guest sends an EOI and re-enables
      interrupts, a guest might spend all its time in the ISR if it has a broken
      handler for a level-triggered interrupt.
      
      Such livelock actually happens with Windows guests when resuming from
      hibernation.
      
      As there's no way to recognize the broken handle from new raised ones, this patch
      delays an interrupt if 10.000 consecutive EOIs found that the interrupt was
      still high.  The guest can then make a little forward progress, until a proper
      IRQ handler is set or until some detection routine in the guest (such as
      Linux's note_interrupt()) recognizes the situation.
      
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarZhang Haoyu <zhanghy@sangfor.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      184564ef
    • Guo Hui Liu's avatar
      KVM: x86: Use kvm_make_request when applicable · 105b21bb
      Guo Hui Liu authored
      This patch replace the set_bit method by kvm_make_request
      to make code more readable and consistent.
      Signed-off-by: default avatarGuo Hui Liu <liuguohui@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      105b21bb
  4. 11 Sep, 2014 3 commits
  5. 10 Sep, 2014 9 commits
  6. 05 Sep, 2014 5 commits
  7. 03 Sep, 2014 6 commits
    • Paolo Bonzini's avatar
      KVM: nSVM: propagate the NPF EXITINFO to the guest · 5e352519
      Paolo Bonzini authored
      This is similar to what the EPT code does with the exit qualification.
      This allows the guest to see a valid value for bits 33:32.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5e352519
    • Paolo Bonzini's avatar
      KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD · a0c0feb5
      Paolo Bonzini authored
      Bit 8 would be the "global" bit, which does not quite make sense for non-leaf
      page table entries.  Intel ignores it; AMD ignores it in PDEs, but reserves it
      in PDPEs and PML4Es.  The SVM test is relying on this behavior, so enforce it.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a0c0feb5
    • Tiejun Chen's avatar
      KVM: mmio: cleanup kvm_set_mmio_spte_mask · d1431483
      Tiejun Chen authored
      Just reuse rsvd_bits() inside kvm_set_mmio_spte_mask()
      for slightly better code.
      Signed-off-by: default avatarTiejun Chen <tiejun.chen@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d1431483
    • David Matlack's avatar
      kvm: x86: fix stale mmio cache bug · 56f17dd3
      David Matlack authored
      The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
      up to userspace:
      
      (1) Guest accesses gpa X without a memory slot. The gfn is cached in
      struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
      the SPTE write-execute-noread so that future accesses cause
      EPT_MISCONFIGs.
      
      (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
      covering the page just accessed.
      
      (3) Guest attempts to read or write to gpa X again. On Intel, this
      generates an EPT_MISCONFIG. The memory slot generation number that
      was incremented in (2) would normally take care of this but we fast
      path mmio faults through quickly_check_mmio_pf(), which only checks
      the per-vcpu mmio cache. Since we hit the cache, KVM passes a
      KVM_EXIT_MMIO up to userspace.
      
      This patch fixes the issue by using the memslot generation number
      to validate the mmio cache.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      [xiaoguangrong: adjust the code to make it simpler for stable-tree fix.]
      Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Reviewed-by: default avatarDavid Matlack <dmatlack@google.com>
      Reviewed-by: default avatarXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Tested-by: default avatarDavid Matlack <dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      56f17dd3
    • David Matlack's avatar
      kvm: fix potentially corrupt mmio cache · ee3d1570
      David Matlack authored
      vcpu exits and memslot mutations can run concurrently as long as the
      vcpu does not aquire the slots mutex. Thus it is theoretically possible
      for memslots to change underneath a vcpu that is handling an exit.
      
      If we increment the memslot generation number again after
      synchronize_srcu_expedited(), vcpus can safely cache memslot generation
      without maintaining a single rcu_dereference through an entire vm exit.
      And much of the x86/kvm code does not maintain a single rcu_dereference
      of the current memslots during each exit.
      
      We can prevent the following case:
      
         vcpu (CPU 0)                             | thread (CPU 1)
      --------------------------------------------+--------------------------
      1  vm exit                                  |
      2  srcu_read_unlock(&kvm->srcu)             |
      3  decide to cache something based on       |
           old memslots                           |
      4                                           | change memslots
                                                  | (increments generation)
      5                                           | synchronize_srcu(&kvm->srcu);
      6  retrieve generation # from new memslots  |
      7  tag cache with new memslot generation    |
      8  srcu_read_unlock(&kvm->srcu)             |
      ...                                         |
         <action based on cache occurs even       |
          though the caching decision was based   |
          on the old memslots>                    |
      ...                                         |
         <action *continues* to occur until next  |
          memslot generation change, which may    |
          be never>                               |
                                                  |
      
      By incrementing the generation after synchronizing with kvm->srcu readers,
      we ensure that the generation retrieved in (6) will become invalid soon
      after (8).
      
      Keeping the existing increment is not strictly necessary, but we
      do keep it and just move it for consistency from update_memslots to
      install_new_memslots.  It invalidates old cached MMIOs immediately,
      instead of having to wait for the end of synchronize_srcu_expedited,
      which makes the code more clearly correct in case CPU 1 is preempted
      right after synchronize_srcu() returns.
      
      To avoid halving the generation space in SPTEs, always presume that the
      low bit of the generation is zero when reconstructing a generation number
      out of an SPTE.  This effectively disables MMIO caching in SPTEs during
      the call to synchronize_srcu_expedited.  Using the low bit this way is
      somewhat like a seqcount---where the protected thing is a cache, and
      instead of retrying we can simply punt if we observe the low bit to be 1.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Reviewed-by: default avatarXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Reviewed-by: default avatarDavid Matlack <dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ee3d1570
    • Paolo Bonzini's avatar
      KVM: do not bias the generation number in kvm_current_mmio_generation · 00f034a1
      Paolo Bonzini authored
      The next patch will give a meaning (a la seqcount) to the low bit of the
      generation number.  Ensure that it matches between kvm->memslots->generation
      and kvm_current_mmio_generation().
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarDavid Matlack <dmatlack@google.com>
      Reviewed-by: default avatarXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      00f034a1
  8. 29 Aug, 2014 8 commits
    • Paolo Bonzini's avatar
      KVM: x86: use guest maxphyaddr to check MTRR values · fd275235
      Paolo Bonzini authored
      The check introduced in commit d7a2a246 (KVM: x86: #GP when attempts to write reserved bits of Variable Range MTRRs, 2014-08-19)
      will break if the guest maxphyaddr is higher than the host's (which
      sometimes happens depending on your hardware and how QEMU is
      configured).
      
      To fix this, use cpuid_maxphyaddr similar to how the APIC_BASE MSR
      does already.
      Reported-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Tested-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fd275235
    • Radim Krčmář's avatar
      KVM: remove garbage arg to *hardware_{en,dis}able · 13a34e06
      Radim Krčmář authored
      In the beggining was on_each_cpu(), which required an unused argument to
      kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten.
      
      Remove unnecessary arguments that stem from this.
      
      Signed-off-by: Radim KrčmáŠ<rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      13a34e06
    • Radim Krčmář's avatar
      KVM: static inline empty kvm_arch functions · 0865e636
      Radim Krčmář authored
      Using static inline is going to save few bytes and cycles.
      For example on powerpc, the difference is 700 B after stripping.
      (5 kB before)
      
      This patch also deals with two overlooked empty functions:
      kvm_arch_flush_shadow was not removed from arch/mips/kvm/mips.c
        2df72e9b KVM: split kvm_arch_flush_shadow
      and kvm_arch_sched_in never made it into arch/ia64/kvm/kvm-ia64.c.
        e790d9ef KVM: add kvm_arch_sched_in
      
      Signed-off-by: Radim KrčmáŠ<rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0865e636
    • Paolo Bonzini's avatar
      KVM: forward declare structs in kvm_types.h · 65647300
      Paolo Bonzini authored
      Opaque KVM structs are useful for prototypes in asm/kvm_host.h, to avoid
      "'struct foo' declared inside parameter list" warnings (and consequent
      breakage due to conflicting types).
      
      Move them from individual files to a generic place in linux/kvm_types.h.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      65647300
    • Paolo Bonzini's avatar
      KVM: x86: remove Aligned bit from movntps/movntpd · d5b77069
      Paolo Bonzini authored
      These are not explicitly aligned, and do not require alignment on AVX.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d5b77069
    • Alex Williamson's avatar
      KVM: x86 emulator: emulate MOVNTDQ · 0a37027e
      Alex Williamson authored
      Windows 8.1 guest with NVIDIA driver and GPU fails to boot with an
      emulation failure.  The KVM spew suggests the fault is with lack of
      movntdq emulation (courtesy of Paolo):
      
      Code=02 00 00 b8 08 00 00 00 f3 0f 6f 44 0a f0 f3 0f 6f 4c 0a e0 <66> 0f e7 41 f0 66 0f e7 49 e0 48 83 e9 40 f3 0f 6f 44 0a 10 f3 0f 6f 0c 0a 66 0f e7 41 10
      
      $ as -o a.out
              .section .text
              .byte 0x66, 0x0f, 0xe7, 0x41, 0xf0
              .byte 0x66, 0x0f, 0xe7, 0x49, 0xe0
      $ objdump -d a.out
          0:  66 0f e7 41 f0          movntdq %xmm0,-0x10(%rcx)
          5:  66 0f e7 49 e0          movntdq %xmm1,-0x20(%rcx)
      
      Add the necessary emulation.
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0a37027e
    • Nadav Amit's avatar
      KVM: vmx: VMXOFF emulation in vm86 should cause #UD · 0f54a321
      Nadav Amit authored
      Unlike VMCALL, the instructions VMXOFF, VMLAUNCH and VMRESUME should cause a UD
      exception in real-mode or vm86.  However, the emulator considers all these
      instructions the same for the matter of mode checks, and emulation upon exit
      due to #UD exception.
      
      As a result, the hypervisor behaves incorrectly on vm86 mode. VMXOFF, VMLAUNCH
      or VMRESUME cause on vm86 exit due to #UD. The hypervisor then emulates these
      instruction and inject #GP to the guest instead of #UD.
      
      This patch creates a new group for these instructions and mark only VMCALL as
      an instruction which can be emulated.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0f54a321
    • Paolo Bonzini's avatar
      KVM: x86: fix some sparse warnings · 48d89b92
      Paolo Bonzini authored
      Sparse reports the following easily fixed warnings:
      
         arch/x86/kvm/vmx.c:8795:48: sparse: Using plain integer as NULL pointer
         arch/x86/kvm/vmx.c:2138:5: sparse: symbol vmx_read_l1_tsc was not declared. Should it be static?
         arch/x86/kvm/vmx.c:6151:48: sparse: Using plain integer as NULL pointer
         arch/x86/kvm/vmx.c:8851:6: sparse: symbol vmx_sched_in was not declared. Should it be static?
      
         arch/x86/kvm/svm.c:2162:5: sparse: symbol svm_read_l1_tsc was not declared. Should it be static?
      
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      48d89b92