1. 01 Sep, 2017 1 commit
  2. 31 Aug, 2017 8 commits
  3. 29 Aug, 2017 1 commit
  4. 25 Aug, 2017 1 commit
  5. 24 Aug, 2017 13 commits
    • Wanpeng Li's avatar
      KVM: nVMX: Fix trying to cancel vmlauch/vmresume · bfcf83b1
      Wanpeng Li authored
      ------------[ cut here ]------------
      WARNING: CPU: 7 PID: 3861 at /home/kernel/ssd/kvm/arch/x86/kvm//vmx.c:11299 nested_vmx_vmexit+0x176e/0x1980 [kvm_intel]
      CPU: 7 PID: 3861 Comm: qemu-system-x86 Tainted: G        W  OE   4.13.0-rc4+ #11
      RIP: 0010:nested_vmx_vmexit+0x176e/0x1980 [kvm_intel]
      Call Trace:
       ? kvm_multiple_exception+0x149/0x170 [kvm]
       ? handle_emulation_failure+0x79/0x230 [kvm]
       ? load_vmcs12_host_state+0xa80/0xa80 [kvm_intel]
       ? check_chain_key+0x137/0x1e0
       ? reexecute_instruction.part.168+0x130/0x130 [kvm]
       nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel]
       ? nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel]
       vmx_queue_exception+0x197/0x300 [kvm_intel]
       kvm_arch_vcpu_ioctl_run+0x1b0c/0x2c90 [kvm]
       ? kvm_arch_vcpu_runnable+0x220/0x220 [kvm]
       ? preempt_count_sub+0x18/0xc0
       ? restart_apic_timer+0x17d/0x300 [kvm]
       ? kvm_lapic_restart_hv_timer+0x37/0x50 [kvm]
       ? kvm_arch_vcpu_load+0x1d8/0x350 [kvm]
       kvm_vcpu_ioctl+0x4e4/0x910 [kvm]
       ? kvm_vcpu_ioctl+0x4e4/0x910 [kvm]
       ? kvm_dev_ioctl+0xbe0/0xbe0 [kvm]
      
      The flag "nested_run_pending", which can override the decision of which should run
      next, L1 or L2. nested_run_pending=1 means that we *must* run L2 next, not L1. This
      is necessary in particular when L1 did a VMLAUNCH of L2 and therefore expects L2 to
      be run (and perhaps be injected with an event it specified, etc.). Nested_run_pending
      is especially intended to avoid switching  to L1 in the injection decision-point.
      
      This can be handled just like the other cases in vmx_check_nested_events, instead of
      having a special case in vmx_queue_exception.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bfcf83b1
    • Wanpeng Li's avatar
      KVM: X86: Fix loss of exception which has not yet been injected · 664f8e26
      Wanpeng Li authored
      vmx_complete_interrupts() assumes that the exception is always injected,
      so it can be dropped by kvm_clear_exception_queue().  However,
      an exception cannot be injected immediately if it is: 1) originally
      destined to a nested guest; 2) trapped to cause a vmexit; 3) happening
      right after VMLAUNCH/VMRESUME, i.e. when nested_run_pending is true.
      
      This patch applies to exceptions the same algorithm that is used for
      NMIs, replacing exception.reinject with "exception.injected" (equivalent
      to nmi_injected).
      
      exception.pending now represents an exception that is queued and whose
      side effects (e.g., update RFLAGS.RF or DR7) have not been applied yet.
      If exception.pending is true, the exception might result in a nested
      vmexit instead, too (in which case the side effects must not be applied).
      
      exception.injected instead represents an exception that is going to be
      injected into the guest at the next vmentry.
      Reported-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      664f8e26
    • Wanpeng Li's avatar
      KVM: VMX: use kvm_event_needs_reinjection · 274bba52
      Wanpeng Li authored
      Use kvm_event_needs_reinjection() encapsulation.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      274bba52
    • Paolo Bonzini's avatar
      KVM: MMU: speedup update_permission_bitmask · 09f037aa
      Paolo Bonzini authored
      update_permission_bitmask currently does a 128-iteration loop to,
      essentially, compute a constant array.  Computing the 8 bits in parallel
      reduces it to 16 iterations, and is enough to speed it up substantially
      because many boolean operations in the inner loop become constants or
      simplify noticeably.
      
      Because update_permission_bitmask is actually the top item in the profile
      for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand
      clock cycles, or up to 30%:
      
                                               before     after
         cpuid                                 35173      25954
         vmcall                                35122      27079
         inl_from_pmtimer                      52635      42675
         inl_from_qemu                         53604      44599
         inl_from_kernel                       38498      30798
         outl_to_kernel                        34508      28816
         wr_tsc_adjust_msr                     34185      26818
         rd_tsc_adjust_msr                     37409      27049
         mmio-no-eventfd:pci-mem               50563      45276
         mmio-wildcard-eventfd:pci-mem         34495      30823
         mmio-datamatch-eventfd:pci-mem        35612      31071
         portio-no-eventfd:pci-io              44925      40661
         portio-wildcard-eventfd:pci-io        29708      27269
         portio-datamatch-eventfd:pci-io       31135      27164
      
      (I wrote a small C program to compare the tables for all values of CR0.WP,
      CR4.SMAP and CR4.SMEP, and they match).
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      09f037aa
    • Yu Zhang's avatar
      KVM: MMU: Expose the LA57 feature to VM. · fd8cb433
      Yu Zhang authored
      This patch exposes 5 level page table feature to the VM.
      At the same time, the canonical virtual address checking is
      extended to support both 48-bits and 57-bits address width.
      Signed-off-by: default avatarYu Zhang <yu.c.zhang@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fd8cb433
    • Yu Zhang's avatar
      KVM: MMU: Add 5 level EPT & Shadow page table support. · 855feb67
      Yu Zhang authored
      Extends the shadow paging code, so that 5 level shadow page
      table can be constructed if VM is running in 5 level paging
      mode.
      
      Also extends the ept code, so that 5 level ept table can be
      constructed if maxphysaddr of VM exceeds 48 bits. Unlike the
      shadow logic, KVM should still use 4 level ept table for a VM
      whose physical address width is less than 48 bits, even when
      the VM is running in 5 level paging mode.
      Signed-off-by: default avatarYu Zhang <yu.c.zhang@linux.intel.com>
      [Unconditionally reset the MMU context in kvm_cpuid_update.
       Changing MAXPHYADDR invalidates the reserved bit bitmasks.
       - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      855feb67
    • Yu Zhang's avatar
      KVM: MMU: Rename PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL. · 2a7266a8
      Yu Zhang authored
      Now we have 4 level page table and 5 level page table in 64 bits
      long mode, let's rename the PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL,
      then we can use PT64_ROOT_5LEVEL for 5 level page table, it's
      helpful to make the code more clear.
      
      Also PT64_ROOT_MAX_LEVEL is defined as 4, so that we can just
      redefine it to 5 whenever a replacement is needed for 5 level
      paging.
      Signed-off-by: default avatarYu Zhang <yu.c.zhang@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2a7266a8
    • Yu Zhang's avatar
      KVM: MMU: check guest CR3 reserved bits based on its physical address width. · d1cd3ce9
      Yu Zhang authored
      Currently, KVM uses CR3_L_MODE_RESERVED_BITS to check the
      reserved bits in CR3. Yet the length of reserved bits in
      guest CR3 should be based on the physical address width
      exposed to the VM. This patch changes CR3 check logic to
      calculate the reserved bits at runtime.
      Signed-off-by: default avatarYu Zhang <yu.c.zhang@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d1cd3ce9
    • Yu Zhang's avatar
      KVM: x86: Add return value to kvm_cpuid(). · e911eb3b
      Yu Zhang authored
      Return false in kvm_cpuid() when it fails to find the cpuid
      entry. Also, this routine(and its caller) is optimized with
      a new argument - check_limit, so that the check_cpuid_limit()
      fall back can be avoided.
      Signed-off-by: default avatarYu Zhang <yu.c.zhang@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e911eb3b
    • Paolo Bonzini's avatar
      kvm: vmx: Raise #UD on unsupported XSAVES/XRSTORS · 3db13480
      Paolo Bonzini authored
      A guest may not be configured to support XSAVES/XRSTORS, even when the host
      does. If the guest does not support XSAVES/XRSTORS, clear the secondary
      execution control so that the processor will raise #UD.
      
      Also clear the "allowed-1" bit for XSAVES/XRSTORS exiting in the
      IA32_VMX_PROCBASED_CTLS2 MSR, and pass through VMCS12's control in
      the VMCS02.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3db13480
    • Jim Mattson's avatar
      kvm: vmx: Raise #UD on unsupported RDSEED · 75f4fc8d
      Jim Mattson authored
      A guest may not be configured to support RDSEED, even when the host
      does. If the guest does not support RDSEED, intercept the instruction
      and synthesize #UD. Also clear the "allowed-1" bit for RDSEED exiting
      in the IA32_VMX_PROCBASED_CTLS2 MSR.
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      75f4fc8d
    • Jim Mattson's avatar
      kvm: vmx: Raise #UD on unsupported RDRAND · 45ec368c
      Jim Mattson authored
      A guest may not be configured to support RDRAND, even when the host
      does. If the guest does not support RDRAND, intercept the instruction
      and synthesize #UD. Also clear the "allowed-1" bit for RDRAND exiting
      in the IA32_VMX_PROCBASED_CTLS2 MSR.
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      45ec368c
    • Paolo Bonzini's avatar
      KVM: VMX: cache secondary exec controls · 80154d77
      Paolo Bonzini authored
      Currently, secondary execution controls are divided in three groups:
      
      - static, depending mostly on the module arguments or the processor
        (vmx_secondary_exec_control)
      
      - static, depending on CPUID (vmx_cpuid_update)
      
      - dynamic, depending on nested VMX or local APIC state
      
      Because walking CPUID is expensive, prepare_vmcs02 is using only
      the first group.  This however is unnecessarily complicated.  Just
      cache the static secondary execution controls, and then prepare_vmcs02
      does not need to compute them every time.  Computation of all static
      secondary execution controls is now kept in a single function,
      vmx_compute_secondary_exec_control.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      80154d77
  6. 23 Aug, 2017 2 commits
  7. 18 Aug, 2017 6 commits
  8. 17 Aug, 2017 1 commit
    • Aneesh Kumar K.V's avatar
      powerpc/mm: Rename find_linux_pte_or_hugepte() · 94171b19
      Aneesh Kumar K.V authored
      Add newer helpers to make the function usage simpler. It is always
      recommended to use find_current_mm_pte() for walking the page table.
      If we cannot use find_current_mm_pte(), it should be documented why
      the said usage of __find_linux_pte() is safe against a parallel THP
      split.
      
      For now we have KVM code using __find_linux_pte(). This is because kvm
      code ends up calling __find_linux_pte() in real mode with MSR_EE=0 but
      with PACA soft_enabled = 1. We may want to fix that later and make
      sure we keep the MSR_EE and PACA soft_enabled in sync. When we do that
      we can switch kvm to use find_linux_pte().
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      94171b19
  9. 15 Aug, 2017 1 commit
    • Arnd Bergmann's avatar
      kvm: avoid uninitialized-variable warnings · 076b925d
      Arnd Bergmann authored
      When PAGE_OFFSET is not a compile-time constant, we run into
      warnings from the use of kvm_is_error_hva() that the compiler
      cannot optimize out:
      
      arch/arm/kvm/../../../virt/kvm/kvm_main.c: In function '__kvm_gfn_to_hva_cache_init':
      arch/arm/kvm/../../../virt/kvm/kvm_main.c:1978:14: error: 'nr_pages_avail' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      arch/arm/kvm/../../../virt/kvm/kvm_main.c: In function 'gfn_to_page_many_atomic':
      arch/arm/kvm/../../../virt/kvm/kvm_main.c:1660:5: error: 'entry' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      This adds fake initializations to the two instances I ran into.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      076b925d
  10. 11 Aug, 2017 5 commits
    • Jim Mattson's avatar
      kvm: x86: Disallow illegal IA32_APIC_BASE MSR values · d3802286
      Jim Mattson authored
      Host-initiated writes to the IA32_APIC_BASE MSR do not have to follow
      local APIC state transition constraints, but the value written must be
      valid.
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d3802286
    • Wanpeng Li's avatar
      KVM: MMU: Bail out immediately if there is no available mmu page · 26eeb53c
      Wanpeng Li authored
      Bailing out immediately if there is no available mmu page to alloc.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      26eeb53c
    • Wanpeng Li's avatar
      KVM: MMU: Fix softlockup due to mmu_lock is held too long · 42bcbebf
      Wanpeng Li authored
      watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [warn_test:3089]
       irq event stamp: 20532
       hardirqs last  enabled at (20531): [<ffffffff8e9b6908>] restore_regs_and_iret+0x0/0x1d
       hardirqs last disabled at (20532): [<ffffffff8e9b7ae8>] apic_timer_interrupt+0x98/0xb0
       softirqs last  enabled at (8266): [<ffffffff8e9badc6>] __do_softirq+0x206/0x4c1
       softirqs last disabled at (8253): [<ffffffff8e083918>] irq_exit+0xf8/0x100
       CPU: 5 PID: 3089 Comm: warn_test Tainted: G           OE   4.13.0-rc3+ #8
       RIP: 0010:kvm_mmu_prepare_zap_page+0x72/0x4b0 [kvm]
       Call Trace:
        make_mmu_pages_available.isra.120+0x71/0xc0 [kvm]
        kvm_mmu_load+0x1cf/0x410 [kvm]
        kvm_arch_vcpu_ioctl_run+0x1316/0x1bf0 [kvm]
        kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? __fget+0xfc/0x210
        do_vfs_ioctl+0xa4/0x6a0
        ? __fget+0x11d/0x210
        SyS_ioctl+0x79/0x90
        entry_SYSCALL_64_fastpath+0x23/0xc2
        ? __this_cpu_preempt_check+0x13/0x20
      
      This can be reproduced readily by ept=N and running syzkaller tests since
      many syzkaller testcases don't setup any memory regions. However, if ept=Y
      rmode identity map will be created, then kvm_mmu_calculate_mmu_pages() will
      extend the number of VM's mmu pages to at least KVM_MIN_ALLOC_MMU_PAGES
      which just hide the issue.
      
      I saw the scenario kvm->arch.n_max_mmu_pages == 0 && kvm->arch.n_used_mmu_pages == 1,
      so there is one active mmu page on the list, kvm_mmu_prepare_zap_page() fails
      to zap any pages, however prepare_zap_oldest_mmu_page() always returns true.
      It incurs infinite loop in make_mmu_pages_available() which causes mmu->lock
      softlockup.
      
      This patch fixes it by setting the return value of prepare_zap_oldest_mmu_page()
      according to whether or not there is mmu page zapped.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      42bcbebf
    • David Hildenbrand's avatar
      KVM: nVMX: validate eptp pointer · a057e0e2
      David Hildenbrand authored
      Let's reuse the function introduced with eptp switching.
      
      We don't explicitly have to check against enable_ept_ad_bits, as this
      is implicitly done when checking against nested_vmx_ept_caps in
      valid_ept_address().
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a057e0e2
    • Andrew Jones's avatar
      KVM: MAINTAINERS improvements · a170504f
      Andrew Jones authored
      Remove nonexistent files, allow less awkward expressions when
      extracting arch-specific information, and only return relevant
      information when using arch-specific expressions. Additionally
      add include/trace/events/kvm.h, arch/*/include/uapi/asm/kvm*,
      and arch/powerpc/kernel/kvm* to appropriate sections. The arch-
      specific expressions are now:
      
       /KVM/                                        -- All KVM
       /\(KVM\)|\(KVM\/x86\)/                       -- X86
       /\(KVM\)|\(KVM\/x86\)|\(KVM\/amd\)/          -- X86 plus AMD
       /\(KVM\)|\(KVM\/arm\)/                       -- ARM
       /\(KVM\)|\(KVM\/arm\)|\(KVM\/arm64\)/        -- ARM plus ARM64
       /\(KVM\)|\(KVM\/powerpc\)/                   -- POWERPC
       /\(KVM\)|\(KVM\/s390\)/                      -- S390
       /\(KVM\)|\(KVM\/mips\)/                      -- MIPS
      Signed-off-by: default avatarAndrew Jones <drjones@redhat.com>
      Acked-by: default avatarCornelia Huck <cohuck@redhat.com>
      Acked-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a170504f
  11. 10 Aug, 2017 1 commit