1. 26 Mar, 2020 1 commit
    • Wanpeng Li's avatar
      KVM: X86: Narrow down the IPI fastpath to single target IPI · e1be9ac8
      Wanpeng Li authored
      The original single target IPI fastpath patch forgot to filter the
      ICR destination shorthand field. Multicast IPI is not suitable for
      this feature since wakeup the multiple sleeping vCPUs will extend
      the interrupt disabled time, it especially worse in the over-subscribe
      and VM has a little bit more vCPUs scenario. Let's narrow it down to
      single target IPI.
      
      Two VMs, each is 76 vCPUs, one running 'ebizzy -M', the other
      running cyclictest on all vCPUs, w/ this patch, the avg score
      of cyclictest can improve more than 5%. (pv tlb, pv ipi, pv
      sched yield are disabled during testing to avoid the disturb).
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1585189202-1708-3-git-send-email-wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e1be9ac8
  2. 24 Mar, 2020 1 commit
  3. 23 Mar, 2020 3 commits
    • Nick Desaulniers's avatar
      KVM: VMX: don't allow memory operands for inline asm that modifies SP · 428b8f1d
      Nick Desaulniers authored
      THUNK_TARGET defines [thunk_target] as having "rm" input constraints
      when CONFIG_RETPOLINE is not set, which isn't constrained enough for
      this specific case.
      
      For inline assembly that modifies the stack pointer before using this
      input, the underspecification of constraints is dangerous, and results
      in an indirect call to a previously pushed flags register.
      
      In this case `entry`'s stack slot is good enough to satisfy the "m"
      constraint in "rm", but the inline assembly in
      handle_external_interrupt_irqoff() modifies the stack pointer via
      push+pushf before using this input, which in this case results in
      calling what was the previous state of the flags register, rather than
      `entry`.
      
      Be more specific in the constraints by requiring `entry` be in a
      register, and not a memory operand.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reported-by: syzbot+3f29ca2efb056a761e38@syzkaller.appspotmail.com
      Debugged-by: default avatarAlexander Potapenko <glider@google.com>
      Debugged-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Debugged-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Message-Id: <20200323191243.30002-1-ndesaulniers@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      428b8f1d
    • He Zhe's avatar
      KVM: LAPIC: Mark hrtimer for period or oneshot mode to expire in hard interrupt context · edec6e01
      He Zhe authored
      apic->lapic_timer.timer was initialized with HRTIMER_MODE_ABS_HARD but
      started later with HRTIMER_MODE_ABS, which may cause the following warning
      in PREEMPT_RT kernel.
      
      WARNING: CPU: 1 PID: 2957 at kernel/time/hrtimer.c:1129 hrtimer_start_range_ns+0x348/0x3f0
      CPU: 1 PID: 2957 Comm: qemu-system-x86 Not tainted 5.4.23-rt11 #1
      Hardware name: Supermicro SYS-E300-9A-8C/A2SDi-8C-HLN4F, BIOS 1.1a 09/18/2018
      RIP: 0010:hrtimer_start_range_ns+0x348/0x3f0
      Code: 4d b8 0f 94 c1 0f b6 c9 e8 35 f1 ff ff 4c 8b 45
            b0 e9 3b fd ff ff e8 d7 3f fa ff 48 98 4c 03 34
            c5 a0 26 bf 93 e9 a1 fd ff ff <0f> 0b e9 fd fc ff
            ff 65 8b 05 fa b7 90 6d 89 c0 48 0f a3 05 60 91
      RSP: 0018:ffffbc60026ffaf8 EFLAGS: 00010202
      RAX: 0000000000000001 RBX: ffff9d81657d4110 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000006cc7987bcf RDI: ffff9d81657d4110
      RBP: ffffbc60026ffb58 R08: 0000000000000001 R09: 0000000000000010
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000006cc7987bcf
      R13: 0000000000000000 R14: 0000006cc7987bcf R15: ffffbc60026d6a00
      FS: 00007f401daed700(0000) GS:ffff9d81ffa40000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000ffffffff CR3: 0000000fa7574000 CR4: 00000000003426e0
      Call Trace:
      ? kvm_release_pfn_clean+0x22/0x60 [kvm]
      start_sw_timer+0x85/0x230 [kvm]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      kvm_lapic_switch_to_sw_timer+0x72/0x80 [kvm]
      vmx_pre_block+0x1cb/0x260 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_vmexit+0x1b/0x30 [kvm_intel]
      ? vmx_vmexit+0xf/0x30 [kvm_intel]
      ? vmx_sync_pir_to_irr+0x9e/0x100 [kvm_intel]
      ? kvm_apic_has_interrupt+0x46/0x80 [kvm]
      kvm_arch_vcpu_ioctl_run+0x85b/0x1fa0 [kvm]
      ? _raw_spin_unlock_irqrestore+0x18/0x50
      ? _copy_to_user+0x2c/0x30
      kvm_vcpu_ioctl+0x235/0x660 [kvm]
      ? rt_spin_unlock+0x2c/0x50
      do_vfs_ioctl+0x3e4/0x650
      ? __fget+0x7a/0xa0
      ksys_ioctl+0x67/0x90
      __x64_sys_ioctl+0x1a/0x20
      do_syscall_64+0x4d/0x120
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7f4027cc54a7
      Code: 00 00 90 48 8b 05 e9 59 0c 00 64 c7 00 26 00 00
            00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00
            00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff
            73 01 c3 48 8b 0d b9 59 0c 00 f7 d8 64 89 01 48
      RSP: 002b:00007f401dae9858 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00005558bd029690 RCX: 00007f4027cc54a7
      RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000d
      RBP: 00007f4028b72000 R08: 00005558bc829ad0 R09: 00000000ffffffff
      R10: 00005558bcf90ca0 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 00005558bce1c840
      --[ end trace 0000000000000002 ]--
      Signed-off-by: default avatarHe Zhe <zhe.he@windriver.com>
      Message-Id: <1584687967-332859-1-git-send-email-zhe.he@windriver.com>
      Reviewed-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      edec6e01
    • Tom Lendacky's avatar
      KVM: SVM: Issue WBINVD after deactivating an SEV guest · 2e2409af
      Tom Lendacky authored
      Currently, CLFLUSH is used to flush SEV guest memory before the guest is
      terminated (or a memory hotplug region is removed). However, CLFLUSH is
      not enough to ensure that SEV guest tagged data is flushed from the cache.
      
      With 33af3a7e ("KVM: SVM: Reduce WBINVD/DF_FLUSH invocations"), the
      original WBINVD was removed. This then exposed crashes at random times
      because of a cache flush race with a page that had both a hypervisor and
      a guest tag in the cache.
      
      Restore the WBINVD when destroying an SEV guest and add a WBINVD to the
      svm_unregister_enc_region() function to ensure hotplug memory is flushed
      when removed. The DF_FLUSH can still be avoided at this point.
      
      Fixes: 33af3a7e ("KVM: SVM: Reduce WBINVD/DF_FLUSH invocations")
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Message-Id: <c8bf9087ca3711c5770bdeaafa3e45b717dc5ef4.1584720426.git.thomas.lendacky@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2e2409af
  4. 20 Mar, 2020 2 commits
    • Paolo Bonzini's avatar
      KVM: SVM: document KVM_MEM_ENCRYPT_OP, let userspace detect if SEV is available · 2da1ed62
      Paolo Bonzini authored
      Userspace has no way to query if SEV has been disabled with the
      sev module parameter of kvm-amd.ko.  Actually it has one, but it
      is a hack: do ioctl(KVM_MEM_ENCRYPT_OP, NULL) and check if it
      returns EFAULT.  Make it a little nicer by returning zero for
      SEV enabled and NULL argument, and while at it document the
      ioctl arguments.
      
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2da1ed62
    • Paolo Bonzini's avatar
      KVM: x86: remove bogus user-triggerable WARN_ON · d3329454
      Paolo Bonzini authored
      The WARN_ON is essentially comparing a user-provided value with 0.  It is
      trivial to trigger it just by passing garbage to KVM_SET_CLOCK.  Guests
      can break if you do so, but the same applies to every KVM_SET_* ioctl.
      So, if it hurts when you do like this, just do not do it.
      
      Reported-by: syzbot+00be5da1d75f1cc95f6b@syzkaller.appspotmail.com
      Fixes: 9446e6fc ("KVM: x86: fix WARN_ON check of an unsigned less than zero")
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d3329454
  5. 14 Mar, 2020 5 commits
    • Paolo Bonzini's avatar
      018cabb6
    • Vitaly Kuznetsov's avatar
      KVM: nVMX: avoid NULL pointer dereference with incorrect EVMCS GPAs · 95fa1010
      Vitaly Kuznetsov authored
      When an EVMCS enabled L1 guest on KVM will tries doing enlightened VMEnter
      with EVMCS GPA = 0 the host crashes because the
      
      evmcs_gpa != vmx->nested.hv_evmcs_vmptr
      
      condition in nested_vmx_handle_enlightened_vmptrld() will evaluate to
      false (as nested.hv_evmcs_vmptr is zeroed after init). The crash will
      happen on vmx->nested.hv_evmcs pointer dereference.
      
      Another problematic EVMCS ptr value is '-1' but it only causes host crash
      after nested_release_evmcs() invocation. The problem is exactly the same as
      with '0', we mistakenly think that the EVMCS pointer hasn't changed and
      thus nested.hv_evmcs_vmptr is valid.
      
      Resolve the issue by adding an additional !vmx->nested.hv_evmcs
      check to nested_vmx_handle_enlightened_vmptrld(), this way we will
      always be trying kvm_vcpu_map() when nested.hv_evmcs is NULL
      and this is supposed to catch all invalid EVMCS GPAs.
      
      Also, initialize hv_evmcs_vmptr to '0' in nested_release_evmcs()
      to be consistent with initialization where we don't currently
      set hv_evmcs_vmptr to '-1'.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      95fa1010
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-master-5.6-1' of... · 997224fe
      Paolo Bonzini authored
      Merge tag 'kvm-s390-master-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master
      
      KVM: s390: Fully do the CPU resets as intended
      
      With 7de3f142 ("KVM: s390: Add new reset vcpu API") we clarified
      the meaning of the reset ioctl to fully reset the CPU and not only the
      parts that can not be handled by userspace. Turns out that we missed
      some parts.
      997224fe
    • Nitesh Narayan Lal's avatar
      KVM: x86: Initializing all kvm_lapic_irq fields in ioapic_write_indirect · 0c22056f
      Nitesh Narayan Lal authored
      Previously all fields of structure kvm_lapic_irq were not initialized
      before it was passed to kvm_bitmap_or_dest_vcpus(). Which will cause
      an issue when any of those fields are used for processing a request.
      For example not initializing the msi_redir_hint field before passing
      to the kvm_bitmap_or_dest_vcpus(), may lead to a misbehavior of
      kvm_apic_map_get_dest_lapic(). This will specifically happen when the
      kvm_lowest_prio_delivery() returns TRUE due to a non-zero garbage
      value of msi_redir_hint, which should not happen as the request belongs
      to APIC fixed delivery mode and we do not want to deliver the
      interrupt only to the lowest priority candidate.
      
      This patch initializes all the fields of kvm_lapic_irq based on the
      values of ioapic redirect_entry object before passing it on to
      kvm_bitmap_or_dest_vcpus().
      
      Fixes: 7ee30bc1 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs")
      Signed-off-by: default avatarNitesh Narayan Lal <nitesh@redhat.com>
      Reviewed-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      [Set level to false since the value doesn't really matter. Suggested
       by Vitaly Kuznetsov. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0c22056f
    • Sean Christopherson's avatar
      KVM: VMX: Condition ENCLS-exiting enabling on CPU support for SGX1 · 7a57c09b
      Sean Christopherson authored
      Enable ENCLS-exiting (and thus set vmcs.ENCLS_EXITING_BITMAP) only if
      the CPU supports SGX1.  Per Intel's SDM, all ENCLS leafs #UD if SGX1
      is not supported[*], i.e. intercepting ENCLS to inject a #UD is
      unnecessary.
      
      Avoiding ENCLS-exiting even when it is reported as supported by the CPU
      works around a reported issue where SGX is "hard" disabled after an S3
      suspend/resume cycle, i.e. CPUID.0x7.SGX=0 and the VMCS field/control
      are enumerated as unsupported.  While the root cause of the S3 issue is
      unknown, it's definitely _not_ a KVM (or kernel) bug, i.e. this is a
      workaround for what is most likely a hardware or firmware issue.  As a
      bonus side effect, KVM saves a VMWRITE when first preparing vmcs01 and
      vmcs02.
      
      Note, SGX must be disabled in BIOS to take advantage of this workaround
      
      [*] The additional ENCLS CPUID check on SGX1 exists so that SGX can be
          globally "soft" disabled post-reset, e.g. if #MC bits in MCi_CTL are
          cleared.  Soft disabled meaning disabling SGX without clearing the
          primary CPUID bit (in leaf 0x7) and without poking into non-SGX
          CPU paths, e.g. for the VMCS controls.
      
      Fixes: 0b665d30 ("KVM: vmx: Inject #UD for SGX ENCLS instruction in guest")
      Reported-by: default avatarToni Spets <toni.spets@iki.fi>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7a57c09b
  6. 11 Mar, 2020 1 commit
  7. 05 Mar, 2020 1 commit
  8. 03 Mar, 2020 2 commits
  9. 02 Mar, 2020 2 commits
    • Haiwei Li's avatar
      KVM: SVM: Fix the svm vmexit code for WRMSR · aaca2100
      Haiwei Li authored
      In svm, exit_code for MSR writes is not EXIT_REASON_MSR_WRITE which
      belongs to vmx.
      
      According to amd manual, SVM_EXIT_MSR(7ch) is the exit_code of VMEXIT_MSR
      due to RDMSR or WRMSR access to protected MSR. Additionally, the processor
      indicates in the VMCB's EXITINFO1 whether a RDMSR(EXITINFO1=0) or
      WRMSR(EXITINFO1=1) was intercepted.
      Signed-off-by: default avatarHaiwei Li <lihaiwei@tencent.com>
      Fixes: 1e9e2622 ("KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath", 2019-11-21)
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      aaca2100
    • Wanpeng Li's avatar
      KVM: X86: Fix dereference null cpufreq policy · 9a11997e
      Wanpeng Li authored
      Naresh Kamboju reported:
      
         Linux version 5.6.0-rc4 (oe-user@oe-host) (gcc version
        (GCC)) #1 SMP Sun Mar 1 22:59:08 UTC 2020
         kvm: no hardware support
         BUG: kernel NULL pointer dereference, address: 000000000000028c
         #PF: supervisor read access in kernel mode
         #PF: error_code(0x0000) - not-present page
         PGD 0 P4D 0
         Oops: 0000 [#1] SMP NOPTI
         CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4 #1
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
        04/01/2014
         RIP: 0010:kobject_put+0x12/0x1c0
         Call Trace:
          cpufreq_cpu_put+0x15/0x20
          kvm_arch_init+0x1f6/0x2b0
          kvm_init+0x31/0x290
          ? svm_check_processor_compat+0xd/0xd
          ? svm_check_processor_compat+0xd/0xd
          svm_init+0x21/0x23
          do_one_initcall+0x61/0x2f0
          ? rdinit_setup+0x30/0x30
          ? rcu_read_lock_sched_held+0x4f/0x80
          kernel_init_freeable+0x219/0x279
          ? rest_init+0x250/0x250
          kernel_init+0xe/0x110
          ret_from_fork+0x27/0x50
         Modules linked in:
         CR2: 000000000000028c
         ---[ end trace 239abf40c55c409b ]---
         RIP: 0010:kobject_put+0x12/0x1c0
      
      cpufreq policy which is get by cpufreq_cpu_get() can be NULL if it is failure,
      this patch takes care of it.
      
      Fixes: aaec7c03 (KVM: x86: avoid useless copy of cpufreq policy)
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9a11997e
  10. 01 Mar, 2020 1 commit
  11. 28 Feb, 2020 9 commits
  12. 24 Feb, 2020 1 commit
  13. 23 Feb, 2020 5 commits
  14. 22 Feb, 2020 3 commits
  15. 21 Feb, 2020 3 commits