1. 28 Oct, 2021 1 commit
    • David Woodhouse's avatar
      KVM: x86: Take srcu lock in post_kvm_run_save() · f3d1436d
      David Woodhouse authored
      The Xen interrupt injection for event channels relies on accessing the
      guest's vcpu_info structure in __kvm_xen_has_interrupt(), through a
      gfn_to_hva_cache.
      
      This requires the srcu lock to be held, which is mostly the case except
      for this code path:
      
      [   11.822877] WARNING: suspicious RCU usage
      [   11.822965] -----------------------------
      [   11.823013] include/linux/kvm_host.h:664 suspicious rcu_dereference_check() usage!
      [   11.823131]
      [   11.823131] other info that might help us debug this:
      [   11.823131]
      [   11.823196]
      [   11.823196] rcu_scheduler_active = 2, debug_locks = 1
      [   11.823253] 1 lock held by dom:0/90:
      [   11.823292]  #0: ffff998956ec8118 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x680
      [   11.823379]
      [   11.823379] stack backtrace:
      [   11.823428] CPU: 2 PID: 90 Comm: dom:0 Kdump: loaded Not tainted 5.4.34+ #5
      [   11.823496] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      [   11.823612] Call Trace:
      [   11.823645]  dump_stack+0x7a/0xa5
      [   11.823681]  lockdep_rcu_suspicious+0xc5/0x100
      [   11.823726]  __kvm_xen_has_interrupt+0x179/0x190
      [   11.823773]  kvm_cpu_has_extint+0x6d/0x90
      [   11.823813]  kvm_cpu_accept_dm_intr+0xd/0x40
      [   11.823853]  kvm_vcpu_ready_for_interrupt_injection+0x20/0x30
                    < post_kvm_run_save() inlined here >
      [   11.823906]  kvm_arch_vcpu_ioctl_run+0x135/0x6a0
      [   11.823947]  kvm_vcpu_ioctl+0x263/0x680
      
      Fixes: 40da8ccd ("KVM: x86/xen: Add event channel interrupt vector upcall")
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: stable@vger.kernel.org
      Message-Id: <606aaaf29fca3850a63aa4499826104e77a72346.camel@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f3d1436d
  2. 27 Oct, 2021 1 commit
  3. 25 Oct, 2021 3 commits
    • David Woodhouse's avatar
      KVM: x86/xen: Fix kvm_xen_has_interrupt() sleeping in kvm_vcpu_block() · 0985dba8
      David Woodhouse authored
      In kvm_vcpu_block, the current task is set to TASK_INTERRUPTIBLE before
      making a final check whether the vCPU should be woken from HLT by any
      incoming interrupt.
      
      This is a problem for the get_user() in __kvm_xen_has_interrupt(), which
      really shouldn't be sleeping when the task state has already been set.
      I think it's actually harmless as it would just manifest itself as a
      spurious wakeup, but it's causing a debug warning:
      
      [  230.963649] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b6bcdbc9>] prepare_to_swait_exclusive+0x30/0x80
      
      Fix the warning by turning it into an *explicit* spurious wakeup. When
      invoked with !task_is_running(current) (and we might as well add
      in_atomic() there while we're at it), just return 1 to indicate that
      an IRQ is pending, which will cause a wakeup and then something will
      call it again in a context that *can* sleep so it can fault the page
      back in.
      
      Cc: stable@vger.kernel.org
      Fixes: 40da8ccd ("KVM: x86/xen: Add event channel interrupt vector upcall")
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      
      Message-Id: <168bf8c689561da904e48e2ff5ae4713eaef9e2d.camel@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0985dba8
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-master-5.15-2' of... · 4b2caef0
      Paolo Bonzini authored
      Merge tag 'kvm-s390-master-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
      
      KVM: s390: Fixes for interrupt delivery
      
      Two bugs that might result in CPUs not woken up when interrupts are
      pending.
      4b2caef0
    • David Woodhouse's avatar
      KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock · 8228c77d
      David Woodhouse authored
      On the preemption path when updating a Xen guest's runstate times, this
      lock is taken inside the scheduler rq->lock, which is a raw spinlock.
      This was shown in a lockdep warning:
      
      [   89.138354] =============================
      [   89.138356] [ BUG: Invalid wait context ]
      [   89.138358] 5.15.0-rc5+ #834 Tainted: G S        I E
      [   89.138360] -----------------------------
      [   89.138361] xen_shinfo_test/2575 is trying to lock:
      [   89.138363] ffffa34a0364efd8 (&kvm->arch.pvclock_gtod_sync_lock){....}-{3:3}, at: get_kvmclock_ns+0x1f/0x130 [kvm]
      [   89.138442] other info that might help us debug this:
      [   89.138444] context-{5:5}
      [   89.138445] 4 locks held by xen_shinfo_test/2575:
      [   89.138447]  #0: ffff972bdc3b8108 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x6f0 [kvm]
      [   89.138483]  #1: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_ioctl_run+0xdc/0x8b0 [kvm]
      [   89.138526]  #2: ffff97331fdbac98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xff/0xbd0
      [   89.138534]  #3: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_put+0x26/0x170 [kvm]
      ...
      [   89.138695]  get_kvmclock_ns+0x1f/0x130 [kvm]
      [   89.138734]  kvm_xen_update_runstate+0x14/0x90 [kvm]
      [   89.138783]  kvm_xen_update_runstate_guest+0x15/0xd0 [kvm]
      [   89.138830]  kvm_arch_vcpu_put+0xe6/0x170 [kvm]
      [   89.138870]  kvm_sched_out+0x2f/0x40 [kvm]
      [   89.138900]  __schedule+0x5de/0xbd0
      
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com
      Fixes: 30b5c851 ("KVM: x86/xen: Add support for vCPU runstate information")
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <1b02a06421c17993df337493a68ba923f3bd5c0f.camel@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8228c77d
  4. 20 Oct, 2021 2 commits
  5. 18 Oct, 2021 1 commit
    • Paolo Bonzini's avatar
      KVM: X86: fix lazy allocation of rmaps · fa13843d
      Paolo Bonzini authored
      If allocation of rmaps fails, but some of the pointers have already been written,
      those pointers can be cleaned up when the memslot is freed, or even reused later
      for another attempt at allocating the rmaps.  Therefore there is no need to
      WARN, as done for example in memslot_rmap_alloc, but the allocation *must* be
      skipped lest KVM will overwrite the previous pointer and will indeed leak memory.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fa13843d
  6. 15 Oct, 2021 2 commits
  7. 05 Oct, 2021 3 commits
  8. 04 Oct, 2021 1 commit
  9. 30 Sep, 2021 4 commits
    • Sean Christopherson's avatar
      KVM: selftests: Ensure all migrations are performed when test is affined · 7b0035ea
      Sean Christopherson authored
      Rework the CPU selection in the migration worker to ensure the specified
      number of migrations are performed when the test iteslf is affined to a
      subset of CPUs.  The existing logic skips iterations if the target CPU is
      not in the original set of possible CPUs, which causes the test to fail
      if too many iterations are skipped.
      
        ==== Test Assertion Failure ====
        rseq_test.c:228: i > (NR_TASK_MIGRATIONS / 2)
        pid=10127 tid=10127 errno=4 - Interrupted system call
           1  0x00000000004018e5: main at rseq_test.c:227
           2  0x00007fcc8fc66bf6: ?? ??:0
           3  0x0000000000401959: _start at ??:?
        Only performed 4 KVM_RUNs, task stalled too much?
      
      Calculate the min/max possible CPUs as a cheap "best effort" to avoid
      high runtimes when the test is affined to a small percentage of CPUs.
      Alternatively, a list or xarray of the possible CPUs could be used, but
      even in a horrendously inefficient setup, such optimizations are not
      needed because the runtime is completely dominated by the cost of
      migrating the task, and the absolute runtime is well under a minute in
      even truly absurd setups, e.g. running on a subset of vCPUs in a VM that
      is heavily overcommited (16 vCPUs per pCPU).
      
      Fixes: 61e52f16 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs")
      Reported-by: default avatarDongli Zhang <dongli.zhang@oracle.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210929234112.1862848-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7b0035ea
    • Sean Christopherson's avatar
      KVM: x86: Swap order of CPUID entry "index" vs. "significant flag" checks · e8a747d0
      Sean Christopherson authored
      Check whether a CPUID entry's index is significant before checking for a
      matching index to hack-a-fix an undefined behavior bug due to consuming
      uninitialized data.  RESET/INIT emulation uses kvm_cpuid() to retrieve
      CPUID.0x1, which does _not_ have a significant index, and fails to
      initialize the dummy variable that doubles as EBX/ECX/EDX output _and_
      ECX, a.k.a. index, input.
      
      Practically speaking, it's _extremely_  unlikely any compiler will yield
      code that causes problems, as the compiler would need to inline the
      kvm_cpuid() call to detect the uninitialized data, and intentionally hose
      the kernel, e.g. insert ud2, instead of simply ignoring the result of
      the index comparison.
      
      Although the sketchy "dummy" pattern was introduced in SVM by commit
      66f7b72e ("KVM: x86: Make register state after reset conform to
      specification"), it wasn't actually broken until commit 7ff6c035
      ("KVM: x86: Remove stateful CPUID handling") arbitrarily swapped the
      order of operations such that "index" was checked before the significant
      flag.
      
      Avoid consuming uninitialized data by reverting to checking the flag
      before the index purely so that the fix can be easily backported; the
      offending RESET/INIT code has been refactored, moved, and consolidated
      from vendor code to common x86 since the bug was introduced.  A future
      patch will directly address the bad RESET/INIT behavior.
      
      The undefined behavior was detected by syzbot + KernelMemorySanitizer.
      
        BUG: KMSAN: uninit-value in cpuid_entry2_find arch/x86/kvm/cpuid.c:68
        BUG: KMSAN: uninit-value in kvm_find_cpuid_entry arch/x86/kvm/cpuid.c:1103
        BUG: KMSAN: uninit-value in kvm_cpuid+0x456/0x28f0 arch/x86/kvm/cpuid.c:1183
         cpuid_entry2_find arch/x86/kvm/cpuid.c:68 [inline]
         kvm_find_cpuid_entry arch/x86/kvm/cpuid.c:1103 [inline]
         kvm_cpuid+0x456/0x28f0 arch/x86/kvm/cpuid.c:1183
         kvm_vcpu_reset+0x13fb/0x1c20 arch/x86/kvm/x86.c:10885
         kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923
         vcpu_enter_guest+0xfd2/0x6d80 arch/x86/kvm/x86.c:9534
         vcpu_run+0x7f5/0x18d0 arch/x86/kvm/x86.c:9788
         kvm_arch_vcpu_ioctl_run+0x245b/0x2d10 arch/x86/kvm/x86.c:10020
      
        Local variable ----dummy@kvm_vcpu_reset created at:
         kvm_vcpu_reset+0x1fb/0x1c20 arch/x86/kvm/x86.c:10812
         kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923
      
      Reported-by: syzbot+f3985126b746b3d59c9d@syzkaller.appspotmail.com
      Reported-by: default avatarAlexander Potapenko <glider@google.com>
      Fixes: 2a24be79 ("KVM: VMX: Set EDX at INIT with CPUID.0x1, Family-Model-Stepping")
      Fixes: 7ff6c035 ("KVM: x86: Remove stateful CPUID handling")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Message-Id: <20210929222426.1855730-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e8a747d0
    • Zelin Deng's avatar
      ptp: Fix ptp_kvm_getcrosststamp issue for x86 ptp_kvm · 773e89ab
      Zelin Deng authored
      hv_clock is preallocated to have only HVC_BOOT_ARRAY_SIZE (64) elements;
      if the PTP_SYS_OFFSET_PRECISE ioctl is executed on vCPUs whose index is
      64 of higher, retrieving the struct pvclock_vcpu_time_info pointer with
      "src = &hv_clock[cpu].pvti" will result in an out-of-bounds access and
      a wild pointer.  Change it to "this_cpu_pvti()" which is guaranteed to
      be valid.
      
      Fixes: 95a3d445 ("Switch kvmclock data to a PER_CPU variable")
      Signed-off-by: default avatarZelin Deng <zelin.deng@linux.alibaba.com>
      Cc: <stable@vger.kernel.org>
      Message-Id: <1632892429-101194-3-git-send-email-zelin.deng@linux.alibaba.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      773e89ab
    • Zelin Deng's avatar
      x86/kvmclock: Move this_cpu_pvti into kvmclock.h · ad9af930
      Zelin Deng authored
      There're other modules might use hv_clock_per_cpu variable like ptp_kvm,
      so move it into kvmclock.h and export the symbol to make it visiable to
      other modules.
      Signed-off-by: default avatarZelin Deng <zelin.deng@linux.alibaba.com>
      Cc: <stable@vger.kernel.org>
      Message-Id: <1632892429-101194-2-git-send-email-zelin.deng@linux.alibaba.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ad9af930
  10. 28 Sep, 2021 2 commits
  11. 27 Sep, 2021 1 commit
    • Zhenzhong Duan's avatar
      KVM: VMX: Fix a TSX_CTRL_CPUID_CLEAR field mask issue · 5c49d185
      Zhenzhong Duan authored
      When updating the host's mask for its MSR_IA32_TSX_CTRL user return entry,
      clear the mask in the found uret MSR instead of vmx->guest_uret_msrs[i].
      Modifying guest_uret_msrs directly is completely broken as 'i' does not
      point at the MSR_IA32_TSX_CTRL entry.  In fact, it's guaranteed to be an
      out-of-bounds accesses as is always set to kvm_nr_uret_msrs in a prior
      loop. By sheer dumb luck, the fallout is limited to "only" failing to
      preserve the host's TSX_CTRL_CPUID_CLEAR.  The out-of-bounds access is
      benign as it's guaranteed to clear a bit in a guest MSR value, which are
      always zero at vCPU creation on both x86-64 and i386.
      
      Cc: stable@vger.kernel.org
      Fixes: 8ea8b8d6 ("KVM: VMX: Use common x86's uret MSR list as the one true list")
      Signed-off-by: default avatarZhenzhong Duan <zhenzhong.duan@intel.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210926015545.281083-1-zhenzhong.duan@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5c49d185
  12. 24 Sep, 2021 3 commits
  13. 23 Sep, 2021 7 commits
  14. 22 Sep, 2021 9 commits