1. 01 Dec, 2023 2 commits
    • Sean Christopherson's avatar
      Revert "KVM: Prevent module exit until all VMs are freed" · ea61294b
      Sean Christopherson authored
      Revert KVM's misguided attempt to "fix" a use-after-module-unload bug that
      was actually due to failure to flush a workqueue, not a lack of module
      refcounting.  Pinning the KVM module until kvm_vm_destroy() doesn't
      prevent use-after-free due to the module being unloaded, as userspace can
      invoke delete_module() the instant the last reference to KVM is put, i.e.
      can cause all KVM code to be unmapped while KVM is actively executing said
      code.
      
      Generally speaking, the many instances of module_put(THIS_MODULE)
      notwithstanding, outside of a few special paths, a module can never safely
      put the last reference to itself without creating deadlock, i.e. something
      external to the module *must* put the last reference.  In other words,
      having VMs grab a reference to the KVM module is futile, pointless, and as
      evidenced by the now-reverted commit 70375c2d ("Revert "KVM: set owner
      of cpu and vm file operations""), actively dangerous.
      
      This reverts commit 405294f2 and commit
      5f6de5cb.
      
      Fixes: 405294f2 ("KVM: Unconditionally get a ref to /dev/kvm module when creating a VM")
      Fixes: 5f6de5cb ("KVM: Prevent module exit until all VMs are freed")
      Link: https://lore.kernel.org/r/20231018204624.1905300-4-seanjc@google.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      ea61294b
    • Sean Christopherson's avatar
      KVM: Set file_operations.owner appropriately for all such structures · 087e1520
      Sean Christopherson authored
      Set .owner for all KVM-owned filed types so that the KVM module is pinned
      until any files with callbacks back into KVM are completely freed.  Using
      "struct kvm" as a proxy for the module, i.e. keeping KVM-the-module alive
      while there are active VMs, doesn't provide full protection.
      
      Userspace can invoke delete_module() the instant the last reference to KVM
      is put.  If KVM itself puts the last reference, e.g. via kvm_destroy_vm(),
      then it's possible for KVM to be preempted and deleted/unloaded before KVM
      fully exits, e.g. when the task running kvm_destroy_vm() is scheduled back
      in, it will jump to a code page that is no longer mapped.
      
      Note, file types that can call into sub-module code, e.g. kvm-intel.ko or
      kvm-amd.ko on x86, must use the module pointer passed to kvm_init(), not
      THIS_MODULE (which points at kvm.ko).  KVM assumes that if /dev/kvm is
      reachable, e.g. VMs are active, then the vendor module is loaded.
      
      To reduce the probability of forgetting to set .owner entirely, use
      THIS_MODULE for stats files where KVM does not call back into vendor code.
      
      This reverts commit 70375c2d, and fixes
      several other file types that have been buggy since their introduction.
      
      Fixes: 70375c2d ("Revert "KVM: set owner of cpu and vm file operations"")
      Fixes: 3bcd0662 ("KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file")
      Reported-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Link: https://lore.kernel.org/all/20231010003746.GN800259@ZenIV
      Link: https://lore.kernel.org/r/20231018204624.1905300-2-seanjc@google.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      087e1520
  2. 29 Nov, 2023 1 commit
    • Like Xu's avatar
      KVM: x86: Get CPL directly when checking if loaded vCPU is in kernel mode · 547c9192
      Like Xu authored
      When querying whether or not a vCPU "is" running in kernel mode, directly
      get the CPL if the vCPU is the currently loaded vCPU.  In scenarios where
      a guest is profiled via perf-kvm, querying vcpu->arch.preempted_in_kernel
      from kvm_guest_state() is wrong if vCPU is actively running, i.e. isn't
      scheduled out due to being preempted and so preempted_in_kernel is stale.
      
      This affects perf/core's ability to accurately tag guest RIP with
      PERF_RECORD_MISC_GUEST_{KERNEL|USER} and record it in the sample.  This
      causes perf/tool to fail to connect the vCPU RIPs to the guest kernel
      space symbols when parsing these samples due to incorrect PERF_RECORD_MISC
      flags:
      
         Before (perf-report of a cpu-cycles sample):
            1.23%  :58945   [unknown]         [u] 0xffffffff818012e0
      
         After:
            1.35%  :60703   [kernel.vmlinux]  [g] asm_exc_page_fault
      
      Note, checking preempted_in_kernel in kvm_arch_vcpu_in_kernel() is awful
      as nothing in the API's suggests that it's safe to use if and only if the
      vCPU was preempted.  That can be cleaned up in the future, for now just
      fix the glaring correctness bug.
      
      Note #2, checking vcpu->preempted is NOT safe, as getting the CPL on VMX
      requires VMREAD, i.e. is correct if and only if the vCPU is loaded.  If
      the target vCPU *was* preempted, then it can be scheduled back in after
      the check on vcpu->preempted in kvm_vcpu_on_spin(), i.e. KVM could end up
      trying to do VMREAD on a VMCS that isn't loaded on the current pCPU.
      Signed-off-by: default avatarLike Xu <likexu@tencent.com>
      Fixes: e1bfc245 ("KVM: Move x86's perf guest info callbacks to generic KVM")
      Link: https://lore.kernel.org/r/20231123075818.12521-1-likexu@tencent.com
      [sean: massage changelong, add Fixes]
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      547c9192
  3. 30 Oct, 2023 1 commit
  4. 28 Oct, 2023 15 commits
  5. 27 Oct, 2023 15 commits
  6. 26 Oct, 2023 6 commits