1. 29 Mar, 2022 1 commit
    • David Matlack's avatar
      KVM: Prevent module exit until all VMs are freed · 5f6de5cb
      David Matlack authored
      Tie the lifetime the KVM module to the lifetime of each VM via
      kvm.users_count. This way anything that grabs a reference to the VM via
      kvm_get_kvm() cannot accidentally outlive the KVM module.
      
      Prior to this commit, the lifetime of the KVM module was tied to the
      lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU
      file descriptors by their respective file_operations "owner" field.
      This approach is insufficient because references grabbed via
      kvm_get_kvm() do not prevent closing any of the aforementioned file
      descriptors.
      
      This fixes a long standing theoretical bug in KVM that at least affects
      async page faults. kvm_setup_async_pf() grabs a reference via
      kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing
      prevents the VM file descriptor from being closed and the KVM module
      from being unloaded before this callback runs.
      
      Fixes: af585b92 ("KVM: Halt vcpu if page it tries to access is swapped out")
      Fixes: 3d3aab1b ("KVM: set owner of cpu and vm file operations")
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarBen Gardon <bgardon@google.com>
      [ Based on a patch from Ben implemented for Google's kernel. ]
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20220303183328.1499189-2-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5f6de5cb
  2. 21 Mar, 2022 8 commits
  3. 18 Mar, 2022 3 commits
  4. 15 Mar, 2022 2 commits
  5. 14 Mar, 2022 6 commits
  6. 11 Mar, 2022 9 commits
  7. 09 Mar, 2022 4 commits
  8. 08 Mar, 2022 7 commits
    • Suravee Suthikulpanit's avatar
      KVM: SVM: Allow AVIC support on system w/ physical APIC ID > 255 · 4a204f78
      Suravee Suthikulpanit authored
      Expand KVM's mask for the AVIC host physical ID to the full 12 bits defined
      by the architecture.  The number of bits consumed by hardware is model
      specific, e.g. early CPUs ignored bits 11:8, but there is no way for KVM
      to enumerate the "true" size.  So, KVM must allow using all bits, else it
      risks rejecting completely legal x2APIC IDs on newer CPUs.
      
      This means KVM relies on hardware to not assign x2APIC IDs that exceed the
      "true" width of the field, but presumably hardware is smart enough to tie
      the width to the max x2APIC ID.  KVM also relies on hardware to support at
      least 8 bits, as the legacy xAPIC ID is writable by software.  But, those
      assumptions are unavoidable due to the lack of any way to enumerate the
      "true" width.
      
      Cc: stable@vger.kernel.org
      Cc: Maxim Levitsky <mlevitsk@redhat.com>
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Fixes: 44a95dae ("KVM: x86: Detect and Initialize AVIC support")
      Signed-off-by: default avatarSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Message-Id: <20220211000851.185799-1-suravee.suthikulpanit@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4a204f78
    • Sean Christopherson's avatar
      KVM: selftests: Add test to populate a VM with the max possible guest mem · b58c55d5
      Sean Christopherson authored
      Add a selftest that enables populating a VM with the maximum amount of
      guest memory allowed by the underlying architecture.  Abuse KVM's
      memslots by mapping a single host memory region into multiple memslots so
      that the selftest doesn't require a system with terabytes of RAM.
      
      Default to 512gb of guest memory, which isn't all that interesting, but
      should work on all MMUs and doesn't take an exorbitant amount of memory
      or time.  E.g. testing with ~64tb of guest memory takes the better part
      of an hour, and requires 200gb of memory for KVM's page tables when using
      4kb pages.
      
      To inflicit maximum abuse on KVM' MMU, default to 4kb pages (or whatever
      the not-hugepage size is) in the backing store (memfd).  Use memfd for
      the host backing store to ensure that hugepages are guaranteed when
      requested, and to give the user explicit control of the size of hugepage
      being tested.
      
      By default, spin up as many vCPUs as there are available to the selftest,
      and distribute the work of dirtying each 4kb chunk of memory across all
      vCPUs.  Dirtying guest memory forces KVM to populate its page tables, and
      also forces KVM to write back accessed/dirty information to struct page
      when the guest memory is freed.
      
      On x86, perform two passes with a MMU context reset between each pass to
      coerce KVM into dropping all references to the MMU root, e.g. to emulate
      a vCPU dropping the last reference.  Perform both passes and all
      rendezvous on all architectures in the hope that arm64 and s390x can gain
      similar shenanigans in the future.
      
      Measure and report the duration of each operation, which is helpful not
      only to verify the test is working as intended, but also to easily
      evaluate the performance differences different page sizes.
      
      Provide command line options to limit the amount of guest memory, set the
      size of each slot (i.e. of the host memory region), set the number of
      vCPUs, and to enable usage of hugepages.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220226001546.360188-29-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b58c55d5
    • Sean Christopherson's avatar
      KVM: selftests: Define cpu_relax() helpers for s390 and x86 · 17ae5ebc
      Sean Christopherson authored
      Add cpu_relax() for s390 and x86 for use in arch-agnostic tests.  arm64
      already defines its own version.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220226001546.360188-28-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      17ae5ebc
    • Sean Christopherson's avatar
      KVM: selftests: Split out helper to allocate guest mem via memfd · a4187c9b
      Sean Christopherson authored
      Extract the code for allocating guest memory via memfd out of
      vm_userspace_mem_region_add() and into a new helper, kvm_memfd_alloc().
      A future selftest to populate a guest with the maximum amount of guest
      memory will abuse KVM's memslots to alias guest memory regions to a
      single memfd-backed host region, i.e. needs to back a guest with memfd
      memory without a 1:1 association between a memslot and a memfd instance.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220226001546.360188-27-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a4187c9b
    • Sean Christopherson's avatar
      KVM: selftests: Move raw KVM_SET_USER_MEMORY_REGION helper to utils · 3d7d6043
      Sean Christopherson authored
      Move set_memory_region_test's KVM_SET_USER_MEMORY_REGION helper to KVM's
      utils so that it can be used by other tests.  Provide a raw version as
      well as an assert-success version to reduce the amount of boilerplate
      code need for basic usage.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220226001546.360188-26-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3d7d6043
    • Sean Christopherson's avatar
      KVM: x86/mmu: WARN on any attempt to atomically update REMOVED SPTE · 396fd74d
      Sean Christopherson authored
      Disallow calling tdp_mmu_set_spte_atomic() with a REMOVED "old" SPTE.
      This solves a conundrum introduced by commit 3255530a ("KVM: x86/mmu:
      Automatically update iter->old_spte if cmpxchg fails"); if the helper
      doesn't update old_spte in the REMOVED case, then theoretically the
      caller could get stuck in an infinite loop as it will fail indefinitely
      on the REMOVED SPTE.  E.g. until recently, clear_dirty_gfn_range() didn't
      check for a present SPTE and would have spun until getting rescheduled.
      
      In practice, only the page fault path should "create" a new SPTE, all
      other paths should only operate on existing, a.k.a. shadow present,
      SPTEs.  Now that the page fault path pre-checks for a REMOVED SPTE in all
      cases, require all other paths to indirectly pre-check by verifying the
      target SPTE is a shadow-present SPTE.
      
      Note, this does not guarantee the actual SPTE isn't REMOVED, nor is that
      scenario disallowed.  The invariant is only that the caller mustn't
      invoke tdp_mmu_set_spte_atomic() if the SPTE was REMOVED when last
      observed by the caller.
      
      Cc: David Matlack <dmatlack@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220226001546.360188-25-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      396fd74d
    • Sean Christopherson's avatar
      KVM: x86/mmu: Check for a REMOVED leaf SPTE before making the SPTE · 58298b06
      Sean Christopherson authored
      Explicitly check for a REMOVED leaf SPTE prior to attempting to map
      the final SPTE when handling a TDP MMU fault.  Functionally, this is a
      nop as tdp_mmu_set_spte_atomic() will eventually detect the frozen SPTE.
      Pre-checking for a REMOVED SPTE is a minor optmization, but the real goal
      is to allow tdp_mmu_set_spte_atomic() to have an invariant that the "old"
      SPTE is never a REMOVED SPTE.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarBen Gardon <bgardon@google.com>
      Message-Id: <20220226001546.360188-24-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      58298b06