1. 24 Jun, 2022 19 commits
  2. 22 Jun, 2022 1 commit
    • Sean Christopherson's avatar
      KVM: selftests: Add MONITOR/MWAIT quirk test · 2325d4dd
      Sean Christopherson authored
      Add a test to verify the "MONITOR/MWAIT never fault" quirk, and as a
      bonus, also verify the related "MISC_ENABLES ignores ENABLE_MWAIT" quirk.
      
      If the "never fault" quirk is enabled, MONITOR/MWAIT should always be
      emulated as NOPs, even if they're reported as disabled in guest CPUID.
      Use the MISC_ENABLES quirk to coerce KVM into toggling the MWAIT CPUID
      enable, as KVM now disallows manually toggling CPUID bits after running
      the vCPU.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220608224516.3788274-6-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2325d4dd
  3. 20 Jun, 2022 20 commits
    • Sean Christopherson's avatar
      KVM: selftests: Use exception fixup for #UD/#GP Hyper-V MSR/hcall tests · cc5851c6
      Sean Christopherson authored
      Use exception fixup to verify VMCALL/RDMSR/WRMSR fault as expected in the
      Hyper-V Features test.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220608224516.3788274-5-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cc5851c6
    • Sean Christopherson's avatar
      KVM: selftests: Mostly fix broken Hyper-V Features test · 9f88d062
      Sean Christopherson authored
      Explicitly do all setup at every stage of the Hyper-V Features test, e.g.
      set the MSR/hypercall, enable capabilities, etc...  Now that the VM is
      recreated for every stage, values that are written into the VM's address
      space, i.e. shared with the guest, are reset between sub-tests, as are
      any capabilities, etc...
      
      Fix the hypercall params as well, which were broken in the same rework.
      The "hcall" struct/pointer needs to point at the hcall_params object, not
      the set of hypercall pages.
      
      The goofs were hidden by the test's dubious behavior of using '0' to
      signal "done", i.e. the MSR test ran exactly one sub-test, and the
      hypercall test was a gigantic nop.
      
      Fixes: 6c118643 ("KVM: selftests: Avoid KVM_SET_CPUID2 after KVM_RUN in hyperv_features test")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220608224516.3788274-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9f88d062
    • Sean Christopherson's avatar
      KVM: selftests: Add x86-64 support for exception fixup · 3b23054c
      Sean Christopherson authored
      Add x86-64 support for exception fixup on single instructions, without
      forcing tests to install their own fault handlers.  Use registers r9-r11
      to flag the instruction as "safe" and pass fixup/vector information,
      i.e. introduce yet another flavor of fixup (versus the kernel's in-memory
      tables and KUT's per-CPU area) to take advantage of KVM sefltests being
      64-bit only.
      
      Using only registers avoids the need to allocate fixup tables, ensure
      FS or GS base is valid for the guest, ensure memory is mapped into the
      guest, etc..., and also reduces the potential for recursive faults due to
      accessing memory.
      
      Providing exception fixup trivializes tests that just want to verify that
      an instruction faults, e.g. no need to track start/end using global
      labels, no need to install a dedicated handler, etc...
      
      Deliberately do not support #DE in exception fixup so that the fixup glue
      doesn't need to account for a fault with vector == 0, i.e. the vector can
      also indicate that a fault occurred.  KVM injects #DE only for esoteric
      emulation scenarios, i.e. there's very, very little value in testing #DE.
      Force any test that wants to generate #DEs to install its own handler(s).
      
      Use kvm_pv_test as a guinea pig for the new fixup, as it has a very
      straightforward use case of wanting to verify that RDMSR and WRMSR fault.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220608224516.3788274-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3b23054c
    • Sean Christopherson's avatar
      KVM: x86: Add a quirk for KVM's "MONITOR/MWAIT are NOPs!" behavior · bfbcc81b
      Sean Christopherson authored
      Add a quirk for KVM's behavior of emulating intercepted MONITOR/MWAIT
      instructions a NOPs regardless of whether or not they are supported in
      guest CPUID.  KVM's current behavior was likely motiviated by a certain
      fruity operating system that expects MONITOR/MWAIT to be supported
      unconditionally and blindly executes MONITOR/MWAIT without first checking
      CPUID.  And because KVM does NOT advertise MONITOR/MWAIT to userspace,
      that's effectively the default setup for any VMM that regurgitates
      KVM_GET_SUPPORTED_CPUID to KVM_SET_CPUID2.
      
      Note, this quirk interacts with KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT.  The
      behavior is actually desirable, as userspace VMMs that want to
      unconditionally hide MONITOR/MWAIT from the guest can leave the
      MISC_ENABLE quirk enabled.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220608224516.3788274-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bfbcc81b
    • Sean Christopherson's avatar
      KVM: x86: Ignore benign host writes to "unsupported" F15H_PERF_CTL MSRs · ff81a90f
      Sean Christopherson authored
      Ignore host userspace writes of '0' to F15H_PERF_CTL MSRs KVM reports
      in the MSR-to-save list, but the MSRs are ultimately unsupported.  All
      MSRs in said list must be writable by userspace, e.g. if userspace sends
      the list back at KVM without filtering out the MSRs it doesn't need.
      
      Note, reads of said MSRs already have the desired behavior.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220611005755.753273-8-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ff81a90f
    • Sean Christopherson's avatar
      KVM: x86: Ignore benign host accesses to "unsupported" PEBS and BTS MSRs · 157fc497
      Sean Christopherson authored
      Ignore host userspace reads and writes of '0' to PEBS and BTS MSRs that
      KVM reports in the MSR-to-save list, but the MSRs are ultimately
      unsupported.  All MSRs in said list must be writable by userspace, e.g.
      if userspace sends the list back at KVM without filtering out the MSRs it
      doesn't need.
      
      Fixes: 8183a538 ("KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS")
      Fixes: 902caeb6 ("KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS")
      Fixes: c59a1f10 ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220611005755.753273-7-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      157fc497
    • Sean Christopherson's avatar
      KVM: VMX: Use vcpu_get_perf_capabilities() to get guest-visible value · 3f7999b9
      Sean Christopherson authored
      Use vcpu_get_perf_capabilities() when querying MSR_IA32_PERF_CAPABILITIES
      from the guest's perspective, e.g. to update the vPMU and to determine
      which MSRs exist.  If userspace ignores MSR_IA32_PERF_CAPABILITIES but
      clear X86_FEATURE_PDCM, the guest should see '0'.
      
      Fixes: 902caeb6 ("KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS")
      Fixes: c59a1f10 ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220611005755.753273-6-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3f7999b9
    • Sean Christopherson's avatar
      Revert "KVM: x86: always allow host-initiated writes to PMU MSRs" · 545feb96
      Sean Christopherson authored
      Revert the hack to allow host-initiated accesses to all "PMU" MSRs,
      as intel_is_valid_msr() returns true for _all_ MSRs, regardless of whether
      or not it has a snowball's chance in hell of actually being a PMU MSR.
      
      That mostly gets papered over by the actual get/set helpers only handling
      MSRs that they knows about, except there's the minor detail that
      kvm_pmu_{g,s}et_msr() eat reads and writes when the PMU is disabled.
      I.e. KVM will happy allow reads and writes to _any_ MSR if the PMU is
      disabled, either via module param or capability.
      
      This reverts commit d1c88a40.
      
      Fixes: d1c88a40 ("KVM: x86: always allow host-initiated writes to PMU MSRs")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220611005755.753273-5-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      545feb96
    • Sean Christopherson's avatar
      Revert "KVM: x86/pmu: Accept 0 for absent PMU MSRs when host-initiated if !enable_pmu" · 5d4283df
      Sean Christopherson authored
      Eating reads and writes to all "PMU" MSRs when there is no PMU is wildly
      broken as it results in allowing accesses to _any_ MSR on Intel CPUs
      as intel_is_valid_msr() returns true for all host_initiated accesses.
      
      A revert of commit d1c88a40 ("KVM: x86: always allow host-initiated
      writes to PMU MSRs") will soon follow.
      
      This reverts commit 8e6a58e2.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220611005755.753273-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5d4283df
    • Sean Christopherson's avatar
      KVM: VMX: Give host userspace full control of MSR_IA32_PERF_CAPABILITIES · 0f4a7185
      Sean Christopherson authored
      Do not clear manipulate MSR_IA32_PERF_CAPABILITIES in intel_pmu_refresh(),
      i.e. give userspace full control over capability/read-only MSRs.  KVM is
      not a babysitter, it is userspace's responsiblity to provide a valid and
      coherent vCPU model.
      
      Attempting to "help" the guest by forcing a consistent model creates edge
      cases, and ironicially leads to inconsistent behavior.
      
      Example #1:  KVM doesn't do intel_pmu_refresh() when userspace writes
      the MSR.
      
      Example #2: KVM doesn't clear the bits when the PMU is disabled, or when
      there's no architectural PMU.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220611005755.753273-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0f4a7185
    • Sean Christopherson's avatar
      KVM: x86: Give host userspace full control of MSR_IA32_MISC_ENABLES · 9fc22296
      Sean Christopherson authored
      Give userspace full control of the read-only bits in MISC_ENABLES, i.e.
      do not modify bits on PMU refresh and do not preserve existing bits when
      userspace writes MISC_ENABLES.  With a few exceptions where KVM doesn't
      expose the necessary controls to userspace _and_ there is a clear cut
      association with CPUID, e.g. reserved CR4 bits, KVM does not own the vCPU
      and should not manipulate the vCPU model on behalf of "dummy user space".
      
      The argument that KVM is doing userspace a favor because "the order of
      setting vPMU capabilities and MSR_IA32_MISC_ENABLE is not strictly
      guaranteed" is specious, as attempting to configure MSRs on behalf of
      userspace inevitably leads to edge cases precisely because KVM does not
      prescribe a specific order of initialization.
      
      Example #1: intel_pmu_refresh() consumes and modifies the vCPU's
      MSR_IA32_PERF_CAPABILITIES, and so assumes userspace initializes config
      MSRs before setting the guest CPUID model.  If userspace sets CPUID
      first, then KVM will mark PEBS as available when arch.perf_capabilities
      is initialized with a non-zero PEBS format, thus creating a bad vCPU
      model if userspace later disables PEBS by writing PERF_CAPABILITIES.
      
      Example #2: intel_pmu_refresh() does not clear PERF_CAP_PEBS_MASK in
      MSR_IA32_PERF_CAPABILITIES if there is no vPMU, making KVM inconsistent
      in its desire to be consistent.
      
      Example #3: intel_pmu_refresh() does not clear MSR_IA32_MISC_ENABLE_EMON
      if KVM_SET_CPUID2 is called multiple times, first with a vPMU, then
      without a vPMU.  While slightly contrived, it's plausible a VMM could
      reflect KVM's default vCPU and then operate on KVM's copy of CPUID to
      later clear the vPMU settings, e.g. see KVM's selftests.
      
      Example #4: Enumerating an Intel vCPU on an AMD host will not call into
      intel_pmu_refresh() at any point, and so the BTS and PEBS "unavailable"
      bits will be left clear, without any way for userspace to set them.
      
      Keep the "R" behavior of the bit 7, "EMON available", for the guest.
      Unlike the BTS and PEBS bits, which are fully "RO", the EMON bit can be
      written with a different value, but that new value is ignored.
      
      Cc: Like Xu <likexu@tencent.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Message-Id: <20220611005755.753273-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9fc22296
    • Dongliang Mu's avatar
      x86: kvm: remove NULL check before kfree · e20918f6
      Dongliang Mu authored
      kfree can handle NULL pointer as its argument.
      According to coccinelle isnullfree check, remove NULL check
      before kfree operation.
      Signed-off-by: default avatarDongliang Mu <mudongliangabcd@gmail.com>
      Message-Id: <20220614133458.147314-1-dzm91@hust.edu.cn>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e20918f6
    • Sean Christopherson's avatar
      KVM: Do not zero initialize 'pfn' in hva_to_pfn() · 943dfea8
      Sean Christopherson authored
      Drop the unnecessary initialization of the local 'pfn' variable in
      hva_to_pfn().  First and foremost, '0' is not an invalid pfn, it's a
      perfectly valid pfn on most architectures.  I.e. if hva_to_pfn() were to
      return an "uninitializd" pfn, it would actually be interpeted as a legal
      pfn by most callers.
      
      Second, hva_to_pfn() can't return an uninitialized pfn as hva_to_pfn()
      explicitly sets pfn to an error value (or returns an error value directly)
      if a helper returns failure, and all helpers set the pfn on success.
      
      The zeroing of 'pfn' was introduced by commit 2fc84311 ("KVM:
      reorganize hva_to_pfn"), probably to avoid "uninitialized variable"
      warnings on statements that return pfn.  However, no compiler seems
      to produce them, making the initialization unnecessary.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220429010416.2788472-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      943dfea8
    • Sean Christopherson's avatar
      KVM: x86/mmu: Shove refcounted page dependency into host_pfn_mapping_level() · 5d49f08c
      Sean Christopherson authored
      Move the check that restricts mapping huge pages into the guest to pfns
      that are backed by refcounted 'struct page' memory into the helper that
      actually "requires" a 'struct page', host_pfn_mapping_level().  In
      addition to deduplicating code, moving the check to the helper eliminates
      the subtle requirement that the caller check that the incoming pfn is
      backed by a refcounted struct page, and as an added bonus avoids an extra
      pfn_to_page() lookup.
      
      Note, the is_error_noslot_pfn() check in kvm_mmu_hugepage_adjust() needs
      to stay where it is, as it guards against dereferencing a NULL memslot in
      the kvm_slot_dirty_track_enabled() that follows.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220429010416.2788472-11-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5d49f08c
    • Sean Christopherson's avatar
      KVM: Rename/refactor kvm_is_reserved_pfn() to kvm_pfn_to_refcounted_page() · b14b2690
      Sean Christopherson authored
      Rename and refactor kvm_is_reserved_pfn() to kvm_pfn_to_refcounted_page()
      to better reflect what KVM is actually checking, and to eliminate extra
      pfn_to_page() lookups.  The kvm_release_pfn_*() an kvm_try_get_pfn()
      helpers in particular benefit from "refouncted" nomenclature, as it's not
      all that obvious why KVM needs to get/put refcounts for some PG_reserved
      pages (ZERO_PAGE and ZONE_DEVICE).
      
      Add a comment to call out that the list of exceptions to PG_reserved is
      all but guaranteed to be incomplete.  The list has mostly been compiled
      by people throwing noodles at KVM and finding out they stick a little too
      well, e.g. the ZERO_PAGE's refcount overflowed and ZONE_DEVICE pages
      didn't get freed.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220429010416.2788472-10-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b14b2690
    • Sean Christopherson's avatar
      KVM: Take a 'struct page', not a pfn in kvm_is_zone_device_page() · 284dc493
      Sean Christopherson authored
      Operate on a 'struct page' instead of a pfn when checking if a page is a
      ZONE_DEVICE page, and rename the helper accordingly.  Generally speaking,
      KVM doesn't actually care about ZONE_DEVICE memory, i.e. shouldn't do
      anything special for ZONE_DEVICE memory.  Rather, KVM wants to treat
      ZONE_DEVICE memory like regular memory, and the need to identify
      ZONE_DEVICE memory only arises as an exception to PG_reserved pages. In
      other words, KVM should only ever check for ZONE_DEVICE memory after KVM
      has already verified that there is a struct page associated with the pfn.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220429010416.2788472-9-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      284dc493
    • Sean Christopherson's avatar
      KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page() · b1624f99
      Sean Christopherson authored
      Drop helpers to convert a gfn/gpa to a 'struct page' in the context of a
      vCPU.  KVM doesn't require that guests be backed by 'struct page' memory,
      thus any use of helpers that assume 'struct page' is bound to be flawed,
      as was the case for the recently removed last user in x86's nested VMX.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220429010416.2788472-8-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b1624f99
    • Sean Christopherson's avatar
      KVM: Don't WARN if kvm_pfn_to_page() encounters a "reserved" pfn · 6573a691
      Sean Christopherson authored
      Drop a WARN_ON() if kvm_pfn_to_page() encounters a "reserved" pfn, which
      in this context means a struct page that has PG_reserved but is not a/the
      ZERO_PAGE and is not a ZONE_DEVICE page.  The usage, via gfn_to_page(),
      in x86 is safe as gfn_to_page() is used only to retrieve a page from
      KVM-controlled memslot, but the usage in PPC and s390 operates on
      arbitrary gfns and thus memslots that can be backed by incompatible
      memory.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220429010416.2788472-7-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6573a691
    • Sean Christopherson's avatar
      KVM: nVMX: Use kvm_vcpu_map() to get/pin vmcs12's APIC-access page · fe1911aa
      Sean Christopherson authored
      Use kvm_vcpu_map() to get/pin the backing for vmcs12's APIC-access page,
      there's no reason it has to be restricted to 'struct page' backing.  The
      APIC-access page actually doesn't need to be backed by anything, which is
      ironically why it got left behind by the series which introduced
      kvm_vcpu_map()[1]; the plan was to shove a dummy pfn into vmcs02[2], but
      that code never got merged.
      
      Switching the APIC-access page to kvm_vcpu_map() doesn't preclude using a
      magic pfn in the future, and will allow a future patch to drop
      kvm_vcpu_gpa_to_page().
      
      [1] https://lore.kernel.org/all/1547026933-31226-1-git-send-email-karahmed@amazon.de
      [2] https://lore.kernel.org/lkml/1543845551-4403-1-git-send-email-karahmed@amazon.deSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220429010416.2788472-6-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fe1911aa
    • Sean Christopherson's avatar
      KVM: Avoid pfn_to_page() and vice versa when releasing pages · 8e1c6914
      Sean Christopherson authored
      Invert the order of KVM's page/pfn release helpers so that the "inner"
      helper operates on a page instead of a pfn.  As pointed out by Linus[*],
      converting between struct page and a pfn isn't necessarily cheap, and
      that's not even counting the overhead of is_error_noslot_pfn() and
      kvm_is_reserved_pfn().  Even if the checks were dirt cheap, there's no
      reason to convert from a page to a pfn and back to a page, just to mark
      the page dirty/accessed or to put a reference to the page.
      
      Opportunistically drop a stale declaration of kvm_set_page_accessed()
      from kvm_host.h (there was no implementation).
      
      No functional change intended.
      
      [*] https://lore.kernel.org/all/CAHk-=wifQimj2d6npq-wCi5onYPjzQg4vyO4tFcPJJZr268cRw@mail.gmail.comSigned-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220429010416.2788472-5-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8e1c6914