1. 23 May, 2024 1 commit
  2. 15 May, 2024 1 commit
  3. 12 May, 2024 7 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-misc-6.10' of https://github.com/kvm-x86/linux into HEAD · 7d41e24d
      Paolo Bonzini authored
      KVM x86 misc changes for 6.10:
      
       - Advertise the max mappable GPA in the "guest MAXPHYADDR" CPUID field, which
         is unused by hardware, so that KVM can communicate its inability to map GPAs
         that set bits 51:48 due to lack of 5-level paging.  Guest firmware is
         expected to use the information to safely remap BARs in the uppermost GPA
         space, i.e to avoid placing a BAR at a legal, but unmappable, GPA.
      
       - Use vfree() instead of kvfree() for allocations that always use vcalloc()
         or __vcalloc().
      
       - Don't completely ignore same-value writes to immutable feature MSRs, as
         doing so results in KVM failing to reject accesses to MSR that aren't
         supposed to exist given the vCPU model and/or KVM configuration.
      
       - Don't mark APICv as being inhibited due to ABSENT if APICv is disabled
         KVM-wide to avoid confusing debuggers (KVM will never bother clearing the
         ABSENT inhibit, even if userspace enables in-kernel local APIC).
      7d41e24d
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-mmu-6.10' of https://github.com/kvm-x86/linux into HEAD · 5a1c72e0
      Paolo Bonzini authored
      KVM x86 MMU changes for 6.10:
      
       - Process TDP MMU SPTEs that are are zapped while holding mmu_lock for read
         after replacing REMOVED_SPTE with '0' and flushing remote TLBs, which allows
         vCPU tasks to repopulate the zapped region while the zapper finishes tearing
         down the old, defunct page tables.
      
       - Fix a longstanding, likely benign-in-practice race where KVM could fail to
         detect a write from kvm_mmu_track_write() to a shadowed GPTE if the GPTE is
         first page table being shadowed.
      5a1c72e0
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-selftests_utils-6.10' of https://github.com/kvm-x86/linux into HEAD · dee7ea42
      Paolo Bonzini authored
      KVM selftests treewide updates for 6.10:
      
       - Define _GNU_SOURCE for all selftests to fix a warning that was introduced by
         a change to kselftest_harness.h late in the 6.9 cycle, and because forcing
         every test to #define _GNU_SOURCE is painful.
      
       - Provide a global psuedo-RNG instance for all tests, so that library code can
         generate random, but determinstic numbers.
      
       - Use the global pRNG to randomly force emulation of select writes from guest
         code on x86, e.g. to help validate KVM's emulation of locked accesses.
      
       - Rename kvm_util_base.h back to kvm_util.h, as the weird layer of indirection
         was added purely to avoid manually #including ucall_common.h in a handful of
         locations.
      
       - Allocate and initialize x86's GDT, IDT, TSS, segments, and default exception
         handlers at VM creation, instead of forcing tests to manually trigger the
         related setup.
      dee7ea42
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-vmx-6.10' of https://github.com/kvm-x86/linux into HEAD · 31a6cd7f
      Paolo Bonzini authored
      KVM VMX changes for 6.10:
      
       - Clear vmcs.EXIT_QUALIFICATION when synthesizing an EPT Misconfig VM-Exit to
         L1, as per the SDM.
      
       - Move kvm_vcpu_arch's exit_qualification into x86_exception, as the field is
         used only when synthesizing nested EPT violation, i.e. it's not the vCPU's
         "real" exit_qualification, which is tracked elsewhere.
      
       - Add a sanity check to assert that EPT Violations are the only sources of
         nested PML Full VM-Exits.
      31a6cd7f
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-selftests-6.10' of https://github.com/kvm-x86/linux into HEAD · 56f40708
      Paolo Bonzini authored
      KVM selftests cleanups and fixes for 6.10:
      
       - Enhance the demand paging test to allow for better reporting and stressing
         of UFFD performance.
      
       - Convert the steal time test to generate TAP-friendly output.
      
       - Fix a flaky false positive in the xen_shinfo_test due to comparing elapsed
         time across two different clock domains.
      
       - Skip the MONITOR/MWAIT test if the host doesn't actually support MWAIT.
      
       - Avoid unnecessary use of "sudo" in the NX hugepage test to play nice with
         running in a minimal userspace environment.
      
       - Allow skipping the RSEQ test's sanity check that the vCPU was able to
         complete a reasonable number of KVM_RUNs, as the assert can fail on a
         completely valid setup.  If the test is run on a large-ish system that is
         otherwise idle, and the test isn't affined to a low-ish number of CPUs, the
         vCPU task can be repeatedly migrated to CPUs that are in deep sleep states,
         which results in the vCPU having very little net runtime before the next
         migration due to high wakeup latencies.
      56f40708
    • Paolo Bonzini's avatar
      Merge tag 'kvm-x86-generic-6.10' of https://github.com/kvm-x86/linux into HEAD · f4bc1373
      Paolo Bonzini authored
      KVM cleanups for 6.10:
      
       - Misc cleanups extracted from the "exit on missing userspace mapping" series,
         which has been put on hold in anticipation of a "KVM Userfault" approach,
         which should provide a superset of functionality.
      
       - Remove kvm_make_all_cpus_request_except(), which got added to hack around an
         AVIC bug, and then became dead code when a more robust fix came along.
      
       - Fix a goof in the KVM_CREATE_GUEST_MEMFD documentation.
      f4bc1373
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD · e5f62e27
      Paolo Bonzini authored
      KVM/arm64 updates for Linux 6.10
      
      - Move a lot of state that was previously stored on a per vcpu
        basis into a per-CPU area, because it is only pertinent to the
        host while the vcpu is loaded. This results in better state
        tracking, and a smaller vcpu structure.
      
      - Add full handling of the ERET/ERETAA/ERETAB instructions in
        nested virtualisation. The last two instructions also require
        emulating part of the pointer authentication extension.
        As a result, the trap handling of pointer authentication has
        been greattly simplified.
      
      - Turn the global (and not very scalable) LPI translation cache
        into a per-ITS, scalable cache, making non directly injected
        LPIs much cheaper to make visible to the vcpu.
      
      - A batch of pKVM patches, mostly fixes and cleanups, as the
        upstreaming process seems to be resuming. Fingers crossed!
      
      - Allocate PPIs and SGIs outside of the vcpu structure, allowing
        for smaller EL2 mapping and some flexibility in implementing
        more or less than 32 private IRQs.
      
      - Purge stale mpidr_data if a vcpu is created after the MPIDR
        map has been created.
      
      - Preserve vcpu-specific ID registers across a vcpu reset.
      
      - Various minor cleanups and improvements.
      e5f62e27
  4. 10 May, 2024 4 commits
  5. 09 May, 2024 8 commits
  6. 08 May, 2024 4 commits
    • Marc Zyngier's avatar
      Merge branch kvm-arm64/misc-6.10 into kvmarm-master/next · e2815706
      Marc Zyngier authored
      * kvm-arm64/misc-6.10:
        : .
        : Misc fixes and updates targeting 6.10
        :
        : - Improve boot-time diagnostics when the sysreg tables
        :   are not correctly sorted
        :
        : - Allow FFA_MSG_SEND_DIRECT_REQ in the FFA proxy
        :
        : - Fix duplicate XNX field in the ID_AA64MMFR1_EL1
        :   writeable mask
        :
        : - Allocate PPIs and SGIs outside of the vcpu structure, allowing
        :   for smaller EL2 mapping and some flexibility in implementing
        :   more or less than 32 private IRQs.
        :
        : - Use bitmap_gather() instead of its open-coded equivalent
        :
        : - Make protected mode use hVHE if available
        :
        : - Purge stale mpidr_data if a vcpu is created after the MPIDR
        :   map has been created
        : .
        KVM: arm64: Destroy mpidr_data for 'late' vCPU creation
        KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support
        KVM: arm64: Fix hvhe/nvhe early alias parsing
        KVM: arm64: Convert kvm_mpidr_index() to bitmap_gather()
        KVM: arm64: vgic: Allocate private interrupts on demand
        KVM: arm64: Remove duplicated AA64MMFR1_EL1 XNX
        KVM: arm64: Remove FFA_MSG_SEND_DIRECT_REQ from the denylist
        KVM: arm64: Improve out-of-order sysreg table diagnostics
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      e2815706
    • Oliver Upton's avatar
      KVM: arm64: Destroy mpidr_data for 'late' vCPU creation · ce5d2448
      Oliver Upton authored
      A particularly annoying userspace could create a vCPU after KVM has
      computed mpidr_data for the VM, either by racing against VGIC
      initialization or having a userspace irqchip.
      
      In any case, this means mpidr_data no longer fully describes the VM, and
      attempts to find the new vCPU with kvm_mpidr_to_vcpu() will fail. The
      fix is to discard mpidr_data altogether, as it is only a performance
      optimization and not required for correctness. In all likelihood KVM
      will recompute the mappings when KVM_RUN is called on the new vCPU.
      
      Note that reads of mpidr_data are not guarded by a lock; promote to RCU
      to cope with the possibility of mpidr_data being invalidated at runtime.
      
      Fixes: 54a8006d ("KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available")
      Signed-off-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Link: https://lore.kernel.org/r/20240508071952.2035422-1-oliver.upton@linux.devSigned-off-by: default avatarMarc Zyngier <maz@kernel.org>
      ce5d2448
    • Will Deacon's avatar
      KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support · 5053c3f0
      Will Deacon authored
      The early command line parsing treats "kvm-arm.mode=protected" as an
      alias for "id_aa64mmfr1.vh=0", forcing the use of nVHE so that the host
      kernel runs at EL1 with the pKVM hypervisor at EL2.
      
      With the introduction of hVHE support in ad744e8c ("arm64: Allow
      arm64_sw.hvhe on command line"), the hypervisor can run using the EL2+0
      translation regime. This is interesting for unusual CPUs that have VH
      stuck to 1, but also because it opens the possibility of a hypervisor
      "userspace" in the distant future which could be used to isolate vCPU
      contexts in the hypervisor (see Marc's talk from KVM Forum 2022 [1]).
      
      Repaint the "kvm-arm.mode=protected" alias to map to "arm64_sw.hvhe=1",
      which will use hVHE on CPUs that support it and remain with nVHE
      otherwise.
      
      [1] https://www.youtube.com/watch?v=1F_Mf2j9eIoSigned-off-by: default avatarWill Deacon <will@kernel.org>
      Acked-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Link: https://lore.kernel.org/r/20240501163400.15838-3-will@kernel.orgSigned-off-by: default avatarMarc Zyngier <maz@kernel.org>
      5053c3f0
    • Will Deacon's avatar
      KVM: arm64: Fix hvhe/nvhe early alias parsing · 3c142f9d
      Will Deacon authored
      Booting a kernel with "arm64_sw.hvhe=1 kvm-arm.mode=nvhe" on the
      command-line results in KVM initialising using hVHE, whereas one might
      expect the latter option to override the former.
      
      Fix this by adding "arm64_sw.hvhe=0" to the alias expansion for
      "kvm-arm.mode=nvhe".
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Acked-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Link: https://lore.kernel.org/r/20240501163400.15838-2-will@kernel.orgSigned-off-by: default avatarMarc Zyngier <maz@kernel.org>
      3c142f9d
  7. 07 May, 2024 15 commits
    • Michael Roth's avatar
      KVM: SEV: Allow per-guest configuration of GHCB protocol version · 4af663c2
      Michael Roth authored
      The GHCB protocol version may be different from one guest to the next.
      Add a field to track it for each KVM instance and extend KVM_SEV_INIT2
      to allow it to be configured by userspace.
      
      Now that all SEV-ES support for GHCB protocol version 2 is in place, go
      ahead and default to it when creating SEV-ES guests through the new
      KVM_SEV_INIT2 interface. Keep the older KVM_SEV_ES_INIT interface
      restricted to GHCB protocol version 1.
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Message-ID: <20240501071048.2208265-5-michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4af663c2
    • Michael Roth's avatar
      KVM: SEV: Add GHCB handling for termination requests · 8d1a36e4
      Michael Roth authored
      GHCB version 2 adds support for a GHCB-based termination request that
      a guest can issue when it reaches an error state and wishes to inform
      the hypervisor that it should be terminated. Implement support for that
      similarly to GHCB MSR-based termination requests that are already
      available to SEV-ES guests via earlier versions of the GHCB protocol.
      
      See 'Termination Request' in the 'Invoking VMGEXIT' section of the GHCB
      specification for more details.
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Message-ID: <20240501071048.2208265-4-michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8d1a36e4
    • Brijesh Singh's avatar
      KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests · ae018183
      Brijesh Singh authored
      Version 2 of the GHCB specification introduced advertisement of features
      that are supported by the Hypervisor.
      
      Now that KVM supports version 2 of the GHCB specification, bump the
      maximum supported protocol version.
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarAshish Kalra <ashish.kalra@amd.com>
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Message-ID: <20240501071048.2208265-3-michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ae018183
    • Tom Lendacky's avatar
      KVM: SEV: Add support to handle AP reset MSR protocol · d916f003
      Tom Lendacky authored
      Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
      available in version 2 of the GHCB specification.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarAshish Kalra <ashish.kalra@amd.com>
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Message-ID: <20240501071048.2208265-2-michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d916f003
    • Sean Christopherson's avatar
      KVM: x86: Explicitly zero kvm_caps during vendor module load · 40269c03
      Sean Christopherson authored
      Zero out all of kvm_caps when loading a new vendor module to ensure that
      KVM can't inadvertently rely on global initialization of a field, and add
      a comment above the definition of kvm_caps to call out that all fields
      needs to be explicitly computed during vendor module load.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarXiaoyao Li <xiaoyao.li@intel.com>
      Message-ID: <20240423165328.2853870-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      40269c03
    • Sean Christopherson's avatar
      KVM: x86: Fully re-initialize supported_mce_cap on vendor module load · 555485bd
      Sean Christopherson authored
      Effectively reset supported_mce_cap on vendor module load to ensure that
      capabilities aren't unintentionally preserved across module reload, e.g.
      if kvm-intel.ko added a module param to control LMCE support, or if
      someone somehow managed to load a vendor module that doesn't support LMCE
      after loading and unloading kvm-intel.ko.
      
      Practically speaking, this bug is a non-issue as kvm-intel.ko doesn't have
      a module param for LMCE, and there is no system in the world that supports
      both kvm-intel.ko and kvm-amd.ko.
      
      Fixes: c45dcc71 ("KVM: VMX: enable guest access to LMCE related MSRs")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarXiaoyao Li <xiaoyao.li@intel.com>
      Message-ID: <20240423165328.2853870-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      555485bd
    • Sean Christopherson's avatar
      KVM: x86: Fully re-initialize supported_vm_types on vendor module load · c43ad190
      Sean Christopherson authored
      Recompute the entire set of supported VM types when a vendor module is
      loaded, as preserving supported_vm_types across vendor module unload and
      reload can result in VM types being incorrectly treated as supported.
      
      E.g. if a vendor module is loaded with TDP enabled, unloaded, and then
      reloaded with TDP disabled, KVM_X86_SW_PROTECTED_VM will be incorrectly
      retained.  Ditto for SEV_VM and SEV_ES_VM and their respective module
      params in kvm-amd.ko.
      
      Fixes: 2a955c4d ("KVM: x86: Add supported_vm_types to kvm_caps")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarXiaoyao Li <xiaoyao.li@intel.com>
      Message-ID: <20240423165328.2853870-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c43ad190
    • Paolo Bonzini's avatar
      Merge tag 'kvm-riscv-6.10-1' of https://github.com/kvm-riscv/linux into HEAD · aa24865f
      Paolo Bonzini authored
       KVM/riscv changes for 6.10
      
      - Support guest breakpoints using ebreak
      - Introduce per-VCPU mp_state_lock and reset_cntx_lock
      - Virtualize SBI PMU snapshot and counter overflow interrupts
      - New selftests for SBI PMU and Guest ebreak
      aa24865f
    • Sean Christopherson's avatar
      KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns · 2b1f4355
      Sean Christopherson authored
      WARN if __kvm_faultin_pfn() generates a "no slot" pfn, and gracefully
      handle the unexpected behavior instead of continuing on with dangerous
      state, e.g. tdp_mmu_map_handle_target_level() _only_ checks fault->slot,
      and so could install a bogus PFN into the guest.
      
      The existing code is functionally ok, because kvm_faultin_pfn() pre-checks
      all of the cases that result in KVM_PFN_NOSLOT, but it is unnecessarily
      unsafe as it relies on __gfn_to_pfn_memslot() getting the _exact_ same
      memslot, i.e. not a re-retrieved pointer with KVM_MEMSLOT_INVALID set.
      And checking only fault->slot would fall apart if KVM ever added a flag or
      condition that forced emulation, similar to how KVM handles writes to
      read-only memslots.
      
      Cc: David Matlack <dmatlack@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarKai Huang <kai.huang@intel.com>
      Message-ID: <20240228024147.41573-17-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2b1f4355
    • Sean Christopherson's avatar
      KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values · f3310e62
      Sean Christopherson authored
      Explicitly set "pfn" and "hva" to error values in kvm_mmu_do_page_fault()
      to harden KVM against using "uninitialized" values.  In quotes because the
      fields are actually zero-initialized, and zero is a legal value for both
      page frame numbers and virtual addresses.  E.g. failure to set "pfn" prior
      to creating an SPTE could result in KVM pointing at physical address '0',
      which is far less desirable than KVM generating a SPTE with reserved PA
      bits set and thus effectively killing the VM.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarKai Huang <kai.huang@intel.com>
      Message-ID: <20240228024147.41573-16-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f3310e62
    • Sean Christopherson's avatar
      KVM: x86/mmu: Set kvm_page_fault.hva to KVM_HVA_ERR_BAD for "no slot" faults · 36d44927
      Sean Christopherson authored
      Explicitly set fault->hva to KVM_HVA_ERR_BAD when handling a "no slot"
      fault to ensure that KVM doesn't use a bogus virtual address, e.g. if
      there *was* a slot but it's unusable (APIC access page), or if there
      really was no slot, in which case fault->hva will be '0' (which is a
      legal address for x86).
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarKai Huang <kai.huang@intel.com>
      Message-ID: <20240228024147.41573-15-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      36d44927
    • Sean Christopherson's avatar
      KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn() · f6adeae8
      Sean Christopherson authored
      Handle the "no memslot" case at the beginning of kvm_faultin_pfn(), just
      after the private versus shared check, so that there's no need to
      repeatedly query whether or not a slot exists.  This also makes it more
      obvious that, except for private vs. shared attributes, the process of
      faulting in a pfn simply doesn't apply to gfns without a slot.
      
      Opportunistically stuff @fault's metadata in kvm_handle_noslot_fault() so
      that it doesn't need to be duplicated in all paths that invoke
      kvm_handle_noslot_fault(), and to minimize the probability of not stuffing
      the right fields.
      
      Leave the existing handle behind, but convert it to a WARN, to guard
      against __kvm_faultin_pfn() unexpectedly nullifying fault->slot.
      
      Cc: David Matlack <dmatlack@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarKai Huang <kai.huang@intel.com>
      Message-ID: <20240228024147.41573-14-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f6adeae8
    • Sean Christopherson's avatar
      KVM: x86/mmu: Move slot checks from __kvm_faultin_pfn() to kvm_faultin_pfn() · cd272fc4
      Sean Christopherson authored
      Move the checks related to the validity of an access to a memslot from the
      inner __kvm_faultin_pfn() to its sole caller, kvm_faultin_pfn().  This
      allows emulating accesses to the APIC access page, which don't need to
      resolve a pfn, even if there is a relevant in-progress mmu_notifier
      invalidation.  Ditto for accesses to KVM internal memslots from L2, which
      KVM also treats as emulated MMIO.
      
      More importantly, this will allow for future cleanup by having the
      "no memslot" case bail from kvm_faultin_pfn() very early on.
      
      Go to rather extreme and gross lengths to make the change a glorified
      nop, e.g. call into __kvm_faultin_pfn() even when there is no slot, as the
      related code is very subtle.  E.g. fault->slot can be nullified if it
      points at the APIC access page, some flows in KVM x86 expect fault->pfn
      to be KVM_PFN_NOSLOT, while others check only fault->slot, etc.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarKai Huang <kai.huang@intel.com>
      Message-ID: <20240228024147.41573-13-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cd272fc4
    • Sean Christopherson's avatar
      KVM: x86/mmu: Explicitly disallow private accesses to emulated MMIO · bde9f9d2
      Sean Christopherson authored
      Explicitly detect and disallow private accesses to emulated MMIO in
      kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private()
      to perform the check.  This will allow the page fault path to go straight
      to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn().
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20240228024147.41573-12-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bde9f9d2
    • Sean Christopherson's avatar
      KVM: x86/mmu: Don't force emulation of L2 accesses to non-APIC internal slots · 5bd74f6e
      Sean Christopherson authored
      Allow mapping KVM's internal memslots used for EPT without unrestricted
      guest into L2, i.e. allow mapping the hidden TSS and the identity mapped
      page tables into L2.  Unlike the APIC access page, there is no correctness
      issue with letting L2 access the "hidden" memory.  Allowing these memslots
      to be mapped into L2 fixes a largely theoretical bug where KVM could
      incorrectly emulate subsequent _L1_ accesses as MMIO, and also ensures
      consistent KVM behavior for L2.
      
      If KVM is using TDP, but L1 is using shadow paging for L2, then routing
      through kvm_handle_noslot_fault() will incorrectly cache the gfn as MMIO,
      and create an MMIO SPTE.  Creating an MMIO SPTE is ok, but only because
      kvm_mmu_page_role.guest_mode ensure KVM uses different roots for L1 vs.
      L2.  But vcpu->arch.mmio_gfn will remain valid, and could cause KVM to
      incorrectly treat an L1 access to the hidden TSS or identity mapped page
      tables as MMIO.
      
      Furthermore, forcing L2 accesses to be treated as "no slot" faults doesn't
      actually prevent exposing KVM's internal memslots to L2, it simply forces
      KVM to emulate the access.  In most cases, that will trigger MMIO,
      amusingly due to filling vcpu->arch.mmio_gfn, but also because
      vcpu_is_mmio_gpa() unconditionally treats APIC accesses as MMIO, i.e. APIC
      accesses are ok.  But the hidden TSS and identity mapped page tables could
      go either way (MMIO or access the private memslot's backing memory).
      
      Alternatively, the inconsistent emulator behavior could be addressed by
      forcing MMIO emulation for L2 access to all internal memslots, not just to
      the APIC.  But that's arguably less correct than letting L2 access the
      hidden TSS and identity mapped page tables, not to mention that it's
      *extremely* unlikely anyone cares what KVM does in this case.  From L1's
      perspective there is R/W memory at those memslots, the memory just happens
      to be initialized with non-zero data.  Making the memory disappear when it
      is accessed by L2 is far more magical and arbitrary than the memory
      existing in the first place.
      
      The APIC access page is special because KVM _must_ emulate the access to
      do the right thing (emulate an APIC access instead of reading/writing the
      APIC access page).  And despite what commit 3a2936de ("kvm: mmu: Don't
      expose private memslots to L2") said, it's not just necessary when L1 is
      accelerating L2's virtual APIC, it's just as important (likely *more*
      imporant for correctness when L1 is passing through its own APIC to L2.
      
      Fixes: 3a2936de ("kvm: mmu: Don't expose private memslots to L2")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarKai Huang <kai.huang@intel.com>
      Message-ID: <20240228024147.41573-11-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5bd74f6e