1. 10 May, 2024 4 commits
  2. 07 May, 2024 25 commits
    • Michael Roth's avatar
      KVM: SEV: Allow per-guest configuration of GHCB protocol version · 4af663c2
      Michael Roth authored
      The GHCB protocol version may be different from one guest to the next.
      Add a field to track it for each KVM instance and extend KVM_SEV_INIT2
      to allow it to be configured by userspace.
      
      Now that all SEV-ES support for GHCB protocol version 2 is in place, go
      ahead and default to it when creating SEV-ES guests through the new
      KVM_SEV_INIT2 interface. Keep the older KVM_SEV_ES_INIT interface
      restricted to GHCB protocol version 1.
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Message-ID: <20240501071048.2208265-5-michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4af663c2
    • Michael Roth's avatar
      KVM: SEV: Add GHCB handling for termination requests · 8d1a36e4
      Michael Roth authored
      GHCB version 2 adds support for a GHCB-based termination request that
      a guest can issue when it reaches an error state and wishes to inform
      the hypervisor that it should be terminated. Implement support for that
      similarly to GHCB MSR-based termination requests that are already
      available to SEV-ES guests via earlier versions of the GHCB protocol.
      
      See 'Termination Request' in the 'Invoking VMGEXIT' section of the GHCB
      specification for more details.
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Message-ID: <20240501071048.2208265-4-michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8d1a36e4
    • Brijesh Singh's avatar
      KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests · ae018183
      Brijesh Singh authored
      Version 2 of the GHCB specification introduced advertisement of features
      that are supported by the Hypervisor.
      
      Now that KVM supports version 2 of the GHCB specification, bump the
      maximum supported protocol version.
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarAshish Kalra <ashish.kalra@amd.com>
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Message-ID: <20240501071048.2208265-3-michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ae018183
    • Tom Lendacky's avatar
      KVM: SEV: Add support to handle AP reset MSR protocol · d916f003
      Tom Lendacky authored
      Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
      available in version 2 of the GHCB specification.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarAshish Kalra <ashish.kalra@amd.com>
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Message-ID: <20240501071048.2208265-2-michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d916f003
    • Sean Christopherson's avatar
      KVM: x86: Explicitly zero kvm_caps during vendor module load · 40269c03
      Sean Christopherson authored
      Zero out all of kvm_caps when loading a new vendor module to ensure that
      KVM can't inadvertently rely on global initialization of a field, and add
      a comment above the definition of kvm_caps to call out that all fields
      needs to be explicitly computed during vendor module load.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarXiaoyao Li <xiaoyao.li@intel.com>
      Message-ID: <20240423165328.2853870-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      40269c03
    • Sean Christopherson's avatar
      KVM: x86: Fully re-initialize supported_mce_cap on vendor module load · 555485bd
      Sean Christopherson authored
      Effectively reset supported_mce_cap on vendor module load to ensure that
      capabilities aren't unintentionally preserved across module reload, e.g.
      if kvm-intel.ko added a module param to control LMCE support, or if
      someone somehow managed to load a vendor module that doesn't support LMCE
      after loading and unloading kvm-intel.ko.
      
      Practically speaking, this bug is a non-issue as kvm-intel.ko doesn't have
      a module param for LMCE, and there is no system in the world that supports
      both kvm-intel.ko and kvm-amd.ko.
      
      Fixes: c45dcc71 ("KVM: VMX: enable guest access to LMCE related MSRs")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarXiaoyao Li <xiaoyao.li@intel.com>
      Message-ID: <20240423165328.2853870-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      555485bd
    • Sean Christopherson's avatar
      KVM: x86: Fully re-initialize supported_vm_types on vendor module load · c43ad190
      Sean Christopherson authored
      Recompute the entire set of supported VM types when a vendor module is
      loaded, as preserving supported_vm_types across vendor module unload and
      reload can result in VM types being incorrectly treated as supported.
      
      E.g. if a vendor module is loaded with TDP enabled, unloaded, and then
      reloaded with TDP disabled, KVM_X86_SW_PROTECTED_VM will be incorrectly
      retained.  Ditto for SEV_VM and SEV_ES_VM and their respective module
      params in kvm-amd.ko.
      
      Fixes: 2a955c4d ("KVM: x86: Add supported_vm_types to kvm_caps")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarXiaoyao Li <xiaoyao.li@intel.com>
      Message-ID: <20240423165328.2853870-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c43ad190
    • Paolo Bonzini's avatar
      Merge tag 'kvm-riscv-6.10-1' of https://github.com/kvm-riscv/linux into HEAD · aa24865f
      Paolo Bonzini authored
       KVM/riscv changes for 6.10
      
      - Support guest breakpoints using ebreak
      - Introduce per-VCPU mp_state_lock and reset_cntx_lock
      - Virtualize SBI PMU snapshot and counter overflow interrupts
      - New selftests for SBI PMU and Guest ebreak
      aa24865f
    • Sean Christopherson's avatar
      KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns · 2b1f4355
      Sean Christopherson authored
      WARN if __kvm_faultin_pfn() generates a "no slot" pfn, and gracefully
      handle the unexpected behavior instead of continuing on with dangerous
      state, e.g. tdp_mmu_map_handle_target_level() _only_ checks fault->slot,
      and so could install a bogus PFN into the guest.
      
      The existing code is functionally ok, because kvm_faultin_pfn() pre-checks
      all of the cases that result in KVM_PFN_NOSLOT, but it is unnecessarily
      unsafe as it relies on __gfn_to_pfn_memslot() getting the _exact_ same
      memslot, i.e. not a re-retrieved pointer with KVM_MEMSLOT_INVALID set.
      And checking only fault->slot would fall apart if KVM ever added a flag or
      condition that forced emulation, similar to how KVM handles writes to
      read-only memslots.
      
      Cc: David Matlack <dmatlack@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarKai Huang <kai.huang@intel.com>
      Message-ID: <20240228024147.41573-17-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2b1f4355
    • Sean Christopherson's avatar
      KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values · f3310e62
      Sean Christopherson authored
      Explicitly set "pfn" and "hva" to error values in kvm_mmu_do_page_fault()
      to harden KVM against using "uninitialized" values.  In quotes because the
      fields are actually zero-initialized, and zero is a legal value for both
      page frame numbers and virtual addresses.  E.g. failure to set "pfn" prior
      to creating an SPTE could result in KVM pointing at physical address '0',
      which is far less desirable than KVM generating a SPTE with reserved PA
      bits set and thus effectively killing the VM.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarKai Huang <kai.huang@intel.com>
      Message-ID: <20240228024147.41573-16-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f3310e62
    • Sean Christopherson's avatar
      KVM: x86/mmu: Set kvm_page_fault.hva to KVM_HVA_ERR_BAD for "no slot" faults · 36d44927
      Sean Christopherson authored
      Explicitly set fault->hva to KVM_HVA_ERR_BAD when handling a "no slot"
      fault to ensure that KVM doesn't use a bogus virtual address, e.g. if
      there *was* a slot but it's unusable (APIC access page), or if there
      really was no slot, in which case fault->hva will be '0' (which is a
      legal address for x86).
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarKai Huang <kai.huang@intel.com>
      Message-ID: <20240228024147.41573-15-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      36d44927
    • Sean Christopherson's avatar
      KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn() · f6adeae8
      Sean Christopherson authored
      Handle the "no memslot" case at the beginning of kvm_faultin_pfn(), just
      after the private versus shared check, so that there's no need to
      repeatedly query whether or not a slot exists.  This also makes it more
      obvious that, except for private vs. shared attributes, the process of
      faulting in a pfn simply doesn't apply to gfns without a slot.
      
      Opportunistically stuff @fault's metadata in kvm_handle_noslot_fault() so
      that it doesn't need to be duplicated in all paths that invoke
      kvm_handle_noslot_fault(), and to minimize the probability of not stuffing
      the right fields.
      
      Leave the existing handle behind, but convert it to a WARN, to guard
      against __kvm_faultin_pfn() unexpectedly nullifying fault->slot.
      
      Cc: David Matlack <dmatlack@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarKai Huang <kai.huang@intel.com>
      Message-ID: <20240228024147.41573-14-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f6adeae8
    • Sean Christopherson's avatar
      KVM: x86/mmu: Move slot checks from __kvm_faultin_pfn() to kvm_faultin_pfn() · cd272fc4
      Sean Christopherson authored
      Move the checks related to the validity of an access to a memslot from the
      inner __kvm_faultin_pfn() to its sole caller, kvm_faultin_pfn().  This
      allows emulating accesses to the APIC access page, which don't need to
      resolve a pfn, even if there is a relevant in-progress mmu_notifier
      invalidation.  Ditto for accesses to KVM internal memslots from L2, which
      KVM also treats as emulated MMIO.
      
      More importantly, this will allow for future cleanup by having the
      "no memslot" case bail from kvm_faultin_pfn() very early on.
      
      Go to rather extreme and gross lengths to make the change a glorified
      nop, e.g. call into __kvm_faultin_pfn() even when there is no slot, as the
      related code is very subtle.  E.g. fault->slot can be nullified if it
      points at the APIC access page, some flows in KVM x86 expect fault->pfn
      to be KVM_PFN_NOSLOT, while others check only fault->slot, etc.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarKai Huang <kai.huang@intel.com>
      Message-ID: <20240228024147.41573-13-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cd272fc4
    • Sean Christopherson's avatar
      KVM: x86/mmu: Explicitly disallow private accesses to emulated MMIO · bde9f9d2
      Sean Christopherson authored
      Explicitly detect and disallow private accesses to emulated MMIO in
      kvm_handle_noslot_fault() instead of relying on kvm_faultin_pfn_private()
      to perform the check.  This will allow the page fault path to go straight
      to kvm_handle_noslot_fault() without bouncing through __kvm_faultin_pfn().
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20240228024147.41573-12-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bde9f9d2
    • Sean Christopherson's avatar
      KVM: x86/mmu: Don't force emulation of L2 accesses to non-APIC internal slots · 5bd74f6e
      Sean Christopherson authored
      Allow mapping KVM's internal memslots used for EPT without unrestricted
      guest into L2, i.e. allow mapping the hidden TSS and the identity mapped
      page tables into L2.  Unlike the APIC access page, there is no correctness
      issue with letting L2 access the "hidden" memory.  Allowing these memslots
      to be mapped into L2 fixes a largely theoretical bug where KVM could
      incorrectly emulate subsequent _L1_ accesses as MMIO, and also ensures
      consistent KVM behavior for L2.
      
      If KVM is using TDP, but L1 is using shadow paging for L2, then routing
      through kvm_handle_noslot_fault() will incorrectly cache the gfn as MMIO,
      and create an MMIO SPTE.  Creating an MMIO SPTE is ok, but only because
      kvm_mmu_page_role.guest_mode ensure KVM uses different roots for L1 vs.
      L2.  But vcpu->arch.mmio_gfn will remain valid, and could cause KVM to
      incorrectly treat an L1 access to the hidden TSS or identity mapped page
      tables as MMIO.
      
      Furthermore, forcing L2 accesses to be treated as "no slot" faults doesn't
      actually prevent exposing KVM's internal memslots to L2, it simply forces
      KVM to emulate the access.  In most cases, that will trigger MMIO,
      amusingly due to filling vcpu->arch.mmio_gfn, but also because
      vcpu_is_mmio_gpa() unconditionally treats APIC accesses as MMIO, i.e. APIC
      accesses are ok.  But the hidden TSS and identity mapped page tables could
      go either way (MMIO or access the private memslot's backing memory).
      
      Alternatively, the inconsistent emulator behavior could be addressed by
      forcing MMIO emulation for L2 access to all internal memslots, not just to
      the APIC.  But that's arguably less correct than letting L2 access the
      hidden TSS and identity mapped page tables, not to mention that it's
      *extremely* unlikely anyone cares what KVM does in this case.  From L1's
      perspective there is R/W memory at those memslots, the memory just happens
      to be initialized with non-zero data.  Making the memory disappear when it
      is accessed by L2 is far more magical and arbitrary than the memory
      existing in the first place.
      
      The APIC access page is special because KVM _must_ emulate the access to
      do the right thing (emulate an APIC access instead of reading/writing the
      APIC access page).  And despite what commit 3a2936de ("kvm: mmu: Don't
      expose private memslots to L2") said, it's not just necessary when L1 is
      accelerating L2's virtual APIC, it's just as important (likely *more*
      imporant for correctness when L1 is passing through its own APIC to L2.
      
      Fixes: 3a2936de ("kvm: mmu: Don't expose private memslots to L2")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarKai Huang <kai.huang@intel.com>
      Message-ID: <20240228024147.41573-11-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5bd74f6e
    • Sean Christopherson's avatar
      KVM: x86/mmu: Move private vs. shared check above slot validity checks · 44f42ef3
      Sean Christopherson authored
      Prioritize private vs. shared gfn attribute checks above slot validity
      checks to ensure a consistent userspace ABI.  E.g. as is, KVM will exit to
      userspace if there is no memslot, but emulate accesses to the APIC access
      page even if the attributes mismatch.
      
      Fixes: 8dd2eee9 ("KVM: x86/mmu: Handle page fault for private memory")
      Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
      Cc: Chao Peng <chao.p.peng@linux.intel.com>
      Cc: Fuad Tabba <tabba@google.com>
      Cc: Michael Roth <michael.roth@amd.com>
      Cc: Isaku Yamahata <isaku.yamahata@intel.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarKai Huang <kai.huang@intel.com>
      Message-ID: <20240228024147.41573-10-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      44f42ef3
    • Sean Christopherson's avatar
      KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults · 07702e5a
      Sean Christopherson authored
      WARN and skip the emulated MMIO fastpath if a private, reserved page fault
      is encountered, as private+reserved should be an impossible combination
      (KVM should never create an MMIO SPTE for a private access).
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20240228024147.41573-9-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      07702e5a
    • Paolo Bonzini's avatar
      KVM: x86/mmu: check for invalid async page faults involving private memory · cd389f50
      Paolo Bonzini authored
      Right now the error code is not used when an async page fault is completed.
      This is not a problem in the current code, but it is untidy.  For protected
      VMs, we will also need to check that the page attributes match the current
      state of the page, because asynchronous page faults can only occur on
      shared pages (private pages go through kvm_faultin_pfn_private() instead of
      __gfn_to_pfn_memslot()).
      
      Start by piping the error code from kvm_arch_setup_async_pf() to
      kvm_arch_async_page_ready() via the architecture-specific async page
      fault data.  For now, it can be used to assert that there are no
      async page faults on private memory.
      
      Extracted from a patch by Isaku Yamahata.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cd389f50
    • Sean Christopherson's avatar
      KVM: x86/mmu: Use synthetic page fault error code to indicate private faults · b3d5dc62
      Sean Christopherson authored
      Add and use a synthetic, KVM-defined page fault error code to indicate
      whether a fault is to private vs. shared memory.  TDX and SNP have
      different mechanisms for reporting private vs. shared, and KVM's
      software-protected VMs have no mechanism at all.  Usurp an error code
      flag to avoid having to plumb another parameter to kvm_mmu_page_fault()
      and friends.
      
      Alternatively, KVM could borrow AMD's PFERR_GUEST_ENC_MASK, i.e. set it
      for TDX and software-protected VMs as appropriate, but that would require
      *clearing* the flag for SEV and SEV-ES VMs, which support encrypted
      memory at the hardware layer, but don't utilize private memory at the
      KVM layer.
      
      Opportunistically add a comment to call out that the logic for software-
      protected VMs is (and was before this commit) broken for nested MMUs, i.e.
      for nested TDP, as the GPA is an L2 GPA.  Punt on trying to play nice with
      nested MMUs as there is a _lot_ of functionality that simply doesn't work
      for software-protected VMs, e.g. all of the paths where KVM accesses guest
      memory need to be updated to be aware of private vs. shared memory.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20240228024147.41573-6-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b3d5dc62
    • Sean Christopherson's avatar
      KVM: x86/mmu: WARN if upper 32 bits of legacy #PF error code are non-zero · 7bdbb820
      Sean Christopherson authored
      WARN if bits 63:32 are non-zero when handling an intercepted legacy #PF,
      as the error code for #PF is limited to 32 bits (and in practice, 16 bits
      on Intel CPUS).  This behavior is architectural, is part of KVM's ABI
      (see kvm_vcpu_events.error_code), and is explicitly documented as being
      preserved for intecerpted #PF in both the APM:
      
        The error code saved in EXITINFO1 is the same as would be pushed onto
        the stack by a non-intercepted #PF exception in protected mode.
      
      and even more explicitly in the SDM as VMCS.VM_EXIT_INTR_ERROR_CODE is a
      32-bit field.
      
      Simply drop the upper bits if hardware provides garbage, as spurious
      information should do no harm (though in all likelihood hardware is buggy
      and the kernel is doomed).
      
      Handling all upper 32 bits in the #PF path will allow moving the sanity
      check on synthetic checks from kvm_mmu_page_fault() to npf_interception(),
      which in turn will allow deriving PFERR_PRIVATE_ACCESS from AMD's
      PFERR_GUEST_ENC_MASK without running afoul of the sanity check.
      
      Note, this is also why Intel uses bit 15 for SGX (highest bit on Intel CPUs)
      and AMD uses bit 31 for RMP (highest bit on AMD CPUs); using the highest
      bit minimizes the probability of a collision with the "other" vendor,
      without needing to plumb more bits through microcode.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarKai Huang <kai.huang@intel.com>
      Message-ID: <20240228024147.41573-7-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7bdbb820
    • Isaku Yamahata's avatar
      KVM: x86/mmu: Pass full 64-bit error code when handling page faults · c9710130
      Isaku Yamahata authored
      Plumb the full 64-bit error code throughout the page fault handling code
      so that KVM can use the upper 32 bits, e.g. SNP's PFERR_GUEST_ENC_MASK
      will be used to determine whether or not a fault is private vs. shared.
      
      Note, passing the 64-bit error code to FNAME(walk_addr)() does NOT change
      the behavior of permission_fault() when invoked in the page fault path, as
      KVM explicitly clears PFERR_IMPLICIT_ACCESS in kvm_mmu_page_fault().
      
      Continue passing '0' from the async #PF worker, as guest_memfd and thus
      private memory doesn't support async page faults.
      Signed-off-by: default avatarIsaku Yamahata <isaku.yamahata@intel.com>
      [mdr: drop references/changes on rebase, update commit message]
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      [sean: drop truncation in call to FNAME(walk_addr)(), rewrite changelog]
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarXiaoyao Li <xiaoyao.li@intel.com>
      Message-ID: <20240228024147.41573-5-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c9710130
    • Sean Christopherson's avatar
      KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler · dee281e4
      Sean Christopherson authored
      Move the sanity check that hardware never sets bits that collide with KVM-
      define synthetic bits from kvm_mmu_page_fault() to npf_interception(),
      i.e. make the sanity check #NPF specific.  The legacy #PF path already
      WARNs if _any_ of bits 63:32 are set, and the error code that comes from
      VMX's EPT Violatation and Misconfig is 100% synthesized (KVM morphs VMX's
      EXIT_QUALIFICATION into error code flags).
      
      Add a compile-time assert in the legacy #PF handler to make sure that KVM-
      define flags are covered by its existing sanity check on the upper bits.
      
      Opportunistically add a description of PFERR_IMPLICIT_ACCESS, since we
      are removing the comment that defined it.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarKai Huang <kai.huang@intel.com>
      Reviewed-by: default avatarBinbin Wu <binbin.wu@linux.intel.com>
      Message-ID: <20240228024147.41573-8-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      dee281e4
    • Sean Christopherson's avatar
      KVM: x86: Define more SEV+ page fault error bits/flags for #NPF · 9b62e03e
      Sean Christopherson authored
      Define more #NPF error code flags that are relevant to SEV+ (mostly SNP)
      guests, as specified by the APM:
      
       * Bit 31 (RMP):   Set to 1 if the fault was caused due to an RMP check or a
                         VMPL check failure, 0 otherwise.
       * Bit 34 (ENC):   Set to 1 if the guest’s effective C-bit was 1, 0 otherwise.
       * Bit 35 (SIZEM): Set to 1 if the fault was caused by a size mismatch between
                         PVALIDATE or RMPADJUST and the RMP, 0 otherwise.
       * Bit 36 (VMPL):  Set to 1 if the fault was caused by a VMPL permission
                         check failure, 0 otherwise.
      
      Note, the APM is *extremely* misleading, and strongly implies that the
      above flags can _only_ be set for #NPF exits from SNP guests.  That is a
      lie, as bit 34 (C-bit=1, i.e. was encrypted) can be set when running _any_
      flavor of SEV guest on SNP capable hardware.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20240228024147.41573-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9b62e03e
    • Sean Christopherson's avatar
      KVM: x86: Remove separate "bit" defines for page fault error code masks · 63b6206e
      Sean Christopherson authored
      Open code the bit number directly in the PFERR_* masks and drop the
      intermediate PFERR_*_BIT defines, as having to bounce through two macros
      just to see which flag corresponds to which bit is quite annoying, as is
      having to define two macros just to add recognition of a new flag.
      
      Use ternary operator to derive the bit in permission_fault(), the one
      function that actually needs the bit number as part of clever shifting
      to avoid conditional branches.  Generally the compiler is able to turn
      it into a conditional move, and if not it's not really a big deal.
      
      No functional change intended.
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Message-ID: <20240228024147.41573-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      63b6206e
    • Sean Christopherson's avatar
      KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation · d0bf8e6e
      Sean Christopherson authored
      Exit to userspace with -EFAULT / KVM_EXIT_MEMORY_FAULT if a private fault
      triggers emulation of any kind, as KVM doesn't currently support emulating
      access to guest private memory.  Practically speaking, private faults and
      emulation are already mutually exclusive, but there are many flow that
      can result in KVM returning RET_PF_EMULATE, and adding one last check
      to harden against weird, unexpected combinations and/or KVM bugs is
      inexpensive.
      Suggested-by: default avatarYan Zhao <yan.y.zhao@intel.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20240228024147.41573-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d0bf8e6e
  3. 06 May, 2024 8 commits
    • Bibo Mao's avatar
      LoongArch: KVM: Add mmio trace events support · 7b7e584f
      Bibo Mao authored
      Add mmio trace events support, currently generic mmio events
      KVM_TRACE_MMIO_WRITE/xxx_READ/xx_READ_UNSATISFIED are added here.
      
      Also vcpu id field is added for all kvm trace events, since perf
      KVM tool parses vcpu id information for kvm entry event.
      Signed-off-by: default avatarBibo Mao <maobibo@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      7b7e584f
    • Bibo Mao's avatar
      LoongArch: KVM: Add software breakpoint support · 163e9fc6
      Bibo Mao authored
      When VM runs in kvm mode, system will not exit to host mode when
      executing a general software breakpoint instruction such as INSN_BREAK,
      trap exception happens in guest mode rather than host mode. In order to
      debug guest kernel on host side, one mechanism should be used to let VM
      exit to host mode.
      
      Here a hypercall instruction with a special code is used for software
      breakpoint usage. VM exits to host mode and kvm hypervisor identifies
      the special hypercall code and sets exit_reason with KVM_EXIT_DEBUG. And
      then let qemu handle it.
      
      Idea comes from ppc kvm, one api KVM_REG_LOONGARCH_DEBUG_INST is added
      to get the hypercall code. VMM needs get sw breakpoint instruction with
      this api and set the corresponding sw break point for guest kernel.
      Signed-off-by: default avatarBibo Mao <maobibo@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      163e9fc6
    • Bibo Mao's avatar
      LoongArch: KVM: Add PV IPI support on guest side · 74c16b2e
      Bibo Mao authored
      PARAVIRT config option and PV IPI is added for the guest side, function
      pv_ipi_init() is used to add IPI sending and IPI receiving hooks. This
      function firstly checks whether system runs in VM mode, and if kernel
      runs in VM mode, it will call function kvm_para_available() to detect
      the current hypervirsor type (now only KVM type detection is supported).
      The paravirt functions can work only if current hypervisor type is KVM,
      since there is only KVM supported on LoongArch now.
      
      PV IPI uses virtual IPI sender and virtual IPI receiver functions. With
      virtual IPI sender, IPI message is stored in memory rather than emulated
      HW. IPI multicast is also supported, and 128 vcpus can received IPIs
      at the same time like X86 KVM method. Hypercall method is used for IPI
      sending.
      
      With virtual IPI receiver, HW SWI0 is used rather than real IPI HW.
      Since VCPU has separate HW SWI0 like HW timer, there is no trap in IPI
      interrupt acknowledge. Since IPI message is stored in memory, there is
      no trap in getting IPI message.
      Signed-off-by: default avatarBibo Mao <maobibo@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      74c16b2e
    • Bibo Mao's avatar
      LoongArch: KVM: Add PV IPI support on host side · e33bda7e
      Bibo Mao authored
      On LoongArch system, IPI hw uses iocsr registers. There are one iocsr
      register access on IPI sending, and two iocsr access on IPI receiving
      for the IPI interrupt handler. In VM mode all iocsr accessing will cause
      VM to trap into hypervisor. So with one IPI hw notification there will
      be three times of trap.
      
      In this patch PV IPI is added for VM, hypercall instruction is used for
      IPI sender, and hypervisor will inject an SWI to the destination vcpu.
      During the SWI interrupt handler, only CSR.ESTAT register is written to
      clear irq. CSR.ESTAT register access will not trap into hypervisor, so
      with PV IPI supported, there is one trap with IPI sender, and no trap
      with IPI receiver, there is only one trap with IPI notification.
      
      Also this patch adds IPI multicast support, the method is similar with
      x86. With IPI multicast support, IPI notification can be sent to at
      most 128 vcpus at one time. It greatly reduces the times of trapping
      into hypervisor.
      Signed-off-by: default avatarBibo Mao <maobibo@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      e33bda7e
    • Bibo Mao's avatar
      LoongArch: KVM: Add vcpu mapping from physical cpuid · 73516e9d
      Bibo Mao authored
      Physical CPUID is used for interrupt routing for irqchips such as ipi,
      msgint and eiointc interrupt controllers. Physical CPUID is stored at
      the CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu
      is created and the physical CPUIDs of two vcpus cannot be the same.
      
      Different irqchips have different size declaration about physical CPUID,
      the max CPUID value for CSR LOONGARCH_CSR_CPUID on Loongson-3A5000 is
      512, the max CPUID supported by IPI hardware is 1024, while for eiointc
      irqchip is 256, and for msgint irqchip is 65536.
      
      The smallest value from all interrupt controllers is selected now, and
      the max cpuid size is defines as 256 by KVM which comes from the eiointc
      irqchip.
      Signed-off-by: default avatarBibo Mao <maobibo@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      73516e9d
    • Bibo Mao's avatar
      LoongArch: KVM: Add cpucfg area for kvm hypervisor · 9753d303
      Bibo Mao authored
      Instruction cpucfg can be used to get processor features. And there
      is a trap exception when it is executed in VM mode, and also it can be
      used to provide cpu features to VM. On real hardware cpucfg area 0 - 20
      is used by now. Here one specified area 0x40000000 -- 0x400000ff is used
      for KVM hypervisor to provide PV features, and the area can be extended
      for other hypervisors in future. This area will never be used for real
      HW, it is only used by software.
      Signed-off-by: default avatarBibo Mao <maobibo@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      9753d303
    • Bibo Mao's avatar
      LoongArch: KVM: Add hypercall instruction emulation · 372631bb
      Bibo Mao authored
      On LoongArch system, there is a hypercall instruction special for
      virtualization. When system executes this instruction on host side,
      there is an illegal instruction exception reported, however it will
      trap into host when it is executed in VM mode.
      
      When hypercall is emulated, A0 register is set with value
      KVM_HCALL_INVALID_CODE, rather than inject EXCCODE_INE invalid
      instruction exception. So VM can continue to executing the next code.
      Signed-off-by: default avatarBibo Mao <maobibo@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      372631bb
    • Bibo Mao's avatar
      LoongArch/smp: Refine some ipi functions on LoongArch platform · 316863cb
      Bibo Mao authored
      Refine the ipi handling on LoongArch platform, there are three
      modifications:
      
      1. Add generic function get_percpu_irq(), replacing some percpu irq
      functions such as get_ipi_irq()/get_pmc_irq()/get_timer_irq() with
      get_percpu_irq().
      
      2. Change definition about parameter action called by function
      loongson_send_ipi_single() and loongson_send_ipi_mask(), and it is
      defined as decimal encoding format at ipi sender side. Normal decimal
      encoding is used rather than binary bitmap encoding for ipi action, ipi
      hw sender uses decimal encoding code, and ipi receiver will get binary
      bitmap encoding, the ipi hw will convert it into bitmap in ipi message
      buffer.
      
      3. Add a structure smp_ops on LoongArch platform so that pv ipi can be
      used later.
      Signed-off-by: default avatarBibo Mao <maobibo@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      316863cb
  4. 05 May, 2024 3 commits