1. 12 Jul, 2024 10 commits
  2. 20 Jun, 2024 10 commits
    • Rick Edgecombe's avatar
      KVM: x86/tdp_mmu: Take a GFN in kvm_tdp_mmu_fast_pf_get_last_sptep() · c2f38f75
      Rick Edgecombe authored
      Pass fault->gfn into kvm_tdp_mmu_fast_pf_get_last_sptep(), instead of
      passing fault->addr and then converting it to a GFN.
      
      Future changes will make fault->addr and fault->gfn differ when running
      TDX guests. The GFN will be conceptually the same as it is for normal VMs,
      but fault->addr may contain a TDX specific bit that differentiates between
      "shared" and "private" memory. This bit will be used to direct faults to
      be handled on different roots, either the normal "direct" root or a new
      type of root that handles private memory. The TDP iterators will process
      the traditional GFN concept and apply the required TDX specifics depending
      on the root type. For this reason, it needs to operate on regular GFN and
      not the addr, which may contain these special TDX specific bits.
      
      Today kvm_tdp_mmu_fast_pf_get_last_sptep() takes fault->addr and then
      immediately converts it to a GFN with a bit shift. However, this would
      unfortunately retain the TDX specific bits in what is supposed to be a
      traditional GFN. Excluding TDX's needs, it is also is unnecessary to pass
      fault->addr and convert it to a GFN when the GFN is already on hand.
      
      So instead just pass the GFN into kvm_tdp_mmu_fast_pf_get_last_sptep() and
      use it directly.
      Signed-off-by: default avatarRick Edgecombe <rick.p.edgecombe@intel.com>
      Message-ID: <20240619223614.290657-9-rick.p.edgecombe@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c2f38f75
    • Rick Edgecombe's avatar
      KVM: x86/tdp_mmu: Rename REMOVED_SPTE to FROZEN_SPTE · 964cea81
      Rick Edgecombe authored
      Rename REMOVED_SPTE to FROZEN_SPTE so that it can be used for other
      multi-part operations.
      
      REMOVED_SPTE is used as a non-present intermediate value for multi-part
      operations that can happen when a thread doesn't have an MMU write lock.
      Today these operations are when removing PTEs.
      
      However, future changes will want to use the same concept for setting a
      PTE. In that case the REMOVED_SPTE name does not quite fit. So rename it
      to FROZEN_SPTE so it can be used for both types of operations.
      
      Also rename the relevant helpers and comments that refer to "removed"
      within the context of the SPTE value. Take care to not update naming
      referring the "remove" operations, which are still distinct.
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRick Edgecombe <rick.p.edgecombe@intel.com>
      Message-ID: <20240619223614.290657-2-rick.p.edgecombe@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      964cea81
    • Paolo Bonzini's avatar
      Merge branch 'kvm-6.10-fixes' into HEAD · 02b0d3b9
      Paolo Bonzini authored
      02b0d3b9
    • Isaku Yamahata's avatar
      KVM: x86/tdp_mmu: Sprinkle __must_check · 8a4e2742
      Isaku Yamahata authored
      The TDP MMU function __tdp_mmu_set_spte_atomic uses a cmpxchg64 to replace
      the SPTE value and returns -EBUSY on failure.  The caller must check the
      return value and retry.  Add __must_check to it, as well as to two more
      functions that forward the return value of __tdp_mmu_set_spte_atomic to
      their caller.
      Signed-off-by: default avatarIsaku Yamahata <isaku.yamahata@intel.com>
      Reviewed-by: default avatarBinbin Wu <binbin.wu@linux.intel.com>
      Message-Id: <8f7d5a1b241bf5351eaab828d1a1efe5c17699ca.1705965635.git.isaku.yamahata@intel.com>
      Acked-by: default avatarKai Huang <kai.huang@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8a4e2742
    • Paolo Bonzini's avatar
      KVM: interrupt kvm_gmem_populate() on signals · d8147384
      Paolo Bonzini authored
      kvm_gmem_populate() is a potentially lengthy operation that can involve
      multiple calls to the firmware.  Interrupt it if a signal arrives.
      
      Fixes: 1f6c06b1 ("KVM: guest_memfd: Add interface for populating gmem pages with user data")
      Cc: Isaku Yamahata <isaku.yamahata@intel.com>
      Cc: Michael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d8147384
    • Bibo Mao's avatar
      KVM: Discard zero mask with function kvm_dirty_ring_reset · 676f819c
      Bibo Mao authored
      Function kvm_reset_dirty_gfn may be called with parameters cur_slot /
      cur_offset / mask are all zero, it does not represent real dirty page.
      It is not necessary to clear dirty page in this condition. Also return
      value of macro __fls() is undefined if mask is zero which is called in
      funciton kvm_reset_dirty_gfn(). Here just return.
      Signed-off-by: default avatarBibo Mao <maobibo@loongson.cn>
      Message-ID: <20240613122803.1031511-1-maobibo@loongson.cn>
      [Move the conditional inside kvm_reset_dirty_gfn; suggested by
       Sean Christopherson. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      676f819c
    • Paolo Bonzini's avatar
      virt: guest_memfd: fix reference leak on hwpoisoned page · c31745d2
      Paolo Bonzini authored
      If kvm_gmem_get_pfn() detects an hwpoisoned page, it returns -EHWPOISON
      but it does not put back the reference that kvm_gmem_get_folio() had
      grabbed.  Add the forgotten folio_put().
      
      Fixes: a7800aa8 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarLiam Merwick <liam.merwick@oracle.com>
      Reviewed-by: default avatarIsaku Yamahata <isaku.yamahata@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c31745d2
    • Alexey Dobriyan's avatar
      kvm: do not account temporary allocations to kmem · f474092c
      Alexey Dobriyan authored
      Some allocations done by KVM are temporary, they are created as result
      of program actions, but can't exists for arbitrary long times.
      
      They should have been GFP_TEMPORARY (rip!).
      
      OTOH, kvm-nx-lpage-recovery and kvm-pit kernel threads exist for as long
      as VM exists but their task_struct memory is not accounted.
      This is story for another day.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Message-ID: <c0122f66-f428-417e-a360-b25fc0f154a0@p183>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f474092c
    • Sean Christopherson's avatar
      MAINTAINERS: Drop Wanpeng Li as a Reviewer for KVM Paravirt support · b0185890
      Sean Christopherson authored
      Drop Wanpeng as a KVM PARAVIRT reviewer as his @tencent.com email is
      bouncing, and according to lore[*], the last activity from his @gmail.com
      address was almost two years ago.
      
      [*] https://lore.kernel.org/all/CANRm+Cwj29M9HU3=JRUOaKDR+iDKgr0eNMWQi0iLkR5THON-bg@mail.gmail.com
      
      Cc: Wanpeng Li <kernellwp@gmail.com>
      Cc: Like Xu <like.xu.linux@gmail.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20240610163427.3359426-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b0185890
    • Sean Christopherson's avatar
      KVM: x86: Always sync PIR to IRR prior to scanning I/O APIC routes · f3ced000
      Sean Christopherson authored
      Sync pending posted interrupts to the IRR prior to re-scanning I/O APIC
      routes, irrespective of whether the I/O APIC is emulated by userspace or
      by KVM.  If a level-triggered interrupt routed through the I/O APIC is
      pending or in-service for a vCPU, KVM needs to intercept EOIs on said
      vCPU even if the vCPU isn't the destination for the new routing, e.g. if
      servicing an interrupt using the old routing races with I/O APIC
      reconfiguration.
      
      Commit fceb3a36 ("KVM: x86: ioapic: Fix level-triggered EOI and
      userspace I/OAPIC reconfigure race") fixed the common cases, but
      kvm_apic_pending_eoi() only checks if an interrupt is in the local
      APIC's IRR or ISR, i.e. misses the uncommon case where an interrupt is
      pending in the PIR.
      
      Failure to intercept EOI can manifest as guest hangs with Windows 11 if
      the guest uses the RTC as its timekeeping source, e.g. if the VMM doesn't
      expose a more modern form of time to the guest.
      
      Cc: stable@vger.kernel.org
      Cc: Adamos Ttofari <attofari@amazon.de>
      Cc: Raghavendra Rao Ananta <rananta@google.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20240611014845.82795-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f3ced000
  3. 05 Jun, 2024 3 commits
    • Ravi Bangoria's avatar
      KVM: SNP: Fix LBR Virtualization for SNP guest · f99b0522
      Ravi Bangoria authored
      SEV-ES and thus SNP guest mandates LBR Virtualization to be _always_ ON.
      Although commit b7e4be0a ("KVM: SEV-ES: Delegate LBR virtualization
      to the processor") did the correct change for SEV-ES guests, it missed
      the SNP. Fix it.
      Reported-by: default avatarSrikanth Aithal <sraithal@amd.com>
      Fixes: b7e4be0a ("KVM: SEV-ES: Delegate LBR virtualization to the processor")
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@amd.com>
      Message-ID: <20240605114810.1304-1-ravi.bangoria@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f99b0522
    • Tao Su's avatar
      KVM: x86/mmu: Don't save mmu_invalidate_seq after checking private attr · db574f2f
      Tao Su authored
      Drop the second snapshot of mmu_invalidate_seq in kvm_faultin_pfn().
      Before checking the mismatch of private vs. shared, mmu_invalidate_seq is
      saved to fault->mmu_seq, which can be used to detect an invalidation
      related to the gfn occurred, i.e. KVM will not install a mapping in page
      table if fault->mmu_seq != mmu_invalidate_seq.
      
      Currently there is a second snapshot of mmu_invalidate_seq, which may not
      be same as the first snapshot in kvm_faultin_pfn(), i.e. the gfn attribute
      may be changed between the two snapshots, but the gfn may be mapped in
      page table without hindrance. Therefore, drop the second snapshot as it
      has no obvious benefits.
      
      Fixes: f6adeae8 ("KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn()")
      Signed-off-by: default avatarTao Su <tao1.su@linux.intel.com>
      Message-ID: <20240528102234.2162763-1-tao1.su@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      db574f2f
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-6.10-1' of... · 45ce0314
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 6.10, take #1
      
      - Large set of FP/SVE fixes for pKVM, addressing the fallout
        from the per-CPU data rework and making sure that the host
        is not involved in the FP/SVE switching any more
      
      - Allow FEAT_BTI to be enabled with NV now that FEAT_PAUTH
        is copletely supported
      
      - Fix for the respective priorities of Failed PAC, Illegal
        Execution state and Instruction Abort exceptions
      
      - Fix the handling of AArch32 instruction traps failing their
        condition code, which was broken by the introduction of
        ESR_EL2.ISS2
      
      - Allow vpcus running in AArch32 state to be restored in
        System mode
      
      - Fix AArch32 GPR restore that would lose the 64 bit state
        under some conditions
      45ce0314
  4. 04 Jun, 2024 9 commits
  5. 03 Jun, 2024 8 commits
    • Paolo Bonzini's avatar
      Merge branch 'kvm-6.11-sev-snp' into HEAD · ab978c62
      Paolo Bonzini authored
      Pull base x86 KVM support for running SEV-SNP guests from Michael Roth:
      
      * add some basic infrastructure and introduces a new KVM_X86_SNP_VM
        vm_type to handle differences versus the existing KVM_X86_SEV_VM and
        KVM_X86_SEV_ES_VM types.
      
      * implement the KVM API to handle the creation of a cryptographic
        launch context, encrypt/measure the initial image into guest memory,
        and finalize it before launching it.
      
      * implement handling for various guest-generated events such as page
        state changes, onlining of additional vCPUs, etc.
      
      * implement the gmem/mmu hooks needed to prepare gmem-allocated pages
        before mapping them into guest private memory ranges as well as
        cleaning them up prior to returning them to the host for use as
        normal memory. Because those cleanup hooks supplant certain
        activities like issuing WBINVDs during KVM MMU invalidations, avoid
        duplicating that work to avoid unecessary overhead.
      
      This merge leaves out support support for attestation guest requests
      and for loading the signing keys to be used for attestation requests.
      ab978c62
    • Paolo Bonzini's avatar
      Merge tag 'kvm-riscv-fixes-6.10-1' of https://github.com/kvm-riscv/linux into HEAD · b50788f7
      Paolo Bonzini authored
      KVM/riscv fixes for 6.10, take #1
      
      - No need to use mask when hart-index-bits is 0
      - Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext()
      b50788f7
    • Paolo Bonzini's avatar
      Merge branch 'kvm-fixes-6.10-1' into HEAD · b3233c73
      Paolo Bonzini authored
      * Fixes and debugging help for the #VE sanity check.  Also disable
        it by default, even for CONFIG_DEBUG_KERNEL, because it was found
        to trigger spuriously (most likely a processor erratum as the
        exact symptoms vary by generation).
      
      * Avoid WARN() when two NMIs arrive simultaneously during an NMI-disabled
        situation (GIF=0 or interrupt shadow) when the processor supports
        virtual NMI.  While generally KVM will not request an NMI window
        when virtual NMIs are supported, in this case it *does* have to
        single-step over the interrupt shadow or enable the STGI intercept,
        in order to deliver the latched second NMI.
      
      * Drop support for hand tuning APIC timer advancement from userspace.
        Since we have adaptive tuning, and it has proved to work well,
        drop the module parameter for manual configuration and with it a
        few stupid bugs that it had.
      b3233c73
    • Paolo Bonzini's avatar
      Merge branch 'kvm-fixes-6.10-1' into HEAD · f9d1b541
      Paolo Bonzini authored
      * Fixes and debugging help for the #VE sanity check.  Also disable
        it by default, even for CONFIG_DEBUG_KERNEL, because it was found
        to trigger spuriously (most likely a processor erratum as the
        exact symptoms vary by generation).
      
      * Avoid WARN() when two NMIs arrive simultaneously during an NMI-disabled
        situation (GIF=0 or interrupt shadow) when the processor supports
        virtual NMI.  While generally KVM will not request an NMI window
        when virtual NMIs are supported, in this case it *does* have to
        single-step over the interrupt shadow or enable the STGI intercept,
        in order to deliver the latched second NMI.
      
      * Drop support for hand tuning APIC timer advancement from userspace.
        Since we have adaptive tuning, and it has proved to work well,
        drop the module parameter for manual configuration and with it a
        few stupid bugs that it had.
      f9d1b541
    • Sean Christopherson's avatar
      KVM: x86: Drop support for hand tuning APIC timer advancement from userspace · 89a58812
      Sean Christopherson authored
      Remove support for specifying a static local APIC timer advancement value,
      and instead present a read-only boolean parameter to let userspace enable
      or disable KVM's dynamic APIC timer advancement.  Realistically, it's all
      but impossible for userspace to specify an advancement that is more
      precise than what KVM's adaptive tuning can provide.  E.g. a static value
      needs to be tuned for the exact hardware and kernel, and if KVM is using
      hrtimers, likely requires additional tuning for the exact configuration of
      the entire system.
      
      Dropping support for a userspace provided value also fixes several flaws
      in the interface.  E.g. KVM interprets a negative value other than -1 as a
      large advancement, toggling between a negative and positive value yields
      unpredictable behavior as vCPUs will switch from dynamic to static
      advancement, changing the advancement in the middle of VM creation can
      result in different values for vCPUs within a VM, etc.  Those flaws are
      mostly fixable, but there's almost no justification for taking on yet more
      complexity (it's minimal complexity, but still non-zero).
      
      The only arguments against using KVM's adaptive tuning is if a setup needs
      a higher maximum, or if the adjustments are too reactive, but those are
      arguments for letting userspace control the absolute max advancement and
      the granularity of each adjustment, e.g. similar to how KVM provides knobs
      for halt polling.
      
      Link: https://lore.kernel.org/all/20240520115334.852510-1-zhoushuling@huawei.com
      Cc: Shuling Zhou <zhoushuling@huawei.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20240522010304.1650603-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      89a58812
    • Ravi Bangoria's avatar
      KVM: SEV-ES: Delegate LBR virtualization to the processor · b7e4be0a
      Ravi Bangoria authored
      As documented in APM[1], LBR Virtualization must be enabled for SEV-ES
      guests. Although KVM currently enforces LBRV for SEV-ES guests, there
      are multiple issues with it:
      
      o MSR_IA32_DEBUGCTLMSR is still intercepted. Since MSR_IA32_DEBUGCTLMSR
        interception is used to dynamically toggle LBRV for performance reasons,
        this can be fatal for SEV-ES guests. For ex SEV-ES guest on Zen3:
      
        [guest ~]# wrmsr 0x1d9 0x4
        KVM: entry failed, hardware error 0xffffffff
        EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000
      
        Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests.
        No additional save/restore logic is required since MSR_IA32_DEBUGCTLMSR
        is of swap type A.
      
      o KVM will disable LBRV if userspace sets MSR_IA32_DEBUGCTLMSR before the
        VMSA is encrypted. Fix this by moving LBRV enablement code post VMSA
        encryption.
      
      [1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June
           2023, Vol 2, 15.35.2 Enabling SEV-ES.
           https://bugzilla.kernel.org/attachment.cgi?id=304653
      
      Fixes: 376c6d28 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
      Co-developed-by: default avatarNikunj A Dadhania <nikunj@amd.com>
      Signed-off-by: default avatarNikunj A Dadhania <nikunj@amd.com>
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@amd.com>
      Message-ID: <20240531044644.768-4-ravi.bangoria@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b7e4be0a
    • Ravi Bangoria's avatar
      KVM: SEV-ES: Disallow SEV-ES guests when X86_FEATURE_LBRV is absent · d9220562
      Ravi Bangoria authored
      As documented in APM[1], LBR Virtualization must be enabled for SEV-ES
      guests. So, prevent SEV-ES guests when LBRV support is missing.
      
      [1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June
           2023, Vol 2, 15.35.2 Enabling SEV-ES.
           https://bugzilla.kernel.org/attachment.cgi?id=304653
      
      Fixes: 376c6d28 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@amd.com>
      Message-ID: <20240531044644.768-3-ravi.bangoria@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d9220562
    • Nikunj A Dadhania's avatar
      KVM: SEV-ES: Prevent MSR access post VMSA encryption · 27bd5fdc
      Nikunj A Dadhania authored
      KVM currently allows userspace to read/write MSRs even after the VMSA is
      encrypted. This can cause unintentional issues if MSR access has side-
      effects. For ex, while migrating a guest, userspace could attempt to
      migrate MSR_IA32_DEBUGCTLMSR and end up unintentionally disabling LBRV on
      the target. Fix this by preventing access to those MSRs which are context
      switched via the VMSA, once the VMSA is encrypted.
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarNikunj A Dadhania <nikunj@amd.com>
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@amd.com>
      Message-ID: <20240531044644.768-2-ravi.bangoria@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      27bd5fdc