- 31 Aug, 2023 30 commits
-
-
Sean Christopherson authored
Bail from ppgtt_populate_shadow_entry() if an unexpected GTT entry type is encountered instead of subtly falling through to the common "direct shadow" path. Eliminating the default/error path's reliance on the common handling will allow hoisting intel_gvt_dma_map_guest_page() into the case statements so that the 2MiB case can try intel_gvt_dma_map_guest_page() and fallback to splitting the entry on failure. Reviewed-by:
Zhi Wang <zhi.a.wang@intel.com> Tested-by:
Yongwei Ma <yongwei.ma@intel.com> Reviewed-by:
Yan Zhao <yan.y.zhao@intel.com> Link: https://lore.kernel.org/r/20230729013535.1070024-8-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Move the check that a vGPU is attached from is_2MB_gtt_possible() all the way up to shadow_ppgtt_mm() to avoid unnecessary work, and to make it more obvious that a future cleanup of is_2MB_gtt_possible() isn't introducing a bug. is_2MB_gtt_possible() has only one caller, ppgtt_populate_shadow_entry(), and all paths in ppgtt_populate_shadow_entry() eventually check for attachment by way of intel_gvt_dma_map_guest_page(). And of the paths that lead to ppgtt_populate_shadow_entry(), shadow_ppgtt_mm() is the only one that doesn't already check for INTEL_VGPU_STATUS_ACTIVE or INTEL_VGPU_STATUS_ATTACHED. workload_thread() <= pick_next_workload() => INTEL_VGPU_STATUS_ACTIVE | -> dispatch_workload() | |-> prepare_workload() | -> intel_vgpu_sync_oos_pages() | | | |-> ppgtt_set_guest_page_sync() | | | |-> sync_oos_page() | | | |-> ppgtt_populate_shadow_entry() | |-> intel_vgpu_flush_post_shadow() | 1: |-> ppgtt_handle_guest_write_page_table() | |-> ppgtt_handle_guest_entry_add() | 2: | -> ppgtt_populate_spt_by_guest_entry() | | | |-> ppgtt_populate_spt() | | | |-> ppgtt_populate_shadow_entry() | | | |-> ppgtt_populate_spt_by_guest_entry() [see 2] | |-> ppgtt_populate_shadow_entry() kvmgt_page_track_write() <= KVM callback => INTEL_VGPU_STATUS_ATTACHED | |-> intel_vgpu_page_track_handler() | |-> ppgtt_write_protection_handler() | |-> ppgtt_handle_guest_write_page_table_bytes() | |-> ppgtt_handle_guest_write_page_table() [see 1] Reviewed-by:
Yan Zhao <yan.y.zhao@intel.com> Tested-by:
Yan Zhao <yan.y.zhao@intel.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Put the struct page reference acquired by gfn_to_pfn(), KVM's API is that the caller is ultimately responsible for dropping any reference. Note, kvm_release_pfn_clean() ensures the pfn is actually a refcounted struct page before trying to put any references. Fixes: b901b252 ("drm/i915/gvt: Add 2M huge gtt support") Reviewed-by:
Yan Zhao <yan.y.zhao@intel.com> Tested-by:
Yongwei Ma <yongwei.ma@intel.com> Reviewed-by:
Zhi Wang <zhi.a.wang@intel.com> Link: https://lore.kernel.org/r/20230729013535.1070024-6-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Yan Zhao authored
Attempt to unpin pages in the error path of gvt_pin_guest_page() if and only if at least one page was successfully pinned. Unpinning doesn't cause functional problems, but vfio_device_container_unpin_pages() rightfully warns about being asked to unpin zero pages. Signed-off-by:
Yan Zhao <yan.y.zhao@intel.com> [sean: write changelog] Reviewed-by:
Zhi Wang <zhi.a.wang@intel.com> Link: https://lore.kernel.org/r/20230729013535.1070024-5-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
When shadowing a GTT entry with a 2M page, verify that the pfns are contiguous, not just that the struct page pointers are contiguous. The memory map is virtual contiguous if "CONFIG_FLATMEM=y || CONFIG_SPARSEMEM_VMEMMAP=y", but not for "CONFIG_SPARSEMEM=y && CONFIG_SPARSEMEM_VMEMMAP=n", so theoretically KVMGT could encounter struct pages that are virtually contiguous, but not physically contiguous. In practice, this flaw is likely a non-issue as it would cause functional problems iff a section isn't 2M aligned _and_ is directly adjacent to another section with discontiguous pfns. Tested-by:
Yongwei Ma <yongwei.ma@intel.com> Reviewed-by:
Zhi Wang <zhi.a.wang@intel.com> Reviewed-by:
Yan Zhao <yan.y.zhao@intel.com> Link: https://lore.kernel.org/r/20230729013535.1070024-4-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Yan Zhao authored
Currently intel_gvt_is_valid_gfn() is called in two places: (1) shadowing guest GGTT entry (2) shadowing guest PPGTT leaf entry, which was introduced in commit cc753fbe ("drm/i915/gvt: validate gfn before set shadow page entry"). However, now it's not necessary to call this interface any more, because a. GGTT partial write issue has been fixed by commit bc0686ff ("drm/i915/gvt: support inconsecutive partial gtt entry write") commit 510fe10b ("drm/i915/gvt: fix a bug of partially write ggtt enties") b. PPGTT resides in normal guest RAM and we only treat 8-byte writes as valid page table writes. Any invalid GPA found is regarded as an error, either due to guest misbehavior/attack or bug in host shadow code. So,rather than do GFN pre-checking and replace invalid GFNs with scratch GFN and continue silently, just remove the pre-checking and abort PPGTT shadowing on error detected. c. GFN validity check is still performed in intel_gvt_dma_map_guest_page() --> gvt_pin_guest_page(). It's more desirable to call VFIO interface to do both validity check and mapping. Calling intel_gvt_is_valid_gfn() to do GFN validity check from KVM side while later mapping the GFN through VFIO interface is unnecessarily fragile and confusing for unaware readers. Signed-off-by:
Yan Zhao <yan.y.zhao@intel.com> [sean: remove now-unused local variables] Acked-by:
Zhi Wang <zhi.a.wang@intel.com> Tested-by:
Yongwei Ma <yongwei.ma@intel.com> Link: https://lore.kernel.org/r/20230729013535.1070024-3-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Check that the pfn found by gfn_to_pfn() is actually backed by "struct page" memory prior to retrieving and dereferencing the page. KVM supports backing guest memory with VM_PFNMAP, VM_IO, etc., and so there is no guarantee the pfn returned by gfn_to_pfn() has an associated "struct page". Fixes: b901b252 ("drm/i915/gvt: Add 2M huge gtt support") Reviewed-by:
Yan Zhao <yan.y.zhao@intel.com> Tested-by:
Yongwei Ma <yongwei.ma@intel.com> Reviewed-by:
Zhi Wang <zhi.a.wang@intel.com> Link: https://lore.kernel.org/r/20230729013535.1070024-2-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Introduce KVM_BUG_ON_DATA_CORRUPTION() and use it in the low-level rmap helpers to convert the existing BUG()s to WARN_ON_ONCE() when the kernel is built with CONFIG_BUG_ON_DATA_CORRUPTION=n, i.e. does NOT want to BUG() on corruption of host kernel data structures. Environments that don't have infrastructure to automatically capture crash dumps, i.e. aren't likely to enable CONFIG_BUG_ON_DATA_CORRUPTION=y, are typically better served overall by WARN-and-continue behavior (for the kernel, the VM is dead regardless), as a BUG() while holding mmu_lock all but guarantees the _best_ case scenario is a panic(). Make the BUG()s conditional instead of removing/replacing them entirely as there's a non-zero chance (though by no means a guarantee) that the damage isn't contained to the target VM, e.g. if no rmap is found for a SPTE then KVM may be double-zapping the SPTE, i.e. has already freed the memory the SPTE pointed at and thus KVM is reading/writing memory that KVM no longer owns. Link: https://lore.kernel.org/all/20221129191237.31447-1-mizhang@google.comSuggested-by:
Mingwei Zhang <mizhang@google.com> Cc: David Matlack <dmatlack@google.com> Cc: Jim Mattson <jmattson@google.com> Reviewed-by:
Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20230729004722.1056172-13-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Mingwei Zhang authored
Plumb "struct kvm" all the way to pte_list_remove() to allow the usage of KVM_BUG() and/or KVM_BUG_ON(). This will allow killing only the offending VM instead of doing BUG() if the kernel is built with CONFIG_BUG_ON_DATA_CORRUPTION=n, i.e. does NOT want to BUG() if KVM's data structures (rmaps) appear to be corrupted. Signed-off-by:
Mingwei Zhang <mizhang@google.com> [sean: tweak changelog] Link: https://lore.kernel.org/r/20230729004722.1056172-12-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Use BUILD_BUG_ON_INVALID() instead of an empty do-while loop to stub out KVM_MMU_WARN_ON() when CONFIG_KVM_PROVE_MMU=n, that way _some_ build issues with the usage of KVM_MMU_WARN_ON() will be dected even if the kernel is using the stubs, e.g. basic syntax errors will be detected. Reviewed-by:
Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20230729004722.1056172-11-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Replace MMU_DEBUG, which requires manually modifying KVM to enable the macro, with a proper Kconfig, KVM_PROVE_MMU. Now that pgprintk() and rmap_printk() are gone, i.e. the macro guards only KVM_MMU_WARN_ON() and won't flood the kernel logs, enabling the option for debug kernels is both desirable and feasible. Reviewed-by:
Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20230729004722.1056172-10-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Promote the ASSERT(), which is quite dead code in KVM, into a KVM_BUG_ON() for KVM's sanity check that CR4.PAE=1 if the vCPU is in long mode when performing a walk of guest page tables. The sanity is quite cheap since neither EFER nor CR4.PAE requires a VMREAD, especially relative to the cost of walking the guest page tables. More importantly, the sanity check would have prevented the true badness fixed by commit 112e6601 ("KVM: nVMX: add missing consistency checks for CR0 and CR4"). The missed consistency check resulted in some versions of KVM corrupting the on-stack guest_walker structure due to KVM thinking there are 4/5 levels of page tables, but wiring up the MMU hooks to point at the paging32 implementation, which only allocates space for two levels of page tables in "struct guest_walker32". Queue a page fault for injection if the assertion fails, as both callers, FNAME(gva_to_gpa) and FNAME(walk_addr_generic), assume that walker.fault contains sane info on a walk failure. E.g. not populating the fault info could result in KVM consuming and/or exposing uninitialized stack data before the vCPU is kicked out to userspace, which doesn't happen until KVM checks for KVM_REQ_VM_DEAD on the next enter. Move the check below the initialization of "pte_access" so that the aforementioned to-be-injected page fault doesn't consume uninitialized stack data. The information _shouldn't_ reach the guest or userspace, but there's zero downside to being paranoid in this case. Link: https://lore.kernel.org/r/20230729004722.1056172-9-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Convert all "runtime" assertions, i.e. assertions that can be triggered while running vCPUs, from WARN_ON() to WARN_ON_ONCE(). Every WARN in the MMU that is tied to running vCPUs, i.e. not contained to loading and initializing KVM, is likely to fire _a lot_ when it does trigger. E.g. if KVM ends up with a bug that causes a root to be invalidated before the page fault handler is invoked, pretty much _every_ page fault VM-Exit triggers the WARN. If a WARN is triggered frequently, the resulting spam usually causes a lot of damage of its own, e.g. consumes resources to log the WARN and pollutes the kernel log, often to the point where other useful information can be lost. In many case, the damage caused by the spam is actually worse than the bug itself, e.g. KVM can almost always recover from an unexpectedly invalid root. On the flip side, warning every time is rarely helpful for debug and triage, i.e. a single splat is usually sufficient to point a debugger in the right direction, and automated testing, e.g. syzkaller, typically runs with warn_on_panic=1, i.e. will never get past the first WARN anyways. Lastly, when an assertions fails multiple times, the stack traces in KVM are almost always identical, i.e. the full splat only needs to be captured once. And _if_ there is value in captruing information about the failed assert, a ratelimited printk() is sufficient and less likely to rack up a large amount of collateral damage. Link: https://lore.kernel.org/r/20230729004722.1056172-8-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Rename MMU_WARN_ON() to make it super obvious that the assertions are all about KVM's MMU, not the primary MMU. Reviewed-by:
Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20230729004722.1056172-7-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Massage the error message for the sanity check on SPTEs when freeing a shadow page to be more verbose, and to print out all shadow-present SPTEs, not just the first SPTE encountered. Printing all SPTEs can be quite valuable for debug, e.g. highlights whether the leak is a one-off or widepsread, or possibly the result of memory corruption (something else in the kernel stomping on KVM's SPTEs). Opportunistically move the MMU_WARN_ON() into the helper itself, which will allow a future cleanup to use BUILD_BUG_ON_INVALID() as the stub for MMU_WARN_ON(). BUILD_BUG_ON_INVALID() works as intended and results in the compiler complaining about is_empty_shadow_page() not being declared. Link: https://lore.kernel.org/r/20230729004722.1056172-6-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Replace the pointer arithmetic used to iterate over SPTEs in is_empty_shadow_page() with more standard interger-based iteration. No functional change intended. Reviewed-by:
Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20230729004722.1056172-5-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Delete KVM's "dbg" module param now that its usage in KVM is gone (it used to guard pgprintk() and rmap_printk()). Link: https://lore.kernel.org/r/20230729004722.1056172-4-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Delete rmap_printk() so that MMU_WARN_ON() and MMU_DEBUG can be morphed into something that can be regularly enabled for debug kernels. The information provided by rmap_printk() isn't all that useful now that the rmap and unsync code is mature, as the prints are simultaneously too verbose (_lots_ of message) and yet not verbose enough to be helpful for debug (most instances print just the SPTE pointer/value, which is rarely sufficient to root cause anything but trivial bugs). Alternatively, rmap_printk() could be reworked to into tracepoints, but it's not clear there is a real need as rmap bugs rarely escape initial development, and when bugs do escape to production, they are often edge cases and/or reside in code that isn't directly related to the rmaps. In other words, the problems with rmap_printk() being unhelpful also apply to tracepoints. And deleting rmap_printk() doesn't preclude adding tracepoints in the future. Link: https://lore.kernel.org/r/20230729004722.1056172-3-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Delete KVM's pgprintk() and all its usage, as the code is very prone to bitrot due to being buried behind MMU_DEBUG, and the functionality has been rendered almost entirely obsolete by the tracepoints KVM has gained over the years. And for the situations where the information provided by KVM's tracepoints is insufficient, pgprintk() rarely fills in the gaps, and is almost always far too noisy, i.e. developers end up implementing custom prints anyways. Link: https://lore.kernel.org/r/20230729004722.1056172-2-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add an assertion in kvm_mmu_page_fault() to ensure the error code provided by hardware doesn't conflict with KVM's software-defined IMPLICIT_ACCESS flag. In the unlikely scenario that future hardware starts using bit 48 for a hardware-defined flag, preserving the bit could result in KVM incorrectly interpreting the unknown flag as KVM's IMPLICIT_ACCESS flag. WARN so that any such conflict can be surfaced to KVM developers and resolved, but otherwise ignore the bit as KVM can't possibly rely on a flag it knows nothing about. Fixes: 4f4aa80e ("KVM: X86: Handle implicit supervisor access with SMAP") Acked-by:
Kai Huang <kai.huang@intel.com> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20230721223711.2334426-1-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Like Xu authored
Move the lockdep_assert_held_write(&kvm->mmu_lock) from the only one caller kvm_tdp_mmu_clear_dirty_pt_masked() to inside clear_dirty_pt_masked(). This change makes it more obvious why it's safe for clear_dirty_pt_masked() to use the non-atomic (for non-volatile SPTEs) tdp_mmu_clear_spte_bits() helper. for_each_tdp_mmu_root() does its own lockdep, so the only "loss" in lockdep coverage is if the list is completely empty. Suggested-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230627042639.12636-1-likexu@tencent.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
https://github.com/kvm-x86/linuxPaolo Bonzini authored
KVM x86 changes for 6.6: - Misc cleanups - Retry APIC optimized recalculation if a vCPU is added/enabled - Overhaul emergency reboot code to bring SVM up to par with VMX, tie the "emergency disabling" behavior to KVM actually being loaded, and move all of the logic within KVM - Fix user triggerable WARNs in SVM where KVM incorrectly assumes the TSC ratio MSR can diverge from the default iff TSC scaling is enabled, and clean up related code - Add a framework to allow "caching" feature flags so that KVM can check if the guest can use a feature without needing to search guest CPUID
-
https://github.com/kvm-x86/linuxPaolo Bonzini authored
KVM: x86: SVM changes for 6.6: - Add support for SEV-ES DebugSwap, i.e. allow SEV-ES guests to use debug registers and generate/handle #DBs - Clean up LBR virtualization code - Fix a bug where KVM fails to set the target pCPU during an IRTE update - Fix fatal bugs in SEV-ES intrahost migration - Fix a bug where the recent (architecturally correct) change to reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to skip it)
-
https://github.com/kvm-x86/linuxPaolo Bonzini authored
KVM: x86: VMX changes for 6.6: - Misc cleanups - Fix a bug where KVM reads a stale vmcs.IDT_VECTORING_INFO_FIELD when trying to handle NMI VM-Exits
-
https://github.com/kvm-x86/linuxPaolo Bonzini authored
KVM x86 PMU changes for 6.6: - Clean up KVM's handling of Intel architectural events
-
https://github.com/kvm-riscv/linuxPaolo Bonzini authored
KVM/riscv changes for 6.6 - Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for Guest/VM - Added ONE_REG interface for SATP mode - Added ONE_REG interface to enable/disable multiple ISA extensions - Improved error codes returned by ONE_REG interfaces - Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V - Added get-reg-list selftest for KVM RISC-V
-
Paolo Bonzini authored
Merge tag 'kvm-s390-next-6.6-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD - PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch) Allows a PV guest to use crypto cards. Card access is governed by the firmware and once a crypto queue is "bound" to a PV VM every other entity (PV or not) looses access until it is not bound anymore. Enablement is done via flags when creating the PV VM. - Guest debug fixes (Ilya)
-
https://github.com/kvm-x86/linuxPaolo Bonzini authored
KVM: x86: Selftests changes for 6.6: - Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs - Add support for printf() in guest code and covert all guest asserts to use printf-based reporting - Clean up the PMU event filter test and add new testcases - Include x86 selftests in the KVM x86 MAINTAINERS entry
-
https://github.com/kvm-x86/linuxPaolo Bonzini authored
Common KVM changes for 6.6: - Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier events to pass action specific data without needing to constantly update the main handlers. - Drop unused function declarations
-
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarmPaolo Bonzini authored
KVM/arm64 updates for Linux 6.6 - Add support for TLB range invalidation of Stage-2 page tables, avoiding unnecessary invalidations. Systems that do not implement range invalidation still rely on a full invalidation when dealing with large ranges. - Add infrastructure for forwarding traps taken from a L2 guest to the L1 guest, with L0 acting as the dispatcher, another baby step towards the full nested support. - Simplify the way we deal with the (long deprecated) 'CPU target', resulting in a much needed cleanup. - Fix another set of PMU bugs, both on the guest and host sides, as we seem to never have any shortage of those... - Relax the alignment requirements of EL2 VA allocations for non-stack allocations, as we were otherwise wasting a lot of that precious VA space. - The usual set of non-functional cleanups, although I note the lack of spelling fixes...
-
- 29 Aug, 2023 1 commit
-
-
Sean Christopherson authored
Reset the mask of available "registers" and refresh the IDT vectoring info snapshot in vmx_vcpu_enter_exit(), before KVM potentially handles a an NMI VM-Exit. One of the "registers" that KVM VMX lazily loads is the vmcs.VM_EXIT_INTR_INFO field, which is holds the vector+type on "exception or NMI" VM-Exits, i.e. is needed to identify NMIs. Clearing the available registers bitmask after handling NMIs results in KVM querying info from the last VM-Exit that read vmcs.VM_EXIT_INTR_INFO, and leads to both missed NMIs and spurious NMIs in the host. Opportunistically grab vmcs.IDT_VECTORING_INFO_FIELD early in the VM-Exit path too, e.g. to guard against similar consumption of stale data. The field is read on every "normal" VM-Exit, and there's no point in delaying the inevitable. Reported-by:
Like Xu <like.xu.linux@gmail.com> Fixes: 11df586d ("KVM: VMX: Handle NMI VM-Exits in noinstr region") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230825014532.2846714-1-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com>
-
- 28 Aug, 2023 9 commits
-
-
Steffen Eiden authored
Introduces new feature bits and enablement flags for AP and AP IRQ support. Signed-off-by:
Steffen Eiden <seiden@linux.ibm.com> Reviewed-by:
Janosch Frank <frankja@linux.ibm.com> Reviewed-by:
Michael Mueller <mimu@linux.ibm.com> Signed-off-by:
Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20230815151415.379760-5-seiden@linux.ibm.com Message-Id: <20230815151415.379760-5-seiden@linux.ibm.com>
-
Steffen Eiden authored
Add a uv_feature list for pv-guests to the KVM cpu-model. The feature bits 'AP-interpretation for secure guests' and 'AP-interrupt for secure guests' are available. Signed-off-by:
Steffen Eiden <seiden@linux.ibm.com> Reviewed-by:
Janosch Frank <frankja@linux.ibm.com> Reviewed-by:
Michael Mueller <mimu@linux.ibm.com> Signed-off-by:
Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20230815151415.379760-4-seiden@linux.ibm.com Message-Id: <20230815151415.379760-4-seiden@linux.ibm.com>
-
Steffen Eiden authored
Introduces a function to check the existence of an UV feature. Refactor feature bit checks to use the new function. Signed-off-by:
Steffen Eiden <seiden@linux.ibm.com> Reviewed-by:
Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by:
Janosch Frank <frankja@linux.ibm.com> Signed-off-by:
Janosch Frank <frankja@linux.ibm.com> Reviewed-by:
Michael Mueller <mimu@linux.ibm.com> Link: https://lore.kernel.org/r/20230815151415.379760-3-seiden@linux.ibm.com Message-Id: <20230815151415.379760-3-seiden@linux.ibm.com>
-
Viktor Mihajlovski authored
Destroy configuration fast may return with RC 0x104 if there are still bound APQNs in the configuration. The final cleanup will occur with the standard destroy configuration UVC as at this point in time all APQNs have been reset and thus unbound. Therefore, don't warn if RC 0x104 is reported. Signed-off-by:
Viktor Mihajlovski <mihajlov@linux.ibm.com> Reviewed-by:
Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by:
Janosch Frank <frankja@linux.ibm.com> Reviewed-by:
Steffen Eiden <seiden@linux.ibm.com> Reviewed-by:
Michael Mueller <mimu@linux.ibm.com> Link: https://lore.kernel.org/r/20230815151415.379760-2-seiden@linux.ibm.comSigned-off-by:
Janosch Frank <frankja@linux.ibm.com> Message-ID: <20230815151415.379760-2-seiden@linux.ibm.com>
-
Janosch Frank authored
The Secure Execution AP support makes it possible for SE VMs to securely use APQNs without a third party being able to snoop IO. VMs first bind to an APQN to securely attach it and granting protected key crypto function access. Afterwards they can associate the APQN which grants them clear key crypto function access. Once bound the APQNs are not accessible to the host until a reset is performed. The vfio-ap patches being merged here provide the base hypervisor Secure Execution / Protected Virtualization AP support. This includes proper handling of APQNs that are securely attached to a SE/PV guest especially regarding resets.
-
Ilya Leoshkevich authored
Test different variations of single-stepping into interrupts: - SVC and PGM interrupts; - Interrupts generated by ISKE; - Interrupts generated by instructions emulated by KVM; - Interrupts generated by instructions emulated by userspace. Reviewed-by:
Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by:
Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20230725143857.228626-7-iii@linux.ibm.com> Signed-off-by:
Claudio Imbrenda <imbrenda@linux.ibm.com> [frankja@de.igm.com: s/ASSERT_EQ/TEST_ASSERT_EQ/ because function was renamed in the selftest printf series] Signed-off-by:
Janosch Frank <frankja@linux.ibm.com>
-
Ilya Leoshkevich authored
kvm_s390_skey_check_enable() does not emulate any instructions, rather, it clears CPUSTAT_KSS and arranges the instruction that caused the exit (e.g., ISKE, SSKE, RRBE or LPSWE with a keyed PSW) to run again. Therefore, skip the PER check and let the instruction execution happen. Otherwise, a debugger will see two single-step events on the same instruction. Reviewed-by:
Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20230725143857.228626-6-iii@linux.ibm.com> Signed-off-by:
Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by:
Janosch Frank <frankja@linux.ibm.com>
-
Ilya Leoshkevich authored
Single-stepping a userspace-emulated instruction that generates an interrupt causes GDB to land on the instruction following it instead of the respective interrupt handler. The reason is that after arranging a KVM_EXIT_S390_SIEIC exit, kvm_handle_sie_intercept() calls kvm_s390_handle_per_ifetch_icpt(), which sets KVM_GUESTDBG_EXIT_PENDING. This bit, however, is not processed immediately, but rather persists until the next ioctl(), causing a spurious single-step exit. Fix by clearing this bit in ioctl(). Reviewed-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by:
Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20230725143857.228626-5-iii@linux.ibm.com> Signed-off-by:
Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by:
Janosch Frank <frankja@linux.ibm.com>
-
Ilya Leoshkevich authored
Single-stepping a kernel-emulated instruction that generates an interrupt causes GDB to land on the instruction following it instead of the respective interrupt handler. The reason is that kvm_handle_sie_intercept(), after injecting the interrupt, also processes the PER event and arranges a KVM_SINGLESTEP exit. The interrupt is not yet delivered, however, so the userspace sees the next instruction. Fix by avoiding the KVM_SINGLESTEP exit when there is a pending interrupt. The next __vcpu_run() loop iteration will arrange a KVM_SINGLESTEP exit after delivering the interrupt. Reviewed-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by:
Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20230725143857.228626-4-iii@linux.ibm.com> Signed-off-by:
Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by:
Janosch Frank <frankja@linux.ibm.com>
-