- 26 Sep, 2022 40 commits
-
-
Paolo Bonzini authored
kvm_vcpu_check_block() is called while not in TASK_RUNNING, and therefore it cannot sleep. Writing to guest memory is therefore forbidden, but it can happen on AMD processors if kvm_check_nested_events() causes a vmexit. Fortunately, all events that are caught by kvm_check_nested_events() are also recognized by kvm_vcpu_has_events() through vendor callbacks such as kvm_x86_interrupt_allowed() or kvm_x86_ops.nested_ops->has_events(), so remove the call and postpone the actual processing to vcpu_block(). Opportunistically honor the return of kvm_check_nested_events(). KVM punted on the check in kvm_vcpu_running() because the only error path is if vmx_complete_nested_posted_interrupt() fails, in which case KVM exits to userspace with "internal error" i.e. the VM is likely dead anyways so it wasn't worth overloading the return of kvm_vcpu_running(). Add the check mostly so that KVM is consistent with itself; the return of the call via kvm_apic_accept_events()=>kvm_check_nested_events() that immediately follows _is_ checked. Reported-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> [sean: check and handle return of kvm_check_nested_events()] Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-11-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Don't snapshot pending INIT/SIPI events prior to checking nested events, architecturally there's nothing wrong with KVM processing (dropping) a SIPI that is received immediately after synthesizing a VM-Exit. Taking and consuming the snapshot makes the flow way more subtle than it needs to be, e.g. nVMX consumes/clears events that trigger VM-Exit (INIT/SIPI), and so at first glance it appears that KVM is double-dipping on pending INITs and SIPIs. But that's not the case because INIT is blocked unconditionally in VMX root mode the CPU cannot be in wait-for_SIPI after VM-Exit, i.e. the paths that truly consume the snapshot are unreachable if apic->pending_events is modified by kvm_check_nested_events(). nSVM is a similar story as GIF is cleared by the CPU on VM-Exit; INIT is blocked regardless of whether or not it was pending prior to VM-Exit. Drop the snapshot logic so that a future fix doesn't create weirdness when kvm_vcpu_running()'s call to kvm_check_nested_events() is moved to vcpu_block(). In that case, kvm_check_nested_events() will be called immediately before kvm_apic_accept_events(), which raises the obvious question of why that change doesn't break the snapshot logic. Note, there is a subtle functional change. Previously, KVM would clear pending SIPIs if and only SIPI was pending prior to VM-Exit, whereas now KVM clears pending SIPI unconditionally if INIT+SIPI are blocked. The latter is architecturally allowed, as SIPI is ignored if the CPU is not in wait-for-SIPI mode (arguably, KVM should be even more aggressive in dropping SIPIs). It is software's responsibility to ensure the SIPI is delivered, i.e. software shouldn't be firing INIT-SIPI at a CPU until it knows with 100% certaining that the target CPU isn't in VMX root mode. Furthermore, the existing code is extra weird as SIPIs that arrive after VM-Exit _are_ dropped if there also happened to be a pending SIPI before VM-Exit. Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-10-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Explicitly check for a pending INIT/SIPI event when emulating VMXOFF instead of blindly making an event request. There's obviously no need to evaluate events if none are pending. Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-9-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Evaluate interrupts, i.e. set KVM_REQ_EVENT, if INIT or SIPI is pending when emulating nested VM-Enter. INIT is blocked while the CPU is in VMX root mode, but not in VMX non-root, i.e. becomes unblocked on VM-Enter. This bug has been masked by KVM calling ->check_nested_events() in the core run loop, but that hack will be fixed in the near future. Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-8-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Set KVM_REQ_EVENT if INIT or SIPI is pending when the guest enables GIF. INIT in particular is blocked when GIF=0 and needs to be processed when GIF is toggled to '1'. This bug has been masked by (a) KVM calling ->check_nested_events() in the core run loop and (b) hypervisors toggling GIF from 0=>1 only when entering guest mode (L1 entering L2). Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-7-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Do not return true from kvm_vcpu_has_events() if the vCPU isn' going to immediately process a pending INIT/SIPI. INIT/SIPI shouldn't be treated as wake events if they are blocked. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> [sean: rebase onto refactored INIT/SIPI helpers, massage changelog] Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-6-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Rename kvm_apic_has_events() to kvm_apic_has_pending_init_or_sipi() so that it's more obvious that "events" really just means "INIT or SIPI". Opportunistically clean up a weirdly worded comment that referenced kvm_apic_has_events() instead of kvm_apic_accept_events(). No functional change intended. Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-5-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Rename and invert kvm_vcpu_latch_init() to kvm_apic_init_sipi_allowed() so as to match the behavior of {interrupt,nmi,smi}_allowed(), and expose the helper so that it can be used by kvm_vcpu_has_events() to determine whether or not an INIT or SIPI is pending _and_ can be taken immediately. Opportunistically replaced usage of the "latch" terminology with "blocked" and/or "allowed", again to align with KVM's terminology used for all other event types. No functional change intended. Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-4-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Set KVM_REQ_EVENT when MTF becomes pending to ensure that KVM will run through inject_pending_event() and thus vmx_check_nested_events() prior to re-entering the guest. MTF currently works by virtue of KVM's hack that calls kvm_check_nested_events() from kvm_vcpu_running(), but that hack will be removed in the near future. Until that call is removed, the patch introduces no real functional change. Fixes: 5ef8acbd ("KVM: nVMX: Emulate MTF when performing instruction emulation") Cc: stable@vger.kernel.org Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-3-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Interrupts, NMIs etc. sent while in guest mode are already handled properly by the *_interrupt_allowed callbacks, but other events can cause a vCPU to be runnable that are specific to guest mode. In the case of VMX there are two, the preemption timer and the monitor trap. The VMX preemption timer is already special cased via the hv_timer_pending callback, but the purpose of the callback can be easily extended to MTF or in fact any other event that can occur only in guest mode. Rename the callback and add an MTF check; kvm_arch_vcpu_runnable() now can return true if an MTF is pending, without relying on kvm_vcpu_running()'s call to kvm_check_nested_events(). Until that call is removed, however, the patch introduces no functional change. Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-2-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Oliver Upton authored
While I'm still at Google, I've since switched to a linux.dev account for working upstream. Add an alias to the new address. Signed-off-by:
Oliver Upton <oliver.upton@linux.dev> Message-Id: <20220819190158.234290-1-oliver.upton@linux.dev> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Allow force_emulation_prefix to be written by privileged userspace without reloading KVM. The param does not have any persistent affects and is trivial to snapshot. Signed-off-by:
Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830231614.3580124-28-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add a test to verify that KVM_{G,S}ET_EVENTS play nice with pending vs. injected exceptions when an exception is being queued for L2, and that KVM correctly handles L1's exception intercept wants. Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-27-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Include the vmx.h and svm.h uapi headers that KVM so kindly provides instead of manually defining all the same exit reasons/code. Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-26-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Rename inject_pending_events() to kvm_check_and_inject_events() in order to capture the fact that it handles more than just pending events, and to (mostly) align with kvm_check_nested_events(), which omits the "inject" for brevity. Add a comment above kvm_check_and_inject_events() to provide a high-level synopsis, and to document a virtualization hole (KVM erratum if you will) that exists due to KVM not strictly tracking instruction boundaries with respect to coincident instruction restarts and asynchronous events. No functional change inteded. Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-25-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Document the oddities of ICEBP interception (trap-like #DB is intercepted as a fault-like exception), and how using VMX's inner "skip" helper deliberately bypasses the pending MTF and single-step #DB logic. No functional change intended. Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-24-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Treat pending TRIPLE_FAULTS as pending exceptions. A triple fault is an exception for all intents and purposes, it's just not tracked as such because there's no vector associated the exception. E.g. if userspace were to set vcpu->request_interrupt_window while running L2 and L2 hit a triple fault, a triple fault nested VM-Exit should be synthesized to L1 before exiting to userspace with KVM_EXIT_IRQ_WINDOW_OPEN. Link: https://lore.kernel.org/all/YoVHAIGcFgJit1qp@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-23-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Morph pending exceptions to pending VM-Exits (due to interception) when the exception is queued instead of waiting until nested events are checked at VM-Entry. This fixes a longstanding bug where KVM fails to handle an exception that occurs during delivery of a previous exception, KVM (L0) and L1 both want to intercept the exception (e.g. #PF for shadow paging), and KVM determines that the exception is in the guest's domain, i.e. queues the new exception for L2. Deferring the interception check causes KVM to esclate various combinations of injected+pending exceptions to double fault (#DF) without consulting L1's interception desires, and ends up injecting a spurious #DF into L2. KVM has fudged around the issue for #PF by special casing emulated #PF injection for shadow paging, but the underlying issue is not unique to shadow paging in L0, e.g. if KVM is intercepting #PF because the guest has a smaller maxphyaddr and L1 (but not L0) is using shadow paging. Other exceptions are affected as well, e.g. if KVM is intercepting #GP for one of SVM's workaround or for the VMware backdoor emulation stuff. The other cases have gone unnoticed because the #DF is spurious if and only if L1 resolves the exception, e.g. KVM's goofs go unnoticed if L1 would have injected #DF anyways. The hack-a-fix has also led to ugly code, e.g. bailing from the emulator if #PF injection forced a nested VM-Exit and the emulator finds itself back in L1. Allowing for direct-to-VM-Exit queueing also neatly solves the async #PF in L2 mess; no need to set a magic flag and token, simply queue a #PF nested VM-Exit. Deal with event migration by flagging that a pending exception was queued by userspace and check for interception at the next KVM_RUN, e.g. so that KVM does the right thing regardless of the order in which userspace restores nested state vs. event state. When "getting" events from userspace, simply drop any pending excpetion that is destined to be intercepted if there is also an injected exception to be migrated. Ideally, KVM would migrate both events, but that would require new ABI, and practically speaking losing the event is unlikely to be noticed, let alone fatal. The injected exception is captured, RIP still points at the original faulting instruction, etc... So either the injection on the target will trigger the same intercepted exception, or the source of the intercepted exception was transient and/or non-deterministic, thus dropping it is ok-ish. Fixes: a04aead1 ("KVM: nSVM: fix running nested guests when npt=0") Fixes: feaf0c7d ("KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2") Cc: Jim Mattson <jmattson@google.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-22-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add a gigantic comment above vmx_check_nested_events() to document the priorities of all known events on Intel CPUs. Intel's SDM doesn't include VMX-specific events in its "Priority Among Concurrent Events", which makes it painfully difficult to suss out the correct priority between things like Monitor Trap Flag VM-Exits and pending #DBs. Kudos to Jim Mattson for doing the hard work of collecting and interpreting the priorities from various locations throughtout the SDM (because putting them all in one place in the SDM would be too easy). Cc: Jim Mattson <jmattson@google.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-21-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add a helper to identify "low"-priority #DB traps, i.e. trap-like #DBs that aren't TSS T flag #DBs, and tweak the related code to operate on any queued exception. A future commit will separate exceptions that are intercepted by L1, i.e. cause nested VM-Exit, from those that do NOT trigger nested VM-Exit. I.e. there will be multiple exception structs and multiple invocations of the helpers. No functional change intended. Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-20-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Determine whether or not new events can be injected after checking nested events. If a VM-Exit occurred during nested event handling, any previous event that needed re-injection is gone from's KVM perspective; the event is captured in the vmc*12 VM-Exit information, but doesn't exist in terms of what needs to be done for entry to L1. Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-19-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Perform nested event checks before re-injecting exceptions/events into L2. If a pending exception causes VM-Exit to L1, re-injecting events into vmcs02 is premature and wasted effort. Take care to ensure events that need to be re-injected are still re-injected if checking for nested events "fails", i.e. if KVM needs to force an immediate entry+exit to complete the to-be-re-injecteed event. Keep the "can_inject" logic the same for now; it too can be pushed below the nested checks, but is a slightly riskier change (see past bugs about events not being properly purged on nested VM-Exit). Add and/or modify comments to better document the various interactions. Of note is the comment regarding "blocking" previously injected NMIs and IRQs if an exception is pending. The old comment isn't wrong strictly speaking, but it failed to capture the reason why the logic even exists. Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-18-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Queue #DF by recursing on kvm_multiple_exception() by way of kvm_queue_exception_e() instead of open coding the behavior. This will allow KVM to Just Work when a future commit moves exception interception checks (for L2 => L1) into kvm_multiple_exception(). No functional change intended. Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-17-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Capture nested_run_pending as block_pending_exceptions so that the logic of why exceptions are blocked only needs to be documented once instead of at every place that employs the logic. No functional change intended. Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-16-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Move the definition of "struct kvm_queued_exception" out of kvm_vcpu_arch in anticipation of adding a second instance in kvm_vcpu_arch to handle exceptions that occur when vectoring an injected exception and are morphed to VM-Exit instead of leading to #DF. Opportunistically take advantage of the churn to rename "nr" to "vector". No functional change intended. Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-15-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Rename the kvm_x86_ops hook for exception injection to better reflect reality, and to align with pretty much every other related function name in KVM. No functional change intended. Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-14-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Treat #PFs that occur during emulation of ENCLS as, wait for it, emulated page faults. Practically speaking, this is a glorified nop as the exception is never of the nested flavor, and it's extremely unlikely the guest is relying on the side effect of an implicit INVLPG on the faulting address. Fixes: 70210c04 ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions") Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-13-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Clear mtf_pending on nested VM-Exit instead of handling the clear on a case-by-case basis in vmx_check_nested_events(). The pending MTF should never survive nested VM-Exit, as it is a property of KVM's run of the current L2, i.e. should never affect the next L2 run by L1. In practice, this is likely a nop as getting to L1 with nested_run_pending is impossible, and KVM doesn't correctly handle morphing a pending exception that occurs on a prior injected exception (need for re-injected exception being the other case where MTF isn't cleared). However, KVM will hopefully soon correctly deal with a pending exception on top of an injected exception. Add a TODO to document that KVM has an inversion priority bug between SMIs and MTF (and trap-like #DBS), and that KVM also doesn't properly save/restore MTF across SMI/RSM. Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-12-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Fall through to handling other pending exception/events for L2 if SIPI is pending while the CPU is not in Wait-for-SIPI. KVM correctly ignores the event, but incorrectly returns immediately, e.g. a SIPI coincident with another event could lead to KVM incorrectly routing the event to L1 instead of L2. Fixes: bf0cd88c ("KVM: x86: emulate wait-for-SIPI and SIPI-VMExit") Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-11-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Use DR7_GD in the emulator instead of open coding the check, and drop a comically wrong comment. Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-10-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add a dedicated "exception type" for #DBs, as #DBs can be fault-like or trap-like depending the sub-type of #DB, and effectively defer the decision of what to do with the #DB to the caller. For the emulator's two calls to exception_type(), treat the #DB as fault-like, as the emulator handles only code breakpoint and general detect #DBs, both of which are fault-like. For event injection, which uses exception_type() to determine whether to set EFLAGS.RF=1 on the stack, keep the current behavior of not setting RF=1 for #DBs. Intel and AMD explicitly state RF isn't set on code #DBs, so exempting by failing the "== EXCPT_FAULT" check is correct. The only other fault-like #DB is General Detect, and despite Intel and AMD both strongly implying (through omission) that General Detect #DBs should set RF=1, hardware (multiple generations of both Intel and AMD), in fact does not. Through insider knowledge, extreme foresight, sheer dumb luck, or some combination thereof, KVM correctly handled RF for General Detect #DBs. Fixes: 38827dbd ("KVM: x86: Do not update EFLAGS on faulting emulation") Cc: stable@vger.kernel.org Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-9-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Service TSS T-flag #DBs prior to pending MTFs, as such #DBs are higher priority than MTF. KVM itself doesn't emulate TSS #DBs, and any such exceptions injected from L1 will be handled by hardware (or morphed to a fault-like exception if injection fails), but theoretically userspace could pend a TSS T-flag #DB in conjunction with a pending MTF. Note, there's no known use case this fixes, it's purely to be technically correct with respect to Intel's SDM. Cc: Oliver Upton <oupton@google.com> Cc: Peter Shier <pshier@google.com> Fixes: 5ef8acbd ("KVM: nVMX: Emulate MTF when performing instruction emulation") Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-8-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Exclude General Detect #DBs, which have fault-like behavior but also have a non-zero payload (DR6.BD=1), from nVMX's handling of pending debug traps. Opportunistically rewrite the comment to better document what is being checked, i.e. "has a non-zero payload" vs. "has a payload", and to call out the many caveats surrounding #DBs that KVM dodges one way or another. Cc: Oliver Upton <oupton@google.com> Cc: Peter Shier <pshier@google.com> Fixes: 684c0422 ("KVM: nVMX: Handle pending #DB when injecting INIT VM-exit") Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-7-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Suppress code breakpoints if MOV/POP SS blocking is active and the guest CPU is Intel, i.e. if the guest thinks it's running on an Intel CPU. Intel CPUs inhibit code #DBs when MOV/POP SS blocking is active, whereas AMD (and its descendents) do not. Signed-off-by:
Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830231614.3580124-6-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Extend force_emulation_prefix to an 'int' and use bit 1 as a flag to indicate that KVM should clear RFLAGS.RF before emulating, e.g. to allow tests to force emulation of code breakpoints in conjunction with MOV/POP SS blocking, which is impossible without KVM intervention as VMX unconditionally sets RFLAGS.RF on intercepted #UD. Make the behavior controllable so that tests can also test RFLAGS.RF=1 (again in conjunction with code #DBs). Note, clearing RFLAGS.RF won't create an infinite #DB loop as the guest's IRET from the #DB handler will return to the instruction and not the prefix, i.e. the restart won't force emulation. Opportunistically convert the permissions to the preferred octal format. Signed-off-by:
Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830231614.3580124-5-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Don't check for code breakpoints during instruction emulation if the emulation was triggered by exception interception. Code breakpoints are the highest priority fault-like exception, and KVM only emulates on exceptions that are fault-like. Thus, if hardware signaled a different exception, then the vCPU is already passed the stage of checking for hardware breakpoints. This is likely a glorified nop in terms of functionality, and is more for clarification and is technically an optimization. Intel's SDM explicitly states vmcs.GUEST_RFLAGS.RF on exception interception is the same as the value that would have been saved on the stack had the exception not been intercepted, i.e. will be '1' due to all fault-like exceptions setting RF to '1'. AMD says "guest state saved ... is the processor state as of the moment the intercept triggers", but that begs the question, "when does the intercept trigger?". Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-4-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Deliberately truncate the exception error code when shoving it into the VMCS (VM-Entry field for vmcs01 and vmcs02, VM-Exit field for vmcs12). Intel CPUs are incapable of handling 32-bit error codes and will never generate an error code with bits 31:16, but userspace can provide an arbitrary error code via KVM_SET_VCPU_EVENTS. Failure to drop the bits on exception injection results in failed VM-Entry, as VMX disallows setting bits 31:16. Setting the bits on VM-Exit would at best confuse L1, and at worse induce a nested VM-Entry failure, e.g. if L1 decided to reinject the exception back into L2. Cc: stable@vger.kernel.org Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Jim Mattson <jmattson@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-3-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Drop pending exceptions and events queued for re-injection when leaving nested guest mode, even if the "exit" is due to VM-Fail, SMI, or forced by host userspace. Failure to purge events could result in an event belonging to L2 being injected into L1. This _should_ never happen for VM-Fail as all events should be blocked by nested_run_pending, but it's possible if KVM, not the L1 hypervisor, is the source of VM-Fail when running vmcs02. SMI is a nop (barring unknown bugs) as recognition of SMI and thus entry to SMM is blocked by pending exceptions and re-injected events. Forced exit is definitely buggy, but has likely gone unnoticed because userspace probably follows the forced exit with KVM_SET_VCPU_EVENTS (or some other ioctl() that purges the queue). Fixes: 4f350c6d ("kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly") Cc: stable@vger.kernel.org Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Jim Mattson <jmattson@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-2-seanjc@google.comSigned-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Hou Wenlong authored
Since the RDMSR/WRMSR emulation uses a sepearte emualtor interface, the trace points for RDMSR/WRMSR can be added in emulator path like normal path. Signed-off-by:
Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/r/39181a9f777a72d61a4d0bb9f6984ccbd1de2ea3.1661930557.git.houwenlong.hwl@antgroup.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Hou Wenlong authored
The return value of emulator_{get|set}_mst_with_filter() is confused, since msr access error and emulator error are mixed. Although, KVM_MSR_RET_* doesn't conflict with X86EMUL_IO_NEEDED at present, it is better to convert msr access error to emulator error if error value is needed. So move "r < 0" handling for wrmsr emulation into the set helper function, then only X86EMUL_* is returned in the helper functions. Also add "r < 0" check in the get helper function, although KVM doesn't return -errno today, but assuming that will always hold true is unnecessarily risking. Suggested-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Hou Wenlong <houwenlong.hwl@antgroup.com> Link: https://lore.kernel.org/r/09b2847fc3bcb8937fb11738f0ccf7be7f61d9dd.1661930557.git.houwenlong.hwl@antgroup.com [sean: wrap changelog less aggressively] Signed-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-