- 27 Jan, 2023 2 commits
-
-
Sean Christopherson authored
Add helpers to print unimplemented MSR accesses and condition all such prints on report_ignored_msrs, i.e. honor userspace's request to not print unimplemented MSRs. Even though vcpu_unimpl() is ratelimited, printing can still be problematic, e.g. if a print gets stalled when host userspace is writing MSRs during live migration, an effective stall can result in very noticeable disruption in the guest. E.g. the profile below was taken while calling KVM_SET_MSRS on the PMU counters while the PMU was disabled in KVM. - 99.75% 0.00% [.] __ioctl - __ioctl - 99.74% entry_SYSCALL_64_after_hwframe do_syscall_64 sys_ioctl - do_vfs_ioctl - 92.48% kvm_vcpu_ioctl - kvm_arch_vcpu_ioctl - 85.12% kvm_set_msr_ignored_check svm_set_msr kvm_set_msr_common printk vprintk_func vprintk_default vprintk_emit console_unlock call_console_drivers univ8250_console_write serial8250_console_write uart_console_write Reported-by:
Aaron Lewis <aaronlewis@google.com> Reviewed-by:
Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20230124234905.3774678-3-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com>
-
Sean Christopherson authored
Limit kvm_pmu_cap.num_counters_gp during kvm_init_pmu_capability() based on the vendor PMU capabilities so that consuming num_counters_gp naturally does the right thing. This fixes a mostly theoretical bug where KVM could over-report its PMU support in KVM_GET_SUPPORTED_CPUID for leaf 0xA, e.g. if the number of counters reported by perf is greater than KVM's hardcoded internal limit. Incorporating input from the AMD PMU also avoids over-reporting MSRs to save when running on AMD. Link: https://lore.kernel.org/r/20230124234905.3774678-2-seanjc@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com>
-
- 26 Jan, 2023 1 commit
-
-
Like Xu authored
After commit ("02791a5c KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp,fixed}counter()"), vPMU starts to directly use the hardware event eventsel and unit_mask to reprogram perf_event, and the event_type field in the "struct kvm_event_hw_type_mapping" is simply no longer being used. Convert the struct into an anonymous struct as the current name is obsolete as the structure no longer has any mapping semantics, and placing the struct definition directly above its sole user makes its easier to understand what the array is filling in. Signed-off-by:
Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20221205122048.16023-1-likexu@tencent.com [sean: drop new comment, use anonymous struct] Signed-off-by:
Sean Christopherson <seanjc@google.com>
-
- 24 Jan, 2023 11 commits
-
-
Aaron Lewis authored
Add testing to show that a pmu event can be filtered with a generalized match on it's unit mask. These tests set up test cases to demonstrate various ways of filtering a pmu event that has multiple unit mask values. It does this by setting up the filter in KVM with the masked events provided, then enabling three pmu counters in the guest. The test then verifies that the pmu counters agree with which counters should be counting and which counters should be filtered for both a sparse filter list and a dense filter list. Signed-off-by:
Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-8-aaronlewis@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com>
-
Aaron Lewis authored
Test that masked events are not using invalid bits, and if they are, ensure the pmu event filter is not accepted by KVM_SET_PMU_EVENT_FILTER. The only valid bits that can be used for masked events are set when using KVM_PMU_ENCODE_MASKED_ENTRY() with one exception: If any of the high bits (35:32) of the event select are set when using Intel, the pmu event filter will fail. Also, because validation was not being done prior to the introduction of masked events, only expect validation to fail when masked events are used. E.g. in the first test a filter event with all its bits set is accepted by KVM_SET_PMU_EVENT_FILTER when flags = 0. Signed-off-by:
Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-7-aaronlewis@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com>
-
Aaron Lewis authored
Now that the flags field can be non-zero, pass it in when creating a pmu event filter. This is needed in preparation for testing masked events. No functional change intended. Signed-off-by:
Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-6-aaronlewis@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com>
-
Aaron Lewis authored
When building a list of filter events, it can sometimes be a challenge to fit all the events needed to adequately restrict the guest into the limited space available in the pmu event filter. This stems from the fact that the pmu event filter requires each event (i.e. event select + unit mask) be listed, when the intention might be to restrict the event select all together, regardless of it's unit mask. Instead of increasing the number of filter events in the pmu event filter, add a new encoding that is able to do a more generalized match on the unit mask. Introduce masked events as another encoding the pmu event filter understands. Masked events has the fields: mask, match, and exclude. When filtering based on these events, the mask is applied to the guest's unit mask to see if it matches the match value (i.e. umask & mask == match). The exclude bit can then be used to exclude events from that match. E.g. for a given event select, if it's easier to say which unit mask values shouldn't be filtered, a masked event can be set up to match all possible unit mask values, then another masked event can be set up to match the unit mask values that shouldn't be filtered. Userspace can query to see if this feature exists by looking for the capability, KVM_CAP_PMU_EVENT_MASKED_EVENTS. This feature is enabled by setting the flags field in the pmu event filter to KVM_PMU_EVENT_FLAG_MASKED_EVENTS. Events can be encoded by using KVM_PMU_ENCODE_MASKED_ENTRY(). It is an error to have a bit set outside the valid bits for a masked event, and calls to KVM_SET_PMU_EVENT_FILTER will return -EINVAL in such cases, including the high bits of the event select (35:32) if called on Intel. With these updates the filter matching code has been updated to match on a common event. Masked events were flexible enough to handle both event types, so they were used as the common event. This changes how guest events get filtered because regardless of the type of event used in the uAPI, they will be converted to masked events. Because of this there could be a slight performance hit because instead of matching the filter event with a lookup on event select + unit mask, it does a lookup on event select then walks the unit masks to find the match. This shouldn't be a big problem because I would expect the set of common event selects to be small, and if they aren't the set can likely be reduced by using masked events to generalize the unit mask. Using one type of event when filtering guest events allows for a common code path to be used. Signed-off-by:
Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-5-aaronlewis@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com>
-
Aaron Lewis authored
Refactor check_pmu_event_filter() in preparation for masked events. No functional changes intended Signed-off-by:
Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-4-aaronlewis@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com>
-
Aaron Lewis authored
If it's not possible for an event in the pmu event filter to match a pmu event being programmed by the guest, it's pointless to have it in the list. Opt for a shorter list by removing those events. Because this is established uAPI the pmu event filter can't outright rejected these events as garbage and return an error. Instead, play nice and remove them from the list. Also, opportunistically rewrite the comment when the filter is set to clarify that it guards against *all* TOCTOU attacks on the verified data. Signed-off-by:
Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-3-aaronlewis@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com>
-
Aaron Lewis authored
When checking if a pmu event the guest is attempting to program should be filtered, only consider the event select + unit mask in that decision. Use an architecture specific mask to mask out all other bits, including bits 35:32 on Intel. Those bits are not part of the event select and should not be considered in that decision. Fixes: 66bb8a06 ("KVM: x86: PMU Event Filter") Signed-off-by:
Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-2-aaronlewis@google.comSigned-off-by:
Sean Christopherson <seanjc@google.com>
-
Sean Christopherson authored
Fix a build error due to a mixup during a recent refactoring. The error was reported during code review, but the fixed up patch didn't make it into the final commit. Fixes: 474856bad921 ("KVM: PPC: Move processor compatibility check to module init") Link: https://lore.kernel.org/all/87cz93snqc.fsf@mpe.ellerman.id.au Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20230119182158.4026656-1-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
The first half or so patches fix semi-urgent, real-world relevant APICv and AVIC bugs. The second half fixes a variety of AVIC and optimized APIC map bugs where KVM doesn't play nice with various edge cases that are architecturally legal(ish), but are unlikely to occur in most real world scenarios Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
ARM: * Fix the PMCR_EL0 reset value after the PMU rework * Correctly handle S2 fault triggered by a S1 page table walk by not always classifying it as a write, as this breaks on R/O memslots * Document why we cannot exit with KVM_EXIT_MMIO when taking a write fault from a S1 PTW on a R/O memslot * Put the Apple M2 on the naughty list for not being able to correctly implement the vgic SEIS feature, just like the M1 before it * Reviewer updates: Alex is stepping down, replaced by Zenghui x86: * Fix various rare locking issues in Xen emulation and teach lockdep to detect them * Documentation improvements * Do not return host topology information from KVM_GET_SUPPORTED_CPUID
-
Paolo Bonzini authored
The main theme of this series is to kill off kvm_arch_init(), kvm_arch_hardware_(un)setup(), and kvm_arch_check_processor_compat(), which all originated in x86 code from way back when, and needlessly complicate both common KVM code and architecture code. E.g. many architectures don't mark functions/data as __init/__ro_after_init purely because kvm_init() isn't marked __init to support x86's separate vendor modules. The idea/hope is that with those hooks gone (moved to arch code), it will be easier for x86 (and other architectures) to modify their module init sequences as needed without having to fight common KVM code. E.g. I'm hoping that ARM can build on this to simplify its hardware enabling logic, especially the pKVM side of things. There are bug fixes throughout this series. They are more scattered than I would usually prefer, but getting the sequencing correct was a gigantic pain for many of the x86 fixes due to needing to fix common code in order for the x86 fix to have any meaning. And while the bugs are often fatal, they aren't all that interesting for most users as they either require a malicious admin or broken hardware, i.e. aren't likely to be encountered by the vast majority of KVM users. So unless someone _really_ wants a particular fix isolated for backporting, I'm not planning on shuffling patches. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 13 Jan, 2023 26 commits
-
-
Sean Christopherson authored
Move the guts of kvm_recalculate_apic_map()'s main loop to two separate helpers to handle recalculating the physical and logical pieces of the optimized map. Having 100+ lines of code in the for-loop makes it hard to understand what is being calculated where. No functional change intended. Suggested-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-34-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Greg Edwards authored
Legacy kernels prior to commit 4399c03c ("x86/apic: Remove verify_local_APIC()") write the APIC ID of the boot CPU twice to verify a functioning local APIC. This results in APIC acceleration inhibited on these kernels for reason APICV_INHIBIT_REASON_APIC_ID_MODIFIED. Allow the APICV_INHIBIT_REASON_APIC_ID_MODIFIED inhibit reason to be cleared if/when all APICs in xAPIC mode set their APIC ID back to the expected vcpu_id value. Fold the functionality previously in kvm_lapic_xapic_id_updated() into kvm_recalculate_apic_map(), as this allows examining all APICs in one pass. Fixes: 3743c2f0 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") Signed-off-by:
Greg Edwards <gedwards@ddn.com> Link: https://lore.kernel.org/r/20221117183247.94314-1-gedwards@ddn.comSigned-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-33-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Track the per-vendor required APICv inhibits with a variable instead of calling into vendor code every time KVM wants to query the set of required inhibits. The required inhibits are a property of the vendor's virtualization architecture, i.e. are 100% static. Using a variable allows the compiler to inline the check, e.g. generate a single-uop TEST+Jcc, and thus eliminates any desire to avoid checking inhibits for performance reasons. No functional change intended. Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-32-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Turns out that some warnings exist for good reasons. Restore the warning in avic_vcpu_load() that guards against calling avic_vcpu_load() on a running vCPU now that KVM avoids doing so when switching between x2APIC and xAPIC. The entire point of the WARN is to highlight that KVM should not be reloading an AVIC. Opportunistically convert the WARN_ON() to WARN_ON_ONCE() to avoid spamming the kernel if it does fire. This reverts commit c0caeee6. Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-31-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Drop writes to APIC_RRR, a.k.a. Remote Read Data Register, on AVIC unaccelerated write traps. The register is read-only and isn't emulated by KVM. Sending the register through kvm_apic_write_nodecode() will result in screaming when x2APIC is enabled due to the unexpected failure to retrieve the MSR (KVM expects that only "legal" accesses will trap). Fixes: 4d1d7942 ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode") Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-30-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Iterate over all target logical IDs in the AVIC kick fastpath instead of bailing if there is more than one target. Now that KVM inhibits AVIC if vCPUs aren't mapped 1:1 with logical IDs, each bit in the destination is guaranteed to match to at most one vCPU, i.e. iterating over the bitmap is guaranteed to kick each valid target exactly once. Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-29-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Do not modify AVIC's logical ID table if the logical ID portion of the LDR is not a power-of-2, i.e. if the LDR has multiple bits set. Taking only the first bit means that KVM will fail to match MDAs that intersect with "higher" bits in the "ID" The "ID" acts as a bitmap, but is referred to as an ID because there's an implicit, unenforced "requirement" that software only set one bit. This edge case is arguably out-of-spec behavior, but KVM cleanly handles it in all other cases, e.g. the optimized logical map (and AVIC!) is also disabled in this scenario. Refactor the code to consolidate the checks, and so that the code looks more like avic_kick_target_vcpus_fast(). Fixes: 18f40c53 ("svm: Add VMEXIT handlers for AVIC") Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-28-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Update SVM's cache of the LDR even if the new value is "bad". Leaving stale information in the cache can result in KVM missing updates and/or invalidating the wrong entry, e.g. if avic_invalidate_logical_id_entry() is triggered after a different vCPU has "claimed" the old LDR. Fixes: 18f40c53 ("svm: Add VMEXIT handlers for AVIC") Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-27-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Update the vCPU's local (virtual) APIC on LDR writes even if the write "fails". The APIC needs to recalc the optimized logical map even if the LDR is invalid or zero, e.g. if the guest clears its LDR, the optimized map will be left as is and the vCPU will receive interrupts using its old LDR. Fixes: 18f40c53 ("svm: Add VMEXIT handlers for AVIC") Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-26-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Inhibit SVM's AVIC if multiple vCPUs are aliased to the same logical ID. Architecturally, all CPUs whose logical ID matches the MDA are supposed to receive the interrupt; overwriting existing entries in AVIC's logical=>physical map can result in missed IPIs. Fixes: 18f40c53 ("svm: Add VMEXIT handlers for AVIC") Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-25-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Inhibit APICv/AVIC if the optimized physical map is disabled so that KVM KVM provides consistent APIC behavior if xAPIC IDs are aliased due to vcpu_id being truncated and the x2APIC hotplug hack isn't enabled. If the hotplug hack is disabled, events that are emulated by KVM will follow architectural behavior (all matching vCPUs receive events, even if the "match" is due to truncation), whereas APICv and AVIC will deliver events only to the first matching vCPU, i.e. the vCPU that matches without truncation. Note, the "extra" inhibit is needed because KVM deliberately ignores mismatches due to truncation when applying the APIC_ID_MODIFIED inhibit so that large VMs (>255 vCPUs) can run with APICv/AVIC. Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-24-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Apply KVM's hotplug hack if and only if userspace has enabled 32-bit IDs for x2APIC. If 32-bit IDs are not enabled, disable the optimized map to honor x86 architectural behavior if multiple vCPUs shared a physical APIC ID. As called out in the changelog that added the hack, all CPUs whose (possibly truncated) APIC ID matches the target are supposed to receive the IPI. KVM intentionally differs from real hardware, because real hardware (Knights Landing) does just "x2apic_id & 0xff" to decide whether to accept the interrupt in xAPIC mode and it can deliver one interrupt to more than one physical destination, e.g. 0x123 to 0x123 and 0x23. Applying the hack even when x2APIC is not fully enabled means KVM doesn't correctly handle scenarios where the guest has aliased xAPIC IDs across multiple vCPUs, as only the vCPU with the lowest vCPU ID will receive any interrupts. It's extremely unlikely any real world guest aliases APIC IDs, or even modifies APIC IDs, but KVM's behavior is arbitrary, e.g. the lowest vCPU ID "wins" regardless of which vCPU is "aliasing" and which vCPU is "normal". Furthermore, the hack is _not_ guaranteed to work! The hack works if and only if the optimized APIC map is successfully allocated. If the map allocation fails (unlikely), KVM will fall back to its unoptimized behavior, which _does_ honor the architectural behavior. Pivot on 32-bit x2APIC IDs being enabled as that is required to take advantage of the hotplug hack (see kvm_apic_state_fixup()), i.e. won't break existing setups unless they are way, way off in the weeds. And an entry in KVM's errata to document the hack. Alternatively, KVM could provide an actual x2APIC quirk and document the hack that way, but there's unlikely to ever be a use case for disabling the quirk. Go the errata route to avoid having to validate a quirk no one cares about. Fixes: 5bd5db38 ("KVM: x86: allow hotplug of VCPU with APIC ID over 0xff") Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-23-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Disable the optimized APIC logical map if multiple vCPUs are aliased to the same logical ID. Architecturally, all CPUs whose logical ID matches the MDA are supposed to receive the interrupt; overwriting existing map entries can result in missed IPIs. Fixes: 1e08ec4a ("KVM: optimize apic interrupt delivery") Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-22-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Disable the optimized APIC logical map if a logical ID covers multiple MDAs, i.e. if a vCPU has multiple bits set in its ID. In logical mode, events match if "ID & MDA != 0", i.e. creating an entry for only the first bit can cause interrupts to be missed. Note, creating an entry for every bit is also wrong as KVM would generate IPIs for every matching bit. It would be possible to teach KVM to play nice with this edge case, but it is very much an edge case and probably not used in any real world OS, i.e. it's not worth optimizing. Fixes: 1e08ec4a ("KVM: optimize apic interrupt delivery") Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-21-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Skip the optimized cluster[] setup for x2APIC logical mode, as KVM reuses the optimized map's phys_map[] and doesn't actually need to insert the target apic into the cluster[]. The LDR is derived from the x2APIC ID, and both are read-only in KVM, thus the vCPU's cluster[ldr] is guaranteed to be the same entry as the vCPU's phys_map[x2apic_id] entry. Skipping the unnecessary setup will allow a future fix for aliased xAPIC logical IDs to simply require that cluster[ldr] is non-NULL, i.e. won't have to special case x2APIC. Alternatively, the future check could allow "cluster[ldr] == apic", but that ends up being terribly confusing because cluster[ldr] is only set at the very end, i.e. it's only possible due to x2APIC's shenanigans. Another alternative would be to send x2APIC down a separate path _after_ the calculation and then assert that all of the above, but the resulting code is rather messy, and it's arguably unnecessary since asserting that the actual LDR matches the expected LDR means that simply testing that interrupts are delivered correctly provides the same guarantees. Reported-by:
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-20-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Track all possibilities for the optimized APIC map's logical modes instead of overloading the pseudo-bitmap and treating any "unknown" value as "invalid". As documented by the now-stale comment above the mode values, the values did have meaning when the optimized map was originally added. That dependent logical was removed by commit e45115b6 ("KVM: x86: use physical LAPIC array for logical x2APIC"), but the obfuscated behavior and its comment were left behind. Opportunistically rename "mode" to "logical_mode", partly to make it clear that the "disabled" case applies only to the logical map, but also to prove that there is no lurking code that expects "mode" to be a bitmap. Functionally, this is a glorified nop. Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-19-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Explicitly skip the optimized map setup if the vCPU's LDR is '0', i.e. if the vCPU will never respond to logical mode interrupts. KVM already skips setup in this case, but relies on kvm_apic_map_get_logical_dest() to generate mask==0. KVM still needs the mask=0 check as a non-zero LDR can yield mask==0 depending on the mode, but explicitly handling the LDR will make it simpler to clean up the logical mode tracking in the future. No functional change intended. Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-18-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add a helper to perform the final kick, two instances of the ICR decoding is one too many. No functional change intended. Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-17-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Document that AVIC is inhibited if any vCPU's APIC ID diverges from its vCPU ID, i.e. that there's no need to check for a destination match in the AVIC kick fast path. Opportunistically tweak comments to remove "guest bug", as that suggests KVM is punting on error handling, which is not the case. Targeting a non-existent vCPU or no vCPUs _may_ be a guest software bug, but whether or not it's a guest bug is irrelevant. Such behavior is architecturally legal and thus needs to faithfully emulated by KVM (and it is). Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-16-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Due to a likely mismerge of patches, KVM ended up with a superfluous commit to "enable" AVIC's fast path for x2AVIC mode. Even worse, the superfluous commit has several bugs and creates a nasty local shadow variable. Rather than fix the bugs piece-by-piece[*] to achieve the same end result, revert the patch wholesale. Opportunistically add a comment documenting the x2AVIC dependencies. This reverts commit 8c9e639d. [*] https://lore.kernel.org/all/YxEP7ZBRIuFWhnYJ@google.com Fixes: 8c9e639d ("KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible") Suggested-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-15-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Suravee Suthikulpanit authored
For X2APIC ID in cluster mode, the logical ID is bit [15:0]. Fixes: 603ccef4 ("KVM: x86: SVM: fix avic_kick_target_vcpus_fast") Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-14-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Compute the destination from ICRH using the sender's x2APIC status, not each (potential) target's x2APIC status. Fixes: c514d3a3 ("KVM: SVM: Update avic_kick_target_vcpus to support 32-bit APIC ID") Cc: Li RongQing <lirongqing@baidu.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Li RongQing <lirongqing@baidu.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-13-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Replace the "avic_mode" enum with a single bool to track whether or not x2AVIC is enabled. KVM already has "apicv_enabled" that tracks if any flavor of AVIC is enabled, i.e. AVIC_MODE_NONE and AVIC_MODE_X1 are redundant and unnecessary noise. No functional change intended. Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-12-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Free the APIC access page memslot if any vCPU enables x2APIC and SVM's AVIC is enabled to prevent accesses to the virtual APIC on vCPUs with x2APIC enabled. On AMD, if its "hybrid" mode is enabled (AVIC is enabled when x2APIC is enabled even without x2AVIC support), keeping the APIC access page memslot results in the guest being able to access the virtual APIC page as x2APIC is fully emulated by KVM. I.e. hardware isn't aware that the guest is operating in x2APIC mode. Exempt nested SVM's update of APICv state from the new logic as x2APIC can't be toggled on VM-Exit. In practice, invoking the x2APIC logic should be harmless precisely because it should be a glorified nop, but play it safe to avoid latent bugs, e.g. with dropping the vCPU's SRCU lock. Intel doesn't suffer from the same issue as APICv has fully independent VMCS controls for xAPIC vs. x2APIC virtualization. Technically, KVM should provide bus error semantics and not memory semantics for the APIC page when x2APIC is enabled, but KVM already provides memory semantics in other scenarios, e.g. if APICv/AVIC is enabled and the APIC is hardware disabled (via APIC_BASE MSR). Note, checking apic_access_memslot_enabled without taking locks relies it being set during vCPU creation (before kvm_vcpu_reset()). vCPUs can race to set the inhibit and delete the memslot, i.e. can get false positives, but can't get false negatives as apic_access_memslot_enabled can't be toggled "on" once any vCPU reaches KVM_RUN. Opportunistically drop the "can" while updating avic_activate_vmcb()'s comment, i.e. to state that KVM _does_ support the hybrid mode. Move the "Note:" down a line to conform to preferred kernel/KVM multi-line comment style. Opportunistically update the apicv_update_lock comment, as it isn't actually used to protect apic_access_memslot_enabled (which is protected by slots_lock). Fixes: 0e311d33 ("KVM: SVM: Introduce hybrid-AVIC mode") Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-11-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Move the APIC access page allocation helper function to common x86 code, the allocation routine is virtually identical between APICv (VMX) and AVIC (SVM). Keep APICv's gfn_to_page() + put_page() sequence, which verifies that a backing page can be allocated, i.e. that the system isn't under heavy memory pressure. Forcing the backing page to be populated isn't strictly necessary, but skipping the effective prefetch only delays the inevitable. Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-10-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Use KVM_REQ_UPDATE_APICV to react to APIC "mode" changes, i.e. to handle the APIC being hardware enabled/disabled and/or x2APIC being toggled. There is no need to immediately update APICv state, the only requirement is that APICv be updating prior to the next VM-Enter. Making a request will allow piggybacking KVM_REQ_UPDATE_APICV to "inhibit" the APICv memslot when x2APIC is enabled. Doing that directly from kvm_lapic_set_base() isn't feasible as KVM's SRCU must not be held when modifying memslots (to avoid deadlock), and may or may not be held when kvm_lapic_set_base() is called, i.e. KVM can't do the right thing without tracking that is rightly buried behind CONFIG_PROVE_RCU=y. Suggested-by:
Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20230106011306.85230-9-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-