- 04 Feb, 2021 27 commits
-
-
Sean Christopherson authored
Use x2apic_mode instead of x2apic_enabled() when adjusting the destination ID during Posted Interrupt updates. This avoids the costly RDMSR that is hidden behind x2apic_enabled(). Reported-by: luferry <luferry@163.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210115220354.434807-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Export x2apic_mode so that KVM can query whether x2APIC is active without having to incur the RDMSR in x2apic_enabled(). When Posted Interrupts are in use for a guest with an assigned device, KVM ends up checking for x2APIC at least once every time a vCPU halts. KVM could obviously snapshot x2apic_enabled() to avoid the RDMSR, but that's rather silly given that x2apic_mode holds the exact info needed by KVM. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210115220354.434807-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Chenyi Qiang authored
Introduce a new capability named KVM_CAP_X86_BUS_LOCK_EXIT, which is used to handle bus locks detected in guest. It allows the userspace to do custom throttling policies to mitigate the 'noisy neighbour' problem. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20201106090315.18606-5-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Chenyi Qiang authored
Virtual Machine can exploit bus locks to degrade the performance of system. Bus lock can be caused by split locked access to writeback(WB) memory or by using locks on uncacheable(UC) memory. The bus lock is typically >1000 cycles slower than an atomic operation within a cache line. It also disrupts performance on other cores (which must wait for the bus lock to be released before their memory operations can complete). To address the threat, bus lock VM exit is introduced to notify the VMM when a bus lock was acquired, allowing it to enforce throttling or other policy based mitigations. A VMM can enable VM exit due to bus locks by setting a new "Bus Lock Detection" VM-execution control(bit 30 of Secondary Processor-based VM execution controls). If delivery of this VM exit was preempted by a higher priority VM exit (e.g. EPT misconfiguration, EPT violation, APIC access VM exit, APIC write VM exit, exception bitmap exiting), bit 26 of exit reason in vmcs field is set to 1. In current implementation, the KVM exposes this capability through KVM_CAP_X86_BUS_LOCK_EXIT. The user can get the supported mode bitmap (i.e. off and exit) and enable it explicitly (disabled by default). If bus locks in guest are detected by KVM, exit to user space even when current exit reason is handled by KVM internally. Set a new field KVM_RUN_BUS_LOCK in vcpu->run->flags to inform the user space that there is a bus lock detected in guest. Document for Bus Lock VM exit is now available at the latest "Intel Architecture Instruction Set Extensions Programming Reference". Document Link: https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.htmlCo-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20201106090315.18606-4-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Chenyi Qiang authored
Reset the vcpu->run->flags at the beginning of kvm_arch_vcpu_ioctl_run. It can avoid every thunk of code that needs to set the flag clear it, which increases the odds of missing a case and ending up with a flag in an undefined state. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20201106090315.18606-3-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Convert vcpu_vmx.exit_reason from a u32 to a union (of size u32). The full VM_EXIT_REASON field is comprised of a 16-bit basic exit reason in bits 15:0, and single-bit modifiers in bits 31:16. Historically, KVM has only had to worry about handling the "failed VM-Entry" modifier, which could only be set in very specific flows and required dedicated handling. I.e. manually stripping the FAILED_VMENTRY bit was a somewhat viable approach. But even with only a single bit to worry about, KVM has had several bugs related to comparing a basic exit reason against the full exit reason store in vcpu_vmx. Upcoming Intel features, e.g. SGX, will add new modifier bits that can be set on more or less any VM-Exit, as opposed to the significantly more restricted FAILED_VMENTRY, i.e. correctly handling everything in one-off flows isn't scalable. Tracking exit reason in a union forces code to explicitly choose between consuming the full exit reason and the basic exit, and is a convenient way to document and access the modifiers. No functional change intended. Cc: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20201106090315.18606-2-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Brijesh Singh authored
The SEV FW version >= 0.23 added a new command that can be used to query the attestation report containing the SHA-256 digest of the guest memory encrypted through the KVM_SEV_LAUNCH_UPDATE_{DATA, VMSA} commands and sign the report with the Platform Endorsement Key (PEK). See the SEV FW API spec section 6.8 for more details. Note there already exist a command (KVM_SEV_LAUNCH_MEASURE) that can be used to get the SHA-256 digest. The main difference between the KVM_SEV_LAUNCH_MEASURE and KVM_SEV_ATTESTATION_REPORT is that the latter can be called while the guest is running and the measurement value is signed with PEK. Cc: James Bottomley <jejb@linux.ibm.com> Cc: Tom Lendacky <Thomas.Lendacky@amd.com> Cc: David Rientjes <rientjes@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: John Allen <john.allen@amd.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: linux-crypto@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: David Rientjes <rientjes@google.com> Tested-by: James Bottomley <jejb@linux.ibm.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Message-Id: <20210104151749.30248-1-brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Ben Gardon authored
Disabling dirty logging is much more intestesting from a testing perspective if the vCPUs are still running. This also excercises the code-path in which collapsible SPTEs must be faulted back in at a higher level after disabling dirty logging. To: linux-kselftest@vger.kernel.org CC: Peter Xu <peterx@redhat.com> CC: Andrew Jones <drjones@redhat.com> CC: Thomas Huth <thuth@redhat.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-29-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Ben Gardon authored
Add a parameter to control the backing memory type for dirty_log_perf_test so that the test can be run with hugepages. To: linux-kselftest@vger.kernel.org CC: Peter Xu <peterx@redhat.com> CC: Andrew Jones <drjones@redhat.com> CC: Thomas Huth <thuth@redhat.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-28-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Ben Gardon authored
Add a memslot modification stress test in which a memslot is repeatedly created and removed while vCPUs access memory in another memslot. Most userspaces do not create or remove memslots on running VMs which makes it hard to test races in adding and removing memslots without a dedicated test. Adding and removing a memslot also has the effect of tearing down the entire paging structure, which leads to more page faults and pressure on the page fault handling path than a one-and-done memory population test. Reviewed-by: Jacob Xu <jacobhxu@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210112214253.463999-7-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Ben Gardon authored
Add an option to overlap the ranges of memory each vCPU accesses instead of partitioning them. This option will increase the probability of multiple vCPUs faulting on the same page at the same time, and causing interesting races, if there are bugs in the page fault handler or elsewhere in the kernel. Reviewed-by: Jacob Xu <jacobhxu@google.com> Reviewed-by: Makarand Sonare <makarandsonare@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210112214253.463999-6-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Ben Gardon authored
Currently the population stage in the dirty_log_perf_test does nothing as the per-vCPU iteration counters are not initialized and the loop does not wait for each vCPU. Remedy those errors. Reviewed-by: Jacob Xu <jacobhxu@google.com> Reviewed-by: Makarand Sonare <makarandsonare@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210112214253.463999-5-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Ben Gardon authored
In order to add an iteration -1 to indicate that the memory population phase has not yet completed, convert the interations counters to ints. No functional change intended. Reviewed-by: Jacob Xu <jacobhxu@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210112214253.463999-4-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Ben Gardon authored
Peter Xu pointed out that a log message printed while waiting for the memory population phase of the dirty_log_perf_test will flood the debug logs as there is no delay after printing the message. Since the message does not provide much value anyway, remove it. Reviewed-by: Jacob Xu <jacobhxu@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210112214253.463999-3-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Ben Gardon authored
In response to some earlier comments from Peter Xu, rename timespec_diff_now to the much more sensible timespec_elapsed. No functional change intended. Reviewed-by: Jacob Xu <jacobhxu@google.com> Reviewed-by: Makarand Sonare <makarandsonare@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210112214253.463999-2-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Remove the update_pte() shadow paging logic, which was obsoleted by commit 4731d4c7 ("KVM: MMU: out of sync shadow core"), but never removed. As pointed out by Yu, KVM never write protects leaf page tables for the purposes of shadow paging, and instead marks their associated shadow page as unsync so that the guest can write PTEs at will. The update_pte() path, which predates the unsync logic, optimizes COW scenarios by refreshing leaf SPTEs when they are written, as opposed to zapping the SPTE, restarting the guest, and installing the new SPTE on the subsequent fault. Since KVM no longer write-protects leaf page tables, update_pte() is unreachable and can be dropped. Reported-by: Yu Zhang <yu.c.zhang@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210115004051.4099250-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Peter Shier authored
When a guest is using xAPIC KVM allocates a backing page for the required EPT entry for the APIC access address set in the VMCS. If mm decides to move that page the KVM mmu notifier will update the VMCS with the new HPA. This test induces a page move to test that APIC access continues to work correctly. It is a directed test for commit e649b3f0 "KVM: x86: Fix APIC page invalidation race". Tested: ran for 1 hour on a skylake, migrating backing page every 1ms Depends on patch "selftests: kvm: Add exception handling to selftests" from aaronlewis@google.com that has not yet been queued. Signed-off-by: Peter Shier <pshier@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Message-Id: <20201105223823.850068-1-pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Yang Zhong authored
Expose AVX (VEX-encoded) versions of the Vector Neural Network Instructions to guest. The bit definition: CPUID.(EAX=7,ECX=1):EAX[bit 4] AVX_VNNI The following instructions are available when this feature is present in the guest. 1. VPDPBUS: Multiply and Add Unsigned and Signed Bytes 2. VPDPBUSDS: Multiply and Add Unsigned and Signed Bytes with Saturation 3. VPDPWSSD: Multiply and Add Signed Word Integers 4. VPDPWSSDS: Multiply and Add Signed Integers with Saturation This instruction is currently documented in the latest "extensions" manual (ISE). It will appear in the "main" manual (SDM) in the future. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Message-Id: <20210105004909.42000-3-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Kyung Min Park authored
Add AVX version of the Vector Neural Network (VNNI) Instructions. A processor supports AVX VNNI instructions if CPUID.0x07.0x1:EAX[4] is present. The following instructions are available when this feature is present. 1. VPDPBUS: Multiply and Add Unsigned and Signed Bytes 2. VPDPBUSDS: Multiply and Add Unsigned and Signed Bytes with Saturation 3. VPDPWSSD: Multiply and Add Signed Word Integers 4. VPDPWSSDS: Multiply and Add Signed Integers with Saturation The only in-kernel usage of this is kvm passthrough. The CPU feature flag is shown as "avx_vnni" in /proc/cpuinfo. This instruction is currently documented in the latest "extensions" manual (ISE). It will appear in the "main" manual (SDM) in the future. Signed-off-by: Kyung Min Park <kyung.min.park@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Message-Id: <20210105004909.42000-2-yang.zhong@intel.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
YANG LI authored
Fix the following coccicheck warning: ./arch/x86/kvm/x86.c:8012:5-48: WARNING: Comparison to bool Signed-off-by: YANG LI <abaci-bugfix@linux.alibaba.com> Reported-by: Abaci Robot <abaci@linux.alibaba.com> Message-Id: <1610357578-66081-1-git-send-email-abaci-bugfix@linux.alibaba.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Walk the list of MMU pages in reverse in kvm_mmu_zap_oldest_mmu_pages(). The list is FIFO, meaning new pages are inserted at the head and thus the oldest pages are at the tail. Using a "forward" iterator causes KVM to zap MMU pages that were just added, which obliterates guest performance once the max number of shadow MMU pages is reached. Fixes: 6b82ef2c ("KVM: x86/mmu: Batch zap MMU pages when recycling oldest pages") Reported-by: Zdenek Kaspar <zkaspar82@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210113205030.3481307-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Return a 'bool' instead of an 'int' for various PTE accessors that are boolean in nature, e.g. is_shadow_present_pte(). Returning an int is goofy and potentially dangerous, e.g. if a flag being checked is moved into the upper 32 bits of a SPTE, then the compiler may silently squash the entire check since casting to an int is guaranteed to yield a return value of '0'. Opportunistically refactor is_last_spte() so that it naturally returns a bool value instead of letting it implicitly cast 0/1 to false/true. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210123003003.3137525-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tian Tao authored
fixed the following warning: /virt/kvm/dirty_ring.c:70:20-27: WARNING: vzalloc should be used for ring -> dirty_gfns, instead of vmalloc/memset. Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Message-Id: <1611547045-13669-1-git-send-email-tiantao6@hisilicon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Enter a SRCU critical section for a memslots lookup during steal time update if and only if a steal time update is actually needed. Taking the lock can be avoided if steal time is disabled by the guest, or if KVM knows it has already flagged the vCPU as being preempted. Reword the comment to be more precise as to exactly why memslots will be queried. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210123000334.3123628-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Remove the disabling of page faults across kvm_steal_time_set_preempted() as KVM now accesses the steal time struct (shared with the guest) via a cached mapping (see commit b0431382, "x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed".) The cache lookup is flagged as atomic, thus it would be a bug if KVM tried to resolve a new pfn, i.e. we want the splat that would be reached via might_fault(). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210123000334.3123628-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
In order to convert an HVA to a PFN, KVM usually tries to use the get_user_pages family of functinso. This however is not possible for VM_IO vmas; in that case, KVM instead uses follow_pfn. In doing this however KVM loses the information on whether the PFN is writable. That is usually not a problem because the main use of VM_IO vmas with KVM is for BARs in PCI device assignment, however it is a bug. To fix it, use follow_pte and check pte_write while under the protection of the PTE lock. The information can be used to fail hva_to_pfn_remapped or passed back to the caller via *writable. Usage of follow_pfn was introduced in commit add6a0cd ("KVM: MMU: try to fix up page faults before giving up", 2016-07-05); however, even older version have the same issue, all the way back to commit 2e2e3738 ("KVM: Handle vma regions with no backing page", 2008-07-20), as they also did not check whether the PFN was writable. Fixes: 2e2e3738 ("KVM: Handle vma regions with no backing page") Reported-by: David Stevens <stevensd@google.com> Cc: 3pvd@google.com Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Ben Gardon authored
There is a bug in the TDP MMU function to zap SPTEs which could be replaced with a larger mapping which prevents the function from doing anything. Fix this by correctly zapping the last level SPTEs. Cc: stable@vger.kernel.org Fixes: 14881998 ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU") Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-11-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 03 Feb, 2021 2 commits
-
-
Paolo Bonzini authored
If not in long mode, the low bits of CR3 are reserved but not enforced to be zero, so remove those checks. If in long mode, however, the MBZ bits extend down to the highest physical address bit of the guest, excluding the encryption bit. Make the checks consistent with the above, and match them between nested_vmcb_checks and KVM_SET_SREGS. Cc: stable@vger.kernel.org Fixes: 761e4169 ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests") Fixes: a780a3ea ("KVM: X86: Fix reserved bits check for MOV to CR3") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Don't let KVM load when running as an SEV guest, regardless of what CPUID says. Memory is encrypted with a key that is not accessible to the host (L0), thus it's impossible for L0 to emulate SVM, e.g. it'll see garbage when reading the VMCB. Technically, KVM could decrypt all memory that needs to be accessible to the L0 and use shadow paging so that L0 does not need to shadow NPT, but exposing such information to L0 largely defeats the purpose of running as an SEV guest. This can always be revisited if someone comes up with a use case for running VMs inside SEV guests. Note, VMLOAD, VMRUN, etc... will also #GP on GPAs with C-bit set, i.e. KVM is doomed even if the SEV guest is debuggable and the hypervisor is willing to decrypt the VMCB. This may or may not be fixed on CPUs that have the SVME_ADDR_CHK fix. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210202212017.2486595-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 02 Feb, 2021 1 commit
-
-
Sean Christopherson authored
Set the emulator context to PROT64 if SYSENTER transitions from 32-bit userspace (compat mode) to a 64-bit kernel, otherwise the RIP update at the end of x86_emulate_insn() will incorrectly truncate the new RIP. Note, this bug is mostly limited to running an Intel virtual CPU model on an AMD physical CPU, as other combinations of virtual and physical CPUs do not trigger full emulation. On Intel CPUs, SYSENTER in compatibility mode is legal, and unconditionally transitions to 64-bit mode. On AMD CPUs, SYSENTER is illegal in compatibility mode and #UDs. If the vCPU is AMD, KVM injects a #UD on SYSENTER in compat mode. If the pCPU is Intel, SYSENTER will execute natively and not trigger #UD->VM-Exit (ignoring guest TLB shenanigans). Fixes: fede8076 ("KVM: x86: handle wrap around 32-bit address space") Cc: stable@vger.kernel.org Signed-off-by: Jonny Barker <jonny@jonnybarker.com> [sean: wrote changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210202165546.2390296-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 01 Feb, 2021 3 commits
-
-
Vitaly Kuznetsov authored
Commit 7a873e45 ("KVM: selftests: Verify supported CR4 bits can be set before KVM_SET_CPUID2") reveals that KVM allows to set X86_CR4_PCIDE even when PCID support is missing: ==== Test Assertion Failure ==== x86_64/set_sregs_test.c:41: rc pid=6956 tid=6956 - Invalid argument 1 0x000000000040177d: test_cr4_feature_bit at set_sregs_test.c:41 2 0x00000000004014fc: main at set_sregs_test.c:119 3 0x00007f2d9346d041: ?? ??:0 4 0x000000000040164d: _start at ??:? KVM allowed unsupported CR4 bit (0x20000) Add X86_FEATURE_PCID feature check to __cr4_reserved_bits() to make kvm_is_valid_cr4() fail. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210201142843.108190-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Zheng Zhan Liang authored
Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Zheng Zhan Liang <zhengzhanliang@huorong.cn> Message-Id: <20210201055310.267029-1-zhengzhanliang@huorong.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Userspace that does not know about KVM_GET_MSR_FEATURE_INDEX_LIST will generally use the default value for MSR_IA32_ARCH_CAPABILITIES. When this happens and the host has tsx=on, it is possible to end up with virtual machines that have HLE and RTM disabled, but TSX_CTRL available. If the fleet is then switched to tsx=off, kvm_get_arch_capabilities() will clear the ARCH_CAP_TSX_CTRL_MSR bit and it will not be possible to use the tsx=off hosts as migration destinations, even though the guests do not have TSX enabled. To allow this migration, allow guests to write to their TSX_CTRL MSR, while keeping the host MSR unchanged for the entire life of the guests. This ensures that TSX remains disabled and also saves MSR reads and writes, and it's okay to do because with tsx=off we know that guests will not have the HLE and RTM features in their CPUID. (If userspace sets bogus CPUID data, we do not expect HLE and RTM to work in guests anyway). Cc: stable@vger.kernel.org Fixes: cbbaa272 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 28 Jan, 2021 4 commits
-
-
Peter Gonda authored
Grab kvm->lock before pinning memory when registering an encrypted region; sev_pin_memory() relies on kvm->lock being held to ensure correctness when checking and updating the number of pinned pages. Add a lockdep assertion to help prevent future regressions. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org Fixes: 1e80fdc0 ("KVM: SVM: Pin guest memory when SEV is active") Signed-off-by: Peter Gonda <pgonda@google.com> V2 - Fix up patch description - Correct file paths svm.c -> sev.c - Add unlock of kvm->lock on sev_pin_memory error V1 - https://lore.kernel.org/kvm/20210126185431.1824530-1-pgonda@google.com/ Message-Id: <20210127161524.2832400-1-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Yu Zhang authored
Nested VMX was enabled by default in commit 1e58e5e5 ("KVM: VMX: enable nested virtualization by default"), which was merged in Linux 4.20. This patch is to fix the documentation accordingly. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Message-Id: <20210128154747.4242-1-yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Merge tag 'kvmarm-fixes-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.11, take #3 - Avoid clobbering extra registers on initialisation
-
Michael Roth authored
Recent commit 255cbecf modified struct kvm_vcpu_arch to make 'cpuid_entries' a pointer to an array of kvm_cpuid_entry2 entries rather than embedding the array in the struct. KVM_SET_CPUID and KVM_SET_CPUID2 were updated accordingly, but KVM_GET_CPUID2 was missed. As a result, KVM_GET_CPUID2 currently returns random fields from struct kvm_vcpu_arch to userspace rather than the expected CPUID values. Fix this by treating 'cpuid_entries' as a pointer when copying its contents to userspace buffer. Fixes: 255cbecf ("KVM: x86: allocate vcpu->arch.cpuid_entries dynamically") Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Michael Roth <michael.roth@amd.com.com> Message-Id: <20210128024451.1816770-1-michael.roth@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 25 Jan, 2021 3 commits
-
-
Paolo Bonzini authored
VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS, which may need to be loaded outside guest mode. Therefore we cannot WARN in that case. However, that part of nested_get_vmcs12_pages is _not_ needed at vmentry time. Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling, so that both vmentry and migration (and in the latter case, independent of is_guest_mode) do the parts that are needed. Cc: <stable@vger.kernel.org> # 5.10.x: f2c7ef3b: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES Cc: <stable@vger.kernel.org> # 5.10.x Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Revert the dirty/available tracking of GPRs now that KVM copies the GPRs to the GHCB on any post-VMGEXIT VMRUN, even if a GPR is not dirty. Per commit de3cd117 ("KVM: x86: Omit caching logic for always-available GPRs"), tracking for GPRs noticeably impacts KVM's code footprint. This reverts commit 1c04d8c9. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210122235049.3107620-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Drop the per-GPR dirty checks when synchronizing GPRs to the GHCB, the GRPs' dirty bits are set from time zero and never cleared, i.e. will always be seen as dirty. The obvious alternative would be to clear the dirty bits when appropriate, but removing the dirty checks is desirable as it allows reverting GPR dirty+available tracking, which adds overhead to all flavors of x86 VMs. Note, unconditionally writing the GPRs in the GHCB is tacitly allowed by the GHCB spec, which allows the hypervisor (or guest) to provide unnecessary info; it's the guest's responsibility to consume only what it needs (the hypervisor is untrusted after all). The guest and hypervisor can supply additional state if desired but must not rely on that additional state being provided. Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Fixes: 291bd20d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210122235049.3107620-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-