- 28 Sep, 2020 40 commits
-
-
Sean Christopherson authored
Move the logic that controls whether or not FNAME(invlpg) needs to flush fully into FNAME(invlpg) so that mmu_page_zap_pte() doesn't return a value. This allows a future patch to redefine the return semantics for mmu_page_zap_pte() so that it can recursively zap orphaned child shadow pages for nested TDP MMUs. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923221406.16297-2-sean.j.christopherson@intel.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Hyper-V Synthetic timers require SynIC but we don't seem to check that upon HV_X64_MSR_STIMER[X]_CONFIG/HV_X64_MSR_STIMER0_COUNT writes. Make the behavior match synic_set_msr(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200924145757.1035782-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
We forgot to update KVM_GET_SUPPORTED_HV_CPUID's documentation in api.rst when SynDBG leaves were added. While on it, fix 'KVM_GET_SUPPORTED_CPUID' copy-paste error. Fixes: f97f5a56 ("x86/kvm/hyper-v: Add support for synthetic debugger interface") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200924145757.1035782-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
To make kvm_mmu_free_roots() a bit more readable, capture 'kvm' in a local variable instead of doing vcpu->kvm over and over (and over). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923191204.8410-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add a helper function and several wrapping macros to consolidate the copy-paste code in vmx_compute_secondary_exec_control() for adjusting controls that are dependent on guest CPUID bits. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200925003011.21016-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Rename SECONDARY_EXEC_RDTSCP to SECONDARY_EXEC_ENABLE_RDTSCP in preparation for consolidating the logic for adjusting secondary exec controls based on the guest CPUID model. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923165048.20486-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
If PCID is not exposed to the guest, clear INVPCID in the guest's CPUID even if the VMCS INVPCID enable is not supported. This will allow consolidating the secondary execution control adjustment code without having to special case INVPCID. Technically, this fixes a bug where !CPUID.PCID && CPUID.INVCPID would result in unexpected guest behavior (#UD instead of #GP/#PF), but KVM doesn't support exposing INVPCID if it's not supported in the VMCS, i.e. such a config is broken/bogus no matter what. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923165048.20486-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Rename helpers for a few controls to conform to the more prevelant style of cpu_has_vmx_<feature>(). Consistent names will allow adding macros to consolidate the boilerplate code for adjusting secondary execution controls. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923165048.20486-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Li Qiang authored
The 'arch_haltpoll_disable' is used to disable guest halt poll. Correct the comments. Fixes: a1c4423b ("cpuidle-haltpoll: disable host side polling when kvm virtualized") Signed-off-by: Li Qiang <liq3ea@163.com> Message-Id: <20200924155800.4939-1-liq3ea@163.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Use kvm_vcpu_is_illegal_gpa() to check for a legal GPA when validating a PT output base instead of open coding a clever, but difficult to read, variant. Code readability is far more important than shaving a few uops in a slow path. No functional change intended. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200924194250.19137-6-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Rename kvm_mmu_is_illegal_gpa() to kvm_vcpu_is_illegal_gpa() and move it to cpuid.h so that's it's colocated with cpuid_maxphyaddr(). The helper is not MMU specific and will gain a user that is completely unrelated to the MMU in a future patch. No functional change intended. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200924194250.19137-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Replace the subtly not-a-constant MSR_IA32_RTIT_OUTPUT_BASE_MASK with a proper helper function to check whether or not the specified base is valid. Blindly referencing the local 'vcpu' is especially nasty. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200924194250.19137-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Stop exporting cpuid_query_maxphyaddr() now that it's not being abused by VMX. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200924194250.19137-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Use cpuid_maxphyaddr() instead of cpuid_query_maxphyaddr() for the RTIT base MSR check. There is no reason to recompute MAXPHYADDR as the precomputed version is synchronized with CPUID updates, and MSR_IA32_RTIT_OUTPUT_BASE is not written between stuffing CPUID and refreshing vcpu->arch.maxphyaddr. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200924194250.19137-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tom Lendacky authored
The INVD instruction is emulated as a NOP, just skip the instruction instead. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <addd41be2fbf50f5f4059e990a2a0cff182d2136.1600972918.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Very similar content is present in four comments in sev.c. Unfortunately there are small differences that make it harder to place the comment in sev_clflush_pages itself, but at least we can make it more concise. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Cfir Cohen authored
The LAUNCH_SECRET command performs encryption of the launch secret memory contents. Mark pinned pages as dirty, before unpinning them. This matches the logic in sev_launch_update_data(). Signed-off-by: Cfir Cohen <cfir@google.com> Message-Id: <20200808003746.66687-1-cfir@google.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add tracepoints for the early consistency checks in nested_vmx_run(). The "VMLAUNCH vs. VMRESUME" check in particular is useful to trace, as there is no architectural way to check VMCS.LAUNCH_STATE, and subtle bugs such as VMCLEAR on the wrong HPA can lead to confusing errors in the L1 VMM. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200812180615.22372-1-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Krish Sadhukhan authored
Commit 761e4169 created a wrong mask for the CR3 MBZ bits. According to APM vol 2, only the upper 12 bits are MBZ. Fixes: 761e4169 ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests", 2020-07-08) Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20200829004824.4577-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Robert Hoo authored
Per Intel's SDM, RDPID takes a #UD if it is unsupported, which is more or less what KVM is emulating when MSR_TSC_AUX is not available. In fact, there are no scenarios in which RDPID is supposed to #GP. Fixes: fb6d4d34 ("KVM: x86: emulate RDPID") Signed-off-by: Robert Hoo <robert.hu@linux.intel.com> Message-Id: <1598581422-76264-1-git-send-email-robert.hu@linux.intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
On successful nested VM-Enter, check for pending interrupts and convert the highest priority interrupt to a pending posted interrupt if it matches L2's notification vector. If the vCPU receives a notification interrupt before nested VM-Enter (assuming L1 disables IRQs before doing VM-Enter), the pending interrupt (for L1) should be recognized and processed as a posted interrupt when interrupts become unblocked after VM-Enter to L2. This fixes a bug where L1/L2 will get stuck in an infinite loop if L1 is trying to inject an interrupt into L2 by setting the appropriate bit in L2's PIR and sending a self-IPI prior to VM-Enter (as opposed to KVM's method of manually moving the vector from PIR->vIRR/RVI). KVM will observe the IPI while the vCPU is in L1 context and so won't immediately morph it to a posted interrupt for L2. The pending interrupt will be seen by vmx_check_nested_events(), cause KVM to force an immediate exit after nested VM-Enter, and eventually be reflected to L1 as a VM-Exit. After handling the VM-Exit, L1 will see that L2 has a pending interrupt in PIR, send another IPI, and repeat until L2 is killed. Note, posted interrupts require virtual interrupt deliveriy, and virtual interrupt delivery requires exit-on-interrupt, ergo interrupts will be unconditionally unmasked on VM-Enter if posted interrupts are enabled. Fixes: 705699a1 ("KVM: nVMX: Enable nested posted interrupt processing") Cc: stable@vger.kernel.org Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200812175129.12172-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tianjia Zhang authored
In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu' structure. For historical reasons, many kvm-related function parameters retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This patch does a unified cleanup of these remaining redundant parameters. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: Huacai Chen <chenhc@lemote.com> Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Message-Id: <20200623131418.31473-6-tianjia.zhang@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Haiwei Li authored
Add trace_kvm_cr_write and trace_kvm_cr_read for svm. Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Message-Id: <f3031602-db3b-c4fe-b719-d402663b0a2b@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Wanpeng Li authored
Analyze is_guest_mode() in svm_vcpu_run() instead of svm_exit_handlers_fastpath() in conformity with VMX version. Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1600066548-4343-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Rework NMI VM-Exit handling to invoke the kernel handler by function call instead of INTn. INTn microcode is relatively expensive, and aligning the IRQ and NMI handling will make it easier to update KVM should some newfangled method for invoking the handlers come along. Suggested-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200915191505.10355-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Move the asm blob that invokes the appropriate IRQ handler after VM-Exit into a proper subroutine. Unconditionally create a stack frame in the subroutine so that, as objtool sees things, the function has standard stack behavior. The dynamic stack adjustment makes using unwind hints problematic. Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200915191505.10355-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Replace the existing kvm_x86_ops.need_emulation_on_page_fault() with a more generic is_emulatable(), and unconditionally call the new function in x86_emulate_instruction(). KVM will use the generic hook to support multiple security related technologies that prevent emulation in one way or another. Similar to the existing AMD #NPF case where emulation of the current instruction is not possible due to lack of information, AMD's SEV-ES and Intel's SGX and TDX will introduce scenarios where emulation is impossible due to the guest's register state being inaccessible. And again similar to the existing #NPF case, emulation can be initiated by kvm_mmu_page_fault(), i.e. outside of the control of vendor-specific code. While the cause and architecturally visible behavior of the various cases are different, e.g. SGX will inject a #UD, AMD #NPF is a clean resume or complete shutdown, and SEV-ES and TDX "return" an error, the impact on the common emulation code is identical: KVM must stop emulation immediately and resume the guest. Query is_emulatable() in handle_ud() as well so that the force_emulation_prefix code doesn't incorrectly modify RIP before calling emulate_instruction() in the absurdly unlikely scenario that KVM encounters forced emulation in conjunction with "do not emulate". Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200915232702.15945-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Haiwei Li authored
Use __GFP_ZERO while alloc_page(). Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Message-Id: <20200916083621.5512-1-lihaiwei.kernel@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Krish Sadhukhan authored
KVM: nVMX: KVM needs to unset "unrestricted guest" VM-execution control in vmcs02 if vmcs12 doesn't set it Currently, prepare_vmcs02_early() does not check if the "unrestricted guest" VM-execution control in vmcs12 is turned off and leaves the corresponding bit on in vmcs02. Due to this setting, vmentry checks which are supposed to render the nested guest state as invalid when this VM-execution control is not set, are passing in hardware. This patch turns off the "unrestricted guest" VM-execution control in vmcs02 if vmcs12 has turned it off. Suggested-by: Jim Mattson <jmattson@google.com> Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20200921081027.23047-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
MSR reads/writes should always access the L1 state, since the (nested) hypervisor should intercept all the msrs it wants to adjust, and these that it doesn't should be read by the guest as if the host had read it. However IA32_TSC is an exception. Even when not intercepted, guest still reads the value + TSC offset. The write however does not take any TSC offset into account. This is documented in Intel's SDM and seems also to happen on AMD as well. This creates a problem when userspace wants to read the IA32_TSC value and then write it. (e.g for migration) In this case it reads L2 value but write is interpreted as an L1 value. To fix this make the userspace initiated reads of IA32_TSC return L1 value as well. Huge thanks to Dave Gilbert for helping me understand this very confusing semantic of MSR writes. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200921103805.9102-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Rustam Kovhaev authored
Make use of the struct_size() helper to avoid any potential type mistakes and protect against potential integer overflows Make use of the flex_array_size() helper to calculate the size of a flexible array member within an enclosing structure Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com> Message-Id: <20200918120500.954436-1-rkovhaev@gmail.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Babu Moger authored
The following intercept bit has been added to support VMEXIT for INVPCID instruction: Code Name Cause A2h VMEXIT_INVPCID INVPCID instruction The following bit has been added to the VMCB layout control area to control intercept of INVPCID: Byte Offset Bit(s) Function 14h 2 intercept INVPCID Enable the interceptions when the the guest is running with shadow page table enabled and handle the tlbflush based on the invpcid instruction type. For the guests with nested page table (NPT) support, the INVPCID feature works as running it natively. KVM does not need to do any special handling in this case. AMD documentation for INVPCID feature is available at "AMD64 Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)" The documentation can be obtained at the links below: Link: https://www.amd.com/system/files/TechDocs/24593.pdf Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985255929.11252.17346684135277453258.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Babu Moger authored
INVPCID instruction handling is mostly same across both VMX and SVM. So, move the code to common x86.c. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985255212.11252.10322694343971983487.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Babu Moger authored
Handling of kvm_read/write_guest_virt*() errors can be moved to common code. The same code can be used by both VMX and SVM. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985254493.11252.6603092560732507607.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Babu Moger authored
Remove set_cr_intercept, clr_cr_intercept and is_cr_intercept. Instead call generic svm_set_intercept, svm_clr_intercept an dsvm_is_intercep tfor all cr intercepts. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985253016.11252.16945893859439811480.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Babu Moger authored
The new intercept bits have been added in vmcb control area to support few more interceptions. Here are the some of them. - INTERCEPT_INVLPGB, - INTERCEPT_INVLPGB_ILLEGAL, - INTERCEPT_INVPCID, - INTERCEPT_MCOMMIT, - INTERCEPT_TLBSYNC, Add a new intercept word in vmcb_control_area to support these instructions. Also update kvm_nested_vmrun trace function to support the new addition. AMD documentation for these instructions is available at "AMD64 Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)" The documentation can be obtained at the links below: Link: https://www.amd.com/system/files/TechDocs/24593.pdf Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985251547.11252.16994139329949066945.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Babu Moger authored
Convert all the intercepts to one array of 32 bit vectors in vmcb_control_area. This makes it easy for future intercept vector additions. Also update trace functions. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985250813.11252.5736581193881040525.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Babu Moger authored
Modify intercept_exceptions to generic intercepts in vmcb_control_area. Use the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to set/clear/test the intercept_exceptions bits. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985250037.11252.1361972528657052410.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Babu Moger authored
Modify intercept_dr to generic intercepts in vmcb_control_area. Use the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to set/clear/test the intercept_dr bits. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985249255.11252.10000868032136333355.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Babu Moger authored
Change intercept_cr to generic intercepts in vmcb_control_area. Use the new vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept where applicable. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985248506.11252.9081085950784508671.stgit@bmoger-ubuntu> [Change constant names. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-