- 15 Dec, 2020 9 commits
-
-
Tom Lendacky authored
SEV-ES adds a new VMEXIT reason code, VMGEXIT. Initial support for a VMGEXIT includes mapping the GHCB based on the guest GPA, which is obtained from a new VMCB field, and then validating the required inputs for the VMGEXIT exit reason. Since many of the VMGEXIT exit reasons correspond to existing VMEXIT reasons, the information from the GHCB is copied into the VMCB control exit code areas and KVM register areas. The standard exit handlers are invoked, similar to standard VMEXIT processing. Before restarting the vCPU, the GHCB is updated with any registers that have been updated by the hypervisor. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <c6a4ed4294a369bd75c44d03bd7ce0f0c3840e50.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tom Lendacky authored
This is a pre-patch to consolidate some exit handling code into callable functions. Follow-on patches for SEV-ES exit handling will then be able to use them from the sev.c file. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <5b8b0ffca8137f3e1e257f83df9f5c881c8a96a3.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tom Lendacky authored
When a SHUTDOWN VMEXIT is encountered, normally the VMCB is re-initialized so that the guest can be re-launched. But when a guest is running as an SEV-ES guest, the VMSA cannot be re-initialized because it has been encrypted. For now, just return -EINVAL to prevent a possible attempt at a guest reset. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <aa6506000f6f3a574de8dbcdab0707df844cb00c.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tom Lendacky authored
When a guest is running as an SEV-ES guest, it is not possible to emulate instructions. Add support to prevent instruction emulation. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <f6355ea3024fda0a3eb5eb99c6b62dca10d792bd.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tom Lendacky authored
Since the guest register state of an SEV-ES guest is encrypted, debugging is not supported. Update the code to prevent guest debugging when the guest has protected state. Additionally, an SEV-ES guest must only and always intercept DR7 reads and writes. Update set_dr_intercepts() and clr_dr_intercepts() to account for this. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <8db966fa2f9803d6454ce773863025d0e2e7f3cc.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tom Lendacky authored
When a guest is running under SEV-ES, the hypervisor cannot access the guest register state. There are numerous places in the KVM code where certain registers are accessed that are not allowed to be accessed (e.g. RIP, CR0, etc). Add checks to prevent register accesses and add intercept update support at various points within the KVM code. Also, when handling a VMGEXIT, exceptions are passed back through the GHCB. Since the RDMSR/WRMSR intercepts (may) inject a #GP on error, update the SVM intercepts to handle this for SEV-ES guests. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> [Redo MSR part using the .complete_emulated_msr callback. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
This will be used by SEV-ES to inject MSR failure via the GHCB. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Simplify the four functions that handle {kernel,user} {rd,wr}msr, there is still some repetition between the two instances of rdmsr but the whole business of calling kvm_inject_gp and kvm_skip_emulated_instruction can be unified nicely. Because complete_emulated_wrmsr now becomes essentially a call to kvm_complete_insn_gp, remove complete_emulated_msr. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
There is no need to inject a #GP from kvm_mtrr_set_msr, kvm_emulate_wrmsr will handle it. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 14 Dec, 2020 7 commits
-
-
Tom Lendacky authored
When performing VMGEXIT processing for an SEV-ES guest, register values will be synced between KVM and the GHCB. Prepare for detecting when a GPR has been updated (marked dirty) in order to determine whether to sync the register to the GHCB. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <7ca2a1cdb61456f2fe9c64193e34d601e395c133.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tom Lendacky authored
Allocate a page during vCPU creation to be used as the encrypted VM save area (VMSA) for the SEV-ES guest. Provide a flag in the kvm_vcpu_arch structure that indicates whether the guest state is protected. When freeing a VMSA page that has been encrypted, the cache contents must be flushed using the MSR_AMD64_VM_PAGE_FLUSH before freeing the page. [ i386 build warnings ] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <fde272b17eec804f3b9db18c131262fe074015c5.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tom Lendacky authored
Update the GHCB accessor functions to add functions for retrieve GHCB fields by name. Update existing code to use the new accessor functions. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <664172c53a5fb4959914e1a45d88e805649af0ad.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tom Lendacky authored
Add support to KVM for determining if a system is capable of supporting SEV-ES as well as determining if a guest is an SEV-ES guest. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <e66792323982c822350e40c7a1cf67ea2978a70b.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tom Lendacky authored
When both KVM support and the CCP driver are built into the kernel instead of as modules, KVM initialization can happen before CCP initialization. As a result, sev_platform_status() will return a failure when it is called from sev_hardware_setup(), when this isn't really an error condition. Since sev_platform_status() doesn't need to be called at this time anyway, remove the invocation from sev_hardware_setup(). Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <618380488358b56af558f2682203786f09a49483.1607620209.git.thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Tom Lendacky authored
On systems that do not have hardware enforced cache coherency between encrypted and unencrypted mappings of the same physical page, the hypervisor can use the VM page flush MSR (0xc001011e) to flush the cache contents of an SEV guest page. When a small number of pages are being flushed, this can be used in place of issuing a WBINVD across all CPUs. CPUID 0x8000001f_eax[2] is used to determine if the VM page flush MSR is available. Add a CPUID feature to indicate it is supported and define the MSR. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <f1966379e31f9b208db5257509c4a089a87d33d0.1607620209.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Uros Bizjak authored
Move kvm_machine_check to x86.h to avoid two exact copies of the same function in kvm.c and svm.c. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20201029135600.122392-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 12 Dec, 2020 7 commits
-
-
Paolo Bonzini authored
Merge tag 'kvm-s390-next-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Features and Test for 5.11 - memcg accouting for s390 specific parts of kvm and gmap - selftest for diag318 - new kvm_stat for when async_pf falls back to sync The selftest even triggers a non-critical bug that is unrelated to diag318, fix will follow later.
-
Paolo Bonzini authored
Until commit e7c587da ("x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP"), KVM was testing both Intel and AMD CPUID bits before allowing the guest to write MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD. Testing only Intel bits on VMX processors, or only AMD bits on SVM processors, fails if the guests are created with the "opposite" vendor as the host. While at it, also tweak the host CPU check to use the vendor-agnostic feature bit X86_FEATURE_IBPB, since we only care about the availability of the MSR on the host here and not about specific CPUID bits. Fixes: e7c587da ("x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP") Cc: stable@vger.kernel.org Reported-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Cathy Zhang authored
AVX512_FP16 is supported by Intel processors, like Sapphire Rapids. It could gain better performance for it's faster compared to FP32 if the precision or magnitude requirements are met. It's availability is indicated by CPUID.(EAX=7,ECX=0):EDX[bit 23]. Expose it in KVM supported CPUID, then guest could make use of it; no new registers are used, only new instructions. Signed-off-by: Cathy Zhang <cathy.zhang@intel.com> Signed-off-by: Kyung Min Park <kyung.min.park@intel.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Message-Id: <20201208033441.28207-3-kyung.min.park@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Kyung Min Park authored
Enumerate AVX512 Half-precision floating point (FP16) CPUID feature flag. Compared with using FP32, using FP16 cut the number of bits required for storage in half, reducing the exponent from 8 bits to 5, and the mantissa from 23 bits to 10. Using FP16 also enables developers to train and run inference on deep learning models fast when all precision or magnitude (FP32) is not needed. A processor supports AVX512 FP16 if CPUID.(EAX=7,ECX=0):EDX[bit 23] is present. The AVX512 FP16 requires AVX512BW feature be implemented since the instructions for manipulating 32bit masks are associated with AVX512BW. The only in-kernel usage of this is kvm passthrough. The CPU feature flag is shown as "avx512_fp16" in /proc/cpuinfo. Signed-off-by: Kyung Min Park <kyung.min.park@intel.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Message-Id: <20201208033441.28207-2-kyung.min.park@intel.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Aaron Lewis authored
Both user_msr_test and userspace_msr_exit_test tests the functionality of kvm_msr_filter. Instead of testing this feature in two tests, merge them together, so there is only one test for this feature. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Message-Id: <20201204172530.2958493-1-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Aaron Lewis authored
Add a selftest to test that when the ioctl KVM_X86_SET_MSR_FILTER is called with an MSR list, those MSRs exit to userspace. This test uses 3 MSRs to test this: 1. MSR_IA32_XSS, an MSR the kernel knows about. 2. MSR_IA32_FLUSH_CMD, an MSR the kernel does not know about. 3. MSR_NON_EXISTENT, an MSR invented in this test for the purposes of passing a fake MSR from the guest to userspace. KVM just acts as a pass through. Userspace is also able to inject a #GP. This is demonstrated when MSR_IA32_XSS and MSR_IA32_FLUSH_CMD are misused in the test. When this happens a #GP is initiated in userspace to be thrown in the guest which is handled gracefully by the exception handling framework introduced earlier in this series. Tests for the generic instruction emulator were also added. For this to work the module parameter kvm.force_emulation_prefix=1 has to be enabled. If it isn't enabled the tests will be skipped. A test was also added to ensure the MSR permission bitmap is being set correctly by executing reads and writes of MSR_FS_BASE and MSR_GS_BASE in the guest while alternating which MSR userspace should intercept. If the permission bitmap is being set correctly only one of the MSRs should be coming through at a time, and the guest should be able to read and write the other one directly. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Alexander Graf <graf@amazon.com> Message-Id: <20201012194716.3950330-5-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Uros Bizjak authored
Saves one byte in __vmx_vcpu_run for the same functionality. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20201029140457.126965-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 10 Dec, 2020 4 commits
-
-
Christian Borntraeger authored
Right now we do count pfault (pseudo page faults aka async page faults start and completion events). What we do not count is, if an async page fault would have been possible by the host, but it was disabled by the guest (e.g. interrupts off, pfault disabled, secure execution....). Let us count those as well in the pfault_sync counter. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20201125090658.38463-1-borntraeger@de.ibm.com
-
Collin Walling authored
The DIAGNOSE 0x0318 instruction, unique to s390x, is a privileged call that must be intercepted via SIE, handled in userspace, and the information set by the instruction is communicated back to KVM. To test the instruction interception, an ad-hoc handler is defined which simply has a VM execute the instruction and then userspace will extract the necessary info. The handler is defined such that the instruction invocation occurs only once. It is up to the caller to determine how the info returned by this handler should be used. The diag318 info is communicated from userspace to KVM via a sync_regs call. This is tested during a sync_regs test, where the diag318 info is requested via the handler, then the info is stored in the appropriate register in KVM via a sync registers call. If KVM does not support diag318, then the tests will print a message stating that diag318 was skipped, and the asserts will simply test against a value of 0. Signed-off-by: Collin Walling <walling@linux.ibm.com> Link: https://lore.kernel.org/r/20201207154125.10322-1-walling@linux.ibm.comAcked-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Christian Borntraeger authored
gmap allocations can be attributed to a process. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com>
-
Christian Borntraeger authored
Almost all kvm allocations in the s390x KVM code can be attributed to the process that triggers the allocation (in other words, no global allocation for other guests). This will help the memcg controller to make the right decisions. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com>
-
- 09 Dec, 2020 1 commit
-
-
Maxim Levitsky authored
In the commit 1c96dcce ("KVM: x86: fix apic_accept_events vs check_nested_events"), we accidently started latching SIPIs that are received while the cpu is not waiting for them. This causes vCPUs to never enter a halted state. Fixes: 1c96dcce ("KVM: x86: fix apic_accept_events vs check_nested_events") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201203143319.159394-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 03 Dec, 2020 1 commit
-
-
Paolo Bonzini authored
Since the ASID is now stored in svm->asid, pre_sev_run should also place it there and not directly in the VMCB control area. Reported-by: Ashish Kalra <Ashish.Kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 27 Nov, 2020 1 commit
-
-
Paolo Bonzini authored
SVM generally ignores fixed-1 bits. Set them manually so that we do not end up by mistake without those bits set in struct kvm_vcpu; it is part of userspace API that KVM always returns value with the bits set. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 19 Nov, 2020 2 commits
-
-
Ben Gardon authored
Add an extremely verbose trace point to the TDP MMU to log all SPTE changes, regardless of callstack / motivation. This is useful when a complete picture of the paging structure is needed or a change cannot be explained with the other, existing trace points. Tested: ran the demand paging selftest on an Intel Skylake machine with all the trace points used by the TDP MMU enabled and observed them firing with expected values. This patch can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/3813Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20201027175944.1183301-2-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Ben Gardon authored
The TDP MMU was initially implemented without some of the usual tracepoints found in mmu.c. Correct this discrepancy by adding the missing trace points to the TDP MMU. Tested: ran the demand paging selftest on an Intel Skylake machine with all the trace points used by the TDP MMU enabled and observed them firing with expected values. This patch can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/3812Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20201027175944.1183301-1-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 16 Nov, 2020 5 commits
-
-
Paolo Bonzini authored
Similarly to what vmx/vmx.c does, use vcpu->arch.cr4 to check if CR4 bits PGE, PKE and OSXSAVE have changed. When switching between VMCB01 and VMCB02, CPUID has to be adjusted every time if CR4.PKE or CR4.OSXSAVE change; without this patch, instead, CR4 would be checked against the previous value for L2 on vmentry, and against the previous value for L1 on vmexit, and CPUID would not be updated. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Cathy Avery authored
KVM does not have separate ASIDs for L1 and L2; either the nested hypervisor and nested guests share a single ASID, or on older processor the ASID is used only to implement TLB flushing. Either way, ASIDs are handled at the VM level. In preparation for having different VMCBs passed to VMLOAD/VMRUN/VMSAVE for L1 and L2, store the current ASID to struct vcpu_svm and only move it to the VMCB in svm_vcpu_run. This way, TLB flushes can be applied no matter which VMCB will be active during the next svm_vcpu_run. Signed-off-by: Cathy Avery <cavery@redhat.com> Message-Id: <20201011184818.3609-2-cavery@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Alex Shi authored
This macro is useless, and could cause gcc warning: arch/x86/kernel/kvmclock.c:47:0: warning: macro "HV_CLOCK_SIZE" is not used [-Wunused-macros] Let's remove it. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <1604651963-10067-1-git-send-email-alex.shi@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Andrew Jones authored
Almost all tests do this anyway and the ones that don't don't appear to care. Only vmx_set_nested_state_test assumes that a feature (VMX) is disabled until later setting the supported CPUIDs. It's better to disable that explicitly anyway. Signed-off-by: Andrew Jones <drjones@redhat.com> Message-Id: <20201111122636.73346-11-drjones@redhat.com> [Restore CPUID_VMX, or vmx_set_nested_state breaks. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Andrew Jones authored
Signed-off-by: Andrew Jones <drjones@redhat.com> Message-Id: <20201111122636.73346-12-drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 15 Nov, 2020 3 commits
-
-
Andrew Jones authored
Signed-off-by: Andrew Jones <drjones@redhat.com> Message-Id: <20201111122636.73346-10-drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Andrew Jones authored
Introduce new vm_create variants that also takes a number of vcpus, an amount of per-vcpu pages, and optionally a list of vcpuids. These variants will create default VMs with enough additional pages to cover the vcpu stacks, per-vcpu pages, and pagetable pages for all. The new 'default' variant uses VM_MODE_DEFAULT, whereas the other new variant accepts the mode as a parameter. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Andrew Jones <drjones@redhat.com> Message-Id: <20201111122636.73346-6-drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Andrew Jones authored
The code is almost 100% the same anyway. Just move it to common and add a few arch-specific macros. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Andrew Jones <drjones@redhat.com> Message-Id: <20201111122636.73346-5-drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-