- 16 Jan, 2018 34 commits
-
-
Paolo Bonzini authored
These fields are also simple copies of the data in the vmcs12 struct. For some of them, prepare_vmcs02 was skipping the copy when the field was unused. In prepare_vmcs02_full, we copy them always as long as the field exists on the host, because the corresponding execution control might be one of the shadowed fields. Optimization opportunities remain for MSRs that, depending on the entry/exit controls, have to be copied from either the vmcs01 or the vmcs12: EFER (whose value is partly stored in the entry controls too), PAT, DEBUGCTL (and also DR7). Before moving these three and the entry/exit controls to prepare_vmcs02_full, KVM would have to set dirty_vmcs12 on writes to the L1 MSRs. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
This part is separate for ease of review, because git prefers to move prepare_vmcs02 below the initial long sequence of vmcs_write* operations. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
VMCS12 fields that are not handled through shadow VMCS are rarely written, and thus they are also almost constant in the vmcs02. We can thus optimize prepare_vmcs02 by skipping all the work for non-shadowed fields in the common case. This patch introduces the (pretty simple) tracking infrastructure; the next patches will move work to prepare_vmcs02_full and save a few hundred clock cycles per VMRESUME on a Haswell Xeon E5 system: before after cpuid 14159 13869 vmcall 15290 14951 inl_from_kernel 17703 17447 outl_to_kernel 16011 14692 self_ipi_sti_nop 16763 15825 self_ipi_tpr_sti_nop 17341 15935 wr_tsc_adjust_msr 14510 14264 rd_tsc_adjust_msr 15018 14311 mmio-wildcard-eventfd:pci-mem 16381 14947 mmio-datamatch-eventfd:pci-mem 18620 17858 portio-wildcard-eventfd:pci-io 15121 14769 portio-datamatch-eventfd:pci-io 15761 14831 (average savings 748, stdev 460). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
Prepare for multiple inclusions of the list. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Jim Mattson authored
The vmcs_field_to_offset_table was a rather sparse table of short integers with a maximum index of 0x6c16, amounting to 55342 bytes. Now that we are considering support for multiple VMCS12 formats, it would be unfortunate to replicate that large, sparse table. Rotating the field encoding (as a 16-bit integer) left by 6 reduces that table to 5926 bytes. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Jim Mattson authored
Per the SDM, "[VMCS] Fields are grouped by width (16-bit, 32-bit, etc.) and type (guest-state, host-state, etc.)." Previously, the width was indicated by vmcs_field_type. To avoid confusion when we start dealing with both field width and field type, change vmcs_field_type to vmcs_field_width. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Jim Mattson authored
This is the highest index value used in any supported VMCS12 field encoding. It is used to populate the IA32_VMX_VMCS_ENUM MSR. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
Because all fields can be read/written with a single vmread/vmwrite on 64-bit kernels, the switch statements in copy_vmcs12_to_shadow and copy_shadow_to_vmcs12 are unnecessary. What I did in this patch is to copy the two parts of 64-bit fields separately on 32-bit kernels, to keep all complicated #ifdef-ery in init_vmcs_shadow_fields. The disadvantage is that 64-bit fields have to be listed separately in shadow_read_only/read_write_fields, but those are few and we can validate the arrays when building the VMREAD and VMWRITE bitmaps. This saves a few hundred clock cycles per nested vmexit. However there is still a "switch" in vmcs_read_any and vmcs_write_any. So, while at it, this patch reorders the fields by type, hoping that the branch predictor appreciates it. Cc: Jim Mattson <jmattson@google.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
Compared to when VMCS shadowing was added to KVM, we are reading/writing a few more fields: the PML index, the interrupt status and the preemption timer value. The first two are because we are exposing more features to nested guests, the preemption timer is simply because we have grown a new optimization. Adding them to the shadow VMCS field lists reduces the cost of a vmexit by about 1000 clock cycles for each field that exists on bare metal. On the other hand, the guest BNDCFGS and TSC offset are not written on fast paths, so remove them. Suggested-by: Jim Mattson <jmattson@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linuxRadim Krčmář authored
KVM: s390: Fixes and features for 4.16 - add the virtio-ccw transport for kvmconfig - more debug tracing for cpu model - cleanups and fixes
-
Liran Alon authored
Consider the following scenario: 1. CPU A calls vmx_deliver_nested_posted_interrupt() to send an IPI to CPU B via virtual posted-interrupt mechanism. 2. CPU B is currently executing L2 guest. 3. vmx_deliver_nested_posted_interrupt() calls kvm_vcpu_trigger_posted_interrupt() which will note that vcpu->mode == IN_GUEST_MODE. 4. Assume that before CPU A sends the physical POSTED_INTR_NESTED_VECTOR IPI, CPU B exits from L2 to L0 during event-delivery (valid IDT-vectoring-info). 5. CPU A now sends the physical IPI. The IPI is received in host and it's handler (smp_kvm_posted_intr_nested_ipi()) does nothing. 6. Assume that before CPU A sets pi_pending=true and KVM_REQ_EVENT, CPU B continues to run in L0 and reach vcpu_enter_guest(). As KVM_REQ_EVENT is not set yet, vcpu_enter_guest() will continue and resume L2 guest. 7. At this point, CPU A sets pi_pending=true and KVM_REQ_EVENT but it's too late! CPU B already entered L2 and KVM_REQ_EVENT will only be consumed at next L2 entry! Another scenario to consider: 1. CPU A calls vmx_deliver_nested_posted_interrupt() to send an IPI to CPU B via virtual posted-interrupt mechanism. 2. Assume that before CPU A calls kvm_vcpu_trigger_posted_interrupt(), CPU B is at L0 and is about to resume into L2. Further assume that it is in vcpu_enter_guest() after check for KVM_REQ_EVENT. 3. At this point, CPU A calls kvm_vcpu_trigger_posted_interrupt() which will note that vcpu->mode != IN_GUEST_MODE. Therefore, do nothing and return false. Then, will set pi_pending=true and KVM_REQ_EVENT. 4. Now CPU B continue and resumes into L2 guest without processing the posted-interrupt until next L2 entry! To fix both issues, we just need to change vmx_deliver_nested_posted_interrupt() to set pi_pending=true and KVM_REQ_EVENT before calling kvm_vcpu_trigger_posted_interrupt(). It will fix the first scenario by chaging step (6) to note that KVM_REQ_EVENT and pi_pending=true and therefore process nested posted-interrupt. It will fix the second scenario by two possible ways: 1. If kvm_vcpu_trigger_posted_interrupt() is called while CPU B has changed vcpu->mode to IN_GUEST_MODE, physical IPI will be sent and will be received when CPU resumes into L2. 2. If kvm_vcpu_trigger_posted_interrupt() is called while CPU B hasn't yet changed vcpu->mode to IN_GUEST_MODE, then after CPU B will change vcpu->mode it will call kvm_request_pending() which will return true and therefore force another round of vcpu_enter_guest() which will note that KVM_REQ_EVENT and pi_pending=true and therefore process nested posted-interrupt. Cc: stable@vger.kernel.org Fixes: 705699a1 ("KVM: nVMX: Enable nested posted interrupt processing") Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> [Add kvm_vcpu_kick to also handle the case where L1 doesn't intercept L2 HLT and L2 executes HLT instruction. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Liran Alon authored
Before each vmentry to guest, vcpu_enter_guest() calls sync_pir_to_irr() which calls vmx_hwapic_irr_update() to update RVI. Currently, vmx_hwapic_irr_update() contains a tweak in case it is called when CPU is running L2 and L1 don't intercept external-interrupts. In that case, code injects interrupt directly into L2 instead of updating RVI. Besides being hacky (wouldn't expect function updating RVI to also inject interrupt), it also doesn't handle this case correctly. The code contains several issues: 1. When code calls kvm_queue_interrupt() it just passes it max_irr which represents the highest IRR currently pending in L1 LAPIC. This is problematic as interrupt was injected to guest but it's bit is still set in LAPIC IRR instead of being cleared from IRR and set in ISR. 2. Code doesn't check if LAPIC PPR is set to accept an interrupt of max_irr priority. It just checks if interrupts are enabled in guest with vmx_interrupt_allowed(). To fix the above issues: 1. Simplify vmx_hwapic_irr_update() to just update RVI. Note that this shouldn't happen when CPU is running L2 (See comment in code). 2. Since now vmx_hwapic_irr_update() only does logic for L1 virtual-interrupt-delivery, inject_pending_event() should be the one responsible for injecting the interrupt directly into L2. Therefore, change kvm_cpu_has_injectable_intr() to check L1 LAPIC when CPU is running L2. 3. Change vmx_sync_pir_to_irr() to set KVM_REQ_EVENT when L1 has a pending injectable interrupt. Fixes: 963fee16 ("KVM: nVMX: Fix virtual interrupt delivery injection") Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Liran Alon authored
In case posted-interrupt was delivered to CPU while it is in host (outside guest), then posted-interrupt delivery will be done by calling sync_pir_to_irr() at vmentry after interrupts are disabled. sync_pir_to_irr() will check vmx->pi_desc.control ON bit and if set, it will sync vmx->pi_desc.pir to IRR and afterwards update RVI to ensure virtual-interrupt-delivery will dispatch interrupt to guest. However, it is possible that L1 will receive a posted-interrupt while CPU runs at host and is about to enter L2. In this case, the call to sync_pir_to_irr() will indeed update the L1's APIC IRR but vcpu_enter_guest() will then just resume into L2 guest without re-evaluating if it should exit from L2 to L1 as a result of this new pending L1 event. To address this case, if sync_pir_to_irr() has a new L1 injectable interrupt and CPU is running L2, we force exit GUEST_MODE which will result in another iteration of vcpu_run() run loop which will call kvm_vcpu_running() which will call check_nested_events() which will handle the pending L1 event properly. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Liran Alon authored
This commit doesn't change semantics. It is done as a preparation for future commits. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Liran Alon authored
sync_pir_to_irr() is only called if vcpu->arch.apicv_active()==true. In case it is false, VMX code make sure to set sync_pir_to_irr to NULL. Therefore, having SVM stubs allows to remove check for if sync_pir_to_irr != NULL from all calling sites. Signed-off-by: Liran Alon <liran.alon@oracle.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> [Return highest IRR in the SVM case. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Liran Alon authored
kvm_clear_exception_queue() should clear pending exception. This also includes exceptions which were only marked pending but not yet injected. This is because exception.pending is used for both L1 and L2 to determine if an exception should be raised to guest. Note that an exception which is pending but not yet injected will be raised again once the guest will be resumed. Consider the following scenario: 1) L0 KVM with ignore_msrs=false. 2) L1 prepare vmcs12 with the following: a) No intercepts on MSR (MSR_BITMAP exist and is filled with 0). b) No intercept for #GP. c) vmx-preemption-timer is configured. 3) L1 enters into L2. 4) L2 reads an unhandled MSR that exists in MSR_BITMAP (such as 0x1fff). L2 RDMSR could be handled as described below: 1) L2 exits to L0 on RDMSR and calls handle_rdmsr(). 2) handle_rdmsr() calls kvm_inject_gp() which sets KVM_REQ_EVENT, exception.pending=true and exception.injected=false. 3) vcpu_enter_guest() consumes KVM_REQ_EVENT and calls inject_pending_event() which calls vmx_check_nested_events() which sees that exception.pending=true but nested_vmx_check_exception() returns 0 and therefore does nothing at this point. However let's assume it later sees vmx-preemption-timer expired and therefore exits from L2 to L1 by calling nested_vmx_vmexit(). 4) nested_vmx_vmexit() calls prepare_vmcs12() which calls vmcs12_save_pending_event() but it does nothing as exception.injected is false. Also prepare_vmcs12() calls kvm_clear_exception_queue() which does nothing as exception.injected is already false. 5) We now return from vmx_check_nested_events() with 0 while still having exception.pending=true! 6) Therefore inject_pending_event() continues and we inject L2 exception to L1!... This commit will fix above issue by changing step (4) to clear exception.pending in kvm_clear_exception_queue(). Fixes: 664f8e26 ("KVM: X86: Fix loss of exception which has not yet been injected") Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Borislav Petkov authored
... just like in vmx_set_msr(). No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Haozhong Zhang authored
Some reserved pages, such as those from NVDIMM DAX devices, are not for MMIO, and can be mapped with cached memory type for better performance. However, the above check misconceives those pages as MMIO. Because KVM maps MMIO pages with UC memory type, the performance of guest accesses to those pages would be harmed. Therefore, we check the host memory type in addition and only treat UC/UC-/WC pages as MMIO. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reported-by: Cuevas Escareno, Ivan D <ivan.d.cuevas.escareno@intel.com> Reported-by: Kumar, Karthik <karthik.kumar@intel.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Haozhong Zhang authored
Check whether the PAT memory type of a pfn cannot be overridden by MTRR UC memory type, i.e. the PAT memory type is UC, UC- or WC. This function will be used by KVM to distinguish MMIO pfns and give them UC memory type in the EPT page tables (on Intel processors, EPT memory types work like MTRRs). Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
Topic branch for CVE-2017-5753, avoiding conflicts in the next merge window.
-
Paolo Bonzini authored
Avoid reverse dependencies. Instead, SEV will only be enabled if the PSP driver is available. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
https://github.com/codomania/kvmPaolo Bonzini authored
This part of Secure Encrypted Virtualization (SEV) patch series focuses on KVM changes required to create and manage SEV guests. SEV is an extension to the AMD-V architecture which supports running encrypted virtual machine (VMs) under the control of a hypervisor. Encrypted VMs have their pages (code and data) secured such that only the guest itself has access to unencrypted version. Each encrypted VM is associated with a unique encryption key; if its data is accessed to a different entity using a different key the encrypted guest's data will be incorrectly decrypted, leading to unintelligible data. This security model ensures that hypervisor will no longer able to inspect or alter any guest code or data. The key management of this feature is handled by a separate processor known as the AMD Secure Processor (AMD-SP) which is present on AMD SOCs. The SEV Key Management Specification (see below) provides a set of commands which can be used by hypervisor to load virtual machine keys through the AMD-SP driver. The patch series adds a new ioctl in KVM driver (KVM_MEMORY_ENCRYPT_OP). The ioctl will be used by qemu to issue SEV guest-specific commands defined in Key Management Specification. The following links provide additional details: AMD Memory Encryption white paper: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf AMD64 Architecture Programmer's Manual: http://support.amd.com/TechDocs/24593.pdf SME is section 7.10 SEV is section 15.34 SEV Key Management: http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf KVM Forum Presentation: http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf SEV Guest BIOS support: SEV support has been add to EDKII/OVMF BIOS https://github.com/tianocore/edk2Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
xsetbv can be expensive when running on nested virtualization, try to avoid it. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Quan Xu <quan.xu0@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Wanpeng Li authored
syzkaller reported: WARNING: CPU: 0 PID: 12927 at arch/x86/kernel/traps.c:780 do_debug+0x222/0x250 CPU: 0 PID: 12927 Comm: syz-executor Tainted: G OE 4.15.0-rc2+ #16 RIP: 0010:do_debug+0x222/0x250 Call Trace: <#DB> debug+0x3e/0x70 RIP: 0010:copy_user_enhanced_fast_string+0x10/0x20 </#DB> _copy_from_user+0x5b/0x90 SyS_timer_create+0x33/0x80 entry_SYSCALL_64_fastpath+0x23/0x9a The testcase sets a watchpoint (with perf_event_open) on a buffer that is passed to timer_create() as the struct sigevent argument. In timer_create(), copy_from_user()'s rep movsb triggers the BP. The testcase also sets the debug registers for the guest. However, KVM only restores host debug registers when the host has active watchpoints, which triggers a race condition when running the testcase with multiple threads. The guest's DR6.BS bit can escape to the host before another thread invokes timer_create(), and do_debug() complains. The fix is to respect do_debug()'s dr6 invariant when leaving KVM. Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Wanpeng Li authored
When running on a virtual machine, IPIs are expensive when the target CPU is sleeping. Thus, it is nice to be able to avoid them for TLB shootdowns. KVM can just do the flush via INVVPID on the guest's behalf the next time the CPU is scheduled. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> [Use "&" to test the bit instead of "==". - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Wanpeng Li authored
Introduce a new bool invalidate_gpa argument to kvm_x86_ops->tlb_flush, it will be used by later patches to just flush guest tlb. For VMX, this will use INVVPID instead of INVEPT, which will invalidate combined mappings while keeping guest-physical mappings. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Wanpeng Li authored
Remote TLB flush does a busy wait which is fine in bare-metal scenario. But with-in the guest, the vcpus might have been pre-empted or blocked. In this scenario, the initator vcpu would end up busy-waiting for a long amount of time; it also consumes CPU unnecessarily to wake up the target of the shootdown. This patch set adds support for KVM's new paravirtualized TLB flush; remote TLB flush does not wait for vcpus that are sleeping, instead KVM will flush the TLB as soon as the vCPU starts running again. The improvement is clearly visible when the host is overcommitted; in this case, the PV TLB flush (in addition to avoiding the wait on the main CPU) prevents preempted vCPUs from stealing precious execution time from the running ones. Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads, so 64 pCPUs, and each VM is 64 vCPUs. ebizzy -M vanilla optimized boost 1VM 46799 48670 4% 2VM 23962 42691 78% 3VM 16152 37539 132% Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Wanpeng Li authored
The next patch will add another bit to the preempted field in kvm_steal_time. Define a constant for bit 0 (the only one that is currently used). Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
David Hildenbrand authored
"wq" is not used at all. "cpuflags" can be access directly via the vcpu, just as "float_int" via vcpu->kvm. While at it, reuse _set_cpuflag() to make the code look nicer. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180108193747.10818-1-david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Christian Borntraeger authored
make kvmconfig currently does not select CONFIG_S390_GUEST. Since the virtio-ccw transport depends on CONFIG_S390_GUEST, we want to add CONFIG_S390_GUEST to kvmconfig. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com>
-
Michael Mueller authored
It is not required to take to a lock to protect access to the cpuflags of the local interrupt structure of a vcpu as the performed operation is an atomic_or. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Christian Borntraeger authored
The cpu model already traces the cpu facilities, the ibc and guest CPU ids. We should do the same for the cpu features (on success only). Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>
-
Christian Borntraeger authored
commit a03825bb ("KVM: s390: use kvm->created_vcpus") introduced kvm->created_vcpus to avoid races with the existing kvm->online_vcpus scheme. One place was "forgotten" and one new place was "added". Let's fix those. Reported-by: Halil Pasic <pasic@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Fixes: 4e0b1ab7 ("KVM: s390: gs support for kvm guests") Fixes: a03825bb ("KVM: s390: use kvm->created_vcpus")
-
David Hildenbrand authored
gmap_mprotect_notify() refuses shadow gmaps. Turns out that a) gmap_protect_range() b) gmap_read_table() c) gmap_pte_op_walk() Are never called for gmap shadows. And never should be. This dates back to gmap shadow prototypes where we allowed to call mprotect_notify() on the gmap shadow (to get notified about the prefix pages getting removed). This is avoided by always getting notified about any change on the gmap shadow. The only real function for walking page tables on shadow gmaps is gmap_table_walk(). So, essentially, these functions should never get called and gmap_pte_op_walk() can be cleaned up. Add some checks to callers of gmap_pte_op_walk(). Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20171110151805.7541-1-david@redhat.com> Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
- 11 Jan, 2018 1 commit
-
-
Andrew Honig authored
This adds a memory barrier when performing a lookup into the vmcs_field_to_offset_table. This is related to CVE-2017-5753. Signed-off-by: Andrew Honig <ahonig@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 14 Dec, 2017 5 commits
-
-
Paolo Bonzini authored
After the vcpu_load/vcpu_put pushdown, the handling of asynchronous VCPU ioctl is already much clearer in that it is obvious that they bypass vcpu_load and vcpu_put. However, it is still not perfect in that the different state of the VCPU mutex is still hidden in the caller. Separate those ioctls into a new function kvm_arch_vcpu_async_ioctl that returns -ENOIOCTLCMD for more "traditional" synchronous ioctls. Cc: James Hogan <jhogan@kernel.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Suggested-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Christoffer Dall authored
Move the calls to vcpu_load() and vcpu_put() in to the architecture specific implementations of kvm_arch_vcpu_ioctl() which dispatches further architecture-specific ioctls on to other functions. Some architectures support asynchronous vcpu ioctls which cannot call vcpu_load() or take the vcpu->mutex, because that would prevent concurrent execution with a running VCPU, which is the intended purpose of these ioctls, for example because they inject interrupts. We repeat the separate checks for these specifics in the architecture code for MIPS, S390 and PPC, and avoid taking the vcpu->mutex and calling vcpu_load for these ioctls. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Christoffer Dall authored
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_fpu(). Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Christoffer Dall authored
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_get_fpu(). Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Christoffer Dall authored
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_guest_debug(). Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-