An error occurred fetching the project authors.
- 16 Jan, 2018 3 commits
-
-
Wanpeng Li authored
When running on a virtual machine, IPIs are expensive when the target CPU is sleeping. Thus, it is nice to be able to avoid them for TLB shootdowns. KVM can just do the flush via INVVPID on the guest's behalf the next time the CPU is scheduled. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Wanpeng Li <wanpeng.li@hotmail.com> [Use "&" to test the bit instead of "==". - Paolo] Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
Wanpeng Li authored
Introduce a new bool invalidate_gpa argument to kvm_x86_ops->tlb_flush, it will be used by later patches to just flush guest tlb. For VMX, this will use INVVPID instead of INVEPT, which will invalidate combined mappings while keeping guest-physical mappings. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
Wanpeng Li authored
The next patch will add another bit to the preempted field in kvm_steal_time. Define a constant for bit 0 (the only one that is currently used). Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
- 14 Dec, 2017 18 commits
-
-
Christoffer Dall authored
Move the calls to vcpu_load() and vcpu_put() in to the architecture specific implementations of kvm_arch_vcpu_ioctl() which dispatches further architecture-specific ioctls on to other functions. Some architectures support asynchronous vcpu ioctls which cannot call vcpu_load() or take the vcpu->mutex, because that would prevent concurrent execution with a running VCPU, which is the intended purpose of these ioctls, for example because they inject interrupts. We repeat the separate checks for these specifics in the architecture code for MIPS, S390 and PPC, and avoid taking the vcpu->mutex and calling vcpu_load for these ioctls. Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Christoffer Dall authored
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_fpu(). Reviewed-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
Cornelia Huck <cohuck@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Christoffer Dall authored
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_get_fpu(). Reviewed-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
Cornelia Huck <cohuck@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Christoffer Dall authored
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_guest_debug(). Reviewed-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
Cornelia Huck <cohuck@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Christoffer Dall authored
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_translate(). Reviewed-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
Cornelia Huck <cohuck@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Christoffer Dall authored
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_mpstate(). Reviewed-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
Cornelia Huck <cohuck@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Christoffer Dall authored
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_get_mpstate(). Reviewed-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
Cornelia Huck <cohuck@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Christoffer Dall authored
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_sregs(). Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Cornelia Huck <cohuck@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Christoffer Dall authored
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_get_sregs(). Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Cornelia Huck <cohuck@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Christoffer Dall authored
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_regs(). Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Cornelia Huck <cohuck@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Christoffer Dall authored
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_get_regs(). Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Cornelia Huck <cohuck@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Christoffer Dall authored
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_run(). Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> # s390 parts Reviewed-by:
Cornelia Huck <cohuck@redhat.com> [Rebased. - Paolo] Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Christoffer Dall authored
As we're about to call vcpu_load() from architecture-specific implementations of the KVM vcpu ioctls, but yet we access data structures protected by the vcpu->mutex in the generic code, factor this logic out from vcpu_load(). x86 is the only architecture which calls vcpu_load() outside of the main vcpu ioctl function, and these calls will no longer take the vcpu mutex following this patch. However, with the exception of kvm_arch_vcpu_postcreate (see below), the callers are either in the creation or destruction path of the VCPU, which means there cannot be any concurrent access to the data structure, because the file descriptor is not yet accessible, or is already gone. kvm_arch_vcpu_postcreate makes the newly created vcpu potentially accessible by other in-kernel threads through the kvm->vcpus array, and we therefore take the vcpu mutex in this case directly. Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
Cornelia Huck <cohuck@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Wanpeng Li authored
When I run ebizzy in a 32 vCPUs guest on a 32 pCPUs Xeon box, I can observe ~8000 kvm_wait_lapic_expire CurAvg/s through kvm_stat tool even if the advance tscdeadline hrtimer expiration is disabled. Each call to wait_lapic_expire() will consume ~70 cycles when a timer fires since apic_timer_expire() will set expired_tscdeadline and then wait_lapic_expire() will do some caculation before bailing out. So total ~175us per second is lost on this 3.2Ghz machine. This patch reduces the overhead by skipping the function wait_lapic_expire() when lapic_timer_advance is disabled. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Liran Alon authored
This MSR returns the number of #SMIs that occurred on CPU since boot. It was seen to be used frequently by ESXi guest. Patch adds a new vcpu-arch specific var called smi_count to save the number of #SMIs which occurred on CPU since boot. It is exposed as a read-only MSR to guest (causing #GP on wrmsr) in RDMSR/WRMSR emulation code. MSR_SMI_COUNT is also added to emulated_msrs[] to make sure user-space can save/restore it for migration purposes. Signed-off-by:
Liran Alon <liran.alon@oracle.com> Suggested-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by:
Bhavesh Davda <bhavesh.davda@oracle.com> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
Add the CPUID bits, make the CR4.UMIP bit not reserved anymore, and add UMIP support for instructions that are already emulated by KVM. Reviewed-by:
Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Peter Xu authored
------------[ cut here ]------------ Bad FPU state detected at kvm_put_guest_fpu+0xd8/0x2d0 [kvm], reinitializing FPU registers. WARNING: CPU: 1 PID: 4594 at arch/x86/mm/extable.c:103 ex_handler_fprestore+0x88/0x90 CPU: 1 PID: 4594 Comm: qemu-system-x86 Tainted: G B OE 4.15.0-rc2+ #10 RIP: 0010:ex_handler_fprestore+0x88/0x90 Call Trace: fixup_exception+0x4e/0x60 do_general_protection+0xff/0x270 general_protection+0x22/0x30 RIP: 0010:kvm_put_guest_fpu+0xd8/0x2d0 [kvm] RSP: 0018:ffff8803d5627810 EFLAGS: 00010246 kvm_vcpu_reset+0x3b4/0x3c0 [kvm] kvm_apic_accept_events+0x1c0/0x240 [kvm] kvm_arch_vcpu_ioctl_run+0x1658/0x2fb0 [kvm] kvm_vcpu_ioctl+0x479/0x880 [kvm] do_vfs_ioctl+0x142/0x9a0 SyS_ioctl+0x74/0x80 do_syscall_64+0x15f/0x600 where kvm_put_guest_fpu is called without a prior kvm_load_guest_fpu. To fix it, move kvm_load_guest_fpu to the very beginning of kvm_arch_vcpu_ioctl_run. Cc: stable@vger.kernel.org Fixes: f775b13eSigned-off-by:
Peter Xu <peterx@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Wanpeng Li authored
*** Guest State *** CR0: actual=0x0000000000000030, shadow=0x0000000060000010, gh_mask=fffffffffffffff7 CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=ffffffffffffe871 CR3 = 0x00000000fffbc000 RSP = 0x0000000000000000 RIP = 0x0000000000000000 RFLAGS=0x00000000 DR7 = 0x0000000000000400 ^^^^^^^^^^ The failed vmentry is triggered by the following testcase when ept=Y: #include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <linux/kvm.h> #include <fcntl.h> #include <sys/ioctl.h> long r[5]; int main() { r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7); struct kvm_regs regs = { .rflags = 0, }; ioctl(r[4], KVM_SET_REGS, ®s); ioctl(r[4], KVM_RUN, 0); } X86 RFLAGS bit 1 is fixed set, userspace can simply clearing bit 1 of RFLAGS with KVM_SET_REGS ioctl which results in vmentry fails. This patch fixes it by oring X86_EFLAGS_FIXED during ioctl. Cc: stable@vger.kernel.org Suggested-by:
Jim Mattson <jmattson@google.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Quan Xu <quan.xu0@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by:
Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 06 Dec, 2017 1 commit
-
-
Radim Krčmář authored
Implementation of the unpinned APIC page didn't update the VMCS address cache when invalidation was done through range mmu notifiers. This became a problem when the page notifier was removed. Re-introduce the arch-specific helper and call it from ...range_start. Reported-by:
Fabian Grünbichler <f.gruenbichler@proxmox.com> Fixes: 38b99173 ("kvm: vmx: Implement set_apic_access_page_addr") Fixes: 369ea824 ("mm/rmap: update to new mmu_notifier semantic v2") Cc: <stable@vger.kernel.org> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Andrea Arcangeli <aarcange@redhat.com> Tested-by:
Wanpeng Li <wanpeng.li@hotmail.com> Tested-by:
Fabian Grünbichler <f.gruenbichler@proxmox.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
- 05 Dec, 2017 2 commits
-
-
Rik van Riel authored
Now that get_fpu and put_fpu do nothing, because the scheduler will automatically load and restore the guest FPU context for us while we are in this code (deep inside the vcpu_run main loop), we can get rid of the get_fpu and put_fpu hooks. Signed-off-by:
Rik van Riel <riel@redhat.com> Suggested-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Rik van Riel authored
Currently, every time a VCPU is scheduled out, the host kernel will first save the guest FPU/xstate context, then load the qemu userspace FPU context, only to then immediately save the qemu userspace FPU context back to memory. When scheduling in a VCPU, the same extraneous FPU loads and saves are done. This could be avoided by moving from a model where the guest FPU is loaded and stored with preemption disabled, to a model where the qemu userspace FPU is swapped out for the guest FPU context for the duration of the KVM_RUN ioctl. This is done under the VCPU mutex, which is also taken when other tasks inspect the VCPU FPU context, so the code should already be safe for this change. That should come as no surprise, given that s390 already has this optimization. This can fix a bug where KVM calls get_user_pages while owning the FPU, and the file system ends up requesting the FPU again: [258270.527947] __warn+0xcb/0xf0 [258270.527948] warn_slowpath_null+0x1d/0x20 [258270.527951] kernel_fpu_disable+0x3f/0x50 [258270.527953] __kernel_fpu_begin+0x49/0x100 [258270.527955] kernel_fpu_begin+0xe/0x10 [258270.527958] crc32c_pcl_intel_update+0x84/0xb0 [258270.527961] crypto_shash_update+0x3f/0x110 [258270.527968] crc32c+0x63/0x8a [libcrc32c] [258270.527975] dm_bm_checksum+0x1b/0x20 [dm_persistent_data] [258270.527978] node_prepare_for_write+0x44/0x70 [dm_persistent_data] [258270.527985] dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data] [258270.527988] submit_io+0x170/0x1b0 [dm_bufio] [258270.527992] __write_dirty_buffer+0x89/0x90 [dm_bufio] [258270.527994] __make_buffer_clean+0x4f/0x80 [dm_bufio] [258270.527996] __try_evict_buffer+0x42/0x60 [dm_bufio] [258270.527998] dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio] [258270.528002] shrink_slab.part.40+0x1f5/0x420 [258270.528004] shrink_node+0x22c/0x320 [258270.528006] do_try_to_free_pages+0xf5/0x330 [258270.528008] try_to_free_pages+0xe9/0x190 [258270.528009] __alloc_pages_slowpath+0x40f/0xba0 [258270.528011] __alloc_pages_nodemask+0x209/0x260 [258270.528014] alloc_pages_vma+0x1f1/0x250 [258270.528017] do_huge_pmd_anonymous_page+0x123/0x660 [258270.528021] handle_mm_fault+0xfd3/0x1330 [258270.528025] __get_user_pages+0x113/0x640 [258270.528027] get_user_pages+0x4f/0x60 [258270.528063] __gfn_to_pfn_memslot+0x120/0x3f0 [kvm] [258270.528108] try_async_pf+0x66/0x230 [kvm] [258270.528135] tdp_page_fault+0x130/0x280 [kvm] [258270.528149] kvm_mmu_page_fault+0x60/0x120 [kvm] [258270.528158] handle_ept_violation+0x91/0x170 [kvm_intel] [258270.528162] vmx_handle_exit+0x1ca/0x1400 [kvm_intel] No performance changes were detected in quick ping-pong tests on my 4 socket system, which is expected since an FPU+xstate load is on the order of 0.1us, while ping-ponging between CPUs is on the order of 20us, and somewhat noisy. Cc: stable@vger.kernel.org Signed-off-by:
Rik van Riel <riel@redhat.com> Suggested-by:
Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> [Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu, which happened inside from KVM_CREATE_VCPU ioctl. - Radim] Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
- 27 Nov, 2017 2 commits
-
-
Jan H. Schönherr authored
KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that "any unblocked signal received [...] will cause KVM_RUN to return with -EINTR" and that "the signal will only be delivered if not blocked by the original signal mask". This, however, is only true, when the calling task has a signal handler registered for a signal. If not, signal evaluation is short-circuited for SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN returning or the whole process is terminated. Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar to that in do_sigtimedwait() to avoid short-circuiting of signals. Signed-off-by:
Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Wanpeng Li authored
watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [qemu-system-x86:10185] CPU: 6 PID: 10185 Comm: qemu-system-x86 Tainted: G OE 4.14.0-rc4+ #4 RIP: 0010:kvm_get_time_scale+0x4e/0xa0 [kvm] Call Trace: get_time_ref_counter+0x5a/0x80 [kvm] kvm_hv_process_stimers+0x120/0x5f0 [kvm] kvm_arch_vcpu_ioctl_run+0x4b4/0x1690 [kvm] kvm_vcpu_ioctl+0x33a/0x620 [kvm] do_vfs_ioctl+0xa1/0x5d0 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x1e/0xa9 This can be reproduced when running kvm-unit-tests/hyperv_stimer.flat and cpu-hotplug stress simultaneously. __this_cpu_read(cpu_tsc_khz) returns 0 (set in kvmclock_cpu_down_prep()) when the pCPU is unhotplug which results in kvm_get_time_scale() gets into an infinite loop. This patch fixes it by treating the unhotplug pCPU as not using master clock. Reviewed-by:
Radim Krčmář <rkrcmar@redhat.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 17 Nov, 2017 4 commits
-
-
Paolo Bonzini authored
Sometimes, a processor might execute an instruction while another processor is updating the page tables for that instruction's code page, but before the TLB shootdown completes. The interesting case happens if the page is in the TLB. In general, the processor will succeed in executing the instruction and nothing bad happens. However, what if the instruction is an MMIO access? If *that* happens, KVM invokes the emulator, and the emulator gets the updated page tables. If the update side had marked the code page as non present, the page table walk then will fail and so will x86_decode_insn. Unfortunately, even though kvm_fetch_guest_virt is correctly returning X86EMUL_PROPAGATE_FAULT, x86_decode_insn's caller treats the failure as a fatal error if the instruction cannot simply be reexecuted (as is the case for MMIO). And this in fact happened sometimes when rebooting Windows 2012r2 guests. Just checking ctxt->have_exception and injecting the exception if true is enough to fix the case. Thanks to Eduardo Habkost for helping in the debugging of this issue. Reported-by:
Yanan Fu <yfu@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: stable@vger.kernel.org Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
Eyal Moscovici authored
Some guests use these unhandled MSRs very frequently. This cause dmesg to be populated with lots of aggregated messages on usage of ignored MSRs. As ignore_msrs=true means that the user is well-aware his guest use ignored MSRs, allow to also disable the prints on their usage. An example of such guest is ESXi which tends to access a lot to MSR 0x34 (MSR_SMI_COUNT) very frequently. In addition, we have observed this to cause unnecessary delays to guest execution. Such an example is ESXi which experience networking delays in it's guests (L2 guests) because of these prints (even when prints are rate-limited). This can easily be reproduced by pinging from one L2 guest to another. Once in a while, a peak in ping RTT will be observed. Removing these unhandled MSR prints solves the issue. Because these prints can help diagnose issues with guests, this commit only suppress them by a module parameter instead of removing them from code entirely. Signed-off-by:
Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by:
Liran Alon <liran.alon@oracle.com> Reviewed-by:
Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by:
Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [Changed suppress_ignore_msrs_prints to report_ignored_msrs - Radim] Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
Liran Alon authored
On this case, handle_emulation_failure() fills kvm_run with internal-error information which it expects to be delivered to user-mode for further processing. However, the code reports a wrong return-value which makes KVM to never return to user-mode on this scenario. Fixes: 6d77dbfc ("KVM: inject #UD if instruction emulation fails and exit to userspace") Signed-off-by:
Liran Alon <liran.alon@oracle.com> Reviewed-by:
Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by:
Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
Liran Alon authored
When guest passes KVM it's pvclock-page GPA via WRMSR to MSR_KVM_SYSTEM_TIME / MSR_KVM_SYSTEM_TIME_NEW, KVM don't initialize pvclock-page to some start-values. It just requests a clock-update which will happen before entering to guest. The clock-update logic will call kvm_setup_pvclock_page() to update the pvclock-page with info. However, kvm_setup_pvclock_page() *wrongly* assumes that the version-field is initialized to an even number. This is wrong because at first-time write, field could be any-value. Fix simply makes sure that if first-time version-field is odd, increment it once more to make it even and only then start standard logic. This follows same logic as done in other pvclock shared-pages (See kvm_write_wall_clock() and record_steal_time()). Signed-off-by:
Liran Alon <liran.alon@oracle.com> Reviewed-by:
Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
- 20 Oct, 2017 1 commit
-
-
Wanpeng Li authored
Both Intel SDM and AMD APM mentioned that MCi_STATUS, when the register is implemented, this register can be cleared by explicitly writing 0s to this register. Writing 1s to this register will cause a general-protection exception. The mce is emulated in qemu, so just the guest attempts to write 1 to this register should cause a #GP, this patch does it. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by:
Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by:
Jim Mattson <jmattson@google.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
- 18 Oct, 2017 1 commit
-
-
Ladi Prosek authored
Commit 05cade71 ("KVM: nSVM: fix SMI injection in guest mode") made KVM mask SMI if GIF=0 but it didn't do anything to unmask it when GIF is enabled. The issue manifests for me as a significantly longer boot time of Windows guests when running with SMM-enabled OVMF. This commit fixes it by intercepting STGI instead of requesting immediate exit if the reason why SMM was masked is GIF. Fixes: 05cade71 ("KVM: nSVM: fix SMI injection in guest mode") Signed-off-by:
Ladi Prosek <lprosek@redhat.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
- 12 Oct, 2017 7 commits
-
-
Ladi Prosek authored
Entering SMM while running in guest mode wasn't working very well because several pieces of the vcpu state were left set up for nested operation. Some of the issues observed: * L1 was getting unexpected VM exits (using L1 interception controls but running in SMM execution environment) * MMU was confused (walk_mmu was still set to nested_mmu) * INTERCEPT_SMI was not emulated for L1 (KVM never injected SVM_EXIT_SMI) Intel SDM actually prescribes the logical processor to "leave VMX operation" upon entering SMM in 34.14.1 Default Treatment of SMI Delivery. AMD doesn't seem to document this but they provide fields in the SMM state-save area to stash the current state of SVM. What we need to do is basically get out of guest mode for the duration of SMM. All this completely transparent to L1, i.e. L1 is not given control and no L1 observable state changes. To avoid code duplication this commit takes advantage of the existing nested vmexit and run functionality, perhaps at the cost of efficiency. To get out of guest mode, nested_svm_vmexit is called, unchanged. Re-entering is performed using enter_svm_guest_mode. This commit fixes running Windows Server 2016 with Hyper-V enabled in a VM with OVMF firmware (OVMF_CODE-need-smm.fd). Signed-off-by:
Ladi Prosek <lprosek@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Ladi Prosek authored
Similar to NMI, there may be ISA specific reasons why an SMI cannot be injected into the guest. This commit adds a new smi_allowed callback to be implemented in following commits. Signed-off-by:
Ladi Prosek <lprosek@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Ladi Prosek authored
Entering and exiting SMM may require ISA specific handling under certain circumstances. This commit adds two new callbacks with empty implementations. Actual functionality will be added in following commits. * pre_enter_smm() is to be called when injecting an SMM, before any SMM related vcpu state has been changed * pre_leave_smm() is to be called when emulating the RSM instruction, when the vcpu is in real mode and before any SMM related vcpu state has been restored Signed-off-by:
Ladi Prosek <lprosek@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Wanpeng Li authored
- XCR0 is reset to 1 by RESET but not INIT - XSS is zeroed by both RESET and INIT - BNDCFGU, BND0-BND3, BNDCFGS, BNDSTATUS are zeroed by both RESET and INIT This patch does this according to SDM. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by:
Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
David Hildenbrand authored
Changing it afterwards doesn't make too much sense and will only result in inconsistencies. Reviewed-by:
Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
David Hildenbrand authored
vmx and svm use zalloc, so this is not necessary. Reviewed-by:
Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
David Hildenbrand authored
And also get rid of that superfluous local variable "kvm". Reviewed-by:
Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
- 26 Sep, 2017 1 commit
-
-
Ingo Molnar authored
Rename this function to better express that it's all about initializing the FPU state of a task which goes hand in hand with the fpu::initialized field. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-33-mingo@kernel.orgSigned-off-by:
Ingo Molnar <mingo@kernel.org>
-