- 07 Sep, 2017 1 commit
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linuxRadim Krčmář authored
KVM: s390: Fixes and features for 4.14 - merge of topic branch tlb-flushing from the s390 tree to get the no-dat base features - merge of kvm/master to avoid conflicts with additional sthyi fixes - wire up the no-dat enhancements in KVM - multiple epoch facility (z14 feature) - Configuration z/Architecture Mode - more sthyi fixes - gdb server range checking fix - small code cleanups
-
- 31 Aug, 2017 3 commits
-
-
David Hildenbrand authored
The machine check information is part of the vsie_page. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170830160603.5452-4-david@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
David Hildenbrand authored
Move the real logic that always has to be executed out of the WARN_ON_ONCE. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170830160603.5452-3-david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
David Hildenbrand authored
Looks like the "overflowing" range check is wrong. |=======b-------a=======| addr >= a || addr <= b Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170830160603.5452-2-david@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
- 29 Aug, 2017 4 commits
-
-
David Hildenbrand authored
Independent of the underlying hardware, kvm will now always handle SIGP SET ARCHITECTURE as if czam were enabled. Therefore, let's not only forward that bit but always set it. While at it, add a comment regarding STHYI. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170829143108.14703-1-david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Claudio Imbrenda authored
The STFLE bit 147 indicates whether the ESSA no-DAT operation code is valid, the bit is not normally provided to the host; the host is instead provided with an SCLP bit that indicates whether guests can support the feature. This patch: * enables the STFLE bit in the guest if the corresponding SCLP bit is present in the host. * adds support for migrating the no-DAT bit in the PGSTEs * fixes the software interpretation of the ESSA instruction that is used when migrating, both for the new operation code and for the old "set stable", as per specifications. Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Heiko Carstens authored
handle_sthyi() always writes to guest memory if the sthyi function code is zero in order to fault in the page that later is written to. However a function code of zero does not necessarily mean that a write to guest memory happens: if the KVM host is running as a second level guest under z/VM 6.2 the sthyi instruction is indicated to be available to the KVM host, however if the instruction is executed it will always return with a return code that indicates "unsupported function code". In such a case handle_sthyi() must not write to guest memory. This means that the prior write access to fault in the guest page may result in invalid guest exceptions, and/or invalid data modification. In order to be architecture compliant simply remove the write_guest() call. Given that the guest assumed a write access anyway, this fix does not qualify for -stable. This just makes sure the sthyi handler is architecture compliant. Fixes: 95ca2cb5 ("KVM: s390: Add sthyi emulation") Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Collin L. Walling authored
Allow for the enablement of MEF and the support for the extended epoch in SIE and VSIE for the extended guest TOD-Clock. A new interface is used for getting/setting a guest's extended TOD-Clock that uses a single ioctl invocation, KVM_S390_VM_TOD_EXT. Since the host time is a moving target that might see an epoch switch or STP sync checks we need an atomic ioctl and cannot use the exisiting two interfaces. The old method of getting and setting the guest TOD-Clock is still retained and is used when the old ioctls are called. Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
- 28 Aug, 2017 2 commits
-
-
Jason J. Herne authored
kvm has always supported the concept of starting in z/Arch mode so let's reflect the feature bit to the guest. Also, we change sigp set architecture to reject any request to change architecture modes. Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Christian Borntraeger authored
Additional fixes on top of these two - missing inline assembly constraint - wrong exception handling are necessary
-
- 25 Aug, 2017 1 commit
-
-
Jim Mattson authored
According to the SDM, if the "use TPR shadow" VM-execution control is 1, bits 11:0 of the virtual-APIC address must be 0 and the address should set any bits beyond the processor's physical-address width. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 24 Aug, 2017 13 commits
-
-
Wanpeng Li authored
------------[ cut here ]------------ WARNING: CPU: 7 PID: 3861 at /home/kernel/ssd/kvm/arch/x86/kvm//vmx.c:11299 nested_vmx_vmexit+0x176e/0x1980 [kvm_intel] CPU: 7 PID: 3861 Comm: qemu-system-x86 Tainted: G W OE 4.13.0-rc4+ #11 RIP: 0010:nested_vmx_vmexit+0x176e/0x1980 [kvm_intel] Call Trace: ? kvm_multiple_exception+0x149/0x170 [kvm] ? handle_emulation_failure+0x79/0x230 [kvm] ? load_vmcs12_host_state+0xa80/0xa80 [kvm_intel] ? check_chain_key+0x137/0x1e0 ? reexecute_instruction.part.168+0x130/0x130 [kvm] nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel] ? nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel] vmx_queue_exception+0x197/0x300 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x1b0c/0x2c90 [kvm] ? kvm_arch_vcpu_runnable+0x220/0x220 [kvm] ? preempt_count_sub+0x18/0xc0 ? restart_apic_timer+0x17d/0x300 [kvm] ? kvm_lapic_restart_hv_timer+0x37/0x50 [kvm] ? kvm_arch_vcpu_load+0x1d8/0x350 [kvm] kvm_vcpu_ioctl+0x4e4/0x910 [kvm] ? kvm_vcpu_ioctl+0x4e4/0x910 [kvm] ? kvm_dev_ioctl+0xbe0/0xbe0 [kvm] The flag "nested_run_pending", which can override the decision of which should run next, L1 or L2. nested_run_pending=1 means that we *must* run L2 next, not L1. This is necessary in particular when L1 did a VMLAUNCH of L2 and therefore expects L2 to be run (and perhaps be injected with an event it specified, etc.). Nested_run_pending is especially intended to avoid switching to L1 in the injection decision-point. This can be handled just like the other cases in vmx_check_nested_events, instead of having a special case in vmx_queue_exception. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Wanpeng Li authored
vmx_complete_interrupts() assumes that the exception is always injected, so it can be dropped by kvm_clear_exception_queue(). However, an exception cannot be injected immediately if it is: 1) originally destined to a nested guest; 2) trapped to cause a vmexit; 3) happening right after VMLAUNCH/VMRESUME, i.e. when nested_run_pending is true. This patch applies to exceptions the same algorithm that is used for NMIs, replacing exception.reinject with "exception.injected" (equivalent to nmi_injected). exception.pending now represents an exception that is queued and whose side effects (e.g., update RFLAGS.RF or DR7) have not been applied yet. If exception.pending is true, the exception might result in a nested vmexit instead, too (in which case the side effects must not be applied). exception.injected instead represents an exception that is going to be injected into the guest at the next vmentry. Reported-by: Radim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Wanpeng Li authored
Use kvm_event_needs_reinjection() encapsulation. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
update_permission_bitmask currently does a 128-iteration loop to, essentially, compute a constant array. Computing the 8 bits in parallel reduces it to 16 iterations, and is enough to speed it up substantially because many boolean operations in the inner loop become constants or simplify noticeably. Because update_permission_bitmask is actually the top item in the profile for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand clock cycles, or up to 30%: before after cpuid 35173 25954 vmcall 35122 27079 inl_from_pmtimer 52635 42675 inl_from_qemu 53604 44599 inl_from_kernel 38498 30798 outl_to_kernel 34508 28816 wr_tsc_adjust_msr 34185 26818 rd_tsc_adjust_msr 37409 27049 mmio-no-eventfd:pci-mem 50563 45276 mmio-wildcard-eventfd:pci-mem 34495 30823 mmio-datamatch-eventfd:pci-mem 35612 31071 portio-no-eventfd:pci-io 44925 40661 portio-wildcard-eventfd:pci-io 29708 27269 portio-datamatch-eventfd:pci-io 31135 27164 (I wrote a small C program to compare the tables for all values of CR0.WP, CR4.SMAP and CR4.SMEP, and they match). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Yu Zhang authored
This patch exposes 5 level page table feature to the VM. At the same time, the canonical virtual address checking is extended to support both 48-bits and 57-bits address width. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Yu Zhang authored
Extends the shadow paging code, so that 5 level shadow page table can be constructed if VM is running in 5 level paging mode. Also extends the ept code, so that 5 level ept table can be constructed if maxphysaddr of VM exceeds 48 bits. Unlike the shadow logic, KVM should still use 4 level ept table for a VM whose physical address width is less than 48 bits, even when the VM is running in 5 level paging mode. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> [Unconditionally reset the MMU context in kvm_cpuid_update. Changing MAXPHYADDR invalidates the reserved bit bitmasks. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Yu Zhang authored
Now we have 4 level page table and 5 level page table in 64 bits long mode, let's rename the PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL, then we can use PT64_ROOT_5LEVEL for 5 level page table, it's helpful to make the code more clear. Also PT64_ROOT_MAX_LEVEL is defined as 4, so that we can just redefine it to 5 whenever a replacement is needed for 5 level paging. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Yu Zhang authored
Currently, KVM uses CR3_L_MODE_RESERVED_BITS to check the reserved bits in CR3. Yet the length of reserved bits in guest CR3 should be based on the physical address width exposed to the VM. This patch changes CR3 check logic to calculate the reserved bits at runtime. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Yu Zhang authored
Return false in kvm_cpuid() when it fails to find the cpuid entry. Also, this routine(and its caller) is optimized with a new argument - check_limit, so that the check_cpuid_limit() fall back can be avoided. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
A guest may not be configured to support XSAVES/XRSTORS, even when the host does. If the guest does not support XSAVES/XRSTORS, clear the secondary execution control so that the processor will raise #UD. Also clear the "allowed-1" bit for XSAVES/XRSTORS exiting in the IA32_VMX_PROCBASED_CTLS2 MSR, and pass through VMCS12's control in the VMCS02. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
A guest may not be configured to support RDSEED, even when the host does. If the guest does not support RDSEED, intercept the instruction and synthesize #UD. Also clear the "allowed-1" bit for RDSEED exiting in the IA32_VMX_PROCBASED_CTLS2 MSR. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
A guest may not be configured to support RDRAND, even when the host does. If the guest does not support RDRAND, intercept the instruction and synthesize #UD. Also clear the "allowed-1" bit for RDRAND exiting in the IA32_VMX_PROCBASED_CTLS2 MSR. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Currently, secondary execution controls are divided in three groups: - static, depending mostly on the module arguments or the processor (vmx_secondary_exec_control) - static, depending on CPUID (vmx_cpuid_update) - dynamic, depending on nested VMX or local APIC state Because walking CPUID is expensive, prepare_vmcs02 is using only the first group. This however is unnecessarily complicated. Just cache the static secondary execution controls, and then prepare_vmcs02 does not need to compute them every time. Computation of all static secondary execution controls is now kept in a single function, vmx_compute_secondary_exec_control. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 23 Aug, 2017 2 commits
-
-
Janakarajan Natarajan authored
Enable the Virtual GIF feature. This is done by setting bit 25 at position 60h in the vmcb. With this feature enabled, the processor uses bit 9 at position 60h as the virtual GIF when executing STGI/CLGI instructions. Since the execution of STGI by the L1 hypervisor does not cause a return to the outermost (L0) hypervisor, the enable_irq_window and enable_nmi_window are modified. The IRQ window will be opened even if GIF is not set, under the assumption that on resuming the L1 hypervisor the IRQ will be held pending until the processor executes the STGI instruction. For the NMI window, the STGI intercept is set. This will assist in opening the window only when GIF=1. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Janakarajan Natarajan authored
Add a new cpufeature definition for Virtual GIF. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 21 Aug, 2017 2 commits
-
-
Heiko Carstens authored
sthyi should only generate a specification exception if the function code is zero and the response buffer is not on a 4k boundary. The current code would also test for unknown function codes if the response buffer, that is currently only defined for function code 0, is not on a 4k boundary and incorrectly inject a specification exception instead of returning with condition code 3 and return code 4 (unsupported function code). Fix this by moving the boundary check. Fixes: 95ca2cb5 ("KVM: s390: Add sthyi emulation") Cc: <stable@vger.kernel.org> # 4.8+ Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Heiko Carstens authored
The sthyi inline assembly misses register r3 within the clobber list. The sthyi instruction will always write a return code to register "R2+1", which in this case would be r3. Due to that we may have register corruption and see host crashes or data corruption depending on how gcc decided to allocate and use registers during compile time. Fixes: 95ca2cb5 ("KVM: s390: Add sthyi emulation") Cc: <stable@vger.kernel.org> # 4.8+ Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
- 18 Aug, 2017 6 commits
-
-
David Hildenbrand authored
We already always set that type but don't check if it is supported. Also for nVMX, we only support WB for now. Let's just require it. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
David Hildenbrand authored
Don't use shifts, tag them correctly as EPTP and use better matching names (PWL vs. GAW). Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Denys Vlasenko authored
With lightly tweaked defconfig: text data bss dec hex filename 11259661 5109408 2981888 19350957 12745ad vmlinux.before 11259661 5109408 884736 17253805 10745ad vmlinux.after Only compile-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
There is currently some confusion between nested and L1 GPAs. The assignment to "direct" in kvm_mmu_page_fault tries to fix that, but it is not enough. What this patch does is fence off the MMIO cache completely when using shadow nested page tables, since we have neither a GVA nor an L1 GPA to put in the cache. This also allows some simplifications in kvm_mmu_page_fault and FNAME(page_fault). The EPT misconfig likewise does not have an L1 GPA to pass to kvm_io_bus_write, so that must be skipped for guest mode. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [Changed comment to say "GPAs" instead of "L1's physical addresses", as per David's review. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Brijesh Singh authored
When a guest causes a page fault which requires emulation, the vcpu->arch.gpa_available flag is set to indicate that cr2 contains a valid GPA. Currently, emulator_read_write_onepage() makes use of gpa_available flag to avoid a guest page walk for a known MMIO regions. Lets not limit the gpa_available optimization to just MMIO region. The patch extends the check to avoid page walk whenever gpa_available flag is set. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> [Fix EPT=0 according to Wanpeng Li's fix, plus ensure VMX also uses the new code. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [Moved "ret < 0" to the else brach, as per David's review. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
Calling handle_mmio_page_fault() has been unnecessary since commit e9ee956e ("KVM: x86: MMU: Move handle_mmio_page_fault() call to kvm_mmu_page_fault()", 2016-02-22). handle_mmio_page_fault() can now be made static. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
- 15 Aug, 2017 1 commit
-
-
Arnd Bergmann authored
When PAGE_OFFSET is not a compile-time constant, we run into warnings from the use of kvm_is_error_hva() that the compiler cannot optimize out: arch/arm/kvm/../../../virt/kvm/kvm_main.c: In function '__kvm_gfn_to_hva_cache_init': arch/arm/kvm/../../../virt/kvm/kvm_main.c:1978:14: error: 'nr_pages_avail' may be used uninitialized in this function [-Werror=maybe-uninitialized] arch/arm/kvm/../../../virt/kvm/kvm_main.c: In function 'gfn_to_page_many_atomic': arch/arm/kvm/../../../virt/kvm/kvm_main.c:1660:5: error: 'entry' may be used uninitialized in this function [-Werror=maybe-uninitialized] This adds fake initializations to the two instances I ran into. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
- 11 Aug, 2017 5 commits
-
-
Jim Mattson authored
Host-initiated writes to the IA32_APIC_BASE MSR do not have to follow local APIC state transition constraints, but the value written must be valid. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Wanpeng Li authored
Bailing out immediately if there is no available mmu page to alloc. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Wanpeng Li authored
watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [warn_test:3089] irq event stamp: 20532 hardirqs last enabled at (20531): [<ffffffff8e9b6908>] restore_regs_and_iret+0x0/0x1d hardirqs last disabled at (20532): [<ffffffff8e9b7ae8>] apic_timer_interrupt+0x98/0xb0 softirqs last enabled at (8266): [<ffffffff8e9badc6>] __do_softirq+0x206/0x4c1 softirqs last disabled at (8253): [<ffffffff8e083918>] irq_exit+0xf8/0x100 CPU: 5 PID: 3089 Comm: warn_test Tainted: G OE 4.13.0-rc3+ #8 RIP: 0010:kvm_mmu_prepare_zap_page+0x72/0x4b0 [kvm] Call Trace: make_mmu_pages_available.isra.120+0x71/0xc0 [kvm] kvm_mmu_load+0x1cf/0x410 [kvm] kvm_arch_vcpu_ioctl_run+0x1316/0x1bf0 [kvm] kvm_vcpu_ioctl+0x340/0x700 [kvm] ? kvm_vcpu_ioctl+0x340/0x700 [kvm] ? __fget+0xfc/0x210 do_vfs_ioctl+0xa4/0x6a0 ? __fget+0x11d/0x210 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x23/0xc2 ? __this_cpu_preempt_check+0x13/0x20 This can be reproduced readily by ept=N and running syzkaller tests since many syzkaller testcases don't setup any memory regions. However, if ept=Y rmode identity map will be created, then kvm_mmu_calculate_mmu_pages() will extend the number of VM's mmu pages to at least KVM_MIN_ALLOC_MMU_PAGES which just hide the issue. I saw the scenario kvm->arch.n_max_mmu_pages == 0 && kvm->arch.n_used_mmu_pages == 1, so there is one active mmu page on the list, kvm_mmu_prepare_zap_page() fails to zap any pages, however prepare_zap_oldest_mmu_page() always returns true. It incurs infinite loop in make_mmu_pages_available() which causes mmu->lock softlockup. This patch fixes it by setting the return value of prepare_zap_oldest_mmu_page() according to whether or not there is mmu page zapped. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Hildenbrand authored
Let's reuse the function introduced with eptp switching. We don't explicitly have to check against enable_ept_ad_bits, as this is implicitly done when checking against nested_vmx_ept_caps in valid_ept_address(). Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Andrew Jones authored
Remove nonexistent files, allow less awkward expressions when extracting arch-specific information, and only return relevant information when using arch-specific expressions. Additionally add include/trace/events/kvm.h, arch/*/include/uapi/asm/kvm*, and arch/powerpc/kernel/kvm* to appropriate sections. The arch- specific expressions are now: /KVM/ -- All KVM /\(KVM\)|\(KVM\/x86\)/ -- X86 /\(KVM\)|\(KVM\/x86\)|\(KVM\/amd\)/ -- X86 plus AMD /\(KVM\)|\(KVM\/arm\)/ -- ARM /\(KVM\)|\(KVM\/arm\)|\(KVM\/arm64\)/ -- ARM plus ARM64 /\(KVM\)|\(KVM\/powerpc\)/ -- POWERPC /\(KVM\)|\(KVM\/s390\)/ -- S390 /\(KVM\)|\(KVM\/mips\)/ -- MIPS Signed-off-by: Andrew Jones <drjones@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-