- 20 Apr, 2017 16 commits
-
-
Alexey Kardashevskiy authored
VFIO on sPAPR already implements guest memory pre-registration when the entire guest RAM gets pinned. This can be used to translate the physical address of a guest page containing the TCE list from H_PUT_TCE_INDIRECT. This makes use of the pre-registrered memory API to access TCE list pages in order to avoid unnecessary locking on the KVM memory reverse map as we know that all of guest memory is pinned and we have a flat array mapping GPA to HPA which makes it simpler and quicker to index into that array (even with looking up the kernel page tables in vmalloc_to_phys) than it is to find the memslot, lock the rmap entry, look up the user page tables, and unlock the rmap entry. Note that the rmap pointer is initialized to NULL where declared (not in this patch). If a requested chunk of memory has not been preregistered, this will fall back to non-preregistered case and lock rmap. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Alexey Kardashevskiy authored
The guest view TCE tables are per KVM anyway (not per VCPU) so pass kvm* there. This will be used in the following patches where we will be attaching VFIO containers to LIOBNs via ioctl() to KVM (rather than to VCPU). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Alexey Kardashevskiy authored
It does not make much sense to have KVM in book3s-64 and not to have IOMMU bits for PCI pass through support as it costs little and allows VFIO to function on book3s KVM. Having IOMMU_API always enabled makes it unnecessary to have a lot of "#ifdef IOMMU_API" in arch/powerpc/kvm/book3s_64_vio*. With those ifdef's we could have only user space emulated devices accelerated (but not VFIO) which do not seem to be very useful. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Alexey Kardashevskiy authored
This adds a capability number for in-kernel support for VFIO on SPAPR platform. The capability will tell the user space whether in-kernel handlers of H_PUT_TCE can handle VFIO-targeted requests or not. If not, the user space must not attempt allocating a TCE table in the host kernel via the KVM_CREATE_SPAPR_TCE KVM ioctl because in that case TCE requests will not be passed to the user space which is desired action in the situation like that. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Paul Mackerras authored
This merges in the commits in the topic/ppc-kvm branch of the powerpc tree to get the changes to arch/powerpc which subsequent patches will rely on. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Alexey Kardashevskiy authored
At the moment the userspace can request a table smaller than a page size and this value will be stored as kvmppc_spapr_tce_table::size. However the actual allocated size will still be aligned to the system page size as alloc_page() is used there. This aligns the table size up to the system page size. It should not change the existing behaviour but when in-kernel TCE acceleration patchset reaches the upstream kernel, this will allow small TCE tables be accelerated as well: PCI IODA iommu_table allocator already aligns the size and, without this patch, an IOMMU group won't attach to LIOBN due to the mismatching table size. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Alexey Kardashevskiy authored
PR KVM page fault handler performs eaddr to pte translation for a guest, however kvmppc_mmu_book3s_64_xlate() does not preserve WIMG bits (storage control) in the kvmppc_pte struct. If PR KVM is running as a second level guest under HV KVM, and PR KVM tries inserting HPT entry, this fails in HV KVM if it already has this mapping. This preserves WIMG bits between kvmppc_mmu_book3s_64_xlate() and kvmppc_mmu_map_page(). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Alexey Kardashevskiy authored
At the moment kvmppc_mmu_map_page() returns -1 if mmu_hash_ops.hpte_insert() fails for any reason so the page fault handler resumes the guest and it faults on the same address again. This adds distinction to kvmppc_mmu_map_page() to return -EIO if mmu_hash_ops.hpte_insert() failed for a reason other than full pteg. At the moment only pSeries_lpar_hpte_insert() returns -2 if plpar_pte_enter() failed with a code other than H_PTEG_FULL. Other mmu_hash_ops.hpte_insert() instances can only fail with -1 "full pteg". With this change, if PR KVM fails to update HPT, it can signal the userspace about this instead of returning to guest and having the very same page fault over and over again. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Alexey Kardashevskiy authored
@is_mmio has never been used since introduction in commit 2f4cf5e4 ("Add book3s.c") from 2009. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Markus Elfring authored
* A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus use the corresponding function "kcalloc". This issue was detected by using the Coccinelle software. * Replace the specification of a data type by a pointer dereference to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Markus Elfring authored
Add a jump target so that a bit of exception handling can be better reused at the end of this function. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Paul Mackerras authored
For completeness, this adds emulation of the lfiwax and lfiwzx instructions. With this, all floating-point load and store instructions as of Power ISA V2.07 are emulated. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Paul Mackerras authored
This adds emulation for the following integer loads and stores, thus enabling them to be used in a guest for accessing emulated MMIO locations. - lhaux - lwaux - lwzux - ldu - lwa - stdux - stwux - stdu - ldbrx - stdbrx Previously, most of these would cause an emulation failure exit to userspace, though ldu and lwa got treated incorrectly as ld, and stdu got treated incorrectly as std. This also tidies up some of the formatting and updates the comment listing instructions that still need to be implemented. With this, all integer loads and stores that are defined in the Power ISA v2.07 are emulated, except for those that are permitted to trap when used on cache-inhibited or write-through mappings (and which do in fact trap on POWER8), that is, lmw/stmw, lswi/stswi, lswx/stswx, lq/stq, and l[bhwdq]arx/st[bhwdq]cx. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Alexey Kardashevskiy authored
This adds missing stdx emulation for emulated MMIO accesses by KVM guests. This allows the Mellanox mlx5_core driver from recent kernels to work when MMIO emulation is enforced by userspace. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Bin Lu authored
This patch provides the MMIO load/store emulation for instructions of 'double & vector unsigned char & vector signed char & vector unsigned short & vector signed short & vector unsigned int & vector signed int & vector double '. The instructions that this adds emulation for are: - ldx, ldux, lwax, - lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux, - stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx, - lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx, - stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x [paulus@ozlabs.org - some cleanups, fixes and rework, make it compile for Book E, fix build when PR KVM is built in] Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Paul Mackerras authored
This provides functions that can be used for generating interrupts indicating that a given functional unit (floating point, vector, or VSX) is unavailable. These functions will be used in instruction emulation code. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
- 11 Apr, 2017 1 commit
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linuxRadim Krčmář authored
From: Christian Borntraeger <borntraeger@de.ibm.com> KVM: s390: features for 4.12 1. guarded storage support for guests This contains an s390 base Linux feature branch that is necessary to implement the KVM part 2. Provide an interface to implement adapter interruption suppression which is necessary for proper zPCI support 3. Use more defines instead of numbers 4. Provide logging for lazy enablement of runtime instrumentation
-
- 07 Apr, 2017 16 commits
-
-
Jim Mattson authored
The userspace exception injection API and code path are entirely unprepared for exceptions that might cause a VM-exit from L2 to L1, so the best course of action may be to simply disallow this for now. 1. The API provides no mechanism for userspace to specify the new DR6 bits for a #DB exception or the new CR2 value for a #PF exception. Presumably, userspace is expected to modify these registers directly with KVM_SET_SREGS before the next KVM_RUN ioctl. However, in the event that L1 intercepts the exception, these registers should not be changed. Instead, the new values should be provided in the exit_qualification field of vmcs12 (Intel SDM vol 3, section 27.1). 2. In the case of a userspace-injected #DB, inject_pending_event() clears DR7.GD before calling vmx_queue_exception(). However, in the event that L1 intercepts the exception, this is too early, because DR7.GD should not be modified by a #DB that causes a VM-exit directly (Intel SDM vol 3, section 27.1). 3. If the injected exception is a #PF, nested_vmx_check_exception() doesn't properly check whether or not L1 is interested in the associated error code (using the #PF error code mask and match fields from vmcs12). It may either return 0 when it should call nested_vmx_vmexit() or vice versa. 4. nested_vmx_check_exception() assumes that it is dealing with a hardware-generated exception intercept from L2, with some of the relevant details (the VM-exit interruption-information and the exit qualification) live in vmcs02. For userspace-injected exceptions, this is not the case. 5. prepare_vmcs12() assumes that when its exit_intr_info argument specifies valid information with a valid error code that it can VMREAD the VM-exit interruption error code from vmcs02. For userspace-injected exceptions, this is not the case. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
David Hildenbrand authored
If we already entered/are about to enter SMM, don't allow switching to INIT/SIPI_RECEIVED, otherwise the next call to kvm_apic_accept_events() will report a warning. Same applies if we are already in MP state INIT_RECEIVED and SMM is requested to be turned on. Refuse to set the VCPU events in this case. Fixes: cd7764fe ("KVM: x86: latch INITs while in system management mode") Cc: stable@vger.kernel.org # 4.2+ Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
Its value has never changed; we might as well make it part of the ABI instead of using the return value of KVM_CHECK_EXTENSION(KVM_CAP_COALESCED_MMIO). Because PPC does not always make MMIO available, the code has to be made dependent on CONFIG_KVM_MMIO rather than KVM_COALESCED_MMIO_PAGE_OFFSET. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
Remove code from architecture files that can be moved to virt/kvm, since there is already common code for coalesced MMIO. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [Removed a pointless 'break' after 'return'.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
In order to simplify adding exit reasons in the future, the array of exit reason names is now also sorted by exit reason code. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
Now use bit 6 of EPTP to optionally enable A/D bits for EPTP. Another thing to change is that, when EPT accessed and dirty bits are not in use, VMX treats accesses to guest paging structures as data reads. When they are in use (bit 6 of EPTP is set), they are treated as writes and the corresponding EPT dirty bit is set. The MMU didn't know this detail, so this patch adds it. We also have to fix up the exit qualification. It may be wrong because KVM sets bit 6 but the guest might not. L1 emulates EPT A/D bits using write permissions, so in principle it may be possible for EPT A/D bits to be used by L1 even though not available in hardware. The problem is that guest page-table walks will be treated as reads rather than writes, so they would not cause an EPT violation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [Fixed typo in walk_addr_generic() comment and changed bit clear + conditional-set pattern in handle_ept_violation() to conditional-clear] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
This prepares the MMU paging code for EPT accessed and dirty bits, which can be enabled optionally at runtime. Code that updates the accessed and dirty bits will need a pointer to the struct kvm_mmu. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
handle_ept_violation is checking for "guest-linear-address invalid" + "not a paging-structure walk". However, _all_ EPT violations without a valid guest linear address are paging structure walks, because those EPT violations happen when loading the guest PDPTEs. Therefore, the check can never be true, and even if it were, KVM doesn't care about the guest linear address; it only uses the guest *physical* address VMCS field. So, remove the check altogether. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Paolo Bonzini authored
Large pages at the PDPE level can be emulated by the MMU, so the bit can be set unconditionally in the EPT capabilities MSR. The same is true of 2MB EPT pages, though all Intel processors with EPT in practice support those. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Legacy device assignment has been deprecated since 4.2 (released 1.5 years ago). VFIO is better and everyone should have switched to it. If they haven't, this should convince them. :) Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Virtual NMIs are only missing in Prescott and Yonah chips. Both are obsolete for virtualization usage---Yonah is 32-bit only even---so drop vNMI emulation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Borislav Petkov authored
MCG_CAP[63:9] bits are reserved on AMD. However, on an AMD guest, this MSR returns 0x100010a. More specifically, bit 24 is set, which is simply wrong. That bit is MCG_SER_P and is present only on Intel. Thus, clean up the reserved bits in order not to confuse guests. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
David Hildenbrand authored
Let's combine it in a single function vmx_switch_vmcs(). Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Jim Mattson authored
According to the Intel SDM, volume 3, section 28.3.2: Creating and Using Cached Translation Information, "No linear mappings are used while EPT is in use." INVEPT will invalidate both the guest-physical mappings and the combined mappings in the TLBs and paging-structure caches, so an INVVPID is superfluous. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
-
Yi Min Zhao authored
Introduce a cap to enable AIS facility bit, and add documentation for this capability. Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com> Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
- 06 Apr, 2017 4 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/kvm-mipsRadim Krčmář authored
From: James Hogan <james.hogan@imgtec.com> KVM: MIPS: VZ support, Octeon III, and TLBR Add basic support for the MIPS Virtualization Module (generally known as MIPS VZ) in KVM. We primarily support the ImgTec P5600, P6600, I6400, and Cavium Octeon III cores so far. Support is included for the following VZ / guest hardware features: - MIPS32 and MIPS64, r5 (VZ requires r5 or later) and r6 - TLBs with GuestID (IMG cores) or Root ASID Dealias (Octeon III) - Shared physical root/guest TLB (IMG cores) - FPU / MSA - Cop0 timer (up to 1GHz for now due to soft timer limit) - Segmentation control (EVA) - Hardware page table walker (HTW) both for root and guest TLB Also included is a proper implementation of the TLBR instruction for the trap & emulate MIPS KVM implementation. Preliminary MIPS architecture changes are applied directly with Ralf's ack.
-
Yi Min Zhao authored
Inject adapter interrupts on a specified adapter which allows to retrieve the adapter flags, e.g. if the adapter is subject to AIS facility or not. And add documentation for this interface. For adapters subject to AIS, handle the airq injection suppression for a given ISC according to the interruption mode: - before injection, if NO-Interruptions Mode, just return 0 and suppress, otherwise, allow the injection. - after injection, if SINGLE-Interruption Mode, change it to NO-Interruptions Mode to suppress the following interrupts. Besides, add tracepoint for suppressed airq and AIS mode transitions. Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com> Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Fei Li authored
Provide an interface for userspace to modify AIS (adapter-interruption-suppression) mode state, and add documentation for the interface. Allowed target modes are ALL-Interruptions mode and SINGLE-Interruption mode. We introduce the 'simm' and 'nimm' fields in kvm_s390_float_interrupt to store interruption modes for each ISC. Each bit in 'simm' and 'nimm' targets to one ISC, and collaboratively indicate three modes: ALL-Interruptions, SINGLE-Interruption and NO-Interruptions. This interface can initiate most transitions between the states; transition from SINGLE-Interruption to NO-Interruptions via adapter interrupt injection will be introduced in a following patch. The meaningful combinations are as follows: interruption mode | simm bit | nimm bit ------------------|----------|---------- ALL | 0 | 0 SINGLE | 1 | 0 NO | 1 | 1 Besides, add tracepoint to track AIS mode transitions. Co-Authored-By: Yi Min Zhao <zyimin@linux.vnet.ibm.com> Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com> Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Fei Li authored
In order to properly implement adapter-interruption suppression, we need a way for userspace to specify which adapters are subject to suppression. Let's convert the existing (and unused) 'pad' field into a 'flags' field and define a flag value for suppressible adapters. Besides, add documentation for the interface. Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
- 30 Mar, 2017 3 commits
-
-
Alexey Kardashevskiy authored
So far iommu_table obejcts were only used in virtual mode and had a single owner. We are going to change this by implementing in-kernel acceleration of DMA mapping requests. The proposed acceleration will handle requests in real mode and KVM will keep references to tables. This adds a kref to iommu_table and defines new helpers to update it. This replaces iommu_free_table() with iommu_tce_table_put() and makes iommu_free_table() static. iommu_tce_table_get() is not used in this patch but it will be in the following patch. Since this touches prototypes, this also removes @node_name parameter as it has never been really useful on powernv and carrying it for the pseries platform code to iommu_free_table() seems to be quite useless as well. This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Alexey Kardashevskiy authored
At the moment iommu_table can be disposed by either calling iommu_table_free() directly or it_ops::free(); the only implementation of free() is in IODA2 - pnv_ioda2_table_free() - and it calls iommu_table_free() anyway. As we are going to have reference counting on tables, we need an unified way of disposing tables. This moves it_ops::free() call into iommu_free_table() and makes use of the latter. The free() callback now handles only platform-specific data. As from now on the iommu_free_table() calls it_ops->free(), we need to have it_ops initialized before calling iommu_free_table() so this moves this initialization in pnv_pci_ioda2_create_table(). This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Alexey Kardashevskiy authored
In real mode, TCE tables are invalidated using special cache-inhibited store instructions which are not available in virtual mode This defines and implements exchange_rm() callback. This does not define set_rm/clear_rm/flush_rm callbacks as there is no user for those - exchange/exchange_rm are only to be used by KVM for VFIO. The exchange_rm callback is defined for IODA1/IODA2 powernv platforms. This replaces list_for_each_entry_rcu with its lockless version as from now on pnv_pci_ioda2_tce_invalidate() can be called in the real mode too. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-