- 11 Sep, 2019 12 commits
-
-
Liran Alon authored
Commit cd7764fe ("KVM: x86: latch INITs while in system management mode") changed code to latch INIT while vCPU is in SMM and process latched INIT when leaving SMM. It left a subtle remark in commit message that similar treatment should also be done while vCPU is in VMX non-root-mode. However, INIT signals should actually be latched in various vCPU states: (*) For both Intel and AMD, INIT signals should be latched while vCPU is in SMM. (*) For Intel, INIT should also be latched while vCPU is in VMX operation and later processed when vCPU leaves VMX operation by executing VMXOFF. (*) For AMD, INIT should also be latched while vCPU runs with GIF=0 or in guest-mode with intercept defined on INIT signal. To fix this: 1) Add kvm_x86_ops->apic_init_signal_blocked() such that each CPU vendor can define the various CPU states in which INIT signals should be blocked and modify kvm_apic_accept_events() to use it. 2) Modify vmx_check_nested_events() to check for pending INIT signal while vCPU in guest-mode. If so, emualte vmexit on EXIT_REASON_INIT_SIGNAL. Note that nSVM should have similar behaviour but is currently left as a TODO comment to implement in the future because nSVM don't yet implement svm_check_nested_events(). Note: Currently KVM nVMX implementation don't support VMX wait-for-SIPI activity state as specified in MSR_IA32_VMX_MISC bits 6:8 exposed to guest (See nested_vmx_setup_ctls_msrs()). If and when support for this activity state will be implemented, kvm_check_nested_events() would need to avoid emulating vmexit on INIT signal in case activity-state is wait-for-SIPI. In addition, kvm_apic_accept_events() would need to be modified to avoid discarding SIPI in case VMX activity-state is wait-for-SIPI but instead delay SIPI processing to vmx_check_nested_events() that would clear pending APIC events and emulate vmexit on SIPI. Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Co-developed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Liran Alon authored
According to Intel SDM section 25.2 "Other Causes of VM Exits", When INIT signal is received on a CPU that is running in VMX non-root mode it should cause an exit with exit-reason of 3. (See Intel SDM Appendix C "VMX BASIC EXIT REASONS") This patch introduce the exit-reason definition. Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Co-developed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Merge tag 'kvm-s390-next-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD * More selftests * Improved KVM_S390_MEM_OP ioctl input checking * Add kvm_valid_regs and kvm_dirty_regs invalid bit checking
-
Wanpeng Li authored
The hrtimer which is used to emulate lapic timer is stopped during vcpu reset, preemption timer should do the same. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Wanpeng Li authored
This patch optimizes the virtual IPI emulation sequence: write ICR2 write ICR2 write ICR read ICR2 read ICR ==> send virtual IPI read ICR2 write ICR send virtual IPI It can reduce kvm-unit-tests/vmexit.flat IPI testing latency(from sender send IPI to sender receive the ACK) from 3319 cycles to 3203 cycles on SKylake server. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jiří Paleček authored
On AMD processors, in PAE 32bit mode, nested KVM instances don't work. The L0 host get a kernel OOPS, which is related to arch.mmu->pae_root being NULL. The reason for this is that when setting up nested KVM instance, arch.mmu is set to &arch.guest_mmu (while normally, it would be &arch.root_mmu). However, the initialization and allocation of pae_root only creates it in root_mmu. KVM code (ie. in mmu_alloc_shadow_roots) then accesses arch.mmu->pae_root, which is the unallocated arch.guest_mmu->pae_root. This fix just allocates (and frees) pae_root in both guest_mmu and root_mmu (and also lm_root if it was allocated). The allocation is subject to previous restrictions ie. it won't allocate anything on 64-bit and AFAIK not on Intel. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=203923 Fixes: 14c07ad8 ("x86/kvm/mmu: introduce guest_mmu") Signed-off-by: Jiri Palecek <jpalecek@web.de> Tested-by: Jiri Palecek <jpalecek@web.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jan Dakinevich authored
x86_emulate_instruction() takes into account ctxt->have_exception flag during instruction decoding, but in practice this flag is never set in x86_decode_insn(). Fixes: 6ea6e843 ("KVM: x86: inject exceptions produced by x86_decode_insn") Cc: stable@vger.kernel.org Cc: Denis Lunev <den@virtuozzo.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Jan Dakinevich authored
inject_emulated_exception() returns true if and only if nested page fault happens. However, page fault can come from guest page tables walk, either nested or not nested. In both cases we should stop an attempt to read under RIP and give guest to step over its own page fault handler. This is also visible when an emulated instruction causes a #GP fault and the VMware backdoor is enabled. To handle the VMware backdoor, KVM intercepts #GP faults; with only the next patch applied, x86_emulate_instruction() injects a #GP but returns EMULATE_FAIL instead of EMULATE_DONE. EMULATE_FAIL causes handle_exception_nmi() (or gp_interception() for SVM) to re-inject the original #GP because it thinks emulation failed due to a non-VMware opcode. This patch prevents the issue as x86_emulate_instruction() will return EMULATE_DONE after injecting the #GP. Fixes: 6ea6e843 ("KVM: x86: inject exceptions produced by x86_decode_insn") Cc: stable@vger.kernel.org Cc: Denis Lunev <den@virtuozzo.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Use the recently added tracepoint for logging nested VM-Enter failures instead of spamming the kernel log when hardware detects a consistency check failure. Take the opportunity to print the name of the error code instead of dumping the raw hex number, but limit the symbol table to error codes that can reasonably be encountered by KVM. Add an equivalent tracepoint in nested_vmx_check_vmentry_hw(), e.g. so that tracing of "invalid control field" errors isn't suppressed when nested early checks are enabled. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Debugging a failed VM-Enter is often like searching for a needle in a haystack, e.g. there are over 80 consistency checks that funnel into the "invalid control field" error code. One way to expedite debug is to run the buggy code as an L1 guest under KVM (and pray that the failing check is detected by KVM). However, extracting useful debug information out of L0 KVM requires attaching a debugger to KVM and/or modifying the source, e.g. to log which check is failing. Make life a little less painful for VMM developers and add a tracepoint for failed VM-Enter consistency checks. Ideally the tracepoint would capture both what check failed and precisely why it failed, but logging why a checked failed is difficult to do in a generic tracepoint without resorting to invasive techniques, e.g. generating a custom string on failure. That being said, for the vast majority of VM-Enter failures the most difficult step is figuring out exactly what to look at, e.g. figuring out which bit was incorrectly set in a control field is usually not too painful once the guilty field as been identified. To reach a happy medium between precision and ease of use, simply log the code that detected a failed check, using a macro to execute the check and log the trace event on failure. This approach enables tracing arbitrary code, e.g. it's not limited to function calls or specific formats of checks, and the changes to the existing code are minimally invasive. A macro with a two-character name is desirable as usage of the macro doesn't result in overly long lines or confusing alignment, while still retaining some amount of readability. I.e. a one-character name is a little too terse, and a three-character name results in the contents being passed to the macro aligning with an indented line when the macro is used an in if-statement, e.g.: if (VCC(nested_vmx_check_long_line_one(...) && nested_vmx_check_long_line_two(...))) return -EINVAL; And that is the story of how the CC(), a.k.a. Consistency Check, macro got its name. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Dan Carpenter authored
We refactored this code a bit and accidentally deleted the "-" character from "-EINVAL". The kvm_vcpu_map() function never returns positive EINVAL. Fixes: c8e16b78 ("x86: KVM: svm: eliminate hardcoded RIP advancement from vmrun_interception()") Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Liran Alon authored
Receiving an unexpected exit reason from hardware should be considered as a severe bug in KVM. Therefore, instead of just injecting #UD to guest and ignore it, exit to userspace on internal error so that it could handle it properly (probably by terminating guest). In addition, prefer to use vcpu_unimpl() instead of WARN_ONCE() as handling unexpected exit reason should be a rare unexpected event (that was expected to never happen) and we prefer to print a message on it every time it occurs to guest. Furthermore, dump VMCS/VMCB to dmesg to assist diagnosing such cases. Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 10 Sep, 2019 11 commits
-
-
Sean Christopherson authored
Move RDMSR and WRMSR emulation into common x86 code to consolidate nearly identical SVM and VMX code. Note, consolidating RDMSR introduces an extra indirect call, i.e. retpoline, due to reaching {svm,vmx}_get_msr() via kvm_x86_ops, but a guest kernel likely has bigger problems if increasing the latency of RDMSR VM-Exits by ~70 cycles has a measurable impact on overall VM performance. E.g. the only recurring RDMSR VM-Exits (after booting) on my system running Linux 5.2 in the guest are for MSR_IA32_TSC_ADJUST via arch_cpu_idle_enter(). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Refactor the top-level MSR accessors to take/return the index and value directly instead of requiring the caller to dump them into a msr_data struct. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Xiaoyao Li authored
Userspace can use ioctl KVM_SET_MSRS to update a set of MSRs of guest. This ioctl set specified MSRs one by one. If it fails to set an MSR, e.g., due to setting reserved bits, the MSR is not supported/emulated by KVM, etc..., it stops processing the MSR list and returns the number of MSRs have been set successfully. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Peter Xu authored
The PLE window tracepoint triggers even if the window is not changed, and the wording can be a bit confusing too. One example line: kvm_ple_window: vcpu 0: ple_window 4096 (shrink 4096) It easily let people think of "the window now is 4096 which is shrinked", but the truth is the value actually didn't change (4096). Let's only dump this message if the value really changed, and we make the message even simpler like: kvm_ple_window: vcpu 4 old 4096 new 8192 (growed) Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Peter Xu authored
The VMX ple_window is 32 bits wide, so logically it can overflow with an int. The module parameter is declared as unsigned int which is good, however the dynamic variable is not. Switching all the ple_window references to use unsigned int. The tracepoint changes will also affect SVM, but SVM is using an even smaller width (16 bits) so it's always fine. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Peter Xu authored
It's done by TP_printk() already. Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Peter Xu authored
Tracing the ID helps to pair vmenters and vmexits for guests with multiple vCPUs. Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarmPaolo Bonzini authored
KVM/arm updates for 5.4 - New ITS translation cache - Allow up to 512 CPUs to be supported with GICv3 (for real this time) - Now call kvm_arch_vcpu_blocking early in the blocking sequence - Tidy-up device mappings in S2 when DIC is available - Clean icache invalidation on VMID rollover - General cleanup
-
Paolo Bonzini authored
Merge tag 'kvm-ppc-next-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD PPC KVM update for 5.4 - Some prep for extending the uses of the rmap array - Various minor fixes - Commits from the powerpc topic/ppc-kvm branch, which fix a problem with interrupts arriving after free_irq, causing host hangs and crashes.
-
Sean Christopherson authored
Manually generate the PDPTR reserved bit mask when explicitly loading PDPTRs. The reserved bits that are being tracked by the MMU reflect the current paging mode, which is unlikely to be PAE paging in the vast majority of flows that use load_pdptrs(), e.g. CR0 and CR4 emulation, __set_sregs(), etc... This can cause KVM to incorrectly signal a bad PDPTR, or more likely, miss a reserved bit check and subsequently fail a VM-Enter due to a bad VMCS.GUEST_PDPTR. Add a one off helper to generate the reserved bits instead of sharing code across the MMU's calculations and the PDPTR emulation. The PDPTR reserved bits are basically set in stone, and pushing a helper into the MMU's calculation adds unnecessary complexity without improving readability. Oppurtunistically fix/update the comment for load_pdptrs(). Note, the buggy commit also introduced a deliberate functional change, "Also remove bit 5-6 from rsvd_bits_mask per latest SDM.", which was effectively (and correctly) reverted by commit cd9ae5fe ("KVM: x86: Fix page-tables reserved bits"). A bit of SDM archaeology shows that the SDM from late 2008 had a bug (likely a copy+paste error) where it listed bits 6:5 as AVL and A for PDPTEs used for 4k entries but reserved for 2mb entries. I.e. the SDM contradicted itself, and bits 6:5 are and always have been reserved. Fixes: 20c466b5 ("KVM: Use rsvd_bits_mask in load_pdptrs()") Cc: stable@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Reported-by: Doug Reiland <doug.reiland@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Alexander Graf authored
We can easily route hardware interrupts directly into VM context when they target the "Fixed" or "LowPriority" delivery modes. However, on modes such as "SMI" or "Init", we need to go via KVM code to actually put the vCPU into a different mode of operation, so we can not post the interrupt Add code in the VMX and SVM PI logic to explicitly refuse to establish posted mappings for advanced IRQ deliver modes. This reflects the logic in __apic_accept_irq() which also only ever passes Fixed and LowPriority interrupts as posted interrupts into the guest. This fixes a bug I have with code which configures real hardware to inject virtual SMIs into my guest. Signed-off-by: Alexander Graf <graf@amazon.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Wanpeng Li <wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 09 Sep, 2019 1 commit
-
-
Marc Zyngier authored
While parts of the VGIC support a large number of vcpus (we bravely allow up to 512), other parts are more limited. One of these limits is visible in the KVM_IRQ_LINE ioctl, which only allows 256 vcpus to be signalled when using the CPU or PPI types. Unfortunately, we've cornered ourselves badly by allocating all the bits in the irq field. Since the irq_type subfield (8 bit wide) is currently only taking the values 0, 1 and 2 (and we have been careful not to allow anything else), let's reduce this field to only 4 bits, and allocate the remaining 4 bits to a vcpu2_index, which acts as a multiplier: vcpu_id = 256 * vcpu2_index + vcpu_index With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2) allowing this to be discovered, it becomes possible to inject PPIs to up to 4096 vcpus. But please just don't. Whilst we're there, add a clarification about the use of KVM_IRQ_LINE on arm, which is not completely conditionned by KVM_CAP_IRQCHIP. Reported-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
-
- 04 Sep, 2019 2 commits
-
-
Thomas Huth authored
Now that we disallow invalid bits in kvm_valid_regs and kvm_dirty_regs on s390x, too, we should also check this condition in the selftests. The code has been taken from the x86-version of the sync_regs_test. Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Link: https://lore.kernel.org/lkml/20190904085200.29021-3-thuth@redhat.com/Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
-
Thomas Huth authored
If unknown bits are set in kvm_valid_regs or kvm_dirty_regs, this clearly indicates that something went wrong in the KVM userspace application. The x86 variant of KVM already contains a check for bad bits, so let's do the same on s390x now, too. Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Link: https://lore.kernel.org/lkml/20190904085200.29021-2-thuth@redhat.com/Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
-
- 29 Aug, 2019 3 commits
-
-
Thomas Huth authored
Check that we can write and read the guest memory with this s390x ioctl, and that some error cases are handled correctly. Signed-off-by: Thomas Huth <thuth@redhat.com> Link: https://lkml.kernel.org/r/20190829130732.580-1-thuth@redhat.comSigned-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Cornelia Huck authored
Explicitly specify the valid ranges for size and ar, and reword buf requirements a bit. Signed-off-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lkml.kernel.org/r/20190829124746.28665-1-cohuck@redhat.comSigned-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
Thomas Huth authored
If the KVM_S390_MEM_OP ioctl is called with an access register >= 16, then there is certainly a bug in the calling userspace application. We check for wrong access registers, but only if the vCPU was already in the access register mode before (i.e. the SIE block has recorded it). The check is also buried somewhere deep in the calling chain (in the function ar_translation()), so this is somewhat hard to find. It's better to always report an error to the userspace in case this field is set wrong, and it's safer in the KVM code if we block wrong values here early instead of relying on a check somewhere deep down the calling chain, so let's add another check to kvm_s390_guest_mem_op() directly. We also should check that the "size" is non-zero here (thanks to Janosch Frank for the hint!). If we do not check the size, we could call vmalloc() with this 0 value, and this will cause a kernel warning. Signed-off-by: Thomas Huth <thuth@redhat.com> Link: https://lkml.kernel.org/r/20190829122517.31042-1-thuth@redhat.comReviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
-
- 27 Aug, 2019 4 commits
-
-
James Morse authored
Since commit 2f6ea23f ("arm64: KVM: Avoid marking pages as XN in Stage-2 if CTR_EL0.DIC is set"), KVM has stopped marking normal memory as execute-never at stage2 when the system supports D->I Coherency at the PoU. This avoids KVM taking a trap when the page is first executed, in order to clean it to PoU. The patch that added this change also wrapped PAGE_S2_DEVICE mappings up in this too. The upshot is, if your CPU caches support DIC ... you can execute devices. Revert the PAGE_S2_DEVICE change so PTE_S2_XN is always used directly. Fixes: 2f6ea23f ("arm64: KVM: Avoid marking pages as XN in Stage-2 if CTR_EL0.DIC is set") Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
-
Paul Mackerras authored
On POWER9, when userspace reads the value of the DPDES register on a vCPU, it is possible for 0 to be returned although there is a doorbell interrupt pending for the vCPU. This can lead to a doorbell interrupt being lost across migration. If the guest kernel uses doorbell interrupts for IPIs, then it could malfunction because of the lost interrupt. This happens because a newly-generated doorbell interrupt is signalled by setting vcpu->arch.doorbell_request to 1; the DPDES value in vcpu->arch.vcore->dpdes is not updated, because it can only be updated when holding the vcpu mutex, in order to avoid races. To fix this, we OR in vcpu->arch.doorbell_request when reading the DPDES value. Cc: stable@vger.kernel.org # v4.13+ Fixes: 57900694 ("KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9") Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
-
Paul Mackerras authored
When we are running multiple vcores on the same physical core, they could be from different VMs and so it is possible that one of the VMs could have its arch.mmu_ready flag cleared (for example by a concurrent HPT resize) when we go to run it on a physical core. We currently check the arch.mmu_ready flag for the primary vcore but not the flags for the other vcores that will be run alongside it. This adds that check, and also a check when we select the secondary vcores from the preempted vcores list. Cc: stable@vger.kernel.org # v4.14+ Fixes: 38c53af8 ("KVM: PPC: Book3S HV: Fix exclusion between HPT resizing and other HPT updates") Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Paul Mackerras authored
There are some POWER9 machines where the OPAL firmware does not support the OPAL_XIVE_GET_QUEUE_STATE and OPAL_XIVE_SET_QUEUE_STATE calls. The impact of this is that a guest using XIVE natively will not be able to be migrated successfully. On the source side, the get_attr operation on the KVM native device for the KVM_DEV_XIVE_GRP_EQ_CONFIG attribute will fail; on the destination side, the set_attr operation for the same attribute will fail. This adds tests for the existence of the OPAL get/set queue state functions, and if they are not supported, the XIVE-native KVM device is not created and the KVM_CAP_PPC_IRQ_XIVE capability returns false. Userspace can then either provide a software emulation of XIVE, or else tell the guest that it does not have a XIVE controller available to it. Cc: stable@vger.kernel.org # v5.2+ Fixes: 3fab2d10 ("KVM: PPC: Book3S HV: XIVE: Activate XIVE exploitation mode") Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
- 25 Aug, 2019 2 commits
-
-
Eric Auger authored
At the moment we use 2 IO devices per GICv3 redistributor: one one for the RD_base frame and one for the SGI_base frame. Instead we can use a single IO device per redistributor (the 2 frames are contiguous). This saves slots on the KVM_MMIO_BUS which is currently limited to NR_IOBUS_DEVS (1000). This change allows to instantiate up to 512 redistributors and may speed the guest boot with a large number of VCPUs. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
-
Marc Zyngier authored
Detected by Coccinelle (and Will Deacon) using scripts/coccinelle/misc/semicolon.cocci. Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
-
- 23 Aug, 2019 3 commits
-
-
Suraj Jitindar Singh authored
The rmap array in the guest memslot is an array of size number of guest pages, allocated at memslot creation time. Each rmap entry in this array is used to store information about the guest page to which it corresponds. For example for a hpt guest it is used to store a lock bit, rc bits, a present bit and the index of a hpt entry in the guest hpt which maps this page. For a radix guest which is running nested guests it is used to store a pointer to a linked list of nested rmap entries which store the nested guest physical address which maps this guest address and for which there is a pte in the shadow page table. As there are currently two uses for the rmap array, and the potential for this to expand to more in the future, define a type field (being the top 8 bits of the rmap entry) to be used to define the type of the rmap entry which is currently present and define two values for this field for the two current uses of the rmap array. Since the nested case uses the rmap entry to store a pointer, define this type as having the two high bits set as is expected for a pointer. Define the hpt entry type as having bit 56 set (bit 7 IBM bit ordering). Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Paul Menzel authored
Fix the error below triggered by `-Wimplicit-fallthrough`, by tagging it as an expected fall-through. arch/powerpc/kvm/book3s_32_mmu.c: In function ‘kvmppc_mmu_book3s_32_xlate_pte’: arch/powerpc/kvm/book3s_32_mmu.c:241:21: error: this statement may fall through [-Werror=implicit-fallthrough=] pte->may_write = true; ~~~~~~~~~~~~~~~^~~~~~ arch/powerpc/kvm/book3s_32_mmu.c:242:5: note: here case 3: ^~~~ Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
Paul Mackerras authored
This merges in fixes for the XIVE interrupt controller which touch both generic powerpc and PPC KVM code. To avoid merge conflicts, these commits will go upstream via the powerpc tree as well as the KVM tree. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-
- 22 Aug, 2019 2 commits
-
-
Sean Christopherson authored
Fix an incorrect/stale comment regarding the vmx_vcpu pointer, as guest registers are now loaded using a direct pointer to the start of the register array. Opportunistically add a comment to document why the vmx_vcpu pointer is needed, its consumption via 'call vmx_update_host_rsp' is rather subtle. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
KVM implementations that wrap struct kvm_vcpu with a vendor specific struct, e.g. struct vcpu_vmx, must place the vcpu member at offset 0, otherwise the usercopy region intended to encompass struct kvm_vcpu_arch will instead overlap random chunks of the vendor specific struct. E.g. padding a large number of bytes before struct kvm_vcpu triggers a usercopy warn when running with CONFIG_HARDENED_USERCOPY=y. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-