- 20 Oct, 2024 13 commits
-
-
Paolo Bonzini authored
Merge tag 'kvmarm-fixes-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.12, take #3 - Stop wasting space in the HYP idmap, as we are dangerously close to the 4kB limit, and this has already exploded in -next - Fix another race in vgic_init() - Fix a UBSAN error when faking the cache topology with MTE enabled
-
Paolo Bonzini authored
Merge tag 'kvmarm-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.12, take #2 - Fix the guest view of the ID registers, making the relevant fields writable from userspace (affecting ID_AA64DFR0_EL1 and ID_AA64PFR1_EL1) - Correcly expose S1PIE to guests, fixing a regression introduced in 6.12-rc1 with the S1POE support - Fix the recycling of stage-2 shadow MMUs by tracking the context (are we allowed to block or not) as well as the recycling state - Address a couple of issues with the vgic when userspace misconfigures the emulation, resulting in various splats. Headaches courtesy of our Syzkaller friends
-
Cyan Yang authored
For the external interrupt updating procedure in imsic, there was a spinlock to protect it already. But since it should not be preempted in any cases, we should turn to use raw_spinlock to prevent any preemption in case PREEMPT_RT was enabled. Signed-off-by: Cyan Yang <cyan.yang@sifive.com> Reviewed-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Message-ID: <20240919160126.44487-1-cyan.yang@sifive.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
When looking for a "mangled", i.e. dynamic, CPUID entry, terminate the walk based on the number of array _entries_, not the size in bytes of the array. Iterating based on the total size of the array can result in false passes, e.g. if the random data beyond the array happens to match a CPUID entry's function and index. Fixes: fb18d053 ("selftest: kvm: x86: test KVM_GET_CPUID2 and guest visible CPUIDs against KVM_GET_SUPPORTED_CPUID") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-ID: <20241003234337.273364-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Some distros switched gcc to '-march=x86-64-v3' by default and while it's hard to find a CPU which doesn't support it today, many KVM selftests fail with ==== Test Assertion Failure ==== lib/x86_64/processor.c:570: Unhandled exception in guest pid=72747 tid=72747 errno=4 - Interrupted system call Unhandled exception '0x6' at guest RIP '0x4104f7' The failure is easy to reproduce elsewhere with $ make clean && CFLAGS='-march=x86-64-v3' make -j && ./x86_64/kvm_pv_test The root cause of the problem seems to be that with '-march=x86-64-v3' GCC uses AVX* instructions (VMOVQ in the example above) and without prior XSETBV() in the guest this results in #UD. It is certainly possible to add it there, e.g. the following saves the day as well: Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-ID: <20240920154422.2890096-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Ignore nCR3[4:0] when loading PDPTEs from memory for nested SVM, as bits 4:0 of CR3 are ignored when PAE paging is used, and thus VMRUN doesn't enforce 32-byte alignment of nCR3. In the absolute worst case scenario, failure to ignore bits 4:0 can result in an out-of-bounds read, e.g. if the target page is at the end of a memslot, and the VMM isn't using guard pages. Per the APM: The CR3 register points to the base address of the page-directory-pointer table. The page-directory-pointer table is aligned on a 32-byte boundary, with the low 5 address bits 4:0 assumed to be 0. And the SDM's much more explicit: 4:0 Ignored Note, KVM gets this right when loading PDPTRs, it's only the nSVM flow that is broken. Fixes: e4e517b4 ("KVM: MMU: Do not unconditionally read PDPTE from guest memory") Reported-by: Kirk Swidowski <swidowski@google.com> Cc: Andy Nguyen <theflow@google.com> Cc: 3pvd <3pvd@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20241009140838.1036226-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
Reset the segment cache after segment initialization in vmx_vcpu_reset() to harden KVM against caching stale/uninitialized data. Without the recent fix to bypass the cache in kvm_arch_vcpu_put(), the following scenario is possible: - vCPU is just created, and the vCPU thread is preempted before SS.AR_BYTES is written in vmx_vcpu_reset(). - When scheduling out the vCPU task, kvm_arch_vcpu_in_kernel() => vmx_get_cpl() reads and caches '0' for SS.AR_BYTES. - vmx_vcpu_reset() => seg_setup() configures SS.AR_BYTES, but doesn't invoke vmx_segment_cache_clear() to invalidate the cache. As a result, KVM retains a stale value in the cache, which can be read, e.g. via KVM_GET_SREGS. Usually this is not a problem because the VMX segment cache is reset on each VM-Exit, but if the userspace VMM (e.g KVM selftests) reads and writes system registers just after the vCPU was created, _without_ modifying SS.AR_BYTES, userspace will write back the stale '0' value and ultimately will trigger a VM-Entry failure due to incorrect SS segment type. Invalidating the cache after writing the VMCS doesn't address the general issue of cache accesses from IRQ context being unsafe, but it does prevent KVM from clobbering the VMCS, i.e. mitigates the harm done _if_ KVM has a bug that results in an unsafe cache access. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Fixes: 2fb92db1 ("KVM: VMX: Cache vmcs segment fields") [sean: rework changelog to account for previous patch] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20241009175002.1118178-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Massage the documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL to call out that it applies to moved memslots as well as deleted memslots, to avoid KVM's "fast zap" terminology (which has no meaning for userspace), and to reword the documented targeted zap behavior to specifically say that KVM _may_ zap a subset of all SPTEs. As evidenced by the fix to zap non-leafs SPTEs with gPTEs, formally documenting KVM's exact internal behavior is risky and unnecessary. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20241009192345.1148353-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add a lockdep assertion in kvm_unmap_gfn_range() to ensure that either mmu_invalidate_in_progress is elevated, or that the range is being zapped due to memslot removal (loosely detected by slots_lock being held). Zapping SPTEs without mmu_invalidate_{in_progress,seq} protection is unsafe as KVM's page fault path snapshots state before acquiring mmu_lock, and thus can create SPTEs with stale information if vCPUs aren't forced to retry faults (due to seeing an in-progress or past MMU invalidation). Memslot removal is a special case, as the memslot is retrieved outside of mmu_invalidate_seq, i.e. doesn't use the "standard" protections, and instead relies on SRCU synchronization to ensure any in-flight page faults are fully resolved before zapping SPTEs. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20241009192345.1148353-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
When performing a targeted zap on memslot removal, zap only MMU pages that shadow guest PTEs, as zapping all SPs that "match" the gfn is inexact and unnecessary. Furthermore, for_each_gfn_valid_sp() arguably shouldn't exist, because it doesn't do what most people would it expect it to do. The "round gfn for level" adjustment that is done for direct SPs (no gPTE) means that the exact gfn comparison will not get a match, even when a SP does "cover" a gfn, or was even created specifically for a gfn. For memslot deletion specifically, KVM's behavior will vary significantly based on the size and alignment of a memslot, and in weird ways. E.g. for a 4KiB memslot, KVM will zap more SPs if the slot is 1GiB aligned than if it's only 4KiB aligned. And as described below, zapping SPs in the aligned case overzaps for direct MMUs, as odds are good the upper-level SPs are serving other memslots. To iterate over all potentially-relevant gfns, KVM would need to make a pass over the hash table for each level, with the gfn used for lookup rounded for said level. And then check that the SP is of the correct level, too, e.g. to avoid over-zapping. But even then, KVM would massively overzap, as processing every level is all but guaranteed to zap SPs that serve other memslots, especially if the memslot being removed is relatively small. KVM could mitigate that issue by processing only levels that can be possible guest huge pages, i.e. are less likely to be re-used for other memslot, but while somewhat logical, that's quite arbitrary and would be a bit of a mess to implement. So, zap only SPs with gPTEs, as the resulting behavior is easy to describe, is predictable, and is explicitly minimal, i.e. KVM only zaps SPs that absolutely must be zapped. Cc: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Message-ID: <20241009192345.1148353-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Kirill A. Shutemov authored
AMD SEV-SNP and Intel TDX have limited access to MTRR: either it is not advertised in CPUID or it cannot be programmed (on TDX, due to #VE on CR0.CD clear). This results in guests using uncached mappings where it shouldn't and pmd/pud_set_huge() failures due to non-uniform memory type reported by mtrr_type_lookup(). Override MTRR state, making it WB by default as the kernel does for Hyper-V guests. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Suggested-by: Binbin Wu <binbin.wu@intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20241015095818.357915-1-kirill.shutemov@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Dr. David Alan Gilbert authored
The last use of kvm_vcpu_gfn_to_pfn_atomic was removed by commit 1bbc60d0 ("KVM: x86/mmu: Remove MMU auditing") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Message-ID: <20241001141354.18009-3-linux@treblig.org> [Adjust Documentation/virt/kvm/locking.rst. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Dr. David Alan Gilbert authored
The last use of kvm_vcpu_gfn_to_pfn was removed by commit b1624f99 ("KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page()") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Message-ID: <20241001141354.18009-2-linux@treblig.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 17 Oct, 2024 4 commits
-
-
Oliver Upton authored
kvm_vgic_map_resources() prematurely marks the distributor as 'ready', potentially allowing vCPUs to enter the guest before the distributor's MMIO registration has been made visible. Plug the race by marking the distributor as ready only after MMIO registration is completed. Rely on the implied ordering of synchronize_srcu() to ensure the MMIO registration is visible before vgic_dist::ready. This also means that writers to vgic_dist::ready are now serialized by the slots_lock, which was effectively the case already as all writers held the slots_lock in addition to the config_lock. Fixes: 59112e9c ("KVM: arm64: vgic: Fix a circular locking issue") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241017001947.2707312-3-oliver.upton@linux.devSigned-off-by: Marc Zyngier <maz@kernel.org>
-
Oliver Upton authored
KVM commits to a particular sizing of SPIs when the vgic is initialized, which is before the point a vgic becomes ready. On top of that, KVM supplies a default amount of SPIs should userspace not explicitly configure this. As such, the check for vgic_ready() in the handling of KVM_DEV_ARM_VGIC_GRP_NR_IRQS is completely wrong, and testing if nr_spis is nonzero is sufficient for preventing userspace from playing games with us. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241017001947.2707312-2-oliver.upton@linux.devSigned-off-by: Marc Zyngier <maz@kernel.org>
-
Ilkka Koskinen authored
Fix a shift-out-of-bounds bug reported by UBSAN when running VM with MTE enabled host kernel. UBSAN: shift-out-of-bounds in arch/arm64/kvm/sys_regs.c:1988:14 shift exponent 33 is too large for 32-bit type 'int' CPU: 26 UID: 0 PID: 7629 Comm: qemu-kvm Not tainted 6.12.0-rc2 #34 Hardware name: IEI NF5280R7/Mitchell MB, BIOS 00.00. 2024-10-12 09:28:54 10/14/2024 Call trace: dump_backtrace+0xa0/0x128 show_stack+0x20/0x38 dump_stack_lvl+0x74/0x90 dump_stack+0x18/0x28 __ubsan_handle_shift_out_of_bounds+0xf8/0x1e0 reset_clidr+0x10c/0x1c8 kvm_reset_sys_regs+0x50/0x1c8 kvm_reset_vcpu+0xec/0x2b0 __kvm_vcpu_set_target+0x84/0x158 kvm_vcpu_set_target+0x138/0x168 kvm_arch_vcpu_ioctl_vcpu_init+0x40/0x2b0 kvm_arch_vcpu_ioctl+0x28c/0x4b8 kvm_vcpu_ioctl+0x4bc/0x7a8 __arm64_sys_ioctl+0xb4/0x100 invoke_syscall+0x70/0x100 el0_svc_common.constprop.0+0x48/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x3c/0x158 el0t_64_sync_handler+0x120/0x130 el0t_64_sync+0x194/0x198 Fixes: 7af0c253 ("KVM: arm64: Normalize cache configuration") Cc: stable@vger.kernel.org Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20241017025701.67936-1-ilkka@os.amperecomputing.comSigned-off-by: Marc Zyngier <maz@kernel.org>
-
Marc Zyngier authored
Our idmap is becoming too big, to the point where it doesn't fit in a 4kB page anymore. There are some low-hanging fruits though, such as the el2_init_state horror that is expanded 3 times in the kernel. Let's at least limit ourselves to two copies, which makes the kernel link again. At some point, we'll have to have a better way of doing this. Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241009204903.GA3353168@thelio-3990X
-
- 11 Oct, 2024 1 commit
-
-
Marc Zyngier authored
As there is very little ordering in the KVM API, userspace can instanciate a half-baked GIC (missing its memory map, for example) at almost any time. This means that, with the right timing, a thread running vcpu-0 can enter the kernel without a GIC configured and get a GIC created behind its back by another thread. Amusingly, it will pick up that GIC and start messing with the data structures without the GIC having been fully initialised. Similarly, a thread running vcpu-1 can enter the kernel, and try to init the GIC that was previously created. Since this GIC isn't properly configured (no memory map), it fails to correctly initialise. And that's the point where we decide to teardown the GIC, freeing all its resources. Behind vcpu-0's back. Things stop pretty abruptly, with a variety of symptoms. Clearly, this isn't good, we should be a bit more careful about this. It is obvious that this guest is not viable, as it is missing some important part of its configuration. So instead of trying to tear bits of it down, let's just mark it as *dead*. It means that any further interaction from userspace will result in -EIO. The memory will be released on the "normal" path, when userspace gives up. Cc: stable@vger.kernel.org Reported-by: Alexander Potapenko <glider@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241009183603.3221824-1-maz@kernel.orgSigned-off-by: Marc Zyngier <maz@kernel.org>
-
- 08 Oct, 2024 7 commits
-
-
Mark Brown authored
Prior to commit 70ed7238 ("KVM: arm64: Sanitise ID_AA64MMFR3_EL1") we just exposed the santised view of ID_AA64MMFR3_EL1 to guests, meaning that they saw both TCRX and S1PIE if present on the host machine. That commit added VMM control over the contents of the register and exposed S1POE but removed S1PIE, meaning that the extension is no longer visible to guests. Reenable support for S1PIE with VMM control. Fixes: 70ed7238 ("KVM: arm64: Sanitise ID_AA64MMFR3_EL1") Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241005-kvm-arm64-fix-s1pie-v1-1-5901f02de749@kernel.orgSigned-off-by: Marc Zyngier <maz@kernel.org>
-
Oliver Upton authored
There's been a decent amount of attention around unmaps of nested MMUs, and TLBI handling is no exception to this. Add a comment clarifying why it is safe to reschedule during a TLBI unmap, even without a reference on the MMU in progress. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007233028.2236133-5-oliver.upton@linux.devSigned-off-by: Marc Zyngier <maz@kernel.org>
-
Oliver Upton authored
Currently, when a nested MMU is repurposed for some other MMU context, KVM unmaps everything during vcpu_load() while holding the MMU lock for write. This is quite a performance bottleneck for large nested VMs, as all vCPU scheduling will spin until the unmap completes. Start punting the MMU cleanup to a vCPU request, where it is then possible to periodically release the MMU lock and CPU in the presence of contention. Ensure that no vCPU winds up using a stale MMU by tracking the pending unmap on the S2 MMU itself and requesting an unmap on every vCPU that finds it. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007233028.2236133-4-oliver.upton@linux.devSigned-off-by: Marc Zyngier <maz@kernel.org>
-
Oliver Upton authored
Right now the nested code allows unmap operations on a shadow stage-2 to block unconditionally. This is wrong in a couple places, such as a non-blocking MMU notifier or on the back of a sched_in() notifier as part of shadow MMU recycling. Carry through whether or not blocking is allowed to kvm_pgtable_stage2_unmap(). This 'fixes' an issue where stage-2 MMU reclaim would precipitate a stack overflow from a pile of kvm_sched_in() callbacks, all trying to recycle a stage-2 MMU. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007233028.2236133-3-oliver.upton@linux.devSigned-off-by: Marc Zyngier <maz@kernel.org>
-
Oliver Upton authored
If a vCPU is scheduling out and not in WFI emulation, it is highly likely it will get scheduled again soon and reuse the MMU it had before. Dropping the MMU at vcpu_put() can have some unfortunate consequences, as the MMU could get reclaimed and used in a different context, forcing another 'cold start' on an otherwise active MMU. Avoid that altogether by keeping a reference on the MMU if the vCPU is scheduling out, ensuring that another vCPU cannot reclaim it while the current vCPU is away. Since there are more MMUs than vCPUs, this does not affect the guarantee that an unused MMU is available at any time. Furthermore, this makes the vcpu->arch.hw_mmu ~stable in preemptible code, at least for where it matters in the stage-2 abort path. Yes, the MMU can change across WFI emulation, but there isn't even a use case where this would matter. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007233028.2236133-2-oliver.upton@linux.devSigned-off-by: Marc Zyngier <maz@kernel.org>
-
Oliver Upton authored
Alex reports that syzkaller has managed to trigger a use-after-free when tearing down a VM: BUG: KASAN: slab-use-after-free in kvm_put_kvm+0x300/0xe68 virt/kvm/kvm_main.c:5769 Read of size 8 at addr ffffff801c6890d0 by task syz.3.2219/10758 CPU: 3 UID: 0 PID: 10758 Comm: syz.3.2219 Not tainted 6.11.0-rc6-dirty #64 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x17c/0x1a8 arch/arm64/kernel/stacktrace.c:317 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:324 __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x94/0xc0 lib/dump_stack.c:119 print_report+0x144/0x7a4 mm/kasan/report.c:377 kasan_report+0xcc/0x128 mm/kasan/report.c:601 __asan_report_load8_noabort+0x20/0x2c mm/kasan/report_generic.c:381 kvm_put_kvm+0x300/0xe68 virt/kvm/kvm_main.c:5769 kvm_vm_release+0x4c/0x60 virt/kvm/kvm_main.c:1409 __fput+0x198/0x71c fs/file_table.c:422 ____fput+0x20/0x30 fs/file_table.c:450 task_work_run+0x1cc/0x23c kernel/task_work.c:228 do_notify_resume+0x144/0x1a0 include/linux/resume_user_mode.h:50 el0_svc+0x64/0x68 arch/arm64/kernel/entry-common.c:169 el0t_64_sync_handler+0x90/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 Upon closer inspection, it appears that we do not properly tear down the MMIO registration for a vCPU that fails creation late in the game, e.g. a vCPU w/ the same ID already exists in the VM. It is important to consider the context of commit that introduced this bug by moving the unregistration out of __kvm_vgic_vcpu_destroy(). That change correctly sought to avoid an srcu v. config_lock inversion by breaking up the vCPU teardown into two parts, one guarded by the config_lock. Fix the use-after-free while avoiding lock inversion by adding a special-cased unregistration to __kvm_vgic_vcpu_destroy(). This is safe because failed vCPUs are torn down outside of the config_lock. Cc: stable@vger.kernel.org Fixes: f6165067 ("KVM: arm64: vgic: Don't hold config_lock while unregistering redistributors") Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007223909.2157336-1-oliver.upton@linux.devSigned-off-by: Marc Zyngier <maz@kernel.org>
-
Marc Zyngier authored
* kvm-arm64/idregs-6.12: : . : Make some fields of ID_AA64DFR0_EL1 and ID_AA64PFR1_EL1 : writable from userspace, so that a VMM can influence the : set of guest-visible features. : : - for ID_AA64DFR0_EL1: DoubleLock, WRPs, PMUVer and DebugVer : are writable (courtesy of Shameer Kolothum) : : - for ID_AA64PFR1_EL1: BT, SSBS, CVS2_frac are writable : (courtesy of Shaoqin Huang) : . KVM: selftests: aarch64: Add writable test for ID_AA64PFR1_EL1 KVM: arm64: Allow userspace to change ID_AA64PFR1_EL1 KVM: arm64: Use kvm_has_feat() to check if FEAT_SSBS is advertised to the guest KVM: arm64: Disable fields that KVM doesn't know how to handle in ID_AA64PFR1_EL1 KVM: arm64: Make the exposed feature bits in AA64DFR0_EL1 writable from userspace Signed-off-by: Marc Zyngier <maz@kernel.org>
-
- 06 Oct, 2024 3 commits
-
-
Paolo Bonzini authored
Merge tag 'kvmarm-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.12, take #1 - Fix pKVM error path on init, making sure we do not change critical system registers as we're about to fail - Make sure that the host's vector length is at capped by a value common to all CPUs - Fix kvm_has_feat*() handling of "negative" features, as the current code is pretty broken - Promote Joey to the status of official reviewer, while James steps down -- hopefully only temporarly
-
Paolo Bonzini authored
Guard them with CONFIG_KVM_X86_COMMON rather than the two vendor modules. In practice this has no functional change, because CONFIG_KVM_X86_COMMON is set if and only if at least one vendor-specific module is being built. However, it is cleaner to specify CONFIG_KVM_X86_COMMON for functions that are used in kvm.ko. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: 590b09b1 ("KVM: x86: Register "emergency disable" callbacks when virt is enabled") Fixes: 6d55a942 ("x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
kvm.ko is nothing but library code shared by kvm-intel.ko and kvm-amd.ko. It provides no functionality on its own and it is unnecessary unless one of the vendor-specific module is compiled. In particular, /dev/kvm is not created until one of kvm-intel.ko or kvm-amd.ko is loaded. Use CONFIG_KVM to decide if it is built-in or a module, but use the vendor-specific modules for the actual decision on whether to build it. This also fixes a build failure when CONFIG_KVM_INTEL and CONFIG_KVM_AMD are both disabled. The cpu_emergency_register_virt_callback() function is called from kvm.ko, but it is only defined if at least one of CONFIG_KVM_INTEL and CONFIG_KVM_AMD is provided. Fixes: 590b09b1 ("KVM: x86: Register "emergency disable" callbacks when virt is enabled") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- 03 Oct, 2024 2 commits
-
-
Paolo Bonzini authored
As was tried in commit 4e103134 ("KVM: x86/mmu: Zap only the relevant pages when removing a memslot"), all shadow pages, i.e. non-leaf SPTEs, need to be zapped. All of the accounting for a shadow page is tied to the memslot, i.e. the shadow page holds a reference to the memslot, for all intents and purposes. Deleting the memslot without removing all relevant shadow pages, as is done when KVM_X86_QUIRK_SLOT_ZAP_ALL is disabled, results in NULL pointer derefs when tearing down the VM. Reintroduce from that commit the code that walks the whole memslot when there are active shadow MMU pages. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Marc Zyngier authored
Oliver reports that the kvm_has_feat() helper is not behaviing as expected for negative feature. On investigation, the main issue seems to be caused by the following construct: #define get_idreg_field(kvm, id, fld) \ (id##_##fld##_SIGNED ? \ get_idreg_field_signed(kvm, id, fld) : \ get_idreg_field_unsigned(kvm, id, fld)) where one side of the expression evaluates as something signed, and the other as something unsigned. In retrospect, this is totally braindead, as the compiler converts this into an unsigned expression. When compared to something that is 0, the test is simply elided. Epic fail. Similar issue exists in the expand_field_sign() macro. The correct way to handle this is to chose between signed and unsigned comparisons, so that both sides of the ternary expression are of the same type (bool). In order to keep the code readable (sort of), we introduce new comparison primitives taking an operator as a parameter, and rewrite the kvm_has_feat*() helpers in terms of these primitives. Fixes: c62d7a23 ("KVM: arm64: Add feature checking helpers") Reported-by: Oliver Upton <oliver.upton@linux.dev> Tested-by: Oliver Upton <oliver.upton@linux.dev> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241002204239.2051637-1-maz@kernel.orgSigned-off-by: Marc Zyngier <maz@kernel.org>
-
- 01 Oct, 2024 4 commits
-
-
Mark Brown authored
The recent addition of support for testing with the x86 specific quirk KVM_X86_QUIRK_SLOT_ZAP_ALL disabled in the generic memslot tests broke the build of the KVM selftests for all other architectures: In file included from include/kvm_util.h:8, from include/memstress.h:13, from memslot_modification_stress_test.c:21: memslot_modification_stress_test.c: In function ‘main’: memslot_modification_stress_test.c:176:38: error: ‘KVM_X86_QUIRK_SLOT_ZAP_ALL’ undeclared (first use in this function) 176 | KVM_X86_QUIRK_SLOT_ZAP_ALL); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ Add __x86_64__ guard defines to avoid building the relevant code on other architectures. Fixes: 61de4c34 ("KVM: selftests: Test memslot move in memslot_perf_test with quirk disabled") Fixes: 218f6415 ("KVM: selftests: Allow slot modification stress test with quirk disabled") Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Message-ID: <20240930-kvm-build-breakage-v1-1-866fad3cc164@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Marc Zyngier authored
It has been a while since James had any significant bandwidth to review KVM/arm64 patches. But in the meantime, Joey has stepped up and did a really good job reviewing some terrifying patch series. Having talked with the interested parties, it appears that James is unlikely to have time for KVM in the near future, and that Joey is willing to take more responsibilities. So let's appoint Joey as an official reviewer, and give James some breathing space, as well as my personal thanks. I'm sure he will be back one way or another! Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Joey Gouly <joey.gouly@arm.com> Acked-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20240927104956.1223658-1-maz@kernel.orgSigned-off-by: Marc Zyngier <maz@kernel.org>
-
Mark Brown authored
When pKVM saves and restores the host floating point state on a SVE system, it programs the vector length in ZCR_EL2.LEN to be whatever the maximum VL for the PE is. But it uses a buffer allocated with kvm_host_sve_max_vl, the maximum VL shared by all PEs in the system. This means that if we run on a system where the maximum VLs are not consistent, we will overflow the buffer on PEs which support larger VLs. Since the host will not currently attempt to make use of non-shared VLs, fix this by explicitly setting the EL2 VL to be the maximum shared VL when we save and restore. This will enforce the limit on host VL usage. Should we wish to support asymmetric VLs, this code will need to be updated along with the required changes for the host: https://lore.kernel.org/r/20240730-kvm-arm64-fix-pkvm-sve-vl-v6-0-cae8a2e0bd66@kernel.org Fixes: b5b99556 ("KVM: arm64: Eagerly restore host fpsimd/sve state in pKVM") Signed-off-by: Mark Brown <broonie@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240912-kvm-arm64-limit-guest-vl-v2-1-dd2c29cb2ac9@kernel.org [maz: added punctuation to the commit message] Signed-off-by: Marc Zyngier <maz@kernel.org>
-
Vincent Donnefort authored
On an error, hyp_vcpu will be accessed while this memory has already been relinquished to the host and unmapped from the hypervisor. Protect the CPTR assignment with an early return. Fixes: b5b99556 ("KVM: arm64: Eagerly restore host fpsimd/sve state in pKVM") Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20240919110500.2345927-1-vdonnefort@google.comSigned-off-by: Marc Zyngier <maz@kernel.org>
-
- 29 Sep, 2024 6 commits
-
-
Linus Torvalds authored
-
Linus Torvalds authored
The cpu_emergency_register_virt_callback() function is used unconditionally by the x86 kvm code, but it is declared (and defined) conditionally: #if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD) void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback); ... leading to a build error when neither KVM_INTEL nor KVM_AMD support is enabled: arch/x86/kvm/x86.c: In function ‘kvm_arch_enable_virtualization’: arch/x86/kvm/x86.c:12517:9: error: implicit declaration of function ‘cpu_emergency_register_virt_callback’ [-Wimplicit-function-declaration] 12517 | cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/x86/kvm/x86.c: In function ‘kvm_arch_disable_virtualization’: arch/x86/kvm/x86.c:12522:9: error: implicit declaration of function ‘cpu_emergency_unregister_virt_callback’ [-Wimplicit-function-declaration] 12522 | cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix the build by defining empty helper functions the same way the old cpu_emergency_disable_virtualization() function was dealt with for the same situation. Maybe we could instead have made the call sites conditional, since the callers (kvm_arch_{en,dis}able_virtualization()) have an empty weak fallback. I'll leave that to the kvm people to argue about, this at least gets the build going for that particular config. Fixes: 590b09b1 ("KVM: x86: Register "emergency disable" callbacks when virt is enabled") Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Kai Huang <kai.huang@intel.com> Cc: Chao Gao <chao.gao@intel.com> Cc: Farrah Chen <farrah.chen@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailboxLinus Torvalds authored
Pull mailbox updates from Jassi Brar: - fix kconfig dependencies (mhu-v3, omap2+) - use devie name instead of genereic imx_mu_chan as interrupt name (imx) - enable sa8255p and qcs8300 ipc controllers (qcom) - Fix timeout during suspend mode (bcm2835) - convert to use use of_property_match_string (mailbox) - enable mt8188 (mediatek) - use devm_clk_get_enabled helpers (spreadtrum) - fix device-id typo (rockchip) * tag 'mailbox-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox: mailbox, remoteproc: omap2+: fix compile testing dt-bindings: mailbox: qcom-ipcc: Document QCS8300 IPCC dt-bindings: mailbox: qcom-ipcc: document the support for SA8255p dt-bindings: mailbox: mtk,adsp-mbox: Add compatible for MT8188 mailbox: Use of_property_match_string() instead of open-coding mailbox: bcm2835: Fix timeout during suspend mode mailbox: sprd: Use devm_clk_get_enabled() helpers mailbox: rockchip: fix a typo in module autoloading mailbox: imx: use device name in interrupt name mailbox: ARM_MHU_V3 should depend on ARM64
-
Linus Torvalds authored
Merge tag 'i2c-for-6.12-rc1-additional_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: - fix DesignWare driver ENABLE-ABORT sequence, ensuring ABORT can always be sent when needed - check for PCLK in the SynQuacer controller as an optional clock, allowing ACPI to directly provide the clock rate - KEBA driver Kconfig dependency fix - fix XIIC driver power suspend sequence * tag 'i2c-for-6.12-rc1-additional_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: xiic: Fix pm_runtime_set_suspended() with runtime pm enabled i2c: keba: I2C_KEBA should depend on KEBA_CP500 i2c: synquacer: Deal with optional PCLK correctly i2c: designware: fix controller is holding SCL low while ENABLE bit is disabled
-
git://git.infradead.org/users/hch/dma-mappingLinus Torvalds authored
Pull dma-mapping fix from Christoph Hellwig: - handle chained SGLs in the new tracing code (Christoph Hellwig) * tag 'dma-mapping-6.12-2024-09-29' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: fix DMA API tracing for chained scatterlists
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull more SCSI updates from James Bottomley: "These are mostly minor updates. There are two drivers (lpfc and mpi3mr) which missed the initial pull and a core change to retry a start/stop unit which affect suspend/resume" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits) scsi: lpfc: Update lpfc version to 14.4.0.5 scsi: lpfc: Support loopback tests with VMID enabled scsi: lpfc: Revise TRACE_EVENT log flag severities from KERN_ERR to KERN_WARNING scsi: lpfc: Ensure DA_ID handling completion before deleting an NPIV instance scsi: lpfc: Fix kref imbalance on fabric ndlps from dev_loss_tmo handler scsi: lpfc: Restrict support for 32 byte CDBs to specific HBAs scsi: lpfc: Update phba link state conditional before sending CMF_SYNC_WQE scsi: lpfc: Add ELS_RSP cmd to the list of WQEs to flush in lpfc_els_flush_cmd() scsi: mpi3mr: Update driver version to 8.12.0.0.50 scsi: mpi3mr: Improve wait logic while controller transitions to READY state scsi: mpi3mr: Update MPI Headers to revision 34 scsi: mpi3mr: Use firmware-provided timestamp update interval scsi: mpi3mr: Enhance the Enable Controller retry logic scsi: sd: Fix off-by-one error in sd_read_block_characteristics() scsi: pm8001: Do not overwrite PCI queue mapping scsi: scsi_debug: Remove a useless memset() scsi: pmcraid: Convert comma to semicolon scsi: sd: Retry START STOP UNIT commands scsi: mpi3mr: A performance fix scsi: ufs: qcom: Update MODE_MAX cfg_bw value ...
-