- 15 Jul, 2021 18 commits
-
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull kvm fixes from Paolo Bonzini: - Allow again loading KVM on 32-bit non-PAE builds - Fixes for host SMIs on AMD - Fixes for guest SMIs on AMD - Fixes for selftests on s390 and ARM - Fix memory leak - Enforce no-instrumentation area on vmentry when hardware breakpoints are in use. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) KVM: selftests: smm_test: Test SMM enter from L2 KVM: nSVM: Restore nested control upon leaving SMM KVM: nSVM: Fix L1 state corruption upon return from SMM KVM: nSVM: Introduce svm_copy_vmrun_state() KVM: nSVM: Check that VM_HSAVE_PA MSR was set before VMRUN KVM: nSVM: Check the value written to MSR_VM_HSAVE_PA KVM: SVM: Fix sev_pin_memory() error checks in SEV migration utilities KVM: SVM: Return -EFAULT if copy_to_user() for SEV mig packet header fails KVM: SVM: add module param to control the #SMI interception KVM: SVM: remove INIT intercept handler KVM: SVM: #SMI interception must not skip the instruction KVM: VMX: Remove vmx_msr_index from vmx.h KVM: X86: Disable hardware breakpoints unconditionally before kvm_x86->run() KVM: selftests: Address extra memslot parameters in vm_vaddr_alloc kvm: debugfs: fix memory leak in kvm_create_vm_debugfs KVM: x86/pmu: Clear anythread deprecated bit when 0xa leaf is unsupported on the SVM KVM: mmio: Fix use-after-free Read in kvm_vm_ioctl_unregister_coalesced_mmio KVM: SVM: Revert clearing of C-bit on GPA in #NPF handler KVM: x86/mmu: Do not apply HPA (memory encryption) mask to GPAs KVM: x86: Use kernel's x86_phys_bits to handle reduced MAXPHYADDR ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds authored
Pull iommu fixes from Joerg Roedel: - Revert a patch which caused boot failures with QCOM IOMMU - Two fixes for Intel VT-d context table handling - Physical address decoding fix for Rockchip IOMMU - Add a reviewer for AMD IOMMU * tag 'iommu-fixes-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: MAINTAINERS: Add Suravee Suthikulpanit as Reviewer for AMD IOMMU (AMD-Vi) iommu/rockchip: Fix physical address decoding iommu/vt-d: Fix clearing real DMA device's scalable-mode context entries iommu/vt-d: Global devTLB flush when present context entry changed iommu/qcom: Revert "iommu/arm: Cleanup resources in case of probe error path"
-
Vitaly Kuznetsov authored
Two additional tests are added: - SMM triggered from L2 does not currupt L1 host state. - Save/restore during SMM triggered from L2 does not corrupt guest/host state. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210628104425.391276-7-vkuznets@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
If the VM was migrated while in SMM, no nested state was saved/restored, and therefore svm_leave_smm has to load both save and control area of the vmcb12. Save area is already loaded from HSAVE area, so now load the control area as well from the vmcb12. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210628104425.391276-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
VMCB split commit 4995a368 ("KVM: SVM: Use a separate vmcb for the nested L2 guest") broke return from SMM when we entered there from guest (L2) mode. Gen2 WS2016/Hyper-V is known to do this on boot. The problem manifests itself like this: kvm_exit: reason EXIT_RSM rip 0x7ffbb280 info 0 0 kvm_emulate_insn: 0:7ffbb280: 0f aa kvm_smm_transition: vcpu 0: leaving SMM, smbase 0x7ffb3000 kvm_nested_vmrun: rip: 0x000000007ffbb280 vmcb: 0x0000000008224000 nrip: 0xffffffffffbbe119 int_ctl: 0x01020000 event_inj: 0x00000000 npt: on kvm_nested_intercepts: cr_read: 0000 cr_write: 0010 excp: 40060002 intercepts: fd44bfeb 0000217f 00000000 kvm_entry: vcpu 0, rip 0xffffffffffbbe119 kvm_exit: reason EXIT_NPF rip 0xffffffffffbbe119 info 200000006 1ab000 kvm_nested_vmexit: vcpu 0 reason npf rip 0xffffffffffbbe119 info1 0x0000000200000006 info2 0x00000000001ab000 intr_info 0x00000000 error_code 0x00000000 kvm_page_fault: address 1ab000 error_code 6 kvm_nested_vmexit_inject: reason EXIT_NPF info1 200000006 info2 1ab000 int_info 0 int_info_err 0 kvm_entry: vcpu 0, rip 0x7ffbb280 kvm_exit: reason EXIT_EXCP_GP rip 0x7ffbb280 info 0 0 kvm_emulate_insn: 0:7ffbb280: 0f aa kvm_inj_exception: #GP (0x0) Note: return to L2 succeeded but upon first exit to L1 its RIP points to 'RSM' instruction but we're not in SMM. The problem appears to be that VMCB01 gets irreversibly destroyed during SMM execution. Previously, we used to have 'hsave' VMCB where regular (pre-SMM) L1's state was saved upon nested_svm_vmexit() but now we just switch to VMCB01 from VMCB02. Pre-split (working) flow looked like: - SMM is triggered during L2's execution - L2's state is pushed to SMRAM - nested_svm_vmexit() restores L1's state from 'hsave' - SMM -> RSM - enter_svm_guest_mode() switches to L2 but keeps 'hsave' intact so we have pre-SMM (and pre L2 VMRUN) L1's state there - L2's state is restored from SMRAM - upon first exit L1's state is restored from L1. This was always broken with regards to svm_get_nested_state()/ svm_set_nested_state(): 'hsave' was never a part of what's being save and restored so migration happening during SMM triggered from L2 would never restore L1's state correctly. Post-split flow (broken) looks like: - SMM is triggered during L2's execution - L2's state is pushed to SMRAM - nested_svm_vmexit() switches to VMCB01 from VMCB02 - SMM -> RSM - enter_svm_guest_mode() switches from VMCB01 to VMCB02 but pre-SMM VMCB01 is already lost. - L2's state is restored from SMRAM - upon first exit L1's state is restored from VMCB01 but it is corrupted (reflects the state during 'RSM' execution). VMX doesn't have this problem because unlike VMCB, VMCS keeps both guest and host state so when we switch back to VMCS02 L1's state is intact there. To resolve the issue we need to save L1's state somewhere. We could've created a third VMCB for SMM but that would require us to modify saved state format. L1's architectural HSAVE area (pointed by MSR_VM_HSAVE_PA) seems appropriate: L0 is free to save any (or none) of L1's state there. Currently, KVM does 'none'. Note, for nested state migration to succeed, both source and destination hypervisors must have the fix. We, however, don't need to create a new flag indicating the fact that HSAVE area is now populated as migration during SMM triggered from L2 was always broken. Fixes: 4995a368 ("KVM: SVM: Use a separate vmcb for the nested L2 guest") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Separate the code setting non-VMLOAD-VMSAVE state from svm_set_nested_state() into its own function. This is going to be re-used from svm_enter_smm()/svm_leave_smm(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210628104425.391276-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
APM states that "The address written to the VM_HSAVE_PA MSR, which holds the address of the page used to save the host state on a VMRUN, must point to a hypervisor-owned page. If this check fails, the WRMSR will fail with a #GP(0) exception. Note that a value of 0 is not considered valid for the VM_HSAVE_PA MSR and a VMRUN that is attempted while the HSAVE_PA is 0 will fail with a #GP(0) exception." svm_set_msr() already checks that the supplied address is valid, so only check for '0' is missing. Add it to nested_svm_vmrun(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210628104425.391276-3-vkuznets@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
APM states that #GP is raised upon write to MSR_VM_HSAVE_PA when the supplied address is not page-aligned or is outside of "maximum supported physical address for this implementation". page_address_valid() check seems suitable. Also, forcefully page-align the address when it's written from VMM. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210628104425.391276-2-vkuznets@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> [Add comment about behavior for host-provided values. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Use IS_ERR() instead of checking for a NULL pointer when querying for sev_pin_memory() failures. sev_pin_memory() always returns an error code cast to a pointer, or a valid pointer; it never returns NULL. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Steve Rutherford <srutherford@google.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Ashish Kalra <ashish.kalra@amd.com> Fixes: d3d1af85 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command") Fixes: 15fb7de1 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210506175826.2166383-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Return -EFAULT if copy_to_user() fails; if accessing user memory faults, copy_to_user() returns the number of bytes remaining, not an error code. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Steve Rutherford <srutherford@google.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Ashish Kalra <ashish.kalra@amd.com> Fixes: d3d1af85 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210506175826.2166383-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
In theory there are no side effects of not intercepting #SMI, because then #SMI becomes transparent to the OS and the KVM. Plus an observation on recent Zen2 CPUs reveals that these CPUs ignore #SMI interception and never deliver #SMI VMexits. This is also useful to test nested KVM to see that L1 handles #SMIs correctly in case when L1 doesn't intercept #SMI. Finally the default remains the same, the SMI are intercepted by default thus this patch doesn't have any effect unless non default module param value is used. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210707125100.677203-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
Kernel never sends real INIT even to CPUs, other than on boot. Thus INIT interception is an error which should be caught by a check for an unknown VMexit reason. On top of that, the current INIT VM exit handler skips the current instruction which is wrong. That was added in commit 5ff3a351 ("KVM: x86: Move trivial instruction-based exit handlers to common code"). Fixes: 5ff3a351 ("KVM: x86: Move trivial instruction-based exit handlers to common code") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210707125100.677203-3-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
Commit 5ff3a351 ("KVM: x86: Move trivial instruction-based exit handlers to common code"), unfortunately made a mistake of treating nop_on_interception and nop_interception in the same way. Former does truly nothing while the latter skips the instruction. SMI VM exit handler should do nothing. (SMI itself is handled by the host when we do STGI) Fixes: 5ff3a351 ("KVM: x86: Move trivial instruction-based exit handlers to common code") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210707125100.677203-2-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Yu Zhang authored
vmx_msr_index was used to record the list of MSRs which can be lazily restored when kvm returns to userspace. It is now reimplemented as kvm_uret_msrs_list, a common x86 list which is only used inside x86.c. So just remove the obsolete declaration in vmx.h. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Message-Id: <20210707235702.31595-1-yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Lai Jiangshan authored
When the host is using debug registers but the guest is not using them nor is the guest in guest-debug state, the kvm code does not reset the host debug registers before kvm_x86->run(). Rather, it relies on the hardware vmentry instruction to automatically reset the dr7 registers which ensures that the host breakpoints do not affect the guest. This however violates the non-instrumentable nature around VM entry and exit; for example, when a host breakpoint is set on vcpu->arch.cr2, Another issue is consistency. When the guest debug registers are active, the host breakpoints are reset before kvm_x86->run(). But when the guest debug registers are inactive, the host breakpoints are delayed to be disabled. The host tracing tools may see different results depending on what the guest is doing. To fix the problems, we clear %db7 unconditionally before kvm_x86->run() if the host has set any breakpoints, no matter if the guest is using them or not. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20210628172632.81029-1-jiangshanlai@gmail.com> Cc: stable@vger.kernel.org [Only clear %db7 instead of reloading all debug registers. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Ricardo Koller authored
Commit a75a895e ("KVM: selftests: Unconditionally use memslot 0 for vaddr allocations") removed the memslot parameters from vm_vaddr_alloc. It addressed all callers except one under lib/aarch64/, due to a race with commit e3db7579 ("KVM: selftests: Add exception handling support for aarch64") Fix the vm_vaddr_alloc call in lib/aarch64/processor.c. Reported-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Ricardo Koller <ricarkol@google.com> Message-Id: <20210702201042.4036162-1-ricarkol@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Pavel Skripkin authored
In commit bc9e9e67 ("KVM: debugfs: Reuse binary stats descriptors") loop for filling debugfs_stat_data was copy-pasted 2 times, but in the second loop pointers are saved over pointers allocated in the first loop. All this causes is a memory leak, fix it. Fixes: bc9e9e67 ("KVM: debugfs: Reuse binary stats descriptors") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Reviewed-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210701195500.27097-1-paskripkin@gmail.com> Reviewed-by: Jing Zhang <jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Suravee Suthikulpanit authored
To help review changes related to AMD IOMMU. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/1626296542-30454-1-git-send-email-suravee.suthikulpanit@amd.comSigned-off-by: Joerg Roedel <jroedel@suse.de>
-
- 14 Jul, 2021 16 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from Jakub Kicinski. "Including fixes from bpf and netfilter. Current release - regressions: - sock: fix parameter order in sock_setsockopt() Current release - new code bugs: - netfilter: nft_last: - fix incorrect arithmetic when restoring last used - honor NFTA_LAST_SET on restoration Previous releases - regressions: - udp: properly flush normal packet at GRO time - sfc: ensure correct number of XDP queues; don't allow enabling the feature if there isn't sufficient resources to Tx from any CPU - dsa: sja1105: fix address learning getting disabled on the CPU port - mptcp: addresses a rmem accounting issue that could keep packets in subflow receive buffers longer than necessary, delaying MPTCP-level ACKs - ip_tunnel: fix mtu calculation for ETHER tunnel devices - do not reuse skbs allocated from skbuff_fclone_cache in the napi skb cache, we'd try to return them to the wrong slab cache - tcp: consistently disable header prediction for mptcp Previous releases - always broken: - bpf: fix subprog poke descriptor tracking use-after-free - ipv6: - allocate enough headroom in ip6_finish_output2() in case iptables TEE is used - tcp: drop silly ICMPv6 packet too big messages to avoid expensive and pointless lookups (which may serve as a DDOS vector) - make sure fwmark is copied in SYNACK packets - fix 'disable_policy' for forwarded packets (align with IPv4) - netfilter: conntrack: - do not renew entry stuck in tcp SYN_SENT state - do not mark RST in the reply direction coming after SYN packet for an out-of-sync entry - mptcp: cleanly handle error conditions with MP_JOIN and syncookies - mptcp: fix double free when rejecting a join due to port mismatch - validate lwtstate->data before returning from skb_tunnel_info() - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path - mt76: mt7921: continue to probe driver when fw already downloaded - bonding: fix multiple issues with offloading IPsec to (thru?) bond - stmmac: ptp: fix issues around Qbv support and setting time back - bcmgenet: always clear wake-up based on energy detection Misc: - sctp: move 198 addresses from unusable to private scope - ptp: support virtual clocks and timestamping - openvswitch: optimize operation for key comparison" * tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits) net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave() sfc: add logs explaining XDP_TX/REDIRECT is not available sfc: ensure correct number of XDP queues sfc: fix lack of XDP TX queues - error XDP TX failed (-22) net: fddi: fix UAF in fza_probe net: dsa: sja1105: fix address learning getting disabled on the CPU port net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload net: Use nlmsg_unicast() instead of netlink_unicast() octeontx2-pf: Fix uninitialized boolean variable pps ipv6: allocate enough headroom in ip6_finish_output2() net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific net: bridge: multicast: fix MRD advertisement router port marking race net: bridge: multicast: fix PIM hello router port marking race net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340 dsa: fix for_each_child.cocci warnings virtio_net: check virtqueue_add_sgs() return value mptcp: properly account bulk freed memory selftests: mptcp: fix case multiple subflows limited by server mptcp: avoid processing packet if a subflow reset mptcp: fix syncookie process if mptcp can not_accept new subflow ...
-
Christian Brauner authored
Add a simple helper that filesystems can use in their parameter parser to parse the "source" parameter. A few places open-coded this function and that already caused a bug in the cgroup v1 parser that we fixed. Let's make it harder to get this wrong by introducing a helper which performs all necessary checks. Link: https://syzkaller.appspot.com/bug?id=6312526aba5beae046fdae8f00399f87aab48b12 Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christian Brauner authored
The following sequence can be used to trigger a UAF: int fscontext_fd = fsopen("cgroup"); int fd_null = open("/dev/null, O_RDONLY); int fsconfig(fscontext_fd, FSCONFIG_SET_FD, "source", fd_null); close_range(3, ~0U, 0); The cgroup v1 specific fs parser expects a string for the "source" parameter. However, it is perfectly legitimate to e.g. specify a file descriptor for the "source" parameter. The fs parser doesn't know what a filesystem allows there. So it's a bug to assume that "source" is always of type fs_value_is_string when it can reasonably also be fs_value_is_file. This assumption in the cgroup code causes a UAF because struct fs_parameter uses a union for the actual value. Access to that union is guarded by the param->type member. Since the cgroup paramter parser didn't check param->type but unconditionally moved param->string into fc->source a close on the fscontext_fd would trigger a UAF during put_fs_context() which frees fc->source thereby freeing the file stashed in param->file causing a UAF during a close of the fd_null. Fix this by verifying that param->type is actually a string and report an error if not. In follow up patches I'll add a new generic helper that can be used here and by other filesystems instead of this error-prone copy-pasta fix. But fixing it in here first makes backporting a it to stable a lot easier. Fixes: 8d2451f4 ("cgroup1: switch to option-by-option parsing") Reported-by: syzbot+283ce5a46486d6acdbaf@syzkaller.appspotmail.com Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@kernel.org> Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Like Xu authored
The AMD platform does not support the functions Ah CPUID leaf. The returned results for this entry should all remain zero just like the native does: AMD host: 0x0000000a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000 (uncanny) AMD guest: 0x0000000a 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00008000 Fixes: cadbaa03 ("perf/x86/intel: Make anythread filter support conditional") Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20210628074354.33848-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Kefeng Wang authored
BUG: KASAN: use-after-free in kvm_vm_ioctl_unregister_coalesced_mmio+0x7c/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:183 Read of size 8 at addr ffff0000c03a2500 by task syz-executor083/4269 CPU: 5 PID: 4269 Comm: syz-executor083 Not tainted 5.10.0 #7 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132 show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x110/0x164 lib/dump_stack.c:118 print_address_description+0x78/0x5c8 mm/kasan/report.c:385 __kasan_report mm/kasan/report.c:545 [inline] kasan_report+0x148/0x1e4 mm/kasan/report.c:562 check_memory_region_inline mm/kasan/generic.c:183 [inline] __asan_load8+0xb4/0xbc mm/kasan/generic.c:252 kvm_vm_ioctl_unregister_coalesced_mmio+0x7c/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:183 kvm_vm_ioctl+0xe30/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3755 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 Allocated by task 4269: stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475 kmem_cache_alloc_trace include/linux/slab.h:450 [inline] kmalloc include/linux/slab.h:552 [inline] kzalloc include/linux/slab.h:664 [inline] kvm_vm_ioctl_register_coalesced_mmio+0x78/0x1cc arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:146 kvm_vm_ioctl+0x7e8/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3746 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 Freed by task 4269: stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track+0x38/0x6c mm/kasan/common.c:56 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355 __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422 kasan_slab_free+0x10/0x1c mm/kasan/common.c:431 slab_free_hook mm/slub.c:1544 [inline] slab_free_freelist_hook mm/slub.c:1577 [inline] slab_free mm/slub.c:3142 [inline] kfree+0x104/0x38c mm/slub.c:4124 coalesced_mmio_destructor+0x94/0xa4 arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:102 kvm_iodevice_destructor include/kvm/iodev.h:61 [inline] kvm_io_bus_unregister_dev+0x248/0x280 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:4374 kvm_vm_ioctl_unregister_coalesced_mmio+0x158/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:186 kvm_vm_ioctl+0xe30/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3755 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 If kvm_io_bus_unregister_dev() return -ENOMEM, we already call kvm_iodevice_destructor() inside this function to delete 'struct kvm_coalesced_mmio_dev *dev' from list and free the dev, but kvm_iodevice_destructor() is called again, it will lead the above issue. Let's check the the return value of kvm_io_bus_unregister_dev(), only call kvm_iodevice_destructor() if the return value is 0. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Message-Id: <20210626070304.143456-1-wangkefeng.wang@huawei.com> Cc: stable@vger.kernel.org Fixes: 5d3c4c79 ("KVM: Stop looking for coalesced MMIO zones if the bus is destroyed", 2021-04-20) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Don't clear the C-bit in the #NPF handler, as it is a legal GPA bit for non-SEV guests, and for SEV guests the C-bit is dropped before the GPA hits the NPT in hardware. Clearing the bit for non-SEV guests causes KVM to mishandle #NPFs with that collide with the host's C-bit. Although the APM doesn't explicitly state that the C-bit is not reserved for non-SEV, Tom Lendacky confirmed that the following snippet about the effective reduction due to the C-bit does indeed apply only to SEV guests. Note that because guest physical addresses are always translated through the nested page tables, the size of the guest physical address space is not impacted by any physical address space reduction indicated in CPUID 8000_001F[EBX]. If the C-bit is a physical address bit however, the guest physical address space is effectively reduced by 1 bit. And for SEV guests, the APM clearly states that the bit is dropped before walking the nested page tables. If the C-bit is an address bit, this bit is masked from the guest physical address when it is translated through the nested page tables. Consequently, the hypervisor does not need to be aware of which pages the guest has chosen to mark private. Note, the bogus C-bit clearing was removed from legacy #PF handler in commit 6d1b867d ("KVM: SVM: Don't strip the C-bit from CR2 on #PF interception"). Fixes: 0ede79e1 ("KVM: SVM: Clear C-bit from the page fault address") Cc: Peter Gonda <pgonda@google.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210625020354.431829-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Ignore "dynamic" host adjustments to the physical address mask when generating the masks for guest PTEs, i.e. the guest PA masks. The host physical address space and guest physical address space are two different beasts, e.g. even though SEV's C-bit is the same bit location for both host and guest, disabling SME in the host (which clears shadow_me_mask) does not affect the guest PTE->GPA "translation". For non-SEV guests, not dropping bits is the correct behavior. Assuming KVM and userspace correctly enumerate/configure guest MAXPHYADDR, bits that are lost as collateral damage from memory encryption are treated as reserved bits, i.e. KVM will never get to the point where it attempts to generate a gfn using the affected bits. And if userspace wants to create a bogus vCPU, then userspace gets to deal with the fallout of hardware doing odd things with bad GPAs. For SEV guests, not dropping the C-bit is technically wrong, but it's a moot point because KVM can't read SEV guest's page tables in any case since they're always encrypted. Not to mention that the current KVM code is also broken since sme_me_mask does not have to be non-zero for SEV to be supported by KVM. The proper fix would be to teach all of KVM to correctly handle guest private memory, but that's a task for the future. Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210623230552.4027702-5-seanjc@google.com> [Use a new header instead of adding header guards to paging_tmpl.h. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Use boot_cpu_data.x86_phys_bits instead of the raw CPUID information to enumerate the MAXPHYADDR for KVM guests when TDP is disabled (the guest version is only relevant to NPT/TDP). When using shadow paging, any reductions to the host's MAXPHYADDR apply to KVM and its guests as well, i.e. using the raw CPUID info will cause KVM to misreport the number of PA bits available to the guest. Unconditionally zero out the "Physical Address bit reduction" entry. For !TDP, the adjustment is already done, and for TDP enumerating the host's reduction is wrong as the reduction does not apply to GPAs. Fixes: 9af9b940 ("x86/cpu/AMD: Handle SME reduction in physical address size") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210623230552.4027702-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Ignore the guest MAXPHYADDR reported by CPUID.0x8000_0008 if TDP, i.e. NPT, is disabled, and instead use the host's MAXPHYADDR. Per AMD'S APM: Maximum guest physical address size in bits. This number applies only to guests using nested paging. When this field is zero, refer to the PhysAddrSize field for the maximum guest physical address size. Fixes: 24c82e57 ("KVM: Sanitize cpuid") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210623230552.4027702-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Let KVM load if EFER.NX=0 even if NX is supported, the analysis and testing (or lack thereof) for the non-PAE host case was garbage. If the kernel won't be using PAE paging, .Ldefault_entry in head_32.S skips over the entire EFER sequence. Hopefully that can be changed in the future to allow KVM to require EFER.NX, but the motivation behind KVM's requirement isn't yet merged. Reverting and revisiting the mess at a later date is by far the safest approach. This reverts commit 8bbed95d. Fixes: 8bbed95d ("KVM: x86: WARN and reject loading KVM if NX is supported but not enabled") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210625001853.318148-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Marc Zyngier authored
Commit b78f4a59 ("KVM: selftests: Rename vm_handle_exception") raced with a couple of new x86 tests, missing two vm_handle_exception to vm_install_exception_handler conversions. Help the two broken tests to catch up with the new world. Cc: Andrew Jones <drjones@redhat.com> CC: Ricardo Koller <ricarkol@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20210701071928.2971053-1-maz@kernel.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Merge tag 'kvm-s390-master-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: selftests: Fixes - provide memory model for IBM z196 and zEC12 - do not require 64GB of memory
-
Benjamin Gaignard authored
Restore bits 39 to 32 at correct position. It reverses the operation done in rk_dma_addr_dte_v2(). Fixes: c55356c5 ("iommu: rockchip: Add support for iommu v2") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Link: https://lore.kernel.org/r/20210712101232.318589-1-benjamin.gaignard@collabora.comSigned-off-by: Joerg Roedel <jroedel@suse.de>
-
Lu Baolu authored
The commit 2b0140c6 ("iommu/vt-d: Use pci_real_dma_dev() for mapping") fixes an issue of "sub-device is removed where the context entry is cleared for all aliases". But this commit didn't consider the PASID entry and PASID table in VT-d scalable mode. This fix increases the coverage of scalable mode. Suggested-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Fixes: 8038bdb8 ("iommu/vt-d: Only clear real DMA device's context entries") Fixes: 2b0140c6 ("iommu/vt-d: Use pci_real_dma_dev() for mapping") Cc: stable@vger.kernel.org # v5.6+ Cc: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210712071712.3416949-1-baolu.lu@linux.intel.comSigned-off-by: Joerg Roedel <jroedel@suse.de>
-
Sanjay Kumar authored
This fixes a bug in context cache clear operation. The code was not following the correct invalidation flow. A global device TLB invalidation should be added after the IOTLB invalidation. At the same time, it uses the domain ID from the context entry. But in scalable mode, the domain ID is in PASID table entry, not context entry. Fixes: 7373a8cc ("iommu/vt-d: Setup context and enable RID2PASID support") Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Sanjay Kumar <sanjay.k.kumar@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210712071315.3416543-1-baolu.lu@linux.intel.comSigned-off-by: Joerg Roedel <jroedel@suse.de>
-
Marek Szyprowski authored
QCOM IOMMU driver calls bus_set_iommu() for every IOMMU device controller, what fails for the second and latter IOMMU devices. This is intended and must be not fatal to the driver registration process. Also the cleanup path should take care of the runtime PM state, what is missing in the current patch. Revert relevant changes to the QCOM IOMMU driver until a proper fix is prepared. This partially reverts commit 249c9dc6. Fixes: 249c9dc6 ("iommu/arm: Cleanup resources in case of probe error path") Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210705065657.30356-1-m.szyprowski@samsung.comSigned-off-by: Joerg Roedel <jroedel@suse.de>
-
- 13 Jul, 2021 6 commits
-
-
Vladimir Oltean authored
This was not caught because there is no switch driver which implements the .port_bridge_join but not .port_bridge_leave method, but it should nonetheless be fixed, as in certain conditions (driver development) it might lead to NULL pointer dereference. Fixes: f66a6a69 ("net: dsa: permit cross-chip bridging between all trees in the system") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linuxLinus Torvalds authored
Pull vboxsf fixes from Hans de Goede: "This adds support for the atomic_open directory-inode op to vboxsf. Note this is not just an enhancement this also fixes an actual issue which users are hitting, see the commit message of the "boxsf: Add support for the atomic_open directory-inode" patch" * tag 'vboxsf-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux: vboxsf: Add support for the atomic_open directory-inode op vboxsf: Add vboxsf_[create|release]_sf_handle() helpers vboxsf: Make vboxsf_dir_create() return the handle for the created file vboxsf: Honor excl flag to the dir-inode create op
-
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds authored
Pull btrfs zoned mode fixes from David Sterba: - fix deadlock when allocating system chunk - fix wrong mutex unlock on an error path - fix extent map splitting for append operation - update and fix message reporting unusable chunk space - don't block when background zone reclaim runs with balance in parallel * tag 'for-5.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: fix wrong mutex unlock on failure to allocate log root tree btrfs: don't block if we can't acquire the reclaim lock btrfs: properly split extent_map for REQ_OP_ZONE_APPEND btrfs: rework chunk allocation to avoid exhaustion of the system chunk array btrfs: fix deadlock with concurrent chunk allocations involving system chunks btrfs: zoned: print unusable percentage when reclaiming block groups btrfs: zoned: fix types for u64 division in btrfs_reclaim_bgs_work
-
David S. Miller authored
Íñigo Huguet says: ==================== sfc: Fix lack of XDP TX queues A change introduced in commit e26ca4b5 ("sfc: reduce the number of requested xdp ev queues") created a bug in XDP_TX and XDP_REDIRECT because it unintentionally reduced the number of XDP TX queues, letting not enough queues to have one per CPU, which leaded to errors if XDP TX/REDIRECT was done from a high numbered CPU. This patchs make the following changes: - Fix the bug mentioned above - Revert commit 99ba0ea6 ("sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues") which intended to fix a related problem, created by mentioned bug, but it's no longer necessary - Add a new error log message if there are not enough resources to make XDP_TX/REDIRECT work V1 -> V2: keep the calculation of how many tx queues can handle a single event queue, but apply the "max. tx queues per channel" upper limit. V2 -> V3: WARN_ON if the number of initialized XDP TXQs differs from the expected. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Íñigo Huguet authored
If it's not possible to allocate enough channels for XDP, XDP_TX and XDP_REDIRECT don't work. However, only a message saying that not enough channels were available was shown, but not saying what are the consequences in that case. The user didn't know if he/she can use XDP or not, if the performance is reduced, or what. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Íñigo Huguet authored
Commit 99ba0ea6 ("sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues") intended to fix a problem caused by a round up when calculating the number of XDP channels and queues. However, this was not the real problem. The real problem was that the number of XDP TX queues had been reduced to half in commit e26ca4b5 ("sfc: reduce the number of requested xdp ev queues"), but the variable xdp_tx_queue_count had remained the same. Once the correct number of XDP TX queues is created again in the previous patch of this series, this also can be reverted since the error doesn't actually exist. Only in the case that there is a bug in the code we can have different values in xdp_queue_number and efx->xdp_tx_queue_count. Because of this, and per Edward Cree's suggestion, I add instead a WARN_ON to catch if it happens again in the future. Note that the number of allocated queues can be higher than the number of used ones due to the round up, as explained in the existing comment in the code. That's why we also have to stop increasing xdp_queue_number beyond efx->xdp_tx_queue_count. Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-