- 30 May, 2018 40 commits
-
-
Mauricio Faria de Oliveira authored
commit 6232774f upstream. After migration the security feature flags might have changed (e.g., destination system with unpatched firmware), but some flags are not set/clear again in init_cpu_char_feature_flags() because it assumes the security flags to be the defaults. Additionally, if the H_GET_CPU_CHARACTERISTICS hypercall fails then init_cpu_char_feature_flags() does not run again, which potentially might leave the system in an insecure or sub-optimal configuration. So, just restore the security feature flags to the defaults assumed by init_cpu_char_feature_flags() so it can set/clear them correctly, and to ensure safe settings are in place in case the hypercall fail. Fixes: f636c147 ("powerpc/pseries: Set or clear security feature flags") Depends-on: 19887d6a28e2 ("powerpc: Move default security feature flags") Signed-off-by:
Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mauricio Faria de Oliveira authored
commit e7347a86 upstream. This moves the definition of the default security feature flags (i.e., enabled by default) closer to the security feature flags. This can be used to restore current flags to the default flags. Signed-off-by:
Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mauricio Faria de Oliveira authored
commit 0f9bdfe3 upstream. The H_CPU_BEHAV_* flags should be checked for in the 'behaviour' field of 'struct h_cpu_char_result' -- 'character' is for H_CPU_CHAR_* flags. Found by playing around with QEMU's implementation of the hypercall: H_CPU_CHAR=0xf000000000000000 H_CPU_BEHAV=0x0000000000000000 This clears H_CPU_BEHAV_FAVOUR_SECURITY and H_CPU_BEHAV_L1D_FLUSH_PR so pseries_setup_rfi_flush() disables 'rfi_flush'; and it also clears H_CPU_CHAR_L1D_THREAD_PRIV flag. So there is no RFI flush mitigation at all for cpu_show_meltdown() to report; but currently it does: Original kernel: # cat /sys/devices/system/cpu/vulnerabilities/meltdown Mitigation: RFI Flush Patched kernel: # cat /sys/devices/system/cpu/vulnerabilities/meltdown Not affected H_CPU_CHAR=0x0000000000000000 H_CPU_BEHAV=0xf000000000000000 This sets H_CPU_BEHAV_BNDS_CHK_SPEC_BAR so cpu_show_spectre_v1() should report vulnerable; but currently it doesn't: Original kernel: # cat /sys/devices/system/cpu/vulnerabilities/spectre_v1 Not affected Patched kernel: # cat /sys/devices/system/cpu/vulnerabilities/spectre_v1 Vulnerable Brown-paper-bag-by:
Michael Ellerman <mpe@ellerman.id.au> Fixes: f636c147 ("powerpc/pseries: Set or clear security feature flags") Signed-off-by:
Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit d6fbe1c5 upstream. Add a definition for cpu_show_spectre_v2() to override the generic version. This has several permuations, though in practice some may not occur we cater for any combination. The most verbose is: Mitigation: Indirect branch serialisation (kernel only), Indirect branch cache disabled, ori31 speculation barrier enabled We don't treat the ori31 speculation barrier as a mitigation on its own, because it has to be *used* by code in order to be a mitigation and we don't know if userspace is doing that. So if that's all we see we say: Vulnerable, ori31 speculation barrier enabled Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit 56986016 upstream. Add a definition for cpu_show_spectre_v1() to override the generic version. Currently this just prints "Not affected" or "Vulnerable" based on the firmware flag. Although the kernel does have array_index_nospec() in a few places, we haven't yet audited all the powerpc code to see where it's necessary, so for now we don't list that as a mitigation. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit 2e4a1616 upstream. Now that we have the security flags we can simplify the code in pseries_setup_rfi_flush() because the security flags have pessimistic defaults. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit 37c0bdd0 upstream. Now that we have the security flags we can significantly simplify the code in pnv_setup_rfi_flush(), because we can use the flags instead of checking device tree properties and because the security flags have pessimistic defaults. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit ff348355 upstream. Now that we have the security feature flags we can make the information displayed in the "meltdown" file more informative. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit 8ad33041 upstream. This landed in setup_64.c for no good reason other than we had nowhere else to put it. Now that we have a security-related file, that is a better place for it so move it. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit 77addf6e upstream. Now that we have feature flags for security related things, set or clear them based on what we see in the device tree provided by firmware. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit f636c147 upstream. Now that we have feature flags for security related things, set or clear them based on what we receive from the hypercall. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit 9a868f63 upstream. This commit adds security feature flags to reflect the settings we receive from firmware regarding Spectre/Meltdown mitigations. The feature names reflect the names we are given by firmware on bare metal machines. See the hostboot source for details. Arguably these could be firmware features, but that then requires them to be read early in boot so they're available prior to asm feature patching, but we don't actually want to use them for patching. We may also want to dynamically update them in future, which would be incompatible with the way firmware features work (at the moment at least). So for now just make them separate flags. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit c4bc3662 upstream. Add some additional values which have been defined for the H_GET_CPU_CHARACTERISTICS hypercall. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit 921bc6cf upstream. We might have migrated to a machine that uses a different flush type, or doesn't need flushing at all. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mauricio Faria de Oliveira authored
commit 0063d61c upstream. Currently the rfi-flush messages print 'Using <type> flush' for all enabled_flush_types, but that is not necessarily true -- as now the fallback flush is always enabled on pseries, but the fixup function overwrites its nop/branch slot with other flush types, if available. So, replace the 'Using <type> flush' messages with '<type> flush is available'. Also, print the patched flush types in the fixup function, so users can know what is (not) being used (e.g., the slower, fallback flush, or no flush type at all if flush is disabled via the debugfs switch). Suggested-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit 84749a58 upstream. This ensures the fallback flush area is always allocated on pseries, so in case a LPAR is migrated from a patched to an unpatched system, it is possible to enable the fallback flush in the target system. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit abf110f3 upstream. For PowerVM migration we want to be able to call setup_rfi_flush() again after we've migrated the partition. To support that we need to check that we're not trying to allocate the fallback flush area after memblock has gone away (i.e., boot-time only). Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit 1e2a9fc7 upstream. rfi_flush_enable() includes a check to see if we're already enabled (or disabled), and in that case does nothing. But that means calling setup_rfi_flush() a 2nd time doesn't actually work, which is a bit confusing. Move that check into the debugfs code, where it really belongs. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit eb0a2d26 upstream. Some versions of firmware will have a setting that can be configured to disable the RFI flush, add support for it. Fixes: 6e032b35 ("powerpc/powernv: Check device-tree for RFI flush settings") Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Ellerman authored
commit 582605a4 upstream. Some versions of firmware will have a setting that can be configured to disable the RFI flush, add support for it. Fixes: 8989d568 ("powerpc/pseries: Query hypervisor for RFI flush settings") Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nicholas Piggin authored
commit bdcb1aef upstream. The fallback RFI flush is used when firmware does not provide a way to flush the cache. It's a "displacement flush" that evicts useful data by displacing it with an uninteresting buffer. The flush has to take care to work with implementation specific cache replacment policies, so the recipe has been in flux. The initial slow but conservative approach is to touch all lines of a congruence class, with dependencies between each load. It has since been determined that a linear pattern of loads without dependencies is sufficient, and is significantly faster. Measuring the speed of a null syscall with RFI fallback flush enabled gives the relative improvement: P8 - 1.83x P9 - 1.75x The flush also becomes simpler and more adaptable to different cache geometries. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Vrabel authored
commit d8f2f498 upstream. Since 4.10, commit 8003c9ae (KVM: LAPIC: add APIC Timer periodic/oneshot mode VMX preemption timer support), guests using periodic LAPIC timers (such as FreeBSD 8.4) would see their timers drift significantly over time. Differences in the underlying clocks and numerical errors means the periods of the two timers (hv and sw) are not the same. This difference will accumulate with every expiry resulting in a large error between the hv and sw timer. This means the sw timer may be running slow when compared to the hv timer. When the timer is switched from hv to sw, the now active sw timer will expire late. The guest VCPU is reentered and it switches to using the hv timer. This timer catches up, injecting multiple IRQs into the guest (of which the guest only sees one as it does not get to run until the hv timer has caught up) and thus the guest's timer rate is low (and becomes increasing slower over time as the sw timer lags further and further behind). I believe a similar problem would occur if the hv timer is the slower one, but I have not observed this. Fix this by synchronizing the deadlines for both timers to the same time source on every tick. This prevents the errors from accumulating. Fixes: 8003c9ae Cc: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by:
David Vrabel <david.vrabel@nutanix.com> Cc: stable@vger.kernel.org Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Wanpeng Li <wanpengli@tencent.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jim Mattson authored
commit 1eaafe91 upstream. If there is a possibility that a VM may migrate to a Skylake host, then the hypervisor should report IA32_ARCH_CAPABILITIES.RSBA[bit 2] as being set (future work, of course). This implies that CPUID.(EAX=7,ECX=0):EDX.ARCH_CAPABILITIES[bit 29] should be set. Therefore, kvm should report this CPUID bit as being supported whether or not the host supports it. Userspace is still free to clear the bit if it chooses. For more information on RSBA, see Intel's white paper, "Retpoline: A Branch Target Injection Mitigation" (Document Number 337131-001), currently available at https://bugzilla.kernel.org/show_bug.cgi?id=199511. Since the IA32_ARCH_CAPABILITIES MSR is emulated in kvm, there is no dependency on hardware support for this feature. Signed-off-by:
Jim Mattson <jmattson@google.com> Reviewed-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Fixes: 28c1c9fa ("KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES") Cc: stable@vger.kernel.org Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Wei Huang authored
commit c4d21882 upstream. The CPUID bits of OSXSAVE (function=0x1) and OSPKE (func=0x7, leaf=0x0) allows user apps to detect if OS has set CR4.OSXSAVE or CR4.PKE. KVM is supposed to update these CPUID bits when CR4 is updated. Current KVM code doesn't handle some special cases when updates come from emulator. Here is one example: Step 1: guest boots Step 2: guest OS enables XSAVE ==> CR4.OSXSAVE=1 and CPUID.OSXSAVE=1 Step 3: guest hot reboot ==> QEMU reset CR4 to 0, but CPUID.OSXAVE==1 Step 4: guest os checks CPUID.OSXAVE, detects 1, then executes xgetbv Step 4 above will cause an #UD and guest crash because guest OS hasn't turned on OSXAVE yet. This patch solves the problem by comparing the the old_cr4 with cr4. If the related bits have been changed, kvm_update_cpuid() needs to be called. Signed-off-by:
Wei Huang <wei@redhat.com> Reviewed-by:
Bandan Das <bsd@redhat.com> Cc: stable@vger.kernel.org Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Hildenbrand authored
commit f4a551b7 upstream. By missing an "L", we might detect some addresses to be <8k, although they are not. e.g. for itdba = 100001fff !(gpa & ~0x1fffU) -> 1 !(gpa & ~0x1fffUL) -> 0 So we would report a SIE validity intercept although everything is fine. Fixes: 166ecb3d ("KVM: s390: vsie: support transactional execution") Reported-by:
Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by:
Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by:
Janosch Frank <frankja@linux.ibm.com> Reviewed-by:
Cornelia Huck <cohuck@redhat.com> Signed-off-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Janosch Frank <frankja@linux.ibm.com> Cc: stable@vger.kernel.org # v4.8+ Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Konrad Rzeszutek Wilk authored
commit 0aa48468 upstream. The X86_FEATURE_SSBD is an synthetic CPU feature - that is it bit location has no relevance to the real CPUID 0x7.EBX[31] bit position. For that we need the new CPU feature name. Fixes: 52817587 ("x86/cpufeatures: Disentangle SSBD enumeration") Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: stable@vger.kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lkml.kernel.org/r/20180521215449.26423-2-konrad.wilk@oracle.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Gustavo A. R. Silva authored
commit 23d6aef7 upstream. `resource' can be controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap) kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap) Fix this by sanitizing *resource* before using it to index current->signal->rlim Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Link: http://lkml.kernel.org/r/20180515030038.GA11822@embeddedor.comSigned-off-by:
Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Hildenbrand authored
commit 3f195972 upstream. Using module_init() is wrong. E.g. ACPI adds and onlines memory before our memory notifier gets registered. This makes sure that ACPI memory detected during boot up will not result in a kernel crash. Easily reproducible with QEMU, just specify a DIMM when starting up. Link: http://lkml.kernel.org/r/20180522100756.18478-3-david@redhat.com Fixes: 786a8959 ("kasan: disable memory hotplug") Signed-off-by:
David Hildenbrand <david@redhat.com> Acked-by:
Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
David Hildenbrand authored
commit ed1596f9 upstream. We have to free memory again when we cancel onlining, otherwise a later onlining attempt will fail. Link: http://lkml.kernel.org/r/20180522100756.18478-2-david@redhat.com Fixes: fa69b598 ("mm/kasan: add support for memory hotplug") Signed-off-by:
David Hildenbrand <david@redhat.com> Acked-by:
Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andrey Ryabinin authored
commit 0f901dcb upstream. KASAN uses different routines to map shadow for hot added memory and memory obtained in boot process. Attempt to offline memory onlined by normal boot process leads to this: Trying to vfree() nonexistent vm area (000000005d3b34b9) WARNING: CPU: 2 PID: 13215 at mm/vmalloc.c:1525 __vunmap+0x147/0x190 Call Trace: kasan_mem_notifier+0xad/0xb9 notifier_call_chain+0x166/0x260 __blocking_notifier_call_chain+0xdb/0x140 __offline_pages+0x96a/0xb10 memory_subsys_offline+0x76/0xc0 device_offline+0xb8/0x120 store_mem_state+0xfa/0x120 kernfs_fop_write+0x1d5/0x320 __vfs_write+0xd4/0x530 vfs_write+0x105/0x340 SyS_write+0xb0/0x140 Obviously we can't call vfree() to free memory that wasn't allocated via vmalloc(). Use find_vm_area() to see if we can call vfree(). Unfortunately it's a bit tricky to properly unmap and free shadow allocated during boot, so we'll have to keep it. If memory will come online again that shadow will be reused. Matthew asked: how can you call vfree() on something that isn't a vmalloc address? vfree() is able to free any address returned by __vmalloc_node_range(). And __vmalloc_node_range() gives you any address you ask. It doesn't have to be an address in [VMALLOC_START, VMALLOC_END] range. That's also how the module_alloc()/module_memfree() works on architectures that have designated area for modules. [aryabinin@virtuozzo.com: improve comments] Link: http://lkml.kernel.org/r/dabee6ab-3a7a-51cd-3b86-5468718e0390@virtuozzo.com [akpm@linux-foundation.org: fix typos, reflow comment] Link: http://lkml.kernel.org/r/20180201163349.8700-1-aryabinin@virtuozzo.com Fixes: fa69b598 ("mm/kasan: add support for memory hotplug") Signed-off-by:
Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by:
Paul Menzel <pmenzel+linux-kasan-dev@molgen.mpg.de> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Davidlohr Bueso authored
commit 8f89c007 upstream. shmat()'s SHM_REMAP option forbids passing a nil address for; this is in fact the very first thing we check for. Andrea reported that for SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check, but we need to check again if the address was rounded down to nil. As of this patch, such cases will return -EINVAL. Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Reported-by:
Andrea Arcangeli <aarcange@redhat.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Davidlohr Bueso authored
commit a73ab244 upstream. Patch series "ipc/shm: shmat() fixes around nil-page". These patches fix two issues reported[1] a while back by Joe and Andrea around how shmat(2) behaves with nil-page. The first reverts a commit that it was incorrectly thought that mapping nil-page (address=0) was a no no with MAP_FIXED. This is not the case, with the exception of SHM_REMAP; which is address in the second patch. I chose two patches because it is easier to backport and it explicitly reverts bogus behaviour. Both patches ought to be in -stable and ltp testcases need updated (the added testcase around the cve can be modified to just test for SHM_RND|SHM_REMAP). [1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805 This patch (of 2): Commit 95e91b83 ("ipc/shm: Fix shmat mmap nil-page protection") worked on the idea that we should not be mapping as root addr=0 and MAP_FIXED. However, it was reported that this scenario is in fact valid, thus making the patch both bogus and breaks userspace as well. For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem initialization[1]. [1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347 Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net Fixes: 95e91b83 ("ipc/shm: Fix shmat mmap nil-page protection") Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Reported-by:
Joe Lawrence <joe.lawrence@redhat.com> Reported-by:
Andrea Arcangeli <aarcange@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Matthew Wilcox authored
commit 7a4deea1 upstream. If the radix tree underlying the IDR happens to be full and we attempt to remove an id which is larger than any id in the IDR, we will call __radix_tree_delete() with an uninitialised 'slot' pointer, at which point anything could happen. This was easiest to hit with a single entry at id 0 and attempting to remove a non-0 id, but it could have happened with 64 entries and attempting to remove an id >= 64. Roman said: The syzcaller test boils down to opening /dev/kvm, creating an eventfd, and calling a couple of KVM ioctls. None of this requires superuser. And the result is dereferencing an uninitialized pointer which is likely a crash. The specific path caught by syzbot is via KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are other user-triggerable paths, so cc:stable is probably justified. Matthew added: We have around 250 calls to idr_remove() in the kernel today. Many of them pass an ID which is embedded in the object they're removing, so they're safe. Picking a few likely candidates: drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl. drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar drivers/atm/nicstar.c could be taken down by a handcrafted packet Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org Fixes: 0a835c4f ("Reimplement IDR and IDA using the radix tree") Reported-by: <syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com> Debugged-by:
Roman Kagan <rkagan@virtuozzo.com> Signed-off-by:
Matthew Wilcox <mawilcox@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jens Axboe authored
commit f7068114 upstream. We're casting the CDROM layer request_sense to the SCSI sense buffer, but the former is 64 bytes and the latter is 96 bytes. As we generally allocate these on the stack, we end up blowing up the stack. Fix this by wrapping the scsi_execute() call with a properly sized sense buffer, and copying back the bits for the CDROM layer. Cc: stable@vger.kernel.org Reported-by:
Piotr Gabriel Kosinski <pg.kosinski@gmail.com> Reported-by:
Daniel Shapira <daniel@twistlock.com> Tested-by:
Kees Cook <keescook@chromium.org> Fixes: 82ed4db4 ("block: split scsi_request out of struct request") Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Lidong Chen authored
commit 8e907ed4 upstream. User-space may invoke ibv_reg_mr and ibv_dereg_mr in different threads. If ibv_dereg_mr is called after the thread which invoked ibv_reg_mr has exited, get_pid_task will return NULL and ib_umem_release will not decrease mm->pinned_vm. Instead of using threads to locate the mm, use the overall tgid from the ib_ucontext struct instead. This matches the behavior of ODP and disassociate in handling the mm of the process that called ibv_reg_mr. Cc: <stable@vger.kernel.org> Fixes: 87773dd5 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get") Signed-off-by:
Lidong Chen <lidongchen@tencent.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael J. Ruhl authored
commit f9e76ca3 upstream. A pio send egress error can occur when the PSM library attempts to to send a bad packet. That issue is still being investigated. The pio error interrupt handler then attempts to progress the recovery of the errored pio send context. Code inspection reveals that the handling lacks the necessary locking if that recovery interleaves with a PSM close of the "context" object contains the pio send context. The lack of the locking can cause the recovery to access the already freed pio send context object and incorrectly deduce that the pio send context is actually a kernel pio send context as shown by the NULL deref stack below: [<ffffffff8143d78c>] _dev_info+0x6c/0x90 [<ffffffffc0613230>] sc_restart+0x70/0x1f0 [hfi1] [<ffffffff816ab124>] ? __schedule+0x424/0x9b0 [<ffffffffc06133c5>] sc_halted+0x15/0x20 [hfi1] [<ffffffff810aa3ba>] process_one_work+0x17a/0x440 [<ffffffff810ab086>] worker_thread+0x126/0x3c0 [<ffffffff810aaf60>] ? manage_workers.isra.24+0x2a0/0x2a0 [<ffffffff810b252f>] kthread+0xcf/0xe0 [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 This is the best case scenario and other scenarios can corrupt the already freed memory. Fix by adding the necessary locking in the pio send context error handler. Cc: <stable@vger.kernel.org> # 4.9.x Reviewed-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Doug Ledford <dledford@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael Neuling authored
commit faf37c44 upstream. Clear the PCR (Processor Compatibility Register) on boot to ensure we are not running in a compatibility mode. We've seen this cause problems when a crash (and kdump) occurs while running compat mode guests. The kdump kernel then runs with the PCR set and causes problems. The symptom in the kdump kernel (also seen in petitboot after fast-reboot) is early userspace programs taking sigills on newer instructions (seen in libc). Signed-off-by:
Michael Neuling <mikey@neuling.org> Cc: stable@vger.kernel.org Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Will Deacon authored
commit 32c3fa7c upstream. For LSE atomics that read and write a register operand, we need to ensure that these operands are annotated as "early clobber" if the register is written before all of the input operands have been consumed. Failure to do so can result in the compiler allocating the same register to both operands, leading to splats such as: Unable to handle kernel paging request at virtual address 11111122222221 [...] x1 : 1111111122222222 x0 : 1111111122222221 Process swapper/0 (pid: 1, stack limit = 0x000000008209f908) Call trace: test_atomic64+0x1360/0x155c where x0 has been allocated as both the value to be stored and also the atomic_t pointer. This patch adds the missing clobbers. Cc: <stable@vger.kernel.org> Cc: Dave Martin <dave.martin@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Reported-by:
Mark Salter <msalter@redhat.com> Signed-off-by:
Will Deacon <will.deacon@arm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Hellstrom authored
commit 938ae725 upstream. Depending on whether the kernel is compiled with frame-pointer or not, the temporary memory location used for the bp parameter in these macros is referenced relative to the stack pointer or the frame pointer. Hence we can never reference that parameter when we've modified either the stack pointer or the frame pointer, because then the compiler would generate an incorrect stack reference. Fix this by pushing the temporary memory parameter on a known location on the stack before modifying the stack- and frame pointers. Cc: <stable@vger.kernel.org> Signed-off-by:
Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by:
Brian Paul <brianp@vmware.com> Reviewed-by:
Sinclair Yeh <syeh@vmware.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Joe Jin authored
commit 4855c92d upstream. When run raidconfig from Dom0 we found that the Xen DMA heap is reduced, but Dom Heap is increased by the same size. Tracing raidconfig we found that the related ioctl() in megaraid_sas will call dma_alloc_coherent() to apply memory. If the memory allocated by Dom0 is not in the DMA area, it will exchange memory with Xen to meet the requiment. Later drivers call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent() the check condition (dev_addr + size - 1 <= dma_mask) is always false, it prevents calling xen_destroy_contiguous_region() to return the memory to the Xen DMA heap. This issue introduced by commit 6810df88 "xen-swiotlb: When doing coherent alloc/dealloc check before swizzling the MFNs.". Signed-off-by:
Joe Jin <joe.jin@oracle.com> Tested-by:
John Sobecki <john.sobecki@oracle.com> Reviewed-by:
Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: stable@vger.kernel.org Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-