- 26 May, 2018 8 commits
-
-
Michal Hocko authored
Oscar has reported: : Due to an unfortunate setting with movablecore, memblocks containing bootmem : memory (pages marked by get_page_bootmem()) ended up marked in zone_movable. : So while trying to remove that memory, the system failed in do_migrate_range : and __offline_pages never returned. : : This can be reproduced by running : qemu-system-x86_64 -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M : and movablecore=4G kernel command line : : linux kernel: BIOS-provided physical RAM map: : linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable : linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved : linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved : linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable : linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved : linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved : linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved : linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable : linux kernel: NX (Execute Disable) protection: active : linux kernel: SMBIOS 2.8 present. : linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org : linux kernel: Hypervisor detected: KVM : linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved : linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable : linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000 : : linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0 : linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1 : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff] : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff] : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff] : linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff] : linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug : linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0 : linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0 : linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff] : linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff] : : zoneinfo shows that the zone movable is placed into both numa nodes: : Node 0, zone Movable : pages free 160140 : min 1823 : low 2278 : high 2733 : spanned 262144 : present 262144 : managed 245670 : Node 1, zone Movable : pages free 448427 : min 3827 : low 4783 : high 5739 : spanned 524288 : present 524288 : managed 515766 Note how only Node 0 has a hutplugable memory region which would rule it out from the early memblock allocations (most likely memmap). Node1 will surely contain memmaps on the same node and those would prevent offlining to succeed. So this is arguably a configuration issue. Although one could argue that we should be more clever and rule early allocations from the zone movable. This would be correct but probably not worth the effort considering what a hack movablecore is. Anyway, We could do better for those cases though. We rely on start_isolate_page_range resp. has_unmovable_pages to do their job. The first one isolates the whole range to be offlined so that we do not allocate from it anymore and the later makes sure we are not stumbling over non-migrateable pages. has_unmovable_pages is overly optimistic, however. It doesn't check all the pages if we are withing zone_movable because we rely that those pages will be always migrateable. As it turns out we are still not perfect there. While bootmem pages in zonemovable sound like a clear bug which should be fixed let's remove the optimization for now and warn if we encounter unmovable pages in zone_movable in the meantime. That should help for now at least. Btw. this wasn't a real problem until commit 72b39cfc ("mm, memory_hotplug: do not fail offlining too early") because we used to have a small number of retries and then failed. This turned out to be too fragile though. Link: http://lkml.kernel.org/r/20180523125555.30039-2-mhocko@kernel.orgSigned-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Oscar Salvador <osalvador@techadventures.net> Tested-by: Oscar Salvador <osalvador@techadventures.net> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrey Ryabinin authored
KASAN uses different routines to map shadow for hot added memory and memory obtained in boot process. Attempt to offline memory onlined by normal boot process leads to this: Trying to vfree() nonexistent vm area (000000005d3b34b9) WARNING: CPU: 2 PID: 13215 at mm/vmalloc.c:1525 __vunmap+0x147/0x190 Call Trace: kasan_mem_notifier+0xad/0xb9 notifier_call_chain+0x166/0x260 __blocking_notifier_call_chain+0xdb/0x140 __offline_pages+0x96a/0xb10 memory_subsys_offline+0x76/0xc0 device_offline+0xb8/0x120 store_mem_state+0xfa/0x120 kernfs_fop_write+0x1d5/0x320 __vfs_write+0xd4/0x530 vfs_write+0x105/0x340 SyS_write+0xb0/0x140 Obviously we can't call vfree() to free memory that wasn't allocated via vmalloc(). Use find_vm_area() to see if we can call vfree(). Unfortunately it's a bit tricky to properly unmap and free shadow allocated during boot, so we'll have to keep it. If memory will come online again that shadow will be reused. Matthew asked: how can you call vfree() on something that isn't a vmalloc address? vfree() is able to free any address returned by __vmalloc_node_range(). And __vmalloc_node_range() gives you any address you ask. It doesn't have to be an address in [VMALLOC_START, VMALLOC_END] range. That's also how the module_alloc()/module_memfree() works on architectures that have designated area for modules. [aryabinin@virtuozzo.com: improve comments] Link: http://lkml.kernel.org/r/dabee6ab-3a7a-51cd-3b86-5468718e0390@virtuozzo.com [akpm@linux-foundation.org: fix typos, reflow comment] Link: http://lkml.kernel.org/r/20180201163349.8700-1-aryabinin@virtuozzo.com Fixes: fa69b598 ("mm/kasan: add support for memory hotplug") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Paul Menzel <pmenzel+linux-kasan-dev@molgen.mpg.de> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mike Kravetz authored
The current hugetlbfs maintainer has not been active for more than a few years. I have been been active in this area for more than two years and plan to remain active in the foreseeable future. Also, update the hugetlbfs entry to include linux-mm mail list and additional hugetlbfs related files. hugetlb.c and hugetlb.h are not 100% hugetlbfs, but a majority of their content is hugetlbfs related. Link: http://lkml.kernel.org/r/20180518225236.19079-1-mike.kravetz@oracle.comSigned-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Nadia Yvette Chambers <nyc@holomorphy.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
shmat()'s SHM_REMAP option forbids passing a nil address for; this is in fact the very first thing we check for. Andrea reported that for SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check, but we need to check again if the address was rounded down to nil. As of this patch, such cases will return -EINVAL. Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reported-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
Patch series "ipc/shm: shmat() fixes around nil-page". These patches fix two issues reported[1] a while back by Joe and Andrea around how shmat(2) behaves with nil-page. The first reverts a commit that it was incorrectly thought that mapping nil-page (address=0) was a no no with MAP_FIXED. This is not the case, with the exception of SHM_REMAP; which is address in the second patch. I chose two patches because it is easier to backport and it explicitly reverts bogus behaviour. Both patches ought to be in -stable and ltp testcases need updated (the added testcase around the cve can be modified to just test for SHM_RND|SHM_REMAP). [1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805 This patch (of 2): Commit 95e91b83 ("ipc/shm: Fix shmat mmap nil-page protection") worked on the idea that we should not be mapping as root addr=0 and MAP_FIXED. However, it was reported that this scenario is in fact valid, thus making the patch both bogus and breaks userspace as well. For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem initialization[1]. [1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347 Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net Fixes: 95e91b83 ("ipc/shm: Fix shmat mmap nil-page protection") Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reported-by: Joe Lawrence <joe.lawrence@redhat.com> Reported-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox authored
If the radix tree underlying the IDR happens to be full and we attempt to remove an id which is larger than any id in the IDR, we will call __radix_tree_delete() with an uninitialised 'slot' pointer, at which point anything could happen. This was easiest to hit with a single entry at id 0 and attempting to remove a non-0 id, but it could have happened with 64 entries and attempting to remove an id >= 64. Roman said: The syzcaller test boils down to opening /dev/kvm, creating an eventfd, and calling a couple of KVM ioctls. None of this requires superuser. And the result is dereferencing an uninitialized pointer which is likely a crash. The specific path caught by syzbot is via KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are other user-triggerable paths, so cc:stable is probably justified. Matthew added: We have around 250 calls to idr_remove() in the kernel today. Many of them pass an ID which is embedded in the object they're removing, so they're safe. Picking a few likely candidates: drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl. drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar drivers/atm/nicstar.c could be taken down by a handcrafted packet Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org Fixes: 0a835c4f ("Reimplement IDR and IDA using the radix tree") Reported-by: <syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com> Debugged-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Changwei Ge authored
This reverts commit ba16ddfb ("ocfs2/o2hb: check len for bio_add_page() to avoid getting incorrect bio"). In my testing, this patch introduces a problem that mkfs can't have slots more than 16 with 4k block size. And the original logic is safe actually with the situation it mentions so revert this commit. Attach test log: (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 0, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 1, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 2, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 3, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 4, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 5, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 6, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 7, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 8, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 9, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 10, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 11, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 12, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 13, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 14, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 15, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 16, vec_len = 4096, vec_start = 0 (mkfs.ocfs2,27479,2):o2hb_setup_one_bio:471 ERROR: Adding page[16] to bio failed, page ffffea0002d7ed40, len 0, vec_len 4096, vec_start 0,bi_sector 8192 (mkfs.ocfs2,27479,2):o2hb_read_slots:500 ERROR: status = -5 (mkfs.ocfs2,27479,2):o2hb_populate_slot_data:1911 ERROR: status = -5 (mkfs.ocfs2,27479,2):o2hb_region_dev_write:2012 ERROR: status = -5 Link: http://lkml.kernel.org/r/SIXPR06MB0461721F398A5A92FC68C39ED5920@SIXPR06MB0461.apcprd06.prod.outlook.comSigned-off-by: Changwei Ge <ge.changwei@h3c.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Omar Sandoval authored
If swapon() fails after incrementing nr_rotate_swap, we don't decrement it and thus effectively leak it. Make sure we decrement it if we incremented it. Link: http://lkml.kernel.org/r/b6fe6b879f17fa68eee6cbd876f459f6e5e33495.1526491581.git.osandov@fb.com Fixes: 81a0298b ("mm, swap: don't use VMA based swap readahead if HDD is used as swap") Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Rik van Riel <riel@surriel.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 25 May, 2018 5 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds authored
Pull more arm64 fixes from Will Deacon: - fix application of read-only permissions to kernel section mappings - sanitise reported ESR values for signals delivered on a kernel address - ensure tishift GCC helpers are exported to modules - fix inline asm constraints for some LSE atomics * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Make sure permission updates happen for pmd/pud arm64: fault: Don't leak data in ESR context for user fault on kernel VA arm64: export tishift functions to modules arm64: lse: Add early clobbers to some input/output asm operands
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds authored
Pull powerpc fix from Michael Ellerman: "Just one fix, to make sure the PCR (Processor Compatibility Register) is reset on boot. Otherwise if we're running in compat mode in a guest (eg. pretending a Power9 is a Power8) and the host kernel oopses and kdumps then the kdump kernel's userspace will be running in Power8 mode, and will SIGILL if it uses Power9-only instructions. Thanks to Michael Neuling" * tag 'powerpc-4.17-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Clear PCR on boot
-
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmcLinus Torvalds authored
Pull MMC fixes from Ulf Hansson: "MMC core: - Propagate correct error code for RPMB requests MMC host: - sdhci-iproc: Drop hard coded cap for 1.8v - sdhci-iproc: Fix 32bit writes for transfer mode - sdhci-iproc: Enable SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus" * tag 'mmc-v4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-iproc: add SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register mmc: sdhci-iproc: remove hard coded mmc cap 1.8v mmc: block: propagate correct returned value in mmc_rpmb_ioctl
-
git://people.freedesktop.org/~airlied/linuxLinus Torvalds authored
Pull drm fixes from Dave Airlie: "Only two sets of drivers fixes: one rcar-du lvds regression fix, and a group of fixes for vmwgfx" * tag 'drm-fixes-for-v4.17-rc7' of git://people.freedesktop.org/~airlied/linux: drm/vmwgfx: Schedule an fb dirty update after resume drm/vmwgfx: Fix host logging / guestinfo reading error paths drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros drm: rcar-du: lvds: Fix crash in .atomic_check when disabling connector
-
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/soundLinus Torvalds authored
Pull sound fixes from Takashi Iwai: "Two fixes: - a timer pause event notification was garbled upon the recent hardening work; corrected now - HD-audio runtime PM regression fix due to the incorrect return type" * tag 'sound-4.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Fix runtime PM ALSA: timer: Fix pause event notification
-
- 24 May, 2018 12 commits
-
-
git://people.freedesktop.org/~thomash/linuxDave Airlie authored
Three fixes for vmwgfx. Two are cc'd stable and fix host logging and its error paths on 32-bit VMs. One is a fix for a hibernate flaw introduced with the 4.17 merge window. * 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux: drm/vmwgfx: Schedule an fb dirty update after resume drm/vmwgfx: Fix host logging / guestinfo reading error paths drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
-
Linus Torvalds authored
Merge branch 'stable/for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb fix from Konrad Rzeszutek Wilk: "One single fix in here: under Xen the DMA32 heap (in the hypervisor) would end up looking like swiss cheese. The reason being that for every coherent DMA allocation we didn't do the proper hypercall to tell Xen to return the page back to the DMA32 heap. End result was (eventually) no DMA32 space if you (for example) continously unloaded and loaded modules" * 'stable/for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
-
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds authored
Pull rdma fixes from Jason Gunthorpe: "This is pretty much just the usual array of smallish driver bugs. - remove bouncing addresses from the MAINTAINERS file - kernel oops and bad error handling fixes for hfi, i40iw, cxgb4, and hns drivers - various small LOC behavioral/operational bugs in mlx5, hns, qedr and i40iw drivers - two fixes for patches already sent during the merge window - a long-standing bug related to not decreasing the pinned pages count in the right MM was found and fixed" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (28 commits) RDMA/hns: Move the location for initializing tmp_len RDMA/hns: Bugfix for cq record db for kernel IB/uverbs: Fix uverbs_attr_get_obj RDMA/qedr: Fix doorbell bar mapping for dpi > 1 IB/umem: Use the correct mm during ib_umem_release iw_cxgb4: Fix an error handling path in 'c4iw_get_dma_mr()' RDMA/i40iw: Avoid panic when reading back the IRQ affinity hint RDMA/i40iw: Avoid reference leaks when processing the AEQ RDMA/i40iw: Avoid panic when objects are being created and destroyed RDMA/hns: Fix the bug with NULL pointer RDMA/hns: Set NULL for __internal_mr RDMA/hns: Enable inner_pa_vld filed of mpt RDMA/hns: Set desc_dma_addr for zero when free cmq desc RDMA/hns: Fix the bug with rq sge RDMA/hns: Not support qp transition from reset to reset for hip06 RDMA/hns: Add return operation when configured global param fail RDMA/hns: Update convert function of endian format RDMA/hns: Load the RoCE dirver automatically RDMA/hns: Bugfix for rq record db for kernel RDMA/hns: Add rq inline flags judgement ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds authored
Pull btrfs fix from David Sterba: "A one-liner that prevents leaking an internal error value 1 out of the ftruncate syscall. This has been observed in practice. The steps to reproduce make a common pattern (open/write/fync/ftruncate) but also need the application to not check only for negative values and happens only for compressed inlined files. The conditions are narrow but as this could break userspace I think it's better to merge it now and not wait for the merge window" * tag 'for-4.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix error handling in btrfs_truncate()
-
Lukas Wunner authored
Before commit 3b5b899c ("ALSA: hda: Make use of core codec functions to sync power state"), hda_set_power_state() returned the response to the Get Power State verb, a 32-bit unsigned integer whose expected value is 0x233 after transitioning a codec to D3, and 0x0 after transitioning it to D0. The response value is significant because hda_codec_runtime_suspend() does not clear the codec's bit in the codec_powered bitmask unless the AC_PWRST_CLK_STOP_OK bit (0x200) is set in the response value. That in turn prevents the HDA controller from runtime suspending because azx_runtime_idle() checks that the codec_powered bitmask is zero. Since commit 3b5b899c, hda_set_power_state() only returns 0x0 or 0x1, thereby breaking runtime PM for any HDA controller. That's because an inline function introduced by the commit returns a bool instead of a 32-bit unsigned int. The change was likely erroneous and resulted from copying and pasting snd_hda_check_power_state(), which is immediately preceding the newly introduced inline function. Fix it. Link: https://bugs.freedesktop.org/show_bug.cgi?id=106597 Fixes: 3b5b899c ("ALSA: hda: Make use of core codec functions to sync power state") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Abhijeet Kumar <abhijeet.kumar@intel.com> Reported-and-tested-by: Gunnar Krüger <taijian@posteo.de> Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
-
Joonsoo Kim authored
This reverts the following commits that change CMA design in MM. 3d2054ad ("ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y") 1d47a3ec ("mm/cma: remove ALLOC_CMA") bad8c6c0 ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE") Ville reported a following error on i386. Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) microcode: microcode updated early to revision 0x4, date = 2013-06-28 Initializing CPU#0 Initializing HighMem for node 0 (000377fe:00118000) Initializing Movable for node 0 (00000001:00118000) BUG: Bad page state in process swapper pfn:377fe page:f53effc0 count:0 mapcount:-127 mapping:00000000 index:0x0 flags: 0x80000000() raw: 80000000 00000000 00000000 ffffff80 00000000 00000100 00000200 00000001 page dumped because: nonzero mapcount Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-rc5-elk+ #145 Hardware name: Dell Inc. Latitude E5410/03VXMC, BIOS A15 07/11/2013 Call Trace: dump_stack+0x60/0x96 bad_page+0x9a/0x100 free_pages_check_bad+0x3f/0x60 free_pcppages_bulk+0x29d/0x5b0 free_unref_page_commit+0x84/0xb0 free_unref_page+0x3e/0x70 __free_pages+0x1d/0x20 free_highmem_page+0x19/0x40 add_highpages_with_active_regions+0xab/0xeb set_highmem_pages_init+0x66/0x73 mem_init+0x1b/0x1d7 start_kernel+0x17a/0x363 i386_start_kernel+0x95/0x99 startup_32_smp+0x164/0x168 The reason for this error is that the span of MOVABLE_ZONE is extended to whole node span for future CMA initialization, and, normal memory is wrongly freed here. I submitted the fix and it seems to work, but, another problem happened. It's so late time to fix the later problem so I decide to reverting the series. Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Laura Abbott <labbott@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libataLinus Torvalds authored
Pull libata fixes from Tejun Heo: "Nothing too interesting. Four patches to update the blacklist and add a controller ID" * 'for-4.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: ahci: Add PCI ID for Cannon Lake PCH-LP AHCI libata: blacklist Micron 500IT SSD with MU01 firmware libata: Apply NOLPM quirk for SAMSUNG PM830 CXM13D1Q. libata: Blacklist some Sandisk SSDs for NCQ
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull block fixes from Jens Axboe: "Two fixes that should go into this release: - a loop writeback error clearing fix from Jeff - the sr sense fix from myself" * tag 'for-linus-20180524' of git://git.kernel.dk/linux-block: loop: clear wb_err in bd_inode when detaching backing file sr: pass down correctly sized SCSI sense buffer
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull power management fix from Rafael Wysocki: "Fix a regression from the 4.15 cycle that caused the system suspend and resume overhead to increase on many systems and triggered more serious problems on some of them (Rafael Wysocki)" * tag 'pm-4.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / core: Fix direct_complete handling for devices with no callbacks
-
Mika Westerberg authored
This one should be using the default LPM policy for mobile chipsets so add the PCI ID to the driver list of supported revices. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
-
Laura Abbott authored
Commit 15122ee2 ("arm64: Enforce BBM for huge IO/VMAP mappings") disallowed block mappings for ioremap since that code does not honor break-before-make. The same APIs are also used for permission updating though and the extra checks prevent the permission updates from happening, even though this should be permitted. This results in read-only permissions not being fully applied. Visibly, this can occasionaly be seen as a failure on the built in rodata test when the test data ends up in a section or as an odd RW gap on the page table dump. Fix this by using pgattr_change_is_safe instead of p*d_present for determining if the change is permitted. Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Peter Robinson <pbrobinson@gmail.com> Reported-by: Peter Robinson <pbrobinson@gmail.com> Fixes: 15122ee2 ("arm64: Enforce BBM for huge IO/VMAP mappings") Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Omar Sandoval authored
Jun Wu at Facebook reported that an internal service was seeing a return value of 1 from ftruncate() on Btrfs in some cases. This is coming from the NEED_TRUNCATE_BLOCK return value from btrfs_truncate_inode_items(). btrfs_truncate() uses two variables for error handling, ret and err. When btrfs_truncate_inode_items() returns non-zero, we set err to the return value. However, NEED_TRUNCATE_BLOCK is not an error. Make sure we only set err if ret is an error (i.e., negative). To reproduce the issue: mount a filesystem with -o compress-force=zstd and the following program will encounter return value of 1 from ftruncate: int main(void) { char buf[256] = { 0 }; int ret; int fd; fd = open("test", O_CREAT | O_WRONLY | O_TRUNC, 0666); if (fd == -1) { perror("open"); return EXIT_FAILURE; } if (write(fd, buf, sizeof(buf)) != sizeof(buf)) { perror("write"); close(fd); return EXIT_FAILURE; } if (fsync(fd) == -1) { perror("fsync"); close(fd); return EXIT_FAILURE; } ret = ftruncate(fd, 128); if (ret) { printf("ftruncate() returned %d\n", ret); close(fd); return EXIT_FAILURE; } close(fd); return EXIT_SUCCESS; } Fixes: ddfae63c ("btrfs: move btrfs_truncate_block out of trans handle") CC: stable@vger.kernel.org # 4.15+ Reported-by: Jun Wu <quark@fb.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
- 23 May, 2018 13 commits
-
-
oulijun authored
When posted work request, it need to compute the length of all sges of every wr and fill it into the msg_len field of send wqe. Thus, While posting multiple wr, tmp_len should be reinitialized to zero. Fixes: 8b9b8d14 ("RDMA/hns: Fix the endian problem for hns") Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
-
oulijun authored
When use cq record db for kernel, it needs to set the hr_cq->db_en to 1 and configure the dma address of record cq db of qp context. Fixes: 86188a88 ("RDMA/hns: Support cq record doorbell for kernel space") Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
-
Jason Gunthorpe authored
The err pointer comes from uverbs_attr_get, not from the uobject member, which does not store an ERR_PTR. Fixes: be934cca ("IB/uverbs: Add device memory registration ioctl support") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
-
Kalderon, Michal authored
Each user_context receives a separate dpi value and thus a different address on the doorbell bar. The qedr_mmap function needs to validate the address and map the doorbell bar accordingly. The current implementation always checked against dpi=0 doorbell range leading to a wrong mapping for doorbell bar. (It entered an else case that mapped the address differently). qedr_mmap should only be used for doorbells, so the else was actually wrong in the first place. This only has an affect on arm architecture and not an issue on a x86 based architecture. This lead to doorbells not occurring on arm based systems and left applications that use more than one dpi (or several applications run simultaneously ) to hang. Fixes: ac1b36e5 ("qedr: Add support for user context verbs") Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfdLinus Torvalds authored
Pull MFD fix from Lee Jones: "A single cros_ec_spi fix correcting the handling for long-running commands" * tag 'mfd-fixes-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: mfd: cros_ec: Retry commands when EC is known to be busy
-
git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alphaLinus Torvalds authored
Pull alpha fixes from Matt Turner: "A few small changes for alpha" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha: alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2 alpha: simplify get_arch_dma_ops alpha: use dma_direct_ops for jensen
-
Thomas Hellstrom authored
We have had problems displaying fbdev after a resume and as a workaround we have had to call vmw_fb_refresh(). This has had a number of unwanted side-effects. The root of the problem was, however that the coalesced fbdev dirty region was not empty on the first dirty_mark() after a resume, so a flush was never scheduled. Fix this by force scheduling an fbdev flush after resume, and remove the workaround. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Deepak Rawat <drawat@vmware.com>
-
Thomas Hellstrom authored
The error paths were leaking opened channels. Fix by using dedicated error paths. Cc: <stable@vger.kernel.org> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com>
-
Thomas Hellstrom authored
Depending on whether the kernel is compiled with frame-pointer or not, the temporary memory location used for the bp parameter in these macros is referenced relative to the stack pointer or the frame pointer. Hence we can never reference that parameter when we've modified either the stack pointer or the frame pointer, because then the compiler would generate an incorrect stack reference. Fix this by pushing the temporary memory parameter on a known location on the stack before modifying the stack- and frame pointers. Cc: <stable@vger.kernel.org> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com>
-
Brian Norris authored
Commit 001dde94 ("mfd: cros ec: spi: Fix "in progress" error signaling") pointed out some bad code, but its analysis and conclusion was not 100% correct. It *is* correct that we should not propagate result==EC_RES_IN_PROGRESS for transport errors, because this has a special meaning -- that we should follow up with EC_CMD_GET_COMMS_STATUS until the EC is no longer busy. This is definitely the wrong thing for many commands, because among other problems, EC_CMD_GET_COMMS_STATUS doesn't actually retrieve any RX data from the EC, so commands that expected some data back will instead start processing junk. For such commands, the right answer is to either propagate the error (and return that error to the caller) or resend the original command (*not* EC_CMD_GET_COMMS_STATUS). Unfortunately, commit 001dde94 forgets a crucial point: that for some long-running operations, the EC physically cannot respond to commands any more. For example, with EC_CMD_FLASH_ERASE, the EC may be re-flashing its own code regions, so it can't respond to SPI interrupts. Instead, the EC prepares us ahead of time for being busy for a "long" time, and fills its hardware buffer with EC_SPI_PAST_END. Thus, we expect to see several "transport" errors (or, messages filled with EC_SPI_PAST_END). So we should really translate that to a retryable error (-EAGAIN) and continue sending EC_CMD_GET_COMMS_STATUS until we get a ready status. IOW, it is actually important to treat some of these "junk" values as retryable errors. Together with commit 001dde94, this resolves bugs like the following: 1. EC_CMD_FLASH_ERASE now works again (with commit 001dde94, we would abort the first time we saw EC_SPI_PAST_END) 2. Before commit 001dde94, transport errors (e.g., EC_SPI_RX_BAD_DATA) seen in other commands (e.g., EC_CMD_RTC_GET_VALUE) used to yield junk data in the RX buffer; they will now yield -EAGAIN return values, and tools like 'hwclock' will simply fail instead of retrieving and re-programming undefined time values Fixes: 001dde94 ("mfd: cros ec: spi: Fix "in progress" error signaling") Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
-
Sinan Kaya authored
memory-barriers.txt has been updated with the following requirement. "When using writel(), a prior wmb() is not needed to guarantee that the cache coherent memory writes have completed before writing to the MMIO region." Current writeX() and iowriteX() implementations on alpha are not satisfying this requirement as the barrier is after the register write. Move mb() in writeX() and iowriteX() functions to guarantee that HW observes memory changes before performing register operations. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Matt Turner <mattst88@gmail.com>
-
Christoph Hellwig authored
Remove the dma_ops indirection. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Matt Turner <mattst88@gmail.com>
-
Christoph Hellwig authored
The generic dma_direct implementation does the same thing as the alpha pci-noop implementation, just with more bells and whistles. And unlike the current code it at least has a theoretical chance to actually compile. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Matt Turner <mattst88@gmail.com>
-
- 22 May, 2018 2 commits
-
-
Peter Maydell authored
If userspace faults on a kernel address, handing them the raw ESR value on the sigframe as part of the delivered signal can leak data useful to attackers who are using information about the underlying hardware fault type (e.g. translation vs permission) as a mechanism to defeat KASLR. However there are also legitimate uses for the information provided in the ESR -- notably the GCC and LLVM sanitizers use this to report whether wild pointer accesses by the application are reads or writes (since a wild write is a more serious bug than a wild read), so we don't want to drop the ESR information entirely. For faulting addresses in the kernel, sanitize the ESR. We choose to present userspace with the illusion that there is nothing mapped in the kernel's part of the address space at all, by reporting all faults as level 0 translation faults taken to EL1. These fields are safe to pass through to userspace as they depend only on the instruction that userspace used to provoke the fault: EC IL (always) ISV CM WNR (for all data aborts) All the other fields in ESR except DFSC are architecturally RES0 for an L0 translation fault taken to EL1, so can be zeroed out without confusing userspace. The illusion is not entirely perfect, as there is a tiny wrinkle where we will report an alignment fault that was not due to the memory type (for instance a LDREX to an unaligned address) as a translation fault, whereas if you do this on real unmapped memory the alignment fault takes precedence. This is not likely to trip anybody up in practice, as the only users we know of for the ESR information who care about the behaviour for kernel addresses only really want to know about the WnR bit. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
-
Rafael J. Wysocki authored
Commit 08810a41 (PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags) inadvertently prevented the power.direct_complete flag from being set for devices without PM callbacks and with disabled runtime PM which also prevents power.direct_complete from being set for their parents. That led to problems including a resume crash on HP ZBook 14u. Restore the previous behavior by causing power.direct_complete to be set for those devices again, but do that in a more direct way to avoid overlooking that case in the future. Link: https://bugzilla.kernel.org/show_bug.cgi?id=199693 Fixes: 08810a41 (PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags) Reported-by: Thomas Martitz <kugel@rockbox.org> Tested-by: Thomas Martitz <kugel@rockbox.org> Cc: 4.15+ <stable@vger.kernel.org> # 4.15+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Johan Hovold <johan@kernel.org>
-