- 16 Sep, 2022 8 commits
-
-
Mark Rutland authored
For each instance of an alternative, the compiler outputs a distinct copy of the alternative instructions into a subsection. As the compiler doesn't have special knowledge of alternatives, it cannot coalesce these to save space. In a defconfig kernel built with GCC 12.1.0, there are approximately 10,000 instances of alternative_has_feature_likely(), where the replacement instruction is always a NOP. As NOPs are position-independent, we don't need a unique copy per alternative sequence. This patch adds a callback to patch an alternative sequence with NOPs, and make use of this in alternative_has_feature_likely(). So that this can be used for other sites in future, this is written to patch multiple instructions up to the original sequence length. For NVHE, an alias is added to image-vars.h. For modules, the callback is exported. Note that as modules are loaded within 2GiB of the kernel, an alt_instr entry in a module can always refer directly to the callback, and no special handling is necessary. When building with GCC 12.1.0, the vmlinux is ~158KiB smaller, though the resulting Image size is unchanged due to alignment constraints and padding: | % ls -al vmlinux-* | -rwxr-xr-x 1 mark mark 134644592 Sep 1 14:52 vmlinux-after | -rwxr-xr-x 1 mark mark 134486232 Sep 1 14:50 vmlinux-before | % ls -al Image-* | -rw-r--r-- 1 mark mark 37108224 Sep 1 14:52 Image-after | -rw-r--r-- 1 mark mark 37108224 Sep 1 14:50 Image-before Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220912162210.3626215-9-mark.rutland@arm.comSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
Currrently we use a mixture of alternative sequences and static branches to handle features detected at boot time. For ease of maintenance we generally prefer to use static branches in C code, but this has a few downsides: * Each static branch has metadata in the __jump_table section, which is not discarded after features are finalized. This wastes some space, and slows down the patching of other static branches. * The static branches are patched at a different point in time from the alternatives, so changes are not atomic. This leaves a transient period where there could be a mismatch between the behaviour of alternatives and static branches, which could be problematic for some features (e.g. pseudo-NMI). * More (instrumentable) kernel code is executed to patch each static branch, which can be risky when patching certain features (e.g. irqflags management for pseudo-NMI). * When CONFIG_JUMP_LABEL=n, static branches are turned into a load of a flag and a conditional branch. This means it isn't safe to use such static branches in an alternative address space (e.g. the NVHE/PKVM hyp code), where the generated address isn't safe to acccess. To deal with these issues, this patch introduces new alternative_has_feature_*() helpers, which work like static branches but are patched using alternatives. This ensures the patching is performed at the same time as other alternative patching, allows the metadata to be freed after patching, and is safe for use in alternative address spaces. Note that all supported toolchains have asm goto support, and since commit: a0a12c3e ("asm goto: eradicate CC_HAS_ASM_GOTO)" ... the CC_HAS_ASM_GOTO Kconfig symbol has been removed, so no feature check is necessary, and we can always make use of asm goto. Additionally, note that: * This has no impact on cpus_have_cap(), which is a dynamic check. * This has no functional impact on cpus_have_const_cap(). The branches are patched slightly later than before this patch, but these branches are not reachable until caps have been finalised. * It is now invalid to use cpus_have_final_cap() in the window between feature detection and patching. All existing uses are only expected after patching anyway, so this should not be a problem. * The LSE atomics will now be enabled during alternatives patching rather than immediately before. As the LL/SC an LSE atomics are functionally equivalent this should not be problematic. When building defconfig with GCC 12.1.0, the resulting Image is 64KiB smaller: | % ls -al Image-* | -rw-r--r-- 1 mark mark 37108224 Aug 23 09:56 Image-after | -rw-r--r-- 1 mark mark 37173760 Aug 23 09:54 Image-before According to bloat-o-meter.pl: | add/remove: 44/34 grow/shrink: 602/1294 up/down: 39692/-61108 (-21416) | Function old new delta | [...] | Total: Before=16618336, After=16596920, chg -0.13% | add/remove: 0/2 grow/shrink: 0/0 up/down: 0/-1296 (-1296) | Data old new delta | arm64_const_caps_ready 16 - -16 | cpu_hwcap_keys 1280 - -1280 | Total: Before=8987120, After=8985824, chg -0.01% | add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0) | RO Data old new delta | Total: Before=18408, After=18408, chg +0.00% Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220912162210.3626215-8-mark.rutland@arm.comSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
Today, callback alternatives are special-cased within __apply_alternatives(), and are applied alongside patching for system capabilities as ARM64_NCAPS is not part of the boot_capabilities feature mask. This special-casing is less than ideal. Giving special meaning to ARM64_NCAPS for this requires some structures and loops to use ARM64_NCAPS + 1 (AKA ARM64_NPATCHABLE), while others use ARM64_NCAPS. It's also not immediately clear callback alternatives are only applied when applying alternatives for system-wide features. To make this a bit clearer, changes the way that callback alternatives are identified to remove the special-casing of ARM64_NCAPS, and to allow callback alternatives to be associated with a cpucap as with all other alternatives. New cpucaps, ARM64_ALWAYS_BOOT and ARM64_ALWAYS_SYSTEM are added which are always detected alongside boot cpu capabilities and system capabilities respectively. All existing callback alternatives are made to use ARM64_ALWAYS_SYSTEM, and so will be patched at the same point during the boot flow as before. Subsequent patches will make more use of these new cpucaps. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220912162210.3626215-7-mark.rutland@arm.comSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
We never alter a struct alt_region after creation, and we open-code the bounds of the kernel alternatives region in two functions. The duplication is a bit unfortunate for clarity (and in future we're likely to have more functions altering alternative regions), and to avoid accidents it would be good to make the structure const. This patch adds a shared struct `kernel_alternatives` alt_region for the main kernel image, and marks the alt_regions as const to prevent unintentional modification. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220912162210.3626215-6-mark.rutland@arm.comSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
Printing in the middle of __apply_alternatives() is potentially unsafe and not all that helpful given these days we practically always patch *something*. Hoist the print out of __apply_alternatives(), and add separate prints to __apply_alternatives() and apply_alternatives_all(), which will make it easier to spot if either patching call goes wrong. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220912162210.3626215-5-mark.rutland@arm.comSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
The spectre patching callbacks use cpus_have_final_cap(), and subsequent patches will make it invalid to call cpus_have_final_cap() before alternatives patching has completed. In preparation for said change, this patch modifies the spectre patching callbacks use cpus_have_cap(). This is not subject to patching, and will dynamically check the cpu_hwcaps array, which is functionally equivalent to the existing behaviour. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220912162210.3626215-4-mark.rutland@arm.comSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
The KVM patching callbacks use cpus_have_final_cap() internally within has_vhe(), and subsequent patches will make it invalid to call cpus_have_final_cap() before alternatives patching has completed, and will mean that cpus_have_const_cap() will always fall back to dynamic checks prior to alternatives patching. In preparation for said change, this patch modifies the KVM patching callbacks to use cpus_have_cap() directly. This is not subject to patching, and will dynamically check the cpu_hwcaps array, which is functionally equivalent to the existing behaviour. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220912162210.3626215-3-mark.rutland@arm.comSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
Mark Rutland authored
Currently it isn't safe to use cpus_have_cap() from noinstr code as test_bit() is explicitly instrumented, and were cpus_have_cap() placed out-of-line, cpus_have_cap() itself could be instrumented. Make cpus_have_cap() noinstr safe by marking it __always_inline and using arch_test_bit(). Aside from the prevention of instrumentation, there should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220912162210.3626215-2-mark.rutland@arm.comSigned-off-by: Catalin Marinas <catalin.marinas@arm.com>
-
- 28 Aug, 2022 25 commits
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mmLinus Torvalds authored
Pull more hotfixes from Andrew Morton: "Seventeen hotfixes. Mostly memory management things. Ten patches are cc:stable, addressing pre-6.0 issues" * tag 'mm-hotfixes-stable-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: .mailmap: update Luca Ceresoli's e-mail address mm/mprotect: only reference swap pfn page if type match squashfs: don't call kmalloc in decompressors mm/damon/dbgfs: avoid duplicate context directory creation mailmap: update email address for Colin King asm-generic: sections: refactor memory_intersects bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem ocfs2: fix freeing uninitialized resource on ocfs2_dlm_shutdown Revert "memcg: cleanup racy sum avoidance code" mm/zsmalloc: do not attempt to free IS_ERR handle binder_alloc: add missing mmap_lock calls when using the VMA mm: re-allow pinning of zero pfns (again) vmcoreinfo: add kallsyms_num_syms symbol mailmap: update Guilherme G. Piccoli's email addresses writeback: avoid use-after-free after removing device shmem: update folio if shmem_replace_page() updates the page mm/hugetlb: avoid corrupting page->mapping in hugetlb_mcopy_atomic_pte
-
Linus Torvalds authored
Pull bitmap fixes from Yury Norov: "Fix the reported issues, and implements the suggested improvements, for the version of the cpumask tests [1] that was merged with commit c41e8866 ("lib/test: introduce cpumask KUnit test suite"). These changes include fixes for the tests, and better alignment with the KUnit style guidelines" * tag 'bitmap-6.0-rc3' of github.com:/norov/linux: lib/cpumask_kunit: add tests file to MAINTAINERS lib/cpumask_kunit: log mask contents lib/test_cpumask: follow KUnit style guidelines lib/test_cpumask: fix cpu_possible_mask last test lib/test_cpumask: drop cpu_possible_mask full test
-
Luca Ceresoli authored
My Bootlin address is preferred from now on. Link: https://lkml.kernel.org/r/20220826130515.3011951-1-luca.ceresoli@bootlin.comSigned-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Atish Patra <atishp@atishpatra.org> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Peter Xu authored
Yu Zhao reported a bug after the commit "mm/swap: Add swp_offset_pfn() to fetch PFN from swap entry" added a check in swp_offset_pfn() for swap type [1]: kernel BUG at include/linux/swapops.h:117! CPU: 46 PID: 5245 Comm: EventManager_De Tainted: G S O L 6.0.0-dbg-DEV #2 RIP: 0010:pfn_swap_entry_to_page+0x72/0xf0 Code: c6 48 8b 36 48 83 fe ff 74 53 48 01 d1 48 83 c1 08 48 8b 09 f6 c1 01 75 7b 66 90 48 89 c1 48 8b 09 f6 c1 01 74 74 5d c3 eb 9e <0f> 0b 48 ba ff ff ff ff 03 00 00 00 eb ae a9 ff 0f 00 00 75 13 48 RSP: 0018:ffffa59e73fabb80 EFLAGS: 00010282 RAX: 00000000ffffffe8 RBX: 0c00000000000000 RCX: ffffcd5440000000 RDX: 1ffffffffff7a80a RSI: 0000000000000000 RDI: 0c0000000000042b RBP: ffffa59e73fabb80 R08: ffff9965ca6e8bb8 R09: 0000000000000000 R10: ffffffffa5a2f62d R11: 0000030b372e9fff R12: ffff997b79db5738 R13: 000000000000042b R14: 0c0000000000042b R15: 1ffffffffff7a80a FS: 00007f549d1bb700(0000) GS:ffff99d3cf680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000440d035b3180 CR3: 0000002243176004 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> change_pte_range+0x36e/0x880 change_p4d_range+0x2e8/0x670 change_protection_range+0x14e/0x2c0 mprotect_fixup+0x1ee/0x330 do_mprotect_pkey+0x34c/0x440 __x64_sys_mprotect+0x1d/0x30 It triggers because pfn_swap_entry_to_page() could be called upon e.g. a genuine swap entry. Fix it by only calling it when it's a write migration entry where the page* is used. [1] https://lore.kernel.org/lkml/CAOUHufaVC2Za-p8m0aiHw6YkheDcrO-C3wRGixwDS32VTS+k1w@mail.gmail.com/ Link: https://lkml.kernel.org/r/20220823221138.45602-1-peterx@redhat.com Fixes: 6c287605 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive") Signed-off-by: Peter Xu <peterx@redhat.com> Reported-by: Yu Zhao <yuzhao@google.com> Tested-by: Yu Zhao <yuzhao@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Phillip Lougher authored
The decompressors may be called while in an atomic section. So move the kmalloc() out of this path, and into the "page actor" init function. This fixes a regression introduced by commit f268eedd ("squashfs: extend "page actor" to handle missing pages") Link: https://lkml.kernel.org/r/20220822215430.15933-1-phillip@squashfs.org.uk Fixes: f268eedd ("squashfs: extend "page actor" to handle missing pages") Reported-by: Chris Murphy <lists@colorremedies.com> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Badari Pulavarty authored
When user tries to create a DAMON context via the DAMON debugfs interface with a name of an already existing context, the context directory creation fails but a new context is created and added in the internal data structure, due to absence of the directory creation success check. As a result, memory could leak and DAMON cannot be turned on. An example test case is as below: # cd /sys/kernel/debug/damon/ # echo "off" > monitor_on # echo paddr > target_ids # echo "abc" > mk_context # echo "abc" > mk_context # echo $$ > abc/target_ids # echo "on" > monitor_on <<< fails Return value of 'debugfs_create_dir()' is expected to be ignored in general, but this is an exceptional case as DAMON feature is depending on the debugfs functionality and it has the potential duplicate name issue. This commit therefore fixes the issue by checking the directory creation failure and immediately return the error in the case. Link: https://lkml.kernel.org/r/20220821180853.2400-1-sj@kernel.org Fixes: 75c1c2b5 ("mm/damon/dbgfs: support multiple contexts") Signed-off-by: Badari Pulavarty <badari.pulavarty@intel.com> Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [ 5.15.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Colin Ian King authored
Colin King is working on kernel janitorial fixes in his spare time and using his Intel email is confusing. Use his gmail account as the default email address. Link: https://lkml.kernel.org/r/20220817212753.101109-1-colin.i.king@gmail.comSigned-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Quanyang Wang authored
There are two problems with the current code of memory_intersects: First, it doesn't check whether the region (begin, end) falls inside the region (virt, vend), that is (virt < begin && vend > end). The second problem is if vend is equal to begin, it will return true but this is wrong since vend (virt + size) is not the last address of the memory region but (virt + size -1) is. The wrong determination will trigger the misreporting when the function check_for_illegal_area calls memory_intersects to check if the dma region intersects with stext region. The misreporting is as below (stext is at 0x80100000): WARNING: CPU: 0 PID: 77 at kernel/dma/debug.c:1073 check_for_illegal_area+0x130/0x168 DMA-API: chipidea-usb2 e0002000.usb: device driver maps memory from kernel text or rodata [addr=800f0000] [len=65536] Modules linked in: CPU: 1 PID: 77 Comm: usb-storage Not tainted 5.19.0-yocto-standard #5 Hardware name: Xilinx Zynq Platform unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from __warn+0xb0/0x198 __warn from warn_slowpath_fmt+0x80/0xb4 warn_slowpath_fmt from check_for_illegal_area+0x130/0x168 check_for_illegal_area from debug_dma_map_sg+0x94/0x368 debug_dma_map_sg from __dma_map_sg_attrs+0x114/0x128 __dma_map_sg_attrs from dma_map_sg_attrs+0x18/0x24 dma_map_sg_attrs from usb_hcd_map_urb_for_dma+0x250/0x3b4 usb_hcd_map_urb_for_dma from usb_hcd_submit_urb+0x194/0x214 usb_hcd_submit_urb from usb_sg_wait+0xa4/0x118 usb_sg_wait from usb_stor_bulk_transfer_sglist+0xa0/0xec usb_stor_bulk_transfer_sglist from usb_stor_bulk_srb+0x38/0x70 usb_stor_bulk_srb from usb_stor_Bulk_transport+0x150/0x360 usb_stor_Bulk_transport from usb_stor_invoke_transport+0x38/0x440 usb_stor_invoke_transport from usb_stor_control_thread+0x1e0/0x238 usb_stor_control_thread from kthread+0xf8/0x104 kthread from ret_from_fork+0x14/0x2c Refactor memory_intersects to fix the two problems above. Before the 1d7db834 ("dma-debug: use memory_intersects() directly"), memory_intersects is called only by printk_late_init: printk_late_init -> init_section_intersects ->memory_intersects. There were few places where memory_intersects was called. When commit 1d7db834 ("dma-debug: use memory_intersects() directly") was merged and CONFIG_DMA_API_DEBUG is enabled, the DMA subsystem uses it to check for an illegal area and the calltrace above is triggered. [akpm@linux-foundation.org: fix nearby comment typo] Link: https://lkml.kernel.org/r/20220819081145.948016-1-quanyang.wang@windriver.com Fixes: 97955936 ("asm/sections: add helpers to check for section data") Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Thierry Reding <treding@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Liu Shixin authored
The vmemmap pages is marked by kmemleak when allocated from memblock. Remove it from kmemleak when freeing the page. Otherwise, when we reuse the page, kmemleak may report such an error and then stop working. kmemleak: Cannot insert 0xffff98fb6eab3d40 into the object search tree (overlaps existing) kmemleak: Kernel memory leak detector disabled kmemleak: Object 0xffff98fb6be00000 (size 335544320): kmemleak: comm "swapper", pid 0, jiffies 4294892296 kmemleak: min_count = 0 kmemleak: count = 0 kmemleak: flags = 0x1 kmemleak: checksum = 0 kmemleak: backtrace: Link: https://lkml.kernel.org/r/20220819094005.2928241-1-liushixin2@huawei.com Fixes: f41f2ed4 (mm: hugetlb: free the vmemmap pages associated with each HugeTLB page) Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Heming Zhao authored
After commit 0737e01d ("ocfs2: ocfs2_mount_volume does cleanup job before return error"), any procedure after ocfs2_dlm_init() fails will trigger crash when calling ocfs2_dlm_shutdown(). ie: On local mount mode, no dlm resource is initialized. If ocfs2_mount_volume() fails in ocfs2_find_slot(), error handling will call ocfs2_dlm_shutdown(), then does dlm resource cleanup job, which will trigger kernel crash. This solution should bypass uninitialized resources in ocfs2_dlm_shutdown(). Link: https://lkml.kernel.org/r/20220815085754.20417-1-heming.zhao@suse.com Fixes: 0737e01d ("ocfs2: ocfs2_mount_volume does cleanup job before return error") Signed-off-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Shakeel Butt authored
This reverts commit 96e51ccf. Recently we started running the kernel with rstat infrastructure on production traffic and begin to see negative memcg stats values. Particularly the 'sock' stat is the one which we observed having negative value. $ grep "sock " /mnt/memory/job/memory.stat sock 253952 total_sock 18446744073708724224 Re-run after couple of seconds $ grep "sock " /mnt/memory/job/memory.stat sock 253952 total_sock 53248 For now we are only seeing this issue on large machines (256 CPUs) and only with 'sock' stat. I think the networking stack increase the stat on one cpu and decrease it on another cpu much more often. So, this negative sock is due to rstat flusher flushing the stats on the CPU that has seen the decrement of sock but missed the CPU that has increments. A typical race condition. For easy stable backport, revert is the most simple solution. For long term solution, I am thinking of two directions. First is just reduce the race window by optimizing the rstat flusher. Second is if the reader sees a negative stat value, force flush and restart the stat collection. Basically retry but limited. Link: https://lkml.kernel.org/r/20220817172139.3141101-1-shakeelb@google.com Fixes: 96e51ccf ("memcg: cleanup racy sum avoidance code") Signed-off-by: Shakeel Butt <shakeelb@google.com> Cc: "Michal Koutný" <mkoutny@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Muchun Song <songmuchun@bytedance.com> Cc: David Hildenbrand <david@redhat.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: <stable@vger.kernel.org> [5.15] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Sergey Senozhatsky authored
zsmalloc() now returns ERR_PTR values as handles, which zram accidentally can pass to zs_free(). Another bad scenario is when zcomp_compress() fails - handle has default -ENOMEM value, and zs_free() will try to free that "pointer value". Add the missing check and make sure that zs_free() bails out when ERR_PTR() is passed to it. Link: https://lkml.kernel.org/r/20220816050906.2583956-1-senozhatsky@chromium.org Fixes: c7e6f17b ("zsmalloc: zs_malloc: return ERR_PTR on failure") Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org>, Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Liam Howlett authored
Take the mmap_read_lock() when using the VMA in binder_alloc_print_pages() and when checking for a VMA in binder_alloc_new_buf_locked(). It is worth noting binder_alloc_new_buf_locked() drops the VMA read lock after it verifies a VMA exists, but may be taken again deeper in the call stack, if necessary. Link: https://lkml.kernel.org/r/20220810160209.1630707-1-Liam.Howlett@oracle.com Fixes: a43cfc87 (android: binder: stop saving a pointer to the VMA) Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Ondrej Mosnacek <omosnace@redhat.com> Reported-by: <syzbot+a7b60a176ec13cafb793@syzkaller.appspotmail.com> Acked-by: Carlos Llamas <cmllamas@google.com> Tested-by: Ondrej Mosnacek <omosnace@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Christian Brauner (Microsoft) <brauner@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hridya Valsaraju <hridya@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Martijn Coenen <maco@android.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@android.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: "Arve Hjønnevåg" <arve@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Alex Williamson authored
The below referenced commit makes the same error as 1c563432 ("mm: fix is_pinnable_page against a cma page"), re-interpreting the logic to exclude pinning of the zero page, which breaks device assignment with vfio. To avoid further subtle mistakes, split the logic into discrete tests. [akpm@linux-foundation.org: simplify comment, per John] Link: https://lkml.kernel.org/r/166015037385.760108.16881097713975517242.stgit@omen Link: https://lore.kernel.org/all/165490039431.944052.12458624139225785964.stgit@omen Fixes: f25cbb7a ("mm: add zone device coherent type memory support") Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Tested-by: Slawomir Laba <slawomirx.laba@intel.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Alistair Popple <apopple@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Stephen Brennan authored
The rest of the kallsyms symbols are useless without knowing the number of symbols in the table. In an earlier patch, I somehow dropped the kallsyms_num_syms symbol, so add it back in. Link: https://lkml.kernel.org/r/20220808205410.18590-1-stephen.s.brennan@oracle.com Fixes: 5fd8fea9 ("vmcoreinfo: include kallsyms symbols") Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Guilherme G. Piccoli authored
Both @canonical and @ibm email addresses are invalid now; use my personal address instead. Link: https://lkml.kernel.org/r/20220804202207.439427-1-gpiccoli@igalia.comSigned-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Khazhismel Kumykov authored
When a disk is removed, bdi_unregister gets called to stop further writeback and wait for associated delayed work to complete. However, wb_inode_writeback_end() may schedule bandwidth estimation dwork after this has completed, which can result in the timer attempting to access the just freed bdi_writeback. Fix this by checking if the bdi_writeback is alive, similar to when scheduling writeback work. Since this requires wb->work_lock, and wb_inode_writeback_end() may get called from interrupt, switch wb->work_lock to an irqsafe lock. Link: https://lkml.kernel.org/r/20220801155034.3772543-1-khazhy@google.com Fixes: 45a2966f ("writeback: fix bandwidth estimate for spiky workload") Signed-off-by: Khazhismel Kumykov <khazhy@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Michael Stapelberg <stapelberg+linux@google.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
If we allocate a new page, we need to make sure that our folio matches that new page. If we do end up in this code path, we store the wrong page in the shmem inode's page cache, and I would rather imagine that data corruption ensues. This will be solved by changing shmem_replace_page() to shmem_replace_folio(), but this is the minimal fix. Link: https://lkml.kernel.org/r/20220730042518.1264767-1-willy@infradead.org Fixes: da08e9b7 ("mm/shmem: convert shmem_swapin_page() to shmem_swapin_folio()") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Miaohe Lin authored
In MCOPY_ATOMIC_CONTINUE case with a non-shared VMA, pages in the page cache are installed in the ptes. But hugepage_add_new_anon_rmap is called for them mistakenly because they're not vm_shared. This will corrupt the page->mapping used by page cache code. Link: https://lkml.kernel.org/r/20220712130542.18836-1-linmiaohe@huawei.com Fixes: f6191471 ("userfaultfd: add UFFDIO_CONTINUE ioctl") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds authored
Pull btrfs fixes from David Sterba: "Fixes: - check that subvolume is writable when changing xattrs from security namespace - fix memory leak in device lookup helper - update generation of hole file extent item when merging holes - fix space cache corruption and potential double allocations; this is a rare bug but can be serious once it happens, stable backports and analysis tool will be provided - fix error handling when deleting root references - fix crash due to assert when attempting to cancel suspended device replace, add message what to do if mount fails due to missing replace item Regressions: - don't merge pages into bio if their page offset is not contiguous - don't allow large NOWAIT direct reads, this could lead to short reads eg. in io_uring" * tag 'for-6.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: add info when mount fails due to stale replace target btrfs: replace: drop assert for suspended replace btrfs: fix silent failure when deleting root reference btrfs: fix space cache corruption and potential double allocations btrfs: don't allow large NOWAIT direct reads btrfs: don't merge pages into bio if their page offset is not contiguous btrfs: update generation of hole file extent item when merging holes btrfs: fix possible memory leak in btrfs_get_dev_args_from_path() btrfs: check if root is readonly while setting security xattr
-
git://git.samba.org/sfrench/cifs-2.6Linus Torvalds authored
Pull cfis fixes from Steve French: - two locking fixes (zero range, punch hole) - DFS 9 fix (padding), affecting some servers - three minor cleanup changes * tag '6.0-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Add helper function to check smb1+ server cifs: Use help macro to get the mid header size cifs: Use help macro to get the header preamble size cifs: skip extra NULL byte in filenames smb3: missing inode locks in punch hole smb3: missing inode locks in zero range
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull misc x86 fixes from Ingo Molnar: - Fix PAT on Xen, which caused i915 driver failures - Fix compat INT 80 entry crash on Xen PV guests - Fix 'MMIO Stale Data' mitigation status reporting on older Intel CPUs - Fix RSB stuffing regressions - Fix ORC unwinding on ftrace trampolines - Add Intel Raptor Lake CPU model number - Fix (work around) a SEV-SNP bootloader bug providing bogus values in boot_params->cc_blob_address, by ignoring the value on !SEV-SNP bootups. - Fix SEV-SNP early boot failure - Fix the objtool list of noreturn functions and annotate snp_abort(), which bug confused objtool on gcc-12. - Fix the documentation for retbleed * tag 'x86-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/ABI: Mention retbleed vulnerability info file for sysfs x86/sev: Mark snp_abort() noreturn x86/sev: Don't use cc_platform_has() for early SEV-SNP calls x86/boot: Don't propagate uninitialized boot_params->cc_blob_address x86/cpu: Add new Raptor Lake CPU model number x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry x86/nospec: Fix i386 RSB stuffing x86/nospec: Unwreck the RSB stuffing x86/bugs: Add "unknown" reporting for MMIO Stale Data x86/entry: Fix entry_INT80_compat for Xen PV guests x86/PAT: Have pat_enabled() properly reflect state when running on Xen
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 perf fixes from Ingo Molnar: "Misc fixes: an Arch-LBR fix, a PEBS enumeration fix, an Intel DS fix, PEBS constraints fix on Alder Lake CPUs and an Intel uncore PMU fix" * tag 'perf-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMU perf/x86/intel: Fix pebs event constraints for ADL perf/x86/intel/ds: Fix precise store latency handling perf/x86/core: Set pebs_capable and PMU_FL_PEBS_ALL for the Baseline perf/x86/lbr: Enable the branch type for the Arch LBR by default
-
Linus Torvalds authored
Merge tag 'perf-tools-fixes-for-v6.0-2022-08-27' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fixup setup of weak groups when using 'perf stat --repeat', add a 'perf test' for it. - Fix memory leaks in 'perf sched record' detected with -fsanitize=address. - Fix build when PYTHON_CONFIG is user supplied. - Capitalize topdown metrics' names in 'perf stat', so that the output, sometimes parsed, matches the Intel SDM docs. - Make sure the documentation for the save_type filter about Intel systems with Arch LBR support (12th-Gen+ client or 4th-Gen Xeon+ server) reflects recent related kernel changes. - Fix 'perf record' man page formatting of description of support to hybrid systems. - Update arm64´s KVM header from the kernel sources. * tag 'perf-tools-fixes-for-v6.0-2022-08-27' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf stat: Capitalize topdown metrics' names perf docs: Update the documentation for the save_type filter perf sched: Fix memory leaks in __cmd_record detected with -fsanitize=address perf record: Fix manpage formatting of description of support to hybrid systems perf test: Stat test for repeat with a weak group perf stat: Clear evsel->reset_group for each stat run tools kvm headers arm64: Update KVM header from the kernel sources perf python: Fix build when PYTHON_CONFIG is user supplied
-
- 27 Aug, 2022 7 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull thermal control fixes from Rafael Wysocki: "Fix two issues introduced recently and one driver problem leading to a NULL pointer dereference in some cases. Specifics: - Add missing EXPORT_SYMBOL_GPL in the thermal core and add back the required 'trips' property to the thermal zone DT bindings (Daniel Lezcano) - Prevent the int340x_thermal driver from crashing when a package with a buffer of 0 length is returned by an ACPI control method evaluated by it (Lee, Chun-Yi)" * tag 'thermal-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal/int340x_thermal: handle data_vault when the value is ZERO_SIZE_PTR dt-bindings: thermal: Fix missing required property thermal/core: Add missing EXPORT_SYMBOL_GPL
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull power management fix from Rafael Wysocki: "Make __resolve_freq() check the presence of the frequency table instead of checking whether or not the ->target_index() callback is implemented by the driver, because that need not be the case when __resolve_freq() is used (Lukasz Luba)" * tag 'pm-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: check only freq_table in __resolve_freq()
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull ACPI fixes from Rafael Wysocki: "These fix issues introduced by recent changes related to the handling of ACPI device properties and a coding mistake in the exit path of the ACPI processor driver. Specifics: - Prevent acpi_thermal_cpufreq_exit() from attempting to remove the same frequency QoS request multiple times (Riwen Lu) - Fix type detection for integer ACPI device properties (Stefan Binding) - Avoid emitting false-positive warnings when processing ACPI device properties and drop the useless default case from the acpi_copy_property_array_uint() macro (Sakari Ailus)" * tag 'acpi-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: property: Remove default association from integer maximum values ACPI: property: Ignore already existing data node tags ACPI: property: Fix type detection of unified integer reading functions ACPI: processor: Remove freq Qos request for all CPUs
-
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds authored
Pull s390 fixes from Vasily Gorbik: - Fix double free of guarded storage and runtime instrumentation control blocks on fork() failure - Fix triggering write fault when VMA does not allow VM_WRITE * tag 's390-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/mm: do not trigger write fault when vma does not allow VM_WRITE s390: fix double free of GS and RI CBs on fork() failure
-
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds authored
Pull xen fixes from Juergen Gross: - two minor cleanups - a fix of the xen/privcmd driver avoiding a possible NULL dereference in an error case * tag 'for-linus-6.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/privcmd: fix error exit of privcmd_ioctl_dm_op() xen: move from strlcpy with unused retval to strscpy xen: x86: remove setting the obsolete config XEN_MAX_DOMAIN_MEMORY
-
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/auditLinus Torvalds authored
Pull audit fix from Paul Moore: "Another small audit patch, this time to fix a bug where the return codes were not properly set before the audit filters were run, potentially resulting in missed audit records" * tag 'audit-pr-20220826' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: move audit_return_fixup before the filters
-
git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdevLinus Torvalds authored
Pull fbdev fixes and updates from Helge Deller: "Mostly just small patches, with the exception of the bigger indenting cleanups in the sisfb and radeonfb drivers. Two patches should be mentioned though: A fix-up for fbdev if the screen resize fails (by Shigeru Yoshida), and a potential divide by zero fix in fb_pm2fb (by Letu Ren). Summary: Major fixes: - Revert the changes for fbcon console when vc_resize() fails [Shigeru Yoshida] - Avoid a potential divide by zero error in fb_pm2fb [Letu Ren] Minor fixes: - Add missing pci_disable_device() in chipsfb_pci_init() [Yang Yingliang] - Fix tests for platform_get_irq() failure in omapfb [Yu Zhe] - Destroy mutex on freeing struct fb_info in fbsysfs [Shigeru Yoshida] Cleanups: - Move fbdev drivers from strlcpy to strscpy [Wolfram Sang] - Indenting fixes, comment fixes, ... [Jiapeng Chong & Jilin Yuan]" * tag 'fbdev-for-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev: fbdev: fbcon: Properly revert changes when vc_resize() failed fbdev: Move fbdev drivers from strlcpy to strscpy fbdev: omap: Remove unnecessary print function dev_err() fbdev: chipsfb: Add missing pci_disable_device() in chipsfb_pci_init() fbdev: fbcon: Destroy mutex on freeing struct fb_info fbdev: radeon: Clean up some inconsistent indenting fbdev: sisfb: Clean up some inconsistent indenting fbdev: fb_pm2fb: Avoid potential divide by zero error fbdev: ssd1307fb: Fix repeated words in comments fbdev: omapfb: Fix tests for platform_get_irq() failure
-