- 04 Jul, 2024 40 commits
-
-
Ilya Leoshkevich authored
All other sanitizers are disabled for boot as well. While at it, add a comment explaining why we need this. Link: https://lkml.kernel.org/r/20240621113706.315500-23-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
On s390 the virtual address 0 is valid (current CPU's lowcore is mapped there), therefore KMSAN should not complain about it. Disable the respective check on s390. There doesn't seem to be a Kconfig option to describe this situation, so explicitly check for s390. Link: https://lkml.kernel.org/r/20240621113706.315500-22-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
The constraints of the DFLTCC inline assembly are not precise: they do not communicate the size of the output buffers to the compiler, so it cannot automatically instrument it. Add the manual kmsan_unpoison_memory() calls for the output buffers. The logic is the same as in [1]. [1] https://github.com/zlib-ng/zlib-ng/commit/1f5ddcc009ac3511e99fc88736a9e1a6381168c5 Link: https://lkml.kernel.org/r/20240621113706.315500-21-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reported-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
KMSAN warns about check_canary() accessing the canary. The reason is that, even though set_canary() is properly instrumented and sets shadow, slub explicitly poisons the canary's address range afterwards. Unpoisoning the canary is not the right thing to do: only check_canary() is supposed to ever touch it. Instead, disable KMSAN checks around canary read accesses. Link: https://lkml.kernel.org/r/20240621113706.315500-20-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
Even though the KMSAN warnings generated by memchr_inv() are suppressed by metadata_access_enable(), its return value may still be poisoned. The reason is that the last iteration of memchr_inv() returns `*start != value ? start : NULL`, where *start is poisoned. Because of this, somewhat counterintuitively, the shadow value computed by visitSelectInst() is equal to `(uintptr_t)start`. One possibility to fix this, since the intention behind guarding memchr_inv() behind metadata_access_enable() is to touch poisoned metadata without triggering KMSAN, is to unpoison its return value. However, this approach is too fragile. So simply disable the KMSAN checks in the respective functions. Link: https://lkml.kernel.org/r/20240621113706.315500-19-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
Building the kernel with CONFIG_SLUB_DEBUG and CONFIG_KMSAN causes KMSAN to complain about touching redzones in kfree(). Fix by extending the existing KASAN-related metadata_access_enable() and metadata_access_disable() functions to KMSAN. Link: https://lkml.kernel.org/r/20240621113706.315500-18-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
KMSAN_WARN_ON() is required for implementing s390-specific KMSAN functions, but right now it's available only to the KMSAN internal functions. Expose it to subsystems through <linux/kmsan.h>. Link: https://lkml.kernel.org/r/20240621113706.315500-17-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
x86's alloc_node_data() rounds up node data size to PAGE_SIZE. It's not explained why it's needed, but it's most likely for performance reasons, since the padding bytes are not used anywhere. Some other architectures do it as well, e.g., mips rounds it up to the cache line size. kmsan_init_shadow() initializes metadata for each node data and assumes the x86 rounding, which does not match other architectures. This may cause the range end to overshoot the end of available memory, in turn causing virt_to_page_or_null() in kmsan_init_alloc_meta_for_range() to return NULL, which leads to kernel panic shortly after. Since the padding bytes are not used, drop the rounding. Link: https://lkml.kernel.org/r/20240621113706.315500-16-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
Improve the readability by replacing the custom aligning logic with ALIGN_DOWN(). Unlike other places where a similar sequence is used, there is no size parameter that needs to be adjusted, so the standard macro fits. Link: https://lkml.kernel.org/r/20240621113706.315500-15-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
Avoid false KMSAN negatives with SLUB_DEBUG by allowing kmsan_slab_free() to poison the freed memory, and by preventing init_object() from unpoisoning new allocations by using __memset(). There are two alternatives to this approach. First, init_object() can be marked with __no_sanitize_memory. This annotation should be used with great care, because it drops all instrumentation from the function, and any shadow writes will be lost. Even though this is not a concern with the current init_object() implementation, this may change in the future. Second, kmsan_poison_memory() calls may be added after memset() calls. The downside is that init_object() is called from free_debug_processing(), in which case poisoning will erase the distinction between simply uninitialized memory and UAF. Link: https://lkml.kernel.org/r/20240621113706.315500-14-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
Add a wrapper for memset() that prevents unpoisoning. This is useful for filling memory allocator redzones. Link: https://lkml.kernel.org/r/20240621113706.315500-13-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
Like for KASAN, it's useful to temporarily disable KMSAN checks around, e.g., redzone accesses. Introduce kmsan_disable_current() and kmsan_enable_current(), which are similar to their KASAN counterparts. Make them reentrant in order to handle memory allocations in interrupt context. Repurpose the allow_reporting field for this. Link: https://lkml.kernel.org/r/20240621113706.315500-12-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
When building the kmsan test as a module, modpost fails with the following error message: ERROR: modpost: "panic_on_kmsan" [mm/kmsan/kmsan_test.ko] undefined! Export panic_on_kmsan in order to improve the KMSAN usability for modules. Link: https://lkml.kernel.org/r/20240621113706.315500-11-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
Each s390 CPU has lowcore pages associated with it. Each CPU sees its own lowcore at virtual address 0 through a hardware mechanism called prefixing. Additionally, all lowcores are mapped to non-0 virtual addresses stored in the lowcore_ptr[] array. When lowcore is accessed through virtual address 0, one needs to resolve metadata for lowcore_ptr[raw_smp_processor_id()]. Expose kmsan_get_metadata() to make it possible to do this from the arch code. Link: https://lkml.kernel.org/r/20240621113706.315500-10-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
Replace the x86-specific asm/pgtable_64_types.h #include with the linux/pgtable.h one, which all architectures have. While at it, sort the headers alphabetically for the sake of consistency with other KMSAN code. Link: https://lkml.kernel.org/r/20240621113706.315500-9-iii@linux.ibm.com Fixes: f80be457 ("kmsan: add KMSAN runtime core") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Suggested-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
The value assigned to prot is immediately overwritten on the next line with PAGE_KERNEL. The right hand side of the assignment has no side-effects. Link: https://lkml.kernel.org/r/20240621113706.315500-8-iii@linux.ibm.com Fixes: b073d7f8 ("mm: kmsan: maintain KMSAN metadata for page operations") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Suggested-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
Comparing pointers with TASK_SIZE does not make sense when kernel and userspace overlap. Assume that we are handling user memory access in this case. Link: https://lkml.kernel.org/r/20240621113706.315500-7-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reported-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
Comparing pointers with TASK_SIZE does not make sense when kernel and userspace overlap. Skip the comparison when this is the case. Link: https://lkml.kernel.org/r/20240621113706.315500-6-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
The inline assembly block in s390's chsc() stores that much. Link: https://lkml.kernel.org/r/20240621113706.315500-5-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
KMSAN relies on memblock returning all available pages to it (see kmsan_memblock_free_pages()). It partitions these pages into 3 categories: pages available to the buddy allocator, shadow pages and origin pages. This partitioning is static. If new pages appear after kmsan_init_runtime(), it is considered an error. DEFERRED_STRUCT_PAGE_INIT causes this, so mark it as incompatible with KMSAN. Link: https://lkml.kernel.org/r/20240621113706.315500-4-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
It's useful to have both tests and kmsan.panic=1 during development, but right now the warnings, that the tests cause, lead to kernel panics. Temporarily set kmsan.panic=0 for the duration of the KMSAN testing. Link: https://lkml.kernel.org/r/20240621113706.315500-3-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Ilya Leoshkevich authored
Patch series "kmsan: Enable on s390", v7. Architectures use assembly code to initialize ftrace_regs and call ftrace_ops_list_func(). Therefore, from the KMSAN's point of view, ftrace_regs is poisoned on ftrace_ops_list_func entry(). This causes KMSAN warnings when running the ftrace testsuite. Fix by trusting the architecture-specific assembly code and always unpoisoning ftrace_regs in ftrace_ops_list_func. The issue was not encountered on x86_64 so far only by accident: assembly-allocated ftrace_regs was overlapping a stale partially unpoisoned stack frame. Poisoning stack frames before returns [1] makes the issue appear on x86_64 as well. [1] https://github.com/iii-i/llvm-project/commits/msan-poison-allocas-before-returning-2024-06-12/ Link: https://lkml.kernel.org/r/20240621113706.315500-1-iii@linux.ibm.com Link: https://lkml.kernel.org/r/20240621113706.315500-2-iii@linux.ibm.comSigned-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
DAMON bi-weekly community meetup series has continued since 2022-08-15 for community members who prefer synchronous chat over asynchronous mails. Recently I got some feedbacks about the series from a few people. They told me the series is helpful for understanding of the project and particiapting to the development, but it could be further better in terms of the visibility. Based on that, I started sending meeting reminder for every occurrence. For people who don't subscribe the mailing list, however, adding an announcement on the official document could be helpful. Document the series on DAMON maintainer's profile for the purpose. Link: https://lkml.kernel.org/r/20240621163626.74815-3-sj@kernel.orgSigned-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
Patch series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series", v2. There is a mailing tool that developed and maintained by DAMON maintainer aiming to support DAMON community. Also there are DAMON community meetup series. Both are known to have rooms of improvements in terms of their visibility. Document those on the maintainer's profile document. This patch (of 2): Since DAMON was merged into mainline, I periodically received some questions around DAMON's mailing lists based workflow. The workflow is not different from the normal ones that well documented, but it is also true that it is not always easy and familiar for everyone. I personally overcame it by developing and using a simple tool, named HacKerMaiL (hkml)[1]. Based on my experience, I believe it is matured enough to be used for simple workflows like that of DAMON. Actually some DAMON contributors and Linux kernel developers other than myself told me they are using the tool. As DAMON maintainer, I also believe helping new DAMON community members onboarding to the worklow is one of the most important parts of my responsibilities. For the reason, the tool is announced[2] to support DAMON community. To further increasing the visibility of the fact, document the tool and the support plan on DAMON maintainer's profile. [1] https://github.com/damonitor/hackermail [2] https://github.com/damonitor/hackermail/commit/3909dad91301 Link: https://lkml.kernel.org/r/20240621163626.74815-1-sj@kernel.org Link: https://lkml.kernel.org/r/20240621163626.74815-2-sj@kernel.orgSigned-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
David Hildenbrand authored
KCSAN complains about possible data races: while we check for a page_type -- for example for sanity checks -- we might concurrently modify the mapcount that overlays page_type. Let's use READ_ONCE to avoid load tearing (shouldn't make a difference) and to make KCSAN happy. Likely, we might also want to use WRITE_ONCE for the writer side of page_type, if KCSAN ever complains about that. But we'll not mess with that for now. Note: nothing should really be broken besides wrong KCSAN complaints. The sanity check that triggers this was added in commit 68f03208 ("mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]()"). Even before that similar races likely where possible, ever since we added page_type in commit 6e292b9b ("mm: split page_type out from _mapcount"). Link: https://lkml.kernel.org/r/20240531125616.2850153-1-david@redhat.comSigned-off-by: David Hildenbrand <david@redhat.com> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202405281431.c46a3be9-lkp@intel.comReviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
Since the callers are converted to use nr_pages naming, use it inside too. Link: https://lkml.kernel.org/r/20240618091242.2140164-5-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
Use nr_pages instead of pages_per_huge_page and move the address alignment from copy_user_large_folio() into the callers since it is only needed when we don't know which address will be accessed. Link: https://lkml.kernel.org/r/20240618091242.2140164-4-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
Directly use folio in struct copy_subpage_arg. Link: https://lkml.kernel.org/r/20240618091242.2140164-3-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
Patch series "mm: improve clear and copy user folio", v2. Some folio conversions. An improvement is to move address alignment into the caller as it is only needed if we don't know which address will be accessed when clearing/copying user folios. This patch (of 4): Replace clear_huge_page() with folio_zero_user(), and take a folio instead of a page. Directly get number of pages by folio_nr_pages() to remove pages_per_huge_page argument, furthermore, move the address alignment from folio_zero_user() to the callers since the alignment is only needed when we don't know which address will be accessed. Link: https://lkml.kernel.org/r/20240618091242.2140164-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20240618091242.2140164-2-wangkefeng.wang@huawei.comSigned-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wei Yang authored
For page with order O, we are checking its order (O + 1)'s buddy. If it is free, we would like to put it to the tail and expect it would be merged to a page with order (O + 2). Reword the comment to reflect it. Link: https://lkml.kernel.org/r/20240619010612.20740-4-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wei Yang authored
The GFP flags used to choose the zonelist is __GFP_THISNODE. Let's change it to what exactly it should be. Link: https://lkml.kernel.org/r/20240619010612.20740-3-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wei Yang authored
Current check on MAX_ZONELISTS is wrapped in CONFIG_DEBUG_MEMORY_INIT, which may not be triggered all the time. Let's move it out to a more general place. Link: https://lkml.kernel.org/r/20240619010612.20740-2-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wei Yang authored
Function subsection_map_init() is only used in free_area_init() in the loop of for_each_mem_pfn_range(). And we are sure in each iteration of for_each_mem_pfn_range(), start_pfn < end_pfn. So nr_pages is not possible to be 0 and we can remove the check. Link: https://lkml.kernel.org/r/20240619010612.20740-1-richard.weiyang@gmail.comSigned-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Jiaqi Yan authored
Logs from memory_failure and other memory-failure.c code follow the format: "Memory failure: 0x{pfn}: ${lower_case_message}" Convert the logs in unpoison_memory to follow similar format: "Unpoison: 0x${pfn}: ${lower_case_message}" For example (from local test): [ 1331.938397] Unpoison: 0x144bc8: page was already unpoisoned No functional change in this commit. Link: https://lkml.kernel.org/r/20240619063355.171313-1-jiaqiyan@google.comSigned-off-by: Jiaqi Yan <jiaqiyan@google.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Javier Martinez Canillas authored
Currently ppc64 and x86 are mentioned as architectures where a 65536 value is reasonable but arm64 isn't listed and it is also a 64-bit architecture. The help text says that for "arm" the value should be no higher than 32768 but it's only talking about 32-bit ARM. Adding arm64 to the above list can make this more clear and avoid confusing users who may think that the 32k limit would also apply to 64-bit ARM. Link: https://lkml.kernel.org/r/20240619083047.114613-1-javierm@redhat.comSigned-off-by: Javier Martinez Canillas <javierm@redhat.com> Cc: Brian Masney <bmasney@redhat.com> Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Maxime Ripard <mripard@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
JaeJoon Jung authored
Since the return value of mas_wr_store_entry() is not used, the return type can be changed to void. Link: https://lkml.kernel.org/r/20240614092428.29491-1-rgbi3307@gmail.comSigned-off-by: JaeJoon Jung <rgbi3307@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Barry Song authored
The folio_test_anon(folio)==false cases has been relocated to folio_add_new_anon_rmap(). Additionally, four other callers consistently pass anonymous folios. stack 1: remove_migration_pmd -> folio_add_anon_rmap_pmd -> __folio_add_anon_rmap stack 2: __split_huge_pmd_locked -> folio_add_anon_rmap_ptes -> __folio_add_anon_rmap stack 3: remove_migration_pmd -> folio_add_anon_rmap_pmd -> __folio_add_anon_rmap (RMAP_LEVEL_PMD) stack 4: try_to_merge_one_page -> replace_page -> folio_add_anon_rmap_pte -> __folio_add_anon_rmap __folio_add_anon_rmap() only needs to handle the cases folio_test_anon(folio)==true now. We can remove the !folio_test_anon(folio)) path within __folio_add_anon_rmap() now. Link: https://lkml.kernel.org/r/20240617231137.80726-4-21cnbao@gmail.comSigned-off-by: Barry Song <v-songbaohua@oppo.com> Suggested-by: David Hildenbrand <david@redhat.com> Tested-by: Shuai Yuan <yuanshuai@oppo.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chris Li <chrisl@kernel.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Barry Song authored
For the !folio_test_anon(folio) case, we can now invoke folio_add_new_anon_rmap() with the rmap flags set to either EXCLUSIVE or non-EXCLUSIVE. This action will suppress the VM_WARN_ON_FOLIO check within __folio_add_anon_rmap() while initiating the process of bringing up mTHP swapin. static __always_inline void __folio_add_anon_rmap(struct folio *folio, struct page *page, int nr_pages, struct vm_area_struct *vma, unsigned long address, rmap_t flags, enum rmap_level level) { ... if (unlikely(!folio_test_anon(folio))) { VM_WARN_ON_FOLIO(folio_test_large(folio) && level != RMAP_LEVEL_PMD, folio); } ... } It also improves the code's readability. Currently, all new anonymous folios calling folio_add_anon_rmap_ptes() are order-0. This ensures that new folios cannot be partially exclusive; they are either entirely exclusive or entirely shared. A useful comment from Hugh's fix: : Commit "mm: use folio_add_new_anon_rmap() if folio_test_anon(folio)== : false" has extended folio_add_new_anon_rmap() to use on non-exclusive : folios, already visible to others in swap cache and on LRU. : : That renders its non-atomic __folio_set_swapbacked() unsafe: it risks : overwriting concurrent atomic operations on folio->flags, losing bits : added or restoring bits cleared. Since it's only used in this risky way : when folio_test_locked and !folio_test_anon, many such races are excluded; : but, for example, isolations by folio_test_clear_lru() are vulnerable, and : setting or clearing active. : : It could just use the atomic folio_set_swapbacked(); but this function : does try to avoid atomics where it can, so use a branch instead: just : avoid setting swapbacked when it is already set, that is good enough. : (Swapbacked is normally stable once set: lazyfree can undo it, but only : later, when found anon in a page table.) : : This fixes a lot of instability under compaction and swapping loads: : assorted "Bad page"s, VM_BUG_ON_FOLIO()s, apparently even page double : frees - though I've not worked out what races could lead to the latter. [akpm@linux-foundation.org: comment fixes, per David and akpm] [v-songbaohua@oppo.com: lock the folio to avoid race] Link: https://lkml.kernel.org/r/20240622032002.53033-1-21cnbao@gmail.com [hughd@google.com: folio_add_new_anon_rmap() careful __folio_set_swapbacked()] Link: https://lkml.kernel.org/r/f3599b1d-8323-0dc5-e9e0-fdb3cfc3dd5a@google.com Link: https://lkml.kernel.org/r/20240617231137.80726-3-21cnbao@gmail.comSigned-off-by: Barry Song <v-songbaohua@oppo.com> Signed-off-by: Hugh Dickins <hughd@google.com> Suggested-by: David Hildenbrand <david@redhat.com> Tested-by: Shuai Yuan <yuanshuai@oppo.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chris Li <chrisl@kernel.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Barry Song authored
Patch series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()", v2. This patchset is preparatory work for mTHP swapin. folio_add_new_anon_rmap() assumes that new anon rmaps are always exclusive. However, this assumption doesn’t hold true for cases like do_swap_page(), where a new anon might be added to the swapcache and is not necessarily exclusive. The patchset extends the rmap flags to allow folio_add_new_anon_rmap() to handle both exclusive and non-exclusive new anon folios. The do_swap_page() function is updated to use this extended API with rmap flags. Consequently, all new anon folios now consistently use folio_add_new_anon_rmap(). The special case for !folio_test_anon() in __folio_add_anon_rmap() can be safely removed. In conclusion, new anon folios always use folio_add_new_anon_rmap(), regardless of exclusivity. Old anon folios continue to use __folio_add_anon_rmap() via folio_add_anon_rmap_pmd() and folio_add_anon_rmap_ptes(). This patch (of 3): In the case of a swap-in, a new anonymous folio is not necessarily exclusive. This patch updates the rmap flags to allow a new anonymous folio to be treated as either exclusive or non-exclusive. To maintain the existing behavior, we always use EXCLUSIVE as the default setting. [akpm@linux-foundation.org: cleanup and constifications per David and akpm] [v-songbaohua@oppo.com: fix missing doc for flags of folio_add_new_anon_rmap()] Link: https://lkml.kernel.org/r/20240619210641.62542-1-21cnbao@gmail.com [v-songbaohua@oppo.com: enhance doc for extend rmap flags arguments for folio_add_new_anon_rmap] Link: https://lkml.kernel.org/r/20240622030256.43775-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240617231137.80726-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240617231137.80726-2-21cnbao@gmail.comSigned-off-by: Barry Song <v-songbaohua@oppo.com> Suggested-by: David Hildenbrand <david@redhat.com> Tested-by: Shuai Yuan <yuanshuai@oppo.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chris Li <chrisl@kernel.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Shubhang Kaushik OS authored
'vmap allocation for size %lu failed: use vmalloc=<size> to increase size' The above warning is seen in the kernel functionality for allocation of the restricted virtual memory range till exhaustion. This message is misleading because 'vmalloc=' is supported on arm32, x86 platforms and is not a valid kernel parameter on a number of other platforms (in particular its not supported on arm64, alpha, loongarch, arc, csky, hexagon, microblaze, mips, nios2, openrisc, parisc, m64k, powerpc, riscv, sh, um, xtensa, s390, sparc). With the update, the output gets modified to include the function parameters along with the start and end of the virtual memory range allowed. The warning message after fix on kernel version 6.10.0-rc1+: vmalloc_node_range for size 33619968 failed: Address range restricted between 0xffff800082640000 - 0xffff800084650000 Backtrace with the misleading error message: vmap allocation for size 33619968 failed: use vmalloc=<size> to increase size insmod: vmalloc error: size 33554432, vm_struct allocation failed, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 46 PID: 1977 Comm: insmod Tainted: G E 6.10.0-rc1+ #79 Hardware name: INGRASYS Yushan Server iSystem TEMP-S000141176+10/Yushan Motherboard, BIOS 2.10.20230517 (SCP: xxx) yyyy/mm/dd Call trace: dump_backtrace+0xa0/0x128 show_stack+0x20/0x38 dump_stack_lvl+0x78/0x90 dump_stack+0x18/0x28 warn_alloc+0x12c/0x1b8 __vmalloc_node_range_noprof+0x28c/0x7e0 custom_init+0xb4/0xfff8 [test_driver] do_one_initcall+0x60/0x290 do_init_module+0x68/0x250 load_module+0x236c/0x2428 init_module_from_file+0x8c/0xd8 __arm64_sys_finit_module+0x1b4/0x388 invoke_syscall+0x78/0x108 el0_svc_common.constprop.0+0x48/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x3c/0x130 el0t_64_sync_handler+0x100/0x130 el0t_64_sync+0x190/0x198 [Shubhang@os.amperecomputing.com: v5] Link: https://lkml.kernel.org/r/CH2PR01MB5894B0182EA0B28DF2EFB916F5C72@CH2PR01MB5894.prod.exchangelabs.com Link: https://lkml.kernel.org/r/MN2PR01MB59025CC02D1D29516527A693F5C62@MN2PR01MB5902.prod.exchangelabs.comSigned-off-by: Shubhang Kaushik <shubhang@os.amperecomputing.com> Reviewed-by: Christoph Lameter (Ampere) <cl@linux.com> Cc: Christoph Lameter <cl@linux.com> Cc: Guo Ren <guoren@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Xiongwei Song <xiongwei.song@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-