- 17 Jul, 2019 36 commits
-
-
Anshuman Khandual authored
Finish up what commit c2febafc ("mm: convert generic code to 5-level paging") started while levelling up P4D huge mapping support at par with PUD and PMD. A new arch call back arch_ioremap_p4d_supported() is added which just maintains status quo (P4D huge map not supported) on x86, arm64 and powerpc. When HAVE_ARCH_HUGE_VMAP is enabled its just a simple check from the arch about the support, hence runtime effects are minimal. Link: http://lkml.kernel.org/r/1561699231-20991-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Anshuman Khandual authored
Virtual address alignment is essential in ensuring correct clearing for all intermediate level pgtable entries and freeing associated pgtable pages. An unaligned address can end up randomly freeing pgtable page that potentially still contains valid mappings. Hence also check it's alignment along with existing phys_addr check. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexander Potapenko authored
Add tests for heap and pagealloc initialization. These can be used to check init_on_alloc and init_on_free implementations as well as other approaches to initialization. Expected test output in the case the kernel provides heap initialization (e.g. when running with either init_on_alloc=1 or init_on_free=1): test_meminit: all 10 tests in test_pages passed test_meminit: all 40 tests in test_kvmalloc passed test_meminit: all 60 tests in test_kmemcache passed test_meminit: all 10 tests in test_rcu_persistent passed test_meminit: all 120 tests passed! Link: http://lkml.kernel.org/r/20190529123812.43089-4-glider@google.comSigned-off-by: Alexander Potapenko <glider@google.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Christoph Lameter <cl@linux.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Sandeep Patil <sspatil@android.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kees Cook authored
This adds __GFP_NOWARN to the kmalloc()-portions of the overflow test to avoid tainting the kernel. Additionally fixes up the math on wrap size to be architecture and page size agnostic. Link: http://lkml.kernel.org/r/201905282012.0A8767E24@keescook Fixes: ca90800a ("test_overflow: Add memory allocation overflow tests") Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Rosin authored
Make sure that the trailing NUL is considered part of the string and can be found. Link: http://lkml.kernel.org/r/20190506124634.6807-4-peda@axentia.seSigned-off-by: Peter Rosin <peda@axentia.se> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Rosin authored
If a memsetXX implementation is completely broken and fails in the first iteration, when i, j, and k are all zero, the failure is masked as zero is returned. Failing in the first iteration is perhaps the most likely failure, so this makes the tests pretty much useless. Avoid the situation by always setting a random unused bit in the result on failure. Link: http://lkml.kernel.org/r/20190506124634.6807-3-peda@axentia.se Fixes: 03270c13 ("lib/string.c: add testcases for memset16/32/64") Signed-off-by: Peter Rosin <peda@axentia.se> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Rosin authored
Patch series "lib/string: search for NUL with strchr/strnchr". I noticed an inconsistency where strchr and strnchr do not behave the same with respect to the trailing NUL. strchr is standardised and the kernel function conforms, and the kernel relies on the behavior. So, naturally strchr stays as-is and strnchr is what I change. While writing a few tests to verify that my new strnchr loop was sane, I noticed that the tests for memset16/32/64 had a problem. Since it's all about the lib/string.c file I made a short series of it all... This patch (of 3): strchr considers the terminating NUL to be part of the string, and NUL can thus be searched for with that function. For consistency, do the same with strnchr. Link: http://lkml.kernel.org/r/20190506124634.6807-2-peda@axentia.seSigned-off-by: Peter Rosin <peda@axentia.se> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexey Dobriyan authored
list_del() poisoning can generate 2 64-bit immediate loads but it also can generate one 64-bit immediate load and an addition: 48 b8 00 01 00 00 00 00 ad de movabs rax,0xdead000000000100 48 89 47 58 mov QWORD PTR [rdi+0x58],rax 48 05 00 01 00 00 <=====> add rax,0x100 48 89 47 60 mov QWORD PTR [rdi+0x60],rax However on x86_64 not all constants are equal: those within [-128, 127] range can be added with shorter "add r64, imm32" instruction: 48 b8 00 01 00 00 00 00 ad de movabs rax,0xdead000000000100 48 89 47 58 mov QWORD PTR [rdi+0x58],rax 48 83 c0 22 <======> add rax,0x22 48 89 47 60 mov QWORD PTR [rdi+0x60],rax Patch saves 2 bytes per some LIST_POISON2 usage. (Slightly disappointing) space savings on F29 x86_64 config: add/remove: 0/0 grow/shrink: 0/2164 up/down: 0/-5184 (-5184) Function old new delta zstd_get_workspace 548 546 -2 ... mlx4_delete_all_resources_for_slave 4826 4804 -22 Total: Before=83304131, After=83298947, chg -0.01% New constants are: 0xdead000000000100 0xdead000000000122 Note: LIST_POISON1 can't be changed to ...11 because something in page allocator requires low bit unset. Link: http://lkml.kernel.org/r/20190513191502.GA8492@avx2Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Joe Perches authored
Add a command line switch --no-moderated to skip L: mailing lists marked with 'moderated'. Some people prefer not emailing moderated mailing lists as the moderation time can be indeterminate and some emails can be intentionally dropped by a moderator. This can cause fragmentation of email threads when some are subscribed to a moderated list but others are not and emails are dropped. Link: http://lkml.kernel.org/r/6f23c2918ad9fc744269feb8f909bdfb105c5afc.camel@perches.comSigned-off-by: Joe Perches <joe@perches.com> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Qian Cai authored
Fix this compilation warning on x86 by making flush_cache_vmap() inline. lib/ioremap.c: In function 'ioremap_page_range': lib/ioremap.c:214:16: warning: variable 'start' set but not used [-Wunused-but-set-variable] unsigned long start; ^~~~~ While at it, convert all other similar functions to inline for consistency. Link: http://lkml.kernel.org/r/1562594592-15228-1-git-send-email-cai@lca.pwSigned-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Stephen Kitt authored
isa_page_to_bus() is deprecated and is no longer used anywhere. Remove it entirely. Link: http://lkml.kernel.org/r/20190613161155.16946-1-steve@sk2.orgSigned-off-by: Stephen Kitt <steve@sk2.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Masahiro Yamada authored
Now that BIT() can be used from assembly code, we can safely replace _BITUL() with equivalent BIT(). UAPI headers are still required to use _BITUL(), but there is no more reason to use it in kernel headers. BIT() is shorter. Link: http://lkml.kernel.org/r/20190609153941.17249-2-yamada.masahiro@socionext.comSigned-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Masahiro Yamada authored
BIT(), GENMASK(), etc. are useful to define register bits of hardware. However, low-level code is often written in assembly, where they are not available due to the hard-coded 1UL, 0UL. In fact, in-kernel headers such as arch/arm64/include/asm/sysreg.h use _BITUL() instead of BIT() so that the register bit macros are available in assembly. Using macros in include/uapi/linux/const.h have two reasons: [1] For use in uapi headers We should use underscore-prefixed variants for user-space. [2] For use in assembly code Since _BITUL() uses UL(1) instead of 1UL, it can be used as an alternative of BIT(). For [2], it is pretty easy to change BIT() etc. for use in assembly. This allows to replace _BUTUL() in kernel-space headers with BIT(). Link: http://lkml.kernel.org/r/20190609153941.17249-1-yamada.masahiro@socionext.comSigned-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Weitao Hou authored
fix lenght to length Link: http://lkml.kernel.org/r/20190521050937.4370-1-houweitaoo@gmail.comSigned-off-by: Weitao Hou <houweitaoo@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Radoslaw Burny authored
Normally, the inode's i_uid/i_gid are translated relative to s_user_ns, but this is not a correct behavior for proc. Since sysctl permission check in test_perm is done against GLOBAL_ROOT_[UG]ID, it makes more sense to use these values in u_[ug]id of proc inodes. In other words: although uid/gid in the inode is not read during test_perm, the inode logically belongs to the root of the namespace. I have confirmed this with Eric Biederman at LPC and in this thread: https://lore.kernel.org/lkml/87k1kzjdff.fsf@xmission.com Consequences ============ Since the i_[ug]id values of proc nodes are not used for permissions checks, this change usually makes no functional difference. However, it causes an issue in a setup where: * a namespace container is created without root user in container - hence the i_[ug]id of proc nodes are set to INVALID_[UG]ID * container creator tries to configure it by writing /proc/sys files, e.g. writing /proc/sys/kernel/shmmax to configure shared memory limit Kernel does not allow to open an inode for writing if its i_[ug]id are invalid, making it impossible to write shmmax and thus - configure the container. Using a container with no root mapping is apparently rare, but we do use this configuration at Google. Also, we use a generic tool to configure the container limits, and the inability to write any of them causes a failure. History ======= The invalid uids/gids in inodes first appeared due to 81754357 (fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns). However, AFAIK, this did not immediately cause any issues. The inability to write to these "invalid" inodes was only caused by a later commit 0bd23d09 (vfs: Don't modify inodes with a uid or gid unknown to the vfs). Tested: Used a repro program that creates a user namespace without any mapping and stat'ed /proc/$PID/root/proc/sys/kernel/shmmax from outside. Before the change, it shows the overflow uid, with the change it's 0. The overflow uid indicates that the uid in the inode is not correct and thus it is not possible to open the file for writing. Link: http://lkml.kernel.org/r/20190708115130.250149-1-rburny@google.com Fixes: 0bd23d09 ("vfs: Don't modify inodes with a uid or gid unknown to the vfs") Signed-off-by: Radoslaw Burny <rburny@google.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Seth Forshee <seth.forshee@canonical.com> Cc: John Sperbeck <jsperbeck@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexey Dobriyan authored
I thought that /proc/sysvipc has the same bug as /proc/net commit 1fde6f21 proc: fix /proc/net/* after setns(2) However, it doesn't! /proc/sysvipc files do get_ipc_ns(current->nsproxy->ipc_ns); in their open() hook and avoid the problem. Keep the test, maybe /proc/sysvipc will become broken someday :-\ Link: http://lkml.kernel.org/r/20190706180146.GA21015@avx2Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexey Dobriyan authored
Don't repeat function signatures twice. This is a kind-of-precursor for "struct proc_ops". Note: typeof(pde->proc_fops->...) ...; can't be used because ->proc_fops is "const struct file_operations *". "const" prevents assignment down the code and it can't be deleted in the type system. Link: http://lkml.kernel.org/r/20190529191110.GB5703@avx2Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexey Dobriyan authored
Add typeof_member() macro so that types can be extracted without introducing dummy variables. Link: http://lkml.kernel.org/r/20190529190720.GA5703@avx2Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kairui Song authored
Since commit 2724273e ("vmcore: add API to collect hardware dump in second kernel"), drivers are allowed to add device related dump data to vmcore as they want by using the device dump API. This has a potential issue, the data is stored in memory, drivers may append too much data and use too much memory. The vmcore is typically used in a kdump kernel which runs in a pre-reserved small chunk of memory. So as a result it will make kdump unusable at all due to OOM issues. So introduce new 'novmcoredd' command line option. User can disable device dump to reduce memory usage. This is helpful if device dump is using too much memory, disabling device dump could make sure a regular vmcore without device dump data is still available. [akpm@linux-foundation.org: tweak documentation] [akpm@linux-foundation.org: vmcore.c needs moduleparam.h] Link: http://lkml.kernel.org/r/20190528111856.7276-1-kasong@redhat.comSigned-off-by: Kairui Song <kasong@redhat.com> Acked-by: Dave Young <dyoung@redhat.com> Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com> Cc: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Cc: "David S . Miller" <davem@davemloft.net> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexey Dobriyan authored
Test tries to access vsyscall page and if it doesn't exist gets SIGSEGV which can spam into dmesg. However the segfault happens by design. Handle it and carry information via exit code to parent. Link: http://lkml.kernel.org/r/20190524181256.GA2260@avx2Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Hellwig authored
The whole header file deals with swap entries and PTEs, none of which can exist for nommu builds. The current nommu ports have lots of stubs to allow the inline functions in swapops.h to compile, but as none of this functionality is actually used there is no point in even providing it. This way we don't have to provide the stubs for the upcoming RISC-V nommu port, and can eventually remove it from the existing ports. Link: http://lkml.kernel.org/r/20190703122359.18200-4-hch@lst.deSigned-off-by: Christoph Hellwig <hch@lst.de> Cc: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Hellwig authored
Link: http://lkml.kernel.org/r/20190703122359.18200-3-hch@lst.deSigned-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Hellwig authored
We can't expose UAPI symbols differently based on CONFIG_ symbols, as userspace won't have them available. Instead always define the flag, but only respect it based on the config option. Link: http://lkml.kernel.org/r/20190703122359.18200-2-hch@lst.deSigned-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Doug Berger authored
The description of cma_declare_contiguous() indicates that if the 'fixed' argument is true the reserved contiguous area must be exactly at the address of the 'base' argument. However, the function currently allows the 'base', 'size', and 'limit' arguments to be silently adjusted to meet alignment constraints. This commit enforces the documented behavior through explicit checks that return an error if the region does not fit within a specified region. Link: http://lkml.kernel.org/r/1561422051-16142-1-git-send-email-opendmb@gmail.com Fixes: 5ea3b1b2 ("cma: add placement specifier for "cma=" kernel parameter") Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Cc: Yue Hu <huyue2@yulong.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Peng Fan <peng.fan@nxp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Henry Burns authored
z3fold_page_migration() calls memcpy(new_zhdr, zhdr, PAGE_SIZE). However, zhdr contains fields that can't be directly coppied over (ex: list_head, a circular linked list). We only need to initialize the linked lists in new_zhdr, as z3fold_isolate_page() already ensures that these lists are empty Additionally it is possible that zhdr->work has been placed in a workqueue. In this case we shouldn't migrate the page, as zhdr->work references zhdr as opposed to new_zhdr. Link: http://lkml.kernel.org/r/20190716000520.230595-1-henryburns@google.com Fixes: 1f862989 ("mm/z3fold.c: support page migration") Signed-off-by: Henry Burns <henryburns@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Vitaly Vul <vitaly.vul@sony.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Jonathan Adams <jwadams@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Henry Burns authored
z3fold_page_migrate() will never succeed because it attempts to acquire a lock that has already been taken by migrate.c in __unmap_and_move(). __unmap_and_move() migrate.c trylock_page(oldpage) move_to_new_page(oldpage_newpage) a_ops->migrate_page(oldpage, newpage) z3fold_page_migrate(oldpage, newpage) trylock_page(oldpage) Link: http://lkml.kernel.org/r/20190710213238.91835-1-henryburns@google.com Fixes: 1f862989 ("mm/z3fold.c: support page migration") Signed-off-by: Henry Burns <henryburns@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Vitaly Vul <vitaly.vul@sony.com> Cc: Jonathan Adams <jwadams@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Snild Dolkow <snild@sony.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrew Morton authored
Six sites are presently altering current->reclaim_state. There is a risk that one function stomps on a caller's value. Use a helper function to catch such errors. Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yafang Shao authored
There are six different reclaim paths by now: - kswapd reclaim path - node reclaim path - hibernate preallocate memory reclaim path - direct reclaim path - memcg reclaim path - memcg softlimit reclaim path The slab caches reclaimed in these paths are only calculated in the above three paths. There're some drawbacks if we don't calculate the reclaimed slab caches. - The sc->nr_reclaimed isn't correct if there're some slab caches relcaimed in this path. - The slab caches may be reclaimed thoroughly if there're lots of reclaimable slab caches and few page caches. Let's take an easy example for this case. If one memcg is full of slab caches and the limit of it is 512M, in other words there're approximately 512M slab caches in this memcg. Then the limit of the memcg is reached and the memcg reclaim begins, and then in this memcg reclaim path it will continuesly reclaim the slab caches until the sc->priority drops to 0. After this reclaim stops, you will find there're few slab caches left, which is less than 20M in my test case. While after this patch applied the number is greater than 300M and the sc->priority only drops to 3. Link: http://lkml.kernel.org/r/1561112086-6169-3-git-send-email-laoar.shao@gmail.comSigned-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yafang Shao authored
Patch series "mm/vmscan: calculate reclaimed slab in all reclaim paths". This patchset is to fix the issues in doing shrink slab. There're six different reclaim paths by now, - kswapd reclaim path - node reclaim path - hibernate preallocate memory reclaim path - direct reclaim path - memcg reclaim path - memcg softlimit reclaim path The slab caches reclaimed in these paths are only calculated in the above three paths. The issues are detailed explained in patch #2. We should calculate the reclaimed slab caches in every reclaim path. In order to do it, the struct reclaim_state is placed into the struct shrink_control. In node reclaim path, there'is another issue about shrinking slab, which is adressed in "mm/vmscan: shrink slab in node reclaim" (https://lore.kernel.org/linux-mm/1559874946-22960-1-git-send-email-laoar.shao@gmail.com/). This patch (of 2): The struct reclaim_state is used to record how many slab caches are reclaimed in one reclaim path. The struct shrink_control is used to control one reclaim path. So we'd better put reclaim_state into shrink_control. [laoar.shao@gmail.com: remove reclaim_state assignment from __perform_reclaim()] Link: http://lkml.kernel.org/r/1561381582-13697-1-git-send-email-laoar.shao@gmail.com Link: http://lkml.kernel.org/r/1561112086-6169-2-git-send-email-laoar.shao@gmail.comSigned-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yafang Shao authored
After commit 815744d7 ("mm: memcontrol: don't batch updates of local VM stats and events"), the local VM counter are not in sync with the hierarchical ones. Below is one example in a leaf memcg on my server (with 8 CPUs): inactive_file 3567570944 total_inactive_file 3568029696 We find that the deviation is very great because the 'val' in __mod_memcg_state() is in pages while the effective value in memcg_stat_show() is in bytes. So the maximum of this deviation between local VM stats and total VM stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an unacceptably great value. We should keep the local VM stats in sync with the total stats. In order to keep this behavior the same across counters, this patch updates __mod_lruvec_state() and __count_memcg_events() as well. Link: http://lkml.kernel.org/r/1562851979-10610-1-git-send-email-laoar.shao@gmail.comSigned-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Yafang Shao <shaoyafang@didiglobal.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Henry Burns authored
One of the gfp flags used to show that a page is movable is __GFP_HIGHMEM. Currently z3fold_alloc() fails when __GFP_HIGHMEM is passed. Now that z3fold pages are movable, we allow __GFP_HIGHMEM. We strip the movability related flags from the call to kmem_cache_alloc() for our slots since it is a kernel allocation. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20190712222118.108192-1-henryburns@google.comSigned-off-by: Henry Burns <henryburns@google.com> Acked-by: Vitaly Wool <vitalywool@gmail.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ryohei Suzuki authored
A comment referred to a non-existent function alloc_cma(), which should have been cma_alloc(). Link: http://lkml.kernel.org/r/20190712085549.5920-1-ryh.szk.cmnty@gmail.comSigned-off-by: Ryohei Suzuki <ryh.szk.cmnty@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Arnd Bergmann authored
Clang gets rather confused about two variables in the same special section when one of them is not initialized, leading to an assembler warning later: /tmp/slab_common-18f869.s: Assembler messages: /tmp/slab_common-18f869.s:7526: Warning: ignoring changed section attributes for .data..ro_after_init Adding an initialization to kmalloc_caches is rather silly here but does avoid the issue. Link: https://bugs.llvm.org/show_bug.cgi?id=42570 Link: http://lkml.kernel.org/r/20190712090455.266021-1-arnd@arndb.deSigned-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Arnd Bergmann authored
The mpi library contains some rather old inline assembly statements that produce a lot of warnings for 32-bit x86, such as: lib/mpi/mpih-div.c:76:16: error: invalid use of a cast in a inline asm context requiring an l-value: remove the cast or build with -fheinous-gnu-extensions udiv_qrnnd(qp[i], n1, n1, np[i], d); ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~ lib/mpi/longlong.h:423:20: note: expanded from macro 'udiv_qrnnd' : "=a" ((USItype)(q)), \ ~~~~~~~~~~^~ There is no point in doing a type cast for the output of an inline assembler statement, so just remove the cast here, as we have done for other architectures in the past. See also dea632ca ("lib/mpi: fix build with clang"). Link: http://lkml.kernel.org/r/20190712090740.340186-1-arnd@arndb.deSigned-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: Stefan Agner <stefan@agner.ch> Cc: Dmitry Kasatkin <dmitry.kasatkin@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Arnd Bergmann authored
When CONFIG_SYSFS is disabled but CONFIG_TMPFS is enabled, we get a warning about shmem_parse_huge() never being called: mm/shmem.c:417:12: error: unused function 'shmem_parse_huge' [-Werror,-Wunused-function] static int shmem_parse_huge(const char *str) Change the #ifdef so we no longer build this function in that configuration. Link: http://lkml.kernel.org/r/20190712091141.673355-1-arnd@arndb.de Fixes: 144df3b288c4 ("vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Hugh Dickins <hughd@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vineeth Remanan Pillai <vpillai@digitalocean.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vitaly Wool authored
As reported by Henry Burns: Running z3fold stress testing with address sanitization showed zhdr->slots was being used after it was freed. z3fold_free(z3fold_pool, handle) free_handle(handle) kmem_cache_free(pool->c_handle, zhdr->slots) release_z3fold_page_locked_list(kref) __release_z3fold_page(zhdr, true) zhdr_to_pool(zhdr) slots_to_pool(zhdr->slots) *BOOM* To fix this, add pointer to the pool back to z3fold_header and modify zhdr_to_pool to return zhdr->pool. Link: http://lkml.kernel.org/r/20190708134808.e89f3bfadd9f6ffd7eff9ba9@gmail.com Fixes: 7c2b8baa ("mm/z3fold.c: add structure for buddy handles") Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Reported-by: Henry Burns <henryburns@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Jonathan Adams <jwadams@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 16 Jul, 2019 4 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlightLinus Torvalds authored
Pull backlight updates from Lee Jones: "New Functionality: - Provide support for ACPI enumeration; gpio_backlight Fix-ups: - SPDX fixups; pwm_bl - Fix linear brightness levels to include number available; pwm_bl" * tag 'backlight-next-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight: backlight: pwm_bl: Fix heuristic to determine number of brightness levels backlight: gpio_backlight: Enable ACPI enumeration backlight: pwm_bl: Convert to use SPDX identifier
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull more block updates from Jens Axboe: "A later pull request with some followup items. I had some vacation coming up to the merge window, so certain things items were delayed a bit. This pull request also contains fixes that came in within the last few days of the merge window, which I didn't want to push right before sending you a pull request. This contains: - NVMe pull request, mostly fixes, but also a few minor items on the feature side that were timing constrained (Christoph et al) - Report zones fixes (Damien) - Removal of dead code (Damien) - Turn on cgroup psi memstall (Josef) - block cgroup MAINTAINERS entry (Konstantin) - Flush init fix (Josef) - blk-throttle low iops timing fix (Konstantin) - nbd resize fixes (Mike) - nbd 0 blocksize crash fix (Xiubo) - block integrity error leak fix (Wenwen) - blk-cgroup writeback and priority inheritance fixes (Tejun)" * tag 'for-linus-20190715' of git://git.kernel.dk/linux-block: (42 commits) MAINTAINERS: add entry for block io cgroup null_blk: fixup ->report_zones() for !CONFIG_BLK_DEV_ZONED block: Limit zone array allocation size sd_zbc: Fix report zones buffer allocation block: Kill gfp_t argument of blkdev_report_zones() block: Allow mapping of vmalloc-ed buffers block/bio-integrity: fix a memory leak bug nvme: fix NULL deref for fabrics options nbd: add netlink reconfigure resize support nbd: fix crash when the blksize is zero block: Disable write plugging for zoned block devices block: Fix elevator name declaration block: Remove unused definitions nvme: fix regression upon hot device removal and insertion blk-throttle: fix zero wait time for iops throttled group block: Fix potential overflow in blk_report_zones() blkcg: implement REQ_CGROUP_PUNT blkcg, writeback: Implement wbc_blkcg_css() blkcg, writeback: Add wbc->no_cgroup_owner blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner() ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull i2c updates from Wolfram Sang: "New stuff from the I2C world: - in the core, getting irqs from ACPI is now similar to OF - new driver for MediaTek MT7621/7628/7688 SoCs - bcm2835, i801, and tegra drivers got some more attention - GPIO API cleanups - cleanups in the core headers - lots of usual driver updates" * 'i2c/for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (74 commits) i2c: mt7621: Fix platform_no_drv_owner.cocci warnings i2c: cpm: remove casting dma_alloc dt-bindings: i2c: sun6i-p2wi: Fix the binding example dt-bindings: i2c: mv64xxx: Fix the example compatible i2c: i801: Documentation update i2c: i801: Add support for Intel Tiger Lake i2c: i801: Fix PCI ID sorting dt-bindings: i2c-stm32: document optional dmas i2c: i2c-stm32f7: Add I2C_SMBUS_I2C_BLOCK_DATA support i2c: core: Tidy up handling of init_irq i2c: core: Move ACPI gpio IRQ handling into i2c_acpi_get_irq i2c: core: Move ACPI IRQ handling to probe time i2c: acpi: Factor out getting the IRQ from ACPI i2c: acpi: Use available IRQ helper functions i2c: core: Allow whole core to use i2c_dev_irq_from_resources eeprom: at24: modify a comment referring to platform data dt-bindings: i2c: omap: Add new compatible for J721E SoCs dt-bindings: i2c: mv64xxx: Add YAML schemas dt-bindings: i2c: sun6i-p2wi: Add YAML schemas i2c: mt7621: Add MediaTek MT7621/7628/7688 I2C driver ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supplyLinus Torvalds authored
Pull power supply and reset updates from Sebastian Reichel: "Core: - add HWMON compat layer - new properties: - input power limit - input voltage limit Drivers: - qcom-pon: add gen2 support - new driver for storing reboot move in NVMEM - new driver for Wilco EC charger configuration - simplify getting the adapter of a client" * tag 'for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: power: reset: nvmem-reboot-mode: add CONFIG_OF dependency power_supply: wilco_ec: Add charging config driver power: supply: cros: allow to set input voltage and current limit power: supply: add input power and voltage limit properties power: supply: fix semicolon.cocci warnings power: reset: nvmem-reboot-mode: use NVMEM as reboot mode write interface dt-bindings: power: reset: add document for NVMEM based reboot-mode reset: qcom-pon: Add support for gen2 pon dt-bindings: power: reset: qcom: Add qcom,pm8998-pon compatibility line power: supply: Add HWMON compatibility layer power: supply: sbs-manager: simplify getting the adapter of a client power: supply: rt9455_charger: simplify getting the adapter of a client power: supply: rt5033_battery: simplify getting the adapter of a client power: supply: max17042_battery: simplify getting the adapter of a client power: supply: max17040_battery: simplify getting the adapter of a client power: supply: max14656_charger_detector: simplify getting the adapter of a client power: supply: bq25890_charger: simplify getting the adapter of a client power: supply: bq24257_charger: simplify getting the adapter of a client power: supply: bq24190_charger: simplify getting the adapter of a client
-