- 30 Jul, 2022 40 commits
-
-
Sophia Gabriella authored
Fixes a typo in the help section for ZSWAP. Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by:
Sophia Gabriella <sophia.gabriellla@outlook.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
Use pr_fmt to prefix all pr_<level> output, but unpoison_memory() and soft_offline_page() are used by error injection, which have own prefixes like "Unpoison:" and "soft offline:", meanwhile, soft_offline_page() could be used by memory hotremove, so reset pr_fmt before unpoison_pr_info definition to keep the original output for them. [wangkefeng.wang@huawei.com: v3] Link: https://lkml.kernel.org/r/20220729031919.72331-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20220726081046.10742-1-wangkefeng.wang@huawei.comSigned-off-by:
Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by:
Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by:
Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
Use is_zone_movable_page() helper to simplify code. Link: https://lkml.kernel.org/r/20220726131135.146912-1-wangkefeng.wang@huawei.comSigned-off-by:
Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Pankaj Gupta <pankaj.gupta@amd.com> Acked-by:
Jason Wang <jasowang@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Miaohe Lin authored
In some cases, e.g. when size option is not specified, f_blocks, f_bavail and f_bfree will be set to -1 instead of 0. Likewise, when nr_inodes isn't specified, f_files and f_ffree will be set to -1 too. Update the comment to make this clear. Link: https://lkml.kernel.org/r/20220726142918.51693-6-linmiaohe@huawei.comSigned-off-by:
Miaohe Lin <linmiaohe@huawei.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by:
Muchun Song <songmuchun@bytedance.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Miaohe Lin authored
The function generic_file_buffered_read has been renamed to filemap_read since commit 87fa0f3e ("mm/filemap: rename generic_file_buffered_read to filemap_read"). Update the corresponding comment. And duplicated taken in hugetlbfs_fill_super is removed. Link: https://lkml.kernel.org/r/20220726142918.51693-5-linmiaohe@huawei.comSigned-off-by:
Miaohe Lin <linmiaohe@huawei.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by:
Muchun Song <songmuchun@bytedance.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Miaohe Lin authored
The header file signal.h is unneeded now. Remove it. Link: https://lkml.kernel.org/r/20220726142918.51693-4-linmiaohe@huawei.comSigned-off-by:
Miaohe Lin <linmiaohe@huawei.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by:
Muchun Song <songmuchun@bytedance.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Miaohe Lin authored
The forward declaration for hugetlbfs_ops is unnecessary. Remove it. Link: https://lkml.kernel.org/r/20220726142918.51693-3-linmiaohe@huawei.comSigned-off-by:
Miaohe Lin <linmiaohe@huawei.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by:
Muchun Song <songmuchun@bytedance.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Miaohe Lin authored
Patch series "A few cleanup and fixup patches for hugetlbfs", v2. This series contains a few cleaup patches to remove unneeded forward declaration, use helper macro and so on. More details can be found in the respective changelogs. This patch (of 5): Use helper macro SZ_1K and SZ_1M to do the size conversion. Minor readability improvement. Link: https://lkml.kernel.org/r/20220726142918.51693-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220726142918.51693-2-linmiaohe@huawei.comSigned-off-by:
Miaohe Lin <linmiaohe@huawei.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by:
Muchun Song <songmuchun@bytedance.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
It is unnecessary to add CONFIG_HIGHMEM check in is_highmem(), which has been done in is_highmem_idx(), and move is_highmem() close to is_highmem_idx(). This has no functional impact. Link: https://lkml.kernel.org/r/20220726131816.149075-1-wangkefeng.wang@huawei.comSigned-off-by:
Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Ralph Campbell authored
Add a simple test case for when hmm_range_fault() is called with the HMM_PFN_REQ_FAULT flag and a device private PTE is found for a device other than the hmm_range::dev_private_owner. This should cause the page to be faulted back to system memory from the other device and the PFN returned in the output array. Also, remove a piece of code that unnecessarily unmaps part of the buffer. Link: https://lkml.kernel.org/r/20220727000837.4128709-3-rcampbell@nvidia.com Link: https://lkml.kernel.org/r/20220725183615.4118795-3-rcampbell@nvidia.comSigned-off-by:
Ralph Campbell <rcampbell@nvidia.com> Reviewed-by:
Alistair Popple <apopple@nvidia.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Philip Yang <Philip.Yang@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Peter Xu authored
Link: https://lkml.kernel.org/r/20220725142048.30450-4-peterx@redhat.comSigned-off-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Peter Xu authored
Add two soft-dirty test cases for mprotect() on both anon or file. Link: https://lkml.kernel.org/r/20220725142048.30450-3-peterx@redhat.comSigned-off-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Peter Xu authored
Patch series "mm/mprotect: Fix soft-dirty checks", v4. This patch (of 3): The check wanted to make sure when soft-dirty tracking is enabled we won't grant write bit by accident, as a page fault is needed for dirty tracking. The intention is correct but we didn't check it right because VM_SOFTDIRTY set actually means soft-dirty tracking disabled. Fix it. There's another thing tricky about soft-dirty is that, we can't check the vma flag !(vma_flags & VM_SOFTDIRTY) directly but only check it after we checked CONFIG_MEM_SOFT_DIRTY because otherwise VM_SOFTDIRTY will be defined as zero, and !(vma_flags & VM_SOFTDIRTY) will constantly return true. To avoid misuse, introduce a helper for checking whether vma has soft-dirty tracking enabled. We can easily verify this with any exclusive anonymous page, like program below: =======8<====== #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <assert.h> #include <inttypes.h> #include <stdint.h> #include <sys/types.h> #include <sys/mman.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #include <fcntl.h> #include <stdbool.h> #define BIT_ULL(nr) (1ULL << (nr)) #define PM_SOFT_DIRTY BIT_ULL(55) unsigned int psize; char *page; uint64_t pagemap_read_vaddr(int fd, void *vaddr) { uint64_t value; int ret; ret = pread(fd, &value, sizeof(uint64_t), ((uint64_t)vaddr >> 12) * sizeof(uint64_t)); assert(ret == sizeof(uint64_t)); return value; } void clear_refs_write(void) { int fd = open("/proc/self/clear_refs", O_RDWR); assert(fd >= 0); write(fd, "4", 2); close(fd); } #define check_soft_dirty(str, expect) do { \ bool dirty = pagemap_read_vaddr(fd, page) & PM_SOFT_DIRTY; \ if (dirty != expect) { \ printf("ERROR: %s, soft-dirty=%d (expect: %d) ", str, dirty, expect); \ exit(-1); \ } \ } while (0) int main(void) { int fd = open("/proc/self/pagemap", O_RDONLY); assert(fd >= 0); psize = getpagesize(); page = mmap(NULL, psize, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); assert(page != MAP_FAILED); *page = 1; check_soft_dirty("Just faulted in page", 1); clear_refs_write(); check_soft_dirty("Clear_refs written", 0); mprotect(page, psize, PROT_READ); check_soft_dirty("Marked RO", 0); mprotect(page, psize, PROT_READ|PROT_WRITE); check_soft_dirty("Marked RW", 0); *page = 2; check_soft_dirty("Wrote page again", 1); munmap(page, psize); close(fd); printf("Test passed. "); return 0; } =======8<====== Here we attach a Fixes to commit 64fe24a3 only for easy tracking, as this patch won't apply to a tree before that point. However the commit wasn't the source of problem, but instead 64e45507. It's just that after 64fe24a3 anonymous memory will also suffer from this problem with mprotect(). Link: https://lkml.kernel.org/r/20220725142048.30450-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20220725142048.30450-2-peterx@redhat.com Fixes: 64e45507 ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared") Fixes: 64fe24a3 ("mm/mprotect: try avoiding write faults for exclusive anonymous pages when changing protection") Signed-off-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Tetsuo Handa authored
syzbot is reporting GFP_KERNEL allocation with oom_lock held when reporting memcg OOM [1]. If this allocation triggers the global OOM situation then the system can livelock because the GFP_KERNEL allocation with oom_lock held cannot trigger the global OOM killer because __alloc_pages_may_oom() fails to hold oom_lock. Fix this problem by removing the allocation from memory_stat_format() completely, and pass static buffer when calling from memcg OOM path. Note that the caller holding filesystem lock was the trigger for syzbot to report this locking dependency. Doing GFP_KERNEL allocation with filesystem lock held can deadlock the system even without involving OOM situation. Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1] Link: https://lkml.kernel.org/r/86afb39f-8c65-bec2-6cfc-c5e3cd600c0b@I-love.SAKURA.ne.jp Fixes: c8713d0b ("mm: memcontrol: dump memory.stat during cgroup OOM") Signed-off-by:
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by:
syzbot <syzbot+2d2aeadc6ce1e1f11d45@syzkaller.appspotmail.com> Suggested-by:
Michal Hocko <mhocko@suse.com> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Alistair Popple authored
Commit b05a79d4 ("mm/gup: migrate device coherent pages when pinning instead of failing") added a badly formatted if statement. Fix it. Link: https://lkml.kernel.org/r/20220721020552.1397598-2-apopple@nvidia.comSigned-off-by:
Alistair Popple <apopple@nvidia.com> Reported-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Shiyang Ruan authored
Failure notification is not supported on partitions. So, when we mount a reflink enabled xfs on a partition with dax option, let it fail with -EINVAL code. Link: https://lkml.kernel.org/r/20220609143435.393724-1-ruansy.fnst@fujitsu.comSigned-off-by:
Shiyang Ruan <ruansy.fnst@fujitsu.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Jiebin Sun authored
Remove the redundant updating of stats_flush_threshold. If the global var stats_flush_threshold has exceeded the trigger value for __mem_cgroup_flush_stats, further increment is unnecessary. Apply the patch and test the pts/hackbench-1.0.0 Count:4 (160 threads). Score gain: 1.95x Reduce CPU cycles in __mod_memcg_lruvec_state (44.88% -> 0.12%) CPU: ICX 8380 x 2 sockets Core number: 40 x 2 physical cores Benchmark: pts/hackbench-1.0.0 Count:4 (160 threads) Link: https://lkml.kernel.org/r/20220722164949.47760-1-jiebin.sun@intel.comSigned-off-by:
Jiebin Sun <jiebin.sun@intel.com> Acked-by:
Shakeel Butt <shakeelb@google.com> Reviewed-by:
Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by:
Tim Chen <tim.c.chen@linux.intel.com> Acked-by:
Muchun Song <songmuchun@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Amadeusz Sawiski <amadeuszx.slawinski@linux.intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Axel Rasmussen authored
The basic interaction for setting up a userfaultfd is, userspace issues a UFFDIO_API ioctl, and passes in a set of zero or more feature flags, indicating the features they would prefer to use. Of course, different kernels may support different sets of features (depending on kernel version, kconfig options, architecture, etc). Userspace's expectations may also not match: perhaps it was built against newer kernel headers, which defined some features the kernel it's running on doesn't support. Currently, if userspace passes in a flag we don't recognize, the initialization fails and we return -EINVAL. This isn't great, though. Userspace doesn't have an obvious way to react to this; sure, one of the features I asked for was unavailable, but which one? The only option it has is to turn off things "at random" and hope something works. Instead, modify UFFDIO_API to just ignore any unrecognized feature flags. The interaction is now that the initialization will succeed, and as always we return the *subset* of feature flags that can actually be used back to userspace. Now userspace has an obvious way to react: it checks if any flags it asked for are missing. If so, it can conclude this kernel doesn't support those, and it can either resign itself to not using them, or fail with an error on its own, or whatever else. Link: https://lkml.kernel.org/r/20220722201513.1624158-1-axelrasmussen@google.comSigned-off-by:
Axel Rasmussen <axelrasmussen@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Miaohe Lin authored
We forget to set cft->private for numa stat file. As a result, numa stat of hstates[0] is always showed for all hstates. Encode the hstates index into cft->private to fix this issue. Link: https://lkml.kernel.org/r/20220723073804.53035-1-linmiaohe@huawei.com Fixes: f4776199 ("hugetlb: add hugetlb.*.numa_stat file") Signed-off-by:
Miaohe Lin <linmiaohe@huawei.com> Acked-by:
Muchun Song <songmuchun@bytedance.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Dan Carpenter authored
Initialize "length" to zero by default. Link: https://lkml.kernel.org/r/YtZzjvHXVXMXxpXO@kili Fixes: ff712a62 ("selftests/vm: cleanup hugetlb file after mremap test") Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by:
Mina Almasry <almasrymina@google.com> Reviewed-by:
Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Kassey Li authored
Avoids truncating the debugfs output to 16 chars. Potentially alters the userspace output, but this is a debugfs interface and there are no stability guarantees. Link: https://lkml.kernel.org/r/20220719091554.27864-1-quic_yingangl@quicinc.comSigned-off-by:
Kassey Li <quic_yingangl@quicinc.com> Cc: Sasha Levin <sashal@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Dan Carpenter authored
This code just reads from memory without caring about the data itself. However static checkers complain that "tmp" is never properly initialized. Initialize it to zero and change the name to "dummy" to show that we don't care about the value stored in it. Link: https://lkml.kernel.org/r/YtZ8mKJmktA2GaHB@kili Fixes: c4b6cb88 ("selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test") Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Acked-by:
Souptick Joarder (HPE) <jrdr.linux@gmail.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Miaohe Lin authored
We can use unlock label to unlock ptl and return ret directly to remove the unneeded out label and reduce the size of mempolicy.o. No functional change intended. [Before] text data bss dec hex filename 26702 3972 6168 36842 8fea mm/mempolicy.o [After] text data bss dec hex filename 26662 3972 6168 36802 8fc2 mm/mempolicy.o Link: https://lkml.kernel.org/r/20220719115233.6706-1-linmiaohe@huawei.comSigned-off-by:
Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Mark-PK Tsai authored
cpuset.c was moved to kernel/cgroup/ in below commit 201af4c0 ("cgroup: move cgroup files under kernel/cgroup/") Correct the wrong path in comment. Link: https://lkml.kernel.org/r/20220718120336.5145-1-mark-pk.tsai@mediatek.comSigned-off-by:
Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Miaohe Lin authored
When code reaches here, the page must be !PageAnon. There's no need to check PageAnon again. Remove it. Link: https://lkml.kernel.org/r/20220716081816.10752-1-linmiaohe@huawei.comSigned-off-by:
Miaohe Lin <linmiaohe@huawei.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Yixuan Cao authored
I noticed one more indentation than necessary in is_need(). Link: https://lkml.kernel.org/r/20220717195506.7602-1-caoyixuan2019@email.szu.edu.cnSigned-off-by:
Yixuan Cao <caoyixuan2019@email.szu.edu.cn> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Theodore Ts'o authored
This allows userspace to set flags like FS_APPEND_FL, FS_IMMUTABLE_FL, FS_NODUMP_FL, etc., like all other standard Linux file systems. [akpm@linux-foundation.org: fix CONFIG_TMPFS_XATTR=n warnings] Link: https://lkml.kernel.org/r/20220715015912.2560575-1-tytso@mit.eduSigned-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: Hugh Dickins <hughd@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Jianglei Nie authored
damon_reclaim_init() allocates a memory chunk for ctx with damon_new_ctx(). When damon_select_ops() fails, ctx is not released, which will lead to a memory leak. We should release the ctx with damon_destroy_ctx() when damon_select_ops() fails to fix the memory leak. Link: https://lkml.kernel.org/r/20220714063746.2343549-1-niejianglei2021@163.com Fixes: 4d69c345 ("mm/damon/reclaim: use damon_select_ops() instead of damon_{v,p}a_set_operations()") Signed-off-by:
Jianglei Nie <niejianglei2021@163.com> Reviewed-by:
SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Yosry Ahmed authored
memory.reclaim is a cgroup v2 interface that allows users to proactively reclaim memory from a memcg, without real memory pressure. Reclaim operations invoke vmpressure, which is used: (a) To notify userspace of reclaim efficiency in cgroup v1, and (b) As a signal for a memcg being under memory pressure for networking (see mem_cgroup_under_socket_pressure()). For (a), vmpressure notifications in v1 are not affected by this change since memory.reclaim is a v2 feature. For (b), the effects of the vmpressure signal (according to Shakeel [1]) are as follows: 1. Reducing send and receive buffers of the current socket. 2. May drop packets on the rx path. 3. May throttle current thread on the tx path. Since proactive reclaim is invoked directly by userspace, not by memory pressure, it makes sense not to throttle networking. Hence, this change makes sure that proactive reclaim caused by memory.reclaim does not trigger vmpressure. [1] https://lore.kernel.org/lkml/CALvZod68WdrXEmBpOkadhB5GPYmCXaDZzXH=yyGOCAjFRn4NDQ@mail.gmail.com/ [yosryahmed@google.com: update documentation] Link: https://lkml.kernel.org/r/20220721173015.2643248-1-yosryahmed@google.com Link: https://lkml.kernel.org/r/20220714064918.2576464-1-yosryahmed@google.comSigned-off-by:
Yosry Ahmed <yosryahmed@google.com> Acked-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Michal Hocko <mhocko@suse.com> Acked-by:
David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: NeilBrown <neilb@suse.de> Cc: Alistair Popple <apopple@nvidia.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Hui Zhu authored
zs_malloc returns 0 if it fails. zs_zpool_malloc will return -1 when zs_malloc return 0. But -1 makes the return value unclear. For example, when zswap_frontswap_store calls zs_malloc through zs_zpool_malloc, it will return -1 to its caller. The other return value is -EINVAL, -ENODEV or something else. This commit changes zs_malloc to return ERR_PTR on failure. It didn't just let zs_zpool_malloc return -ENOMEM becaue zs_malloc has two types of failure: - size is not OK return -EINVAL - memory alloc fail return -ENOMEM. Link: https://lkml.kernel.org/r/20220714080757.12161-1-teawater@gmail.comSigned-off-by:
Hui Zhu <teawater@antgroup.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Xiu Jianfeng authored
inode_to_wb_is_valid() is no longer used since commit fe55d563 ("remove inode_congested()"), remove it. Link: https://lkml.kernel.org/r/20220714084147.140324-1-xiujianfeng@huawei.comSigned-off-by:
Xiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Zhou Guanghui authored
In a system(Huawei Ascend ARM64 SoC) using HBM, a multi-bit ECC error occurs, and the BIOS will mark the corresponding area (for example, 2 MB) as unusable. When the system restarts next time, these areas are not reported or reported as EFI_UNUSABLE_MEMORY. Both cases lead to an increase in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads to a larger number of memblocks. For example, if the EFI_UNUSABLE_MEMORY type is reported: ... memory[0x92] [0x0000200834a00000-0x0000200835bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x93] [0x0000200835c00000-0x0000200835dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x94] [0x0000200835e00000-0x00002008367fffff], 0x0000000000a00000 bytes on node 7 flags: 0x0 memory[0x95] [0x0000200836800000-0x00002008369fffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x96] [0x0000200836a00000-0x0000200837bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x97] [0x0000200837c00000-0x0000200837dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x98] [0x0000200837e00000-0x000020087fffffff], 0x0000000048200000 bytes on node 7 flags: 0x0 memory[0x99] [0x0000200880000000-0x0000200bcfffffff], 0x0000000350000000 bytes on node 6 flags: 0x0 memory[0x9a] [0x0000200bd0000000-0x0000200bd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9b] [0x0000200bd0200000-0x0000200bd07fffff], 0x0000000000600000 bytes on node 6 flags: 0x0 memory[0x9c] [0x0000200bd0800000-0x0000200bd09fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9d] [0x0000200bd0a00000-0x0000200fcfffffff], 0x00000003ff600000 bytes on node 6 flags: 0x0 memory[0x9e] [0x0000200fd0000000-0x0000200fd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9f] [0x0000200fd0200000-0x0000200fffffffff], 0x000000002fe00000 bytes on node 6 flags: 0x0 ... The EFI memory map is parsed to construct the memblock arrays before the memblock arrays can be resized. As the result, memory regions beyond INIT_MEMBLOCK_REGIONS are lost. Add a new macro INIT_MEMBLOCK_MEMORY_REGIONS to replace INIT_MEMBLOCK_REGTIONS to define the size of the static memblock.memory array. Allow overriding memblock.memory array size with architecture defined INIT_MEMBLOCK_MEMORY_REGIONS and make arm64 to set INIT_MEMBLOCK_MEMORY_REGIONS to 1024 when CONFIG_EFI is enabled. Link: https://lkml.kernel.org/r/20220615102742.96450-1-zhouguanghui1@huawei.comSigned-off-by:
Zhou Guanghui <zhouguanghui1@huawei.com> Acked-by:
Mike Rapoport <rppt@linux.ibm.com> Tested-by:
Darren Hart <darren@os.amperecomputing.com> Acked-by: Will Deacon <will@kernel.org> [arm64] Reviewed-by:
Anshuman Khandual <anshuman.khandual@arm.com> Cc: Xu Qiang <xuqiang36@huawei.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Miaohe Lin authored
Since commit 7267ec00 ("mm: postpone page table allocation until we have page to map"), do_fault_around is not called with page table lock held. Cleanup the corresponding comments. Link: https://lkml.kernel.org/r/20220716080359.38791-1-linmiaohe@huawei.comSigned-off-by:
Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
William Lam authored
The number of scanned pages can be lower than the number of isolated pages when isolating mirgratable or free pageblock. The metric is being reported in trace event and also used in vmstat. some example output from trace where it shows nr_taken can be greater than nr_scanned: Produced by kernel v5.19-rc6 kcompactd0-42 [001] ..... 1210.268022: mm_compaction_isolate_migratepages: range=(0x107ae4 ~ 0x107c00) nr_scanned=265 nr_taken=255 [...] kcompactd0-42 [001] ..... 1210.268382: mm_compaction_isolate_freepages: range=(0x215800 ~ 0x215a00) nr_scanned=13 nr_taken=128 kcompactd0-42 [001] ..... 1210.268383: mm_compaction_isolate_freepages: range=(0x215600 ~ 0x215680) nr_scanned=1 nr_taken=128 mm_compaction_isolate_migratepages does not seem to have this behaviour, but for the reason of consistency, nr_scanned should also be taken care of in that side. This behaviour is confusing since currently the count for isolated pages takes account of compound page but not for the case of scanned pages. And given that the number of isolated pages(nr_taken) reported in mm_compaction_isolate_template trace event is on a single-page basis, the ambiguity when reporting the number of scanned pages can be removed by also including compound page count. Link: https://lkml.kernel.org/r/20220711202806.22296-1-william.lam@bytedance.comSigned-off-by:
William Lam <william.lam@bytedance.com> Reviewed-by:
Punit Agrawal <punit.agrawal@bytedance.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Adam Sindelar authored
The test va_128TBswitch.c exercises a feature only supported on PPC and x86_64, but it's run on other 64-bit archs as well. Before this patch, the test did nothing and returned 0 for KSFT_PASS. This patch makes it return the KSFT codes from kselftest.h, including KSFT_SKIP when appropriate. Verified on arm64 and x86_64. Link: https://lkml.kernel.org/r/20220704123813.427625-1-adam@wowsignal.ioSigned-off-by:
Adam Sindelar <adam@wowsignal.io> Cc: David Vernet <void@manifault.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Adam Sindelar authored
mrelease_test should return KSFT_SKIP when process_mrelease is not defined, but due to a perror call consuming the errno, it returns KSFT_FAIL. This patch decides the exit code before calling perror. [adam@wowsignal.io: fix remaining instances of errno mishandling] Link: https://lkml.kernel.org/r/20220706141602.10159-1-adam@wowsignal.io Link: https://lkml.kernel.org/r/20220704173351.19595-1-adam@wowsignal.io Fixes: 33776141 ("selftests: vm: add process_mrelease tests") Signed-off-by:
Adam Sindelar <adam@wowsignal.io> Reviewed-by:
David Vernet <void@manifault.com> Reviewed-by:
Suren Baghdasaryan <surenb@google.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Roman Gushchin authored
Yafang Shao reported an issue related to the accounting of bpf memory: if a bpf map is charged indirectly for memory consumed from an interrupt context and allocations are enforced, MEMCG_MAX events are not raised. It's not/less of an issue in a generic case because consequent allocations from a process context will trigger the direct reclaim and MEMCG_MAX events will be raised. However a bpf map can belong to a dying/abandoned memory cgroup, so there will be no allocations from a process context and no MEMCG_MAX events will be triggered. Link: https://lkml.kernel.org/r/20220702033521.64630-1-roman.gushchin@linux.devSigned-off-by:
Roman Gushchin <roman.gushchin@linux.dev> Reported-by:
Yafang Shao <laoar.shao@gmail.com> Acked-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Miaohe Lin authored
Restructure the logic in filemap_write_and_wait_range to simplify the code and make it more consistent with file_write_and_wait_range. No functional change intended. Link: https://lkml.kernel.org/r/20220627132351.55680-1-linmiaohe@huawei.comSigned-off-by:
Miaohe Lin <linmiaohe@huawei.com> Reviewed-by:
Muchun Song <songmuchun@bytedance.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Miaohe Lin authored
Since the beginning, charged is set to 0 to avoid calling vm_unacct_memory twice because vm_unacct_memory will be called by above unmap_region. But since commit 4f74d2c8 ("vm: remove 'nr_accounted' calculations from the unmap_vmas() interfaces"), unmap_region doesn't call vm_unacct_memory anymore. So charged shouldn't be set to 0 now otherwise the calling to paired vm_unacct_memory will be missed and leads to imbalanced account. Link: https://lkml.kernel.org/r/20220618082027.43391-1-linmiaohe@huawei.com Fixes: 4f74d2c8 ("vm: remove 'nr_accounted' calculations from the unmap_vmas() interfaces") Signed-off-by:
Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Liam Howlett authored
When munmapping a vma, the mmap_lock can be degraded to a write before calling close() on the file handle. The binder close() function calls binder_alloc_set_vma() to clear the vma address, which now has a lock dep check for writing on the mmap_lock. Change the lockdep check to ensure the reading lock is held while clearing and keep the write check while writing. Link: https://lkml.kernel.org/r/20220627151857.2316964-1-Liam.Howlett@oracle.com Fixes: 472a68df605b ("android: binder: stop saving a pointer to the VMA") Signed-off-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: syzbot+da54fa8d793ca89c741f@syzkaller.appspotmail.com Acked-by:
Todd Kjos <tkjos@google.com> Cc: "Arve Hjønnevåg" <arve@android.com> Cc: Christian Brauner (Microsoft) <brauner@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hridya Valsaraju <hridya@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Martijn Coenen <maco@android.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-