1. 05 Jul, 2024 7 commits
    • Roman Gushchin's avatar
      mm: memcg: introduce memcontrol-v1.c · 1b1e1344
      Roman Gushchin authored
      Patch series "mm: memcg: separate legacy cgroup v1 code and put under
      config option", v2.
      
      Cgroups v2 have been around for a while and many users have fully adopted
      them, so they never use cgroups v1 features and functionality.  Yet they
      have to "pay" for the cgroup v1 support anyway:
      1) the kernel binary contains an unused cgroup v1 code,
      2) some code paths have additional checks which are not needed,
      3) some common structures like task_struct and mem_cgroup contain unused
         cgroup v1-specific members.
      
      Cgroup v1's memory controller has a number of features that are not
      supported by cgroup v2 and their implementation is pretty much self
      contained.  Most notably, these features are: soft limit reclaim, oom
      handling in userspace, complicated event notification system, charge
      migration.  Cgroup v1-specific code in memcontrol.c is close to 4k lines
      in size and it's intervened with generic and cgroup v2-specific code. 
      It's a burden on developers and maintainers.
      
      This patchset aims to solve these problems by:
      1) moving cgroup v1-specific memcg code to the new mm/memcontrol-v1.c file,
      2) putting definitions shared by memcontrol.c and memcontrol-v1.c into the
         mm/memcontrol-v1.h header,
      3) introducing the CONFIG_MEMCG_V1 config option, turned off by default,
      4) making memcontrol-v1.c to compile only if CONFIG_MEMCG_V1 is set.
      
      If CONFIG_MEMCG_V1 is not set, cgroup v1 memory controller is still available
      for mounting, however no memory-specific control knobs are present.
      
      This patch (of 14):
      
      
      This patch introduces the mm/memcontrol-v1.c source file which will be
      used for all legacy (cgroup v1) memory cgroup code.  It also introduces
      mm/memcontrol-v1.h to keep declarations shared between mm/memcontrol.c and
      mm/memcontrol-v1.c.
      
      As of now, let's compile it if CONFIG_MEMCG is set, similar to
      mm/memcontrol.c.  Later on it can be switched to use a separate config
      option, so that the legacy code won't be compiled if not required.
      
      Link: https://lkml.kernel.org/r/20240625005906.106920-1-roman.gushchin@linux.dev
      Link: https://lkml.kernel.org/r/20240625005906.106920-2-roman.gushchin@linux.devSigned-off-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarShakeel Butt <shakeel.butt@linux.dev>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1b1e1344
    • Chengming Zhou's avatar
      mm/ksm: optimize the chain()/chain_prune() interfaces · a0b856b6
      Chengming Zhou authored
      Now the implementation of stable_node_dup() causes chain()/chain_prune()
      interfaces and usages are overcomplicated.
      
      Why?  stable_node_dup() only find and return a candidate stable_node for
      sharing, so the users have to recheck using stable_node_dup_any() if any
      non-candidate stable_node exist.  And try to ksm_get_folio() from it
      again.
      
      Actually, stable_node_dup() can just return a best stable_node as it can,
      then the users can check if it's a candidate for sharing or not.
      
      The code is simplified too and fewer corner cases: such as stable_node and
      stable_node_dup can't be NULL if returned tree_folio is not NULL.
      
      Link: https://lkml.kernel.org/r/20240621-b4-ksm-scan-optimize-v2-3-1c328aa9e30b@linux.devSigned-off-by: default avatarChengming Zhou <chengming.zhou@linux.dev>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Stefan Roesch <shr@devkernel.io>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a0b856b6
    • Chengming Zhou's avatar
      mm/ksm: don't waste time searching stable tree for fast changing page · d58a361b
      Chengming Zhou authored
      The code flow in cmp_and_merge_page() is suboptimal for handling the ksm
      page and non-ksm page at the same time.  For example:
      
      - ksm page
       1. Mostly just return if this ksm page is not migrated and this rmap_item
          has been on the rmap hlist. Or we have to fix this rmap_item mapping.
       2. But we absolutely don't need to checksum for this ksm page, since it
          can't change.
      
      - non-ksm page
       1. First don't need to waste time searching stable tree if fast changing.
       2. Should try to merge with zero page before search the stable tree.
       3. Then search stable tree to find mergeable ksm page.
      
      This patch optimizes the code flow so the handling differences between ksm
      page and non-ksm page become clearer and more efficient too.
      
      Link: https://lkml.kernel.org/r/20240621-b4-ksm-scan-optimize-v2-2-1c328aa9e30b@linux.devSigned-off-by: default avatarChengming Zhou <chengming.zhou@linux.dev>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Stefan Roesch <shr@devkernel.io>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d58a361b
    • Chengming Zhou's avatar
      mm/ksm: refactor out try_to_merge_with_zero_page() · ac90c56b
      Chengming Zhou authored
      Patch series "mm/ksm: cmp_and_merge_page() optimizations and cleanup", v2.
      
      This series mainly optimizes cmp_and_merge_page() to have more efficient
      separate code flow for ksm page and non-ksm anon page.
      
      - ksm page: don't need to calculate the checksum obviously.
      - anon page: don't need to search stable tree if changing fast and try
        to merge with zero page before searching ksm page on stable tree.
      
      Please see the patch-2 for details.
      
      Patch-3 is cleanup also a little optimization for the chain()/chain_prune
      interfaces, which made the stable_tree_search()/stable_tree_insert() over
      complex.
      
      I have done simple testing using "hackbench -g 1 -l 300000" (maybe I need
      to use a better workload) on my machine, have seen a little CPU usage
      decrease of ksmd and some improvements of cmp_and_merge_page() latency:
      
      We can see the latency of cmp_and_merge_page() when handling non-ksm anon
      pages has been improved.
      
      
      This patch (of 3):
      
      In preparation for later changes, refactor out a new function called
      try_to_merge_with_zero_page(), which tries to merge with zero page.
      
      Link: https://lkml.kernel.org/r/20240621-b4-ksm-scan-optimize-v2-0-1c328aa9e30b@linux.dev
      Link: https://lkml.kernel.org/r/20240621-b4-ksm-scan-optimize-v2-1-1c328aa9e30b@linux.devSigned-off-by: default avatarChengming Zhou <chengming.zhou@linux.dev>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Stefan Roesch <shr@devkernel.io>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ac90c56b
    • Aristeu Rozanski's avatar
      hugetlb: force allocating surplus hugepages on mempolicy allowed nodes · 003af997
      Aristeu Rozanski authored
      When trying to allocate a hugepage with no reserved ones free, it may be
      allowed in case a number of overcommit hugepages was configured (using
      /proc/sys/vm/nr_overcommit_hugepages) and that number wasn't reached. 
      This allows for a behavior of having extra hugepages allocated
      dynamically, if there're resources for it.  Some sysadmins even prefer not
      reserving any hugepages and setting a big number of overcommit hugepages.
      
      But while attempting to allocate overcommit hugepages in a multi node
      system (either NUMA or mempolicy/cpuset) said allocations might randomly
      fail even when there're resources available for the allocation.
      
      This happens due to allowed_mems_nr() only accounting for the number of
      free hugepages in the nodes the current process belongs to and the surplus
      hugepage allocation is done so it can be allocated in any node.  In case
      one or more of the requested surplus hugepages are allocated in a
      different node, the whole allocation will fail due allowed_mems_nr()
      returning a lower value.
      
      So allocate surplus hugepages in one of the nodes the current process
      belongs to.
      
      Easy way to reproduce this issue is to use a 2+ NUMA nodes system:
      
      	# echo 0 >/proc/sys/vm/nr_hugepages
      	# echo 1 >/proc/sys/vm/nr_overcommit_hugepages
      	# numactl -m0 ./tools/testing/selftests/mm/map_hugetlb 2
      
      Repeating the execution of map_hugetlb test application will eventually
      fail when the hugepage ends up allocated in a different node.
      
      [aris@ruivo.org: v2]
        Link: https://lkml.kernel.org/r/20240701212343.GG844599@cathedrallabs.org
      Link: https://lkml.kernel.org/r/20240621190050.mhxwb65zn37doegp@redhat.comSigned-off-by: default avatarAristeu Rozanski <aris@redhat.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Aristeu Rozanski <aris@ruivo.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Vishal Moola <vishal.moola@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      003af997
    • SeongJae Park's avatar
      mm/damon/paddr: initialize nr_succeeded in __damon_pa_migrate_folio_list() · 64548bc5
      SeongJae Park authored
      The variable is supposed to be set via later migrate_pages() call. 
      However, the function does not do that when CONFIG_MIGRATION is unset. 
      Initialize the variable to zero.
      
      Link: https://lkml.kernel.org/r/20240701165332.47495-1-sj@kernel.org
      Fixes: 5311c0a2eee3 ("mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion")
      Signed-off-by: default avatarSeongJae Park <sj@kernel.org>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Closes: https://lore.kernel.org/r/202406251102.GE07hqfQ-lkp@intel.com/
      Cc: Honggyu Kim <honggyu.kim@sk.com>
      Cc: Hyeongtak Ji <hyeongtak.ji@sk.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      64548bc5
    • Kefeng Wang's avatar
      mm: refactor folio_undo_large_rmappable() · 593a10da
      Kefeng Wang authored
      Folios of order <= 1 are not in deferred list, the check of order is added
      into folio_undo_large_rmappable() from commit 8897277a ("mm: support
      order-1 folios in the page cache"), but there is a repeated check for
      small folio (order 0) during each call of the
      folio_undo_large_rmappable(), so only keep folio_order() check inside the
      function.
      
      In addition, move all the checks into header file to save a function call
      for non-large-rmappable or empty deferred_list folio.
      
      Link: https://lkml.kernel.org/r/20240521130315.46072-1-wangkefeng.wang@huawei.comSigned-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarVishal Moola (Oracle) <vishal.moola@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Shakeel Butt <shakeel.butt@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      593a10da
  2. 04 Jul, 2024 33 commits