• Yu Zhao's avatar
    mm: multi-gen LRU: rename lru_gen_struct to lru_gen_folio · 391655fe
    Yu Zhao authored
    Patch series "mm: multi-gen LRU: memcg LRU", v3.
    
    Overview
    ========
    
    An memcg LRU is a per-node LRU of memcgs.  It is also an LRU of LRUs,
    since each node and memcg combination has an LRU of folios (see
    mem_cgroup_lruvec()).
    
    Its goal is to improve the scalability of global reclaim, which is
    critical to system-wide memory overcommit in data centers.  Note that
    memcg reclaim is currently out of scope.
    
    Its memory bloat is a pointer to each lruvec and negligible to each
    pglist_data.  In terms of traversing memcgs during global reclaim, it
    improves the best-case complexity from O(n) to O(1) and does not affect
    the worst-case complexity O(n).  Therefore, on average, it has a sublinear
    complexity in contrast to the current linear complexity.
    
    The basic structure of an memcg LRU can be understood by an analogy to
    the active/inactive LRU (of folios):
    1. It has the young and the old (generations), i.e., the counterparts
       to the active and the inactive;
    2. The increment of max_seq triggers promotion, i.e., the counterpart
       to activation;
    3. Other events trigger similar operations, e.g., offlining an memcg
       triggers demotion, i.e., the counterpart to deactivation.
    
    In terms of global reclaim, it has two distinct features:
    1. Sharding, which allows each thread to start at a random memcg (in
       the old generation) and improves parallelism;
    2. Eventual fairness, which allows direct reclaim to bail out at will
       and reduces latency without affecting fairness over some time.
    
    The commit message in patch 6 details the workflow:
    https://lore.kernel.org/r/20221222041905.2431096-7-yuzhao@google.com/
    
    The following is a simple test to quickly verify its effectiveness.
    
      Test design:
      1. Create multiple memcgs.
      2. Each memcg contains a job (fio).
      3. All jobs access the same amount of memory randomly.
      4. The system does not experience global memory pressure.
      5. Periodically write to the root memory.reclaim.
    
      Desired outcome:
      1. All memcgs have similar pgsteal counts, i.e., stddev(pgsteal)
         over mean(pgsteal) is close to 0%.
      2. The total pgsteal is close to the total requested through
         memory.reclaim, i.e., sum(pgsteal) over sum(requested) is close
         to 100%.
    
      Actual outcome [1]:
                                         MGLRU off    MGLRU on
      stddev(pgsteal) / mean(pgsteal)    75%          20%
      sum(pgsteal) / sum(requested)      425%         95%
    
      ####################################################################
      MEMCGS=128
    
      for ((memcg = 0; memcg < $MEMCGS; memcg++)); do
          mkdir /sys/fs/cgroup/memcg$memcg
      done
    
      start() {
          echo $BASHPID > /sys/fs/cgroup/memcg$memcg/cgroup.procs
    
          fio -name=memcg$memcg --numjobs=1 --ioengine=mmap \
              --filename=/dev/zero --size=1920M --rw=randrw \
              --rate=64m,64m --random_distribution=random \
              --fadvise_hint=0 --time_based --runtime=10h \
              --group_reporting --minimal
      }
    
      for ((memcg = 0; memcg < $MEMCGS; memcg++)); do
          start &
      done
    
      sleep 600
    
      for ((i = 0; i < 600; i++)); do
          echo 256m >/sys/fs/cgroup/memory.reclaim
          sleep 6
      done
    
      for ((memcg = 0; memcg < $MEMCGS; memcg++)); do
          grep "pgsteal " /sys/fs/cgroup/memcg$memcg/memory.stat
      done
      ####################################################################
    
    [1]: This was obtained from running the above script (touches less
         than 256GB memory) on an EPYC 7B13 with 512GB DRAM for over an
         hour.
    
    
    This patch (of 8):
    
    The new name lru_gen_folio will be more distinct from the coming
    lru_gen_memcg.
    
    Link: https://lkml.kernel.org/r/20221222041905.2431096-1-yuzhao@google.com
    Link: https://lkml.kernel.org/r/20221222041905.2431096-2-yuzhao@google.comSigned-off-by: default avatarYu Zhao <yuzhao@google.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Michael Larabel <Michael@MichaelLarabel.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Mike Rapoport <rppt@kernel.org>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    391655fe
workingset.c 24.5 KB