• Dan Schatzberg's avatar
    mm: add defines for min/max swappiness · 410abb20
    Dan Schatzberg authored
    Patch series "Add swappiness argument to memory.reclaim", v6.
    
    This patch proposes augmenting the memory.reclaim interface with a
    swappiness=<val> argument that overrides the swappiness value for that
    instance of proactive reclaim.
    
    Userspace proactive reclaimers use the memory.reclaim interface to trigger
    reclaim.  The memory.reclaim interface does not allow for any way to
    effect the balance of file vs anon during proactive reclaim.  The only
    approach is to adjust the vm.swappiness setting.  However, there are a few
    reasons we look to control the balance of file vs anon during proactive
    reclaim, separately from reactive reclaim:
    
    * Swapout should be limited to manage SSD write endurance.  In near-OOM
      situations we are fine with lots of swap-out to avoid OOMs.  As these
      are typically rare events, they have relatively little impact on write
      endurance.  However, proactive reclaim runs continuously and so its
      impact on SSD write endurance is more significant.  Therefore it is
      desireable to control swap-out for proactive reclaim separately from
      reactive reclaim
    
    * Some userspace OOM killers like systemd-oomd[1] support OOM killing on
      swap exhaustion.  This makes sense if the swap exhaustion is triggered
      due to reactive reclaim but less so if it is triggered due to proactive
      reclaim (e.g.  one could see OOMs when free memory is ample but anon is
      just particularly cold).  Therefore, it's desireable to have proactive
      reclaim reduce or stop swap-out before the threshold at which OOM
      killing occurs.
    
    In the case of Meta's Senpai proactive reclaimer, we adjust vm.swappiness
    before writes to memory.reclaim[2].  This has been in production for
    nearly two years and has addressed our needs to control proactive vs
    reactive reclaim behavior but is still not ideal for a number of reasons:
    
    * vm.swappiness is a global setting, adjusting it can race/interfere
      with other system administration that wishes to control vm.swappiness. 
      In our case, we need to disable Senpai before adjusting vm.swappiness.
    
    * vm.swappiness is stateful - so a crash or restart of Senpai can leave
      a misconfigured setting.  This requires some additional management to
      record the "desired" setting and ensure Senpai always adjusts to it.
    
    With this patch, we avoid these downsides of adjusting vm.swappiness
    globally.
    
    Previously, this exact interface addition was proposed by Yosry[3].  In
    response, Roman proposed instead an interface to specify precise
    file/anon/slab reclaim amounts[4].  More recently Huan also proposed this
    as well[5] and others similarly questioned if this was the proper
    interface.
    
    Previous proposals sought to use this to allow proactive reclaimers to
    effectively perform a custom reclaim algorithm by issuing proactive
    reclaim with different settings to control file vs anon reclaim (e.g.  to
    only reclaim anon from some applications).  Responses argued that
    adjusting swappiness is a poor interface for custom reclaim.
    
    In contrast, I argue in favor of a swappiness setting not as a way to
    implement custom reclaim algorithms but rather to bias the balance of anon
    vs file due to differences of proactive vs reactive reclaim.  In this
    context, swappiness is the existing interface for controlling this balance
    and this patch simply allows for it to be configured differently for
    proactive vs reactive reclaim.
    
    Specifying explicit amounts of anon vs file pages to reclaim feels
    inappropriate for this prupose.  Proactive reclaimers are un-aware of the
    relative age of file vs anon for a cgroup which makes it difficult to
    manage proactive reclaim of different memory pools.  A proactive reclaimer
    would need some amount of anon reclaim attempts separate from the amount
    of file reclaim attempts which seems brittle given that it's difficult to
    observe the impact.
    
    [1]https://www.freedesktop.org/software/systemd/man/latest/systemd-oomd.service.html
    [2]https://github.com/facebookincubator/oomd/blob/main/src/oomd/plugins/Senpai.cpp#L585-L598
    [3]https://lore.kernel.org/linux-mm/CAJD7tkbDpyoODveCsnaqBBMZEkDvshXJmNdbk51yKSNgD7aGdg@mail.gmail.com/
    [4]https://lore.kernel.org/linux-mm/YoPHtHXzpK51F%2F1Z@carbon/
    [5]https://lore.kernel.org/lkml/20231108065818.19932-1-link@vivo.com/
    
    
    This patch (of 2):
    
    We use the constants 0 and 200 in a few places in the mm code when
    referring to the min and max swappiness.  This patch adds MIN_SWAPPINESS
    and MAX_SWAPPINESS #defines to improve clarity.  There are no functional
    changes.
    
    Link: https://lkml.kernel.org/r/20240103164841.2800183-1-schatzberg.dan@gmail.com
    Link: https://lkml.kernel.org/r/20240103164841.2800183-2-schatzberg.dan@gmail.comSigned-off-by: default avatarDan Schatzberg <schatzberg.dan@gmail.com>
    Acked-by: default avatarDavid Rientjes <rientjes@google.com>
    Acked-by: default avatarChris Li <chrisl@kernel.org>
    Reviewed-by: default avatarNhat Pham <nphamcs@gmail.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shakeel Butt <shakeel.butt@linux.dev>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Yosry Ahmed <yosryahmed@google.com>
    Cc: Yue Zhao <findns94@gmail.com>
    Cc: Zefan Li <lizefan.x@bytedance.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    410abb20
vmscan.c 209 KB