• Johannes Weiner's avatar
    mm: vmscan: enforce inactive:active ratio at the reclaim root · b91ac374
    Johannes Weiner authored
    We split the LRU lists into inactive and an active parts to maximize
    workingset protection while allowing just enough inactive cache space to
    faciltate readahead and writeback for one-off file accesses (e.g.  a
    linear scan through a file, or logging); or just enough inactive anon to
    maintain recent reference information when reclaim needs to swap.
    
    With cgroups and their nested LRU lists, we currently don't do this
    correctly.  While recursive cgroup reclaim establishes a relative LRU
    order among the pages of all involved cgroups, inactive:active size
    decisions are done on a per-cgroup level.  As a result, we'll reclaim a
    cgroup's workingset when it doesn't have cold pages, even when one of its
    siblings has plenty of it that should be reclaimed first.
    
    For example: workload A has 50M worth of hot cache but doesn't do any
    one-off file accesses; meanwhile, parallel workload B scans files and
    rarely accesses the same page twice.
    
    If these workloads were to run in an uncgrouped system, A would be
    protected from the high rate of cache faults from B.  But if they were put
    in parallel cgroups for memory accounting purposes, B's fast cache fault
    rate would push out the hot cache pages of A.  This is unexpected and
    undesirable - the "scan resistance" of the page cache is broken.
    
    This patch moves inactive:active size balancing decisions to the root of
    reclaim - the same level where the LRU order is established.
    
    It does this by looking at the recursive size of the inactive and the
    active file sets of the cgroup subtree at the beginning of the reclaim
    cycle, and then making a decision - scan or skip active pages - that
    applies throughout the entire run and to every cgroup involved.
    
    With that in place, in the test above, the VM will recognize that there
    are plenty of inactive pages in the combined cache set of workloads A and
    B and prefer the one-off cache in B over the hot pages in A.  The scan
    resistance of the cache is restored.
    
    [cai@lca.pw: fix some -Wenum-conversion warnings]
      Link: http://lkml.kernel.org/r/1573848697-29262-1-git-send-email-cai@lca.pw
    Link: http://lkml.kernel.org/r/20191107205334.158354-4-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Reviewed-by: default avatarSuren Baghdasaryan <surenb@google.com>
    Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
    Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    b91ac374
mmzone.h 43.6 KB