• Johannes Weiner's avatar
    mm: memcontrol: make cgroup stats and events query API explicitly local · 205b20cc
    Johannes Weiner authored
    Patch series "mm: memcontrol: memory.stat cost & correctness".
    
    The cgroup memory.stat file holds recursive statistics for the entire
    subtree.  The current implementation does this tree walk on-demand
    whenever the file is read.  This is giving us problems in production.
    
    1. The cost of aggregating the statistics on-demand is high.  A lot of
       system service cgroups are mostly idle and their stats don't change
       between reads, yet we always have to check them.  There are also always
       some lazily-dying cgroups sitting around that are pinned by a handful
       of remaining page cache; the same applies to them.
    
       In an application that periodically monitors memory.stat in our
       fleet, we have seen the aggregation consume up to 5% CPU time.
    
    2. When cgroups die and disappear from the cgroup tree, so do their
       accumulated vm events.  The result is that the event counters at
       higher-level cgroups can go backwards and confuse some of our
       automation, let alone people looking at the graphs over time.
    
    To address both issues, this patch series changes the stat
    implementation to spill counts upwards when the counters change.
    
    The upward spilling is batched using the existing per-cpu cache.  In a
    sparse file stress test with 5 level cgroup nesting, the additional cost
    of the flushing was negligible (a little under 1% of CPU at 100% CPU
    utilization, compared to the 5% of reading memory.stat during regular
    operation).
    
    This patch (of 4):
    
    memcg_page_state(), lruvec_page_state(), memcg_sum_events() are
    currently returning the state of the local memcg or lruvec, not the
    recursive state.
    
    In practice there is a demand for both versions, although the callers
    that want the recursive counts currently sum them up by hand.
    
    Per default, cgroups are considered recursive entities and generally we
    expect more users of the recursive counters, with the local counts being
    special cases.  To reflect that in the name, add a _local suffix to the
    current implementations.
    
    The following patch will re-incarnate these functions with recursive
    semantics, but with an O(1) implementation.
    
    [hannes@cmpxchg.org: fix bisection hole]
      Link: http://lkml.kernel.org/r/20190417160347.GC23013@cmpxchg.org
    Link: http://lkml.kernel.org/r/20190412151507.2769-2-hannes@cmpxchg.orgSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
    Reviewed-by: default avatarRoman Gushchin <guro@fb.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    205b20cc
memcontrol.c 171 KB