• Roman Gushchin's avatar
    mm: memcg: factor out memcg- and lruvec-level changes out of __mod_lruvec_state() · eedc4e5a
    Roman Gushchin authored
    Patch series "The new cgroup slab memory controller", v7.
    
    The patchset moves the accounting from the page level to the object level.
    It allows to share slab pages between memory cgroups.  This leads to a
    significant win in the slab utilization (up to 45%) and the corresponding
    drop in the total kernel memory footprint.  The reduced number of
    unmovable slab pages should also have a positive effect on the memory
    fragmentation.
    
    The patchset makes the slab accounting code simpler: there is no more need
    in the complicated dynamic creation and destruction of per-cgroup slab
    caches, all memory cgroups use a global set of shared slab caches.  The
    lifetime of slab caches is not more connected to the lifetime of memory
    cgroups.
    
    The more precise accounting does require more CPU, however in practice the
    difference seems to be negligible.  We've been using the new slab
    controller in Facebook production for several months with different
    workloads and haven't seen any noticeable regressions.  What we've seen
    were memory savings in order of 1 GB per host (it varied heavily depending
    on the actual workload, size of RAM, number of CPUs, memory pressure,
    etc).
    
    The third version of the patchset added yet another step towards the
    simplification of the code: sharing of slab caches between accounted and
    non-accounted allocations.  It comes with significant upsides (most
    noticeable, a complete elimination of dynamic slab caches creation) but
    not without some regression risks, so this change sits on top of the
    patchset and is not completely merged in.  So in the unlikely event of a
    noticeable performance regression it can be reverted separately.
    
    The slab memory accounting works in exactly the same way for SLAB and
    SLUB.  With both allocators the new controller shows significant memory
    savings, with SLUB the difference is bigger.  On my 16-core desktop
    machine running Fedora 32 the size of the slab memory measured after the
    start of the system was lower by 58% and 38% with SLUB and SLAB
    correspondingly.
    
    As an estimation of a potential CPU overhead, below are results of
    slab_bulk_test01 test, kindly provided by Jesper D.  Brouer.  He also
    helped with the evaluation of results.
    
    The test can be found here: https://github.com/netoptimizer/prototype-kernel/
    The smallest number in each row should be picked for a comparison.
    
    SLUB-patched - bulk-API
     - SLUB-patched : bulk_quick_reuse objects=1 : 187 -  90 - 224  cycles(tsc)
     - SLUB-patched : bulk_quick_reuse objects=2 : 110 -  53 - 133  cycles(tsc)
     - SLUB-patched : bulk_quick_reuse objects=3 :  88 -  95 -  42  cycles(tsc)
     - SLUB-patched : bulk_quick_reuse objects=4 :  91 -  85 -  36  cycles(tsc)
     - SLUB-patched : bulk_quick_reuse objects=8 :  32 -  66 -  32  cycles(tsc)
    
    SLUB-original -  bulk-API
     - SLUB-original: bulk_quick_reuse objects=1 :  87 -  87 - 142  cycles(tsc)
     - SLUB-original: bulk_quick_reuse objects=2 :  52 -  53 -  53  cycles(tsc)
     - SLUB-original: bulk_quick_reuse objects=3 :  42 -  42 -  91  cycles(tsc)
     - SLUB-original: bulk_quick_reuse objects=4 :  91 -  37 -  37  cycles(tsc)
     - SLUB-original: bulk_quick_reuse objects=8 :  31 -  79 -  76  cycles(tsc)
    
    SLAB-patched -  bulk-API
     - SLAB-patched : bulk_quick_reuse objects=1 :  67 -  67 - 140  cycles(tsc)
     - SLAB-patched : bulk_quick_reuse objects=2 :  55 -  46 -  46  cycles(tsc)
     - SLAB-patched : bulk_quick_reuse objects=3 :  93 -  94 -  39  cycles(tsc)
     - SLAB-patched : bulk_quick_reuse objects=4 :  35 -  88 -  85  cycles(tsc)
     - SLAB-patched : bulk_quick_reuse objects=8 :  30 -  30 -  30  cycles(tsc)
    
    SLAB-original-  bulk-API
     - SLAB-original: bulk_quick_reuse objects=1 : 143 - 136 -  67  cycles(tsc)
     - SLAB-original: bulk_quick_reuse objects=2 :  45 -  46 -  46  cycles(tsc)
     - SLAB-original: bulk_quick_reuse objects=3 :  38 -  39 -  39  cycles(tsc)
     - SLAB-original: bulk_quick_reuse objects=4 :  35 -  87 -  87  cycles(tsc)
     - SLAB-original: bulk_quick_reuse objects=8 :  29 -  66 -  30  cycles(tsc)
    
    This patch (of 19):
    
    To convert memcg and lruvec slab counters to bytes there must be a way to
    change these counters without touching node counters.  Factor out
    __mod_memcg_lruvec_state() out of __mod_lruvec_state().
    Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Tejun Heo <tj@kernel.org>
    Link: http://lkml.kernel.org/r/20200623174037.3951353-1-guro@fb.com
    Link: http://lkml.kernel.org/r/20200623174037.3951353-2-guro@fb.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    eedc4e5a
memcontrol.c 186 KB