• Roman Gushchin's avatar
    percpu: return number of released bytes from pcpu_free_area() · 5b32af91
    Roman Gushchin authored
    Patch series "mm: memcg accounting of percpu memory", v3.
    
    This patchset adds percpu memory accounting to memory cgroups.  It's based
    on the rework of the slab controller and reuses concepts and features
    introduced for the per-object slab accounting.
    
    Percpu memory is becoming more and more widely used by various subsystems,
    and the total amount of memory controlled by the percpu allocator can make
    a good part of the total memory.
    
    As an example, bpf maps can consume a lot of percpu memory, and they are
    created by a user.  Also, some cgroup internals (e.g.  memory controller
    statistics) can be quite large.  On a machine with many CPUs and big
    number of cgroups they can consume hundreds of megabytes.
    
    So the lack of memcg accounting is creating a breach in the memory
    isolation.  Similar to the slab memory, percpu memory should be accounted
    by default.
    
    Percpu allocations by their nature are scattered over multiple pages, so
    they can't be tracked on the per-page basis.  So the per-object tracking
    introduced by the new slab controller is reused.
    
    The patchset implements charging of percpu allocations, adds memcg-level
    statistics, enables accounting for percpu allocations made by memory
    cgroup internals and provides some basic tests.
    
    To implement the accounting of percpu memory without a significant memory
    and performance overhead the following approach is used: all accounted
    allocations are placed into a separate percpu chunk (or chunks).  These
    chunks are similar to default chunks, except that they do have an attached
    vector of pointers to obj_cgroup objects, which is big enough to save a
    pointer for each allocated object.  On the allocation, if the allocation
    has to be accounted (__GFP_ACCOUNT is passed, the allocating process
    belongs to a non-root memory cgroup, etc), the memory cgroup is getting
    charged and if the maximum limit is not exceeded the allocation is
    performed using a memcg-aware chunk.  Otherwise -ENOMEM is returned or the
    allocation is forced over the limit, depending on gfp (as any other kernel
    memory allocation).  The memory cgroup information is saved in the
    obj_cgroup vector at the corresponding offset.  On the release time the
    memcg information is restored from the vector and the cgroup is getting
    uncharged.  Unaccounted allocations (at this point the absolute majority
    of all percpu allocations) are performed in the old way, so no additional
    overhead is expected.
    
    To avoid pinning dying memory cgroups by outstanding allocations,
    obj_cgroup API is used instead of directly saving memory cgroup pointers.
    obj_cgroup is basically a pointer to a memory cgroup with a standalone
    reference counter.  The trick is that it can be atomically swapped to
    point at the parent cgroup, so that the original memory cgroup can be
    released prior to all objects, which has been charged to it.  Because all
    charges and statistics are fully recursive, it's perfectly correct to
    uncharge the parent cgroup instead.  This scheme is used in the slab
    memory accounting, and percpu memory can just follow the scheme.
    
    This patch (of 5):
    
    To implement accounting of percpu memory we need the information about the
    size of freed object.  Return it from pcpu_free_area().
    Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
    Acked-by: default avatarDennis Zhou <dennis@kernel.org>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: Tobin C. Harding <tobin@kernel.org>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Waiman Long <longman@redhat.com>
    cC: Michal Koutnýutny@suse.com>
    Cc: Bixuan Cui <cuibixuan@huawei.com>
    Cc: Michal Koutný <mkoutny@suse.com>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Link: http://lkml.kernel.org/r/20200623184515.4132564-1-guro@fb.com
    Link: http://lkml.kernel.org/r/20200608230819.832349-1-guro@fb.com
    Link: http://lkml.kernel.org/r/20200608230819.832349-2-guro@fb.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    5b32af91
percpu.c 91.5 KB