• Qi Zheng's avatar
    mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred} · 307becec
    Qi Zheng authored
    Currently, we maintain two linear arrays per node per memcg, which are
    shrinker_info::map and shrinker_info::nr_deferred. And we need to resize
    them when the shrinker_nr_max is exceeded, that is, allocate a new array,
    and then copy the old array to the new array, and finally free the old
    array by RCU.
    
    For shrinker_info::map, we do set_bit() under the RCU lock, so we may set
    the value into the old map which is about to be freed. This may cause the
    value set to be lost. The current solution is not to copy the old map when
    resizing, but to set all the corresponding bits in the new map to 1. This
    solves the data loss problem, but bring the overhead of more pointless
    loops while doing memcg slab shrink.
    
    For shrinker_info::nr_deferred, we will only modify it under the read lock
    of shrinker_rwsem, so it will not run concurrently with the resizing. But
    after we make memcg slab shrink lockless, there will be the same data loss
    problem as shrinker_info::map, and we can't work around it like the map.
    
    For such resizable arrays, the most straightforward idea is to change it
    to xarray, like we did for list_lru [1]. We need to do xa_store() in the
    list_lru_add()-->set_shrinker_bit(), but this will cause memory
    allocation, and the list_lru_add() doesn't accept failure. A possible
    solution is to pre-allocate, but the location of pre-allocation is not
    well determined (such as deferred_split_shrinker case).
    
    Therefore, this commit chooses to introduce the following secondary array
    for shrinker_info::{map, nr_deferred}:
    
    +---------------+--------+--------+-----+
    | shrinker_info | unit 0 | unit 1 | ... | (secondary array)
    +---------------+--------+--------+-----+
                         |
                         v
                    +---------------+-----+
                    | nr_deferred[] | map | (leaf array)
                    +---------------+-----+
                    (shrinker_info_unit)
    
    The leaf array is never freed unless the memcg is destroyed. The secondary
    array will be resized every time the shrinker id exceeds shrinker_nr_max.
    So the shrinker_info_unit can be indexed from both the old and the new
    shrinker_info->unit[x]. Then even if we get the old secondary array under
    the RCU lock, the found map and nr_deferred are also true, so the updated
    nr_deferred and map will not be lost.
    
    [1]. https://lore.kernel.org/all/20220228122126.37293-13-songmuchun@bytedance.com/
    
    [zhengqi.arch@bytedance.com: unlock the &shrinker_rwsem before the call to free_shrinker_info()]
      Link: https://lkml.kernel.org/r/20230928141517.12164-1-zhengqi.arch@bytedance.com
    Link: https://lkml.kernel.org/r/20230911094444.68966-41-zhengqi.arch@bytedance.comSigned-off-by: default avatarQi Zheng <zhengqi.arch@bytedance.com>
    Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
    Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
    Cc: Alasdair Kergon <agk@redhat.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
    Cc: Andreas Dilger <adilger.kernel@dilger.ca>
    Cc: Andreas Gruenbacher <agruenba@redhat.com>
    Cc: Anna Schumaker <anna@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Bob Peterson <rpeterso@redhat.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Carlos Llamas <cmllamas@google.com>
    Cc: Chandan Babu R <chandan.babu@oracle.com>
    Cc: Chao Yu <chao@kernel.org>
    Cc: Chris Mason <clm@fb.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Christian Koenig <christian.koenig@amd.com>
    Cc: Chuck Lever <cel@kernel.org>
    Cc: Coly Li <colyli@suse.de>
    Cc: Dai Ngo <Dai.Ngo@oracle.com>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
    Cc: "Darrick J. Wong" <djwong@kernel.org>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Airlie <airlied@gmail.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Sterba <dsterba@suse.com>
    Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
    Cc: Gao Xiang <hsiangkao@linux.alibaba.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Huang Rui <ray.huang@amd.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jaegeuk Kim <jaegeuk@kernel.org>
    Cc: Jani Nikula <jani.nikula@linux.intel.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Jason Wang <jasowang@redhat.com>
    Cc: Jeff Layton <jlayton@kernel.org>
    Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
    Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Cc: Josef Bacik <josef@toxicpanda.com>
    Cc: Juergen Gross <jgross@suse.com>
    Cc: Kent Overstreet <kent.overstreet@gmail.com>
    Cc: Kirill Tkhai <tkhai@ya.ru>
    Cc: Marijn Suijten <marijn.suijten@somainline.org>
    Cc: "Michael S. Tsirkin" <mst@redhat.com>
    Cc: Mike Snitzer <snitzer@kernel.org>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Nadav Amit <namit@vmware.com>
    Cc: Neil Brown <neilb@suse.de>
    Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
    Cc: Olga Kornievskaia <kolga@netapp.com>
    Cc: Paul E. McKenney <paulmck@kernel.org>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rob Clark <robdclark@gmail.com>
    Cc: Rob Herring <robh@kernel.org>
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Sean Paul <sean@poorly.run>
    Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
    Cc: Song Liu <song@kernel.org>
    Cc: Stefano Stabellini <sstabellini@kernel.org>
    Cc: Steven Price <steven.price@arm.com>
    Cc: "Theodore Ts'o" <tytso@mit.edu>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
    Cc: Tom Talpey <tom@talpey.com>
    Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    Cc: Yue Hu <huyue2@coolpad.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    307becec
shrinker.c 17.7 KB