• Yosry Ahmed's avatar
    writeback: move wb_over_bg_thresh() call outside lock section · 2816ea2a
    Yosry Ahmed authored
    Patch series "cgroup: eliminate atomic rstat flushing", v5.
    
    A previous patch series [1] changed most atomic rstat flushing contexts to
    become non-atomic.  This was done to avoid an expensive operation that
    scales with # cgroups and # cpus to happen with irqs disabled and
    scheduling not permitted.  There were two remaining atomic flushing
    contexts after that series.  This series tries to eliminate them as well,
    eliminating atomic rstat flushing completely.
    
    The two remaining atomic flushing contexts are:
    (a) wb_over_bg_thresh()->mem_cgroup_wb_stats()
    (b) mem_cgroup_threshold()->mem_cgroup_usage()
    
    For (a), flushing needs to be atomic as wb_writeback() calls
    wb_over_bg_thresh() with a spinlock held.  However, it seems like the call
    to wb_over_bg_thresh() doesn't need to be protected by that spinlock, so
    this series proposes a refactoring that moves the call outside the lock
    criticial section and makes the stats flushing in mem_cgroup_wb_stats()
    non-atomic.
    
    For (b), flushing needs to be atomic as mem_cgroup_threshold() is called
    with irqs disabled.  We only flush the stats when calculating the root
    usage, as it is approximated as the sum of some memcg stats (file, anon,
    and optionally swap) instead of the conventional page counter.  This
    series proposes changing this calculation to use the global stats instead,
    eliminating the need for a memcg stat flush.
    
    After these 2 contexts are eliminated, we no longer need
    mem_cgroup_flush_stats_atomic() or cgroup_rstat_flush_atomic().  We can
    remove them and simplify the code.
    
    [1] https://lore.kernel.org/linux-mm/20230330191801.1967435-1-yosryahmed@google.com/
    
    
    This patch (of 5):
    
    wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat
    flush, which can be expensive on large systems. Currently,
    wb_writeback() calls wb_over_bg_thresh() within a lock section, so we
    have to do the rstat flush atomically. On systems with a lot of
    cpus and/or cgroups, this can cause us to disable irqs for a long time,
    potentially causing problems.
    
    Move the call to wb_over_bg_thresh() outside the lock section in
    preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic.
    The list_empty(&wb->work_list) check should be okay outside the lock
    section of wb->list_lock as it is protected by a separate lock
    (wb->work_lock), and wb_over_bg_thresh() doesn't seem like it is
    modifying any of wb->b_* lists the wb->list_lock is protecting.
    Also, the loop seems to be already releasing and reacquring the
    lock, so this refactoring looks safe.
    
    Link: https://lkml.kernel.org/r/20230421174020.2994750-1-yosryahmed@google.com
    Link: https://lkml.kernel.org/r/20230421174020.2994750-2-yosryahmed@google.comSigned-off-by: default avatarYosry Ahmed <yosryahmed@google.com>
    Reviewed-by: default avatarMichal Koutný <mkoutny@suse.com>
    Reviewed-by: default avatarJan Kara <jack@suse.cz>
    Acked-by: default avatarShakeel Butt <shakeelb@google.com>
    Acked-by: default avatarTejun Heo <tj@kernel.org>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    2816ea2a
fs-writeback.c 79.8 KB