• Chengming Zhou's avatar
    mm/zswap: add more comments in shrink_memcg_cb() · f9c0f1c3
    Chengming Zhou authored
    Patch series "mm/zswap: optimize zswap lru list", v2.
    
    This series is motivated when observe the zswap lru list shrinking, noted
    there are some unexpected cases in zswap_writeback_entry().
    
    bpftrace -e 'kr:zswap_writeback_entry {@[(int32)retval]=count()}'
    
    There are some -ENOMEM because when the swap entry is freed to per-cpu
    swap pool, it doesn't invalidate/drop zswap entry.  Then the shrinker
    encounter these trashy zswap entries, it can't be reclaimed and return
    -ENOMEM.
    
    So move the invalidation ahead to when swap entry freed to the per-cpu
    swap pool, since there is no any benefit to leave trashy zswap entries on
    the zswap tree and lru list.
    
    Another case is -EEXIST, which is seen more in the case of
    !zswap_exclusive_loads_enabled, in which case the swapin folio will leave
    compressed copy on the tree and lru list.  And it can't be reclaimed until
    the folio is removed from swapcache.
    
    Changing to zswap_exclusive_loads_enabled mode will invalidate when folio
    swapin, which has its own drawback if that folio is still clean in
    swapcache and swapout again, we need to compress it again.  Please see the
    commit for details on why we choose exclusive load as the default for
    zswap.
    
    Another optimization for -EEXIST is that we add LRU_STOP to support
    terminating the shrinking process to avoid evicting warmer region.
    
    Testing using kernel build in tmpfs, one 50GB swapfile and
    zswap shrinker_enabled, with memory.max set to 2GB.
    
                    mm-unstable   zswap-optimize
    real               63.90s       63.25s
    user             1064.05s     1063.40s
    sys               292.32s      270.94s
    
    The main optimization is in sys cpu, about 7% improvement.
    
    
    This patch (of 6):
    
    Add more comments in shrink_memcg_cb() to describe the deref dance which
    is implemented to fix race problem between lru writeback and swapoff, and
    the reason why we rotate the entry at the beginning.
    
    Also fix the stale comments in zswap_writeback_entry(), and add more
    comments to state that we only deref the tree after we get the swapcache
    reference.
    
    Link: https://lkml.kernel.org/r/20240201-b4-zswap-invalidate-entry-v2-0-99d4084260a0@bytedance.com
    Link: https://lkml.kernel.org/r/20240201-b4-zswap-invalidate-entry-v2-1-99d4084260a0@bytedance.comSigned-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: default avatarChengming Zhou <zhouchengming@bytedance.com>
    Suggested-by: default avatarYosry Ahmed <yosryahmed@google.com>
    Suggested-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Acked-by: default avatarYosry Ahmed <yosryahmed@google.com>
    Reviewed-by: default avatarNhat Pham <nphamcs@gmail.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    f9c0f1c3
zswap.c 51.4 KB