• Dave Chinner's avatar
    list_lru: fix broken LRU_RETRY behaviour · 5cedf721
    Dave Chinner authored
    The LRU_RETRY code assumes that the list traversal status after we have
    dropped and regained the list lock.  Unfortunately, this is not a valid
    assumption, and that can lead to racing traversals isolating objects that
    the other traversal expects to be the next item on the list.
    
    This is causing problems with the inode cache shrinker isolation, with
    races resulting in an inode on a dispose list being "isolated" because a
    racing traversal still thinks it is on the LRU.  The inode is then never
    reclaimed and that causes hangs if a subsequent lookup on that inode
    occurs.
    
    Fix it by always restarting the list walk on a LRU_RETRY return from the
    isolate callback.  Avoid the possibility of livelocks the current code was
    trying to avoid by always decrementing the nr_to_walk counter on retries
    so that even if we keep hitting the same item on the list we'll eventually
    stop trying to walk and exit out of the situation causing the problem.
    Reported-by: default avatarMichal Hocko <mhocko@suse.cz>
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Cc: Glauber Costa <glommer@gmail.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
    5cedf721
list_lru.c 4.15 KB