• Zhihao Cheng's avatar
    vfs: Don't evict inode under the inode lru traversing context · 2a062983
    Zhihao Cheng authored
    The inode reclaiming process(See function prune_icache_sb) collects all
    reclaimable inodes and mark them with I_FREEING flag at first, at that
    time, other processes will be stuck if they try getting these inodes
    (See function find_inode_fast), then the reclaiming process destroy the
    inodes by function dispose_list(). Some filesystems(eg. ext4 with
    ea_inode feature, ubifs with xattr) may do inode lookup in the inode
    evicting callback function, if the inode lookup is operated under the
    inode lru traversing context, deadlock problems may happen.
    
    Case 1: In function ext4_evict_inode(), the ea inode lookup could happen
            if ea_inode feature is enabled, the lookup process will be stuck
    	under the evicting context like this:
    
     1. File A has inode i_reg and an ea inode i_ea
     2. getfattr(A, xattr_buf) // i_ea is added into lru // lru->i_ea
     3. Then, following three processes running like this:
    
        PA                              PB
     echo 2 > /proc/sys/vm/drop_caches
      shrink_slab
       prune_dcache_sb
       // i_reg is added into lru, lru->i_ea->i_reg
       prune_icache_sb
        list_lru_walk_one
         inode_lru_isolate
          i_ea->i_state |= I_FREEING // set inode state
         inode_lru_isolate
          __iget(i_reg)
          spin_unlock(&i_reg->i_lock)
          spin_unlock(lru_lock)
                                         rm file A
                                          i_reg->nlink = 0
          iput(i_reg) // i_reg->nlink is 0, do evict
           ext4_evict_inode
            ext4_xattr_delete_inode
             ext4_xattr_inode_dec_ref_all
              ext4_xattr_inode_iget
               ext4_iget(i_ea->i_ino)
                iget_locked
                 find_inode_fast
                  __wait_on_freeing_inode(i_ea) ----→ AA deadlock
        dispose_list // cannot be executed by prune_icache_sb
         wake_up_bit(&i_ea->i_state)
    
    Case 2: In deleted inode writing function ubifs_jnl_write_inode(), file
            deleting process holds BASEHD's wbuf->io_mutex while getting the
    	xattr inode, which could race with inode reclaiming process(The
            reclaiming process could try locking BASEHD's wbuf->io_mutex in
    	inode evicting function), then an ABBA deadlock problem would
    	happen as following:
    
     1. File A has inode ia and a xattr(with inode ixa), regular file B has
        inode ib and a xattr.
     2. getfattr(A, xattr_buf) // ixa is added into lru // lru->ixa
     3. Then, following three processes running like this:
    
            PA                PB                        PC
                    echo 2 > /proc/sys/vm/drop_caches
                     shrink_slab
                      prune_dcache_sb
                      // ib and ia are added into lru, lru->ixa->ib->ia
                      prune_icache_sb
                       list_lru_walk_one
                        inode_lru_isolate
                         ixa->i_state |= I_FREEING // set inode state
                        inode_lru_isolate
                         __iget(ib)
                         spin_unlock(&ib->i_lock)
                         spin_unlock(lru_lock)
                                                       rm file B
                                                        ib->nlink = 0
     rm file A
      iput(ia)
       ubifs_evict_inode(ia)
        ubifs_jnl_delete_inode(ia)
         ubifs_jnl_write_inode(ia)
          make_reservation(BASEHD) // Lock wbuf->io_mutex
          ubifs_iget(ixa->i_ino)
           iget_locked
            find_inode_fast
             __wait_on_freeing_inode(ixa)
              |          iput(ib) // ib->nlink is 0, do evict
              |           ubifs_evict_inode
              |            ubifs_jnl_delete_inode(ib)
              ↓             ubifs_jnl_write_inode
         ABBA deadlock ←-----make_reservation(BASEHD)
                       dispose_list // cannot be executed by prune_icache_sb
                        wake_up_bit(&ixa->i_state)
    
    Fix the possible deadlock by using new inode state flag I_LRU_ISOLATING
    to pin the inode in memory while inode_lru_isolate() reclaims its pages
    instead of using ordinary inode reference. This way inode deletion
    cannot be triggered from inode_lru_isolate() thus avoiding the deadlock.
    evict() is made to wait for I_LRU_ISOLATING to be cleared before
    proceeding with inode cleanup.
    
    Link: https://lore.kernel.org/all/37c29c42-7685-d1f0-067d-63582ffac405@huaweicloud.com/
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=219022
    Fixes: e50e5129 ("ext4: xattr-in-inode support")
    Fixes: 7959cf3a ("ubifs: journal: Handle xattrs like files")
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
    Link: https://lore.kernel.org/r/20240809031628.1069873-1-chengzhihao@huaweicloud.comReviewed-by: default avatarJan Kara <jack@suse.cz>
    Suggested-by: default avatarJan Kara <jack@suse.cz>
    Suggested-by: default avatarMateusz Guzik <mjguzik@gmail.com>
    Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
    2a062983
inode.c 72.5 KB