• Naohiro Aota's avatar
    btrfs: zoned: use dedicated lock for data relocation · 5f0addf7
    Naohiro Aota authored
    Currently, we use btrfs_inode_{lock,unlock}() to grant an exclusive
    writeback of the relocation data inode in
    btrfs_zoned_data_reloc_{lock,unlock}(). However, that can cause a deadlock
    in the following path.
    
    Thread A takes btrfs_inode_lock() and waits for metadata reservation by
    e.g, waiting for writeback:
    
    prealloc_file_extent_cluster()
      - btrfs_inode_lock(&inode->vfs_inode, 0);
      - btrfs_prealloc_file_range()
      ...
        - btrfs_replace_file_extents()
          - btrfs_start_transaction
          ...
            - btrfs_reserve_metadata_bytes()
    
    Thread B (e.g, doing a writeback work) needs to wait for the inode lock to
    continue writeback process:
    
    do_writepages
      - btrfs_writepages
        - extent_writpages
          - btrfs_zoned_data_reloc_lock(BTRFS_I(inode));
            - btrfs_inode_lock()
    
    The deadlock is caused by relying on the vfs_inode's lock. By using it, we
    introduced unnecessary exclusion of writeback and
    btrfs_prealloc_file_range(). Also, the lock at this point is useless as we
    don't have any dirty pages in the inode yet.
    
    Introduce fs_info->zoned_data_reloc_io_lock and use it for the exclusive
    writeback.
    
    Fixes: 35156d85 ("btrfs: zoned: only allow one process to add pages to a relocation inode")
    CC: stable@vger.kernel.org # 5.16.x: 869f4cdc: btrfs: zoned: encapsulate inode locking for zoned relocation
    CC: stable@vger.kernel.org # 5.16.x
    CC: stable@vger.kernel.org # 5.17
    Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
    Signed-off-by: default avatarNaohiro Aota <naohiro.aota@wdc.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    5f0addf7
ctree.h 132 KB