• Qu Wenruo's avatar
    btrfs: relocation: fix reloc_root lifespan and access · 6282675e
    Qu Wenruo authored
    [BUG]
    There are several different KASAN reports for balance + snapshot
    workloads.  Involved call paths include:
    
       should_ignore_root+0x54/0xb0 [btrfs]
       build_backref_tree+0x11af/0x2280 [btrfs]
       relocate_tree_blocks+0x391/0xb80 [btrfs]
       relocate_block_group+0x3e5/0xa00 [btrfs]
       btrfs_relocate_block_group+0x240/0x4d0 [btrfs]
       btrfs_relocate_chunk+0x53/0xf0 [btrfs]
       btrfs_balance+0xc91/0x1840 [btrfs]
       btrfs_ioctl_balance+0x416/0x4e0 [btrfs]
       btrfs_ioctl+0x8af/0x3e60 [btrfs]
       do_vfs_ioctl+0x831/0xb10
    
       create_reloc_root+0x9f/0x460 [btrfs]
       btrfs_reloc_post_snapshot+0xff/0x6c0 [btrfs]
       create_pending_snapshot+0xa9b/0x15f0 [btrfs]
       create_pending_snapshots+0x111/0x140 [btrfs]
       btrfs_commit_transaction+0x7a6/0x1360 [btrfs]
       btrfs_mksubvol+0x915/0x960 [btrfs]
       btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs]
       btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs]
       btrfs_ioctl+0x241b/0x3e60 [btrfs]
       do_vfs_ioctl+0x831/0xb10
    
       btrfs_reloc_pre_snapshot+0x85/0xc0 [btrfs]
       create_pending_snapshot+0x209/0x15f0 [btrfs]
       create_pending_snapshots+0x111/0x140 [btrfs]
       btrfs_commit_transaction+0x7a6/0x1360 [btrfs]
       btrfs_mksubvol+0x915/0x960 [btrfs]
       btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs]
       btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs]
       btrfs_ioctl+0x241b/0x3e60 [btrfs]
       do_vfs_ioctl+0x831/0xb10
    
    [CAUSE]
    All these call sites are only relying on root->reloc_root, which can
    undergo btrfs_drop_snapshot(), and since we don't have real refcount
    based protection to reloc roots, we can reach already dropped reloc
    root, triggering KASAN.
    
    [FIX]
    To avoid such access to unstable root->reloc_root, we should check
    BTRFS_ROOT_DEAD_RELOC_TREE bit first.
    
    This patch introduces wrappers that provide the correct way to check the
    bit with memory barriers protection.
    
    Most callers don't distinguish merged reloc tree and no reloc tree.  The
    only exception is should_ignore_root(), as merged reloc tree can be
    ignored, while no reloc tree shouldn't.
    
    [CRITICAL SECTION ANALYSIS]
    Although test_bit()/set_bit()/clear_bit() doesn't imply a barrier, the
    DEAD_RELOC_TREE bit has extra help from transaction as a higher level
    barrier, the lifespan of root::reloc_root and DEAD_RELOC_TREE bit are:
    
    	NULL: reloc_root is NULL	PTR: reloc_root is not NULL
    	0: DEAD_RELOC_ROOT bit not set	DEAD: DEAD_RELOC_ROOT bit set
    
    	(NULL, 0)    Initial state		 __
    	  |					 /\ Section A
            btrfs_init_reloc_root()			 \/
    	  |				 	 __
    	(PTR, 0)     reloc_root initialized      /\
              |					 |
    	btrfs_update_reloc_root()		 |  Section B
              |					 |
    	(PTR, DEAD)  reloc_root has been merged  \/
              |					 __
    	=== btrfs_commit_transaction() ====================
    	  |					 /\
    	clean_dirty_subvols()			 |
    	  |					 |  Section C
    	(NULL, DEAD) reloc_root cleanup starts   \/
              |					 __
    	btrfs_drop_snapshot()			 /\
    	  |					 |  Section D
    	(NULL, 0)    Back to initial state	 \/
    
    Every have_reloc_root() or test_bit(DEAD_RELOC_ROOT) caller holds
    transaction handle, so none of such caller can cross transaction boundary.
    
    In Section A, every caller just found no DEAD bit, and grab reloc_root.
    
    In the cross section A-B, caller may get no DEAD bit, but since reloc_root
    is still completely valid thus accessing reloc_root is completely safe.
    
    No test_bit() caller can cross the boundary of Section B and Section C.
    
    In Section C, every caller found the DEAD bit, so no one will access
    reloc_root.
    
    In the cross section C-D, either caller gets the DEAD bit set, avoiding
    access reloc_root no matter if it's safe or not.  Or caller get the DEAD
    bit cleared, then access reloc_root, which is already NULL, nothing will
    be wrong.
    
    The memory write barriers are between the reloc_root updates and bit
    set/clear, the pairing read side is before test_bit.
    Reported-by: default avatarZygo Blaxell <ce3g8jdj@umail.furryterror.org>
    Fixes: d2311e69 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
    CC: stable@vger.kernel.org # 5.4+
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    [ barriers ]
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    6282675e
relocation.c 116 KB