• Josef Bacik's avatar
    btrfs: honor path->skip_locking in backref code · 38e3eebf
    Josef Bacik authored
    Qgroups will do the old roots lookup at delayed ref time, which could be
    while walking down the extent root while running a delayed ref.  This
    should be fine, except we specifically lock eb's in the backref walking
    code irrespective of path->skip_locking, which deadlocks the system.
    Fix up the backref code to honor path->skip_locking, nobody will be
    modifying the commit_root when we're searching so it's completely safe
    to do.
    
    This happens since fb235dc0 ("btrfs: qgroup: Move half of the qgroup
    accounting time out of commit trans"), kernel may lockup with quota
    enabled.
    
    There is one backref trace triggered by snapshot dropping along with
    write operation in the source subvolume.  The example can be reliably
    reproduced:
    
      btrfs-cleaner   D    0  4062      2 0x80000000
      Call Trace:
       schedule+0x32/0x90
       btrfs_tree_read_lock+0x93/0x130 [btrfs]
       find_parent_nodes+0x29b/0x1170 [btrfs]
       btrfs_find_all_roots_safe+0xa8/0x120 [btrfs]
       btrfs_find_all_roots+0x57/0x70 [btrfs]
       btrfs_qgroup_trace_extent_post+0x37/0x70 [btrfs]
       btrfs_qgroup_trace_leaf_items+0x10b/0x140 [btrfs]
       btrfs_qgroup_trace_subtree+0xc8/0xe0 [btrfs]
       do_walk_down+0x541/0x5e3 [btrfs]
       walk_down_tree+0xab/0xe7 [btrfs]
       btrfs_drop_snapshot+0x356/0x71a [btrfs]
       btrfs_clean_one_deleted_snapshot+0xb8/0xf0 [btrfs]
       cleaner_kthread+0x12b/0x160 [btrfs]
       kthread+0x112/0x130
       ret_from_fork+0x27/0x50
    
    When dropping snapshots with qgroup enabled, we will trigger backref
    walk.
    
    However such backref walk at that timing is pretty dangerous, as if one
    of the parent nodes get WRITE locked by other thread, we could cause a
    dead lock.
    
    For example:
    
               FS 260     FS 261 (Dropped)
                node A        node B
               /      \      /      \
           node C      node D      node E
          /   \         /  \        /     \
      leaf F|leaf G|leaf H|leaf I|leaf J|leaf K
    
    The lock sequence would be:
    
          Thread A (cleaner)             |       Thread B (other writer)
    -----------------------------------------------------------------------
    write_lock(B)                        |
    write_lock(D)                        |
    ^^^ called by walk_down_tree()       |
                                         |       write_lock(A)
                                         |       write_lock(D) << Stall
    read_lock(H) << for backref walk     |
    read_lock(D) << lock owner is        |
                    the same thread A    |
                    so read lock is OK   |
    read_lock(A) << Stall                |
    
    So thread A hold write lock D, and needs read lock A to unlock.
    While thread B holds write lock A, while needs lock D to unlock.
    
    This will cause a deadlock.
    
    This is not only limited to snapshot dropping case.  As the backref
    walk, even only happens on commit trees, is breaking the normal top-down
    locking order, makes it deadlock prone.
    
    Fixes: fb235dc0 ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans")
    CC: stable@vger.kernel.org # 4.14+
    Reported-and-tested-by: default avatarDavid Sterba <dsterba@suse.com>
    Reported-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
    Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
    [ rebase to latest branch and fix lock assert bug in btrfs/007 ]
    Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
    [ copy logs and deadlock analysis from Qu's patch ]
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    38e3eebf
backref.c 59.2 KB