• Filipe Manana's avatar
    Btrfs: fix deadlock when writing out free space caches · 5ce55557
    Filipe Manana authored
    When writing out a block group free space cache we can end deadlocking
    with ourselves on an extent buffer lock resulting in a warning like the
    following:
    
      [245043.379979] WARNING: CPU: 4 PID: 2608 at fs/btrfs/locking.c:251 btrfs_tree_lock+0x1be/0x1d0 [btrfs]
      [245043.392792] CPU: 4 PID: 2608 Comm: btrfs-transacti Tainted: G
        W I      4.16.8 #1
      [245043.395489] RIP: 0010:btrfs_tree_lock+0x1be/0x1d0 [btrfs]
      [245043.396791] RSP: 0018:ffffc9000424b840 EFLAGS: 00010246
      [245043.398093] RAX: 0000000000000a30 RBX: ffff8807e20a3d20 RCX: 0000000000000001
      [245043.399414] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff8807e20a3d20
      [245043.400732] RBP: 0000000000000001 R08: ffff88041f39a700 R09: ffff880000000000
      [245043.402021] R10: 0000000000000040 R11: ffff8807e20a3d20 R12: ffff8807cb220630
      [245043.403296] R13: 0000000000000001 R14: ffff8807cb220628 R15: ffff88041fbdf000
      [245043.404780] FS:  0000000000000000(0000) GS:ffff88082fc80000(0000) knlGS:0000000000000000
      [245043.406050] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [245043.407321] CR2: 00007fffdbdb9f10 CR3: 0000000001c09005 CR4: 00000000000206e0
      [245043.408670] Call Trace:
      [245043.409977]  btrfs_search_slot+0x761/0xa60 [btrfs]
      [245043.411278]  btrfs_insert_empty_items+0x62/0xb0 [btrfs]
      [245043.412572]  btrfs_insert_item+0x5b/0xc0 [btrfs]
      [245043.413922]  btrfs_create_pending_block_groups+0xfb/0x1e0 [btrfs]
      [245043.415216]  do_chunk_alloc+0x1e5/0x2a0 [btrfs]
      [245043.416487]  find_free_extent+0xcd0/0xf60 [btrfs]
      [245043.417813]  btrfs_reserve_extent+0x96/0x1e0 [btrfs]
      [245043.419105]  btrfs_alloc_tree_block+0xfb/0x4a0 [btrfs]
      [245043.420378]  __btrfs_cow_block+0x127/0x550 [btrfs]
      [245043.421652]  btrfs_cow_block+0xee/0x190 [btrfs]
      [245043.422979]  btrfs_search_slot+0x227/0xa60 [btrfs]
      [245043.424279]  ? btrfs_update_inode_item+0x59/0x100 [btrfs]
      [245043.425538]  ? iput+0x72/0x1e0
      [245043.426798]  write_one_cache_group.isra.49+0x20/0x90 [btrfs]
      [245043.428131]  btrfs_start_dirty_block_groups+0x102/0x420 [btrfs]
      [245043.429419]  btrfs_commit_transaction+0x11b/0x880 [btrfs]
      [245043.430712]  ? start_transaction+0x8e/0x410 [btrfs]
      [245043.432006]  transaction_kthread+0x184/0x1a0 [btrfs]
      [245043.433341]  kthread+0xf0/0x130
      [245043.434628]  ? btrfs_cleanup_transaction+0x4e0/0x4e0 [btrfs]
      [245043.435928]  ? kthread_create_worker_on_cpu+0x40/0x40
      [245043.437236]  ret_from_fork+0x1f/0x30
      [245043.441054] ---[ end trace 15abaa2aaf36827f ]---
    
    This is because at write_one_cache_group() when we are COWing a leaf from
    the extent tree we end up allocating a new block group (chunk) and,
    because we have hit a threshold on the number of bytes reserved for system
    chunks, we attempt to finalize the creation of new block groups from the
    current transaction, by calling btrfs_create_pending_block_groups().
    However here we also need to modify the extent tree in order to insert
    a block group item, and if the location for this new block group item
    happens to be in the same leaf that we were COWing earlier, we deadlock
    since btrfs_search_slot() tries to write lock the extent buffer that we
    locked before at write_one_cache_group().
    
    We have already hit similar cases in the past and commit d9a0540a
    ("Btrfs: fix deadlock when finalizing block group creation") fixed some
    of those cases by delaying the creation of pending block groups at the
    known specific spots that could lead to a deadlock. This change reworks
    that commit to be more generic so that we don't have to add similar logic
    to every possible path that can lead to a deadlock. This is done by
    making __btrfs_cow_block() disallowing the creation of new block groups
    (setting the transaction's can_flush_pending_bgs to false) before it
    attempts to allocate a new extent buffer for either the extent, chunk or
    device trees, since those are the trees that pending block creation
    modifies. Once the new extent buffer is allocated, it allows creation of
    pending block groups to happen again.
    
    This change depends on a recent patch from Josef which is not yet in
    Linus' tree, named "btrfs: make sure we create all new block groups" in
    order to avoid occasional warnings at btrfs_trans_release_chunk_metadata().
    
    Fixes: d9a0540a ("Btrfs: fix deadlock when finalizing block group creation")
    CC: stable@vger.kernel.org # 4.4+
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199753
    Link: https://lore.kernel.org/linux-btrfs/CAJtFHUTHna09ST-_EEiyWmDH6gAqS6wa=zMNMBsifj8ABu99cw@mail.gmail.com/Reported-by: default avatarE V <eliventer@gmail.com>
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    5ce55557
ctree.c 151 KB