• Josef Bacik's avatar
    btrfs: run delayed items before dropping the snapshot · 0568e82d
    Josef Bacik authored
    With my delayed refs patches in place we started seeing a large amount
    of aborts in __btrfs_free_extent:
    
     BTRFS error (device sdb1): unable to find ref byte nr 91947008 parent 0 root 35964  owner 1 offset 0
     Call Trace:
      ? btrfs_merge_delayed_refs+0xaf/0x340
      __btrfs_run_delayed_refs+0x6ea/0xfc0
      ? btrfs_set_path_blocking+0x31/0x60
      btrfs_run_delayed_refs+0xeb/0x180
      btrfs_commit_transaction+0x179/0x7f0
      ? btrfs_check_space_for_delayed_refs+0x30/0x50
      ? should_end_transaction.isra.19+0xe/0x40
      btrfs_drop_snapshot+0x41c/0x7c0
      btrfs_clean_one_deleted_snapshot+0xb5/0xd0
      cleaner_kthread+0xf6/0x120
      kthread+0xf8/0x130
      ? btree_invalidatepage+0x90/0x90
      ? kthread_bind+0x10/0x10
      ret_from_fork+0x35/0x40
    
    This was because btrfs_drop_snapshot depends on the root not being
    modified while it's dropping the snapshot.  It will unlock the root node
    (and really every node) as it walks down the tree, only to re-lock it
    when it needs to do something.  This is a problem because if we modify
    the tree we could cow a block in our path, which frees our reference to
    that block.  Then once we get back to that shared block we'll free our
    reference to it again, and get ENOENT when trying to lookup our extent
    reference to that block in __btrfs_free_extent.
    
    This is ultimately happening because we have delayed items left to be
    processed for our deleted snapshot _after_ all of the inodes are closed
    for the snapshot.  We only run the delayed inode item if we're deleting
    the inode, and even then we do not run the delayed insertions or delayed
    removals.  These can be run at any point after our final inode does its
    last iput, which is what triggers the snapshot deletion.  We can end up
    with the snapshot deletion happening and then have the delayed items run
    on that file system, resulting in the above problem.
    
    This problem has existed forever, however my patches made it much easier
    to hit as I wake up the cleaner much more often to deal with delayed
    iputs, which made us more likely to start the snapshot dropping work
    before the transaction commits, which is when the delayed items would
    generally be run.  Before, generally speaking, we would run the delayed
    items, commit the transaction, and wakeup the cleaner thread to start
    deleting snapshots, which means we were less likely to hit this problem.
    You could still hit it if you had multiple snapshots to be deleted and
    ended up with lots of delayed items, but it was definitely harder.
    
    Fix for now by simply running all the delayed items before starting to
    drop the snapshot.  We could make this smarter in the future by making
    the delayed items per-root, and then simply drop any delayed items for
    roots that we are going to delete.  But for now just a quick and easy
    solution is the safest.
    
    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    0568e82d
extent-tree.c 312 KB