• Filipe Manana's avatar
    Btrfs: fix the number of transaction units needed to remove a block group · 7fd01182
    Filipe Manana authored
    We were using only 1 transaction unit when attempting to delete an unused
    block group but in reality we need 3 + N units, where N corresponds to the
    number of stripes. We were accounting only for the addition of the orphan
    item (for the block group's free space cache inode) but we were not
    accounting that we need to delete one block group item from the extent
    tree, one free space item from the tree of tree roots and N device extent
    items from the device tree.
    
    While one unit is not enough, it worked most of the time because for each
    single unit we are too pessimistic and assume an entire tree path, with
    the highest possible heigth (8), needs to be COWed with eventual node
    splits at every possible level in the tree, so there was usually enough
    reserved space for removing all the items and adding the orphan item.
    
    However after adding the orphan item, writepages() can by called by the VM
    subsystem against the btree inode when we are under memory pressure, which
    causes writeback to start for the nodes we COWed before, this forces the
    operation to remove the free space item to COW again some (or all of) the
    same nodes (in the tree of tree roots). Even without writepages() being
    called, we could fail with ENOSPC because these items are located in
    multiple trees and one of them might have a higher heigth and require
    node/leaf splits at many levels, exhausting all the reserved space before
    removing all the items and adding the orphan.
    
    In the kernel 4.0 release, commit 3d84be79 ("Btrfs: fix BUG_ON in
    btrfs_orphan_add() when delete unused block group"), we attempted to fix
    a BUG_ON due to ENOSPC when trying to add the orphan item by making the
    cleaner kthread reserve one transaction unit before attempting to remove
    the block group, but this was not enough. We had a couple user reports
    still hitting the same BUG_ON after 4.0, like Stefan Priebe's report on
    a 4.2-rc6 kernel for example:
    
        http://www.spinics.net/lists/linux-btrfs/msg46070.html
    
    So fix this by reserving all the necessary units of metadata.
    Reported-by: default avatarStefan Priebe <s.priebe@profihost.ag>
    Fixes: 3d84be79 ("Btrfs: fix BUG_ON in btrfs_orphan_add() when delete unused block group")
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    7fd01182
extent-tree.c 288 KB