• Filipe Manana's avatar
    btrfs: do not delete unused block group if it may be used soon · f4a9f219
    Filipe Manana authored
    Before deleting a block group that is in the list of unused block groups
    (fs_info->unused_bgs), we check if the block group became used before
    deleting it, as extents from it may have been allocated after it was added
    to the list.
    
    However even if the block group was not yet used, there may be tasks that
    have only reserved space and have not yet allocated extents, and they
    might be relying on the availability of the unused block group in order
    to allocate extents. The reservation works first by increasing the
    "bytes_may_use" field of the corresponding space_info object (which may
    first require flushing delayed items, allocating a new block group, etc),
    and only later a task does the actual allocation of extents.
    
    For metadata we usually don't end up using all reserved space, as we are
    pessimistic and typically account for the worst cases (need to COW every
    single node in a path of a tree at maximum possible height, etc). For
    data we usually reserve the exact amount of space we're going to allocate
    later, except when using compression where we always reserve space based
    on the uncompressed size, as compression is only triggered when writeback
    starts so we don't know in advance how much space we'll actually need, or
    if the data is compressible.
    
    So don't delete an unused block group if the total size of its space_info
    object minus the block group's size is less then the sum of used space and
    space that may be used (space_info->bytes_may_use), as that means we have
    tasks that reserved space and may need to allocate extents from the block
    group. In this case, besides skipping the deletion, re-add the block group
    to the list of unused block groups so that it may be reconsidered later,
    in case the tasks that reserved space end up not needing to allocate
    extents from it.
    
    Allowing the deletion of the block group while we have reserved space, can
    result in tasks failing to allocate metadata extents (-ENOSPC) while under
    a transaction handle, resulting in a transaction abort, or failure during
    writeback for the case of data extents.
    
    CC: stable@vger.kernel.org # 6.0+
    Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Reviewed-by: default avatarBoris Burkov <boris@bur.io>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    f4a9f219
block-group.c 136 KB