• Filipe Manana's avatar
    btrfs: don't refill whole delayed refs block reserve when starting transaction · 2f6397e4
    Filipe Manana authored
    Since commit 28270e25 ("btrfs: always reserve space for delayed refs
    when starting transaction") we started not only to reserve metadata space
    for the delayed refs a caller of btrfs_start_transaction() might generate
    but also to try to fully refill the delayed refs block reserve, because
    there are several case where we generate delayed refs and haven't reserved
    space for them, relying on the global block reserve. Relying too much on
    the global block reserve is not always safe, and can result in hitting
    -ENOSPC during transaction commits or worst, in rare cases, being unable
    to mount a filesystem that needs to do orphan cleanup or anything that
    requires modifying the filesystem during mount, and has no more
    unallocated space and the metadata space is nearly full. This was
    explained in detail in that commit's change log.
    
    However the gap between the reserved amount and the size of the delayed
    refs block reserve can be huge, so attempting to reserve space for such
    a gap can result in allocating many metadata block groups that end up
    not being used. After a recent patch, with the subject:
    
      "btrfs: add new unused block groups to the list of unused block groups"
    
    We started to add new block groups that are unused to the list of unused
    block groups, to avoid having them around for a very long time in case
    they are never used, because a block group is only added to the list of
    unused block groups when we deallocate the last extent or when mounting
    the filesystem and the block group has 0 bytes used. This is not a problem
    introduced by the commit mentioned earlier, it always existed as our
    metadata space reservations are, most of the time, pessimistic and end up
    not using all the space they reserved, so we can occasionally end up with
    one or two unused metadata block groups for a long period. However after
    that commit mentioned earlier, we are just more pessimistic in the
    metadata space reservations when starting a transaction and therefore the
    issue is more likely to happen.
    
    This however is not always enough because we might create unused metadata
    block groups when reserving metadata space at a high rate if there's
    always a gap in the delayed refs block reserve and the cleaner kthread
    isn't triggered often enough or is busy with other work (running delayed
    iputs, cleaning deleted roots, etc), not to mention the block group's
    allocated space is only usable for a new block group after the transaction
    used to remove it is committed.
    
    A user reported that he's getting a lot of allocated metadata block groups
    but the usage percentage of metadata space was very low compared to the
    total allocated space, specially after running a series of block group
    relocations.
    
    So for now stop trying to refill the gap in the delayed refs block reserve
    and reserve space only for the delayed refs we are expected to generate
    when starting a transaction.
    
    CC: stable@vger.kernel.org # 6.7+
    Reported-by: default avatarIvan Shapovalov <intelfx@intelfx.name>
    Link: https://lore.kernel.org/linux-btrfs/9cdbf0ca9cdda1b4c84e15e548af7d7f9f926382.camel@intelfx.name/
    Link: https://lore.kernel.org/linux-btrfs/CAL3q7H6802ayLHUJFztzZAVzBLJAGdFx=6FHNNy87+obZXXZpQ@mail.gmail.com/Tested-by: default avatarIvan Shapovalov <intelfx@intelfx.name>
    Reported-by: default avatarHeddxh <g311571057@gmail.com>
    Link: https://lore.kernel.org/linux-btrfs/CAE93xANEby6RezOD=zcofENYZOT-wpYygJyauyUAZkLv6XVFOA@mail.gmail.com/Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    2f6397e4
transaction.c 79.9 KB