• Filipe Manana's avatar
    btrfs: make ranged full fsyncs more efficient · 0a8068a3
    Filipe Manana authored
    Commit 0c713cba ("Btrfs: fix race between ranged fsync and writeback
    of adjacent ranges") fixed a bug where we could end up with file extent
    items in a log tree that represent file ranges that overlap due to a race
    between the hole detection of a ranged full fsync and writeback for a
    different file range.
    
    The problem was solved by forcing any ranged full fsync to become a
    non-ranged full fsync - setting the range start to 0 and the end offset to
    LLONG_MAX. This was a simple solution because the code that detected and
    marked holes was very complex, it used to be done at copy_items() and
    implied several searches on the fs/subvolume tree. The drawback of that
    solution was that we started to flush delalloc for the entire file and
    wait for all the ordered extents to complete for ranged full fsyncs
    (including ordered extents covering ranges completely outside the given
    range). Fortunatelly ranged full fsyncs are not the most common case
    (hopefully for most workloads).
    
    However a later fix for detecting and marking holes was made by commit
    0e56315c ("Btrfs: fix missing hole after hole punching and fsync
    when using NO_HOLES") and it simplified a lot the detection of holes,
    and now copy_items() no longer does it and we do it in a much more simple
    way at btrfs_log_holes().
    
    This makes it now possible to simply make the code that detects holes to
    operate only on the initial range and no longer need to operate on the
    whole file, while also avoiding the need to flush delalloc for the entire
    file and wait for ordered extents that cover ranges that don't overlap the
    given range.
    
    Another special care is that we must skip file extent items that fall
    entirely outside the fsync range when copying inode items from the
    fs/subvolume tree into the log tree - this is to avoid races with ordered
    extent completion for extents falling outside the fsync range, which could
    cause us to end up with file extent items in the log tree that have
    overlapping ranges - for example if the fsync range is [1Mb, 2Mb], when
    we copy inode items we could copy an extent item for the range [0, 512K],
    then release the search path and before moving to the next leaf, an
    ordered extent for a range of [256Kb, 512Kb] completes - this would
    cause us to copy the new extent item for range [256Kb, 512Kb] into the
    log tree after we have copied one for the range [0, 512Kb] - the extents
    overlap, resulting in a corruption.
    
    So this change just does these steps:
    
    1) When the NO_HOLES feature is enabled it leaves the initial range
       intact - no longer sets it to [0, LLONG_MAX] when the full sync bit
       is set in the inode. If NO_HOLES is not enabled, always set the range
       to a full, just like before this change, to avoid missing file extent
       items representing holes after replaying the log (for both full and
       fast fsyncs);
    
    2) Make the hole detection code to operate only on the fsync range;
    
    3) Make the code that copies items from the fs/subvolume tree to skip
       copying file extent items that cover a range completely outside the
       range of the fsync.
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    0a8068a3
file.c 93.1 KB