• Filipe Manana's avatar
    btrfs: use cached state when looking for delalloc ranges with lseek · 3c32c721
    Filipe Manana authored
    During lseek (SEEK_HOLE/DATA), whenever we find a hole or prealloc extent,
    we will look for delalloc in that range, and one of the things we do for
    that is to find out ranges in the inode's io_tree marked with
    EXTENT_DELALLOC, using calls to count_range_bits().
    
    Typically there's a single, or few, searches in the io_tree for delalloc
    per lseek call. However it's common for applications to keep calling
    lseek with SEEK_HOLE and SEEK_DATA to find where extents and holes are in
    a file, read the extents and skip holes in order to avoid unnecessary IO
    and save disk space by preserving holes.
    
    One popular user is the cp utility from coreutils. Starting with coreutils
    9.0, cp uses SEEK_HOLE and SEEK_DATA to iterate over the extents of a
    file. Before 9.0, it used fiemap to figure out where holes and extents are
    in the source file. Another popular user is the tar utility when used with
    the --sparse / -S option to detect and preserve holes.
    
    Given that the pattern is to keep calling lseek with a start offset that
    matches the returned offset from the previous lseek call, we can benefit
    from caching the last extent state visited in count_range_bits() and use
    it for the next count_range_bits() from the next lseek call. Example,
    the following strace excerpt from running tar:
    
       $ strace tar cJSvf foo.tar.xz qemu_disk_file.raw
       (...)
       lseek(5, 125019574272, SEEK_HOLE)       = 125024989184
       lseek(5, 125024989184, SEEK_DATA)       = 125024993280
       lseek(5, 125024993280, SEEK_HOLE)       = 125025239040
       lseek(5, 125025239040, SEEK_DATA)       = 125025255424
       lseek(5, 125025255424, SEEK_HOLE)       = 125025353728
       lseek(5, 125025353728, SEEK_DATA)       = 125025357824
       lseek(5, 125025357824, SEEK_HOLE)       = 125026766848
       lseek(5, 125026766848, SEEK_DATA)       = 125026770944
       lseek(5, 125026770944, SEEK_HOLE)       = 125027053568
       (...)
    
    Shows that pattern, which is the same as with cp from coreutils 9.0+.
    
    So start using a cached state for the delalloc searches in lseek, and
    store it in struct file's private data so that it can be reused across
    lseek calls.
    
    This change is part of a patchset that is comprised of the following
    patches:
    
      1/9 btrfs: remove leftover setting of EXTENT_UPTODATE state in an inode's io_tree
      2/9 btrfs: add an early exit when searching for delalloc range for lseek/fiemap
      3/9 btrfs: skip unnecessary delalloc searches during lseek/fiemap
      4/9 btrfs: search for delalloc more efficiently during lseek/fiemap
      5/9 btrfs: remove no longer used btrfs_next_extent_map()
      6/9 btrfs: allow passing a cached state record to count_range_bits()
      7/9 btrfs: update stale comment for count_range_bits()
      8/9 btrfs: use cached state when looking for delalloc ranges with fiemap
      9/9 btrfs: use cached state when looking for delalloc ranges with lseek
    
    The following test was run before and after applying the whole patchset:
    
       $ cat test-cp.sh
       #!/bin/bash
    
       DEV=/dev/sdh
       MNT=/mnt/sdh
    
       # coreutils 8.32, cp uses fiemap to detect holes and extents
       #CP_PROG=/usr/bin/cp
       # coreutils 9.1, cp uses SEEK_HOLE/DATA to detect holes and extents
       CP_PROG=/home/fdmanana/git/hub/coreutils/src/cp
    
       umount $DEV &> /dev/null
       mkfs.btrfs -f $DEV
       mount $DEV $MNT
    
       FILE_SIZE=$((1024 * 1024 * 1024))
       echo "Creating file with a size of $((FILE_SIZE / 1024 / 1024))M"
       # Create a very sparse file, where each extent has a length of 4K and
       # is preceded by a 4K hole and followed by another 4K hole.
       start=$(date +%s%N)
       echo -n > $MNT/foobar
       for ((off = 0; off < $FILE_SIZE; off += 8192)); do
               xfs_io -c "pwrite -S 0xab $off 4K" $MNT/foobar > /dev/null
               echo -ne "\r$off / $FILE_SIZE ..."
       done
       end=$(date +%s%N)
       echo -e "\nFile created ($(( (end - start) / 1000000 )) milliseconds)"
    
       start=$(date +%s%N)
       $CP_PROG $MNT/foobar /dev/null
       end=$(date +%s%N)
       dur=$(( (end - start) / 1000000 ))
       echo "cp took $dur milliseconds with data/metadata cached and delalloc"
    
       # Flush all delalloc.
       sync
    
       start=$(date +%s%N)
       $CP_PROG $MNT/foobar /dev/null
       end=$(date +%s%N)
       dur=$(( (end - start) / 1000000 ))
       echo "cp took $dur milliseconds with data/metadata cached and no delalloc"
    
       # Unmount and mount again to test the case without any metadata
       # loaded in memory.
       umount $MNT
       mount $DEV $MNT
    
       start=$(date +%s%N)
       $CP_PROG $MNT/foobar /dev/null
       end=$(date +%s%N)
       dur=$(( (end - start) / 1000000 ))
       echo "cp took $dur milliseconds without data/metadata cached and no delalloc"
    
       umount $MNT
    
    The results, running on a box with a non-debug kernel (Debian's default
    kernel config), were the following:
    
    128M file, before patchset:
    
       cp took 16574 milliseconds with data/metadata cached and delalloc
       cp took 122 milliseconds with data/metadata cached and no delalloc
       cp took 20144 milliseconds without data/metadata cached and no delalloc
    
    128M file, after patchset:
    
       cp took 6277 milliseconds with data/metadata cached and delalloc
       cp took 109 milliseconds with data/metadata cached and no delalloc
       cp took 210 milliseconds without data/metadata cached and no delalloc
    
    512M file, before patchset:
    
       cp took 14369 milliseconds with data/metadata cached and delalloc
       cp took 429 milliseconds with data/metadata cached and no delalloc
       cp took 88034 milliseconds without data/metadata cached and no delalloc
    
    512M file, after patchset:
    
       cp took 12106 milliseconds with data/metadata cached and delalloc
       cp took 427 milliseconds with data/metadata cached and no delalloc
       cp took 824 milliseconds without data/metadata cached and no delalloc
    
    1G file, before patchset:
    
       cp took 10074 milliseconds with data/metadata cached and delalloc
       cp took 886 milliseconds with data/metadata cached and no delalloc
       cp took 181261 milliseconds without data/metadata cached and no delalloc
    
    1G file, after patchset:
    
       cp took 3320 milliseconds with data/metadata cached and delalloc
       cp took 880 milliseconds with data/metadata cached and no delalloc
       cp took 1801 milliseconds without data/metadata cached and no delalloc
    Reported-by: default avatarWang Yugui <wangyugui@e16-tech.com>
    Link: https://lore.kernel.org/linux-btrfs/20221106073028.71F9.409509F4@e16-tech.com/
    Link: https://lore.kernel.org/linux-btrfs/CAL3q7H5NSVicm7nYBJ7x8fFkDpno8z3PYt5aPU43Bajc1H0h1Q@mail.gmail.com/Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    3c32c721
ctree.h 23 KB