• Filipe Manana's avatar
    btrfs: avoid duplicated resolution of indirect backrefs during fiemap · 877c1476
    Filipe Manana authored
    During fiemap, when determining if a data extent is shared or not, if we
    don't find the extent is directly shared, then we need to determine if
    it's shared through subtrees. For that we need to resolve the indirect
    reference we found in order to figure out the path in the inode's fs tree,
    which is a path starting at the fs tree's root node and going down to the
    leaf that contains the file extent item that points to the data extent.
    We then proceed to determine if any extent buffer in that path is shared
    with other trees or not.
    
    Currently whenever we find the data extent that a file extent item points
    to is not directly shared, we always resolve the path in the fs tree, and
    then check if any extent buffer in the path is shared. This is a lot of
    work and when we have file extent items that belong to the same leaf, we
    have the same path, so we only need to calculate it once.
    
    This change does that, it keeps track of the current and previous leaf,
    and when we find that a data extent is not directly shared, we try to
    compute the fs tree path only once and then use it for every other file
    extent item in the same leaf, using the existing cached path result for
    the leaf as long as the cache results are valid.
    
    This saves us from doing expensive b+tree searches in the fs tree of our
    target inode, as well as other minor work.
    
    The following test was run on a non-debug kernel (Debian's default kernel
    config):
    
       $ cat test-with-snapshots.sh
       #!/bin/bash
    
       DEV=/dev/sdi
       MNT=/mnt/sdi
    
       umount $DEV &> /dev/null
       mkfs.btrfs -f $DEV
       # Use compression to quickly create files with a lot of extents
       # (each with a size of 128K).
       mount -o compress=lzo $DEV $MNT
    
       # 40G gives 327680 extents, each with a size of 128K.
       xfs_io -f -c "pwrite -S 0xab -b 1M 0 40G" $MNT/foobar
    
       # Add some more files to increase the size of the fs and extent
       # trees (in the real world there's a lot of files and extents
       # from other files).
       xfs_io -f -c "pwrite -S 0xcd -b 1M 0 20G" $MNT/file1
       xfs_io -f -c "pwrite -S 0xef -b 1M 0 20G" $MNT/file2
       xfs_io -f -c "pwrite -S 0x73 -b 1M 0 20G" $MNT/file3
    
       # Create a snapshot so all the extents become indirectly shared
       # through subtrees, with a generation less than or equals to the
       # generation used to create the snapshot.
       btrfs subvolume snapshot -r $MNT $MNT/snap1
    
       umount $MNT
       mount -o compress=lzo $DEV $MNT
    
       start=$(date +%s%N)
       filefrag $MNT/foobar
       end=$(date +%s%N)
       dur=$(( (end - start) / 1000000 ))
       echo "fiemap took $dur milliseconds (metadata not cached)"
       echo
    
       start=$(date +%s%N)
       filefrag $MNT/foobar
       end=$(date +%s%N)
       dur=$(( (end - start) / 1000000 ))
       echo "fiemap took $dur milliseconds (metadata cached)"
    
       umount $MNT
    
    Result before applying this patch:
    
       (...)
       /mnt/sdi/foobar: 327680 extents found
       fiemap took 1204 milliseconds (metadata not cached)
    
       /mnt/sdi/foobar: 327680 extents found
       fiemap took 729 milliseconds (metadata cached)
    
    Result after applying this patch:
    
       (...)
       /mnt/sdi/foobar: 327680 extents found
       fiemap took 732 milliseconds (metadata not cached)
    
       /mnt/sdi/foobar: 327680 extents found
       fiemap took 421 milliseconds (metadata cached)
    
    That's a -46.1% total reduction for the metadata not cached case, and
    a -42.2% reduction for the cached metadata case.
    
    The test is somewhat limited in the sense the gains may be higher in
    practice, because in the test the filesystem is small, so we have small
    fs and extent trees, plus there's no concurrent access to the trees as
    well, therefore no lock contention there.
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    877c1476
backref.c 94.5 KB