• Filipe Manana's avatar
    Btrfs: use generic_remap_file_range_prep() for cloning and deduplication · 34a28e3d
    Filipe Manana authored
    Since cloning and deduplication are no longer Btrfs specific operations, we
    now have generic code to handle parameter validation, compare file ranges
    used for deduplication, clear capabilities when cloning, etc. This change
    makes Btrfs use it, eliminating a lot of code in Btrfs and also fixing a
    few bugs, such as:
    
    1) When cloning, the destination file's capabilities were not dropped
       (the fstest generic/513 tests this);
    
    2) We were not checking if the destination file is immutable;
    
    3) Not checking if either the source or destination files are swap
       files (swap file support is coming soon for Btrfs);
    
    4) System limits were not checked (resource limits and O_LARGEFILE).
    
    Note that the generic helper generic_remap_file_range_prep() does start
    and waits for writeback by calling filemap_write_and_wait_range(), however
    that is not enough for Btrfs for two reasons:
    
    1) With compression, we need to start writeback twice in order to get the
       pages marked for writeback and ordered extents created;
    
    2) filemap_write_and_wait_range() (and all its other variants) only waits
       for the IO to complete, but we need to wait for the ordered extents to
       finish, so that when we do the actual reflinking operations the file
       extent items are in the fs tree. This is also important due to the fact
       that the generic helper, for the deduplication case, compares the
       contents of the pages in the requested range, which might require
       reading extents from disk in the very unlikely case that pages get
       invalidated after writeback finishes (so the file extent items must be
       up to date in the fs tree).
    
    Since these reasons are specific to Btrfs we have to do it in the Btrfs
    code before calling generic_remap_file_range_prep(). This also results
    in a simpler way of dealing with existing delalloc in the source/target
    ranges, specially for the deduplication case where we used to lock all
    the pages first and then if we found any dealloc for the range, or
    ordered extent, we would unlock the pages trigger writeback and wait for
    ordered extents to complete, then lock all the pages again and check if
    deduplication can be done. So now we get a simpler approach: lock the
    inodes, then trigger writeback and then wait for ordered extents to
    complete.
    
    So make btrfs use generic_remap_file_range_prep() (XFS and OCFS2 use it)
    to eliminate duplicated code, fix a few bugs and benefit from future bug
    fixes done there - for example the recent clone and dedupe bugs involving
    reflinking a partial EOF block got a counterpart fix in the generic
    helper, since it affected all filesystems supporting these operations,
    so we no longer need special checks in Btrfs for them.
    Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    34a28e3d
ioctl.c 139 KB