• Filipe Manana's avatar
    Btrfs: make defrag not fragment files when using prealloc extents · e2127cf0
    Filipe Manana authored
    When using prealloc extents, a file defragment operation may actually
    fragment the file and increase the amount of data space used by the file.
    This change fixes that behaviour.
    
    Example:
    
    $ mkfs.btrfs -f /dev/sdb3
    $ mount /dev/sdb3 /mnt
    $ cd /mnt
    $ xfs_io -f -c 'falloc 0 1048576' foobar && sync
    $ xfs_io -c 'pwrite -S 0xff -b 100000 5000 100000' foobar
    $ xfs_io -c 'pwrite -S 0xac -b 100000 200000 100000' foobar
    $ xfs_io -c 'pwrite -S 0xe1 -b 100000 900000 100000' foobar && sync
    
    Before defragmenting the file:
    
    $ btrfs filesystem df /mnt
    Data, single: total=8.00MiB, used=1.25MiB
    System, DUP: total=8.00MiB, used=16.00KiB
    System, single: total=4.00MiB, used=0.00
    Metadata, DUP: total=1.00GiB, used=112.00KiB
    Metadata, single: total=8.00MiB, used=0.00
    
    $ btrfs-debug-tree /dev/sdb3
    (...)
    	item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
    		prealloc data disk byte 12845056 nr 1048576
    		prealloc data offset 0 nr 4096
    	item 7 key (257 EXTENT_DATA 4096) itemoff 15757 itemsize 53
    		extent data disk byte 12845056 nr 1048576
    		extent data offset 4096 nr 102400 ram 1048576
    		extent compression 0
    	item 8 key (257 EXTENT_DATA 106496) itemoff 15704 itemsize 53
    		prealloc data disk byte 12845056 nr 1048576
    		prealloc data offset 106496 nr 90112
    	item 9 key (257 EXTENT_DATA 196608) itemoff 15651 itemsize 53
    		extent data disk byte 12845056 nr 1048576
    		extent data offset 196608 nr 106496 ram 1048576
    		extent compression 0
    	item 10 key (257 EXTENT_DATA 303104) itemoff 15598 itemsize 53
    		prealloc data disk byte 12845056 nr 1048576
    		prealloc data offset 303104 nr 593920
    	item 11 key (257 EXTENT_DATA 897024) itemoff 15545 itemsize 53
    		extent data disk byte 12845056 nr 1048576
    		extent data offset 897024 nr 106496 ram 1048576
    		extent compression 0
    	item 12 key (257 EXTENT_DATA 1003520) itemoff 15492 itemsize 53
    		prealloc data disk byte 12845056 nr 1048576
    		prealloc data offset 1003520 nr 45056
    (...)
    
    Now defragmenting the file results in more data space used than before:
    
    $ btrfs filesystem defragment -f foobar && sync
    $ btrfs filesystem df /mnt
    Data, single: total=8.00MiB, used=1.55MiB
    System, DUP: total=8.00MiB, used=16.00KiB
    System, single: total=4.00MiB, used=0.00
    Metadata, DUP: total=1.00GiB, used=112.00KiB
    Metadata, single: total=8.00MiB, used=0.00
    
    And the corresponding file extent items are now no longer perfectly sequential
    as before, and we're now needlessly using more space from data block groups:
    
    $ btrfs-debug-tree /dev/sdb3
    (...)
    	item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
    		extent data disk byte 12845056 nr 1048576
    		extent data offset 0 nr 4096 ram 1048576
    		extent compression 0
    	item 7 key (257 EXTENT_DATA 4096) itemoff 15757 itemsize 53
    		extent data disk byte 13893632 nr 102400
    		extent data offset 0 nr 102400 ram 102400
    		extent compression 0
    	item 8 key (257 EXTENT_DATA 106496) itemoff 15704 itemsize 53
    		extent data disk byte 12845056 nr 1048576
    		extent data offset 106496 nr 90112 ram 1048576
    		extent compression 0
    	item 9 key (257 EXTENT_DATA 196608) itemoff 15651 itemsize 53
    		extent data disk byte 13996032 nr 106496
    		extent data offset 0 nr 106496 ram 106496
    		extent compression 0
    	item 10 key (257 EXTENT_DATA 303104) itemoff 15598 itemsize 53
    		prealloc data disk byte 12845056 nr 1048576
    		prealloc data offset 303104 nr 593920
    	item 11 key (257 EXTENT_DATA 897024) itemoff 15545 itemsize 53
    		extent data disk byte 14102528 nr 106496
    		extent data offset 0 nr 106496 ram 106496
    		extent compression 0
    	item 12 key (257 EXTENT_DATA 1003520) itemoff 15492 itemsize 53
    		extent data disk byte 12845056 nr 1048576
    		extent data offset 1003520 nr 45056 ram 1048576
    		extent compression 0
    (...)
    
    With this change, the above example will no longer cause allocation of new data
    space nor change the sequentiality of the file extents, that is, defragment will
    be effectless, leaving all extent items pointing to the extent starting at disk
    byte 12845056.
    
    In a 20Gb filesystem I had, mounted with the autodefrag option and 20 files of
    400Mb each, initially consisting of a single prealloc extent of 400Mb, having
    random writes happening at a low rate, lead to a total of over ~17Gb of data
    space used, not far from eventually reaching an ENOSPC state.
    Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
    Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
    e2127cf0
ioctl.c 117 KB