• Qu Wenruo's avatar
    btrfs: zlib: fix and simplify the inline extent decompression · 2c25716d
    Qu Wenruo authored
    [BUG]
    
    If we have a filesystem with 4k sectorsize, and an inlined compressed
    extent created like this:
    
    	item 4 key (257 INODE_ITEM 0) itemoff 15863 itemsize 160
    		generation 8 transid 8 size 4096 nbytes 4096
    		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
    		sequence 1 flags 0x0(none)
    	item 5 key (257 INODE_REF 256) itemoff 15839 itemsize 24
    		index 2 namelen 14 name: source_inlined
    	item 6 key (257 EXTENT_DATA 0) itemoff 15770 itemsize 69
    		generation 8 type 0 (inline)
    		inline extent data size 48 ram_bytes 4096 compression 1 (zlib)
    
    Which has an inline compressed extent at file offset 0, and its
    decompressed size is 4K, allowing us to reflink that 4K range to another
    location (which will not be compressed).
    
    If we do such reflink on a subpage system, it would fail like this:
    
      # xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest
      XFS_IOC_CLONE_RANGE: Input/output error
    
    [CAUSE]
    In zlib_decompress(), we didn't treat @start_byte as just a page offset,
    but also use it as an indicator on whether we should switch our output
    buffer.
    
    In reality, for subpage cases, although @start_byte can be non-zero,
    we should never switch input/output buffer, since the whole input/output
    buffer should never exceed one sector.
    
    Note: The above assumption is only not true if we're going to support
    multi-page sectorsize.
    
    Thus the current code using @start_byte as a condition to switch
    input/output buffer or finish the decompression is completely incorrect.
    
    [FIX]
    The fix involves several modifications:
    
    - Rename @start_byte to @dest_pgoff to properly express its meaning
    
    - Add an extra ASSERT() inside btrfs_decompress() to make sure the
      input/output size never exceeds one sector.
    
    - Use Z_FINISH flag to make sure the decompression happens in one go
    
    - Remove the loop needed to switch input/output buffers
    
    - Use correct destination offset inside the destination page
    
    - Consider early end as an error
    
    After the fix, even on 64K page sized aarch64, above reflink now
    works as expected:
    
      # xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest
      linked 4096/4096 bytes at offset 61440
    
    And resulted a correct file layout:
    
    	item 9 key (258 INODE_ITEM 0) itemoff 15542 itemsize 160
    		generation 10 transid 10 size 65536 nbytes 4096
    		block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0
    		sequence 1 flags 0x0(none)
    	item 10 key (258 INODE_REF 256) itemoff 15528 itemsize 14
    		index 3 namelen 4 name: dest
    	item 11 key (258 XATTR_ITEM 3817753667) itemoff 15445 itemsize 83
    		location key (0 UNKNOWN.0 0) type XATTR
    		transid 10 data_len 37 name_len 16
    		name: security.selinux
    		data unconfined_u:object_r:unlabeled_t:s0
    	item 12 key (258 EXTENT_DATA 61440) itemoff 15392 itemsize 53
    		generation 10 type 1 (regular)
    		extent data disk byte 13631488 nr 4096
    		extent data offset 0 nr 4096 ram 4096
    		extent compression 0 (none)
    Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    2c25716d
compression.h 5.93 KB