• Zygo Blaxell's avatar
    btrfs: add missing memset while reading compressed inline extents · e1699d2d
    Zygo Blaxell authored
    This is a story about 4 distinct (and very old) btrfs bugs.
    
    Commit c8b97818 ("Btrfs: Add zlib compression support") added
    three data corruption bugs for inline extents (bugs #1-3).
    
    Commit 93c82d57 ("Btrfs: zero page past end of inline file items")
    fixed bug #1:  uncompressed inline extents followed by a hole and more
    extents could get non-zero data in the hole as they were read.  The fix
    was to add a memset in btrfs_get_extent to zero out the hole.
    
    Commit 166ae5a4 ("btrfs: fix inline compressed read err corruption")
    fixed bug #2:  compressed inline extents which contained non-zero bytes
    might be replaced with zero bytes in some cases.  This patch removed an
    unhelpful memset from uncompress_inline, but the case where memset is
    required was missed.
    
    There is also a memset in the decompression code, but this only covers
    decompressed data that is shorter than the ram_bytes from the extent
    ref record.  This memset doesn't cover the region between the end of the
    decompressed data and the end of the page.  It has also moved around a
    few times over the years, so there's no single patch to refer to.
    
    This patch fixes bug #3:  compressed inline extents followed by a hole
    and more extents could get non-zero data in the hole as they were read
    (i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/).
    The fix is the same:  zero out the hole in the compressed case too,
    by putting a memset back in uncompress_inline, but this time with
    correct parameters.
    
    The last and oldest bug, bug #0, is the cause of the offending inline
    extent/hole/extent pattern.  Bug #0 is a subtle and mostly-harmless quirk
    of behavior somewhere in the btrfs write code.  In a few special cases,
    an inline extent and hole are allowed to persist where they normally
    would be combined with later extents in the file.
    
    A fast reproducer for bug #0 is presented below.  A few offending extents
    are also created in the wild during large rsync transfers with the -S
    flag.  A Linux kernel build (git checkout; make allyesconfig; make -j8)
    will produce a handful of offending files as well.  Once an offending
    file is created, it can present different content to userspace each
    time it is read.
    
    Bug #0 is at least 4 and possibly 8 years old.  I verified every vX.Y
    kernel back to v3.5 has this behavior.  There are fossil records of this
    bug's effects in commits all the way back to v2.6.32.  I have no reason
    to believe bug #0 wasn't present at the beginning of btrfs compression
    support in v2.6.29, but I can't easily test kernels that old to be sure.
    
    It is not clear whether bug #0 is worth fixing.  A fix would likely
    require injecting extra reads into currently write-only paths, and most
    of the exceptional cases caused by bug #0 are already handled now.
    
    Whether we like them or not, bug #0's inline extents followed by holes
    are part of the btrfs de-facto disk format now, and we need to be able
    to read them without data corruption or an infoleak.  So enough about
    bug #0, let's get back to bug #3 (this patch).
    
    An example of on-disk structure leading to data corruption found in
    the wild:
    
            item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160
                    inode generation 50 transid 50 size 47424 nbytes 49141
                    block group 0 mode 100644 links 1 uid 0 gid 0
                    rdev 0 flags 0x0(none)
            item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20
                    inode ref index 3 namelen 10 name: DB_File.so
            item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362
                    inline extent data size 1341 ram 4085 compress(zlib)
            item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53
                    extent data disk byte 5367308288 nr 20480
                    extent data offset 0 nr 45056 ram 45056
                    extent compression(zlib)
    
    Different data appears in userspace during each read of the 11 bytes
    between 4085 and 4096.  The extent in item 63 is not long enough to
    fill the first page of the file, so a memset is required to fill the
    space between item 63 (ending at 4085) and item 64 (beginning at 4096)
    with zero.
    
    Here is a reproducer from Liu Bo, which demonstrates another method
    of creating the same inline extent and hole pattern:
    
    Using 'page_poison=on' kernel command line (or enable
    CONFIG_PAGE_POISONING) run the following:
    
    	# touch foo
    	# chattr +c foo
    	# xfs_io -f -c "pwrite -W 0 1000" foo
    	# xfs_io -f -c "falloc 4 8188" foo
    	# od -x foo
    	# echo 3 >/proc/sys/vm/drop_caches
    	# od -x foo
    
    This produce the following on my box:
    
    Correct output:  file contains 1000 data bytes followed
    by zeros:
    
    	0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
    	*
    	0001740 cdcd cdcd cdcd cdcd 0000 0000 0000 0000
    	0001760 0000 0000 0000 0000 0000 0000 0000 0000
    	*
    	0020000
    
    Actual output:  the data after the first 1000 bytes
    will be different each run:
    
    	0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
    	*
    	0001740 cdcd cdcd cdcd cdcd 6c63 7400 635f 006d
    	0001760 5f74 6f43 7400 435f 0053 5f74 7363 7400
    	0002000 435f 0056 5f74 6164 7400 645f 0062 5f74
    	(...)
    Signed-off-by: default avatarZygo Blaxell <ce3g8jdj@umail.furryterror.org>
    Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
    Reviewed-by: default avatarChris Mason <clm@fb.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    e1699d2d
inode.c 283 KB