• Boris Burkov's avatar
    btrfs: fix encoded write i_size corruption with no-holes · e7db9e5c
    Boris Burkov authored
    We have observed a btrfs filesystem corruption on workloads using
    no-holes and encoded writes via send stream v2. The symptom is that a
    file appears to be truncated to the end of its last aligned extent, even
    though the final unaligned extent and even the file extent and otherwise
    correctly updated inode item have been written.
    
    So if we were writing out a 1MiB+X file via 8 128K extents and one
    extent of length X, i_size would be set to 1MiB, but the ninth extent,
    nbyte, etc. would all appear correct otherwise.
    
    The source of the race is a narrow (one line of code) window in which a
    no-holes fs has read in an updated i_size, but has not yet set a shared
    disk_i_size variable to write. Therefore, if two ordered extents run in
    parallel (par for the course for receive workloads), the following
    sequence can play out: (following "threads" a bit loosely, since there
    are callbacks involved for endio but extra threads aren't needed to
    cause the issue)
    
      ENC-WR1 (second to last)                                         ENC-WR2 (last)
      -------                                                          -------
      btrfs_do_encoded_write
        set i_size = 1M
        submit bio B1 ending at 1M
      endio B1
      btrfs_inode_safe_disk_i_size_write
        local i_size = 1M
        falls off a cliff for some reason
    							      btrfs_do_encoded_write
    								set i_size = 1M+X
    								submit bio B2 ending at 1M+X
    							      endio B2
    							      btrfs_inode_safe_disk_i_size_write
    								local i_size = 1M+X
    								disk_i_size = 1M+X
        disk_i_size = 1M
    							      btrfs_delayed_update_inode
        btrfs_delayed_update_inode
    
    And the delayed inode ends up filled with nbytes=1M+X and isize=1M, and
    writes respect i_size and present a corrupted file missing its last
    extents.
    
    Fix this by holding the inode lock in the no-holes case so that a thread
    can't sneak in a write to disk_i_size that gets overwritten with an out
    of date i_size.
    
    Fixes: 41a2ee75 ("btrfs: introduce per-inode file extent tree")
    CC: stable@vger.kernel.org # 5.10+
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarBoris Burkov <boris@bur.io>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    e7db9e5c
file-item.c 37.7 KB