• Josef Bacik's avatar
    btrfs: use the file extent tree infrastructure · 9ddc959e
    Josef Bacik authored
    We want to use this everywhere we modify the file extent items
    permanently.  These include:
    
      1) Inserting new file extents for writes and prealloc extents.
      2) Truncating inode items.
      3) btrfs_cont_expand().
      4) Insert inline extents.
      5) Insert new extents from log replay.
      6) Insert a new extent for clone, as it could be past i_size.
      7) Hole punching
    
    For hole punching in particular it might seem it's not necessary because
    anybody extending would use btrfs_cont_expand, however there is a corner
    that still can give us trouble.  Start with an empty file and
    
    fallocate KEEP_SIZE 1M-2M
    
    We now have a 0 length file, and a hole file extent from 0-1M, and a
    prealloc extent from 1M-2M.  Now
    
    punch 1M-1.5M
    
    Because this is past i_size we have
    
    [HOLE EXTENT][ NOTHING ][PREALLOC]
    [0        1M][1M   1.5M][1.5M  2M]
    
    with an i_size of 0.  Now if we pwrite 0-1.5M we'll increas our i_size
    to 1.5M, but our disk_i_size is still 0 until the ordered extent
    completes.
    
    However if we now immediately truncate 2M on the file we'll just call
    btrfs_cont_expand(inode, 1.5M, 2M), since our old i_size is 1.5M.  If we
    commit the transaction here and crash we'll expose the gap.
    
    To fix this we need to clear the file extent mapping for the range that
    we punched but didn't insert a corresponding file extent for.  This will
    mean the truncate will only get an disk_i_size set to 1M if we crash
    before the finish ordered io happens.
    
    I've written an xfstest to reproduce the problem and validate this fix.
    Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    9ddc959e
delayed-inode.c 51.5 KB