• Filipe Manana's avatar
    Btrfs: fix missing hole after hole punching and fsync when using NO_HOLES · 0e56315c
    Filipe Manana authored
    When using the NO_HOLES feature, if we punch a hole into a file and then
    fsync it, there are cases where a subsequent fsync will miss the fact that
    a hole was punched, resulting in the holes not existing after replaying
    the log tree.
    
    Essentially these cases all imply that, tree-log.c:copy_items(), is not
    invoked for the leafs that delimit holes, because nothing changed those
    leafs in the current transaction. And it's precisely copy_items() where
    we currenly detect and log holes, which works as long as the holes are
    between file extent items in the input leaf or between the beginning of
    input leaf and the previous leaf or between the last item in the leaf
    and the next leaf.
    
    First example where we miss a hole:
    
      *) The extent items of the inode span multiple leafs;
    
      *) The punched hole covers a range that affects only the extent items of
         the first leaf;
    
      *) The fsync operation is done in full mode (BTRFS_INODE_NEEDS_FULL_SYNC
         is set in the inode's runtime flags).
    
      That results in the hole not existing after replaying the log tree.
    
      For example, if the fs/subvolume tree has the following layout for a
      particular inode:
    
          Leaf N, generation 10:
    
          [ ... INODE_ITEM INODE_REF EXTENT_ITEM (0 64K) EXTENT_ITEM (64K 128K) ]
    
          Leaf N + 1, generation 10:
    
          [ EXTENT_ITEM (128K 64K) ... ]
    
      If at transaction 11 we punch a hole coverting the range [0, 128K[, we end
      up dropping the two extent items from leaf N, but we don't touch the other
      leaf, so we end up in the following state:
    
          Leaf N, generation 11:
    
          [ ... INODE_ITEM INODE_REF ]
    
          Leaf N + 1, generation 10:
    
          [ EXTENT_ITEM (128K 64K) ... ]
    
      A full fsync after punching the hole will only process leaf N because it
      was modified in the current transaction, but not leaf N + 1, since it
      was not modified in the current transaction (generation 10 and not 11).
      As a result the fsync will not log any holes, because it didn't process
      any leaf with extent items.
    
    Second example where we will miss a hole:
    
      *) An inode as its items spanning 5 (or more) leafs;
    
      *) A hole is punched and it covers only the extents items of the 3rd
         leaf. This resulsts in deleting the entire leaf and not touching any
         of the other leafs.
    
      So the only leaf that is modified in the current transaction, when
      punching the hole, is the first leaf, which contains the inode item.
      During the full fsync, the only leaf that is passed to copy_items()
      is that first leaf, and that's not enough for the hole detection
      code in copy_items() to determine there's a hole between the last
      file extent item in the 2nd leaf and the first file extent item in
      the 3rd leaf (which was the 4th leaf before punching the hole).
    
    Fix this by scanning all leafs and punch holes as necessary when doing a
    full fsync (less common than a non-full fsync) when the NO_HOLES feature
    is enabled. The lack of explicit file extent items to mark holes makes it
    necessary to scan existing extents to determine if holes exist.
    
    A test case for fstests follows soon.
    
    Fixes: 16e7549f ("Btrfs: incompatible format change to remove hole extents")
    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    0e56315c
tree-log.c 169 KB