• Filipe Manana's avatar
    btrfs: ignore fiemap path cache if we have multiple leaves for a data extent · 63c84b46
    Filipe Manana authored
    The path cache used during fiemap used to determine the sharedness of
    extent buffers in a path from a leaf containing a file extent item
    pointing to our data extent up to the root node of the tree, is meant to
    be used for a single path. Having a single path is by far the most common
    case, and therefore worth to optimize for, but it's possible to actually
    have multiple paths because we have 2 or more leaves.
    
    If we have multiple leaves, the 'level' variable keeps getting incremented
    in each iteration of the while loop at btrfs_is_data_extent_shared(),
    which means we will treat the second leaf in the 'tmp' ulist as a level 1
    node, and so forth. In the worst case this can lead to getting a level
    greater than or equals to BTRFS_MAX_LEVEL (8), which will trigger a
    WARN_ON_ONCE() in the functions to lookup from or store in the path cache
    (lookup_backref_shared_cache() and store_backref_shared_cache()). If the
    current level never goes beyond 8, due to shared nodes in the paths and
    a fs tree height smaller than 8, it can still result in incorrectly
    marking one leaf as shared because some other leaf is shared and is stored
    one level below that other leaf, as when storing a true sharedness value
    in the cache results in updating the sharedness to true of all entries in
    the cache below the current level.
    
    Having multiple leaves happens in a case like the following:
    
      - We have a file extent item point to data extent at bytenr X, for
        a file range [0, 1M[ for example;
    
      - At this moment we have an extent data ref for the extent, with
        an offset of 0 and a count of 1;
    
      - A write into the middle of the extent happens, file range [64K, 128K)
        so the file extent item is split into two (at btrfs_drop_extents()):
    
        1) One for file range [0, 64K), with a length (num_bytes field) of
           64K and an extent offset of 0;
    
        2) Another one for file range [128K, 1M), with a length of 896K
           (1M - 128K) and an extent offset of 128K.
    
      - At this moment the two file extent items are located in the same
        leaf;
    
      - A new file extent item for the range [64K, 128K), pointing to a new
        data extent, is inserted in the leaf. This results in a leaf split
        and now those two file extent items pointing to data extent X end
        up located in different leaves;
    
      - Once delayed refs are run, we still have a single extent data ref
        item for our data extent at bytenr X, for offset 0, but now with a
        count of 2 instead of 1;
    
      - So during fiemap, at btrfs_is_data_extent_shared(), after we call
        find_parent_nodes() for the data extent, we get two leaves, since
        we have two file extent items point to data extent at bytenr X that
        are located in two different leaves.
    
    So skip the use of the path cache when we get more than one leaf.
    
    Fixes: 12a824dc ("btrfs: speedup checking for extent sharedness during fiemap")
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    63c84b46
backref.h 10.9 KB