• Filipe Manana's avatar
    btrfs: always pin deleted leaves when there are active tree mod log users · 485df755
    Filipe Manana authored
    When freeing a tree block we may end up adding its extent back to the
    free space cache/tree, as long as there are no more references for it,
    it was created in the current transaction and writeback for it never
    happened. This is generally fine, however when we have tree mod log
    operations it can result in inconsistent versions of a btree after
    unwinding extent buffers with the recorded tree mod log operations.
    
    This is because:
    
    * We only log operations for nodes (adding and removing key/pointers),
      for leaves we don't do anything;
    
    * This means that we can log a MOD_LOG_KEY_REMOVE_WHILE_FREEING operation
      for a node that points to a leaf that was deleted;
    
    * Before we apply the logged operation to unwind a node, we can have
      that leaf's extent allocated again, either as a node or as a leaf, and
      possibly for another btree. This is possible if the leaf was created in
      the current transaction and writeback for it never started, in which
      case btrfs_free_tree_block() returns its extent back to the free space
      cache/tree;
    
    * Then, before applying the tree mod log operation, some task allocates
      the metadata extent just freed before, and uses it either as a leaf or
      as a node for some btree (can be the same or another one, it does not
      matter);
    
    * After applying the MOD_LOG_KEY_REMOVE_WHILE_FREEING operation we now
      get the target node with an item pointing to the metadata extent that
      now has content different from what it had before the leaf was deleted.
      It might now belong to a different btree and be a node and not a leaf
      anymore.
    
      As a consequence, the results of searches after the unwinding can be
      unpredictable and produce unexpected results.
    
    So make sure we pin extent buffers corresponding to leaves when there
    are tree mod log users.
    
    CC: stable@vger.kernel.org # 4.14+
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    485df755
extent-tree.c 163 KB