• Filipe Manana's avatar
    btrfs: send, orphanize first all conflicting inodes when processing references · 98272bb7
    Filipe Manana authored
    When doing an incremental send it is possible that when processing the new
    references for an inode we end up issuing rename or link operations that
    have an invalid path, which contains the orphanized name of a directory
    before we actually orphanized it, causing the receiver to fail.
    
    The following reproducer triggers such scenario:
    
      $ cat reproducer.sh
      #!/bin/bash
    
      mkfs.btrfs -f /dev/sdi >/dev/null
      mount /dev/sdi /mnt/sdi
    
      touch /mnt/sdi/a
      touch /mnt/sdi/b
      mkdir /mnt/sdi/testdir
      # We want "a" to have a lower inode number then "testdir" (257 vs 259).
      mv /mnt/sdi/a /mnt/sdi/testdir/a
    
      # Filesystem looks like:
      #
      # .                           (ino 256)
      # |----- testdir/             (ino 259)
      # |          |----- a         (ino 257)
      # |
      # |----- b                    (ino 258)
    
      btrfs subvolume snapshot -r /mnt/sdi /mnt/sdi/snap1
      btrfs send -f /tmp/snap1.send /mnt/sdi/snap1
    
      # Now rename 259 to "testdir_2", then change the name of 257 to
      # "testdir" and make it a direct descendant of the root inode (256).
      # Also create a new link for inode 257 with the old name of inode 258.
      # By swapping the names and location of several inodes and create a
      # nasty dependency chain of rename and link operations.
      mv /mnt/sdi/testdir/a /mnt/sdi/a2
      touch /mnt/sdi/testdir/a
      mv /mnt/sdi/b /mnt/sdi/b2
      ln /mnt/sdi/a2 /mnt/sdi/b
      mv /mnt/sdi/testdir /mnt/sdi/testdir_2
      mv /mnt/sdi/a2 /mnt/sdi/testdir
    
      # Filesystem now looks like:
      #
      # .                            (ino 256)
      # |----- testdir_2/            (ino 259)
      # |          |----- a          (ino 260)
      # |
      # |----- testdir               (ino 257)
      # |----- b                     (ino 257)
      # |----- b2                    (ino 258)
    
      btrfs subvolume snapshot -r /mnt/sdi /mnt/sdi/snap2
      btrfs send -f /tmp/snap2.send -p /mnt/sdi/snap1 /mnt/sdi/snap2
    
      mkfs.btrfs -f /dev/sdj >/dev/null
      mount /dev/sdj /mnt/sdj
    
      btrfs receive -f /tmp/snap1.send /mnt/sdj
      btrfs receive -f /tmp/snap2.send /mnt/sdj
    
      umount /mnt/sdi
      umount /mnt/sdj
    
    When running the reproducer, the receive of the incremental send stream
    fails:
    
      $ ./reproducer.sh
      Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1'
      At subvol /mnt/sdi/snap1
      Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2'
      At subvol /mnt/sdi/snap2
      At subvol snap1
      At snapshot snap2
      ERROR: link b -> o259-6-0/a failed: No such file or directory
    
    The problem happens because of the following:
    
    1) Before we start iterating the list of new references for inode 257,
       we generate its current path and store it at @valid_path, done at
       the very beginning of process_recorded_refs(). The generated path
       is "o259-6-0/a", containing the orphanized name for inode 259;
    
    2) Then we iterate over the list of new references, which has the
       references "b" and "testdir" in that specific order;
    
    3) We process reference "b" first, because it is in the list before
       reference "testdir". We then issue a link operation to create
       the new reference "b" using a target path corresponding to the
       content at @valid_path, which corresponds to "o259-6-0/a".
       However we haven't yet orphanized inode 259, its name is still
       "testdir", and not "o259-6-0". The orphanization of 259 did not
       happen yet because we will process the reference named "testdir"
       for inode 257 only in the next iteration of the loop that goes
       over the list of new references.
    
    Fix the issue by having a preliminar iteration over all the new references
    at process_recorded_refs(). This iteration is responsible only for doing
    the orphanization of other inodes that have and old reference that
    conflicts with one of the new references of the inode we are currently
    processing. The emission of rename and link operations happen now in the
    next iteration of the new references.
    
    A test case for fstests will follow soon.
    
    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    98272bb7
send.c 178 KB