• Filipe Manana's avatar
    Btrfs: fix loading of orphan roots leading to BUG_ON · 909c3a22
    Filipe Manana authored
    When looking for orphan roots during mount we can end up hitting a
    BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is
    replayed and qgroups are enabled. This is because after a log tree is
    replayed, a transaction commit is made, which triggers qgroup extent
    accounting which in turn does backref walking which ends up reading and
    inserting all roots in the radix tree fs_info->fs_root_radix, including
    orphan roots (deleted snapshots). So after the log tree is replayed, when
    finding orphan roots we hit the BUG_ON with the following trace:
    
    [118209.182438] ------------[ cut here ]------------
    [118209.183279] kernel BUG at fs/btrfs/root-tree.c:314!
    [118209.184074] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    [118209.185123] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic ppdev xor raid6_pq evdev sg parport_pc parport acpi_cpufreq tpm_tis tpm psmouse
    processor i2c_piix4 serio_raw pcspkr i2c_core button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata
    virtio_pci virtio_ring virtio scsi_mod e1000 floppy [last unloaded: btrfs]
    [118209.186318] CPU: 14 PID: 28428 Comm: mount Tainted: G        W       4.5.0-rc5-btrfs-next-24+ #1
    [118209.186318] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
    [118209.186318] task: ffff8801ec131040 ti: ffff8800af34c000 task.ti: ffff8800af34c000
    [118209.186318] RIP: 0010:[<ffffffffa04237d7>]  [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs]
    [118209.186318] RSP: 0018:ffff8800af34faa8  EFLAGS: 00010246
    [118209.186318] RAX: 00000000ffffffef RBX: 00000000ffffffef RCX: 0000000000000001
    [118209.186318] RDX: 0000000080000000 RSI: 0000000000000001 RDI: 00000000ffffffff
    [118209.186318] RBP: ffff8800af34fb08 R08: 0000000000000001 R09: 0000000000000000
    [118209.186318] R10: ffff8800af34f9f0 R11: 6db6db6db6db6db7 R12: ffff880171b97000
    [118209.186318] R13: ffff8801ca9d65e0 R14: ffff8800afa2e000 R15: 0000160000000000
    [118209.186318] FS:  00007f5bcb914840(0000) GS:ffff88023edc0000(0000) knlGS:0000000000000000
    [118209.186318] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [118209.186318] CR2: 00007f5bcaceb5d9 CR3: 00000000b49b5000 CR4: 00000000000006e0
    [118209.186318] Stack:
    [118209.186318]  fffffbffffffffff 010230ffffffffff 0101000000000000 ff84000000000000
    [118209.186318]  fbffffffffffffff 30ffffffffffffff 0000000000000101 ffff880082348000
    [118209.186318]  0000000000000000 ffff8800afa2e000 ffff8800afa2e000 0000000000000000
    [118209.186318] Call Trace:
    [118209.186318]  [<ffffffffa042e2db>] open_ctree+0x1e37/0x21b9 [btrfs]
    [118209.186318]  [<ffffffffa040a753>] btrfs_mount+0x97e/0xaed [btrfs]
    [118209.186318]  [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf
    [118209.186318]  [<ffffffff8117b87e>] mount_fs+0x67/0x131
    [118209.186318]  [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde
    [118209.186318]  [<ffffffffa0409f81>] btrfs_mount+0x1ac/0xaed [btrfs]
    [118209.186318]  [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf
    [118209.186318]  [<ffffffff8108c26b>] ? lockdep_init_map+0xb9/0x1b3
    [118209.186318]  [<ffffffff8117b87e>] mount_fs+0x67/0x131
    [118209.186318]  [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde
    [118209.186318]  [<ffffffff81195637>] do_mount+0x8a6/0x9e8
    [118209.186318]  [<ffffffff8119598d>] SyS_mount+0x77/0x9f
    [118209.186318]  [<ffffffff81493017>] entry_SYSCALL_64_fastpath+0x12/0x6b
    [118209.186318] Code: 64 00 00 85 c0 89 c3 75 24 f0 41 80 4c 24 20 20 49 8b bc 24 f0 01 00 00 4c 89 e6 e8 e8 65 00 00 85 c0 89 c3 74 11 83 f8 ef 75 02 <0f> 0b
    4c 89 e7 e8 da 72 00 00 eb 1c 41 83 bc 24 00 01 00 00 00
    [118209.186318] RIP  [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs]
    [118209.186318]  RSP <ffff8800af34faa8>
    [118209.230735] ---[ end trace 83938f987d85d477 ]---
    
    So fix this by not treating the error -EEXIST, returned when attempting
    to insert a root already inserted by the backref walking code, as an error.
    
    The following test case for xfstests reproduces the bug:
    
      seq=`basename $0`
      seqres=$RESULT_DIR/$seq
      echo "QA output created by $seq"
      tmp=/tmp/$$
      status=1	# failure is the default!
      trap "_cleanup; exit \$status" 0 1 2 3 15
    
      _cleanup()
      {
          _cleanup_flakey
          cd /
          rm -f $tmp.*
      }
    
      # get standard environment, filters and checks
      . ./common/rc
      . ./common/filter
      . ./common/dmflakey
    
      # real QA test starts here
      _supported_fs btrfs
      _supported_os Linux
      _require_scratch
      _require_dm_target flakey
      _require_metadata_journaling $SCRATCH_DEV
    
      rm -f $seqres.full
    
      _scratch_mkfs >>$seqres.full 2>&1
      _init_flakey
      _mount_flakey
    
      _run_btrfs_util_prog quota enable $SCRATCH_MNT
    
      # Create 2 directories with one file in one of them.
      # We use these just to trigger a transaction commit later, moving the file from
      # directory a to directory b and doing an fsync against directory a.
      mkdir $SCRATCH_MNT/a
      mkdir $SCRATCH_MNT/b
      touch $SCRATCH_MNT/a/f
      sync
    
      # Create our test file with 2 4K extents.
      $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io
    
      # Create a snapshot and delete it. This doesn't really delete the snapshot
      # immediately, just makes it inaccessible and invisible to user space, the
      # snapshot is deleted later by a dedicated kernel thread (cleaner kthread)
      # which is woke up at the next transaction commit.
      # A root orphan item is inserted into the tree of tree roots, so that if a
      # power failure happens before the dedicated kernel thread does the snapshot
      # deletion, the next time the filesystem is mounted it resumes the snapshot
      # deletion.
      _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap
      _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap
    
      # Now overwrite half of the extents we wrote before. Because we made a snapshpot
      # before, which isn't really deleted yet (since no transaction commit happened
      # after we did the snapshot delete request), the non overwritten extents get
      # referenced twice, once by the default subvolume and once by the snapshot.
      $XFS_IO_PROG -c "pwrite -S 0xbb 4K 8K" $SCRATCH_MNT/foobar | _filter_xfs_io
    
      # Now move file f from directory a to directory b and fsync directory a.
      # The fsync on the directory a triggers a transaction commit (because a file
      # was moved from it to another directory) and the file fsync leaves a log tree
      # with file extent items to replay.
      mv $SCRATCH_MNT/a/f $SCRATCH_MNT/a/b
      $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a
      $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar
    
      echo "File digest before power failure:"
      md5sum $SCRATCH_MNT/foobar | _filter_scratch
    
      # Now simulate a power failure and mount the filesystem to replay the log tree.
      # After the log tree was replayed, we used to hit a BUG_ON() when processing
      # the root orphan item for the deleted snapshot. This is because when processing
      # an orphan root the code expected to be the first code inserting the root into
      # the fs_info->fs_root_radix radix tree, while in reallity it was the second
      # caller attempting to do it - the first caller was the transaction commit that
      # took place after replaying the log tree, when updating the qgroup counters.
      _flakey_drop_and_remount
    
      echo "File digest before after failure:"
      # Must match what he got before the power failure.
      md5sum $SCRATCH_MNT/foobar | _filter_scratch
    
      _unmount_flakey
      status=0
      exit
    
    Fixes: 2d9e9776 ("Btrfs: use btrfs_get_fs_root in resolve_indirect_ref")
    Cc: stable@vger.kernel.org  # 4.4+
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    909c3a22
root-tree.c 12.9 KB