• Filipe Manana's avatar
    Btrfs: fix race between balance and unused block group deletion · 67c5e7d4
    Filipe Manana authored
    We have a race between deleting an unused block group and balancing the
    same block group that leads to an assertion failure/BUG(), producing the
    following trace:
    
    [181631.208236] BTRFS: assertion failed: 0, file: fs/btrfs/volumes.c, line: 2622
    [181631.220591] ------------[ cut here ]------------
    [181631.222959] kernel BUG at fs/btrfs/ctree.h:4062!
    [181631.223932] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    [181631.224566] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse acpi_cpufreq parpor$
    [181631.224566] CPU: 8 PID: 17451 Comm: btrfs Tainted: G        W       4.1.0-rc5-btrfs-next-10+ #1
    [181631.224566] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
    [181631.224566] task: ffff880127e09590 ti: ffff8800b5824000 task.ti: ffff8800b5824000
    [181631.224566] RIP: 0010:[<ffffffffa03f19f6>]  [<ffffffffa03f19f6>] assfail.constprop.50+0x1e/0x20 [btrfs]
    [181631.224566] RSP: 0018:ffff8800b5827ae8  EFLAGS: 00010246
    [181631.224566] RAX: 0000000000000040 RBX: ffff8800109fc218 RCX: ffffffff81095dce
    [181631.224566] RDX: 0000000000005124 RSI: ffffffff81464819 RDI: 00000000ffffffff
    [181631.224566] RBP: ffff8800b5827ae8 R08: 0000000000000001 R09: 0000000000000000
    [181631.224566] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800109fc200
    [181631.224566] R13: ffff880020095000 R14: ffff8800b1a13f38 R15: ffff880020095000
    [181631.224566] FS:  00007f70ca0b0c80(0000) GS:ffff88013ec00000(0000) knlGS:0000000000000000
    [181631.224566] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [181631.224566] CR2: 00007f2872ab6e68 CR3: 00000000a717c000 CR4: 00000000000006e0
    [181631.224566] Stack:
    [181631.224566]  ffff8800b5827ba8 ffffffffa03f3916 ffff8800b5827b38 ffffffffa03d080e
    [181631.224566]  ffffffffa03d1423 ffff880020095000 ffff88001233c000 0000000000000001
    [181631.224566]  ffff880020095000 ffff8800b1a13f38 0000000a69c00000 0000000000000000
    [181631.224566] Call Trace:
    [181631.224566]  [<ffffffffa03f3916>] btrfs_remove_chunk+0xa4/0x6bb [btrfs]
    [181631.224566]  [<ffffffffa03d080e>] ? join_transaction.isra.8+0xb9/0x3ba [btrfs]
    [181631.224566]  [<ffffffffa03d1423>] ? wait_current_trans.isra.13+0x22/0xfc [btrfs]
    [181631.224566]  [<ffffffffa03f3fbc>] btrfs_relocate_chunk.isra.29+0x8f/0xa7 [btrfs]
    [181631.224566]  [<ffffffffa03f54df>] btrfs_balance+0xaa4/0xc52 [btrfs]
    [181631.224566]  [<ffffffffa03fd388>] btrfs_ioctl_balance+0x23f/0x2b0 [btrfs]
    [181631.224566]  [<ffffffff810872f9>] ? trace_hardirqs_on+0xd/0xf
    [181631.224566]  [<ffffffffa04019a3>] btrfs_ioctl+0xfe2/0x2220 [btrfs]
    [181631.224566]  [<ffffffff812603ed>] ? __this_cpu_preempt_check+0x13/0x15
    [181631.224566]  [<ffffffff81084669>] ? arch_local_irq_save+0x9/0xc
    [181631.224566]  [<ffffffff81138def>] ? handle_mm_fault+0x834/0xcd2
    [181631.224566]  [<ffffffff81138def>] ? handle_mm_fault+0x834/0xcd2
    [181631.224566]  [<ffffffff8103e48c>] ? __do_page_fault+0x211/0x424
    [181631.224566]  [<ffffffff811755e6>] do_vfs_ioctl+0x3c6/0x479
    (...)
    
    The sequence of steps leading to this are:
    
               CPU 0                                         CPU 1
    
      btrfs_balance()
        btrfs_relocate_chunk()
    
          btrfs_relocate_block_group(bg X)
            btrfs_lookup_block_group(bg X)
    
                                                   cleaner_kthread
                                                      locks fs_info->cleaner_mutex
    
                                                      btrfs_delete_unused_bgs()
                                                        finds bg X, which became
                                                        unused in the previous
                                                        transaction
    
                                                        checks bg X ->ro == 0,
                                                        so it proceeds
            sets bg X ->ro to 1
            (btrfs_set_block_group_ro(bg X))
    
            blocks on fs_info->cleaner_mutex
                                                        btrfs_remove_chunk(bg X)
                                                      unlocks fs_info->cleaner_mutex
    
            acquires fs_info->cleaner_mutex
            relocate_block_group()
              --> does nothing, no extents found in
                  the extent tree from bg X
            unlocks fs_info->cleaner_mutex
    
          btrfs_relocate_block_group(bg X) returns
    
        btrfs_remove_chunk(bg X)
           extent map not found
              --> ASSERT(0)
    
    Fix this by using a new mutex to make sure these 2 operations, block
    group relocation and removal, are serialized.
    
    This issue is reproducible by running fstests generic/038 (which stresses
    chunk allocation and automatic removal of unused block groups) together
    with the following balance loop:
    
        while true; do btrfs balance start -dusage=0 <mountpoint> ; done
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    67c5e7d4
ctree.h 136 KB