• chandan's avatar
    Btrfs: find_free_extent: Do not erroneously skip LOOP_CACHING_WAIT state · 13a0db5a
    chandan authored
    When executing generic/001 in a loop on a ppc64 machine (with both sectorsize
    and nodesize set to 64k), the following call trace is observed,
    
    WARNING: at /root/repos/linux/fs/btrfs/locking.c:253
    Modules linked in:
    CPU: 2 PID: 8353 Comm: umount Not tainted 4.3.0-rc5-13676-ga5e681d9 #54
    task: c0000000f2b1f560 ti: c0000000f6008000 task.ti: c0000000f6008000
    NIP: c000000000520c88 LR: c0000000004a3b34 CTR: 0000000000000000
    REGS: c0000000f600a820 TRAP: 0700   Not tainted  (4.3.0-rc5-13676-ga5e681d9)
    MSR: 8000000102029032 <SF,VEC,EE,ME,IR,DR,RI>  CR: 24444884  XER: 00000000
    CFAR: c0000000004a3b30 SOFTE: 1
    GPR00: c0000000004a3b34 c0000000f600aaa0 c00000000108ac00 c0000000f5a808c0
    GPR04: 0000000000000000 c0000000f600ae60 0000000000000000 0000000000000005
    GPR08: 00000000000020a1 0000000000000001 c0000000f2b1f560 0000000000000030
    GPR12: 0000000084842882 c00000000fdc0900 c0000000f600ae60 c0000000f070b800
    GPR16: 0000000000000000 c0000000f3c8a000 0000000000000000 0000000000000049
    GPR20: 0000000000000001 0000000000000001 c0000000f5aa01f8 0000000000000000
    GPR24: 0f83e0f83e0f83e1 c0000000f5a808c0 c0000000f3c8d000 c000000000000000
    GPR28: c0000000f600ae74 0000000000000001 c0000000f3c8d000 c0000000f5a808c0
    NIP [c000000000520c88] .btrfs_tree_lock+0x48/0x2a0
    LR [c0000000004a3b34] .btrfs_lock_root_node+0x44/0x80
    Call Trace:
    [c0000000f600aaa0] [c0000000f600ab80] 0xc0000000f600ab80 (unreliable)
    [c0000000f600ab80] [c0000000004a3b34] .btrfs_lock_root_node+0x44/0x80
    [c0000000f600ac00] [c0000000004a99dc] .btrfs_search_slot+0xa8c/0xc00
    [c0000000f600ad40] [c0000000004ab878] .btrfs_insert_empty_items+0x98/0x120
    [c0000000f600adf0] [c00000000050da44] .btrfs_finish_chunk_alloc+0x1d4/0x620
    [c0000000f600af20] [c0000000004be854] .btrfs_create_pending_block_groups+0x1d4/0x2c0
    [c0000000f600b020] [c0000000004bf188] .do_chunk_alloc+0x3c8/0x420
    [c0000000f600b100] [c0000000004c27cc] .find_free_extent+0xbfc/0x1030
    [c0000000f600b260] [c0000000004c2ce8] .btrfs_reserve_extent+0xe8/0x250
    [c0000000f600b330] [c0000000004c2f90] .btrfs_alloc_tree_block+0x140/0x590
    [c0000000f600b440] [c0000000004a47b4] .__btrfs_cow_block+0x124/0x780
    [c0000000f600b530] [c0000000004a4fc0] .btrfs_cow_block+0xf0/0x250
    [c0000000f600b5e0] [c0000000004a917c] .btrfs_search_slot+0x22c/0xc00
    [c0000000f600b720] [c00000000050aa40] .btrfs_remove_chunk+0x1b0/0x9f0
    [c0000000f600b850] [c0000000004c4e04] .btrfs_delete_unused_bgs+0x434/0x570
    [c0000000f600b950] [c0000000004d3cb8] .close_ctree+0x2e8/0x3b0
    [c0000000f600ba20] [c00000000049d178] .btrfs_put_super+0x18/0x30
    [c0000000f600ba90] [c000000000243cd4] .generic_shutdown_super+0xa4/0x1a0
    [c0000000f600bb10] [c0000000002441d8] .kill_anon_super+0x18/0x30
    [c0000000f600bb90] [c00000000049c898] .btrfs_kill_super+0x18/0xc0
    [c0000000f600bc10] [c0000000002444f8] .deactivate_locked_super+0x98/0xe0
    [c0000000f600bc90] [c000000000269f94] .cleanup_mnt+0x54/0xa0
    [c0000000f600bd10] [c0000000000bd744] .task_work_run+0xc4/0x100
    [c0000000f600bdb0] [c000000000016334] .do_notify_resume+0x74/0x80
    [c0000000f600be30] [c0000000000098b8] .ret_from_except_lite+0x64/0x68
    Instruction dump:
    fba1ffe8 fbc1fff0 fbe1fff8 7c791b78 f8010010 f821ff21 e94d0290 81030040
    812a04e8 7d094a78 7d290034 5529d97e <0b090000> 3b400000 3be30050 3bc3004c
    
    The above call trace is seen even on x86_64; albeit very rarely and that too
    with nodesize set to 64k and with nospace_cache mount option being used.
    
    The reason for the above call trace is,
    btrfs_remove_chunk
      check_system_chunk
        Allocate chunk if required
      For each physical stripe on underlying device,
        btrfs_free_dev_extent
          ...
          Take lock on Device tree's root node
          btrfs_cow_block("dev tree's root node");
            btrfs_reserve_extent
              find_free_extent
    	    index = BTRFS_RAID_DUP;
    	    have_caching_bg = false;
    
                When in LOOP_CACHING_NOWAIT state, Assume we find a block group
    	    which is being cached; Hence have_caching_bg is set to true
    
                When repeating the search for the next RAID index, we set
    	    have_caching_bg to false.
    
    Hence right after completing the LOOP_CACHING_NOWAIT state, we incorrectly
    skip LOOP_CACHING_WAIT state and move to LOOP_ALLOC_CHUNK state where we
    allocate a chunk and try to add entries corresponding to the chunk's physical
    stripe into the device tree. When doing so the task deadlocks itself waiting
    for the blocking lock on the root node of the device tree.
    
    This commit fixes the issue by introducing a new local variable to help
    indicate as to whether a block group of any RAID type is being cached.
    Signed-off-by: default avatarChandan Rajendra <chandan@linux.vnet.ibm.com>
    Reviewed-by: default avatarJosef Bacik <jbacik@fb.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    13a0db5a
extent-tree.c 287 KB