• Jaegeuk Kim's avatar
    f2fs: reuse the locked dnode page and its inode · b292dcab
    Jaegeuk Kim authored
    This patch fixes the following deadlock bug during the recovery.
    
    INFO: task mount:1322 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    mount           D ffffffff81125870     0  1322   1266 0x00000000
     ffff8801207e39d8 0000000000000046 ffff88012ab1dee0 0000000000000046
     ffff8801207e3a08 ffff880115903f40 ffff8801207e3fd8 ffff8801207e3fd8
     ffff8801207e3fd8 ffff880115903f40 ffff8801207e39d8 ffff88012fc94520
    Call Trace:
    [<ffffffff81125870>] ? __lock_page+0x70/0x70
    [<ffffffff816a92d9>] schedule+0x29/0x70
    [<ffffffff816a93af>] io_schedule+0x8f/0xd0
    [<ffffffff8112587e>] sleep_on_page+0xe/0x20
    [<ffffffff816a649a>] __wait_on_bit_lock+0x5a/0xc0
    [<ffffffff81125867>] __lock_page+0x67/0x70
    [<ffffffff8106c7b0>] ? autoremove_wake_function+0x40/0x40
    [<ffffffff81126857>] find_lock_page+0x67/0x80
    [<ffffffff8112698f>] find_or_create_page+0x3f/0xb0
    [<ffffffffa03901a8>] ? sync_inode_page+0xa8/0xd0 [f2fs]
    [<ffffffffa038fdf7>] get_node_page+0x67/0x180 [f2fs]
    [<ffffffffa039818b>] recover_fsync_data+0xacb/0xff0 [f2fs]
    [<ffffffff816aaa1e>] ? _raw_spin_unlock+0x3e/0x40
    [<ffffffffa0389634>] f2fs_fill_super+0x7d4/0x850 [f2fs]
    [<ffffffff81184cf9>] mount_bdev+0x1c9/0x210
    [<ffffffffa0388e60>] ? validate_superblock+0x180/0x180 [f2fs]
    [<ffffffffa0387635>] f2fs_mount+0x15/0x20 [f2fs]
    [<ffffffff81185a13>] mount_fs+0x43/0x1b0
    [<ffffffff81145ba0>] ? __alloc_percpu+0x10/0x20
    [<ffffffff811a0796>] vfs_kern_mount+0x76/0x120
    [<ffffffff811a2cb7>] do_mount+0x237/0xa10
    [<ffffffff81140b9b>] ? strndup_user+0x5b/0x80
    [<ffffffff811a3520>] SyS_mount+0x90/0xe0
    [<ffffffff816b3502>] system_call_fastpath+0x16/0x1b
    
    The bug is triggered when check_index_in_prev_nodes tries to get the direct
    node page by calling get_node_page.
    At this point, if the direct node page is already locked by get_dnode_of_data,
    its caller, we got a deadlock condition.
    
    This patch adds additional condition check for the reuse of locked direct node
    pages prior to the get_node_page call.
    Signed-off-by: default avatarJaegeuk Kim <jaegeuk.kim@samsung.com>
    b292dcab
recovery.c 10.3 KB