• Guoqing Jiang's avatar
    md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang · 7bcda714
    Guoqing Jiang authored
    When some node leaves cluster, then it's bitmap need to be
    synced by another node, so "md*_recover" thread is triggered
    for the purpose. However, with below steps. we can find tasks
    hang happened either in B or C.
    
    1. Node A create a resyncing cluster raid1, assemble it in
       other two nodes (B and C).
    2. stop array in B and C.
    3. stop array in A.
    
    linux44:~ # ps aux|grep md|grep D
    root	5938	0.0  0.1  19852  1964 pts/0    D+   14:52   0:00 mdadm -S md0
    root	5939	0.0  0.0      0     0 ?        D    14:52   0:00 [md0_recover]
    
    linux44:~ # cat /proc/5939/stack
    [<ffffffffa04cf321>] dlm_lock_sync+0x71/0x90 [md_cluster]
    [<ffffffffa04d0705>] recover_bitmaps+0x125/0x220 [md_cluster]
    [<ffffffffa052105d>] md_thread+0x16d/0x180 [md_mod]
    [<ffffffff8107ad94>] kthread+0xb4/0xc0
    [<ffffffff8152a518>] ret_from_fork+0x58/0x90
    
    linux44:~ # cat /proc/5938/stack
    [<ffffffff8107afde>] kthread_stop+0x6e/0x120
    [<ffffffffa0519da0>] md_unregister_thread+0x40/0x80 [md_mod]
    [<ffffffffa04cfd20>] leave+0x70/0x120 [md_cluster]
    [<ffffffffa0525e24>] md_cluster_stop+0x14/0x30 [md_mod]
    [<ffffffffa05269ab>] bitmap_free+0x14b/0x150 [md_mod]
    [<ffffffffa0523f3b>] do_md_stop+0x35b/0x5a0 [md_mod]
    [<ffffffffa0524e83>] md_ioctl+0x873/0x1590 [md_mod]
    [<ffffffff81288464>] blkdev_ioctl+0x214/0x7d0
    [<ffffffff811dd3dd>] block_ioctl+0x3d/0x40
    [<ffffffff811b92d4>] do_vfs_ioctl+0x2d4/0x4b0
    [<ffffffff811b9538>] SyS_ioctl+0x88/0xa0
    [<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b
    
    The problem is caused by recover_bitmaps can't reliably abort
    when the thread is unregistered. So dlm_lock_sync_interruptible
    is introduced to detect the thread's situation to fix the problem.
    Reviewed-by: default avatarNeilBrown <neilb@suse.com>
    Signed-off-by: default avatarGuoqing Jiang <gqjiang@suse.com>
    Signed-off-by: default avatarShaohua Li <shli@fb.com>
    7bcda714
md-cluster.c 35.4 KB