• Guoqing Jiang's avatar
    md: fix raid10 hang issue caused by barrier · e820d55c
    Guoqing Jiang authored
    When both regular IO and resync IO happen at the same time,
    and if we also need to split regular. Then we can see tasks
    hang due to barrier.
    
    1. resync thread
    [ 1463.757205] INFO: task md1_resync:5215 blocked for more than 480 seconds.
    [ 1463.757207]       Not tainted 4.19.5-1-default #1
    [ 1463.757209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1463.757212] md1_resync      D    0  5215      2 0x80000000
    [ 1463.757216] Call Trace:
    [ 1463.757223]  ? __schedule+0x29a/0x880
    [ 1463.757231]  ? raise_barrier+0x8d/0x140 [raid10]
    [ 1463.757236]  schedule+0x78/0x110
    [ 1463.757243]  raise_barrier+0x8d/0x140 [raid10]
    [ 1463.757248]  ? wait_woken+0x80/0x80
    [ 1463.757257]  raid10_sync_request+0x1f6/0x1e30 [raid10]
    [ 1463.757265]  ? _raw_spin_unlock_irq+0x22/0x40
    [ 1463.757284]  ? is_mddev_idle+0x125/0x137 [md_mod]
    [ 1463.757302]  md_do_sync.cold.78+0x404/0x969 [md_mod]
    [ 1463.757311]  ? wait_woken+0x80/0x80
    [ 1463.757336]  ? md_rdev_init+0xb0/0xb0 [md_mod]
    [ 1463.757351]  md_thread+0xe9/0x140 [md_mod]
    [ 1463.757358]  ? _raw_spin_unlock_irqrestore+0x2e/0x60
    [ 1463.757364]  ? __kthread_parkme+0x4c/0x70
    [ 1463.757369]  kthread+0x112/0x130
    [ 1463.757374]  ? kthread_create_worker_on_cpu+0x40/0x40
    [ 1463.757380]  ret_from_fork+0x3a/0x50
    
    2. regular IO
    [ 1463.760679] INFO: task kworker/0:8:5367 blocked for more than 480 seconds.
    [ 1463.760683]       Not tainted 4.19.5-1-default #1
    [ 1463.760684] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1463.760687] kworker/0:8     D    0  5367      2 0x80000000
    [ 1463.760718] Workqueue: md submit_flushes [md_mod]
    [ 1463.760721] Call Trace:
    [ 1463.760731]  ? __schedule+0x29a/0x880
    [ 1463.760741]  ? wait_barrier+0xdd/0x170 [raid10]
    [ 1463.760746]  schedule+0x78/0x110
    [ 1463.760753]  wait_barrier+0xdd/0x170 [raid10]
    [ 1463.760761]  ? wait_woken+0x80/0x80
    [ 1463.760768]  raid10_write_request+0xf2/0x900 [raid10]
    [ 1463.760774]  ? wait_woken+0x80/0x80
    [ 1463.760778]  ? mempool_alloc+0x55/0x160
    [ 1463.760795]  ? md_write_start+0xa9/0x270 [md_mod]
    [ 1463.760801]  ? try_to_wake_up+0x44/0x470
    [ 1463.760810]  raid10_make_request+0xc1/0x120 [raid10]
    [ 1463.760816]  ? wait_woken+0x80/0x80
    [ 1463.760831]  md_handle_request+0x121/0x190 [md_mod]
    [ 1463.760851]  md_make_request+0x78/0x190 [md_mod]
    [ 1463.760860]  generic_make_request+0x1c6/0x470
    [ 1463.760870]  raid10_write_request+0x77a/0x900 [raid10]
    [ 1463.760875]  ? wait_woken+0x80/0x80
    [ 1463.760879]  ? mempool_alloc+0x55/0x160
    [ 1463.760895]  ? md_write_start+0xa9/0x270 [md_mod]
    [ 1463.760904]  raid10_make_request+0xc1/0x120 [raid10]
    [ 1463.760910]  ? wait_woken+0x80/0x80
    [ 1463.760926]  md_handle_request+0x121/0x190 [md_mod]
    [ 1463.760931]  ? _raw_spin_unlock_irq+0x22/0x40
    [ 1463.760936]  ? finish_task_switch+0x74/0x260
    [ 1463.760954]  submit_flushes+0x21/0x40 [md_mod]
    
    So resync io is waiting for regular write io to complete to
    decrease nr_pending (conf->barrier++ is called before waiting).
    The regular write io splits another bio after call wait_barrier
    which call nr_pending++, then the splitted bio would continue
    with raid10_write_request -> wait_barrier, so the splitted bio
    has to wait for barrier to be zero, then deadlock happens as
    follows.
    
    	resync io		regular io
    
    	raise_barrier
    				wait_barrier
    				generic_make_request
    				wait_barrier
    
    To resolve the issue, we need to call allow_barrier to decrease
    nr_pending before generic_make_request since regular IO is not
    issued to underlying devices, and wait_barrier is called again
    to ensure no internal IO happening.
    
    Fixes: fc9977dd ("md/raid10: simplify the splitting of requests.")
    Reported-and-tested-by: default avatarSiniša Bandin <sinisa@4net.rs>
    Signed-off-by: default avatarGuoqing Jiang <gqjiang@suse.com>
    Signed-off-by: default avatarShaohua Li <shli@fb.com>
    e820d55c
raid10.c 137 KB