drbd: drbdsetup detach of an unresponsive local disk should not block IO "forever"

When detaching, we make sure no application IO is in-flight by internally suspending IO, then trigger the state change, wait for the result, and finally internally resume IO again. Once we triggered the stat change to "Failed", we expect it to change from Failed to Diskless. (To avoid races, we actually wait for it to leave "Failed"). On an unresponsive local IO backend, this may not happen, ever. Don't have a "hung" detach block IO "forever", but resume IO before waiting for the state change to Diskless. We may well be able to continue IO to and from a healthy peer. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>

drbd: drbdsetup detach of an unresponsive local disk should not block IO "forever"
When detaching, we make sure no application IO is in-flight by internally suspending IO, then trigger the state change, wait for the result, and finally internally resume IO again. Once we triggered the stat change to "Failed", we expect it to change from Failed to Diskless. (To avoid races, we actually wait for it to leave "Failed"). On an unresponsive local IO backend, this may not happen, ever. Don't have a "hung" detach block IO "forever", but resume IO before waiting for the state change to Diskless. We may well be able to continue IO to and from a healthy peer. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
05a72772 · Lars Ellenberg · Jens Axboe · 92f108b4 · 05a72772
Commit 05a72772 authored Jan 26, 2015 by Lars Ellenberg Committed by Jens Axboe Nov 25, 2015
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

drivers/block/drbd/drbd_nl.c drivers/block/drbd/drbd_nl.c +1 -1

No files found.
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1929,9 +1929,9 @@ static int adm_detach(struct drbd_device *device, int force)
 	retcode = drbd_request_state(device, NS(disk, D_FAILED));
 	drbd_md_put_buffer(device);
 	/* D_FAILED will transition to DISKLESS. */
+	drbd_resume_io(device);
 	ret = wait_event_interruptible(device->misc_wait,
 			device->state.disk != D_FAILED);
-	drbd_resume_io(device);
 	if ((int)retcode == (int)SS_IS_DISKLESS)
 		retcode = SS_NOTHING_TO_DO;
 	if (ret)