drbd: fix potential protocol error and resulting disconnect/reconnect

When we notice a disk failure on the receiving side, we stop sending it new incoming writes. Depending on exact timing of various events, the same transfer log epoch could end up containing both replicated (before we noticed the failure) and local-only requests (after we noticed the failure). The sanity checks in tl_release(), called when receiving a P_BARRIER_ACK, check that the ack'ed transfer log epoch matches the expected epoch, and the number of contained writes matches the number of ack'ed writes. In this case, they counted both replicated and local-only writes, but the peer only acknowledges those it has seen. We get a mismatch, resulting in a protocol error and disconnect/reconnect cycle. Messages logged are "BAD! BarrierAck #%u received with n_writes=%u, expected n_writes=%u!\n" A similar issue can also be triggered when starting a resync while having a healthy replication link, by invalidating one side, forcing a full sync, or attaching to a diskless node. Fix this by closing the current epoch if the state changes in a way that would cause the replication intent of the next write. Epochs now contain either only non-replicated, or only replicated writes. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix potential protocol error and resulting disconnect/reconnect
When we notice a disk failure on the receiving side, we stop sending it new incoming writes. Depending on exact timing of various events, the same transfer log epoch could end up containing both replicated (before we noticed the failure) and local-only requests (after we noticed the failure). The sanity checks in tl_release(), called when receiving a P_BARRIER_ACK, check that the ack'ed transfer log epoch matches the expected epoch, and the number of contained writes matches the number of ack'ed writes. In this case, they counted both replicated and local-only writes, but the peer only acknowledges those it has seen. We get a mismatch, resulting in a protocol error and disconnect/reconnect cycle. Messages logged are "BAD! BarrierAck #%u received with n_writes=%u, expected n_writes=%u!\n" A similar issue can also be triggered when starting a resync while having a healthy replication link, by invalidating one side, forcing a full sync, or attaching to a diskless node. Fix this by closing the current epoch if the state changes in a way that would cause the replication intent of the next write. Epochs now contain either only non-replicated, or only replicated writes. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2681f7f6 · Lars Ellenberg · Philipp Reisner · d2ec180c · 2681f7f6 · 2681f7f6
Commit 2681f7f6 authored Jan 21, 2013 by Lars Ellenberg Committed by Philipp Reisner Jan 21, 2013
Showing with 9 additions and 1 deletion

drivers/block/drbd/drbd_req.c drivers/block/drbd/drbd_req.c +1 -1

drivers/block/drbd/drbd_req.h drivers/block/drbd/drbd_req.h +1 -0

drivers/block/drbd/drbd_state.c drivers/block/drbd/drbd_state.c +7 -0

No files found.
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -168,7 +168,7 @@ static void wake_all_senders(struct drbd_tconn *tconn) {
 }

 /* must hold resource->req_lock */
-static void start_new_tl_epoch(struct drbd_tconn *tconn)
+void start_new_tl_epoch(struct drbd_tconn *tconn)
 {
 	/* no point closing an epoch, if it is empty, anyways. */
 	if (tconn->current_tle_writes == 0)

--- a/drivers/block/drbd/drbd_req.h
+++ b/drivers/block/drbd/drbd_req.h
@@ -267,6 +267,7 @@ struct bio_and_error {
 	int error;
 };

+extern void start_new_tl_epoch(struct drbd_tconn *tconn);
 extern void drbd_req_destroy(struct kref *kref);
 extern void _req_may_be_done(struct drbd_request *req,
 		struct bio_and_error *m);

--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -931,6 +931,7 @@ __drbd_set_state(struct drbd_conf *mdev, union drbd_state ns,
 	enum drbd_state_rv rv = SS_SUCCESS;
 	enum sanitize_state_warnings ssw;
 	struct after_state_chg_work *ascw;
+	bool did_remote, should_do_remote;

 	os = drbd_read_state(mdev);

@@ -981,11 +982,17 @@ __drbd_set_state(struct drbd_conf *mdev, union drbd_state ns,
 	    (os.disk != D_DISKLESS && ns.disk == D_DISKLESS))
 		atomic_inc(&mdev->local_cnt);

+	did_remote = drbd_should_do_remote(mdev->state);
 	mdev->state.i = ns.i;
+	should_do_remote = drbd_should_do_remote(mdev->state);
 	mdev->tconn->susp = ns.susp;
 	mdev->tconn->susp_nod = ns.susp_nod;
 	mdev->tconn->susp_fen = ns.susp_fen;

+	/* put replicated vs not-replicated requests in seperate epochs */
+	if (did_remote != should_do_remote)
+		start_new_tl_epoch(mdev->tconn);
+
 	if (os.disk == D_ATTACHING && ns.disk >= D_NEGOTIATING)
 		drbd_print_uuids(mdev, "attached to UUIDs");