Commit c91fb14e authored by Kirill Smelkov's avatar Kirill Smelkov Committed by Levin Zimmermann

wcfs: tests: Extend faulty protection tests with more kinds of faulty clients

So far we were testing only against faulty client that reads pin
notification ok, but does not reply to the notification. But there could
be more problems:

1) a client does not read pin notification at all
2) a client closes watchlink abruptly after reading pin notification
3) a client replies to pin notification but the reply is not "ack"

The first problem, if not handled leads to whole set of clients to
become stuck on reading the same block as the faulty client. The other
problems also indicate breakage of the isolation protocol from the client
side and that wcfs can no longer be sure that it provides good
uncorrupted data to the client.

In the first case, similarly to "no reply" situation we need to kill the
client to make progress while maintaining safety as well. In the cases 2
and 3 we cannot maintain safety if the faulty client remains in the set
of live and served clients, so it is also logical to send SIGBUS/SIGKILL
to it.

Killing a client with SIGBUS is similar to how OS kernel sends SIGBUS when
a memory-mapped file is accessed and loading file data results in EIO. It is
also similar to wendelin.core 1 where SIGBUS is raised if loading file block
results in an error.

Extend tests to cover all explained scenarios.

/reviewed-by @levin.zimmermann
/reviewed-on !18
parent 0c35ae45
This diff is collapsed.
...@@ -1459,7 +1459,8 @@ def test_wcfs_watch_robust(): ...@@ -1459,7 +1459,8 @@ def test_wcfs_watch_robust():
"file not yet known to wcfs or is not a ZBigFile" "file not yet known to wcfs or is not a ZBigFile"
wl.close() wl.close()
# closeTX/bye cancels blocked pin handlers # closeTX gently with "bye" cancels blocked pin handlers without killing client
# (closing abruptly is verified in wcfs_faultyprot_test.py)
f = t.open(zf) f = t.open(zf)
f.assertBlk(2, 'c2') f.assertBlk(2, 'c2')
f.assertCache([0,0,1]) f.assertCache([0,0,1])
...@@ -1467,23 +1468,15 @@ def test_wcfs_watch_robust(): ...@@ -1467,23 +1468,15 @@ def test_wcfs_watch_robust():
wl = t.openwatch() wl = t.openwatch()
wg = sync.WorkGroup(timeout()) wg = sync.WorkGroup(timeout())
def _(ctx): def _(ctx):
# TODO clarify what wcfs should do if pin handler closes wlink TX:
# - reply error + close, or
# - just close
# t = when reviewing WatchLink.serve in wcfs.go
#assert wl.sendReq(ctx, b"watch %s @%s" % (h(zf._p_oid), h(at1))) == \
# "error setup watch f<%s> @%s: " % (h(zf._p_oid), h(at1)) + \
# "pin #%d @%s: context canceled" % (2, h(at1))
#with raises(error, match="unexpected EOF"):
with raises(error, match="recvReply: link is down"): with raises(error, match="recvReply: link is down"):
wl.sendReq(ctx, b"watch %s @%s" % (h(zf._p_oid), h(at1))) wl.sendReq(ctx, b"watch %s @%s" % (h(zf._p_oid), h(at1)))
wg.go(_) wg.go(_)
def _(ctx): def _(ctx):
req = wl.recvReq(ctx) req = wl.recvReq(ctx)
assert req is not None assert req is not None
assert req.msg == b"pin %s #%d @%s" % (h(zf._p_oid), 2, h(at1)) assert req.msg == b"pin %s #%d @%s" % (h(zf._p_oid), 2, h(at1))
# don't reply to req - close instead # don't reply to req - close instead
# NOTE this closes watchlink gently with first sending "bye" message
wl.closeWrite() wl.closeWrite()
wg.go(_) wg.go(_)
wg.wait() wg.wait()
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment