1. 25 Sep, 2024 5 commits
    • Kirill Smelkov's avatar
      wcfs: zdata: Fix loading ZBlk data saved on py3 · bf816c40
      Kirill Smelkov authored
      Previously TestZBlk was xfailing as e.g.
      
              --- FAIL: TestZBlk/py3_pickle3 (0.00s)
          panic: ZBlk0(0000000000000002): loadBlkData: wendelin.bigfile.file_zodb.ZBlk0(0000000000000002): activate: pysetstate: expect str; got ogórek.Bytes [recovered]
                  panic: ZBlk0(0000000000000002): loadBlkData: wendelin.bigfile.file_zodb.ZBlk0(0000000000000002): activate: pysetstate: expect str; got ogórek.Bytes
      
      because on py3 ZBlk data is saved as bytes while zblk.go code was trying
      to decode it as bytestring (str from py2).
      
      -> Fix that by adjusting ZBlk/go to follow py3 model with ZBlk data being
         considered to be bytes but also accepting bytestr from py2 for that.
      
      Now on go side loading ZBlk works for all py2_pickle{1,2,3} and
      py3_pickle3 data.
      bf816c40
    • Kirill Smelkov's avatar
      wcfs: v↑ neo/go · f237dc35
      Kirill Smelkov authored
      This brings ZODB/go updates to support Python3 from
      neo!8 that we will need to
      support Python3(*). For now do not add any py3 support on wcfs side - just
      update neo/go with preserving wcfs/py2 in working state.
      
      Note that this ZODB/go update brings backward incompatible changes to
      activate StrictUnicode and PyDict modes(+) on unpickling and also changes
      API of zodb.Map to follow that PyDict mode. We thus need to update
      go bits inside wcfs because else compilation and tests, expectedly,
      start to break in the following ways
      
          # lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/xbtree/xbtreetest
          internal/xbtree/xbtreetest/treeenv.go:292:24: zroot.Data undefined (type *zodb.Map has no field or method Data)
      
          # lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata [lab.nexedi.com/nexedi/wendelin.core/wcfs/internal/zdata.test]
          ./δftail_test.go:737:16: zroot.Data undefined (type *zodb.Map has no field or method Data)
      
          --- FAIL: TestZBlk (0.00s)
              --- FAIL: TestZBlk/py2_pickle1 (0.00s)
          panic: ZBlk0(0000000000000002): loadBlkData: wendelin.bigfile.file_zodb.ZBlk0(0000000000000002): activate: pysetstate: expect str; got ogórek.ByteString [recovered]
                  panic: ZBlk0(0000000000000002): loadBlkData: wendelin.bigfile.file_zodb.ZBlk0(0000000000000002): activate: pysetstate: expect str; got ogórek.ByteString
      
          --- FAIL: TestΔFtail (0.98s)
          panic: root['treegen/values']: key ["a"]: expected str, got ogórek.ByteString [recovered]
                  panic: file:///tmp/TestΔFtail4275282346/001/1.fs: @03fb8a25642d1e88: get blktab: root['treegen/values']: key ["a"]: expected str, got ogórek.ByteString [recovered]
                  panic: file:///tmp/TestΔFtail4275282346/001/1.fs: @03fb8a25642d1e88: get blktab: root['treegen/values']: key ["a"]: expected str, got ogórek.ByteString
      
          --- FAIL: TestΔBTail (1.13s)
          panic: root['treegen/values']: key ["a"]: expected str, got ogórek.ByteString [recovered]
                  panic: file:///tmp/TestΔBTail3238267558/001/1.fs: @03fb8a212cd06933: get blktab: root['treegen/values']: key ["a"]: expected str, got ogórek.ByteString [recovered]
                  panic: file:///tmp/TestΔBTail3238267558/001/1.fs: @03fb8a212cd06933: get blktab: root['treegen/values']: key ["a"]: expected str, got ogórek.ByteString
      
      The changes are minimal just to keep wcfs/py2 working ok. We will do more changes in the follow-up patches to support py3.
      
      All the changes in go.mod and go.sum are produced automatically by sole `go get lab.nexedi.com/kirr/neo/go@t+ypy3`.
      
      (*) see neo@efde5253...6235fb60 for full list of changes.
      (+) see neo@078b6262, neo@1e87fe5b,
          https://github.com/kisielk/og-rek/pull/68 and https://github.com/kisielk/og-rek/pull/75 for details.
      f237dc35
    • Kirill Smelkov's avatar
      wcfs: zdata: Replace pycompat.Int64 with ogórek.AsInt64 · d8249b62
      Kirill Smelkov authored
      Our "as int64" code was put into ogórek as generalized utility in
      https://github.com/kisielk/og-rek/commit/010fbd2e.
      d8249b62
    • Kirill Smelkov's avatar
      wcfs: v↑ ogórek v1.2.0 -> current v1.3 pre-release · 727ca0d6
      Kirill Smelkov authored
      Going from v1.2.0 to e691997e3596 brings in StrictUnicode + PyDict modes and other
      improvements: https://github.com/kisielk/og-rek/compare/v1.2.0...e691997e3596. We will
      need those improvements in the next patches.
      727ca0d6
    • Kirill Smelkov's avatar
      wcfs: zdata: Test ZBlk loading on all py2/py3 ZODB kinds of data we care about · 76f7e270
      Kirill Smelkov authored
      Previously we were testing ZBlk loading on Go side only with data generated by
      python2 and pickle protocol=2. However even on py2 there are more pickle
      protocols that are in use, and also there is python3.
      
      -> Modernize testdata/zblk_test_gen.py to use run_with_all_zodb_pickle_kinds
         that was recently added as part of nexedi/zodbtools@f9d36ba7
         and generate test data with both python2 and python3. It is handy to
         use py2py3-venv(*) to prepare python environment to do that.
      
         Adjust tests on Go side to verify how ZBlk is loaded for all generated zkinds.
      
      py2_pickle1, py2_pickle2 and py2_pickle3 are handled well.
      ZBlk test for py3_pickle3 currently fails with
      
              --- FAIL: TestZBlk/py3_pickle3 (0.01s)
          panic: ZBlk0(0000000000000002): loadBlkData: wendelin.bigfile.file_zodb.ZBlk0(0000000000000002): activate: pysetstate: expect str; got ogórek.Bytes [recovered]
      
      and so is marked with "xfail".
      
      We will fix tests for py3_pickle3 in follow-up patches.
      
      (*) see nexedi/zodbtools@fac2f190
      76f7e270
  2. 19 Sep, 2024 2 commits
  3. 18 Sep, 2024 1 commit
    • Kirill Smelkov's avatar
      wcfs: Clarify error context when WatchLink.sendReq is waiting for reply · 39d53cbb
      Kirill Smelkov authored
      sendReq has two phases: a) send request, and b) read reply. When there
      is an error on the first phase, e.g. client does not read what wcfs is
      trying to send, it returns an error like
      
          pin #2 @03fb63abd6d65b33: sendReq: send .2: context deadline exceeded
      
      however when there is an error on the second phase, e.g. client does not
      reply to wcfs request, it currently returns an error like
      
          pin #2 @03fb63abd6d65b33: sendReq: context deadline exceeded
      
      which is not clear to interpret about which part was problematic.
      
      After this patch the error for the second case becomes
      
          pin #2 @03fb63abd6d65b33: sendReq: waiting for reply: context deadline exceeded
      
      which is easier to interpret.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!31
      39d53cbb
  4. 17 Sep, 2024 28 commits
    • Levin Zimmermann's avatar
      wcfs: v↑ go dependencies · 764d4da8
      Levin Zimmermann authored
      This patch updates:
      
      - github.com/golang/glog: we already wanted to do so in
          nexedi/wendelin.core!23,
          but we deferred it to keep go 1.18 support. However in recent patches
          we already dropped go 1.18 support and we can therefore update glog now.
      - lab.nexedi.com/kirr/neo/go: add fix in handshake, see here for more information:
          neo@d75f4ac2 and
          neo@03db1d8a
      
      This patch doesn't update:
      
      - github.com/hanwen/go-fuse: This was updated upstream and Kirill already
          reviewed and integrated patches in custom branch. However when updating
          go-fuse to v2.4.3-0.20240904154523-9546fc238dc6 (this is
          go-fuse@9546fc23),
          WCFS tests fail on my machine [1] => let's defer update
      - github.com/kisielk/og-rek: there are new patches that will be needed
          in the future, but we didn't update NEO/go og-rek dependency yet,
          so let's defer the update in wendelin.core until we updated og-rek
          in NEO/go
      - github.com/johncgriffin/overflow: no update on upstream
      - github.com/pkg/errors: no update on upstream
      - github.com/stretchr/testify: This was already updated with
          nexedi/wendelin.core@c559ec1a
          'testify' seems to have a major release in the future which may break
          some of our test code, but for now major version 1 is still the
          stable release.
      
      ----
      kirr: I confirm that
      go-fuse@9546fc23 brings in
      regression to WCFS tests. It seems I missed some error in that go-fuse
      update and it will need to be bisected and debugged.
      
      ---
      
      [1] Test failure log:
      
      ========================================== FAILURES ==========================================
      ______________________________________ test_wcfs_basic _______________________________________
      
          @func
          def test_wcfs_basic():
              t = tDB(); zf = t.zfile
              defer(t.close)
      
              # >>> lookup non-BigFile -> must be rejected
              with raises(OSError) as exc:
                  t.wc._stat("head/bigfile/%s" % h(t.nonzfile._p_oid))
              assert exc.value.errno == EINVAL
      
              # >>> file initially empty
              f = t.open(zf)
              f.assertCache([])
              f.assertData ([], mtime=t.at0)
      
              # >>> (@at1) commit data -> we can see it on wcfs
              at1 = t.commit(zf, {2:'c1'})
      
              f.assertCache([0,0,0])  # initially not cached
              f.assertData (['','','c1'], mtime=t.head)
      
              # >>> (@at2) commit again -> we can see both latest and snapshotted states
              # NOTE blocks e(4) and f(5) will be accessed only in the end
              at2 = t.commit(zf, {2:'c2', 3:'d2', 5:'f2'})
      
              # f @head
      >       f.assertCache([1,1,0,0,0,0])
      
      wcfs/wcfs_test.py:1341:
      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
      
      t = <wcfs.wcfs_test.tFile instance at 0x7ff61457b960>, incorev = [1, 1, 0, 0, 0, 0]
      
          def assertCache(t, incorev):
      >       assert t.cached() == incorev
      E       assert [0, 0, 0, 0, 0, 0] == [1, 1, 0, 0, 0, 0]
      E         At index 0 diff: 0 != 1
      E         Use -v to get the full diff
      
      wcfs/wcfs_test.py:791: AssertionError
      ------------------------------------ Captured stdout call ------------------------------------
      
      M: commit -> @at0 (03fb5dfbe3c1cd55)
      
      M: commit -> @at1 (03fb5dfbe4936a66)
      M:      f<0000000000000002>     [2]
      
      M: commit -> @at2 (03fb5dfbe4d01166)
      M:      f<0000000000000002>     [2, 3, 5]
      >>> Change history by file:
      
      f<0000000000000002>:
                                      0 1 2 3 4 5 6 7
                                      a b c d e f g h
              @at0 (03fb5dfbe3c1cd55)
              @at1 (03fb5dfbe4936a66)     2
              @at2 (03fb5dfbe4d01166)     2 3   5
      
      ------------------------------------ Captured stderr call ------------------------------------
      I0917 12:43:53.392222  124283 wcfs.go:2752] start "/dev/shm/wcfs/0ca22ca24e4cff2d01c10aa546fe5d5ac64bce72" "file:///tmp/testdb_fs.z5ZoMH/1.fs"
      I0917 12:43:53.392282  124283 wcfs.go:2758] (built with go1.21.13)
      W0917 12:43:53.392404  124283 storage.go:232] zodb: FIXME: open file:///tmp/testdb_fs.z5ZoMH/1.fs: raw cache is not ready for invalidations -> NoCache forced
      W0917 12:43:53.567807  124283 wcfs.go:2331] /head/bigfile: lookup "0000000000000001": bigfopen 0000000000000001 @03fb5dfbe3c1cd55: invalid argument: ZODB.Broken("persistent.Persistent") is not a ZBigFile
      I0917 12:43:53.710208  124283 wcfs.go:2933] stop "/dev/shm/wcfs/0ca22ca24e4cff2d01c10aa546fe5d5ac64bce72" "file:///tmp/testdb_fs.z5ZoMH/1.fs"
      ------------------------------------- Captured log call --------------------------------------
      WARNING  ZODB.FileStorage:FileStorage.py:412 Ignoring index for /tmp/testdb_fs.z5ZoMH/1.fs
      _________________________________ test_wcfs_watch_vs_access __________________________________
      
          @func
          def test_wcfs_watch_vs_access():
              t = tDB(); zf = t.zfile; at0=t.at0
              defer(t.close)
      
              f = t.open(zf)
              at1 = t.commit(zf, {2:'c1'})
              at2 = t.commit(zf, {2:'c2', 3:'d2', 5:'f2'})
              at3 = t.commit(zf, {0:'a3', 2:'c3', 5:'f3'})
      
              f.assertData(['a3','','c3','d2','x','x'])
              f.assertCache([1,1,1,1,0,0])
      
              # watched + commit -> read -> receive pin messages.
              # read vs pin ordering is checked by assertBlk.
              #
              # f(5) is kept not accessed to check later how wcfs.go handles δFtail
              # rebuild after it sees not yet accessed ZBlk that has change history.
              wl3  = t.openwatch();  w3 = wl3.watch(zf, at3);  assert at3 == t.head
              assert w3.at     == at3
              assert w3.pinned == {}
      
              wl3_ = t.openwatch();  w3_ = wl3_.watch(zf, at3)
              assert w3_.at     == at3
              assert w3_.pinned == {}
      
              wl2  = t.openwatch();  w2 = wl2.watch(zf, at2)
              assert w2.at     == at2
              assert w2.pinned == {0:at0, 2:at2}
      
              # w_assertPin asserts on state of .pinned for {w3,w3_,w2}
              def w_assertPin(pinw3, pinw3_, pinw2):
                  assert w3.pinned   == pinw3
                  assert w3_.pinned  == pinw3_
                  assert w2.pinned   == pinw2
      
              f.assertCache([1,1,1,1,0,0])
              at4 = t.commit(zf, {1:'b4', 2:'c4', 5:'f4', 6:'g4'})
      >       f.assertCache([1,0,0,1,0,0,0])
      
      wcfs/wcfs_test.py:1702:
      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
      
      t = <wcfs.wcfs_test.tFile instance at 0x7ff614512050>, incorev = [1, 0, 0, 1, 0, 0, ...]
      
          def assertCache(t, incorev):
      >       assert t.cached() == incorev
      E       assert [0, 0, 0, 0, 0, 0, ...] == [1, 0, 0, 1, 0, 0, ...]
      E         At index 0 diff: 0 != 1
      E         Use -v to get the full diff
      
      wcfs/wcfs_test.py:791: AssertionError
      ------------------------------------ Captured stdout call ------------------------------------
      
      M: commit -> @at0 (03fb5dfc0fd82300)
      
      M: commit -> @at1 (03fb5dfc10b92ecc)
      M:      f<0000000000000049>     [2]
      
      M: commit -> @at2 (03fb5dfc10cee9dd)
      M:      f<0000000000000049>     [2, 3, 5]
      
      M: commit -> @at3 (03fb5dfc1100c999)
      M:      f<0000000000000049>     [0, 2, 5]
      
      C: setup watch f<0000000000000049> @at3 (03fb5dfc1100c999)
      
      C: setup watch f<0000000000000049> @at3 (03fb5dfc1100c999)
      
      C: setup watch f<0000000000000049> @at2 (03fb5dfc10cee9dd)
      
      M: commit -> @at4 (03fb5dfc120ed611)
      M:      f<0000000000000049>     [1, 2, 5, 6]
      >>> Change history by file:
      
      f<0000000000000049>:
                                      0 1 2 3 4 5 6 7
                                      a b c d e f g h
              @at0 (03fb5dfc0fd82300)
              @at1 (03fb5dfc10b92ecc)     2
              @at2 (03fb5dfc10cee9dd)     2 3   5
              @at3 (03fb5dfc1100c999) 0   2     5
              @at4 (03fb5dfc120ed611)   1 2     5 6
      
      ------------------------------------ Captured stderr call ------------------------------------
      I0917 12:44:03.733037  125217 wcfs.go:2752] start "/dev/shm/wcfs/0ca22ca24e4cff2d01c10aa546fe5d5ac64bce72" "file:///tmp/testdb_fs.z5ZoMH/1.fs"
      I0917 12:44:03.733126  125217 wcfs.go:2758] (built with go1.21.13)
      W0917 12:44:03.733418  125217 storage.go:232] zodb: FIXME: open file:///tmp/testdb_fs.z5ZoMH/1.fs: raw cache is not ready for invalidations -> NoCache forced
      I0917 12:44:04.475273  125217 wcfs.go:2933] stop "/dev/shm/wcfs/0ca22ca24e4cff2d01c10aa546fe5d5ac64bce72" "file:///tmp/testdb_fs.z5ZoMH/1.fs"
      ============================ 2 failed, 42 passed in 55.81 seconds ============================
      I0917 12:44:17.882140  125540 wcfs.go:2933] stop "/dev/shm/wcfs/c4d833a0bdea4c51decf5425b8ad2cc4d017280f" "file:///tmp/testdb_fs.bvHBy9/1.fs"
      make: *** [Makefile:174: test.wcfs] Error 1
      
      /reviewed-by @kirr
      /reviewed-on nexedi/wendelin.core!30
      764d4da8
    • Kirill Smelkov's avatar
      wcfs: Implement protection against faulty client + related fixes and improvements · 89d653c0
      Kirill Smelkov authored
      The WCFS documentation specifies [1]:
      
      - - - 8> - - - 8> - - -
      
      If a client, on purpose or due to a bug or being stopped, is slow to respond
      with ack to file invalidation notification, it creates a problem because the
      server will become blocked waiting for pin acknowledgments, and thus all
      other clients, that try to work with the same file, will get stuck.
      
      [...]
      
      Lacking OS primitives to change address space of another process and not
      being able to work it around with ptrace in userspace, wcfs takes approach
      to kill a slow client on 30 seconds timeout by default.
      
      - - - <8 - - - <8 - - -
      
      But before, this protection wasn't implemented yet: one
      faulty client could therefore freeze the whole system. With this work
      this protection is implemented now: faulty clients are killed after the
      timeout or any other misbehaviour in their pin handlers.
      
      Working on this topic also resulted in several fixes and improvements
      around isolation protocol implementation on the server side.
      
      See individual patches for details.
      
      [1] https://lab.nexedi.com/nexedi/wendelin.core/blob/38dde766/wcfs/wcfs.go#L186-208Co-authored-by: Levin Zimmermann's avatarLevin Zimmermann <levin.zimmermann@nexedi.com>
      
      /reviewed-on nexedi/wendelin.core!18
      89d653c0
    • Levin Zimmermann's avatar
      wcfs: Require Go 1.19 + go mod tidy · 1fcef9c9
      Levin Zimmermann authored
      I would only suggest one very tiny change. In go.mod we have:
      
          module lab.nexedi.com/nexedi/wendelin.core/wcfs
      
          go 1.14
      
      I think this needs to be updated to go 1.19 due to atomic.Int64.
      
      And maybe we just need general go mod tidy update.
      
      /reviewed-by @kirr
      /reviewed-on nexedi/wendelin.core!18
      1fcef9c9
    • Kirill Smelkov's avatar
      wcfs: Implement protection against faulty client · c559ec1a
      Kirill Smelkov authored
      The WCFS documentation specifies [1]:
      
      - - - 8> - - - 8> - - -
      
      If a client, on purpose or due to a bug or being stopped, is slow to respond
      with ack to file invalidation notification, it creates a problem because the
      server will become blocked waiting for pin acknowledgments, and thus all
      other clients, that try to work with the same file, will get stuck.
      
      [...]
      
      Lacking OS primitives to change address space of another process and not
      being able to work it around with ptrace in userspace, wcfs takes approach
      to kill a slow client on 30 seconds timeout by default.
      
      - - - <8 - - - <8 - - -
      
      But before this patch, this protection wasn't implemented yet: one
      faulty client could therefore freeze the whole system. With this patch
      this protection is implemented now: faulty clients are killed after the
      timeout or any other misbehaviour in their pin handlers.
      
      [1] https://lab.nexedi.com/nexedi/wendelin.core/blob/38dde766/wcfs/wcfs.go#L186-208
      
      Preliminary history:
      
          levin.zimmermann/wendelin.core@24904e82
          levin.zimmermann/wendelin.core@b02dcadcCo-authored-by: Levin Zimmermann's avatarLevin Zimmermann <levin.zimmermann@nexedi.com>
      
      /discussed-on !18
      c559ec1a
    • Kirill Smelkov's avatar
      wcfs: Shutdown WatchLink on any pin error · 007d53db
      Kirill Smelkov authored
      If a pin misbehaves or there is IO error or anything else, we want to
      stop all communication on the watchlink, cancel on in-flight pin
      handlers, and (TODO) kill the client with SIGBUS.
      
      This patch organizes WatchLink shutdown on any pin error.
      This functionality is indirectly tested by test_Wcfs_watch_robust and
      will be also indirectly tested by faultyprotection tests.
      
      It would be good to have dedicated tests probably, but that is,
      hopefully, TODO.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      007d53db
    • Kirill Smelkov's avatar
      wcfs: Do not cancel pin handler by a READ interrupt · 7a79bdd6
      Kirill Smelkov authored
      Pinning is critical operation whose failure will soon lead to client
      being killed with SIGBUS. WCFS correctness also depend fundamentally on
      pin operation, if started, to be handled by the client.
      
      -> rework the READ handler not to cancel pin if a READ interrupt comes
         in from the OS client.
      
      Do this via organizing WatchLink.serveCtx and running pins under this
      context instead of under READ context. Later we will adjust pins to also
      cancel this context on any error.
      
      Test is, hopefully, TODO.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      7a79bdd6
    • Kirill Smelkov's avatar
      wcfs: Fix potential stuck in WatchLink.serve exit codepath · a6dd7806
      Kirill Smelkov authored
      When serve is completing and going to exit, it sends an error message to
      the client without any timeout. So if the client is not reading from the
      channel, wcfs will get stuck waiting for the message to be consumed.
      
      -> Fix that by trying to send that last error only during 1 second and
         ignoring errors if any
      
      Test is, hopefully, TODO.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on !18
      a6dd7806
    • Kirill Smelkov's avatar
      wcfs: Rework WatchLink.serve exit codepath for better clarity · 08b011f5
      Kirill Smelkov authored
      Bring in more structure:
      
      - final watchlink cleanup is done in its own block
      - cancelling spawned handlers is done in another block
      - add more comments explaining things
      
      /reviewed-by @levin.zimmermann
      /reviewed-on !18
      08b011f5
    • Kirill Smelkov's avatar
      wcfs: Rework WatchLink.serve to rely on context cancellation to stop reading · c7c3b82a
      Kirill Smelkov authored
      Previously we were using .sk.CloseRead() to interrupt sk.Read(), but
      that is not necessary since .sk, relying on xio.Pipe, implements
      xio.Reader natively with full support for cancellation.
      
      The original code to cancel via CloseRead comes from mid 2019 and predates
      
      go123@7ad867a3
      go123@0e368363
      go123@0bdac628
      go123@9db4dfac
      go123@d2dc6c09
      
      And in b17aeb8c and
      6f0cdaff (wcfs: Provide isolation to clients), it seems, I missed to
      update WatchLink.serve code to that.
      
      Do that now because it simplifies code flow organization a bit.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      c7c3b82a
    • Kirill Smelkov's avatar
      wcfs: tests: Extend faulty protection tests with more kinds of faulty clients · c91fb14e
      Kirill Smelkov authored
      So far we were testing only against faulty client that reads pin
      notification ok, but does not reply to the notification. But there could
      be more problems:
      
      1) a client does not read pin notification at all
      2) a client closes watchlink abruptly after reading pin notification
      3) a client replies to pin notification but the reply is not "ack"
      
      The first problem, if not handled leads to whole set of clients to
      become stuck on reading the same block as the faulty client. The other
      problems also indicate breakage of the isolation protocol from the client
      side and that wcfs can no longer be sure that it provides good
      uncorrupted data to the client.
      
      In the first case, similarly to "no reply" situation we need to kill the
      client to make progress while maintaining safety as well. In the cases 2
      and 3 we cannot maintain safety if the faulty client remains in the set
      of live and served clients, so it is also logical to send SIGBUS/SIGKILL
      to it.
      
      Killing a client with SIGBUS is similar to how OS kernel sends SIGBUS when
      a memory-mapped file is accessed and loading file data results in EIO. It is
      also similar to wendelin.core 1 where SIGBUS is raised if loading file block
      results in an error.
      
      Extend tests to cover all explained scenarios.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      c91fb14e
    • Kirill Smelkov's avatar
      wcfs: tests: Add test to exercies faulty client that does not reply to pin... · 0c35ae45
      Kirill Smelkov authored
      wcfs: tests: Add test to exercies faulty client that does not reply to pin triggered by readPinWatchers
      
      Levin writes:
      
          This patch extends the test scope of 'test_wcfs_pintimeout_kill'. Before
          this patch, the test only ensured that a client that does not
          respond to pin requests during the initial watch request [1] is
          killed. Now it also tests that a faulty client is killed when a block
          is invalidated. Since there are no other situations where the WCFS
          server sends pin requests to a client, the tests now cover all situations
          where a faulty client might not respond. This patch therefore aims to
          increase the security that WCFS is not blocked by a faulty client.
      
          [1] See nexedi/wendelin.core!18
      
      Preliminary history:
      
          levin.zimmermann/wendelin.core@9d42efffCo-authored-by: Levin Zimmermann's avatarLevin Zimmermann <levin.zimmermann@nexedi.com>
      
      /discussed-on nexedi/wendelin.core!18
      0c35ae45
    • Kirill Smelkov's avatar
      wcfs: tests: Factor assertion that process should be killed into assertKilled · 008211fb
      Kirill Smelkov authored
      We will need to use this utilitin from several places in the next patch.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      008211fb
    • Kirill Smelkov's avatar
      wcfs: tests: Allow to adjust tDB.assertBlk timeout · 82aa5949
      Kirill Smelkov authored
      Currently assertBlk uses default timeout() to wait for READ operation to
      complete. That works well everywhere except that in faulty
      protection tests wcfs server will first need to wait for its own
      pintimeout time to kill the faulty client and only then	return read
      result to all non-faulty clients.
      
      This way corresponding test, when one client fails to handle pin
      notification well triggered due to READ operations, will need to use
      adjusted longer timeout for the good client when doing assertBlk.
      
      Adjust assertBlk to allow specifying custom timeout as preparatory step
      for that.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      82aa5949
    • Kirill Smelkov's avatar
      wcfs: tests: Extend faultyprotect test with good client · 001e2e7e
      Kirill Smelkov authored
      And make sure that that good client can setup its watch ok even
      through there simultaneously is a faulty client that should get killed.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      001e2e7e
    • Kirill Smelkov's avatar
      wcfs: tests: Move client to be pinkill'ed into separate process · 33ea7769
      Kirill Smelkov authored
      If we don't the whole testing process will become killed when wcfs
      becomes taught to kill clients that do not handle pin notifications
      well.
      
      Use multiprocessing to do so and to be able to interoperate with spawned
      test process by sending/receiving objects to/from it.
      
      Preliminary history:
      
          levin.zimmermann/wendelin.core@aef0f0e1Co-authored-by: Levin Zimmermann's avatarLevin Zimmermann <levin.zimmermann@nexedi.com>
      
      /discussed-on nexedi/wendelin.core!18
      33ea7769
    • Kirill Smelkov's avatar
      wcfs: tests: Fix thinko in "sleep > wcfs pin timeout - wcfs must kill us" · 1303799e
      Kirill Smelkov authored
      If wcfs kills client that did not respond to pin notification in
      pintimeout time, we need to wait strictly _more_ than that time to detect
      whether client was killed or not. And in practice, due to noise in
      operating system load and other factors, that waiting time should be
      significantly greater to detect lack of expected event. However we were
      waiting for exactly 1·pintimeout time and were claiming that there was
      no pinkill event right after that.
      
      -> Wait for 2·pintimeout instead of 1·pintimeout to make pinkill detection robust.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      1303799e
    • Kirill Smelkov's avatar
      wcfs: tests: Use small "pin timeout" for faulty protection tests · e8a3f34a
      Kirill Smelkov authored
      The default "pin timeout" is 30s and we are going to add many tests that
      exercise pinkilling functionality soon. If every such test takes
      2·pintimeout time = 60s, it will result in significant time increase
      needed to run WCFS tests. Avoid that by adjusting pin timeout to
      one order of magnitude smaller pintimeout=3s during faulty protection
      tests.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      e8a3f34a
    • Kirill Smelkov's avatar
      wcfs: tests: Add context to tWCFS · 869e597d
      Kirill Smelkov authored
      This testing helper limits whole test time to detect FUSE-related
      deadlocks via aborting FUSE connection on timeout. It is working good so
      far. But soon we will need pinkill-related tests, where timeout will
      need to be detected independently of FUSE connection. Expose tWCFS.ctx
      for tests to be able to use this context and do things limited in time.
      Adjust FUSE aborting to correlate exactly with this context
      cancellation.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      869e597d
    • Kirill Smelkov's avatar
      wcfs: tests: Move test for verifying protection against faulty/slow clients to dedicated file · ab38f971
      Kirill Smelkov authored
      We are going to add more tests on this topic + supporting infrastructure.
      It makes sense to move everything related to dedicated test file first
      as a preparatory step because wcfs_test.py feels already overloaded.
      
      Plain code movement.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      ab38f971
    • Kirill Smelkov's avatar
      wcfs: Fix setupWatch vs setupWatch race on the same file · 64468d47
      Kirill Smelkov authored
      WCFS allows issuing simultaneous watch requests and when two watch
      requests are simultaneously issued for the same file there was a race in
      their handling: the code was relying on w.atMu.W to protect setupWatch
      from concurrent readPinWatcher, and also, seemingly from another
      setupWatch running on the same file.
      
      But there is a bug about that: lacking atomic primitive to downgrade
      RWMutex from wlock to rlock, atMu.W was first fully unlocked and then
      rlocked again. The code prepare wrt readPinWatcher to start running in
      that unlock->rlock time window, but it was not prepared wrt another
      setupWatch starting to run on the same file in that pause time.
      
      -> Fix that via using dedicated Watch.setupMu lock that protects
         setupWatch from setupWatch.
      
      Test is, hopefully, TODO.
      
      My mistake from 6f0cdaff (wcfs: Provide isolation to clients)
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      64468d47
    • Kirill Smelkov's avatar
      wcfs: Fix readPinWatchers error path · 7bbd6177
      Kirill Smelkov authored
      Inside readPinWatchers:
      
          https://lab.nexedi.com/nexedi/wendelin.core/-/blob/wendelin.core-2.0.alpha3-26-g79e6f7b9/wcfs/wcfs.go#L1536-1591
      
      if δFtail.BlkRevAt would return an error, then f.watchMu was not
      RUnlocked back, and wg.Wait was not called at all.
      
      -> Fix that by scheduling unlock and wg wait right after f.watchMu is
         rlocked and workgroup is created.
      
      Test is, hopefully, TODO.
      
      My mistake from 6f0cdaff (wcfs: Provide isolation to clients)
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      7bbd6177
    • Kirill Smelkov's avatar
      wcfs: Cleanup wlinkTab entry when client drops opened head/watch handle · b20a26cb
      Kirill Smelkov authored
      The code was already behaving like that but there was XXX to do it. Add
      test to verify it is actually done.
      
      Opened WatchLink handle is released after RELEASE because
      read in WatchLink.serve, after RELEASE, returns EOF and then the code
      inside WCFS does all necessary WatchLink-related cleanup:
      
      https://lab.nexedi.com/nexedi/wendelin.core/-/blob/wendelin.core-2.0.alpha3-26-g79e6f7b9/wcfs/wcfs.go#L1828-1872
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      b20a26cb
    • Kirill Smelkov's avatar
      wcfs: Cleanup zheadSockTab entry when client drops opened .wcfs/zhead handle · 87818b0d
      Kirill Smelkov authored
      This was marked as TODO in server code and not implemented.
      Without this cleanup zheadSockTab was growing indefinitely after every
      open/close and leaking memory.
      
      -> Fix it via registering RELEASE handler to FUSE and removing
      corresponding zheadSockTab entry from there.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      87818b0d
    • Kirill Smelkov's avatar
      wcfs: Add .wcfs/stats file with basic usage statistics · 8abfd27d
      Kirill Smelkov authored
      Report there number of inside-WCFS instances, e.g. number of tracked
      BigFiles, WatchLinks etc, and also number of counted events, for example
      how many times a pin event happened.
      
      Soon we will need this statistics to implement tests e.g. for pinkilling
      and other functionalities, and it might be also useful to have in general.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      8abfd27d
    • Kirill Smelkov's avatar
      wcfs: Fix wlinkTab locking · 96b216f6
      Kirill Smelkov authored
      ZWatcher says it does not need to lock wlinkMu because it is already
      holding zheadMu and setupWatch runs with zheadMu locked. That is indeed
      true, but the mistake here is that it i not only setupWatch that makes
      access to wlinkTab. For example WatchNode.Open registers new entries
      there only under wlinkMu:
      
      https://lab.nexedi.com/nexedi/wendelin.core/-/blob/wendelin.core-2.0.alpha3-26-g79e6f7b9/wcfs/wcfs.go#L1819-1822
      
      -> Fix it by always using wlinkMu when accessing wlinkTab.
      
      My mistake from 6f0cdaff (wcfs: Provide isolation to clients)
      
      Test is, hopefully, TODO.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      96b216f6
    • Kirill Smelkov's avatar
      wcfs: Switch debug.zheadSockTab to fine-grained locking · 82359abe
      Kirill Smelkov authored
      Previously we were protecting access to zheadSockTab with zheadMu
      because this table was accessed from only two places: when opening
      .wcfs/zhead and in zwatcher. Soon we are going to add another place that
      will access this table and still using big zheadMu seem less and less
      logical.
      
      -> Switch to using dedicated lock to protect table of .wcfs/zhead opens
         as preparatory step for that.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      82359abe
    • Kirill Smelkov's avatar
      wcfs: Switch filesystem to EIO mode on zwatcher failure · a36b5562
      Kirill Smelkov authored
      Currently zwatcher failure leads to wcfs starting to provide stale data
      instead of uptodate data. Fix that by detecting zwatcher failures and
      explicitly switching the filesystem to a mode where any access to
      anything returns "input/output error".
      
      Zwatcher can fail on e.g. failure to retrieve transactions from ZODB
      storage or any other failure. With this patch we make sure this does not
      go unnoticed.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      a36b5562
    • Kirill Smelkov's avatar
      wcfs: Remove TODO to teach go-fuse about Init.MaxPages · 6dfcb69e
      Kirill Smelkov authored
      go-fuse added functionality to handle Init.MaxPages in
      https://github.com/hanwen/go-fuse/commit/265a39266958.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on nexedi/wendelin.core!18
      6dfcb69e
  5. 23 Jul, 2024 3 commits
  6. 22 Jul, 2024 1 commit
    • Kirill Smelkov's avatar
      bigfile/zodb: Apply auto format as default only in WCFS mode · 34309058
      Kirill Smelkov authored
      This semantically reverts 99f262dd (bigfile/zodb: Make auto format the
      default) for wendelin.core-1 mode because in non-WCFS mode there are
      known problems with data corruption on BTree topology changes(*) and
      auto mode actually does change those topologies with first setting
      ZBigFile[blk] -> ZBlk1 and then updating the same block to point to
      ZBlk0 object.
      
      Avoid pressuring those problems and use auto as default only in WCFS
      mode that should handle invalidations with all those BTree topology
      changes well.
      
      The patch is based on suggestion by Levin Zimmermann: !20 (comment 212405)
      
      We have to move _default_use_wcfs because now it is invoked at module
      import time and needs to be already defined at the time of the call.
      
      (*) see 8c32c9f6 for details.
      
      /reviewed-by @levin.zimmermann
      /reviewed-on !29
      34309058