1. 26 Mar, 2021 5 commits
    • Kirill Smelkov's avatar
      go/zodb: Fix IStorage implementation not to deadlock driver if watcher fails · 003c44cb
      Kirill Smelkov authored
      Before the patch if storage.watcher fails, storage.driver.Close is not
      called, and so the driver will continue to send to .drvWatchq, but noone
      is receiving from it.
      
      a5dbb92b (go/zodb: Require drivers to close watchq on Close), provides
      the guarantee that the driver will stop sending on drvWatchq right after
      drv.Close call.
      003c44cb
    • Kirill Smelkov's avatar
      go/zodb/zeo: Fix it to provide "Close vs watchq" guaranty · 58e0142c
      Kirill Smelkov authored
      Provide guaranty that Close forces the driver to stop sending to watchq
      and to close it. See a5dbb92b ("go/zodb: Require drivers to close watchq
      on Close") for details.
      
      Without the fix TestWatch fails with test timeout:
      
          panic: test timed out after 30s
      
          # Close waits for serve to stop
          goroutine 93 [semacquire]:
          sync.runtime_Semacquire(0xc000152170)
                  /home/kirr/src/tools/go/go/src/runtime/sema.go:56 +0x45
          sync.(*WaitGroup).Wait(0xc000152168)
                  /home/kirr/src/tools/go/go/src/sync/waitgroup.go:130 +0x65
          lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.(*zLink).Close(0xc0001520f0, 0x1313, 0x1)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zrpc.go:159 +0x47
          lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.(*zeo).Close(0xc000313680, 0xc000107c78, 0x1)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zeo.go:526 +0x2e
          lab.nexedi.com/kirr/neo/go/internal/xtesting.DrvTestWatch(0xc000082c00, 0xc0000aa2a0, 0x24, 0x6a4a38)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/internal/xtesting/xtesting.go:442 +0xdb5
          lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.TestWatch.func1(0xc000082c00, 0x6e3498, 0xc00009a380)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zeo_test.go:270 +0x99
          lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.withZEOSrv.func2.1(0xc0000a4168, 0x16)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zeo_test.go:207 +0xfb
          lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.withZEOSrv.func1(0xc000082c00, 0xc00009c180)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zeo_test.go:186 +0x129
          lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.withZEOSrv.func2(0xc000082c00)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zeo_test.go:199 +0x10e
          testing.tRunner(0xc000082c00, 0xc00009c160)
                  /home/kirr/src/tools/go/go/src/testing/testing.go:1194 +0xef
          created by testing.(*T).Run
                  /home/kirr/src/tools/go/go/src/testing/testing.go:1239 +0x2b3
      
          # serve is stuck in invalidateTransaction doing watchq<-
          goroutine 26 [chan send]:
          lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.(*zeo).invalidateTransaction(0xc000313680, 0x6417e0, 0xc000323b60, 0x0, 0x0)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zeo.go:176 +0x373
          lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.(*zLink).serveRecv1(0xc0001520f0, 0xc000393890, 0x0, 0x0)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zrpc.go:225 +0x4b4
          lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.(*zLink).serveRecv(0xc0001520f0)
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zrpc.go:176 +0x8d
          created by lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.(*zLink).start
                  /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zrpc.go:99 +0xc8
      58e0142c
    • Kirill Smelkov's avatar
      go/zodb/fs1: Fix it to provide "Close vs watchq" guaranty · 88848c31
      Kirill Smelkov authored
      Provide guaranty that Close forces the driver to stop sending to watchq
      and to close it. See a5dbb92b ("go/zodb: Require drivers to close watchq
      on Close") for details.
      88848c31
    • Kirill Smelkov's avatar
      go/zodb: Require drivers to close watchq on Close · a5dbb92b
      Kirill Smelkov authored
      If we don't require drivers to stop sending to watchq after Close, there
      could be many deadlock scenarios, for example:
      
      - client called drv.Close(); no longer listens to watchq; driver is
        stuck sending to watchq, or
      
      - client called drv.Close(), which itself waits for tasks spawned inside
        driver to complete, which are stuck sending to watchq because client
        no longer does <-watchq.
      
      The change is similar in spirit to safety guaranty provided by high-level
      Watcher where after DelWatch call it is guaranteed that there will be no
      more sends to subscribed watchq (see c41c2907 "go/zodb: High-level
      watching - initial draft") for details.
      
      All drivers don't provide requested guarantee yet.
      We'll be fixing them one-by-one in followup commits.
      a5dbb92b
    • Kirill Smelkov's avatar
      go/internal/xtesting: Fix FatalIf to return a helper · 20d48ded
      Kirill Smelkov authored
      Else, on an error, it is the lineno of `t.Fatal(err)` inside FatalIf
      that is printed, not the line number inside user test.
      20d48ded
  2. 17 Mar, 2021 4 commits
    • Kirill Smelkov's avatar
      go/zodb/fs1: Fix access to whiteout · 543041a3
      Kirill Smelkov authored
      A data record with len(data)=0 and backpointer=0 is considered by
      FileStorage/py as "no data":
      
      https://github.com/zopefoundation/ZODB/blob/5.6.0-15-g22d1405d4/src/ZODB/FileStorage/FileStorage.py#L576-L582
      
      Even though currently it is not possible to create such data record via
      FileStorage(py).deleteObject (becase it raises POSKeyError if there is
      no previous object revision), being able to use such data records is
      semantically useful in overlayed DemoStorage settings, where δ part
      marks an object that exists only in base with delete record whiteout.
      
      It is also generally meaningfull to be able to create "delete" record
      even if object was not previously existing: "deleteObject" is actually
      similar to "store" (and so should be better named as "storeDelete"). If
      one wants to store deletion, there should not be a reason to reject it,
      because deleteObject already has seatbelt in the form of oldserial, and
      if the user calls deleteObject(oid, oldserial=z64), he/she is already
      telling that "I know that this object does not exist in this storage
      (oldserial=z64), but still please create a deletion record for it". Once
      again this is useful in overlayed DemoStorage settings described above.
      
      For the reference, such whiteout deletion records pass ZODB/scripts/fstest just fine.
      
      Even though FileStorage/py loads such data records just fine, on
      FileStorage/go side it was not the case - DataHeader.LoadBackRef, even
      with backpointer=0, was verifying that backpointer to be valid and
      failing seeing it might overlap with current transaction:
      
          === RUN   TestLoadWhiteout
          2021/03/17 06:40:58 index load: open testdata/whiteout.fs.index: no such file or directory
          2021/03/17 06:40:58 testdata/whiteout.fs: index rebuild...
              filestorage_test.go:398: load 0000000000000017:0000000000000001: bad err:
                  have: testdata/whiteout.fs: load 0000000000000017:0000000000000001: testdata/whiteout.fs: data record @27: check: backpointer (0) overlaps with txn (4)
                  want: testdata/whiteout.fs: load 0000000000000017:0000000000000001: 0000000000000001: object was not yet created
      
      It was a thinko: backPos==0 was already kind of handled in LoadBackRef,
      but only in one verification case. -> Fix all checks not to trigger when
      seeing backPos=0. DataHeader.LoadBack - the caller of LoadBackRef -
      already handles returned backPos=0 as "no data".
      543041a3
    • Kirill Smelkov's avatar
      go/zodb/fs1tools: tests: Use fresh dumper for each subtest · a708663d
      Kirill Smelkov authored
      We were reusing Dumper instance in between testDump subtests. This was
      not noticed, as it was only "1" and then "empty" case, because "emtpy"
      has no transactions. However in the next patch we'll add another
      subcase, and if the dumper instance is not reset, it will think that it
      starts from transaction number non-zero, which would differ from fresh
      dumper output.
      
      -> Fix it.
      a708663d
    • Kirill Smelkov's avatar
      go/zodb/fs1: Fix Python database generator to work with recent zodbtools · fc69e00d
      Kirill Smelkov authored
      As Zodbtools dropped ZODB3 support its run_with_zodb3py2_compat was
      renamed to run_with_zodb4py2_compat:
      
      nexedi/zodbtools@c59a54ca
      fc69e00d
    • Kirill Smelkov's avatar
      go/zodb/zeo: msgpack.{Encode,Decode} -> msgpack.{Marshal,Unmarshal} · 363e6e61
      Kirill Smelkov authored
      Encode/decode was deprecated and removed in recent github.com/shamaton/msgpack.
      363e6e61
  3. 19 Feb, 2021 1 commit
  4. 18 Jan, 2021 5 commits
  5. 15 Jan, 2021 20 commits
  6. 14 Jan, 2021 4 commits
    • Kirill Smelkov's avatar
      go/neo/neonet: Kill pktBuf.Dump · dfec9278
      Kirill Smelkov authored
      It is not used anywhere and would have to be reworked with upcoming
      introduction of msgpack encoding. -> Simply remove that.
      dfec9278
    • Kirill Smelkov's avatar
      go/neo/neonet: msgPack -> pktEncode · 47e3cdd9
      Kirill Smelkov authored
      Just renaming.
      47e3cdd9
    • Kirill Smelkov's avatar
      go/neo/neonet: tests: Introduce T · 41a120c1
      Kirill Smelkov authored
      This represents a neonet testing environment and in the future will be
      used to run tests under different protocol versions, protocol encodings, etc.
      
      For now it is noop wrapper around testing.T + t.bin that wraps []byte
      and will be amended to return something different depending on t's
      encoding.
      41a120c1
    • Kirill Smelkov's avatar
      go/neo/neonet: Rework handshake to differentiate client and server parts · e407f725
      Kirill Smelkov authored
      Previously we were doing handshake symmetrically: both client and server
      were transmitting hello and receiving peer's hello simultaneously.
      However this does not allow server to adjust its behaviour depending on
      which client (protocol version, protocol encoding, ...) is connecting to it.
      
      -> Rework handshake so that client always sends its hello first, and
      only then the server side replies. This matches actual NEO/py behaviour:
      
      https://lab.nexedi.com/nexedi/neoppod/blob/v1.12-67-g261dd4b4/neo/lib/connector.py#L293-294
      
      even though the "NEO protocol" states that
      
      	Handshake transmissions are not ordered with respect to each other and can go in parallel.
      
      	( https://neo.nexedi.com/P-NEO-Protocol.Specification.2019?portal_skin=CI_slideshow#/9/2 )
      
      If I recall correctly that sentence was authored by me in 2018 based on
      previous understanding of should-be full symmetry in-between client and
      server.
      
      However soon we are going to teach server sides to autodetect client
      encoding and adjust server to talk to client via its preferred way.
      This needs handshake for client and server to be differentiated.
      
      The protocol needs to be adjusted as well. However I'm not sure it is
      going to happen...
      e407f725
  7. 13 Jan, 2021 1 commit