- 17 Jul, 2023 2 commits
-
-
Kirill Smelkov authored
Levin Zimmerman discovered that sometimes NEO/py accepts our handshake hello with encoding 'M', then replies its owh handshake ehlo with encoding 'N' and then further terminates the connection. In other words it looks like that the handshake went successful, but it actually did not and NEO/py terminates the link after some time. This manifests itself e.g. as infrequent TestLoad failures on t branch with the following output: === RUN TestLoad/py/!ssl I: runneo.py: /tmp/neo776618506/1 !ssl: started master(s): 127.0.0.1:21151 === RUN TestLoad/py/!ssl/enc=N(dialTryOrder=N,M) client_test.go:598: skip: does not excercise client redial === RUN TestLoad/py/!ssl/enc=N(dialTryOrder=M,N) xtesting.go:330: load 0285cbac258bf266:0000000000000000: returned err unexpected: have: neo://127.0.0.1:21151/1: load 0285cbac258bf266:0000000000000000: dial S1: dial 127.0.0.1:40345 (STORAGE): 127.0.0.1:56678 - 127.0.0.1:40345: request identification: 127.0.0.1:56678 - 127.0.0.1:40345 .1: recv: EOF want: nil xtesting.go:330: load 0285cbac258bf266:0000000000000000: returned tid unexpected: 0000000000000000 ; want: 0285cbac258bf266 xtesting.go:330: load 0285cbac258bf266:0000000000000000: returned buf = nil xtesting.go:339: load 0285cbac258bf265:0000000000000000: returned err unexpected: have: neo://127.0.0.1:21151/1: load 0285cbac258bf265:0000000000000000: dial S1: dial 127.0.0.1:40345 (STORAGE): 127.0.0.1:56688 - 127.0.0.1:40345: request identification: 127.0.0.1:56688 - 127.0.0.1:40345 .1: recv: EOF want: neo://127.0.0.1:21151/1: load 0285cbac258bf265:0000000000000000: 0000000000000000: object was not yet created ... client_test.go:588: NEO log tail: log file 'storage_0.log' tail: 2023-07-17 17:21:57.1519 DEBUG S1 connection completed for <ServerConnection(nid=None, address=127.0.0.1:51230, handler=IdentificationHandler, fd=20, server) at 7f3583fd4f50> (from 127.0.0.1:40345) 2023-07-17 17:21:57.1537 WARNING S1 Protocol version mismatch with <ServerConnection(nid=None, address=127.0.0.1:51230, handler=IdentificationHandler, fd=20, server) at 7f3583fd4f50> 2023-07-17 17:21:57.1548 DEBUG S1 connection closed for <ServerConnection(nid=None, address=127.0.0.1:51230, handler=IdentificationHandler, closed, server) at 7f3583fd4f50> 2023-07-17 17:21:57.1551 WARNING S1 A connection was lost during identification 2023-07-17 17:22:00.1582 DEBUG S1 accepted a connection from 127.0.0.1:51236 2023-07-17 17:22:00.1585 DEBUG S1 connection completed for <ServerConnection(nid=None, address=127.0.0.1:51236, handler=IdentificationHandler, fd=20, server) at 7f3583fd4e90> (from 127.0.0.1:40345) 2023-07-17 17:22:00.1604 WARNING S1 Protocol version mismatch with <ServerConnection(nid=None, address=127.0.0.1:51236, handler=IdentificationHandler, fd=20, server) at 7f3583fd4e90> 2023-07-17 17:22:00.1622 DEBUG S1 connection closed for <ServerConnection(nid=None, address=127.0.0.1:51236, handler=IdentificationHandler, closed, server) at 7f3583fd4e90> 2023-07-17 17:22:00.1625 WARNING S1 A connection was lost during identification 2023-07-17 17:22:03.1663 DEBUG S1 accepted a connection from 127.0.0.1:51238 2023-07-17 17:22:03.1666 DEBUG S1 connection completed for <ServerConnection(nid=None, address=127.0.0.1:51238, handler=IdentificationHandler, fd=20, server) at 7f3583fd4d10> (from 127.0.0.1:40345) 2023-07-17 17:22:03.1674 WARNING S1 Protocol version mismatch with <ServerConnection(nid=None, address=127.0.0.1:51238, handler=IdentificationHandler, fd=20, server) at 7f3583fd4d10> 2023-07-17 17:22:03.1688 DEBUG S1 connection closed for <ServerConnection(nid=None, address=127.0.0.1:51238, handler=IdentificationHandler, closed, server) at 7f3583fd4d10> 2023-07-17 17:22:03.1691 WARNING S1 A connection was lost during identification 2023-07-17 17:22:06.1714 DEBUG S1 accepted a connection from 127.0.0.1:57072 2023-07-17 17:22:06.1719 DEBUG S1 connection completed for <ServerConnection(nid=None, address=127.0.0.1:57072, handler=IdentificationHandler, fd=20, server) at 7f3583fd4b50> (from 127.0.0.1:40345) 2023-07-17 17:22:06.1727 WARNING S1 Protocol version mismatch with <ServerConnection(nid=None, address=127.0.0.1:57072, handler=IdentificationHandler, fd=20, server) at 7f3583fd4b50> 2023-07-17 17:22:06.1738 DEBUG S1 connection closed for <ServerConnection(nid=None, address=127.0.0.1:57072, handler=IdentificationHandler, closed, server) at 7f3583fd4b50> 2023-07-17 17:22:06.1738 WARNING S1 A connection was lost during identification log file 'master_0.log' tail: 2023-07-17 17:21:21.0799 PACKET M1 #0x0012 NotifyNodeInformation > A1 (127.0.0.1:37906) 2023-07-17 17:21:21.0799 PACKET M1 ! C0 | CLIENT | | RUNNING | 2023-07-17 14:21:21.079838 2023-07-17 17:21:21.0800 PACKET M1 #0x0102 NotifyNodeInformation > S1 (127.0.0.1:37918) 2023-07-17 17:21:21.0800 PACKET M1 ! C0 | CLIENT | | RUNNING | 2023-07-17 14:21:21.079838 2023-07-17 17:21:21.0801 DEBUG M1 Handler changed on <ServerConnection(nid=None, address=127.0.0.1:37966, handler=ClientServiceHandler, fd=18, server) at 7f3584245910> 2023-07-17 17:21:21.0802 PACKET M1 #0x0001 AnswerRequestIdentification > C0 (127.0.0.1:37966) 2023-07-17 17:21:21.0804 PACKET M1 #0x0000 NotifyNodeInformation > C0 (127.0.0.1:37966) 2023-07-17 17:21:21.0804 PACKET M1 ! C0 | CLIENT | | RUNNING | 2023-07-17 14:21:21.079838 2023-07-17 17:21:21.0804 PACKET M1 ! M1 | MASTER | 127.0.0.1:21151 | RUNNING | None 2023-07-17 17:21:21.0804 PACKET M1 ! S1 | STORAGE | 127.0.0.1:40345 | RUNNING | 2023-07-17 14:21:18.737469 2023-07-17 17:21:21.0805 PACKET M1 #0x0002 NotifyPartitionTable > C0 (127.0.0.1:37966) 2023-07-17 17:21:21.0810 PACKET M1 #0x0003 LastTransaction < C0 (127.0.0.1:37966) 2023-07-17 17:21:21.0811 PACKET M1 #0x0003 AnswerLastTransaction > C0 (127.0.0.1:37966) 2023-07-17 17:22:06.2053 DEBUG M1 <SocketConnectorIPv4 at 0x7f3584252d10 fileno 18 ('127.0.0.1', 21151), opened from ('127.0.0.1', 37966)> closed in recv 2023-07-17 17:22:06.2056 DEBUG M1 connection closed for <ServerConnection(nid=C0, address=127.0.0.1:37966, handler=ClientServiceHandler, closed, server) at 7f3584245910> 2023-07-17 17:22:06.2058 PACKET M1 #0x0014 NotifyNodeInformation > A1 (127.0.0.1:37906) 2023-07-17 17:22:06.2058 PACKET M1 ! C0 | CLIENT | | UNKNOWN | 2023-07-17 14:21:21.079838 2023-07-17 17:22:06.2059 PACKET M1 #0x0104 NotifyNodeInformation > S1 (127.0.0.1:37918) 2023-07-17 17:22:06.2059 PACKET M1 ! C0 | CLIENT | | UNKNOWN | 2023-07-17 14:21:21.079838 The problem is due to that my analysis from e407f725 (go/neo/neonet: Rework handshake to differentiate client and server parts) turned out to be incorrect. Quoting that patch: -> Rework handshake so that client always sends its hello first, and only then the server side replies. This matches actual NEO/py behaviour: https://lab.nexedi.com/nexedi/neoppod/blob/v1.12-67-g261dd4b4/neo/lib/connector.py#L293-294 even though the "NEO protocol" states that Handshake transmissions are not ordered with respect to each other and can go in parallel. ( https://neo.nexedi.com/P-NEO-Protocol.Specification.2019?portal_skin=CI_slideshow#/9/2 ) If I recall correctly that sentence was authored by me in 2018 based on previous understanding of should-be full symmetry in-between client and server. so here "This matches actual NEO/py behaviour" was wrong: even though https://lab.nexedi.com/nexedi/neoppod/blob/v1.12-67-g261dd4b4/neo/lib/connector.py#L293-294 indeed says that # The NEO protocol is such that a client connection is always the # first to send a packet, as soon as the connection is established, in reality it is not the case as SocketConnector always queues handshake hello upon its creation before receiving anything from remote side: https://lab.nexedi.com/nexedi/neoppod/blob/v1.12-93-gfd87e153/neo/lib/connector.py#L77-78 . In practice this leads to that in non-SSL case NEO/py server might be fast enough to send its prepared hello before receiving hello from us. Levin also explains at !5 (comment 187429): I think what happens is this: the NEO protocol doesn't specify in which order handshakes happen after initial dial. If the peer sends a handshake before receiving our handshake and if this peers handshake is received by us, 'DialLink' assumes everything is fine (no err is returned), it breaks the loop and returns the link. But then, very little time later, when the peer finally receives our handshake, this looks strange for the peer and it closes the connection. So in my understanding this should be fixed by explicitly comparing the encodings between our expected one and what the peer provided us. If encodings don't match we should retry with a new encoding in order to prevent the peer from closing the connection. For me this also explains why sometimes the tests passed and sometimes didn't: it depended on which node was faster ('race condition'). -> In this patch we add correspondig handshake test that demonstrates this problem. It currently fails as --- FAIL: TestHandshake (0.01s) --- FAIL: TestHandshake/enc=N (0.00s) newlink_test.go:154: handshake encoding mismatch: client: unexpected error: have: <nil> "<nil>" want: &neonet._HandshakeError{LocalRole:1, LocalAddr:net.pipeAddr{}, RemoteAddr:net.pipeAddr{}, Err:(*neonet._EncodingMismatchError)(0xc0000a4190)} "pipe - pipe: handshake (client): protocol encoding mismatch: peer = 'M' ; our side = 'N'" --- FAIL: TestHandshake/enc=M (0.00s) newlink_test.go:154: handshake encoding mismatch: client: unexpected error: have: <nil> "<nil>" want: &neonet._HandshakeError{LocalRole:1, LocalAddr:net.pipeAddr{}, RemoteAddr:net.pipeAddr{}, Err:(*neonet._EncodingMismatchError)(0xc0001a22cc)} "pipe - pipe: handshake (client): protocol encoding mismatch: peer = 'N' ; our side = 'M'" We will fix it in the next patch. /reported-by @levin.zimmermann /reported-on !5
-
Kirill Smelkov authored
go/neo/neonet: Dedicate an error type to indicate "protocol version mismatch" as handshake failure cause We will soon need to detect if a handshake failure was due to mismatch of protocol encodings and that would require introduction of dedicated error type for that cause. As a preparatory step first refactor "version mismatch cause" to follow the same style for symmetry.
-
- 18 Jan, 2023 2 commits
-
-
Levin Zimmermann authored
This patch fixes a discrepancy between NEO/py and NEO/go: NEO/py expands the '~' and the '~username' prefix in the file path of the TLS certificate/key files [1]. This syntax is used in NEO/py SlapOS SR [2]. We need to fix this discrepancy in NEO/go in order to use TLS encryption with NEO + WCFS. [1] https://lab.nexedi.com/nexedi/neoppod/blob/7c539f0f/neo/lib/config.py#L149 and https://lab.nexedi.com/nexedi/neoppod/blob/fa63d856/neo/lib/app.py#L25-31 [2] https://lab.nexedi.com/nexedi/slapos/blob/397726e1/stack/erp5/instance-zodb-base.cfg.in#L18-20 and https://lab.nexedi.com/nexedi/slapos/blob/a8150a1a/software/neoppod/instance-neo-input-schema.json#L62 /reviewed-by @kirr /reviewed-on kirr/neo!1
-
Levin Zimmermann authored
The xfilepath package supports resolving filepaths with a user prefix to absolute paths: it converts '~' and '~username' to $HOME of user (as it's done by for instance bash). No builtin golang module supports this functionality [1]. We need this functionality in order to imitate the behaviour of NEO/py in NEO/go [2]. --- [1] https://stackoverflow.com/questions/47261719/how-can-i-resolve-a-relative-path-to-absolute-path-in-golang) [2] nexedi/slapos!1307 (comment 17574) /reviewed-by @kirr /reviewed-on !1
-
- 18 May, 2022 2 commits
-
-
Kirill Smelkov authored
Without parenthesis it was failing on py3: (neo) (py3.venv) (g.env) kirr@deca:~/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t$ ./neotest info-local date: Wed, 18 May 2022 11:05:50 +0300 xnode: kirr@deca.navytux.spb.ru (2401:5180:0:af::1 192.168.0.3 (+ 1·ipv4)) uname: Linux deca 5.10.0-13-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64 GNU/Linux cpu: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz File "<string>", line 1 print '%.2fGHz' % (400000 / 1E6) ^ SyntaxError: invalid syntax
-
Kirill Smelkov authored
On py3 dict.keys() returns iterator instead of list: $ ./tzodb.py zhash Traceback (most recent call last): File "/home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/./tzodb.py", line 141, in <module> main() File "/home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/./tzodb.py", line 138, in main zhash() File "/home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/neo/t/./tzodb.py", line 59, in zhash optv, argv = getopt(sys.argv[2:], "h", ["help", "check=", "bench="] + hashRegistry.keys()) TypeError: can only concatenate list (not "dict_keys") to list
-
- 25 Nov, 2021 1 commit
-
-
Kirill Smelkov authored
With xurl.ParseQuery being simpler analog of url.ParseQuery. Simpler: It returns regular map instead of url.Values by not allowing duplicates.
-
- 04 Oct, 2021 5 commits
-
-
Kirill Smelkov authored
WCFS needs to know key coverage for every visited node. WARNING: this is API change.
-
Kirill Smelkov authored
KeyRange represents [lo,hi) key range. It simplifies working with ranges of keys. We will use it in the next commit. KeyRange originated in WCFS and was copied from there: https://lab.nexedi.com/kirr/wendelin.core/blob/57be0126/wcfs/internal/xbtree/blib/keyrange.go
-
Kirill Smelkov authored
We were already using math.Min<Key> in one place, but the number of such places is going to increase. -> Keep min/max definition in only one place.
-
Kirill Smelkov authored
NOTE: db was already being closed in the test's code.
-
Kirill Smelkov authored
-
- 08 Sep, 2021 4 commits
-
-
Kirill Smelkov authored
staticcheck says: xtesting.go:386:2: this value of err is never used (SA4006)
-
Kirill Smelkov authored
fs1/format.go:204:2: only the first constant in this group has an explicit type (SA9004) zeo/proto.go:56:2: only the first constant in this group has an explicit type (SA9004)
-
Kirill Smelkov authored
staticcheck reports ziobtree.go:606:4: Errorf is a pure function but its return value is ignored (SA4017) ziobtree.go:626:4: Errorf is a pure function but its return value is ignored (SA4017) zlobtree.go:606:4: Errorf is a pure function but its return value is ignored (SA4017) zlobtree.go:626:4: Errorf is a pure function but its return value is ignored (SA4017)
-
Kirill Smelkov authored
run_with_zodb3py2_compat was renamed to run_with_zodb4py2_compat in nexedi/zodbtools@c59a54ca . Without the fix go genrate was failing as (neo) (z-dev) (g.env) kirr@deca:~/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/btree$ go generate Traceback (most recent call last): File "./testdata/gen-testdata", line 26, in <module> from zodbtools.test.gen_testdata import run_with_zodb3py2_compat ImportError: cannot import name run_with_zodb3py2_compat and (neo) (z-dev) (g.env) kirr@deca:~/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb$ go generate Traceback (most recent call last): File "./py/pydata-gen-testdata", line 24, in <module> from zodbtools.test.gen_testdata import run_with_zodb3py2_compat ImportError: cannot import name run_with_zodb3py2_compat This amends commit fc69e00d (go/zodb/fs1: Fix Python database generator to work with recent zodbtools).
-
- 20 Jul, 2021 1 commit
-
-
Kirill Smelkov authored
- the only valid range for at is [tail, head]. Don't try to return anything meaningful for queries outside of this range and just panic instead. This is consistent with SliceByRev, which also panics on invalid query, and it is also consistent with semantic model that ΔTail is a vector with data keyed by tid in range (tail, head]: if key is out of vector range, access to the vector should panic, isn't it? - instead of returning revision of minimum entry on exact=n, always return (tail, exact=n) in that case. The change in behaviour is consistent with ΔFtail and ΔBtail from WCFS and is needed for ΔFtail to function correctly: https://lab.nexedi.com/kirr/wendelin.core/blob/22f5f096/wcfs/internal/xbtree/δbtail.go https://lab.nexedi.com/kirr/wendelin.core/blob/22f5f096/wcfs/internal/zdata/δftail.go
-
- 24 May, 2021 6 commits
-
-
Kirill Smelkov authored
When object is just created, it is not yet assigned an OID, but can be reachable from other objects. The code that processes transaction can reach to that new object and try to PActivate/PDeactivate it. And currently PDeactivate will drop the object state completely. Another example of object without an OID is Bucket embedded into a Tree object. There, the code that scans the tree can reach to that bucket and try to activate/deactivate it, leading, again, to dropping state of that bucket. -> Fix it.
-
Kirill Smelkov authored
Persistent.PActivate used to panic when called the second time, if the first time it hit an error. WCFS hit this in practice via object, that was previously accessed and pinned into the cache, but later deleted in the storage. -> Fix PActivate to reset .loading on an error, so that next time PActivate is called, it tries to trigger load again instead of panicking. Change doload criteria from state==GHOST && refcnt==1 to state==GHOST && loading==nil because now, after failed PActivate, refcnt can be != 0, if there are several other PActivate calls that were waiting for the failed PActivate but did not yet woke up. Here is how added test fails without the fix: --- FAIL: TestActivateAfterDelete (1.65s) panic: t.zodb.MyObject(0000000000000065): activate: need to load, but .loading != nil [recovered] panic: t.zodb.MyObject(0000000000000065): activate: need to load, but .loading != nil goroutine 10085 [running]: testing.tRunner.func1.2(0x649020, 0xc000520660) /home/kirr/src/tools/go/go/src/testing/testing.go:1143 +0x332 testing.tRunner.func1(0xc0001cb080) /home/kirr/src/tools/go/go/src/testing/testing.go:1146 +0x4b6 panic(0x649020, 0xc000520660) /home/kirr/src/tools/go/go/src/runtime/panic.go:965 +0x1b9 lab.nexedi.com/kirr/neo/go/zodb.(*Persistent).PActivate(0xc0001184d0, 0x6e8360, 0xc00019ac90, 0x0, 0x0) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/persistent.go:191 +0xce5 lab.nexedi.com/kirr/neo/go/zodb.TestActivateAfterDelete(0xc0001cb080) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/persistent_test.go:786 +0x72c
-
Kirill Smelkov authored
Add utility to verify FileStorage data for consistency. To verify we just need to iterate through all records, because FileStorage driver performs all consistency checks by itself. Mimic normal output to be the same as in fstest from ZODB/py. Example runs of fstest.py and `fs1 verify` on a broken file: $ python ~/src/wendelin/z/ZODB/src/ZODB/scripts/fstest.py -v 1.fs 4: transaction tid 0x03e044f6448c8022 #0 213: transaction tid 0x03e044f646e044bb #1 1.fs has data records that extend beyond the transaction record; end at 466 $ fs1 verify -v 1.fs 4: transaction tid 0x03e044f6448c8022 #0 213: transaction tid 0x03e044f646e044bb #1 2021/05/24 12:43:37 fsverify: 1.fs: 1.fs: transaction record @355: -> (iter data): 1.fs: data record @416: check: data record [..., 466) overlaps txn boundary [..., 458) As can be seen, in fs1 case, the error contains more details: [start, end) of both mismatching transaction and data records. In addition to fstest-like verbosity, add progress-mode, where % of total completion is printed in a style similar to one used by `fs1 verify-index`. The Go-based implementation is also faster even when data is on HDD. For example on a 73GB database provided by @jerome[1] fsrefs.py takes ~15 minutes to run and occupy ~70-100% of CPU. On the other hand `fs1 verify` takes ~7 minutes to run and occupy ~ 20-30% of CPU. Tests pending. [1] nexedi/zodbtools!19 (comment 129480)
-
Kirill Smelkov authored
Some dumpers might want to print something at the end of their dump. We will need this functionality for Verify (see next patch).
-
Kirill Smelkov authored
-
Kirill Smelkov authored
And use that in the callers.
-
- 19 May, 2021 1 commit
-
-
Kirill Smelkov authored
This error type is documented (see Loader) to be always created as *NoObjectError. -> Fix Error receiver accordingly.
-
- 10 May, 2021 1 commit
-
-
Kirill Smelkov authored
WCFS needs this to run tests faster. In general it is also a good idea to pass options to DB constructor, in particular options that affect live cache size, or other properties, for further created connections.
-
- 26 Mar, 2021 11 commits
-
-
Kirill Smelkov authored
-
Kirill Smelkov authored
-
Kirill Smelkov authored
For Load demo.Storage implementation is similar to DemoStorage in ZODB/py with fixes "cherry-picked" from: - https://github.com/zopefoundation/ZODB/issues/318 (DemoStorage does not take whiteouts into account -> leading to data corruption) - https://github.com/zopefoundation/ZODB/pull/323 (loadAt + fix for the above issue) For safety demo.Storage - contrary to DemoStorage/py - actually verifies that for demo=base+δ δ comes strictly after base and that base remains unchanged. URI schema follows XRI Cross-references approach and is demo:(zurl_base)/(zurl_δ) https://en.wikipedia.org/wiki/Extensible_Resource_Identifier provides some related details and examples. For ZODB/py corresponding pull-request for zodburi to support demo: URI scheme has been made here: https://github.com/Pylons/zodburi/pull/29 . Tests need: - recent zodbtools with zodbrestore: https://lab.nexedi.com/nexedi/zodbtools/blob/129afa67/zodbtools/zodbrestore.py nexedi/zodbtools!19 - ZODB with support for DemoStorage.deleteObject https://github.com/zopefoundation/ZODB/pull/341 On Go side demo storage is needed for wendelin.core 2 because ERP5 uses DemoStorage to run tests.
-
Kirill Smelkov authored
This is low-level API to open IStorageDriver instead of IStorage. Demo storage will need this. Maybe it would be a good idea to move drivers-related functionality into separate package in the future.
-
Kirill Smelkov authored
In ZODB/go when there is no schema in zurl, open automatically prepends file:// . However filename itself could contain ":" and so generally speaking it is incorrect to return URL without file:// schema prepended to file name. Another reason to always use fully-constructed URLs with schema, is interoperability with ZODB/py - there zodburi, when given zurl without schema, does not make any assumption that it is of file:// kind and rejects opening such URIs.
-
Kirill Smelkov authored
An URI schema is required to have ":" after it, but - even if frequently used in practice - not //. We will soon introduce "demo:" URI scheme that comes without //, so fix Open to detect schema presence just by ":" and not to fixup "demo:..." url to "file://demo:..." automatically.
-
Kirill Smelkov authored
Before the patch if storage.watcher fails, storage.driver.Close is not called, and so the driver will continue to send to .drvWatchq, but noone is receiving from it. a5dbb92b (go/zodb: Require drivers to close watchq on Close), provides the guarantee that the driver will stop sending on drvWatchq right after drv.Close call.
-
Kirill Smelkov authored
Provide guaranty that Close forces the driver to stop sending to watchq and to close it. See a5dbb92b ("go/zodb: Require drivers to close watchq on Close") for details. Without the fix TestWatch fails with test timeout: panic: test timed out after 30s # Close waits for serve to stop goroutine 93 [semacquire]: sync.runtime_Semacquire(0xc000152170) /home/kirr/src/tools/go/go/src/runtime/sema.go:56 +0x45 sync.(*WaitGroup).Wait(0xc000152168) /home/kirr/src/tools/go/go/src/sync/waitgroup.go:130 +0x65 lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.(*zLink).Close(0xc0001520f0, 0x1313, 0x1) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zrpc.go:159 +0x47 lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.(*zeo).Close(0xc000313680, 0xc000107c78, 0x1) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zeo.go:526 +0x2e lab.nexedi.com/kirr/neo/go/internal/xtesting.DrvTestWatch(0xc000082c00, 0xc0000aa2a0, 0x24, 0x6a4a38) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/internal/xtesting/xtesting.go:442 +0xdb5 lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.TestWatch.func1(0xc000082c00, 0x6e3498, 0xc00009a380) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zeo_test.go:270 +0x99 lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.withZEOSrv.func2.1(0xc0000a4168, 0x16) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zeo_test.go:207 +0xfb lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.withZEOSrv.func1(0xc000082c00, 0xc00009c180) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zeo_test.go:186 +0x129 lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.withZEOSrv.func2(0xc000082c00) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zeo_test.go:199 +0x10e testing.tRunner(0xc000082c00, 0xc00009c160) /home/kirr/src/tools/go/go/src/testing/testing.go:1194 +0xef created by testing.(*T).Run /home/kirr/src/tools/go/go/src/testing/testing.go:1239 +0x2b3 # serve is stuck in invalidateTransaction doing watchq<- goroutine 26 [chan send]: lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.(*zeo).invalidateTransaction(0xc000313680, 0x6417e0, 0xc000323b60, 0x0, 0x0) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zeo.go:176 +0x373 lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.(*zLink).serveRecv1(0xc0001520f0, 0xc000393890, 0x0, 0x0) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zrpc.go:225 +0x4b4 lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.(*zLink).serveRecv(0xc0001520f0) /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zrpc.go:176 +0x8d created by lab.nexedi.com/kirr/neo/go/zodb/storage/zeo.(*zLink).start /home/kirr/src/neo/src/lab.nexedi.com/kirr/neo/go/zodb/storage/zeo/zrpc.go:99 +0xc8
-
Kirill Smelkov authored
Provide guaranty that Close forces the driver to stop sending to watchq and to close it. See a5dbb92b ("go/zodb: Require drivers to close watchq on Close") for details.
-
Kirill Smelkov authored
If we don't require drivers to stop sending to watchq after Close, there could be many deadlock scenarios, for example: - client called drv.Close(); no longer listens to watchq; driver is stuck sending to watchq, or - client called drv.Close(), which itself waits for tasks spawned inside driver to complete, which are stuck sending to watchq because client no longer does <-watchq. The change is similar in spirit to safety guaranty provided by high-level Watcher where after DelWatch call it is guaranteed that there will be no more sends to subscribed watchq (see c41c2907 "go/zodb: High-level watching - initial draft") for details. All drivers don't provide requested guarantee yet. We'll be fixing them one-by-one in followup commits.
-
Kirill Smelkov authored
Else, on an error, it is the lineno of `t.Fatal(err)` inside FatalIf that is printed, not the line number inside user test.
-
- 17 Mar, 2021 4 commits
-
-
Kirill Smelkov authored
A data record with len(data)=0 and backpointer=0 is considered by FileStorage/py as "no data": https://github.com/zopefoundation/ZODB/blob/5.6.0-15-g22d1405d4/src/ZODB/FileStorage/FileStorage.py#L576-L582 Even though currently it is not possible to create such data record via FileStorage(py).deleteObject (becase it raises POSKeyError if there is no previous object revision), being able to use such data records is semantically useful in overlayed DemoStorage settings, where δ part marks an object that exists only in base with delete record whiteout. It is also generally meaningfull to be able to create "delete" record even if object was not previously existing: "deleteObject" is actually similar to "store" (and so should be better named as "storeDelete"). If one wants to store deletion, there should not be a reason to reject it, because deleteObject already has seatbelt in the form of oldserial, and if the user calls deleteObject(oid, oldserial=z64), he/she is already telling that "I know that this object does not exist in this storage (oldserial=z64), but still please create a deletion record for it". Once again this is useful in overlayed DemoStorage settings described above. For the reference, such whiteout deletion records pass ZODB/scripts/fstest just fine. Even though FileStorage/py loads such data records just fine, on FileStorage/go side it was not the case - DataHeader.LoadBackRef, even with backpointer=0, was verifying that backpointer to be valid and failing seeing it might overlap with current transaction: === RUN TestLoadWhiteout 2021/03/17 06:40:58 index load: open testdata/whiteout.fs.index: no such file or directory 2021/03/17 06:40:58 testdata/whiteout.fs: index rebuild... filestorage_test.go:398: load 0000000000000017:0000000000000001: bad err: have: testdata/whiteout.fs: load 0000000000000017:0000000000000001: testdata/whiteout.fs: data record @27: check: backpointer (0) overlaps with txn (4) want: testdata/whiteout.fs: load 0000000000000017:0000000000000001: 0000000000000001: object was not yet created It was a thinko: backPos==0 was already kind of handled in LoadBackRef, but only in one verification case. -> Fix all checks not to trigger when seeing backPos=0. DataHeader.LoadBack - the caller of LoadBackRef - already handles returned backPos=0 as "no data".
-
Kirill Smelkov authored
We were reusing Dumper instance in between testDump subtests. This was not noticed, as it was only "1" and then "empty" case, because "emtpy" has no transactions. However in the next patch we'll add another subcase, and if the dumper instance is not reset, it will think that it starts from transaction number non-zero, which would differ from fresh dumper output. -> Fix it.
-
Kirill Smelkov authored
As Zodbtools dropped ZODB3 support its run_with_zodb3py2_compat was renamed to run_with_zodb4py2_compat: nexedi/zodbtools@c59a54ca
-
Kirill Smelkov authored
Encode/decode was deprecated and removed in recent github.com/shamaton/msgpack.
-