- 04 Sep, 2022 1 commit
-
-
Kirill Smelkov authored
As explained by "Bug1" in https://github.com/zopefoundation/ZEO/issues/209 there is a race in between load and invalidate when ZEO5 client works wrt ZEO4 server: ---- 8< ---- ZEO5.client - contrary to ZEO4.client - does not account for simultaneous invalidations when working wrt ZEO4.server. It shows as e.g. ```console (z-dev) kirr@deca:~/src/wendelin/z/ZEO5$ ZEO4_SERVER=1 zope-testrunner -fvvvx --test-path=src -t check_race_load_vs_external ... AssertionError: T1: obj1 (24) != obj2 (23) obj1._p_serial: 0x03ea4b6fb05b52cc obj2._p_serial: 0x03ea4b6faf253855 zconn_at: 0x03ea4b6fb05b52cc # approximated as max(serials) zstor.loadBefore(obj1, @zconn.at) -> serial: 0x03ea4b6fb05b52cc next_serial: None zstor.loadBefore(obj2, @zconn.at) -> serial: 0x03ea4b6faf253855 next_serial: 0x03ea4b6fb07def66 zstor._cache.clear() zstor.loadBefore(obj1, @zconn.at) -> serial: 0x03ea4b6fb05b52cc next_serial: 0x03ea4b6fb07def66 zstor.loadBefore(obj2, @zconn.at) -> serial: 0x03ea4b6fb05b52cc next_serial: 0x03ea4b6fb07def66 ``` indicating that obj2 was provided to user from the cache that erroneously got stale. With [IO trace](kirr/ZEO@f184a1d9) showing message exchange in between client and the server, this looks as: ``` # loadBefore issued tx (('\x00\x00\x00\x00\x00\x00\x00\x02', '\x03\xeaKo\xaf%8V'), False, 'loadBefore', ('\x00\x00\x00\x00\x00\x00\x00\x02', '\x03\xeaKo\xaf%8V')) # received invalidateTransaction rx (0, 1, 'invalidateTransaction', ('\x03\xeaKo\xb0[R\xcc', ['\x00\x00\x00\x00\x00\x00\x00\x01', '\x00\x00\x00\x00\x00\x00\x00\x02'])) # received loadBefore reply but with end_tid=None !!! rx (('\x00\x00\x00\x00\x00\x00\x00\x02', '\x03\xeaKo\xaf%8V'), 0, '.reply', ('\x80\x03cZODB.tests.MinPO\nMinPO\nq\x01.\x80\x03}q\x02U\x05valueq\x03K\x17s.', '\x03\xeaKo\xaf%8U', None)) ``` which: 1) contradicts what @jimfulton [wrote about ZEO4](https://github.com/zopefoundation/ZEO/blob/master/docs/ordering.rst#zeo-4): that there _invalidations are sent in a callback called when the storage lock is held, blocking loads while committing_, and 2) points out what the bug is: Since in ZEO4 loads can be handled while a commit is in progress, ZEO4.client has special care to detect if an `invalidateTransaction` comes in between `load` request and corresponding `.reply` response, and _does not update the cache if invalidateTransaction sneaked in-between_: https://github.com/zopefoundation/ZEO/blob/47d3fbe8cbf24cad91b183483df069ef20708874/src/ZEO/ClientStorage.py#L367-L374 https://github.com/zopefoundation/ZEO/blob/47d3fbe8cbf24cad91b183483df069ef20708874/src/ZEO/ClientStorage.py#L841-L852 https://github.com/zopefoundation/ZEO/blob/47d3fbe8cbf24cad91b183483df069ef20708874/src/ZEO/ClientStorage.py#L1473-L1476 but in ZEO5.client `loadBefore` does not have anything like that https://github.com/zopefoundation/ZEO/blob/fc0729b3cc754bda02c7f54319260b5527dd42a3/src/ZEO/ClientStorage.py#L603-L608 https://github.com/zopefoundation/ZEO/blob/fc0729b3cc754bda02c7f54319260b5527dd42a3/src/ZEO/asyncio/client.py#L289-L309 and thus `invalidateTransaction` sneaked in between `loadBefore` request and corresponding `.reply` causes ZEO client cache corruption. In the original `check_race_load_vs_external_invalidate` the problem appears only sometimes, but the bug happens with ~ 100% probability with the following delay injected on the server after `loadBefore`: ```diff --- a/src/ZEO/tests/ZEO4/StorageServer.py +++ b/src/ZEO/tests/ZEO4/StorageServer.py @@ -285,7 +285,9 @@ def loadEx(self, oid): def loadBefore(self, oid, tid): self.stats.loads += 1 - return self.storage.loadBefore(oid, tid) + x = self.storage.loadBefore(oid, tid) + time.sleep(0.1) + return x def getInvalidations(self, tid): invtid, invlist = self.server.get_invalidations(self.storage_id, tid) ``` so maybe, in addition to normal test runs, it could be also good idea to run the whole ZEO testsuite against so-amended storage backend. This way it will be similar to how e.g. [races are detected](https://lab.nexedi.com/kirr/go123/blob/070bfdbb/tracing/tracetest/tracetest.go#L76-102) by my [tracetest](https://lab.nexedi.com/kirr/go123/blob/070bfdbb/tracing/tracetest/tracetest.go#L20-73) package. ---- 8< ---- -> Add suggested test to the whole ZEO testsuite. Currently it reliably catches the data curruption caused by explained load/invalidate race condition and fails as follows: (z-dev) kirr@deca:~/src/wendelin/z/ZEO5$ ZEO4_SERVER=1 zope-testrunner -fvvvc -a 100 --test-path=src ... Running .FileStorageLoadDelayedTests tests: ... Failure in test check_race_load_vs_external_invalidate (ZEO.tests.testZEO.FileStorageLoadDelayedTests) Traceback (most recent call last): File "/usr/lib/python2.7/unittest/case.py", line 329, in run testMethod() File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/racetest.py", line 253, in check_race_load_vs_external_invalidate return self._check_race_load_vs_external_invalidate(T2ObjectsInc()) File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/util.py", line 417, in _ return f(*argv, **kw) File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/tests/racetest.py", line 346, in _check_race_load_vs_external_invalidate self.fail('\n\n'.join([_ for _ in failure if _])) File "/usr/lib/python2.7/unittest/case.py", line 410, in fail raise self.failureException(msg) AssertionError: T4: obj1 (19) != obj2 (20) obj1._p_serial: 0x03eabd580073db55 obj2._p_serial: 0x03eabd580079a5dd zconn_at: 0x03eabd580079a5dd # approximated as max(serials) zstor.loadBefore(obj1, @zconn.at) -> serial: 0x03eabd580073db55 next_serial: None zstor.loadBefore(obj2, @zconn.at) -> serial: 0x03eabd580079a5dd next_serial: None zstor._cache.clear() zstor.loadBefore(obj1, @zconn.at) -> serial: 0x03eabd580079a5dd next_serial: None zstor.loadBefore(obj2, @zconn.at) -> serial: 0x03eabd580079a5dd next_serial: 0x03eabd5800aff0ee T7: obj1 (18) != obj2 (20) obj1._p_serial: 0x03eabd580071a011 obj2._p_serial: 0x03eabd580079a5dd zconn_at: 0x03eabd580079a5dd # approximated as max(serials) zstor.loadBefore(obj1, @zconn.at) -> serial: 0x03eabd580071a011 next_serial: None zstor.loadBefore(obj2, @zconn.at) -> serial: 0x03eabd580079a5dd next_serial: None zstor._cache.clear() zstor.loadBefore(obj1, @zconn.at) -> serial: 0x03eabd580079a5dd next_serial: None zstor.loadBefore(obj2, @zconn.at) -> serial: 0x03eabd580079a5dd next_serial: None Fortunately added tests do not fail for ZEO5.client-ZEO5.server. Running added tests takes more time compared to running regular tests. For example the time to run BlobAdaptedFileStorageTests at default level=1 takes 19s. To run the same BlobAdaptedFileStorageTests at level=2 (thus including check_race_external_invalidate_vs_disconnect from https://github.com/zopefoundation/ZODB/pull/368) takes 25s. And to run hereby-added FileStorageLoadDelayedTests takes 85s. That's why the test is marked as very long via level=10
-
- 02 Sep, 2022 3 commits
-
-
Kirill Smelkov authored
/reviewed-by @d-maurer /reviewed-on https://github.com/zopefoundation/ZEO/pull/210
-
Kirill Smelkov authored
Since 2016 this mode was opt-in for a long time with the default being single-threaded asyncio-based server implementation. We believe nobody actually used multi-threaded ZEO5 server mode for real at all. Multi-threaded server mode was deprecated this spring (https://github.com/zopefoundation/ZEO/pull/190), and later found to have concurrency bug that lead to data corruption (https://github.com/zopefoundation/ZEO/issues/209). Given deprecation status, instead of fixing those bugs let's finally remove the multi-threaded server mode.
-
Kirill Smelkov authored
@d-maurer notes that --all on the command line prevents using `-a X` on the tox command line, e.g. as in `tox -- -a 1` because `--all` takes precedence over `-a`: https://github.com/zopefoundation/ZEO/pull/210#pullrequestreview-1094352596 Work this around via `-a 1000` used inside tox -> then, any `-a X` passed to tox on command line will take precedence over the default `-a 1000`. Note: Dieter suggested to use `-a 0` as the default, probably because zope-testrunner documents that "level 0 runs all tests". But unfortunately it is not true because `-a 0` disables tests on any level >= 0: https://github.com/zopefoundation/zope.testrunner/blob/eab00d6f/src/zope/testrunner/find.py#L471-L473 (z-dev) kirr@deca:~/src/wendelin/z/ZEO5$ zope-testrunner -uvvx -a 0 --test-path=src Running tests at level 0 Total: 0 tests, 0 failures, 0 errors and 0 skipped in 0.000 seconds. (z-dev) kirr@deca:~/src/wendelin/z/ZEO5$ zope-testrunner -uvvx -a 1 --test-path=src Running tests at level 1 Running zope.testrunner.layer.UnitTests tests: Set up zope.testrunner.layer.UnitTests in 0.000 seconds. Running: testClientBasics (ZEO.asyncio.tests.ClientTests) test_ClientDisconnected_on_call_timeout (ZEO.asyncio.tests.ClientTests) test_bad_protocol (ZEO.asyncio.tests.ClientTests) test_bad_server_tid (ZEO.asyncio.tests.ClientTests) ...
-
- 01 Sep, 2022 1 commit
-
-
Kirill Smelkov authored
Since https://github.com/zopefoundation/ZODB/commit/a94d63c552d6 (part of https://github.com/zopefoundation/ZODB/pull/368) some ZODB tests are marked as being long to help skipping them easily during development via running tests only at level 1. But by default we want to continue to run all tests. -> Adjust tox to run tests at all levels explicitly, since zope-testrunner's default is to run tests at level 1 only.
-
- 02 Aug, 2022 1 commit
-
-
Michael Howitz authored
-
- 01 Jun, 2022 1 commit
-
-
dieter authored
-------- kirr: Those methods are not private to Protocol, because they are invoked e.g. from other modules besides ZEO.asyncio.base - e.g. from ZEO.asyncio.server . Make them public via dropping "_" prefix and naming them using more descriptive names. Extracted from https://github.com/zopefoundation/ZEO/pull/195
-
- 31 May, 2022 1 commit
-
-
dieter authored
-------- kirr: Always use appropriate logger object set up in a module. It was only ZEO/tests/ConnectionTests.py which was using logging directly even though corresponding logger object was set up in that module. Extracted from https://github.com/zopefoundation/ZEO/pull/195
-
- 30 May, 2022 1 commit
-
-
dieter authored
-------- kirr: Extract from https://github.com/zopefoundation/ZEO/pull/195 bits that add documentation to existing code without changing semantic, and fix typos. The only things I added myself with further help from @d-maurer are: - documentation for server_sync in ClientStorage; - stub documentation for credentials in ClientStorage. Even though we agree to deprecate credentials in favour of peer-to-peer TLS, removing their support should go as a separate step. For server_sync feature, that https://github.com/zopefoundation/ZEO/pull/195 currently removes, we actually do use it for correctness: https://lab.nexedi.com/nexedi/erp5/blob/eaae74a082a0/product/ERP5Type/tests/custom_zodb.py#L175-179 erp5@c663257f https://github.com/zopefoundation/ZODB/commit/9821696f584f So document it with the intent to preserve it. /reviewed-on https://github.com/zopefoundation/ZEO/pull/202
-
- 23 May, 2022 1 commit
-
-
Kirill Smelkov authored
We can use stdlib random module and manually adjust randint to behave exactly as before Python3, so that testdata are matched exactly as before. Also drop unnecessary `random.seed(42)` in cache_run. It is not needed there because random is not used in cache_run at all. Based on work of @d-maurer in https://github.com/zopefoundation/ZEO/pull/195 .
-
- 11 Apr, 2022 1 commit
-
-
Dieter Maurer authored
Reduce ZEO testing time by dropping tests with some storages
-
- 09 Apr, 2022 2 commits
- 07 Apr, 2022 7 commits
-
-
Jens Vagelpohl authored
-
Jens Vagelpohl authored
Full linting with flake8
-
Jens Vagelpohl authored
-
Jens Vagelpohl authored
-
Jens Vagelpohl authored
Co-authored-by: Dieter Maurer <d-maurer@users.noreply.github.com>
-
Jens Vagelpohl authored
Co-authored-by: Dieter Maurer <d-maurer@users.noreply.github.com>
-
Jens Vagelpohl authored
-
- 06 Apr, 2022 3 commits
-
-
Jens Vagelpohl authored
Configure with zope.meta.config.config-package
-
Jens Vagelpohl authored
-
Jens Vagelpohl authored
Python 3.10 support
-
- 05 Apr, 2022 1 commit
-
-
Jens Vagelpohl authored
-
- 31 Mar, 2022 4 commits
-
-
Dieter Maurer authored
tpc_vote/abort + log message enhancements
-
dieter authored
-
dieter authored
-
dieter authored
-
- 30 Mar, 2022 2 commits
- 24 Mar, 2022 5 commits
-
-
Jens Vagelpohl authored
-
Jens Vagelpohl authored
-
Jens Vagelpohl authored
-
Jens Vagelpohl authored
-
Jens Vagelpohl authored
Remove testing for mtacceptor and dead Python versions
-
- 19 Mar, 2022 5 commits
-
-
dieter authored
-
dieter authored
-
dieter authored
Python 3.10 distinguishes SSL contexts for server and client use - and SSL fails with the wrong context type
-
Jens Vagelpohl authored
-
Jens Vagelpohl authored
-