- 02 Apr, 2021 7 commits
-
-
Julien Muchembled authored
- When undoing current record, fix: - crash of storage nodes that don't have the undo data (non-readable cells); - and conflict resolution. - Fix undo deduplication in replication when NEO deduplication is disabled. - client: minor fixes in undo() about concurrent storage disconnections and PT updates.
-
Julien Muchembled authored
undone_data_tid can't be equal to a TTID.
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 22 Mar, 2021 1 commit
-
-
Julien Muchembled authored
-
- 04 Mar, 2021 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 15 Jan, 2021 2 commits
-
-
Julien Muchembled authored
The purpose of suppress_ragged_eofs=False was to micro-optimize the normal case: when there's no EOF. But commit 061cd5d8 showed that this option only turns ragged EOF into an exception. It may be easier for alternate NEO implementations to close the SSL connection properly. Or the performance benefit was not worth the risk to freeze a NEO process.
-
Kirill Smelkov authored
Testing NEO/go client wrt NEO/py server revealed a bug in NEO/py SSL handling: proper non-ragged EOF from a peer is ignored, and so leads to hang in infinite loop inside _SSL.receive with read_buf memory growing indefinitely. Details are below: NEO/py wraps raw sockets with ssl.wrap_socket(suppress_ragged_eofs=False) which instructs SSL layer to convert unexpected EOF when receiving a TLS record into SSLEOFError exception. However when remote peer properly closes its side of the connection, socket.read() still returns b'' to report non-ragged regular EOF: https://github.com/python/cpython/blob/v2.7.18/Lib/ssl.py#L630-L650 The code was handling SSLEOFError but not b'' return from socket recv. Thus after NEO/go client was disconnecting and properly closing its side of the connection, the code started to loop indefinitely in _SSL.receive under `while 1` with b'' returned by self.socket.recv() appended to read_buf again and again. -> Fix it by detecting non-ragged EOF as well and, similarly to how SSLEOFError is handled, converting them into self._error('recv', None). See merge request nexedi/neoppod!17
-
- 11 Jan, 2021 4 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
The scenario that was described in comments was meaningless because S1 never goes out-of-date.
-
- 02 Oct, 2020 1 commit
-
-
Julien Muchembled authored
For the master, the purpose of -m/--masters is to specify addresses of other master nodes, since its own address is already known via -b/--bind. Therefore, an empty value for -m/--masters is valid. The user remains free to repeat the -b value in -m. More generally, a node may choose to only specify master addresses via -D/--dynamic-master-list, so the check that at least one master address is specified is moved where the NodeManager is expected to be initialized.
-
- 29 Sep, 2020 1 commit
-
-
Julien Muchembled authored
-
- 25 Sep, 2020 4 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
The time complexity of previous one was too bad. With several tens of concurrent transactions, we saw commits take minutes to complete and the whole application looked frozen. This new algorithm is much simpler. Instead of asking the oldest transaction to somewhat restart (we used the "rebase" term because the concept was similar to what git-rebase does), the storage gives it priority and the newest is asked to relock (this request is ignored if vote already happened, which means there was actually no deadlock). testLocklessWriteDuringConflictResolution was initially more complex because Transaction.written (client) ignored KeyError (which is not the case anymore since commit 8ef1ddba).
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 10 Sep, 2020 3 commits
-
-
Julien Muchembled authored
This is all the more important for RocksDB that it wants to keep all transaction work in RAM. Once we had to truncate 40% of a 1TB MyRocks DB with 24 partitions, 4 being processed in parallel. Even when committing between partitions, MariaDB used up to 200 GB. Without the commit, 1TB RAM would not have been enough.
-
Julien Muchembled authored
-
Julien Muchembled authored
The default value is quickly exceeded when truncating a DB. Obviously, you may need a lot of RAM.
-
- 04 Sep, 2020 1 commit
-
-
Julien Muchembled authored
-
- 21 Aug, 2020 1 commit
-
-
Julien Muchembled authored
Resetting a storage node could mark all TEST log entries as being emitted by this storage node. For example: 16:18:12.9114 S2 #0x0007 AskStoreObject > S1 (...)
-
- 25 Jun, 2020 1 commit
-
-
Julien Muchembled authored
-
- 24 Jun, 2020 1 commit
-
-
Julien Muchembled authored
-
- 12 Jun, 2020 1 commit
-
-
Julien Muchembled authored
====================================================================== FAIL: check_tid_ordering_w_commit (neo.tests.zodb.testBasic.BasicTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "ZODB/tests/BasicStorage.py", line 397, in check_tid_ordering_w_commit self.assertEqual(results.pop('lastTransaction'), tids[1]) File "neo/tests/__init__.py", line 301, in assertEqual return super(NeoTestBase, self).assertEqual(first, second, msg=msg) failureException: '\x03\xd8\x85H\xbffp\xbb' != '\x03\xd8\x85H\xbfs\x0b\xdd'
-
- 11 Jun, 2020 1 commit
-
-
Julien Muchembled authored
This requires ZODB >= 5.6.0
-
- 29 May, 2020 3 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 18 May, 2020 1 commit
-
-
Julien Muchembled authored
This fixes the bug that with only email notification, monitoring stopped checking whether backup clusters are lagging after status is unchanged since the last check (about lagging, what is compared is the set of lagging backups). Until another event wakes up monitoring. The code is also simplified in that there's no need for the moment to have a different timeout between the normal case and a smtp failure.
-
- 20 Mar, 2020 1 commit
-
-
Vincent Pelletier authored
-
- 16 Mar, 2020 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 14 Feb, 2020 1 commit
-
-
Julien Muchembled authored
When concurrent transactions fail with different storages (e.g. only network issues between C1-S2 and C2-S1), in such a way that each transaction can be committed but not both (or the cluster would be non-operational), and if the first transaction is aborted (between tpc_vote and tpc_finish), then the second wrongly failed with INCOMPLETE_TRANSACTION. And if both transactions could be committed (e.g. more than 1 replica), some nodes would be disconnected for nothing.
-
- 21 Jan, 2020 1 commit
-
-
Julien Muchembled authored
This fixes: Traceback (most recent call last): ... File "neo/admin/handler.py", line 200, in answerLastTransaction app.maybeNotify(name) File "neo/admin/app.py", line 380, in maybeNotify self._notify(False) File "neo/admin/app.py", line 302, in _notify body += '', name, ' ' + backup.formatSummary(upstream)[1] File "neo/admin/app.py", line 74, in formatSummary tid = self.backup_tid if backup else self.ltid AttributeError: 'Backup' object has no attribute 'backup_tid'
-