- 11 May, 2017 1 commit
-
-
Julien Muchembled authored
The next line (MTClientConnection) already logs new connections and the storage node is necessarily in RUNNING state.
-
- 10 May, 2017 2 commits
-
-
Julien Muchembled authored
Now, the primary master is the running master with None displayed in the last column. Before, it could be the id timestamp of when it was secondary, which was obsolete information.
-
Julien Muchembled authored
This fixes up commit 23b6a66a, which reimplements election. poll raised, retrying Traceback (most recent call last): ... File "neo/client/handlers/master.py", line 41, in notPrimaryMaster super(PrimaryNotificationsHandler, self).notPrimaryMaster(*args) File "neo/lib/handler.py", line 157, in notPrimaryMaster assert primary != self.app.server File "neo/client/app.py", line 109, in __getattr__ return self.__getattribute__(attr) AttributeError: 'Application' object has no attribute 'server'
-
- 04 May, 2017 1 commit
-
-
Julien Muchembled authored
-
- 02 May, 2017 1 commit
-
-
Julien Muchembled authored
This fixes the following crash: Traceback (most recent call last): ... File "neo/master/handlers/identification.py", line 94, in requestIdentification uuid = app.getNewUUID(uuid, address, node_type) File "neo/master/app.py", line 449, in getNewUUID assert uuid != self.uuid AssertionError
-
- 28 Apr, 2017 3 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
This really fixes the bug described in commit 40bac312, which could probably be reverted. It only reduced the probability of failure. What happened is that the second conflict on 'a' for t3 what first reported by an answer to first store with: - a base serial at which a=0 - a conflict serial at which a=7 However, the cached data is not 8 anymore but 12, since a second store already occurred after the first conflict (reported by the other storage node). When this conflict was resolved before receiving the conflict for second store, it gave: resolve(old=0, saved=7, new=12) -> 19 instead of: resolve(old=4, saved=7, new=12) -> 15 (if we still had the data of the first store, we could also do resolve(old=0, saved=7, new=8) but that would be inefficient from a memory point of view) The bug was difficult to reproduce. testNotifyReplicated had to be run many many times before that race conditions trigger it. The test was changed to enforce some of them, and the above scenario now happens almost always.
-
- 27 Apr, 2017 7 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
- atomic write to disk to avoid corruption - update when the address changes (not only when a node is removed/added)
-
Julien Muchembled authored
-
Julien Muchembled authored
This fixes 2 issues: - Because neoctl connects to admin nodes without requesting identification, the protocol version was not checked, which could even be dangerous (think of a user asking for information, but the packet sent by neoctl could be decoded as a packet to alter data, like Truncate). - In case of mismatched protocol version, the error was not loggued on the node that initiated the connection. Compatibility is handled as follows: - For an old node receiving data from a new node, the 2 high order bytes of the packet id, which is always 0 for the first packet, is decoded as the packet code. Packet 0 has never existed, which results in PacketMalformedError. - For a new node receiving data from an old node, the id of the first packet, which is always 0, is decoded as the version, which results in a version mismatch error. This new protocol also guarantees that there's no conflict with SSL. For simplification, the packet length does not count the header anymore.
-
- 25 Apr, 2017 4 commits
-
-
Julien Muchembled authored
When using network byte order ('!'), the size of struct items is independant of the platform. They have never changed from one version of Python to another.
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 24 Apr, 2017 6 commits
-
-
Julien Muchembled authored
The election is not a separate process anymore. It happens during the RECOVERING phase, and there's no use of timeouts anymore. Each master node keeps a timestamp of when it started to play the primary role, and the node with the smallest timestamp is elected. The election stops when the cluster is started: as long as it is operational, the primary master can't be deposed. An election must happen whenever the cluster is not operational anymore, to handle the case of a network cut between a primary master and all other nodes: then another master node (secondary) takes over and when the initial primary master is back, it loses against the new primary master if the cluster is already started.
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
In order to do that correctly, this commit contains several other changes: When connecting to a primary master, a full node list always follows the identification. For storage nodes, this means that they now know all nodes during the RECOVERING phase. The initial full node list now always contains a node tuple for: - the server-side node (i.e. the primary master): on a master, this is done by always having a node describing itself in its node manager. - the client-side node, to make sure it gets a id timestamp: now an admin node also receives a node for itself.
-
Julien Muchembled authored
This keeps the connection fully functional when a handler raises an exception.
-
- 19 Apr, 2017 2 commits
-
-
Julien Muchembled authored
Commits like 7eb7cf1b ("Minimize the amount of work during tpc_finish") dropped what was done in commit 07b48079 ("Ignore some requests, based on connection state") to protect request handlers when they respond. This commit fixes this in a generic way.
-
Julien Muchembled authored
-
- 18 Apr, 2017 4 commits
-
-
Julien Muchembled authored
The initial intention was to rely on stable sorting when several events have the same key. For this to happen, sorting must not continue the comparison with the second item of events. This could lead to data corruption (conflict resolution with wrong base): FAIL: testNotifyReplicated (neo.tests.threaded.test.Test) ---------------------------------------------------------------------- Traceback (most recent call last): File "neo/tests/threaded/__init__.py", line 1093, in wrapper return wrapped(self, cluster, *args, **kw) File "neo/tests/threaded/test.py", line 2019, in testNotifyReplicated self.assertEqual([15, 11, 13, 16], [r[x].value for x in 'abcd']) File "neo/tests/__init__.py", line 187, in assertEqual return super(NeoTestBase, self).assertEqual(first, second, msg=msg) failureException: Lists differ: [15, 11, 13, 16] != [19, 11, 13, 16] First differing element 0: 15 19 - [15, 11, 13, 16] ? ^ + [19, 11, 13, 16] ? ^
-
Julien Muchembled authored
-
Julien Muchembled authored
'aborted' could appear twice.
-
Julien Muchembled authored
-
- 13 Apr, 2017 1 commit
-
-
Julien Muchembled authored
-
- 04 Apr, 2017 1 commit
-
-
Kirill Smelkov authored
zodburi[1] provides a way to open ZODB storages by URL/URI. It already has support for file:// zeo:// zconfig:// memory:// etc schemes out of the box and third-party-to-ZODB storages can add support for their schemes via providing zodburi.resolvers entrypoint. For example relstorage and newtdb do this. Let's also teach NEO to open itself via neo:// URI schema. [1] http://docs.pylonsproject.org/projects/zodburi [2] https://github.com/zodb/relstorage/blob/2.1a1-15-g68c8cf1/relstorage/zodburi_resolver.py [3] https://github.com/newtdb/db/blob/0.5.2-1-gbd36e90/src/newt/db/zodburi.py
-
- 31 Mar, 2017 7 commits
-
-
Julien Muchembled authored
Commit 58d0b602 didn't fix the issue completely. Storage space can be freed with --repair option. This adds an expectedFailure test.
-
Julien Muchembled authored
This is a follow up of commit 64afd7d2, which focused on read accesses when there is no transaction activity. This commit also includes a test to check a simpler scenario that the one described in the previous commit.
-
Julien Muchembled authored
-
Julien Muchembled authored
Commit ad43dcd3 should have bumped it as well.
-
Julien Muchembled authored
Unused but it is likely to be useful in the future.
-
Julien Muchembled authored
The bug could lead to data corruption (if a partition is wrongly marked as UP_TO_DATE) or crashes (assertion failure on either the storage or the master). The protocol is extended to handle the following scenario: S M partition 0 outdated <-- UnfinishedTransactions ------> replication of partition 0 ... partition 1 outdated --- UnfinishedTransactions ... ... replication finished --- ReplicationDone ... tweak <-- partition 1 discarded -------- tweak <-- partition 1 outdated --------- ... UnfinishedTransactions --> ... ReplicationDone ---------> The master can't simply mark all outdated cells as being updatable when it receives an UnfinishedTransactions packet.
-
Julien Muchembled authored
-