- 21 Mar, 2016 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 09 Mar, 2016 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 08 Mar, 2016 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 04 Mar, 2016 3 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
Before this change, a storage node did 3 commits per transaction: - once all data are stored - when locking the transaction - when unlocking the transaction The last one is not important for ACID. In case of a crash, the transaction is unlocked again (verification phase). By deferring it by 1 second, we only have 2 commits per transaction during high activity because all pending changes are merged with the commits caused by other transactions. This change compensates the extra commit(s) per transaction that were introduced in commit 7eb7cf1b ("Minimize the amount of work during tpc_finish").
-
Julien Muchembled authored
-
- 02 Mar, 2016 1 commit
-
-
Julien Muchembled authored
Since commit d2d77437 ("client: make the cache tolerant to late invalidations when the entry is in the history queue"), invalidated items became current again when they were moved to the history queue, which was wrong for 2 reasons: - only the last items of _oid_dict values may have next_tid=None, - and for such items, they could be wrongly reused when caching the real current data.
-
- 01 Mar, 2016 1 commit
-
-
Julien Muchembled authored
-
- 26 Feb, 2016 4 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 05 Feb, 2016 1 commit
-
-
Julien Muchembled authored
This fixes the following scenario: 1. the master sends invalidations to clients, and unlocks to storages (oid1, tid1) 2. the storage receives/processes the unlock 3. the client asks data (oid1, tid0) 4. the storage returns tid1 as next tid, whereas it's still None in the cache (before, it caused an assertion failure) 6. the client processes invalidations
-
- 25 Jan, 2016 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 21 Jan, 2016 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 12 Jan, 2016 1 commit
-
-
Julien Muchembled authored
See commit c277ed20 ("client: really process all invalidations in poll thread").
-
- 16 Dec, 2015 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 13 Dec, 2015 3 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
This is a partial implementation. To truncate at a smaller tid, you must wait that data is imported up to this tid and stop using the Importer backend.
-
Julien Muchembled authored
This backend does not support replication. Even if we implemented it, such node could only be a source for other nodes so we should never delete transactions.
-
- 12 Dec, 2015 1 commit
-
-
Julien Muchembled authored
-
- 11 Dec, 2015 1 commit
-
-
Julien Muchembled authored
-
- 09 Dec, 2015 1 commit
-
-
Julien Muchembled authored
This fixes a regression caused by commit eef52c27
-
- 02 Dec, 2015 1 commit
-
-
Julien Muchembled authored
-
- 01 Dec, 2015 3 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
With the previous commit, the request to truncate the DB was not stored persistently, which means that this operation was still vulnerable to the case where the master is restarted after some nodes, but not all, have already truncated. The master didn't have the information to fix this and the result was a DB partially truncated. -> On a Truncate packet, a storage node only stores the tid somewhere, to send it back to the master, which stays in RECOVERING state as long as any node has a different value than that of the node with the latest partition table. We also want to make sure that there is no unfinished data, because a user may truncate at a tid higher than a locked one. -> Truncation is now effective at the end on the VERIFYING phase, just before returning the last ids to the master. At last all nodes should be truncated, to avoid that an offline node comes back with a different history. Currently, this would not be an issue since replication is always restart from the beginning, but later we'd like they remember where they stopped to replicate. -> If a truncation is requested, the master waits for all nodes to be pending, even if it was previously started (the user can still force the cluster to start with neoctl). And any lost node during verification also causes the master to go back to recovery. Obviously, the protocol has been changed to split the LastIDs packet and introduce a new Recovery, since it does not make sense anymore to ask last ids during recovery.
-
- 30 Nov, 2015 7 commits
-
-
Julien Muchembled authored
Currently, the database may only be truncated when leaving backup mode, but the issue will be the same when neoctl gets a new command to truncate at an arbitrary tid: we want to be sure that all nodes are truncated before anything else. Therefore, we stop sending Truncate orders before stopping operation because nodes could fail/exit before actually processing them. Truncation must also happen before asking nodes their last ids. With this commit, if a truncation is requested: - this is always the first thing done when a storage node connects to the primary master during the RECOVERING phase, - and the cluster does not start automatically if there are missing nodes, unless an admin forces it. Other changes: - Connections to storage nodes don't need to be aborted anymore when leaving backup mode. - The master always initiates communication when a storage node identifies, which simplifies code and reduces the number of exchanged packets.
-
Julien Muchembled authored
At some point, the master asks a storage node its partition table. If this node is lost before getting an answer, another node (or the same one if it comes back) must be asked. Before this change, the master node had to be restarted.
-
Julien Muchembled authored
The important bugfix is to update the last oid when the master verifies a transaction with new oids. By resetting the transaction manager at the beginning of the recovery phase, it become possible to avoid tid/oid holes: - by reallocating previously unused allocated oids - when going back "in the past", i.e. reverting to an older version of the database (with fewer oids) and/or adjusting the clock
-
Julien Muchembled authored
This fixes several cases where the partition table could become corrupt and the whole cluster being stuck in VERIFYING state. This also reduces the probability the have cells out of date when restarting several storage nodes simultaneously. At last, if a master node becomes primary again, a cluster must not be started automatically if nodes with readable cells are missing, in order to avoid a split of the database. This could happen if this master node was previously forced to start it.
-
Julien Muchembled authored
NEO did not ensure that all data and metadata are written on disk before tpc_finish, and it was for example vulnerable to ENOSPC errors. In other words, some work had to be moved to tpc_vote: - In tpc_vote, all involved storage nodes are now asked to write all metadata to ttrans/tobj and _commit_. Because the final tid is not known yet, the tid column of ttrans and tobj now contains NULL and the ttid respectively. - In tpc_finish, AskLockInformation is still required for read locking, ttrans.tid is updated with the final value and this change is _committed_. - The verification phase is greatly simplified, more reliable and faster. For all voted transactions, we can know if a tpc_finish was started by getting the final tid from the ttid, either from ttrans or from trans. And we know that such transactions can't be partial so we don't need to check oids. So in addition to minimizing the risk of failures during tpc_finish, we also fix a bug causing the verification phase to discard transactions with objects for which readCurrent was called. On performance side: - Although tpc_vote now asks all involved storages, instead of only those storing the transaction metadata, the client has been improved to do this in parallel. The additional commits are also all done in parallel. - A possible improvement to compensate the additional commits is to delay the commit done by the unlock. - By minimizing the time to lock transactions, objects are read-locked for a much shorter period. This is even more important that locked transactions must be unlocked in the same order. Transactions with too many modified objects will now timeout inside tpc_vote instead of tpc_finish. Of course, such transactions may still cause other transaction to timeout in tpc_finish.
-
Julien Muchembled authored
-
Julien Muchembled authored
This fixes a regression in commit 83fe64bf when ttrans has several rows to the same data_id.
-