Commits · 3e3eab5bed678ab76639026099823e9a948e0c3c · Carlos Ramos Carreño / neoppod

30 Nov, 2015 10 commits

Perform DB truncation during recovery, send PT to storages before verification · 3e3eab5b

Julien Muchembled authored Nov 25, 2015

Currently, the database may only be truncated when leaving backup mode, but
the issue will be the same when neoctl gets a new command to truncate at an
arbitrary tid: we want to be sure that all nodes are truncated before anything
else.

Therefore, we stop sending Truncate orders before stopping operation because
nodes could fail/exit before actually processing them. Truncation must also
happen before asking nodes their last ids.

With this commit, if a truncation is requested:
- this is always the first thing done when a storage node connects to the
  primary master during the RECOVERING phase,
- and the cluster does not start automatically if there are missing nodes,
  unless an admin forces it.

Other changes:
- Connections to storage nodes don't need to be aborted anymore when leaving
  backup mode.
- The master always initiates communication when a storage node identifies,
  which simplifies code and reduces the number of exchanged packets.

3e3eab5b

master: fix possible blockage during recovery after a storage disconnection · 2485f151

Julien Muchembled authored Nov 19, 2015

At some point, the master asks a storage node its partition table. If this node
is lost before getting an answer, another node (or the same one if it comes
back) must be asked.

Before this change, the master node had to be restarted.

2485f151

master: last tid/oid after recovery/verification · dec81519

Julien Muchembled authored Nov 20, 2015

The important bugfix is to update the last oid when the master verifies a
transaction with new oids.

By resetting the transaction manager at the beginning of the recovery phase,
it become possible to avoid tid/oid holes:
- by reallocating previously unused allocated oids
- when going back "in the past", i.e. reverting to an older version of the
  database (with fewer oids) and/or adjusting the clock

dec81519

Go back/stay in RECOVERING state when the partition table can't be operational · e1f9a7da

Julien Muchembled authored Nov 25, 2015

This fixes several cases where the partition table could become corrupt and
the whole cluster being stuck in VERIFYING state.

This also reduces the probability the have cells out of date when restarting
several storage nodes simultaneously.

At last, if a master node becomes primary again, a cluster must not be started
automatically if nodes with readable cells are missing, in order to avoid
a split of the database. This could happen if this master node was previously
forced to start it.

e1f9a7da

Minimize the amount of work during tpc_finish · 7eb7cf1b

Julien Muchembled authored Nov 25, 2015

NEO did not ensure that all data and metadata are written on disk before
tpc_finish, and it was for example vulnerable to ENOSPC errors.
In other words, some work had to be moved to tpc_vote:

- In tpc_vote, all involved storage nodes are now asked to write all metadata
  to ttrans/tobj and _commit_. Because the final tid is not known yet, the tid
  column of ttrans and tobj now contains NULL and the ttid respectively.

- In tpc_finish, AskLockInformation is still required for read locking,
  ttrans.tid is updated with the final value and this change is _committed_.

- The verification phase is greatly simplified, more reliable and faster. For
  all voted transactions, we can know if a tpc_finish was started by getting
  the final tid from the ttid, either from ttrans or from trans. And we know
  that such transactions can't be partial so we don't need to check oids.

So in addition to minimizing the risk of failures during tpc_finish, we also
fix a bug causing the verification phase to discard transactions with objects
for which readCurrent was called.

On performance side:

- Although tpc_vote now asks all involved storages, instead of only those
  storing the transaction metadata, the client has been improved to do this
  in parallel. The additional commits are also all done in parallel.

- A possible improvement to compensate the additional commits is to delay the
  commit done by the unlock.

- By minimizing the time to lock transactions, objects are read-locked for a
  much shorter period. This is even more important that locked transactions
  must be unlocked in the same order.

Transactions with too many modified objects will now timeout inside tpc_vote
instead of tpc_finish. Of course, such transactions may still cause other
transaction to timeout in tpc_finish.

7eb7cf1b

Do not send useless node information to bootstraping node · 99ac542c
Julien Muchembled authored Nov 23, 2015

99ac542c
fixup! storage: fix pruning of data when deleting partial transactions during verification · cff279af
Julien Muchembled authored Nov 30, 2015
```
This fixes a regression in commit 83fe64bf
when ttrans has several rows to the same data_id.
```
cff279af
threaded: prevent neoctl to loop forever when something went wrong during the test · a63bf12f
Julien Muchembled authored Nov 26, 2015

a63bf12f
ssl: fix handshaking connections being stuck when they're aborted · fe487c07
Julien Muchembled authored Nov 27, 2015

fe487c07

ssl: consider connections completed after the handshake · aaefaf8b

Julien Muchembled authored Nov 27, 2015

- Server connections can now be in 'connecting' state.
- connectionAccepted event (which has never been used so far) is merged into
  connectionCompleted.

aaefaf8b

25 Nov, 2015 13 commits
- storage: always restart replication of outdated cells from the beginning (ZERO_TID) · 6b1f198f
  Julien Muchembled authored Nov 25, 2015
```
This is a workaround to fix holes if replication is interrupted after new data
is committed.
```
  6b1f198f
- threaded: fix typo · 949f7e0f
  Julien Muchembled authored Nov 25, 2015
  
  949f7e0f
- Ignore but log exceptions while closing a connection for which a assertion failed · 34a2fea3
  Julien Muchembled authored Nov 24, 2015
```
AssertionError are certainly more severe that any other exception
(including OperationFailure) because the process is in an unknown state.
```
  34a2fea3
- threaded: make it possible to send packets from a connection filter · 50134569
  Julien Muchembled authored Nov 24, 2015
```
This could have been useful in testStorageFailureDuringTpcFinish:
close() could not be called from answerTransactionFinished because it
deadlocked while trying to send notifications.
```
  50134569
- tests: clarify intention in testStorageFailureDuringTpcFinish · c5913373
  Julien Muchembled authored Nov 24, 2015
```
The test was relying on fact on the fact that 'c.abort()' caused an assertion
failure, which closed the connection and then raised OperationFailure.
Actually, I wanted to close the connection on master, but it's clearer this way.
```
  c5913373
- TODO: review election timeouts and transaction aborting on client disconnection · 20b7cecd
  Julien Muchembled authored Nov 20, 2015
  
  20b7cecd
- Small optimizations & cleanups · 79ea07c8
  Julien Muchembled authored Nov 19, 2015
  
  79ea07c8
- Fix 2 'except' statements that will bug when moving to Python 3 · 0d36de7b
  Julien Muchembled authored Nov 19, 2015
```
Previous code relied on the fact that the exception target is kept past
the end of the except clause. 2to3 is not smart enough to detect that.

Without this change, a different OperationalError exception would be
ignored because there's already a local variable of the same name.
```
  0d36de7b
- mysql: drop 'bigdata' table when erasing the database · b0023b43
  Julien Muchembled authored Nov 19, 2015
```
This was forgotten when this table was introduced in
commit f9a8500d
```
  b0023b43
- threaded: new method to sort storage nodes · 9d24294a
  Julien Muchembled authored Nov 13, 2015
```
If needed, sortStorageList can be extended in the
future to support a 'readable' parameter.
```
  9d24294a
- threaded: expose a method to stop a A/M/S node · 93f5b0d8
  Julien Muchembled authored Nov 13, 2015
  
  93f5b0d8
- neolog: new --node option to filter logs produced by threaded tests · e57b1bdd
  Julien Muchembled authored Nov 12, 2015
  
  e57b1bdd
- master: simplify code in verification by removing useless checks · 259539e5
  Julien Muchembled authored Nov 09, 2015
```
We can never receive several answers from the same node.

testVerification is dropped for the same reason as for testEvent and most of
testConnection, since there is much incoming changes for verification.
```
  259539e5
03 Nov, 2015 2 commits
- storage: fix pruning of data when deleting partial transactions during verification · 83fe64bf
  Julien Muchembled authored Nov 02, 2015
  
  83fe64bf
- master: fix 2 bugs in verification phase · daa83cb4
  Julien Muchembled authored Nov 02, 2015
```
- Last known TID was not updated when recovering a transaction.
- Missing OIDs were ignored, which caused partial transactions to be committed
  instead of being deleted.
```
  daa83cb4
29 Oct, 2015 2 commits
- BUGS: mark whether bugs concern basic features of ZODB or promised features of NEO · 524463e8
  Julien Muchembled authored Oct 27, 2015
  
  524463e8
- TODO: safer tpc_finish and faster storage · 63324838
  Julien Muchembled authored Oct 27, 2015
  
  63324838
26 Oct, 2015 2 commits
- Release version 1.5.1 · 6275f7c6
  Julien Muchembled authored Oct 26, 2015
  
  6275f7c6
- storage: faster resumption when many transactions have already been imported to MySQL · 7469e55b
  Julien Muchembled authored Oct 26, 2015
```
The previous SQL query caused a full table scan of the 'trans' table at startup.
```
  7469e55b
21 Oct, 2015 3 commits

tests: regenerate patch to ZODB3 using git · 0f0700a8
Julien Muchembled authored Oct 21, 2015
```
I used git-diff for each file and concatenated the result to preverse the order.
```
0f0700a8
client: add assertion in cache to detect wrong invalidation · badd9de3
Julien Muchembled authored Oct 21, 2015

badd9de3

Do not send invalidations for objects on which only readCurrent was called · 9682722b

Julien Muchembled authored Oct 20, 2015

This fixes invalid next_serial entries in cache,
and the following error for values not in cache:

  Traceback (most recent call last):
    File "ZODB/Connection.py", line 856, in setstate
      self._setstate(obj)
    File "ZODB/Connection.py", line 894, in _setstate
      self._load_before_or_conflict(obj)
    File "ZODB/Connection.py", line 922, in _load_before_or_conflict
      if not self._setstate_noncurrent(obj):
    File "ZODB/Connection.py", line 945, in _setstate_noncurrent
      assert end is not None
  AssertionError

9682722b

20 Oct, 2015 1 commit
- importer: fix crash when aborting transaction · b3522b1b
  Julien Muchembled authored Oct 20, 2015
  
  b3522b1b
19 Oct, 2015 4 commits
- mysql: use fewer queries to fill (t)obj when storing a transaction · 68401e70
  Julien Muchembled authored Oct 19, 2015
  
  68401e70
- mysql: refuse to start if max_allowed_packet is too small · b70b4689
  Julien Muchembled authored Oct 19, 2015
  
  b70b4689
- Do not flood logs when a client node sends a big packet in threaded tests · f4a66782
  Julien Muchembled authored Oct 19, 2015
```
When run with MySQL, testBasicStore (neo.tests.threaded.test.Test) was slow
and generated log exceeded 29MB.
```
  f4a66782
- Importer: fix retrieval of an object from ZODB when next serial in NEO · c9658ff3
  Julien Muchembled authored Oct 19, 2015
  
  c9658ff3
16 Oct, 2015 1 commit
- storage: speed up checking of replicas · f1bc3c32
  Julien Muchembled authored Oct 16, 2015
```
This increases the number of rows to check per AskCheck*Range packets.
```
  f1bc3c32
13 Oct, 2015 1 commit

storage: partially fix a potential crash during replication · 7af9d2d3

Julien Muchembled authored Oct 12, 2015

And document 3 bugs found by running many times testBackupNodeLost. About the
tic() issue, I had a case where the test exited instead of looping forever after
the storage crash.

7af9d2d3

12 Oct, 2015 1 commit
- storage: fix crash when a corruption is found while checking TIDs · 6da59ae8
  Julien Muchembled authored Oct 12, 2015
  
  6da59ae8