Commits · 7025db52513639f881e5996c8a87850cdc4c3fa5 · Kirill Smelkov / neo

15 Sep, 2015 1 commit

Rewrite of scheduler for threaded tests · 7025db52

Julien Muchembled authored Sep 03, 2015

The previous implementation was built around a 'pending' global variable that
was set by a few monkey-patches when some network activity was pending between
nodes. All this is replaced by an extra epoll object is used to wait for nodes
that have pending network events: this is simpler, and faster since it
significantly reduces the number of context switches.

7025db52

14 Sep, 2015 1 commit
- Thread.isAlive is deprecated · 61009341
  Julien Muchembled authored Sep 14, 2015
  
  61009341
07 Sep, 2015 1 commit

Fix potential deadlock when connecting to primary master · af06676a

Julien Muchembled authored Sep 07, 2015

This is a regression caused by commit eef52c27
("Tickless poll loop, for lowest latency and cpu usage"), affecting:
- admins
- storages
- primary masters of backup clusters

af06676a

28 Aug, 2015 6 commits

client: drop now useless wrapper to log safely in poll thread during shutdown · 9531c9cb
Julien Muchembled authored Aug 28, 2015
```
Recent Python already catches exceptions due to garbage collection on exit.
```
9531c9cb

storage: fix history() not waiting oid to be unlocked · e27358d1

Julien Muchembled authored Aug 28, 2015

This fixes a random failure in testClientReconnection:

Traceback (most recent call last):
  File "neo/tests/threaded/test.py", line 754, in testClientReconnection
    self.assertTrue(cluster.client.history(x1._p_oid))
failureException: None is not true

e27358d1

Fix random failure in testRecycledClientUUID · 79be7787

Julien Muchembled authored Aug 28, 2015

Traceback (most recent call last):
  File "neo/tests/threaded/test.py", line 838, in testRecycledClientUUID
    x = client.load(ZERO_TID)
  [...]
  File "neo/tests/threaded/test.py", line 822, in notReady
    m2s.remove(delayNotifyInformation)
  File "neo/tests/threaded/__init__.py", line 482, in remove
    del self.filter_dict[filter]
KeyError: <function delayNotifyInformation at 0x7f511063a578>

79be7787

Fix several random failures in tests that didn't wait for transaction to be unlocked · c4ac45a8

Julien Muchembled authored Aug 28, 2015

NEOCluster.tic() gets a new 'slave' parameter that must be True when a client
node is in 'master' mode (i.e. setPoll(True)). In this case, tic() will wait
that all nodes finish their work and the client polls with a non-zero timeout.

Here, tic(slave=1) is used to wait for the storage to process
NotifyUnlockInformation notification from the master.

Traceback (most recent call last):
File "neo/tests/threaded/test.py", line 80, in testBasicStore
self.assertEqual(data_info, cluster.storage.getDataLockInfo())
File "neo/tests/__init__.py", line 170, in assertEqual
return super(NeoTestBase, self).assertEqual(first, second, msg=msg)
failureException: {('\x0b\xee\xc7\xb5\xea?\x0f\xdb\xc9]\r\xd4\x7f<[\xc2u\xda\x8a3', 0): 0} != {('\x0b\xee\xc7\xb5\xea?\x0f\xdb\xc9]\r\xd4\x7f<[\xc2u\xda\x8a3', 0): 1}

c4ac45a8

Several improvements to verbose locks · 5dc1f06c

Julien Muchembled authored Aug 28, 2015

All these changes were useful to debug deadlocks in threaded tests:
- New verbose Semaphore.
- Logs with numerical 'ident' were too annoying to read so revert to thread
  name (before commit 5b69d553), with an
  exception for threaded tests. There remains one case where the result is not
  unique: when several client apps are instantiated.
- Make deadlock detection optional.
- Make it possible to name locks.
- Make output more compact.
- Remove useless 'debug_lock' option.
- Add timing information.
- Make exception more verbose when an un-acquired lock is released.

Here is how I used 'locking':

--- a/neo/tests/threaded/__init__.py
+++ b/neo/tests/threaded/__init__.py
@@ -37,0 +38 @@
+from neo.lib.locking import VerboseSemaphore
@@ -71 +72,2 @@ def init(cls):
-        cls._global_lock = threading.Semaphore(0)
+        cls._global_lock = VerboseSemaphore(0, check_owner=False,
+                                            name="Serialized._global_lock")
@@ -265 +267,2 @@ def start(self):
-        self.em._lock = l = threading.Semaphore(0)
+        self.em._lock = l = VerboseSemaphore(0, check_owner=False,
+                                             name=self.node_name)
@@ -346 +349,2 @@ def __init__(self, master_nodes, name, **kw):
-        self.em._lock = threading.Semaphore(0)
+        self.em._lock = VerboseSemaphore(0, check_owner=False,
+                                         name=repr(self))

5dc1f06c

Fix occasional deadlocks in threaded tests · 0b93b1fb

Julien Muchembled authored Aug 28, 2015

deadlocks mainly happened while stopping a cluster, hence the complete review
of NEOCluster.stop()

A major change is to make the client node handle its lock like other nodes
(i.e. in the polling thread itself) to better know when to call
Serialized.background() (there was a race condition with the test of
'self.poll_thread.isAlive()' in ClientApplication.close).

0b93b1fb

14 Aug, 2015 2 commits

Remove useless assert in a private method of MTClientConnection · 1ab594b4
Julien Muchembled authored Aug 12, 2015

1ab594b4

Do not reconnect too quickly to a node after an error · d898a83d

Julien Muchembled authored Aug 09, 2015

For example, a backup storage node that was rejected because the upstream
cluster was not ready could reconnect in loop without delay, using 100% CPU
and flooding logs.

A new 'setReconnectionNoDelay' method on Connection can be used for cases where
it's legitimate to quickly reconnect.

With this new delayed reconnection, it's possible to remove the remaining
time.sleep().

d898a83d

12 Aug, 2015 16 commits
- Remove useless testEvent · 71e30fb9
  Julien Muchembled authored Aug 12, 2015
```
Such kind of test has never helped to detect regressions and any bug in
EpollEventManager would be quickly reported by other tests.

testConnection may go the same way if it keeps annoying me too much.
```
  71e30fb9
- client: do not wait for the remote to close the connection if it's not ready · f9df31be
  Julien Muchembled authored Aug 10, 2015
```
This is currently not an issue because the 'time.sleep(1)' in iterateForObject
(storage) and _connectToPrimaryNode (master) leave enough time. What could
happen is a new connection attempt for a node that already has a connection
(causing a failure assertion in Node.setConnection).
```
  f9df31be
- Fix invalid processing of unregistered connections · a4731a0c
  Julien Muchembled authored Aug 09, 2015
```
This could happen if a file descriptor was reallocated by the kernel.
```
  a4731a0c
- Simplify API to establish connections and accept mix of IPv4/IPv6 · ed50edca
  Julien Muchembled authored Aug 08, 2015
  
  ed50edca
- Rename parameter of polling methods now that _poll computes the timeout itself · c2c97752
  Julien Muchembled authored Aug 12, 2015
  
  c2c97752
- Tickless poll loop, for lowest latency and cpu usage · eef52c27
  Julien Muchembled authored Aug 02, 2015
```
With this patch, the epolling object is not awoken every second to check
if a timeout has expired. The API of Connection is changed to get the smallest
timeout.
```
  eef52c27
- tests: make Patch usable as a context manager · fd0b9c98
  Julien Muchembled authored Aug 05, 2015
  
  fd0b9c98
- Add file descriptor and aborted flag to __repr__ of connections · 91c66356
  Julien Muchembled authored Aug 02, 2015
  
  91c66356
- client: replace Event by a pipe as a way to stop the poll loop · cb8a5a88
  Julien Muchembled authored Jul 25, 2015
```
This is a prerequisite for tickless poll loops.
```
  cb8a5a88
- Fix 100% CPU usage when the closure of a connection is delayed · 4a328ade
  Julien Muchembled authored Aug 01, 2015
  
  4a328ade
- client: review connection locking (MTClientConnection) · 4e739de4
  Julien Muchembled authored Jul 27, 2015
```
This mainly changes several methods to lock automatically instead of asserting
that the caller did it. This removes any overhead for non-MT classes, and
the use of 'with' instead of lock/unlock methods also simplifies the API.
```
  4e739de4
- client: a simple lock is enough for the connection pool · e438f864
  Julien Muchembled authored Aug 10, 2015
  
  e438f864
- Remove useless socket shutdown on close · c319b065
  Julien Muchembled authored Jul 24, 2015
```
shutdown is implicit because we don't duplicate sockets.
```
  c319b065
- Small optimizations & cleanups · 19745e7c
  Julien Muchembled authored Jul 24, 2015
  
  19745e7c
- Better output of verbose locks · 5b69d553
  Julien Muchembled authored Jul 28, 2015
```
- For all threads except the main one, the id is displayed instead of the name,
  because the latter is not always unique.
- Outputs may be interlaced by concurrent thread, so tracebacks are also
  prefixed by their idents.
```
  5b69d553
- Fix verbose locks when acquiring without blocking · ede173f8
  Julien Muchembled authored Jul 28, 2015
  
  ede173f8
28 Jul, 2015 1 commit
- Add a neo/debug.py example to display tracebacks of threads · 52ed5aab
  Julien Muchembled authored Jul 28, 2015
  
  52ed5aab
13 Jul, 2015 2 commits
- Release version 1.4 · f4e656f6
  Julien Muchembled authored Jul 13, 2015
  
  f4e656f6
- Better handling of NotReady error · 167ad36b
  Julien Muchembled authored Jul 10, 2015
  
  167ad36b
10 Jul, 2015 1 commit
- Some documentation cleanup · 8ec87379
  Julien Muchembled authored Jul 10, 2015
  
  8ec87379
09 Jul, 2015 1 commit
- client: fix misleading exception message in case of mismatch checksum · 197054be
  Julien Muchembled authored Jul 09, 2015
  
  197054be
03 Jul, 2015 3 commits
- Fix neo/debug.py example for clients · 9e026d08
  Julien Muchembled authored Jul 03, 2015
  
  9e026d08
- client: prevent RTMIN+3 from connecting to master if not connected yet · e03a836a
  Julien Muchembled authored Jul 03, 2015
  
  e03a836a
- client: fix "signal only works in main thread" when adding a ZODB Mount Point to NEO · c324955d
  Julien Muchembled authored Jul 03, 2015
  
  c324955d
01 Jul, 2015 1 commit
- Update changelog · 79fca358
  Julien Muchembled authored Jul 01, 2015
  
  79fca358
30 Jun, 2015 2 commits
- Add upgrade notes about MySQL/SQLite schema changes since NEO 1.3 · 02a5b4e3
  Julien Muchembled authored Jun 30, 2015
  
  02a5b4e3
- master: new option to automatically start a new cluster · 58774fb6
  Julien Muchembled authored Jun 29, 2015
  
  58774fb6
29 Jun, 2015 2 commits
- master: simplify recovery loop · 5a76664a
  Julien Muchembled authored Jun 29, 2015
  
  5a76664a
- Add support for IPython >= 1, ignore older versions · b19bf40e
  Julien Muchembled authored Jun 29, 2015
  
  b19bf40e