- 20 Dec, 2016 1 commit
-
-
Julien Muchembled authored
-
- 06 Dec, 2016 2 commits
-
-
Julien Muchembled authored
A backup master crashed with the following traceback after a reconnection: Traceback (most recent call last): File "neo/master/app.py", line 127, in run self._run() File "neo/master/app.py", line 147, in _run self.playPrimaryRole() File "neo/master/app.py", line 348, in playPrimaryRole self.backup_app.provideService()) File "neo/master/backup_app.py", line 123, in provideService poll(1) File "neo/lib/event.py", line 126, in poll to_process.process() File "neo/lib/connection.py", line 500, in process self._handlers.handle(self, self._queue.pop(0)) File "neo/lib/connection.py", line 110, in handle self._handle(connection, packet) File "neo/lib/connection.py", line 125, in _handle handler.packetReceived(connection, packet) File "neo/lib/handler.py", line 117, in packetReceived self.dispatch(*args) File "neo/lib/handler.py", line 66, in dispatch method(conn, *args, **kw) File "neo/master/handlers/backup.py", line 52, in invalidateObjects app.invalidatePartitions(tid, partition_set) File "neo/master/backup_app.py", line 257, in invalidatePartitions self.triggerBackup(node) File "neo/master/backup_app.py", line 281, in triggerBackup assert cell_list, offset AssertionError: 0
-
Julien Muchembled authored
-
- 01 Dec, 2016 5 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
Many "unit" tests (!= "threaded" tests) don't do more than checking implementation details, and increase coverage artificially. As with testEvent in commit 71e30fb9, most of these tests will either be removed or rewritten as threaded tests. The fact that the remaining unit tests actually cover code that other test don't gives motivation to maintain them. It will be also less code to update when switching to https://pypi.python.org/pypi/mock I proceeded as follows: 1. Measure coverage for all tests except unit tests. While checking my work, I found that coverage stats for threaded/functional/zodb tests are quite unstable, so I restarted from the beginning by doing this measure several times and only keeping the intersection of coverage data. 2. Measure coverage individually for each 'unit' tests, and substract the each result with the data in 1. 3. The candidates for deletion are those without any code covered. Tests I didn't delete: - neo.tests.master.testElectionHandler: I always do minimal changes about election, as long as there's no serious review. - neo.tests.master.testMasterPT.MasterPartitionTableTests.test_13_outdate - 4 tests in neo.tests.testPT: test_01_Cell, test_04_removeCell, test_06_clear, test_08_filled - neo.tests.storage.testStorage{MySQL,SQLite} - neo.tests.testUtil.UtilTests.testReadBufferRead In a way, this commit is actually quite conservative. There are still many useless tests that only check error paths and for simple tested methods, this is just duplicating thie tested code.
-
Julien Muchembled authored
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 30 Nov, 2016 3 commits
-
-
Julien Muchembled authored
- format IPv6 inside [] when followed by :<port> - unify rendering of node lists - neoctl: do not crash on empty DB (no PT id)
-
Julien Muchembled authored
-
Julien Muchembled authored
This fixes a reqression in commit c39d5c67, which could leads to failures like: 2016-11-29 09:56:58,756 ERROR ZODB.Connection Couldn't load state for 0x4843 Traceback (most recent call last): File "ZODB/Connection.py", line 860, in setstate self._setstate(obj) File "ZODB/Connection.py", line 901, in _setstate p, serial = self._storage.load(obj._p_oid, '') File "neo/client/Storage.py", line 82, in load return self.app.load(oid)[:2] File "neo/client/app.py", line 352, in load data, tid, next_tid, _ = self._loadFromStorage(oid, tid, before_tid) File "neo/client/app.py", line 372, in _loadFromStorage for node, conn in self.cp.iterateForObject(oid, readable=True): File "neo/client/pool.py", line 91, in iterateForObject pt = self.app.pt File "neo/client/app.py", line 146, in __getattr__ return self.__getattribute__(attr) AttributeError: 'Application' object has no attribute 'pt'
-
- 28 Nov, 2016 4 commits
-
-
Julien Muchembled authored
This is similar to commit 7aecdada and for completeness, we also protect unlock the same way.
-
Julien Muchembled authored
For 2 previous commits, we didn't reindent in order to keep the diff readable.
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 27 Nov, 2016 11 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
The added test describes how the new id timestamps fix the race condition. These timestamps could be any unique opaque values, and the protocol is extended to exchange them along with node ids. Internally, nodes also reuse timestamps as a marker to identify the first NotifyNodeInformation packets from the master: since this packet is a complete list of nodes in the cluster, any other node in the node manager has left the cluster definitely and is removed. The secondary masters didn't receive update about master nodes. It's also useless to send them information about non-master nodes.
-
Julien Muchembled authored
-
Julien Muchembled authored
When Client (including backup master) and admin nodes are identified, the primary master now sends them automatically all nodes with NotifyNodeInformation, as with storage nodes.
-
Julien Muchembled authored
- check address conflicts - on invalid values, reject peer instead of dying
-
Julien Muchembled authored
Listing connected/connecting nodes with a UUID is used: - in one place by storage nodes: here, it does not matter if we skip nodes that aren't really identified - in many places by the master, only for server connections, in which case we have equivalence with real identification So in practice, NodeManager is only simplified to reuse the 'identified' property of nodes.
-
Julien Muchembled authored
-
Julien Muchembled authored
Therefore, a client node in the node manager is always RUNNING.
-
Julien Muchembled authored
Although the change applies to any node with a temporary ids (all but storage), only clients don't have addresses and are therefore not recognizable. After a client is disconnected from the master and before reconnecting, another client may join the cluster and "steals" the id of the first client. This issue leads to stuck clients, failing in loop with exceptions like the following one: ERROR ZODB.Connection Couldn't load state for 0x0251 Traceback (most recent call last): File "ZODB/Connection.py", line 860, in setstate self._setstate(obj) File "ZODB/Connection.py", line 901, in _setstate p, serial = self._storage.load(obj._p_oid, '') File "neo/client/Storage.py", line 82, in load return self.app.load(oid)[:2] File "neo/client/app.py", line 353, in load data, tid, next_tid, _ = self._loadFromStorage(oid, tid, before_tid) File "neo/client/app.py", line 373, in _loadFromStorage for node, conn in self.cp.iterateForObject(oid, readable=True): File "neo/client/pool.py", line 91, in iterateForObject pt = self.app.pt File "neo/client/app.py", line 145, in __getattr__ self._getMasterConnection() File "neo/client/app.py", line 214, in _getMasterConnection result = self.master_conn = self._connectToPrimaryNode() File "neo/client/app.py", line 246, in _connectToPrimaryNode handler=handler) File "neo/lib/threaded_app.py", line 154, in _ask _handlePacket(qconn, qpacket, kw, handler) File "neo/lib/threaded_app.py", line 135, in _handlePacket handler.dispatch(conn, packet, kw) File "neo/lib/handler.py", line 66, in dispatch method(conn, *args, **kw) File "neo/lib/handler.py", line 188, in error getattr(self, Errors[code])(conn, message) File "neo/client/handlers/__init__.py", line 23, in protocolError raise StorageError("protocol error: %s" % message) StorageError: protocol error: already connected
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 25 Nov, 2016 2 commits
-
-
Julien Muchembled authored
-
Julien Muchembled authored
-
- 21 Nov, 2016 2 commits
-
-
Julien Muchembled authored
`ClientCache._oid_dict` shall not have empty values. For a given oid, when the last item is removed from the cache, the oid must be removed as well to free memory. In some cases, this was not done. A consequence of this bug is the following exception: ERROR ZODB.Connection Couldn't load state for 0x02d1e1e4 Traceback (most recent call last): File "ZODB/Connection.py", line 860, in setstate self._setstate(obj) File "ZODB/Connection.py", line 901, in _setstate p, serial = self._storage.load(obj._p_oid, '') File "neo/client/Storage.py", line 82, in load return self.app.load(oid)[:2] File "neo/client/app.py", line 358, in load self._cache.store(oid, data, tid, next_tid) File "neo/client/cache.py", line 228, in store prev = item_list[-1] IndexError: list index out of range
-
Julien Muchembled authored
-
- 15 Nov, 2016 2 commits
-
-
Kirill Smelkov authored
A backup cluster for tids <= backup_tid has all data to provide regular read-only ZODB service. Having regular ZODB access to the data can be handy e.g. for externally verifying data for consistency between main and backup clusters. Peeking around without disturbing main cluster might be also useful sometimes. In this patch: - master & storage nodes are taught: * to instantiate read-only or regular client service handler depending on cluster state: RUNNING -> regular BACKINGUP -> read-only * in read-only client handler: + to reject write-related operations + to provide read operations but adjust semantic as last_tid in the database would be = backup_tid - new READ_ONLY_ACCESS protocol error code is introduced so that client can raise POSException.ReadOnlyError upon receiving it. I have not implemented back-channel for invalidations in read-only mode (yet ?). This way once a client connects to cluster in backup state, it won't see new data fetched by backup cluster from upstream after client connected. The reasons invalidations are not implemented is that for now (imho) there is no off-hand ready infrastructure to get updates from replicating node on transaction-by-transaction basis (it currently only notifies when whole batch is done). For consistency verification (main reason for this patch) we also don't need invalidations to work, as in that task we always connect afresh to backup. So I simply only put relevant TODOs about invalidations for now. The patch is not very polished but should work. /reviewed-on nexedi/neoppod!4
-
Kirill Smelkov authored
-
- 27 Oct, 2016 1 commit
-
-
Iliya Manolov authored
Currently, the command "neoctl [arguments] print ids" has the following output: last_oid = 0x... last_tid = 0x... last_ptid = ... or backup_tid = 0x... last_tid = 0x... last_ptid = ... depending on whether the cluster is in normal or backup mode. This is extremely unreadable since the admin is often interested in the time that corresponds to each tid. Now the output is: last_oid = 0x... last_tid = 0x... (yyyy-mm-dd hh:mm:ss.ssssss) last_ptid = ... or backup_tid = 0x... (yyyy-mm-dd hh:mm:ss.ssssss) last_tid = 0x... (yyyy-mm-dd hh:mm:ss.ssssss) last_ptid = ... /reviewed-on nexedi/neoppod!2
-
- 17 Oct, 2016 1 commit
-
-
Kirill Smelkov authored
Similarly to 13911ca3 on the same instance after MariaDB was upgraded to 10.1.17 the following query, even after `OPTIMIZE TABLE obj`, started to execute very slowly: MariaDB [(none)]> SELECT tid FROM neo1.obj WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1; +--------------------+ | tid | +--------------------+ | 268707072758797063 | +--------------------+ 1 row in set (4.82 sec) Both explain and analyze says the query will/is using `partition` key but only partially (note key_len is only 10, not 18): MariaDB [(none)]> SHOW INDEX FROM neo1.obj; +-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | obj | 0 | PRIMARY | 1 | partition | A | 28755928 | NULL | NULL | | BTREE | | | | obj | 0 | PRIMARY | 2 | tid | A | 28755928 | NULL | NULL | | BTREE | | | | obj | 0 | PRIMARY | 3 | oid | A | 28755928 | NULL | NULL | | BTREE | | | | obj | 0 | partition | 1 | partition | A | 28755928 | NULL | NULL | | BTREE | | | | obj | 0 | partition | 2 | oid | A | 28755928 | NULL | NULL | | BTREE | | | | obj | 0 | partition | 3 | tid | A | 28755928 | NULL | NULL | | BTREE | | | | obj | 1 | data_id | 1 | data_id | A | 28755928 | NULL | NULL | YES | BTREE | | | +-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 7 rows in set (0.00 sec) MariaDB [(none)]> explain SELECT tid FROM neo1.obj WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1; +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+--------------------------+ | 1 | SIMPLE | obj | ref | PRIMARY,partition | partition | 10 | const,const | 2 | Using where; Using index | +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+--------------------------+ 1 row in set (0.00 sec) MariaDB [(none)]> analyze SELECT tid FROM neo1.obj WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1; +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+------------+----------+------------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+------------+----------+------------+--------------------------+ | 1 | SIMPLE | obj | ref | PRIMARY,partition | partition | 10 | const,const | 2 | 9741121.00 | 100.00 | 0.00 | Using where; Using index | +------+-------------+-------+------+-------------------+-----------+---------+-------------+------+------------+----------+------------+--------------------------+ 1 row in set (4.93 sec) By explicitly forcing (partition, oid, tid) index usage which is precisely designed to serve this and similar queries can avoid the query from being slow: MariaDB [(none)]> analyze SELECT tid FROM neo1.obj FORCE INDEX(`partition`) WHERE `partition`=5 AND oid=79613 AND tid>268707071353462798 ORDER BY tid LIMIT 1; +------+-------------+-------+-------+---------------+-----------+---------+------+------+--------+----------+------------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+-------+-------+---------------+-----------+---------+------+------+--------+----------+------------+--------------------------+ | 1 | SIMPLE | obj | range | partition | partition | 18 | NULL | 2 | 1.00 | 100.00 | 100.00 | Using where; Using index | +------+-------------+-------+-------+---------------+-----------+---------+------+------+--------+----------+------------+--------------------------+ 1 row in set (0.00 sec) /cc @jm, @vpelltier, @Tyagov /reviewed-on nexedi/neoppod!1
-
- 12 Sep, 2016 1 commit
-
-
Julien Muchembled authored
Many patches have been merged upstream :) A notable change is that lastTransaction() does not ping the master anymore (but it still causes a connection to the master if the client is disconnected).
-
- 29 Aug, 2016 2 commits
-
-
Julien Muchembled authored
After partitions were dropped with TokuDB, we had a case where MariaDB 10.1.14 stopped using the most appropriate index. MariaDB [neo0]> explain SELECT DISTINCT data_id FROM obj WHERE `partition`=5; +------+-------------+-------+-------+-------------------+---------+---------+------+------+---------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+-------------------+---------+---------+------+------+---------------------------------------+ | 1 | SIMPLE | obj | range | PRIMARY,partition | data_id | 11 | NULL | 10 | Using where; Using index for group-by | +------+-------------+-------+-------+-------------------+---------+---------+------+------+---------------------------------------+ MariaDB [neo0]> SELECT SQL_NO_CACHE DISTINCT data_id FROM obj WHERE `partition`=5; Empty set (1 min 51.47 sec) Expected: MariaDB [neo1]> explain SELECT DISTINCT data_id FROM obj WHERE `partition`=4; +------+-------------+-------+------+-------------------+---------+---------+-------+------+------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+-------------------+---------+---------+-------+------+------------------------------+ | 1 | SIMPLE | obj | ref | PRIMARY,partition | PRIMARY | 2 | const | 1 | Using where; Using temporary | +------+-------------+-------+------+-------------------+---------+---------+-------+------+------------------------------+ 1 row in set (0.00 sec) MariaDB [neo1]> SELECT SQL_NO_CACHE DISTINCT data_id FROM obj WHERE `partition`=4; Empty set (0.00 sec) Restarting the server or 'OPTIMIZE TABLE obj; ' does not help. Such issue could prevent the cluster to start due to timeouts, by always going back to RECOVERING state.
-
Julien Muchembled authored
-
- 11 Aug, 2016 2 commits
-
-
Julien Muchembled authored
Freeing disk space when a cell is dropped will have to be implemented with care, not only for performance reasons.
-
Julien Muchembled authored
TRUNCATE was chosen for performance reasons, but it's usually done on small tables, and not for performance-critical operations. TRUNCATE commits implicitely, so for pt/ttrans in particular, it's certainly slower due to extra fsyncs to disk. On the other side, committing too early can corrupt the database if the storage node is stopped just after. For example, a failure in changePartitionTable() can cause 'pt' to remain empty.
-
- 01 Aug, 2016 1 commit
-
-
Julien Muchembled authored
-