Commits · a4846b1077c448ec40905d337c730f87dd4c6071 · Carlos Ramos Carreño / neoppod

20 May, 2019 2 commits

WIP: Make admin node a web-app · a4846b10

Julien Muchembled authored Aug 31, 2015

The goal is to get rid off the neoctl command-line tool, and to manage the
cluster via a web browser, or tools like 'wget'. Then, it will be possible to
provide an web user interface to connect to the underlying DB of any storage
node, usually a SQL client.

The design of admin app is finished:
- it's threaded like clients
- it's a WSGI app

I also hacked a HTTP API as quickly as possible to make all tests pass.

TODO:
- SSL
- define a better HTTP API
- there's no UI at all yet
- remove all unused packets from the protocol (those that were only used
  between neoctl and admin node)

There are a few dead files, not deleted yet, in case that they contain a few
pieces of useful code:
 neo/neoctl/app.py
 neo/neoctl/handler.py
 neo/scripts/neoctl.py

a4846b10

Use bool instead of int for 'dry_run' field in (Notify)Repair packets · 0b34a051
Julien Muchembled authored May 20, 2019

0b34a051

09 May, 2019 3 commits
- importer: add support for undo, faster conflict check when import is finished · de645092
  Julien Muchembled authored May 07, 2019
  
  de645092
- Fix undo of transactions during which readCurrent() was used · bd5ba87a
  Julien Muchembled authored May 07, 2019
  
  bd5ba87a
- storage: require backends to use @fallback implementation explicitly · 1a72a60f
  Julien Muchembled authored May 07, 2019
```
... rather than logging when the backend does not override.
```
  1a72a60f
30 Apr, 2019 3 commits

importer: fix writeback of transactions during which readCurrent() was used · 042f5ac0
Julien Muchembled authored Apr 30, 2019
```
Contrary to FileStorage, NEO remembers uses of readCurrent().
```
042f5ac0
importer: forbid truncation when writeback is active · 68f26415
Julien Muchembled authored Apr 30, 2019

68f26415

master: fix crash in STARTING_BACKUP when connecting to an upstream secondary master · dba07e72

Julien Muchembled authored Apr 30, 2019

This fixes the following assertion:

  Traceback (most recent call last):
    File "neo/master/app.py", line 172, in run
      self._run()
    File "neo/master/app.py", line 182, in _run
      self.playPrimaryRole()
    File "neo/master/app.py", line 302, in playPrimaryRole
      self.backup_app.provideService())
    File "neo/master/backup_app.py", line 114, in provideService
      node, conn = bootstrap.getPrimaryConnection()
    File "neo/lib/bootstrap.py", line 74, in getPrimaryConnection
      poll(1)
    File "neo/lib/event.py", line 160, in poll
      to_process.process()
    File "neo/lib/connection.py", line 504, in process
      self._handlers.handle(self, self._queue.pop(0))
    File "neo/lib/connection.py", line 92, in handle
      self._handle(connection, packet)
    File "neo/lib/connection.py", line 107, in _handle
      pending[0][1].packetReceived(connection, packet)
    File "neo/lib/handler.py", line 125, in packetReceived
      self.dispatch(*args)
    File "neo/lib/handler.py", line 75, in dispatch
      method(conn, *args, **kw)
    File "neo/lib/handler.py", line 159, in notPrimaryMaster
      assert primary != self.app.server
  AttributeError: 'BackupApplication' object has no attribute 'server'

dba07e72

28 Apr, 2019 3 commits

qa: add testrunner options to dump/check the format of network packets · e3cd5c5b

Julien Muchembled authored Jan 02, 2019

With the switch to msgpack, there was no schema anymore whereas it was
sometimes used for both automatic conversion (e.g. the last argument of
AskStoreTransaction must now be explicitly cast to list) and type checking.

This somewhat reintroduces a kind of schema that:
- is used by the test suite for type checking
- can be generated automatically from the test suite
  when one change the procotol

e3cd5c5b

protocol: switch to msgpack for packet serialization · 9d0bf97a

Julien Muchembled authored May 07, 2018

Not only for performance reasons (at least 3% faster) but also because of
several ugly things in the way packets were defined:
- packet field names, which are only documentary; for roots fields,
  they even just duplicate the packet names
- a lot of repetitions for packet names, and even confusion between the name
  of the packet definition and the name of the actual notify/request packet
- the need to implement field types for anything, like PByte to support new
  compression formats, since PBoolean is not enough

neo/lib/protocol.py is now much smaller.

9d0bf97a

Release version 1.12 · 6332112c
Julien Muchembled authored Apr 28, 2019

6332112c

27 Apr, 2019 12 commits

master: reject drop/tweak ctl commands that could lead to unwanted status · 55a6dd0f

Julien Muchembled authored Apr 11, 2019

The following 2 operations can be onerous and they should not be
directly usable without some kind of confirmation by the user:
- Dropping a node now requires to first stop it.
- Tweaking does not exclude anymore automatically DOWN nodes,
  because a node could go DOWN between the moment the user sends
  the command to tweak and the actual tweak by the master.

55a6dd0f

qa: extend test reproducing the migration of a big ZODB to NEO · ef4d58f6
Julien Muchembled authored Apr 07, 2019

ef4d58f6
neoctl: better display of full partition tables · ab082d7e
Julien Muchembled authored Apr 04, 2019

ab082d7e
Bump protocol version · c6453626
Julien Muchembled authored Apr 26, 2019

c6453626

tweak: add option to simulate · 2a27239d

Julien Muchembled authored Mar 31, 2019

Initially, I wanted to do the simulation inside neoctl but it has no knowledge
of the topology (the master don't send devpath values of storage nodes).
Therefore, the work is delegated to the master node, which implies a change
of the protocol.

2a27239d

tweak: do not crash when trying to remove all nodes · 3839d224
Julien Muchembled authored Apr 04, 2019

3839d224
tweak: do not touch cells of nodes that are intended to be dropped · 8a645d9f
Julien Muchembled authored Mar 29, 2019

8a645d9f

Better error reporting from the master to neoctl for denied requests · c2c9e99d

Julien Muchembled authored Apr 06, 2019

This stops abusing ProtocolError, which disconnects the admin node needlessly.

The many 'if ... raise RuntimeError' in neo/neoctl/neoctl.py
could be turned into assertions.

c2c9e99d

Make 'neoctl print pt' report the number of replicas · 21190ee7
Julien Muchembled authored Mar 31, 2019

21190ee7

Make the number of replicas modifiable when the cluster is running · ef5fc508

Julien Muchembled authored Mar 27, 2019

neoctl gets a new command to change the number of replicas.

The number of replicas becomes a new partition table attribute and
like the PT id, it is stored in the config table. On the other side,
the configuration value for the number of partitions is dropped,
since it can be computed from the partition table, which is
always stored in full.

The -p/-r master options now only apply at database creation.

Some implementation notes:

- The protocol is slightly optimized in that the master now sends
  automatically the whole partition tables to the admin & client
  nodes upon connection, like for storage nodes.
  This makes the protocol more consistent, and the master is the
  only remaining node requesting partition tables, during recovery.

- Some parts become tricky because app.pt can be None in more cases.
  For example, the extra condition in NodeManager.update
  (before app.pt.dropNode) was added for this is the reason.
  Or the 'loadPartitionTable' method (storage) that is not inlined
  because of unit tests.
  Overall, this commit simplifies more than it complicates.

- In the master handlers, we stop hijacking the 'connectionCompleted'
  method for tasks to be performed (often send the full partition
  table) on handler switches.

- The admin's 'bootstrapped' flag could have been removed earlier:
  race conditions can't happen since the AskNodeInformation packet
  was removed (commit d048a52d).

ef5fc508

New --new-nid storage option for fast cloning · 27e3f620

Julien Muchembled authored Mar 21, 2019

It is often faster to set up replicas by stopping a node (and any
underlying database server like MariaDB) and do a raw copy of the
database (e.g. with rsync). So far, it required to stop the whole
cluster and use tools like 'mysql' or sqlite3' to edit:
- the 'pt' table in databases,
- the 'config.nid' values of the new nodes.

With this new option, if you already have 1 replica, you can set up
new replicas with such fast raw copy, and without interruption of
service. Obviously, this implies less redundancy during the operation.

27e3f620

qa: fix 2 tests with ZODB5 · 64e02391
Julien Muchembled authored Apr 26, 2019

64e02391

26 Apr, 2019 4 commits
- qa: new tools/stress options to evaluate MySQL engines · 491f4c89
  Julien Muchembled authored Apr 23, 2019
```
--kill-mysqld should be combined with something like -f .3 -r .1
to give storage nodes enough time to recover.
And also -D 0 to focus testing on the storage backend rather than NEO.
```
  491f4c89
- qa: provide a way to let tests start 1 mysqld per storage node · c11410ef
  Julien Muchembled authored Apr 23, 2019
  
  c11410ef
- mysql: make 'user' actually optional in the DB connection string · 74ec44e3
  Julien Muchembled authored Apr 23, 2019
  
  74ec44e3
- mysql: specify column families for RocksDB · 87c1de3b
  Julien Muchembled authored Apr 17, 2019
  
  87c1de3b
16 Apr, 2019 5 commits
- qa: add testIncremental (testImporter) test · aa7b654f
  Julien Muchembled authored Apr 09, 2019
  
  aa7b654f
- importer: fix hidden "maximum recursion depth exceeded" at startup · d5834ee9
  Julien Muchembled authored Apr 09, 2019
  
  d5834ee9
- importer: fix closure of ZODB, and also do it when the import is finished · c37bcfa3
  Julien Muchembled authored Apr 09, 2019
  
  c37bcfa3
- sqlite: fix resumption of migration to NEO with Importer · 6608a868
  Julien Muchembled authored Apr 09, 2019
  
  6608a868
- qa: fix a random failure in threaded tests · 989e9920
  Julien Muchembled authored Apr 06, 2019
```
This also reverts commit 442bb43a.
```
  989e9920
05 Apr, 2019 3 commits
- importer: speed up startup when the import is already finished · 26b1246a
  Julien Muchembled authored Apr 05, 2019
  
  26b1246a
- importer: fix replication (as source) once import is finished · 9d14ea1b
  Julien Muchembled authored Apr 05, 2019
```
This fixes up commit be839e92.
```
  9d14ea1b
- storage: fix DatabaseManager.getLastTID with max_tid · c58d4862
  Julien Muchembled authored Apr 05, 2019
  
  c58d4862
01 Apr, 2019 1 commit
- qa: remove 2 useless unit tests · b10cc750
  Julien Muchembled authored Mar 29, 2019
  
  b10cc750
21 Mar, 2019 2 commits

storage: allow the master to change our node id · 15369269
Julien Muchembled authored Mar 21, 2019
```
This is not used currently.
```
15369269

Rename --uuid command-line options into --nid · e8473a23

Julien Muchembled authored Mar 21, 2019

This breaks compatibily but it was mentionned from the beginning
that these options are only there for testing purpose.

TODO: rename all remaining occurrences of UUID into NID in the code

e8473a23

16 Mar, 2019 1 commit

importer: fix possible data loss on writeback · e387ad59

Julien Muchembled authored Mar 12, 2019

If the source DB is lost during the import and then restored from a backup,
all new transactions have to written back again on resume. It is the most
common case for which the writeback hits the maximum number of transactions
per partition to process at each iteration; the previous code was buggy in
that it could skip transactions.

e387ad59

11 Mar, 2019 1 commit
- Release version 1.11 · 48d936cb
  Julien Muchembled authored Mar 11, 2019
  
  48d936cb