An error occurred fetching the project authors.
  1. 01 Dec, 2016 1 commit
  2. 27 Nov, 2016 1 commit
    • Julien Muchembled's avatar
      Fix identification issues, including a race condition causing id conflicts · 9385706f
      Julien Muchembled authored
      The added test describes how the new id timestamps fix the race condition.
      These timestamps could be any unique opaque values, and the protocol is
      extended to exchange them along with node ids.
      
      Internally, nodes also reuse timestamps as a marker to identify the first
      NotifyNodeInformation packets from the master: since this packet is a complete
      list of nodes in the cluster, any other node in the node manager has left the
      cluster definitely and is removed.
      
      The secondary masters didn't receive update about master nodes.
      It's also useless to send them information about non-master nodes.
      9385706f
  3. 24 Oct, 2016 1 commit
  4. 21 Oct, 2016 1 commit
  5. 25 Jan, 2016 1 commit
  6. 01 Dec, 2015 1 commit
    • Julien Muchembled's avatar
      Safer DB truncation, new 'truncate' ctl command · d3c8b76d
      Julien Muchembled authored
      With the previous commit, the request to truncate the DB was not stored
      persistently, which means that this operation was still vulnerable to the case
      where the master is restarted after some nodes, but not all, have already
      truncated. The master didn't have the information to fix this and the result
      was a DB partially truncated.
      
      -> On a Truncate packet, a storage node only stores the tid somewhere, to send
         it back to the master, which stays in RECOVERING state as long as any node
         has a different value than that of the node with the latest partition table.
      
      We also want to make sure that there is no unfinished data, because a user may
      truncate at a tid higher than a locked one.
      
      -> Truncation is now effective at the end on the VERIFYING phase, just before
         returning the last ids to the master.
      
      At last all nodes should be truncated, to avoid that an offline node comes back
      with a different history. Currently, this would not be an issue since
      replication is always restart from the beginning, but later we'd like they
      remember where they stopped to replicate.
      
      -> If a truncation is requested, the master waits for all nodes to be pending,
         even if it was previously started (the user can still force the cluster to
         start with neoctl). And any lost node during verification also causes the
         master to go back to recovery.
      
      Obviously, the protocol has been changed to split the LastIDs packet and
      introduce a new Recovery, since it does not make sense anymore to ask last ids
      during recovery.
      d3c8b76d
  7. 30 Nov, 2015 1 commit
    • Julien Muchembled's avatar
      Perform DB truncation during recovery, send PT to storages before verification · 3e3eab5b
      Julien Muchembled authored
      Currently, the database may only be truncated when leaving backup mode, but
      the issue will be the same when neoctl gets a new command to truncate at an
      arbitrary tid: we want to be sure that all nodes are truncated before anything
      else.
      
      Therefore, we stop sending Truncate orders before stopping operation because
      nodes could fail/exit before actually processing them. Truncation must also
      happen before asking nodes their last ids.
      
      With this commit, if a truncation is requested:
      - this is always the first thing done when a storage node connects to the
        primary master during the RECOVERING phase,
      - and the cluster does not start automatically if there are missing nodes,
        unless an admin forces it.
      
      Other changes:
      - Connections to storage nodes don't need to be aborted anymore when leaving
        backup mode.
      - The master always initiates communication when a storage node identifies,
        which simplifies code and reduces the number of exchanged packets.
      3e3eab5b
  8. 05 Oct, 2015 1 commit
  9. 24 Sep, 2015 2 commits
  10. 12 Aug, 2015 2 commits
  11. 24 Jun, 2015 1 commit
  12. 21 May, 2015 1 commit
  13. 05 May, 2015 1 commit
  14. 30 Jul, 2014 1 commit
  15. 22 Jul, 2014 1 commit
  16. 04 Jul, 2014 1 commit
  17. 03 Jun, 2014 1 commit
  18. 07 Jan, 2014 1 commit
  19. 23 Aug, 2012 1 commit
  20. 21 Aug, 2012 2 commits
  21. 20 Aug, 2012 1 commit
    • Julien Muchembled's avatar
      More bugfixes to backup mode · 08742377
      Julien Muchembled authored
      - catch OperationFailure
      - reset transaction manager when leaving backup mode
      - send appropriate target tid to a storage that updates a outdated cell
      - clean up partition table when leaving BACKINGUP state unexpectedly
      - make sure all readable cells of a partition have the same 'backup_tid'
        if they have the same data, so that we know when internal replication is
        finished when leaving backup mode
      - fix storage not finished internal replication when leaving backup mode
      08742377
  22. 14 Aug, 2012 1 commit
  23. 09 Aug, 2012 1 commit
    • Julien Muchembled's avatar
      Backup bugfixes · ad01f379
      Julien Muchembled authored
      - fix stopping backup cluster
      - fix leaving backup mode, including truncating to consistent TID
      - fix backup_tid on master and storages
      ad01f379
  24. 17 Jul, 2012 1 commit
    • Julien Muchembled's avatar
      storage: fix save or reset of 'backup_tid' config value · 1d8a0dbe
      Julien Muchembled authored
      Because masters don't have persistent storage, the task to remember whether
      the cluster is in backup mode or not is delegated to storages, via the presence
      of a 'backup_tid' config value.
      
      This fixes a bug that set 'backup_tid' after a simple replication.
      If the cluster was restarted, it would have tried to switch to BACKINGUP state.
      1d8a0dbe
  25. 13 Jul, 2012 1 commit
  26. 21 Mar, 2012 1 commit
  27. 20 Mar, 2012 1 commit
  28. 13 Mar, 2012 1 commit
  29. 12 Mar, 2012 1 commit
    • Julien Muchembled's avatar
      New feature to check that partitions are replicated properly · 04f72a4c
      Julien Muchembled authored
      This includes an API change of Node.isIdentified, which now tells whether
      identification packets have been exchanged or not.
      All handlers must be updated to implement '_acceptIdentification' instead of
      overriding EventHandler.acceptIdentification: this patch only does it for
      StorageOperationHandler
      04f72a4c
  30. 24 Feb, 2012 1 commit
    • Julien Muchembled's avatar
      Implements backup using specialised storage nodes and relying on replication · 8e3c7b01
      Julien Muchembled authored
      Replication is also fully reimplemented:
      - It is not done anymore on whole partitions.
      - It runs at lowest priority not to degrades performance for client nodes.
      
      Schema of MySQL table is changed to optimize storage layout: rows are now
      grouped by age, for good partial replication performance.
      This certainly also speeds up simple loads/stores.
      8e3c7b01
  31. 23 Feb, 2012 3 commits
  32. 10 Feb, 2012 2 commits
  33. 26 Jan, 2012 1 commit
  34. 16 Jan, 2012 1 commit