1. 22 Mar, 2016 2 commits
  2. 21 Mar, 2016 3 commits
  3. 09 Mar, 2016 2 commits
  4. 08 Mar, 2016 2 commits
  5. 04 Mar, 2016 3 commits
  6. 02 Mar, 2016 1 commit
    • Julien Muchembled's avatar
      client: revert incorrect memory optimization · 763806e0
      Julien Muchembled authored
      Since commit d2d77437 ("client: make the cache
      tolerant to late invalidations when the entry is in the history queue"),
      invalidated items became current again when they were moved to the history
      queue, which was wrong for 2 reasons:
      - only the last items of _oid_dict values may have next_tid=None,
      - and for such items, they could be wrongly reused when caching the real
        current data.
      763806e0
  7. 01 Mar, 2016 1 commit
  8. 26 Feb, 2016 4 commits
  9. 05 Feb, 2016 1 commit
  10. 25 Jan, 2016 2 commits
  11. 21 Jan, 2016 2 commits
  12. 12 Jan, 2016 1 commit
  13. 16 Dec, 2015 2 commits
  14. 13 Dec, 2015 3 commits
  15. 12 Dec, 2015 1 commit
  16. 11 Dec, 2015 1 commit
  17. 09 Dec, 2015 1 commit
  18. 02 Dec, 2015 1 commit
  19. 01 Dec, 2015 3 commits
    • Julien Muchembled's avatar
    • Julien Muchembled's avatar
    • Julien Muchembled's avatar
      Safer DB truncation, new 'truncate' ctl command · d3c8b76d
      Julien Muchembled authored
      With the previous commit, the request to truncate the DB was not stored
      persistently, which means that this operation was still vulnerable to the case
      where the master is restarted after some nodes, but not all, have already
      truncated. The master didn't have the information to fix this and the result
      was a DB partially truncated.
      
      -> On a Truncate packet, a storage node only stores the tid somewhere, to send
         it back to the master, which stays in RECOVERING state as long as any node
         has a different value than that of the node with the latest partition table.
      
      We also want to make sure that there is no unfinished data, because a user may
      truncate at a tid higher than a locked one.
      
      -> Truncation is now effective at the end on the VERIFYING phase, just before
         returning the last ids to the master.
      
      At last all nodes should be truncated, to avoid that an offline node comes back
      with a different history. Currently, this would not be an issue since
      replication is always restart from the beginning, but later we'd like they
      remember where they stopped to replicate.
      
      -> If a truncation is requested, the master waits for all nodes to be pending,
         even if it was previously started (the user can still force the cluster to
         start with neoctl). And any lost node during verification also causes the
         master to go back to recovery.
      
      Obviously, the protocol has been changed to split the LastIDs packet and
      introduce a new Recovery, since it does not make sense anymore to ask last ids
      during recovery.
      d3c8b76d
  20. 30 Nov, 2015 4 commits
    • Julien Muchembled's avatar
      Perform DB truncation during recovery, send PT to storages before verification · 3e3eab5b
      Julien Muchembled authored
      Currently, the database may only be truncated when leaving backup mode, but
      the issue will be the same when neoctl gets a new command to truncate at an
      arbitrary tid: we want to be sure that all nodes are truncated before anything
      else.
      
      Therefore, we stop sending Truncate orders before stopping operation because
      nodes could fail/exit before actually processing them. Truncation must also
      happen before asking nodes their last ids.
      
      With this commit, if a truncation is requested:
      - this is always the first thing done when a storage node connects to the
        primary master during the RECOVERING phase,
      - and the cluster does not start automatically if there are missing nodes,
        unless an admin forces it.
      
      Other changes:
      - Connections to storage nodes don't need to be aborted anymore when leaving
        backup mode.
      - The master always initiates communication when a storage node identifies,
        which simplifies code and reduces the number of exchanged packets.
      3e3eab5b
    • Julien Muchembled's avatar
      master: fix possible blockage during recovery after a storage disconnection · 2485f151
      Julien Muchembled authored
      At some point, the master asks a storage node its partition table. If this node
      is lost before getting an answer, another node (or the same one if it comes
      back) must be asked.
      
      Before this change, the master node had to be restarted.
      2485f151
    • Julien Muchembled's avatar
      master: last tid/oid after recovery/verification · dec81519
      Julien Muchembled authored
      The important bugfix is to update the last oid when the master verifies a
      transaction with new oids.
      
      By resetting the transaction manager at the beginning of the recovery phase,
      it become possible to avoid tid/oid holes:
      - by reallocating previously unused allocated oids
      - when going back "in the past", i.e. reverting to an older version of the
        database (with fewer oids) and/or adjusting the clock
      dec81519
    • Julien Muchembled's avatar
      Go back/stay in RECOVERING state when the partition table can't be operational · e1f9a7da
      Julien Muchembled authored
      This fixes several cases where the partition table could become corrupt and
      the whole cluster being stuck in VERIFYING state.
      
      This also reduces the probability the have cells out of date when restarting
      several storage nodes simultaneously.
      
      At last, if a master node becomes primary again, a cluster must not be started
      automatically if nodes with readable cells are missing, in order to avoid
      a split of the database. This could happen if this master node was previously
      forced to start it.
      e1f9a7da