1. 31 Mar, 2017 3 commits
  2. 30 Mar, 2017 1 commit
  3. 23 Mar, 2017 9 commits
  4. 22 Mar, 2017 1 commit
  5. 21 Mar, 2017 1 commit
  6. 20 Mar, 2017 2 commits
  7. 18 Mar, 2017 1 commit
    • Julien Muchembled's avatar
      master: fix crash when a transaction begins while a storage node starts operation · 781b4eb5
      Julien Muchembled authored
      Traceback (most recent call last):
        ...
        File "neo/lib/handler.py", line 72, in dispatch
          method(conn, *args, **kw)
        File "neo/master/handlers/client.py", line 70, in askFinishTransaction
          conn.getPeerId(),
        File "neo/master/transactions.py", line 387, in prepare
          assert node_list, (ready, failed)
      AssertionError: (set([]), frozenset([]))
      
      Master log leading to the crash:
        PACKET    #0x0009 StartOperation                 > S1
        PACKET    #0x0004 BeginTransaction               < C1
        DEBUG     Begin <...>
        PACKET    #0x0004 AnswerBeginTransaction         > C1
        PACKET    #0x0001 NotifyReady                    < S1
      
      It was wrong to process BeginTransaction before receiving NotifyReady.
      
      The changes in the storage are cosmetics: the 'ready' attribute has become
      redundant with 'operational'.
      781b4eb5
  8. 17 Mar, 2017 3 commits
  9. 14 Mar, 2017 4 commits
  10. 07 Mar, 2017 1 commit
  11. 03 Mar, 2017 1 commit
    • Julien Muchembled's avatar
      qa: fix random failure of check_checkCurrentSerialInTransaction · fec9a3a5
      Julien Muchembled authored
      Generators are not thread-safe:
      
      Exception in thread T2:
      Traceback (most recent call last):
        ...
        File "ZODB/tests/StorageTestBase.py", line 157, in _dostore
          r2 = self._storage.tpc_vote(t)
        File "neo/client/Storage.py", line 95, in tpc_vote
          return self.app.tpc_vote(transaction)
        File "neo/client/app.py", line 507, in tpc_vote
          self.waitStoreResponses(txn_context)
        File "neo/client/app.py", line 500, in waitStoreResponses
          _waitAnyTransactionMessage(txn_context)
        File "neo/client/app.py", line 145, in _waitAnyTransactionMessage
          self._waitAnyMessage(queue, block=block)
        File "neo/client/app.py", line 128, in _waitAnyMessage
          conn, packet, kw = get(block)
        File "neo/lib/locking.py", line 203, in get
          self._lock()
        File "neo/tests/threaded/__init__.py", line 590, in _lock
          for i in TIC_LOOP:
      ValueError: generator already executing
      
      ======================================================================
      FAIL: check_checkCurrentSerialInTransaction (neo.tests.zodb.testBasic.BasicTests)
      ----------------------------------------------------------------------
      Traceback (most recent call last):
        File "neo/tests/zodb/testBasic.py", line 33, in check_checkCurrentSerialInTransaction
          super(BasicTests, self).check_checkCurrentSerialInTransaction()
        File "ZODB/tests/BasicStorage.py", line 294, in check_checkCurrentSerialInTransaction
          utils.load_current(self._storage, b'\0\0\0\0\0\0\0\xf4')[1])
      failureException: False is not true
      fec9a3a5
  12. 02 Mar, 2017 2 commits
    • Julien Muchembled's avatar
      storage: fix PT updates in case of late AnswerUnfinishedTransactions · a74937c8
      Julien Muchembled authored
      This is done by moving
              self.replicator.populate()
      after the switch to MasterOperationHandler, so that the latter is not delayed.
      
      This change comes with some refactoring of the main loop,
      to clean up app.checker and app.replicator properly (like app.tm).
      
      Another option could have been to process notifications with the last handler,
      instead of the first one. But if possible, cleaning up the whole code to not
      delay handlers anymore looks the best option.
      a74937c8
    • Julien Muchembled's avatar
      mysql: code clean up · 041a3eda
      Julien Muchembled authored
      041a3eda
  13. 27 Feb, 2017 3 commits
    • Julien Muchembled's avatar
      Fix oids remaining write-locked forever · 9b33b1db
      Julien Muchembled authored
      This happened in 2 cases:
      - Commit a4c06242 ("Review aborting of
        transactions") introduced a race condition causing oids to remain
        write-locked forever after that the transaction modifying them is aborted.
      - An unfinished transaction is not locked/unlocked during tpc_finish: oids
        must be unlocked when being notified that the transaction is finished.
      9b33b1db
    • Julien Muchembled's avatar
      storage: fix bug not replicating unfinished transactions when the last ones are aborted · 7f754b5e
      Julien Muchembled authored
      This was found by the first assertion of answerRebaseObject (client) because
      a storage node missed a few transactions and reported a conflict with an older
      serial than the one being stored: this must never happen and this commit adds a
      more generic assertion on the storage side.
      
      The above case is when the "first phase" of replication of a partition
      (all history up to the tid before unfinished transactions) ended after
      that the unfinished transactions are finished: this was a corruption bug,
      where UP_TO_DATE cells could miss data.
      
      Otherwise, if the "first phase" ended before, then the partition remained stuck
      in OUT_OF_DATE state. Restarting the storage node was enough to recover.
      7f754b5e
    • Julien Muchembled's avatar
      client: fix an AssertionError while processing late AnswerRebaseObject · 44452395
      Julien Muchembled authored
      Traceback (most recent call last):
        ...
        File "neo/client/app.py", line 507, in tpc_vote
          self.waitStoreResponses(txn_context)
        File "neo/client/app.py", line 500, in waitStoreResponses
          _waitAnyTransactionMessage(txn_context)
        File "neo/client/app.py", line 150, in _waitAnyTransactionMessage
          self._handleConflicts(txn_context)
        File "neo/client/app.py", line 474, in _handleConflicts
          self._store(txn_context, oid, conflict_serial, data)
        File "neo/client/app.py", line 410, in _store
          self._waitAnyTransactionMessage(txn_context, False)
        File "neo/client/app.py", line 145, in _waitAnyTransactionMessage
          self._waitAnyMessage(queue, block=block)
        File "neo/client/app.py", line 133, in _waitAnyMessage
          _handlePacket(conn, packet, kw)
        File "neo/lib/threaded_app.py", line 133, in _handlePacket
          handler.dispatch(conn, packet, kw)
        File "neo/lib/handler.py", line 72, in dispatch
          method(conn, *args, **kw)
        File "neo/client/handlers/storage.py", line 122, in answerRebaseObject
          assert txn_context.conflict_dict[oid] == (serial, conflict)
      AssertionError
      
      Scenario:
      0. unanswered rebase from S2
      1. conflict resolved between t1 and t2 -> S1 & S2
      2. S1 reports a new conflict
      3. S2 answers to the rebase:
         returned serial (t1) is smaller than in conflict_dict (t2)
      4. S2 reports the same conflict as in 2
      44452395
  14. 24 Feb, 2017 2 commits
    • Julien Muchembled's avatar
      storage: fix an AssertionError in internal replication · 560e4fb1
      Julien Muchembled authored
      Traceback (most recent call last):
        ...
        File "neo/storage/handlers/storage.py", line 111, in answerFetchObjects
          self.app.replicator.finish()
        File "neo/storage/replicator.py", line 370, in finish
          self._nextPartition()
        File "neo/storage/replicator.py", line 279, in _nextPartition
          assert app.pt.getCell(offset, app.uuid).isOutOfDate()
      AssertionError
      
      The scenario is:
      1. partition A: start of replication, with unfinished transactions
      2. partition A: all unfinished transactions are finished
      3. partition A: end of replication with ReplicationDone notification
      4. replication of partition B
      5. partition A: AssertionError when starting replication
      
      The bug is that in 3, the partition A is partially replicated and the storage
      node must not notify the master.
      560e4fb1
    • Julien Muchembled's avatar
  15. 23 Feb, 2017 1 commit
  16. 21 Feb, 2017 5 commits
    • Julien Muchembled's avatar
      Remove obsolete comment · df01cdcf
      Julien Muchembled authored
      df01cdcf
    • Julien Muchembled's avatar
    • Julien Muchembled's avatar
      Bump protocol version · c42baaef
      Julien Muchembled authored
      c42baaef
    • Julien Muchembled's avatar
      Implement deadlock avoidance · 092992db
      Julien Muchembled authored
      This is a first version with several optimizations possible:
      - improve EventQueue (or implement a specific queue) to minimize deadlocks
      - turn the RebaseObject packet into a notification
      
      Sorting oids could also be useful to reduce the probability of deadlocks,
      but that would never be enough to avoid them completely, even if there's a
      single storage. For example:
      
      1. C1 does a first store (x or y)
      2. C2 stores x and y; one is delayed
      3. C1 stores the other -> deadlock
         When solving the deadlock, the data of the first store may only
         exist on the storage.
      
      2 functional tests are removed because they're redundant,
      either with ZODB tests or with the new threaded tests.
      092992db
    • Julien Muchembled's avatar
      Fixes/improvements to EventQueue · cc8d0a7c
      Julien Muchembled authored
      - Make sure that errors while processing a delayed packet are reported to the
        connection that sent this packet.
      - Provide a mechanism to process events for the same connection in
        chronological order.
      cc8d0a7c