An error occurred fetching the project authors.
  1. 07 Nov, 2018 1 commit
  2. 05 Nov, 2018 1 commit
  3. 30 May, 2018 1 commit
  4. 31 Mar, 2017 1 commit
  5. 23 Mar, 2017 1 commit
    • Julien Muchembled's avatar
      storage: in deadlock avoidance, fix performance issue that could freeze the cluster · 1280f73e
      Julien Muchembled authored
      In the worst case, with many clients trying to lock the same oids,
      the cluster could enter in an infinite cascade of deadlocks.
      
      Here is an overview with 3 storage nodes and 3 transactions:
      
       S1     S2     S3     order of locking tids          # abbreviations:
       l1     l1     l2     123                            #  l: lock
       q23    q23    d1q3   231                            #  d: deadlock triggered
       r1:l3  r1:l2  (r1)   # for S3, we still have l2     #  q: queued
       d2q1   q13    q13    312                            #  r: rebase
      
      Above, we show what happens when a random transaction gets a lock just after
      that another is rebased. Here, the result is that the last 2 lines are a
      permutation of the first 2, and this can repeat indefinitely with bad luck.
      
      This commit reduces the probability of deadlock by processing delayed
      stores/checks in the order of their locking tid. In the above example,
      S1 would give the lock to 2 when 1 is rebased, and 2 would vote successfully.
      1280f73e
  6. 27 Feb, 2017 1 commit
    • Julien Muchembled's avatar
      Fix oids remaining write-locked forever · 9b33b1db
      Julien Muchembled authored
      This happened in 2 cases:
      - Commit a4c06242 ("Review aborting of
        transactions") introduced a race condition causing oids to remain
        write-locked forever after that the transaction modifying them is aborted.
      - An unfinished transaction is not locked/unlocked during tpc_finish: oids
        must be unlocked when being notified that the transaction is finished.
      9b33b1db
  7. 21 Feb, 2017 3 commits
    • Julien Muchembled's avatar
      Implement deadlock avoidance · 092992db
      Julien Muchembled authored
      This is a first version with several optimizations possible:
      - improve EventQueue (or implement a specific queue) to minimize deadlocks
      - turn the RebaseObject packet into a notification
      
      Sorting oids could also be useful to reduce the probability of deadlocks,
      but that would never be enough to avoid them completely, even if there's a
      single storage. For example:
      
      1. C1 does a first store (x or y)
      2. C2 stores x and y; one is delayed
      3. C1 stores the other -> deadlock
         When solving the deadlock, the data of the first store may only
         exist on the storage.
      
      2 functional tests are removed because they're redundant,
      either with ZODB tests or with the new threaded tests.
      092992db
    • Julien Muchembled's avatar
      Fixes/improvements to EventQueue · cc8d0a7c
      Julien Muchembled authored
      - Make sure that errors while processing a delayed packet are reported to the
        connection that sent this packet.
      - Provide a mechanism to process events for the same connection in
        chronological order.
      cc8d0a7c
    • Julien Muchembled's avatar
  8. 14 Feb, 2017 3 commits
  9. 02 Feb, 2017 4 commits
  10. 18 Jan, 2017 1 commit
  11. 23 Dec, 2016 1 commit
  12. 27 Nov, 2016 2 commits
  13. 15 Nov, 2016 1 commit
    • Kirill Smelkov's avatar
      backup: Teach cluster in BACKUPING state to also serve regular ZODB clients in read-only mode · d4944062
      Kirill Smelkov authored
      A backup cluster for tids <= backup_tid has all data to provide regular
      read-only ZODB service. Having regular ZODB access to the data can be
      handy e.g. for externally verifying data for consistency between
      main and backup clusters. Peeking around without disturbing main
      cluster might be also useful sometimes.
      
      In this patch:
      
      - master & storage nodes are taught:
      
          * to instantiate read-only or regular client service handler depending on cluster state:
            RUNNING   -> regular
            BACKINGUP -> read-only
      
          * in read-only client handler:
            + to reject write-related operations
            + to provide read operations but adjust semantic as last_tid in the database
              would be = backup_tid
      
      - new READ_ONLY_ACCESS protocol error code is introduced so that client can
        raise POSException.ReadOnlyError upon receiving it.
      
      I have not implemented back-channel for invalidations in read-only mode (yet ?).
      This way once a client connects to cluster in backup state, it won't see
      new data fetched by backup cluster from upstream after client connected.
      
      The reasons invalidations are not implemented is that for now (imho)
      there is no off-hand ready infrastructure to get updates from
      replicating node on transaction-by-transaction basis (it currently only
      notifies when whole batch is done). For consistency verification (main
      reason for this patch) we also don't need invalidations to work, as in
      that task we always connect afresh to backup. So I simply only put
      relevant TODOs about invalidations for now.
      
      The patch is not very polished but should work.
      
      /reviewed-on nexedi/neoppod!4
      d4944062
  14. 01 Aug, 2016 1 commit
  15. 22 Mar, 2016 1 commit
  16. 25 Jan, 2016 1 commit
  17. 30 Nov, 2015 1 commit
    • Julien Muchembled's avatar
      Minimize the amount of work during tpc_finish · 7eb7cf1b
      Julien Muchembled authored
      NEO did not ensure that all data and metadata are written on disk before
      tpc_finish, and it was for example vulnerable to ENOSPC errors.
      In other words, some work had to be moved to tpc_vote:
      
      - In tpc_vote, all involved storage nodes are now asked to write all metadata
        to ttrans/tobj and _commit_. Because the final tid is not known yet, the tid
        column of ttrans and tobj now contains NULL and the ttid respectively.
      
      - In tpc_finish, AskLockInformation is still required for read locking,
        ttrans.tid is updated with the final value and this change is _committed_.
      
      - The verification phase is greatly simplified, more reliable and faster. For
        all voted transactions, we can know if a tpc_finish was started by getting
        the final tid from the ttid, either from ttrans or from trans. And we know
        that such transactions can't be partial so we don't need to check oids.
      
      So in addition to minimizing the risk of failures during tpc_finish, we also
      fix a bug causing the verification phase to discard transactions with objects
      for which readCurrent was called.
      
      On performance side:
      
      - Although tpc_vote now asks all involved storages, instead of only those
        storing the transaction metadata, the client has been improved to do this
        in parallel. The additional commits are also all done in parallel.
      
      - A possible improvement to compensate the additional commits is to delay the
        commit done by the unlock.
      
      - By minimizing the time to lock transactions, objects are read-locked for a
        much shorter period. This is even more important that locked transactions
        must be unlocked in the same order.
      
      Transactions with too many modified objects will now timeout inside tpc_vote
      instead of tpc_finish. Of course, such transactions may still cause other
      transaction to timeout in tpc_finish.
      7eb7cf1b
  18. 28 Aug, 2015 1 commit
    • Julien Muchembled's avatar
      storage: fix history() not waiting oid to be unlocked · e27358d1
      Julien Muchembled authored
      This fixes a random failure in testClientReconnection:
      
      Traceback (most recent call last):
        File "neo/tests/threaded/test.py", line 754, in testClientReconnection
          self.assertTrue(cluster.client.history(x1._p_oid))
      failureException: None is not true
      e27358d1
  19. 15 Jun, 2015 1 commit
  20. 21 May, 2015 1 commit
  21. 24 Jun, 2014 1 commit
  22. 07 Jan, 2014 1 commit
  23. 28 Oct, 2013 2 commits
  24. 20 Mar, 2012 1 commit
  25. 13 Mar, 2012 1 commit
  26. 24 Feb, 2012 1 commit
    • Julien Muchembled's avatar
      Implements backup using specialised storage nodes and relying on replication · 8e3c7b01
      Julien Muchembled authored
      Replication is also fully reimplemented:
      - It is not done anymore on whole partitions.
      - It runs at lowest priority not to degrades performance for client nodes.
      
      Schema of MySQL table is changed to optimize storage layout: rows are now
      grouped by age, for good partial replication performance.
      This certainly also speeds up simple loads/stores.
      8e3c7b01
  27. 07 Feb, 2012 1 commit
  28. 17 Jan, 2012 1 commit
  29. 26 Oct, 2011 1 commit
  30. 11 Oct, 2011 2 commits
    • Julien Muchembled's avatar
      Fix protocol and DB schema so that storages can handle transactions of any size · d5c469be
      Julien Muchembled authored
      - Change protocol to use SHA1 for all checksums:
        - Use SHA1 instead of CRC32 for data checksums.
        - Use SHA1 instead of MD5 for replication.
      
      - Change DatabaseManager API so that backends can store raw data separately from
        object metadata:
        - When processing AskStoreObject, call the backend to store the data
          immediately, instead of keeping it in RAM or in the temporary object table.
          Data is then referenced only by its checksum.
          Without such change, the storage could fail to store the transaction due to
          lack of RAM, or it could make tpc_finish step very slow.
        - Backends have to store data in a separate space, and remove entries as soon
          as they get unreferenced. So they must have an index of checksums in object
          metadata space. A new '_uncommitted_data' backend attribute keeps references
          of uncommitted data.
        - New methods: _pruneData, _storeData, storeData, unlockData
        - MySQL: change vertical partitioning of 'obj' by having data in a separate
          'data' table instead of using a shortened 'obj_short' table.
        - BTree: data is moved from '_obj' to a new '_data' btree.
      
      - Undo is optimized so that backpointers are not required anymore to fetch data:
        - The checksum of an object is None only when creation is undone.
        - Removed DatabaseManager methods: _getObjectData, _getDataTIDFromData
        - DatabaseManager: move some code from _getDataTID to findUndoTID so that
          _getDataTID only has what's specific to backend.
      
      - Removed because already covered by ZODB tests:
        - neo.tests.storage.testStorageDBTests.StorageDBTests.test__getDataTID
        - neo.tests.storage.testStorageDBTests.StorageDBTests.test__getDataTIDFromData
      d5c469be
    • Julien Muchembled's avatar
      Allow NEO to store empty values · d90c5b83
      Julien Muchembled authored
      This changes how NEO stores undo information
      and how it is transmitted on the network.
      d90c5b83