Commits · 9a5b46dd6cd38a627575cf8b48f14bf108f4a93c · nexedi / neoppod

An error occurred fetching the project authors.

07 Nov, 2018 1 commit

Fix data corruption due to undetected conflicts after storage failures · 854a4920

Without this new mechanism to detect oids that aren't write-locked,
a transaction could be committed successfully without detecting conflicts.
In the added test, the resulting value was 2, whereas it should be 5 if there
was no node failure.

854a4920

05 Nov, 2018 1 commit
- storage: fix write-lock leak · ce42103a
  Julien Muchembled authored 6 years ago
  
  ce42103a
30 May, 2018 1 commit
- protocol: update packet docstrings · 9f0f2afe
  Julien Muchembled authored 7 years ago
```
/reviewed-on !9
```
  9f0f2afe
31 Mar, 2017 1 commit

storage: fix commit activity when cells are discarded or when they become readable · 34d797e2

Julien Muchembled authored 7 years ago

This is a follow up of commit 64afd7d2,
which focused on read accesses when there is no transaction activity.

This commit also includes a test to check a simpler scenario that the one
described in the previous commit.

34d797e2

23 Mar, 2017 1 commit

storage: in deadlock avoidance, fix performance issue that could freeze the cluster · 1280f73e

Julien Muchembled authored 8 years ago

In the worst case, with many clients trying to lock the same oids,
the cluster could enter in an infinite cascade of deadlocks.

Here is an overview with 3 storage nodes and 3 transactions:

S1 S2 S3 order of locking tids # abbreviations:
l1 l1 l2 123 # l: lock
q23 q23 d1q3 231 # d: deadlock triggered
r1:l3 r1:l2 (r1) # for S3, we still have l2 # q: queued
d2q1 q13 q13 312 # r: rebase

Above, we show what happens when a random transaction gets a lock just after
that another is rebased. Here, the result is that the last 2 lines are a
permutation of the first 2, and this can repeat indefinitely with bad luck.

This commit reduces the probability of deadlock by processing delayed
stores/checks in the order of their locking tid. In the above example,
S1 would give the lock to 2 when 1 is rebased, and 2 would vote successfully.

1280f73e

27 Feb, 2017 1 commit

Fix oids remaining write-locked forever · 9b33b1db

Julien Muchembled authored 8 years ago

This happened in 2 cases:
- Commit a4c06242 ("Review aborting of
  transactions") introduced a race condition causing oids to remain
  write-locked forever after that the transaction modifying them is aborted.
- An unfinished transaction is not locked/unlocked during tpc_finish: oids
  must be unlocked when being notified that the transaction is finished.

9b33b1db

21 Feb, 2017 3 commits

Implement deadlock avoidance · 092992db

Julien Muchembled authored 8 years ago

This is a first version with several optimizations possible:
- improve EventQueue (or implement a specific queue) to minimize deadlocks
- turn the RebaseObject packet into a notification

Sorting oids could also be useful to reduce the probability of deadlocks,
but that would never be enough to avoid them completely, even if there's a
single storage. For example:

1. C1 does a first store (x or y)
2. C2 stores x and y; one is delayed
3. C1 stores the other -> deadlock
   When solving the deadlock, the data of the first store may only
   exist on the storage.

2 functional tests are removed because they're redundant,
either with ZODB tests or with the new threaded tests.

092992db

Fixes/improvements to EventQueue · cc8d0a7c

Julien Muchembled authored 8 years ago

- Make sure that errors while processing a delayed packet are reported to the
  connection that sent this packet.
- Provide a mechanism to process events for the same connection in
  chronological order.

cc8d0a7c

Change order of oid/serial fields in CheckCurrentSerial packet · 3e6adac3
Julien Muchembled authored 8 years ago

3e6adac3

14 Feb, 2017 3 commits
- Fix conflict handling after a successful store to a node being disconnected... · 74c69d54
  Julien Muchembled authored 8 years ago
```
Fix conflict handling after a successful store to a node being disconnected for having missed a transaction
```
  74c69d54
- Review aborting of transactions · a4c06242
  Julien Muchembled authored 8 years ago
  
  a4c06242
- Lockless stores/checks during replication · 7af948cf
  Julien Muchembled authored 8 years ago
  
  7af948cf
02 Feb, 2017 4 commits

Delayed connection acception when the storage node is ready · b7a5bc99

Julien Muchembled authored 8 years ago

Now that we do inequality comparisons between timestamps, the master must
use a monotonic clock, to avoid issues when the clock is turned back.
Before, the probability that time.time() returned again the same value was
probably negligible.

b7a5bc99

Drop initial implementation of deadlock resolution · bf03a305

Julien Muchembled authored 8 years ago

It was disabled long time ago and NEO has evolved in such a way that the new
implementation will be completely different.

bf03a305

Remove HasLock mechanism · c184ab48

Julien Muchembled authored 9 years ago

It's dead code, because 1 year after it was introduced, something else was
implemented to detect deadlocks immediately.

Anyway, it would be an unacceptable way to detect them.

c184ab48

Do not ask storage to send oid/serial back on store/check · 787586e6
Julien Muchembled authored 8 years ago

787586e6

18 Jan, 2017 1 commit
- Update copyright year · a97cd0a2
  Julien Muchembled authored 8 years ago
  
  a97cd0a2
23 Dec, 2016 1 commit
- storage: simplify Transaction API · a8f9fedb
  Julien Muchembled authored 8 years ago
  
  a8f9fedb
27 Nov, 2016 2 commits
- storage: only accept clients that are known by the master · c17f5f91
  Julien Muchembled authored 8 years ago
```
Therefore, a client node in the node manager is always RUNNING.
```
  c17f5f91
- Fix spelling mistakes · 6e32ebb7
  Julien Muchembled authored 8 years ago
  
  6e32ebb7
15 Nov, 2016 1 commit

backup: Teach cluster in BACKUPING state to also serve regular ZODB clients in read-only mode · d4944062

Kirill Smelkov authored 8 years ago

A backup cluster for tids <= backup_tid has all data to provide regular
read-only ZODB service. Having regular ZODB access to the data can be
handy e.g. for externally verifying data for consistency between
main and backup clusters. Peeking around without disturbing main
cluster might be also useful sometimes.

In this patch:

- master & storage nodes are taught:

* to instantiate read-only or regular client service handler depending on cluster state:
RUNNING -> regular
BACKINGUP -> read-only

* in read-only client handler:
+ to reject write-related operations
+ to provide read operations but adjust semantic as last_tid in the database
would be = backup_tid

- new READ_ONLY_ACCESS protocol error code is introduced so that client can
raise POSException.ReadOnlyError upon receiving it.

I have not implemented back-channel for invalidations in read-only mode (yet ?).
This way once a client connects to cluster in backup state, it won't see
new data fetched by backup cluster from upstream after client connected.

The reasons invalidations are not implemented is that for now (imho)
there is no off-hand ready infrastructure to get updates from
replicating node on transaction-by-transaction basis (it currently only
notifies when whole batch is done). For consistency verification (main
reason for this patch) we also don't need invalidations to work, as in
that task we always connect afresh to backup. So I simply only put
relevant TODOs about invalidations for now.

The patch is not very polished but should work.

/reviewed-on nexedi/neoppod!4

d4944062

01 Aug, 2016 1 commit
- storage: speed up transaction registration · e25fa5d9
  Julien Muchembled authored 8 years ago
  
  e25fa5d9
22 Mar, 2016 1 commit
- Recover from failures during tpc_finish when the transaction got successfully committed · dd74d662
  Julien Muchembled authored 8 years ago
  
  dd74d662
25 Jan, 2016 1 commit
- Update copyright year · 5a8e9d04
  Julien Muchembled authored 9 years ago
  
  5a8e9d04
30 Nov, 2015 1 commit

Minimize the amount of work during tpc_finish · 7eb7cf1b

Julien Muchembled authored 9 years ago

NEO did not ensure that all data and metadata are written on disk before
tpc_finish, and it was for example vulnerable to ENOSPC errors.
In other words, some work had to be moved to tpc_vote:

- In tpc_vote, all involved storage nodes are now asked to write all metadata
  to ttrans/tobj and _commit_. Because the final tid is not known yet, the tid
  column of ttrans and tobj now contains NULL and the ttid respectively.

- In tpc_finish, AskLockInformation is still required for read locking,
  ttrans.tid is updated with the final value and this change is _committed_.

- The verification phase is greatly simplified, more reliable and faster. For
  all voted transactions, we can know if a tpc_finish was started by getting
  the final tid from the ttid, either from ttrans or from trans. And we know
  that such transactions can't be partial so we don't need to check oids.

So in addition to minimizing the risk of failures during tpc_finish, we also
fix a bug causing the verification phase to discard transactions with objects
for which readCurrent was called.

On performance side:

- Although tpc_vote now asks all involved storages, instead of only those
  storing the transaction metadata, the client has been improved to do this
  in parallel. The additional commits are also all done in parallel.

- A possible improvement to compensate the additional commits is to delay the
  commit done by the unlock.

- By minimizing the time to lock transactions, objects are read-locked for a
  much shorter period. This is even more important that locked transactions
  must be unlocked in the same order.

Transactions with too many modified objects will now timeout inside tpc_vote
instead of tpc_finish. Of course, such transactions may still cause other
transaction to timeout in tpc_finish.

7eb7cf1b

28 Aug, 2015 1 commit

storage: fix history() not waiting oid to be unlocked · e27358d1

Julien Muchembled authored 9 years ago

This fixes a random failure in testClientReconnection:

Traceback (most recent call last):
  File "neo/tests/threaded/test.py", line 754, in testClientReconnection
    self.assertTrue(cluster.client.history(x1._p_oid))
failureException: None is not true

e27358d1

15 Jun, 2015 1 commit

mysql: the largest value allowed by TokuDB enginge is 32 MB · f9a8500d

Julien Muchembled authored 9 years ago

Limiting the size of data.value column to 16 MB saves 1 byte by switching
to MEDIUMBLOB, and it avoid the need of big redo logs in InnoDB.

f9a8500d

21 May, 2015 1 commit
- Update copyright year · 8204e541
  Julien Muchembled authored 9 years ago
  
  8204e541
24 Jun, 2014 1 commit
- storage: small optimisation of askObject for most likely code path · 07e11994
  Julien Muchembled authored 10 years ago
  
  07e11994
07 Jan, 2014 1 commit
- Update copyright year · 9b05eff4
  Julien Muchembled authored 11 years ago
  
  9b05eff4
28 Oct, 2013 2 commits

storage.handler.client: Create a transaction when checking current serial. · e4e764db

Vincent Pelletier authored 11 years ago

This fixes a bug causing a crash during tpc_finish phase when a storage
involved in a transaction does not receive any object, but receives at
least one CheckCurrentSerial request: no transaction was registered, and
storage would fail to lock transaction when requested by master during
tpc_finish phase.

e4e764db

storage.handler.client: Answer with consistent class. · 4a89c573
Vincent Pelletier authored 11 years ago

4a89c573

20 Mar, 2012 1 commit
- neo.lib.logging.* -> logging.* · 5743cdce
  Julien Muchembled authored 12 years ago
  
  5743cdce
13 Mar, 2012 1 commit
- Fix and update copyright notices · 20dae799
  Julien Muchembled authored 13 years ago
  
  20dae799
24 Feb, 2012 1 commit

Implements backup using specialised storage nodes and relying on replication · 8e3c7b01

Julien Muchembled authored 13 years ago

Replication is also fully reimplemented:
- It is not done anymore on whole partitions.
- It runs at lowest priority not to degrades performance for client nodes.

Schema of MySQL table is changed to optimize storage layout: rows are now
grouped by age, for good partial replication performance.
This certainly also speeds up simple loads/stores.

8e3c7b01

07 Feb, 2012 1 commit
- Drop unneeded intermediate variable. · dcb0e404
  Vincent Pelletier authored 13 years ago
  
  dcb0e404
17 Jan, 2012 1 commit
- storage: remove useless 'num_partitions' parameter from backend methods · 6959fbe6
  Julien Muchembled authored 13 years ago
  
  6959fbe6
26 Oct, 2011 1 commit
- Use relative imports inside each virtual package · d9471e2e
  Julien Muchembled authored 13 years ago
  
  d9471e2e
11 Oct, 2011 2 commits

Fix protocol and DB schema so that storages can handle transactions of any size · d5c469be

Julien Muchembled authored 13 years ago

- Change protocol to use SHA1 for all checksums:
  - Use SHA1 instead of CRC32 for data checksums.
  - Use SHA1 instead of MD5 for replication.

- Change DatabaseManager API so that backends can store raw data separately from
  object metadata:
  - When processing AskStoreObject, call the backend to store the data
    immediately, instead of keeping it in RAM or in the temporary object table.
    Data is then referenced only by its checksum.
    Without such change, the storage could fail to store the transaction due to
    lack of RAM, or it could make tpc_finish step very slow.
  - Backends have to store data in a separate space, and remove entries as soon
    as they get unreferenced. So they must have an index of checksums in object
    metadata space. A new '_uncommitted_data' backend attribute keeps references
    of uncommitted data.
  - New methods: _pruneData, _storeData, storeData, unlockData
  - MySQL: change vertical partitioning of 'obj' by having data in a separate
    'data' table instead of using a shortened 'obj_short' table.
  - BTree: data is moved from '_obj' to a new '_data' btree.

- Undo is optimized so that backpointers are not required anymore to fetch data:
  - The checksum of an object is None only when creation is undone.
  - Removed DatabaseManager methods: _getObjectData, _getDataTIDFromData
  - DatabaseManager: move some code from _getDataTID to findUndoTID so that
    _getDataTID only has what's specific to backend.

- Removed because already covered by ZODB tests:
  - neo.tests.storage.testStorageDBTests.StorageDBTests.test__getDataTID
  - neo.tests.storage.testStorageDBTests.StorageDBTests.test__getDataTIDFromData

d5c469be

Allow NEO to store empty values · d90c5b83
Julien Muchembled authored 13 years ago
```
This changes how NEO stores undo information
and how it is transmitted on the network.
```
d90c5b83