client: fix load error during conflict resolution in case of late invalidation
-
Owner
Why are we subject to last_tid during conflict resolution ? I thought conflict resolution always happened outside of the transaction snapshot by using loadSerial. Then cache should miss and the load should reach storage without needing to explicitely wait for the invalidation.
On a related note, I see
tryToResolveConflict
supports an extra argument,committedData
, which would allow storage to send us the conflicting data without requiring an extra round-trip. It may be nice to make use of this. -
Owner
I found this while writing
testPartialConflict
in 94e4f469d087159ee67928ae6e63b3b5cd403d6b (a previous version of 74c69d54). In this test:- I delay communications with the master
- 1 storage node is being disconnected by the master because it missed a transaction (actually the one that caused the conflict)
IOW:
- 2 clients C1, C2
- 2 storages S1, S2 replicated
- network cut between C2 & S1 just before the last committed transaction
- network delay between C1 & M
- TCP FIN delayed or dropped from M to S1
The client is not written to load data past last_tid. Without this, it fails in 3 ways:
-
Failed assertion in _loadFromCache
Easy to solve:
return None
ifresult or result[1] < at_tid
. That's what I did at the beginning. -
Failed assertion when storing the value in cache (assuming _loadFromStorage works)
Also easy: just
return
there ifitem.tid < tid
. But this commit is better with respect to the cache. -
a storage may fail to read the data
A pending notification that the down storage may also be late and _loadFromStorage may try it first. When this node is not aware yet that it's not up-to-date anymore, it answers happily that such oid does not exist at this revision. I don't want to change _loadFromStorage to ignore this error when a second load from another cell works. It looks too difficult to make sure this is reliable.
Dealing with outdated partition table is actually a subject we'll have to review.
-
Owner
I see
tryToResolveConflict
supports an extra argument,committedData
, which would allow storage to send us the conflicting data without requiring an extra round-trip. It may be nice to make use of this.2 problems:
- the data may already be loaded by another thread
- with replicas, the data would be transferred several times on the network
-
Owner
The client is not written to load data past last_tid. Without this, it fails in 3 ways:
Ways 1 and 2 are related to current cache's rigidity. I happen to drop both asserts in my old load-lock-removall patchset.
For 3, I think client has to try the next storage on such failure during a loadSerial, as it is not supposed to be scoped to any running transaction AFAIK. It must indeed be legal for a storage to fall out-of-date without it nor the client knowing it through a partition table update.
2 problems:
- the data may already be loaded by another thread
Not sure this is very relevant: the more clients there are, the less likely the current one caused the invalidation so the less likely it is in local cache. Also, the more clients there are the more likely latency is getting higher (actual networking becomes more likely, instead of localhost use). And the less clients there are, the less likely the database will be the bottleneck (client-side cpu being more likely).
- with replicas, the data would be transferred several times on the network
Good point, althgouh I'm not sure of the cost in practice (how many replicas will be in use ? how long does an individual pickle takes to be transfered ? what is the typical latency ?).
-
Owner
as it is not supposed to be scoped to any running transaction
What I mean here is that while we can expect consistency across storages while we are inside a transaction snapshot (and I believe it still holds when partition table updates are lagging - if a storage was good enough for a given transaction before whatever caused it to be outdated, it should still be good enough between the time it became outdated and the time client receives the updated partition table), we should probably not expect consistency outside of a transaction snapshot.
-
Owner
I forgot another reason. The commit mentioned in my previous comment describes the same scenario in TransactionManager.lockObject (storage), which now rejects multiple stores that aren't undos: the master disconnects S1 (node/pt notifications) before the invalidation* for the conflicting transaction (missed by S1), so this sync guarantees that the client does not store the same oid twice to S1 (either down, or up with a new TM).
So reverting this commit also means that the storage should reset itself in the above case, something like
app.master_conn.close()
.* actually, just before locking the transaction (see TransactionManager.prepare), to make sure that the verification phase works with a new PT.
-
Owner
I'm going to revert the raise in case of multiple stores of an oid. There's one valid case, when the previous undo resulted in a resolved conflict.