WIP: component/mariadb: revive innodb_locks_unsafe_for_binlog
This is to revive innodb_locks_unsafe_for_binlog
for the latest stable MariaDB.
Performance
The following graph is the result of software/erp5/test/test/benchmarks.py
using the default REPEATABLE-READ
transaction isolation.
- 10.4-innodb-unsafe: with innodb_locks_unsafe_for_binlog=1
- 10.11-innodb: without patch
- 11.4-innodb: without patch
- 11.4-innodb-snapshot: without patch, start transaction always with
WITH CONSISTENT SNAPSHOT
- 11.4-innodb-unsafe: with patch here, with innodb_locks_unsafe_for_binlog=1
Q & A
innodb_locks_unsafe_for_binlog
is important for parallel indexing in ERP5 ?
Why (by @vpelletier) Here is how I remember understanding the performance impact of innodb_locks_unsafe_for_binlog
not being enabled: When deleting rows based on a condition which is not a strict exact match on the entire primary key (ex: DELETE FROM stock WHERE uid IN (...)
whereas stock
table has a second column in its primary key), mariadb must take a "gap lock" over the entire range, on the primary key, of each of each condition. In the stock
example this means not only locking each existing row for each given uid
, but also somewhere storing the information that any possible row until at least the next potential uid
(in practice probably uid
to uid
+ 1, even if uid
+ 1 does not exist in stock
, but reading the revert patches it seems it locks until the next actually existing row).
I believe these locks, intrinsically, may not be the main issue. I believe (but did not investigate at the time) that the issue is the fact these locks must then be checked by any concurrent query, which has a cost. Especially, I can guess per-row locks have to be stored very differently from locks which do not have any underlying row - what is the computational complexity of navigating such set of locks to find whether a query intersects with any ?
Also, my understanding is that we do not get any benefit from such lock: ERP5 should not have any race-condition which would lead to one node inserting stuff in a table while another one deletes the same stuff. Or at least it should not have it in a way which causes permanent divergence: there will be a DELETE + INSERT
which happens last in a race-y sequence and brings everything to consistency.
Is it safe in our usage while we use binlogs for maste-slave replication etc. ?
(by @vpelletier) Basically, this means that if two data-modifying transactions can be committed in either order because of a lack of gap-locks, and the outcome differs, then the lack of gap-lock introduce a bug which can cause replication to diverge.
So the question becomes: what can cause such divergence ? I believe this can only happen when the source of data is mariadb itself (IOW, a data modiication query which directly references data from a table, as opposed to containing a literal: a string, a scalar, ...). But in ERP5 it is Zope which is supposed to be the source of data: we pull values from the ZODB (or we generate values in python) and we push the result in mariadb.
There are some fuzzy edges to this though: what happens if we insert rows with literal values but those values come from the result of a previous query ? Do we have such cases ? Can they be divergent (ex: security_uid
during indexation matches this case, but it should by design not become divergent).
My belief is that this is in practice not an issue in the case of ERP5, so we can enable innodb_locks_unsafe_for_binlog
and still have consistent replication.
How sure are we that this will not lead to data inconsistencies in table contents ?
(to be written)
WITH CONSISTENT SNAPSHOT
with READ-COMMITTED
isolation ?
How about That combination sounds nice at a glance, but WITH CONSISTENT SNAPSHOT
has no meaning with READ-COMMITTED
transaction isolation. (ref: transaction isolation level at least REPEATABLE READ
)
How about MyRocks that does not use next-key lock ?
The number of deadlocks is significantly smaller than InnoDB without innodb_locks_unsafe_for_binlog
but its overall performance is bad, because it is LSM.
How about other databases like PostgreSQL etc. ?
(to be written)