WIP: component/mariadb: revive innodb_locks_unsafe_for_binlog

This is to revive innodb_locks_unsafe_for_binlog for the latest stable MariaDB.

Performance

The following graph is the result of software/erp5/test/test/benchmarks.py using the default REPEATABLE-READ transaction isolation.

10.4-innodb-unsafe: with innodb_locks_unsafe_for_binlog=1
10.11-innodb: without patch
11.4-innodb: without patch
11.4-innodb-snapshot: without patch, start transaction always with WITH CONSISTENT SNAPSHOT
11.4-innodb-unsafe: with patch here, with innodb_locks_unsafe_for_binlog=1

Q & A

Why `innodb_locks_unsafe_for_binlog` is important for parallel indexing in ERP5 ?

(by @vpelletier) Here is how I remember understanding the performance impact of innodb_locks_unsafe_for_binlog not being enabled: When deleting rows based on a condition which is not a strict exact match on the entire primary key (ex: DELETE FROM stock WHERE uid IN (...) whereas stock table has a second column in its primary key), mariadb must take a "gap lock" over the entire range, on the primary key, of each of each condition. In the stock example this means not only locking each existing row for each given uid, but also somewhere storing the information that any possible row until at least the next potential uid (in practice probably uid to uid + 1, even if uid + 1 does not exist in stock, but reading the revert patches it seems it locks until the next actually existing row).

I believe these locks, intrinsically, may not be the main issue. I believe (but did not investigate at the time) that the issue is the fact these locks must then be checked by any concurrent query, which has a cost. Especially, I can guess per-row locks have to be stored very differently from locks which do not have any underlying row - what is the computational complexity of navigating such set of locks to find whether a query intersects with any ?

Also, my understanding is that we do not get any benefit from such lock: ERP5 should not have any race-condition which would lead to one node inserting stuff in a table while another one deletes the same stuff. Or at least it should not have it in a way which causes permanent divergence: there will be a DELETE + INSERT which happens last in a race-y sequence and brings everything to consistency.

Is it safe in our usage while we use binlogs for maste-slave replication etc. ?

(by @vpelletier) Basically, this means that if two data-modifying transactions can be committed in either order because of a lack of gap-locks, and the outcome differs, then the lack of gap-lock introduce a bug which can cause replication to diverge.

So the question becomes: what can cause such divergence ? I believe this can only happen when the source of data is mariadb itself (IOW, a data modiication query which directly references data from a table, as opposed to containing a literal: a string, a scalar, ...). But in ERP5 it is Zope which is supposed to be the source of data: we pull values from the ZODB (or we generate values in python) and we push the result in mariadb.

There are some fuzzy edges to this though: what happens if we insert rows with literal values but those values come from the result of a previous query ? Do we have such cases ? Can they be divergent (ex: security_uid during indexation matches this case, but it should by design not become divergent).

My belief is that this is in practice not an issue in the case of ERP5, so we can enable innodb_locks_unsafe_for_binlog and still have consistent replication.

How sure are we that this will not lead to data inconsistencies in table contents ?

(to be written)

How about `WITH CONSISTENT SNAPSHOT` with `READ-COMMITTED` isolation ?

That combination sounds nice at a glance, but WITH CONSISTENT SNAPSHOT has no meaning with READ-COMMITTED transaction isolation. (ref: transaction isolation level at least REPEATABLE READ)

How about MyRocks that does not use next-key lock ?

The number of deadlocks is significantly smaller than InnoDB without innodb_locks_unsafe_for_binlog but its overall performance is bad, because it is LSM.

How about other databases like PostgreSQL etc. ?

(to be written)

This is to revive `innodb_locks_unsafe_for_binlog` for the latest stable MariaDB.

# Performance

The following graph is the result of `software/erp5/test/test/benchmarks.py` using the default `REPEATABLE-READ` transaction isolation.

![duration_conflicts_deadlocks_graphs_8.svg](/uploads/3757ed1006f0920a86fb3af8ce001ae8/duration_conflicts_deadlocks_graphs_8.svg)

* 10.4-innodb-unsafe: with innodb_locks_unsafe_for_binlog=1
* 10.11-innodb: without patch
* 11.4-innodb: without patch
* 11.4-innodb-snapshot: without patch, start transaction always with `WITH CONSISTENT SNAPSHOT`
* 11.4-innodb-unsafe: with patch here, with innodb_locks_unsafe_for_binlog=1

# Q & A

## Why `innodb_locks_unsafe_for_binlog` is important for parallel indexing in ERP5 ?

(by @vpelletier) Here is how I remember understanding the performance impact of `innodb_locks_unsafe_for_binlog` not being enabled: When deleting rows based on a condition which is not a strict exact match on the entire primary key (ex: `DELETE FROM stock WHERE uid IN (...)` whereas `stock` table has a second column in its primary key), mariadb must take a "gap lock" over the entire range, on the primary key, of each of each condition. In the `stock` example this means not only locking each existing row for each given `uid`, but also somewhere storing the information that any possible row until at least the next potential `uid` (in practice probably `uid` to `uid` + 1, even if `uid` + 1 does not exist in `stock`, but reading the revert patches it seems it locks until the next actually existing row).

Also, my understanding is that we do not get any benefit from such lock: ERP5 should not have any race-condition which would lead to one node inserting stuff in a table while another one deletes the same stuff. Or at least it should not have it in a way which causes permanent divergence: there will be a `DELETE + INSERT` which happens last in a race-y sequence and brings everything to consistency.

## Is it safe in our usage while we use binlogs for maste-slave replication etc. ?

There are some fuzzy edges to this though: what happens if we insert rows with literal values but those values come from the result of a previous query ? Do we have such cases ? Can they be divergent (ex: `security_uid` during indexation matches this case, but it should by design not become divergent).

My belief is that this is in practice not an issue in the case of ERP5, so we can enable `innodb_locks_unsafe_for_binlog` and still have consistent replication.

## How sure are we that this will not lead to data inconsistencies in table contents ?

(to be written)

## How about `WITH CONSISTENT SNAPSHOT` with `READ-COMMITTED` isolation ?

That combination sounds nice at a glance, but `WITH CONSISTENT SNAPSHOT` has no meaning with `READ-COMMITTED` transaction isolation. (ref: [transaction isolation level at least `REPEATABLE READ`](https://mariadb.com/kb/en/enhancements-for-start-transaction-with-consistent-snapshot/))

## How about MyRocks that does not use next-key lock ?

The number of deadlocks is significantly smaller than InnoDB without `innodb_locks_unsafe_for_binlog` but its overall performance is bad, because it is LSM.

## How about other databases like PostgreSQL etc. ?

(to be written)

Edited Oct 03, 2024 by Kazuhiko Shiozaki

WIP: component/mariadb: revive innodb_locks_unsafe_for_binlog

Performance

Q & A

Why innodb_locks_unsafe_for_binlog is important for parallel indexing in ERP5 ?

Is it safe in our usage while we use binlogs for maste-slave replication etc. ?

How sure are we that this will not lead to data inconsistencies in table contents ?

How about WITH CONSISTENT SNAPSHOT with READ-COMMITTED isolation ?

How about MyRocks that does not use next-key lock ?

How about other databases like PostgreSQL etc. ?

Check out, review, and merge locally

Why `innodb_locks_unsafe_for_binlog` is important for parallel indexing in ERP5 ?

How about `WITH CONSISTENT SNAPSHOT` with `READ-COMMITTED` isolation ?