Commits · c68007d958aaab0953a01c96eb02326f19d3c9d7 · nexedi / MariaDB

17 Feb, 2021 4 commits

MDEV-24738 Improve the InnoDB deadlock checker · c68007d9

Marko Mäkelä authored Feb 17, 2021

A new configuration parameter innodb_deadlock_report is introduced:
* innodb_deadlock_report=off: Do not report any details of deadlocks.
* innodb_deadlock_report=basic: Report transactions and waiting locks.
* innodb_deadlock_report=full (default): Report also the blocking locks.

The improved deadlock checker will consider all involved transactions
in one loop, even if the deadlock loop includes several transactions.
The theoretical maximum number of transactions that can be involved in
a deadlock is `innodb_page_size` * 8, limited by the persistent data
structures.

Note: Similar to
mysql/mysql-server@3859219875b62154b921e8c6078c751198071b9c
our deadlock checker will consider at most one blocking transaction
for each waiting transaction. The new field trx->lock.wait_trx be
nullptr if and only if trx->lock.wait_lock is nullptr. Note that
trx->lock.wait_lock->trx == trx (the waiting transaction), while
trx->lock.wait_trx points to one of the transactions whose lock is
conflicting with trx->lock.wait_lock.

Considering only one blocking transaction will greatly simplify
our deadlock checker, but it may also make the deadlock checker
blind to some deadlocks where the deadlock cycle is 'hidden' by
the fact that the registered trx->lock.wait_trx is not actually
waiting for any InnoDB lock, but something else. So, instead of
deadlocks, sometimes lock wait timeout may be reported.

To improve on this, whenever trx->lock.wait_trx is changed, we
will register further 'candidate' transactions in Deadlock::to_check(),
and check for 'revealed' deadlocks as soon as possible, in lock_release()
and innobase_kill_query().

The old DeadlockChecker was holding lock_sys.latch, even though using
lock_sys.wait_mutex should be less contended (and thus preferred)
in the likely case that no deadlock is present.

lock_wait(): Defer the deadlock check to this function, instead of
executing it in lock_rec_enqueue_waiting(), lock_table_enqueue_waiting().

DeadlockChecker: Complete rewrite:
(1) Explicitly keep track of transactions that are being waited for,
in trx->lock.wait_trx, protected by lock_sys.wait_mutex. Previously,
we were painstakingly traversing the lock heaps while blocking
concurrent registration or removal of any locks (even uncontended ones).
(2) Use Brent's cycle-detection algorithm for deadlock detection,
traversing each trx->lock.wait_trx edge at most 2 times.
(3) If a deadlock is detected, release lock_sys.wait_mutex,
acquire LockMutexGuard, re-acquire lock_sys.wait_mutex and re-invoke
find_cycle() to find out whether the deadlock is still present.
(4) Display information on all transactions that are involved in the
deadlock, and choose a victim to be rolled back.

lock_sys.deadlocks: Replaces lock_deadlock_found. Protected by wait_mutex.

Deadlock::find_cycle(): Quickly find a cycle of trx->lock.wait_trx...
using Brent's cycle detection algorithm.

Deadlock::report(): Report a deadlock cycle that was found by
Deadlock::find_cycle(), and choose a victim with the least weight.
Altogether, we may traverse each trx->lock.wait_trx edge up to 5
times (2*find_cycle()+1 time for reporting and choosing the victim).

Deadlock::check_and_resolve(): Find and resolve a deadlock.

lock_wait_rpl_report(): Report the waits-for information to
replication. This used to be executed as part of DeadlockChecker.
Replication must know the waits-for relations even if no deadlocks
are present in InnoDB.

Reviewed by: Vladislav Vaintroub

c68007d9

MDEV-24738: Extend the test innodb.deadlock_detect · 3ddb4fdd
Marko Mäkelä authored Feb 15, 2021

3ddb4fdd

MDEV-24884 Hang in ssux_lock_low::write_lock() · 272a1289

Marko Mäkelä authored Feb 17, 2021

ssux_lock_low::write_lock(): Before invoking writer_wait(), keep
attempting write_lock_wait_try() as long as no conflict exists.

rw_lock::upgrade_trylock(): Relax a bogus assertion and correct
the acquisition operation. Another thread may be executing in
ssux_lock_low::write_lock() on the same latch. Because we are the
only thread that can make progress on that latch, we must become
the writer. Any waiting thread will be eventually woken up by
ssux_lock_low::u_unlock() or ssux_lock_low::wr_unlock(), but not
by wr_u_downgrade() because the upgrade is a very rare operation.

272a1289

MDEV-20612 fixup: Make comments refer to lock_sys.latch · 584e5211
Marko Mäkelä authored Feb 17, 2021

584e5211

16 Feb, 2021 2 commits
- MDEV-24341 - followup remove assert. · 1146e98b
  Vladislav Vaintroub authored Feb 16, 2021
  
  1146e98b
- MDEV-20612 fixup: Fix a memory leak in buffer pool resize · e5d83ad4
  Marko Mäkelä authored Feb 16, 2021
  
  e5d83ad4
15 Feb, 2021 1 commit

MDEV-24861 Assertion `trx->rsegs.m_redo.rseg' failed in innodb_prepare_commit_versioned · 2e84846e

Marko Mäkelä authored Feb 15, 2021

trx_t::commit_tables(): Ensure that mod_tables will be empty.
This was broken in commit b08448de
where the query cache invalidation was moved from lock_release().

2e84846e

14 Feb, 2021 1 commit
- MDEV-24341 Innodb - do not block in foreground thread in log_write_up_to( · 4df0249b
  Vladislav Vaintroub authored Feb 14, 2021
  
  4df0249b
12 Feb, 2021 3 commits

MDEV-24643: Assertion failed in rw_lock::update_unlock() · a1542f8a

Marko Mäkelä authored Feb 12, 2021

mtr_defer_drop_ahi(): Upgrade the U lock to X lock and downgrade
it back to U lock in case the adaptive hash index needs to be dropped.

This regression was introduced in
commit 03ca6495 (MDEV-24142).

a1542f8a

MDEV-20612: Enable concurrent lock_release() · 26d6224d

Marko Mäkelä authored Feb 12, 2021

lock_release_try(): Try to release locks while only holding
shared lock_sys.latch.

lock_release(): If 5 attempts of lock_release_try() fail,
proceed to acquire exclusive lock_sys.latch.

26d6224d

MDEV-20612: Partition lock_sys.latch · b08448de

Marko Mäkelä authored Feb 12, 2021

We replace the old lock_sys.mutex (which was renamed to lock_sys.latch)
with a combination of a global lock_sys.latch and table or page hash lock
mutexes.

The global lock_sys.latch can be acquired in exclusive mode, or
it can be acquired in shared mode and another mutex will be acquired
to protect the locks for a particular page or a table.

This is inspired by
mysql/mysql-server@1d259b87a63defa814e19a7534380cb43ee23c48
but the optimization of lock_release() will be done in the next commit.
Also, we will interleave mutexes with the hash table elements, similar
to how buf_pool.page_hash was optimized
in commit 5155a300 (MDEV-22871).

dict_table_t::autoinc_trx: Use Atomic_relaxed.

dict_table_t::autoinc_mutex: Use srw_mutex in order to reduce the
memory footprint. On 64-bit Linux or OpenBSD, both this and the new
dict_table_t::lock_mutex should be 32 bits and be stored in the same
64-bit word. On Microsoft Windows, the underlying SRWLOCK is 32 or 64
bits, and on other systems, sizeof(pthread_mutex_t) can be much larger.

ib_lock_t::trx_locks, trx_lock_t::trx_locks: Document the new rules.
Writers must assert lock_sys.is_writer() || trx->mutex_is_owner().

LockGuard: A RAII wrapper for acquiring a page hash table lock.

LockGGuard: Like LockGuard, but when Galera Write-Set Replication
is enabled, we must acquire all shards, for updating arbitrary trx_locks.

LockMultiGuard: A RAII wrapper for acquiring two page hash table locks.

lock_rec_create_wsrep(), lock_table_create_wsrep(): Special
Galera conflict resolution in non-inlined functions in order
to keep the common code paths shorter.

lock_sys_t::prdt_page_free_from_discard(): Refactored from
lock_prdt_page_free_from_discard() and
lock_rec_free_all_from_discard_page().

trx_t::commit_tables(): Replaces trx_update_mod_tables_timestamp().

lock_release(): Let trx_t::commit_tables() invalidate the query cache
for those tables that were actually modified by the transaction.
Merge lock_check_dict_lock() to lock_release().

We must never release lock_sys.latch while holding any
lock_sys_t::hash_latch. Failure to do that could lead to
memory corruption if the buffer pool is resized between
the time lock_sys.latch is released and the hash_latch is released.

b08448de

11 Feb, 2021 5 commits

MDEV-20612: Replace lock_sys.mutex with lock_sys.latch · b01d8e1a

Marko Mäkelä authored Feb 11, 2021

For now, we will acquire the lock_sys.latch only in exclusive mode,
that is, use it as a mutex.

This is preparation for the next commit where we will introduce
a less intrusive alternative, combining a shared lock_sys.latch
with dict_table_t::lock_mutex or a mutex embedded in
lock_sys.rec_hash, lock_sys.prdt_hash, or lock_sys.prdt_page_hash.

b01d8e1a

MDEV-20612 preparation: LockMutexGuard · 90346492

Marko Mäkelä authored Feb 11, 2021

Let us use the RAII wrapper LockMutexGuard for most operations where
lock_sys.mutex is acquired.

90346492

MDEV-20612 preparation: Fewer calls to buf_page_t::id() · 2e64513f
Marko Mäkelä authored Feb 05, 2021

2e64513f
Merge 10.5 into 10.6 · b19ec884
Marko Mäkelä authored Feb 11, 2021

b19ec884

MDEV-24366: s3 test postfix - use default for S3_BUCKET · c7edbe5b

Daniel Black authored Feb 11, 2021

and S3_HOST_NAME.

Required environment variables are now S3_ACCESS_KEY and S3_SECRET_KEY.

Or a running minio instance on localhost:9000.

c7edbe5b

10 Feb, 2021 2 commits

MDEV-24834 Assertion mtr->memo_contains_flagged() in btr0cur.cc:1500 · bfb4761c

Marko Mäkelä authored Feb 10, 2021

A too strict debug assertion was introduced in
commit 03ca6495 (MDEV-24142).
It turns out that row_ins_sec_index_entry_low() may acquire a stronger
latch on the index. The old rw_lock_own(..., RW_LOCK_S) assertion
would hold also for the SX (Update) latch mode.

btr_cur_search_to_nth_level_func(): Relax the assertion to require
that the mini-transaction hold any of S or U latch.

bfb4761c

MDEV-24832 Root page AHI removal fails during rollback of bulk insert · a2fbbba2

Thirunarayanan Balathandayuthapani authored Feb 10, 2021

This failure is caused by commit 43ca6059
(MDEV-24720). InnoDB fails to remove the ahi entries
during rollback of bulk insert operation. InnoDB should
remove the AHI entries of root page before reinitialising it.

Reviewed-by: Marko Mäkelä

a2fbbba2

09 Feb, 2021 1 commit
- MDEV-24344: BINLOG REPLAY privilege is missing from SHOW PRIVILEGES · 5e3d3220
  Daniel Black authored Feb 08, 2021
```
Was added in 10.5.2 (MDEV-21975)
```
  5e3d3220
08 Feb, 2021 4 commits

MDEV-24087 s3.replication_partition fails in buildbot wiht replication failure · ffc5d064

Monty authored Feb 08, 2021

A few of the failures was because of missing sync_slave_to_master in
the test suite.

However, the biggest reason for most faulures was that in case of
ALTER PARTITION the master writes the query to the binary log before
it has updated the .frm and .par files. This causes a problem for an
S3 slave as it will start execute the ALTER PARTITION but get old .frm and
.par files from S3 which causes "open table" to fail, either with an error
or in some case with a crash.
Fixed

ffc5d064

Make maria_data_root const char* · bd5ac038

Monty authored Feb 05, 2021

This allow one to remove some casts like:
maria_data_root= (char *)".";

It also removes warnings from icc.

bd5ac038

Added 'const' to arguments in get_one_option and find_typeset() · 5d6ad2ad

Monty authored Feb 05, 2021

One should not change the program arguments!
This change also reduces warnings from the icc compiler.

Almost all changes are just syntax changes (adding const to
'get_one_option function' declarations).

Other changes:
- Added a few cast of 'argument' from 'const char*' to 'char *'. This
  was mainly in calls to 'external' functions we don't have control of.
- Ensure that all reset of 'password command line argument' are similar.
  (In almost all cases it was just adding a comment and a cast)
- In mysqlbinlog.cc and mysqld.cc there was a few cases that changed
  the command line argument. These places where changed to instead allocate
  the option in a MEM_ROOT to avoid changing the argument. Some of this
  code was changed to ensure that different programs did parsing the
  same way. Added a test case for the changes in mysqlbinlog.cc
- Changed a few variables that took their value from command line options
  from 'char *' to 'const char *'.

5d6ad2ad

Ensure that mysqlbinlog frees all memory at exit · e30a3048
Monty authored Feb 07, 2021

e30a3048

07 Feb, 2021 5 commits
- Cleanup: Replace mysql_cond_t with pthread_cond_t · 786bc312
  Marko Mäkelä authored Feb 07, 2021
```
Let us avoid the memory overhead and the dead duplicated code
for each use of never-instrumented condition variables in InnoDB.
```
  786bc312
- Merge 10.5 into 10.6 · 520c76bf
  Marko Mäkelä authored Feb 07, 2021
  
  520c76bf
- MDEV-23399 fixup: Use plain pthread_cond · 4f4a4cf9
  Marko Mäkelä authored Feb 07, 2021
```
The condition variables that were introduced in
commit 7cffb5f6 (MDEV-23399)
are never instrumented with PERFORMANCE_SCHEMA.
Let us avoid the storage overhead and dead code.
```
  4f4a4cf9
- MDEV-23379 fixup: Remove dead PERFORMANCE_SCHEMA code · 7ce64378
  Marko Mäkelä authored Feb 07, 2021
  
  7ce64378
- Cleanup: Remove lock_trx_lock_list_init(), lock_table_get_n_locks() · 74ab97f5
  Marko Mäkelä authored Feb 07, 2021
  
  74ab97f5
05 Feb, 2021 7 commits

MDEV-21452 fixup: Introduce trx_t::mutex_is_owner() · 487fbc2e

Marko Mäkelä authored Feb 05, 2021

When we replaced trx_t::mutex with srw_mutex
in commit 38fd7b7d
we lost the SAFE_MUTEX instrumentation.
Let us introduce a replacement and restore the assertions.

487fbc2e

MDEV-24789: Try to reduce mutex contention · 455514c8
Marko Mäkelä authored Feb 05, 2021

455514c8

MDEV-24789: Reduce sizeof(trx_lock_t) · 3e45f8e3

Marko Mäkelä authored Feb 05, 2021

trx_lock_t::cond: Use pthread_cond_t directly, because no instrumentation
will ever be used. This saves sizeof(void*) and removes some duplicated
inline code.

trx_lock_t::was_chosen_as_wsrep_victim: Fold into
trx_lock_t::was_chosen_as_deadlock_victim.

trx_lock_t::cancel, trx_lock_t::rec_cached, trx_lock_t::table_cached:
Use only one byte of storage, reducing memory alignment waste.

On AMD64 GNU/Linux, MDEV-24671 caused a sizeof(trx_lock_t) increase
of 48 bytes (plus the PLUGIN_PERFSCHEMA overhead of trx_lock_t::cond).
These changes should save 32 bytes.

3e45f8e3

Cleanup: Reduce some lock_sys.mutex contention · 465bdabb

Marko Mäkelä authored Feb 04, 2021

lock_table(): Remove the constant parameter flags=0.

lock_table_resurrect(): Merge lock_table_ix_resurrect() and
lock_table_x_resurrect().

lock_rec_lock(): Only acquire LockMutexGuard if lock_table_has()
does not hold.

465bdabb

MDEV-24731 fixup: bogus assertion · de407e7c

Marko Mäkelä authored Feb 05, 2021

DeadlockChecker::search(): Move a bogus assertion into a condition.
If the current transaction is waiting for a table lock (on something
else than an auto-increment lock), it is well possible that other
transactions are holding not only a conflicting lock, but also an
auto-increment lock.

This mistake was noticed during the testing of MDEV-24731, but it was
accidentally introduced in commit 5f463857.

lock_wait_end(): Remove an unused variable, and add an assertion.

de407e7c

MDEV-24781 fixup: Adjust innodb.innodb-index-debug · c42ee8a7

Marko Mäkelä authored Feb 05, 2021

Now that an INSERT into an empty table is replicated more efficiently
during online ALTER, an old test case started to fail. Let us disable
the MDEV-515 logic for the critical INSERT statement.

c42ee8a7

MDEV-24781 Assertion `mode == 16 || mode == 12 || fix_block->page.status !=... · 597510ad

Thirunarayanan Balathandayuthapani authored Feb 04, 2021

MDEV-24781 Assertion `mode == 16 || mode == 12 || fix_block->page.status != buf_page_t::FREED' failed in buf_page_get_low

This is caused by commit 3cef4f8f
(MDEV-515). dict_table_t::clear() frees all the blob during
rollback of bulk insert.But online log tries to read the
freed blob while applying the log. It can be fixed if we
truncate the online log during rollback of bulk insert operation.

597510ad

04 Feb, 2021 1 commit

MDEV-24731 Excessive mutex contention in DeadlockChecker::check_and_resolve() · 5f463857

Marko Mäkelä authored Feb 04, 2021

The DeadlockChecker expects to be able to freeze the waits-for graph.
Hence, it is best executed somewhere where we are not holding any
additional mutexes.

lock_wait(): Defer the deadlock check to this function, instead
of executing it in lock_rec_enqueue_waiting(), lock_table_enqueue_waiting().

DeadlockChecker::trx_rollback(): Merge with the only caller,
check_and_resolve().

LockMutexGuard: RAII accessor for lock_sys.mutex.

lock_sys.deadlocks: Replaces lock_deadlock_found.

trx_t: Clean up some comments.

5f463857

03 Feb, 2021 1 commit

MDEV-24750 Various corruptions caused by Aria subsystem... · eacefbca

Monty authored Feb 01, 2021

The test case was setting aria_sort_buffer_size to MAX_ULONGLONG-1
which was not handled gracefully by my_malloc() or safemalloc().
Fixed by ensuring that the malloc functions returns 0 if the size
is too big.
I also added some protection to Aria repair:
- Limit sort_buffer_size to 16G (after that a bigger sort buffer will
  not help that much anyway)
- Limit sort_buffer_size also according to sort file size. This will
  help by not allocating less memory if someone sets the buffer size too
  high.

eacefbca

02 Feb, 2021 3 commits

MDEV-24720 AHI removal during rollback of bulk insert · 43ca6059

Thirunarayanan Balathandayuthapani authored Feb 02, 2021

InnoDB fails to remove the ahi entries during rollback
of bulk insert operation. InnoDB throws the error when
validates the ahi hash tables. InnoDB should remove
the ahi entries while freeing the segment only during
bulk index rollback operation.

Reviewed-by: Marko Mäkelä

43ca6059

Merge 10.5 into 10.6 · 1110becc
Marko Mäkelä authored Feb 02, 2021

1110becc
MDEV-24765 fseg_free_extent fails to call buf_page_free() for the whole segment · b76e5c66
Thirunarayanan Balathandayuthapani authored Feb 02, 2021
```
This is caused by commit c92f7e28(MDEV-8139).
InnoDB fails to set the page status as FREED in buffer pool while freeing
the extent.
```
b76e5c66