Commits · 68938d2b42474a6ff1fced3b1495ec45de2f5c47 · nexedi / MariaDB

An error occurred fetching the project authors.

17 Sep, 2024 1 commit

MDEV-33500 (part 2): rpl.rpl_parallel_sbm can still fail · 68938d2b

Brandon Nesterenko authored 5 months ago

The failing test case validates Seconds_Behind_Master for a delayed
slave, while STOP SLAVE is executed during a delay. The test fixes
initially added to the test (commit b04c8575) added a table lock
to ensure a transaction could not finish before validating the
Seconds_Behind_Master field after SLAVE START, but did not address a
possibility that the transaction could finish before running the
STOP SLAVE command, which invalidates the validations for the rest
of the test case. Specifically, this would result in 1) a timeout in
“Waiting for table metadata lock” on the replica, which expects the
transaction to retry after slave restart and hit a lock conflict on
the locked tables (added in b04c8575), and 2) that
Seconds_Behind_Master should have increased, but did not.

The failure can be reproduced by synchronizing the slave to the master
before the MDEV-32265 echo statement (i.e. before the SLAVE STOP).

This patch fixes the test by adding a mechanism to use DEBUG_SYNC to
synchronize a MASTER_DELAY, rather than continually increase the
duration of the delay each time the test fails on buildbot. This is
to ensure that on slow machines, a delay does not pass before the
test gets a chance to validate results. Additionally, it decreases
overall test time because the test can continue immediately after
validation, thereby bypassing the remainder of a full delay for each
transaction.

68938d2b

08 Jul, 2024 1 commit

MDEV-32892: IO Thread Reports False Error When Stopped During Connecting to Primary · 744580d5

Brandon Nesterenko authored 8 months ago

The IO thread can report error code 2013 into the error log when it
is stopped during the initial connection process to the primary, as
well as when trying to read an event. However, because the IO thread
is being stopped, its connection to the primary is force-killed by
the signaling thread (see THD::awake_no_mutex()), and thereby these
connection errors should be ignored.

Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>

744580d5

06 May, 2024 1 commit

MDEV-34071: Failure during the galera_3nodes_sr.GCF-336 test · 52c45332

Julius Goryavsky authored 10 months ago

This commit fixes sporadic failures in galera_3nodes_sr.GCF-336
test. The following changes have been made here:

1) A small addition to the test itself which should make
   it more deterministic by waiting for non-primary state
   before COMMIT;
2) More careful handling of the wsrep_ready variable in
   the server code (it should always be protected with mutex).

No additional tests are required.

52c45332

05 May, 2024 1 commit
- cleanup: use THD_STAGE_INFO, not thd_proc_info · cea083af
  Sergei Golubchik authored 10 months ago
```
and put master-slave.inc *last* in the series of includes
```
  cea083af
20 Apr, 2024 1 commit

MDEV-19415: use-after-free on charsets_dir from slave connect · 57f6a1ca

Kristian Nielsen authored 10 months ago

The slave IO thread sets MYSQL_SET_CHARSET_DIR. The code for this option
however is not thread-safe in sql-common/client.c. The value set is
temporarily written to mysys global variable `charsets-dir` and can be seen
by other threads running in parallel, which can result in use-after-free
error.

Problem was visible as random failures of test cases in suite multi_source
with Valgrind or MSAN.

Work-around by not setting this option for slave connect, it is redundant
anyway as it is just setting the default value.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

57f6a1ca

13 Feb, 2024 1 commit

MDEV-33393 audit plugin do not report user did the action.. · 85517f60

Alexey Botchkov authored 1 year ago

The '<replication_slave>' user is assigned to the slave replication
thread so this name appears in the auditing logs.

85517f60

30 Jan, 2024 1 commit

MDEV-33327: rpl_seconds_behind_master_spike Sensitive to IO Thread Stop Position · c75905ca

Brandon Nesterenko authored 1 year ago

rpl.rpl_seconds_behind_master_spike uses the DEBUG_SYNC mechanism to
count how many format descriptor events (FDEs) have been executed,
to attempt to pause on a specific relay log FDE after executing
transactions. However, depending on when the IO thread is stopped,
it can send an extra FDE before sending the transactions, forcing
the test to pause before executing any transactions, resulting in a
table not existing, that is attempted to be read for COUNT.

This patch fixes this by no longer counting FDEs, but rather by
programmatically waiting until the SQL thread has executed the
transaction and then automatically activating the DEBUG_SYNC point
to trigger at the next relay log FDE.

c75905ca

29 Jan, 2024 1 commit

MDEV-33327: rpl_seconds_behind_master_spike Sensitive to IO Thread Stop Position · e4f221a5

Brandon Nesterenko authored 1 year ago

e4f221a5

19 Dec, 2023 1 commit

MDEV-10653: Fix segfault in SHOW MASTER STATUS with NULL inuse_relaylog · eaa4968f

Kristian Nielsen authored 1 year ago

The previous patch for MDEV-10653 changes the rpl_parallel::workers_idle()
function to use Relay_log_info::last_inuse_relaylog to check for idle
workers. But the code was missing a NULL check. Also, there was one place
during SQL slave thread start which was missing mutex synchronisation when
updating inuse_relaylog.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

eaa4968f

11 Dec, 2023 1 commit

MDEV-10653: SHOW SLAVE STATUS Can Deadlock an Errored Slave · 8dad5148

Brandon Nesterenko authored 1 year ago

AKA rpl.rpl_parallel, binlog_encryption.rpl_parallel fails in
buildbot with timeout in include

A replication parallel worker thread can deadlock with another
connection running SHOW SLAVE STATUS. That is, if the replication
worker thread is in do_gco_wait() and is killed, it will already
hold the LOCK_parallel_entry, and during error reporting, try to
grab the err_lock. SHOW SLAVE STATUS, however, grabs these locks in
reverse order. It will initially grab the err_lock, and then try to
grab LOCK_parallel_entry. This leads to a deadlock when both threads
have grabbed their first lock without the second.

This patch implements the MDEV-31894 proposed fix to optimize the
workers_idle() check to compare the last in-use relay log’s
queued_count==dequeued_count for idleness. This removes the need for
workers_idle() to grab LOCK_parallel_entry, as these values are
atomically updated.

Huge thanks to Kristian Nielsen for diagnosing the problem!

Reviewed By:
============
Kristian Nielsen <knielsen@knielsen-hq.org>
Andrei Elkin <andrei.elkin@mariadb.com>

8dad5148

28 Nov, 2023 1 commit
- Fixed build failure on aarch64-macos · 1ffa8c50
  Monty authored 1 year ago
```
debug_sync.h was wrongly combined with replication
```
  1ffa8c50
16 Nov, 2023 1 commit
- MDEV-32168: slave_error_param condition is never checked from the wait_for_slave_param.inc · a7d186a1
  Anel Husakovic authored 1 year ago
```
- Reviewer: <knielsen@knielsen-hq.org>
            <brandon.nesterenko@mariadb.com>
            <andrei.elkin@mariadb.com>
```
  a7d186a1
23 Oct, 2023 1 commit

MDEV-32265: seconds_behind_master is inaccurate for Delayed replication · c5f776e9

Brandon Nesterenko authored 1 year ago

If a replica is actively delaying a transaction when restarted (STOP
SLAVE/START SLAVE), when the sql thread is back up,
Seconds_Behind_Master will present as 0 until the configured
MASTER_DELAY has passed. That is, before the restart,
last_master_timestamp is updated to the timestamp of the delayed
event. Then after the restart, the negation of sql_thread_caught_up
is skipped because the timestamp of the event has already been used
for the last_master_timestamp, and their update is grouped together
in the same conditional block.

This patch fixes this by separating the negation of
sql_thread_caught_up out of the timestamp-dependent block, so it is
called any time an idle parallel slave queues an event to a worker.

Note that sql_thread_caught_up is still left in the check for internal
events, as SBM should remain idle in such case to not "magically" begin
incrementing.

Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>

c5f776e9

13 Sep, 2023 1 commit

MDEV-31177: SHOW SLAVE STATUS Last_SQL_Errno Race Condition on Errored Slave Restart · 1407f999

Brandon Nesterenko authored 1 year ago

The SQL thread and a user connection executing SHOW SLAVE STATUS
have a race condition on Last_SQL_Errno, such that a slave which
previously errored and stopped, on its next start, SHOW SLAVE STATUS
can show that the SQL Thread is running while the previous error is
also showing.

The fix is to move when the last error is cleared when the SQL
thread starts to occur before setting the status of
Slave_SQL_Running.

Thanks to Kristian Nielson for his work diagnosing the problem!

Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
Kristian Nielson <knielsen@knielsen-hq.org>

1407f999

12 Sep, 2023 1 commit

MDEV-31833 replication breaks when using optimistic replication and replica is a galera node · a3cbc44b

sjaakola authored 1 year ago

MariaDB async replication SQL thread was stopped for any failure
in applying of replication events and error message logged for the failure
was: "Node has dropped from cluster". The assumption was that event applying
failure is always due to node dropping out.
With optimistic parallel replication, event applying can fail for natural
reasons and applying should be retried to handle the failure. This retry
logic was never exercised because the slave SQL thread was stopped with first
applying failure.

To support optimistic parallel replication retrying logic this commit will
now skip replication slave abort, if node remains in cluster (wsrep_ready==ON)
and replication is configured for optimistic or aggressive retry logic.

During the development of this fix, galera.galera_as_slave_nonprim test showed
some problems. The test was analyzed, and it appears to need some attention.
One excessive sleep command was removed in this commit, but it will need more
fixes still to be fully deterministic. After this commit galera_as_slave_nonprim
is successful, though.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

a3cbc44b

15 Aug, 2023 1 commit

MDEV-31655: Parallel replication deadlock victim preference code errorneously removed · 900c4d69

Kristian Nielsen authored 1 year ago

Restore code to make InnoDB choose the second transaction as a deadlock
victim if two transactions deadlock that need to commit in-order for
parallel replication. This code was erroneously removed when VATS was
implemented in InnoDB.

Also add a test case for InnoDB choosing the right deadlock victim.
Also fixes this bug, with testcase that reliably reproduces:

MDEV-28776: rpl.rpl_mark_optimize_tbl_ddl fails with timeout on sync_with_master

Note: This should be null-merged to 10.6, as a different fix is needed
there due to InnoDB locking code changes.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

900c4d69

08 Aug, 2023 1 commit

MDEV-31413 : Node has been dropped from the cluster on Startup / Shutdown with async replica · 277968aa

Jan Lindström authored 1 year ago

There was two related problems:

(1) Galera node that is defined as a slave to async MariaDB
master at restart might do SST (state stransfer) and
part of that it will copy mysql.gtid_slave_pos table.
Problem is that updates on that table are not replicated
on a cluster. Therefore, table from donor that is not
slave is copied and joiner looses gtid position it was
and start executing events from wrong position of the binlog.
This incorrect position could break replication and
causes node to be dropped and requiring user action.

(2) Slave sql thread might start executing events before
galera is ready (wsrep_ready=ON) and that could also
cause node to be dropped from the cluster.

In this fix we enable replication of mysql.gtid_slave_pos
table on a cluster. In this way all nodes in a cluster
will know gtid slave position and even after SST joiner
knows correct gtid position to start.

Furthermore, we wait galera to be ready before slave
sql thread executes any events to prevent too early
execution.
Signed-off-by: Julius Goryavsky <julius.goryavsky@mariadb.com>

277968aa

25 Jul, 2023 1 commit

MDEV-30619: Parallel Slave SQL Thread Can Update Seconds_Behind_Master with Active Workers · 063f4ac2

Brandon Nesterenko authored 1 year ago

MDEV-31749 sporadic assert in MDEV-30619 new test

If the workers of a parallel replica are busy (potentially with long
queues), but the SQL thread has no events left to distribute (so it
goes idle), then the next event that comes from the primary will
update mi->last_master_timestamp with its timestamp, even if the
workers have not yet finished.

This patch changes the parallel replica logic which updates
last_master_timestamp after idling from using solely sql_thread_caught_up
(added in MDEV-29639) to using the latter with rli queued/dequeued
event counters.
That is, if  the queued count is equal to the dequeued count, it
means all events have been processed and the replica is considered
idle when the driver thread has also distributed all events.

Low level details of the commit include
- to make a more generalized test for Seconds_Behind_Master on
  the parallel replica, rpl_delayed_parallel_slave_sbm.test
  is renamed to rpl_parallel_sbm.test for this purpose.
- pause_sql_thread_on_next_event usage was removed
  with the MDEV-30619 fixes. Rather than remove it, we adapt it
  to the needs of this test case
- added test case to cover SBM spike of relay log read and LMT
  update that was fixed by MDEV-29639
- rpl_seconds_behind_master_spike.test is made to use
  the negate_clock_diff_with_master debug eval.

Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>

063f4ac2

06 Jun, 2023 1 commit
- Revert "MDEV-13915: STOP SLAVE takes very long time on a busy system" · 8ed88e34
  Brandon Nesterenko authored 1 year ago
```
This reverts commit 0a99d457
because it should go into only 10.5+
```
  8ed88e34
05 Jun, 2023 1 commit

MDEV-13915: STOP SLAVE takes very long time on a busy system · 0a99d457

Brandon Nesterenko authored 2 years ago

The problem is that a parallel replica would not immediately stop
running/queued transactions when issued STOP SLAVE. That is, it
allowed the current group of transactions to run, and sometimes the
transactions which belong to the next group could be started and run
through commit after STOP SLAVE was issued too, if the last group
had started committing. This would lead to long periods to wait for
all waiting transactions to finish.

This patch updates a parallel replica to try and abort immediately
and roll-back any ongoing transactions. The exception to this is any
transactions which are non-transactional (e.g. those modifying
sequences or non-transactional tables), and any prior transactions,
will be run to completion.

The specifics are as follows:

 1. A new stage was added to SHOW PROCESSLIST output for the SQL
Thread when it is waiting for a replica thread to either rollback or
finish its transaction before stopping. This stage presents as
“Waiting for worker thread to stop”

 2. Worker threads which error or are killed no longer perform GCO
cleanup if there is a concurrently running prior transaction. This
is because a worker thread scheduled to run in a future GCO could be
killed and incorrectly perform cleanup of the active GCO.

 3. Refined cases when the FL_TRANSACTIONAL flag is added to GTID
binlog events to disallow adding it to transactions which modify
both transactional and non-transactional engines when the binlogging
configuration allow the modifications to exist in the same event,
i.e. when using binlog_direct_non_trans_update == 0 and
binlog_format == statement.

 4. A few existing MTR tests relied on the completion of certain
transactions after issuing STOP SLAVE, and were re-recorded
(potentially with added synchronizations) under the new rollback
behavior.

Reviewed By
===========
Andrei Elkin <andrei.elkin@mariadb.com>

0a99d457

27 Apr, 2023 1 commit

MDEV-29621: Replica stopped by locks on sequence · 55a53949

Andrei authored 1 year ago

When using binlog_row_image=FULL with sequence table inserts, a
replica can deadlock because it treats full inserts in a sequence as DDL
statements by getting an exclusive lock on the sequence table. It
has been observed that with parallel replication, this exclusive
lock on the sequence table can lead to a deadlock where one
transaction has the exclusive lock and is waiting on a prior
transaction to commit, whereas this prior transaction is waiting on
the MDL lock.

This fix for this is on the master side, to raise FL_DDL
flag on the GTID of a full binlog_row_image write of a sequence table.
This forces the slave to execute the statement serially so a deadlock
cannot happen.

A test verifies the deadlock also to prove it happen on the OLD (pre-fixes)
slave.

OLD (buggy master) -replication-> NEW (fixed slave) is provided.
As the pre-fixes master's full row-image may represent both
SELECT NEXT VALUE and INSERT, the parallel slave pessimistically
waits for the prior transaction to have committed before to take on the
critical part of the second (like INSERT in the test) event execution.
The waiting exploits a parallel slave's retry mechanism which is
controlled by `@@global.slave_transaction_retries`.

Note that in order to avoid any persistent 'Deadlock found' 2013 error
in OLD -> NEW, `slave_transaction_retries` may need to be set to a
higher than the default value.
START-SLAVE is an effective work-around if this still happens.

55a53949

28 Mar, 2023 1 commit

MDEV-30936 clang 15.0.7 -fsanitize=memory fails massively · dfa90257

Marko Mäkelä authored 1 year ago

handle_slave_io(), handle_slave_sql(), os_thread_exit():
Remove a redundant pthread_exit(nullptr) call, because it
would cause SIGSEGV.

mysql_print_status(): Add MEM_MAKE_DEFINED() to work around
some missing instrumentation around mallinfo2().

que_graph_free_stat_list(): Invoke que_node_get_next(node) before
que_graph_free_recursive(node). That is the logical and
MSAN_OPTIONS=poison_in_dtor=1 compatible way of freeing memory.

ins_node_t::~ins_node_t(): Invoke mem_heap_free(entry_sys_heap).

que_graph_free_recursive(): Rely on ins_node_t::~ins_node_t().

fts_t::~fts_t(): Invoke mem_heap_free(fts_heap).

fts_free(): Replace with direct calls to fts_t::~fts_t().

The failures in free_root() due to MSAN_OPTIONS=poison_in_dtor=1
will be covered in MDEV-30942.

dfa90257

09 Feb, 2023 1 commit

MDEV-30608: rpl.rpl_delayed_parallel_slave_sbm sometimes fails with... · eecd4f14

Brandon Nesterenko authored 2 years ago

MDEV-30608: rpl.rpl_delayed_parallel_slave_sbm sometimes fails with Seconds_Behind_Master should not have used second transaction timestamp

One of the constraints added in the MDEV-29639 patch, is that only
the first event after idling should update last_master_timestamp;
and as long as the replica has more events to execute, the variable
should not be updated. The corresponding test,
rpl_delayed_parallel_slave_sbm.test, aims to verify this; however,
if the IO thread takes too long to queue events, the SQL thread can
appear to catch up too fast.

This fix ensures that the relay log has been fully written before
executing the events.

Note that the underlying cause of this test failure needs to be
addressed as a bug-fix, this is a temporary fix to stop test
failures. To track work on the bug-fix for the underlying issue,
please see MDEV-30619.

eecd4f14

24 Jan, 2023 1 commit

MDEV-29639: Seconds_Behind_Master is incorrect for Delayed, Parallel Replicas · d69e8357

Brandon Nesterenko authored 2 years ago

Problem
========
On a parallel, delayed replica, Seconds_Behind_Master will not be
calculated until after MASTER_DELAY seconds have passed and the
event has finished executing, resulting in potentially very large
values of Seconds_Behind_Master (which could be much larger than the
MASTER_DELAY parameter) for the entire duration the event is
delayed. This contradicts the documented MASTER_DELAY behavior,
which specifies how many seconds to withhold replicated events from
execution.

Solution
========
After a parallel replica idles, the first event after idling should
immediately update last_master_timestamp with the time that it began
execution on the primary.

Reviewed By
===========
Andrei Elkin <andrei.elkin@mariadb.com>

d69e8357

30 Nov, 2022 1 commit
- Merge 10.4 into 10.5 · 4eb8e51c
  Jan Lindström authored 2 years ago
  
  4eb8e51c
22 Nov, 2022 2 commits

MDEV-29817: Issues with handling options for SSL CRLs (and some others) · 1ebf0b73

Julius Goryavsky authored 2 years ago

This patch adds the correct setting of the "--tls-version" and
"--ssl-verify-server-cert" options in the client-side utilities
such as mysqltest, mysqlcheck and mysqlslap, as well as the correct
setting of the "--ssl-crl" option when executing queries on the
slave side, and also the correct option codes in the "sslopts-logopts.h"
file (in the latter case, incorrect values are not a problem right
now, but may cause subtle test failures in the future, if the option
handling code changes).

1ebf0b73

MDEV-29817: Issues with handling options for SSL CRLs (and some others) · f0820400

Julius Goryavsky authored 2 years ago

This patch adds the correct setting of the "--ssl-verify-server-cert"
option in the client-side utilities such as mysqlcheck and mysqlslap,
as well as the correct setting of the "--ssl-crl" option when executing
queries on the slave side, and also add the correct option codes in
the "sslopts-logopts.h" file (in the latter case, incorrect values
are not a problem right now, but may cause subtle test failures in
the future, if the option handling code changes).

f0820400

23 Sep, 2022 2 commits

Fix build without either ENABLED_DEBUG_SYNC or DBUG_OFF · 3c92050d

Marko Mäkelä authored 2 years ago

There are separate flags DBUG_OFF for disabling the DBUG facility
and ENABLED_DEBUG_SYNC for enabling the DEBUG_SYNC facility.
Let us allow debug builds without DEBUG_SYNC.

Note: For CMAKE_BUILD_TYPE=Debug, CMakeLists.txt will continue to
define ENABLED_DEBUG_SYNC.

3c92050d

MDEV-29613 Improve WITH_DBUG_TRACE=OFF · a69cf6f0

Marko Mäkelä authored 2 years ago

In commit 28325b08
a compile-time option was introduced to disable the macros
DBUG_ENTER and DBUG_RETURN or DBUG_VOID_RETURN.

The parameter name WITH_DBUG_TRACE would hint that it also
covers DBUG_PRINT statements. Let us do that: WITH_DBUG_TRACE=OFF
shall disable DBUG_PRINT() as well.

A few InnoDB recovery tests used to check that some output from
DBUG_PRINT("ib_log", ...) is present. We can live without those checks.

Reviewed by: Vladislav Vaintroub

a69cf6f0

03 Jun, 2022 1 commit
- MDEV-27697 fixup: Exclude debug code from non-debug builds · 099b9202
  Marko Mäkelä authored 2 years ago
  
  099b9202
13 May, 2022 1 commit

MDEV-28550 improper handling of replication event group that contains · 726bd8c9

Andrei authored 2 years ago

GTID_LIST_EVENT or INCIDENT_EVENT.

It's legal to have either of the two inside a group. E.g
  Gtid_event, Gtid_log_list_event, Query_1, ... Xid_log_event
is permitted.
However, the slave IO thread treated both
as the terminal even when the group represents a DDL query.
That causes a premature Gtid state update so the slave IO would think
the whole group has been collected while in fact Query_1 etc are yet to process.

Fixed with correcting a condition to compute the terminal event
of the group.
Tested with rpl_mysqlbinlog_slave_consistency (of 10.9) and
rpl_gtid_errorlog.test.

726bd8c9

28 Apr, 2022 1 commit
- MDEV-28428 Master_SSL_Crl shows Master_SSL_CA value in SHOW SLAVE STATUS output · 1430cf78
  Sergei Golubchik authored 2 years ago
```
it was showing ca and capath instead of crl and crl_path
```
  1430cf78
26 Apr, 2022 2 commits
- MDEV-27697. Removed a false assert. · 388032e9
  Andrei authored 2 years ago
  
  388032e9
- MDEV-27697. Two affected tests fixed. · 945245ae
  Andrei authored 2 years ago
```
A result file is updated in one case and former error simulation got
refined.
```
  945245ae
25 Apr, 2022 1 commit

MDEV-27697 slave must recognize incomplete replication event group · 1bcdc3e9

Andrei authored 2 years ago

In cases of a faulty master or an incorrect binlog event producer, that slave is working with,
sends an incomplete group of events slave must react with an error to not to log
into the relay-log any new events that do not belong to the incomplete group.

Fixed with extending received event properties check when slave connects to master
in gtid mode.
Specifically for the event that can be a part of a group its relay-logging is
permitted only when its position within the group is validated.
Otherwise slave IO thread stops with ER_SLAVE_RELAY_LOG_WRITE_FAILURE.

1bcdc3e9

22 Apr, 2022 1 commit

MDEV-11853: semisync thread can be killed after sync binlog but before ACK in the sync state · a83c7ab1

Brandon Nesterenko authored 3 years ago

Problem:
========
If a primary is shutdown during an active semi-sync connection
during the period when the primary is awaiting an ACK, the primary
hard kills the active communication thread and does not ensure the
transaction was received by a replica. This can lead to an
inconsistent replication state.

Solution:
========
During shutdown, the primary should wait for an ACK or timeout
before hard killing a thread which is awaiting a communication. We
extend the `SHUTDOWN WAIT FOR SLAVES` logic to identify and ignore
any threads waiting for a semi-sync ACK in phase 1. Then, before
stopping the ack receiver thread, the shutdown is delayed until all
waiting semi-sync connections receive an ACK or time out. The
connections are then killed in phase 2.

Notes:
 1) There remains an unresolved corner case that affects this
patch. MDEV-28141: Slave crashes with Packets out of order when
connecting to a shutting down master. Specifically, If a slave is
connecting to a master which is actively shutting down, the slave
can crash with a "Packets out of order" assertion error. To get
around this issue in the MTR tests, the primary will wait a small
amount of time before phase 1 killing threads to let the replicas
safely stop (if applicable).
 2) This patch also fixes MDEV-28114: Semi-sync Master ACK Receiver
Thread Can Error on COM_QUIT

Reviewed By
============
Andrei Elkin <andrei.elkin@mariadb.com>

a83c7ab1

04 Jan, 2022 1 commit

MDEV-16091: Seconds_Behind_Master spikes to millions of seconds · 96de6bfd

Brandon Nesterenko authored 3 years ago

Problem:
========
A slave’s relay log format description event is used when
calculating Seconds_Behind_Master (SBM). This forces the SBM
value to spike when processing these events, as their creation
date is set to the timestamp that the IO thread begins.

Solution:
========
When the slave generates a format description event, mark the
event as a relay log event so it does not update the
rli->last_master_timestamp variable.

Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>

96de6bfd

29 Oct, 2021 3 commits

MDEV-23328 Server hang due to Galera lock conflict resolution · ef2dbb8d

sjaakola authored 3 years ago

Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is

          wsrep_thd_LOCK()
          wsrep_kill_victim()
          lock_rec_other_has_conflicting()
          lock_clust_rec_read_check_and_lock()
          row_search_mvcc()
          ha_innobase::index_read()
          ha_innobase::rnd_pos()
          handler::ha_rnd_pos()
          handler::rnd_pos_by_record()
          handler::ha_rnd_pos_by_record()
          Rows_log_event::find_row()
          Update_rows_log_event::do_exec_row()
          Rows_log_event::do_apply_event()
          Log_event::apply_event()
          wsrep_apply_events()

and mutexes are taken in the order

          lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data

When a normal KILL statement is executed, the stack is

          innobase_kill_query()
          kill_handlerton()
          plugin_foreach_with_mask()
          ha_kill_query()
          THD::awake()
          kill_one_thread()

        and mutexes are

          victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex

This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.

In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.

TOI replication is used, in this approach,  purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.

This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>

ef2dbb8d

MDEV-25114: Crash: WSREP: invalid state ROLLED_BACK (FATAL) · d5bc0579
Jan Lindström authored 3 years ago
```
Revert "MDEV-23328 Server hang due to Galera lock conflict resolution"

This reverts commit eac8341d.
```
d5bc0579

MDEV-23328 Server hang due to Galera lock conflict resolution · 5c230b21

sjaakola authored 3 years ago

Mutex order violation when wsrep bf thread kills a conflicting trx,
the stack is

          wsrep_thd_LOCK()
          wsrep_kill_victim()
          lock_rec_other_has_conflicting()
          lock_clust_rec_read_check_and_lock()
          row_search_mvcc()
          ha_innobase::index_read()
          ha_innobase::rnd_pos()
          handler::ha_rnd_pos()
          handler::rnd_pos_by_record()
          handler::ha_rnd_pos_by_record()
          Rows_log_event::find_row()
          Update_rows_log_event::do_exec_row()
          Rows_log_event::do_apply_event()
          Log_event::apply_event()
          wsrep_apply_events()

and mutexes are taken in the order

          lock_sys->mutex -> victim_trx->mutex -> victim_thread->LOCK_thd_data

When a normal KILL statement is executed, the stack is

          innobase_kill_query()
          kill_handlerton()
          plugin_foreach_with_mask()
          ha_kill_query()
          THD::awake()
          kill_one_thread()

        and mutexes are

          victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex

This patch is the plan D variant for fixing potetial mutex locking
order exercised by BF aborting and KILL command execution.

In this approach, KILL command is replicated as TOI operation.
This guarantees total isolation for the KILL command execution
in the first node: there is no concurrent replication applying
and no concurrent DDL executing. Therefore there is no risk of
BF aborting to happen in parallel with KILL command execution
either. Potential mutex deadlocks between the different mutex
access paths with KILL command execution and BF aborting cannot
therefore happen.

TOI replication is used, in this approach,  purely as means
to provide isolated KILL command execution in the first node.
KILL command should not (and must not) be applied in secondary
nodes. In this patch, we make this sure by skipping KILL
execution in secondary nodes, in applying phase, where we
bail out if applier thread is trying to execute KILL command.
This is effective, but skipping the applying of KILL command
could happen much earlier as well.

This also fixed unprotected calls to wsrep_thd_abort
that will use wsrep_abort_transaction. This is fixed
by holding THD::LOCK_thd_data while we abort transaction.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>

5c230b21