Commit a1e70388 authored by sjaakola's avatar sjaakola Committed by Jan Lindström

MDEV-24966 Galera multi-master regression

After the merging of MDEV-24915, 10.6 branch has regressions with handling of
concurrent write load against two or more cluster nodes. These regressions may
surface as cluster hanging, node crashes or data inconsistency. With some test
scenarios, the only visible symptom could be that the BF victim aborting happens
only by innodb lock wait timeout expiration. This would result only to poor
performance (by default 50 sec hang for each BF conflict), and could be somewhat
difficult to diagnose.

This pull request has following fixes to handle concurrent write load from
multiple nodes:

In lock_wait_wsrep_kill(), the victim trx was expected to be only in
TRX_STATE_ACTIVE state. With the delayed BF conflict handling, it can happen
that victim has advanced into pre commit state. This was fixed by choosing
victim both in TRX_STATE_ACTIVE and TRX_STATE_PREPARED states.

Victim transaction may be in several different states at the time of detected
lock conflict, and due to delayed BF aborting practice in MDEV-24915, the victim
may advance further before the actual BF aborting takes place. The BF aborting
in MDEV-24915 did not wake the victim, if it was in the state of waiting for
some other lock (than the one that was blocking the high priority thread).
This anomaly caused the innodb lock wait timeout expiration delays and poor
performance symptom. To fix this, lock_wait_wsrep_kill() now looks if
victim is in lock waiting state, and uses lock_cancel_waiting_and_release()
to cancel this lock wait.

wsrep_bf_abort() checks if the victim has active transaction (in wsrep-lib),
and starts a new transaction if there was no active transaction before.
Due to late BF aborting, the victim may have e.g. failed in certification
and is already aborting or has aborted at this stage. This has caused
problems in testing where BF aborter tries to BF abort himself.
The fix in wsrep_bf_abort() now skips the BF abort, if victim is aborting
or has aborted. Victim may not have started transaction yet in wsrep context,
but it may have acquired MDL locks (due to DDL execution), and this has
caused BF conflict. Such case does not require aborting in wsrep or
replication provider state.

BF aborting could cause BF-BF conflict scenario, if victim was already aborted
and changed to replayer having high priority as well. This BF-BF conflict
scenario is now avoided in lock_wait_wsrep() where we now check if blocking
lock holder is also high priority and is ordered before, caller should wait
for the lock in this situation.

The natural innodb deadlock resolving algorithm could pick BF thread as
deadlock victim. This is fixed by giving max weigh to BF threads in
Deadlock::report().

MDEV-24341 has changed excution paths in do_command() and this affects BF
aborted victim execution. This PR fixes one assert in do_command():
 DBUG_ASSERT(!thd->async_state.pending_ops())
Which fired if the thd was BF aborted earlier. This assert is now changed
to allow pending_ops() if thd was BF aborted before.

With these fixes, long term highly conflicting write load could be run against
to node cluster. If binlogging is configured, log_slave_updates should be
also set.
parent f74704c7
...@@ -119,6 +119,7 @@ SET debug_sync='RESET'; ...@@ -119,6 +119,7 @@ SET debug_sync='RESET';
connection node_1; connection node_1;
SET GLOBAL wsrep_slave_threads = DEFAULT; SET GLOBAL wsrep_slave_threads = DEFAULT;
connection node_2; connection node_2;
SET SESSION wsrep_sync_wait=15;
SELECT * FROM t1; SELECT * FROM t1;
f1 f2 f3 f1 f2 f3
1 1 0 1 1 0
......
...@@ -266,6 +266,7 @@ SET debug_sync='RESET'; ...@@ -266,6 +266,7 @@ SET debug_sync='RESET';
SET GLOBAL wsrep_slave_threads = DEFAULT; SET GLOBAL wsrep_slave_threads = DEFAULT;
--connection node_2 --connection node_2
SET SESSION wsrep_sync_wait=15;
SELECT * FROM t1; SELECT * FROM t1;
# replicate some transactions, so that wsrep slave thread count can reach # replicate some transactions, so that wsrep slave thread count can reach
......
connection node_2; connection node_2;
connection node_1; connection node_1;
connection node_1; connection node_1;
connection node_2;
connection node_3;
connection node_1;
CREATE TABLE t1 (pk INT PRIMARY KEY, node INT) ENGINE=innodb; CREATE TABLE t1 (pk INT PRIMARY KEY, node INT) ENGINE=innodb;
INSERT INTO t1 VALUES (1, 1); INSERT INTO t1 VALUES (1, 1);
connection node_2; connection node_2;
......
...@@ -15,6 +15,12 @@ ...@@ -15,6 +15,12 @@
--let $galera_server_number = 3 --let $galera_server_number = 3
--source include/galera_connect.inc --source include/galera_connect.inc
# Save original auto_increment_offset values.
--let $node_1=node_1
--let $node_2=node_2
--let $node_3=node_3
--source ../galera/include/auto_increment_offset_save.inc
--connection node_1 --connection node_1
--let $wait_condition = SELECT VARIABLE_VALUE = 3 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_size'; --let $wait_condition = SELECT VARIABLE_VALUE = 3 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_size';
--source include/wait_condition.inc --source include/wait_condition.inc
...@@ -260,3 +266,5 @@ call mtr.add_suppression("WSREP: Rejecting JOIN message from \(.*\): new State T ...@@ -260,3 +266,5 @@ call mtr.add_suppression("WSREP: Rejecting JOIN message from \(.*\): new State T
--connection node_3 --connection node_3
call mtr.add_suppression("WSREP: Rejecting JOIN message from \(.*\): new State Transfer required."); call mtr.add_suppression("WSREP: Rejecting JOIN message from \(.*\): new State Transfer required.");
--source ../galera/include/auto_increment_offset_restore.inc
call mtr.add_suppression("WSREP: Initial position was provided by configuration or SST, avoiding override");
SET @wsrep_provider_options_saved= @@global.wsrep_provider_options;
SET @wsrep_cluster_address_saved= @@global.wsrep_cluster_address;
# MDEV#5534: mysql_tzinfo_to_sql generates wrong query
#
# Testing wsrep_replicate_myisam variable.
SELECT @@session.wsrep_replicate_myisam;
ERROR HY000: Variable 'wsrep_replicate_myisam' is a GLOBAL variable
SELECT @@global.wsrep_replicate_myisam;
@@global.wsrep_replicate_myisam
0
SET SESSION wsrep_replicate_myisam= ON;
ERROR HY000: Variable 'wsrep_replicate_myisam' is a GLOBAL variable and should be set with SET GLOBAL
SET GLOBAL wsrep_replicate_myisam= ON;
SET GLOBAL wsrep_replicate_myisam= OFF;
SET GLOBAL wsrep_provider=none;
#
# MDEV#5790: SHOW GLOBAL STATUS LIKE does not show the correct list of
# variables when using "_"
#
CALL mtr.add_suppression("WSREP: Could not open saved state file for reading.*");
SHOW GLOBAL STATUS LIKE 'wsrep%';
Variable_name Value
wsrep_local_state_uuid #
wsrep_protocol_version #
wsrep_last_committed #
wsrep_replicated #
wsrep_replicated_bytes #
wsrep_repl_keys #
wsrep_repl_keys_bytes #
wsrep_repl_data_bytes #
wsrep_repl_other_bytes #
wsrep_received #
wsrep_received_bytes #
wsrep_local_commits #
wsrep_local_cert_failures #
wsrep_local_replays #
wsrep_local_send_queue #
wsrep_local_send_queue_max #
wsrep_local_send_queue_min #
wsrep_local_send_queue_avg #
wsrep_local_recv_queue #
wsrep_local_recv_queue_max #
wsrep_local_recv_queue_min #
wsrep_local_recv_queue_avg #
wsrep_local_cached_downto #
wsrep_flow_control_paused_ns #
wsrep_flow_control_paused #
wsrep_flow_control_sent #
wsrep_flow_control_recv #
wsrep_flow_control_active #
wsrep_flow_control_requested #
wsrep_cert_deps_distance #
wsrep_apply_oooe #
wsrep_apply_oool #
wsrep_apply_window #
wsrep_commit_oooe #
wsrep_commit_oool #
wsrep_commit_window #
wsrep_local_state #
wsrep_local_state_comment #
wsrep_cert_index_size #
wsrep_causal_reads #
wsrep_cert_interval #
wsrep_open_transactions #
wsrep_open_connections #
wsrep_incoming_addresses #
wsrep_debug_sync_waiters #
wsrep_applier_thread_count #
wsrep_cluster_capabilities #
wsrep_cluster_conf_id #
wsrep_cluster_size #
wsrep_cluster_state_uuid #
wsrep_cluster_status #
wsrep_connected #
wsrep_local_bf_aborts #
wsrep_local_index #
wsrep_provider_capabilities #
wsrep_provider_name #
wsrep_provider_vendor #
wsrep_provider_version #
wsrep_ready #
wsrep_rollbacker_thread_count #
wsrep_thread_count #
SHOW GLOBAL STATUS LIKE 'wsrep_%';
Variable_name Value
wsrep_local_state_uuid #
wsrep_protocol_version #
wsrep_last_committed #
wsrep_replicated #
wsrep_replicated_bytes #
wsrep_repl_keys #
wsrep_repl_keys_bytes #
wsrep_repl_data_bytes #
wsrep_repl_other_bytes #
wsrep_received #
wsrep_received_bytes #
wsrep_local_commits #
wsrep_local_cert_failures #
wsrep_local_replays #
wsrep_local_send_queue #
wsrep_local_send_queue_max #
wsrep_local_send_queue_min #
wsrep_local_send_queue_avg #
wsrep_local_recv_queue #
wsrep_local_recv_queue_max #
wsrep_local_recv_queue_min #
wsrep_local_recv_queue_avg #
wsrep_local_cached_downto #
wsrep_flow_control_paused_ns #
wsrep_flow_control_paused #
wsrep_flow_control_sent #
wsrep_flow_control_recv #
wsrep_flow_control_active #
wsrep_flow_control_requested #
wsrep_cert_deps_distance #
wsrep_apply_oooe #
wsrep_apply_oool #
wsrep_apply_window #
wsrep_commit_oooe #
wsrep_commit_oool #
wsrep_commit_window #
wsrep_local_state #
wsrep_local_state_comment #
wsrep_cert_index_size #
wsrep_causal_reads #
wsrep_cert_interval #
wsrep_open_transactions #
wsrep_open_connections #
wsrep_incoming_addresses #
wsrep_debug_sync_waiters #
wsrep_applier_thread_count #
wsrep_cluster_capabilities #
wsrep_cluster_conf_id #
wsrep_cluster_size #
wsrep_cluster_state_uuid #
wsrep_cluster_status #
wsrep_connected #
wsrep_local_bf_aborts #
wsrep_local_index #
wsrep_provider_capabilities #
wsrep_provider_name #
wsrep_provider_vendor #
wsrep_provider_version #
wsrep_ready #
wsrep_rollbacker_thread_count #
wsrep_thread_count #
SHOW GLOBAL STATUS LIKE 'wsrep_local_state_comment';
Variable_name Value
wsrep_local_state_comment #
# Should show nothing.
SHOW STATUS LIKE 'x';
Variable_name Value
SET GLOBAL wsrep_provider=none;
SHOW STATUS LIKE 'wsrep_local_state_uuid';
Variable_name Value
wsrep_local_state_uuid #
SHOW STATUS LIKE 'wsrep_last_committed';
Variable_name Value
wsrep_last_committed #
SET GLOBAL wsrep_provider=none;
#
# MDEV#6206: wsrep_slave_threads subtracts from max_connections
#
call mtr.add_suppression("WSREP: Failed to get provider options");
SELECT @@global.wsrep_provider;
@@global.wsrep_provider
libgalera_smm.so
SELECT @@global.wsrep_slave_threads;
@@global.wsrep_slave_threads
1
SELECT @@global.wsrep_cluster_address;
@@global.wsrep_cluster_address
SELECT @@global.wsrep_on;
@@global.wsrep_on
1
SHOW STATUS LIKE 'threads_connected';
Variable_name Value
Threads_connected 1
SHOW STATUS LIKE 'wsrep_thread_count';
Variable_name Value
wsrep_thread_count 0
SELECT @@global.wsrep_provider;
@@global.wsrep_provider
libgalera_smm.so
SELECT @@global.wsrep_cluster_address;
@@global.wsrep_cluster_address
SELECT @@global.wsrep_on;
@@global.wsrep_on
1
SHOW STATUS LIKE 'threads_connected';
Variable_name Value
Threads_connected 1
SHOW STATUS LIKE 'wsrep_thread_count';
Variable_name Value
wsrep_thread_count 0
# Setting wsrep_cluster_address triggers the creation of
# applier/rollbacker threads.
SET GLOBAL wsrep_cluster_address= 'gcomm://';
# Wait for applier thread to get created 1.
# Wait for applier thread to get created 2.
SELECT VARIABLE_VALUE AS EXPECT_1 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_applier_thread_count';
EXPECT_1
1
SELECT VARIABLE_VALUE AS EXPECT_1 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_rollbacker_thread_count';
EXPECT_1
1
SELECT VARIABLE_VALUE AS EXPECT_2 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_thread_count';
EXPECT_2
2
SELECT @@global.wsrep_provider;
@@global.wsrep_provider
libgalera_smm.so
SELECT @@global.wsrep_cluster_address;
@@global.wsrep_cluster_address
gcomm://
SELECT @@global.wsrep_on;
@@global.wsrep_on
1
SHOW STATUS LIKE 'threads_connected';
Variable_name Value
Threads_connected 1
SHOW STATUS LIKE 'wsrep_thread_count';
Variable_name Value
wsrep_thread_count 2
SET @wsrep_slave_threads_saved= @@global.wsrep_slave_threads;
SET GLOBAL wsrep_slave_threads= 10;
# Wait for 9 applier threads to get created.
SELECT VARIABLE_VALUE AS EXPECT_10 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_applier_thread_count';
EXPECT_10
10
SELECT VARIABLE_VALUE AS EXPECT_1 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_rollbacker_thread_count';
EXPECT_1
1
SELECT VARIABLE_VALUE AS EXPECT_11 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_thread_count';
EXPECT_11
11
SHOW STATUS LIKE 'threads_connected';
Variable_name Value
Threads_connected 1
set wsrep_on=0;
set wsrep_on=1;
create user test@localhost;
connect con1,localhost,test;
set auto_increment_increment=10;
set wsrep_on=0;
ERROR 42000: Access denied; you need (at least one of) the SUPER privilege(s) for this operation
disconnect con1;
connection default;
drop user test@localhost;
#
# MDEV#6411: Setting set @@global.wsrep_sst_auth=NULL causes crash
#
SET @wsrep_sst_auth_saved= @@global.wsrep_sst_auth;
SET @@global.wsrep_sst_auth= 'user:pass';
SELECT @@global.wsrep_sst_auth;
@@global.wsrep_sst_auth
********
SET @@global.wsrep_sst_auth= '';
SELECT @@global.wsrep_sst_auth;
@@global.wsrep_sst_auth
SET @@global.wsrep_sst_auth= NULL;
SELECT @@global.wsrep_sst_auth;
@@global.wsrep_sst_auth
NULL
SET @@global.wsrep_sst_auth= @wsrep_sst_auth_saved;
# End of test.
!include ../my.cnf
[mysqld.1]
wsrep-on=ON
wsrep-cluster-address=gcomm://
wsrep-provider=@ENV.WSREP_PROVIDER
binlog-format=ROW
--source include/have_wsrep.inc
--source include/force_restart.inc
--source include/have_innodb.inc
--source include/galera_have_debug_sync.inc
call mtr.add_suppression("WSREP: Initial position was provided by configuration or SST, avoiding override");
SET @wsrep_provider_options_saved= @@global.wsrep_provider_options;
SET @wsrep_cluster_address_saved= @@global.wsrep_cluster_address;
--echo
--echo # MDEV#5534: mysql_tzinfo_to_sql generates wrong query
--echo #
--echo # Testing wsrep_replicate_myisam variable.
--error ER_INCORRECT_GLOBAL_LOCAL_VAR
SELECT @@session.wsrep_replicate_myisam;
SELECT @@global.wsrep_replicate_myisam;
--error ER_GLOBAL_VARIABLE
SET SESSION wsrep_replicate_myisam= ON;
SET GLOBAL wsrep_replicate_myisam= ON;
# Reset it back.
SET GLOBAL wsrep_replicate_myisam= OFF;
SET GLOBAL wsrep_provider=none;
--echo #
--echo # MDEV#5790: SHOW GLOBAL STATUS LIKE does not show the correct list of
--echo # variables when using "_"
--echo #
CALL mtr.add_suppression("WSREP: Could not open saved state file for reading.*");
--disable_query_log
eval SET GLOBAL wsrep_provider= '$WSREP_PROVIDER';
--enable_query_log
--replace_column 2 #
SHOW GLOBAL STATUS LIKE 'wsrep%';
--echo
--replace_column 2 #
SHOW GLOBAL STATUS LIKE 'wsrep_%';
--replace_column 2 #
SHOW GLOBAL STATUS LIKE 'wsrep_local_state_comment';
--echo # Should show nothing.
SHOW STATUS LIKE 'x';
# Reset it back.
SET GLOBAL wsrep_provider=none;
--disable_query_log
eval SET GLOBAL wsrep_provider= '$WSREP_PROVIDER';
--enable_query_log
# The following 2 variables are used by mariabackup
# SST.
--echo
--replace_column 2 #
SHOW STATUS LIKE 'wsrep_local_state_uuid';
--echo
--replace_column 2 #
SHOW STATUS LIKE 'wsrep_last_committed';
# Reset it back.
SET GLOBAL wsrep_provider=none;
--echo
--echo #
--echo # MDEV#6206: wsrep_slave_threads subtracts from max_connections
--echo #
call mtr.add_suppression("WSREP: Failed to get provider options");
--disable_query_log
eval SET GLOBAL wsrep_provider= '$WSREP_PROVIDER';
--enable_query_log
--replace_regex /.*libgalera_smm.*/libgalera_smm.so/
SELECT @@global.wsrep_provider;
SELECT @@global.wsrep_slave_threads;
SELECT @@global.wsrep_cluster_address;
SELECT @@global.wsrep_on;
SHOW STATUS LIKE 'threads_connected';
SHOW STATUS LIKE 'wsrep_thread_count';
--echo
--disable_query_log
eval SET GLOBAL wsrep_provider= '$WSREP_PROVIDER';
--enable_query_log
--replace_regex /.*libgalera_smm.*/libgalera_smm.so/
SELECT @@global.wsrep_provider;
SELECT @@global.wsrep_cluster_address;
SELECT @@global.wsrep_on;
SHOW STATUS LIKE 'threads_connected';
SHOW STATUS LIKE 'wsrep_thread_count';
--echo
--echo # Setting wsrep_cluster_address triggers the creation of
--echo # applier/rollbacker threads.
SET GLOBAL wsrep_cluster_address= 'gcomm://';
--echo # Wait for applier thread to get created 1.
--let $wait_condition = SELECT VARIABLE_VALUE = 1 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_applier_thread_count';
--source include/wait_condition.inc
--echo # Wait for applier thread to get created 2.
--let $wait_condition = SELECT VARIABLE_VALUE = 1 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_rollbacker_thread_count';
--source include/wait_condition.inc
SELECT VARIABLE_VALUE AS EXPECT_1 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_applier_thread_count';
SELECT VARIABLE_VALUE AS EXPECT_1 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_rollbacker_thread_count';
SELECT VARIABLE_VALUE AS EXPECT_2 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_thread_count';
--replace_regex /.*libgalera_smm.*/libgalera_smm.so/
SELECT @@global.wsrep_provider;
SELECT @@global.wsrep_cluster_address;
SELECT @@global.wsrep_on;
SHOW STATUS LIKE 'threads_connected';
SHOW STATUS LIKE 'wsrep_thread_count';
--echo
SET @wsrep_slave_threads_saved= @@global.wsrep_slave_threads;
SET GLOBAL wsrep_slave_threads= 10;
--echo # Wait for 9 applier threads to get created.
--let $wait_condition = SELECT VARIABLE_VALUE = 10 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_applier_thread_count';
--source include/wait_condition.inc
SELECT VARIABLE_VALUE AS EXPECT_10 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_applier_thread_count';
SELECT VARIABLE_VALUE AS EXPECT_1 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_rollbacker_thread_count';
SELECT VARIABLE_VALUE AS EXPECT_11 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_thread_count';
SHOW STATUS LIKE 'threads_connected';
#
# privileges for wsrep_on
#
set wsrep_on=0;
set wsrep_on=1;
--source include/wait_until_connected_again.inc
create user test@localhost;
connect con1,localhost,test;
set auto_increment_increment=10;
--error ER_SPECIFIC_ACCESS_DENIED_ERROR
set wsrep_on=0;
disconnect con1;
connection default;
drop user test@localhost;
--echo #
--echo # MDEV#6411: Setting set @@global.wsrep_sst_auth=NULL causes crash
--echo #
SET @wsrep_sst_auth_saved= @@global.wsrep_sst_auth;
SET @@global.wsrep_sst_auth= 'user:pass';
SELECT @@global.wsrep_sst_auth;
SET @@global.wsrep_sst_auth= '';
SELECT @@global.wsrep_sst_auth;
SET @@global.wsrep_sst_auth= NULL;
SELECT @@global.wsrep_sst_auth;
SET @@global.wsrep_sst_auth= @wsrep_sst_auth_saved;
# Reset (for mtr internal checks)
--disable_query_log
SET GLOBAL wsrep_slave_threads= @wsrep_slave_threads_saved;
eval SET GLOBAL wsrep_provider= '$WSREP_PROVIDER';
SET GLOBAL wsrep_cluster_address= @wsrep_cluster_address_saved;
SET GLOBAL wsrep_provider_options= @wsrep_provider_options_saved;
--enable_query_log
--source include/galera_wait_ready.inc
--echo # End of test.
...@@ -7972,7 +7972,6 @@ MYSQL_BIN_LOG::queue_for_group_commit(group_commit_entry *orig_entry) ...@@ -7972,7 +7972,6 @@ MYSQL_BIN_LOG::queue_for_group_commit(group_commit_entry *orig_entry)
DBUG_ASSERT(entry != NULL); DBUG_ASSERT(entry != NULL);
cur= entry->thd->wait_for_commit_ptr; cur= entry->thd->wait_for_commit_ptr;
} }
#ifdef WITH_WSREP #ifdef WITH_WSREP
if (wsrep_is_active(entry->thd) && if (wsrep_is_active(entry->thd) &&
wsrep_run_commit_hook(entry->thd, entry->all)) wsrep_run_commit_hook(entry->thd, entry->all))
...@@ -7986,7 +7985,7 @@ MYSQL_BIN_LOG::queue_for_group_commit(group_commit_entry *orig_entry) ...@@ -7986,7 +7985,7 @@ MYSQL_BIN_LOG::queue_for_group_commit(group_commit_entry *orig_entry)
result= -3; result= -3;
} }
else else
DBUG_ASSERT(result != -2 && result != -3); DBUG_ASSERT(result == 0);
#endif /* WITH_WSREP */ #endif /* WITH_WSREP */
if (opt_binlog_commit_wait_count > 0 && orig_queue != NULL) if (opt_binlog_commit_wait_count > 0 && orig_queue != NULL)
......
...@@ -1206,7 +1206,8 @@ dispatch_command_return do_command(THD *thd, bool blocking) ...@@ -1206,7 +1206,8 @@ dispatch_command_return do_command(THD *thd, bool blocking)
enum enum_server_command command; enum enum_server_command command;
DBUG_ENTER("do_command"); DBUG_ENTER("do_command");
DBUG_ASSERT(!thd->async_state.pending_ops()); DBUG_ASSERT(!thd->async_state.pending_ops() ||
(WSREP(thd) && thd->wsrep_trx().state() == wsrep::transaction::s_aborted));
if (thd->async_state.m_state == thd_async_state::enum_async_state::RESUMED) if (thd->async_state.m_state == thd_async_state::enum_async_state::RESUMED)
{ {
/* /*
......
...@@ -348,8 +348,23 @@ bool wsrep_bf_abort(const THD* bf_thd, THD* victim_thd) ...@@ -348,8 +348,23 @@ bool wsrep_bf_abort(const THD* bf_thd, THD* victim_thd)
if (WSREP(victim_thd) && !victim_thd->wsrep_trx().active()) if (WSREP(victim_thd) && !victim_thd->wsrep_trx().active())
{ {
WSREP_DEBUG("wsrep_bf_abort, BF abort for non active transaction"); WSREP_DEBUG("wsrep_bf_abort, BF abort for non active transaction."
wsrep_start_transaction(victim_thd, victim_thd->wsrep_next_trx_id()); " Victim state %s bf state %s",
wsrep::to_c_string(victim_thd->wsrep_trx().state()),
wsrep::to_c_string(bf_thd->wsrep_trx().state()));
switch (victim_thd->wsrep_trx().state()) {
case wsrep::transaction::s_aborting: /* fall through */
case wsrep::transaction::s_aborted:
WSREP_DEBUG("victim is aborting or has aborted");
break;
default: break;
}
/* victim may not have started transaction yet in wsrep context, but it may
have acquired MDL locks (due to DDL execution), and this has caused BF conflict.
such case does not require aborting in wsrep or replication provider state.
*/
return false;
} }
bool ret; bool ret;
...@@ -359,6 +374,7 @@ bool wsrep_bf_abort(const THD* bf_thd, THD* victim_thd) ...@@ -359,6 +374,7 @@ bool wsrep_bf_abort(const THD* bf_thd, THD* victim_thd)
} }
else else
{ {
DBUG_ASSERT(WSREP(victim_thd) ? victim_thd->wsrep_trx().active() : 1);
ret= victim_thd->wsrep_cs().bf_abort(bf_seqno); ret= victim_thd->wsrep_cs().bf_abort(bf_seqno);
} }
if (ret) if (ret)
......
...@@ -18075,41 +18075,53 @@ void lock_wait_wsrep_kill(trx_t *bf_trx, ulong thd_id, trx_id_t trx_id) ...@@ -18075,41 +18075,53 @@ void lock_wait_wsrep_kill(trx_t *bf_trx, ulong thd_id, trx_id_t trx_id)
{ {
bool aborting= false; bool aborting= false;
wsrep_thd_LOCK(vthd); wsrep_thd_LOCK(vthd);
if (trx_t *vtrx= thd_to_trx(vthd)) trx_t *vtrx= thd_to_trx(vthd);
if (vtrx)
{ {
lock_sys.wr_lock(SRW_LOCK_CALL); lock_sys.wr_lock(SRW_LOCK_CALL);
mysql_mutex_lock(&lock_sys.wait_mutex); mysql_mutex_lock(&lock_sys.wait_mutex);
vtrx->mutex_lock(); vtrx->mutex_lock();
if (vtrx->id == trx_id && vtrx->state == TRX_STATE_ACTIVE) /* victim transaction is either active or prepared, if it has already
proceeded to replication phase */
if (vtrx->id == trx_id)
{ {
WSREP_LOG_CONFLICT(bf_thd, vthd, TRUE); switch (vtrx->state) {
WSREP_DEBUG("Aborter BF trx_id: " TRX_ID_FMT " thread: %ld " default:
"seqno: %lld client_state: %s " break;
"client_mode: %s transaction_mode: %s query: %s", case TRX_STATE_PREPARED:
bf_trx->id, if (!wsrep_is_wsrep_xid(vtrx->xid))
thd_get_thread_id(bf_thd), break;
wsrep_thd_trx_seqno(bf_thd), /* fall through */
wsrep_thd_client_state_str(bf_thd), case TRX_STATE_ACTIVE:
wsrep_thd_client_mode_str(bf_thd), WSREP_LOG_CONFLICT(bf_thd, vthd, TRUE);
wsrep_thd_transaction_state_str(bf_thd), WSREP_DEBUG("Aborter BF trx_id: " TRX_ID_FMT " thread: %ld "
wsrep_thd_query(bf_thd)); "seqno: %lld client_state: %s "
WSREP_DEBUG("Victim %s trx_id: " TRX_ID_FMT " thread: %ld " "client_mode: %s transaction_mode: %s query: %s",
"seqno: %lld client_state: %s " bf_trx->id,
"client_mode: %s transaction_mode: %s query: %s", thd_get_thread_id(bf_thd),
wsrep_thd_is_BF(vthd, false) ? "BF" : "normal", wsrep_thd_trx_seqno(bf_thd),
vtrx->id, wsrep_thd_client_state_str(bf_thd),
thd_get_thread_id(vthd), wsrep_thd_client_mode_str(bf_thd),
wsrep_thd_trx_seqno(vthd), wsrep_thd_transaction_state_str(bf_thd),
wsrep_thd_client_state_str(vthd), wsrep_thd_query(bf_thd));
wsrep_thd_client_mode_str(vthd), WSREP_DEBUG("Victim %s trx_id: " TRX_ID_FMT " thread: %ld "
wsrep_thd_transaction_state_str(vthd), "seqno: %lld client_state: %s "
wsrep_thd_query(vthd)); "client_mode: %s transaction_mode: %s query: %s",
/* Mark transaction as a victim for Galera abort */ wsrep_thd_is_BF(vthd, false) ? "BF" : "normal",
vtrx->lock.was_chosen_as_deadlock_victim.fetch_or(2); vtrx->id,
if (!wsrep_thd_set_wsrep_aborter(bf_thd, vthd)) thd_get_thread_id(vthd),
aborting= true; wsrep_thd_trx_seqno(vthd),
else wsrep_thd_client_state_str(vthd),
WSREP_DEBUG("kill transaction skipped due to wsrep_aborter set"); wsrep_thd_client_mode_str(vthd),
wsrep_thd_transaction_state_str(vthd),
wsrep_thd_query(vthd));
/* Mark transaction as a victim for Galera abort */
vtrx->lock.was_chosen_as_deadlock_victim.fetch_or(2);
if (!wsrep_thd_set_wsrep_aborter(bf_thd, vthd))
aborting= true;
else
WSREP_DEBUG("kill transaction skipped due to wsrep_aborter set");
}
} }
lock_sys.wr_unlock(); lock_sys.wr_unlock();
mysql_mutex_unlock(&lock_sys.wait_mutex); mysql_mutex_unlock(&lock_sys.wait_mutex);
...@@ -18118,6 +18130,11 @@ void lock_wait_wsrep_kill(trx_t *bf_trx, ulong thd_id, trx_id_t trx_id) ...@@ -18118,6 +18130,11 @@ void lock_wait_wsrep_kill(trx_t *bf_trx, ulong thd_id, trx_id_t trx_id)
wsrep_thd_UNLOCK(vthd); wsrep_thd_UNLOCK(vthd);
if (aborting) if (aborting)
{ {
/* if victim is waiting for some other lock, we have to cancel
that waiting
*/
lock_sys.cancel_lock_wait_for_trx(vtrx);
DEBUG_SYNC(bf_thd, "before_wsrep_thd_abort"); DEBUG_SYNC(bf_thd, "before_wsrep_thd_abort");
wsrep_thd_bf_abort(bf_thd, vthd, true); wsrep_thd_bf_abort(bf_thd, vthd, true);
} }
...@@ -18146,7 +18163,7 @@ wsrep_abort_transaction( ...@@ -18146,7 +18163,7 @@ wsrep_abort_transaction(
ut_ad(bf_thd); ut_ad(bf_thd);
ut_ad(victim_thd); ut_ad(victim_thd);
trx_t* victim_trx = thd_to_trx(victim_thd); trx_t* victim_trx= thd_to_trx(victim_thd);
WSREP_DEBUG("abort transaction: BF: %s victim: %s victim conf: %s", WSREP_DEBUG("abort transaction: BF: %s victim: %s victim conf: %s",
wsrep_thd_query(bf_thd), wsrep_thd_query(bf_thd),
......
...@@ -859,6 +859,11 @@ class lock_sys_t ...@@ -859,6 +859,11 @@ class lock_sys_t
@param id page to be discarded @param id page to be discarded
@param page whether to discard also from lock_sys.prdt_hash */ @param page whether to discard also from lock_sys.prdt_hash */
void prdt_page_free_from_discard(const page_id_t id, bool all= false); void prdt_page_free_from_discard(const page_id_t id, bool all= false);
#ifdef WITH_WSREP
/** Cancel possible lock waiting for a transaction */
static void cancel_lock_wait_for_trx(trx_t *trx);
#endif /* WITH_WSREP */
}; };
/** The lock system */ /** The lock system */
......
...@@ -921,8 +921,13 @@ ATTRIBUTE_COLD ATTRIBUTE_NOINLINE static void lock_wait_wsrep(trx_t *trx) ...@@ -921,8 +921,13 @@ ATTRIBUTE_COLD ATTRIBUTE_NOINLINE static void lock_wait_wsrep(trx_t *trx)
dict_table_t *table= wait_lock->un_member.tab_lock.table; dict_table_t *table= wait_lock->un_member.tab_lock.table;
for (lock_t *lock= UT_LIST_GET_FIRST(table->locks); lock; for (lock_t *lock= UT_LIST_GET_FIRST(table->locks); lock;
lock= UT_LIST_GET_NEXT(un_member.tab_lock.locks, lock)) lock= UT_LIST_GET_NEXT(un_member.tab_lock.locks, lock))
if (lock->trx != trx) /* if victim has also BF status, but has earlier seqno, we have to wait */
if (lock->trx != trx &&
!(wsrep_thd_is_BF(lock->trx->mysql_thd, false) &&
wsrep_thd_order_before(lock->trx->mysql_thd, trx->mysql_thd)))
{
victims.emplace(lock->trx); victims.emplace(lock->trx);
}
} }
else else
{ {
...@@ -936,8 +941,13 @@ ATTRIBUTE_COLD ATTRIBUTE_NOINLINE static void lock_wait_wsrep(trx_t *trx) ...@@ -936,8 +941,13 @@ ATTRIBUTE_COLD ATTRIBUTE_NOINLINE static void lock_wait_wsrep(trx_t *trx)
if (!lock_rec_get_nth_bit(lock, heap_no)) if (!lock_rec_get_nth_bit(lock, heap_no))
lock= lock_rec_get_next(heap_no, lock); lock= lock_rec_get_next(heap_no, lock);
do do
if (lock->trx != trx) /* if victim has also BF status, but has earlier seqno, we have to wait */
if (lock->trx != trx &&
!(wsrep_thd_is_BF(lock->trx->mysql_thd, false) &&
wsrep_thd_order_before(lock->trx->mysql_thd, trx->mysql_thd)))
{
victims.emplace(lock->trx); victims.emplace(lock->trx);
}
while ((lock= lock_rec_get_next(heap_no, lock))); while ((lock= lock_rec_get_next(heap_no, lock)));
} }
} }
...@@ -5362,6 +5372,21 @@ static void lock_cancel_waiting_and_release(lock_t *lock) ...@@ -5362,6 +5372,21 @@ static void lock_cancel_waiting_and_release(lock_t *lock)
lock_wait_end(trx); lock_wait_end(trx);
trx->mutex_unlock(); trx->mutex_unlock();
} }
#ifdef WITH_WSREP
void lock_sys_t::cancel_lock_wait_for_trx(trx_t *trx)
{
lock_sys.wr_lock(SRW_LOCK_CALL);
mysql_mutex_lock(&lock_sys.wait_mutex);
if (lock_t *lock= trx->lock.wait_lock)
{
/* check if victim is still waiting */
if (lock->is_waiting())
lock_cancel_waiting_and_release(lock);
}
lock_sys.wr_unlock();
mysql_mutex_unlock(&lock_sys.wait_mutex);
}
#endif /* WITH_WSREP */
/** Cancel a waiting lock request. /** Cancel a waiting lock request.
@param lock waiting lock request @param lock waiting lock request
...@@ -5758,8 +5783,15 @@ namespace Deadlock ...@@ -5758,8 +5783,15 @@ namespace Deadlock
If current_trx=false, a concurrent commit is protected by both If current_trx=false, a concurrent commit is protected by both
lock_sys.latch and lock_sys.wait_mutex. */ lock_sys.latch and lock_sys.wait_mutex. */
const undo_no_t trx_weight= TRX_WEIGHT(trx) | const undo_no_t trx_weight= TRX_WEIGHT(trx) |
(trx->mysql_thd && thd_has_edited_nontrans_tables(trx->mysql_thd) (trx->mysql_thd &&
#ifdef WITH_WSREP
(thd_has_edited_nontrans_tables(trx->mysql_thd) ||
(trx->is_wsrep() && wsrep_thd_is_BF(trx->mysql_thd, false)))
#else
thd_has_edited_nontrans_tables(trx->mysql_thd)
#endif /* WITH_WSREP */
? 1ULL << 63 : 0); ? 1ULL << 63 : 0);
trx_t *victim= nullptr; trx_t *victim= nullptr;
undo_no_t victim_weight= ~0ULL; undo_no_t victim_weight= ~0ULL;
unsigned victim_pos= 0, trx_pos= 0; unsigned victim_pos= 0, trx_pos= 0;
...@@ -5782,7 +5814,13 @@ namespace Deadlock ...@@ -5782,7 +5814,13 @@ namespace Deadlock
{ {
next= next->lock.wait_trx; next= next->lock.wait_trx;
const undo_no_t next_weight= TRX_WEIGHT(next) | const undo_no_t next_weight= TRX_WEIGHT(next) |
(next->mysql_thd && thd_has_edited_nontrans_tables(next->mysql_thd) (next->mysql_thd &&
#ifdef WITH_WSREP
(thd_has_edited_nontrans_tables(next->mysql_thd) ||
(next->is_wsrep() && wsrep_thd_is_BF(next->mysql_thd, false)))
#else
thd_has_edited_nontrans_tables(next->mysql_thd)
#endif /* WITH_WSREP */
? 1ULL << 63 : 0); ? 1ULL << 63 : 0);
if (next_weight < victim_weight) if (next_weight < victim_weight)
{ {
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment