Commit fa804497 authored by Brandon Nesterenko's avatar Brandon Nesterenko

MDEV-34274: Test rpl.rpl_change_master_demote frequently fails on buildbot...

MDEV-34274: Test rpl.rpl_change_master_demote frequently fails on buildbot with "IO thread should not be running..."

Note this is a backport of 8c8b3ab7
from 11.1.

The test rpl.rpl_change_master_demote used a `sleep 1` command
to give time for a START SLAVE UNTIL to start the slave threads
and wait for them to automatically die by UNTIL.  On machines
with heavy load (especially MSAN bb builders), one second was
not enough, and the test would fail due to the IO thread
still being up.

This patch fixes the test by replacing the sleep with specific
conditions to wait for. The test cannot wait for the IO or SQL
threads to start, as it would be possible that they would be
started and stopped by the time the MTR executor would check
the slave status. So instead, we test for proof that they
existed via the Connections status variable being incremented
by at least 2 (Connections just shows the global thread id).
At this point, we still can't use the wait_for_slave_to_stop
helper, as the SQL/IO_Running fields of SHOW SLAVE STATUS
may not be updated yet. So instead, we use
information_schema.processlist, which would show the presence
of the Slave_SQL/IO threads. So to "wait for the slave to stop",
we wait for the Slave_SQL/IO threads to be gone from the
processlist.
parent 579450c2
......@@ -500,6 +500,9 @@ START SLAVE UNTIL master_gtid_pos="ssu_middle_binlog_pos";
Warnings:
Note 1278 It is recommended to use --skip-slave-start when doing step-by-step replication with START SLAVE UNTIL; otherwise, you will get problems if you get an unexpected slave's mariadbd restart
# Slave needs time to start and stop automatically
# Waiting for both SQL and IO threads to have started..
# Waiting for SQL thread to be killed..
# Waiting for IO thread to be killed..
# Validating neither SQL nor IO threads are running..
# ..success
# Clean slave state of master
......
......@@ -276,14 +276,27 @@ SELECT VARIABLE_NAME, GLOBAL_VALUE FROM INFORMATION_SCHEMA.SYSTEM_VARIABLES WHER
--echo # binlog position and should still succeed despite the SSU stop
--echo # position pointing to a previous event (because
--echo # master_demote_to_slave=1 merges gtid_binlog_pos into gtid_slave_pos).
--let $pre_start_slave_thread_count= query_get_value(SHOW STATUS LIKE 'Connections', Value, 1)
--replace_result $ssu_middle_binlog_pos ssu_middle_binlog_pos
eval START SLAVE UNTIL master_gtid_pos="$ssu_middle_binlog_pos";
--echo # Slave needs time to start and stop automatically
# Note sync_with_master_gtid.inc, wait_for_slave_to_start.inc, and
# wait_for_slave_to_stop.inc won't work due to replication state and race
# conditions
--sleep 1
--echo # Waiting for both SQL and IO threads to have started..
--let $expected_cons_after_start_slave= `SELECT ($pre_start_slave_thread_count + 2)`
--let $status_var= Connections
--let $status_var_value= $expected_cons_after_start_slave
--let $status_var_comparsion= >=
--source include/wait_for_status_var.inc
--let $status_var_comparsion=
--echo # Waiting for SQL thread to be killed..
--let $wait_condition= SELECT count(*)=0 from information_schema.PROCESSLIST where COMMAND="Slave_SQL"
--source include/wait_condition.inc
--echo # Waiting for IO thread to be killed..
--let $wait_condition= SELECT count(*)=0 from information_schema.PROCESSLIST where COMMAND="Slave_IO"
--source include/wait_condition.inc
--echo # Validating neither SQL nor IO threads are running..
--let $io_state= query_get_value("SHOW SLAVE STATUS", Slave_IO_State, 1)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment