Commit e42192d7 authored by Sujatha Sivakumar's avatar Sujatha Sivakumar

MDEV-13895: GTID and Master_Delay causes excessive initial delay

Problem:
========
When attempting to delay a Slave attached with GTID, there appears to be an
extra delay applied initially. For example, this output reflects a Slave that is
already delayed by 43200 seconds. When switching to GTID replication,
replication is paused until SQL_Remaining_Delay counts down to 0:

CHANGE MASTER TO master_use_gtid=current_pos; CHANGE MASTER TO
MASTER_DELAY=43200;

Seconds_Behind_Master: 44847
Using_Gtid: Current_Pos
SQL_Delay: 43200
SQL_Remaining_Delay: 43089
Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master
executed event

Analysis:
=========
When slave initiates a GTID based connection request to master, the master sends
two GTID_LIST events.  The first one is actual GTID_LIST event and the second
one is a fake GTID_LIST event. This is sent by master to provide its current
binlary log file position. The fake GTID_LIST events will have their ev->when=0.
'when' (the timestamp) is set to 0 so that slave could distinguish between real
and fake Rotate events.

On slave side when MASTER_DELAY is configured to "X" the applier will ensure
that there is a time delay of "X" seconds before the event is applied.

General behaviour of MASTER_DELAY example:-

Master
timestamp of event e1=10
timestamp of event e2=11

On slave MASTER_DELAY=5
Event e1 will be applied at = 15
e2 will be applied at =16

In bug scenario:-

On Master: With GTIDs
timestamp of event e1=10
timestamp of event e2=0

On Slave:
e1 will be applied at = 10 + 5 =15
For e2, since "e2->when=0" e2->when is set to current timestamp.
i.e since the e2->when and current timestamp on slave is the same applier waits
for additional master_delay=5 seconds. the ev->when contributes to
"rli->last_master_timestamp".

rli->last_master_timestamp= ev->when + (time_t) ev->exec_time;

Fake events should not update the "ev->when" to "current timestamp" on slave.

Fix:
===
Remove the assignment of current timestamp to "ev->when" when "ev->when=0".
parent 05ad7fc3
include/master-slave.inc
[connection master]
CREATE TABLE t1 (i INT);
connection slave;
include/stop_slave.inc
CHANGE MASTER TO MASTER_USE_GTID= current_pos, MASTER_DELAY= 10;
include/start_slave.inc
connection master;
INSERT INTO t1 VALUES (1);
include/sync_slave_io_with_master.inc
connection slave;
"Sleeping for 15"
# Asserted this: Seconds_Behind_Master should be less than MASTER_DELAY
# Asserted this: One row shoule be found in table t1.
"======= Clean up ========"
STOP SLAVE;
CHANGE MASTER TO MASTER_USE_GTID=no, MASTER_DELAY=0;
START SLAVE;
connection master;
DROP TABLE t1;
connection slave;
connection master;
include/rpl_end.inc
# ==== Purpose ====
#
# Test verifies that when "Master_Delay" is specified on slave with GTIDS there
# will not be any extra delay initially.
#
# ==== Implementation ====
#
# Steps:
# 0 - Stop the slave and execute CHANGE MASTER command with
# master_use_gtid= curren_pos and master_delay= 10
# 1 - On slave introduce a sleep of 15 seconds and check that the
# Seconds_Behind_Master is within specified master_delay limit. It should
# not be more that "10" seconds.
#
# ==== References ====
#
# MDEV-13895: GTID and Master_Delay causes excessive initial delay
--source include/have_binlog_format_mixed.inc
--source include/master-slave.inc
CREATE TABLE t1 (i INT);
--sync_slave_with_master
--source include/stop_slave.inc
CHANGE MASTER TO MASTER_USE_GTID= current_pos, MASTER_DELAY= 10;
--source include/start_slave.inc
--connection master
INSERT INTO t1 VALUES (1);
--source include/sync_slave_io_with_master.inc
--connection slave
--let $actual_delay= query_get_value(SHOW SLAVE STATUS, SQL_Delay, 1)
--let $sleep_time= `SELECT 5 + $actual_delay`
--echo "Sleeping for $sleep_time"
--sleep $sleep_time
--let $assert_cond= [SHOW SLAVE STATUS, Seconds_Behind_Master, 1] <= 10
--let $assert_text= Seconds_Behind_Master should be less than MASTER_DELAY
--source include/rpl_assert.inc
# The row should be available in table after master_delay=20 seconds.
--let $assert_text= One row shoule be found in table t1.
--let $assert_cond= COUNT(*) = 1 FROM t1
--source include/rpl_assert.inc
--echo "======= Clean up ========"
STOP SLAVE;
CHANGE MASTER TO MASTER_USE_GTID=no, MASTER_DELAY=0;
START SLAVE;
--connection master
DROP TABLE t1;
--sync_slave_with_master
--connection master
--source include/rpl_end.inc
......@@ -3532,12 +3532,6 @@ apply_event_and_update_pos_setup(Log_event* ev, THD* thd, rpl_group_info *rgi)
thd->variables.server_id = ev->server_id;
thd->set_time(); // time the query
thd->lex->current_select= 0;
if (!ev->when)
{
my_hrtime_t hrtime= my_hrtime();
ev->when= hrtime_to_my_time(hrtime);
ev->when_sec_part= hrtime_sec_part(hrtime);
}
thd->variables.option_bits=
(thd->variables.option_bits & ~OPTION_SKIP_REPLICATION) |
(ev->flags & LOG_EVENT_SKIP_REPLICATION_F ? OPTION_SKIP_REPLICATION : 0);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment