• Sujatha Sivakumar's avatar
    MDEV-13895: GTID and Master_Delay causes excessive initial delay · e42192d7
    Sujatha Sivakumar authored
    Problem:
    ========
    When attempting to delay a Slave attached with GTID, there appears to be an
    extra delay applied initially. For example, this output reflects a Slave that is
    already delayed by 43200 seconds. When switching to GTID replication,
    replication is paused until SQL_Remaining_Delay counts down to 0:
    
    CHANGE MASTER TO master_use_gtid=current_pos; CHANGE MASTER TO
    MASTER_DELAY=43200;
    
    Seconds_Behind_Master: 44847
    Using_Gtid: Current_Pos
    SQL_Delay: 43200
    SQL_Remaining_Delay: 43089
    Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master
    executed event
    
    Analysis:
    =========
    When slave initiates a GTID based connection request to master, the master sends
    two GTID_LIST events.  The first one is actual GTID_LIST event and the second
    one is a fake GTID_LIST event. This is sent by master to provide its current
    binlary log file position. The fake GTID_LIST events will have their ev->when=0.
    'when' (the timestamp) is set to 0 so that slave could distinguish between real
    and fake Rotate events.
    
    On slave side when MASTER_DELAY is configured to "X" the applier will ensure
    that there is a time delay of "X" seconds before the event is applied.
    
    General behaviour of MASTER_DELAY example:-
    
    Master
    timestamp of event e1=10
    timestamp of event e2=11
    
    On slave MASTER_DELAY=5
    Event e1 will be applied at = 15
    e2 will be applied at =16
    
    In bug scenario:-
    
    On Master: With GTIDs
    timestamp of event e1=10
    timestamp of event e2=0
    
    On Slave:
    e1 will be applied at = 10 + 5 =15
    For e2, since "e2->when=0" e2->when is set to current timestamp.
    i.e since the e2->when and current timestamp on slave is the same applier waits
    for additional master_delay=5 seconds. the ev->when contributes to
    "rli->last_master_timestamp".
    
    rli->last_master_timestamp= ev->when + (time_t) ev->exec_time;
    
    Fake events should not update the "ev->when" to "current timestamp" on slave.
    
    Fix:
    ===
    Remove the assignment of current timestamp to "ev->when" when "ev->when=0".
    e42192d7
rpl_gtid_excess_initial_delay.test 1.7 KB