• Brandon Nesterenko's avatar
    MDEV-11853: semisync thread can be killed after sync binlog but before ACK in the sync state · a83c7ab1
    Brandon Nesterenko authored
    Problem:
    ========
    If a primary is shutdown during an active semi-sync connection
    during the period when the primary is awaiting an ACK, the primary
    hard kills the active communication thread and does not ensure the
    transaction was received by a replica. This can lead to an
    inconsistent replication state.
    
    Solution:
    ========
    During shutdown, the primary should wait for an ACK or timeout
    before hard killing a thread which is awaiting a communication. We
    extend the `SHUTDOWN WAIT FOR SLAVES` logic to identify and ignore
    any threads waiting for a semi-sync ACK in phase 1. Then, before
    stopping the ack receiver thread, the shutdown is delayed until all
    waiting semi-sync connections receive an ACK or time out. The
    connections are then killed in phase 2.
    
    Notes:
     1) There remains an unresolved corner case that affects this
    patch. MDEV-28141: Slave crashes with Packets out of order when
    connecting to a shutting down master. Specifically, If a slave is
    connecting to a master which is actively shutting down, the slave
    can crash with a "Packets out of order" assertion error. To get
    around this issue in the MTR tests, the primary will wait a small
    amount of time before phase 1 killing threads to let the replicas
    safely stop (if applicable).
     2) This patch also fixes MDEV-28114: Semi-sync Master ACK Receiver
    Thread Can Error on COM_QUIT
    
    Reviewed By
    ============
    Andrei Elkin <andrei.elkin@mariadb.com>
    a83c7ab1
semisync_slave.cc 8.33 KB