• Kristian Nielsen's avatar
    MDEV-7888, MDEV-7929: Parallel replication hangs sometimes on ANALYZE TABLE or DDL · 3b961347
    Kristian Nielsen authored
    The hangs occur when the group_commit_orderer object is freed before the last
    mark_start_commit() call on it - this loses the wakeup to other waiting worker
    threads, causing them to hang until killed manually.
    
    The object was freed because wakeup_subsequent_commits() was called two early
    in two places. For MDEV-7888, during ANALYZE TABLE, and for MDEV-7929 during
    record_gtid() after processing a DDL event. The group_commit_orderer object
    can be freed when its last transaction has called wait_for_prior_commit().
    
    Fix by implementing a suspend/resume mechanism for wakeup_subsequent_commits()
    that can be used in places where a transaction is committed without this being
    the commit of the actual replication event group.
    
    Also add a protection mechanism (that asserts in debug builds) which can
    prevent the too-early free and hang if other similar bugs should remain in
    other parts of the code.
    3b961347
rpl_parallel.cc 66.5 KB