• Kristian Nielsen's avatar
    MDEV-31448: Killing a replica thread awaiting its GCO can hang/crash a parallel replica · a8ea6627
    Kristian Nielsen authored
    The problem was an incorrect unmark_start_commit() in
    signal_error_to_sql_driver_thread(). If an event group gets an error, this
    unmark could run after the following GCO started, and the subsequent
    re-marking could access de-allocated GCO.
    
    The offending unmark_start_commit() looks obviously incorrect, and the fix
    is to just remove it. It was introduced in the MDEV-8302 patch, the commit
    message of which suggests it was added there solely to satisfy an assertion
    in ha_rollback_trans(). So update this assertion instead to not trigger for
    event groups that experienced an error (rgi->worker_error). When an error
    occurs in an event group, all following event groups are skipped anyway, so
    the unmark should never be needed in this case.
    Reviewed-by: default avatarAndrei Elkin <andrei.elkin@mariadb.com>
    Signed-off-by: default avatarKristian Nielsen <knielsen@knielsen-hq.org>
    a8ea6627
rpl_parallel.h 13.5 KB