• sjaakola's avatar
    MDEV-24966 Galera multi-master regression · a1e70388
    sjaakola authored
    After the merging of MDEV-24915, 10.6 branch has regressions with handling of
    concurrent write load against two or more cluster nodes. These regressions may
    surface as cluster hanging, node crashes or data inconsistency. With some test
    scenarios, the only visible symptom could be that the BF victim aborting happens
    only by innodb lock wait timeout expiration. This would result only to poor
    performance (by default 50 sec hang for each BF conflict), and could be somewhat
    difficult to diagnose.
    
    This pull request has following fixes to handle concurrent write load from
    multiple nodes:
    
    In lock_wait_wsrep_kill(), the victim trx was expected to be only in
    TRX_STATE_ACTIVE state. With the delayed BF conflict handling, it can happen
    that victim has advanced into pre commit state. This was fixed by choosing
    victim both in TRX_STATE_ACTIVE and TRX_STATE_PREPARED states.
    
    Victim transaction may be in several different states at the time of detected
    lock conflict, and due to delayed BF aborting practice in MDEV-24915, the victim
    may advance further before the actual BF aborting takes place. The BF aborting
    in MDEV-24915 did not wake the victim, if it was in the state of waiting for
    some other lock (than the one that was blocking the high priority thread).
    This anomaly caused the innodb lock wait timeout expiration delays and poor
    performance symptom. To fix this, lock_wait_wsrep_kill() now looks if
    victim is in lock waiting state, and uses lock_cancel_waiting_and_release()
    to cancel this lock wait.
    
    wsrep_bf_abort() checks if the victim has active transaction (in wsrep-lib),
    and starts a new transaction if there was no active transaction before.
    Due to late BF aborting, the victim may have e.g. failed in certification
    and is already aborting or has aborted at this stage. This has caused
    problems in testing where BF aborter tries to BF abort himself.
    The fix in wsrep_bf_abort() now skips the BF abort, if victim is aborting
    or has aborted. Victim may not have started transaction yet in wsrep context,
    but it may have acquired MDL locks (due to DDL execution), and this has
    caused BF conflict. Such case does not require aborting in wsrep or
    replication provider state.
    
    BF aborting could cause BF-BF conflict scenario, if victim was already aborted
    and changed to replayer having high priority as well. This BF-BF conflict
    scenario is now avoided in lock_wait_wsrep() where we now check if blocking
    lock holder is also high priority and is ordered before, caller should wait
    for the lock in this situation.
    
    The natural innodb deadlock resolving algorithm could pick BF thread as
    deadlock victim. This is fixed by giving max weigh to BF threads in
    Deadlock::report().
    
    MDEV-24341 has changed excution paths in do_command() and this affects BF
    aborted victim execution. This PR fixes one assert in do_command():
     DBUG_ASSERT(!thd->async_state.pending_ops())
    Which fired if the thd was BF aborted earlier. This assert is now changed
    to allow pending_ops() if thd was BF aborted before.
    
    With these fixes, long term highly conflicting write load could be run against
    to node cluster. If binlogging is configured, log_slave_updates should be
    also set.
    a1e70388
ha_innodb.cc 596 KB