1. 15 Jan, 2016 1 commit
    • Kristian Nielsen's avatar
      Fix error handling for GTID and domain-based parallel replication · 06b2e327
      Kristian Nielsen authored
      This occurs when replication stops with an error, domain-based parallel
      replication is used, and the GTID position contains more than one domain.
      Furthermore, it relates to the case where the SQL thread is restarted
      without first stopping the IO thread.
      
      In this case, the file/offset relay-log position does not correctly
      represent the slave's multi-dimensional position, because other domains may
      be far ahead of, or behind, the domain with the failing event. So the code
      reverts the relay log position back to the start of a relay log file that is
      known to be before all active domains.
      
      There was a bug that when the SQL thread was restarted, the
      rli->relay_log_state was incorrectly initialised from @@gtid_slave_pos. This
      position will likely be too far ahead, due to reverting the relay log
      position. Thus, if the replication fails again after the SQL thread restart,
      the rli->restart_gtid_pos might be updated incorrectly. This in turn would
      cause a second SQL thread restart to replicate from the wrong position, if
      the IO thread was still left running.
      
      The fix is to initialise rli->relay_log_state from @@gtid_slave_pos only
      when we actually purge and re-fetch relay logs from the master, not at every
      SQL thread start.
      
      A related problem is the use of sql_slave_skip_counter to resolve
      replication failures in this kind of scenario. Since the slave position is
      multi-dimensional, sql_slave_skip_counter can not work properly - it is
      indeterminate exactly which event is to be skipped, and is unlikely to work
      as expected for the user. So make this an error in the case where
      domain-based parallel replication is used with multiple domains, suggesting
      instead the user to set @@gtid_slave_pos to reliably skip the desired event.
      06b2e327
  2. 05 Aug, 2015 3 commits
  3. 04 Aug, 2015 8 commits
    • Sergei Golubchik's avatar
      Merge branch 'bb-10.0-jan' into 10.0 · 1610c428
      Sergei Golubchik authored
      5.5 with our InnoDB changes
      1610c428
    • Sergei Golubchik's avatar
      correct the NULL-pointer test · fa51f70d
      Sergei Golubchik authored
      fa51f70d
    • Sergei Golubchik's avatar
      after-merge fixes · 006ffca5
      Sergei Golubchik authored
      006ffca5
    • Kristian Nielsen's avatar
      Merge fix of embedded server build. · d6d54584
      Kristian Nielsen authored
      d6d54584
    • Kristian Nielsen's avatar
      Fix embedded server build · 5ca061e6
      Kristian Nielsen authored
      5ca061e6
    • Kristian Nielsen's avatar
      Merge MDEV-8302 into 10.0 · e8e2ef47
      Kristian Nielsen authored
      e8e2ef47
    • Kristian Nielsen's avatar
      MDEV-8302: Duplicate key with parallel replication · 9b9c5e89
      Kristian Nielsen authored
      This bug is essentially another variant of MDEV-7458.
      
      If a transaction conflict caused a deadlock kill of T2 in record_gtid()
      during commit, the code would do a rollback _before_ running
      rgi->unmark_start_commit(). This creates a race where following transactions
      could start too early (before T2 has completed its transaction retry). This
      in turn could lead to replication failure, if there was a conflict that
      caused eg. duplicate key error or similar.
      
      The fix is to remove these rollbacks (in Query_log_event::do_apply_event()
      and Xid_log_event::do_apply_event(). They seem out-of-place; code in
      log_event.cc generally does not roll back on error, this is handled higher
      up.
      
      In addition, because of the extreme difficulty of reproducing bugs like
      MDEV-7458 and MDEV-8302, this patch adds some extra precations to try to
      detect (in debug builds) or prevent (in release builds) similar bugs.
      ha_rollback_trans() will now call unmark_start_commit() if needed (and
      assert in debug build when a caller does rollback without unmark first).
      
      We also add an extra check for thd->killed() so that we avoid doing
      mark_start_commit() if we already have a pending deadlock kill.
      
      And we add a missing unmark_start_commit() call in the error case, found by
      the above assertion.
      9b9c5e89
    • Jan Lindström's avatar
      Fix merge error. · d71b5840
      Jan Lindström authored
      d71b5840
  4. 03 Aug, 2015 12 commits
  5. 01 Aug, 2015 7 commits
  6. 31 Jul, 2015 9 commits