1. 11 Jul, 2014 3 commits
    • Kristian Nielsen's avatar
      MDEV-5262, MDEV-5914, MDEV-5941, MDEV-6020: Deadlocks during parallel... · 501c56ef
      Kristian Nielsen authored
      MDEV-5262, MDEV-5914, MDEV-5941, MDEV-6020: Deadlocks during parallel replication causing replication to fail.
      
      Merge the patches into MariaDB 10.0 main.
      
      With this patch, parallel replication will now automatically retry a
      transaction that fails due to deadlock or other temporary error, same as
      single-threaded replication.
      
      We catch deadlocks with InnoDB transactions due to enforced commit order. If
      T1 must commit before T2 in parallel replication and T1 ends up waiting for T2
      inside InnoDB, we kill T2 and retry it later to resolve the deadlock
      automatically.
      
      501c56ef
    • Kristian Nielsen's avatar
      Fix test failure seen in buildbot on power8. · fd0abeca
      Kristian Nielsen authored
      GTID order in @@gtid_binlog_pos depends on internal hash order,
      so requires --replace_result for stable test output.
      fd0abeca
    • Kristian Nielsen's avatar
      MDEV-5262, MDEV-5914, MDEV-5941, MDEV-6020: Deadlocks during parallel... · e81ecc9c
      Kristian Nielsen authored
      MDEV-5262, MDEV-5914, MDEV-5941, MDEV-6020: Deadlocks during parallel replication causing replication to fail.
      
      Fix a bug discovered in Buildbot valgrind. The logic in checking for slave
      init thread completion was reversed, so depending on thread scheduling server
      startup could hang.
      
      Also add another variant of SSL valgrind suppression, needed for different
      library version.
      
      e81ecc9c
  2. 10 Jul, 2014 4 commits
  3. 09 Jul, 2014 1 commit
  4. 08 Jul, 2014 3 commits
    • Kristian Nielsen's avatar
      Fix small merge errors after rebase · ba4e56d8
      Kristian Nielsen authored
      ba4e56d8
    • Kristian Nielsen's avatar
      MDEV-5262, MDEV-5914, MDEV-5941, MDEV-6020: Deadlocks during parallel... · 92577cc0
      Kristian Nielsen authored
      MDEV-5262, MDEV-5914, MDEV-5941, MDEV-6020: Deadlocks during parallel replication causing replication to fail.
      
      Fix small (but nasty) typo.
      92577cc0
    • Kristian Nielsen's avatar
      MDEV-5262, MDEV-5914, MDEV-5941, MDEV-6020: Deadlocks during parallel... · 98fc5b3a
      Kristian Nielsen authored
      MDEV-5262, MDEV-5914, MDEV-5941, MDEV-6020: Deadlocks during parallel replication causing replication to fail.
      
      After-review changes.
      
      For this patch in 10.0, we do not introduce a new public storage engine API,
      we just fix the InnoDB/XtraDB issues. In 10.1, we will make a better public
      API that can be used for all storage engines (MDEV-6429).
      
      Eliminate the background thread that did deadlock kills asynchroneously.
      Instead, we ensure that the InnoDB/XtraDB code can handle doing the kill from
      inside the deadlock detection code (when thd_report_wait_for() needs to kill a
      later thread to resolve a deadlock).
      
      (We preserve the part of the original patch that introduces dedicated mutex
      and condition for the slave init thread, to remove the abuse of
      LOCK_thread_count for start/stop synchronisation of the slave init thread).
      
      98fc5b3a
  5. 04 Jul, 2014 1 commit
  6. 25 Jun, 2014 1 commit
    • Kristian Nielsen's avatar
      MDEV-4937: sql_slave_skip_counter does not work with GTID · 9150a0c7
      Kristian Nielsen authored
      The sql_slave_skip_counter is important to be able to recover replication from
      certain errors. Often, an appropriate solution is to set
      sql_slave_skip_counter to skip over a problem event. But setting
      sql_slave_skip_counter produced an error in GTID mode, with a suggestion to
      instead set @@gtid_slave_pos to point past the problem event. This however is
      not always possible; for example, in case of an INCIDENT event, that event
      does not have any GTID to assign to @@gtid_slave_pos.
      
      With this patch, sql_slave_skip_counter now works in GTID mode the same was as
      in non-GTID mode. When set, that many initial events are skipped when the SQL
      thread starts, plus as many extra events are needed to completely skip any
      partially skipped event group. The GTID position is updated to point past the
      skipped event(s).
      9150a0c7
  7. 09 Jul, 2014 4 commits
  8. 08 Jul, 2014 8 commits
  9. 07 Jul, 2014 1 commit
    • Kristian Nielsen's avatar
      MDEV-6120: When slave stops with error, error message should indicate the failing GTID · 2b4b857d
      Kristian Nielsen authored
      Follow-up patch. The original patch added an extra argument to the
      rli->report() function, however it was forgotten to adjust the calls
      accordingly in a few places.
      
      This patch updates the remaining calls as needed. In files log_event_old.cc
      and rpl_record_old.cc, it just adds NULL, since this is only for old event
      formats from ancient master servers, which would not have any GTID information
      to add to the error messages in any case.
      2b4b857d
  10. 04 Jul, 2014 2 commits
  11. 03 Jul, 2014 1 commit
  12. 30 Jun, 2014 2 commits
    • Alexey Botchkov's avatar
      MDEV-6073 Merge gis test cases form 5.6. · 80a02037
      Alexey Botchkov authored
              Tests were merged.
              As the implementation is different, the 'internal debugging' part
              was not merged, only a stub for it created.
      80a02037
    • Kristian Nielsen's avatar
      Fix test failures in rpl.rpl_checksum and rpl.rpl_gtid_errorlog. · 439f75f8
      Kristian Nielsen authored
      These tests use search_pattern_in_file.inc to search the error log for
      expected output. However, search_pattern_in_file.inc by default searched only
      the first 50000 bytes, so if the error log grew too big the tests would fail.
      
      This patch extends search_pattern_in_file.inc with an option to specify how
      much of the file to search, and whether to search from the start of the file
      or from the end. Then the rpl.rpl_checksum and rpl.rpl_gtid_errorlog test
      cases are fixed to search the last 50000 bytes of the error log, which will
      work no matter how large prior tests have made it.
      439f75f8
  13. 27 Jun, 2014 2 commits
    • Kristian Nielsen's avatar
      MDEV-6386: Assertion `thd->transaction.stmt.is_empty() || thd->in_sub_stmt ||... · 370318f8
      Kristian Nielsen authored
      MDEV-6386: Assertion `thd->transaction.stmt.is_empty() || thd->in_sub_stmt || (thd->state_flags & Open_tables_state::BACKUPS_AVAIL)' fails with parallel replication
      
      The direct cause of the assertion was missing error handling in
      record_gtid(). If ha_commit_trans() fails for the statement commit, there was
      missing code to catch the error and do ha_rollback_trans() in this case; this
      caused close_thread_tables() to assert.
      
      Normally, this error case is not hit, but in this case it was triggered due to
      another bug: When a transaction T1 fails during parallel replication, the code
      would signal following transactions that they could start to run without
      properly marking the error condition. This caused subsequent transactions to
      incorrectly start replicating, only to get an error later during their own
      commit step. This was particularly serious if the subsequent transactions were
      DDL or MyISAM updates, which cannot be rolled back and would leave replication
      in an inconsistent state.
      
      Fixed by 1) in case of error, only signal following transactions to continue
      once the error has been properly marked and those transactions will know not
      to start; and 2) implement proper error handling in record_gtid() in the case
      that statement commit fails.
      
      370318f8
    • Sergei Golubchik's avatar
      MDEV-6401 SET ROLE returning ERROR 1959 Invalid role specification for valid role · b9ddeeff
      Sergei Golubchik authored
      Use user's ip address when verifying privileges for SET ROLE (just like check_access() does)
      b9ddeeff
  14. 25 Jun, 2014 2 commits
    • Kristian Nielsen's avatar
      MDEV-6120: When slave stops with error, error message should indicate the failing GTID · 86362129
      Kristian Nielsen authored
      If replication breaks in GTID mode, it is not trivial to determine the GTID of
      the failing event group. This is a problem, as such GTID is needed eg. to
      explicitly set @@gtid_slave_pos to skip to after that event group, or to
      compare errors on different servers, etc.
      
      Fix by ensuring that relevant slave errors logged to the error log include the
      GTID of the event group containing the problem event.
      86362129
    • Kristian Nielsen's avatar
      MDEV-5799: Error messages written upon LOST EVENTS incident are corrupted · 00467e13
      Kristian Nielsen authored
      This is MySQL Bug#59123. The message string stored in an INCIDENT event was
      not zero-terminated. This caused any following checksum bytes (if enabled on
      the master) to be output to the error log as trailing garbage when the message
      was printed to the error log.
      
      Backport the patch from MySQL 5.6:
      
        revno: 2876.228.200
        revision-id: zhenxing.he@sun.com-20110111051323-w2xnzvcjn46x6h6u
        committer: He Zhenxing <zhenxing.he@sun.com>
        timestamp: Tue 2011-01-11 13:13:23 +0800
        message:
          BUG#59123 rpl_stm_binlog_max_cache_size fails sporadically with found warnings
      
      Also add a test case.
      00467e13
  15. 24 Jun, 2014 3 commits
  16. 23 Jun, 2014 1 commit
  17. 20 Jun, 2014 1 commit