1. 28 Aug, 2017 3 commits
    • Elena Stepanova's avatar
    • Marko Mäkelä's avatar
      309fe35f
    • Marko Mäkelä's avatar
      MDEV-13637 InnoDB change buffer housekeeping can cause redo log overrun and possibly deadlocks · f87cb652
      Marko Mäkelä authored
      The function ibuf_remove_free_page() may be called while the caller
      is holding several mutexes or rw-locks. Because of this, this
      housekeeping loop may cause performance glitches for operations that
      involve tables that are stored in the InnoDB system tablespace.
      Also deadlocks might be possible.
      
      The worst impact of all is that due to the mutexes being held, calls to
      log_free_check() had to be skipped during this housekeeping.
      This means that the cyclic InnoDB redo log may be overwritten.
      If the system crashes during this, it would be unable to recover.
      
      The entry point to the problematic code is ibuf_free_excess_pages().
      It would make sense to call it before acquiring any mutexes or rw-locks,
      in any 'pessimistic' operation that involves the system tablespace.
      
      fseg_create_general(), fseg_alloc_free_page_general(): Do not call
      ibuf_free_excess_pages() while potentially holding some latches.
      
      ibuf_remove_free_page(): Do call log_free_check(), like every operation
      that is about to generate redo log should do.
      
      ibuf_free_excess_pages(): Remove some assertions that are replaced
      by stricter assertions in the log_free_check() that is now called by
      ibuf_remove_free_page().
      
      row_mtr_start(): New function, to perform necessary preparations when
      starting a mini-transaction for row operations. For pessimistic operations
      on secondary indexes that are located in the system tablespace,
      this includes calling ibuf_free_excess_pages().
      
      row_undo_ins_remove_sec_low(), row_undo_mod_del_mark_or_remove_sec_low(),
      row_undo_mod_del_unmark_sec_and_undo_update(): Call row_mtr_start().
      
      row_ins_sec_index_entry(): Call ibuf_free_excess_pages() if the operation
      may involve allocating pages and change buffering in the system tablespace.
      
      row_upd_sec_index_entry(): Slightly refactor the code. The
      delete-marking of the old entry is done in-place. It could be
      change-buffered, but the old code should be unlikely to have
      invoked ibuf_free_excess_pages() in this case.
      f87cb652
  2. 24 Aug, 2017 2 commits
    • Kenny John Jacob's avatar
      Update README.md · a544225d
      Kenny John Jacob authored
      Fix minor typo.
      a544225d
    • Marko Mäkelä's avatar
      MDEV-13534 InnoDB STATS_PERSISTENT fails to ignore garbage delete-mark flag on node pointer pages · e7bf8bca
      Marko Mäkelä authored
      This bug was a regression caused by MDEV-12698.
      
      On non-leaf pages, the delete-mark flag in the node pointer records is
      basically garbage. (Delete-marking only makes sense at the leaf level
      anyway. The purpose of the delete-mark is to tell MVCC, locking and purge
      that a leaf-level record does not exist in the READ UNCOMMITTED view,
      but it used to exist.)
      Node pointer records and non-leaf pages are glue that attaches multiple
      leaf pages to an index. This glue is supposed to be transparent to the
      transactional layer.
      
      When a page is split, InnoDB creates a node pointer record out of the
      child page record that the cursor is positioned on. The node pointer record
      for the parent page will be a copy of the child page record, amended with
      the child page number. If the child page record happened to carry the
      delete-mark flag, then the node pointer record would also carry this flag
      (even though the flag makes no sense outside child pages).
      
      (On a related note, for the first node pointer record in the first
      node pointer page of each tree level, if the MIN_REC_FLAG is set,
      the rest of the record contents (except the child page number)
      is basically garbage. From this garbage you could deduce at which point
      the child was originally split.)
      
      page_scan_method_t: Replace with bool, as there are only 2 values.
      
      dict_stats_scan_page(): Replace the parameter scan_method with is_leaf.
      Ignore the bogus (garbage) delete-mark flag if !is_leaf.
      e7bf8bca
  3. 23 Aug, 2017 8 commits
    • Sergei Petrunia's avatar
      MDEV-13602: rocksdb.index_merge_rocksdb2 failed in buildbot · ae0759ad
      Sergei Petrunia authored
      - Add include/index_merge*. Upstream has different files than MariaDB,
        use copies theirs, not ours.
      - There was a prblem with running "DDL-like" commands with binlog=ON:
        MariaDB sets binlog_format=STATEMENT for the duration of such command
        to prevent RBR replication from catching (and replicating) updates to
        system tables.
        However, MyRocks tries to prevent any writes to MyRocks tables with
        binlog_format!=ROW.
      - Added exceptions for DDL-type commands (ANALYZE TABLE, OPTIMIZE TABLE)
      - Added special handling for "LOCK TABLE(s) myrocks_table WRITE".
      ae0759ad
    • Marko Mäkelä's avatar
      The test failed once on Buildbot with the result difference: · 06b4b99f
      Marko Mäkelä authored
       # ib_logfile0 expecting FOUND
      -FOUND 3 /public|gossip/ in ib_logfile0
      +FOUND 2 /public|gossip/ in ib_logfile0
      
      The most plausible explanation for this difference
      should be that the redo log payload grew was so big that
      one of the strings (for writing the undo log record,
      clustered index record, and secondary index record)
      was written to ib_logfile1 instead of ib_logfile0.
      
      Let us run the test with --innodb-log-files-in-group=1 so that
      only a single log file will be used.
      06b4b99f
    • Marko Mäkelä's avatar
      Adjust InnoDB debug assertions for Oracle Bug#25551311 aka Bug#23517560 · 81bd81fb
      Marko Mäkelä authored
      The MySQL 5.6.36 merge (commit 0af98182
      in MariaDB Server 10.0.31, 10.1.24, 10.2.7) introduced a change from
      Oracle:
      Bug#25551311 BACKPORT BUG #23517560 REMOVE SPACE_ID RESTRICTION
      FOR UNDO TABLESPACES
      
      Some debug assertions in MariaDB 10.2 were still assuming that the
      InnoDB undo tablespace IDs start from 1. With the above mentioned
      change, the undo tablespace IDs must be contiguous and nonzero.
      81bd81fb
    • Marko Mäkelä's avatar
      MDEV-13167 InnoDB key rotation is not skipping unused pages · 36a97172
      Marko Mäkelä authored
      In key rotation, we must initialize unallocated but previously
      initialized pages, so that if encryption is enabled on a table,
      all clear-text data for the page will eventually be overwritten.
      But we should not rotate keys on pages that were never allocated
      after the data file was created.
      
      According to the latching order rules, after acquiring the
      tablespace latch, no page latches of previously allocated user pages
      may be acquired. So, key rotation should check the page allocation
      status after acquiring the page latch, not before. But, the latching
      order rules also prohibit accessing pages that were not allocated first,
      and then acquiring the tablespace latch. Such behaviour would indeed
      result in a deadlock when running the following tests:
      encryption.innodb_encryption-page-compression
      encryption.innodb-checksum-algorithm
      
      Because the key rotation is accessing potentially unallocated pages, it
      cannot reliably check if these pages were allocated. It can only check
      the page header. If the page number is zero, we can assume that the
      page is unallocated.
      
      fil_crypt_rotate_pages(): Skip pages that are known to be uninitialized.
      
      fil_crypt_rotate_page(): Detect uninitialized pages by FIL_PAGE_OFFSET.
      Page 0 is never encrypted, and on other pages that are initialized,
      FIL_PAGE_OFFSET must contain the page number.
      
      fil_crypt_is_page_uninitialized(): Remove. It suffices to check the
      page number field in fil_crypt_rotate_page().
      36a97172
    • Marko Mäkelä's avatar
      Code clean-up related to MDEV-13167 · e52dd13c
      Marko Mäkelä authored
      xdes_get_descriptor_const(): New function, to get read-only access to
      the allocation descriptor.
      
      fseg_page_is_free(): Only acquire a shared latch on the tablespace,
      not an exclusive latch. Calculate the descriptor page address before
      acquiring the tablespace latch. If the page number is out of bounds,
      return without fetching any page. Access only one descriptor page.
      
      fsp_page_is_free(), fsp_page_is_free_func(): Remove.
      Use fseg_page_is_free() instead.
      
      fsp_init_file_page(): Move the debug parameter into a separate function.
      
      btr_validate_level(): Remove the unused variable "seg".
      e52dd13c
    • Marko Mäkelä's avatar
      MDEV-13485 MTR tests fail massively with --innodb-sync-debug · 59caf2c3
      Marko Mäkelä authored
      The parameter --innodb-sync-debug, which is disabled by default,
      aims to find potential deadlocks in InnoDB.
      
      When the parameter is enabled, lots of tests failed. Most of these
      failures were due to bogus diagnostics. But, as part of this fix,
      we are also fixing a bug in error handling code and removing dead
      code, and fixing cases where an uninitialized mutex was being
      locked and unlocked.
      
      dict_create_foreign_constraints_low(): Remove an extraneous
      mutex_exit() call that could cause corruption in an error handling
      path. Also, do not unnecessarily acquire dict_foreign_err_mutex.
      Its only purpose is to control concurrent access to
      dict_foreign_err_file.
      
      row_ins_foreign_trx_print(): Replace a redundant condition with a
      debug assertion.
      
      srv_dict_tmpfile, srv_dict_tmpfile_mutex: Remove. The
      temporary file is never being written to or read from.
      
      log_free_check(): Allow SYNC_FTS_CACHE (fts_cache_t::lock)
      to be held.
      
      ha_innobase::inplace_alter_table(), row_merge_insert_index_tuples():
      Assert that no unexpected latches are being held.
      
      sync_latch_meta_init(): Properly initialize dict_operation_lock_key
      at SYNC_DICT_OPERATION. dict_sys->mutex is SYNC_DICT, and
      the now-removed SRV_DICT_TMPFILE was wrongly registered at
      SYNC_DICT_OPERATION.
      
      buf_block_init(): Correctly register buf_block_t::debug_latch.
      It was previously misleadingly reported as LATCH_ID_DICT_FOREIGN_ERR.
      
      latch_level_t: Correct the relative latching order of
      SYNC_IBUF_PESS_INSERT_MUTEX,SYNC_INDEX_TREE and
      SYNC_FILE_FORMAT_TAG,SYNC_DICT_OPERATION to avoid bogus failures.
      
      row_drop_table_for_mysql(): Avoid accessing btr_defragment_mutex
      if the defragmentation thread has not been started. This is the
      case during fts_drop_orphaned_tables() in recv_recovery_rollback_active().
      
      fil_space_destroy_crypt_data(): Avoid acquiring fil_crypt_threads_mutex
      when it is uninitialized. We may have created crypt_data before the
      mutex was created, and the mutex creation would be skipped if
      InnoDB startup failed or --innodb-read-only was specified.
      59caf2c3
    • Marko Mäkelä's avatar
      Remove the unused redo log record type MLOG_INIT_FILE_PAGE · 1621d32e
      Marko Mäkelä authored
      InnoDB stopped generating the MLOG_INIT_FILE_PAGE record in
      MySQL 5.7.5. Starting with MySQL 5.7.9 (which was imported to
      MariaDB Server 10.2.2), the InnoDB redo log format tag prevents
      crash recovery from old-format redo logs.
      
      Remove the dead code for dealing with MLOG_INIT_FILE_PAGE.
      1621d32e
    • Marko Mäkelä's avatar
      MDEV-13452 Assertion `!recv_no_log_write' failed at startup · 825b6a35
      Marko Mäkelä authored
      The previous fix (commit dcdc1c6d)
      should have removed the assertion from log_close(), because every
      caller that requires this assertion is already asserting that log
      writes are allowed. When fil_names_clear() is called, it must be
      able to write the MLOG_CHECKPOINT records. The purpose of the debug
      variable recv_no_log_write is to prevent the creation of page-level
      redo log records, or modifications to persistent data.
      825b6a35
  4. 21 Aug, 2017 2 commits
  5. 18 Aug, 2017 7 commits
    • Marko Mäkelä's avatar
      MDEV-13559 encryption.innodb-redo-badkey failed in buildbot · 86fc5ece
      Marko Mäkelä authored
      Add suppressions for the read and decompression errors.
      This may be 10.3 specific and related to MDEV-13536 which increases
      purge activity. But it does not hurt to suppress rarely occurring
      and plausible error messages for this fault-injection test already in 10.2.
      86fc5ece
    • Marko Mäkelä's avatar
      MDEV-13570 Assertion failure !srv_read_only_mode in --innodb-read-only... · 8a9e9d89
      Marko Mäkelä authored
      MDEV-13570 Assertion failure !srv_read_only_mode in --innodb-read-only shutdown when buf_resize_thread is active
      
      logs_empty_and_mark_files_at_shutdown(): Skip the debug assertion
      when the buf_resize_thread is active.
      8a9e9d89
    • Marko Mäkelä's avatar
      MDEV-13575 On failure, Mariabackup --backup --safe-slave-backup may forget to... · 8a3e2970
      Marko Mäkelä authored
      MDEV-13575 On failure, Mariabackup --backup --safe-slave-backup may forget to START SLAVE SQL_THREAD
      
      backup_release(): New function, refactored from backup_finish().
      Release some resources that may have been acquired by backup_startup()
      and should be released even after a failed operation.
      
      xtrabackup_backup_low(): Refactored from xtrabackup_backup_func().
      
      xtrabackup_backup_func(): Always call backup_release() after calling
      backup_start().
      8a3e2970
    • Daniel Bartholomew's avatar
      bump the VERSION · 72ac85cd
      Daniel Bartholomew authored
      72ac85cd
    • Marko Mäkelä's avatar
      MDEV-13754 Memory leak in mariabackup.incremental_backup · 605b8352
      Marko Mäkelä authored
      The test mariabackup.incremental_backup revealed a memory leak
      in have_queries_to_wait_for(). The problem is that
      xb_mysql_query() is being invoked with bool use_result=true
      but the result is not being freed by mysql_store_result().
      There are similar leaks in other functions.
      
      have_queries_to_wait_for(): Invoke mysql_free_result() to
      clean up after the mysql_store_result() that was invoked
      by xb_mysql_query().
      
      select_incremental_lsn_from_history(): Plug the leak on failure.
      
      kill_long_queries(): Plug the memory leak.
      (This function always leaked memory when it was called.)
      605b8352
    • Marko Mäkelä's avatar
      MDEV-13574 related Mariabackup code cleanup (non-functional change) · 74ce0cf1
      Marko Mäkelä authored
      have_queries_to_wait_for(), kill_long_queries(): Declare and initialize
      variables in one go.
      74ce0cf1
    • Marko Mäkelä's avatar
      Follow-up fix to MDEV-12988 backup fails if innodb_undo_tablespaces>0 · e9e051d2
      Marko Mäkelä authored
      The fix broke mariabackup --prepare --incremental.
      
      The restore of an incremental backup starts up (parts of) InnoDB twice.
      First, all data files are discovered for applying .delta files. Then,
      after the .delta files have been applied, InnoDB will be restarted
      more completely, so that the redo log records will be applied via the
      buffer pool.
      
      During the first startup, the buffer pool is not initialized, and thus
      trx_rseg_get_n_undo_tablespaces() must not be invoked. The apply of
      the .delta files will currently assume that the --innodb-undo-tablespaces
      option correctly specifies the number of undo tablespace files, just
      like --backup does.
      
      The second InnoDB startup of --prepare for applying the redo log will
      properly invoke trx_rseg_get_n_undo_tablespaces().
      
      enum srv_operation_mode: Add SRV_OPERATION_RESTORE_DELTA for
      distinguishing the apply of .delta files from SRV_OPERATION_RESTORE.
      
      srv_undo_tablespaces_init(): In mariabackup --prepare --incremental,
      in the initial SRV_OPERATION_RESTORE_DELTA phase, do not invoke
      trx_rseg_get_n_undo_tablespaces() because the buffer pool or the
      redo logs are not available. Instead, blindly rely on the parameter
      --innodb-undo-tablespaces.
      e9e051d2
  6. 17 Aug, 2017 6 commits
  7. 16 Aug, 2017 6 commits
  8. 15 Aug, 2017 6 commits
    • Igor Babaev's avatar
      Fixed the bug mdev-13346. · a28152aa
      Igor Babaev authored
      The bug was caused by a defect of the patch for the bug 11081.
      The patch was actually a port of the fix this bug from the mysql
      code line. Later a correction of this fix was added to the
      mysql code. Here's the comment this correction was provided with:
      
        Bug#16499751: Opening cursor on SELECT in stored procedure causes segfault
        This is a regression from the fix of bug#14740889.
        The fix started using another set of expressions as the source for
        the temporary table used for the materialized cursor. However,
        JOIN::make_tmp_tables_info() calls setup_copy_fields() which creates
        an Item_copy wrapper object on top of the function being selected.
        The Item_copy objects were not properly handled by create_tmp_table -
        they were simply ignored. This patch creates temporary table fields
        based on the underlying item of the Item_copy objects.
      
      The test case for the bug 13346 was taken from mdev-13380.
      a28152aa
    • Sergei Petrunia's avatar
      MDEV-13515: rocksdb.use_direct_reads_writes fails in buildbot with not found pattern · c354cb66
      Sergei Petrunia authored
      The test mis-used MTR's "restart the server if it crashed or exited"
      feature to try starting MyRocks plugin with invalid arguments.
      
      Changed the test to use the --default-storage-engine=myisam which
      allows the server to start when MyRocks fails to start.
      
      This removes the need to "start the server with the arguments which
      will caused it to fail to start", and so removes the race conditions
      with MTR server restart code and  mysqld.*.expect file.
      c354cb66
    • Marko Mäkelä's avatar
      MDEV-13331 FK DELETE CASCADE does not honor innodb_lock_wait_timeout · 5d1c0d00
      Marko Mäkelä authored
      row_ins_check_foreign_constraint(): On timeout,
      return DB_LOCK_WAIT_TIMEOUT instead of DB_LOCK_WAIT,
      so that the lock wait will be properly terminated.
      Also, replace some redundant assignments.
      
      It looks like this bug was introduced in MySQL 5.7.8 by:
      
          commit a97f6b91227c7e0fc3151cfe5421891e79c12d19
          Author: Annamalai Gurusami <annamalai.gurusami@oracle.com>
          Date:   Tue Jun 9 16:02:31 2015 +0530
      
              Bug #20953265 INNODB: FAILING ASSERTION: RESULT != FTS_INVALID
      5d1c0d00
    • Marko Mäkelä's avatar
      MDEV-13498 DELETE with CASCADE constraints takes long time / MDEV-13246 · 2f342c45
      Marko Mäkelä authored
      MDEV-13498 is a performance regression that was introduced in MariaDB 10.2.2
      by commit fec844ac
      which introduced some Galera-specific conditions that were being
      evaluated even if the write-set replication was not enabled.
      
      MDEV-13246 Stale rows despite ON DELETE CASCADE constraint
      is a correctness regression that was introduced by the same commit.
      
      Especially the subcondition
      	!(parent && que_node_get_type(parent) == QUE_NODE_UPDATE)
      which is equivalent to
      	!parent || que_node_get_type(parent) != QUE_NODE_UPDATE
      makes little sense. If parent==NULL, the evaluation would proceed to the
      std::find() expression, which would dereference parent. Because no SIGSEGV
      was observed related to this, we can conclude that parent!=NULL always
      holds. But then, the condition would be equivalent to
      	que_node_get_type(parent) != QUE_NODE_UPDATE
      which would not make sense either, because the std::find() expression
      is actually assuming the opposite when casting parent to upd_node_t*.
      
      It looks like this condition never worked properly, or that
      it was never properly tested, or both.
      
      wsrep_must_process_fk(): Helper function to check if FOREIGN KEY
      constraints need to be processed. Only evaluate the costly std::find()
      expression when write-set replication is enabled.
      
      Also, rely on operator<<(std::ostream&, const id_name_t&) and
      operator<<(std::ostream&, const table_name_t&) for pretty-printing
      index and table names.
      
      row_upd_sec_index_entry(): Add !wsrep_thd_is_BF() to the condition.
      This is applying part of "Galera MW-369 FK fixes"
      https://github.com/codership/mysql-wsrep/commit/f37b79c6dab101310a45a9e8cb23c0f98716da52
      that is described by the following part of the commit comment:
          additionally: skipping wsrep_row_upd_check_foreign_constraint if thd has
          BF, essentially is applier or replaying
          This FK check would be needed only for populating parent row FK keys
          in write set, so no use for appliers
      2f342c45
    • Marko Mäkelä's avatar
      MDEV-13520 InnoDB attempts UPDATE with DB_TRX_ID=0 if innodb_force_recovery=3 · b4f6b678
      Marko Mäkelä authored
      trx_set_rw_mode(): Check the flag high_level_read_only instead
      of testing srv_force_recovery (innodb_force_recovery) directly.
      There is no need to prevent the creation of read-write transactions
      if innodb_force_recovery=3 is used. Yes, in that mode any recovered
      incomplete transactions will not be rolled back, but these transactions
      will continue to hold locks on the records that they have modified.
      If the new read-write transactions hit conflicts with already existing
      (possibly recovered) transactions, the lock wait timeout mechanism
      will work just fine.
      b4f6b678
    • Marko Mäkelä's avatar
      Fix a test result · a5e4365e
      Marko Mäkelä authored
      a5e4365e