An error occurred fetching the project authors.
  1. 18 Oct, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-26682 Replication timeouts with XA PREPARE · 18eab4a8
      Marko Mäkelä authored
      The purpose of non-exclusive locks in a transaction is to guarantee
      that the records covered by those locks must remain in that way until
      the transaction is committed. (The purpose of gap locks is to ensure
      that a record that was nonexistent will remain that way.)
      
      Once a transaction has reached the XA PREPARE state, the only allowed
      further actions are XA ROLLBACK or XA COMMIT. Therefore, it can be
      argued that only the exclusive locks that the XA PREPARE transaction
      is holding are essential.
      
      Furthermore, InnoDB never preserved explicit locks across server restart.
      For XA PREPARE transations, we will only recover implicit exclusive locks
      for records that had been modified.
      
      Because of the fact that XA PREPARE followed by a server restart will
      cause some locks to be lost, we might as well always release all
      non-exclusive locks during the execution of an XA PREPARE statement.
      
      lock_release_on_prepare(): Release non-exclusive locks on XA PREPARE.
      
      trx_prepare(): Invoke lock_release_on_prepare() unless the
      isolation level is SERIALIZABLE or this is an internal distributed
      transaction with the binlog (not actual XA PREPARE statement).
      
      This has been discussed with Sergei Golubchik and Andrei Elkin.
      
      Reviewed by: Sergei Golubchik
      18eab4a8
  2. 06 Sep, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-26467: Avoid re-reading srv_spin_wait_delay inside a loop · 0f0b7e47
      Marko Mäkelä authored
      Invoking ut_delay(srv_wpin_wait_delay) inside a spinloop would
      cause a read of 2 global variables as well as multiplication.
      Let us loop around MY_RELAX_CPU() using a precomputed loop count
      to keep the loops simpler, to help them scale better.
      
      We also tried precomputing the delay into a global variable,
      but that appeared to result in slightly worse throughput.
      0f0b7e47
  3. 31 Aug, 2021 3 commits
    • Marko Mäkelä's avatar
      MDEV-25919: Lock tables before acquiring dict_sys.latch · c5fd9aa5
      Marko Mäkelä authored
      In commit 1bd681c8 (MDEV-25506 part 3)
      we introduced a "fake instant timeout" when a transaction would wait
      for a table or record lock while holding dict_sys.latch. This prevented
      a deadlock of the server but could cause bogus errors for operations
      on the InnoDB persistent statistics tables.
      
      A better fix is to ensure that whenever a transaction is being
      executed in the InnoDB internal SQL parser (which will for now
      require dict_sys.latch to be held), it will already have acquired
      all locks that could be required for the execution. So, we will
      acquire the following locks upfront, before acquiring dict_sys.latch:
      
      (1) MDL on the affected user table (acquired by the SQL layer)
      (2) If applicable (not for RENAME TABLE): InnoDB table lock
      (3) If persistent statistics are going to be modified:
      (3.a) MDL_SHARED on mysql.innodb_table_stats, mysql.innodb_index_stats
      (3.b) exclusive table locks on the statistics tables
      (4) Exclusive table locks on the InnoDB data dictionary tables
      (not needed in ANALYZE TABLE and the like)
      
      Note: Acquiring exclusive locks on the statistics tables may cause
      more locking conflicts between concurrent DDL operations.
      Notably, RENAME TABLE will lock the statistics tables
      even if no persistent statistics are enabled for the table.
      
      DROP DATABASE will only acquire locks on statistics tables if
      persistent statistics are enabled for the tables on which the
      SQL layer is invoking ha_innobase::delete_table().
      For any "garbage collection" in innodb_drop_database(), a timeout
      while acquiring locks on the statistics tables will result in any
      statistics not being deleted for any tables that the SQL layer
      did not know about.
      
      If innodb_defragment=ON, information may be written to the statistics
      tables even for tables for which InnoDB persistent statistics are
      disabled. But, DROP TABLE will no longer attempt to delete that
      information if persistent statistics are not enabled for the table.
      
      This change should also fix the hangs related to InnoDB persistent
      statistics and STATS_AUTO_RECALC (MDEV-15020) as well as
      a bug that running ALTER TABLE on the statistics tables
      concurrently with running ALTER TABLE on InnoDB tables could
      cause trouble.
      
      lock_rec_enqueue_waiting(), lock_table_enqueue_waiting():
      Do not issue a fake instant timeout error when the transaction
      is holding dict_sys.latch. Instead, assert that the dict_sys.latch
      is never being held here.
      
      lock_sys_tables(): A new function to acquire exclusive locks on all
      dictionary tables, in case DROP TABLE or similar operation is
      being executed. Locking non-hard-coded tables is optional to avoid
      a crash in row_merge_drop_temp_indexes(). The SYS_VIRTUAL table was
      introduced in MySQL 5.7 and MariaDB Server 10.2. Normally, we require
      all these dictionary tables to exist before executing any DDL, but
      the function row_merge_drop_temp_indexes() is an exception.
      When upgrading from MariaDB Server 10.1 or MySQL 5.6 or earlier,
      the table SYS_VIRTUAL would not exist at this point.
      
      ha_innobase::commit_inplace_alter_table(): Invoke
      log_write_up_to() while not holding dict_sys.latch.
      
      dict_sys_t::remove(), dict_table_close(): No longer try to
      drop index stubs that were left behind by aborted online ADD INDEX.
      Such indexes should be dropped from the InnoDB data dictionary by
      row_merge_drop_indexes() as part of the failed DDL operation.
      Stubs for aborted indexes may only be left behind in the
      data dictionary cache.
      
      dict_stats_fetch_from_ps(): Use a normal read-only transaction.
      
      ha_innobase::delete_table(), ha_innobase::truncate(), fts_lock_table():
      While waiting for purge to stop using the table,
      do not hold dict_sys.latch.
      
      ha_innobase::delete_table(): Implement a work-around for the rollback
      of ALTER TABLE...ADD PARTITION. MDL_EXCLUSIVE would not be held if
      ALTER TABLE hits lock_wait_timeout while trying to upgrade the MDL
      due to a conflicting LOCK TABLES, such as in the first ALTER TABLE
      in the test case of Bug#53676 in parts.partition_special_innodb.
      Therefore, we must explicitly stop purge, because it would not be
      stopped by MDL.
      
      dict_stats_func(), btr_defragment_chunk(): Allocate a THD so that
      we can acquire MDL on the InnoDB persistent statistics tables.
      
      mysqltest_embedded: Invoke ha_pre_shutdown() before free_used_memory()
      in order to avoid ASAN heap-use-after-free related to acquire_thd().
      
      trx_t::dict_operation_lock_mode: Changed the type to bool.
      
      row_mysql_lock_data_dictionary(), row_mysql_unlock_data_dictionary():
      Implemented as macros.
      
      rollback_inplace_alter_table(): Apply an infinite timeout to lock waits.
      
      innodb_thd_increment_pending_ops(): Wrapper for
      thd_increment_pending_ops(). Never attempt async operation for
      InnoDB background threads, such as the trx_t::commit() in
      dict_stats_process_entry_from_recalc_pool().
      
      lock_sys_t::cancel(trx_t*): Make dictionary transactions immune to KILL.
      
      lock_wait(): Make dictionary transactions immune to KILL, and to
      lock wait timeout when waiting for locks on dictionary tables.
      
      parts.partition_special_innodb: Use lock_wait_timeout=0 to instantly
      get ER_LOCK_WAIT_TIMEOUT.
      
      main.mdl: Filter out MDL on InnoDB persistent statistics tables
      
      Reviewed by: Thirunarayanan Balathandayuthapani
      c5fd9aa5
    • Marko Mäkelä's avatar
      MDEV-25919 preparation: Various cleanup · 094de717
      Marko Mäkelä authored
      que_eval_sql(): Remove the parameter lock_dict. The only caller
      with lock_dict=true was dict_stats_exec_sql(), which will now
      explicitly invoke dict_sys.lock() and dict_sys.unlock() by itself.
      
      row_import_cleanup(): Do not unnecessarily lock the dictionary.
      Concurrent access to the table during ALTER TABLE...IMPORT TABLESPACE
      is prevented by MDL and the fact that there cannot exist any
      undo log or change buffer records that would refer to the table
      or tablespace.
      
      row_import_for_mysql(): Do not unnecessarily lock the dictionary
      while accessing fil_system. Thanks to MDL_EXCLUSIVE that was acquired
      by the SQL layer, only one IMPORT may be in effect for the table name.
      
      row_quiesce_set_state(): Do not unnecessarily lock the dictionary.
      The dict_table_t::quiesce state is documented to be protected by
      all index latches, which we are acquiring.
      
      dict_table_close(): Introduce a simpler variant with fewer parameters.
      
      dict_table_close(): Reduce the amount of calls.
      We can simply invoke dict_table_t::release() on startup or
      in DDL operations, or when the table is inaccessible.
      In none of these cases, there is no need to invalidate the
      InnoDB persistent statistics.
      
      pars_info_t::graph_owns_us: Remove (unused).
      
      pars_info_free(): Define inline.
      
      fts_delete(), trx_t::evict_table(), row_prebuilt_free(),
      row_rename_table_for_mysql(): Simplify.
      
      row_mysql_lock_data_dictionary(): Remove some references;
      use dict_sys.lock() and dict_sys.unlock() instead.
      
      row_mysql_lock_table(): Remove. Use lock_table_for_trx() instead.
      
      ha_innobase::check_if_supported_inplace_alter(),
      row_create_table_for_mysql(): Simply assert dict_sys.sys_tables_exist().
      In commit 49e2c8f0 and
      commit 1bd681c8 srv_start()
      actually guarantees that the system tables will exist,
      or the server is in read-only mode, or startup will fail.
      
      Reviewed by: Thirunarayanan Balathandayuthapani
      094de717
    • Marko Mäkelä's avatar
      MDEV-24258 Merge dict_sys.mutex into dict_sys.latch · 82b7c561
      Marko Mäkelä authored
      In the parent commit, dict_sys.latch could theoretically have been
      replaced with a mutex. But, we can do better and merge dict_sys.mutex
      into dict_sys.latch. Generally, every occurrence of dict_sys.mutex_lock()
      will be replaced with dict_sys.lock().
      
      The PERFORMANCE_SCHEMA instrumentation for dict_sys_mutex
      will be removed along with dict_sys.mutex. The dict_sys.latch
      will remain instrumented as dict_operation_lock.
      
      Some use of dict_sys.lock() will be replaced with dict_sys.freeze(),
      which we will reintroduce for the new shared mode. Most notably,
      concurrent table lookups are possible as long as the tables are present
      in the dict_sys cache. In particular, this will allow more concurrency
      among InnoDB purge workers.
      
      Because dict_sys.mutex will no longer 'throttle' the threads that purge
      InnoDB transaction history, a performance degradation may be observed
      unless innodb_purge_threads=1.
      
      The table cache eviction policy will become FIFO-like,
      similar to what happened to fil_system.LRU
      in commit 45ed9dd9.
      The name of the list dict_sys.table_LRU will become somewhat misleading;
      that list contains tables that may be evicted, even though the
      eviction policy no longer is least-recently-used but first-in-first-out.
      (Note: Tables can never be evicted as long as locks exist on them or
      the tables are in use by some thread.)
      
      As demonstrated by the test perfschema.sxlock_func, there
      will be less contention on dict_sys.latch, because some previous
      use of exclusive latches will be replaced with shared latches.
      
      fts_parse_sql_no_dict_lock(): Replaced with pars_sql().
      
      fts_get_table_name_prefix(): Merged to fts_optimize_create().
      
      dict_stats_update_transient_for_index(): Deduplicated some code.
      
      ha_innobase::info_low(), dict_stats_stop_bg(): Use a combination
      of dict_sys.latch and table->stats_mutex_lock() to cover the
      changes of BG_STAT_SHOULD_QUIT, because the flag is being read
      in dict_stats_update_persistent() while not holding dict_sys.latch.
      
      row_discard_tablespace_for_mysql(): Protect stats_bg_flag by
      exclusive dict_sys.latch, like most other code does.
      
      row_quiesce_table_has_fts_index(): Remove unnecessary mutex
      acquisition. FLUSH TABLES...FOR EXPORT is protected by MDL.
      
      row_import::set_root_by_heuristic(): Remove unnecessary mutex
      acquisition. ALTER TABLE...IMPORT TABLESPACE is protected by MDL.
      
      row_ins_sec_index_entry_low(): Replace a call
      to dict_set_corrupted_index_cache_only(). Reads of index->type
      were not really protected by dict_sys.mutex, and writes
      (flagging an index corrupted) should be extremely rare.
      
      dict_stats_process_entry_from_defrag_pool(): Only freeze the dictionary,
      do not lock it exclusively.
      
      dict_stats_wait_bg_to_stop_using_table(), DICT_BG_YIELD: Remove trx.
      We can simply invoke dict_sys.unlock() and dict_sys.lock() directly.
      
      dict_acquire_mdl_shared()<trylock=false>: Assert that dict_sys.latch is
      only held in shared more, not exclusive mode. Only acquire it in
      exclusive mode if the table needs to be loaded to the cache.
      
      dict_sys_t::acquire(): Remove. Relocating elements in dict_sys.table_LRU
      would require holding an exclusive latch, which we want to avoid
      for performance reasons.
      
      dict_sys_t::allow_eviction(): Add the table first to dict_sys.table_LRU,
      to compensate for the removal of dict_sys_t::acquire(). This function
      is only invoked by INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS.
      
      dict_table_open_on_id(), dict_table_open_on_name(): If dict_locked=false,
      try to acquire dict_sys.latch in shared mode. Only acquire the latch in
      exclusive mode if the table is not found in the cache.
      
      Reviewed by: Thirunarayanan Balathandayuthapani
      82b7c561
  4. 27 Jul, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-25594: Improve debug checks · cf1fc598
      Marko Mäkelä authored
      trx_t::will_lock: Changed the type to bool.
      
      trx_t::is_autocommit_non_locking(): Replaces
      trx_is_autocommit_non_locking().
      
      trx_is_ac_nl_ro(): Remove (replaced with equivalent assertion expressions).
      
      assert_trx_nonlocking_or_in_list(): Remove.
      Replaced with at least as strict checks in each place.
      
      check_trx_state(): Moved to a static function; partially replaced with
      individual debug assertions implementing equivalent or stricter checks.
      
      This is a backport of commit 7b51d11c
      from 10.5.
      cf1fc598
  5. 23 Jul, 2021 1 commit
  6. 22 Jul, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-26193: Wake up purge less often · a4dc9265
      Marko Mäkelä authored
      Starting with commit 6e12ebd4
      (MDEV-25062), srv_wake_purge_thread_if_not_active() became
      more expensive operation, especially on NUMA systems, because
      instead of reading an atomic global variable trx_sys.rseg_history_len
      we are traversing up to 128 cache lines in trx_sys.history_exists().
      
      trx_t::commit_cleanup(): Do not wake up purge at all.
      We will wake up purge about once per second in srv_master_callback().
      
      srv_master_do_active_tasks(), srv_master_do_idle_tasks():
      Move some duplicated code to srv_master_callback().
      
      srv_master_callback(): Invoke purge_coordinator_timer_callback()
      to ensure that purge will be periodically woken up, even if the
      latest execution of trx_t::commit_cleanup() allowed the purge view
      to advance but did not wake up purge.
      Do not call log_free_check(), because every thread that is going
      to generate redo log is supposed to call that function anyway,
      before acquiring any page latches. Additional calls to the function
      every few seconds should not make any difference.
      
      srv_shutdown_threads(): Ensure that srv_shutdown_state can be at most
      SRV_SHUTDOWN_INITIATED in srv_master_callback(), by first invoking
      srv_master_timer.reset() before changing srv_shutdown_state.
      (Note: We first terminate the srv_master_callback and only then
      terminate the purge tasks. Thus, the purge subsystem should exist
      when srv_master_callback() invokes purge_coordinator_timer_callback()
      if it was initiated in the first place.
      a4dc9265
  7. 20 Jul, 2021 1 commit
    • Jagdeep Sidhu's avatar
      Fix switch case statement in trx_flush_log_if_needed_low() · 5f8651ac
      Jagdeep Sidhu authored
      In commit 2e814d47 on MariaDB 10.2
      the switch case statement in trx_flush_log_if_needed_low() regressed.
      
      Since 10.2 this code was refactored to have switches in descending
      order, so value of 3 for innodb_flush_log_at_trx_commit is behaving
      the same as value of 2, that is no FSYNC is being enforced during
      COMMIT phase. The switch should however not be empty and cases 2 and 3
      should not have the identical contents.
      
      As per documentation, setting innodb_flush_log_at_trx_commit to 3
      should do FSYNC to disk if innodb_flush_log_at_trx_commit is set to 3.
      This fixes the regression so that the switch statement again does
      what users expect the setting should do.
      
      All new code of the whole pull request, including one or several files
      that are either new files or modified ones, are contributed under the
      BSD-new license. I am contributing on behalf of my employer Amazon Web
      Services, Inc.
      5f8651ac
  8. 03 Jul, 2021 1 commit
    • Marko Mäkelä's avatar
      fixup 0a67b15a · 789a2a36
      Marko Mäkelä authored
      trx_t::free(): Declare xid as fully initialized in order to
      avoid tripping the subsequent MEM_CHECK_DEFINED
      (in WITH_MSAN and WITH_VALGRIND builds).
      789a2a36
  9. 01 Jul, 2021 2 commits
  10. 24 Jun, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-26007 Rollback unnecessarily initiates redo log write · 033e29b6
      Marko Mäkelä authored
      trx_t::commit_in_memory(): Do not initiate a redo log write if
      the transaction has no visible effect. If anything for this
      transaction had been made durable, crash recovery will roll back
      the transaction just fine even if the end of ROLLBACK is not
      durably written.
      
      Rollbacks of transactions that are associated with XA identifiers
      (possibly internally via the binlog) will always be persisted.
      The test rpl.rpl_gtid_crash covers this.
      033e29b6
  11. 23 Jun, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-25062: Reduce trx_rseg_t::mutex contention · 6e12ebd4
      Marko Mäkelä authored
      redo_rseg_mutex, noredo_rseg_mutex: Remove the PERFORMANCE_SCHEMA keys.
      The rollback segment mutex will be uninstrumented.
      
      trx_sys_t: Remove pointer indirection for rseg_array, temp_rseg.
      Align each element to the cache line.
      
      trx_sys_t::rseg_id(): Replaces trx_rseg_t::id.
      
      trx_rseg_t::ref: Replaces needs_purge, trx_ref_count, skip_allocation
      in a single std::atomic<uint32_t>.
      
      trx_rseg_t::latch: Replaces trx_rseg_t::mutex.
      
      trx_rseg_t::history_size: Replaces trx_sys_t::rseg_history_len
      
      trx_sys_t::history_size_approx(): Replaces trx_sys.rseg_history_len
      in those places where the exact count does not matter. We must not
      acquire any trx_rseg_t::latch while holding index page latches, because
      normally the trx_rseg_t::latch is acquired before any page latches.
      
      trx_sys_t::history_exists(): Replaces trx_sys.rseg_history_len!=0
      with an approximation.
      
      We remove some unnecessary trx_rseg_t::latch acquisition around
      trx_undo_set_state_at_prepare() and trx_undo_set_state_at_finish().
      Those operations will only access fields that remain constant
      after trx_rseg_t::init().
      6e12ebd4
  12. 21 Jun, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-15912: Remove traces of insert_undo · e46f76c9
      Marko Mäkelä authored
      Let us simply refuse an upgrade from earlier versions if the
      upgrade procedure was not followed. This simplifies the purge,
      commit, and rollback of transactions.
      
      Before upgrading to MariaDB 10.3 or later, a clean shutdown
      of the server (with innodb_fast_shutdown=1 or 0) is necessary,
      to ensure that any incomplete transactions are rolled back.
      The undo log format was changed in MDEV-12288. There is only
      one persistent undo log for each transaction.
      e46f76c9
  13. 09 Jun, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-25506 (3 of 3): Do not delete .ibd files before commit · 1bd681c8
      Marko Mäkelä authored
      This is a complete rewrite of DROP TABLE, also as part of other DDL,
      such as ALTER TABLE, CREATE TABLE...SELECT, TRUNCATE TABLE.
      
      The background DROP TABLE queue hack is removed.
      If a transaction needs to drop and create a table by the same name
      (like TRUNCATE TABLE does), it must first rename the table to an
      internal #sql-ib name. No committed version of the data dictionary
      will include any #sql-ib tables, because whenever a transaction
      renames a table to a #sql-ib name, it will also drop that table.
      Either the rename will be rolled back, or the drop will be committed.
      
      Data files will be unlinked after the transaction has been committed
      and a FILE_RENAME record has been durably written. The file will
      actually be deleted when the detached file handle returned by
      fil_delete_tablespace() will be closed, after the latches have been
      released. It is possible that a purge of the delete of the SYS_INDEXES
      record for the clustered index will execute fil_delete_tablespace()
      concurrently with the DDL transaction. In that case, the thread that
      arrives later will wait for the other thread to finish.
      
      HTON_TRUNCATE_REQUIRES_EXCLUSIVE_USE: A new handler flag.
      ha_innobase::truncate() now requires that all other references to
      the table be released in advance. This was implemented by Monty.
      
      ha_innobase::delete_table(): If CREATE TABLE..SELECT is detected,
      we will "hijack" the current transaction, drop the table in
      the current transaction and commit the current transaction.
      This essentially fixes MDEV-21602. There is a FIXME comment about
      making the check less failure-prone.
      
      ha_innobase::truncate(), ha_innobase::delete_table():
      Implement a fast path for temporary tables. We will no longer allow
      temporary tables to use the adaptive hash index.
      
      dict_table_t::mdl_name: The original table name for the purpose of
      acquiring MDL in purge, to prevent a race condition between a
      DDL transaction that is dropping a table, and purge processing
      undo log records of DML that had executed before the DDL operation.
      For #sql-backup- tables during ALTER TABLE...ALGORITHM=COPY, the
      dict_table_t::mdl_name will differ from dict_table_t::name.
      
      dict_table_t::parse_name(): Use mdl_name instead of name.
      
      dict_table_rename_in_cache(): Update mdl_name.
      
      For the internal FTS_ tables of FULLTEXT INDEX, purge would
      acquire MDL on the FTS_ table name, but not on the main table,
      and therefore it would be able to run concurrently with a
      DDL transaction that is dropping the table. Previously, the
      DROP TABLE queue hack prevented a race between purge and DDL.
      For now, we introduce purge_sys.stop_FTS() to prevent purge from
      opening any table, while a DDL transaction that may drop FTS_
      tables is in progress. The function fts_lock_table(), which will
      be invoked before the dictionary is locked, will wait for
      purge to release any table handles.
      
      trx_t::drop_table_statistics(): Drop statistics for the table.
      This replaces dict_stats_drop_index(). We will drop or rename
      persistent statistics atomically as part of DDL transactions.
      On lock conflict for dropping statistics, we will fail instantly
      with DB_LOCK_WAIT_TIMEOUT, because we will be holding the
      exclusive data dictionary latch.
      
      trx_t::commit_cleanup(): Separated from trx_t::commit_in_memory().
      Relax an assertion around fts_commit() and allow DB_LOCK_WAIT_TIMEOUT
      in addition to DB_DUPLICATE_KEY. The call to fts_commit() is
      entirely misplaced here and may obviously break the consistency
      of transactions that affect FULLTEXT INDEX. It needs to be fixed
      separately.
      
      dict_table_t::n_foreign_key_checks_running: Remove (MDEV-21175).
      The counter was a work-around for missing meta-data locking (MDL)
      on the SQL layer, and not really needed in MariaDB.
      
      ER_TABLE_IN_FK_CHECK: Replaced with ER_UNUSED_28.
      
      HA_ERR_TABLE_IN_FK_CHECK: Remove.
      
      row_ins_check_foreign_constraints(): Do not acquire
      dict_sys.latch either. The SQL-layer MDL will protect us.
      
      This was reviewed by Thirunarayanan Balathandayuthapani
      and tested by Matthias Leich.
      1bd681c8
  14. 27 May, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-25791: Remove UNIV_INTERN · a7d68e7a
      Marko Mäkelä authored
      Back in 2006 or 2007, when MySQL AB and Innobase Oy existed as
      separately controlled entities (Innobase had been acquired by
      Oracle Corporation), MySQL 5.1 introduced a storage engine plugin
      interface and Oracle made use of it by distributing a separate
      InnoDB Plugin, which would contain some more bug fixes and
      improvements, compared to the version of InnoDB that was statically
      linked with the mysqld server that was distributed by MySQL AB.
      The built-in InnoDB would export global symbols, which would clash
      with the symbols of the dynamic InnoDB Plugin (which was supposed
      to override the built-in one when present).
      
      The solution to this problem was to declare all global symbols with
      UNIV_INTERN, so that they would get the GCC function attribute that
      specifies hidden visibility.
      
      Later, in MariaDB Server, something based on Percona XtraDB (a fork of
      MySQL InnoDB) became the statically linked implementation, and something
      closer to MySQL InnoDB was available as a dynamic plugin. Starting with
      version 10.2, MariaDB Server includes only one InnoDB implementation,
      and hence any reason to have the UNIV_INTERN definition was lost.
      
      btr_get_size_and_reserved(): Move to the same compilation unit with
      the only caller.
      
      innodb_set_buf_pool_size(): Remove. Modify innobase_buffer_pool_size
      directly.
      
      fil_crypt_calculate_checksum(): Merge to the only caller.
      
      ha_innobase::innobase_reset_autoinc(): Merge to the only caller.
      
      thd_query_start_micro(): Remove. Call thd_start_utime() directly.
      a7d68e7a
  15. 21 May, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-25743: Unnecessary copying of table names in InnoDB dictionary · 49e2c8f0
      Marko Mäkelä authored
      Many InnoDB data dictionary cache operations require that the
      table name be copied so that it will be NUL terminated.
      (For example, SYS_TABLES.NAME is not guaranteed to be NUL-terminated.)
      
      dict_table_t::is_garbage_name(): Check if a name belongs to
      the background drop table queue.
      
      dict_check_if_system_table_exists(): Remove.
      
      dict_sys_t::load_sys_tables(): Load the non-hard-coded system tables
      SYS_FOREIGN, SYS_FOREIGN_COLS, SYS_VIRTUAL on startup.
      
      dict_sys_t::create_or_check_sys_tables(): Replaces
      dict_create_or_check_foreign_constraint_tables() and
      dict_create_or_check_sys_virtual().
      
      dict_sys_t::load_table(): Replaces dict_table_get_low()
      and dict_load_table().
      
      dict_sys_t::find_table(): Renamed from get_table().
      
      dict_sys_t::sys_tables_exist(): Check whether all the non-hard-coded
      tables SYS_FOREIGN, SYS_FOREIGN_COLS, SYS_VIRTUAL exist.
      
      trx_t::has_stats_table_lock(): Moved to dict0stats.cc.
      
      Some error messages will now report table names in the internal
      databasename/tablename format, instead of `databasename`.`tablename`.
      49e2c8f0
  16. 19 May, 2021 1 commit
    • Monty's avatar
      MDEV-25180 Atomic ALTER TABLE · 7762ee5d
      Monty authored
      MDEV-25604 Atomic DDL: Binlog event written upon recovery does not
                 have default database
      
      The purpose of this task is to ensure that ALTER TABLE is atomic even if
      the MariaDB server would be killed at any point of the alter table.
      This means that either the ALTER TABLE succeeds (including that triggers,
      the status tables and the binary log are updated) or things should be
      reverted to their original state.
      
      If the server crashes before the new version is fully up to date and
      commited, it will revert to the original table and remove all
      temporary files and tables.
      If the new version is commited, crash recovery will use the new version,
      and update triggers, the status tables and the binary log.
      The one execption is ALTER TABLE .. RENAME .. where no changes are done
      to table definition. This one will work as RENAME and roll back unless
      the whole statement completed, including updating the binary log (if
      enabled).
      
      Other changes:
      - Added handlerton->check_version() function to allow the ddl recovery
        code to check, in case of inplace alter table, if the table in the
        storage engine is of the new or old version.
      - Added handler->table_version() so that an engine can report the current
        version of the table. This should be changed each time the table
        definition changes.
      - Added  ha_signal_ddl_recovery_done() and
        handlerton::signal_ddl_recovery_done() to inform all handlers when
        ddl recovery has been done. (Needed by InnoDB).
      - Added handlerton call inplace_alter_table_committed, to signal engine
        that ddl_log has been closed for the alter table query.
      - Added new handerton flag
        HTON_REQUIRES_NOTIFY_TABLEDEF_CHANGED_AFTER_COMMIT to signal when we
        should call hton->notify_tabledef_changed() during
        mysql_inplace_alter_table. This was required as MyRocks and InnoDB
        needed the call at different times.
      - Added function server_uuid_value() to be able to generate a temporary
        xid when ddl recovery writes the query to the binary log. This is
        needed to be able to handle crashes during ddl log recovery.
      - Moved freeing of the frm definition to end of mysql_alter_table() to
        remove duplicate code and have a common exit strategy.
      
      -------
      InnoDB part of atomic ALTER TABLE
      (Implemented by Marko Mäkelä)
      innodb_check_version(): Compare the saved dict_table_t::def_trx_id
      to determine whether an ALTER TABLE operation was committed.
      
      We must correctly recover dict_table_t::def_trx_id for this to work.
      Before purge removes any trace of DB_TRX_ID from system tables, it
      will make an effort to load the user table into the cache, so that
      the dict_table_t::def_trx_id can be recovered.
      
      ha_innobase::table_version(): return garbage, or the trx_id that would
      be used for committing an ALTER TABLE operation.
      
      In InnoDB, table names starting with #sql-ib will remain special:
      they will be dropped on startup. This may be revisited later in
      MDEV-18518 when we implement proper undo logging and rollback
      for creating or dropping multiple tables in a transaction.
      
      Table names starting with #sql will retain some special meaning:
      dict_table_t::parse_name() will not consider such names for
      MDL acquisition, and dict_table_rename_in_cache() will treat such
      names specially when handling FOREIGN KEY constraints.
      
      Simplify InnoDB DROP INDEX.
      Prevent purge wakeup
      
      To ensure that dict_table_t::def_trx_id will be recovered correctly
      in case the server is killed before ddl_log_complete(), we will block
      the purge of any history in SYS_TABLES, SYS_INDEXES, SYS_COLUMNS
      between ha_innobase::commit_inplace_alter_table(commit=true)
      (purge_sys.stop_SYS()) and purge_sys.resume_SYS().
      The completion callback purge_sys.resume_SYS() must be between
      ddl_log_complete() and MDL release.
      
      --------
      
      MyRocks support for atomic ALTER TABLE
      (Implemented by Sergui Petrunia)
      
      Implement these SE API functions:
      - ha_rocksdb::table_version()
      - hton->check_version = rocksdb_check_versionMyRocks data dictionary
        now stores table version for each table.
        (Absence of table version record is interpreted as table_version=0,
        that is, which means no upgrade changes are needed)
      - For inplace alter table of a partitioned table, call the underlying
        handlerton when checking if the table is ok. This assumes that the
        partition engine commits all changes at once.
      7762ee5d
  17. 18 May, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-25594: Improve debug checks · 7b51d11c
      Marko Mäkelä authored
      trx_t::will_lock: Changed the type to bool.
      
      trx_t::is_autocommit_non_locking(): Replaces
      trx_is_autocommit_non_locking().
      
      trx_is_ac_nl_ro(): Remove (replaced with equivalent assertion expressions).
      
      assert_trx_nonlocking_or_in_list(): Remove.
      Replaced with at least as strict checks in each place.
      
      check_trx_state(): Moved to a static function; partially replaced with
      individual debug assertions implementing equivalent or stricter checks.
      7b51d11c
  18. 17 May, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-25687: Remove trx_active_transactions · a6cff02a
      Marko Mäkelä authored
      MONITOR_TRX_ACTIVE: Remove. The count is not being updated consistently,
      and it would also include read-only transactions that are otherwise
      fully invisible to any other threads.
      
      If it later turns out that a reliable count of active transactions
      is needed, it can be exposed via a different interface.
      
      trx_commit_for_mysql(): If the transaction was not started, return
      immediately.
      a6cff02a
  19. 04 May, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-18518 Multi-table CREATE and DROP transactions for InnoDB · 52aac131
      Marko Mäkelä authored
      InnoDB used to support at most one CREATE TABLE or DROP TABLE
      per transaction. This caused complications for DDL operations on
      partitioned tables (where each partition is treated as a separate
      table by InnoDB) and FULLTEXT INDEX (where each index is maintained
      in a number of internal InnoDB tables).
      
      dict_drop_index_tree(): Extend the MDEV-24589 logic and treat
      the purge or rollback of SYS_INDEXES records of clustered indexes
      specially: by dropping the tablespace if it exists. This is the only
      form of recovery that we will need.
      
      trx_undo_ddl_type: Document the DDL undo log record types better.
      
      trx_t::dict_operation: Change the type to bool.
      
      trx_t::ddl: Remove.
      
      trx_t::table_id, trx_undo_t::table_id: Remove.
      
      dict_build_table_def_step(): Remove trx_t::table_id logging.
      
      dict_table_close_and_drop(), row_merge_drop_table(): Remove.
      
      row_merge_lock_table(): Merged to the only callers, which can
      call lock_table_for_trx() directly.
      
      fts_aux_table_t, fts_aux_id, fts_space_set_t: Remove.
      
      fts_drop_orphaned_tables(): Remove.
      
      row_merge_rename_index_to_drop(): Remove. Thanks to MDEV-24589,
      we can simply delete the to-be-dropped indexes from SYS_INDEXES,
      while still being able to roll back the operation.
      
      ha_innobase_inplace_ctx: Make a few data members const.
      Preallocate trx.
      
      prepare_inplace_alter_table_dict(): Simplify the logic. Let the
      normal rollback take care of some cleanup.
      
      row_undo_ins_remove_clust_rec(): Simplify the parsing of SYS_COLUMNS.
      
      trx_rollback_active(): Remove the special DROP TABLE logic.
      
      trx_undo_mem_create_at_db_start(), trx_undo_reuse_cached():
      Always write TRX_UNDO_TABLE_ID as 0.
      52aac131
  20. 27 Apr, 2021 1 commit
  21. 09 Apr, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-25297 Assertion: trx->roll_limit <= trx->undo_no in ROLLBACK TO SAVEPOINT · de119fa2
      Marko Mäkelä authored
      In commit 8ea923f5 (MDEV-24818)
      when we optimized multi-statement INSERT transactions into empty tables,
      we would roll back the entire transaction on any error. But, we would
      fail to invalidate any SAVEPOINT that had been requested in the past.
      
      trx_t::savepoints_discard(): Renamed from trx_roll_savepoints_free().
      
      row_mysql_handle_errors(): If we were in bulk insert, invoke
      trx_t::savepoints_discard(). In this way, a future attempt of
      ROLLBACK TO SAVEPOINT will return an error.
      de119fa2
  22. 19 Mar, 2021 1 commit
  23. 16 Mar, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-24818: Optimize multi-statement INSERT into an empty table · 8ea923f5
      Marko Mäkelä authored
      If the user "opts in" (as in the parent
      commit 92b2a911),
      we can optimize multiple INSERT statements to use table-level locking
      and undo logging.
      
      There will be a change of behavior:
      
          CREATE TABLE t(a PRIMARY KEY) ENGINE=InnoDB;
          SET foreign_key_checks=0, unique_checks=0;
          BEGIN; INSERT INTO t SET a=1; INSERT INTO t SET a=1; COMMIT;
      
      will end up with an empty table, because in case of an error,
      the entire transaction will be rolled back, instead of rolling
      back the failing statement. Previously, the second INSERT statement
      would have been logged row by row, and only that second statement
      would have been rolled back, leaving the first INSERT intact.
      
      lock_table_x_unlock(), trx_mod_table_time_t::WAS_BULK: Remove.
      Because we cannot really support statement rollback in this
      optimized mode, we will not optimize the locking. The exclusive
      table lock will be held until the end of the transaction.
      8ea923f5
  24. 03 Mar, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-24811 Assertion find(table) failed with innodb_evict_tables_on_commit_debug · 5bd994b0
      Marko Mäkelä authored
      This is a backport of commit 18535a40
      from 10.6.
      
      lock_release(): Implement innodb_evict_tables_on_commit_debug.
      Before releasing any locks, collect the identifiers of tables to
      be evicted. After releasing all locks, look up for the tables and
      evict them if it is safe to do so.
      
      trx_commit_in_memory(): Invoke trx_update_mod_tables_timestamp()
      before lock_release(), so that our locks will protect the tables
      from being evicted.
      5bd994b0
  25. 02 Mar, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-24811 Assertion find(table) failed with innodb_evict_tables_on_commit_debug · 18535a40
      Marko Mäkelä authored
      lock_release_try(): Implement innodb_evict_tables_on_commit_debug.
      Before releasing any locks, collect the identifiers of tables to
      be evicted. After releasing all locks, look up for the tables and
      evict them if it is safe to do so.
      
      trx_t::commit_tables(): Remove the eviction logic.
      
      trx_t::commit_in_memory(): Invoke release_locks() only after
      commit_tables().
      18535a40
  26. 24 Feb, 2021 1 commit
  27. 15 Feb, 2021 1 commit
  28. 14 Feb, 2021 1 commit
  29. 12 Feb, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-20612: Partition lock_sys.latch · b08448de
      Marko Mäkelä authored
      We replace the old lock_sys.mutex (which was renamed to lock_sys.latch)
      with a combination of a global lock_sys.latch and table or page hash lock
      mutexes.
      
      The global lock_sys.latch can be acquired in exclusive mode, or
      it can be acquired in shared mode and another mutex will be acquired
      to protect the locks for a particular page or a table.
      
      This is inspired by
      mysql/mysql-server@1d259b87a63defa814e19a7534380cb43ee23c48
      but the optimization of lock_release() will be done in the next commit.
      Also, we will interleave mutexes with the hash table elements, similar
      to how buf_pool.page_hash was optimized
      in commit 5155a300 (MDEV-22871).
      
      dict_table_t::autoinc_trx: Use Atomic_relaxed.
      
      dict_table_t::autoinc_mutex: Use srw_mutex in order to reduce the
      memory footprint. On 64-bit Linux or OpenBSD, both this and the new
      dict_table_t::lock_mutex should be 32 bits and be stored in the same
      64-bit word. On Microsoft Windows, the underlying SRWLOCK is 32 or 64
      bits, and on other systems, sizeof(pthread_mutex_t) can be much larger.
      
      ib_lock_t::trx_locks, trx_lock_t::trx_locks: Document the new rules.
      Writers must assert lock_sys.is_writer() || trx->mutex_is_owner().
      
      LockGuard: A RAII wrapper for acquiring a page hash table lock.
      
      LockGGuard: Like LockGuard, but when Galera Write-Set Replication
      is enabled, we must acquire all shards, for updating arbitrary trx_locks.
      
      LockMultiGuard: A RAII wrapper for acquiring two page hash table locks.
      
      lock_rec_create_wsrep(), lock_table_create_wsrep(): Special
      Galera conflict resolution in non-inlined functions in order
      to keep the common code paths shorter.
      
      lock_sys_t::prdt_page_free_from_discard(): Refactored from
      lock_prdt_page_free_from_discard() and
      lock_rec_free_all_from_discard_page().
      
      trx_t::commit_tables(): Replaces trx_update_mod_tables_timestamp().
      
      lock_release(): Let trx_t::commit_tables() invalidate the query cache
      for those tables that were actually modified by the transaction.
      Merge lock_check_dict_lock() to lock_release().
      
      We must never release lock_sys.latch while holding any
      lock_sys_t::hash_latch. Failure to do that could lead to
      memory corruption if the buffer pool is resized between
      the time lock_sys.latch is released and the hash_latch is released.
      b08448de
  30. 11 Feb, 2021 2 commits
    • Marko Mäkelä's avatar
      MDEV-20612: Replace lock_sys.mutex with lock_sys.latch · b01d8e1a
      Marko Mäkelä authored
      For now, we will acquire the lock_sys.latch only in exclusive mode,
      that is, use it as a mutex.
      
      This is preparation for the next commit where we will introduce
      a less intrusive alternative, combining a shared lock_sys.latch
      with dict_table_t::lock_mutex or a mutex embedded in
      lock_sys.rec_hash, lock_sys.prdt_hash, or lock_sys.prdt_page_hash.
      b01d8e1a
    • Marko Mäkelä's avatar
      MDEV-20612 preparation: LockMutexGuard · 90346492
      Marko Mäkelä authored
      Let us use the RAII wrapper LockMutexGuard for most operations where
      lock_sys.mutex is acquired.
      90346492
  31. 07 Feb, 2021 1 commit
  32. 05 Feb, 2021 3 commits
    • Marko Mäkelä's avatar
      MDEV-21452 fixup: Introduce trx_t::mutex_is_owner() · 487fbc2e
      Marko Mäkelä authored
      When we replaced trx_t::mutex with srw_mutex
      in commit 38fd7b7d
      we lost the SAFE_MUTEX instrumentation.
      Let us introduce a replacement and restore the assertions.
      487fbc2e
    • Marko Mäkelä's avatar
      MDEV-24789: Reduce sizeof(trx_lock_t) · 3e45f8e3
      Marko Mäkelä authored
      trx_lock_t::cond: Use pthread_cond_t directly, because no instrumentation
      will ever be used. This saves sizeof(void*) and removes some duplicated
      inline code.
      
      trx_lock_t::was_chosen_as_wsrep_victim: Fold into
      trx_lock_t::was_chosen_as_deadlock_victim.
      
      trx_lock_t::cancel, trx_lock_t::rec_cached, trx_lock_t::table_cached:
      Use only one byte of storage, reducing memory alignment waste.
      
      On AMD64 GNU/Linux, MDEV-24671 caused a sizeof(trx_lock_t) increase
      of 48 bytes (plus the PLUGIN_PERFSCHEMA overhead of trx_lock_t::cond).
      These changes should save 32 bytes.
      3e45f8e3
    • Marko Mäkelä's avatar
      Cleanup: Reduce some lock_sys.mutex contention · 465bdabb
      Marko Mäkelä authored
      lock_table(): Remove the constant parameter flags=0.
      
      lock_table_resurrect(): Merge lock_table_ix_resurrect() and
      lock_table_x_resurrect().
      
      lock_rec_lock(): Only acquire LockMutexGuard if lock_table_has()
      does not hold.
      465bdabb
  33. 27 Jan, 2021 2 commits
    • Marko Mäkelä's avatar
      MDEV-24671: Replace lock_wait_timeout_task with mysql_cond_timedwait() · e71e6133
      Marko Mäkelä authored
      lock_wait(): Replaces lock_wait_suspend_thread(). Wait for the lock to
      be granted or the transaction to be killed using mysql_cond_timedwait()
      or mysql_cond_wait().
      
      lock_wait_end(): Replaces que_thr_end_lock_wait() and
      lock_wait_release_thread_if_suspended().
      
      lock_wait_timeout_task: Remove. The operating system kernel will
      resume the mysql_cond_timedwait() in lock_wait(). An added benefit
      is that innodb_lock_wait_timeout no longer has a 'jitter' of 1 second,
      which was caused by this wake-up task waking up only once per second,
      and then waking up any threads for which the timeout (which was only
      measured in seconds) was exceeded.
      
      innobase_kill_query(): Set trx->error_state=DB_INTERRUPTED,
      so that a call trx_is_interrupted(trx) in lock_wait() can be avoided.
      
      We will protect things more consistently with lock_sys.wait_mutex,
      which will be moved below lock_sys.mutex in the latching order.
      
      trx_lock_t::cond: Condition variable for !wait_lock, used with
      lock_sys.wait_mutex.
      
      srv_slot_t: Remove. Replaced by trx_lock_t::cond,
      
      lock_grant_after_reset(): Merged to to lock_grant().
      
      lock_rec_get_index_name(): Remove.
      
      lock_sys_t: Introduce wait_pending, wait_count, wait_time, wait_time_max
      that are protected by wait_mutex.
      
      trx_lock_t::que_state: Remove.
      
      que_thr_state_t: Remove QUE_THR_COMMAND_WAIT, QUE_THR_LOCK_WAIT.
      
      que_thr_t: Remove is_active, start_running(), stop_no_error().
      
      que_fork_t::n_active_thrs, trx_lock_t::n_active_thrs: Remove.
      e71e6133
    • Marko Mäkelä's avatar
      Cleanups: · 7f1ab8f7
      Marko Mäkelä authored
      que_thr_t::fork_type: Remove.
      
      QUE_THR_SUSPENDED, TRX_QUE_COMMITTING: Remove.
      
      Cleanup lock_cancel_waiting_and_release()
      7f1ab8f7