An error occurred fetching the project authors.
  1. 22 Apr, 2022 1 commit
    • Marko Mäkelä's avatar
      MDEV-27094 Debug builds include useless InnoDB "disabled" options · c009ce7d
      Marko Mäkelä authored
      This is a backport of commit 4489a89c
      in order to remove the test innodb.redo_log_during_checkpoint
      that would cause trouble in the DBUG subsystem invoked by
      safe_mutex_lock() via log_checkpoint(). Before
      commit 7cffb5f6
      these mutexes were of different type.
      
      The following options were introduced in
      commit 2e814d47 (mariadb-10.2.2)
      and have little use:
      
      innodb_disable_resize_buffer_pool_debug had no effect even in
      MariaDB 10.2.2 or MySQL 5.7.9. It was introduced in
      mysql/mysql-server@5c4094cf4971eebab89da4ee4ae92c71f69cd524
      to work around a problem that was fixed in
      mysql/mysql-server@2957ae4f990bf3aed25822b0ce15d3ccad0b54b6
      (but the parameter was not removed).
      
      innodb_page_cleaner_disabled_debug and innodb_master_thread_disabled_debug
      are only used by the test innodb.redo_log_during_checkpoint
      that will be removed as part of this commit.
      
      innodb_dict_stats_disabled_debug is only used by that test,
      and it is redundant because one could simply use
      innodb_stats_persistent=OFF or the STATS_PERSISTENT=0 attribute
      of the table in the test to achieve the same effect.
      c009ce7d
  2. 19 Apr, 2022 1 commit
  3. 06 Apr, 2022 1 commit
    • Marko Mäkelä's avatar
      MDEV-25975 innodb_disallow_writes causes shutdown to hang · e9735a81
      Marko Mäkelä authored
      We will remove the parameter innodb_disallow_writes because it is badly
      designed and implemented. The parameter was never allowed at startup.
      It was only internally used by Galera snapshot transfer.
      If a user executed
      SET GLOBAL innodb_disallow_writes=ON;
      the server could hang even on subsequent read operations.
      
      During Galera snapshot transfer, we will block writes
      to implement an rsync friendly snapshot, as follows:
      
      sst_flush_tables() will acquire a global lock by executing
      FLUSH TABLES WITH READ LOCK, which will block any writes
      at the high level.
      
      sst_disable_innodb_writes(), invoked via ha_disable_internal_writes(true),
      will suspend or disable InnoDB background tasks or threads that could
      initiate writes. As part of this, log_make_checkpoint() will be invoked
      to ensure that anything in the InnoDB buf_pool.flush_list will be written
      to the data files. This has the nice side effect that the Galera joiner
      will avoid crash recovery.
      
      The changes to sql/wsrep.cc and to the tests are based on a prototype
      that was developed by Jan Lindström.
      
      Reviewed by: Jan Lindström
      e9735a81
  4. 19 Jan, 2022 1 commit
    • Daniel Black's avatar
      MDEV-27467: innodb to enforce the minimum innodb_buffer_pool_size in SET GLOBAL · 410c4ede
      Daniel Black authored
      .. to be the same as startup.
      
      In resolving MDEV-27461, BUF_LRU_MIN_LEN (256) is the minimum number of
      pages for the innodb buffer pool size. Obviously we need more than just
      flushing pages. Taking the 16k page size and its default minimum, an
      extra 25% is needed on top of the flushing pages to make a workable buffer
      pool.
      
      The minimum innodb_buffer_pool_chunk_size (1M) restricts the minimum
      otherwise we'd have a pool made up of different chunk sizes.
      
      The resulting minimum innodb buffer pool sizes are:
      
      Page Size, Previously minimum (startup), with change.
              4k                            5M           2M
              8k                            5M           3M
             16k                            5M           5M
             32k                           24M          10M
             64k                           24M          20M
      
      With this patch, SET GLOBAL innodb_buffer_pool_size minimums are
      enforced.
      
      The evident minimum system variable size for innodb_buffer_pool_size
      is 2M, however this is only setable if using 4k page size. As
      the order of the page_size and buffer_pool_size aren't fixed, we can't
      hide this change.
      
      Subsequent changes:
      * innodb_buffer_pool_resize_with_chunks.test - raised of pool resize due to new
        minimums. Chunk size also needed increase as the test was for
        pool_size < chunk_size to generate a warning.
      * Removed srv_buf_pool_min_size and replaced use with MYSQL_SYSVAR_NAME(buffer_pool_size).min_val
      * Removed srv_buf_pool_def_size and replaced constant defination in
        MYSQL_SYSVAR_LONGLONG(buffer_pool_size)
      * Reordered ha_innodb to allow for direct use of MYSQL_SYSVAR_NAME(buffer_pool_size).min_val
      * Moved buf_pool_size_align into ha_innodb to access to MYSQL_SYSVAR_NAME(buffer_pool_size).min_val
      * loose-innodb_disable_resize_buffer_pool_debug is needed in the
        innodb.restart.opt test so that under debug mode, resizing of the
        innodb buffer pool can occur.
      410c4ede
  5. 04 Jan, 2022 1 commit
    • Marko Mäkelä's avatar
      MDEV-27416 InnoDB hang in buf_flush_wait_flushed(), on log checkpoint · 4c3ad244
      Marko Mäkelä authored
      InnoDB could sometimes hang when triggering a log checkpoint. This is
      due to commit 7b1252c0 (MDEV-24278),
      which introduced an untimed wait to buf_flush_page_cleaner().
      
      The hang was noticed by occasional failures of IMPORT TABLESPACE tests,
      such as innodb.innodb-wl5522, which would (unnecessarily) invoke
      log_make_checkpoint() from row_import_cleanup().
      
      The reason of the hang was that buf_flush_page_cleaner() would enter
      untimed sleep despite buf_flush_sync_lsn being set. The exact failure
      scenario is unclear, because buf_flush_sync_lsn should actually be
      protected by buf_pool.flush_list_mutex. We prevent the hang by
      invoking buf_pool.page_cleaner_set_idle(false) whenever we are
      setting buf_flush_sync_lsn and signaling buf_pool.do_flush_list.
      
      The bulk of these changes was originally developed as a preparation
      for MDEV-26827, to invoke buf_flush_list() from fewer threads,
      and tested on 10.6 by Matthias Leich.
      
      This fix was tested by running 100 repetitions of 100 concurrent instances
      of the test innodb.innodb-wl5522 on a RelWithDebInfo build, using ext4fs
      and innodb_flush_method=O_DIRECT on a SATA SSD with 4096-byte block size.
      During the test, the call to log_make_checkpoint() in row_import_cleanup()
      was present.
      
      buf_flush_list(): Make static.
      
      buf_flush_wait(): Wait for buf_pool.get_oldest_modification()
      to reach a target, by work done in the buf_flush_page_cleaner.
      If buf_flush_sync_lsn is going to be set, we will invoke
      buf_pool.page_cleaner_set_idle(false).
      
      buf_flush_ahead(): If buf_flush_sync_lsn or buf_flush_async_lsn
      is going to be set and the page cleaner woken up, we will invoke
      buf_pool.page_cleaner_set_idle(false).
      
      buf_flush_wait_flushed(): Invoke buf_flush_wait().
      
      buf_flush_sync(): Invoke recv_sys.apply() at the start in case
      crash recovery is active. Invoke buf_flush_wait().
      
      buf_flush_sync_batch(): A lower-level variant of buf_flush_sync()
      that is only called by recv_sys_t::apply().
      
      buf_flush_sync_for_checkpoint(): Do not trigger log apply
      or checkpoint during recovery.
      
      buf_dblwr_t::create(): Only initiate a buffer pool flush, not
      a checkpoint.
      
      row_import_cleanup(): Do not unnecessarily invoke log_make_checkpoint().
      Invoking buf_flush_list_space() before starting to generate redo log
      for the imported tablespace should suffice.
      
      srv_prepare_to_delete_redo_log_file():
      Set recv_sys.recovery_on in order to prevent
      buf_flush_sync_for_checkpoint() from initiating a checkpoint
      while the log is inaccessible. Remove a wait loop that is already
      part of buf_flush_sync().
      Do not invoke fil_names_clear() if the log is being upgraded,
      because the FILE_MODIFY record is specific to the latest format.
      
      create_log_file(): Clear recv_sys.recovery_on only after calling
      log_make_checkpoint(), to prevent buf_flush_page_cleaner from
      invoking a checkpoint.
      
      innodb_shutdown(): Simplify the logic in mariadb-backup --prepare.
      
      os_aio_wait_until_no_pending_writes(): Update the function comment.
      Apart from row_quiesce_table_start() during FLUSH TABLES...FOR EXPORT,
      this is being called by buf_flush_list_space(), which is invoked
      by ALTER TABLE...IMPORT TABLESPACE as well as some encryption operations.
      4c3ad244
  6. 22 Oct, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-26769 InnoDB does not support hardware lock elision · 1f022809
      Marko Mäkelä authored
      This implements memory transaction support for:
      
      * Intel Restricted Transactional Memory (RTM), also known as TSX-NI
      (Transactional Synchronization Extensions New Instructions)
      * POWER v2.09 Hardware Trace Monitor (HTM) on GNU/Linux
      
      transactional_lock_guard, transactional_shared_lock_guard:
      RAII lock guards that try to elide the lock acquisition
      when transactional memory is available.
      
      buf_pool.page_hash: Try to elide latches whenever feasible.
      Related to the InnoDB change buffer and ROW_FORMAT=COMPRESSED
      tables, this is not always possible.
      In buf_page_get_low(), memory transactions only work reasonably
      well for validating a guessed block address.
      
      TMLockGuard, TMLockTrxGuard, TMLockMutexGuard: RAII lock guards
      that try to elide lock_sys.latch and related latches.
      1f022809
  7. 13 Oct, 2021 1 commit
  8. 24 Sep, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-26672 innodb_undo_log_truncate may reset transaction ID sequence · 4bfdba2e
      Marko Mäkelä authored
      trx_rseg_header_create(): Add a parameter for the value that is
      to be written to TRX_RSEG_MAX_TRX_ID. If we omit this write, then
      the updated test innodb.undo_truncate will fail for the 4k, 8k, 16k
      page sizes. This was broken ever since
      commit 947efe17 (MDEV-15158)
      removed the writes of transaction identifiers to the TRX_SYS page.
      
      srv_do_purge(): Truncate undo tablespaces also during slow shutdown
      (innodb_fast_shutdown=0).
      
      Thanks to Krunal Bauskar for noticing this problem.
      4bfdba2e
  9. 17 Sep, 2021 1 commit
  10. 14 Sep, 2021 2 commits
    • Marko Mäkelä's avatar
      MDEV-26356 Adaptive purge scheduling based on redo log fill factor · ea52a3eb
      Marko Mäkelä authored
      This should be equivalent to pull request #1889 by Krunal Bauskar.
      
      The existing logic in purge_coordinator_state::do_purge()
      activates a number of the configured innodb_purge_threads
      based on the history list length. Activating more purge worker
      tasks should shrink the history list faster. But, more purge
      workers will also generate more redo log, which may slow down
      writes by user connections.
      
      row_purge_parse_undo_rec(): Revert the work-around that was added in
      commit 46904424.
      
      purge_coordinator_state: Keep track of the redo log fill factor
      (how big percentage of innodb_log_file_size is being occupied by
      log records that were generated since the latest checkpoint).
      If the redo log is getting full, log checkpoints will be triggered
      more frequently, and user threads may end up waiting in
      log_free_check(). We try to reduce purge-induced jitter in overall
      throughput by throttling down the active number of purge tasks as
      the log checkpoint age is approaching the log size (in other words,
      the redo log fill factor is approaching 100%).
      ea52a3eb
    • Marko Mäkelä's avatar
      MDEV-26356 preparation: Refactor purge_state · 717a3215
      Marko Mäkelä authored
      purge_coordinator_timer_callback(): Remove. We will have
      purge_coordinator_timer invoke purge_coordinator_callback()
      directly.
      
      srv_master_callback(): Invoke srv_wake_purge_thread_if_not_active()
      instead of purge_coordinator_timer_callback(). That is, we will
      trigger purge_coordinator_callback() once per second if there is
      any work to be done.
      
      purge_state::do_purge(): Replaces srv_do_purge(),
      purge_coordinator_callback_low(), and
      purge_coordinator_timer_callback(). The static variables
      inside srv_do_purge() were moved to purge_state data members.
      717a3215
  11. 31 Aug, 2021 2 commits
    • Marko Mäkelä's avatar
      MDEV-24258 Merge dict_sys.mutex into dict_sys.latch · 82b7c561
      Marko Mäkelä authored
      In the parent commit, dict_sys.latch could theoretically have been
      replaced with a mutex. But, we can do better and merge dict_sys.mutex
      into dict_sys.latch. Generally, every occurrence of dict_sys.mutex_lock()
      will be replaced with dict_sys.lock().
      
      The PERFORMANCE_SCHEMA instrumentation for dict_sys_mutex
      will be removed along with dict_sys.mutex. The dict_sys.latch
      will remain instrumented as dict_operation_lock.
      
      Some use of dict_sys.lock() will be replaced with dict_sys.freeze(),
      which we will reintroduce for the new shared mode. Most notably,
      concurrent table lookups are possible as long as the tables are present
      in the dict_sys cache. In particular, this will allow more concurrency
      among InnoDB purge workers.
      
      Because dict_sys.mutex will no longer 'throttle' the threads that purge
      InnoDB transaction history, a performance degradation may be observed
      unless innodb_purge_threads=1.
      
      The table cache eviction policy will become FIFO-like,
      similar to what happened to fil_system.LRU
      in commit 45ed9dd9.
      The name of the list dict_sys.table_LRU will become somewhat misleading;
      that list contains tables that may be evicted, even though the
      eviction policy no longer is least-recently-used but first-in-first-out.
      (Note: Tables can never be evicted as long as locks exist on them or
      the tables are in use by some thread.)
      
      As demonstrated by the test perfschema.sxlock_func, there
      will be less contention on dict_sys.latch, because some previous
      use of exclusive latches will be replaced with shared latches.
      
      fts_parse_sql_no_dict_lock(): Replaced with pars_sql().
      
      fts_get_table_name_prefix(): Merged to fts_optimize_create().
      
      dict_stats_update_transient_for_index(): Deduplicated some code.
      
      ha_innobase::info_low(), dict_stats_stop_bg(): Use a combination
      of dict_sys.latch and table->stats_mutex_lock() to cover the
      changes of BG_STAT_SHOULD_QUIT, because the flag is being read
      in dict_stats_update_persistent() while not holding dict_sys.latch.
      
      row_discard_tablespace_for_mysql(): Protect stats_bg_flag by
      exclusive dict_sys.latch, like most other code does.
      
      row_quiesce_table_has_fts_index(): Remove unnecessary mutex
      acquisition. FLUSH TABLES...FOR EXPORT is protected by MDL.
      
      row_import::set_root_by_heuristic(): Remove unnecessary mutex
      acquisition. ALTER TABLE...IMPORT TABLESPACE is protected by MDL.
      
      row_ins_sec_index_entry_low(): Replace a call
      to dict_set_corrupted_index_cache_only(). Reads of index->type
      were not really protected by dict_sys.mutex, and writes
      (flagging an index corrupted) should be extremely rare.
      
      dict_stats_process_entry_from_defrag_pool(): Only freeze the dictionary,
      do not lock it exclusively.
      
      dict_stats_wait_bg_to_stop_using_table(), DICT_BG_YIELD: Remove trx.
      We can simply invoke dict_sys.unlock() and dict_sys.lock() directly.
      
      dict_acquire_mdl_shared()<trylock=false>: Assert that dict_sys.latch is
      only held in shared more, not exclusive mode. Only acquire it in
      exclusive mode if the table needs to be loaded to the cache.
      
      dict_sys_t::acquire(): Remove. Relocating elements in dict_sys.table_LRU
      would require holding an exclusive latch, which we want to avoid
      for performance reasons.
      
      dict_sys_t::allow_eviction(): Add the table first to dict_sys.table_LRU,
      to compensate for the removal of dict_sys_t::acquire(). This function
      is only invoked by INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS.
      
      dict_table_open_on_id(), dict_table_open_on_name(): If dict_locked=false,
      try to acquire dict_sys.latch in shared mode. Only acquire the latch in
      exclusive mode if the table is not found in the cache.
      
      Reviewed by: Thirunarayanan Balathandayuthapani
      82b7c561
    • Vladislav Vaintroub's avatar
      MDEV-26511 - Do not change purge thread count during bootstrap · 1fcd8db7
      Vladislav Vaintroub authored
      Apparently, in bootstrap this could crash when creating new THDs
      1fcd8db7
  12. 30 Aug, 2021 1 commit
  13. 17 Aug, 2021 1 commit
  14. 22 Jul, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-26193: Wake up purge less often · a4dc9265
      Marko Mäkelä authored
      Starting with commit 6e12ebd4
      (MDEV-25062), srv_wake_purge_thread_if_not_active() became
      more expensive operation, especially on NUMA systems, because
      instead of reading an atomic global variable trx_sys.rseg_history_len
      we are traversing up to 128 cache lines in trx_sys.history_exists().
      
      trx_t::commit_cleanup(): Do not wake up purge at all.
      We will wake up purge about once per second in srv_master_callback().
      
      srv_master_do_active_tasks(), srv_master_do_idle_tasks():
      Move some duplicated code to srv_master_callback().
      
      srv_master_callback(): Invoke purge_coordinator_timer_callback()
      to ensure that purge will be periodically woken up, even if the
      latest execution of trx_t::commit_cleanup() allowed the purge view
      to advance but did not wake up purge.
      Do not call log_free_check(), because every thread that is going
      to generate redo log is supposed to call that function anyway,
      before acquiring any page latches. Additional calls to the function
      every few seconds should not make any difference.
      
      srv_shutdown_threads(): Ensure that srv_shutdown_state can be at most
      SRV_SHUTDOWN_INITIATED in srv_master_callback(), by first invoking
      srv_master_timer.reset() before changing srv_shutdown_state.
      (Note: We first terminate the srv_master_callback and only then
      terminate the purge tasks. Thus, the purge subsystem should exist
      when srv_master_callback() invokes purge_coordinator_timer_callback()
      if it was initiated in the first place.
      a4dc9265
  15. 16 Jul, 2021 1 commit
  16. 23 Jun, 2021 2 commits
    • Marko Mäkelä's avatar
      MDEV-25113: Make page flushing faster · 22b62eda
      Marko Mäkelä authored
      buf_page_write_complete(): Reduce the buf_pool.mutex hold time,
      and do not acquire buf_pool.flush_list_mutex at all.
      Instead, mark blocks clean by setting oldest_modification to 1.
      Dirty pages of temporary tables will be identified by the special
      value 2 instead of the previous special value 1.
      (By design of the ib_logfile0 format, actual LSN values smaller
      than 2048 are not possible.)
      
      buf_LRU_free_page(), buf_pool_t::get_oldest_modification()
      and many other functions will remove the garbage (clean blocks)
      from buf_pool.flush_list while holding buf_pool.flush_list_mutex.
      
      buf_pool_t::n_flush_LRU, buf_pool_t::n_flush_list:
      Replaced with non-atomic variables, protected by buf_pool.mutex,
      to avoid unnecessary synchronization when modifying the counts.
      
      export_vars: Remove unnecessary indirection for
      innodb_pages_created, innodb_pages_read, innodb_pages_written.
      22b62eda
    • Marko Mäkelä's avatar
      MDEV-25062: Reduce trx_rseg_t::mutex contention · 6e12ebd4
      Marko Mäkelä authored
      redo_rseg_mutex, noredo_rseg_mutex: Remove the PERFORMANCE_SCHEMA keys.
      The rollback segment mutex will be uninstrumented.
      
      trx_sys_t: Remove pointer indirection for rseg_array, temp_rseg.
      Align each element to the cache line.
      
      trx_sys_t::rseg_id(): Replaces trx_rseg_t::id.
      
      trx_rseg_t::ref: Replaces needs_purge, trx_ref_count, skip_allocation
      in a single std::atomic<uint32_t>.
      
      trx_rseg_t::latch: Replaces trx_rseg_t::mutex.
      
      trx_rseg_t::history_size: Replaces trx_sys_t::rseg_history_len
      
      trx_sys_t::history_size_approx(): Replaces trx_sys.rseg_history_len
      in those places where the exact count does not matter. We must not
      acquire any trx_rseg_t::latch while holding index page latches, because
      normally the trx_rseg_t::latch is acquired before any page latches.
      
      trx_sys_t::history_exists(): Replaces trx_sys.rseg_history_len!=0
      with an approximation.
      
      We remove some unnecessary trx_rseg_t::latch acquisition around
      trx_undo_set_state_at_prepare() and trx_undo_set_state_at_finish().
      Those operations will only access fields that remain constant
      after trx_rseg_t::init().
      6e12ebd4
  17. 17 Jun, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-25854: Remove garbage tables after restoring a backup · f778a5d5
      Marko Mäkelä authored
      In commit 1c5ae991 (MDEV-25666)
      we had changed Mariabackup so that it would no longer skip files
      whose names start with #sql. This turned out to be wrong.
      Because operations on such named files are not protected by any
      locks in the server, it is not safe to copy them.
      
      Not copying the files may make the InnoDB data dictionary
      inconsistent with the file system. So, we must do something
      in InnoDB to adjust for that.
      
      If InnoDB is being started up without the redo log (ib_logfile0)
      or with a zero-length log file, we will assume that the server
      was restored from a backup, and adjust things as follows:
      
      dict_check_sys_tables(), fil_ibd_open(): Do not complain about
      missing #sql files if they would be dropped a little later.
      
      dict_stats_update_if_needed(): Never add #sql tables to
      the recomputing queue. This avoids a potential race condition when
      dropping the garbage tables.
      
      drop_garbage_tables_after_restore(): Try to drop any garbage tables.
      
      innodb_ddl_recovery_done(): Invoke drop_garbage_tables_after_restore()
      if srv_start_after_restore (a new flag) was set and we are not in
      read-only mode (innodb_read_only=ON or innodb_force_recovery>3).
      
      The tests and dbug_mariabackup_event() instrumentation
      were developed by Vladislav Vaintroub, who also reviewed this.
      f778a5d5
  18. 11 Jun, 2021 1 commit
    • Krunal Bauskar's avatar
      MDEV-25882: Statistics used to track b-tree (non-adaptive) searches · 102ff420
      Krunal Bauskar authored
                  should be updated only when adaptive hashing is turned-on
      
      Currently, btr_cur_n_non_sea is used to track the search that missed
      adaptive hash index. adaptive hash index is turned off by default
      but the said variable is updated always though the value of it makes sense
      only when an adaptive index is enabled. It is meant to check how many
      searches didn't go through an adaptive hash index.
      
      Given a global variable that is updated on each search path it causes
      a contention with a multi-threaded workload.
      
      Patch moves the said variables inside a loop that is now updated
      only when the adaptive hash index is enabled and that in theory should
      also, reduce the update frequency of the said variable as the majority of
      the request should be serviced through the adaptive hash index.
      
      Variables (btr_cur_n_non_sea and btr_cur_n_sea) are also converted to
      use distributed counter to avoid contention.
      
      User visible changes:
      
      This also means that user will now see
      Innodb_adaptive_hash_non_hash_searches (viewed as part of show status)
      only if code is compiled with DWITH_INNODB_AHI=ON (default) and it will
      be updated only if innodb_adaptive_hash_index=1 else it reported as 0.
      102ff420
  19. 09 Jun, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-25506 (3 of 3): Do not delete .ibd files before commit · 1bd681c8
      Marko Mäkelä authored
      This is a complete rewrite of DROP TABLE, also as part of other DDL,
      such as ALTER TABLE, CREATE TABLE...SELECT, TRUNCATE TABLE.
      
      The background DROP TABLE queue hack is removed.
      If a transaction needs to drop and create a table by the same name
      (like TRUNCATE TABLE does), it must first rename the table to an
      internal #sql-ib name. No committed version of the data dictionary
      will include any #sql-ib tables, because whenever a transaction
      renames a table to a #sql-ib name, it will also drop that table.
      Either the rename will be rolled back, or the drop will be committed.
      
      Data files will be unlinked after the transaction has been committed
      and a FILE_RENAME record has been durably written. The file will
      actually be deleted when the detached file handle returned by
      fil_delete_tablespace() will be closed, after the latches have been
      released. It is possible that a purge of the delete of the SYS_INDEXES
      record for the clustered index will execute fil_delete_tablespace()
      concurrently with the DDL transaction. In that case, the thread that
      arrives later will wait for the other thread to finish.
      
      HTON_TRUNCATE_REQUIRES_EXCLUSIVE_USE: A new handler flag.
      ha_innobase::truncate() now requires that all other references to
      the table be released in advance. This was implemented by Monty.
      
      ha_innobase::delete_table(): If CREATE TABLE..SELECT is detected,
      we will "hijack" the current transaction, drop the table in
      the current transaction and commit the current transaction.
      This essentially fixes MDEV-21602. There is a FIXME comment about
      making the check less failure-prone.
      
      ha_innobase::truncate(), ha_innobase::delete_table():
      Implement a fast path for temporary tables. We will no longer allow
      temporary tables to use the adaptive hash index.
      
      dict_table_t::mdl_name: The original table name for the purpose of
      acquiring MDL in purge, to prevent a race condition between a
      DDL transaction that is dropping a table, and purge processing
      undo log records of DML that had executed before the DDL operation.
      For #sql-backup- tables during ALTER TABLE...ALGORITHM=COPY, the
      dict_table_t::mdl_name will differ from dict_table_t::name.
      
      dict_table_t::parse_name(): Use mdl_name instead of name.
      
      dict_table_rename_in_cache(): Update mdl_name.
      
      For the internal FTS_ tables of FULLTEXT INDEX, purge would
      acquire MDL on the FTS_ table name, but not on the main table,
      and therefore it would be able to run concurrently with a
      DDL transaction that is dropping the table. Previously, the
      DROP TABLE queue hack prevented a race between purge and DDL.
      For now, we introduce purge_sys.stop_FTS() to prevent purge from
      opening any table, while a DDL transaction that may drop FTS_
      tables is in progress. The function fts_lock_table(), which will
      be invoked before the dictionary is locked, will wait for
      purge to release any table handles.
      
      trx_t::drop_table_statistics(): Drop statistics for the table.
      This replaces dict_stats_drop_index(). We will drop or rename
      persistent statistics atomically as part of DDL transactions.
      On lock conflict for dropping statistics, we will fail instantly
      with DB_LOCK_WAIT_TIMEOUT, because we will be holding the
      exclusive data dictionary latch.
      
      trx_t::commit_cleanup(): Separated from trx_t::commit_in_memory().
      Relax an assertion around fts_commit() and allow DB_LOCK_WAIT_TIMEOUT
      in addition to DB_DUPLICATE_KEY. The call to fts_commit() is
      entirely misplaced here and may obviously break the consistency
      of transactions that affect FULLTEXT INDEX. It needs to be fixed
      separately.
      
      dict_table_t::n_foreign_key_checks_running: Remove (MDEV-21175).
      The counter was a work-around for missing meta-data locking (MDL)
      on the SQL layer, and not really needed in MariaDB.
      
      ER_TABLE_IN_FK_CHECK: Replaced with ER_UNUSED_28.
      
      HA_ERR_TABLE_IN_FK_CHECK: Remove.
      
      row_ins_check_foreign_constraints(): Do not acquire
      dict_sys.latch either. The SQL-layer MDL will protect us.
      
      This was reviewed by Thirunarayanan Balathandayuthapani
      and tested by Matthias Leich.
      1bd681c8
  20. 27 May, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-25791: Remove UNIV_INTERN · a7d68e7a
      Marko Mäkelä authored
      Back in 2006 or 2007, when MySQL AB and Innobase Oy existed as
      separately controlled entities (Innobase had been acquired by
      Oracle Corporation), MySQL 5.1 introduced a storage engine plugin
      interface and Oracle made use of it by distributing a separate
      InnoDB Plugin, which would contain some more bug fixes and
      improvements, compared to the version of InnoDB that was statically
      linked with the mysqld server that was distributed by MySQL AB.
      The built-in InnoDB would export global symbols, which would clash
      with the symbols of the dynamic InnoDB Plugin (which was supposed
      to override the built-in one when present).
      
      The solution to this problem was to declare all global symbols with
      UNIV_INTERN, so that they would get the GCC function attribute that
      specifies hidden visibility.
      
      Later, in MariaDB Server, something based on Percona XtraDB (a fork of
      MySQL InnoDB) became the statically linked implementation, and something
      closer to MySQL InnoDB was available as a dynamic plugin. Starting with
      version 10.2, MariaDB Server includes only one InnoDB implementation,
      and hence any reason to have the UNIV_INTERN definition was lost.
      
      btr_get_size_and_reserved(): Move to the same compilation unit with
      the only caller.
      
      innodb_set_buf_pool_size(): Remove. Modify innobase_buffer_pool_size
      directly.
      
      fil_crypt_calculate_checksum(): Merge to the only caller.
      
      ha_innobase::innobase_reset_autoinc(): Merge to the only caller.
      
      thd_query_start_micro(): Remove. Call thd_start_utime() directly.
      a7d68e7a
  21. 21 May, 2021 1 commit
    • Marko Mäkelä's avatar
      Cleanup: Access lower_case_table_names, tdc_size directly · 9eb4ad57
      Marko Mäkelä authored
      dict_sys_t::evict_table_LRU(): Replaces dict_make_room_in_cache() and
      srv_master_evict_from_table_cache().
      
      innobase_get_table_cache_size(): Replaced with direct read of tdc_size,
      in dict_sys_t::evict_table_LRU().
      
      innobase_get_lower_case_table_names(): Replaced with direct reads of
      lower_case_table_names.
      9eb4ad57
  22. 19 May, 2021 1 commit
    • Monty's avatar
      MDEV-25180 Atomic ALTER TABLE · 7762ee5d
      Monty authored
      MDEV-25604 Atomic DDL: Binlog event written upon recovery does not
                 have default database
      
      The purpose of this task is to ensure that ALTER TABLE is atomic even if
      the MariaDB server would be killed at any point of the alter table.
      This means that either the ALTER TABLE succeeds (including that triggers,
      the status tables and the binary log are updated) or things should be
      reverted to their original state.
      
      If the server crashes before the new version is fully up to date and
      commited, it will revert to the original table and remove all
      temporary files and tables.
      If the new version is commited, crash recovery will use the new version,
      and update triggers, the status tables and the binary log.
      The one execption is ALTER TABLE .. RENAME .. where no changes are done
      to table definition. This one will work as RENAME and roll back unless
      the whole statement completed, including updating the binary log (if
      enabled).
      
      Other changes:
      - Added handlerton->check_version() function to allow the ddl recovery
        code to check, in case of inplace alter table, if the table in the
        storage engine is of the new or old version.
      - Added handler->table_version() so that an engine can report the current
        version of the table. This should be changed each time the table
        definition changes.
      - Added  ha_signal_ddl_recovery_done() and
        handlerton::signal_ddl_recovery_done() to inform all handlers when
        ddl recovery has been done. (Needed by InnoDB).
      - Added handlerton call inplace_alter_table_committed, to signal engine
        that ddl_log has been closed for the alter table query.
      - Added new handerton flag
        HTON_REQUIRES_NOTIFY_TABLEDEF_CHANGED_AFTER_COMMIT to signal when we
        should call hton->notify_tabledef_changed() during
        mysql_inplace_alter_table. This was required as MyRocks and InnoDB
        needed the call at different times.
      - Added function server_uuid_value() to be able to generate a temporary
        xid when ddl recovery writes the query to the binary log. This is
        needed to be able to handle crashes during ddl log recovery.
      - Moved freeing of the frm definition to end of mysql_alter_table() to
        remove duplicate code and have a common exit strategy.
      
      -------
      InnoDB part of atomic ALTER TABLE
      (Implemented by Marko Mäkelä)
      innodb_check_version(): Compare the saved dict_table_t::def_trx_id
      to determine whether an ALTER TABLE operation was committed.
      
      We must correctly recover dict_table_t::def_trx_id for this to work.
      Before purge removes any trace of DB_TRX_ID from system tables, it
      will make an effort to load the user table into the cache, so that
      the dict_table_t::def_trx_id can be recovered.
      
      ha_innobase::table_version(): return garbage, or the trx_id that would
      be used for committing an ALTER TABLE operation.
      
      In InnoDB, table names starting with #sql-ib will remain special:
      they will be dropped on startup. This may be revisited later in
      MDEV-18518 when we implement proper undo logging and rollback
      for creating or dropping multiple tables in a transaction.
      
      Table names starting with #sql will retain some special meaning:
      dict_table_t::parse_name() will not consider such names for
      MDL acquisition, and dict_table_rename_in_cache() will treat such
      names specially when handling FOREIGN KEY constraints.
      
      Simplify InnoDB DROP INDEX.
      Prevent purge wakeup
      
      To ensure that dict_table_t::def_trx_id will be recovered correctly
      in case the server is killed before ddl_log_complete(), we will block
      the purge of any history in SYS_TABLES, SYS_INDEXES, SYS_COLUMNS
      between ha_innobase::commit_inplace_alter_table(commit=true)
      (purge_sys.stop_SYS()) and purge_sys.resume_SYS().
      The completion callback purge_sys.resume_SYS() must be between
      ddl_log_complete() and MDL release.
      
      --------
      
      MyRocks support for atomic ALTER TABLE
      (Implemented by Sergui Petrunia)
      
      Implement these SE API functions:
      - ha_rocksdb::table_version()
      - hton->check_version = rocksdb_check_versionMyRocks data dictionary
        now stores table version for each table.
        (Absence of table version record is interpreted as table_version=0,
        that is, which means no upgrade changes are needed)
      - For inplace alter table of a partitioned table, call the underlying
        handlerton when checking if the table is ok. This assumes that the
        partition engine commits all changes at once.
      7762ee5d
  23. 27 Apr, 2021 1 commit
    • Nikita Malyavin's avatar
      revive innodb_debug_sync · 300253ac
      Nikita Malyavin authored
      innodb_debug_sync was introduced in commit
      b393e2cb and reverted in
      commit fc58c172 due to memory leak reported
      by valgrind, see MDEV-21336.
      
      The leak is now fixed by adding `rw_lock_free(&slot->debug_sync_lock)`
      after background thread working loop is finished, and the patch is
      reapplied, with respect to c++98 fixes by Marko.
      
      The missing DEBUG_SYNC for MDEV-18546 in row0vers.cc is also reapplied.
      300253ac
  24. 30 Mar, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-24302 follow-up: RESET MASTER hangs · 8c2e3259
      Marko Mäkelä authored
      As pointed out by Andrei Elkin, the previous fix did not fix one
      race condition that may have caused the observed hang.
      
      innodb_log_flush_request(): If we are enqueueing the very first
      request at the same time the log write is being completed,
      we must ensure that a near-concurrent call to log_flush_notify()
      will not result in a missed notification. We guarantee this by
      release-acquire operations on log_requests.start and
      log_sys.flushed_to_disk_lsn.
      
      log_flush_notify_and_unlock(): Cleanup: Always release the mutex.
      
      log_sys_t::get_flushed_lsn(): Use acquire memory order.
      
      log_sys_t::set_flushed_lsn(): Use release memory order.
      
      log_sys_t::set_lsn(): Use release memory order.
      
      log_sys_t::get_lsn(): Use relaxed memory order by default, and
      allow the caller to specify acquire memory order explicitly.
      Whenever the log_sys.mutex is being held or when log writes are
      prohibited during startup, we can use a relaxed load. Likewise,
      in some assertions where reading a stale value of log_sys.lsn
      should not matter, we can use a relaxed load.
      
      This will cause some additional instructions to be emitted on
      architectures that do not implement Total Store Ordering (TSO),
      such as POWER, ARM, and RISC-V Weak Memory Ordering (RVWMO).
      8c2e3259
  25. 24 Mar, 2021 1 commit
  26. 22 Mar, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-22653: Remove the useless parameter innodb_simulate_comp_failures · 0f8caadc
      Marko Mäkelä authored
      The debug parameter innodb_simulate_comp_failures injected compression
      failures for ROW_FORMAT=COMPRESSED tables, breaking the pre-existing
      logic that I had implemented in the InnoDB Plugin for MySQL 5.1 to prevent
      compressed page overflows. A much better check is already achieved by
      defining UNIV_ZIP_COPY at the compilation time.
      (Only UNIV_ZIP_DEBUG is part of cmake -DWITH_INNODB_EXTRA_DEBUG=ON.)
      0f8caadc
  27. 19 Mar, 2021 1 commit
  28. 11 Mar, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-25105 Remove innodb_checksum_algorithm values none,innodb,... · 7a4fbb55
      Marko Mäkelä authored
      Historically, InnoDB supported a buggy page checksum algorithm that did not
      compute a checksum over the full page. Later, well before MySQL 4.1
      introduced .ibd files and the innodb_file_per_table option, the algorithm
      was corrected and the first 4 bytes of each page were redefined to be
      a checksum.
      
      The original checksum was so slow that an option to disable page checksum
      was introduced for benchmarketing purposes.
      
      The Intel Nehalem microarchitecture introduced the SSE4.2 instruction set
      extension, which includes instructions for faster computation of CRC-32C.
      In MySQL 5.6 (and MariaDB 10.0), innodb_checksum_algorithm=crc32 was
      implemented to make of that. As that option was changed to be the default
      in MySQL 5.7, a bug was found on big-endian platforms and some work-around
      code was added to weaken that checksum further. MariaDB disables that
      work-around by default since MDEV-17958.
      
      Later, SIMD-accelerated CRC-32C has been implemented in MariaDB for POWER
      and ARM and also for IA-32/AMD64, making use of carry-less multiplication
      where available.
      
      Long story short, innodb_checksum_algorithm=crc32 is faster and more secure
      than the pre-MySQL 5.6 checksum, called innodb_checksum_algorithm=innodb.
      It should have removed any need to use innodb_checksum_algorithm=none.
      
      The setting innodb_checksum_algorithm=crc32 is the default in
      MySQL 5.7 and MariaDB Server 10.2, 10.3, 10.4. In MariaDB 10.5,
      MDEV-19534 made innodb_checksum_algorithm=full_crc32 the default.
      It is even faster and more secure.
      
      The default settings in MariaDB do allow old data files to be read,
      no matter if a worse checksum algorithm had been used.
      (Unfortunately, before innodb_checksum_algorithm=full_crc32,
      the data files did not identify which checksum algorithm is being used.)
      
      The non-default settings innodb_checksum_algorithm=strict_crc32 or
      innodb_checksum_algorithm=strict_full_crc32 would only allow CRC-32C
      checksums. The incompatibility with old data files is why they are
      not the default.
      
      The newest server not to support innodb_checksum_algorithm=crc32
      were MySQL 5.5 and MariaDB 5.5. Both have reached their end of life.
      A valid reason for using innodb_checksum_algorithm=innodb could have
      been the ability to downgrade. If it is really needed, data files
      can be converted with an older version of the innochecksum utility.
      
      Because there is no good reason to allow data files to be written
      with insecure checksums, we will reject those option values:
      
          innodb_checksum_algorithm=none
          innodb_checksum_algorithm=innodb
          innodb_checksum_algorithm=strict_none
          innodb_checksum_algorithm=strict_innodb
      
      Furthermore, the following innochecksum options will be removed,
      because only strict crc32 will be supported:
      
          innochecksum --strict-check=crc32
          innochecksum -C crc32
          innochecksum --write=crc32
          innochecksum -w crc32
      
      If a user wishes to convert a data file to use a different checksum
      (so that it might be used with the no-longer-supported
      MySQL 5.5 or MariaDB 5.5, which do not support IMPORT TABLESPACE
      nor system tablespace format changes that were made in MariaDB 10.3),
      then the innochecksum tool from MariaDB 10.2, 10.3, 10.4, 10.5 or
      MySQL 5.7 can be used.
      
      Reviewed by: Thirunarayanan Balathandayuthapani
      7a4fbb55
  29. 17 Feb, 2021 1 commit
  30. 11 Feb, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-20612: Replace lock_sys.mutex with lock_sys.latch · b01d8e1a
      Marko Mäkelä authored
      For now, we will acquire the lock_sys.latch only in exclusive mode,
      that is, use it as a mutex.
      
      This is preparation for the next commit where we will introduce
      a less intrusive alternative, combining a shared lock_sys.latch
      with dict_table_t::lock_mutex or a mutex embedded in
      lock_sys.rec_hash, lock_sys.prdt_hash, or lock_sys.prdt_page_hash.
      b01d8e1a
  31. 07 Feb, 2021 1 commit
  32. 29 Jan, 2021 1 commit
  33. 27 Jan, 2021 2 commits
    • Marko Mäkelä's avatar
      MDEV-24671: Replace lock_wait_timeout_task with mysql_cond_timedwait() · e71e6133
      Marko Mäkelä authored
      lock_wait(): Replaces lock_wait_suspend_thread(). Wait for the lock to
      be granted or the transaction to be killed using mysql_cond_timedwait()
      or mysql_cond_wait().
      
      lock_wait_end(): Replaces que_thr_end_lock_wait() and
      lock_wait_release_thread_if_suspended().
      
      lock_wait_timeout_task: Remove. The operating system kernel will
      resume the mysql_cond_timedwait() in lock_wait(). An added benefit
      is that innodb_lock_wait_timeout no longer has a 'jitter' of 1 second,
      which was caused by this wake-up task waking up only once per second,
      and then waking up any threads for which the timeout (which was only
      measured in seconds) was exceeded.
      
      innobase_kill_query(): Set trx->error_state=DB_INTERRUPTED,
      so that a call trx_is_interrupted(trx) in lock_wait() can be avoided.
      
      We will protect things more consistently with lock_sys.wait_mutex,
      which will be moved below lock_sys.mutex in the latching order.
      
      trx_lock_t::cond: Condition variable for !wait_lock, used with
      lock_sys.wait_mutex.
      
      srv_slot_t: Remove. Replaced by trx_lock_t::cond,
      
      lock_grant_after_reset(): Merged to to lock_grant().
      
      lock_rec_get_index_name(): Remove.
      
      lock_sys_t: Introduce wait_pending, wait_count, wait_time, wait_time_max
      that are protected by wait_mutex.
      
      trx_lock_t::que_state: Remove.
      
      que_thr_state_t: Remove QUE_THR_COMMAND_WAIT, QUE_THR_LOCK_WAIT.
      
      que_thr_t: Remove is_active, start_running(), stop_no_error().
      
      que_fork_t::n_active_thrs, trx_lock_t::n_active_thrs: Remove.
      e71e6133
    • Vladislav Vaintroub's avatar
      MDEV-24685 - remove IO thread states output from SHOW ENGINE INNODB STATUS · c310f4c3
      Vladislav Vaintroub authored
      There are no IO threads anymore.
      c310f4c3
  34. 13 Jan, 2021 2 commits
    • Marko Mäkelä's avatar
      MDEV-24536 innodb_idle_flush_pct has no effect · e4205fba
      Marko Mäkelä authored
      The parameter innodb_idle_flush_pct that was introduced in
      MariaDB Server 10.1.2 by MDEV-6932 has no effect ever since
      the InnoDB changes from MySQL 5.7.9 were applied in
      commit 2e814d47.
      
      Let us declare the parameter as MARIADB_REMOVED_OPTION.
      For earlier versions, commit ea9cd97f
      declared the parameter deprecated.
      e4205fba
    • Marko Mäkelä's avatar
      MDEV-24536 innodb_idle_flush_pct has no effect · ea9cd97f
      Marko Mäkelä authored
      The parameter innodb_idle_flush_pct that was introduced in
      MariaDB Server 10.1.2 by MDEV-6932 has no effect ever since
      the InnoDB changes from MySQL 5.7.9 were applied in
      commit 2e814d47.
      
      Let us declare the parameter as deprecated and having no effect.
      ea9cd97f
  35. 07 Jan, 2021 1 commit