An error occurred fetching the project authors.
  1. 03 Sep, 2020 1 commit
  2. 19 Aug, 2020 1 commit
    • Marko Mäkelä's avatar
      MDEV-23475 InnoDB performance regression for write-heavy workloads · 309302a3
      Marko Mäkelä authored
      In commit fe39d02f (MDEV-20638)
      we removed some wake-up signaling of the master thread that should
      have been there, to ensure a steady log checkpointing workload.
      
      Common sense suggests that the commit omitted some necessary calls
      to srv_inc_activity_count(). But, an attempt to add the call to
      trx_flush_log_if_needed_low() as well as to reinstate the function
      innobase_active_small() did not restore the performance for the
      case where sync_binlog=1 is set.
      
      Therefore, we will revert the entire commit in MariaDB Server 10.2.
      In MariaDB Server 10.5, adding a srv_inc_activity_count() call to
      trx_flush_log_if_needed_low() did restore the performance, so we
      will not revert MDEV-20638 across all versions.
      309302a3
  3. 04 Aug, 2020 1 commit
    • Marko Mäkelä's avatar
      MDEV-23379 Deprecate&ignore InnoDB concurrency throttling parameters · bbd70fcc
      Marko Mäkelä authored
      The parameters innodb_thread_concurrency and innodb_commit_concurrency
      were useful years ago when both computing resources and the implementation
      of some shared data structures were limited. MySQL 5.0 or 5.1 had trouble
      scaling beyond 8 concurrent connections. Most of the scalability bottlenecks
      have been removed since then, and the transactions per second delivered
      by MariaDB Server 10.5 should not dramatically drop upon exceeding the
      'optimal' number of connections.
      
      Hence, enabling any concurrency throttling for InnoDB actually makes
      things worse. We have seen many customers mistakenly setting this to a
      small value like 16 or 64 and then complaining the server was slow.
      
      Ignoring the parameters allows us to remove some normally unused code
      and data structures, which could slightly improve performance.
      
      innodb_thread_concurrency, innodb_commit_concurrency,
      innodb_replication_delay, innodb_concurrency_tickets,
      innodb_thread_sleep_delay, innodb_adaptive_max_sleep_delay:
      Deprecate and ignore; hard-wire to 0.
      
      The column INFORMATION_SCHEMA.INNODB_TRX.trx_concurrency_tickets
      will always report 0.
      bbd70fcc
  4. 23 Jul, 2020 1 commit
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-20638 Remove the deadcode from srv_master_thread() and srv_active_wake_master_thread_low() · fe39d02f
      Thirunarayanan Balathandayuthapani authored
      - Due to commit fe95cb2e (MDEV-16125),
      InnoDB master thread does not need to call srv_resume_thread()
      and therefore there is no need to wake up the thread.
      Due to the above patch, InnoDB should remove the following dead code.
      
      srv_check_activity(): Makes the parameter as in,out and returns the
      recent activity value
      
      innobase_active_small(): Removed
      
      srv_active_wake_master_thread(): Removed
      
      srv_wake_master_thread(): Removed
      
      srv_active_wake_master_thread_low(): Removed
      
      Simplify srv_master_thread() and remove switch cases, added the assert.
      
      Replace srv_wake_master_thread() with srv_inc_activity_count()
      
      INNOBASE_WAKE_INTERVAL: Removed
      fe39d02f
  5. 20 Jul, 2020 1 commit
    • Marko Mäkelä's avatar
      MDEV-23190 InnoDB data file extension is not crash-safe · 57ec42bc
      Marko Mäkelä authored
      When InnoDB is extending a data file, it is updating the FSP_SIZE
      field in the first page of the data file.
      
      In commit 8451e090 (MDEV-11556)
      we removed a work-around for this bug and made recovery stricter,
      by making it track changes to FSP_SIZE via redo log records, and
      extend the data files before any changes are being applied to them.
      
      It turns out that the function fsp_fill_free_list() is not crash-safe
      with respect to this when it is initializing the change buffer bitmap
      page (page 1, or generally, N*innodb_page_size+1). It uses a separate
      mini-transaction that is committed (and will be written to the redo
      log file) before the mini-transaction that actually extended the data
      file. Hence, recovery can observe a reference to a page that is
      beyond the current end of the data file.
      
      fsp_fill_free_list(): Initialize the change buffer bitmap page in
      the same mini-transaction.
      
      The rest of the changes are fixing a bug that the use of the separate
      mini-transaction was attempting to work around. Namely, we must ensure
      that no other thread will access the change buffer bitmap page before
      our mini-transaction has been committed and all page latches have been
      released.
      
      That is, for read-ahead as well as neighbour flushing, we must avoid
      accessing pages that might not yet be durably part of the tablespace.
      
      fil_space_t::committed_size: The size of the tablespace
      as persisted by mtr_commit().
      
      fil_space_t::max_page_number_for_io(): Limit the highest page
      number for I/O batches to committed_size.
      
      MTR_MEMO_SPACE_X_LOCK: Replaces MTR_MEMO_X_LOCK for fil_space_t::latch.
      
      mtr_x_space_lock(): Replaces mtr_x_lock() for fil_space_t::latch.
      
      mtr_memo_slot_release_func(): When releasing MTR_MEMO_SPACE_X_LOCK,
      copy space->size to space->committed_size. In this way, read-ahead
      or flushing will never be invoked on pages that do not yet exist
      according to FSP_SIZE.
      57ec42bc
  6. 18 Jun, 2020 1 commit
    • Marko Mäkelä's avatar
      MDEV-22871: Clean up btr_search_sys · bf3c862f
      Marko Mäkelä authored
      btr_search_sys::parts[]: A single structure for the partitions of
      the adaptive hash index. Replaces the 3 separate arrays:
      btr_search_latches[], btr_search_sys->hash_tables,
      btr_search_sys->hash_tables[i]->heap.
      
      hash_table_t::heap, hash_table_t::adaptive: Remove.
      
      ha0ha.cc: Remove. Move all code to btr0sea.cc.
      bf3c862f
  7. 08 Jun, 2020 2 commits
    • Marko Mäkelä's avatar
      MDEV-15053 follow-up to reduce buf_pool.mutex contention · 1b01833a
      Marko Mäkelä authored
      buf_LRU_make_block_young(): Merge with buf_page_make_young().
      
      buf_pool_check_no_pending_io(): Remove. Replaced with
      buf_pool.any_io_pending() and buf_pool.io_pending(),
      which do not unnecessarily acquire buf_pool.mutex.
      
      buf_pool_t::init_flush[]: Use atomic access, so that
      buf_flush_wait_LRU_batch_end() can avoid acquiring buf_pool.mutex.
      
      buf_pool_t::try_LRU_scan: Declare as bool.
      1b01833a
    • Marko Mäkelä's avatar
      MDEV-22827 InnoDB: Failing assertion: purge_sys->n_stop == 0 · f458b40f
      Marko Mäkelä authored
      When MDEV-22769 introduced srv_shutdown_state=SRV_SHUTDOWN_INITIATED in
      commit efc70da5
      we forgot to adjust a few checks for SRV_SHUTDOWN_NONE.
      
      In the initial shutdown step, we are waiting for the background
      DROP TABLE queue to be processed or discarded. At that time,
      some background tasks (such as buffer pool resizing or dumping
      or encryption key rotation) may be terminated, but others must
      remain running normally.
      
      srv_purge_coordinator_suspend(), srv_purge_coordinator_thread(),
      srv_start_wait_for_purge_to_start(): Treat SRV_SHUTDOWN_NONE
      and SRV_SHUTDOWN_INITIATED equally.
      f458b40f
  8. 05 Jun, 2020 3 commits
    • Marko Mäkelä's avatar
    • Marko Mäkelä's avatar
      MDEV-22769 Shutdown hang or crash due to XA breaking locks · efc70da5
      Marko Mäkelä authored
      The background drop table queue in InnoDB is a work-around for
      cases where the SQL layer is requesting DDL on tables on which
      transactional locks exist.
      
      One such case are XA transactions. Our test case exploits the
      fact that the recovery of XA PREPARE transactions will
      only resurrect InnoDB table locks, but not MDL that should
      block any concurrent DDL.
      
      srv_shutdown_t: Introduce the srv_shutdown_state=SRV_SHUTDOWN_INITIATED
      for the initial part of shutdown, to wait for the background drop
      table queue to be emptied.
      
      srv_shutdown_bg_undo_sources(): Assign
      srv_shutdown_state=SRV_SHUTDOWN_INITIATED
      before waiting for the background drop table queue to be emptied.
      
      row_drop_tables_for_mysql_in_background(): On slow shutdown, if
      no active transactions exist (excluding ones that are in
      XA PREPARE state), skip any tables on which locks exist.
      
      row_drop_table_for_mysql(): Do not unnecessarily attempt to
      drop InnoDB persistent statistics for tables that have
      already been added to the background drop table queue.
      
      row_mysql_close(): Relax an assertion, and free all memory
      even if innodb_force_recovery=2 would prevent the background
      drop table queue from being emptied.
      efc70da5
    • Marko Mäkelä's avatar
      MDEV-15053 Reduce buf_pool_t::mutex contention · b1ab211d
      Marko Mäkelä authored
      User-visible changes: The INFORMATION_SCHEMA views INNODB_BUFFER_PAGE
      and INNODB_BUFFER_PAGE_LRU will report a dummy value FLUSH_TYPE=0
      and will no longer report the PAGE_STATE value READY_FOR_USE.
      
      We will remove some fields from buf_page_t and move much code to
      member functions of buf_pool_t and buf_page_t, so that the access
      rules of data members can be enforced consistently.
      
      Evicting or adding pages in buf_pool.LRU will remain covered by
      buf_pool.mutex.
      
      Evicting or adding pages in buf_pool.page_hash will remain
      covered by both buf_pool.mutex and the buf_pool.page_hash X-latch.
      
      After this fix, buf_pool.page_hash lookups can entirely
      avoid acquiring buf_pool.mutex, only relying on
      buf_pool.hash_lock_get() S-latch.
      
      Similarly, buf_flush_check_neighbors() can will rely solely on
      buf_pool.mutex, no buf_pool.page_hash latch at all.
      
      The buf_pool.mutex is rather contended in I/O heavy benchmarks,
      especially when the workload does not fit in the buffer pool.
      
      The first attempt to alleviate the contention was the
      buf_pool_t::mutex split in
      commit 4ed7082e
      which introduced buf_block_t::mutex, which we are now removing.
      
      Later, multiple instances of buf_pool_t were introduced
      in commit c18084f7
      and recently removed by us in
      commit 1a6f708e (MDEV-15058).
      
      UNIV_BUF_DEBUG: Remove. This option to enable some buffer pool
      related debugging in otherwise non-debug builds has not been used
      for years. Instead, we have been using UNIV_DEBUG, which is enabled
      in CMAKE_BUILD_TYPE=Debug.
      
      buf_block_t::mutex, buf_pool_t::zip_mutex: Remove. We can mainly rely on
      std::atomic and the buf_pool.page_hash latches, and in some cases
      depend on buf_pool.mutex or buf_pool.flush_list_mutex just like before.
      We must always release buf_block_t::lock before invoking
      unfix() or io_unfix(), to prevent a glitch where a block that was
      added to the buf_pool.free list would apper X-latched. See
      commit c5883deb how this glitch
      was finally caught in a debug environment.
      
      We move some buf_pool_t::page_hash specific code from the
      ha and hash modules to buf_pool, for improved readability.
      
      buf_pool_t::close(): Assert that all blocks are clean, except
      on aborted startup or crash-like shutdown.
      
      buf_pool_t::validate(): No longer attempt to validate
      n_flush[] against the number of BUF_IO_WRITE fixed blocks,
      because buf_page_t::flush_type no longer exists.
      
      buf_pool_t::watch_set(): Replaces buf_pool_watch_set().
      Reduce mutex contention by separating the buf_pool.watch[]
      allocation and the insert into buf_pool.page_hash.
      
      buf_pool_t::page_hash_lock<bool exclusive>(): Acquire a
      buf_pool.page_hash latch.
      Replaces and extends buf_page_hash_lock_s_confirm()
      and buf_page_hash_lock_x_confirm().
      
      buf_pool_t::READ_AHEAD_PAGES: Renamed from BUF_READ_AHEAD_PAGES.
      
      buf_pool_t::curr_size, old_size, read_ahead_area, n_pend_reads:
      Use Atomic_counter.
      
      buf_pool_t::running_out(): Replaces buf_LRU_buf_pool_running_out().
      
      buf_pool_t::LRU_remove(): Remove a block from the LRU list
      and return its predecessor. Incorporates buf_LRU_adjust_hp(),
      which was removed.
      
      buf_page_get_gen(): Remove a redundant call of fsp_is_system_temporary(),
      for mode == BUF_GET_IF_IN_POOL_OR_WATCH, which is only used by
      BTR_DELETE_OP (purge), which is never invoked on temporary tables.
      
      buf_free_from_unzip_LRU_list_batch(): Avoid redundant assignments.
      
      buf_LRU_free_from_unzip_LRU_list(): Simplify the loop condition.
      
      buf_LRU_free_page(): Clarify the function comment.
      
      buf_flush_check_neighbor(), buf_flush_check_neighbors():
      Rewrite the construction of the page hash range. We will hold
      the buf_pool.mutex for up to buf_pool.read_ahead_area (at most 64)
      consecutive lookups of buf_pool.page_hash.
      
      buf_flush_page_and_try_neighbors(): Remove.
      Merge to its only callers, and remove redundant operations in
      buf_flush_LRU_list_batch().
      
      buf_read_ahead_random(), buf_read_ahead_linear(): Rewrite.
      Do not acquire buf_pool.mutex, and iterate directly with page_id_t.
      
      ut_2_power_up(): Remove. my_round_up_to_next_power() is inlined
      and avoids any loops.
      
      fil_page_get_prev(), fil_page_get_next(), fil_addr_is_null(): Remove.
      
      buf_flush_page(): Add a fil_space_t* parameter. Minimize the
      buf_pool.mutex hold time. buf_pool.n_flush[] is no longer updated
      atomically with the io_fix, and we will protect most buf_block_t
      fields with buf_block_t::lock. The function
      buf_flush_write_block_low() is removed and merged here.
      
      buf_page_init_for_read(): Use static linkage. Initialize the newly
      allocated block and acquire the exclusive buf_block_t::lock while not
      holding any mutex.
      
      IORequest::IORequest(): Remove the body. We only need to invoke
      set_punch_hole() in buf_flush_page() and nowhere else.
      
      buf_page_t::flush_type: Remove. Replaced by IORequest::flush_type.
      This field is only used during a fil_io() call.
      That function already takes IORequest as a parameter, so we had
      better introduce  for the rarely changing field.
      
      buf_block_t::init(): Replaces buf_page_init().
      
      buf_page_t::init(): Replaces buf_page_init_low().
      
      buf_block_t::initialise(): Initialise many fields, but
      keep the buf_page_t::state(). Both buf_pool_t::validate() and
      buf_page_optimistic_get() requires that buf_page_t::in_file()
      be protected atomically with buf_page_t::in_page_hash
      and buf_page_t::in_LRU_list.
      
      buf_page_optimistic_get(): Now that buf_block_t::mutex
      no longer exists, we must check buf_page_t::io_fix()
      after acquiring the buf_pool.page_hash lock, to detect
      whether buf_page_init_for_read() has been initiated.
      We will also check the io_fix() before acquiring hash_lock
      in order to avoid unnecessary computation.
      The field buf_block_t::modify_clock (protected by buf_block_t::lock)
      allows buf_page_optimistic_get() to validate the block.
      
      buf_page_t::real_size: Remove. It was only used while flushing
      pages of page_compressed tables.
      
      buf_page_encrypt(): Add an output parameter that allows us ot eliminate
      buf_page_t::real_size. Replace a condition with debug assertion.
      
      buf_page_should_punch_hole(): Remove.
      
      buf_dblwr_t::add_to_batch(): Replaces buf_dblwr_add_to_batch().
      Add the parameter size (to replace buf_page_t::real_size).
      
      buf_dblwr_t::write_single_page(): Replaces buf_dblwr_write_single_page().
      Add the parameter size (to replace buf_page_t::real_size).
      
      fil_system_t::detach(): Replaces fil_space_detach().
      Ensure that fil_validate() will not be violated even if
      fil_system.mutex is released and reacquired.
      
      fil_node_t::complete_io(): Renamed from fil_node_complete_io().
      
      fil_node_t::close_to_free(): Replaces fil_node_close_to_free().
      Avoid invoking fil_node_t::close() because fil_system.n_open
      has already been decremented in fil_space_t::detach().
      
      BUF_BLOCK_READY_FOR_USE: Remove. Directly use BUF_BLOCK_MEMORY.
      
      BUF_BLOCK_ZIP_DIRTY: Remove. Directly use BUF_BLOCK_ZIP_PAGE,
      and distinguish dirty pages by buf_page_t::oldest_modification().
      
      BUF_BLOCK_POOL_WATCH: Remove. Use BUF_BLOCK_NOT_USED instead.
      This state was only being used for buf_page_t that are in
      buf_pool.watch.
      
      buf_pool_t::watch[]: Remove pointer indirection.
      
      buf_page_t::in_flush_list: Remove. It was set if and only if
      buf_page_t::oldest_modification() is nonzero.
      
      buf_page_decrypt_after_read(), buf_corrupt_page_release(),
      buf_page_check_corrupt(): Change the const fil_space_t* parameter
      to const fil_node_t& so that we can report the correct file name.
      
      buf_page_monitor(): Declare as an ATTRIBUTE_COLD global function.
      
      buf_page_io_complete(): Split to buf_page_read_complete() and
      buf_page_write_complete().
      
      buf_dblwr_t::in_use: Remove.
      
      buf_dblwr_t::buf_block_array: Add IORequest::flush_t.
      
      buf_dblwr_sync_datafiles(): Remove. It was a useless wrapper of
      os_aio_wait_until_no_pending_writes().
      
      buf_flush_write_complete(): Declare static, not global.
      Add the parameter IORequest::flush_t.
      
      buf_flush_freed_page(): Simplify the code.
      
      recv_sys_t::flush_lru: Renamed from flush_type and changed to bool.
      
      fil_read(), fil_write(): Replaced with direct use of fil_io().
      
      fil_buffering_disabled(): Remove. Check srv_file_flush_method directly.
      
      fil_mutex_enter_and_prepare_for_io(): Return the resolved
      fil_space_t* to avoid a duplicated lookup in the caller.
      
      fil_report_invalid_page_access(): Clean up the parameters.
      
      fil_io(): Return fil_io_t, which comprises fil_node_t and error code.
      Always invoke fil_space_t::acquire_for_io() and let either the
      sync=true caller or fil_aio_callback() invoke
      fil_space_t::release_for_io().
      
      fil_aio_callback(): Rewrite to replace buf_page_io_complete().
      
      fil_check_pending_operations(): Remove a parameter, and remove some
      redundant lookups.
      
      fil_node_close_to_free(): Wait for n_pending==0. Because we no longer
      do an extra lookup of the tablespace between fil_io() and the
      completion of the operation, we must give fil_node_t::complete_io() a
      chance to decrement the counter.
      
      fil_close_tablespace(): Remove unused parameter trx, and document
      that this is only invoked during the error handling of IMPORT TABLESPACE.
      
      row_import_discard_changes(): Merged with the only caller,
      row_import_cleanup(). Do not lock up the data dictionary while
      invoking fil_close_tablespace().
      
      logs_empty_and_mark_files_at_shutdown(): Do not invoke
      fil_close_all_files(), to avoid a !needs_flush assertion failure
      on fil_node_t::close().
      
      innodb_shutdown(): Invoke os_aio_free() before fil_close_all_files().
      
      fil_close_all_files(): Invoke fil_flush_file_spaces()
      to ensure proper durability.
      
      thread_pool::unbind(): Fix a crash that would occur on Windows
      after srv_thread_pool->disable_aio() and os_file_close().
      This fix was submitted by Vladislav Vaintroub.
      
      Thanks to Matthias Leich and Axel Schwenke for extensive testing,
      Vladislav Vaintroub for helpful comments, and Eugene Kosov for a review.
      b1ab211d
  9. 04 Jun, 2020 1 commit
    • Marko Mäkelä's avatar
      MDEV-22721 Remove bloat caused by InnoDB logger class · eba2d10a
      Marko Mäkelä authored
      Introduce a new ATTRIBUTE_NOINLINE to
      ib::logger member functions, and add UNIV_UNLIKELY hints to callers.
      
      Also, remove some crash reporting output. If needed, the
      information will be available using debugging tools.
      
      Furthermore, remove some fts_enable_diag_print output that included
      indexed words in raw form. The code seemed to assume that words are
      NUL-terminated byte strings. It is not clear whether a NUL terminator
      is always guaranteed to be present. Also, UCS2 or UTF-16 strings would
      typically contain many NUL bytes.
      eba2d10a
  10. 03 Jun, 2020 2 commits
    • Vladislav Vaintroub's avatar
      MDEV-21751 innodb_fast_shutdown=0 can be unnecessarily slow · bee4b044
      Vladislav Vaintroub authored
      max out parallel purge worker tasks, on slow shutdown, to speedup
      bee4b044
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-22646 Assertion `table2->cached' failed in dict_table_t::add_to_cache · ad2bf112
      Thirunarayanan Balathandayuthapani authored
      Problem:
      ========
        During buffer pool resizing, InnoDB recreates the dictionary hash
      tables. Dictionary hash table reuses the heap of AHI hash tables.
      It leads to memory corruption.
      
      Fix:
      ====
      - While disabling AHI, free the heap and AHI hash tables. Recreate the
      AHI hash tables and assign new heap when AHI is enabled.
      
      - btr_blob_free() access invalid page if page was reallocated during
      buffer poolresizing. So btr_blob_free() should get the page from
      buf_pool instead of using existing block.
      
      - btr_search_enabled and block->index should be checked after
      acquiring the btr_search_sys latch
      
      - Moved the buffer_pool_scan debug sync to earlier before accessing the
      btr_search_sys latches to avoid the hang of truncate_purge_debug
      test case
      
      - srv_printf_innodb_monitor() should acquire btr_search_sys latches
      before AHI hash tables.
      ad2bf112
  11. 22 May, 2020 1 commit
  12. 12 May, 2020 1 commit
  13. 30 Apr, 2020 1 commit
    • Eugene Kosov's avatar
      split log_t::buf into two buffers · 7f9dc0d8
      Eugene Kosov authored
      Maybe this patch will help catch problems like buffer overflow.
      
      log_t::first_in_use: removed
      
      log_t::buf: this is where mtr_t are supposed to append data
      log_t::flush_buf: this is from server writes to a file
      
      Those two buffers are std::swap()ped when some thread is gonna write
      to a file
      7f9dc0d8
  14. 28 Apr, 2020 1 commit
    • Eugene Kosov's avatar
      MDEV-22177 more fsync() -> fdatasync() in InnoDB · 0cd2b4c2
      Eugene Kosov authored
      Replace all fsync() with fdatasync() when possible (e.g. On Linux)
      
      InnoDB doesn't care about file timestamps. So, to achieve a better
      performance it makes sense to use fdatasync() everywhere.
      
      file_io::flush(): renamed from flush_data_only()
      
      os_file_flush_data(): removed
      
      os_file_sync_posix(): renamed from os_file_fsync_posix(). Now it uses
      fdatasync() when it's available.
      0cd2b4c2
  15. 07 Apr, 2020 1 commit
  16. 31 Mar, 2020 1 commit
  17. 30 Mar, 2020 1 commit
    • Marko Mäkelä's avatar
      Cleanup recv_sys: Move things to members · aae3f921
      Marko Mäkelä authored
      recv_sys.recovery_on: Replaces recv_recovery_on.
      
      recv_sys_t::apply(): Replaces recv_apply_hashed_log_recs().
      
      recv_sys_var_init(): Remove.
      
      recv_sys_t::recover_low(): Attempt to initialize a page based
      on buffered redo log records.
      aae3f921
  18. 18 Mar, 2020 1 commit
    • Marko Mäkelä's avatar
      MDEV-21962 Allocate buf_pool statically · a786f50d
      Marko Mäkelä authored
      Thanks to MDEV-15058, there is only one InnoDB buffer pool.
      Allocating buf_pool statically removes one level of pointer indirection
      and makes code more readable, and removes the awkward initialization of
      some buf_pool members.
      
      While doing this, we will also declare some buf_pool_t data members
      private and replace some functions with member functions. This is
      mostly affecting buffer pool resizing.
      
      This is not aiming to be a complete rewrite of buf_pool_t to
      a proper class. Most of the buffer pool interface, such as
      buf_page_get_gen(), will remain in the C programming style
      for now.
      
      buf_pool_t::withdrawing: Replaces buf_pool_withdrawing.
      buf_pool_t::withdraw_clock_: Replaces buf_withdraw_clock.
      
      buf_pool_t::create(): Repalces buf_pool_init().
      buf_pool_t::close(): Replaces buf_pool_free().
      
      buf_bool_t::will_be_withdrawn(): Replaces buf_block_will_be_withdrawn(),
      buf_frame_will_be_withdrawn().
      
      buf_pool_t::clear_hash_index(): Replaces buf_pool_clear_hash_index().
      buf_pool_t::get_n_pages(): Replaces buf_pool_get_n_pages().
      buf_pool_t::validate(): Replaces buf_validate().
      buf_pool_t::print(): Replaces buf_print().
      buf_pool_t::block_from_ahi(): Replaces buf_block_from_ahi().
      buf_pool_t::is_block_field(): Replaces buf_pointer_is_block_field().
      buf_pool_t::is_block_mutex(): Replaces buf_pool_is_block_mutex().
      buf_pool_t::is_block_lock(): Replaces buf_pool_is_block_lock().
      buf_pool_t::is_obsolete(): Replaces buf_pool_is_obsolete().
      buf_pool_t::io_buf: Make default-constructible.
      buf_pool_t::io_buf::create(): Delayed 'constructor'
      buf_pool_t::io_buf::close(): Early 'destructor'
      
      HazardPointer: Make default-constructible. Define all member functions
      inline, also for derived classes.
      a786f50d
  19. 16 Mar, 2020 1 commit
    • Eugene Kosov's avatar
      cleanup redo log · 0c2365c4
      Eugene Kosov authored
      Write log header just ones when file is created, instead of
      writing to it on every log file wrap around.
      
      log_t::file::write_header_durable(): this one writes to log header
      
      log_write_buf(): this one stops writing to log header
      0c2365c4
  20. 12 Mar, 2020 1 commit
    • Marko Mäkelä's avatar
      MDEV-21907: InnoDB: Enable -Wconversion on clang and GCC · f2245252
      Marko Mäkelä authored
      The -Wconversion in GCC seems to be stricter than in clang.
      GCC at least since version 4.4.7 issues truncation warnings for
      assignments to bitfields, while clang 10 appears to only issue
      warnings when the sizes in bytes rounded to the nearest integer
      powers of 2 are different.
      
      Before GCC 10.0.0, -Wconversion required more casts and would not
      allow some operations, such as x<<=1 or x+=1 on a data type that
      is narrower than int.
      
      GCC 5 (but not GCC 4, GCC 6, or any later version) is complaining
      about x|=y even when x and y are compatible types that are narrower
      than int.  Hence, we must rewrite some x|=y as
      x=static_cast<byte>(x|y) or similar, or we must disable -Wconversion.
      
      In GCC 6 and later, the warning for assigning wider to bitfields
      that are narrower than 8, 16, or 32 bits can be suppressed by
      applying a bitwise & with the exact bitmask of the bitfield.
      For older GCC, we must disable -Wconversion for GCC 4 or 5 in such
      cases.
      
      The bitwise negation operator appears to promote short integers
      to a wider type, and hence we must add explicit truncation casts
      around them. Microsoft Visual C does not allow a static_cast to
      truncate a constant, such as static_cast<byte>(1) truncating int.
      Hence, we will use the constructor-style cast byte(~1) for such cases.
      
      This has been tested at least with GCC 4.8.5, 5.4.0, 7.4.0, 9.2.1, 10.0.0,
      clang 9.0.1, 10.0.0, and MSVC 14.22.27905 (Microsoft Visual Studio 2019)
      on 64-bit and 32-bit targets (IA-32, AMD64, POWER 8, POWER 9, ARMv8).
      f2245252
  21. 11 Mar, 2020 1 commit
  22. 10 Mar, 2020 1 commit
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-15528 Punch holes when pages are freed · a5584b13
      Thirunarayanan Balathandayuthapani authored
      The following parameters are deprecated:
      
        innodb-background-scrub-data-uncompressed
        innodb-background-scrub-data-compressed
        innodb-background-scrub-data-interval
        innodb-background-scrub-data-check-interval
      
      Removed scrubbing code completely(btr0scrub.h, btr0scrub.cc)
      Removed information_schema.innodb_tablespaces_scrubbing tables
      Removed the scrubbing logic from fil_crypt_thread()
      a5584b13
  23. 07 Mar, 2020 1 commit
  24. 05 Mar, 2020 1 commit
    • Marko Mäkelä's avatar
      MDEV-14425 Cleanup: Use std::atomic for some log_sys members · a4ab54d7
      Marko Mäkelä authored
      Some fields were protected by log_sys.mutex, which adds quite some
      overhead for readers. Some readers were submitting dirty reads.
      
      log_t::lsn: Declare private and atomic. Add wrappers get_lsn()
      and set_lsn() that will use relaxed memory access. Many accesses
      to log_sys.lsn are still protected by log_sys.mutex; we avoid the
      mutex for some readers.
      
      log_t::flushed_to_disk_lsn: Declare private and atomic, and move
      to the same cache line with log_t::lsn.
      
      log_t::buf_free: Declare as size_t, and move to the same cache line
      with log_t::lsn.
      
      log_t::check_flush_or_checkpoint_: Declare private and atomic,
      and move to the same cache line with log_t::lsn.
      
      log_get_lsn(): Define as an alias of log_sys.get_lsn().
      
      log_get_lsn_nowait(), log_peek_lsn(): Remove.
      
      log_get_flush_lsn(): Define as an alias of log_sys.get_flush_lsn().
      
      log_t::initiate_write(): Replaces log_buffer_sync_in_background().
      a4ab54d7
  25. 04 Mar, 2020 2 commits
    • Marko Mäkelä's avatar
      MDEV-21870 Deprecate and ignore innodb_scrub_log and innodb_scrub_log_speed · 64be4ab4
      Marko Mäkelä authored
      The configuration parameter innodb_scrub_log never really worked, as
      reported in MDEV-13019 and MDEV-18370.
      
      Because MDEV-14425 is changing the redo log format, the innodb_scrub_log
      feature would have to be adjusted for it. Due to the known problems,
      it is easier to remove the feature for now, and to ignore and deprecate
      the parameters.
      
      If old log contents should be kept secret, then enabling innodb_encrypt_log
      or setting a smaller innodb_log_file_size could help.
      64be4ab4
    • Marko Mäkelä's avatar
      Cleanup: Make MONITOR_LSN_CHECKPOINT_AGE a value. · 9e488653
      Marko Mäkelä authored
      Compute MONITOR_LSN_CHECKPOINT_AGE on demand in
      srv_mon_process_existing_counter().
      This allows us to remove the overhead of MONITOR_SET
      calls for the counter.
      9e488653
  26. 02 Mar, 2020 1 commit
  27. 01 Mar, 2020 1 commit
    • Vladislav Vaintroub's avatar
      MDEV-21534 - Improve innodb redo log group commit performance · 30ea63b7
      Vladislav Vaintroub authored
      Introduce special synchronization primitive  group_commit_lock
      for more efficient synchronization of redo log writing and flushing.
      
      The goal is to reduce CPU consumption on log_write_up_to, to reduce
      the spurious wakeups, and improve the throughput in write-intensive
      benchmarks.
      30ea63b7
  28. 20 Feb, 2020 1 commit
    • Marko Mäkelä's avatar
      Cleanup: Remove dict_ind_redundant · 96901d95
      Marko Mäkelä authored
      There is no reason for the dummy index object dict_ind_redundant
      to exist any more. It was only being passed to btr_create().
      
      btr_create(): If !index, assume that a ROW_FORMAT=REDUNDANT
      table is being created.
      
      We could pass ibuf.index, dict_sys.sys_tables->indexes.start
      and so on, if those objects had been initialized before the
      function btr_create() is called.
      96901d95
  29. 19 Feb, 2020 1 commit
    • Eugene Kosov's avatar
      MDEV-14425 deprecate and ignore innodb_log_files_in_group · 9ef2d29f
      Eugene Kosov authored
      Now there can be only one log file instead of several which
      logically work as a single file.
      
      Possible names of redo log files: ib_logfile0,
      ib_logfile101 (for just created one)
      
      innodb_log_fiels_in_group: value of this variable is not used
      by InnoDB. Possible values are still 1..100, to not break upgrade
      
      LOG_FILE_NAME: add constant of value "ib_logfile0"
      LOG_FILE_NAME_PREFIX: add constant of value "ib_logfile"
      
      get_log_file_path(): convenience function that returns full
      path of a redo log file
      
      SRV_N_LOG_FILES_MAX: removed
      
      srv_n_log_files: we can't remove this for compatibility reasons,
      but now server doesn't use this variable
      
      log_sys_t::file::fd: now just one, not std::vector
      
      log_sys_t::log_capacity: removed word 'group'
      
      find_and_check_log_file(): part of logic from huge srv_start()
      moved here
      
      recv_sys_t::files: file descriptors of redo log files.
      There can be several of those in case we're upgrading
      from older MariaDB version.
      
      recv_sys_t::remove_extra_log_files: whether to remove
      ib_logfile{1,2,3...} after successfull upgrade.
      
      recv_sys_t::read(): open if needed and read from one
      of several log files
      
      recv_sys_t::files_size(): open if needed and return files count
      
      redo_file_sizes_are_correct(): check that redo log files
      sizes are equal. Just to log an error for a user.
      Corresponding check was moved from srv0start.cc
      
      namespace deprecated: put all deprecated variables here to
      prevent usage of it by us, developers
      9ef2d29f
  30. 13 Feb, 2020 2 commits
    • Marko Mäkelä's avatar
      MDEV-12353: Change the redo log encoding · 7ae21b18
      Marko Mäkelä authored
      log_t::FORMAT_10_5: physical redo log format tag
      
      log_phys_t: Buffered records in the physical format.
      The log record bytes will follow the last data field,
      making use of alignment padding that would otherwise be wasted.
      If there are multiple records for the same page, also those
      may be appended to an existing log_phys_t object if the memory
      is available.
      
      In the physical format, the first byte of a record identifies the
      record and its length (up to 15 bytes). For longer records, the
      immediately following bytes will encode the remaining length
      in a variable-length encoding. Usually, a variable-length-encoded
      page identifier will follow, followed by optional payload, whose
      length is included in the initially encoded total record length.
      
      When a mini-transaction is updating multiple fields in a page,
      it can avoid repeating the tablespace identifier and page number
      by setting the same_page flag (most significant bit) in the first
      byte of the log record. The byte offset of the record will be
      relative to where the previous record for that page ended.
      
      Until MDEV-14425 introduces a separate file-level log for
      redo log checkpoints and file operations, we will write the
      file-level records in the page-level redo log file.
      The record FILE_CHECKPOINT (which replaces MLOG_CHECKPOINT)
      will be removed in MDEV-14425, and one sequential scan of the
      page recovery log will suffice.
      
      Compared to MLOG_FILE_CREATE2, FILE_CREATE will not include any flags.
      If the information is needed, it can be parsed from WRITE records that
      modify FSP_SPACE_FLAGS.
      
      MLOG_ZIP_WRITE_STRING: Remove. The record was only introduced temporarily
      as part of this work, before being replaced with WRITE (along with
      MLOG_WRITE_STRING, MLOG_1BYTE, MLOG_nBYTES).
      
      mtr_buf_t::empty(): Check if the buffer is empty.
      
      mtr_t::m_n_log_recs: Remove. It suffices to check if m_log is empty.
      
      mtr_t::m_last, mtr_t::m_last_offset: End of the latest m_log record,
      for the same_page encoding.
      
      page_recv_t::last_offset: Reflects mtr_t::m_last_offset.
      
      Valid values for last_offset during recovery should be 0 or above 8.
      (The first 8 bytes of a page are the checksum and the page number,
      and neither are ever updated directly by log records.)
      Internally, the special value 1 indicates that the same_page form
      will not be allowed for the subsequent record.
      
      mtr_t::page_create(): Take the block descriptor as parameter,
      so that it can be compared to mtr_t::m_last. The INIT_INDEX_PAGE
      record will always followed by a subtype byte, because same_page
      records must be longer than 1 byte.
      
      trx_undo_page_init(): Combine the writes in WRITE record.
      
      trx_undo_header_create(): Write 4 bytes using a special MEMSET
      record that includes 1 bytes of length and 2 bytes of payload.
      
      flst_write_addr(): Define as a static function. Combine the writes.
      
      flst_zero_both(): Replaces two flst_zero_addr() calls.
      
      flst_init(): Do not inline the function.
      
      fsp_free_seg_inode(): Zerofill the whole inode.
      
      fsp_apply_init_file_page(): Initialize FIL_PAGE_PREV,FIL_PAGE_NEXT
      to FIL_NULL when using the physical format.
      
      btr_create(): Assert !page_has_siblings() because fsp_apply_init_file_page()
      must have been invoked.
      
      fil_ibd_create(): Do not write FILE_MODIFY after FILE_CREATE.
      
      fil_names_dirty_and_write(): Remove the parameter mtr.
      Write the records using a separate mini-transaction object,
      because any FILE_ records must be at the start of a mini-transaction log.
      
      recv_recover_page(): Add a fil_space_t* parameter.
      After applying log to the a ROW_FORMAT=COMPRESSED page,
      invoke buf_zip_decompress() to restore the uncompressed page.
      
      buf_page_io_complete(): Remove the temporary hack to discard the
      uncompressed page of a ROW_FORMAT=COMPRESSED page.
      
      page_zip_write_header(): Remove. Use mtr_t::write() or
      mtr_t::memset() instead, and update the compressed page frame
      separately.
      
      trx_undo_header_add_space_for_xid(): Remove.
      
      trx_undo_seg_create(): Perform the changes that were previously
      made by trx_undo_header_add_space_for_xid().
      
      btr_reset_instant(): New function: Reset the table to MariaDB 10.2
      or 10.3 format when rolling back an instant ALTER TABLE operation.
      
      page_rec_find_owner_rec(): Merge with the only callers.
      
      page_cur_insert_rec_low(): Combine writes by using a local buffer.
      MEMMOVE data from the preceding record whenever feasible
      (copying at least 3 bytes).
      
      page_cur_insert_rec_zip(): Combine writes to page header fields.
      
      PageBulk::insertPage(): Issue MEMMOVE records to copy a matching
      part from the preceding record.
      
      PageBulk::finishPage(): Combine the writes to the page header
      and to the sparse page directory slots.
      
      mtr_t::write(): Only log the least significant (last) bytes
      of multi-byte fields that actually differ.
      
      For updating FSP_SIZE, we must always write all 4 bytes to the
      redo log, so that the fil_space_set_recv_size() logic in
      recv_sys_t::parse() will work.
      
      mtr_t::memcpy(), mtr_t::zmemcpy(): Take a pointer argument
      instead of a numeric offset to the page frame. Only log the
      last bytes of multi-byte fields that actually differ.
      
      In fil_space_crypt_t::write_page0(), we must log also any
      unchanged bytes, so that recovery will recognize the record
      and invoke fil_crypt_parse().
      
      Future work:
      MDEV-21724 Optimize page_cur_insert_rec_low() redo logging
      MDEV-21725 Optimize btr_page_reorganize_low() redo logging
      MDEV-21727 Optimize redo logging for ROW_FORMAT=COMPRESSED
      7ae21b18
    • Marko Mäkelä's avatar
      MDEV-12353: Write log by mtr_t member functions only · f37a29dd
      Marko Mäkelä authored
      mtr_t::log_write_low(): Replaces mlog_write_initial_log_record_low().
      
      mtr_t::log_file_op(): Replaces fil_op_write_log().
      
      mtr_t::free(): Write MLOG_INIT_FREE_PAGE.
      
      mtr_t::init(): Write MLOG_INIT_FILE_PAGE2.
      
      mtr_t::page_create(): Write record about the partial initialization
      of an index page.
      
      mlog_catenate_ulint(), mlog_catenate_string(),
      mlog_open(), mlog_close(): Remove.
      f37a29dd
  31. 12 Feb, 2020 1 commit
    • Marko Mäkelä's avatar
      MDEV-15058: Deprecate and ignore innodb_buffer_pool_instances · 1a6f708e
      Marko Mäkelä authored
      Our benchmarking efforts indicate that the reasons for splitting the
      buf_pool in commit c18084f7
      have mostly gone away, possibly as a result of
      mysql/mysql-server@ce6109ebfdedfdf185e391a0c97dc6d33867ed78
      or similar work.
      
      Only in one write-heavy benchmark where the working set size is
      ten times the buffer pool size, the buf_pool->mutex would be
      less contended with 4 buffer pool instances than with 1 instance,
      in buf_page_io_complete(). That contention could be alleviated
      further by making more use of std::atomic and by splitting
      buf_pool_t::mutex further (MDEV-15053).
      
      We will deprecate and ignore the following parameters:
      
      	innodb_buffer_pool_instances
      	innodb_page_cleaners
      
      There will be only one buffer pool and one page cleaner task.
      
      In a number of INFORMATION_SCHEMA views, columns that indicated
      the buffer pool instance will be removed:
      
      	information_schema.innodb_buffer_page.pool_id
      	information_schema.innodb_buffer_page_lru.pool_id
      	information_schema.innodb_buffer_pool_stats.pool_id
      	information_schema.innodb_cmpmem.buffer_pool_instance
      	information_schema.innodb_cmpmem_reset.buffer_pool_instance
      1a6f708e
  32. 11 Feb, 2020 1 commit
    • Marko Mäkelä's avatar
      MDEV-19747: Deprecate and ignore innodb_log_optimize_ddl · fc2f2fa8
      Marko Mäkelä authored
      During native table rebuild or index creation, InnoDB used to skip
      redo logging and write MLOG_INDEX_LOAD records to inform crash recovery
      and Mariabackup of the gaps in redo log. This is fragile and prohibits
      some optimizations, such as skipping the doublewrite buffer for
      newly (re)initialized pages (MDEV-19738).
      
      row_merge_write_redo(): Remove. We do not write MLOG_INDEX_LOAD
      records any more. Instead, we write full redo log.
      
      FlushObserver: Remove.
      
      fseg_free_page_func(): Remove the parameter log. Redo logging
      cannot be disabled.
      
      fil_space_t::redo_skipped_count: Remove.
      
      We cannot remove buf_block_t::skip_flush_check, because PageBulk
      will temporarily generate invalid B-tree pages in the buffer pool.
      fc2f2fa8
  33. 01 Feb, 2020 1 commit
    • Eugene Kosov's avatar
      clean up redo log · 691c691a
      Eugene Kosov authored
      main change: rename first redo log without file close
      
      second change: use os_offset_t to represent offset in a file
      
      third change: fix log texts
      691c691a
  34. 24 Jan, 2020 1 commit
    • Eugene Kosov's avatar
      cleanup redo log · b534a667
      Eugene Kosov authored
      class log_file_t: more or less sane RAII wrapper around redo log file
      descriptor and its path.
      
      This change is motivated by the need of using that log_file_t somewhere else.
      b534a667