1. 20 Feb, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-24854: Change innodb_flush_method=O_DIRECT by default · 420f8e24
      Marko Mäkelä authored
      We have innodb_use_native_aio=ON by default since the introduction of
      that parameter in commit 2f9fb41b
      (MySQL 5.5 and MariaDB 5.5).
      
      However, to really benefit from the setting, the files should be
      opened in O_DIRECT mode, to bypass the file system cache.
      In this way, the reads and writes can be submitted with DMA, using
      the InnoDB buffer pool directly, and no processor cycles need to be
      used for copying data. The use of O_DIRECT benefits not only the
      current libaio implementation, but also liburing.
      
      os_file_set_nocache(): Test innodb_flush_method in the function,
      not in the callers.
      420f8e24
  2. 18 Feb, 2021 2 commits
    • Marko Mäkelä's avatar
      MDEV-24915 Galera conflict resolution is unnecessarily complex · 43b239a0
      Marko Mäkelä authored
      The fix of MDEV-23328 introduced a background thread for
      killing conflicting transactions.
      Thanks to the refactoring that was conducted in MDEV-24671,
      the high-priority ("brute-force") applier thread can kill the
      conflicting transactions itself, before waiting for the
      locks to be finally released (after the conflicting transactions
      have been rolled back).
      
      This also allows us to remove the hack LockGGuard that had to
      be added in MDEV-20612, and remove Galera-related function
      parameters from lock creation.
      43b239a0
    • Marko Mäkelä's avatar
      MDEV-20612 fixup: Remove a redundant check · 18dc5b01
      Marko Mäkelä authored
      lock_wait_rpl_report(): Only reload trx->lock.wait_lock
      if lock_sys.wait_mutex had to be released and reacquired.
      18dc5b01
  3. 17 Feb, 2021 11 commits
    • Robert Bindar's avatar
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 94b45787
      Marko Mäkelä authored
      94b45787
    • Kartik Soneji's avatar
      MDEV-19168: Add ssl-flush command. (#1749) · 66b8edf8
      Kartik Soneji authored
      * MDEV-19168: Add ssl-flush command.
      Improve flush error messages and move error printing into the `flush` function.
      66b8edf8
    • Marko Mäkelä's avatar
      MDEV-24738 fixup: heap-use-after-poison in lock_sys_t::deadlock_check() · 9f136700
      Marko Mäkelä authored
      Deadlock::report(): Require the caller to acquire lock_sys.latch
      if invoking on a transaction that is now owned by the current thread.
      9f136700
    • Vladislav Vaintroub's avatar
      Keep old GCC quiet. · e92c34ce
      Vladislav Vaintroub authored
      e92c34ce
    • Marko Mäkelä's avatar
      Merge mariadb-10.5.9 · 16388f39
      Marko Mäkelä authored
      16388f39
    • Marko Mäkelä's avatar
      MDEV-24848 Assertion rlen<llen failed when applying MEMSET · d82386b6
      Marko Mäkelä authored
      btr_cur_upd_rec_in_place(): Prefer WRITE to MEMSET for a single-byte
      operation.
      
      log_phys_t::apply(): Relax the assertion to allow a single-byte MEMSET,
      even though it is 1 byte longer than a WRITE record.
      d82386b6
    • Marko Mäkelä's avatar
      MDEV-24738 Improve the InnoDB deadlock checker · c68007d9
      Marko Mäkelä authored
      A new configuration parameter innodb_deadlock_report is introduced:
      * innodb_deadlock_report=off: Do not report any details of deadlocks.
      * innodb_deadlock_report=basic: Report transactions and waiting locks.
      * innodb_deadlock_report=full (default): Report also the blocking locks.
      
      The improved deadlock checker will consider all involved transactions
      in one loop, even if the deadlock loop includes several transactions.
      The theoretical maximum number of transactions that can be involved in
      a deadlock is `innodb_page_size` * 8, limited by the persistent data
      structures.
      
      Note: Similar to
      mysql/mysql-server@3859219875b62154b921e8c6078c751198071b9c
      our deadlock checker will consider at most one blocking transaction
      for each waiting transaction. The new field trx->lock.wait_trx be
      nullptr if and only if trx->lock.wait_lock is nullptr. Note that
      trx->lock.wait_lock->trx == trx (the waiting transaction), while
      trx->lock.wait_trx points to one of the transactions whose lock is
      conflicting with trx->lock.wait_lock.
      
      Considering only one blocking transaction will greatly simplify
      our deadlock checker, but it may also make the deadlock checker
      blind to some deadlocks where the deadlock cycle is 'hidden' by
      the fact that the registered trx->lock.wait_trx is not actually
      waiting for any InnoDB lock, but something else. So, instead of
      deadlocks, sometimes lock wait timeout may be reported.
      
      To improve on this, whenever trx->lock.wait_trx is changed, we
      will register further 'candidate' transactions in Deadlock::to_check(),
      and check for 'revealed' deadlocks as soon as possible, in lock_release()
      and innobase_kill_query().
      
      The old DeadlockChecker was holding lock_sys.latch, even though using
      lock_sys.wait_mutex should be less contended (and thus preferred)
      in the likely case that no deadlock is present.
      
      lock_wait(): Defer the deadlock check to this function, instead of
      executing it in lock_rec_enqueue_waiting(), lock_table_enqueue_waiting().
      
      DeadlockChecker: Complete rewrite:
      (1) Explicitly keep track of transactions that are being waited for,
      in trx->lock.wait_trx, protected by lock_sys.wait_mutex. Previously,
      we were painstakingly traversing the lock heaps while blocking
      concurrent registration or removal of any locks (even uncontended ones).
      (2) Use Brent's cycle-detection algorithm for deadlock detection,
      traversing each trx->lock.wait_trx edge at most 2 times.
      (3) If a deadlock is detected, release lock_sys.wait_mutex,
      acquire LockMutexGuard, re-acquire lock_sys.wait_mutex and re-invoke
      find_cycle() to find out whether the deadlock is still present.
      (4) Display information on all transactions that are involved in the
      deadlock, and choose a victim to be rolled back.
      
      lock_sys.deadlocks: Replaces lock_deadlock_found. Protected by wait_mutex.
      
      Deadlock::find_cycle(): Quickly find a cycle of trx->lock.wait_trx...
      using Brent's cycle detection algorithm.
      
      Deadlock::report(): Report a deadlock cycle that was found by
      Deadlock::find_cycle(), and choose a victim with the least weight.
      Altogether, we may traverse each trx->lock.wait_trx edge up to 5
      times (2*find_cycle()+1 time for reporting and choosing the victim).
      
      Deadlock::check_and_resolve(): Find and resolve a deadlock.
      
      lock_wait_rpl_report(): Report the waits-for information to
      replication. This used to be executed as part of DeadlockChecker.
      Replication must know the waits-for relations even if no deadlocks
      are present in InnoDB.
      
      Reviewed by: Vladislav Vaintroub
      c68007d9
    • Marko Mäkelä's avatar
      3ddb4fdd
    • Marko Mäkelä's avatar
      MDEV-24884 Hang in ssux_lock_low::write_lock() · 272a1289
      Marko Mäkelä authored
      ssux_lock_low::write_lock(): Before invoking writer_wait(), keep
      attempting write_lock_wait_try() as long as no conflict exists.
      
      rw_lock::upgrade_trylock(): Relax a bogus assertion and correct
      the acquisition operation. Another thread may be executing in
      ssux_lock_low::write_lock() on the same latch. Because we are the
      only thread that can make progress on that latch, we must become
      the writer. Any waiting thread will be eventually woken up by
      ssux_lock_low::u_unlock() or ssux_lock_low::wr_unlock(), but not
      by wr_u_downgrade() because the upgrade is a very rare operation.
      272a1289
    • Marko Mäkelä's avatar
      584e5211
  4. 16 Feb, 2021 4 commits
  5. 15 Feb, 2021 7 commits
  6. 14 Feb, 2021 3 commits
    • Monty's avatar
      MDEV-24855 ER_CRASHED_ON_USAGE or Assertion `length <= column->length' · 34c65402
      Monty authored
      When creating a summary temporary table with bit fields used in the sum
      expression with several parameters, like GROUP_CONCAT(), the counting of
      bits needed in the record was wrong.
      
      The reason we got an assert in Aria was because the bug caused a memory
      overwrite in the record and Aria noticed that the data was 'impossible.
      34c65402
    • Sergei Golubchik's avatar
      updating @@wsrep_cluster_address deadlocks · 26965387
      Sergei Golubchik authored
      wsrep_cluster_address_update() causes LOCK_wsrep_slave_threads
      to be locked under LOCK_wsrep_cluster_config, while normally
      the order should be the opposite.
      
      Fix: don't protect @@wsrep_cluster_address value with the
      LOCK_wsrep_cluster_config, LOCK_global_system_variables is enough.
      
      Only protect wsrep reinitialization with the LOCK_wsrep_cluster_config.
      And make it use a local copy of the global @@wsrep_cluster_address.
      
      Also, introduce a helper function that checks whether
      wsrep_cluster_address is set and also asserts that it can be safely
      read by the caller.
      26965387
    • Vladislav Vaintroub's avatar
  7. 12 Feb, 2021 11 commits
    • Jan Lindström's avatar
      MDEV-24833 : Signal 11 on wsrep_can_run_in_toi at wsrep_mysqld.cc:1994 · b3df194e
      Jan Lindström authored
      Problem was that when engine substitution is allowd (e.g. sql_mode='')
      we must also check db_type. Additionally, we did not resolve
      default storage engine on that case and used that to check is
      TOI possible or not.
      b3df194e
    • Sergei Golubchik's avatar
      fix a 3-way deadlock in galera_sr.galera-features#56 · b91e77cf
      Sergei Golubchik authored
      rarely (try --repeat 1000), the following happens:
      
      * from wsrep_bf_abort (when a thread is being killed), wsrep-lib
      starts streaming_rollback that wants to
      convert_streaming_client_to_applier. wsrep_create_streaming_applier
      creates a new THD(). All while the other THD is being killed,
      so under LOCK_thd_kill and LOCK_thd_data. In particular, THD::init()
      takes LOCK_global_system_variables under LOCK_thd_kill.
      
      * updating @@wsrep_slave_threads takes LOCK_global_system_variables
      and LOCK_wsrep_cluster_config (in that order) and invokes
      wsrep_slave_threads_update() that takes LOCK_wsrep_slave_threads
      
      * wsrep_replication_process() takes LOCK_wsrep_slave_threads and
      invokes wsrep_close_applier(), that does thd->set_killed() which
      takes LOCK_thd_kill.
      
      et voilà.
      
      As a fix I copied a workaround from wsrep_cluster_address_update()
      to wsrep_slave_threads_update(). It seems to be safe: without mutexes
      a race condition is possible and a concurrent SET might change
      wsrep_slave_threads, but wsrep_slave_threads_update() always verifies
      if there's a need to do something, so it will not run twice in this case,
      it'll be a no-op.
      b91e77cf
    • Sergei Golubchik's avatar
      remove find_thread_with_thd_data_lock_callback · 259b9452
      Sergei Golubchik authored
      let the caller take the lock if needed
      259b9452
    • Sergei Golubchik's avatar
      MDEV-23328 Server hang due to Galera lock conflict resolution · eac8341d
      Sergei Golubchik authored
      adaptation of 29bbcac0 for 10.4
      eac8341d
    • Sergei Golubchik's avatar
      don't take mutexes conditionally · 9703cffa
      Sergei Golubchik authored
      9703cffa
    • Sergei Golubchik's avatar
      cleanup: THD::abort_current_cond_wait() · 259a1902
      Sergei Golubchik authored
      * reuse the loop in THD::abort_current_cond_wait, don't duplicate it
      * find_thread_by_id should return whatever it has found, it's the
        caller's task not to kill COM_DAEMON (if the caller's a killer)
      
      and other minor changes
      259a1902
    • Elena Stepanova's avatar
      List of unstable tests for 10.4.18 release · cbbcc8fa
      Elena Stepanova authored
      Test code modifications and new failures from buildbot registered
      only for the main suite. The rest was updated partially,
      based on the status of existing JIRA items
      cbbcc8fa
    • Sergei Golubchik's avatar
      Merge branch 'bb-10.3-release' into bb-10.4-release · 00a313ec
      Sergei Golubchik authored
      Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution"
      was null-merged. 10.4 version of the fix is coming up separately
      00a313ec
    • Marko Mäkelä's avatar
      MDEV-24643: Assertion failed in rw_lock::update_unlock() · a1542f8a
      Marko Mäkelä authored
      mtr_defer_drop_ahi(): Upgrade the U lock to X lock and downgrade
      it back to U lock in case the adaptive hash index needs to be dropped.
      
      This regression was introduced in
      commit 03ca6495 (MDEV-24142).
      a1542f8a
    • Marko Mäkelä's avatar
      MDEV-20612: Enable concurrent lock_release() · 26d6224d
      Marko Mäkelä authored
      lock_release_try(): Try to release locks while only holding
      shared lock_sys.latch.
      
      lock_release(): If 5 attempts of lock_release_try() fail,
      proceed to acquire exclusive lock_sys.latch.
      26d6224d
    • Marko Mäkelä's avatar
      MDEV-20612: Partition lock_sys.latch · b08448de
      Marko Mäkelä authored
      We replace the old lock_sys.mutex (which was renamed to lock_sys.latch)
      with a combination of a global lock_sys.latch and table or page hash lock
      mutexes.
      
      The global lock_sys.latch can be acquired in exclusive mode, or
      it can be acquired in shared mode and another mutex will be acquired
      to protect the locks for a particular page or a table.
      
      This is inspired by
      mysql/mysql-server@1d259b87a63defa814e19a7534380cb43ee23c48
      but the optimization of lock_release() will be done in the next commit.
      Also, we will interleave mutexes with the hash table elements, similar
      to how buf_pool.page_hash was optimized
      in commit 5155a300 (MDEV-22871).
      
      dict_table_t::autoinc_trx: Use Atomic_relaxed.
      
      dict_table_t::autoinc_mutex: Use srw_mutex in order to reduce the
      memory footprint. On 64-bit Linux or OpenBSD, both this and the new
      dict_table_t::lock_mutex should be 32 bits and be stored in the same
      64-bit word. On Microsoft Windows, the underlying SRWLOCK is 32 or 64
      bits, and on other systems, sizeof(pthread_mutex_t) can be much larger.
      
      ib_lock_t::trx_locks, trx_lock_t::trx_locks: Document the new rules.
      Writers must assert lock_sys.is_writer() || trx->mutex_is_owner().
      
      LockGuard: A RAII wrapper for acquiring a page hash table lock.
      
      LockGGuard: Like LockGuard, but when Galera Write-Set Replication
      is enabled, we must acquire all shards, for updating arbitrary trx_locks.
      
      LockMultiGuard: A RAII wrapper for acquiring two page hash table locks.
      
      lock_rec_create_wsrep(), lock_table_create_wsrep(): Special
      Galera conflict resolution in non-inlined functions in order
      to keep the common code paths shorter.
      
      lock_sys_t::prdt_page_free_from_discard(): Refactored from
      lock_prdt_page_free_from_discard() and
      lock_rec_free_all_from_discard_page().
      
      trx_t::commit_tables(): Replaces trx_update_mod_tables_timestamp().
      
      lock_release(): Let trx_t::commit_tables() invalidate the query cache
      for those tables that were actually modified by the transaction.
      Merge lock_check_dict_lock() to lock_release().
      
      We must never release lock_sys.latch while holding any
      lock_sys_t::hash_latch. Failure to do that could lead to
      memory corruption if the buffer pool is resized between
      the time lock_sys.latch is released and the hash_latch is released.
      b08448de
  8. 11 Feb, 2021 1 commit