1. 17 Feb, 2021 4 commits
    • Marko Mäkelä's avatar
      MDEV-24738 Improve the InnoDB deadlock checker · c68007d9
      Marko Mäkelä authored
      A new configuration parameter innodb_deadlock_report is introduced:
      * innodb_deadlock_report=off: Do not report any details of deadlocks.
      * innodb_deadlock_report=basic: Report transactions and waiting locks.
      * innodb_deadlock_report=full (default): Report also the blocking locks.
      
      The improved deadlock checker will consider all involved transactions
      in one loop, even if the deadlock loop includes several transactions.
      The theoretical maximum number of transactions that can be involved in
      a deadlock is `innodb_page_size` * 8, limited by the persistent data
      structures.
      
      Note: Similar to
      mysql/mysql-server@3859219875b62154b921e8c6078c751198071b9c
      our deadlock checker will consider at most one blocking transaction
      for each waiting transaction. The new field trx->lock.wait_trx be
      nullptr if and only if trx->lock.wait_lock is nullptr. Note that
      trx->lock.wait_lock->trx == trx (the waiting transaction), while
      trx->lock.wait_trx points to one of the transactions whose lock is
      conflicting with trx->lock.wait_lock.
      
      Considering only one blocking transaction will greatly simplify
      our deadlock checker, but it may also make the deadlock checker
      blind to some deadlocks where the deadlock cycle is 'hidden' by
      the fact that the registered trx->lock.wait_trx is not actually
      waiting for any InnoDB lock, but something else. So, instead of
      deadlocks, sometimes lock wait timeout may be reported.
      
      To improve on this, whenever trx->lock.wait_trx is changed, we
      will register further 'candidate' transactions in Deadlock::to_check(),
      and check for 'revealed' deadlocks as soon as possible, in lock_release()
      and innobase_kill_query().
      
      The old DeadlockChecker was holding lock_sys.latch, even though using
      lock_sys.wait_mutex should be less contended (and thus preferred)
      in the likely case that no deadlock is present.
      
      lock_wait(): Defer the deadlock check to this function, instead of
      executing it in lock_rec_enqueue_waiting(), lock_table_enqueue_waiting().
      
      DeadlockChecker: Complete rewrite:
      (1) Explicitly keep track of transactions that are being waited for,
      in trx->lock.wait_trx, protected by lock_sys.wait_mutex. Previously,
      we were painstakingly traversing the lock heaps while blocking
      concurrent registration or removal of any locks (even uncontended ones).
      (2) Use Brent's cycle-detection algorithm for deadlock detection,
      traversing each trx->lock.wait_trx edge at most 2 times.
      (3) If a deadlock is detected, release lock_sys.wait_mutex,
      acquire LockMutexGuard, re-acquire lock_sys.wait_mutex and re-invoke
      find_cycle() to find out whether the deadlock is still present.
      (4) Display information on all transactions that are involved in the
      deadlock, and choose a victim to be rolled back.
      
      lock_sys.deadlocks: Replaces lock_deadlock_found. Protected by wait_mutex.
      
      Deadlock::find_cycle(): Quickly find a cycle of trx->lock.wait_trx...
      using Brent's cycle detection algorithm.
      
      Deadlock::report(): Report a deadlock cycle that was found by
      Deadlock::find_cycle(), and choose a victim with the least weight.
      Altogether, we may traverse each trx->lock.wait_trx edge up to 5
      times (2*find_cycle()+1 time for reporting and choosing the victim).
      
      Deadlock::check_and_resolve(): Find and resolve a deadlock.
      
      lock_wait_rpl_report(): Report the waits-for information to
      replication. This used to be executed as part of DeadlockChecker.
      Replication must know the waits-for relations even if no deadlocks
      are present in InnoDB.
      
      Reviewed by: Vladislav Vaintroub
      c68007d9
    • Marko Mäkelä's avatar
      3ddb4fdd
    • Marko Mäkelä's avatar
      MDEV-24884 Hang in ssux_lock_low::write_lock() · 272a1289
      Marko Mäkelä authored
      ssux_lock_low::write_lock(): Before invoking writer_wait(), keep
      attempting write_lock_wait_try() as long as no conflict exists.
      
      rw_lock::upgrade_trylock(): Relax a bogus assertion and correct
      the acquisition operation. Another thread may be executing in
      ssux_lock_low::write_lock() on the same latch. Because we are the
      only thread that can make progress on that latch, we must become
      the writer. Any waiting thread will be eventually woken up by
      ssux_lock_low::u_unlock() or ssux_lock_low::wr_unlock(), but not
      by wr_u_downgrade() because the upgrade is a very rare operation.
      272a1289
    • Marko Mäkelä's avatar
      584e5211
  2. 16 Feb, 2021 2 commits
  3. 15 Feb, 2021 1 commit
  4. 14 Feb, 2021 1 commit
  5. 12 Feb, 2021 3 commits
    • Marko Mäkelä's avatar
      MDEV-24643: Assertion failed in rw_lock::update_unlock() · a1542f8a
      Marko Mäkelä authored
      mtr_defer_drop_ahi(): Upgrade the U lock to X lock and downgrade
      it back to U lock in case the adaptive hash index needs to be dropped.
      
      This regression was introduced in
      commit 03ca6495 (MDEV-24142).
      a1542f8a
    • Marko Mäkelä's avatar
      MDEV-20612: Enable concurrent lock_release() · 26d6224d
      Marko Mäkelä authored
      lock_release_try(): Try to release locks while only holding
      shared lock_sys.latch.
      
      lock_release(): If 5 attempts of lock_release_try() fail,
      proceed to acquire exclusive lock_sys.latch.
      26d6224d
    • Marko Mäkelä's avatar
      MDEV-20612: Partition lock_sys.latch · b08448de
      Marko Mäkelä authored
      We replace the old lock_sys.mutex (which was renamed to lock_sys.latch)
      with a combination of a global lock_sys.latch and table or page hash lock
      mutexes.
      
      The global lock_sys.latch can be acquired in exclusive mode, or
      it can be acquired in shared mode and another mutex will be acquired
      to protect the locks for a particular page or a table.
      
      This is inspired by
      mysql/mysql-server@1d259b87a63defa814e19a7534380cb43ee23c48
      but the optimization of lock_release() will be done in the next commit.
      Also, we will interleave mutexes with the hash table elements, similar
      to how buf_pool.page_hash was optimized
      in commit 5155a300 (MDEV-22871).
      
      dict_table_t::autoinc_trx: Use Atomic_relaxed.
      
      dict_table_t::autoinc_mutex: Use srw_mutex in order to reduce the
      memory footprint. On 64-bit Linux or OpenBSD, both this and the new
      dict_table_t::lock_mutex should be 32 bits and be stored in the same
      64-bit word. On Microsoft Windows, the underlying SRWLOCK is 32 or 64
      bits, and on other systems, sizeof(pthread_mutex_t) can be much larger.
      
      ib_lock_t::trx_locks, trx_lock_t::trx_locks: Document the new rules.
      Writers must assert lock_sys.is_writer() || trx->mutex_is_owner().
      
      LockGuard: A RAII wrapper for acquiring a page hash table lock.
      
      LockGGuard: Like LockGuard, but when Galera Write-Set Replication
      is enabled, we must acquire all shards, for updating arbitrary trx_locks.
      
      LockMultiGuard: A RAII wrapper for acquiring two page hash table locks.
      
      lock_rec_create_wsrep(), lock_table_create_wsrep(): Special
      Galera conflict resolution in non-inlined functions in order
      to keep the common code paths shorter.
      
      lock_sys_t::prdt_page_free_from_discard(): Refactored from
      lock_prdt_page_free_from_discard() and
      lock_rec_free_all_from_discard_page().
      
      trx_t::commit_tables(): Replaces trx_update_mod_tables_timestamp().
      
      lock_release(): Let trx_t::commit_tables() invalidate the query cache
      for those tables that were actually modified by the transaction.
      Merge lock_check_dict_lock() to lock_release().
      
      We must never release lock_sys.latch while holding any
      lock_sys_t::hash_latch. Failure to do that could lead to
      memory corruption if the buffer pool is resized between
      the time lock_sys.latch is released and the hash_latch is released.
      b08448de
  6. 11 Feb, 2021 5 commits
  7. 10 Feb, 2021 2 commits
  8. 09 Feb, 2021 1 commit
  9. 08 Feb, 2021 4 commits
    • Monty's avatar
      MDEV-24087 s3.replication_partition fails in buildbot wiht replication failure · ffc5d064
      Monty authored
      A few of the failures was because of missing sync_slave_to_master in
      the test suite.
      
      However, the biggest reason for most faulures was that in case of
      ALTER PARTITION the master writes the query to the binary log before
      it has updated the .frm and .par files. This causes a problem for an
      S3 slave as it will start execute the ALTER PARTITION but get old .frm and
      .par files from S3 which causes "open table" to fail, either with an error
      or in some case with a crash.
      Fixed
      ffc5d064
    • Monty's avatar
      Make maria_data_root const char* · bd5ac038
      Monty authored
      This allow one to remove some casts like:
      maria_data_root= (char *)".";
      
      It also removes warnings from icc.
      bd5ac038
    • Monty's avatar
      Added 'const' to arguments in get_one_option and find_typeset() · 5d6ad2ad
      Monty authored
      One should not change the program arguments!
      This change also reduces warnings from the icc compiler.
      
      Almost all changes are just syntax changes (adding const to
      'get_one_option function' declarations).
      
      Other changes:
      - Added a few cast of 'argument' from 'const char*' to 'char *'. This
        was mainly in calls to 'external' functions we don't have control of.
      - Ensure that all reset of 'password command line argument' are similar.
        (In almost all cases it was just adding a comment and a cast)
      - In mysqlbinlog.cc and mysqld.cc there was a few cases that changed
        the command line argument. These places where changed to instead allocate
        the option in a MEM_ROOT to avoid changing the argument. Some of this
        code was changed to ensure that different programs did parsing the
        same way. Added a test case for the changes in mysqlbinlog.cc
      - Changed a few variables that took their value from command line options
        from 'char *' to 'const char *'.
      5d6ad2ad
    • Monty's avatar
      Ensure that mysqlbinlog frees all memory at exit · e30a3048
      Monty authored
      e30a3048
  10. 07 Feb, 2021 5 commits
  11. 05 Feb, 2021 7 commits
    • Marko Mäkelä's avatar
      MDEV-21452 fixup: Introduce trx_t::mutex_is_owner() · 487fbc2e
      Marko Mäkelä authored
      When we replaced trx_t::mutex with srw_mutex
      in commit 38fd7b7d
      we lost the SAFE_MUTEX instrumentation.
      Let us introduce a replacement and restore the assertions.
      487fbc2e
    • Marko Mäkelä's avatar
      455514c8
    • Marko Mäkelä's avatar
      MDEV-24789: Reduce sizeof(trx_lock_t) · 3e45f8e3
      Marko Mäkelä authored
      trx_lock_t::cond: Use pthread_cond_t directly, because no instrumentation
      will ever be used. This saves sizeof(void*) and removes some duplicated
      inline code.
      
      trx_lock_t::was_chosen_as_wsrep_victim: Fold into
      trx_lock_t::was_chosen_as_deadlock_victim.
      
      trx_lock_t::cancel, trx_lock_t::rec_cached, trx_lock_t::table_cached:
      Use only one byte of storage, reducing memory alignment waste.
      
      On AMD64 GNU/Linux, MDEV-24671 caused a sizeof(trx_lock_t) increase
      of 48 bytes (plus the PLUGIN_PERFSCHEMA overhead of trx_lock_t::cond).
      These changes should save 32 bytes.
      3e45f8e3
    • Marko Mäkelä's avatar
      Cleanup: Reduce some lock_sys.mutex contention · 465bdabb
      Marko Mäkelä authored
      lock_table(): Remove the constant parameter flags=0.
      
      lock_table_resurrect(): Merge lock_table_ix_resurrect() and
      lock_table_x_resurrect().
      
      lock_rec_lock(): Only acquire LockMutexGuard if lock_table_has()
      does not hold.
      465bdabb
    • Marko Mäkelä's avatar
      MDEV-24731 fixup: bogus assertion · de407e7c
      Marko Mäkelä authored
      DeadlockChecker::search(): Move a bogus assertion into a condition.
      If the current transaction is waiting for a table lock (on something
      else than an auto-increment lock), it is well possible that other
      transactions are holding not only a conflicting lock, but also an
      auto-increment lock.
      
      This mistake was noticed during the testing of MDEV-24731, but it was
      accidentally introduced in commit 5f463857.
      
      lock_wait_end(): Remove an unused variable, and add an assertion.
      de407e7c
    • Marko Mäkelä's avatar
      MDEV-24781 fixup: Adjust innodb.innodb-index-debug · c42ee8a7
      Marko Mäkelä authored
      Now that an INSERT into an empty table is replicated more efficiently
      during online ALTER, an old test case started to fail. Let us disable
      the MDEV-515 logic for the critical INSERT statement.
      c42ee8a7
    • Thirunarayanan Balathandayuthapani's avatar
      MDEV-24781 Assertion `mode == 16 || mode == 12 || fix_block->page.status !=... · 597510ad
      Thirunarayanan Balathandayuthapani authored
      MDEV-24781 Assertion `mode == 16 || mode == 12 || fix_block->page.status != buf_page_t::FREED' failed in buf_page_get_low
      
      This is caused by commit 3cef4f8f
      (MDEV-515). dict_table_t::clear() frees all the blob during
      rollback of bulk insert.But online log tries to read the
      freed blob while applying the log. It can be fixed if we
      truncate the online log during rollback of bulk insert operation.
      597510ad
  12. 04 Feb, 2021 1 commit
    • Marko Mäkelä's avatar
      MDEV-24731 Excessive mutex contention in DeadlockChecker::check_and_resolve() · 5f463857
      Marko Mäkelä authored
      The DeadlockChecker expects to be able to freeze the waits-for graph.
      Hence, it is best executed somewhere where we are not holding any
      additional mutexes.
      
      lock_wait(): Defer the deadlock check to this function, instead
      of executing it in lock_rec_enqueue_waiting(), lock_table_enqueue_waiting().
      
      DeadlockChecker::trx_rollback(): Merge with the only caller,
      check_and_resolve().
      
      LockMutexGuard: RAII accessor for lock_sys.mutex.
      
      lock_sys.deadlocks: Replaces lock_deadlock_found.
      
      trx_t: Clean up some comments.
      5f463857
  13. 03 Feb, 2021 1 commit
    • Monty's avatar
      MDEV-24750 Various corruptions caused by Aria subsystem... · eacefbca
      Monty authored
      The test case was setting aria_sort_buffer_size to MAX_ULONGLONG-1
      which was not handled gracefully by my_malloc() or safemalloc().
      Fixed by ensuring that the malloc functions returns 0 if the size
      is too big.
      I also added some protection to Aria repair:
      - Limit sort_buffer_size to 16G (after that a bigger sort buffer will
        not help that much anyway)
      - Limit sort_buffer_size also according to sort file size. This will
        help by not allocating less memory if someone sets the buffer size too
        high.
      eacefbca
  14. 02 Feb, 2021 3 commits