- 03 Dec, 2020 7 commits
-
-
Marko Mäkelä authored
InnoDB buffer pool block and index tree latches depend on a special kind of read-update-write lock that allows reentrant (recursive) acquisition of the 'update' and 'write' locks as well as an upgrade from 'update' lock to 'write' lock. The 'update' lock allows any number of reader locks from other threads, but no concurrent 'update' or 'write' lock. If there were no requirement to support an upgrade from 'update' to 'write', we could compose the lock out of two srw_lock (implemented as any type of native rw-lock, such as SRWLOCK on Microsoft Windows). Removing this requirement is very difficult, so in commit f7e7f487d4b06695f91f6fbeb0396b9d87fc7bbf we implemented an 'update' mode to our srw_lock. Re-entrant or recursive locking is mostly needed when writing or freeing BLOB pages, but also in crash recovery or when merging buffered changes to an index page. The re-entrancy allows us to attach a previously acquired page to a sub-mini-transaction that will be committed before whatever else is holding the page latch. The SUX lock supports Shared ('read'), Update, and eXclusive ('write') locking modes. The S latches are not re-entrant, but a single S latch may be acquired even if the thread already holds an U latch. The idea of the U latch is to allow a write of something that concurrent readers do not care about (such as the contents of BTR_SEG_LEAF, BTR_SEG_TOP and other page allocation metadata structures, or the MDEV-6076 PAGE_ROOT_AUTO_INC). (The PAGE_ROOT_AUTO_INC field is only updated when a dict_table_t for the table exists, and only read when a dict_table_t for the table is being added to dict_sys.) block_lock::u_lock_try(bool for_io=true) is used in buf_flush_page() to allow concurrent readers but no concurrent modifications while the page is being written to the data file. That latch will be released by buf_page_write_complete() in a different thread. Hence, we use the special lock owner value FOR_IO. The index_lock::u_lock() improves concurrency on operations that involve non-leaf index pages. The interface has been cleaned up a little. We will use x_lock_recursive() instead of x_lock() when we know that a lock is already held by the current thread. Similarly, a lock upgrade from U to X is only allowed via u_x_upgrade() or x_lock_upgraded() but not via x_lock(). We will disable the LatchDebug and sync_array interfaces to InnoDB rw-locks. The SEMAPHORES section of SHOW ENGINE INNODB STATUS output will no longer include any information about InnoDB rw-locks, only TTASEventMutex (cmake -DMUTEXTYPE=event) waits. This will make a part of the 'innotop' script dead code. The block_lock buf_block_t::lock will not be covered by any PERFORMANCE_SCHEMA instrumentation. SHOW ENGINE INNODB MUTEX and INFORMATION_SCHEMA.INNODB_MUTEXES will no longer output source code file names or line numbers. The dict_index_t::lock will be identified by index and table names, which should be much more useful. PERFORMANCE_SCHEMA is lumping information about all dict_index_t::lock together as event_name='wait/synch/sxlock/innodb/index_tree_rw_lock'. buf_page_free(): Remove the file,line parameters. The sux_lock will not store such diagnostic information. buf_block_dbg_add_level(): Define as empty macro, to be removed in a subsequent commit. Unless the build was configured with cmake -DPLUGIN_PERFSCHEMA=NO the index_lock dict_index_t::lock will be instrumented via PERFORMANCE_SCHEMA. Similar to commit 1669c889 we will distinguish lock waits by registering shared_lock,exclusive_lock events instead of try_shared_lock,try_exclusive_lock. Actual 'try' operations will not be instrumented at all. rw_lock_list: Remove. After MDEV-24167, this only covered buf_block_t::lock and dict_index_t::lock. We will output their information by traversing buf_pool or dict_sys.
-
Marko Mäkelä authored
The PERFORMANCE_SCHEMA insists on distinguishing read-update-write locks from read-write locks, so we must add template<bool support_u_lock> in rd_lock() and wr_lock() operations. rd_lock::read_trylock(): Add template<bool prioritize_updater=false> which is used by the srw_lock_low::read_lock() loop. As long as an UPDATE lock has already been granted to some thread, we will grant subsequent READ lock requests even if a waiting WRITE lock request exists. This will be necessary to be compatible with existing usage pattern of InnoDB rw_lock_t where the holder of SX-latch (which we will rename to UPDATE latch) may acquire an additional S-latch on the same object. For normal read-write locks without update operations this should make no difference at all, because the rw_lock::UPDATER flag would never be set.
-
Marko Mäkelä authored
The extension of the test perfschema.sxlock_func in commit 1669c889 turned out to be unstable. Let us filter out purge_sys.latch (trx_purge_latch) from the output, because it might happen that the purge tasks will not be executed during the test execution.
-
Marko Mäkelä authored
Let us try to avoid code bloat for the common case that performance_schema is disabled at runtime, and use ATTRIBUTE_NOINLINE member functions for instrumented latch acquisition. Also, let us distinguish lock waits from non-contended lock requests by using write_lock,read_lock for the requests that lead to waits, and try_write_lock,try_read_lock for the wait-free lock acquisitions. Actual 'try' operations are not being instrumented at all.
-
Marko Mäkelä authored
In commit 1fdc161d we introduced a mutex-and-condition-variable based fallback implementation for platforms that lack a futex system call. That implementation is prone to hangs. Let us use separate condition variables for shared and exclusive requests.
-
Marko Mäkelä authored
-
Marko Mäkelä authored
The clang++ -stdlib=libc++ header file <fstream> depends on <filesystem> that defines a member function path::root_name(), which conflicts with the rather unused #define root_name() that had been introduced in commit 7c58e97b. Because an instrumented -stdlib=libc++ (rather than the default -stdlib=libstdc++) is easier to build for a working -fsanitize=memory (cmake -DWITH_MSAN=ON), let us remove the conflicting #define for now.
-
- 02 Dec, 2020 5 commits
-
-
Marko Mäkelä authored
Sorry, only tested commit 4174fc1a on clang. Other compilers do not define __has_feature().
-
Marko Mäkelä authored
For some reason, commit 5bb5d4ad made clang++-11 unhappy about a constexpr declaration.
-
Marko Mäkelä authored
For some reason, the test was never adjusted for commit e6a50e41.
-
Marko Mäkelä authored
-
Marko Mäkelä authored
The Galera tests were massively failing with debug assertions.
-
- 01 Dec, 2020 8 commits
-
-
Marko Mäkelä authored
-
Vlad Lesin authored
Post-push Windows compilation errors fix.
-
Marko Mäkelä authored
-
Monty authored
Change thd->mdl_context.release_transactional_locks() to thd->mdl_release_transactional_locks()
-
Marko Mäkelä authored
row_undo_ins_parse_undo_rec(): Do not try to read non-existing virtual column information for the metadata record.
-
Marko Mäkelä authored
-
Marko Mäkelä authored
The replacement is buf_pool.contains_zip().
-
Vlad Lesin authored
The new option --log-innodb-page-corruption is introduced. When this option is set, backup is not interrupted if innodb corrupted page is detected. Instead it logs all found corrupted pages in innodb_corrupted_pages file in backup directory and finishes with error. For incremental backup corrupted pages are also copied to .delta file, because we can't do LSN check for such pages during backup, innodb_corrupted_pages will also be created in incremental backup directory. During --prepare, corrupted pages list is read from the file just after redo log is applied, and each page from the list is checked if it is allocated in it's tablespace or not. If it is not allocated, then it is zeroed out, flushed to the tablespace and removed from the list. If all pages are removed from the list, then --prepare is finished successfully and innodb_corrupted_pages file is removed from backup directory. Otherwise --prepare is finished with error message and innodb_corrupted_pages contains the list of the pages, which are detected as corrupted during backup, and are allocated in their tablespaces, what means backup directory contains corrupted innodb pages, and backup can not be considered as consistent. For incremental --prepare corrupted pages from .delta files are applied to the base backup, innodb_corrupted_pages is read from both base in incremental directories, and the same action is proceded for corrupted pages list as for full --prepare. innodb_corrupted_pages file is modified or removed only in base directory. If DDL happens during backup, it is also processed at the end of backup to have correct tablespace names in innodb_corrupted_pages.
-
- 30 Nov, 2020 15 commits
-
-
Monty authored
The reason for the failure is that thd->mdl_context.release_transactional_locks() was called after commit & rollback even in cases where the current transaction is still active. For 10.2, 10.3 and 10.4 the fix is simple: - Replace all calls to thd->mdl_context.release_transactional_locks() with thd->release_transactional_locks(). The thd function will only call the mdl_context function if there are no active transactional locks. In 10.6 we will better fix where we will change the return value for some trans_xxx() functions to indicate if transaction did close the transaction or not. This will avoid the need of the indirect call. Other things: - trans_xa_commit() and trans_xa_rollback() will automatically call release_transactional_locks() if the transaction is closed. - We can't do that for the other functions as the caller of many of these are doing additional work (like close_thread_tables) before calling release_transactional_locks(). - Added missing abort_result_set() and missing DBUG_RETURN in select_create::send_eof() - Fixed wrong indentation in injector::transaction::commit()
-
Monty authored
-
Monty authored
The real fix for MDEV-15532 will be pushed into 10.2 and 10.6 This is an additional fix for 10.4. In 10.4 trans_xa_detach was introduced. However THD::cleanup() assumes that after trans_xa_detach() is done, there is no registered transactions anymore. In the 10.2 patch there will be an assert to ensure this, which will cause 10.4 to fail. The fix used is to reset the transaction flags in trans_xa_detach().
-
Monty authored
-
Vladislav Vaintroub authored
- the intention for my_getevents syscall is now better explained, why are we using it (to be able to interrupt io_getevents syscall via io_destroy()). - Fix comment for MAX_EVENTS in getevent_thread_routine. MAX_EVENTS is more of less arbitrary constant, chosen such that events array is big enough to get multiple simultaneous io completions, but small enough so it does not blow the thread's stack.
-
Vladislav Vaintroub authored
If maintenance timer does not do much for prolonged time, it will wake up less frequently, once every 4 seconds instead of once every 0.4 second. It will wakeup more often if thread creation is throttled, to avoid stalls.
-
Monty authored
-
Sergei Petrunia authored
-
Marko Mäkelä authored
-
Marko Mäkelä authored
For some reason, InnoDB debug tests on Windows fail due to rw_lock_t if the function call overhead for some os_thread_ code is removed. This change worked fine on Windows in combination with MDEV-24142.
-
Varun Gupta authored
MDEV-21265: IN predicate conversion to IN subquery should be allowed for a broader set of datatype comparison Allow materialization strategy when collations on the inner and outer sides of an IN subquery are the same and the character set of the inner side is a proper subset of the character set on the outer side. This allows conversion from utf8mb3 to utf8mb4 as the former is a subset of the later. This is only allowed when IN predicate is converted to an IN subquery Backported part of the patch (d6a00d9b) of MDEV-17905.
-
Marko Mäkelä authored
-
Marko Mäkelä authored
Let us always base srw_lock on our own std::atomic<uint32_t> based rw_lock. In this way, we can extend the locks in a portable way across all platforms. We will use futex system calls where available: Linux, OpenBSD, and Microsoft Windows. Elsewhere, we will emulate futex with a mutex and a condition variable. Thanks to Daniel Black for testing this on OpenBSD.
-
Marko Mäkelä authored
-
Marko Mäkelä authored
os_thread_pf(): Remove. os_thread_eq(), os_thread_yield(), os_thread_get_curr_id(): Define as macros. ut_print_timestamp(), ut_sprintf_timestamp(): Simplify.
-
- 28 Nov, 2020 1 commit
-
-
Marko Mäkelä authored
-
- 27 Nov, 2020 1 commit
-
-
Igor Babaev authored
When executing set operations in a pipeline using only one temporary table additional scans of intermediate results may be needed. The scans are performed with usage of the rnd_next() handler function that might leave record buffers used for the temporary table not in a state that is good for following writes into the table. For example it happens for aria engine when the last call of rnd_next() encounters only deleted records. Thus a cleanup of record buffers is needed after each such scan of the temporary table. Approved by Oleksandr Byelkin <sanja@mariadb.com>
-
- 26 Nov, 2020 3 commits