- 21 Dec, 2020 8 commits
-
-
Marko Mäkelä authored
-
Marko Mäkelä authored
srv_monitor_task(): Make the innodb_fatal_semaphore_wait_threshold watchdog tolerate non-monotonic clock. On NUMA systems, the my_hrtime_coarse() executed by different NUMA nodes are not in sync, and the clock could appear to run backwards. We must treat negative time durations as zero, just like we did in commit ff5d306e in dict_sys_t::mutex_lock_wait(). The wrong logic caused occasional crashes of the test mariabackup.apply-log-only-incr when it was run concurrently with itself with a large number of instances.
-
Marko Mäkelä authored
We are seeing !buf_pool.any_io_pending() assertion failures in srv_start() ever since MDEV-21452 in 10.6. But, the problem appears to be older. In 10.5 since MDEV-19514 removed writes from the precursor of buf_page_read_complete(), it seems that the debug assertion failure could have been harmless. recv_sys_t::apply(): At the end of each batch, wait not only for all log records to have been processed, but also for all pending reads to complete, so that the buffer pool will be in an idle state.
-
Sergei Golubchik authored
list all supported options in the comment. remove wsrep-specific hack of EXPORT_SYMBOLS, wsrep-specific hacks belong to wsrep
-
Sergei Golubchik authored
Reverts 10.5 commit 6033cc85 The fix a587ded2 will be merged from 10.2
-
Sergei Golubchik authored
-
Sergei Golubchik authored
-
Marko Mäkelä authored
In commit 0c23e32d (MDEV-24445) we forgot to keep m_freed_space in sync with m_freed_pages in one case.
-
- 19 Dec, 2020 1 commit
-
-
Marko Mäkelä authored
-
- 18 Dec, 2020 4 commits
-
-
Marko Mäkelä authored
In the rewrite of MDEV-8139 (based on MDEV-15528), we introduced a wrong assumption that any persistent tablespace that is not an .ibd file is the system tablespace. This assumption is broken when innodb_undo_tablespaces (files undo001, undo002, ...) are being used. By default, we have innodb_undo_tablespaces=0 (the persistent undo log is being stored in the system tablespace). In MDEV-15528 and MDEV-8139 we rewrote the page scrubbing logic so that it will follow the tried-and-true write-ahead logging protocol, first writing FREE_PAGE records and then in the page flushing, zerofilling or hole-punching freed pages. Unfortunately, the implementation included a wrong assumption that that anything that is not in an .ibd file must be the system tablespace. This wrong assumption would cause overwrites of valid data pages in the system tablespace. mtr_t::m_freed_in_system_tablespace: Remove. mtr_t::m_freed_space: The tablespace associated with m_freed_pages. buf_page_free(): Take the tablespace and page number as a parameter, instead of taking a page identifier.
-
Marko Mäkelä authored
A race condition between deleting an .ibd file and fil_crypt_thread marking pages dirty was introduced in commit 118e258a (part of MDEV-23855). fil_space_t::acquire_if_not_stopped(): Correctly return false if the STOPPING flag is set, indicating that any further activity on the tablespace must be avoided. Also, remove the constant parameter have_mutex=true and move the function declaration to the same compilation unit with the only callers. fil_crypt_flush_space(): Remove an unused variable.
-
Marko Mäkelä authored
-
Marko Mäkelä authored
fil_crypt_find_space_to_rotate(): Always treat the sentinel value that indicates that we have run out of work, even if at the same time the thread should shut down due to other reasons. Thanks to Matthias Leich for reproducing this bug with RQG.
-
- 17 Dec, 2020 2 commits
-
-
Marko Mäkelä authored
-
Marko Mäkelä authored
After MDEV-15528, two modes of operation in the fil_crypt_thread remains, depending on whether innodb_encryption_rotate_key_age=0 (whether key rotation is disabled). If the key rotation is disabled, the fil_crypt_thread miss the opportunity to sleep, which will result in lots of wasted CPU usage. fil_crypt_return_iops(): Add a parameter to specify whether other fil_crypt_thread should be woken up. fil_system_t::keyrotate_next(): Return the special value fil_system.temp_space to indicate that no work is to be done. fil_space_t::next(): Propagage the special value fil_system.temp_space to the caller. fil_crypt_find_space_to_rotate(): If no work is to be done, do not wake up other threads.
-
- 16 Dec, 2020 2 commits
-
-
Marko Mäkelä authored
With system mutexes, contention can be very expensive. Let us configure innodb_encryption_threads=1 to minimize contention. The actual work is being done in buf_flush_page_cleaner thread anyway.
-
Marko Mäkelä authored
It turns out that the hang that was fixed in commit 43d3dad1 for the SRW_LOCK_DUMMY implementation is also possible in the futex implementation. We have observed hangs of ssux_lock_low::u_unlock() on Windows where the undesirable value is rw_lock::UPDATER, in the test mariabackup.xb_compressed_encrypted. The exact sequence of events to the hang is not known, but it seems that u_unlock() had better always wake up one thread. Possibly, the case involves multiple blocked u_unlock(). On a busy server, the hang might be 'rescued' by a subsequent lock acquisition and release that is executed by another thread. rw_lock::update_unlock(): Change the return type to void. ssux_lock_low::u_unlock(): Always invoke readers_wake() [sic], to wake up any pending update_lock() or write_lock(). On futex implementation, this will wake up all waiters. On SRW_LOCK_DUMMY, writer_wake() and readers_wake() do the same thing: wake up one write_lock(), or all update_lock() waiters.
-
- 15 Dec, 2020 19 commits
-
-
Etienne Guesnet authored
-
Etienne Guesnet authored
-
Etienne Guesnet authored
-
Etienne Guesnet authored
-
Etienne Guesnet authored
-
Etienne Guesnet authored
-
Etienne Guesnet authored
-
Etienne Guesnet authored
-
Etienne Guesnet authored
-
Etienne Guesnet authored
-
Etienne Guesnet authored
-
Marko Mäkelä authored
Most hangs seem to involve dict_sys.mutex. While holding lock_sys.mutex we rarely acquire any buffer pool page latches, which are a frequent source of potential hangs.
-
Marko Mäkelä authored
SHOW ENGINE INNODB MUTEX functionality is completely removed, as are the InnoDB latching order checks. We will enforce innodb_fatal_semaphore_wait_threshold only for dict_sys.mutex and lock_sys.mutex. dict_sys_t::mutex_lock(): A single entry point for dict_sys.mutex. lock_sys_t::mutex_lock(): A single entry point for lock_sys.mutex. FIXME: srv_sys should be removed altogether; it is duplicating tpool functionality. fil_crypt_threads_init(): To prevent SAFE_MUTEX warnings, we must not hold fil_system.mutex. fil_close_all_files(): To prevent SAFE_MUTEX warnings for fil_space_destroy_crypt_data(), we must not hold fil_system.mutex while invoking fil_space_free_low() on a detached tablespace.
-
Marko Mäkelä authored
We will default to MUTEXTYPE=sys (using OSTrackMutex) for those ib_mutex_t that have not been replaced yet. The view INFORMATION_SCHEMA.INNODB_SYS_SEMAPHORE_WAITS is removed. The parameter innodb_sync_array_size is removed. FIXME: innodb_fatal_semaphore_wait_threshold will no longer be enforced. We should enforce it for lock_sys.mutex and dict_sys.mutex somehow! innodb_sync_debug=ON might still cover ib_mutex_t.
-
Marko Mäkelä authored
Let us replace os_event_t with mysql_cond_t, and replace the necessary ib_mutex_t with mysql_mutex_t so that they can be used with condition variables. Also, let us replace polling (os_thread_sleep() or timed waits) with plain mysql_cond_wait() wherever possible. Furthermore, we will use the lightweight srw_mutex for trx_t::mutex, to hopefully reduce contention on lock_sys.mutex. FIXME: Add test coverage of mariabackup --backup --kill-long-queries-timeout
-
Marko Mäkelä authored
srw_lock_low: Declare the member functions public when wrapping rw_lock_t
-
Marko Mäkelä authored
In commit 43d3dad1 we forgot to invert the return values of rw_tryrdlock() and rw_trywrlock(), causing strange failures.
-
Marko Mäkelä authored
This conceptually reverts commit 1fdc161d and reintroduces an option for srw_lock to wrap a native implementation. The srw_lock and srw_lock_low differ from ssux_lock and ssux_lock_low in that Slim SUX locks support three modes (Shared, Update, eXclusive) while Slim RW locks support only two (Read, Write). On Microsoft Windows, the srw_lock will be implemented by SRWLOCK. On Linux and OpenBSD, it will be implemented by rw_lock and the futex system call, just like earlier. On other systems or if SRW_LOCK_DUMMY is defined on anything else than Microsoft Windows, rw_lock_t will be used. ssux_lock_low::read_lock(), ssux_lock_low::update_lock(): Correct the SRW_LOCK_DUMMY implementation to prevent hangs. The intention of commit 1fdc161d seems to have been do ... while loops, but the 'do' keyword was missing. This total breakage was missed in commit 260161fc which did reduce the probability of the hangs. ssux_lock_low::u_unlock(): In the SRW_LOCK_DUMMY implementation (based on a mutex and two condition variables), always invoke writer_wake() in order to ensure that a waiting update_lock() will be woken up. ssux_lock_low::writer_wait(), ssux_lock_low::readers_wait(): In the SRW_LOCK_DUMMY implementation, keep waiting for the signal until the lock word has changed. The "while" had been changed to "if" in order to avoid hangs.
-
zhaorenhai authored
Move the S3 test case variables to suite.pm to use environment variables. Use minio credentials if a TCP connection to localhost:9000 is accepted so the current build works corrected. Reviewer: Daniel Black closes #1711
-
- 14 Dec, 2020 4 commits
-
-
Stepan Patryshev authored
-
Stepan Patryshev authored
-
Marko Mäkelä authored
-
Marko Mäkelä authored
-