Commits · c36a2a0d1c03900796dc35c01a745bec8b1b54e2 · nexedi / MariaDB

17 Dec, 2020 2 commits

Merge 10.5 into 10.6 · c36a2a0d
Marko Mäkelä authored Dec 17, 2020

c36a2a0d

MDEV-24426 fil_crypt_thread keep spinning even if innodb_encryption_rotate_key_age=0 · 1fe3dd00

Marko Mäkelä authored Dec 17, 2020

After MDEV-15528, two modes of operation in the fil_crypt_thread
remains, depending on whether innodb_encryption_rotate_key_age=0
(whether key rotation is disabled). If the key rotation is disabled,
the fil_crypt_thread miss the opportunity to sleep, which will result
in lots of wasted CPU usage.

fil_crypt_return_iops(): Add a parameter to specify whether other
fil_crypt_thread should be woken up.

fil_system_t::keyrotate_next(): Return the special value
fil_system.temp_space to indicate that no work is to be done.

fil_space_t::next(): Propagage the special value fil_system.temp_space
to the caller.

fil_crypt_find_space_to_rotate(): If no work is to be done,
do not wake up other threads.

1fe3dd00

16 Dec, 2020 2 commits

Speed up mariabackup.xb_compressed_encrypted · af1335c2

Marko Mäkelä authored Dec 16, 2020

With system mutexes, contention can be very expensive.
Let us configure innodb_encryption_threads=1 to minimize contention.
The actual work is being done in buf_flush_page_cleaner thread anyway.

af1335c2

MDEV-24167 fixup: Wake up all update_lock() in u_unlock() · 07e4b6b2

Marko Mäkelä authored Dec 16, 2020

It turns out that the hang that was fixed in
commit 43d3dad1
for the SRW_LOCK_DUMMY implementation is also possible in the futex
implementation. We have observed hangs of ssux_lock_low::u_unlock()
on Windows where the undesirable value is rw_lock::UPDATER, in the
test mariabackup.xb_compressed_encrypted.

The exact sequence of events to the hang is not known, but
it seems that u_unlock() had better always wake up one thread.
Possibly, the case involves multiple blocked u_unlock().

On a busy server, the hang might be 'rescued' by a subsequent
lock acquisition and release that is executed by another thread.

rw_lock::update_unlock(): Change the return type to void.

ssux_lock_low::u_unlock(): Always invoke readers_wake() [sic],
to wake up any pending update_lock() or write_lock().
On futex implementation, this will wake up all waiters.
On SRW_LOCK_DUMMY, writer_wake() and readers_wake() do the same
thing: wake up one write_lock(), or all update_lock() waiters.

07e4b6b2

15 Dec, 2020 19 commits

Contain AIX perror · 6bb3949e
Etienne Guesnet authored Oct 26, 2020

6bb3949e
Fix build on GCC 5 · 2ce48f06
Etienne Guesnet authored Oct 26, 2020

2ce48f06
Add LARGE_FILES flag for GCC AIX build · a6e90992
Etienne Guesnet authored Sep 14, 2020

a6e90992
Add -berok for head test on AIX · 4fade4da
Etienne Guesnet authored Sep 11, 2020

4fade4da
Parse GSSAPI flags on AIX · 2dee6a74
Etienne Guesnet authored Sep 11, 2020

2dee6a74
Add flags for AIX build · 1d7fc728
Etienne Guesnet authored Sep 11, 2020

1d7fc728
Remove -Werror for AIX · b23e5457
Etienne Guesnet authored Sep 11, 2020

b23e5457
AIX workaround for GCC include bug · 1a49619a
Etienne Guesnet authored Sep 11, 2020

1a49619a
AIX workaround for GCC TOC bug · 2c724762
Etienne Guesnet authored Sep 11, 2020

2c724762
Support of AIX for auth_socket plugin · 77d7de8d
Etienne Guesnet authored Sep 11, 2020

77d7de8d
Add build on AIX · 2f5d3724
Etienne Guesnet authored Jan 31, 2020

2f5d3724

MDEV-21452: Retain the watchdog only on dict_sys.mutex, for performance · cf2480dd

Marko Mäkelä authored Dec 15, 2020

Most hangs seem to involve dict_sys.mutex. While holding lock_sys.mutex
we rarely acquire any buffer pool page latches, which are a frequent
source of potential hangs.

cf2480dd

MDEV-21452: Replace ib_mutex_t with mysql_mutex_t · ff5d306e

Marko Mäkelä authored Dec 04, 2020

SHOW ENGINE INNODB MUTEX functionality is completely removed,
as are the InnoDB latching order checks.

We will enforce innodb_fatal_semaphore_wait_threshold
only for dict_sys.mutex and lock_sys.mutex.

dict_sys_t::mutex_lock(): A single entry point for dict_sys.mutex.

lock_sys_t::mutex_lock(): A single entry point for lock_sys.mutex.

FIXME: srv_sys should be removed altogether; it is duplicating tpool
functionality.

fil_crypt_threads_init(): To prevent SAFE_MUTEX warnings, we must
not hold fil_system.mutex.

fil_close_all_files(): To prevent SAFE_MUTEX warnings for
fil_space_destroy_crypt_data(), we must not hold fil_system.mutex
while invoking fil_space_free_low() on a detached tablespace.

ff5d306e

MDEV-21452: Remove os_event_t, MUTEX_EVENT, TTASEventMutex, sync_array · db006a9a

Marko Mäkelä authored Dec 04, 2020

We will default to MUTEXTYPE=sys (using OSTrackMutex) for those
ib_mutex_t that have not been replaced yet.

The view INFORMATION_SCHEMA.INNODB_SYS_SEMAPHORE_WAITS is removed.

The parameter innodb_sync_array_size is removed.

FIXME: innodb_fatal_semaphore_wait_threshold will no longer be enforced.
We should enforce it for lock_sys.mutex and dict_sys.mutex somehow!

innodb_sync_debug=ON might still cover ib_mutex_t.

db006a9a

MDEV-21452: Replace all direct use of os_event_t · 38fd7b7d

Marko Mäkelä authored Dec 04, 2020

Let us replace os_event_t with mysql_cond_t, and replace the
necessary ib_mutex_t with mysql_mutex_t so that they can be
used with condition variables.

Also, let us replace polling (os_thread_sleep() or timed waits)
with plain mysql_cond_wait() wherever possible.

Furthermore, we will use the lightweight srw_mutex for trx_t::mutex,
to hopefully reduce contention on lock_sys.mutex.

FIXME: Add test coverage of
mariabackup --backup --kill-long-queries-timeout

38fd7b7d

Fix the SRW_LOCK_DUMMY build with PLUGIN_PERFSCHEMA=NO · 59b2848a
Marko Mäkelä authored Dec 15, 2020
```
srw_lock_low: Declare the member functions public when wrapping rw_lock_t
```
59b2848a

MDEV-24410: Bug in SRW_LOCK_DUMMY rw_lock_t wrapper · 20da7b22

Marko Mäkelä authored Dec 15, 2020

In commit 43d3dad1 we forgot to
invert the return values of rw_tryrdlock() and rw_trywrlock(),
causing strange failures.

20da7b22

MDEV-24142/MDEV-24167 fixup: Split ssux_lock and srw_lock · 43d3dad1

Marko Mäkelä authored Dec 15, 2020

This conceptually reverts commit 1fdc161d
and reintroduces an option for srw_lock to wrap a native implementation.

The srw_lock and srw_lock_low differ from ssux_lock and ssux_lock_low
in that Slim SUX locks support three modes (Shared, Update, eXclusive)
while Slim RW locks support only two (Read, Write).

On Microsoft Windows, the srw_lock will be implemented by SRWLOCK.
On Linux and OpenBSD, it will be implemented by rw_lock and the
futex system call, just like earlier.
On other systems or if SRW_LOCK_DUMMY is defined on anything else
than Microsoft Windows, rw_lock_t will be used.

ssux_lock_low::read_lock(), ssux_lock_low::update_lock(): Correct
the SRW_LOCK_DUMMY implementation to prevent hangs. The intention of
commit 1fdc161d seems to have been
do ... while loops, but the 'do' keyword was missing. This total
breakage was missed in commit 260161fc
which did reduce the probability of the hangs.

ssux_lock_low::u_unlock(): In the SRW_LOCK_DUMMY implementation
(based on a mutex and two condition variables), always invoke
writer_wake() in order to ensure that a waiting update_lock()
will be woken up.

ssux_lock_low::writer_wait(), ssux_lock_low::readers_wait():
In the SRW_LOCK_DUMMY implementation, keep waiting for the signal
until the lock word has changed. The "while" had been changed to "if"
in order to avoid hangs.

43d3dad1

MDEV-24366 Use environment variables as S3 test case variables · ee69c153

zhaorenhai authored Dec 08, 2020

Move the S3 test case variables to suite.pm to use environment variables.

Use minio credentials if a TCP connection to localhost:9000 is accepted
so the current build works corrected.

Reviewer: Daniel Black

closes #1711

ee69c153

14 Dec, 2020 7 commits

MDEV-23659 Update Galera disabled.def file · e4c25895
Stepan Patryshev authored Dec 14, 2020

e4c25895
MDEV-23659 Update Galera disabled.def file · 1c660211
Stepan Patryshev authored Dec 14, 2020

1c660211
Merge 10.5 into 10.6 · 9ecd7665
Marko Mäkelä authored Dec 14, 2020

9ecd7665
MDEV-24313 fixup: GCC 8 -Wconversion · e8217d07
Marko Mäkelä authored Dec 14, 2020

e8217d07
MDEV-24313 fixup: GCC -Wparentheses · 2c226e01
Marko Mäkelä authored Dec 14, 2020

2c226e01

MDEV-24313 (2 of 2): Silently ignored innodb_use_native_aio=1 · f24b7383

Marko Mäkelä authored Dec 14, 2020

In commit 5e62b6a5 (MDEV-16264)
the logic of os_aio_init() was changed so that it will never fail,
but instead automatically disable innodb_use_native_aio (which is
enabled by default) if the io_setup() system call would fail due
to resource limits being exceeded. This is questionable, especially
because falling back to simulated AIO may lead to significantly
reduced performance.

srv_n_file_io_threads, srv_n_read_io_threads, srv_n_write_io_threads:
Change the data type from ulong to uint.

os_aio_init(): Remove the parameters, and actually return an error code.

thread_pool::configure_aio(): Do not silently fall back to simulated AIO.

Reviewed by: Vladislav Vaintroub

f24b7383

MDEV-24313 (1 of 2): Hang with innodb_write_io_threads=1 · 17d3f856

Marko Mäkelä authored Dec 14, 2020

After commit a5a2ef07 (part of MDEV-23855)
implemented asynchronous doublewrite, it is possible that the server will
hang when the following parametes are in effect:

    innodb_doublewrite=1 (default)
    innodb_write_io_threads=1
    innodb_use_native_aio=0

Note: In commit 5e62b6a5 (MDEV-16264)
the logic of os_aio_init() was changed so that it will never fail,
but instead automatically disable innodb_use_native_aio (which is
enabled by default) if the io_setup() system call would fail due
to resource limits being exceeded.

Before commit a5a2ef07, we used
a synchronous write for the doublewrite buffer batches, always at
most 64 pages at a time. So, upon completing a doublewrite batch,
a single thread would submit at most 64 page writes (for the
individual pages that were first written to the doublewrite buffer).
With that commit, we may submit up to 128 page writes at a time.

The maximum number of outstanding requests per thread is 256.
Because the maximum number of asynchronous write submissions per
thread was roughly doubled, it is now possible that
buf_dblwr_t::flush_buffered_writes_completed() will hang in
io_slots::acquire(), called via os_aio() and fil_space_t::io(),
when submitting writes of the individual blocks.

We will prevent this type of hang by increasing the minimum number
of innodb_write_io_threads from 1 to 2, so that this type of hang
would only become possible when 512 outstanding write requests
are exceeded.

17d3f856

11 Dec, 2020 2 commits

MDEV-24353: Adding GROUP BY slows down a query · d79c3f32

Varun Gupta authored Dec 09, 2020

A heuristic in best_access_path says that if for an index
ref access involved key parts which are greater than equal to that
for range access, then range access should not be considered.
The assumption made by this heuristic does not hold when
the range optimizer opted to use the group-by min-max optimization.
So the fix here would be to not consider the heuristic if
the range optimizer picked the usage of group-by min-max
optimization.

d79c3f32

MDEV-24391 heap-use-after-free in fil_space_t::flush_low() · 8677c14e

Marko Mäkelä authored Dec 11, 2020

We observed a race condition that involved two threads
executing fil_flush_file_spaces() and one thread
executing fil_delete_tablespace(). After one of the
fil_flush_file_spaces() observed that
space.needs_flush_not_stopping() is set and was
releasing the fil_system.mutex, the other fil_flush_file_spaces()
would complete the execution of fil_space_t::flush_low() on
the same tablespace. Then, fil_delete_tablespace() would
destroy the object, because the value of fil_space_t::n_pending
did not prevent that. Finally, the fil_flush_file_spaces() would
resume execution and invoke fil_space_t::flush_low() on the freed
object.

This race condition was introduced in
commit 118e258a of MDEV-23855.

fil_space_t::flush(): Add a template parameter that indicates
whether the caller is holding a reference to prevent the
tablespace from being freed.

buf_dblwr_t::flush_buffered_writes_completed(),
row_quiesce_table_start(): Acquire a reference for the duration
of the fil_space_t::flush_low() operation. It should be impossible
for the object to be freed in these code paths, but we want to
satisfy the debug assertions.

fil_space_t::flush_low(): Do not increment or decrement the
reference count, but instead assert that the caller is holding
a reference.

fil_space_extend_must_retry(), fil_flush_file_spaces():
Acquire a reference before releasing fil_system.mutex.
This is what will fix the race condition.

8677c14e

09 Dec, 2020 4 commits

Merge 10.5 into 10.6 · be4d2665
Marko Mäkelä authored Dec 09, 2020

be4d2665

Remove unused DBUG_EXECUTE_IF "ignore_punch_hole" · 0c7c4492

Marko Mäkelä authored Dec 09, 2020

Since commit ea21d630 we
conditionally define a variable that only plays a role on
systems that support hole-punching (explicit creation of sparse files).
However, that broke debug builds on such systems.

It turns out that the debug_dbug label "ignore_punch_hole" is
not at all used in MariaDB server. It would be covered by
the MySQL 5.7 test innodb.table_compress. (Note: MariaDB 10.1
implemented page_compressed tables before something comparable
appeared in MySQL 5.7.)

0c7c4492

Merge 10.5 into 10.6 · ca821692
Marko Mäkelä authored Dec 09, 2020

ca821692

MDEV-12227 Defer writes to the InnoDB temporary tablespace · 5eb53955

Marko Mäkelä authored Dec 09, 2020

The flushing of the InnoDB temporary tablespace is unnecessarily
tied to the write-ahead redo logging and redo log checkpoints,
which must be tied to the page writes of persistent tablespaces.

Let us simply omit any pages of temporary tables from buf_pool.flush_list.
In this way, log checkpoints will never incur any 'collateral damage' of
writing out unmodified changes for temporary tables.

After this change, pages of the temporary tablespace can only be written
out by buf_flush_lists(n_pages,0) as part of LRU eviction. Hopefully,
most of the time, that code will never be executed, and instead, the
temporary pages will be evicted by buf_release_freed_page() without
ever being written back to the temporary tablespace file.

This should improve the efficiency of the checkpoint flushing and
the buf_flush_page_cleaner thread.

Reviewed by: Vladislav Vaintroub

5eb53955

08 Dec, 2020 3 commits

Fix -Wunused-but-set-variable · ea21d630
Marko Mäkelä authored Dec 08, 2020

ea21d630

MDEV-24369 Page cleaner sleeps despite innodb_max_dirty_pages_pct_lwm being exceeded · f0c295e2

Marko Mäkelä authored Dec 08, 2020

MDEV-24278 improved the page cleaner so that it will no longer wake up
once per second on an idle server. However, with innodb_adaptive_flushing
(the default) the function page_cleaner_flush_pages_recommendation()
could initially return 0 even if there is work to do.

af_get_pct_for_dirty(): Remove. Based on a comment here, it appears
that an initial intention of innodb_max_dirty_pages_pct_lwm=0.0
(the default value) was to disable something. That ceased to hold in
MDEV-23855: the value is a pure threshold; the page cleaner will not
perform any work unless the threshold is exceeded.

page_cleaner_flush_pages_recommendation(): Add the parameter dirty_blocks
to ensure that buf_pool.flush_list will eventually be emptied.

f0c295e2

MDEV-24351: S3, same-backend replication: Dropping a table on master... · 6859e80d

Sergei Petrunia authored Dec 08, 2020

..causes error on slave.
Cause: if the master doesn't have the frm file for the table,
DROP TABLE code will call ha_delete_table_force() to drop the table
in all available storage engines.
The issue was that this code path didn't check for
HTON_TABLE_MAY_NOT_EXIST_ON_SLAVE flag for the storage engine,
and so did not add "... IF EXISTS" to the statement that's written
to the binary log.  This can cause error on the slave when it tries to
drop a table that's already gone.

6859e80d

07 Dec, 2020 1 commit
- Simplify clang workarounds. · 3ee24b23
  Vladislav Vaintroub authored Dec 07, 2020
  
  3ee24b23