Commits · 8274c207dfd1bdac10e71f17ce3acccf330c8db6 · nexedi / MariaDB

12 Feb, 2022 1 commit
- bump the VERSION · 8274c207
  Daniel Bartholomew authored Feb 12, 2022
  
  8274c207
11 Feb, 2022 1 commit

MDEV-27774 fixup: Replace sspin_lock with srw_lock · bd70ae05

Marko Mäkelä authored Feb 11, 2022

srw_lock in log_sys.append_prepare() turned out to yield best throughput.
We might try a NUMA aware spin lock implementation later.

bd70ae05

10 Feb, 2022 3 commits

MDEV-27774 Reduce scalability bottlenecks in mtr_t::commit() · a635c406

Marko Mäkelä authored Feb 10, 2022

A prominent bottleneck in mtr_t::commit() is log_sys.mutex between
log_sys.append_prepare() and log_close().

User-visible change: The minimum innodb_log_file_size will be
increased from 1MiB to 4MiB so that some conditions can be
trivially satisfied.

log_sys.latch (log_latch): Replaces log_sys.mutex and
log_sys.flush_order_mutex. Copying mtr_t::m_log to
log_sys.buf is protected by a shared log_sys.latch.
Writes from log_sys.buf to the file system will be protected
by an exclusive log_sys.latch.

log_sys.lsn_lock: Protects the allocation of log buffer
in log_sys.append_prepare().

sspin_lock: A simple spin lock, for log_sys.lsn_lock.

Thanks to Vladislav Vaintroub for suggesting this idea, and for
reviewing these changes.

mariadb-backup: Replace some use of log_sys.mutex with recv_sys.mutex.

buf_pool_t::insert_into_flush_list(): Implement sorting of flush_list
because ordering is otherwise no longer guaranteed. Ordering by LSN
is needed for the proper operation of redo log checkpoints.

log_sys.append_prepare(): Advance log_sys.lsn and log_sys.buf_free by
the length, and return the old values. Also increment write_to_buf,
which was previously done in log_close().

mtr_t::finish_write(): Obtain the buffer pointer from
log_sys.append_prepare().

log_sys.buf_free: Make the field Atomic_relaxed,
to simplify log_flush_margin(). Use only loads and stores
to avoid costly read-modify-write atomic operations.

buf_pool.flush_list_requests: Replaces
export_vars.innodb_buffer_pool_write_requests
and srv_stats.buf_pool_write_requests.
Protected by buf_pool.flush_list_mutex.

buf_pool_t::insert_into_flush_list(): Do not invoke page_cleaner_wakeup().
Let the caller do that after a batch of calls.

recv_recover_page(): Invoke a minimal part of
buf_pool.insert_into_flush_list().

ReleaseBlocks::modified: A number of pages added to buf_pool.flush_list.

ReleaseBlocks::operator(): Merge buf_flush_note_modification() here.

log_t::set_capacity(): Renamed from log_set_capacity().

a635c406

MDEV-27787 mariadb-backup --backup is allocating extra memory for log records · 8c7c92ad

Marko Mäkelä authored Feb 10, 2022

In commit 685d958e (MDEV-14425),
the log parsing in mariadb-backup --backup was rewritten.
The parameter STORE_IF_EXISTS that is being passed to recv_sys.parse_mtr()
or recv_sys.parse_pmem() instead of STORE_NO caused unnecessary additional
memory allocation for redo log records.

8c7c92ad

MDEV-27790: Fix mis-matched braces for non-Linux targets · e375f519

Vincent Milum Jr authored Feb 09, 2022

Ran into this while compiling on FreeBSD 13.0-RELEASE

After this one change, it compiles and runs just fine on my FreeBSD Aarch64 server.

e375f519

09 Feb, 2022 9 commits

Merge 10.7 into 10.8 · c75e3770
Marko Mäkelä authored Feb 09, 2022

c75e3770
Merge 10.6 into 10.7 · 70a88755
Marko Mäkelä authored Feb 09, 2022

70a88755
Merge 10.5 into 10.6 · cce99405
Marko Mäkelä authored Feb 09, 2022

cce99405

MDEV-27716 mtr_t::commit() acquires log_sys.mutex when writing no log · fd101daa

Marko Mäkelä authored Feb 09, 2022

mtr_t::is_block_dirtied(), mtr_t::memo_push(): Never set m_made_dirty
for pages of the temporary tablespace. Ever since
commit 5eb53955
we never add those pages to buf_pool.flush_list.

mtr_t::commit(): Implement part of mtr_t::prepare_write() here,
and avoid acquiring log_sys.mutex if no log is written.
During IMPORT TABLESPACE fixup, we do not write log, but we must
add pages to buf_pool.flush_list and for that, be prepared
to acquire log_sys.flush_order_mutex.

mtr_t::do_write(): Replaces mtr_t::prepare_write().

fd101daa

Merge branch '10.8' into bb-10.8-release · 12cd3dc7
Oleksandr Byelkin authored Feb 09, 2022

12cd3dc7
Merge branch '10.7' into bb-10.7-release · bbd4837f
Oleksandr Byelkin authored Feb 09, 2022

bbd4837f
Merge branch '10.6' into bb-10.6-release · 1bed5640
Oleksandr Byelkin authored Feb 09, 2022

1bed5640
Merge branch '10.5' into bb-10.5-release · 34c50196
Oleksandr Byelkin authored Feb 09, 2022

34c50196

MDEV-27734 Set innodb_change_buffering=none by default · 5c46751f

Marko Mäkelä authored Feb 09, 2022

The aim of the InnoDB change buffer is to avoid delays when a leaf page
of a secondary index is not present in the buffer pool, and a record needs
to be inserted, delete-marked, or purged. Instead of reading the page into
the buffer pool for making such a modification, we may insert a record to
the change buffer (a special index tree in the InnoDB system tablespace).
The buffered changes are guaranteed to be merged if the index page
actually needs to be read later.

The change buffer could be useful when the database is stored on a
rotational medium (hard disk) where random seeks are slower than
sequential reads or writes.

Obviously, the change buffer will cause write amplification, due to
potentially large amount of metadata that is being written to the
change buffer. We will have to write redo log records for modifying
the change buffer tree as well as the user tablespace. Furthermore,
in the user tablespace, we must maintain a change buffer bitmap page
that uses 2 bits for estimating the amount of free space in pages,
and 1 bit to specify whether buffered changes exist. This bitmap needs
to be updated on every operation, which could reduce performance.

Even if the change buffer were free of bugs such as MDEV-24449
(potentially causing the corruption of any page in the system tablespace)
or MDEV-26977 (corruption of secondary indexes due to a currently
unknown reason), it will make diagnosis of other data corruption harder.

Because of all this, it is best to disable the change buffer by default.

5c46751f

08 Feb, 2022 10 commits

bump the VERSION · ac077490
Daniel Bartholomew authored Feb 08, 2022

ac077490
bump the VERSION · 9055db2f
Daniel Bartholomew authored Feb 08, 2022

9055db2f
bump the VERSION · fa73117b
Daniel Bartholomew authored Feb 08, 2022

fa73117b
bump the VERSION · f7704d74
Daniel Bartholomew authored Feb 08, 2022

f7704d74

MDEV-26585 Wrong query results when `using index for group-by` · 38058c04

Monty authored Feb 02, 2022

The problem was that "group_min_max optimization" does not work if
some aggregate functions, like COUNT(*), is used.
The function get_best_group_min_max() is using the join->sum_funcs
array to check which aggregate functions are used.
The bug was that aggregates in HAVING where not yet added to
join->sum_funcs at the time get_best_group_min_max() was called.

Fixed by populate join->sum_funcs already in prepare, which means that
all sum functions will be in join->sum_funcs in get_best_group_min_max().
A benefit of this approach is that we can remove several calls to
make_sum_func_list() from the code and simplify the function.

I removed some wrong setting of 'sort_and_group'.
This variable is set when alloc_group_fields() is called, as part
of allocating the cache needed by end_send_group() and does not need
to be set by other functions.

One problematic thing was that Spider is using *join->sum_funcs to detect
at which stage the optimizer is and do internal calculations of aggregate
functions. Updating join->sum_funcs early caused Spider to fail when trying
to find min/max values in opt_sum_query().
Fixed by temporarily resetting sum_funcs during opt_sum_query().

Reviewer: Sergei Petrunia

38058c04

MDEV-27442 Wrong result upon query with DISTINCT and EXISTS subquery · d314bd26

Monty authored Feb 02, 2022

The problem was that get_best_group_min_max() did not check if fields used
by the "group_min_max optimization" where used in sub queries.
Because of this, it did not detect that a key (b,a) was used in the WHERE
clause for the statement:
SELECT DISTINCT b FROM t1 WHERE EXISTS ( SELECT 1 FROM DUAL WHERE a > 1 ).

Fixed by also traversing the sub queries when checking if a field is used.
This disables group_min_max_optimization for the above query.

Reviewer: Sergei Petrunia

d314bd26

MENT-328 Retry BACKUP STAGE BLOCK DDL in case of deadlocks · a1c23807

Monty authored Feb 06, 2022

MENT-328 wrongly assumed that the backup failed because of warnings from
mariabackup about not found files. This is normal (and the error message
should be deleted).

randgen failed because mariabackup didn't retry BACKUP STAGE BLOCK DDL
if it failed with a deadlock.

To simplify things, I implemented the retry loop in the server as
this particular deadlock should be quickly resolved.

a1c23807

Don't run innodb_defgragment under valgrind (too slow) · 0ec27d7b
Monty authored Feb 02, 2022

0ec27d7b
Fixes some compiler issues on AIX ( · 88fb89ac
Monty authored Feb 02, 2022

88fb89ac

Fixed my_addr_resolve (cherry picked from 10.6) · df02de68

Monty authored Aug 17, 2020

When a server is compiled with -fPIE, my_addr_resolve needs to
subtract the info.dli_fbase from symbol addresses in memory for
addr2line to recognize them.  When a server is compiled without -fPIE,
my_addr_resolve should not do it.  Unfortunately not all compilers
define __PIE__ when -fPIE was used (e.g. older gcc doesn't), so we
have to resort to run-time detection.

df02de68

07 Feb, 2022 2 commits

MDEV-27754 : Assertion with innodb_flush_method=O_DSYNC · 881918bf

Vladislav Vaintroub authored Feb 07, 2022

If innodb_flush_method=O_DSYNC, log_sys.flushed_to_disk_lsn  is changed
without 'flush_lock' protection inside log_write().

This leads to a race condition, if there are 2 threads running in parallel,
doing log_write_up_to() with different values for 'flush_to_disk'

In this case, log_write() and log_write_flush_to_disk_low() can execute at
the same time, and both would change flushed_lsn.

The fix is to remove special treatment of durable writes from log_write().
There is no apparent reason for this special treatment, log_write_flush_to_disk_low()
is already optimized for durable writes.

Nor there is an apparent reason to call log_flush_notify() more often in
for O_DSYNC.

881918bf

Fix JSON statistics time format and added tests for it and server version. · 307b2991
Oleksandr Byelkin authored Feb 07, 2022

307b2991

06 Feb, 2022 4 commits
- update test result · a319220e
  Sergei Golubchik authored Feb 06, 2022
  
  a319220e
- Merge branch '10.7' into 10.8 · 34564587
  Sergei Golubchik authored Feb 06, 2022
  
  34564587
- wrong merge · cb1316b8
  Sergei Golubchik authored Feb 06, 2022
  
  cb1316b8
- Merge branch '10.6' into 10.7 · 2150ad3f
  Sergei Golubchik authored Feb 06, 2022
  
  2150ad3f
05 Feb, 2022 4 commits
- update columnstore · 4ffffd98
  Sergei Golubchik authored Feb 05, 2022
  
  4ffffd98
- Remove incorrect narrowing size_t->ulong casts. Fix printf format error. · 5ded88eb
  Vladislav Vaintroub authored Feb 04, 2022
  
  5ded88eb
- enable main.mysqldump-system test · 4c2c1e61
  Sergei Golubchik authored Feb 04, 2022
  
  4c2c1e61
- make zstd in C/C optional and disable it for now in RPM/DEB · 6009f9b8
  Sergei Golubchik authored Feb 04, 2022
  
  6009f9b8
04 Feb, 2022 6 commits
- .gitignore · e70bd5f6
  Sergei Golubchik authored Feb 04, 2022
  
  e70bd5f6
- Merge branch '10.7' into 10.8 · 2f29d0ea
  Oleksandr Byelkin authored Feb 04, 2022
  
  2f29d0ea
- Merge branch '10.6' into 10.7 · 47f42ce1
  Oleksandr Byelkin authored Feb 04, 2022
  
  47f42ce1
- Revert "don't build with OpenSSL 3.0, it doesn't work before MDEV-25785" · 64e35882
  Oleksandr Byelkin authored Feb 04, 2022
```
This reverts commit c9beef43, because
we have OpenSSL 3.0 support here.
```
  64e35882
- Merge branch '10.7' into 10.8 · 4fb2cb1a
  Oleksandr Byelkin authored Feb 04, 2022
  
  4fb2cb1a
- Fix for compiling under clang. · a806c993
  Oleksandr Byelkin authored Feb 04, 2022
  
  a806c993