Commits · 8684af76e34641331695860fc13eb9fc2ff94215 · nexedi / MariaDB

24 Mar, 2022 1 commit

MDEV-28137 Some memory transactions are unnecessarily complex · 8684af76

Marko Mäkelä authored Mar 24, 2022

buf_page_get_zip(): Do not perform a system call inside a
memory transaction. Instead, if the page latch is unavailable,
abort the memory transaction and let the fall-back code path
wait for the page latch.

buf_pool_t::watch_remove(): Return the previous state of the block.

buf_page_init_for_read(): Use regular stores for moving the
buffer fix count of watch_remove() to the new block descriptor.

A more extensive version of this was reviewed by Daniel Black
and tested with Intel TSX-NI by Axel Schwenke and Matthias Leich.
My assumption that regular loads and stores would execute faster
in a memory transaction than operations like std::atomic::fetch_add()
turned out to be incorrect.

8684af76

22 Mar, 2022 2 commits

MDEV-27760 event may non stop replicate in circular semisync setup · 5ccd845d

Andrei authored Feb 10, 2022

MDEV-21117 had to relax own events acceptance condition for a case
when a former semisync master server recovers after crash as the
semisync slave. That however admitted a possibility for endless event
"orbiting" in the non-strict slave gtid mode of semisync circular
setup.

The same server-id event termination is restored now for
the non-strict gtid mode to follow regular rules (that is it's ignored
unless @@global.replicate_same_server_id allows it in).

To address MDEV-21117 recovery agenda,
in the strict gtid mode and the transaction's gtid ordered strictly
greater than the current slave gtid state, the same server-id
transaction is accepted.

The gtid strict mode is safe to accept transactions even if
the slave state were not set correct by the user, e.g
at the former master.
An added test shows a typical out-of-order error at execution so
no data corruption is guaranteed in such a case.

5ccd845d

MDEV-27524 addendum: fix for bug introduced by automatic migration · 35725df6
Julius Goryavsky authored Mar 22, 2022

35725df6

18 Mar, 2022 2 commits

MDEV-27909 InnoDB: Failing assertion: state == TRX_STATE_NOT_STARTED ... on DDL · 8840583a

Marko Mäkelä authored Mar 18, 2022

The fix in commit 6e390a62 (MDEV-26772)
was a step to the right direction, but implemented incorrectly.
When an InnoDB persistent statistics table cannot be locked immediately,
we must not let row_mysql_handle_errors() to roll back the transaction.

lock_table_for_trx(): Add the parameter no_wait (default false)
for an immediate return of DB_LOCK_WAIT in case of a conflict.

ha_innobase::delete_table(), ha_innobase::rename_table():
Pass no_wait=true to lock_table_for_trx() when needed,
instead of temporarily setting THDVAR(thd, lock_wait_timeout) to 0.

8840583a

Merge branch 10.5 into 10.6 · 065f995e
Daniel Black authored Mar 18, 2022

065f995e

17 Mar, 2022 2 commits

MDEV-17841 fixup: GCC -Wmaybe-uninitialized · 06e3bc43

Marko Mäkelä authored Mar 17, 2022

In commit ab38b751
an added "goto err" would seemingly cause a read of
an uninitialized variable old_info if errpos>=5.

However, because we would have errpos=0 at that point,
there was no real error.

06e3bc43

Merge 10.4 to 10.5 · b73d8527
Daniel Black authored Mar 16, 2022

b73d8527

16 Mar, 2022 5 commits

MDEV-26551 InnoDB crash on multiple concurrent SHOW TABLE STATUS · ee80c196

Marko Mäkelä authored Mar 16, 2022

dict_get_and_save_data_dir_path(): Protect the operation with
dict_table_t::lock_mutex and avoid unnecessary memory allocation.

ee80c196

MDEV-28079 Shutdown hangs after altering innodb partition fts table · 31ad9277

Thirunarayanan Balathandayuthapani authored Mar 16, 2022

- InnoDB purge waits at resume_FTS() while shutting down.
This happens after altering the FTS innodb partition table.
stop_FTS() has been called for each partition, but it calls
resume_FTS() only once and it leads to hang during shutdown.
This issue was introduced by
commit 1bd681c8(MDEV-25506).

31ad9277

Merge 10.3 to 10.4 · 069139a5

Daniel Black authored Mar 16, 2022

extra2_read_len resolved by keeping the implementation
in sql/table.cc by exposed it for use by ha_partition.cc

Remove identical implementation in unireg.h
(ref: bfed2c7d)

069139a5

Merge 10.2 to 10.3 · 6a2d88c1
Daniel Black authored Mar 16, 2022

6a2d88c1
Merge branch 10.2 into 10.3 · 0e63023c
Alexander Barkov authored Mar 15, 2022

0e63023c

15 Mar, 2022 10 commits

MDEV-27955 main.func_json_notembedded test fails on out-of-memory · b2c81e06

Daniel Black authored Feb 28, 2022

Uses 500M+ of memory by repeating an 8 byte sequence 62.5M times.

Reduce the number of repeats on string reduced by 100 times.

Tested by applying against the reverted MDEV-24909 code. 1000 times
reduction was too much, but 100 still managed to trigger the bug.

b2c81e06

MDEV-23915 ER_KILL_DENIED_ERROR not passed a thread id (part 2) · 57dbe878

Daniel Black authored Mar 15, 2022

Per Marko's comment in JIRA, sql_kill is passing the thread id
as long long. We change the format of the error messages to match,
and cast the thread id to long long in sql_kill_user.

57dbe878

MDEV-23915 ER_KILL_DENIED_ERROR not passed a thread id · 99837c61

Daniel Black authored Feb 23, 2022

The 10.5 test error main.grant_kill showed up a incorrect
thread id on a big endian architecture.

The cause of this is the sql_kill_user function assumed the
error was ER_OUT_OF_RESOURCES, when the the actual error was
ER_KILL_DENIED_ERROR. ER_KILL_DENIED_ERROR as an error message
requires a thread id to be passed as unsigned long, however a
user/host was passed.

ER_OUT_OF_RESOURCES doesn't even take a user/host, despite
the optimistic comment. We remove this being passed as an
argument to the function so that when MDEV-21978 is implemented
one less compiler format warning is generated (which would
have caught this error sooner).

Thanks Otto for reporting and Marko for analysis.

99837c61

Merge 10.5 into 10.6 · 4ef44cc2
Marko Mäkelä authored Mar 15, 2022

4ef44cc2

MDEV-27985 buf_flush_freed_pages() causes InnoDB to hang · 73fee39e

Marko Mäkelä authored Mar 15, 2022

buf_flush_freed_pages(): Assert that neither buf_pool.mutex
nor buf_pool.flush_list_mutex are held. Simplify the loops.
Return the tablespace and the number of pages written or punched.

buf_flush_LRU_list_batch(), buf_do_flush_list_batch():
Release buf_pool.mutex before invoking buf_flush_space().

buf_flush_list_space(): Acquire the mutexes only after invoking
buf_flush_freed_pages().

Reviewed by: Thirunarayanan Balathandayuthapani

73fee39e

MDEV-25214 Crash in fil_space_t::try_to_close · 00896db1

Marko Mäkelä authored Mar 15, 2022

fil_space_t::try_to_close(): Tolerate a tablespace that has no
data files attached. The function fil_ibd_create() initially
creates and attaches a tablespace with no files, and invokes
fil_space_t::add() later.

fil_node_open_file(): After releasing and reacquiring fil_system.mutex,
check if the file was already opened by another thread. This avoids
an assertion failure !node->is_open() in fil_node_open_file_low().

These failures were reproduced with the test
innodb.table_definition_cache_debug and the fix of MDEV-27985.

00896db1

Merge 10.4 into 10.5 · e1246775
Marko Mäkelä authored Mar 15, 2022

e1246775
Merge 10.3 into 10.4 · 9c6135e8
Marko Mäkelä authored Mar 15, 2022

9c6135e8

Merge 10.2 (part) into 10.3 · a9500860

Daniel Black authored Mar 15, 2022

commit '6de482a6'

10.3 no longer errors in truncate_notembedded.test
but per comments, a non-crash is all that we are after.

a9500860

MDEV-27342: Fix issue of recovery failure using new server id · dafc5fb9

Hugo Wen authored Feb 04, 2022

Commit 6c39eaeb made the crash recovery dependent on server_id.
The crash recovery could fail when restoring a new instance from
original crashed data directory USING A NEW SERVER ID.

The issue doesn't exist in previous major versions before 10.6.

Root cause is when generating the input XID to be searched in the hash,
server id is populated with the current server id.
So if the server id changed when recovering, the XID couldn't be found
in the hash due to server id doesn't match.

This fix is to use original server id when creating the input XID
object in function `xarecover_do_commit_or_rollback`.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.

dafc5fb9

14 Mar, 2022 9 commits

MDEV-28060 Online DDL fails while checking for instant alter condition · 1c43660a

Thirunarayanan Balathandayuthapani authored Mar 14, 2022

- InnoDB fails to skip newly created column while checking for
change column when table is in redundant row format. This issue
is caused the MDEV-18035 (ccb1acbd)

1c43660a

MDEV-23210 Assertion `(length % 4) == 0' failed in my_lengthsp_utf32 on ALTER... · 03c3dc63

Alexander Barkov authored Mar 12, 2022

MDEV-23210 Assertion `(length % 4) == 0' failed in my_lengthsp_utf32 on ALTER TABLE, SELECT and INSERT

Problem:
Parse-time conversion from binary to tricky character sets like utf32
produced ill-formed strings. So, later a chash happened in debug builds,
or a wrong SHOW CREATE TABLE was returned in release builds.

Fix:

1. Backporting a few methods from 10.3:
  - THD::check_string_for_wellformedness()
  - THD::convert_string() overloads
  - THD::make_text_string_connection()

2. Adding a new method THD::reinterpret_string_from_binary(),
   which makes sure to either returns a well-formed string
   (optionally prepending with zero bytes), or returns an error.

03c3dc63

Merge 10.5 into 10.6 · 572e3430
Marko Mäkelä authored Mar 14, 2022

572e3430
MDEV-28050: clang -Wtypedef-redefinition when PLUGIN_S3=NO · 258c34f1
Marko Mäkelä authored Mar 14, 2022
```
Let us remove the redundant typedef.
This problem was revealed by
commit 77c184df
```
258c34f1

MDEV-24841: More workarounds · c2146ce7

Marko Mäkelä authored Mar 14, 2022

For some reason, the tests of the MemorySanitizer build on 10.5 failed
with both clang 13 and clang 14 with SIGSEGV. On 10.6 where it worked
better, some more places to work around were identified.

c2146ce7

mtr: fix --source lines detection · f217c761
Sergei Golubchik authored Feb 21, 2022
```
mysqltest allows leading spaces before `--`, so mtr should too
```
f217c761

MDEV-27753 Incorrect ENGINE type of table after crash for CONNECT table · bfed2c7d

Sergei Golubchik authored Mar 11, 2022

whenever possible, partitioning should use the full
partition plugin name, not the one byte legacy code.

Normally, ha_partition can get the engine plugin from
table_share->default_part_plugin.

But in some cases, e.g. in DROP TABLE, the table isn't
opened, table_share is NULL, and ha_partition has to parse
the frm, much like dd_frm_type() does.

temporary_tables.cc, sql_table.cc:

When dropping a table, it must be deleted in the engine
first, then frm file. Because frm can be the only true
source of metadata that the engine might need for DROP.

table.cc:

when opening a partitioned table, if the engine for
partitions is not found, do not fallback to MyISAM.

bfed2c7d

MDEV-24841 Build error with MSAN use-of-uninitialized-value in comp_err · 59359fb4

Marko Mäkelä authored Mar 14, 2022

The MemorySanitizer implementation in clang includes some built-in
instrumentation (interceptors) for GNU libc. In GNU libc 2.33, the
interface to the stat() family of functions was changed. Until the
MemorySanitizer interceptors are adjusted, any MSAN code builds
will act as if that the stat() family of functions failed to initialize
the struct stat.

A fix was applied in
https://reviews.llvm.org/rG4e1a6c07052b466a2a1cd0c3ff150e4e89a6d87a
but it fails to cover the 64-bit variants of the calls.

For now, let us work around the MemorySanitizer bug by defining
and using the macro MSAN_STAT_WORKAROUND().

59359fb4

MDEV-28049 Error on compiling trx0purge.cc · 3b499679

Marko Mäkelä authored Mar 14, 2022

In commit 83212632
the trx_rseg_latch was instrumented for performance_schema,
but some acqusitions of rd_lock() were not adjusted.
Thus, the build would fail on platforms where a futex-based
rw-lock is not available (SUX_LOCK_GENERIC) unless the code
was built with cmake -DPLUGIN_PERFSCHEMA=NO.

3b499679

13 Mar, 2022 1 commit
- MDEV-28036 gcol.gcol_supported_sql_funcs_xxx fail in FIPS mode · ed6e271f
  Elena Stepanova authored Mar 14, 2022
  
  ed6e271f
12 Mar, 2022 4 commits
- MDEV-18304 sql_safe_updates does not work with OR clauses · 6789f2cf
  Sergei Golubchik authored Oct 12, 2021
```
not every index-using plan sets bits in table->quick_keys.
QUICK_ROR_INTERSECT_SELECT, for example, doesn't.

Use the fact that select->quick is set instead.

Also allow EXPLAIN to work.
```
  6789f2cf
- MDEV-27753 Incorrect ENGINE type of table after crash for CONNECT table · e0dc22b2
  Sergei Golubchik authored Mar 11, 2022
```
fix two null pointer dereferences
```
  e0dc22b2
- MDEV-27900: aio handle partial reads/writes (uring) · f4fb6cb3
  Daniel Black authored Mar 10, 2022
```
MDEV-27900 continued for uring.

Also spell synchronously correctly in sql_parse.cc.

Reviewed by Wlad.
```
  f4fb6cb3
- Merge branch 10.5 into 10.6 · bd1ba780
  Daniel Black authored Mar 12, 2022
  
  bd1ba780
11 Mar, 2022 4 commits

MDEV-27900: aio handle partial reads/writes · d7817382

Daniel Black authored Mar 10, 2022

As btrfs showed, a partial read of data in AIO /O_DIRECT circumstances can
really confuse MariaDB.

Filipe Manana (SuSE)[1] showed how database programmers can assume
O_DIRECT is all or nothing.

While a fix was done in the kernel side, we can do better in our code by
requesting that the rest of the block be read/written synchronously if
we do only get a partial read/write.

Per the APIs, a partial read/write can occur before an error, so
reattempting the request will leave the caller with a concrete error to
handle.

[1] https://lore.kernel.org/linux-btrfs/CABVffENfbsC6HjGbskRZGR2NvxbnQi17gAuW65eOM+QRzsr8Bg@mail.gmail.com/T/#mb2738e675e48e0e0778a2e8d1537dec5ec0d3d3a

Also spell synchronously correctly in other files.

d7817382

Avoid shutdown timeout in innodb.undo_truncate · dc680d21

Marko Mäkelä authored Mar 11, 2022

Let us explicitly wait for purge before invoking a slow shutdown,
so that instrumented builds (such as ASAN or UBSAN) will not
exceed the 60-second timeout during shutdown.

dc680d21

Merge 10.5 into 10.6 · 77c7390f
Marko Mäkelä authored Mar 11, 2022

77c7390f
Merge 10.4 into 10.5 · 4cfb6edd
Marko Mäkelä authored Mar 11, 2022

4cfb6edd