Commit 5b9ee8d8 authored by Marko Mäkelä's avatar Marko Mäkelä

MDEV-24449 Corruption of system tablespace or last recovered page

This corresponds to 10.5 commit 39378e13.

With a patched version of the test innodb.ibuf_not_empty (so that
it would trigger crash recovery after using the change buffer),
and patched code that would modify the os_thread_sleep() in
recv_apply_hashed_log_recs() to be 1ms as well as add a sleep of
the same duration to the end of recv_recover_page() when
recv_sys->n_addrs=0, we can demonstrate a race condition.

After disabling some debug checks in buf_all_freed_instance(),
buf_pool_invalidate_instance() and buf_validate(), we managed to
trigger an assertion failure in fseg_free_step(), on the XDES_FREE_BIT.
In other words, an trx_undo_seg_free() call during
trx_rollback_resurrected() was attempting a double-free of a page.
This was repeated about once in 400 to 500 test runs. With the fix
applied, the test passed 2,000 runs.

recv_apply_hashed_log_recs(): Do not only wait for recv_sys->n_addrs
to reach 0, but also wait for buf_get_n_pending_read_ios() to reach 0,
to guarantee that buf_page_io_complete() will not be executing
ibuf_merge_or_delete_for_page().
parent 8e3e87d2
...@@ -2501,7 +2501,7 @@ void recv_apply_hashed_log_recs(bool last_batch) ...@@ -2501,7 +2501,7 @@ void recv_apply_hashed_log_recs(bool last_batch)
/* Wait until all the pages have been processed */ /* Wait until all the pages have been processed */
while (recv_sys->n_addrs != 0) { while (recv_sys->n_addrs || buf_get_n_pending_read_ios()) {
const bool abort = recv_sys->found_corrupt_log const bool abort = recv_sys->found_corrupt_log
|| recv_sys->found_corrupt_fs; || recv_sys->found_corrupt_fs;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment