MDEV-24449 Corruption of system tablespace or last recovered page
This corresponds to 10.5 commit 39378e13. With a patched version of the test innodb.ibuf_not_empty (so that it would trigger crash recovery after using the change buffer), and patched code that would modify the os_thread_sleep() in recv_apply_hashed_log_recs() to be 1ms as well as add a sleep of the same duration to the end of recv_recover_page() when recv_sys->n_addrs=0, we can demonstrate a race condition. After disabling some debug checks in buf_all_freed_instance(), buf_pool_invalidate_instance() and buf_validate(), we managed to trigger an assertion failure in fseg_free_step(), on the XDES_FREE_BIT. In other words, an trx_undo_seg_free() call during trx_rollback_resurrected() was attempting a double-free of a page. This was repeated about once in 400 to 500 test runs. With the fix applied, the test passed 2,000 runs. recv_apply_hashed_log_recs(): Do not only wait for recv_sys->n_addrs to reach 0, but also wait for buf_get_n_pending_read_ios() to reach 0, to guarantee that buf_page_io_complete() will not be executing ibuf_merge_or_delete_for_page().
Showing
Please register or sign in to comment