MDEV-34520 purge_sys_t::wait_FTS sleeps 10ms, even if it does not have to
There were two separate Atomic_counter<uint32_t>, purge_sys.m_SYS_paused and purge_sys.m_FTS_paused. In purge_sys.wait_FTS() we have to read both atomically. We used to use an overkill solution for this, acquiring purge_sys.latch and waiting 10 milliseconds between samples. To make matters worse, the 10-millisecond wait was unconditional, which would unnecessarily suspend the purge_coordinator_task every now and then. It turns out that we can fold both "reference counts" into a single Atomic_relaxed<uint32_t> and avoid the purge_sys.latch. To assess whether std::memory_order_relaxed is acceptable, we should consider the operations that read these "reference counts", that is, purge_sys_t::wait_FTS(bool) and purge_sys_t::must_wait_FTS(). Outside debug assertions, purge_sys.must_wait_FTS() is only invoked in trx_purge_table_acquire(), which is covered by a shared dict_sys.latch. We would increment the counter as part of a DDL operation, but before acquiring an exclusive dict_sys.latch. So, a purge_sys_t::close_and_reopen() loop could be triggered slightly prematurely, before a problematic DDL operation is actually executed. Decrementing the counter is less of an issue; purge_sys.resume_FTS() or purge_sys.resume_SYS() would mostly be invoked while holding an exclusive dict_sys.latch; ha_innobase::delete_table() does it outside that critical section. Still, this would only cause some extra wait in the purge_coordinator_task, just like at the start of a DDL operation. There are two calls to purge_sys_t::wait_FTS(bool): in the above mentioned purge_sys_t::close_and_reopen() and in purge_sys_t::clone_oldest_view(), both invoked by the purge_coordinator_task. There is also a purge_sys.clone_oldest_view<true>() call at startup when no DDL operation can be in progress. purge_sys_t::m_SYS_paused: Merged into m_FTS_paused, using a new multiplier PAUSED_SYS = 65536. purge_sys_t::wait_FTS(): Remove an unnecessary sleep as well as the access to purge_sys.latch. It suffices to poll purge_sys.m_FTS_paused. purge_sys_t::stop_FTS(): Do not acquire purge_sys.latch. Reviewed by: Debarun Banerjee
Showing
Please register or sign in to comment