Commit d58734d7 authored by Marko Mäkelä's avatar Marko Mäkelä

MDEV-34520 purge_sys_t::wait_FTS sleeps 10ms, even if it does not have to

There were two separate Atomic_counter<uint32_t>, purge_sys.m_SYS_paused
and purge_sys.m_FTS_paused. In purge_sys.wait_FTS() we have to read both
atomically. We used to use an overkill solution for this, acquiring
purge_sys.latch and waiting 10 milliseconds between samples. To make
matters worse, the 10-millisecond wait was unconditional, which would
unnecessarily suspend the purge_coordinator_task every now and then.

It turns out that we can fold both "reference counts" into a single
Atomic_relaxed<uint32_t> and avoid the purge_sys.latch.
To assess whether std::memory_order_relaxed is acceptable, we should
consider the operations that read these "reference counts", that is,
purge_sys_t::wait_FTS(bool) and purge_sys_t::must_wait_FTS().

Outside debug assertions, purge_sys.must_wait_FTS() is only invoked in
trx_purge_table_acquire(), which is covered by a shared dict_sys.latch.
We would increment the counter as part of a DDL operation, but before
acquiring an exclusive dict_sys.latch. So, a
purge_sys_t::close_and_reopen() loop could be triggered slightly
prematurely, before a problematic DDL operation is actually executed.
Decrementing the counter is less of an issue; purge_sys.resume_FTS()
or purge_sys.resume_SYS() would mostly be invoked while holding an
exclusive dict_sys.latch; ha_innobase::delete_table() does it outside
that critical section. Still, this would only cause some extra wait in
the purge_coordinator_task, just like at the start of a DDL operation.

There are two calls to purge_sys_t::wait_FTS(bool): in the above mentioned
purge_sys_t::close_and_reopen() and in purge_sys_t::clone_oldest_view(),
both invoked by the purge_coordinator_task. There is also a
purge_sys.clone_oldest_view<true>() call at startup when no DDL operation
can be in progress.

purge_sys_t::m_SYS_paused: Merged into m_FTS_paused, using a new
multiplier PAUSED_SYS = 65536.

purge_sys_t::wait_FTS(): Remove an unnecessary sleep as well as the
access to purge_sys.latch. It suffices to poll purge_sys.m_FTS_paused.

purge_sys_t::stop_FTS(): Do not acquire purge_sys.latch.

Reviewed by: Debarun Banerjee
parent 9020baf1
......@@ -149,10 +149,11 @@ class purge_sys_t
private:
/** number of pending stop() calls without resume() */
Atomic_counter<uint32_t> m_paused;
/** number of stop_SYS() calls without resume_SYS() */
Atomic_counter<uint32_t> m_SYS_paused;
/** number of stop_FTS() calls without resume_FTS() */
Atomic_counter<uint32_t> m_FTS_paused;
/** PAUSED_SYS * number of stop_SYS() calls without resume_SYS() +
number of stop_FTS() calls without resume_FTS() */
Atomic_relaxed<uint32_t> m_FTS_paused;
/** The stop_SYS() multiplier in m_FTS_paused */
static constexpr const uint32_t PAUSED_SYS= 1U << 16;
/** latch protecting end_view */
alignas(CPU_LEVEL1_DCACHE_LINESIZE) srw_spin_lock_low end_latch;
......@@ -321,16 +322,21 @@ class purge_sys_t
void wait_FTS(bool also_sys);
public:
/** Suspend purge in data dictionary tables */
void stop_SYS() { m_SYS_paused++; }
void stop_SYS()
{
ut_d(const auto p=) m_FTS_paused.fetch_add(PAUSED_SYS);
ut_ad(p < p + PAUSED_SYS);
}
/** Resume purge in data dictionary tables */
static void resume_SYS(void *);
/** Pause purge during a DDL operation that could drop FTS_ tables. */
void stop_FTS();
/** Resume purge after stop_FTS(). */
void resume_FTS() { ut_d(const auto p=) m_FTS_paused--; ut_ad(p); }
void resume_FTS()
{ ut_d(const auto p=) m_FTS_paused.fetch_sub(1); ut_ad(p & ~PAUSED_SYS); }
/** @return whether stop_SYS() is in effect */
bool must_wait_FTS() const { return m_FTS_paused; }
bool must_wait_FTS() const { return m_FTS_paused & ~PAUSED_SYS; }
private:
/**
......
......@@ -1298,10 +1298,9 @@ bool purge_sys_t::running()
void purge_sys_t::stop_FTS()
{
latch.rd_lock(SRW_LOCK_CALL);
m_FTS_paused++;
latch.rd_unlock();
while (m_active)
ut_d(const auto paused=) m_FTS_paused.fetch_add(1);
ut_ad(paused < PAUSED_SYS);
while (m_active.load(std::memory_order_acquire))
std::this_thread::sleep_for(std::chrono::seconds(1));
}
......@@ -1335,8 +1334,8 @@ void purge_sys_t::stop()
/** Resume purge in data dictionary tables */
void purge_sys_t::resume_SYS(void *)
{
ut_d(auto paused=) purge_sys.m_SYS_paused--;
ut_ad(paused);
ut_d(auto paused=) purge_sys.m_FTS_paused.fetch_sub(PAUSED_SYS);
ut_ad(paused >= PAUSED_SYS);
}
/** Resume purge at UNLOCK TABLES after FLUSH TABLES FOR EXPORT */
......
......@@ -1065,15 +1065,8 @@ static void trx_purge_close_tables(purge_node_t *node, THD *thd)
void purge_sys_t::wait_FTS(bool also_sys)
{
bool paused;
do
{
latch.wr_lock(SRW_LOCK_CALL);
paused= m_FTS_paused || (also_sys && m_SYS_paused);
latch.wr_unlock();
for (const uint32_t mask= also_sys ? ~0U : ~PAUSED_SYS; m_FTS_paused & mask;)
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
while (paused);
}
__attribute__((nonnull))
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment