Commit 49406b97 authored by Dmitry Lenev's avatar Dmitry Lenev

A better fix for bug #56405 "Deadlock in the MDL deadlock

detector" that doesn't introduce bug #56715 "Concurrent
transactions + FLUSH result in sporadical unwarranted
deadlock errors".

Deadlock could have occurred when workload containing a mix
of DML, DDL and FLUSH TABLES statements affecting the same
set of tables was executed in a heavily concurrent environment.

This deadlock occurred when several connections tried to
perform deadlock detection in the metadata locking subsystem.
The first connection started traversing wait-for graph,
encountered a sub-graph representing a wait for flush, acquired
LOCK_open and dived into sub-graph inspection. Then it
encountered sub-graph corresponding to wait for metadata lock
and blocked while trying to acquire a rd-lock on
MDL_lock::m_rwlock, since some,other thread had a wr-lock on it.
When this wr-lock was released it could have happened (if there
was another pending wr-lock against this rwlock) that the rd-lock
from the first connection was left unsatisfied but at the same
time the new rd-lock request from the second connection sneaked
in and was satisfied (for this to be possible the second
rd-request should come exactly after the wr-lock is released but
before pending the wr-lock manages to grab rwlock, which is
possible both on Linux and in our own rwlock implementation).
If this second connection continued traversing the wait-for graph
and encountered a sub-graph representing a wait for flush it tried
to acquire LOCK_open and thus the deadlock was created.

The previous patch tried to workaround this problem by not
allowing the deadlock detector to lock LOCK_open mutex if
some other thread doing deadlock detection already owns it
and current search depth is greater than 0. Instead deadlock
was reported. As a result it has introduced bug #56715.

This patch solves this problem in a different way.
It introduces a new rw_pr_lock_t implementation to be used
by MDL subsystem instead of one based on Linux rwlocks or
our own rwlock implementation. This new implementation
never allows situation in which an rwlock is rd-locked and
there is a blocked pending rd-lock. Thus the situation which
has caused this bug becomes impossible with this implementation.

Due to fact that this implementation is optimized for
wr-lock/unlock scenario which is most common in the MDL
subsystem it doesn't introduce noticeable performance
regressions in sysbench tests. Moreover it significantly
improves situation for POINT_SELECT test when many
connections are used.

No test case is provided as this bug is very hard to repeat
in MTR environment but is repeatable with the help of RQG
tests.
This patch also doesn't include a test for bug #56715
"Concurrent transactions + FLUSH result in sporadical
unwarranted deadlock errors" as it takes too much time to
be run as part of normal test-suite runs.
parent 4228b8bb
......@@ -220,7 +220,6 @@
#cmakedefine HAVE_PTHREAD_KEY_DELETE 1
#cmakedefine HAVE_PTHREAD_KILL 1
#cmakedefine HAVE_PTHREAD_RWLOCK_RDLOCK 1
#cmakedefine HAVE_PTHREAD_RWLOCKATTR_SETKIND_NP 1
#cmakedefine HAVE_PTHREAD_SETPRIO_NP 1
#cmakedefine HAVE_PTHREAD_SETSCHEDPARAM 1
#cmakedefine HAVE_PTHREAD_SIGMASK 1
......
......@@ -346,7 +346,6 @@ CHECK_FUNCTION_EXISTS (pthread_condattr_setclock HAVE_PTHREAD_CONDATTR_SETCLOCK)
CHECK_FUNCTION_EXISTS (pthread_init HAVE_PTHREAD_INIT)
CHECK_FUNCTION_EXISTS (pthread_key_delete HAVE_PTHREAD_KEY_DELETE)
CHECK_FUNCTION_EXISTS (pthread_rwlock_rdlock HAVE_PTHREAD_RWLOCK_RDLOCK)
CHECK_FUNCTION_EXISTS (pthread_rwlockattr_setkind_np HAVE_PTHREAD_RWLOCKATTR_SETKIND_NP)
CHECK_FUNCTION_EXISTS (pthread_sigmask HAVE_PTHREAD_SIGMASK)
CHECK_FUNCTION_EXISTS (pthread_threadmask HAVE_PTHREAD_THREADMASK)
CHECK_FUNCTION_EXISTS (pthread_yield_np HAVE_PTHREAD_YIELD_NP)
......
......@@ -2169,7 +2169,7 @@ AC_CHECK_FUNCS(alarm bfill bmove bsearch bzero \
mkstemp mlockall perror poll pread pthread_attr_create mmap mmap64 getpagesize \
pthread_attr_getstacksize pthread_attr_setstacksize pthread_condattr_create \
pthread_getsequence_np pthread_key_delete pthread_rwlock_rdlock \
pthread_rwlockattr_setkind_np pthread_sigmask \
pthread_sigmask \
readlink realpath rename rint rwlock_init setupterm \
shmget shmat shmdt shmctl sigaction sigemptyset sigaddset \
sighold sigset sigthreadmask port_create sleep thr_yield \
......
......@@ -594,7 +594,7 @@ int my_pthread_fastmutex_lock(my_pthread_fastmutex_t *mp);
/* Use our own version of read/write locks */
#define NEED_MY_RW_LOCK 1
#define rw_lock_t my_rw_lock_t
#define my_rwlock_init(A,B) my_rw_init((A), 0)
#define my_rwlock_init(A,B) my_rw_init((A))
#define rw_rdlock(A) my_rw_rdlock((A))
#define rw_wrlock(A) my_rw_wrlock((A))
#define rw_tryrdlock(A) my_rw_tryrdlock((A))
......@@ -606,49 +606,82 @@ int my_pthread_fastmutex_lock(my_pthread_fastmutex_t *mp);
#endif /* USE_MUTEX_INSTEAD_OF_RW_LOCKS */
/*
Portable read-write locks which prefer readers.
Required by some algorithms in order to provide correctness.
/**
Portable implementation of special type of read-write locks.
These locks have two properties which are unusual for rwlocks:
1) They "prefer readers" in the sense that they do not allow
situations in which rwlock is rd-locked and there is a
pending rd-lock which is blocked (e.g. due to pending
request for wr-lock).
This is a stronger guarantee than one which is provided for
PTHREAD_RWLOCK_PREFER_READER_NP rwlocks in Linux.
MDL subsystem deadlock detector relies on this property for
its correctness.
2) They are optimized for uncontended wr-lock/unlock case.
This is scenario in which they are most oftenly used
within MDL subsystem. Optimizing for it gives significant
performance improvements in some of tests involving many
connections.
Another important requirement imposed on this type of rwlock
by the MDL subsystem is that it should be OK to destroy rwlock
object which is in unlocked state even though some threads might
have not yet fully left unlock operation for it (of course there
is an external guarantee that no thread will try to lock rwlock
which is destroyed).
Putting it another way the unlock operation should not access
rwlock data after changing its state to unlocked.
TODO/FIXME: We should consider alleviating this requirement as
it blocks us from doing certain performance optimizations.
*/
#if defined(HAVE_PTHREAD_RWLOCK_RDLOCK) && defined(HAVE_PTHREAD_RWLOCKATTR_SETKIND_NP)
/*
On systems which have a way to specify that readers should
be preferred through attribute mechanism (e.g. Linux) we use
system implementation of read/write locks.
*/
#define rw_pr_lock_t pthread_rwlock_t
typedef struct st_rw_pr_lock_t {
/**
Lock which protects the structure.
Also held for the duration of wr-lock.
*/
pthread_mutex_t lock;
/**
Condition variable which is used to wake-up
writers waiting for readers to go away.
*/
pthread_cond_t no_active_readers;
/** Number of active readers. */
uint active_readers;
/** Number of writers waiting for readers to go away. */
uint writers_waiting_readers;
/** Indicates whether there is an active writer. */
my_bool active_writer;
#ifdef SAFE_MUTEX
/** Thread holding wr-lock (for debug purposes only). */
pthread_t writer_thread;
#endif
} rw_pr_lock_t;
extern int rw_pr_init(rw_pr_lock_t *);
#define rw_pr_rdlock(A) pthread_rwlock_rdlock(A)
#define rw_pr_wrlock(A) pthread_rwlock_wrlock(A)
#define rw_pr_tryrdlock(A) pthread_rwlock_tryrdlock(A)
#define rw_pr_trywrlock(A) pthread_rwlock_trywrlock(A)
#define rw_pr_unlock(A) pthread_rwlock_unlock(A)
#define rw_pr_destroy(A) pthread_rwlock_destroy(A)
extern int rw_pr_rdlock(rw_pr_lock_t *);
extern int rw_pr_wrlock(rw_pr_lock_t *);
extern int rw_pr_unlock(rw_pr_lock_t *);
extern int rw_pr_destroy(rw_pr_lock_t *);
#ifdef SAFE_MUTEX
#define rw_pr_lock_assert_write_owner(A) \
DBUG_ASSERT((A)->active_writer && pthread_equal(pthread_self(), \
(A)->writer_thread))
#define rw_pr_lock_assert_not_write_owner(A) \
DBUG_ASSERT(! (A)->active_writer || ! pthread_equal(pthread_self(), \
(A)->writer_thread))
#else
#define rw_pr_lock_assert_write_owner(A)
#define rw_pr_lock_assert_not_write_owner(A)
#else
/* Otherwise we have to use our own implementation of read/write locks. */
#define NEED_MY_RW_LOCK 1
struct st_my_rw_lock_t;
#define rw_pr_lock_t my_rw_lock_t
extern int rw_pr_init(struct st_my_rw_lock_t *);
#define rw_pr_rdlock(A) my_rw_rdlock((A))
#define rw_pr_wrlock(A) my_rw_wrlock((A))
#define rw_pr_tryrdlock(A) my_rw_tryrdlock((A))
#define rw_pr_trywrlock(A) my_rw_trywrlock((A))
#define rw_pr_unlock(A) my_rw_unlock((A))
#define rw_pr_destroy(A) my_rw_destroy((A))
#define rw_pr_lock_assert_write_owner(A) my_rw_lock_assert_write_owner((A))
#define rw_pr_lock_assert_not_write_owner(A) my_rw_lock_assert_not_write_owner((A))
#endif /* defined(HAVE_PTHREAD_RWLOCK_RDLOCK) && defined(HAVE_PTHREAD_RWLOCKATTR_SETKIND_NP) */
#endif /* SAFE_MUTEX */
#ifdef NEED_MY_RW_LOCK
/*
On systems which don't support native read/write locks, or don't support
read/write locks which prefer readers we have to use own implementation.
On systems which don't support native read/write locks we have
to use own implementation.
*/
typedef struct st_my_rw_lock_t {
pthread_mutex_t lock; /* lock for structure */
......@@ -656,13 +689,12 @@ typedef struct st_my_rw_lock_t {
pthread_cond_t writers; /* waiting writers */
int state; /* -1:writer,0:free,>0:readers */
int waiters; /* number of waiting writers */
my_bool prefer_readers;
#ifdef SAFE_MUTEX
pthread_t write_thread;
#endif
} my_rw_lock_t;
extern int my_rw_init(my_rw_lock_t *, my_bool *);
extern int my_rw_init(my_rw_lock_t *);
extern int my_rw_destroy(my_rw_lock_t *);
extern int my_rw_rdlock(my_rw_lock_t *);
extern int my_rw_wrlock(my_rw_lock_t *);
......
......@@ -141,9 +141,7 @@ typedef struct st_mysql_rwlock mysql_rwlock_t;
@c mysql_prlock_t is a drop-in replacement for @c rw_pr_lock_t.
@sa mysql_prlock_init
@sa mysql_prlock_rdlock
@sa mysql_prlock_tryrdlock
@sa mysql_prlock_wrlock
@sa mysql_prlock_trywrlock
@sa mysql_prlock_unlock
@sa mysql_prlock_destroy
*/
......@@ -420,20 +418,6 @@ typedef struct st_mysql_cond mysql_cond_t;
inline_mysql_rwlock_tryrdlock(RW)
#endif
/**
@def mysql_prlock_tryrdlock(RW)
Instrumented rw_pr_tryrdlock.
@c mysql_prlock_tryrdlock is a drop-in replacement
for @c rw_pr_tryrdlock.
*/
#ifdef HAVE_PSI_INTERFACE
#define mysql_prlock_tryrdlock(RW) \
inline_mysql_prlock_tryrdlock(RW, __FILE__, __LINE__)
#else
#define mysql_prlock_tryrdlock(RW) \
inline_mysql_prlock_tryrdlock(RW)
#endif
/**
@def mysql_rwlock_trywrlock(RW)
Instrumented rwlock_trywrlock.
......@@ -448,20 +432,6 @@ typedef struct st_mysql_cond mysql_cond_t;
inline_mysql_rwlock_trywrlock(RW)
#endif
/**
@def mysql_prlock_trywrlock(RW)
Instrumented rw_pr_trywrlock.
@c mysql_prlock_trywrlock is a drop-in replacement
for @c rw_pr_trywrlock.
*/
#ifdef HAVE_PSI_INTERFACE
#define mysql_prlock_trywrlock(RW) \
inline_mysql_prlock_trywrlock(RW, __FILE__, __LINE__)
#else
#define mysql_prlock_trywrlock(RW) \
inline_mysql_prlock_trywrlock(RW)
#endif
/**
@def mysql_rwlock_unlock(RW)
Instrumented rwlock_unlock.
......@@ -905,35 +875,6 @@ static inline int inline_mysql_rwlock_tryrdlock(
return result;
}
#ifndef DISABLE_MYSQL_PRLOCK_H
static inline int inline_mysql_prlock_tryrdlock(
mysql_prlock_t *that
#ifdef HAVE_PSI_INTERFACE
, const char *src_file, uint src_line
#endif
)
{
int result;
#ifdef HAVE_PSI_INTERFACE
struct PSI_rwlock_locker *locker= NULL;
PSI_rwlock_locker_state state;
if (likely(PSI_server && that->m_psi))
{
locker= PSI_server->get_thread_rwlock_locker(&state, that->m_psi,
PSI_RWLOCK_TRYREADLOCK);
if (likely(locker != NULL))
PSI_server->start_rwlock_rdwait(locker, src_file, src_line);
}
#endif
result= rw_pr_tryrdlock(&that->m_prlock);
#ifdef HAVE_PSI_INTERFACE
if (likely(locker != NULL))
PSI_server->end_rwlock_rdwait(locker, result);
#endif
return result;
}
#endif
static inline int inline_mysql_rwlock_trywrlock(
mysql_rwlock_t *that
#ifdef HAVE_PSI_INTERFACE
......@@ -961,35 +902,6 @@ static inline int inline_mysql_rwlock_trywrlock(
return result;
}
#ifndef DISABLE_MYSQL_PRLOCK_H
static inline int inline_mysql_prlock_trywrlock(
mysql_prlock_t *that
#ifdef HAVE_PSI_INTERFACE
, const char *src_file, uint src_line
#endif
)
{
int result;
#ifdef HAVE_PSI_INTERFACE
struct PSI_rwlock_locker *locker= NULL;
PSI_rwlock_locker_state state;
if (likely(PSI_server && that->m_psi))
{
locker= PSI_server->get_thread_rwlock_locker(&state, that->m_psi,
PSI_RWLOCK_TRYWRITELOCK);
if (likely(locker != NULL))
PSI_server->start_rwlock_wrwait(locker, src_file, src_line);
}
#endif
result= rw_pr_trywrlock(&that->m_prlock);
#ifdef HAVE_PSI_INTERFACE
if (likely(locker != NULL))
PSI_server->end_rwlock_wrwait(locker, result);
#endif
return result;
}
#endif
static inline int inline_mysql_rwlock_unlock(
mysql_rwlock_t *that)
{
......
......@@ -59,7 +59,7 @@
* Mountain View, California 94043
*/
int my_rw_init(my_rw_lock_t *rwp, my_bool *prefer_readers_attr)
int my_rw_init(my_rw_lock_t *rwp)
{
pthread_condattr_t cond_attr;
......@@ -74,8 +74,6 @@ int my_rw_init(my_rw_lock_t *rwp, my_bool *prefer_readers_attr)
#ifdef SAFE_MUTEX
rwp->write_thread = 0;
#endif
/* If attribute argument is NULL use default value - prefer writers. */
rwp->prefer_readers= prefer_readers_attr ? *prefer_readers_attr : FALSE;
return(0);
}
......@@ -96,8 +94,7 @@ int my_rw_rdlock(my_rw_lock_t *rwp)
pthread_mutex_lock(&rwp->lock);
/* active or queued writers */
while (( rwp->state < 0 ) ||
(rwp->waiters && ! rwp->prefer_readers))
while (( rwp->state < 0 ) || rwp->waiters)
pthread_cond_wait( &rwp->readers, &rwp->lock);
rwp->state++;
......@@ -109,8 +106,7 @@ int my_rw_tryrdlock(my_rw_lock_t *rwp)
{
int res;
pthread_mutex_lock(&rwp->lock);
if ((rwp->state < 0 ) ||
(rwp->waiters && ! rwp->prefer_readers))
if ((rwp->state < 0 ) || rwp->waiters)
res= EBUSY; /* Can't get lock */
else
{
......@@ -192,30 +188,127 @@ int my_rw_unlock(my_rw_lock_t *rwp)
return(0);
}
#endif /* defined(NEED_MY_RW_LOCK) */
int rw_pr_init(struct st_my_rw_lock_t *rwlock)
int rw_pr_init(rw_pr_lock_t *rwlock)
{
my_bool prefer_readers_attr= TRUE;
return my_rw_init(rwlock, &prefer_readers_attr);
pthread_mutex_init(&rwlock->lock, NULL);
pthread_cond_init(&rwlock->no_active_readers, NULL);
rwlock->active_readers= 0;
rwlock->writers_waiting_readers= 0;
rwlock->active_writer= FALSE;
#ifdef SAFE_MUTEX
rwlock->writer_thread= 0;
#endif
return 0;
}
#else
/*
We are on system which has native read/write locks which support
preferring of readers.
*/
int rw_pr_destroy(rw_pr_lock_t *rwlock)
{
pthread_cond_destroy(&rwlock->no_active_readers);
pthread_mutex_destroy(&rwlock->lock);
return 0;
}
int rw_pr_init(rw_pr_lock_t *rwlock)
int rw_pr_rdlock(rw_pr_lock_t *rwlock)
{
pthread_mutex_lock(&rwlock->lock);
/*
The fact that we were able to acquire 'lock' mutex means
that there are no active writers and we can acquire rd-lock.
Increment active readers counter to prevent requests for
wr-lock from succeeding and unlock mutex.
*/
rwlock->active_readers++;
pthread_mutex_unlock(&rwlock->lock);
return 0;
}
int rw_pr_wrlock(rw_pr_lock_t *rwlock)
{
pthread_rwlockattr_t rwlock_attr;
pthread_mutex_lock(&rwlock->lock);
if (rwlock->active_readers != 0)
{
/* There are active readers. We have to wait until they are gone. */
rwlock->writers_waiting_readers++;
pthread_rwlockattr_init(&rwlock_attr);
pthread_rwlockattr_setkind_np(&rwlock_attr, PTHREAD_RWLOCK_PREFER_READER_NP);
pthread_rwlock_init(rwlock, NULL);
pthread_rwlockattr_destroy(&rwlock_attr);
while (rwlock->active_readers != 0)
pthread_cond_wait(&rwlock->no_active_readers, &rwlock->lock);
rwlock->writers_waiting_readers--;
}
/*
We own 'lock' mutex so there is no active writers.
Also there are no active readers.
This means that we can grant wr-lock.
Not releasing 'lock' mutex until unlock will block
both requests for rd and wr-locks.
Set 'active_writer' flag to simplify unlock.
Thanks to the fact wr-lock/unlock in the absence of
contention from readers is essentially mutex lock/unlock
with a few simple checks make this rwlock implementation
wr-lock optimized.
*/
rwlock->active_writer= TRUE;
#ifdef SAFE_MUTEX
rwlock->writer_thread= pthread_self();
#endif
return 0;
}
#endif /* defined(NEED_MY_RW_LOCK) */
int rw_pr_unlock(rw_pr_lock_t *rwlock)
{
if (rwlock->active_writer)
{
/* We are unlocking wr-lock. */
#ifdef SAFE_MUTEX
rwlock->writer_thread= 0;
#endif
rwlock->active_writer= FALSE;
if (rwlock->writers_waiting_readers)
{
/*
Avoid expensive cond signal in case when there is no contention
or it is wr-only.
Note that from view point of performance it would be better to
signal on the condition variable after unlocking mutex (as it
reduces number of contex switches).
Unfortunately this would mean that such rwlock can't be safely
used by MDL subsystem, which relies on the fact that it is OK
to destroy rwlock once it is in unlocked state.
*/
pthread_cond_signal(&rwlock->no_active_readers);
}
pthread_mutex_unlock(&rwlock->lock);
}
else
{
/* We are unlocking rd-lock. */
pthread_mutex_lock(&rwlock->lock);
rwlock->active_readers--;
if (rwlock->active_readers == 0 &&
rwlock->writers_waiting_readers)
{
/*
If we are last reader and there are waiting
writers wake them up.
*/
pthread_cond_signal(&rwlock->no_active_readers);
}
pthread_mutex_unlock(&rwlock->lock);
}
return 0;
}
#endif /* defined(THREAD) */
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment