• Dmitry Lenev's avatar
    A temporary workaround for bug #56405 "Deadlock in the · ac351578
    Dmitry Lenev authored
    MDL deadlock detector".
    
    Deadlock could have occurred when workload containing mix
    of DML, DDL and FLUSH TABLES statements affecting same
    set of tables was executed in heavily concurrent environment.
    
    This deadlock occurred when several connections tried to
    perform deadlock detection in metadata locking subsystem.
    The first connection started traversing wait-for graph,
    encountered sub-graph representing wait for flush, acquired
    LOCK_open and dived into sub-graph inspection. When it has
    encounterd sub-graph corresponding to wait for metadata lock
    and blocked while trying to acquire rd-lock on
    MDL_lock::m_rwlock (*) protecting this subgraph, since some
    other thread had wr-lock on it. When this wr-lock was released
    it could have happened (if there was other pending wr-lock
    against this rwlock) that rd-lock from the first connection
    was left unsatisfied but at the same time new rd-lock request
    from the second connection sneaked in and was satisfied (for
    this to be possible second rd- request should come exactly
    after wr-lock is released but before pending wr-lock manages
    to grab rwlock, which is possible both on Linux and in our
    own rwlock implementation). If this second connection
    continued traversing wait-for graph and encountered sub-graph
    representing wait for flush it tried to acquire LOCK_open
    and thus deadlock was created.
    
    This patch tries to workaround this problem but not allowing
    deadlock detector to lock LOCK_open mutex if some other thread
    doing deadlock detection already owns it and current search
    depth is greater than 0. Instead deadlock is reported.
    
    Other possible solutions are either known to have negative
    effects on performance or require much more time for proper
    implementation and testing.
    
    No test case is provided as this bug is very hard to repeat
    in MTR environment but is repeatable with the help of RQG
    tests.
    ac351578
table.cc 154 KB