• Konstantin Osipov's avatar
    Backport of: · a3814e36
    Konstantin Osipov authored
    -----------------------------------------------------------
    2630.28.28 Magne Mahre  2008-12-05
    Bug #38661 'all threads hang in "opening tables" or "waiting for table"
                and cpu is at 100%'
                          
    Concurrent execution of FLUSH TABLES statement and at least two statements
    using the same table might have led to live-lock which caused all three
    connections to stall and hog 100% of CPU.
            
    tdc_wait_for_old_versions() wrongly assumed that there cannot be a share
    with an old version and no used TABLE instances and thus was failing to
    perform wait in situation when such old share was cached in MDL subsystem
    thanks to a still active metadata lock on the table. So it might have
    happened that two or more connections simultaneously executing statements
    which involve table being flushed managed to prevent each other from
    waiting in this function by keeping shared metadata lock on the table 
    constantly active (i.e. one of the statements managed to take/hold this
    lock while other statements were calling tdc_wait_for_old_versions()).
    Thus they were forcing each other to loop infinitely in open_tables() - 
    close_thread_tables_for_reopen() - tdc_wait_for_old_versions() cycle
    causing CPU hogging.
            
    This patch fixes this problem by removing this false assumption from
    tdc_wait_for_old_versions().
     
    Note that the problem is specific only for server versions >= 6.0.
            
    No test case is submitted for this test, as the test infrastructure
    hasn't got the necessary primitives to test the behaviour.  The
    manifestation is that throughput will decrease to a low level
    (possibly 0) after some time, and stay at that level. Several
    transactions will not complete. 
            
    Manual testing can be done by running the code submitted by Shane 
    Bester attached to the bug report.  If the bug persists, the 
    transaction thruput will almost immediately drop to near zero 
    (shown as the transaction count output from the test program staying 
    on a close to constant value, instead of increasing rapidly).
    a3814e36
sql_base.cc 276 KB