Yet another attempt at fixing random failures in test case main.myisam-metadata
I think I finally found the problem, managed to reproduce locally using a sleep in the test case to simulate the particular race condition that causes the test to fail often in Buildbot. The test starts an ALTER TABLE that does repair by sort in one thread, then another thread waits for the sort to be visible in SHOW PROCESSLIST and runs a SHOW statement in parallel. The problem happens when the sort manages to run to completion before the other thread has the time to look at SHOW PROCESSLIST. In this case, the wait times out because the state looked for has already passed. Earlier I added some DEBUG_SYNC to prevent this race, but it turns out that DEBUG_SYNC itself changes the state in the processlist. So when the debug sync point was hit, the processlist was showing the wrong state, so the wait would still time out. Fixed now by looking for the processlist to contain either the "Repair by sorting" state or the debug sync wait stage. Also clean up previous attempts to fix it. Set the wait timeout back to reasonable 60 seconds, and simplify the DEBUG_SYNC operations to work closer to how the original test case was intended.
Showing
Please register or sign in to comment