• Davi Arnaut's avatar
    Bug#37780: Make KILL reliable (main.kill fails randomly) · ae6801eb
    Davi Arnaut authored
    - A prerequisite cleanup patch for making KILL reliable.
    
    The test case main.kill did not work reliably.
    
    The following problems have been identified:
    
    1. A kill signal could go lost if it came in, short before a
    thread went reading on the client connection.
    
    2. A kill signal could go lost if it came in, short before a
    thread went waiting on a condition variable.
    
    These problems have been solved as follows. Please see also
    added code comments for more details.
    
    1. There is no safe way to detect, when a thread enters the
    blocking state of a read(2) or recv(2) system call, where it
    can be interrupted by a signal. Hence it is not possible to
    wait for the right moment to send a kill signal. It has been
    decided, not to fix it in the code.  Instead, the test case
    repeats the KILL statement until the connection terminates.
    
    2. Before waiting on a condition variable, we register it
    together with a synchronizating mutex in THD::mysys_var. After
    this, we need to test THD::killed again. At some places we did
    only test it in a loop condition before the registration. When
    THD::killed had been set between this test and the registration,
    we entered waiting without noticing the killed flag. Additional
    checks ahve been introduced where required.
    
    In addition to the above, a re-write of the main.kill test
    case has been done. All sleeps have been replaced by Debug
    Sync Facility synchronization. A couple of sync points have
    been added to the server code.
    
    To avoid further problems, if the test case fails in spite of
    the fixes, the test case has been added to the "experimental"
    list for now.
    
    - Most of the work on this patch is authored by Ingo Struewing
    ae6801eb
lock.cc 42 KB