• Ingo Molnar's avatar
    [PATCH] sys_exit_group(), threading, 2.5.34 · b62bf732
    Ingo Molnar authored
    This is another step to have better threading support under Linux, it
    implements the sys_exit_group() system call.
    
    It's a straightforward extension of the generic 'thread group' concept,
    which extension also comes handy to solve a number of problems when
    implementing POSIX threads.
    
    POSIX exit() [the C library function] has the following semantics: all
    thread have to exit and the waiting parent has to get the exit code that
    was specified for the exit() function.  It also has to be ensured that
    every thread has truly finished its work by the time the parent gets the
    notification.  The exit code has to be propagated properly to the parent
    thread even if not the thread group leader calls the exit() function.
    
    Normal single-thread exit is done via the pthread_exit() function, which
    calls sys_exit().
    
    Previous incarnations of Linux POSIX threads implementations chose the
    following solution: send a 'thread management' signal to the thread
    group leader via tkill(), which thread goes around and kills every
    thread in the group (except itself), then calls sys_exit() with the
    proper exit code.  Both old libpthreads and NGPT use this solution.
    
    This works to a certain degree, unless a userspace threading library
    uses the initial thread for normal thread work [like the new
    libpthreads], which 'work' can cause the initial thread to exit
    prematurely.
    
    At this point the threading library has to catch the group leader in
    pthread_exit() and has to keep the management thread 'hanging around'
    artificially, waiting for the management signal. Besides being slightly
    confusing to users ('why is this thread still around?') even this variant
    is unrobust: if the initial thread is killed by the kernel (SIGSEGV or any
    other thread-specific event that triggers do_exit()) then the thread goes
    away without the thread library having a chance to intervene.
    
    the sys_exit_group() syscall implements the mechanism within the kernel,
    which, besides robustness, is also *much* faster. Instead of the threading
    library having to tkill() every thread available, the kernel can use the
    already existing 'broadcast signal' capability. (the threading library
    cannot use broadcast signals because that would kill the initial thread as
    well.)
    
    as a side-effect of the completion mechanism used by sys_exit_group() it
    was also possible to make the initial thread hang around as a zombie until
    every other thread in the group has exited. A 'Z' state thread is much
    easier to understand by users - it's around because it has to wait for all
    other threads to exit first.
    
    and as a side-effect of the initial thread hanging around in a guaranteed
    way, there are three advantages:
    
     - signals sent to the thread group via sys_kill() work again. Previously
       if the initial thread exited then all subsequent sys_kill() calls to
       the group PID failed with a -ESRCH.
    
     - the get_pid() function got faster: it does not have to check for tgid
       collision anymore.
    
     - procps has an easier job displaying threaded applications - since the
       thread group leader is always around, no thread group can 'hide' from
       procps just because the thread group leader has exited.
    
     [ - NOTE: the same mechanism can/will also be used by the upcoming
         threaded-coredumps patch. ]
    
    there's also another (small) advantage for threading libraries: eg. the
    new libpthreads does not even have any notion of 'group of threads'
    anymore - it does not maintain any global list of threads. Via this
    syscall it can purely rely on the kernel to manage thread groups.
    
    the patch itself does some internal changes to the way a thread exits: now
    the unhashing of the PID and the signal-freeing is done atomically. This
    is needed to make sure the thread group leader unhashes itself precisely
    when the last thread group member has exited.
    
    (the sys_exit_group() syscall has been used by glibc's new libpthreads
    code for the past couple of weeks and the concept is working just fine.)
    b62bf732
signal.c 43.8 KB