• Roland McGrath's avatar
    [PATCH] fix stop signal race · 0b4eff5d
    Roland McGrath authored
    The `sig_avoid_stop_race' checks fail to catch a related race scenario that
    can happen.  I don't think this has been seen in nature, but it could
    happen in the same sorts of situations where the observed problems come up
    that those checks work around.  This patch takes a different approach to
    catching this race condition.  The new approach plugs the hole, and I think
    is also cleaner.
    
    The issue is a race between one CPU processing a stop signal while another
    CPU processes a SIGCONT or SIGKILL.  There is a window in stop-signal
    processing where the siglock must be released.  If a SIGCONT or SIGKILL
    comes along here on another CPU, then the stop signal in the midst of being
    processed needs to be discarded rather than having the stop take place
    after the SIGCONT or SIGKILL has been generated.  The existing workaround
    checks for this case explicitly by looking for a pending SIGCONT or SIGKILL
    after reacquiring the lock.
    
    However, there is another problem related to the same race issue.  In the
    window where the processing of the stop signal has released the siglock,
    the stop signal is not represented in the pending set any more, but it is
    still "pending" and not "delivered" in POSIX terms.  The SIGCONT coming in
    this window is required to clear all pending stop signals.  But, if a stop
    signal has been dequeued but not yet processed, the SIGCONT generation will
    fail to clear it (in handle_stop_signal).  Likewise, a SIGKILL coming here
    should prevent the stop processing and make the thread die immediately
    instead.  The `sig_avoid_stop_race' code checks for this by examining the
    pending set to see if SIGCONT or SIGKILL is in it.  But this fails to
    handle the case where another CPU running another thread in the same
    process has already dequeued the signal (so it no longer can be found in
    the pending set).  We must catch this as well, so that the same problems do
    not arise when another thread on another CPU acted real fast.
    
    I've fixed this dumping the `sig_avoid_stop_race' kludge in favor of a
    little explicit bookkeeping.  Now, dequeuing any stop signal sets a flag
    saying that a pending stop signal has been taken on by some CPU since the
    last time all pending stop signals were cleared due to SIGCONT/SIGKILL. 
    The processing of stop signals checks the flag after the window where it
    released the lock, and abandons the signal the flag has been cleared.  The
    code that clears pending stop signals on SIGCONT generation also clears
    this flag.  The various places that are trying to ensure the process dies
    quickly (SIGKILL or other unhandled signals) also clear the flag.  I've
    made this a general flags word in signal_struct, and replaced the
    stop_state field with flag bits in this word.
    Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    0b4eff5d
exit.c 38.3 KB