• Ingo Molnar's avatar
    [PATCH] shared thread signals · 6dfc8897
    Ingo Molnar authored
    Support POSIX compliant thread signals on a kernel level with usable
    debugging (broadcast SIGSTOP, SIGCONT) and thread group management
    (broadcast SIGKILL), plus to load-balance 'process' signals between
    threads for better signal performance. 
    
    Changes:
    
    - POSIX thread semantics for signals
    
    there are 7 'types' of actions a signal can take: specific, load-balance,
    kill-all, kill-all+core, stop-all, continue-all and ignore. Depending on
    the POSIX specifications each signal has one of the types defined for both
    the 'handler defined' and the 'handler not defined (kernel default)' case.  
    Here is the table:
    
     ----------------------------------------------------------
     |                    |  userspace       |  kernel        |
     ----------------------------------------------------------
     |  SIGHUP            |  load-balance    |  kill-all      |
     |  SIGINT            |  load-balance    |  kill-all      |
     |  SIGQUIT           |  load-balance    |  kill-all+core |
     |  SIGILL            |  specific        |  kill-all+core |
     |  SIGTRAP           |  specific        |  kill-all+core |
     |  SIGABRT/SIGIOT    |  specific        |  kill-all+core |
     |  SIGBUS            |  specific        |  kill-all+core |
     |  SIGFPE            |  specific        |  kill-all+core |
     |  SIGKILL           |  n/a             |  kill-all      |
     |  SIGUSR1           |  load-balance    |  kill-all      |
     |  SIGSEGV           |  specific        |  kill-all+core |
     |  SIGUSR2           |  load-balance    |  kill-all      |
     |  SIGPIPE           |  specific        |  kill-all      |
     |  SIGALRM           |  load-balance    |  kill-all      |
     |  SIGTERM           |  load-balance    |  kill-all      |
     |  SIGCHLD           |  load-balance    |  ignore        |
     |  SIGCONT           |  load-balance    |  continue-all  |
     |  SIGSTOP           |  n/a             |  stop-all      |
     |  SIGTSTP           |  load-balance    |  stop-all      |
     |  SIGTTIN           |  load-balancen   |  stop-all      |
     |  SIGTTOU           |  load-balancen   |  stop-all      |
     |  SIGURG            |  load-balance    |  ignore        |
     |  SIGXCPU           |  specific        |  kill-all+core |
     |  SIGXFSZ           |  specific        |  kill-all+core |
     |  SIGVTALRM         |  load-balance    |  kill-all      |
     |  SIGPROF           |  specific        |  kill-all      |
     |  SIGPOLL/SIGIO     |  load-balance    |  kill-all      |
     |  SIGSYS/SIGUNUSED  |  specific        |  kill-all+core |
     |  SIGSTKFLT         |  specific        |  kill-all      |
     |  SIGWINCH          |  load-balance    |  ignore        |
     |  SIGPWR            |  load-balance    |  kill-all      |
     |  SIGRTMIN-SIGRTMAX |  load-balance    |  kill-all      |
     ----------------------------------------------------------
    
    as you can see it from the list, signals that have handlers defined never 
    get broadcasted - they are either specific or load-balanced.
    
    - CLONE_THREAD implies CLONE_SIGHAND
    
    It does not make much sense to have a thread group that does not share
    signal handlers. In fact in the patch i'm using the signal spinlock to
    lock access to the thread group. I made the siglock IRQ-safe, thus we can
    load-balance signals from interrupt contexts as well. (we cannot take the
    tasklist lock in write mode from IRQ handlers.)
    
    this is not as clean as i'd like it to be, but it's the best i could come
    up with so far.
    
    - thread group list management reworked.
    
    threads are now removed from the group if the thread is unhashed from the
    PID table. This makes the most sense. This also helps with another feature 
    that relies on an intact thread group list: multithreaded coredumps.
    
    - child reparenting reworked.
    
    the O(N) algorithm in forget_original_parent() causes massive performance
    problems if a large number of threads exit from the group. Performance 
    improves more than 10-fold if the following simple rules are followed 
    instead:
    
     - reparent children to the *previous* thread [exiting or not]
     - if a thread is detached then reparent to init.
    
    - fast broadcasting of kernel-internal SIGSTOP, SIGCONT, SIGKILL, etc.
    
    kernel-internal broadcasted signals are a potential DoS problem, since
    they might generate massive amounts of GFP_ATOMIC allocations of siginfo
    structures. The important thing to note is that the siginfo structure does
    not actually have to be allocated and queued - the signal processing code
    has all the information it needs, neither of these signals carries any
    information in the siginfo structure. This makes a broadcast SIGKILL a
    very simple operation: all threads get the bit 9 set in their pending
    bitmask. The speedup due to this was significant - and the robustness win
    is invaluable.
    
    - sys_execve() should not kill off 'all other' threads.
    
    the 'exec kills all threads if the master thread does the exec()' is a
    POSIX(-ish) thing that should not be hardcoded in the kernel in this case.
    
    to handle POSIX exec() semantics, glibc uses a special syscall, which
    kills 'all but self' threads: sys_exit_allbutself().
    
    the straightforward exec() implementation just calls sys_exit_allbutself()  
    and then sys_execve().
    
    (this syscall is also be used internally if the thread group leader
    thread sys_exit()s or sys_exec()s, to ensure the integrity of the thread
    group.)
    6dfc8897
sched.c 51.1 KB