Commit 6dfc8897 authored by Ingo Molnar's avatar Ingo Molnar Committed by Linus Torvalds

[PATCH] shared thread signals

Support POSIX compliant thread signals on a kernel level with usable
debugging (broadcast SIGSTOP, SIGCONT) and thread group management
(broadcast SIGKILL), plus to load-balance 'process' signals between
threads for better signal performance. 

Changes:

- POSIX thread semantics for signals

there are 7 'types' of actions a signal can take: specific, load-balance,
kill-all, kill-all+core, stop-all, continue-all and ignore. Depending on
the POSIX specifications each signal has one of the types defined for both
the 'handler defined' and the 'handler not defined (kernel default)' case.  
Here is the table:

 ----------------------------------------------------------
 |                    |  userspace       |  kernel        |
 ----------------------------------------------------------
 |  SIGHUP            |  load-balance    |  kill-all      |
 |  SIGINT            |  load-balance    |  kill-all      |
 |  SIGQUIT           |  load-balance    |  kill-all+core |
 |  SIGILL            |  specific        |  kill-all+core |
 |  SIGTRAP           |  specific        |  kill-all+core |
 |  SIGABRT/SIGIOT    |  specific        |  kill-all+core |
 |  SIGBUS            |  specific        |  kill-all+core |
 |  SIGFPE            |  specific        |  kill-all+core |
 |  SIGKILL           |  n/a             |  kill-all      |
 |  SIGUSR1           |  load-balance    |  kill-all      |
 |  SIGSEGV           |  specific        |  kill-all+core |
 |  SIGUSR2           |  load-balance    |  kill-all      |
 |  SIGPIPE           |  specific        |  kill-all      |
 |  SIGALRM           |  load-balance    |  kill-all      |
 |  SIGTERM           |  load-balance    |  kill-all      |
 |  SIGCHLD           |  load-balance    |  ignore        |
 |  SIGCONT           |  load-balance    |  continue-all  |
 |  SIGSTOP           |  n/a             |  stop-all      |
 |  SIGTSTP           |  load-balance    |  stop-all      |
 |  SIGTTIN           |  load-balancen   |  stop-all      |
 |  SIGTTOU           |  load-balancen   |  stop-all      |
 |  SIGURG            |  load-balance    |  ignore        |
 |  SIGXCPU           |  specific        |  kill-all+core |
 |  SIGXFSZ           |  specific        |  kill-all+core |
 |  SIGVTALRM         |  load-balance    |  kill-all      |
 |  SIGPROF           |  specific        |  kill-all      |
 |  SIGPOLL/SIGIO     |  load-balance    |  kill-all      |
 |  SIGSYS/SIGUNUSED  |  specific        |  kill-all+core |
 |  SIGSTKFLT         |  specific        |  kill-all      |
 |  SIGWINCH          |  load-balance    |  ignore        |
 |  SIGPWR            |  load-balance    |  kill-all      |
 |  SIGRTMIN-SIGRTMAX |  load-balance    |  kill-all      |
 ----------------------------------------------------------

as you can see it from the list, signals that have handlers defined never 
get broadcasted - they are either specific or load-balanced.

- CLONE_THREAD implies CLONE_SIGHAND

It does not make much sense to have a thread group that does not share
signal handlers. In fact in the patch i'm using the signal spinlock to
lock access to the thread group. I made the siglock IRQ-safe, thus we can
load-balance signals from interrupt contexts as well. (we cannot take the
tasklist lock in write mode from IRQ handlers.)

this is not as clean as i'd like it to be, but it's the best i could come
up with so far.

- thread group list management reworked.

threads are now removed from the group if the thread is unhashed from the
PID table. This makes the most sense. This also helps with another feature 
that relies on an intact thread group list: multithreaded coredumps.

- child reparenting reworked.

the O(N) algorithm in forget_original_parent() causes massive performance
problems if a large number of threads exit from the group. Performance 
improves more than 10-fold if the following simple rules are followed 
instead:

 - reparent children to the *previous* thread [exiting or not]
 - if a thread is detached then reparent to init.

- fast broadcasting of kernel-internal SIGSTOP, SIGCONT, SIGKILL, etc.

kernel-internal broadcasted signals are a potential DoS problem, since
they might generate massive amounts of GFP_ATOMIC allocations of siginfo
structures. The important thing to note is that the siginfo structure does
not actually have to be allocated and queued - the signal processing code
has all the information it needs, neither of these signals carries any
information in the siginfo structure. This makes a broadcast SIGKILL a
very simple operation: all threads get the bit 9 set in their pending
bitmask. The speedup due to this was significant - and the robustness win
is invaluable.

- sys_execve() should not kill off 'all other' threads.

the 'exec kills all threads if the master thread does the exec()' is a
POSIX(-ish) thing that should not be hardcoded in the kernel in this case.

to handle POSIX exec() semantics, glibc uses a special syscall, which
kills 'all but self' threads: sys_exit_allbutself().

the straightforward exec() implementation just calls sys_exit_allbutself()  
and then sys_execve().

(this syscall is also be used internally if the thread group leader
thread sys_exit()s or sys_exec()s, to ensure the integrity of the thread
group.)
parent 36780249
......@@ -504,6 +504,8 @@ static inline int make_private_signals(void)
{
struct signal_struct * newsig;
remove_thread_group(current, current->sig);
if (atomic_read(&current->sig->count) <= 1)
return 0;
newsig = kmem_cache_alloc(sigact_cachep, GFP_KERNEL);
......@@ -575,42 +577,10 @@ static inline void flush_old_files(struct files_struct * files)
*/
static void de_thread(struct task_struct *tsk)
{
struct task_struct *sub;
struct list_head *head, *ptr;
struct siginfo info;
int pause;
write_lock_irq(&tasklist_lock);
if (tsk->tgid != tsk->pid) {
/* subsidiary thread - just escapes the group */
list_del_init(&tsk->thread_group);
tsk->tgid = tsk->pid;
pause = 0;
}
else {
/* master thread - kill all subsidiary threads */
info.si_signo = SIGKILL;
info.si_errno = 0;
info.si_code = SI_DETHREAD;
info.si_pid = current->pid;
info.si_uid = current->uid;
head = tsk->thread_group.next;
list_del_init(&tsk->thread_group);
list_for_each(ptr,head) {
sub = list_entry(ptr,struct task_struct,thread_group);
send_sig_info(SIGKILL,&info,sub);
}
pause = 1;
}
write_unlock_irq(&tasklist_lock);
/* give the subsidiary threads a chance to clean themselves up */
if (pause) yield();
if (!list_empty(&tsk->thread_group))
BUG();
/* An exec() starts a new thread group: */
tsk->tgid = tsk->pid;
}
int flush_old_exec(struct linux_binprm * bprm)
......@@ -633,6 +603,8 @@ int flush_old_exec(struct linux_binprm * bprm)
if (retval) goto mmap_failed;
/* This is the point of no return */
de_thread(current);
release_old_signals(oldsig);
current->sas_ss_sp = current->sas_ss_size = 0;
......@@ -651,9 +623,6 @@ int flush_old_exec(struct linux_binprm * bprm)
flush_thread();
if (!list_empty(&current->thread_group))
de_thread(current);
if (bprm->e_uid != current->euid || bprm->e_gid != current->egid ||
permission(bprm->file->f_dentry->d_inode,MAY_READ))
current->mm->dumpable = 0;
......
......@@ -158,6 +158,8 @@ typedef struct {
#define rwlock_init(x) do { *(x) = RW_LOCK_UNLOCKED; } while(0)
#define rwlock_is_locked(x) ((x)->lock != RW_LOCK_BIAS)
/*
* On x86, we implement read-write locks as a 32-bit counter
* with the high bit (sign) being the "contended" bit.
......
......@@ -211,6 +211,11 @@ struct signal_struct {
atomic_t count;
struct k_sigaction action[_NSIG];
spinlock_t siglock;
/* current thread group signal load-balancing target: */
task_t *curr_target;
struct sigpending shared_pending;
};
/*
......@@ -356,7 +361,7 @@ struct task_struct {
spinlock_t sigmask_lock; /* Protects signal and blocked */
struct signal_struct *sig;
sigset_t blocked;
sigset_t blocked, real_blocked, shared_unblocked;
struct sigpending pending;
unsigned long sas_ss_sp;
......@@ -431,6 +436,7 @@ extern void set_cpus_allowed(task_t *p, unsigned long new_mask);
extern void set_user_nice(task_t *p, long nice);
extern int task_prio(task_t *p);
extern int task_nice(task_t *p);
extern int task_curr(task_t *p);
extern int idle_cpu(int cpu);
void yield(void);
......@@ -535,7 +541,7 @@ extern void proc_caches_init(void);
extern void flush_signals(struct task_struct *);
extern void flush_signal_handlers(struct task_struct *);
extern void sig_exit(int, int, struct siginfo *);
extern int dequeue_signal(sigset_t *, siginfo_t *);
extern int dequeue_signal(struct sigpending *pending, sigset_t *mask, siginfo_t *info);
extern void block_all_signals(int (*notifier)(void *priv), void *priv,
sigset_t *mask);
extern void unblock_all_signals(void);
......@@ -654,6 +660,7 @@ extern void exit_thread(void);
extern void exit_mm(struct task_struct *);
extern void exit_files(struct task_struct *);
extern void exit_sighand(struct task_struct *);
extern void remove_thread_group(struct task_struct *tsk, struct signal_struct *sig);
extern void reparent_to_init(void);
extern void daemonize(void);
......@@ -786,8 +793,29 @@ static inline struct task_struct *younger_sibling(struct task_struct *p)
#define for_each_thread(task) \
for (task = next_thread(current) ; task != current ; task = next_thread(task))
#define next_thread(p) \
list_entry((p)->thread_group.next, struct task_struct, thread_group)
static inline task_t *next_thread(task_t *p)
{
if (!p->sig)
BUG();
#if CONFIG_SMP
if (!spin_is_locked(&p->sig->siglock) &&
!rwlock_is_locked(&tasklist_lock))
BUG();
#endif
return list_entry((p)->thread_group.next, task_t, thread_group);
}
static inline task_t *prev_thread(task_t *p)
{
if (!p->sig)
BUG();
#if CONFIG_SMP
if (!spin_is_locked(&p->sig->siglock) &&
!rwlock_is_locked(&tasklist_lock))
BUG();
#endif
return list_entry((p)->thread_group.prev, task_t, thread_group);
}
#define thread_group_leader(p) (p->pid == p->tgid)
......@@ -903,21 +931,8 @@ static inline void cond_resched(void)
This is required every time the blocked sigset_t changes.
Athread cathreaders should have t->sigmask_lock. */
static inline void recalc_sigpending_tsk(struct task_struct *t)
{
if (has_pending_signals(&t->pending.signal, &t->blocked))
set_tsk_thread_flag(t, TIF_SIGPENDING);
else
clear_tsk_thread_flag(t, TIF_SIGPENDING);
}
static inline void recalc_sigpending(void)
{
if (has_pending_signals(&current->pending.signal, &current->blocked))
set_thread_flag(TIF_SIGPENDING);
else
clear_thread_flag(TIF_SIGPENDING);
}
extern FASTCALL(void recalc_sigpending_tsk(struct task_struct *t));
extern void recalc_sigpending(void);
/*
* Wrappers for p->thread_info->cpu access. No-op on UP.
......
......@@ -36,7 +36,6 @@ static inline void __unhash_process(struct task_struct *p)
nr_threads--;
unhash_pid(p);
REMOVE_LINKS(p);
list_del(&p->thread_group);
p->pid = 0;
proc_dentry = p->proc_dentry;
if (unlikely(proc_dentry != NULL)) {
......@@ -73,6 +72,7 @@ static void release_task(struct task_struct * p)
}
BUG_ON(!list_empty(&p->ptrace_list) || !list_empty(&p->ptrace_children));
unhash_process(p);
exit_sighand(p);
release_thread(p);
if (p != current) {
......@@ -244,7 +244,8 @@ void daemonize(void)
static void reparent_thread(task_t *p, task_t *reaper, task_t *child_reaper)
{
/* We dont want people slaying init */
p->exit_signal = SIGCHLD;
if (p->exit_signal != -1)
p->exit_signal = SIGCHLD;
p->self_exec_id++;
/* Make sure we're not reparenting to ourselves */
......@@ -412,18 +413,15 @@ void exit_mm(struct task_struct *tsk)
*/
static inline void forget_original_parent(struct task_struct * father)
{
struct task_struct *p, *reaper;
struct task_struct *p, *reaper = father;
struct list_head *_p;
read_lock(&tasklist_lock);
write_lock_irq(&tasklist_lock);
/* Next in our thread group, if they're not already exiting */
reaper = father;
do {
reaper = next_thread(reaper);
if (!(reaper->flags & PF_EXITING))
break;
} while (reaper != father);
if (father->exit_signal != -1)
reaper = prev_thread(reaper);
else
reaper = child_reaper;
if (reaper == father)
reaper = child_reaper;
......@@ -444,7 +442,7 @@ static inline void forget_original_parent(struct task_struct * father)
p = list_entry(_p,struct task_struct,ptrace_list);
reparent_thread(p, reaper, child_reaper);
}
read_unlock(&tasklist_lock);
write_unlock_irq(&tasklist_lock);
}
static inline void zap_thread(task_t *p, task_t *father, int traced)
......@@ -604,7 +602,6 @@ NORET_TYPE void do_exit(long code)
__exit_files(tsk);
__exit_fs(tsk);
exit_namespace(tsk);
exit_sighand(tsk);
exit_thread();
if (current->leader)
......@@ -763,6 +760,8 @@ asmlinkage long sys_wait4(pid_t pid,unsigned int * stat_addr, int options, struc
if (options & __WNOTHREAD)
break;
tsk = next_thread(tsk);
if (tsk->sig != current->sig)
BUG();
} while (tsk != current);
read_unlock(&tasklist_lock);
if (flag) {
......
......@@ -630,6 +630,9 @@ static inline int copy_sighand(unsigned long clone_flags, struct task_struct * t
spin_lock_init(&sig->siglock);
atomic_set(&sig->count, 1);
memcpy(tsk->sig->action, current->sig->action, sizeof(tsk->sig->action));
sig->curr_target = NULL;
init_sigpending(&sig->shared_pending);
return 0;
}
......@@ -664,6 +667,12 @@ static struct task_struct *copy_process(unsigned long clone_flags,
if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
return ERR_PTR(-EINVAL);
/*
* Thread groups must share signals as well:
*/
if (clone_flags & CLONE_THREAD)
clone_flags |= CLONE_SIGHAND;
retval = security_ops->task_create(clone_flags);
if (retval)
goto fork_out;
......@@ -843,8 +852,10 @@ static struct task_struct *copy_process(unsigned long clone_flags,
p->parent = p->real_parent;
if (clone_flags & CLONE_THREAD) {
spin_lock(&current->sig->siglock);
p->tgid = current->tgid;
list_add(&p->thread_group, &current->thread_group);
spin_unlock(&current->sig->siglock);
}
SET_LINKS(p);
......
......@@ -1335,6 +1335,15 @@ int task_nice(task_t *p)
return TASK_NICE(p);
}
/**
* task_curr - is this task currently executing on a CPU?
* @p: the task in question.
*/
int task_curr(task_t *p)
{
return cpu_curr(task_cpu(p)) == p;
}
/**
* idle_cpu - is a given cpu idle currently?
* @cpu: the processor in question.
......
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment