[PATCH] signal-fixes-2.5.59-A4

this is the current threading patchset, which accumulated up during the past two weeks. It consists of a biggest set of changes from Roland, to make threaded signals work. There were still tons of testcases and boundary conditions (mostly in the signal/exit/ptrace area) that we did not handle correctly. Roland's thread-signal semantics/behavior/ptrace fixes: - fix signal delivery race with do_exit() => signals are re-queued to the 'process' if do_exit() finds pending unhandled ones. This prevents signals getting lost upon thread-sys_exit(). - a non-main thread has died on one processor and gone to TASK_ZOMBIE, but before it's gotten to release_task a sys_wait4 on the other processor reaps it. It's only because it's ptraced that this gets through eligible_child. Somewhere in there the main thread is also dying so it reparents the child thread to hit that case. This means that there is a race where P might be totally invalid. - forget_original_parent is not doing the right thing when the group leader dies, i.e. reparenting threads to init when there is a zombie group leader. Perhaps it doesn't matter for any practical purpose without ptrace, though it makes for ppid=1 for each thread in core dumps, which looks funny. Incidentally, SIGCHLD here really should be p->exit_signal. - one of the gdb tests makes a questionable assumption about what kill will do when it has some threads stopped by ptrace and others running. exit races: 1. Processor A is in sys_wait4 case TASK_STOPPED considering task P. Processor B is about to resume P and then switch to it. While A is inside that case block, B starts running P and it clears P->exit_code, or takes a pending fatal signal and sets it to a new value. Depending on the interleaving, the possible failure modes are: a. A gets to its put_user after B has cleared P->exit_code => returns with WIFSTOPPED, WSTOPSIG==0 b. A gets to its put_user after B has set P->exit_code anew => returns with e.g. WIFSTOPPED, WSTOPSIG==SIGKILL A can spend an arbitrarily long time in that case block, because there's getrusage and put_user that can take page faults, and write_lock'ing of the tasklist_lock that can block. But even if it's short the race is there in principle. 2. This is new with NPTL, i.e. CLONE_THREAD. Two processors A and B are both in sys_wait4 case TASK_STOPPED considering task P. Both get through their tests and fetches of P->exit_code before either gets to P->exit_code = 0. => two threads return the same pid from waitpid. In other interleavings where one processor gets to its put_user after the other has cleared P->exit_code, it's like case 1(a). 3. SMP races with stop/cont signals First, take: kill(pid, SIGSTOP); kill(pid, SIGCONT); or: kill(pid, SIGSTOP); kill(pid, SIGKILL); It's possible for this to leave the process stopped with a pending SIGCONT/SIGKILL. That's a state that should never be possible. Moreover, kill(pid, SIGKILL) without any repetition should always be enough to kill a process. (Likewise SIGCONT when you know it's sequenced after the last stop signal, must be sufficient to resume a process.) 4. take: kill(pid, SIGKILL); // or any fatal signal kill(pid, SIGCONT); // or SIGKILL it's possible for this to cause pid to be reaped with status 0 instead of its true termination status. The equivalent scenario happens when the process being killed is in an _exit call or a trap-induced fatal signal before the kills. plus i've done stability fixes for bugs that popped up during beta-testing, and minor tidying of Roland's changes: - a rare tasklist corruption during exec, causing some very spurious and colorful crashes. - a copy_process()-related dereference of already freed thread structure if hit with a SIGKILL in the wrong moment. - SMP spinlock deadlocks in the signal code this patchset has been tested quite well in the 2.4 backport of the threading changes - and i've done some stresstesting on 2.5.59 SMP as well, and did an x86 UP testcompile + testboot as well.

[PATCH] signal-fixes-2.5.59-A4
this is the current threading patchset, which accumulated up during the past two weeks. It consists of a biggest set of changes from Roland, to make threaded signals work. There were still tons of testcases and boundary conditions (mostly in the signal/exit/ptrace area) that we did not handle correctly. Roland's thread-signal semantics/behavior/ptrace fixes: - fix signal delivery race with do_exit() => signals are re-queued to the 'process' if do_exit() finds pending unhandled ones. This prevents signals getting lost upon thread-sys_exit(). - a non-main thread has died on one processor and gone to TASK_ZOMBIE, but before it's gotten to release_task a sys_wait4 on the other processor reaps it. It's only because it's ptraced that this gets through eligible_child. Somewhere in there the main thread is also dying so it reparents the child thread to hit that case. This means that there is a race where P might be totally invalid. - forget_original_parent is not doing the right thing when the group leader dies, i.e. reparenting threads to init when there is a zombie group leader. Perhaps it doesn't matter for any practical purpose without ptrace, though it makes for ppid=1 for each thread in core dumps, which looks funny. Incidentally, SIGCHLD here really should be p->exit_signal. - one of the gdb tests makes a questionable assumption about what kill will do when it has some threads stopped by ptrace and others running. exit races: 1. Processor A is in sys_wait4 case TASK_STOPPED considering task P. Processor B is about to resume P and then switch to it. While A is inside that case block, B starts running P and it clears P->exit_code, or takes a pending fatal signal and sets it to a new value. Depending on the interleaving, the possible failure modes are: a. A gets to its put_user after B has cleared P->exit_code => returns with WIFSTOPPED, WSTOPSIG==0 b. A gets to its put_user after B has set P->exit_code anew => returns with e.g. WIFSTOPPED, WSTOPSIG==SIGKILL A can spend an arbitrarily long time in that case block, because there's getrusage and put_user that can take page faults, and write_lock'ing of the tasklist_lock that can block. But even if it's short the race is there in principle. 2. This is new with NPTL, i.e. CLONE_THREAD. Two processors A and B are both in sys_wait4 case TASK_STOPPED considering task P. Both get through their tests and fetches of P->exit_code before either gets to P->exit_code = 0. => two threads return the same pid from waitpid. In other interleavings where one processor gets to its put_user after the other has cleared P->exit_code, it's like case 1(a). 3. SMP races with stop/cont signals First, take: kill(pid, SIGSTOP); kill(pid, SIGCONT); or: kill(pid, SIGSTOP); kill(pid, SIGKILL); It's possible for this to leave the process stopped with a pending SIGCONT/SIGKILL. That's a state that should never be possible. Moreover, kill(pid, SIGKILL) without any repetition should always be enough to kill a process. (Likewise SIGCONT when you know it's sequenced after the last stop signal, must be sufficient to resume a process.) 4. take: kill(pid, SIGKILL); // or any fatal signal kill(pid, SIGCONT); // or SIGKILL it's possible for this to cause pid to be reaped with status 0 instead of its true termination status. The equivalent scenario happens when the process being killed is in an _exit call or a trap-induced fatal signal before the kills. plus i've done stability fixes for bugs that popped up during beta-testing, and minor tidying of Roland's changes: - a rare tasklist corruption during exec, causing some very spurious and colorful crashes. - a copy_process()-related dereference of already freed thread structure if hit with a SIGKILL in the wrong moment. - SMP spinlock deadlocks in the signal code this patchset has been tested quite well in the 2.4 backport of the threading changes - and i've done some stresstesting on 2.5.59 SMP as well, and did an x86 UP testcompile + testboot as well.
ebf5ebe3 · Ingo Molnar · Linus Torvalds · 44a5a59c · ebf5ebe3 · ebf5ebe3
Commit ebf5ebe3 authored Feb 05, 2003 by Ingo Molnar Committed by Linus Torvalds Feb 05, 2003
6 changed files
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -587,7 +587,7 @@ static inline int de_thread(struct signal_struct *oldsig)
 		return -EAGAIN;
 	}
 	oldsig->group_exit = 1;
-	__broadcast_thread_group(current, SIGKILL);
+	zap_other_threads(current);

 	/*
 	 * Account for the thread group leader hanging around:
@@ -660,6 +660,7 @@ static inline int de_thread(struct signal_struct *oldsig)
 			__ptrace_link(current, parent);
 		}

+		list_del(&current->tasks);
 		list_add_tail(&current->tasks, &init_task.tasks);
 		current->exit_signal = SIGCHLD;
 		state = leader->state;
@@ -680,6 +681,7 @@ static inline int de_thread(struct signal_struct *oldsig)
 	newsig->group_exit = 0;
 	newsig->group_exit_code = 0;
 	newsig->group_exit_task = NULL;
+	newsig->group_stop_count = 0;
 	memcpy(newsig->action, current->sig->action, sizeof(newsig->action));
 	init_sigpending(&newsig->shared_pending);


--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -235,6 +235,9 @@ struct signal_struct {
 	int			group_exit;
 	int			group_exit_code;
 	struct task_struct	*group_exit_task;
+
+	/* thread group stop support, overloads group_exit_code too */
+	int			group_stop_count;
 };

 /*
@@ -508,7 +511,6 @@ extern int in_egroup_p(gid_t);
 extern void proc_caches_init(void);
 extern void flush_signals(struct task_struct *);
 extern void flush_signal_handlers(struct task_struct *);
-extern void sig_exit(int, int, struct siginfo *);
 extern int dequeue_signal(sigset_t *mask, siginfo_t *info);
 extern void block_all_signals(int (*notifier)(void *priv), void *priv,
 			      sigset_t *mask);
@@ -525,7 +527,7 @@ extern void do_notify_parent(struct task_struct *, int);
 extern void force_sig(int, struct task_struct *);
 extern void force_sig_specific(int, struct task_struct *);
 extern int send_sig(int, struct task_struct *, int);
-extern int __broadcast_thread_group(struct task_struct *p, int sig);
+extern void zap_other_threads(struct task_struct *p);
 extern int kill_pg(pid_t, int, int);
 extern int kill_sl(pid_t, int, int);
 extern int kill_proc(pid_t, int, int);
@@ -590,6 +592,8 @@ extern void exit_files(struct task_struct *);
 extern void exit_sighand(struct task_struct *);
 extern void __exit_sighand(struct task_struct *);

+extern NORET_TYPE void do_group_exit(int);
+
 extern void reparent_to_init(void);
 extern void daemonize(void);
 extern task_t *child_reaper;
@@ -762,6 +766,8 @@ static inline void cond_resched_lock(spinlock_t * lock)
 extern FASTCALL(void recalc_sigpending_tsk(struct task_struct *t));
 extern void recalc_sigpending(void);

+extern void signal_wake_up(struct task_struct *t, int resume_stopped);
+
 /*
 * Wrappers for p->thread_info->cpu access. No-op on UP.
 */

--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -647,7 +647,7 @@ NORET_TYPE void do_exit(long code)
 	exit_namespace(tsk);
 	exit_thread();

-	if (current->leader)
+	if (tsk->leader)
 		disassociate_ctty(1);

 	module_put(tsk->thread_info->exec_domain->module);
@@ -657,8 +657,31 @@ NORET_TYPE void do_exit(long code)
 	tsk->exit_code = code;
 	exit_notify();
 	preempt_disable();
-	if (current->exit_signal == -1)
-		release_task(current);
+	if (signal_pending(tsk) && !tsk->sig->group_exit
+	    && !thread_group_empty(tsk)) {
+		/*
+		 * This occurs when there was a race between our exit
+		 * syscall and a group signal choosing us as the one to
+		 * wake up.  It could be that we are the only thread
+		 * alerted to check for pending signals, but another thread
+		 * should be woken now to take the signal since we will not.
+		 * Now we'll wake all the threads in the group just to make
+		 * sure someone gets all the pending signals.
+		 */
+		struct task_struct *t;
+		read_lock(&tasklist_lock);
+		spin_lock_irq(&tsk->sig->siglock);
+		for (t = next_thread(tsk); t != tsk; t = next_thread(t))
+			if (!signal_pending(t) && !(t->flags & PF_EXITING)) {
+				recalc_sigpending_tsk(t);
+				if (signal_pending(t))
+					signal_wake_up(t, 0);
+			}
+		spin_unlock_irq(&tsk->sig->siglock);
+		read_unlock(&tasklist_lock);
+	}
+	if (tsk->exit_signal == -1)
+		release_task(tsk);
 	schedule();
 	BUG();
 /*
@@ -710,31 +733,44 @@ task_t *next_thread(task_t *p)
 }

 /*
- * this kills every thread in the thread group. Note that any externally
- * wait4()-ing process will get the correct exit code - even if this 
- * thread is not the thread group leader.
+ * Take down every thread in the group.  This is called by fatal signals
+ * as well as by sys_exit_group (below).
 */
-asmlinkage long sys_exit_group(int error_code)
+NORET_TYPE void
+do_group_exit(int exit_code)
 {
-	unsigned int exit_code = (error_code & 0xff) << 8;
-
-	if (!thread_group_empty(current)) {
-		struct signal_struct *sig = current->sig;
+	BUG_ON(exit_code & 0x80); /* core dumps don't get here */

+	if (current->sig->group_exit)
+		exit_code = current->sig->group_exit_code;
+	else if (!thread_group_empty(current)) {
+		struct signal_struct *const sig = current->sig;
+		read_lock(&tasklist_lock);
 		spin_lock_irq(&sig->siglock);
-		if (sig->group_exit) {
-			spin_unlock_irq(&sig->siglock);
-
-			/* another thread was faster: */
-			do_exit(sig->group_exit_code);
-		}
+		if (sig->group_exit)
+			/* Another thread got here before we took the lock.  */
+			exit_code = sig->group_exit_code;
+		else {
 		sig->group_exit = 1;
 		sig->group_exit_code = exit_code;
-		__broadcast_thread_group(current, SIGKILL);
+			zap_other_threads(current);
+		}
 		spin_unlock_irq(&sig->siglock);
+		read_unlock(&tasklist_lock);
 	}

 	do_exit(exit_code);
+	/* NOTREACHED */
+}
+
+/*
+ * this kills every thread in the thread group. Note that any externally
+ * wait4()-ing process will get the correct exit code - even if this
+ * thread is not the thread group leader.
+ */
+asmlinkage long sys_exit_group(int error_code)
+{
+	do_group_exit((error_code & 0xff) << 8);
 }

 static int eligible_child(pid_t pid, int options, task_t *p)
@@ -800,6 +836,8 @@ asmlinkage long sys_wait4(pid_t pid,unsigned int * stat_addr, int options, struc
 		int ret;

 		list_for_each(_p,&tsk->children) {
+			int exit_code;
+
 			p = list_entry(_p,struct task_struct,sibling);

 			ret = eligible_child(pid, options, p);
@@ -813,20 +851,69 @@ asmlinkage long sys_wait4(pid_t pid,unsigned int * stat_addr, int options, struc
 					continue;
 				if (!(options & WUNTRACED) && !(p->ptrace & PT_PTRACED))
 					continue;
+				if (ret == 2 && !(p->ptrace & PT_PTRACED) &&
+				    p->sig && p->sig->group_stop_count > 0)
+					/*
+					 * A group stop is in progress and
+					 * we are the group leader.  We won't
+					 * report until all threads have
+					 * stopped.
+					 */
+					continue;
 				read_unlock(&tasklist_lock);

 				/* move to end of parent's list to avoid starvation */
 				write_lock_irq(&tasklist_lock);
 				remove_parent(p);
 				add_parent(p, p->parent);
+
+				/*
+				 * This uses xchg to be atomic with
+				 * the thread resuming and setting it.
+				 * It must also be done with the write
+				 * lock held to prevent a race with the
+				 * TASK_ZOMBIE case (below).
+				 */
+				exit_code = xchg(&p->exit_code, 0);
+				if (unlikely(p->state > TASK_STOPPED)) {
+					/*
+					 * The task resumed and then died.
+					 * Let the next iteration catch it
+					 * in TASK_ZOMBIE.  Note that
+					 * exit_code might already be zero
+					 * here if it resumed and did
+					 * _exit(0).  The task itself is
+					 * dead and won't touch exit_code
+					 * again; other processors in
+					 * this function are locked out.
+					 */
+					p->exit_code = exit_code;
+					exit_code = 0;
+				}
+				if (unlikely(exit_code == 0)) {
+					/*
+					 * Another thread in this function
+					 * got to it first, or it resumed,
+					 * or it resumed and then died.
+					 */
+					write_unlock_irq(&tasklist_lock);
+					continue;
+				}
+				/*
+				 * Make sure this doesn't get reaped out from
+				 * under us while we are examining it below.
+				 * We don't want to keep holding onto the
+				 * tasklist_lock while we call getrusage and
+				 * possibly take page faults for user memory.
+				 */
+				get_task_struct(p);
 				write_unlock_irq(&tasklist_lock);
 				retval = ru ? getrusage(p, RUSAGE_BOTH, ru) : 0; 
 				if (!retval && stat_addr) 
-					retval = put_user((p->exit_code << 8) | 0x7f, stat_addr);
-				if (!retval) {
-					p->exit_code = 0;
+					retval = put_user((exit_code << 8) | 0x7f, stat_addr);
+				if (!retval)
 					retval = p->pid;
-				}
+				put_task_struct(p);
 				goto end_wait4;
 			case TASK_ZOMBIE:
 				/*
@@ -841,6 +928,13 @@ asmlinkage long sys_wait4(pid_t pid,unsigned int * stat_addr, int options, struc
 				state = xchg(&p->state, TASK_DEAD);
 				if (state != TASK_ZOMBIE)
 					continue;
+				if (unlikely(p->exit_signal == -1))
+					/*
+					 * This can only happen in a race with
+					 * a ptraced thread dying on another
+					 * processor.
+					 */
+					continue;
 				read_unlock(&tasklist_lock);

 				retval = ru ? getrusage(p, RUSAGE_BOTH, ru) : 0;
@@ -857,11 +951,17 @@ asmlinkage long sys_wait4(pid_t pid,unsigned int * stat_addr, int options, struc
 				retval = p->pid;
 				if (p->real_parent != p->parent) {
 					write_lock_irq(&tasklist_lock);
+					/* Double-check with lock held.  */
+					if (p->real_parent != p->parent) {
 					__ptrace_unlink(p);
-					do_notify_parent(p, SIGCHLD);
+						do_notify_parent(
+							p, p->exit_signal);
 					p->state = TASK_ZOMBIE;
+						p = NULL;
+					}
 					write_unlock_irq(&tasklist_lock);
-				} else
+				}
+				if (p != NULL)
 					release_task(p);
 				goto end_wait4;
 			default:

--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -680,6 +680,7 @@ static inline int copy_sighand(unsigned long clone_flags, struct task_struct * t
 	sig->group_exit = 0;
 	sig->group_exit_code = 0;
 	sig->group_exit_task = NULL;
+	sig->group_stop_count = 0;
 	memcpy(sig->action, current->sig->action, sizeof(sig->action));
 	sig->curr_target = NULL;
 	init_sigpending(&sig->shared_pending);
@@ -801,7 +802,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	spin_lock_init(&p->alloc_lock);
 	spin_lock_init(&p->switch_lock);

-	clear_tsk_thread_flag(p,TIF_SIGPENDING);
+	clear_tsk_thread_flag(p, TIF_SIGPENDING);
 	init_sigpending(&p->pending);

 	p->it_real_value = p->it_virt_value = p->it_prof_value = 0;
@@ -910,6 +911,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	 */
 	if (sigismember(&current->pending.signal, SIGKILL)) {
 		write_unlock_irq(&tasklist_lock);
+		retval = -EINTR;
 		goto bad_fork_cleanup_namespace;
 	}

@@ -934,6 +936,17 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 		}
 		p->tgid = current->tgid;
 		p->group_leader = current->group_leader;
+
+		if (current->sig->group_stop_count > 0) {
+			/*
+			 * There is an all-stop in progress for the group.
+			 * We ourselves will stop as soon as we check signals.
+			 * Make the new thread part of that group stop too.
+			 */
+			current->sig->group_stop_count++;
+			set_tsk_thread_flag(p, TIF_SIGPENDING);
+		}
+
 		spin_unlock(&current->sig->siglock);
 	}

@@ -1036,8 +1049,13 @@ struct task_struct *do_fork(unsigned long clone_flags,
 			init_completion(&vfork);
 		}

-		if (p->ptrace & PT_PTRACED)
-			send_sig(SIGSTOP, p, 1);
+		if (p->ptrace & PT_PTRACED) {
+			/*
+			 * We'll start up with an immediate SIGSTOP.
+			 */
+			sigaddset(&p->pending.signal, SIGSTOP);
+			set_tsk_thread_flag(p, TIF_SIGPENDING);
+		}

 		wake_up_forked_process(p);		/* do this last */
 		++total_forks;

--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -55,7 +55,7 @@ int max_queued_signals = 1024;
 |  SIGALRM           |  load-balance    |  kill-all      |
 |  SIGTERM           |  load-balance    |  kill-all      |
 |  SIGCHLD           |  load-balance    |  ignore        |
-|  SIGCONT           |  specific        |  continue-all  |
+|  SIGCONT           |  load-balance    |  ignore        |
 |  SIGSTOP           |  n/a             |  stop-all      |
 |  SIGTSTP           |  load-balance    |  stop-all      |
 |  SIGTTIN           |  load-balance    |  stop-all      |
@@ -98,26 +98,11 @@ int max_queued_signals = 1024;
 #endif

 #if SIGRTMIN > BITS_PER_LONG
-#define M(sig) (1ULL << (sig))
+#define M(sig) (1ULL << ((sig)-1))
 #else
-#define M(sig) (1UL << (sig))
+#define M(sig) (1UL << ((sig)-1))
 #endif
-#define T(sig, mask) (M(sig) & mask)
-
-#define SIG_USER_SPECIFIC_MASK (\
-	M(SIGILL)    |  M(SIGTRAP)   |  M(SIGABRT)   |  M(SIGBUS)    | \
-	M(SIGFPE)    |  M(SIGSEGV)   |  M(SIGPIPE)   |  M(SIGXFSZ)   | \
-	M(SIGPROF)   |  M(SIGSYS)    |  M_SIGSTKFLT  |  M(SIGCONT)   | \
-        M_SIGEMT )
-
-#define SIG_USER_LOAD_BALANCE_MASK (\
-        M(SIGHUP)    |  M(SIGINT)    |  M(SIGQUIT)   |  M(SIGUSR1)   | \
-        M(SIGUSR2)   |  M(SIGALRM)   |  M(SIGTERM)   |  M(SIGCHLD)   | \
-        M(SIGURG)    |  M(SIGVTALRM) |  M(SIGPOLL)   |  M(SIGWINCH)  | \
-        M(SIGPWR)    |  M(SIGTSTP)   |  M(SIGTTIN)   |  M(SIGTTOU)   )
-
-#define SIG_KERNEL_SPECIFIC_MASK (\
-        M(SIGCHLD)   |   M(SIGURG)   |  M(SIGWINCH)                  )
+#define T(sig, mask) (M(sig) & (mask))

 #define SIG_KERNEL_BROADCAST_MASK (\
 	M(SIGHUP)    |  M(SIGINT)    |  M(SIGQUIT)   |  M(SIGILL)    | \
@@ -132,34 +117,37 @@ int max_queued_signals = 1024;
 #define SIG_KERNEL_ONLY_MASK (\
 	M(SIGKILL)   |  M(SIGSTOP)                                   )

+#define SIG_KERNEL_STOP_MASK (\
+	M(SIGSTOP)   |  M(SIGTSTP)   |  M(SIGTTIN)   |  M(SIGTTOU)   )
+
 #define SIG_KERNEL_COREDUMP_MASK (\
        M(SIGQUIT)   |  M(SIGILL)    |  M(SIGTRAP)   |  M(SIGABRT)   | \
        M(SIGFPE)    |  M(SIGSEGV)   |  M(SIGBUS)    |  M(SIGSYS)    | \
        M(SIGXCPU)   |  M(SIGXFSZ)   |  M_SIGEMT                     )

-#define sig_user_specific(sig) \
-		(((sig) < SIGRTMIN)  && T(sig, SIG_USER_SPECIFIC_MASK))
-#define sig_user_load_balance(sig) \
-		(((sig) >= SIGRTMIN) || T(sig, SIG_USER_LOAD_BALANCE_MASK))
-#define sig_kernel_specific(sig) \
-		(((sig) < SIGRTMIN)  && T(sig, SIG_KERNEL_SPECIFIC_MASK))
-#define sig_kernel_broadcast(sig) \
-		(((sig) >= SIGRTMIN) || T(sig, SIG_KERNEL_BROADCAST_MASK))
+#define SIG_KERNEL_IGNORE_MASK (\
+        M(SIGCONT)   |  M(SIGCHLD)   |  M(SIGWINCH)  |  M(SIGURG)    )
+
 #define sig_kernel_only(sig) \
 		(((sig) < SIGRTMIN)  && T(sig, SIG_KERNEL_ONLY_MASK))
 #define sig_kernel_coredump(sig) \
 		(((sig) < SIGRTMIN)  && T(sig, SIG_KERNEL_COREDUMP_MASK))
+#define sig_kernel_ignore(sig) \
+		(((sig) < SIGRTMIN)  && T(sig, SIG_KERNEL_IGNORE_MASK))
+#define sig_kernel_stop(sig) \
+		(((sig) < SIGRTMIN)  && T(sig, SIG_KERNEL_STOP_MASK))

-#define sig_user_defined(t, sig) \
-	(((t)->sig->action[(sig)-1].sa.sa_handler != SIG_DFL) &&	\
-	 ((t)->sig->action[(sig)-1].sa.sa_handler != SIG_IGN))
+#define sig_user_defined(t, signr) \
+	(((t)->sig->action[(signr)-1].sa.sa_handler != SIG_DFL) &&	\
+	 ((t)->sig->action[(signr)-1].sa.sa_handler != SIG_IGN))

-#define sig_ignored(t, sig) \
-	(((sig) != SIGCHLD) && \
-		((t)->sig->action[(sig)-1].sa.sa_handler == SIG_IGN))
+#define sig_ignored(t, signr) \
+	(!((t)->ptrace & PT_PTRACED) && \
+	 (t)->sig->action[(signr)-1].sa.sa_handler == SIG_IGN)

-static int
-__send_sig_info(int sig, struct siginfo *info, struct task_struct *p);
+#define sig_fatal(t, signr) \
+	(!T(signr, SIG_KERNEL_IGNORE_MASK|SIG_KERNEL_STOP_MASK) && \
+	 (t)->sig->action[(signr)-1].sa.sa_handler == SIG_DFL)

 /*
 * Re-calculate pending state from the set of locally pending
@@ -193,9 +181,10 @@ static inline int has_pending_signals(sigset_t *signal, sigset_t *blocked)

 #define PENDING(p,b) has_pending_signals(&(p)->signal, (b))

-void recalc_sigpending_tsk(struct task_struct *t)
+inline void recalc_sigpending_tsk(struct task_struct *t)
 {
-	if (PENDING(&t->pending, &t->blocked) ||
+	if (t->sig->group_stop_count > 0 ||
+	    PENDING(&t->pending, &t->blocked) ||
 			PENDING(&t->sig->shared_pending, &t->blocked))
 		set_tsk_thread_flag(t, TIF_SIGPENDING);
 	else
@@ -204,11 +193,7 @@ void recalc_sigpending_tsk(struct task_struct *t)

 void recalc_sigpending(void)
 {
-	if (PENDING(&current->pending, &current->blocked) ||
-		    PENDING(&current->sig->shared_pending, &current->blocked))
-		set_thread_flag(TIF_SIGPENDING);
-	else
-		clear_thread_flag(TIF_SIGPENDING);
+	recalc_sigpending_tsk(current);
 }

 /* Given the mask, find the first available signal that should be serviced. */
@@ -337,23 +322,6 @@ flush_signal_handlers(struct task_struct *t)
 	}
 }

-/*
- * sig_exit - cause the current task to exit due to a signal.
- */
-
-void
-sig_exit(int sig, int exit_code, struct siginfo *info)
-{
-	sigaddset(&current->pending.signal, sig);
-	recalc_sigpending();
-	current->flags |= PF_SIGNALED;
-
-	if (current->sig->group_exit)
-		exit_code = current->sig->group_exit_code;
-
-	do_exit(exit_code);
-	/* NOTREACHED */
-}

 /* Notify the system that a driver wants to block all signals for this
 * process, and wants to be notified if any signals at all were to be
@@ -473,32 +441,74 @@ static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
 */
 int dequeue_signal(sigset_t *mask, siginfo_t *info)
 {
+	int signr = __dequeue_signal(&current->pending, mask, info);
+	if (!signr)
+		signr = __dequeue_signal(&current->sig->shared_pending,
+					 mask, info);
+	return signr;
+}
+
+/*
+ * Tell a process that it has a new active signal..
+ *
+ * NOTE! we rely on the previous spin_lock to
+ * lock interrupts for us! We can only be called with
+ * "siglock" held, and the local interrupt must
+ * have been disabled when that got acquired!
+ *
+ * No need to set need_resched since signal event passing
+ * goes through ->blocked
+ */
+inline void signal_wake_up(struct task_struct *t, int resume)
+{
+	set_tsk_thread_flag(t,TIF_SIGPENDING);
+
+	/*
+	 * If the task is running on a different CPU
+	 * force a reschedule on the other CPU to make
+	 * it notice the new signal quickly.
+	 *
+	 * The code below is a tad loose and might occasionally
+	 * kick the wrong CPU if we catch the process in the
+	 * process of changing - but no harm is done by that
+	 * other than doing an extra (lightweight) IPI interrupt.
+	 */
+	if (t->state == TASK_RUNNING)
+		kick_if_running(t);
 	/*
-	 * Here we handle shared pending signals. To implement the full
-	 * semantics we need to unqueue and resend them. It will likely
-	 * get into our own pending queue.
+	 * If resume is set, we want to wake it up in the TASK_STOPPED case.
+	 * We don't check for TASK_STOPPED because there is a race with it
+	 * executing another processor and just now entering stopped state.
+	 * By calling wake_up_process any time resume is set, we ensure
+	 * the process will wake up and handle its stop or death signal.
 	 */
-	if (current->sig->shared_pending.head) {
-		int signr = __dequeue_signal(&current->sig->shared_pending, mask, info);
-		if (signr)
-			__send_sig_info(signr, info, current);
+	if ((t->state & TASK_INTERRUPTIBLE) ||
+	    (resume && t->state < TASK_ZOMBIE)) {
+		wake_up_process(t);
+		return;
 	}
-	return __dequeue_signal(&current->pending, mask, info);
 }

-static int rm_from_queue(int sig, struct sigpending *s)
+/*
+ * Remove signals in mask from the pending set and queue.
+ * Returns 1 if any signals were found.
+ *
+ * All callers must be holding the siglock.
+ */
+static int rm_from_queue(unsigned long mask, struct sigpending *s)
 {
 	struct sigqueue *q, **pp;

-	if (!sigismember(&s->signal, sig))
+	if (!sigtestsetmask(&s->signal, mask))
 		return 0;

-	sigdelset(&s->signal, sig);
+	sigdelsetmask(&s->signal, mask);

 	pp = &s->head;

 	while ((q = *pp) != NULL) {
-		if (q->info.si_signo == sig) {
+		if (q->info.si_signo < SIGRTMIN &&
+		    (mask & sigmask (q->info.si_signo))) {
 			if ((*pp = q->next) == NULL)
 				s->tail = pp;
 			kmem_cache_free(sigqueue_cachep,q);
@@ -510,112 +520,101 @@ static int rm_from_queue(int sig, struct sigpending *s)
 	return 1;
 }

-/*
- * Remove signal sig from t->pending.
- * Returns 1 if sig was found.
- *
- * All callers must be holding the siglock.
- */
-static int rm_sig_from_queue(int sig, struct task_struct *t)
-{
-	return rm_from_queue(sig, &t->pending);
-}
-
 /*
 * Bad permissions for sending the signal
 */
-static inline int bad_signal(int sig, struct siginfo *info, struct task_struct *t)
+static inline int check_kill_permission(int sig, struct siginfo *info,
+					struct task_struct *t)
 {
-	return (!info || ((unsigned long)info != 1 &&
+	int error = -EINVAL;
+	if (sig < 0 || sig > _NSIG)
+		return error;
+	error = -EPERM;
+	if ((!info || ((unsigned long)info != 1 &&
 			(unsigned long)info != 2 && SI_FROMUSER(info)))
 	    && ((sig != SIGCONT) || (current->session != t->session))
 	    && (current->euid ^ t->suid) && (current->euid ^ t->uid)
 	    && (current->uid ^ t->suid) && (current->uid ^ t->uid)
-	    && !capable(CAP_KILL);
+	    && !capable(CAP_KILL))
+		return error;
+	return security_task_kill(t, info, sig);
 }

+/* forward decl */
+static void do_notify_parent_cldstop(struct task_struct *tsk,
+				     struct task_struct *parent);
+
 /*
- * Signal type:
- *    < 0 : global action (kill - spread to all non-blocked threads)
- *    = 0 : ignored
- *    > 0 : wake up.
+ * Handle magic process-wide effects of stop/continue signals, and SIGKILL.
+ * Unlike the signal actions, these happen immediately at signal-generation
+ * time regardless of blocking, ignoring, or handling.  This does the
+ * actual continuing for SIGCONT, but not the actual stopping for stop
+ * signals.  The process stop is done as a signal action for SIG_DFL.
 */
-static int signal_type(int sig, struct signal_struct *signals)
+static void handle_stop_signal(int sig, struct task_struct *p)
 {
-	unsigned long handler;
-
-	if (!signals)
-		return 0;
-	
-	handler = (unsigned long) signals->action[sig-1].sa.sa_handler;
-	if (handler > 1)
-		return 1;
-
-	/* "Ignore" handler.. Illogical, but that has an implicit handler for SIGCHLD */
-	if (handler == 1)
-		return sig == SIGCHLD;
-
-	/* Default handler. Normally lethal, but.. */
-	switch (sig) {
-
-	/* Ignored */
-	case SIGCONT: case SIGWINCH:
-	case SIGCHLD: case SIGURG:
-		return 0;
-
-	/* Implicit behaviour */
-	case SIGTSTP: case SIGTTIN: case SIGTTOU:
-		return 1;
+	struct task_struct *t;

-	/* Implicit actions (kill or do special stuff) */
-	default:
-		return -1;
+	if (sig_kernel_stop(sig)) {
+		/*
+		 * This is a stop signal.  Remove SIGCONT from all queues.
+		 */
+		rm_from_queue(sigmask(SIGCONT), &p->sig->shared_pending);
+		t = p;
+		do {
+			rm_from_queue(sigmask(SIGCONT), &t->pending);
+			t = next_thread(t);
+		} while (t != p);
 	}
-}
-		
-
-/*
- * Determine whether a signal should be posted or not.
- *
- * Signals with SIG_IGN can be ignored, except for the
- * special case of a SIGCHLD. 
- *
- * Some signals with SIG_DFL default to a non-action.
+	else if (sig == SIGCONT) {
+		/*
+		 * Remove all stop signals from all queues,
+		 * and wake all threads.
 */
-static int ignored_signal(int sig, struct task_struct *t)
-{
-	/* Don't ignore traced or blocked signals */
-	if ((t->ptrace & PT_PTRACED) || sigismember(&t->blocked, sig))
-		return 0;
-
-	return signal_type(sig, t->sig) == 0;
-}
-
-/*
- * Handle TASK_STOPPED cases etc implicit behaviour
- * of certain magical signals.
- *
- * SIGKILL gets spread out to every thread. 
+		if (unlikely(p->sig->group_stop_count > 0)) {
+			/*
+			 * There was a group stop in progress.  We'll
+			 * pretend it finished before we got here.  We are
+			 * obliged to report it to the parent: if the
+			 * SIGSTOP happened "after" this SIGCONT, then it
+			 * would have cleared this pending SIGCONT.  If it
+			 * happened "before" this SIGCONT, then the parent
+			 * got the SIGCHLD about the stop finishing before
+			 * the continue happened.  We do the notification
+			 * now, and it's as if the stop had finished and
+			 * the SIGCHLD was pending on entry to this kill.
 			 */
-static void handle_stop_signal(int sig, struct task_struct *t)
-{
-	switch (sig) {
-	case SIGKILL: case SIGCONT:
-		/* Wake up the process if stopped.  */
-		if (t->state == TASK_STOPPED)
+			p->sig->group_stop_count = 0;
+			if (p->ptrace & PT_PTRACED)
+				do_notify_parent_cldstop(p, p->parent);
+			else
+				do_notify_parent_cldstop(
+					p->group_leader,
+					p->group_leader->real_parent);
+		}
+		rm_from_queue(SIG_KERNEL_STOP_MASK, &p->sig->shared_pending);
+		t = p;
+		do {
+			rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
+			if (t->state == TASK_STOPPED) {
+				/*
+				 * If there is a handler for SIGCONT, we
+				 * must make sure that no thread returns to
+				 * user mode before we post the signal, in
+				 * case it was the only thread eligible to
+				 * run the signal handler--then it must not
+				 * do anything between resuming and running
+				 * the handler.  With the TIF_SIGPENDING flag
+				 * set, the thread will pause and acquire the
+				 * siglock that we hold now and until we've
+				 * queued the pending signal.
+ */
+				if (sig_user_defined(p, SIGCONT))
+					set_tsk_thread_flag(t, TIF_SIGPENDING);
 				wake_up_process(t);
-		t->exit_code = 0;
-		rm_sig_from_queue(SIGSTOP, t);
-		rm_sig_from_queue(SIGTSTP, t);
-		rm_sig_from_queue(SIGTTOU, t);
-		rm_sig_from_queue(SIGTTIN, t);
-		break;
-
-	case SIGSTOP: case SIGTSTP:
-	case SIGTTIN: case SIGTTOU:
-		/* If we're stopping again, cancel SIGCONT */
-		rm_sig_from_queue(SIGCONT, t);
-		break;
+			}
+			t = next_thread(t);
+		} while (t != p);
 	}
 }

@@ -678,51 +677,12 @@ static int send_signal(int sig, struct siginfo *info, struct sigpending *signals
 	return 0;
 }

-/*
- * Tell a process that it has a new active signal..
- *
- * NOTE! we rely on the previous spin_lock to
- * lock interrupts for us! We can only be called with
- * "siglock" held, and the local interrupt must
- * have been disabled when that got acquired!
- *
- * No need to set need_resched since signal event passing
- * goes through ->blocked
- */
-inline void signal_wake_up(struct task_struct *t)
-{
-	set_tsk_thread_flag(t,TIF_SIGPENDING);
-
-	/*
-	 * If the task is running on a different CPU 
-	 * force a reschedule on the other CPU to make
-	 * it notice the new signal quickly.
-	 *
-	 * The code below is a tad loose and might occasionally
-	 * kick the wrong CPU if we catch the process in the
-	 * process of changing - but no harm is done by that
-	 * other than doing an extra (lightweight) IPI interrupt.
-	 */
-	if (t->state == TASK_RUNNING)
-		kick_if_running(t);
-	if (t->state & TASK_INTERRUPTIBLE) {
-		wake_up_process(t);
-		return;
-	}
-}
-
-static int deliver_signal(int sig, struct siginfo *info, struct task_struct *t)
-{
-	int retval = send_signal(sig, info, &t->pending);
-
-	if (!retval && !sigismember(&t->blocked, sig))
-		signal_wake_up(t);
+#define LEGACY_QUEUE(sigptr, sig) \
+	(((sig) < SIGRTMIN) && sigismember(&(sigptr)->signal, (sig)))

-	return retval;
-}

 static int
-specific_send_sig_info(int sig, struct siginfo *info, struct task_struct *t, int shared)
+specific_send_sig_info(int sig, struct siginfo *info, struct task_struct *t)
 {
 	int ret;

@@ -732,49 +692,21 @@ specific_send_sig_info(int sig, struct siginfo *info, struct task_struct *t, int
 	if (!spin_is_locked(&t->sig->siglock))
 		BUG();
 #endif
-	ret = -EINVAL;
-	if (sig < 0 || sig > _NSIG)
-		goto out;
-	/* The somewhat baroque permissions check... */
-	ret = -EPERM;
-	if (bad_signal(sig, info, t))
-		goto out;
-	ret = security_task_kill(t, info, sig);
-	if (ret)
-		goto out;

-	/* The null signal is a permissions and process existence probe.
-	   No signal is actually delivered.  Same goes for zombies. */
-	ret = 0;
-	if (!sig || !t->sig)
-		goto out;
-
-	handle_stop_signal(sig, t);
-
-	/* Optimize away the signal, if it's a signal that can be
-	   handled immediately (ie non-blocked and untraced) and
-	   that is ignored (either explicitly or by default).  */
-
-	if (ignored_signal(sig, t))
-		goto out;
-
-#define LEGACY_QUEUE(sigptr, sig) \
-	(((sig) < SIGRTMIN) && sigismember(&(sigptr)->signal, (sig)))
+	/* Short-circuit ignored signals.  */
+	if (sig_ignored(t, sig))
+		return 0;

-	if (!shared) {
 		/* Support queueing exactly one non-rt signal, so that we
 		   can get more detailed information about the cause of
 		   the signal. */
 		if (LEGACY_QUEUE(&t->pending, sig))
-			goto out;
+		return 0;
+
+	ret = send_signal(sig, info, &t->pending);
+	if (!ret && !sigismember(&t->blocked, sig))
+		signal_wake_up(t, sig == SIGKILL);

-		ret = deliver_signal(sig, info, t);
-	} else {
-		if (LEGACY_QUEUE(&t->sig->shared_pending, sig))
-			goto out;
-		ret = send_signal(sig, info, &t->sig->shared_pending);
-	}
-out:
 	return ret;
 }

@@ -794,26 +726,12 @@ force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
 		t->sig->action[sig-1].sa.sa_handler = SIG_DFL;
 	sigdelset(&t->blocked, sig);
 	recalc_sigpending_tsk(t);
-	ret = __send_sig_info(sig, info, t);
+	ret = specific_send_sig_info(sig, info, t);
 	spin_unlock_irqrestore(&t->sig->siglock, flags);

 	return ret;
 }

-static int
-__specific_force_sig_info(int sig, struct task_struct *t)
-{
-	if (!t->sig)
-		return -ESRCH;
-
-	if (t->sig->action[sig-1].sa.sa_handler == SIG_IGN)
-		t->sig->action[sig-1].sa.sa_handler = SIG_DFL;
-	sigdelset(&t->blocked, sig);
-	recalc_sigpending_tsk(t);
-
-	return specific_send_sig_info(sig, (void *)2, t, 0);
-}
-
 void
 force_sig_specific(int sig, struct task_struct *t)
 {
@@ -824,157 +742,182 @@ force_sig_specific(int sig, struct task_struct *t)
 		t->sig->action[sig-1].sa.sa_handler = SIG_DFL;
 	sigdelset(&t->blocked, sig);
 	recalc_sigpending_tsk(t);
-	specific_send_sig_info(sig, (void *)2, t, 0);
+	specific_send_sig_info(sig, (void *)2, t);
 	spin_unlock_irqrestore(&t->sig->siglock, flags);
 }

-#define can_take_signal(p, sig)	\
-	(((unsigned long) p->sig->action[sig-1].sa.sa_handler > 1) && \
-	!sigismember(&p->blocked, sig) && (task_curr(p) || !signal_pending(p)))
+/*
+ * Test if P wants to take SIG.  After we've checked all threads with this,
+ * it's equivalent to finding no threads not blocking SIG.  Any threads not
+ * blocking SIG were ruled out because they are not running and already
+ * have pending signals.  Such threads will dequeue from the shared queue
+ * as soon as they're available, so putting the signal on the shared queue
+ * will be equivalent to sending it to one such thread.
+ */
+#define wants_signal(sig, p)	(!sigismember(&(p)->blocked, sig) \
+				 && (p)->state < TASK_STOPPED \
+				 && !((p)->flags & PF_EXITING) \
+				 && (task_curr(p) || !signal_pending(p)))

-static inline
-int load_balance_thread_group(struct task_struct *p, int sig,
-				struct siginfo *info)
+static inline int
+__group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
 {
-	struct task_struct *tmp;
+	struct task_struct *t;
 	int ret;

+#if CONFIG_SMP
+	if (!spin_is_locked(&p->sig->siglock))
+		BUG();
+#endif
+	handle_stop_signal(sig, p);
+
+	/* Short-circuit ignored signals.  */
+	if (sig_ignored(p, sig))
+		return 0;
+
+	if (LEGACY_QUEUE(&p->sig->shared_pending, sig))
+		/* This is a non-RT signal and we already have one queued.  */
+		return 0;
+
 	/*
-	 * if the specified thread is not blocking this signal
-	 * then deliver it.
+	 * Put this signal on the shared-pending queue, or fail with EAGAIN.
+	 * We always use the shared queue for process-wide signals,
+	 * to avoid several races.
 	 */
-	if (can_take_signal(p, sig))
-		return specific_send_sig_info(sig, info, p, 0);
+	ret = send_signal(sig, info, &p->sig->shared_pending);
+	if (unlikely(ret))
+		return ret;

 	/*
+	 * Now find a thread we can wake up to take the signal off the queue.
+	 *
+	 * If the main thread wants the signal, it gets first crack.
+	 * Probably the least surprising to the average bear.
+	 */
+	if (p->state < TASK_ZOMBIE &&
+	    (sig_kernel_only(sig) || wants_signal(sig, p)))
+		t = p;
+	else if (thread_group_empty(p))
+		/*
+		 * There is just one thread and it does not need to be woken.
+		 * It will dequeue unblocked signals before it runs again.
+		 */
+		return 0;
+	else {
+		/*
 	 * Otherwise try to find a suitable thread.
-	 * If no such thread is found then deliver to
-	 * the original thread.
 	 */
-
-	tmp = p->sig->curr_target;
-
-	if (!tmp || tmp->tgid != p->tgid)
+		t = p->sig->curr_target;
+		if (t == NULL)
 		/* restart balancing at this thread */
-		p->sig->curr_target = p;
-
-	else for (;;) {
-		if (thread_group_empty(p))
-			BUG();
-		if (!tmp || tmp->tgid != p->tgid)
-			BUG();
+			t = p->sig->curr_target = p;
+		BUG_ON(t->tgid != p->tgid);

+		while (!wants_signal(sig, t)) {
+			t = next_thread(t);
+			if (t == p->sig->curr_target)
 		/*
-		 * Do not send signals that are ignored or blocked,
-		 * or to not-running threads that are overworked:
+				 * No thread needs to be woken.
+				 * Any eligible threads will see
+				 * the signal in the queue soon.
 		 */
-		if (!can_take_signal(tmp, sig)) {
-			tmp = next_thread(tmp);
-			p->sig->curr_target = tmp;
-			if (tmp == p)
-				break;
-			continue;
+				return 0;
 		}
-		ret = specific_send_sig_info(sig, info, tmp, 0);
-		return ret;
+		p->sig->curr_target = t;
 	}
+
 	/*
-	 * No suitable thread was found - put the signal
-	 * into the shared-pending queue.
+	 * Found a killable thread.  If the signal will be fatal,
+	 * then start taking the whole group down immediately.
 	 */
-	return specific_send_sig_info(sig, info, p, 1);
-}
-
-int __broadcast_thread_group(struct task_struct *p, int sig)
-{
-	struct task_struct *tmp;
-	struct list_head *l;
-	struct pid *pid;
-	int err = 0;
-
-	for_each_task_pid(p->tgid, PIDTYPE_TGID, tmp, l, pid)
-		err = __specific_force_sig_info(sig, tmp);
-
-	return err;
-}
+	if (sig_fatal(p, sig) && !p->sig->group_exit &&
+	    !sigismember(&t->real_blocked, sig) &&
+	    (sig == SIGKILL || !(t->ptrace & PT_PTRACED))) {
+		/*
+		 * This signal will be fatal to the whole group.
+		 */
+		if (!sig_kernel_coredump(sig)) {
+			/*
+			 * Start a group exit and wake everybody up.
+			 * This way we don't have other threads
+			 * running and doing things after a slower
+			 * thread has the fatal signal pending.
+			 */
+			p->sig->group_exit = 1;
+			p->sig->group_exit_code = sig;
+			p->sig->group_stop_count = 0;
+			t = p;
+			do {
+				sigaddset(&t->pending.signal, SIGKILL);
+				signal_wake_up(t, 1);
+				t = next_thread(t);
+			} while (t != p);
+			return 0;
+		}

-struct task_struct * find_unblocked_thread(struct task_struct *p, int signr)
-{
-	struct task_struct *tmp;
-	struct list_head *l;
-	struct pid *pid;
+		/*
+		 * There will be a core dump.  We make all threads other
+		 * than the chosen one go into a group stop so that nothing
+		 * happens until it gets scheduled, takes the signal off
+		 * the shared queue, and does the core dump.  This is a
+		 * little more complicated than strictly necessary, but it
+		 * keeps the signal state that winds up in the core dump
+		 * unchanged from the death state, e.g. which thread had
+		 * the core-dump signal unblocked.
+		 */
+		rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
+		rm_from_queue(SIG_KERNEL_STOP_MASK, &p->sig->shared_pending);
+		p->sig->group_stop_count = 0;
+		p->sig->group_exit_task = t;
+		t = p;
+		do {
+			p->sig->group_stop_count++;
+			signal_wake_up(t, 0);
+			t = next_thread(t);
+		} while (t != p);
+		wake_up_process(p->sig->group_exit_task);
+		return 0;
+	}

-	for_each_task_pid(p->tgid, PIDTYPE_TGID, tmp, l, pid)
-		if (!sigismember(&tmp->blocked, signr))
-			return tmp;
-	return NULL;
+	/*
+	 * The signal is already in the shared-pending queue.
+	 * Tell the chosen thread to wake up and dequeue it.
+	 */
+	signal_wake_up(t, sig == SIGKILL);
+	return 0;
 }

-static int
-__send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
+/*
+ * Nuke all other threads in the group.
+ */
+void zap_other_threads(struct task_struct *p)
 {
 	struct task_struct *t;
-	int ret = 0;

-#if CONFIG_SMP
-	if (!spin_is_locked(&p->sig->siglock))
-		BUG();
-#endif
-	/* not a thread group - normal signal behavior */
-	if (thread_group_empty(p) || !sig)
-		goto out_send;
+	p->sig->group_stop_count = 0;

-	if (sig_user_defined(p, sig)) {
-		if (sig_user_specific(sig))
-			goto out_send;
-		if (sig_user_load_balance(sig)) {
-			ret = load_balance_thread_group(p, sig, info);
-			goto out_unlock;
-		}
-
-		/* must not happen */
-		BUG();
-	}
-	/* optimize away ignored signals: */
-	if (sig_ignored(p, sig))
-		goto out_unlock;
-
-	if (sig_kernel_specific(sig) ||
-		       ((p->ptrace & PT_PTRACED) && !sig_kernel_only(sig)))
-		goto out_send;
+	if (thread_group_empty(p))
+		return;

-	/* Does any of the threads unblock the signal? */
-	t = find_unblocked_thread(p, sig);
-	if (!t) {
-		ret = specific_send_sig_info(sig, info, p, 1);
-		goto out_unlock;
-	}
-	if (sigismember(&t->real_blocked,sig)) {
-		ret = specific_send_sig_info(sig, info, t, 0);
-		goto out_unlock;
+	for (t = next_thread(p); t != p; t = next_thread(t)) {
+		sigaddset(&t->pending.signal, SIGKILL);
+		rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
+		signal_wake_up(t, 1);
 	}
-	if (sig_kernel_broadcast(sig) || sig_kernel_coredump(sig)) {
-		ret = __broadcast_thread_group(p, sig);
-		goto out_unlock;
-	}
-
-	/* must not happen */
-	BUG();
-out_send:
-	ret = specific_send_sig_info(sig, info, p, 0);
-out_unlock:
-	return ret;
 }

 int
-send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
+group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
 {
 	unsigned long flags;
 	int ret;

+	ret = check_kill_permission(sig, info, p);
+	if (!ret && sig && p->sig) {
 		spin_lock_irqsave(&p->sig->siglock, flags);
-	ret = __send_sig_info(sig, info, p);
+		ret = __group_send_sig_info(sig, info, p);
 		spin_unlock_irqrestore(&p->sig->siglock, flags);
+	}

 	return ret;
 }
@@ -995,7 +938,7 @@ int __kill_pg_info(int sig, struct siginfo *info, pid_t pgrp)
 		return -EINVAL;

 	for_each_task_pid(pgrp, PIDTYPE_PGID, p, l, pid) {
-		err = send_sig_info(sig, info, p);
+		err = group_send_sig_info(sig, info, p);
 		if (retval)
 			retval = err;
 	}
@@ -1037,7 +980,7 @@ kill_sl_info(int sig, struct siginfo *info, pid_t sid)
 	for_each_task_pid(sid, PIDTYPE_SID, p, l, pid) {
 		if (!p->leader)
 			continue;
-		err = send_sig_info(sig, info, p);
+		err = group_send_sig_info(sig, info, p);
 		if (retval)
 			retval = err;
 	}
@@ -1056,7 +999,7 @@ kill_proc_info(int sig, struct siginfo *info, pid_t pid)
 	p = find_task_by_pid(pid);
 	error = -ESRCH;
 	if (p)
-		error = send_sig_info(sig, info, p);
+		error = group_send_sig_info(sig, info, p);
 	read_unlock(&tasklist_lock);
 	return error;
 }
@@ -1079,8 +1022,8 @@ static int kill_something_info(int sig, struct siginfo *info, int pid)

 		read_lock(&tasklist_lock);
 		for_each_process(p) {
-			if (p->pid > 1 && p != current) {
-				int err = send_sig_info(sig, info, p);
+			if (p->pid > 1 && p->tgid != current->tgid) {
+				int err = group_send_sig_info(sig, info, p);
 				++count;
 				if (err != -EPERM)
 					retval = err;
@@ -1099,6 +1042,22 @@ static int kill_something_info(int sig, struct siginfo *info, int pid)
 * These are for backward compatibility with the rest of the kernel source.
 */

+int
+send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
+{
+	/* XXX should nix these interfaces and update the kernel */
+	if (T(sig, SIG_KERNEL_BROADCAST_MASK))
+		/* XXX do callers really always hold the tasklist_lock?? */
+		return group_send_sig_info(sig, info, p);
+	else {
+		int error;
+		spin_lock_irq(&p->sig->siglock);
+		error = specific_send_sig_info(sig, info, p);
+		spin_unlock_irq(&p->sig->siglock);
+		return error;
+	}
+}
+
 int
 send_sig(int sig, struct task_struct *p, int priv)
 {
@@ -1133,9 +1092,10 @@ kill_proc(pid_t pid, int sig, int priv)
 * Joy. Or not. Pthread wants us to wake up every thread
 * in our parent group.
 */
-static inline void __wake_up_parent(struct task_struct *p)
+static inline void __wake_up_parent(struct task_struct *p,
+				    struct task_struct *parent)
 {
-	struct task_struct *parent = p->parent, *tsk = parent;
+	struct task_struct *tsk = parent;

 	/*
 	 * Fortunately this is not necessary for thread groups:
@@ -1162,6 +1122,7 @@ void do_notify_parent(struct task_struct *tsk, int sig)
 	struct siginfo info;
 	unsigned long flags;
 	int why, status;
+	struct signal_struct *psig;

 	if (sig == -1)
 		BUG();
@@ -1200,10 +1161,34 @@ void do_notify_parent(struct task_struct *tsk, int sig)
 	info.si_code = why;
 	info.si_status = status;

-	spin_lock_irqsave(&tsk->parent->sig->siglock, flags);
-	__send_sig_info(sig, &info, tsk->parent);
-	__wake_up_parent(tsk);
-	spin_unlock_irqrestore(&tsk->parent->sig->siglock, flags);
+	psig = tsk->parent->sig;
+	spin_lock_irqsave(&psig->siglock, flags);
+	if (sig == SIGCHLD && tsk->state != TASK_STOPPED &&
+	    (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN ||
+	     (psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT))) {
+		/*
+		 * We are exiting and our parent doesn't care.  POSIX.1
+		 * defines special semantics for setting SIGCHLD to SIG_IGN
+		 * or setting the SA_NOCLDWAIT flag: we should be reaped
+		 * automatically and not left for our parent's wait4 call.
+		 * Rather than having the parent do it as a magic kind of
+		 * signal handler, we just set this to tell do_exit that we
+		 * can be cleaned up without becoming a zombie.  Note that
+		 * we still call __wake_up_parent in this case, because a
+		 * blocked sys_wait4 might now return -ECHILD.
+		 *
+		 * Whether we send SIGCHLD or not for SA_NOCLDWAIT
+		 * is implementation-defined: we do (if you don't want
+		 * it, just use SIG_IGN instead).
+		 */
+		tsk->exit_signal = -1;
+		if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
+			sig = 0;
+	}
+	if (sig > 0 && sig <= _NSIG)
+		__group_send_sig_info(sig, &info, tsk->parent);
+	__wake_up_parent(tsk, tsk->parent);
+	spin_unlock_irqrestore(&psig->siglock, flags);
 }


@@ -1224,6 +1209,149 @@ notify_parent(struct task_struct *tsk, int sig)
 	}
 }

+static void
+do_notify_parent_cldstop(struct task_struct *tsk, struct task_struct *parent)
+{
+	struct siginfo info;
+	unsigned long flags;
+
+	info.si_signo = SIGCHLD;
+	info.si_errno = 0;
+	info.si_pid = tsk->pid;
+	info.si_uid = tsk->uid;
+
+	/* FIXME: find out whether or not this is supposed to be c*time. */
+	info.si_utime = tsk->utime;
+	info.si_stime = tsk->stime;
+
+	info.si_status = tsk->exit_code & 0x7f;
+	info.si_code = CLD_STOPPED;
+
+	spin_lock_irqsave(&parent->sig->siglock, flags);
+	if (parent->sig->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
+	    !(parent->sig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
+		__group_send_sig_info(SIGCHLD, &info, parent);
+	/*
+	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
+	 */
+	__wake_up_parent(tsk, parent);
+	spin_unlock_irqrestore(&parent->sig->siglock, flags);
+}
+
+static void
+finish_stop(int stop_count)
+{
+	/*
+	 * If there are no other threads in the group, or if there is
+	 * a group stop in progress and we are the last to stop,
+	 * report to the parent.  When ptraced, every thread reports itself.
+	 */
+	if (stop_count < 0 || (current->ptrace & PT_PTRACED)) {
+		read_lock(&tasklist_lock);
+		do_notify_parent_cldstop(current, current->parent);
+		read_unlock(&tasklist_lock);
+	}
+	else if (stop_count == 0) {
+		read_lock(&tasklist_lock);
+		do_notify_parent_cldstop(current->group_leader,
+					 current->group_leader->real_parent);
+		read_unlock(&tasklist_lock);
+	}
+
+	schedule();
+	/*
+	 * Now we don't run again until continued.
+	 */
+	current->exit_code = 0;
+}
+
+/*
+ * This performs the stopping for SIGSTOP and other stop signals.
+ * We have to stop all threads in the thread group.
+ */
+static void
+do_signal_stop(int signr)
+{
+	struct signal_struct *sig = current->sig;
+	int stop_count = -1;
+
+	if (sig->group_stop_count > 0) {
+		/*
+		 * There is a group stop in progress.  We don't need to
+		 * start another one.
+		 */
+		spin_lock_irq(&sig->siglock);
+		if (unlikely(sig->group_stop_count == 0)) {
+			BUG_ON(!sig->group_exit);
+			spin_unlock_irq(&sig->siglock);
+			return;
+		}
+		signr = sig->group_exit_code;
+		stop_count = --sig->group_stop_count;
+		current->exit_code = signr;
+		set_current_state(TASK_STOPPED);
+		spin_unlock_irq(&sig->siglock);
+	}
+	else if (thread_group_empty(current)) {
+		/*
+		 * No locks needed in this case.
+		 */
+		current->exit_code = signr;
+		set_current_state(TASK_STOPPED);
+	}
+	else {
+		/*
+		 * There is no group stop already in progress.
+		 * We must initiate one now.
+		 */
+		struct task_struct *t;
+		read_lock(&tasklist_lock);
+		spin_lock_irq(&sig->siglock);
+
+		if (unlikely(sig->group_exit)) {
+			/*
+			 * There is a group exit in progress now.
+			 * We'll just ignore the stop and process the
+			 * associated fatal signal.
+			 */
+			spin_unlock_irq(&sig->siglock);
+			read_unlock(&tasklist_lock);
+			return;
+		}
+
+		if (sig->group_stop_count == 0) {
+			sig->group_exit_code = signr;
+			stop_count = 0;
+			for (t = next_thread(current); t != current;
+			     t = next_thread(t))
+				/*
+				 * Setting state to TASK_STOPPED for a group
+				 * stop is always done with the siglock held,
+				 * so this check has no races.
+				 */
+				if (t->state < TASK_STOPPED) {
+					stop_count++;
+					signal_wake_up(t, 0);
+				}
+			sig->group_stop_count = stop_count;
+		}
+		else {
+			/* A race with another thread while unlocked.  */
+			signr = sig->group_exit_code;
+			stop_count = --sig->group_stop_count;
+		}
+
+		current->exit_code = signr;
+		set_current_state(TASK_STOPPED);
+
+		spin_unlock_irq(&sig->siglock);
+		read_unlock(&tasklist_lock);
+	}
+
+	finish_stop(stop_count);
+}
+
+
 #ifndef HAVE_ARCH_GET_SIGNAL_TO_DELIVER

 int get_signal_to_deliver(siginfo_t *info, struct pt_regs *regs)
@@ -1235,6 +1363,28 @@ int get_signal_to_deliver(siginfo_t *info, struct pt_regs *regs)
 		struct k_sigaction *ka;

 		spin_lock_irq(&current->sig->siglock);
+		if (unlikely(current->sig->group_stop_count > 0)) {
+			int stop_count;
+			if (current->sig->group_exit_task == current) {
+				/*
+				 * Group stop is so we can do a core dump.
+				 */
+				current->sig->group_exit_task = NULL;
+				goto dequeue;
+			}
+			/*
+			 * There is a group stop in progress.  We stop
+			 * without any associated signal being in our queue.
+			 */
+			stop_count = --current->sig->group_stop_count;
+			signr = current->sig->group_exit_code;
+			current->exit_code = signr;
+			set_current_state(TASK_STOPPED);
+			spin_unlock_irq(&current->sig->siglock);
+			finish_stop(stop_count);
+			continue;
+		}
+	dequeue:
 		signr = dequeue_signal(mask, info);
 		spin_unlock_irq(&current->sig->siglock);

@@ -1242,6 +1392,16 @@ int get_signal_to_deliver(siginfo_t *info, struct pt_regs *regs)
 			break;

 		if ((current->ptrace & PT_PTRACED) && signr != SIGKILL) {
+			/*
+			 * If there is a group stop in progress,
+			 * we must participate in the bookkeeping.
+			 */
+			if (current->sig->group_stop_count > 0) {
+				spin_lock_irq(&current->sig->siglock);
+				--current->sig->group_stop_count;
+				spin_unlock_irq(&current->sig->siglock);
+			}
+
 			/* Let the debugger run.  */
 			current->exit_code = signr;
 			set_current_state(TASK_STOPPED);
@@ -1254,10 +1414,6 @@ int get_signal_to_deliver(siginfo_t *info, struct pt_regs *regs)
 				continue;
 			current->exit_code = 0;

-			/* The debugger continued.  Ignore SIGSTOP.  */
-			if (signr == SIGSTOP)
-				continue;
-
 			/* Update the siginfo structure.  Is this good?  */
 			if (signr != info->si_signo) {
 				info->si_signo = signr;
@@ -1269,61 +1425,69 @@ int get_signal_to_deliver(siginfo_t *info, struct pt_regs *regs)

 			/* If the (new) signal is now blocked, requeue it.  */
 			if (sigismember(&current->blocked, signr)) {
-				send_sig_info(signr, info, current);
+				spin_lock_irq(&current->sig->siglock);
+				specific_send_sig_info(signr, info, current);
+				spin_unlock_irq(&current->sig->siglock);
 				continue;
 			}
 		}

 		ka = &current->sig->action[signr-1];
-		if (ka->sa.sa_handler == SIG_IGN) {
-			if (signr != SIGCHLD)
-				continue;
-			/* Check for SIGCHLD: it's special.  */
-			while (sys_wait4(-1, NULL, WNOHANG, NULL) > 0)
-				/* nothing */;
+		if (ka->sa.sa_handler == SIG_IGN) /* Do nothing.  */
 			continue;
-		}
+		if (ka->sa.sa_handler != SIG_DFL) /* Run the handler.  */
+			return signr;

-		if (ka->sa.sa_handler == SIG_DFL) {
-			int exit_code = signr;
+		/*
+		 * Now we are doing the default action for this signal.
+		 */
+		if (sig_kernel_ignore(signr)) /* Default is nothing. */
+			continue;

 			/* Init gets no signals it doesn't want.  */
 			if (current->pid == 1)
 				continue;

-			switch (signr) {
-			case SIGCONT: case SIGCHLD: case SIGWINCH: case SIGURG:
-				continue;
-
-			case SIGTSTP: case SIGTTIN: case SIGTTOU:
-				if (is_orphaned_pgrp(current->pgrp))
-					continue;
-				/* FALLTHRU */
-
-			case SIGSTOP: {
-				struct signal_struct *sig;
-				set_current_state(TASK_STOPPED);
-				current->exit_code = signr;
-				sig = current->parent->sig;
-				if (sig && !(sig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
-					notify_parent(current, SIGCHLD);
-				schedule();
+		if (sig_kernel_stop(signr)) {
+			/*
+			 * The default action is to stop all threads in
+			 * the thread group.  The job control signals
+			 * do nothing in an orphaned pgrp, but SIGSTOP
+			 * always works.
+			 */
+			if (signr == SIGSTOP ||
+			    !is_orphaned_pgrp(current->pgrp))
+				do_signal_stop(signr);
 				continue;
 			}

-			case SIGQUIT: case SIGILL: case SIGTRAP:
-			case SIGABRT: case SIGFPE: case SIGSEGV:
-			case SIGBUS: case SIGSYS: case SIGXCPU: case SIGXFSZ:
-				if (do_coredump(signr, exit_code, regs))
-					exit_code |= 0x80;
-				/* FALLTHRU */
-
-			default:
-				sig_exit(signr, exit_code, info);
+		/*
+		 * Anything else is fatal, maybe with a core dump.
+		 */
+		current->flags |= PF_SIGNALED;
+		if (sig_kernel_coredump(signr) &&
+		    do_coredump(signr, signr, regs)) {
+			/*
+			 * That killed all other threads in the group and
+			 * synchronized with their demise, so there can't
+			 * be any more left to kill now.  The group_exit
+			 * flags are set by do_coredump.  Note that
+			 * thread_group_empty won't always be true yet,
+			 * because those threads were blocked in __exit_mm
+			 * and we just let them go to finish dying.
+			 */
+			const int code = signr | 0x80;
+			BUG_ON(!current->sig->group_exit);
+			BUG_ON(current->sig->group_exit_code != code);
+			do_exit(code);
 				/* NOTREACHED */
 			}
-		}
-		return signr;
+
+		/*
+		 * Death signals, no core dump.
+		 */
+		do_group_exit(signr);
+		/* NOTREACHED */
 	}
 	return 0;
 }
@@ -1435,12 +1599,17 @@ long do_sigpending(void *set, unsigned long sigsetsize)
 		goto out;

 	spin_lock_irq(&current->sig->siglock);
-	sigandsets(&pending, &current->blocked, &current->pending.signal);
+	sigorsets(&pending, &current->pending.signal,
+		  &current->sig->shared_pending.signal);
 	spin_unlock_irq(&current->sig->siglock);

+	/* Outside the lock because only this thread touches it.  */
+	sigandsets(&pending, &current->blocked, &pending);
+
 	error = -EFAULT;
 	if (!copy_to_user(set, &pending, sigsetsize))
 		error = 0;
+
 out:
 	return error;
 }	
@@ -1628,10 +1797,18 @@ sys_tkill(int pid, int sig)
 	p = find_task_by_pid(pid);
 	error = -ESRCH;
 	if (p) {
+		error = check_kill_permission(sig, &info, p);
+		/*
+		 * The null signal is a permissions and process existence
+		 * probe.  No signal is actually delivered.
+		 */
+		if (!error && sig && p->sig) {
 			spin_lock_irq(&p->sig->siglock);
-		error = specific_send_sig_info(sig, &info, p, 0);
+			handle_stop_signal(sig, p);
+			error = specific_send_sig_info(sig, &info, p);
 			spin_unlock_irq(&p->sig->siglock);
 		}
+	}
 	read_unlock(&tasklist_lock);
 	return error;
 }
@@ -1664,7 +1841,17 @@ do_sigaction(int sig, const struct k_sigaction *act, struct k_sigaction *oact)

 	k = &current->sig->action[sig-1];

+	read_lock(&tasklist_lock);
 	spin_lock_irq(&current->sig->siglock);
+	if (signal_pending(current)) {
+		/*
+		 * If there might be a fatal signal pending on multiple
+		 * threads, make sure we take it before changing the action.
+		 */
+		spin_unlock_irq(&current->sig->siglock);
+		read_unlock(&tasklist_lock);
+		return -ERESTARTSYS;
+	}

 	if (oact)
 		*oact = *k;
@@ -1683,25 +1870,22 @@ do_sigaction(int sig, const struct k_sigaction *act, struct k_sigaction *oact)
 		 *   pending and whose default action is to ignore the signal
 		 *   (for example, SIGCHLD), shall cause the pending signal to
 		 *   be discarded, whether or not it is blocked"
-		 *
-		 * Note the silly behaviour of SIGCHLD: SIG_IGN means that the
-		 * signal isn't actually ignored, but does automatic child
-		 * reaping, while SIG_DFL is explicitly said by POSIX to force
-		 * the signal to be ignored.
-		 */
-
-		if (k->sa.sa_handler == SIG_IGN
-		    || (k->sa.sa_handler == SIG_DFL
-			&& (sig == SIGCONT ||
-			    sig == SIGCHLD ||
-			    sig == SIGWINCH ||
-			    sig == SIGURG))) {
-			if (rm_sig_from_queue(sig, current))
-				recalc_sigpending();
+		 */
+
+		if (k->sa.sa_handler == SIG_IGN ||
+		    (k->sa.sa_handler == SIG_DFL && sig_kernel_ignore(sig))) {
+			struct task_struct *t = current;
+			rm_from_queue(sigmask(sig), &t->sig->shared_pending);
+			do {
+				rm_from_queue(sigmask(sig), &t->pending);
+				recalc_sigpending_tsk(t);
+				t = next_thread(t);
+			} while (t != current);
 		}
 	}
-
 	spin_unlock_irq(&current->sig->siglock);
+	read_unlock(&tasklist_lock);
+
 	return 0;
 }


--- a/kernel/suspend.c
+++ b/kernel/suspend.c
@@ -65,7 +65,6 @@
 #include <asm/pgtable.h>
 #include <asm/io.h>

-extern void signal_wake_up(struct task_struct *t);
 extern int sys_sync(void);

 unsigned char software_suspend_enabled = 0;
@@ -220,7 +219,7 @@ int freeze_processes(void)
 			   without locking */
 			p->flags |= PF_FREEZE;
 			spin_lock_irqsave(&p->sig->siglock, flags);
-			signal_wake_up(p);
+			signal_wake_up(p, 0);
 			spin_unlock_irqrestore(&p->sig->siglock, flags);
 			todo++;
 		} while_each_thread(g, p);