- 08 Feb, 2008 40 commits
-
-
Eric W. Biederman authored
pid_vnr returns the user space pid with respect to the pid namespace the struct pid was allocated in. What we want before we return a pid to user space is the user space pid with respect to the pid namespace of current. pid_vnr is a very nice optimization but because it isn't quite what we want it is easy to use pid_vnr at times when we aren't certain the struct pid was allocated in our pid namespace. Currently this describes at least tiocgpgrp and tiocgsid in ttyio.c the parent process reported in the core dumps and the parent process in get_signal_to_deliver. So unless the performance impact is huge having an interface that does what we want instead of always what we want should be much more reliable and much less error prone. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Eric W. Biederman authored
This modifies do_wait and eligible child to take a pair of enum pid_type and struct pid *pid to precisely specify what set of processes are eligible to be waited for, instead of the raw pid_t value from sys_wait4. This fixes a bug in sys_waitid where you could not wait for children in just process group 1. This fixes a pid namespace crossing case in eligible_child. Allowing us to wait for a processes in our current process group even if our current process group == 0. This allows the no child with this pid case to be optimized. This allows us to optimize the pid membership test in eligible child to be optimized. This even closes a theoretical pid wraparound race where in a threaded parent if two threads are waiting for the same child and one thread picks up the child and the pid numbers wrap around and generate another child with that same pid before the other thread is scheduled (teribly insanely unlikely) we could end up waiting on the second child with the same pid# and not discover that the specific child we were waiting for has exited. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
The previous bugfix was not optimal, we shouldn't care about group stop when we are the only thread or the group stop is in progress. In that case nothing special is needed, just set PF_EXITING and return. Also, take the related "TIF_SIGPENDING re-targeting" code from exit_notify(). So, from the performance POV the only difference is that we don't trust !signal_pending() until we take ->siglock. But this in fact fixes another ___pure___ theoretical minor race. __group_complete_signal() finds the task without PF_EXITING and chooses it as the target for signal_wake_up(). But nothing prevents this task from exiting in between without noticing the pending signal and thus unpredictably delaying the actual delivery. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Eric's "fix clone(CLONE_NEWPID)" eliminated the last reason for this hack. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
do_signal_stop() counts all sub-thread and sets ->group_stop_count accordingly. Every thread should decrement ->group_stop_count and stop, the last one should notify the parent. However a sub-thread can exit before it notices the signal_pending(), or it may be somewhere in do_exit() already. In that case the group stop never finishes properly. Note: this is a minimal fix, we can add some optimizations later. Say we can return quickly if thread_group_empty(). Also, we can move some signal related code from exit_notify() to exit_signals(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
As Eric pointed out, there is no problem with init starting with sid == pgid == 0, and this was historical linux behavior changed in 2.6.18. Remove kernel_init()->__set_special_pids(), this is unneeded and complicates the rules for sys_setsid(). This change and the previous change in daemonize() mean that /sbin/init does not need the special "session != 1" hack in sys_setsid() any longer. We can't remove this check yet, we should cleanup copy_process(CLONE_NEWPID) first, so update the comment only. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Daemonized kernel threads run in the init's session. This doesn't match the behaviour of kthread_create()'ed threads, and this is one of the 2 reasons why we need a special hack in sys_setsid(). Now that set_special_pids() was changed to use struct pid, not pid_t, we can use init_struct_pid and set 0,0 special pids. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Change set_special_pids() to work with struct pid, not pid_t from global name space. This again speedups and imho cleanups the code, also a preparation for the next patch. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
sys_setsid() still deals with pid_t's from the global namespace. This means that the "session > 1" check can't help for sub-namespace init, setsid() can't succeed because copy_process(CLONE_NEWPID) populates PIDTYPE_PGID/SID links. Remove the usage of task_struct->pid and convert the code to use "struct pid". This also simplifies and speedups the code, saves one find_pid(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
sys_setpgid() does unneeded conversions from pid_t to "struct pid" and vice versa. Use "struct pid" more consistently. Saves one find_vpid() and eliminates the explicit usage of ->nsproxy->pid_ns. Imho, cleanups the code. Also use the same_thread_group() helper. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
The first "p->exit_state != EXIT_ZOMBIE" check doesn't make too much sense. The exit_state was EXIT_ZOMBIE when the function was called, and another thread can change it to EXIT_DEAD right after the check. The second condition is not possible, detached non-traced threads were already filtered out by eligible_child(), we didn't drop tasklist since then. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Surprise, the other two wait_task_*() functions also abuse the task_pid_nr_ns() function, and may cause read-after-free or report nr == 0 in wait_task_continued(). wait_task_zombie() doesn't have this problem, but it is still better to cache pid_t rather than call task_pid_nr_ns() three times on the saved pid_namespace. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Imho, the current usage of security_task_wait() is not logical. Suppose we have the single child p, and security_task_wait(p) return -EANY. In that case waitpid(-1) returns this error. Why? Isn't it better to return ECHLD? We don't really have reapable children. Now suppose that child was stolen by gdb. In that case we find this child on ->ptrace_children and set flag = 1, but we don't check that the child was denied. So, do_wait(..., WNOHANG) returns 0, this doesn't match the behaviour above. Without WNOHANG do_wait() blocks only to return the error later, when the child will be untraced. Inho, really strange. I think eligible_child() should return the error only if the child's pid was requested explicitly, otherwise we should silently ignore the tasks which were nacked by security_task_wait(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
eligible_child() == 2 means delay_group_leader(). With the previous patch this only matters for EXIT_ZOMBIE task, we can move that special check to the only place it is really needed. Also, with this patch we don't skip security_task_wait() for the group leaders in a non-empty thread group. I don't really understand the exact semantics of security_task_wait(), but imho this change is a bugfix. Also rearrange the code a bit to kill an ugly "check_continued" backdoor. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Roland McGrath <roland@redhat.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
wait_task_stopped() doesn't need the "delay_group_leader" parameter. If the child is not traced it must be a group leader. With or without subthreads ->group_stop_count == 0 when the whole task is stopped. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Mika Penttila <mika.penttila@kolumbus.fi> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
If the tracer is gone and we are not going to stop, ptrace_stop() sets ->exit_code = nostop_code. However, the tracer could actually clear the exit code before detaching. In that case get_signal_to_deliver() "resends" the signal which was cancelled by the debugger. For example, it is possible that a quick PTRACE_ATTACH + PTRACE_DETACH can leave the tracee in STOPPED state. Change the behaviour of ptrace_stop(). If the caller is ptrace notify(), we should always clear ->exit_code. If the caller is get_signal_to_deliver(), we should not touch it at all. To do so, change the nonstop_code parameter to "bool clear_code" and change the callers accordingly. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Every branch if the main "if" statement does the same code at the end. Move it down. Also, fix the indentation. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
wait_task_stopped() has multiple races with SIGCONT/SIGKILL. tasklist_lock does not pin the child in TASK_TRACED/TASK_STOPPED stated, almost all info reported (including exit_code) may be wrong. In fact, the code under write_lock_irq(tasklist_lock) is not safe. The child may be PTRACE_DETACH'ed at this time by another subthread, in that case it is possible we are no longer its ->parent. Change wait_task_stopped() to take ->siglock before inspecting the task. This guarantees that the child can't resume and (for example) clear its ->exit_code, so we don't need to use xchg(&p->exit_code) and re-check. The only exception is ptrace_stop() which changes ->state and ->exit_code without ->siglock held during abort. But this can only happen if both the tracer and the tracee are dying (coredump is in progress), we don't care. With this patch wait_task_stopped() doesn't move the child to the end of the ->parent list on success. This optimization could be restored, but in that case we have to take write_lock(tasklist) and do some nasty checks. Also change the do_wait() since we don't return EAGAIN any longer. [akpm@linux-foundation.org: fix up after Willy renamed everything] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
If the tracer went away (may_ptrace_stop() failed), ptrace_stop() drops tasklist and then changes the ->state from TASK_TRACED to TASK_RUNNING. This can fool another tracer which attaches to us in between. Change the ->state under tasklist_lock to ensure that ptrace_check_attach() can't wrongly succeed. Also, remove the unnecessary mb(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
It is not possible to see the PT_PTRACED task without ->signal/sighand under tasklist_lock, release_task() does ptrace_unlink() first. If the task was already released before, ptrace_attach() can't succeed and set PT_PTRACED. Remove this check. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Now that my_ptrace_child() is trivial we can use the "p->ptrace & PT_PTRACED" inline and simplify the corresponding logic in do_wait: we can't find the child in TASK_TRACED state without PT_PTRACED flag set, ptrace_untrace() either sets TASK_STOPPED or wakes up the tracee. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Since the patch "Fix ptrace_attach()/ptrace_traceme()/de_thread() race" commit f5b40e36 we set PT_ATTACHED and change child->parent "atomically" wrt task_list lock. This means we can remove the checks like "PT_ATTACHED && ->parent != ptracer" which were needed to catch the "ptrace attach is in progress" case. We can also remove the flag itself since nobody else uses it. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Pierre Peiffer authored
sem_exit_ns(), msg_exit_ns() and shm_exit_ns() are all called when an ipc_namespace is released to free all ipcs of each type. But in fact, they do the same thing: they loop around all ipcs to free them individually by calling a specific routine. This patch proposes to consolidate this by introducing a common function, free_ipcs(), that do the job. The specific routine to call on each individual ipcs is passed as parameter. For this, these ipc-specific 'free' routines are reworked to take a generic 'struct ipc_perm' as parameter. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Pierre Peiffer authored
Each ipc_namespace contains a table of 3 pointers to struct ipc_ids (3 for msg, sem and shm, structure used to store all ipcs) These 'struct ipc_ids' are dynamically allocated for each icp_namespace as the ipc_namespace itself (for the init namespace, they are initialized with pointers to static variables instead) It is so for historical reason: in fact, before the use of idr to store the ipcs, the ipcs were stored in tables of variable length, depending of the maximum number of ipc allowed. Now, these 'struct ipc_ids' have a fixed size. As they are allocated in any cases for each new ipc_namespace, there is no gain of memory in having them allocated separately of the struct ipc_namespace. This patch proposes to make this table static in the struct ipc_namespace. Thus, we can allocate all in once and get rid of all the code needed to allocate and free these ipc_ids separately. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Acked-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Pierre Peiffer authored
These commands (SEM_STAT and IPC_STAT) are rather doing the same things (only the meaning of the id given as input and the return value differ). However, for the semaphores, they are handled in two different places (two different functions). This patch consolidates this for clarification by handling these both commands in the same place in semctl_nolock(). It also removes one unused parameter for this function. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Pavel Emelyanov authored
ipc_lock_check_down(), ipc_lock_check() and ipcget() seem too large to be inline. Besides, they give no optimization being inline as they perform calls inside in any case. Moving them into ipc/util.c saves 500 bytes of vmlinux and shortens IPC internal API. $ ./scripts/bloat-o-meter vmlinux-orig vmlinux add/remove: 3/2 grow/shrink: 0/10 up/down: 490/-989 (-499) function old new delta ipcget - 392 +392 ipc_lock_check_down - 49 +49 ipc_lock_check - 49 +49 sys_semget 119 105 -14 sys_shmget 108 86 -22 sys_msgget 100 78 -22 do_msgsnd 665 631 -34 do_msgrcv 680 644 -36 do_shmat 771 733 -38 sys_msgctl 1302 1229 -73 ipcget_new 80 - -80 sys_semtimedop 1534 1452 -82 sys_semctl 2034 1922 -112 sys_shmctl 1919 1765 -154 ipcget_public 322 - -322 The ipcget() growth is the result of gcc inlining of currently static ipcget_new/_public. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alan Cox authored
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alan Cox authored
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alan Cox authored
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alan Cox authored
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alan Cox authored
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alan Cox authored
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alan Cox authored
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alan Cox authored
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alan Cox authored
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alan Cox authored
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alan Cox authored
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alan Cox authored
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alan Cox authored
Also removes a cflag comparison that caused some mode changes to get wrongly ignored Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alan Cox authored
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-