An error occurred fetching the project authors.
- 14 Sep, 2002 1 commit
-
-
Ingo Molnar authored
I fixed up the 'remove thread group inferiors from the tasklist' patch. I think i managed to find a reasonably good construct to iterate over all threads: do_each_thread(g, p) { ... } while_each_thread(g, p); the only caveat with this is that the construct suggests a single-loop - while it's two loops internally - and 'break' will not work. I added a comment to sched.h that warns about this, but perhaps it would help more to have naming that suggests two loops: for_each_process_do_each_thread(g, p) { ... } while_each_thread(g, p); but this looks a bit too long. I dont know. We might as well use it all unrolled and no helper macros - although with the above construct it's pretty straightforward to iterate over all threads in the system.
-
- 13 Sep, 2002 3 commits
-
-
Andrew Morton authored
This adds support for synchronous iocbs and converts generic_file_read to use a sync iocb to call into generic_file_aio_read. The tests I've run with lmbench on a piii-866 showed no difference in file re-read speed when forced to use a completion path via aio_complete and an -EIOCBQUEUED return from generic_file_aio_read -- people with slower machines might want to test this to see if we can tune it any better. Also, a bug fix to correct a missing call into the aio code from the fork code is present. This patch sets things up for making generic_file_aio_read actually asynchronous.
-
Ingo Molnar authored
This implements the 'keep the initial thread around until every thread in the group exits' concept in a different, less intrusive way, along your suggestions. There is no exit_done completion handling anymore, freeing of the task is still done by wait4(). This has the following side-effect: detached threads/processes can only be started within a thread group, not in a standalone way. (This also fixes the bugs introduced by the ->exit_done code, which made it possible for a zombie task to be reactivated.) I've introduced the p->group_leader pointer, which can/will be used for other purposes in the future as well - since from now on the thread group leader is always existent. Right now it's used to notify the parent of the thread group leader from the last non-leader thread that exits [if the thread group leader is a zombie already].
-
Ingo Molnar authored
I distilled the attached fix-patch from Daniel's bigger patch - it includes all fixes for all currently known ptrace related breakages, which include things like bad behavior (crash) if the tracer process dies unexpectedly.
-
- 11 Sep, 2002 1 commit
-
-
Ingo Molnar authored
This is another step to have better threading support under Linux, it implements the sys_exit_group() system call. It's a straightforward extension of the generic 'thread group' concept, which extension also comes handy to solve a number of problems when implementing POSIX threads. POSIX exit() [the C library function] has the following semantics: all thread have to exit and the waiting parent has to get the exit code that was specified for the exit() function. It also has to be ensured that every thread has truly finished its work by the time the parent gets the notification. The exit code has to be propagated properly to the parent thread even if not the thread group leader calls the exit() function. Normal single-thread exit is done via the pthread_exit() function, which calls sys_exit(). Previous incarnations of Linux POSIX threads implementations chose the following solution: send a 'thread management' signal to the thread group leader via tkill(), which thread goes around and kills every thread in the group (except itself), then calls sys_exit() with the proper exit code. Both old libpthreads and NGPT use this solution. This works to a certain degree, unless a userspace threading library uses the initial thread for normal thread work [like the new libpthreads], which 'work' can cause the initial thread to exit prematurely. At this point the threading library has to catch the group leader in pthread_exit() and has to keep the management thread 'hanging around' artificially, waiting for the management signal. Besides being slightly confusing to users ('why is this thread still around?') even this variant is unrobust: if the initial thread is killed by the kernel (SIGSEGV or any other thread-specific event that triggers do_exit()) then the thread goes away without the thread library having a chance to intervene. the sys_exit_group() syscall implements the mechanism within the kernel, which, besides robustness, is also *much* faster. Instead of the threading library having to tkill() every thread available, the kernel can use the already existing 'broadcast signal' capability. (the threading library cannot use broadcast signals because that would kill the initial thread as well.) as a side-effect of the completion mechanism used by sys_exit_group() it was also possible to make the initial thread hang around as a zombie until every other thread in the group has exited. A 'Z' state thread is much easier to understand by users - it's around because it has to wait for all other threads to exit first. and as a side-effect of the initial thread hanging around in a guaranteed way, there are three advantages: - signals sent to the thread group via sys_kill() work again. Previously if the initial thread exited then all subsequent sys_kill() calls to the group PID failed with a -ESRCH. - the get_pid() function got faster: it does not have to check for tgid collision anymore. - procps has an easier job displaying threaded applications - since the thread group leader is always around, no thread group can 'hide' from procps just because the thread group leader has exited. [ - NOTE: the same mechanism can/will also be used by the upcoming threaded-coredumps patch. ] there's also another (small) advantage for threading libraries: eg. the new libpthreads does not even have any notion of 'group of threads' anymore - it does not maintain any global list of threads. Via this syscall it can purely rely on the kernel to manage thread groups. the patch itself does some internal changes to the way a thread exits: now the unhashing of the PID and the signal-freeing is done atomically. This is needed to make sure the thread group leader unhashes itself precisely when the last thread group member has exited. (the sys_exit_group() syscall has been used by glibc's new libpthreads code for the past couple of weeks and the concept is working just fine.)
-
- 08 Sep, 2002 1 commit
-
-
Ingo Molnar authored
Support POSIX compliant thread signals on a kernel level with usable debugging (broadcast SIGSTOP, SIGCONT) and thread group management (broadcast SIGKILL), plus to load-balance 'process' signals between threads for better signal performance. Changes: - POSIX thread semantics for signals there are 7 'types' of actions a signal can take: specific, load-balance, kill-all, kill-all+core, stop-all, continue-all and ignore. Depending on the POSIX specifications each signal has one of the types defined for both the 'handler defined' and the 'handler not defined (kernel default)' case. Here is the table: ---------------------------------------------------------- | | userspace | kernel | ---------------------------------------------------------- | SIGHUP | load-balance | kill-all | | SIGINT | load-balance | kill-all | | SIGQUIT | load-balance | kill-all+core | | SIGILL | specific | kill-all+core | | SIGTRAP | specific | kill-all+core | | SIGABRT/SIGIOT | specific | kill-all+core | | SIGBUS | specific | kill-all+core | | SIGFPE | specific | kill-all+core | | SIGKILL | n/a | kill-all | | SIGUSR1 | load-balance | kill-all | | SIGSEGV | specific | kill-all+core | | SIGUSR2 | load-balance | kill-all | | SIGPIPE | specific | kill-all | | SIGALRM | load-balance | kill-all | | SIGTERM | load-balance | kill-all | | SIGCHLD | load-balance | ignore | | SIGCONT | load-balance | continue-all | | SIGSTOP | n/a | stop-all | | SIGTSTP | load-balance | stop-all | | SIGTTIN | load-balancen | stop-all | | SIGTTOU | load-balancen | stop-all | | SIGURG | load-balance | ignore | | SIGXCPU | specific | kill-all+core | | SIGXFSZ | specific | kill-all+core | | SIGVTALRM | load-balance | kill-all | | SIGPROF | specific | kill-all | | SIGPOLL/SIGIO | load-balance | kill-all | | SIGSYS/SIGUNUSED | specific | kill-all+core | | SIGSTKFLT | specific | kill-all | | SIGWINCH | load-balance | ignore | | SIGPWR | load-balance | kill-all | | SIGRTMIN-SIGRTMAX | load-balance | kill-all | ---------------------------------------------------------- as you can see it from the list, signals that have handlers defined never get broadcasted - they are either specific or load-balanced. - CLONE_THREAD implies CLONE_SIGHAND It does not make much sense to have a thread group that does not share signal handlers. In fact in the patch i'm using the signal spinlock to lock access to the thread group. I made the siglock IRQ-safe, thus we can load-balance signals from interrupt contexts as well. (we cannot take the tasklist lock in write mode from IRQ handlers.) this is not as clean as i'd like it to be, but it's the best i could come up with so far. - thread group list management reworked. threads are now removed from the group if the thread is unhashed from the PID table. This makes the most sense. This also helps with another feature that relies on an intact thread group list: multithreaded coredumps. - child reparenting reworked. the O(N) algorithm in forget_original_parent() causes massive performance problems if a large number of threads exit from the group. Performance improves more than 10-fold if the following simple rules are followed instead: - reparent children to the *previous* thread [exiting or not] - if a thread is detached then reparent to init. - fast broadcasting of kernel-internal SIGSTOP, SIGCONT, SIGKILL, etc. kernel-internal broadcasted signals are a potential DoS problem, since they might generate massive amounts of GFP_ATOMIC allocations of siginfo structures. The important thing to note is that the siginfo structure does not actually have to be allocated and queued - the signal processing code has all the information it needs, neither of these signals carries any information in the siginfo structure. This makes a broadcast SIGKILL a very simple operation: all threads get the bit 9 set in their pending bitmask. The speedup due to this was significant - and the robustness win is invaluable. - sys_execve() should not kill off 'all other' threads. the 'exec kills all threads if the master thread does the exec()' is a POSIX(-ish) thing that should not be hardcoded in the kernel in this case. to handle POSIX exec() semantics, glibc uses a special syscall, which kills 'all but self' threads: sys_exit_allbutself(). the straightforward exec() implementation just calls sys_exit_allbutself() and then sys_execve(). (this syscall is also be used internally if the thread group leader thread sys_exit()s or sys_exec()s, to ensure the integrity of the thread group.)
-
- 07 Sep, 2002 1 commit
-
-
Daniel Jacobowitz authored
Here are the changes I have - Fix some bugs I introduced in zap_thread - Improve the check for traced children in sys_wait4 - Fix parent links when using CLONE_PTRACE My thanks to OGAWA Hirofumi for pointing out the first bit. The only other issue I know of is something else Hirofumi pointed out earlier; there are problems when a tracing process dies unexpectedly. I'll come back to that later.
-
- 05 Sep, 2002 1 commit
-
-
Ingo Molnar authored
This is the pid-max patch, the one i sent for 2.5.31 was botched. I have removed the 'once' debugging stupidity - now PIDs start at 0 again. Also, for an unknown reason the previous patch missed the hunk that had the declaration of 'DEFAULT_PID_MAX' which made it not compile ...
-
- 30 Aug, 2002 1 commit
-
-
Ingo Molnar authored
This moves CLONE_SETTID and CLONE_CLEARTID handling into kernel/fork.c, where it belongs. [the CLONE_SETTLS is x86-specific and thus remains in the per-arch process.c] This makes support for these two new flags much easier: architectures only have to pass in the user_tid pointer.
-
- 23 Aug, 2002 1 commit
-
-
David S. Miller authored
- futex uses int as its atomic word type, we pass in user_tid to the futex routines, so the types must match
-
- 19 Aug, 2002 1 commit
-
-
Ingo Molnar authored
the attached patch updates a number of items: - adds cleanups suggested by Christoph Hellwig: needed unlikely() statements, a superfluous #define and line length problems. - splits up the global ptrace list into per-task ptrace lists. This was pretty straightforward, and this makes the worst-case exit() latency O(nr_children). the per-task ptrace lists unearthed a bug that the previous code did not take care of: tasks on the ptrace list have to be correctly reparented as well. This patch passed my stresstests as well.
-
- 17 Aug, 2002 1 commit
-
-
Ingo Molnar authored
This updates the CLONE_CLEARTID case to use futexes to make it easier to wait for a thread exit. glibc/pthreads had been updated to use the TID-futex, this removes an extra system-call and it also simplifies the pthread_join() code. The pthreads testcode works just fine with the new kernel and does not work with a kernel that does not do the futex wakeup, so it's working fine.
-
- 15 Aug, 2002 2 commits
-
-
Ingo Molnar authored
you have applied my independent-pointer patch already, but i think your CLEARTID variant is the most elegant solution: it reuses a clone argument, thus reduces the number of arguments and it's also a nice conceptual pair to the existing SETTID call. And the TID field can be used as a 'usage' field as well, because the TID (PID) can never be 0, reducing the number of fields in the TCB. And we can change the userspace locking code to use the TID field no problem.
-
Paul Larson authored
Include tgid when finding next_safe in get_pid()
-
- 13 Aug, 2002 3 commits
-
-
Andrew Morton authored
- I changed the sector_t thing in max_block to use davem's approach. I agree with Anton, but making it explicit doesn't hurt. - Remove a dead comment in copy_strings. Old stuff: - Remove the IO error warning in end_buffer_io_sync(). Failed READA attempts trigger it. - Emit a warning when an ext2 is mounting an ext3 filesystem. We have had quite a few problem reports related to this, mainly arising from initrd problems. And mount(8) tends to report the fstype from /etc/fstab rather than reporting what has really happened. Fixes some bogosity which I added to max_block(): - `size' doesn't need to be sector_t - `retval' should not be initialised to "~0UL" because that is 0x00000000ffffffff with 64-bit sector_t. - Allocate task_structs with GFP_KERNEL, as discussed. - Convert the EXPORT_SYMBOL for generic_file_direct_IO() to EXPORT_SYMBOL_GPL. That was only exported as a practicality for the raw driver. - Make the loop thread run balance_dirty_pages() after dirtying the backing file. So it will perform writeback of the backing file when dirty memory levels are high. Export balance_dirty_pages to GPL modules for this. This makes loop work a lot better - I suspect it broke when callers of balance_dirty_pages() started writing back only their own queue. There are many page allocation failures under heavy loop writeout. Coming from blk_queue_bounce()'s allocation from the page_pool mempool. So... - Disable page allocation warnings around the initial atomic allocation attempt in mempool_alloc() - the one where __GFP_WAIT and __GFP_IO were turned off. That one can easily fail. - Add some commentary in block_write_full_page()
-
Ingo Molnar authored
This implements CLONE_VM_RELEASE, which lets the child release the 'user VM' at mm_release() time.
-
Ingo Molnar authored
the attached patch implements the per-CPU thread-structure cache to do detached exit, if the parent does not want to be notified of child exit via a signal.
-
- 08 Aug, 2002 1 commit
-
-
Linus Torvalds authored
-
- 30 Jul, 2002 1 commit
-
-
Linus Torvalds authored
copy_process() just copies the process, it doesn't actually start it. This is in preparation for doing a "atomically start process on CPU X" or other cases where we want to change the state of the process before we actually start running it.
-
- 29 Jul, 2002 1 commit
-
-
Hugh Dickins authored
Update Doc and remove FIXME comment from fork.c now accounting right.
-
- 28 Jul, 2002 1 commit
-
-
Andrew Morton authored
Alan's overcommit patch, brought to 2.5 by Robert Love. Can't say I've tested its functionality at all, but it doesn't crash, it has been in -ac and RH kernels for some time and I haven't observed any of its functions on profiles. "So what is strict VM overcommit? We introduce new overcommit policies that attempt to never succeed an allocation that can not be fulfilled by the backing store and consequently never OOM. This is achieved through strict accounting of the committed address space and a policy to allow/refuse allocations based on that accounting. In the strictest of modes, it should be impossible to allocate more memory than available and impossible to OOM. All memory failures should be pushed down to the allocation routines -- malloc, mmap, etc. The new modes are available via sysctl (same as before). See Documentation/vm/overcommit-accounting for more information."
-
- 24 Jul, 2002 2 commits
-
-
Ingo Molnar authored
- introduce new type of context-switch locking, this is a must-have for ia64 and sparc64. - load_balance() bug noticed by Scott Rhine and myself: scan the whole list to find imbalance number of tasks, not just the tail of the list. - sched_yield() fix: use current->array not rq->active.
-
Ingo Molnar authored
- init thread needs to have preempt_count of 1 until sched_init(). (William Lee Irwin III) - clean up the irq-mask macros. (Linus) - add barrier() to irq_enter() and irq_exit(). (based on Oleg Nesterov's comment.) - move the irqs-off check into preempt_schedule() and remove CONFIG_DEBUG_IRQ_SCHEDULE. - remove spin_unlock_no_resched() and comment the affected places more agressively. - slab.c needs to spin_unlock_no_resched(), instead of spin_unlock(). (It also has to check for preemption in the right spot.) This should fix the memory corruption. - irq_exit() needs to run softirqs if interrupts not active - in the previous patch it ran them when preempt_count() was 0, which is incorrect. - spinlock macros are updated to enable preemption after enabling interrupts. Besides avoiding false positive warnings, this also - fork.c has to call scheduler_tick() with preemption disabled - otherwise scheduler_tick()'s spin_unlock can preempt! - irqs_disabled() macro introduced. - [ all other local_irq_enable() or sti instances conditional on CONFIG_DEBUG_IRQ_SCHEDULE are to fix false positive warnings. ] - fix buggy in_softirq(). Fortunately the bug made the test broader, which didnt result in algorithmical breakage, just suboptimal performance. - move do_softirq() processing into irq_exit() => this also fixes the softirq processing bugs present in apic.c IRQ handlers that did not test for softirqs after irq_exit(). - simplify local_bh_enable().
-
- 22 Jul, 2002 1 commit
-
-
Ingo Molnar authored
Make people use the proper cli/sti replacements
-
- 19 Jul, 2002 2 commits
-
-
Greg Kroah-Hartman authored
-
Andrew Morton authored
This is the "minimal rmap" patch, writen by Rik, ported to 2.5 by Craig Kulsea. Basically, before: When the page reclaim code decides that is has scanned too many unreclaimable pages on the LRU it does a scan of process virtual address spaces for pages to add to swapcache. ptes pointing at the page are unmapped as the scan proceeds. When all ptes referring to a page have been unmapped and it has been written to swap the page is reclaimable. after: When an anonymous page is encountered on the tail of the LRU we use the rmap to see if it hasn't been referenced lately. If so then add it to swapcache. When the page is again encountered on the LRU, if it is still unreferenced then try to unmap all ptes which refer to it in one hit, and if it is clean (ie: on swap) then free it. The rest of the VM - list management, the classzone concept, etc remains unchanged. There are a number of things which the per-page pte chain could be used for. Bill Irwin has identified the following. (1) page replacement no longer goes around randomly unmapping things (2) referenced bits are more accurate because there aren't several ms or even seconds between find the multiple pte's mapping a page (3) reduces page replacement from O(total virtually mapped) to O(physical) (4) enables defragmentation of physical memory (5) enables cooperative offlining of memory for friendly guest instance behavior in UML and/or LPAR settings (6) demonstrable benefit in performance of swapping which is common in end-user interactive workstation workloads (I don't like the word "desktop"). c.f. Craig Kulesa's post wrt. swapping performance (7) evidence from 2.4-based rmap trees indicates approximate parity with mainline in kernel compiles with appropriate locking bits (8) partitioning of physical memory can reduce the complexity of page replacement searches by scanning only the "interesting" zones implemented and merged in 2.4-based rmap (9) partitioning of physical memory can increase the parallelism of page replacement searches by independently processing different zones implemented, but not merged in 2.4-based rmap (10) the reverse mappings may be used for efficiently keeping pte cache attributes coherent (11) they may be used for virtual cache invalidation (with changes) (12) the reverse mappings enable proper RSS limit enforcement implemented and merged in 2.4-based rmap The code adds a pointer to struct page, consumes additional storage for the pte chains and adds computational expense to the page reclaim code (I measured it at 3% additional load during streaming I/O). The benefits which we get back for all this are, I must say, theoretical and unproven. If it has real advantages (or, indeed, disadvantages) then why has nobody demonstrated them? There are a number of things remaining to be done: 1: Demonstrate the above advantages. 2: Make it work with pte-highmem (Bill Irwin is signed up for this) 3: Don't add pte_chains to non-shared pages optimisation (Dave McCracken's patch does this) 4: Move the pte_chains into highmem too (Bill, I guess) 5: per-cpu pte_chain freelists (Rik?) 6: maybe GC the pte_chain backing pages. (Seems unavoidable. Rik?) 7: multithread the page reclaim code. (I have patches). 8: clustered add-to-swap. Not sure if I buy this. anon pages are often well-ordered-by-virtual-address on the LRU, so it "just works" for benchmarky loads. But there may be some other loads... 9: Fix bad IO latency in page reclaim (I have lame patches) 10: Develop tuning tools, use them. 11: The nightly updatedb run is still evicting everything.
-
- 01 Jul, 2002 1 commit
-
-
Linus Torvalds authored
Stop using "struct tms" internally - always use timer ticks (or one of the sane timeval/timespec types) instead. Explicitly convert to clock_t when copying to user space for the old broken interfaces that still use "clock_t". Clean up and unify jiffies<->timeval conversion.
-
- 20 Jun, 2002 1 commit
-
-
Stephen Rothwell authored
dup_task_struct is defined and used only in kernel/fork.c.
-
- 17 Jun, 2002 1 commit
-
-
Rusty Russell authored
This patch removes the concept of "logical" CPU numbers, in preparation for CPU hotplugging.
-
- 09 Jun, 2002 1 commit
-
-
Russell King authored
Since namespace.h needs the contents of dcache, task struct and semaphores, it seems sensible to include these two files into namespace.h. For the future: If the task_struct in sched.h is split into its own include file, namespace.h could include this file, but namespace.h will also need asm/semaphore.h
-
- 03 Jun, 2002 1 commit
-
-
Robert Love authored
This patch removes the whole wq_lock_t abstraction, forcing the behavior to be that of a standard spinlock and changes all the wq_lock code in the tree appropriately. Removes lots of code - always a Good Thing to me. New behavior is same as previous behavior (USE_RW_WAIT_QUEUE_SPINLOCK unset).
-
- 13 May, 2002 1 commit
-
-
Rusty Russell authored
This changes do_fork() to return the task struct, rather than the PID. Also changes CLONE_PID ("if my pid is 0, copy it") to CLONE_IDLETASK ("set child's pid to zero"), and disallows access to the flag from user mode.
-
- 07 May, 2002 1 commit
-
-
Colin Gibbs authored
- If dup_mmap fails we will try to destroy_context before init_new_context occurs. Platforms with non-trivial init_new_context can explode because of this. The fix is to invoke init_new_context before dup_mmap.
-
- 28 Apr, 2002 1 commit
-
-
Dave Jones authored
Originally from Manfred Spraul. * dynamically grow the LDT Every app that's linked against libpthread right now allocates a full 64 kB LDT, without proper error handling, and always from the vmalloc area
-
- 23 Apr, 2002 2 commits
-
-
Dave Olien authored
As we discussed some time ago, here is a patch for the SEM_UNDO change that can be applied to linux-2.5.9.
-
Kanoj Sarcar authored
Make sure that flush_tlb_range is called with PTL held. Also, make sure no new threads can start up in user mode while a tlb_gather_mmu is in progress.
-
- 22 Apr, 2002 1 commit
-
-
Alexander Viro authored
- sane dentry retention. Namely, we don't kill /proc/<pid> dentries at the first opportunity (as the current tree does). Instead we do the following: * ->d_delete() kills it only if process is already dead. * all ->lookup() in proc/base.c end with checking if process is still alive and unhash if it isn't. * proc_pid_lookup() (lookup for /proc/<pid>) caches reference to dentry in task_struct. It's _not_ counted in ->d_count. * ->d_iput() resets said reference to NULL. * release_task() (burying a zombie) checks if there is a cached reference and if there is - shrinks the subtree. * tasklist_lock is used for exclusion. That way we are guaranteed that after release_task() all dentries in /proc/<pid> will go away as soon as possible; OTOH, before release_task() we have normal retention policy - they go away under memory pressure with the same rules as for dentries on any other fs.
-
- 24 Mar, 2002 1 commit
-
-
Richard Henderson authored
asm/pgtable.h and/or asm/pgalloc.h to asm/cacheflush.h, and tlb flushing routines to asm/tlbflush.h.
-
- 15 Mar, 2002 1 commit
-
-
David Howells authored
This patch (#1) just converts the task_struct to use struct list_head rather than direct pointers for maintaining the children list.
-
- 21 Feb, 2002 1 commit
-
-
Ingo Molnar authored
- make vma->vm_next_share and vma->vm_pprev_share a proper list.h list as well.
-