- 15 Sep, 2002 7 commits
-
-
Andrew Morton authored
The patch adds a "Mapped" field to /proc/meminfo - tha amount of memory which is mapped into pagetables. This is a useful statistic to monitor when testing and observing the vitual memory system.
-
Andrew Morton authored
From Hugh Dickins. Fix a leak in the /proc/meminfo:ReverseMaps accounting.
-
Andrew Morton authored
Rohit Seth's ia32 huge tlb pages patch. Anton Blanchard took a look at this today; he seemed happy with it and said he could borrow bits.
-
Andrew Morton authored
The /proc/meminfo:Buffers statistic is quite useful - it tells us how effective we are being at caching filesystem metadata. For example, increases in this figure are a measure of success of the slablru and buffer_head-limitation patches. The patch resurrects buffermem accounting. The metric is calculated on-demand, via a walk of the blockdev hashtable.
-
Andrew Morton authored
zap_page_range and truncate are the two main latency problems in the VM/VFS. The radix-tree-based truncate grinds that into the dust, but no algorithmic fixes for pagetable takedown have presented themselves... Patch from Robert Love. Attached patch implements a low latency version of "zap_page_range()". Calls with even moderately large page ranges result in very long lock held times and consequently very long periods of non-preemptibility. This function is in my list of the top 3 worst offenders. It is gross. This new version reimplements zap_page_range() as a loop over ZAP_BLOCK_SIZE chunks. After each iteration, if a reschedule is pending, we drop page_table_lock and automagically preempt. Note we can not blindly drop the locks and reschedule (e.g. for the non-preempt case) since there is a possibility to enter this codepath holding other locks. ... I am sure you are familar with all this, its the same deal as your low-latency work. This patch implements the "cond_resched_lock()" as we discussed sometime back. I think this solution should be acceptable to you and Linus. There are other misc. cleanups, too. This new zap_page_range() yields latency too-low-to-benchmark: <<1ms.
-
Linus Torvalds authored
-
bk://ppc.bkbits.net/for-linus-ppcLinus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
- 16 Sep, 2002 10 commits
-
-
Paul Mackerras authored
-
Paul Mackerras authored
This gets rid of ide_request/free_irq, ide_get/release_lock, ide_check/request/release_region etc.
-
Paul Mackerras authored
-
Paul Mackerras authored
There is a perfectly good one in drivers/ide/ide-iops.c now.
-
Paul Mackerras authored
and add exit_group to the syscall table.
-
Paul Mackerras authored
-
Paul Mackerras authored
-
Paul Mackerras authored
-
Paul Mackerras authored
-
Paul Mackerras authored
into samba.org:/home/paulus/kernel/for-linus-ppc
-
- 15 Sep, 2002 10 commits
-
-
Ingo Molnar authored
The broadcast SIGKILL kept pending in the new thread as well, and killed it prematurely ...
-
Linus Torvalds authored
-
Ingo Molnar authored
This implements one of the last missing POSIX threading details - exec() semantics. Previous kernels had code that tried to handle it, but that code had a number of disadvantages: - it only worked if the exec()-ing thread was the thread group leader, creating an assymetry. This does not work if the thread group leader has exited already. - it was racy: it sent a SIGKILL to every thread in the group but did not wait for them to actually process the SIGKILL. It did a yield() but that is not enough. All 'other' threads have to finish processing before we can continue with the exec(). This adds the same logic, but extended with the following enhancements: - works from non-leader threads just as much as the thread group leader. - waits for all other threads to exit before continuing with the exec(). - reuses the PID of the group. It would perhaps be a more generic approach to add a new syscall, sys_ungroup() - which would do largely what de_thread() does in this patch. But it's not really needed now - posix_spawn() is currently implemented via starting a non-CLONE_THREAD helper thread that does a sys_exec(). There's no API currently that needs a direct exec() from a thread - but it could be created (such as pthread_exec_np()). It would have the advantage of not having to go through a helper thread, but the difference is minimal.
-
Ingo Molnar authored
This fixes one more exit-time resource accounting issue - and it's also a speedup and a thread-tree (to-be thread-aware pstree) visual improvement. In the current code we reparent detached threads to the init thread. This worked but was not very nice in ps output: threads showed up as being related to init. There was also a resource-accounting issue, upon exit they update their parent's (ie. init's) rusage fields - effectively losing these statistics. Eg. 'time' under-reports CPU usage if the threaded app is Ctrl-C-ed prematurely. The solution is to reparent threads to the group leader - this is now very easy since we have p->group_leader cached and it's also valid all the time. It's also somewhat faster for applications that use CLONE_THREAD but do not use the CLONE_DETACHED feature.
-
Ingo Molnar authored
This fixes a number of bugs that broke ptrace: - wait4 must not inhibit TASK_STOPPED processes even for thread group leaders. - do_notify_parent() should not delay the notification of parents if the thread in question is ptraced. strace now works as expected for CLONE_THREAD applications as well.
-
Ingo Molnar authored
This optimizes sys_exit_group() to only take the siglock if it's a true thread group. Boots & works fine.
-
Ingo Molnar authored
This fixes three resource accounting related bugs introduced by detached threads: - the 'child CPU usage' fields were updated in wait4 until now - this was slightly buggy for a number of reasons, eg. if the exit_code writout faults then it's possible to trigger this code multiple times. - those threads that do not go through wait4 were not properly accounted. - sched_exit() was incorrectly assuming that current == parent. In the detached case p->parent is the real parent. with this patch applied things like 'time' work again for new-style threaded apps.
-
Ingo Molnar authored
This fixes a clone-flags bug noticed by Roland McGrath. The current CLONE_DETACHED & CLONE_THREAD forcing code did things in the wrong order, which makes it possible to force an oops the following way: main () { syscall(120, 0x00400000); } instead of changing the order of CLONE_SIGHAND and CLONE_THREAD flag forcing (which would fix the bug), the proper approach is to fail with -EINVAL if invalid combinations of clone flags are detected. This change does not affect existing applications.
-
Ingo Molnar authored
the attached patch (against BK-curr) fixes a sys_wait4() bug noticed by Ulrich Drepper. The kernel would not block properly if there are eligible children delayed due to the new delayed thread-group-leader logic. The solution is to introduce a new type of 'eligible child' type - and skip over delayed children but set the wait4 flag nevertheless. The libpthreads testcase that failed due to it now it works fine.
-
Paul Mackerras authored
into samba.org:/home/paulus/kernel/for-linus-ppc
-
- 14 Sep, 2002 10 commits
-
-
Paul Mackerras authored
into samba.org:/home/paulus/kernel/for-linus-ppc
-
Linus Torvalds authored
- HT CPU's can share the MTRR state between cores - the code uses static variables that are shared
-
Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
Ingo Molnar authored
I fixed up the 'remove thread group inferiors from the tasklist' patch. I think i managed to find a reasonably good construct to iterate over all threads: do_each_thread(g, p) { ... } while_each_thread(g, p); the only caveat with this is that the construct suggests a single-loop - while it's two loops internally - and 'break' will not work. I added a comment to sched.h that warns about this, but perhaps it would help more to have naming that suggests two loops: for_each_process_do_each_thread(g, p) { ... } while_each_thread(g, p); but this looks a bit too long. I dont know. We might as well use it all unrolled and no helper macros - although with the above construct it's pretty straightforward to iterate over all threads in the system.
-
Petr Vandrovec authored
This fixes endless loop without schedule which happens as soon as smbd invokes fcntl64(7, F_SETLK64, ...). fcntl_setlk64 gets cmd F_SETLK64, not F_SETLK tested in the loop; Maybe return value from posix_lock_file should be changed to -EINPROGRESS or -EJUKEBOX instead of testing passed cmd in callers, but this oneliner works too. If you preffer changing posix_lock_file return value to clearly distinugish between -EAGAIN and lock request queued, I'll do that.
-
Ingo Molnar authored
On 13 Sep 2002, Paul Larson wrote: > > The nightly LTP test against the 2.5 kernel bk tree last night turned up > some test failures we don't normally see. These failures did not show > up in the run from the previous night. [...] > I found what was breaking this, looks like it was this change from your > shared thread signals patch: > - if (sig < 1 || sig > _NSIG || > - (act && (sig == SIGKILL || sig == SIGSTOP))) > + if (sig < 1 || sig > _NSIG || (act && sig_kernel_only(sig))) This fixes this bug and a number of others in the same class - the signal behavior bitmasks should never be consulted before making sure that the signal is in the word range.
-
Ingo Molnar authored
This fixes the Mozilla SMP lockup in the exit path.
-
Neil Brown authored
The partition changes shifted a lot of indexes down one, but this one shouldn't have been shifted...
-
Paul Mackerras authored
into au1.ibm.com:/home/paulus/kernel/for-linus-ppc
-
Arnaldo Carvalho de Melo authored
. No need for the timer_running member on llc_timer, we only need it in one place, and timer_pending is equivalent. One more procom OS generalisation killed. . Move the skb->protocol assignment in llc_build_and_send_pkt routines and llc_ui_send_data to the caller, this is the common practice in Linux networking code (think netif_rx) and required to keep the request functions in psnap and p8022 simple. . Remove the rpt_status (report status) ev members, not used at all, not even in the original procom code. . Convert psnap and p8022 request functions to use llc_ui_build_and_send_ui_pkt, removing all the prim cruft.
-
- 13 Sep, 2002 3 commits
-
-
Andrew Morton authored
This adds support for synchronous iocbs and converts generic_file_read to use a sync iocb to call into generic_file_aio_read. The tests I've run with lmbench on a piii-866 showed no difference in file re-read speed when forced to use a completion path via aio_complete and an -EIOCBQUEUED return from generic_file_aio_read -- people with slower machines might want to test this to see if we can tune it any better. Also, a bug fix to correct a missing call into the aio code from the fork code is present. This patch sets things up for making generic_file_aio_read actually asynchronous.
-
Andrew Morton authored
This is Janet Morgan's patch which converts the readv/writev code to submit all segments for IO before waiting on them, rather than submitting each segment separately. This is a critical performance fix for O_DIRECT reads and writes. Prior to this change, O_DIRECT vectored IO was forced to wait for completion against each segment of the iovec rather than submitting all segments and waiting on the lot. ie: for ten segments, this code will be ten times faster. There will also be moderate improvements for buffered IO - smaller code paths, plus writev() only takes i_sem once. The patch ended up quite large unfortunately - turned out that the only sane way to implement this without duplicating significant amounts of code (the generic_file_write() bounds checking, all the O_DIRECT handling, etc) was to redo generic_file_read() and generic_file_write() to take an iovec/nr_segs pair rather than `buf, count'. New exported functions generic_file_readv() and generic_file_writev() have been added: ssize_t generic_file_readv(struct file *filp, const struct iovec *iov, unsigned long nr_segs, loff_t *ppos); ssize_t generic_file_writev(struct file *file, const struct iovec *iov, unsigned long nr_segs, loff_t * ppos); If a driver does not use these in their file_operations then they will continue to use the old readv/writev code, which sits in a loop calling calls fops->read() or fops->write(). ext2, ext3, JFS and the blockdev driver are currently using this capability. Some coding cleanups were made in fs/read_write.c. Mainly: - pass "READ" or "WRITE" around to indicate the diretion of the operation, rather than the (confusing, inverted) VERIFY_READ/VERIFY_WRITE. - Use the identifier `nr_segs' everywhere to indicate the iovec length rather than `count', which is often used to indicate the number of bytes in the syscall. It was confusing the heck out of me. - Some cleanups to the raw driver. - Some additional generality in fs/direct_io.c: the core `struct dio' used to be a "populate-and-go" thing. Janet has broken that up so you can initialise a struct dio once, then loop around feeding it more file segments, then wait on completion against everything. - In a couple of places we needed to handle the situation where we knew, a-priori, that the user was going to get a short read or write. File size limit exceeded, read past i_size, etc. We handled that by shortening the iovec in-place with iov_shorten(). Which is not particularly pretty, but neither were the alternatives.
-
Ingo Molnar authored
This makes NMIs work - otherwise they go to CPU 0 only and any hard lockup on the other CPUs will not be detected by the nmi_watchdog.
-