- 08 Jan, 2005 40 commits
-
-
Ingo Molnar authored
This is the current remove-BKL patch. I test-booted it on x86 and x64, trying every conceivable combination of SMP, PREEMPT and PREEMPT_BKL. All other architectures should compile as well. (most of the testing was done with the zaphod patch undone but it applies cleanly on vanilla -mm3 as well and should work fine.) this is the debugging-enabled variant of the patch which has two main debugging features: - debug potentially illegal smp_processor_id() use. Has caught a number of real bugs - e.g. look at the printk.c fix in the patch. - make it possible to enable/disable the BKL via a .config. If this goes upstream we dont want this of course, but for now it gives people a chance to find out whether any particular problem was caused by this patch. This patch has one important fix over the previous BKL patch: on PREEMPT kernels if we preempted BKL-using code then the code still auto-dropped the BKL by mistake. This caused a number of breakages for testers, which breakages went away once this bug was fixed. Also the debugging mechanism has been improved alot relative to the previous BKL patch. Would be nice to test-drive this in -mm. There will likely be some more smp_processor_id() false positives but they are 1) harmless 2) easy to fix up. We could as well find more real smp_processor_id() related breakages as well. The most noteworthy fact is that no BKL-using code was found yet that relied on smp_processor_id(), which is promising from a compatibility POV. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hugh Dickins authored
Despite its restart_pgoff pretentions, unmap_mapping_range_vma was fatally unable to distinguish a vma to be restarted from the case where that vma has been freed, and its vm_area_struct reused for the top part of a !new_below split of an isomorphic vma yet to be scanned. The obvious answer is to note restart_vma in the struct address_space, and cancel it when that vma is freed; but I'm reluctant to enlarge every struct inode just for this. Another answer is to flag valid restart in the vm_area_struct; but vm_flags is protected by down_write of mmap_sem, which we cannot take within down_write of i_sem. If we're going to need yet another field, better to record the restart_addr itself: restart_vma only recorded the last restart, but a busy tree could well use more. Actually, we don't need another field: we can neatly (though naughtily) keep restart_addr in vm_truncate_count, provided mapping->truncate_count leaps over those values which look like a page-aligned address. Zero remains good for forcing a scan (though now interpreted as restart_addr 0), and it turns out no change is needed to any of the vm_truncate_count settings in dup_mmap, vma_link, vma_adjust, move_one_page. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hugh Dickins authored
If unmap_mapping_range (and mapping->truncate_count) are doing their jobs right, truncate_complete_page should never find the page mapped: add BUG_ON for our immediate testing, but this patch should probably not go to mainline - a mapped page here is not a catastrophe. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hugh Dickins authored
Fix some unlikely races in respect of vm_truncate_count. Firstly, it's supposed to be guarded by i_mmap_lock, but some places copy a vma structure by *new_vma = *old_vma: if the compiler implements that with a bytewise copy, new_vma->vm_truncate_count could be munged, and new_vma later appear up-to-date when it's not; so set it properly once under lock. vma_link set vm_truncate_count to mapping->truncate_count when adding an empty vma: if new vmas are being added profusely while vmtruncate is in progess, this lets them be skipped without scanning. vma_adjust has vm_truncate_count problem much like it had with anon_vma under mprotect merge: when merging be careful not to leave vma marked as up-to-date when it might not be, lest unmap_mapping_range in progress - set vm_truncate_count 0 when in doubt. Similarly when mremap moving ptes from one vma to another. Cut a little code from __anon_vma_merge: now vma_adjust sets "importer" in the remove_next case (to get its vm_truncate_count right), its anon_vma is already linked by the time __anon_vma_merge is called. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hugh Dickins authored
vmtruncate (or more generally, unmap_mapping_range) has been observed responsible for very high latencies: the lockbreak work in unmap_vmas is good for munmap or exit_mmap, but no use while mapping->i_mmap_lock is held, to keep our place in the prio_tree (or list) of a file's vmas. Extend the zap_details block with i_mmap_lock pointer, so unmap_vmas can detect if that needs lockbreak, and break_addr so it can notify where it left off. Add unmap_mapping_range_vma, used from both prio_tree and nonlinear list handlers. This is what now calls zap_page_range (above unmap_vmas), but handles the lockbreak and restart issues: letting unmap_mapping_range_ tree or list know when they need to start over because lock was dropped. When restarting, of course there's a danger of never making progress. Add vm_truncate_count field to vm_area_struct, update that to mapping-> truncate_count once fully scanned, skip up-to-date vmas without a scan (and without dropping i_mmap_lock). Further danger of never making progress if a vma is very large: when breaking out, save restart_vma and restart_addr (and restart_pgoff to confirm, in case vma gets reused), to help continue where we left off. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hugh Dickins authored
Move unmap_mapping_range's nonlinear vma handling out to its own inline, parallel to the prio_tree handling; unmap_mapping_range_list is a better name for the nonlinear list, rename the other unmap_mapping_range_tree. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hugh Dickins authored
The low-latency unmap_vmas patch silently moved the zap_bytes test after the TLB finish and lockbreak and regather: why? That not only makes zap_bytes redundant (might as well use ZAP_BLOCK_SIZE), it makes the unmap_vmas level redundant too - it's all about saving TLB flushes when unmapping a series of small vmas. Move zap_bytes test back before the lockbreak, and delete the curious comment that a small zap block size doesn't matter: it's true need_flush prevents TLB flush when no page has been unmapped, but unmapping pages in small blocks involves many more TLB flushes than in large blocks. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Hugh Dickins authored
Why is mapping->truncate_count atomic? It's incremented inside i_mmap_lock (and i_sem), and the reads don't need it to be atomic. And why smp_rmb() before call to ->nopage? The compiler cannot reorder the initial assignment of sequence after the call to ->nopage, and no cpu (yet!) can read from the future, which is all that matters there. And delete totally bogus reset of truncate_count from blkmtd add_device. truncate_count is all about detecting i_size changes: i_size does not change there; and if it did, the count should be incremented not reset. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
blockmtd doesn't need to initialise address_space.truncate_count: open_bdev_excl did that. Plus I have a patch queued up which removes ->truncate_count. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Matthew Dobson authored
From: William Lee Irwin III <wli@holomorphy.com> Without passing this parameter by reference, the changes to used_node_mask are meaningless and do not affect the caller's copy. This leads to boot-time failure. This proposed fix passes it by reference. Signed-off-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Matthew Dobson authored
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Matthew Dobson authored
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Matthew Dobson authored
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Matthew Dobson authored
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Matthew Dobson authored
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Matthew Dobson authored
From: Jesse Barnes <jbarnes@engr.sgi.com> Here are some compile fixes for this patch. Looks like simple typos. Signed-off-by: Jesse Barnes <jbarnes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Matthew Dobson authored
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Matthew Dobson authored
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Matthew Dobson authored
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Nick Piggin authored
Change the sched-domain debug routine to be called on a per-CPU basis, and executed before the domain is actually attached to the CPU. Previously, all CPUs would have their new domains attached, and then the debug routine would loop over all of them. This has two advantages: First, there is no longer any theoretical races: we are running the debug routine on a domain that isn't yet active, and should have no racing access from another CPU. Second, if there is a problem with a domain, the validator will have a better chance to catch the error and print a diagnostic _before_ the domain is attached, which may take down the system. Also, change reporting of detected error conditions to KERN_ERR instead of KERN_DEBUG, so they have a better chance of being seen in a hang on boot situation. The patch also does an unrelated (and harmless) cleanup in migration_thread(). Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
The patch below fixes smp_processor_id() warnings that are triggered by numa_node_id(). All uses of numa_node_id() in mm/mempolicy.c seem to use it as a 'hint' only, not as a correctness number. Once a node is established, it's used in a preemption-safe way. So the simple fix is to disable the checking for numa_node_id(). But additional review would be more than welcome, because this patch turns off the preemption-checking of numa_node_id() permanently. Tested on amd64. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
Clean up a few suspicious-looking uses of smp_processor_id() in preemptible code. The current_cpu_data use is unclean but most likely safe. I haven't seen any outright bugs. Since oprofile does not seem to be ready for different-type CPUs (do we even care?), the patch below documents this property by using boot_cpu_data. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
The early bootup stage is pretty fragile because the idle thread is not yet functioning as such and so we need preemption disabled. Whether the bootup fails or not seems to depend on timing details so e.g. the presence of SCHED_SMT makes it go away. Disabling preemption explicitly has another advantage: the atomicity check in schedule() will catch early-bootup schedule() calls from now on. The patch also fixes another preempt-bkl buglet: interrupt-driven forced-preemption didnt go through preempt_schedule() so it resulted in auto-dropping of the BKL. Now we go through preempt_schedule() which properly deals with the BKL. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
This patch adds a handful of cond_resched() points to a number of key, scheduling-latency related non-inlined functions. This reduces preemption latency for !PREEMPT kernels. These are scheduling points complementary to PREEMPT_VOLUNTARY scheduling points (might_sleep() places) - i.e. these are all points where an explicit cond_resched() had to be added. Has been tested as part of the -VP patchset. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
This patch fixes scheduling latencies in vgacon_do_font_op(). The code is protected by vga_lock already so it's safe to drop (and re-acquire) the BKL. Has been tested in the -VP patchset. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
Fix scheduling latencies in the MTRR-setting codepath. Also, fix bad bug: MTRR's _must_ be set with interrupts disabled! From: Bernard Blackham <bernard@blackham.com.au> The patch sched-fix-scheduling-latencies-in-mttr in recent -mm kernels has the bad side-effect of re-enabling interrupts even if they were disabled. This caused bugs in Software Suspend 2 which reenabled MTRRs whilst interrupts were already disabled. Attached is a replacement patch which uses spin_lock_irqsave instead of spin_lock_irq. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
We dont want to execute off keventd since it might hold a semaphore our callers hold too. This can happen when kthread_create() is called from within keventd. This happened due to the IRQ threading patches but it could happen with other code too. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
The attached patch, written by Andrew Morton, fixes long scheduling latencies in filemap_sync(). Has been tested as part of the -VP patchset. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
The attached patch fixes long scheduling latencies in get_user_pages(). Has been tested as part of the -VP patchset. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
The attached patch fixes long latencies in unmap_vmas(). We had lockbreak code in that function already but it did not take delayed effects of TLB-gather into account. Has been tested as part of the -VP patchset. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
The attached patch fixes long scheduling latencies caused by backlog triggered by __release_sock(). That code only executes in process context, and we've made the backlog queue private already at this point so it is safe to do a cond_resched_softirq(). Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
The attached patch fixes long scheduling latencies caused by access to the /proc/net/tcp file. The seqfile functions keep softirqs disabled for a very long time (i've seen reports of 20+ msecs, if there are enough sockets in the system). With the attached patch it's below 100 usecs. The cond_resched_softirq() relies on the implicit knowledge that this code executes in process context and runs with softirqs disabled. Potentially enabling softirqs means that the socket list might change between buckets - but this is not an issue since seqfiles have a 4K iteration granularity anyway and /proc/net/tcp is often (much) larger than that. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
The attached patch fixes long scheduling latencies in select_parent() and prune_dcache(). The prune_dcache() lock-break is easy, but for select_parent() the only viable solution i found was to break out if there's a resched necessary - the reordering is not necessary and the dcache scanning/shrinking will later on do it anyway. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
break latency in invalidate_list(). Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
The attached patch fixes long scheduling latencies in the ext3 code, and it also cleans up the existing lock-break functionality to use the new primitives. This patch has been in the -VP patchset for quite some time. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
It adds cond_resched_softirq() which can be used by _process context_ softirqs-disabled codepaths to preempt if necessary. The function will enable softirqs before scheduling. (Later patches will use this primitive.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
Add lock_need_resched() which is to check for the necessity of lock-break in a critical section. Used by later latency-break patches. tested on x86, should work on all architectures. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
This is another generic fallout from the voluntary-preempt patchset: a cleanup of the cond_resched() infrastructure, in preparation of the latency reduction patches. The changes: - uninline cond_resched() - this makes the footprint smaller, especially once the number of cond_resched() points increase. - add a 'was rescheduled' return value to cond_resched. This makes it symmetric to cond_resched_lock() and later latency reduction patches rely on the ability to tell whether there was any preemption. - make cond_resched() more robust by using the same mechanism as preempt_kernel(): by using PREEMPT_ACTIVE. This preserves the task's state - e.g. if the task is in TASK_ZOMBIE but gets preempted via cond_resched() just prior scheduling off then this approach preserves TASK_ZOMBIE. - the patch also adds need_lockbreak() which critical sections can use to detect lock-break requests. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ingo Molnar authored
SMP locking latencies are one of the last architectural problems that cause millisec-category scheduling delays. CONFIG_PREEMPT tries to solve some of the SMP issues but there are still lots of problems remaining: spinlocks nested at multiple levels, spinning with irqs turned off, and non-nested spinning with preemption turned off permanently. The nesting problem goes like this: if a piece of kernel code (e.g. the MM or ext3's journalling code) does the following: spin_lock(&spinlock_1); ... spin_lock(&spinlock_2); ... then even with CONFIG_PREEMPT enabled, current kernels may spin on spinlock_2 indefinitely. A number of critical sections break their long paths by using cond_resched_lock(), but this does not break the path on SMP, because need_resched() *of the other CPU* is not set so cond_resched_lock() doesnt notice that a reschedule is due. to solve this problem i've introduced a new spinlock field, lock->break_lock, which signals towards the holding CPU that a spinlock-break is requested by another CPU. This field is only set if a CPU is spinning in a spinlock function [at any locking depth], so the default overhead is zero. I've extended cond_resched_lock() to check for this flag - in this case we can also save a reschedule. I've added the lock_need_resched(lock) and need_lockbreak(lock) methods to check for the need to break out of a critical section. Another latency problem was that the stock kernel, even with CONFIG_PREEMPT enabled, didnt have any spin-nicely preemption logic for the following, commonly used SMP locking primitives: read_lock(), spin_lock_irqsave(), spin_lock_irq(), spin_lock_bh(), read_lock_irqsave(), read_lock_irq(), read_lock_bh(), write_lock_irqsave(), write_lock_irq(), write_lock_bh(). Only spin_lock() and write_lock() [the two simplest cases] where covered. In addition to the preemption latency problems, the _irq() variants in the above list didnt do any IRQ-enabling while spinning - possibly resulting in excessive irqs-off sections of code! preempt-smp.patch fixes all these latency problems by spinning irq-nicely (if possible) and by requesting lock-breaks if needed. Two architecture-level changes were necessary for this: the addition of the break_lock field to spinlock_t and rwlock_t, and the addition of the _raw_read_trylock() function. Testing done by Mark H Johnson and myself indicate SMP latencies comparable to the UP kernel - while they were basically indefinitely high without this patch. i successfully test-compiled and test-booted this patch ontop of BK-curr using the following .config combinations: SMP && PREEMPT, !SMP && PREEMPT, SMP && !PREEMPT and !SMP && !PREEMPT on x86, !SMP && !PREEMPT and SMP && PREEMPT on x64. I also test-booted x86 with the generic_read_trylock function to check that it works fine. Essentially the same patch has been in testing as part of the voluntary-preempt patches for some time already. NOTE to architecture maintainers: generic_raw_read_trylock() is a crude version that should be replaced with the proper arch-optimized version ASAP. From: Hugh Dickins <hugh@veritas.com> The i386 and x86_64 _raw_read_trylocks in preempt-smp.patch are too successful: atomic_read() returns a signed integer. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Nathan Lynch authored
Call idle_task_exit from cpu_die to avoid mm_struct leak. Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-