- 10 Jul, 2003 40 commits
-
-
Jens Axboe authored
We should only account file system requests, ones originating from __make_request(). Otherwise it skews the counters and they go negative really fast.
-
Andrew Morton authored
From: Davide Libenzi <davidel@xmailserver.org> Fix epoll to allow pushing of multiple file descriptors sharing the same kernel's file*
-
Andrew Morton authored
From: Andrey Borzenkov <arvidjaar@mail.ru> I finally hit a painfully trivial way to reproduce another long standing devfs problem - deadlock between devfs_lookup and devfs_d_revalidate_wait. When devfs_lookup releases directory i_sem devfs_d_revalidate_wait grabs it (it happens not for every path) and goes to wait to be waked up. Unfortunately, devfs_lookup attempts to acquire directory i_sem before ever waking it up ... To reproduce (2.5.74 UP or SMP - does not matter, single CPU system) ls /dev/foo & rm -f /dev/foo & or possibly in a loop but then it easily fills up process table. In my case it hangs 100% reliably - on 2.5 OR 2.4. The current fix is to move re-acquire of i_sem after all devfs_d_revalidate_wait waiters have been waked up. Much better fix would be to ensure that ->d_revalidate either is always called under i_sem or always without. But that means the very heart of VFS and I do not dare to touch it. The fix has been tested on 2.4 (and is part of unofficial Mandrake Club kernel); I expected the same bug is in 2.5; I just was stupid not seeing the way to reproduce it before.
-
Andrew Morton authored
From: Andrey Borzenkov <arvidjaar@mail.ru> Doing concurrent lookups for the same name in devfs with devfsd and modules enabled may result in stack coruption. When devfs_lookup needs to call devfsd it arranges for other lookups for the same name to wait. It is using local variable as wait queue head. After devfsd returns devfs_lookup wakes up all waiters and returns. Unfortunately there is no garantee all waiters will actually get chance to run and clean up before devfs_lookup returns. so some of them attempt to access already freed storage on stack. It is trivial to trigger with SMP kernel (I have single-CPU system if it matters) doing while true do ls /dev/foo & done Without spinlock debug system usually hung dead with reset button as the only possibility. I was not able to reproduce it on 2.4 on single-CPU system - in 2.4 devfs_d_revalidate_wait does not attempt to remove itself from wait queue so it appears to be safe. The patch makes lookup struct be allocated from heap and adds reference counter to free it when no more needed.
-
Andrew Morton authored
From: Russell King <rmk@arm.linux.org.uk> Interrupts can sometimes occur before the socket thread is started.
-
Andrew Morton authored
From: Daniel Ritz <daniel.ritz@gmx.ch> init_socket() enables interrupts, and the interrupt handler does a wakeup. Let's initialise that waitqueue head before turning on the interrupts.
-
Andrew Morton authored
From: William Lee Irwin III <wli@holomorphy.com> There are reports of kernel threads being killed by the oomkiller. We think this is because the oom killer tries to kill a task after it has exitted and set its ->mm to zero. The oom killer will then try to kill all other tasks which have a null ->mm. Attempt to detect that case and fix it up.
-
Andrew Morton authored
From: Alex Tomas <bzzz@tmi.comex.ru> fsync_super() calls ->sync_fs() just after ->write_super(). But write_super() will start a commit. In this case, ext3_sync_fs() will not itself start a commit, and it hence forgets to wait on the commit which ext3_write_super() started. Fix that up by making journal_start_commit() return the transaction ID of any currently-running transaction.
-
Andrew Morton authored
From: Alex Tomas <bzzz@tmi.comex.ru> start_this_handle() takes into account t_outstanding_credits when calculating log free space, but journal_next_log_block() accounts for blocks being logged also. Hence, blocks are accounting twice. This effectively reduces the amount of log space available to transactions and forces more commits. Fix it by decrementing t_outstanding_credits each time we allocate a new journal block.
-
Andrew Morton authored
From: Alex Tomas <bzzz@tmi.comex.ru> Some transaction checkpointing improvements for the JBD commit phase. Decent speedups: creation of 500K files in single dir (with htree, of course): before: 4m16.094s, 4m12.035s, 4m11.911s after: 1m41.364s, 1m43.461s, 1m45.189s removal of 500K files in single dir: before: 43m50.161s after: 38m45.510s - Make __log_wait_for_space() recalculate the needed blocks because journal free space changes during commit - Make log_do_checkpoint() starts scanning from the oldest transaction - Make log_do_checkpoint() stop scanning if a transaction gets dropped. The caller will reevaluate the transaction state and decide whether more space needs to be generated in the log. The effect of this is to smooth out the I/O patterns, avoid the huge stop-and-go which currently happens when forced checkpointing writes out and waits upon 3/4 of the journal's size worth of data.
-
Andrew Morton authored
From: Paul Clements <Paul.Clements@SteelEye.com> Ensure that nbd and the block layer agree about device block sizes and total device sizes.
-
Andrew Morton authored
From: Paul Clements <Paul.Clements@SteelEye.com> Remove the unneeded nbd_open and nbd_release functions.
-
Andrew Morton authored
From: Paul Clements <Paul.Clements@SteelEye.com> Modernise nbd.txt a bit.
-
Andrew Morton authored
From: Lou Langholtz <ldl@aros.net> This fifth patch cleans up usage of the PARANOIA sanity checking macro and code. This patch modifies both drivers/block/nbd.c and include/linux/nbd.h. It's intended to be applied incrementally on top of my fourth patch (4.1 really if you count the memset addition as .1's worth) that simply removed unneeded blksize_bits field. Again, I wanted to get this smaller change out of the way before my next patch will is much more major.
-
Andrew Morton authored
From: Lou Langholtz <ldl@aros.net> Fixes the NBD oopses which people have been reporting.
-
Andrew Morton authored
From: Lou Langholtz <ldl@aros.net> This fourth patch simply removes the blksize_bits field from the nbd_device struct and driver implementation. How this field made it into this driver to begin with is a mystery (where was Al Viro when that patch was submitted??). :-) This patch modifies both drivers/block/nbd.c and include/linux/nbd.h files. It's intended to be applied incrementally on top of my third patch (for enhanced diagnostics support).
-
Andrew Morton authored
From: Lou Langholtz <ldl@aros.net> This third patch (for enhancing diagnostics support) applies incrementally after my last LKML'd patch (for cosmetic changes). These changes introduce configurable KERN_DEBUG level printk output for a variety of different things that the driver does and provides the framework for enhanced future debugging support as well.
-
Andrew Morton authored
From: Lou Langholtz <ldl@aros.net> It's a helpful step in being better able to identify code inefficiencies and problems particularly w.r.t. locking. It also modifies some of the output messages for greater consistancy and better diagnostic support. This second patch is a lead in that way to the third patch, which will simply introduce the dprintk() debugging facility that my jumbo patch originally had. With the cosmetics patch and debugging enhancement (patch), it will make it easier to fix or at least improve the locking bugs/races in NBD (that will likely make up the fourth patch in my envisioned roadmap).
-
Andrew Morton authored
From: Ingo Molnar <mingo@elte.hu> It makes hot-balancing happen in the 'busy tick' case as well, which should spread out processes more agressively.
-
Andrew Morton authored
From: Maneesh Soni <maneesh@in.ibm.com> While path walking we do follow_mount or follow_down which uses dcache_lock for serialisation. vfsmount related operations also use dcache_lock for all updates. I think we can use a separate lock for vfsmount related work and can improve path walking. The following two patches does the same. The first one replaces dcache_lock with new vfsmount_lock in namespace.c. The lock is local to namespace.c and is not required outside. The second patch uses RCU to have lock free lookup_mnt(). The patches are quite simple and straight forward. The lockmeter reults show reduced contention, and lock acquisitions for dcache_lock while running dcachebench* on a 4-way SMP box SPINLOCKS HOLD WAIT UTIL CON MEAN( MAX ) MEAN( MAX )(% CPU) TOTAL NOWAIT SPIN RJECT NAME baselkm-2569: 20.7% 20.9% 0.5us( 146us) 2.9us( 144us)(0.81%) 31590840 79.1% 20.9% 0% dcache_lock mntlkm-2569: 14.3% 13.6% 0.4us( 170us) 2.9us( 187us)(0.42%) 23071746 86.4% 13.6% 0% dcache_lock We get more than 8% improvement on 4-way SMP and 44% improvement on 16-way NUMAQ while runing dcachebench*. Average (usecs/iteration) Std. Deviation (lower is better) 4-way SMP 2.5.69 15739.3 470.90 2.5.69-mnt 14459.6 298.51 16-way NUMAQ 2.5.69 120426.5 363.78 2.5.69-mnt 63225.8 427.60 *dcachebench is a microbenchmark written by Bill Hartner and is available at http://www-124.ibm.com/developerworks/opensource/linuxperf/dcachebench/dcachebench.html vfsmount_lock.patch ------------------- - Patch for replacing dcache_lock with new vfsmount_lock for all mount related operation. This removes the need to take dcache_lock while doing follow_mount or follow_down operations in path walking. I re-ran dcachebench with 2.5.70 as base on 16-way NUMAQ box. Average (usecs/iteration) Std. Deviation (lower is better) 16-way NUMAQ 2.5.70 120710.9 230.67 + vfsmount_lock.patch 65209.6 242.97 + lookup_mnt-rcu.patch 64042.3 416.61 So just the lock splitting (vfsmount_lock.patch) gives almost similar benifits
-
Andrew Morton authored
From: "Chen, Kenneth W" <kenneth.w.chen@intel.com> We hit a memory ordering race condition on AIO ring buffer tail pointer between function aio_complete() and aio_read_evt(). What happens is that on an architecture that has a relaxed memory ordering model like IPF(ia64), explicit memory barrier is required in a SMP execution environment. Considering the following case: 1 CPU is executing a tight loop of aio_read_evt. It is pulling event off the ring buffer. During that loop, another CPU is executing aio_complete() where it is putting event into the ring buffer and then update the tail pointer. However, due to relaxed memory ordering model, the tail pointer can be visible before the actual event is being updated. So the other CPU sees the updated tail pointer but picks up a staled event data. A memory barrier is required in this case between the event data and tail pointer update. Same is true for the head pointer but the window of the race condition is nil. For function correctness, it is fixed here as well. By the way, this bug is fixed in the major distributor's kernel on 2.4.x kernel series for a while, but somehow hasn't been propagated to 2.5 kernel yet.
-
Andrew Morton authored
From: "Chen, Kenneth W" <kenneth.w.chen@intel.com> We hit this bug when we have the following scenario: One process initializes an AIO context and then forks out many child processes. When those child processes exit, many BUG checks (effectively kernel oops) were triggered from put_ioctx(ctx) in function exit_aio(). The issue was that the AIO context was incorrectly copied upon forking and mislead all child processes to think they have an IO context and trying to free it where they really don't own. The following patch fix the issue.
-
Andrew Morton authored
From: Andreas Gruenbacher <agruen@suse.de> Without acls, when creating files the umask is applied directly in the vfs. ACLs require that the umask is applied at the file system level, depending on whether or not the containing directory has a default acl. The daemonize() function makes kernel threads share their fs_struct structure with the init process. Among other things, fs_struct contains the umask, so all kernel threads share their umask with init. The kernel nfsd needs to create files with a umask of 0. Init's umask cannot simply be changed to 0 --- this would have side effects on init, and init would have side effects on nfsd. So this patch recreates a fs_struct structure for nfsd kernel threads, and sets its umask to 0. This fixes bug #721, <http://www.osdl.net/show_bug.cgi?id=721>.
-
Andrew Morton authored
- remove accidental debug code from ext3 commit. - /proc/profile documentation fix (Randy Dunlap) - use sb_breadahead() in ext2_preread_inode() - unused var in mpage_writepages()
-
Andrew Morton authored
From: Diego Calleja Garcia <diegocg@teleline.es> Move CONFIG_KALLSYMS out of the arch directory and into init/. It defaults to "on" unless the user explicitly turns it off in the "embedded systems" menu.
-
Andrew Morton authored
replace a kmap() with kmap_atomic()
-
Andrew Morton authored
From: Daniel McNeil <daniel@osdl.org> This adds i_seqcount to the inode structure and then uses i_size_read() and i_size_write() to provide atomic access to i_size. This is a port of Andrea Arcangeli's i_size atomic access patch from 2.4. This only uses the generic reader/writer consistent mechanism. Before: mnm:/usr/src/25> size vmlinux text data bss dec hex filename 2229582 1027683 162436 3419701 342e35 vmlinux After: mnm:/usr/src/25> size vmlinux text data bss dec hex filename 2225642 1027655 162436 3415733 341eb5 vmlinux 3.9k more text, a lot of it fastpath :( It's a very minor bug, and the fix has a fairly non-minor cost. The most compelling reason for fixing this is that writepage() checks i_size. If it sees a transient value it may decide that page is outside i_size and will refuse to write it. Lost user data.
-
Andrew Morton authored
From: Daniel McNeil <daniel@osdl.org> This adds a sequence counter only version of the reader/writer consistent mechanism to seqlock.h This is used in the second part of this patch give atomic access to i_size.
-
Andrew Morton authored
From: Tim Schmielau <tim@physik3.uni-rostock.de> This patch adds (or fixes) initialization of wall_to_monotonic for a few more architectures. This should get rid of the strange uptime>14600 days reports, except on arm whose arch file layout is too unfamiliar to me. The patch is blessed by George Anzinger, but untested due to lack of hardware.
-
Andrew Morton authored
From: Oleg Drokin <green@namesys.com> From the time of reiserfs_file_write inclusion all 64bit arches were not able to work with reiserfs for pretty stupid reason (incorrect "unsigned long" definition of blocknumber type). This fixes the problem.
-
Andrew Morton authored
The ClearPageDirty() in there is wrong - it doesn't adjust the VM's dirty memory accounting. The system thinks it's full of dirty memory and stops.
-
Andrew Morton authored
From: Christoph Hellwig <hch@lst.de> It's not used anymore since ALSA switched to traditional devices and device nodes in procfs are a bad idea in general.. Also update the docs.
-
Andrew Morton authored
From: rwhron@earthlink.net It returns sizeof(compat_ulong_t) even if put_user() faulted.
-
Linus Torvalds authored
-
Linus Torvalds authored
(version 1.8.0 -> 1.9.0)
-
Linus Torvalds authored
(version 2.3.0 -> 2.4.0)
-
Linus Torvalds authored
(version 1.2.1 to 1.3.0)
-
Linus Torvalds authored
-
bk://kernel.bkbits.net/davem/net-2.5Linus Torvalds authored
into home.osdl.org:/home/torvalds/v2.5/linux
-
Matthew Wilcox authored
Update gsc_ps2 for recent changes.
-