- 03 Jan, 2005 23 commits
-
-
Alex Williamson authored
I noticed the function __read_page_state() curiously high in a q-tools profile of a write to a software raid0 device. Seems this is because we're checking page_states for all possible cpus and we have NR_CPUS possible when CONFIG_HOTPLUG_CPU=y. The default config for ia64 is now NR_CPUS=512, so on a little 8-way box, this is a significant waste of time. The patch below updates __read_page_state() and __get_page_state() to only count page_state info for online cpus. To keep the stats consistent, the page_alloc notifier is updated to move page_states off of the cpu going offline. On my profile, this dropped __read_page_state() back into the noise and boosted block write performance by 5% (as measured by spew - http://spew.berlios.de). Signed-off-by: Alex Williamson <alex.williamson@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Manfred Spraul authored
Add ARCH_SLAB_MINALIGN and document ARCH_KMALLOC_MINALIGN: The flags allow the arch code to override the default minimum object aligment (BYTES_PER_WORD). Signed-Off-By: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
mark_page_accessed() is more heavyweight than we need: the page is already headed for the active list, so setting the software-referenced bit is equivalent. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Miquel van Smoorenburg authored
When reading a (partial) page from disk using read(), the kernel only marks the page as "accessed" if the read started at a page boundary. This means that files that are accessed randomly at non-page boundaries (usually database style files) will not be cached properly. The patch below uses the readahead state instead. If a page is read(), it is marked as "accessed" if the previous read() was for a different page, whatever the offset in the page. Testing results: - Boot kernel with mem=128M - create a testfile of size 8 MB on a partition. Unmount/mount. - then generate about 10 MB/sec streaming writes for i in `seq 1 1000` do dd if=/dev/zero of=junkfile.$i bs=1M count=10 sync cat junkfile.$i > /dev/null sleep 1 done - use an application that reads 128 bytes 64000 times from a random offset in the 64 MB testfile. 1. Linux 2.6.10-rc3 vanilla, no streaming writes: # time ~/rr testfile Read 128 bytes 64000 times ~/rr testfile 0.03s user 0.22s system 5% cpu 4.456 total 2. Linux 2.6.10-rc3 vanilla, streaming writes: # time ~/rr testfile Read 128 bytes 64000 times ~/rr testfile 0.03s user 0.16s system 2% cpu 7.667 total # time ~/rr testfile Read 128 bytes 64000 times ~/rr testfile 0.03s user 0.37s system 1% cpu 23.294 total # time ~/rr testfile Read 128 bytes 64000 times ~/rr testfile 0.02s user 0.99s system 1% cpu 1:11.52 total # time ~/rr testfile Read 128 bytes 64000 times ~/rr testfile 0.03s user 0.21s system 2% cpu 10.273 total 3. Linux 2.6.10-rc3 with read-page-access.patch , streaming writes: # time ~/rr testfile Read 128 bytes 64000 times ~/rr testfile 0.02s user 0.21s system 3% cpu 7.634 total # time ~/rr testfile Read 128 bytes 64000 times ~/rr testfile 0.04s user 0.22s system 2% cpu 9.588 total # time ~/rr testfile Read 128 bytes 64000 times ~/rr testfile 0.02s user 0.12s system 24% cpu 0.563 total # time ~/rr testfile Read 128 bytes 64000 times ~/rr testfile 0.03s user 0.13s system 98% cpu 0.163 total As expected, with the read-page-access.patch, the kernel keeps the 8 MB testfile cached as expected, while without it, it doesn't. So this is useful for workloads where one smallish (wrt RAM) file is read randomly over and over again (like heavily used database indexes), while other I/O is going on. Plain 2.6 caches those files poorly, if the app uses plain read(). Signed-Off-By: Miquel van Smoorenburg <miquels@cistron.nl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Dave Hansen authored
When CONFIG_HIGHMEM=y, but ZONE_NORMAL isn't quite full, there is, of course, no actual memory at *high_memory. This isn't a problem with normal virt<->phys translations because it's never dereferenced, but CONFIG_NONLINEAR is a bit more finicky. So, don't do virt_to_phys() to non-existent addresses. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Dave Hansen authored
People love to do comparisons with highmem_start_page. However, where CONFIG_HIGHMEM=y and there is no actual highmem, there's no real page at *highmem_start_page. That's usually not a problem, but CONFIG_NONLINEAR is a bit more strict and catches the bogus address tranlations. There are about a gillion different ways to find out of a 'struct page' is highmem or not. Why not just check page_flags? Just use PageHighMem() wherever there used to be a highmem_start_page comparison. Then, kill off highmem_start_page. This removes more code than it adds, and gets rid of some nasty #ifdefs in .c files. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andries E. Brouwer authored
Alan made overcommit mode 2 and it doesnt work at all. A process passing the limit often does so at a moment of stack extension, and is killed by a segfault, not better than being OOM-killed. Another problem is that close to the edge no other processes can be started, so that a sysadmin has problems logging in and investigating. Below a patch that does 3 things: (1) It reserves a reasonable amount of virtual stack space (amount randomly chosen, no guarantees given) when the process is started, so that the common utilities will not be killed by segfault on stack extension. (2) It reserves a reasonable amount of virtual memory for root, so that root can do things when the system is out-of-memory (3) It limits a single process to 97% of what is left, so that also an ordinary user is able to use getty, login, bash, ps, kill and similar things when one of her processes got out of control. Since the current overcommit mode 2 is not really useful, I did not give this a new number. The patch is just for playing, not to be applied by Linus. But, Andrew, I hope that you would be willing to put this in -mm so that people can experiment. Of course it only does something if one sets overcommit mode to 2. The past month I have pressured people asking for feedback, and now have about a dozen reports, mostly positive, one very positive. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrea Arcangeli authored
Some optimizations in mempolicy.c (like to avoid rebalancing the tree while destroying it and by breaking loops early and not checking for invariant conditions in the replace operation). Signed-off-by: Andrea Arcangeli <andrea@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ram Pai authored
Reinstate the feature wherein readahead will be bypassed if the underlying queue is read-congersted. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Steven Pratt authored
With Ram Pai <linuxram@us.ibm.com> - request size is now passed into page_cache_readahead. This allows the removal of the size averaging code in the current readahead logic. - readahead rampup is now faster (especially for larger request sizes) - No longer "slow read path". Readahead is turn off at first random access, turned back on at first sequential access. - Code now handles thrashing, slowly reducing readahead window until thrashing stops, or min size reached. - Returned to old behavior where first access is assumed sequential only if at offset 0. - designed to handle larger (1M or above) window sizes efficiently Benchmark results: machine 1: 8 way pentiumIV 1GB memory, tests run to 36GB SCSI disk (Similar results were seen on a 1 way 866Mhz box with IDE disk.) TioBench: tiobench.pl --dir /mnt/tmp --block 4096 --size 4000 --numruns 2 --threads 1(4,16,64) 4k request size sequential read results in MB/sec Threads 2.6.9 w/patches %diff diff
-
Nick Piggin authored
Teach kswapd to free memory on behalf of higher order allocators. This could be important for higher order atomic allocations because they otherwise have no means to free the memory themselves. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Nick Piggin authored
Move the watermark checking code into a single function. Extend it to account for the order of the allocation and the number of free pages that could satisfy such a request. From: Marcelo Tosatti <marcelo.tosatti@cyclades.com> Fix typo in Nick's kswapd-high-order awareness patch Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Nick Piggin authored
Keep track of the number of free pages of each order in the buddy allocator. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Ron Murray authored
With Cal Peake <cp@absolutedigital.net> I've found a typo in drivers/input/gameport/Makefile in kernel 2.6.9 which effectively prevents the CS461x gameport code from being included. Signed-off-by: Ron Murray <rjmx@rjmx.net> Signed-off-by: Cal Peake <cp@absolutedigital.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
We haven't been incrementing local variable total_scanned since the scan_control stuff went in. That broke kswapd throttling. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Jan Kara authored
Allow disabling of quota messages to console (they can disturb other output). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Jan Kara authored
Implement quota journaling and quota reading and writing functions for reiserfs. Solves also several other deadlocks possible for reiserfs due to the lock inversion on journal_begin and quota locks. From: Vladimir Saveliev <vs@namesys.com> When CONFIG_QUOTA is defined reiserfs's finish_unfinished sets and clears MS_ACTIVE bit in s_flags field of super block. If that bit was set already it should not be set. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Jan Kara authored
Implementation of quota reading and writing functions for ext3. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Jan Kara authored
Implementation of quota reading and writing functions for ext2. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Jan Kara authored
Fix possible races between umount and quota on/off. Finally I decided to take a reference to vfsmount during vfs_quota_on() and to drop it after the final cleanup in the vfs_quota_off(). This way we should be all the time guarded against umount. This way was protected also the old code which used filp_open() for opening quota files. I was also thinking about other ways of protection but there would be always a window (provided I don't want to play much with namespace locks) where vfs_quota_on() could be called while umount() is in progress resulting in the "Busy inodes after unmount" messages... Get a reference to vfsmount during quotaon() so that we are guarded against umount (as was the old code using filp_open()). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Jan Kara authored
The four patches in this series fix deadlocks with quotas of pagelock (the problem was lock inversion on PageLock and transaction start - quota code needed to first start a transaction and then write the data which subsequently needed acquisition of PageLock while the standard ordering - PageLock first and transaction start later - was used e.g. by pdflush). They implement a new way of quota access to disk: Every filesystem that would like to implement quotas now has to provide quota_read() and quota_write() functions. These functions must obey quota lock ordering (in particular they should not take PageLock inside a transaction). The first patch implements the changes in the quota core, the other three patches implement needed functions in ext2, ext3 and reiserfs. The patch for reiserfs also fixes several other lock inversion problems (similar as ext3 had) and implements the journaled quota functionality (which comes almost for free after the locking fixes...). The quota core patch makes quota support in other filesystems (except XFS which implements everything on its own ;)) unfunctional (quotaon() will refuse to turn on quotas on them). When the patches get reasonable wide testing and it will seem that no major changes will be needed I can make fixes also for the other filesystems (JFS, UDF, UFS). This patch: The patch implements the new way of quota io in the quota core. Every filesystem wanting to support quotas has to provide functions quota_read() and quota_write() obeying quota locking rules. As the writes and reads bypass the pagecache there is some ugly stuff ensuring that userspace can see all the data after quotaoff() (or Q_SYNC quotactl). In future I plan to make quota files inaccessible from userspace (with the exception of quotacheck(8) which will take care about the cache flushing and such stuff itself) so that this synchronization stuff can be removed... The rewrite of the quota core. Quota uses the filesystem read() and write() functions no more to avoid possible deadlocks on PageLock. From now on every filesystem supporting quotas must provide functions quota_read() and quota_write() which obey the quota locking rules (e.g. they cannot acquire the PageLock). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Jan Kara authored
Attached patch fixes debug messages of quota code in reiserfs so that they compile. Chris Mason agreed the patch. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Jan Kara authored
Attached patch exposes reiserfs_sync_fs(). This call is needed by the new quota code to write data to disk on quotaoff so that userspace can see them afterwards. Chris Mason agrees with the patch. Make reiserfs provide the sync_fs() function so that the quota code has a way to reliably force a transaction to disk. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
- 02 Jan, 2005 17 commits
-
-
Nick Piggin authored
Fix a 4-level page table bug that slipped through (introduced by me, not Andi). Compiles and boots on ia64 and 2-level i386. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Linus Torvalds authored
Pretty much all the TF-related comments were stale, and had been for a long time. Fix them up, clean up code.
-
Linus Torvalds authored
PT_DTRACE without PT_PTRACED. Long ago, the "D" in PT_DTRACE meant "Delayed", and it was used as a flag to mark that we had ptrace'd the process but no longer did so. That hasn't been true in a while now, and the flag should probably be renamed, but in the meantime the test for PT_PTRACED being cleared had been corrupted into something totally nonsensical. Pointed out by Andi Kleen.
-
bk://bk.arm.linux.org.uk/linux-2.6-mmcLinus Torvalds authored
into ppc970.osdl.org:/home/torvalds/v2.6/linux
-
Russell King authored
Quieten down compiler warnings, and fix an off-by-one bug when deciding whether to include the next word.
-
bk://bk.arm.linux.org.uk/linux-2.6-rmkLinus Torvalds authored
into ppc970.osdl.org:/home/torvalds/v2.6/linux
-
Linus Torvalds authored
It didn't allocate space for the final terminating entry, which caused it to overwrite the next slab entry, which in turn sometimes ended up being a slab array cache pointer. End result: total slab cache corruption at a random time afterwards. Very nasty.
-
Alexander Viro authored
some trivial iomem annotations were still missing Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Alexander Viro authored
local variable used to store flags after spin_lock_irqsave() should be unsigned long, not u32. That should complete the 64bit cleanups in there. Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Alexander Viro authored
- get_user() __gu_val should be unsigned long (same as with i386 patch) - __copy_to_user() et.al. didn't have proper type checking - documented the casts in __copy_tofrom_user() calls with __force. Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Rusty Russell authored
nfsim gains sysctl support, and sure enough, --failtest uncovered an unregister when the registration had failed. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Rusty Russell authored
Someone thought it would be clever if proc code ignores removal of non-existent entries. Hence, we missed that /proc/net/stat/ip_conntrack is never removed on module removal or init failure. Found by nfsim. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
Randy Dunlap authored
Somehow parport_pc.c ended up with mixed old-style and new-style module parameters, but mixing them is not allowed. Use module_param() instead of MODULE_PARM() -- cannot be mixed. Signed-off-by: Randy Dunlap <rddunlap@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
William Lee Irwin III authored
Although the CG6 framebuffer is detected and initialized, without this patch all it displays is a blank screen. Tested on an Ultra 1 with a TGX+. Originally from Bob Breuer for the CG14. Signed-off-by: Adam Kropelin <akropel1@rochester.rr.com> Acked-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
William Lee Irwin III authored
sparc32 had a conflicting _exit, removed the line from asm- sparc/unistd.h. This is the same change that DaveM made to sparc64 here: http://linux.bkbits.net:8080/linux-2.6/diffs/include/asm- sparc64/unistd.h@1.33 Warning was: In file included from include/linux/unistd.h:9, from init/main.c:45: include/asm/unistd.h:489: warning: conflicting types for built-in function '_exit' Signed-off-by: Tom 'spot' Callaway <tcallawa@redhat.com> Acked-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
William Lee Irwin III authored
Squelch the floppy compile warning: include/asm/floppy.h: In function `sun_fd_request_irq': include/asm/floppy.h:276: warning: passing arg 2 of `request_fast_irq' from incompatible pointer type Signed-off-by: Tom 'spot' Callaway <tcallawa@redhat.com> Acked-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-
William Lee Irwin III authored
Fix missing cases for vm fault codes in sparc32 fault handling, and convert the entire file to using symbolic fault codes. This fixes a latent bug where an allocation failure returns to the kernel instead of delivering an error as expected. Signed-off-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-