- 03 Apr, 2003 40 commits
-
-
David S. Miller authored
into nuts.ninka.net:/home/davem/src/BK/sparc-2.5
-
David S. Miller authored
-
David S. Miller authored
1) Use PTREGS_OFF consistently 2) Define it to allocate STACKFRAME_SZ instead of REGWIN_SZ 3) Kill off REGWIN_SZ, replace with sizeof(struct reg_window).
-
David S. Miller authored
-
David S. Miller authored
-
Ben Collins authored
-
bk://kernel.bkbits.net/davem/sparc-2.5Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
bk://kernel.bkbits.net/davem/net-2.5Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
David S. Miller authored
-
David S. Miller authored
-
Andrew Morton authored
From: Hua Zhong <hzhong@cisco.com> The current ext3 totally ignores I/O errors that happened during a journal_force_commit time, causing user space to falsely believe it has succeeded, which actually did not. This patch checks IO error during journal_commit_transaction. and aborts the journal when there is I/O error. Originally I thought about reporting the error without doing aborting the journal, but it probably needs a new flag. Aborting the journal seems to be the easy way to signal "hey sth is wrong..".
-
Andrew Morton authored
ext3 allocates and frees at least one handle structure for each system call. kmalloc and kfree are apparent in the profiles. Adding a slab cache for these objects takes the overhead for a write() from 1.63 microseconds down to 1.56.
-
Andrew Morton authored
For an appending write, ext3_commit_write() will call the expensive ext3_mark_inode_dirty() twice. Once in generic_commit_write()'s extension of i_size and once in ext3_commit_write() itself where i_disksize is updated. But by updating i_disksize _before_ calling generic_commit_write() these can be piggybacked. The patch takes the overhead of a write() from 1.96 microseconds down to 1.63.
-
Andrew Morton authored
ext3_mark_inode_dirty() (and several other callers) use the ext3_reserve_inode_write() and ext3_mark_ioc_dirty() pair for journalling an inode's backing block. Because ext3_reserve_inode_write() gets journalling access to the block there is no need for ext3_mark_iloc_dirty() to do it as well. This change reduces the overhead of a write() from 2.7 microseconds to 1.95 on a 2.7G P4.
-
Andrew Morton authored
From: Stephen Tweedie <sct@redhat.com> The buffer_uptodate flag gets cleared on IO failure, and this can panic jbd when it tries to write such a buffer. Relax the panic to be just a warning.
-
Andrew Morton authored
From: Stephen Tweedie <sct@redhat.com> Add a new form of assert failure in ext3 which allows us to flag events which are *usually* bugs, but which can be legally triggered in the presence of IO failures. Don't panic the kernel on such errors unless we've defined #JBD_PARANOID_IOFAIL, which will normally be set only for testing purposes.
-
Andrew Morton authored
The big SMP machines are seeing quite some contention in dnotify_parent() (via vfs_write). This function is hammering the global dparent_lock. However we don't actually need a global dparent_lock for pinning down dentry->d_parent. We can use dentry->d_lock for this. That is already being held across d_move. This patch speeds up SDET on the 16-way by 5% and wipes dnotify_parent() off the profiles. It also uninlines dnofity_parent(). It also uses spin_lock(), which is faster than read_lock(). I'm not sure that we need to take both the source and target dentry's d_lock in d_move. The patch also does lots of s/__inline__/inline/ in dcache.h
-
Andrew Morton authored
From: Maneesh Soni <maneesh@in.ibm.com> Here is a patch to use seqlock for real_lookup race with d_lookup as suggested by Linus. The race condition can result in duplicate dentry when d_lookup fails due concurrent d_move in some unrelated directory. Apart from real_lookup, lookup_hash()->cached_lookup() can also fail due to same reason. So, for that I am doing the d_lookup again. Now we have __d_lookup (called from do_lookup() during pathwalk) and d_lookup which uses seqlock to protect againt rename race. dcachebench numbers (lower is better) don't have much difference on a 4-way PIII xeon SMP box. base-2565 Average usec/iteration 19059.4 Standard Deviation 503.07 base-2565 + seq_lock Average usec/iteration 18843.2 Standard Deviation 450.57
-
Andrew Morton authored
From: Neil Brown and myself. Don't do dput() inside read_lock(). It can sleep.
-
Andrew Morton authored
From: Andries.Brouwer@cwi.nl Make sure structs used by umsdos ioctls do not change size when the size of dev_t is changed.
-
Andrew Morton authored
From: Andre Landwehr <andre.landwehr@gmx.net> with / on an IDE harddisk the disks partitions do not appear in devfs, only the disc device. This is due to rescan_partitions being called twice and deleting but not re-creating the entries during the second call. hch has acked this.
-
Andrew Morton authored
From: Samuel Thibault <Samuel.Thibault@ens-lyon.fr> There is no way for a braille device driven by brltty (userland root-owned daemon) to scrollback the virtual console, the only way is to use the pc keyboard. A very simple TIOCLINUX ioctl meets this need (tested). Also add a command for bringing the last console to the top, as keyboard.c's lastcons() does when pressing alt - down arrow.
-
Andrew Morton authored
This changes the fadvise(FADV_DONTNEED) operation to start async writeout of any dirty pages in the file. The thinking is that if the application doesn't want to use those pages in the future, we may as well get IO underway against them so they can be freed up on the next call to fadvise(). The POSIX spec does not go into any detail as to whether this is the right or wrong behaviour. This provides a nice way for applications whihc are writing streaming data (the main users of fadvise) to keep the amount of dirty pagecache under control without having to resort to system-wide VM tuning. It also provides an "async fsync()". If the application passes in a length of zero, fadvise will start async writeout of the pages, but will not invalidate any of the file's pagecache.
-
Andrew Morton authored
From: "J.A. Magallon" <jamagallon@able.es> Adds support for a couple of 3c980 variants which are in pci.ids, but not in the driver.
-
Andrew Morton authored
From: Zwane Mwaikambo <zwane@linuxpower.ca> ahc_linux_free_device() needs to use del_timer_sync(). slab corruption has been observed due to the timer handler running after the containing object was freed.
-
Andrew Morton authored
- Fix warning in sound/pci/cs46xx/cs46xx_lib.c (Martin Bligh) - pte_file() comment fix (Pete Zaitcev) - _PAGE_FILE comment clarifications - copy_to_user() check in do_proc_readlink()
-
Andrew Morton authored
From: Andries.Brouwer@cwi.nl Below a patch that changes struct stat for a number of architectures. Maintainers, please watch carefully. Struct stat is used to transfer information from kernel to user space on a stat() system call. It has fields st_dev, st_rdev. The size of these fields is in principle unrelated to the size of a dev_t in user space or the size of a dev_t or kdev_t in kernel space. It is just the "capacity" of the channel. The actual amount of useful information is the minimum of the four sizes (kernel dev_t, kernel kdev_t, user dev_t, width of stat st_dev, st_rdev fields). The goal of this patch is to make sure that the stat() and stat64() system calls transmit at least 32 and 64 bits, respectively. This is achieved by using the padding that was present already. We fail when no padding was present, or when the padding is on the wrong side (after the field, while the machine is big-endian). alpha: stat: uses unsigned int, 32 bits arm: stat: uses unsigned short - bad. The padding is on one side, which means that this can be made into unsigned long only on little endian systems. FIXED - unless __ARMEB__. stat64: used unsigned short - FIXED, now unsigned long long. cris: stat: used unsigned short - FIXED, now unsigned long stat64: used unsigned short - FIXED, now unsigned long long. i386: stat: used unsigned short - FIXED, now unsigned long stat64: used unsigned short - FIXED, now unsigned long long. ia64: stat: uses unsigned long, 64 bits m68k: stat: used unsigned short - bad, but this cannot be fixed since m68k is big-endian, and the available padding is on the wrong side. NOT FIXED. stat64: used unsigned short - FIXED, now unsigned long long. mips: stat: uses dev_t which is unsigned int, 32 bits stat64: used unsigned long, 32 bits. NOT FIXED. (There is padding on one side, so this can be fixed if __MIPSEL__.) mips64: stat: uses dev_t which is unsigned int, 32 bits parisc: stat: uses dev_t, 32 bits stat64: uses unsigned long long, 64 bits ppc: stat: uses dev_t which is unsigned int, 32 bits stat64: unsigned long long, 64 bits ppc64: stat: uses dev_t which is unsigned long, 64 bits stat64: uses unsigned long, 64 bits sparc: stat: uses unsigned short, no padding. NOT FIXED. stat64: used unsigned short - FIXED, now unsigned long long. sparc64:stat: uses dev_t which is unsigned int, 32 bits stat64: used unsigned short - FIXED, now unsigned long long. s390: stat: used unsigned short, big-endian, padding on the wrong side, NOT FIXED. stat64: used unsigned short - FIXED, now unsigned long long. s390x: stat: uses unsigned long, 64 bits sh: stat: used unsigned short, but padding maybe on wrong side. NOT FIXED. stat64: used unsigned short - FIXED, now unsigned long long. v850: stat: used __kernel_dev_t. BUG: NEVER use __kernel types in a user space interface. Replaced the types. FIXED - now unsigned int - 32 bits. stat64: FIXED - now unsigned long long - 64 bits. x86_64: stat: uses unsigned long, 64 bits So, on most architectures we achieve the aim of 32 bits for stat, 64 bits for stat64. On all architectures we achieve at least 16 bits for stat, 32 bits for stat64.
-
Andrew Morton authored
From: CaT <cat@zip.com.au> What this patch does is allow you to specify the max amount of memory tmpfs can use as a percentage of available real ram. This (in my eyes) is useful so that you do not have to remember to change the setting if you want something other then 50% and some of your ram goes. Hugh redid the arithmetic to not overflow at 4GB; the particular order of lines helps RH's gcc-2.96-110 not to get confused in the do_div. 2.5 can use totalram_pages. Update mount options in tmpfs Doc. There's an argument that the percentage should be of ram+swap, that's what Christoph originally intended. But we set the default at 50% of ram only, so I believe it's more consistent to follow that precedent.
-
Andrew Morton authored
From: Hugh Dickins <hugh@veritas.com> cond_resched each time around the loop in shmem_file_write and do_shmem_file_read, matching filemap.c.
-
Andrew Morton authored
From: Hugh Dickins <hugh@veritas.com> tmpfs pages should be surfing the LRUs in the company of their filemap friends: I was expecting the rules to change, but they've been stable so long, let's sprinkle mark_page_accessed in the equivalent places here; but (don't ask me why) SetPageReferenced in shmem_file_write. Ooh, and shmem_populate was missing a flush_page_to_ram.
-
Andrew Morton authored
From: Hugh Dickins <hugh@veritas.com> default_llseek's use of BKL and not i_sem was recently exposed: tmpfs should be using generic_file_llseek which guards with i_sem.
-
Andrew Morton authored
From: Hugh Dickins <hugh@veritas.com> shmem_readpage was created to give tmpfs sendfile and loop ability; but they're both using shmem_file_sendfile now, so remove shmem_readpage.
-
Andrew Morton authored
From: Hugh Dickins <hugh@veritas.com> Remove the duplicated checks in shmem_file-write(), use generic_write_checks() instead.
-
Andrew Morton authored
From: Hugh Dickins <hugh@veritas.com> When handling rlimit != RLIM_INFINITY, generic_write_checks tests file position against 0xFFFFFFFFULL, and casts it to a u32. This code is carried forward from 2.4.4, and the 2.4-ac tree contains an apparently obvious fix to one part of it (should set count to 0 not to a negative). But when you think it through, it all turns out to be bogus. On a 32-bit architecture: limit is a 32-bit unsigned long, we've already handled *pos < 0 and *pos >= limit, so *pos here has no way of being > 0xFFFFFFFFULL, and thus casting it to u32 won't truncate it. And on a 64-bit architecture: limit is a 64-bit unsigned long, but this code is disallowing file position beyond the 32 bits; or if there's some userspace compatibility issue, with limit having to fit into 32 bits, the 32-bit architecture argument applies and they're still irrelevant. So just remove the 0xFFFFFFFFULL test; and in place of the u32, cast to typeof(limit) so it's right even if rlimits get wider. And there's no way we'd want to send SIGXFSZ below the limit: remove send_sig comment. There's a similarly suspicious u32 cast a little further down, when checking MAX_NON_LFS. Given its definition, that does no harm on any arch: but it's better changed to unsigned long, the type of MAX_NON_LFS.
-
Andrew Morton authored
RAID5 is calling copy_data() under sh->lock. But copy_data() does kmap(), which can sleep. The best fix is to use kmap_atomic() in there. It is faster than kmap() and does not block. The patch removes the unused bio_kmap() and replaces __bio_kmap() with __bio_kmap_atomic(). I think it's best to withdraw the sleeping-and-slow bio_kmap() from the kernel API before someone else tries to use it. Also, I notice that bio_kmap_irq() was using local_save_flags(). This is a bug - local_save_flags() does not disable interrupts. Converted that to local_irq_save(). These names are terribly chosen. This patch was acked by Jens and Neil.
-
Andrew Morton authored
From: "Martin J. Bligh" <mbligh@aracnet.com> Fix a couple of instances of "warning: suggest parentheses around assignment used as truth value".
-
Andrew Morton authored
From: john stultz <johnstul@us.ibm.com> This patch, written with the advice of Joel Becker, addresses a problem with the hangcheck-timer. The basic problem is that the hangcheck-timer code (Required for Oracle) needs a accurate hard clock which can be used to detect OS stalls (due to udelay() or pci bus hangs) that would cause system time to skew (its sort of a sanity check that insures the system's notion of time is accurate). However, currently they are using get_cycles() to fetch the cpu's TSC register, thus this does not work on systems w/o a synced TSC. As suggested by Andi Kleen (see thread here: http://www.uwsg.iu.edu/hypermail/linux/kernel/0302.0/1234.html ) I've worked with Joel and others to implement the monotonic_clock() interface. Some of the major considerations made when writing this patch were o Needs to be able to return accurate time in the absence of multiple timer interrupts o Needs to be abstracted out from the hardware o Avoids impacting gettimeofday() performance This interface returns a unsigned long long representing the number of nanoseconds that has passed since time_init().
-
Andrew Morton authored
From: "J. Bruce Fields" <bfields@fieldses.org> If the NFS daemon is presented with a filehandle for a file that has been deleted, it does an iget() in fs/exportfs/expfs.c:export_iget() and gets a bad inode back. When it subsequently iput()s the inode, the result is: Mar 27 12:53:40 snoopy kernel: EXT2-fs error (device ide0(3,3)): ext2_free_blocks: Freeing blocks not in datazone - block = 1802201963, count = 27499 Mar 27 12:53:40 snoopy kernel: Remounting filesystem read-only The same can happen if ext2_get_inode() returns an error - ext2_read_inode() will return an uninitialised inode and ext2_put_inode() is not allowed to go looking inside the bad inode.
-
Andrew Morton authored
From: Hugh Dickins <hugh@veritas.com> The blk_congestion_waits in shmem_getpage are appropriate when the error is -ENOMEM, but not when the error is -EEXIST. So add that test in the first instance, but omit it all in the second instance.
-
Ben Collins authored
-