Commits · c600751de91ae5c3f4310a0c1501d4dfd98bf89b · nexedi / linux

03 Apr, 2003 40 commits

[PATCH] md: Cleanups for md to move device size calculations into personalities · c600751d
Neil Brown authored Apr 03, 2003

c600751d
[PATCH] md: Fix stupid oops in recent md.c module changes · becf91fc
Neil Brown authored Apr 03, 2003

becf91fc
Merge nuts.ninka.net:/home/davem/src/BK/sparcwork-2.5 · bd4efa73
David S. Miller authored Apr 02, 2003
```
into nuts.ninka.net:/home/davem/src/BK/sparc-2.5
```
bd4efa73
[MODULE]: On sparc, ignore undefined symbols of type STT_REGISTER. · 8b4d8f66
David S. Miller authored Apr 02, 2003

8b4d8f66

[SPARC64]: Fix trap stack allocations so gcc-3.x builds work. · 11bef0bb

David S. Miller authored Apr 02, 2003

1) Use PTREGS_OFF consistently
2) Define it to allocate STACKFRAME_SZ instead of REGWIN_SZ
3) Kill off REGWIN_SZ, replace with sizeof(struct reg_window).

11bef0bb

[SPARC64]: Missing break; statement in module reloc code. · aadd0e95
David S. Miller authored Apr 02, 2003

aadd0e95
[SPARC64]: Support OLO10 relocations for modules. · 54f05dc4
David S. Miller authored Apr 02, 2003

54f05dc4
[SPARC64]: Fix boot target deps. · afa82d46
Ben Collins authored Apr 02, 2003

afa82d46
Merge bk://kernel.bkbits.net/davem/sparc-2.5 · 513230c4
Linus Torvalds authored Apr 02, 2003
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
513230c4
Merge bk://kernel.bkbits.net/davem/net-2.5 · 5148292f
Linus Torvalds authored Apr 02, 2003
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
5148292f
[SPARC64]: Update defconfig. · 588a4b4c
David S. Miller authored Apr 02, 2003

588a4b4c
[SPARC64]: Dont transition in us2e drivers if divisor does not change. · 66395803
David S. Miller authored Apr 02, 2003

66395803

[PATCH] ext3 journal commit I/O error fix · 68569684

Andrew Morton authored Apr 02, 2003

From: Hua Zhong <hzhong@cisco.com>

The current ext3 totally ignores I/O errors that happened during a
journal_force_commit time, causing user space to falsely believe it has
succeeded, which actually did not.

This patch  checks IO error during  journal_commit_transaction. and aborts
the journal when there is I/O error.

Originally I thought about reporting the error without doing aborting the
journal, but it probably needs a new flag. Aborting the journal seems to be
the easy way to  signal "hey sth is wrong..".

68569684

[PATCH] ext3: create a slab cache for transaction handles · c20fb5f1

Andrew Morton authored Apr 02, 2003

ext3 allocates and frees at least one handle structure for each system call.
kmalloc and kfree are apparent in the profiles.

Adding a slab cache for these objects takes the overhead for a write() from
1.63 microseconds down to 1.56.

c20fb5f1

[PATCH] ext3_commit_write speedup · 9aabee2e

Andrew Morton authored Apr 02, 2003

For an appending write, ext3_commit_write() will call the expensive
ext3_mark_inode_dirty() twice.  Once in generic_commit_write()'s extension of
i_size and once in ext3_commit_write() itself where i_disksize is updated.

But by updating i_disksize _before_ calling generic_commit_write() these can
be piggybacked.

The patch takes the overhead of a write() from 1.96 microseconds down to
1.63.

9aabee2e

[PATCH] ext3_mark_inode_dirty() speedup · f0f46afd

Andrew Morton authored Apr 02, 2003

ext3_mark_inode_dirty() (and several other callers) use the
ext3_reserve_inode_write() and ext3_mark_ioc_dirty() pair for journalling an
inode's backing block.

Because ext3_reserve_inode_write() gets journalling access to the block there
is no need for ext3_mark_iloc_dirty() to do it as well.

This change reduces the overhead of a write() from 2.7 microseconds to 1.95
on a 2.7G P4.

f0f46afd

[PATCH] Fix jbd assert failure on IO error. · 7ba93ca7

Andrew Morton authored Apr 02, 2003

From: Stephen Tweedie <sct@redhat.com>

The buffer_uptodate flag gets cleared on IO failure, and this can panic jbd
when it tries to write such a buffer. Relax the panic to be just a warning.

7ba93ca7

[PATCH] Add less-severe assert-failure form for ext3. · bdf6c6a6

Andrew Morton authored Apr 02, 2003

From: Stephen Tweedie <sct@redhat.com>

Add a new form of assert failure in ext3 which allows us to flag events which
are *usually* bugs, but which can be legally triggered in the presence of IO
failures.  Don't panic the kernel on such errors unless we've defined
#JBD_PARANOID_IOFAIL, which will normally be set only for testing purposes.

bdf6c6a6

[PATCH] remove dparent_lock · 723c6e83

Andrew Morton authored Apr 02, 2003

The big SMP machines are seeing quite some contention in dnotify_parent()
(via vfs_write).  This function is hammering the global dparent_lock.

However we don't actually need a global dparent_lock for pinning down
dentry->d_parent.  We can use dentry->d_lock for this.  That is already being
held across d_move.

This patch speeds up SDET on the 16-way by 5% and wipes dnotify_parent() off
the profiles.

It also uninlines dnofity_parent().

It also uses spin_lock(), which is faster than read_lock().

I'm not sure that we need to take both the source and target dentry's d_lock
in d_move.

The patch also does lots of s/__inline__/inline/ in dcache.h

723c6e83

[PATCH] real_lookup race fix · 1b8910cf

Andrew Morton authored Apr 02, 2003

From: Maneesh Soni <maneesh@in.ibm.com>

Here is a patch to use seqlock for real_lookup race with d_lookup as suggested
by Linus. The race condition can result in duplicate dentry when d_lookup
fails due concurrent d_move in some unrelated directory.

Apart from real_lookup, lookup_hash()->cached_lookup() can also fail due
to same reason. So, for that I am doing the d_lookup again.

Now we have __d_lookup (called from do_lookup() during pathwalk) and
d_lookup which uses seqlock to protect againt rename race.

dcachebench numbers (lower is better) don't have much difference on a 4-way
PIII xeon SMP box.

base-2565
Average usec/iteration  19059.4
Standard Deviation      503.07

base-2565 + seq_lock
Average usec/iteration  18843.2
Standard Deviation      450.57

1b8910cf

[PATCH] exp_parent locking fixes · ec1d26ec
Andrew Morton authored Apr 02, 2003
```
From: Neil Brown and myself.

Don't do dput() inside read_lock().  It can sleep.
```
ec1d26ec

[PATCH] umsdos fixes · ca88b8e8

Andrew Morton authored Apr 02, 2003

From: Andries.Brouwer@cwi.nl

Make sure structs used by umsdos ioctls do not change size when the size of
dev_t is changed.

ca88b8e8

[PATCH] Fix devfs' partition handling · 7ceef18f

Andrew Morton authored Apr 02, 2003

From: Andre Landwehr <andre.landwehr@gmx.net>

with / on an IDE harddisk the disks partitions do not appear in
devfs, only the disc device. This is due to rescan_partitions
being called twice and deleting but not re-creating the entries
during the second call.

hch has acked this.

7ceef18f

[PATCH] add vt console scrollback ioctl · 8a8e9c88

Andrew Morton authored Apr 02, 2003

From: Samuel Thibault <Samuel.Thibault@ens-lyon.fr>

There is no way for a braille device driven by brltty (userland root-owned
daemon) to scrollback the virtual console, the only way is to use the pc
keyboard. A very simple TIOCLINUX ioctl meets this need (tested).

Also add a command for bringing the last console to the top, as keyboard.c's
lastcons() does when pressing alt - down arrow.

8a8e9c88

[PATCH] sync dirty pages in fadvise(FADV_DONTNEED) · 3bc17e74

Andrew Morton authored Apr 02, 2003

This changes the fadvise(FADV_DONTNEED) operation to start async writeout of
any dirty pages in the file.

The thinking is that if the application doesn't want to use those pages in
the future, we may as well get IO underway against them so they can be freed
up on the next call to fadvise().

The POSIX spec does not go into any detail as to whether this is the right or
wrong behaviour.

This provides a nice way for applications whihc are writing streaming data
(the main users of fadvise) to keep the amount of dirty pagecache under
control without having to resort to system-wide VM tuning.

It also provides an "async fsync()".  If the application passes in a length
of zero, fadvise will start async writeout of the pages, but will not
invalidate any of the file's pagecache.

3bc17e74

[PATCH] Additional 3c980 device support · 89ef9495

Andrew Morton authored Apr 02, 2003

From: "J.A. Magallon" <jamagallon@able.es>

Adds support for a couple of 3c980 variants which are in pci.ids, but not in
the driver.

89ef9495

[PATCH] aic7xxx timer deletion fix · 93bd249f

Andrew Morton authored Apr 02, 2003

From: Zwane Mwaikambo <zwane@linuxpower.ca>

ahc_linux_free_device() needs to use del_timer_sync().  slab corruption has
been observed due to the timer handler running after the containing object
was freed.

93bd249f

[PATCH] misc fixes · 98c20bf4

Andrew Morton authored Apr 02, 2003

- Fix warning in sound/pci/cs46xx/cs46xx_lib.c (Martin Bligh)

- pte_file() comment fix (Pete Zaitcev)

- _PAGE_FILE comment clarifications

- copy_to_user() check in do_proc_readlink()

98c20bf4

[PATCH] struct stat - support larger dev_t · e95b2065

Andrew Morton authored Apr 02, 2003

From: Andries.Brouwer@cwi.nl

Below a patch that changes struct stat for a number of
architectures. Maintainers, please watch carefully.

Struct stat is used to transfer information from kernel
to user space on a stat() system call.
It has fields st_dev, st_rdev.

The size of these fields is in principle unrelated to
the size of a dev_t in user space or the size of a
dev_t or kdev_t in kernel space.

It is just the "capacity" of the channel.
The actual amount of useful information is the minimum
of the four sizes (kernel dev_t, kernel kdev_t,
user dev_t, width of stat st_dev, st_rdev fields).

The goal of this patch is to make sure that the stat() and stat64()
system calls transmit at least 32 and 64 bits, respectively.
This is achieved by using the padding that was present already.
We fail when no padding was present, or when the padding is on
the wrong side (after the field, while the machine is big-endian).

alpha:	stat: uses unsigned int, 32 bits
arm:	stat: uses unsigned short - bad.
	The padding is on one side, which means that this can
	be made into unsigned long only on little endian systems.
	FIXED - unless __ARMEB__.
	stat64: used unsigned short - FIXED, now unsigned long long.
cris:	stat: used unsigned short - FIXED, now unsigned long
	stat64: used unsigned short - FIXED, now unsigned long long.
i386:	stat: used unsigned short - FIXED, now unsigned long
	stat64: used unsigned short - FIXED, now unsigned long long.
ia64:	stat: uses unsigned long, 64 bits
m68k:	stat: used unsigned short - bad, but this cannot be fixed
	since m68k is big-endian, and the available padding is on
	the wrong side. NOT FIXED.
	stat64: used unsigned short - FIXED, now unsigned long long.
mips:	stat: uses dev_t which is unsigned int, 32 bits
	stat64: used unsigned long, 32 bits. NOT FIXED.
	(There is padding on one side, so this can be fixed if __MIPSEL__.)
mips64:	stat: uses dev_t which is unsigned int, 32 bits
parisc:	stat: uses dev_t, 32 bits
	stat64: uses unsigned long long, 64 bits
ppc:	stat: uses dev_t which is unsigned int, 32 bits
	stat64: unsigned long long, 64 bits
ppc64:	stat: uses dev_t which is unsigned long, 64 bits
	stat64: uses unsigned long, 64 bits
sparc:	stat: uses unsigned short, no padding. NOT FIXED.
	stat64: used unsigned short - FIXED, now unsigned long long.
sparc64:stat: uses dev_t which is unsigned int, 32 bits
	stat64: used unsigned short - FIXED, now unsigned long long.
s390:	stat: used unsigned short, big-endian, padding on the wrong side,
	NOT FIXED.
	stat64: used unsigned short - FIXED, now unsigned long long.
s390x:	stat: uses unsigned long, 64 bits
sh:	stat: used unsigned short, but padding maybe on wrong side.
	NOT FIXED.
	stat64: used unsigned short - FIXED, now unsigned long long.
v850:	stat: used __kernel_dev_t.
	BUG: NEVER use __kernel types in a user space interface.
	Replaced the types. FIXED - now unsigned int - 32 bits.
	stat64: FIXED - now unsigned long long - 64 bits.
x86_64:	stat: uses unsigned long, 64 bits

So, on most architectures we achieve the aim of 32 bits for stat,
64 bits for stat64. On all architectures we achieve at least
16 bits for stat, 32 bits for stat64.

e95b2065

[PATCH] tmpfs 6/6: percentile sizing of tmpfs · 65aaef27

Andrew Morton authored Apr 02, 2003

From: CaT <cat@zip.com.au>

What this patch does is allow you to specify the max amount of memory tmpfs
can use as a percentage of available real ram. This (in my eyes) is useful
so that you do not have to remember to change the setting if you want
something other then 50% and some of your ram goes.

Hugh redid the arithmetic to not overflow at 4GB; the particular order of
lines helps RH's gcc-2.96-110 not to get confused in the do_div. 2.5 can use
totalram_pages. Update mount options in tmpfs Doc.

There's an argument that the percentage should be of ram+swap, that's what
Christoph originally intended. But we set the default at 50% of ram only, so
I believe it's more consistent to follow that precedent.

65aaef27

[PATCH] tmpfs 5/6: use cond_resched · 548ac1de

Andrew Morton authored Apr 02, 2003

From: Hugh Dickins <hugh@veritas.com>

cond_resched each time around the loop in shmem_file_write
and do_shmem_file_read, matching filemap.c.

548ac1de

[PATCH] tmpfs 4/6: use mark_page_accessed · 5d86cc8b

Andrew Morton authored Apr 02, 2003

From: Hugh Dickins <hugh@veritas.com>

tmpfs pages should be surfing the LRUs in the company of their filemap
friends: I was expecting the rules to change, but they've been stable so
long, let's sprinkle mark_page_accessed in the equivalent places here; but
(don't ask me why) SetPageReferenced in shmem_file_write.  Ooh, and
shmem_populate was missing a flush_page_to_ram.

5d86cc8b

[PATCH] tmpfs 3/6: use generic_file_llseek · f56453c9

Andrew Morton authored Apr 02, 2003

From: Hugh Dickins <hugh@veritas.com>

default_llseek's use of BKL and not i_sem was recently exposed:
tmpfs should be using generic_file_llseek which guards with i_sem.

f56453c9

[PATCH] tmpfs 2/6 remove shmem_readpage · 2927b748

Andrew Morton authored Apr 02, 2003

From: Hugh Dickins <hugh@veritas.com>

shmem_readpage was created to give tmpfs sendfile and loop ability; but
they're both using shmem_file_sendfile now, so remove shmem_readpage.

2927b748

[PATCH] tmpfs 1/6 use generic_write_checks · acad2c18

Andrew Morton authored Apr 02, 2003

From: Hugh Dickins <hugh@veritas.com>

Remove the duplicated checks in shmem_file-write(), use
generic_write_checks() instead.

acad2c18

[PATCH] file limit checking simplification · d80bbda5

Andrew Morton authored Apr 02, 2003

From: Hugh Dickins <hugh@veritas.com>

When handling rlimit != RLIM_INFINITY, generic_write_checks tests file
position against 0xFFFFFFFFULL, and casts it to a u32. This code is
carried forward from 2.4.4, and the 2.4-ac tree contains an apparently
obvious fix to one part of it (should set count to 0 not to a negative).
But when you think it through, it all turns out to be bogus.

On a 32-bit architecture: limit is a 32-bit unsigned long, we've
already handled *pos < 0 and *pos >= limit, so *pos here has no way
of being > 0xFFFFFFFFULL, and thus casting it to u32 won't truncate it.
And on a 64-bit architecture: limit is a 64-bit unsigned long, but this
code is disallowing file position beyond the 32 bits; or if there's some
userspace compatibility issue, with limit having to fit into 32 bits,
the 32-bit architecture argument applies and they're still irrelevant.

So just remove the 0xFFFFFFFFULL test; and in place of the u32, cast to
typeof(limit) so it's right even if rlimits get wider. And there's no
way we'd want to send SIGXFSZ below the limit: remove send_sig comment.

There's a similarly suspicious u32 cast a little further down, when
checking MAX_NON_LFS. Given its definition, that does no harm on any
arch: but it's better changed to unsigned long, the type of MAX_NON_LFS.

d80bbda5

[PATCH] bio kmapping changes · 240d3e2d

Andrew Morton authored Apr 02, 2003

RAID5 is calling copy_data() under sh->lock.  But copy_data() does kmap(),
which can sleep.

The best fix is to use kmap_atomic() in there.  It is faster than kmap() and
does not block.

The patch removes the unused bio_kmap() and replaces __bio_kmap() with
__bio_kmap_atomic().  I think it's best to withdraw the sleeping-and-slow
bio_kmap() from the kernel API before someone else tries to use it.


Also, I notice that bio_kmap_irq() was using local_save_flags().  This is a
bug - local_save_flags() does not disable interrupts.  Converted that to
local_irq_save().  These names are terribly chosen.

This patch was acked by Jens and Neil.

240d3e2d

[PATCH] Fix some compile warnings · d597f71b

Andrew Morton authored Apr 02, 2003

From: "Martin J. Bligh" <mbligh@aracnet.com>

Fix a couple of instances of "warning: suggest parentheses around assignment
used as truth value".

d597f71b

[PATCH] monotonic clock source for hangcheck timer · 92525be5

Andrew Morton authored Apr 02, 2003

From: john stultz <johnstul@us.ibm.com>

This patch, written with the advice of Joel Becker, addresses a problem with
the hangcheck-timer.

The basic problem is that the hangcheck-timer code (Required for Oracle)
needs a accurate hard clock which can be used to detect OS stalls (due to
udelay() or pci bus hangs) that would cause system time to skew (its sort of
a sanity check that insures the system's notion of time is accurate).
However, currently they are using get_cycles() to fetch the cpu's TSC
register, thus this does not work on systems w/o a synced TSC.

As suggested by Andi Kleen (see thread here:
http://www.uwsg.iu.edu/hypermail/linux/kernel/0302.0/1234.html ) I've worked
with Joel and others to implement the monotonic_clock() interface.  Some of
the major considerations made when writing this patch were

o Needs to be able to return accurate time in the absence of multiple timer
  interrupts

o Needs to be abstracted out from the hardware

o Avoids impacting gettimeofday() performance

This interface returns a unsigned long long representing the number of
nanoseconds that has passed since time_init().

92525be5

[PATCH] handle bad inodes in put_inode · 68fa8120

Andrew Morton authored Apr 02, 2003

From: "J. Bruce Fields" <bfields@fieldses.org>

If the NFS daemon is presented with a filehandle for a file that has
been deleted, it does an iget() in fs/exportfs/expfs.c:export_iget() and
gets a bad inode back. When it subsequently iput()s the inode, the
result is:

Mar 27 12:53:40 snoopy kernel: EXT2-fs error (device ide0(3,3)): ext2_free_blocks: Freeing blocks not in datazone - block = 1802201963, count = 27499
Mar 27 12:53:40 snoopy kernel: Remounting filesystem read-only

The same can happen if ext2_get_inode() returns an error - ext2_read_inode()
will return an uninitialised inode and ext2_put_inode() is not allowed to go
looking inside the bad inode.

68fa8120