Commits · 673263406b941fd2bb3ec1353263fa2f455bba37 · Kirill Smelkov / linux

11 Jul, 2003 8 commits
- [PATCH] axnet can unload with timers live · 67326340
  Alan Cox authored Jul 11, 2003
  
  67326340
- [PATCH] isurf compile fix · c8b7ecf1
  Alan Cox authored Jul 11, 2003
  
  c8b7ecf1
- [PATCH] dtlk comment fix · 09b4f821
  Alan Cox authored Jul 11, 2003
  
  09b4f821
- [PATCH] clean up floppy98 a bit · 103ba0b8
  Alan Cox authored Jul 11, 2003
  
  103ba0b8
- [PATCH] Remove bogus printk in microcode.c · f5ccc046
  Alan Cox authored Jul 11, 2003
  
  f5ccc046
- [PATCH] genrtc sets owner fields so.. · a5b484fd
  Alan Cox authored Jul 11, 2003
  
  a5b484fd
- fix cifs distributed caching - send oplock release immediately after flush of... · ecfe832e
  Steve French authored Jul 11, 2003
```
fix cifs distributed caching - send oplock release immediately after flush of writebehind data on oplock break from server
```
  ecfe832e
- Merge bk://linux.bkbits.net/linux-2.5 · bed8ce47
  Steve French authored Jul 10, 2003
```
into hostme.bitkeeper.com:/repos/c/cifs/linux-2.5cifs
```
  bed8ce47
10 Jul, 2003 32 commits

[PATCH] show_stack changes for v850 · ccfd6724
Miles Bader authored Jul 10, 2003

ccfd6724
[PATCH] More irqreturn_t changes for v850 · ab48a939
Miles Bader authored Jul 10, 2003

ab48a939
[PATCH] Use <asm-generic/statsfs.h> on v850 · 599cd887
Miles Bader authored Jul 10, 2003

599cd887

[PATCH] disk stats accounting fix · b8ac7066

Jens Axboe authored Jul 10, 2003

We should only account file system requests, ones originating from
__make_request(). Otherwise it skews the counters and they go negative
really fast.

b8ac7066

[PATCH] epoll-per-fd fix · 9cbdaa44

Andrew Morton authored Jul 10, 2003

From: Davide Libenzi <davidel@xmailserver.org>

Fix epoll to allow pushing of multiple file descriptors sharing the same
kernel's file*

9cbdaa44

[PATCH] devfs deadlock fix · 1cf2ec10

Andrew Morton authored Jul 10, 2003

From: Andrey Borzenkov <arvidjaar@mail.ru>

I finally hit a painfully trivial way to reproduce another long standing devfs
problem - deadlock between devfs_lookup and devfs_d_revalidate_wait. When
devfs_lookup releases directory i_sem devfs_d_revalidate_wait grabs it (it
happens not for every path) and goes to wait to be waked up. Unfortunately,
devfs_lookup attempts to acquire directory i_sem before ever waking it up ...

To reproduce (2.5.74 UP or SMP - does not matter, single CPU system)

ls /dev/foo & rm -f /dev/foo &

or possibly in a loop but then it easily fills up process table. In my case it
hangs 100% reliably - on 2.5 OR 2.4.

The current fix is to move re-acquire of i_sem after all
devfs_d_revalidate_wait waiters have been waked up. Much better fix would be
to ensure that ->d_revalidate either is always called under i_sem or always
without. But that means the very heart of VFS and I do not dare to touch it.

The fix has been tested on 2.4 (and is part of unofficial Mandrake Club
kernel); I expected the same bug is in 2.5; I just was stupid not seeing the
way to reproduce it before.

1cf2ec10

[PATCH] devfs oops fix · 934acf6c

Andrew Morton authored Jul 10, 2003

From: Andrey Borzenkov <arvidjaar@mail.ru>

Doing concurrent lookups for the same name in devfs with devfsd and modules
enabled may result in stack coruption.

When devfs_lookup needs to call devfsd it arranges for other lookups for the
same name to wait. It is using local variable as wait queue head. After
devfsd returns devfs_lookup wakes up all waiters and returns. Unfortunately
there is no garantee all waiters will actually get chance to run and clean up
before devfs_lookup returns. so some of them attempt to access already freed
storage on stack.

It is trivial to trigger with SMP kernel (I have single-CPU system if it
matters) doing

while true
do
  ls /dev/foo &
done

Without spinlock debug system usually hung dead with reset button as the only
possibility.

I was not able to reproduce it on 2.4 on single-CPU system - in 2.4
devfs_d_revalidate_wait does not attempt to remove itself from wait queue
so it appears to be safe.

The patch makes lookup struct be allocated from heap and adds reference
counter to free it when no more needed.

934acf6c

[PATCH] Fix yenta-socket oops · e59d9afb

Andrew Morton authored Jul 10, 2003

From: Russell King <rmk@arm.linux.org.uk>

Interrupts can sometimes occur before the socket thread is started.

e59d9afb

[PATCH] yenta-socket initialisation fix · 85cea662

Andrew Morton authored Jul 10, 2003

From: Daniel Ritz <daniel.ritz@gmx.ch>

init_socket() enables interrupts, and the interrupt handler does a wakeup.
Let's initialise that waitqueue head before turning on the interrupts.

85cea662

[PATCH] oom killer fixes · 21f8b968

Andrew Morton authored Jul 10, 2003

From: William Lee Irwin III <wli@holomorphy.com>

There are reports of kernel threads being killed by the oomkiller.  We
think this is because the oom killer tries to kill a task after it has
exitted and set its ->mm to zero.  The oom killer will then try to kill all
other tasks which have a null ->mm.

Attempt to detect that case and fix it up.

21f8b968

[PATCH] ext3: sync_fs() fix · af738c8a

Andrew Morton authored Jul 10, 2003

From: Alex Tomas <bzzz@tmi.comex.ru>

fsync_super() calls ->sync_fs() just after ->write_super().  But
write_super() will start a commit.  In this case, ext3_sync_fs() will not
itself start a commit, and it hence forgets to wait on the commit which
ext3_write_super() started.

Fix that up by making journal_start_commit() return the transaction ID of
any currently-running transaction.

af738c8a

[PATCH] JBD: transaction buffer accounting fix · 4152cdfa

Andrew Morton authored Jul 10, 2003

From: Alex Tomas <bzzz@tmi.comex.ru>

start_this_handle() takes into account t_outstanding_credits when calculating
log free space, but journal_next_log_block() accounts for blocks being logged
also.  Hence, blocks are accounting twice.  This effectively reduces the
amount of log space available to transactions and forces more commits.

Fix it by decrementing t_outstanding_credits each time we allocate a new
journal block.

4152cdfa

[PATCH] JBD: checkpointing optimisations · a2df663d

Andrew Morton authored Jul 10, 2003

From: Alex Tomas <bzzz@tmi.comex.ru>

Some transaction checkpointing improvements for the JBD commit phase.  Decent
speedups:

creation of 500K files in single dir (with htree, of course):
 before: 4m16.094s, 4m12.035s, 4m11.911s
 after:  1m41.364s, 1m43.461s, 1m45.189s

removal of 500K files in single dir:
 before: 43m50.161s
 after:  38m45.510s


- Make __log_wait_for_space() recalculate the needed blocks because journal
  free space changes during commit

- Make log_do_checkpoint() starts scanning from the oldest transaction

- Make log_do_checkpoint() stop scanning if a transaction gets dropped.
  The caller will reevaluate the transaction state and decide whether more
  space needs to be generated in the log.

  The effect of this is to smooth out the I/O patterns, avoid the huge
  stop-and-go which currently happens when forced checkpointing writes out
  and waits upon 3/4 of the journal's size worth of data.

a2df663d

[PATCH] nbd: make nbd and block layer agree about device and · 20c52ab8

Andrew Morton authored Jul 10, 2003

From: Paul Clements <Paul.Clements@SteelEye.com>

Ensure that nbd and the block layer agree about device block sizes and total
device sizes.

20c52ab8

[PATCH] nbd: remove unneeded nbd_open/nbd_release and refcnt · 627c0412
Andrew Morton authored Jul 10, 2003
```
From: Paul Clements <Paul.Clements@SteelEye.com>

Remove the unneeded nbd_open and nbd_release functions.
```
627c0412
[PATCH] NBD documentation update · f4c39f4b
Andrew Morton authored Jul 10, 2003
```
From: Paul Clements <Paul.Clements@SteelEye.com>

Modernise nbd.txt a bit.
```
f4c39f4b

[PATCH] nbd: cleanup PARANOIA usage & code · d7b92e1d

Andrew Morton authored Jul 10, 2003

From: Lou Langholtz <ldl@aros.net>

This fifth patch cleans up usage of the PARANOIA sanity checking macro and
code.  This patch modifies both drivers/block/nbd.c and
include/linux/nbd.h.  It's intended to be applied incrementally on top of
my fourth patch (4.1 really if you count the memset addition as .1's worth)
that simply removed unneeded blksize_bits field.  Again, I wanted to get
this smaller change out of the way before my next patch will is much more
major.

d7b92e1d

[PATCH] nbd: initialise the embedded kobject · 4f9420c6
Andrew Morton authored Jul 10, 2003
```
From: Lou Langholtz <ldl@aros.net>

Fixes the NBD oopses which people have been reporting.
```
4f9420c6

[PATCH] nbd: remove unneeded blksize_bits field · 49e57bfc

Andrew Morton authored Jul 10, 2003

From: Lou Langholtz <ldl@aros.net>

This fourth patch simply removes the blksize_bits field from the nbd_device
struct and driver implementation.  How this field made it into this driver
to begin with is a mystery (where was Al Viro when that patch was
submitted??).  :-)

This patch modifies both drivers/block/nbd.c and include/linux/nbd.h files.
 It's intended to be applied incrementally on top of my third patch (for
enhanced diagnostics support).

49e57bfc

[PATCH] nbd: enhanced diagnostics support · 9c976399

Andrew Morton authored Jul 10, 2003

From: Lou Langholtz <ldl@aros.net>

This third patch (for enhancing diagnostics support) applies incrementally
after my last LKML'd patch (for cosmetic changes).  These changes introduce
configurable KERN_DEBUG level printk output for a variety of different
things that the driver does and provides the framework for enhanced future
debugging support as well.

9c976399

[PATCH] NBD: cosmetic cleanups · 52fa6e21

Andrew Morton authored Jul 10, 2003

From: Lou Langholtz <ldl@aros.net>

It's a helpful step in being better able to identify code inefficiencies
and problems particularly w.r.t.  locking.  It also modifies some of the
output messages for greater consistancy and better diagnostic support.

This second patch is a lead in that way to the third patch, which will
simply introduce the dprintk() debugging facility that my jumbo patch
originally had.

With the cosmetics patch and debugging enhancement (patch), it will make it
easier to fix or at least improve the locking bugs/races in NBD (that will
likely make up the fourth patch in my envisioned roadmap).

52fa6e21

[PATCH] fix for CPU scheduler load distribution · e0a3db1a

Andrew Morton authored Jul 10, 2003

From: Ingo Molnar <mingo@elte.hu>

It makes hot-balancing happen in the 'busy tick' case as well, which should
spread out processes more agressively.

e0a3db1a

[PATCH] separate locking for vfsmounts · 91b79ba7

Andrew Morton authored Jul 10, 2003

From: Maneesh Soni <maneesh@in.ibm.com>

While path walking we do follow_mount or follow_down which uses
dcache_lock for serialisation.  vfsmount related operations also use
dcache_lock for all updates. I think we can use a separate lock for
vfsmount related work and can improve path walking.

The following two patches does the same. The first one replaces
dcache_lock with new vfsmount_lock in namespace.c. The lock is
local to namespace.c and is not required outside. The second patch
uses RCU to have lock free lookup_mnt(). The patches are quite simple
and straight forward.

The lockmeter reults show reduced contention, and lock acquisitions
for dcache_lock while running dcachebench* on a 4-way SMP box

    SPINLOCKS         HOLD            WAIT
    UTIL  CON    MEAN(  MAX )   MEAN(  MAX )(% CPU)     TOTAL NOWAIT SPIN RJECT  NAME

  baselkm-2569:
    20.7% 20.9%  0.5us( 146us)  2.9us( 144us)(0.81%)  31590840 79.1% 20.9%    0%  dcache_lock
  mntlkm-2569:
    14.3% 13.6%  0.4us( 170us)  2.9us( 187us)(0.42%)  23071746 86.4% 13.6%    0%  dcache_lock

We get more than 8% improvement on 4-way SMP and 44% improvement on 16-way
NUMAQ while runing dcachebench*.

		Average (usecs/iteration)	Std. Deviation
		(lower is better)
4-way SMP
  2.5.69	15739.3				470.90
  2.5.69-mnt	14459.6				298.51

16-way NUMAQ
  2.5.69	120426.5			363.78
  2.5.69-mnt	 63225.8			427.60

*dcachebench is a microbenchmark written by Bill Hartner and is available at
http://www-124.ibm.com/developerworks/opensource/linuxperf/dcachebench/dcachebench.html

 vfsmount_lock.patch
 -------------------
 - Patch for replacing dcache_lock with new vfsmount_lock for all mount
   related operation. This removes the need to take dcache_lock while
   doing follow_mount or follow_down operations in path walking.

I re-ran dcachebench with 2.5.70 as base on 16-way NUMAQ box.

                	Average (usecs/iteration)       Std. Deviation
                	(lower is better)
16-way NUMAQ
2.5.70 				120710.9		 	230.67
 + vfsmount_lock.patch  	65209.6				242.97
    + lookup_mnt-rcu.patch 	64042.3				416.61

So just the lock splitting (vfsmount_lock.patch) gives almost similar benifits

91b79ba7

[PATCH] Fix race condition between aio_complete and · 679c40a8

Andrew Morton authored Jul 10, 2003

From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>

We hit a memory ordering race condition on AIO ring buffer tail pointer
between function aio_complete() and aio_read_evt().

What happens is that on an architecture that has a relaxed memory ordering
model like IPF(ia64), explicit memory barrier is required in a SMP
execution environment. Considering the following case:

1 CPU is executing a tight loop of aio_read_evt. It is pulling event off
the ring buffer. During that loop, another CPU is executing aio_complete()
where it is putting event into the ring buffer and then update the tail
pointer. However, due to relaxed memory ordering model, the tail pointer
can be visible before the actual event is being updated. So the other CPU
sees the updated tail pointer but picks up a staled event data.

A memory barrier is required in this case between the event data and tail
pointer update. Same is true for the head pointer but the window of the
race condition is nil. For function correctness, it is fixed here as well.

By the way, this bug is fixed in the major distributor's kernel on 2.4.x
kernel series for a while, but somehow hasn't been propagated to 2.5 kernel
yet.

679c40a8

[PATCH] Bug fix in AIO initialization · b1648ead

Andrew Morton authored Jul 10, 2003

From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>

We hit this bug when we have the following scenario:

One process initializes an AIO context and then forks out many child
processes.  When those child processes exit, many BUG checks
(effectively kernel oops) were triggered from put_ioctx(ctx) in function
exit_aio().

The issue was that the AIO context was incorrectly copied upon forking
and mislead all child processes to think they have an IO context and
trying to free it where they really don't own.  The following patch fix
the issue.

b1648ead

[PATCH] Set umask correctly for nfsd kernel threads · b14241c4

Andrew Morton authored Jul 10, 2003

From: Andreas Gruenbacher <agruen@suse.de>

Without acls, when creating files the umask is applied directly in the vfs.
ACLs require that the umask is applied at the file system level, depending on
whether or not the containing directory has a default acl. The daemonize()
function makes kernel threads share their fs_struct structure with the init
process. Among other things, fs_struct contains the umask, so all kernel
threads share their umask with init.

The kernel nfsd needs to create files with a umask of 0. Init's umask cannot
simply be changed to 0 --- this would have side effects on init, and init
would have side effects on nfsd. So this patch recreates a fs_struct
structure for nfsd kernel threads, and sets its umask to 0.

This fixes bug #721, <http://www.osdl.net/show_bug.cgi?id=721>.

b14241c4

[PATCH] misc fixes · ecbaa730

Andrew Morton authored Jul 10, 2003

- remove accidental debug code from ext3 commit.

- /proc/profile documentation fix (Randy Dunlap)

- use sb_breadahead() in ext2_preread_inode()

- unused var in mpage_writepages()

ecbaa730

[PATCH] make CONFIG_KALLSYMS default to "on" · f3eee922

Andrew Morton authored Jul 10, 2003

From: Diego Calleja Garcia <diegocg@teleline.es>

Move CONFIG_KALLSYMS out of the arch directory and into init/.

It defaults to "on" unless the user explicitly turns it off in the
"embedded systems" menu.

f3eee922

[PATCH] kmap() -> kmap_atomic() in fs/exec.c · 9f1ed86f
Andrew Morton authored Jul 10, 2003
```
replace a kmap() with kmap_atomic()
```
9f1ed86f

[PATCH] i_size atomic access · eafe5916

Andrew Morton authored Jul 10, 2003

From: Daniel McNeil <daniel@osdl.org>

This adds i_seqcount to the inode structure and then uses i_size_read() and
i_size_write() to provide atomic access to i_size.  This is a port of
Andrea Arcangeli's i_size atomic access patch from 2.4.  This only uses the
generic reader/writer consistent mechanism.

Before:
mnm:/usr/src/25> size vmlinux
   text    data     bss     dec     hex filename
2229582 1027683  162436 3419701  342e35 vmlinux

After:
mnm:/usr/src/25> size vmlinux
   text    data     bss     dec     hex filename
2225642 1027655  162436 3415733  341eb5 vmlinux

3.9k more text, a lot of it fastpath :(

It's a very minor bug, and the fix has a fairly non-minor cost.  The most
compelling reason for fixing this is that writepage() checks i_size.  If it
sees a transient value it may decide that page is outside i_size and will
refuse to write it.  Lost user data.

eafe5916

[PATCH] i_size atomic access: infrastructure · e9b94f6a

Andrew Morton authored Jul 10, 2003

From: Daniel McNeil <daniel@osdl.org>

This adds a sequence counter only version of the reader/writer consistent
mechanism to seqlock.h This is used in the second part of this patch give
atomic access to i_size.

e9b94f6a

[PATCH] wall_to_monotonic initialization fixes for · 1ac38088

Andrew Morton authored Jul 10, 2003

From: Tim Schmielau <tim@physik3.uni-rostock.de>

This patch adds (or fixes) initialization of wall_to_monotonic for a few
more architectures.

This should get rid of the strange uptime>14600 days reports, except on arm
whose arch file layout is too unfamiliar to me.

The patch is blessed by George Anzinger, but untested due to lack of
hardware.

1ac38088