Commits · a51ae74343bbf3dcec319a2637bf56c4b45ea082 · Kirill Smelkov / linux

01 Aug, 2003 40 commits

Andrew Morton authored Jul 31, 2003

From: Adrian Bunk <bunk@fs.tum.de>

Fix ACPI compile error if doing processor probing only: acpi_find_bmc is
only available #ifdef CONFIG_ACPI_INTERPRETER.

a51ae743

[PATCH] export agp_memory_reserved · 4c0c1725
Andrew Morton authored Jul 31, 2003
```
nvidia-agp needs it.
```
4c0c1725

[PATCH] i810fb oops fix · 3dd9a2ff

Andrew Morton authored Jul 31, 2003

The module device table is not NULL-terminated, so we run off the end during
probing and oops.

Also, move all those static decls out of .h and into .c

3dd9a2ff

[PATCH] quota typo fix · efc0596c
Andrew Morton authored Jul 31, 2003
```
From: Herbert Potzl <herbert@13thfloor.at>

quota.h typo fix
```
efc0596c

[PATCH] Fix dac960 for devfs · 6a740541

Andrew Morton authored Jul 31, 2003

From: Dave Olien <dmo@osdl.org>

It wasn't initializing the devfs_name member of the gendisk structures to
contain the root name of the logical disk.

6a740541

[PATCH] binfmt_script argv[0] fix · 867c7a0a

Andrew Morton authored Jul 31, 2003

From: Arun Sharma <arun.sharma@intel.com>

A script such as

  #!/bin/foo.bar
  ...

where /bin/foo.bar is handled by binfmt_misc, is not handled correctly i.e.
the interpreter of foo.bar doesn't receive the correct arguments.

The binfmt_misc handler requires that bprm->filename is appropriately
filled so that the argv[1] could be correctly passed to the interpreter.

However, binfmt_script, as it exists today doesn't populate bprm->filename
correctly.

Another motivation for this patch is the output of ps.  Emulators which use
binfmt_misc may want to keep the output of ps consistent with native
execution.  This requires preserving bprm->filename.  The attached patch
guarantees this even if we have to go through several binfmt handlers
(think of finite loops involving binfmt_script and binfmt_misc).

867c7a0a

[PATCH] pc300_drv build fix · f4f980fb

Andrew Morton authored Jul 31, 2003

From: Adrian Bunk <bunk@fs.tum.de>

Fix compile error introduced by the HDLC update.

f4f980fb

[PATCH] com20020_cs.c doesn't compile · 10aeaa6e

Andrew Morton authored Jul 31, 2003

From: Adrian Bunk <bunk@fs.tum.de>

drivers/net/pcmcia/com20020_cs.c wasn't updated to the module owner
field changes

10aeaa6e

[PATCH] uidhash init-time locking · 47f7edc1

Andrew Morton authored Jul 31, 2003

From: <ffrederick@prov-liege.be>

Add the necessary locking around uid_hash_insert() in uid_cache_init().

(It's an initcall, and the chances of another CPU racing with us here are
basically zero.  But it's good for documentary purposes and the code gets
dropped later anyway...)

47f7edc1

[PATCH] Move the special_file() definition · cbc8594c

Andrew Morton authored Jul 31, 2003

From: <ffrederick@prov-liege.be>

The special_file() macro is being duplicated in JFS.  Move it to fs.h.

cbc8594c

[PATCH] fix read_dir() · 7a6fbd69

Andrew Morton authored Jul 31, 2003

This function tries to allocate increasingly large buffers, but it gets the
bounds wrong by a factor of PAGE_SIZE.  It causes boot-time devfs mounting
to fail.

7a6fbd69

[PATCH] ext3: fix commit assertion failure · b84ee08e

Andrew Morton authored Jul 31, 2003

We're getting asserion failures in commit in data=journal mode.

journal_unmap_buffer() has unexpectedly donated this buffer to the committing
transaction, and the commit-time assertion doesn't expect that to happen. It
doesn't happen in 2.4 because both paths are under lock_journal().

Simply remove the assertion: the commit code will uncheckpoint the buffer and
then recheckpoint it if needed.

b84ee08e

[PATCH] fix ip_conntrack_core.h compile error · 01b1174f

Andrew Morton authored Jul 31, 2003

From: Felipe Alfaro Solana <felipe_alfaro@linuxmail.org>

Fix compile error in 2.6.0-test2 when Netfilter IP connection tracking
is enabled.

01b1174f

[PATCH] fix select() with an xoffed tty · 791c4b04

Andrew Morton authored Jul 31, 2003

From: Manfred Spraul <manfred@colorfullife.com>

Eli Barzilay noticed that select() for tty devices is broken: For
stopped tty devices, select says POLLOUT and write fails with -EAGAIN.

    http://marc.theaimsgroup.com/?l=linux-kernel&m=105902461110282&w=2

I've tracked this back to normal_poll in drivers/char/n_tty.c:

 > if (tty->driver->chars_in_buffer(tty) < WAKEUP_CHARS)
 >                mask |= POLLOUT | POLLWRNORM;

It assumes that a following write will succeed if less than 256 bytes
are in the write buffer right now. This assumption is wrong for
con_write_room: if the console is stopped, it returns 0 bytes buffer
size (con_write_room()). Ditto for pty_write_room.

791c4b04

[PATCH] Fix ipt_helper compilation · a74ccc26

Andrew Morton authored Jul 31, 2003

From: florin@iucha.net (Florin Iucha)

Fix compilation of net/ipv4/netfilter/ipt_helper.c by including the
proper header files.

a74ccc26

[PATCH] direct-io support for XFS unwritten extents · 359a5de1

Andrew Morton authored Jul 31, 2003

From: Nathan Scott <nathans@sgi.com>

This patch adds a mechanism by which a filesystem can register an interest in
the completion of direct I/O. The completion routine will be given the
inode, an offset and a length, and an optional filesystem-private field.

We have extended the use of the buffer_head-based interface (i.e.
get_block_t) for direct I/O such that the b_private field is now utilised.
It is defined to be initially zero at the start of I/O, and will be passed
into the filesystem unmodified by the VFS with each map request, while
setting up the direct I/O. Once I/O has completed the final value of this
pointer will be passed into a filesystems I/O completion handler. This
mechanism can be used to keep track of all of the mapping requests which
encompass an individual direct I/O request.

This has been implemented specifically for XFS, but is done so as to be as
generic as possible. XFS uses this mechanism to provide support for
unwritten extents - these are file extents which have been pre-allocated
on-disk, but not yet written to (once written, these become regular file
extents, but only once I/O is complete).

359a5de1

[PATCH] vmscan: use zone_pressure for page unmapping · 14d927a3

Andrew Morton authored Jul 31, 2003

From: Nikita Danilov <Nikita@Namesys.COM>

Use zone->pressure (rathar than scanning priority) to determine when to
start reclaiming mapped pages in refill_inactive_zone(). When using
priority every call to try_to_free_pages() starts with scanning parts of
active list and skipping mapped pages (because reclaim_mapped evaluates to
0 on low priorities) no matter how high memory pressure is.

14d927a3

[PATCH] vmscan: decaying average of zone pressure · ecbeb4b2

Andrew Morton authored Jul 31, 2003

From: Nikita Danilov <Nikita@Namesys.COM>

The vmscan logic at present will scan the inactive list with increasing
priority until a threshold is triggered.  At that threshold we start
unmapping pages from pagetables.

The problem is that each time someone calls into this code, the priority is
initially low, so some mapped pages will be refiled event hough we really
should be unmapping them now.

Nikita's patch adds the `pressure' field to struct zone.  it is a decaying
average of the zone's memory pressure and allows us to start unmapping pages
immediately on entry to page reclaim, based on measurements which were made
in earlier reclaim attempts.

ecbeb4b2

[PATCH] fix kswapd throttling · 00401a44

Andrew Morton authored Jul 31, 2003

kswapd currently takes a throttling nap even if it freed all the pages it
was asked to free.

Change it so we only throttle if reclaim is not being sufficiently
successful.

00401a44

[PATCH] use mark_page_accessed() in the write() path · ed8ff7a4

Andrew Morton authored Jul 31, 2003

We're currently just setting the referenced bit when modifying pagecache in
write().

Consequently overwritten (and redirtied) pages are remaining on the inactive
list. The net result is that a lot of dirty pages are reaching the tail of
the LRU in page reclaim and are getting written via the writepage() in there.

But a core design objective is to minimise the amount of IO via that path,
and to maximise the amount of IO via balance_dirty_pages(). Because the
latter has better IO patterns.

This may explain the bad IO patterns which Gerrit talked about at KS.

ed8ff7a4

[PATCH] devfs_lookup stack corruption fix rework · 9f49f9f3

Andrew Morton authored Jul 31, 2003

From: Andrey Borzenkov <arvidjaar@mail.ru>

A while back Andrey fixed a devfs bug in which we were running
remove_wait_queue() against a wait_queue_head which was on another process's
stack, and which had gone out of scope.

The patch reverts that fix and does it the same way as 2.4: just leave the
waitqueue struct dangling on the waitqueue_head: there is no need to touch it
at all.

It adds a big comment explaining why we are doing this nasty thing.

9f49f9f3

[PATCH] 6PACK asumes HZ=100 · 74b0ab1b

Andrew Morton authored Jul 31, 2003

From: Hans-Joachim Hetscher <me@privacy.net>

the Hamradio 6pack driver wasn't modified to work with the 1000 HZ
internal kernel timebase.

74b0ab1b

[PATCH] soundcard.c devfs fix · 6107a647
Andrew Morton authored Jul 31, 2003
```
It is using "snd".  It should be using "sound".
```
6107a647

[PATCH] fix alloc_bootmem_low_pages · 7821370f

Andrew Morton authored Jul 31, 2003

From: jbarnes@sgi.com (Jesse Barnes)

This patch is needed for some discontig boxes since the memory maps may
be built out-of-order.

7821370f

[PATCH] ext3: don't start a commit in write_super() · 9f280843

Andrew Morton authored Jul 31, 2003

From: bzzz@tmi.comex.ru

Now we have sync_fs(), the kludge of using write_super() to detect when the
VFS is trying to sync the fs is unneeded.

With this change we don't accidentally run commits in response to kupdate
and bdflush activity and it speedup up some heavy workloads significantly.

9f280843

[PATCH] Fix race in ext3_getblk · 77b070cb

Andrew Morton authored Jul 31, 2003

From: Alex Tomas <bzzz@tmi.comex.ru>

ext3_getblk() memsets a newly allocated buffer, but forgets to check
whether a different thread brought it uptodate while we waited for the
buffer lock.

It's OK normally because we're serialised by the page lock.  But lustre
apparently is doing something different with getblk and hits this race.

Plus I suspect it's racy with competing O_DIRECT writes.

77b070cb

[PATCH] ext3: avoid reading empty inode blocks · bca17d03

Andrew Morton authored Jul 31, 2003

From: Alex Tomas <bzzz@tmi.comex.ru>

ext3_get_inode_loc() read inode's block only if:

  1) this inode has no copy in memory
  2) inode's block has another valid inode(s)

this optimization allows to avoid needless I/O in two cases:

1) just allocated inode is first valid in the inode's block

2) kernel wants to write inode, but buffer in which inode
   belongs to gets freed by VM

bca17d03

[PATCH] rework readahead for congested queues · 12affe8f

Andrew Morton authored Jul 31, 2003

Since Jens changed the block layer to fail readahead if the queue has no
requests free, a few changes suggest themselves.

- It's a bit silly to go and alocate a bunch of pages, build BIOs for them,
  submit the IO only to have it fail, forcing us to free the pages again.

  So the patch changes do_page_cache_readahead() to peek at the queue's
  read_congested state.  If the queue is read-congested we abandon the entire
  readahead up-front without doing all that work.

- If the queue is not read-congested, we go ahead and do the readahead,
  after having set PF_READAHEAD.

  The backing_dev_info's read-congested threshold cuts in when 7/8ths of
  the queue's requests are in flight, so it is probable that the readahead
  abandonment code in __make_request will now almost never trigger.

- The above changes make do_page_cache_readahead() "unreliable", in that it
  may do nothing at all.

  However there are some system calls:

	- fadvise(POSIX_FADV_WILLNEED)
	- madvise(MADV_WILLNEED)
	- sys_readahead()

  In which the user has an expectation that the kernel will actually
  perform the IO.

  So the patch creates a new "force_page_cache_readahead()" which will
  perform the IO regardless of the queue's congestion state.

  Arguably, this is the wrong thing to do: even though the application
  requested readahead it could be that the kernel _should_ abandon the user's
  request because the disk is so busy.

  I don't know.  But for now, let's keep the above syscalls behaviour
  unchanged.  It is trivial to switch back to do_page_cache_readahead()
  later.

12affe8f

[PATCH] fix bogus IO error messages · d49ceaba

Andrew Morton authored Jul 31, 2003

Since Jens added the pagecache readahead support in the block layer we've
been getting bogus IO error messages from buffer.c due to __make_request
calling end_io against a non-uptodate buffer.

We can just use PF_READAHEAD to shut that up. But really, we shouldn't even
have allocated all those pages and submittted the readahead IO if the queue
was congested. We have the infrastructure to do that now.

d49ceaba

[PATCH] Fix vmtruncate race and distributed filesystem race · 3e63f0be

Andrew Morton authored Jul 31, 2003

From: Dave McCracken <dmccr@us.ibm.com>

This patch solves the race between truncate and page in which can cause stray
anon pages to appear in the truncated region.

The race occurs when a process is sleeping in pagein IO during the truncate:
there's a window after checking i_size in which the paging-in process decides
that the page was an OK one.

This leaves an anon page in the pagetables, and if the file is subsequently
extended we have an anon page floating about inside a file-backed mmap - user
modifications will not be written out.

Apparently this is also needed for the implementation of POSIX semantics for
distributed filesystems.

We use a generation counter in the address_space so the paging-in process can
determine whether there was a truncate which might have shot the new page
down.

It's a bit grubby to be playing with files and inodes in do_no_page(), but we
do need the page_table_lock coverage for this, and rearranging thngs to
provide that coverage to filemap_nopage wasn't very nice either.

3e63f0be

[PATCH] Interface to invalidate regions of mmaps · 5096494f

Andrew Morton authored Jul 31, 2003

From: "Paul E. McKenney" <paulmck@us.ibm.com>

The patch reworks and generalises vmtruncate_list() a bit to create an API
which invalidates a specified portion of an address_space, permitting
distributed filesystems to maintain POSIX semantics when a file mmap()ed on
one client is modified on another client.

5096494f

[PATCH] unlock_buffer() needs a barrier · 6e20adb2

Andrew Morton authored Jul 31, 2003

From: Chris Mason <mason@suse.com>

unlock_buffer() needs a barrier before the waitqueue_active() optimisation.

6e20adb2

[PATCH] kwsapd can free too much memory · f76a4338

Andrew Morton authored Jul 31, 2003

We need to subtract the number of freed slab pages from the number of pages
to free, not add it.

f76a4338

[PATCH] non-MII 3c59x fix · 6f3a72d6

Andrew Morton authored Jul 31, 2003

From: Marc Zyngier <mzyngier@freesurf.fr>

The following patch tries to fix a small bug that crept in at some
point during 2.5.

None of my 3c592 or 3c597 would work if I didn't force media
type. Instead, it would try to probe MII, looking for a suitable
transceiver, and finaly give up, because these cards really do not
have any sort of MII... :

  EISA: Probing bus 0 at Intel Corp. 82375EB
  EISA: Mainboard DEC5000 detected.
  EISA: slot 2 : ADP0001 detected.
  EISA: slot 3 : ADP7771 detected.
  EISA: slot 4 : DPTA401 detected.
  EISA: slot 5 : TCM5920 detected.
  3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
  00:05: 3Com EISA 3c592 EISA 10Mbps Demon/Vortex at 0x5000. Vers LK1.1.19
    ***WARNING*** No MII transceivers found!
  EISA: Detected 4 cards.

With the enclosed patch, it just works, at least on my setup (3c592 on
Alpha, and 3c597 on x86). I haven't been able to test it didn't break
cards with MII, because I do not have such cards in my test boxes...

The patch also removes two useless EISA-only #define I introduced some
time ago.

6f3a72d6

[PATCH] dev_t printing · 482a9473

Andrew Morton authored Jul 31, 2003

From: Greg KH <greg@kroah.com>

Different architectures use different types for dev_t, so it is hard to
print dev_t variables out correctly. Quite a lot of code is wrong now, and
will continue to be wrong when 64-bit dev_t is merged.

Greg's patch introduces a little wrapper function which can be used to
safely form a dev_t for printing. I added the format_dev_t function as
well, which is needed for direct insertion in a printk statement.

482a9473

[PATCH] 3c59x suspend/resume fix · 81e99e7f

Andrew Morton authored Jul 31, 2003

Currently, all of the 3c59x power management code is disabled unless the
`enable_wol' module parameter is provided.  This was done because the PM
support was added quite late in the 2.4 cycle.

It was always intended that this conditionality be removed in 2.5.

81e99e7f

[PATCH] update to speedstep-centrino.c · 1a14aeea

Andrew Morton authored Jul 31, 2003

From: Jeremy Fitzhardinge <jeremy@goop.org>

The 900MHz Pentium M has two spaces before the frequency:
"Intel(R) Pentium(R) M processor  900MHz"

This patch adds a 2nd CPU macro (_CPU) which also takes the
stringified speed so that extra spacing can be added.

1a14aeea

[PATCH] buffer.c debugging · e6238ac5

Andrew Morton authored Jul 31, 2003

We get a bug report about once per month wherein find_get_block_slow() spits
an error message.  For some reason we have buffers against a blockdev page
which have the incorrect b_size.

Probably, an earlier set_blcoksize() failed to invalidate all the apges for
some reason.  I just don't know.

The patch adds a bit of extra debug info to aid in diagnosing this.

e6238ac5

[PATCH] re-slabify i386 pgd's and pmd's · 6beadb3b

Andrew Morton authored Jul 31, 2003

From: William Lee Irwin III <wli@holomorphy.com>

The original pgd/pmd slabification patches had a critical bug on
non-PAE where both modifications of pgd entries to remove pagetables
attached for non-PSE mappings back to a PSE state and modifications of
pgd entries to attach pagetables to bring PSE mappings into a non-PSE
state were not propagated to cached pgd's. PAE was immune to it owing
to the shared kernel pmd.

The following patch vs. 2.5.69 restores the slabification done to cache
preconstructed pagetables with the proper propagation of conversions
to and from PSE mappings to cached pgd's for the non-PAE case.

This is an optimization to reduce the bitblitting overhead for spawning
small tasks (for larger ones, bottom-level pagetable copies dominate)
primarily on non-PAE; the PAE code change is largely to remove #ifdefs
and to treat the two cases uniformly, though some positive but small
performance improvement has been observed for PAE in one of mbligh's
posts. The non-PAE performance improvement has been observed on a box
running a script-heavy end-user workload as a large long-term profile
hit count reduction for pgd_alloc() and relatives thereof.

I would very much appreciate outside testers. Even though I've been
able to verify this boots and runs properly and survives several cycles
of restarting X on my non-PAE Thinkpad T21, that environment has never
been able to reproduce the bug. Those with the proper graphics hardware
to prod the affected codepaths into action are the ones best suited to
verify proper functionality. There is also some locking introduced; if
some performance verification on non-PAE SMP i386 targets (my SMP
targets unfortunately all require PAE due to arch code dependencies)
that also have the proper hardware could be done, that would help
determine whether alternative locking schemes that competed against
the one shown here are preferable (in particular, the ticket-based
scheme mentioned in the comments).

6beadb3b

[PATCH] selinux merge · 7bbf0e05

Andrew Morton authored Jul 31, 2003

From Stephen Smalley <sds@epoch.ncsc.mil>

This has been in -mm for a few weeks and James Morris has been
regression testing each release.

7bbf0e05