Commits · a27efcaff9ffd5ad05f4e111751da41a8820f7ab · nexedi / linux

05 Oct, 2002 20 commits

Andrew Morton authored Oct 04, 2002

The patch removes page->virtual for all architectures which do not
define WANT_PAGE_VIRTUAL.  Hash for it instead.

Possibly we could define WANT_PAGE_VIRTUAL for CONFIG_HIGHMEM4G, but it
seems unlikely.

A lot of the pressure went off kmap() and page_address() as a result of
the move to kmap_atomic().  That should be the preferred way to address
CPU load in the set_page_address() and page_address() hashing and
locking.

If kmap_atomic is not usable then the next best approach is for users
to cache the result of kmap() in a local rather than calling
page_address() repeatedly.

One heavy user of kmap() and page_address() is the ext2 directory code.

On a 7G Quad PIII, running four concurrent instances of

	while true
	do
		find /usr/src/linux > /dev/null
	done

on ext2 with everything cached, profiling shows that the new hashed
set_page_address() and page_address() implementations consume 0.4% and
1.3% of CPU time respectively.   I think that's OK.

a27efcaf

[PATCH] use buffer_boundary() for writeback scheduling hints · 343893e6

Andrew Morton authored Oct 04, 2002

This is the replacement for write_mapping_buffers().

Whenever the mpage code sees that it has just written a block which had
buffer_boundary() set, it assumes that the next block is dirty
filesystem metadata.  (This is a good assumption - that's what
buffer_boundary is for).

So we do a lookup in the blockdev mapping for the next block and it if
is present and dirty, then schedule it for IO.

So the indirect blocks in the blockdev mapping get merged with the data
blocks in the file mapping.

This is a bit more general than the write_mapping_buffers() approach.
write_mapping_buffers() required that the fs carefully maintain the
correct buffers on the mapping->private_list, and that the fs call
write_mapping_buffers(), and the implementation was generally rather
yuk.

This version will "just work" for filesystems which implement
buffer_boundary correctly.  Currently this is ext2, ext3 and some
not-yet-merged reiserfs patches.  JFS implements buffer_boundary() but
does not use ext2-like layouts - so there will be no change there.

Works nicely.

343893e6

[PATCH] remove write_mapping_buffers() · 4ac833da

Andrew Morton authored Oct 04, 2002

When the global buffer LRU was present, dirty ext2 indirect blocks were
automatically scheduled for writeback alongside their data.

I added write_mapping_buffers() to replace this - the idea was to
schedule the indirects close in time to the scheduling of their data.

It works OK for small-to-medium sized files but for large, linear writes
it doesn't work: the request queue is completely full of file data and
when we later come to scheduling the indirects, their neighbouring data
has already been written.

So writeback of really huge files tends to be a bit seeky.

So. Kill it. Will fix this problem by other means.

4ac833da

[PATCH] use bio_get_nr_vecs() for sizing direct-io BIOs · e3b12fc1

Andrew Morton authored Oct 04, 2002

From Badari Pulavarty.

Rather than allocating maximum-sized BIOs, use the new
bio_get_nr_vecs() hint when sizing the BIOs.

Also keep track of the approximate upper-bound on the number of pages
remaining to do, so we can again avoid allocating excessively-sized
BIOs.

e3b12fc1

[PATCH] Documentation/filesystems/ext3.txt · 6fb75ca4
Andrew Morton authored Oct 04, 2002
```
By Vincent Hanquez <tab@tuxfamily.org>
```
6fb75ca4
[PATCH] use bio_get_nr_vecs() hint for pagecache writeback · f2b01f8b
Andrew Morton authored Oct 04, 2002
```
Use the bio_get_nr_pages() hint for sizing the BIOs which writeback
allocates.
```
f2b01f8b

[PATCH] fix reclaim for higher-order allocations · 3209a954

Andrew Morton authored Oct 04, 2002

The page reclaim logic will bail out if all zones are at pages_high.
But if the caller is requesting a higher-order allocation we need to go
on and free more memory anyway.  That's the only way we have of
addressing buddy fragmentation.

3209a954

[PATCH] separation of direct-reclaim and kswapd functions · bf3f607a

Andrew Morton authored Oct 04, 2002

There is some lack of clarity in what kswapd does and what
direct-reclaim tasks do; try_to_free_pages() tries to service both
functions, and they are different.

- kswapd's role is to keep all zones on its node at

	zone->free_pages >= zone->pages_high.

  and to never stop as long as any zones do not meet that condition.

- A direct reclaimer's role is to try to free some pages from the
  zones which are suitable for this particular allocation request, and
  to return when that has been achieved, or when all the relevant zones
  are at

	zone->free_pages >= zone->pages_high.

The patch explicitly separates these two code paths; kswapd does not
run try_to_free_pages() any more.  kswapd should not be aware of zone
fallbacks.

bf3f607a

[PATCH] mempool wakeup fix · fe66ad33

Andrew Morton authored Oct 04, 2002

When the mempool is empty, tasks wait on the waitqueue in "exclusive
mode".  So one task is woken for each returned element.

But if the number of tasks which are waiting exceeds the mempool's
specified size (min_nr), mempool_free() ends up deciding that as the
pool is fully replenished, there cannot possibly be anyone waiting for
more elements.

But with 16384 threads running tiobench, it happens.

We could fix this with a waitqueue_active() test in mempool_free().
But rather than adding that test to this fastpath I changed the wait to
be non-exclusive, and used the prepare_to_wait/finish_wait API, which
will be quite beneficial in this case.

Also, convert the schedule() in mempool_alloc() to an io_schedule(), so
this sleep time is accounted as "IO wait".  Which is a bit approximate
- we don't _know_ that the caller is really waiting for IO completion.
But for most current users of mempools, io_schedule() is more accurate
than schedule() here.

fe66ad33

[PATCH] O_DIRECT invalidation fix · a7634cff

Andrew Morton authored Oct 04, 2002

If the alignment checks in generic_direct_IO() fail, we end up not
forcing writeback of dirty pagecache pages, but we still run
invalidate_inode_pages2().  The net result is that dirty pagecache gets
incorrectly removed.  I guess this will expose unwritten disk blocks.

So move the sync up into generic_file_direct_IO(), where we perform the
invalidation.  So we know that pagecache and disk are in sync before we
do anything else.

a7634cff

[PATCH] truncate fixes · 911ceab5

Andrew Morton authored Oct 04, 2002

The new truncate code needs to check page->mapping after acquiring the
page lock.  Because the page could have been unmapped by page reclaim
or by invalidate_inode_pages() while we waited for the page lock.

Also, the page may have been moved between a tmpfs inode and
swapper_space.  Because we don't hold the mapping->page_lock across the
entire truncate operation any more.

Also, change the initial truncate scan (the non-blocking one which is
there to stop as much writeout as possible) so that it is immune to
other CPUs decreasing page->index.

Also fix negated test in invalidate_inode_pages2().  Not sure how that
got in there.

911ceab5

[PATCH] distinguish between address span of a zone and the number · d3975580

Andrew Morton authored Oct 04, 2002

From David Mosberger

The patch below fixes a bug in nr_free_zone_pages() which shows when a
zone has hole.  The problem is due to the fact that "struct zone"
didn't keep track of the amount of real memory in a zone.  Because of
this, nr_free_zone_pages() simply assumed that a zone consists entirely
of real memory.  On machines with large holes, this has catastrophic
effects on VM performance, because the VM system ends up thinking that
there is plenty of memory left over in a zone, when in fact it may be
completely full.

The patch below fixes the problem by replacing the "size" member in
"struct zone" with "spanned_pages" and "present_pages" and updating
page_alloc.c.

d3975580

[PATCH] remove debug code from list_del() · 9d66d9e9

Andrew Morton authored Oct 04, 2002

It hasn't caught any bugs, and it is causing confusion over whether
this is a permanent part of list_del() behaviour.

9d66d9e9

[PATCH] hugetlb kmap fix · db12b88f

Andrew Morton authored Oct 04, 2002

From Bill Irwin

This patch makes alloc_hugetlb_page() kmap() the memory it's zeroing,
and cleans up a tiny bit of list handling on the side.  Without this
fix, it oopses every time it's called.

db12b88f

[PATCH] fix /proc/vmstat:pgpgout/pgpgin · 908325dc

Andrew Morton authored Oct 04, 2002

These numbers are being sent to userspace as number-of-sectors, whereas
they should be number-of-k.

908325dc

[PATCH] struct super_block cleanup - ext3 · 5868a499
Brian Gerst authored Oct 04, 2002
```
Removes the last member of the union, ext3.
```
5868a499
[PATCH] struct super_block cleanup - hpfs · 40f51070
Brian Gerst authored Oct 04, 2002
```
Remove hpfs_sb from struct super_block.
```
40f51070

[PATCH] SCSI tape devfs & driverfs fix · 9709ae9f

Kai Mäkisara authored Oct 04, 2002

fix device numbering in driverfs and devfs broken by previous patch
(bug found by Bjoern A. Zeeb (bz@zabbadoz.net))

9709ae9f

[PATCH] Updated NatSemi SCx200 patches for Linux-2.5 · 3900abd5

Christer Weinigel authored Oct 04, 2002

This patch adds support for the National Semiconductor SCx200
processor family to Linux 2.5.

The patch consists of the following drivers:

  arch/i386/kernel/scx200.c -- give kernel access to the GPIO pins

  drivers/chars/scx200_gpio.c -- give userspace access to the GPIO pins
  drivers/chars/scx200_wdt.c -- watchdog timer driver

  drivers/i2c/scx200_i2c.c -- use any two GPIO pins as an I2C bus
  drivers/i2c/scx200_acb.c -- driver for the Access.BUS hardware

  drivers/mtd/maps/scx200_docflash.c -- driver for a CFI flash connected
                                      to the DOCCS pin

3900abd5

[PATCH] FAT/VFAT memory corruption during mount() · 10d033f7

Petr Vandrovec authored Oct 04, 2002

This patch fixes memory corruption during vfat mount: one byte
before mount options is overwritten by ',' since strtok->strsep
conversion happened.

This patch also fixes another problem introduced by strtok->strsep
conversion: VFAT requires that FAT does not modify passed options,
but unfortunately FAT driver fails to preserve options string if
there is more than one consecutive comma in option string.

10d033f7

04 Oct, 2002 20 commits

Use dump_stack() for the USB storage buffer size checking, · eff95566
Linus Torvalds authored Oct 04, 2002
```
to make it possible to track down.
```
eff95566
Undo due to weird behaviour on various boxes · 182d090b
Linus Torvalds authored Oct 04, 2002
```
Cset exclude: ink@jurassic.park.msu.ru|ChangeSet|20021003201553|58706
```
182d090b

[PATCH] NFS: readdir reply truncated! · d08a0a0e

Trond Myklebust authored Oct 04, 2002

Duh... Even a simple one-liner test can be wrong. The really sad bit
is that I made the same mistake 3 weeks ago, fixed it, and then lost
track of the fix...

To recap fix to fix: A valid end of directory marker has to read
(entry[0]==0 && entry[1]!=0). Here is final correct (I hope) patch.

d08a0a0e

[PATCH] 64-bit timer fix · 2f210ce0

Anton Blanchard authored Oct 04, 2002

I think I have found it and it only hits on a 64 bit machine.

If the timeout is big enough we still need to initialise timer->entry.
Otherwise bad things happen we we hit del_timer.

2f210ce0

Merge http://linuxusb.bkbits.net/pci-2.5 · 8737bd66
Linus Torvalds authored Oct 04, 2002
```
into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
```
8737bd66
Export the gdt table GPL-only for APM. · 3d2251c0
Linus Torvalds authored Oct 03, 2002

3d2251c0
Merge kroah.com:/home/greg/linux/BK/bleeding_edge-2.5 · 0f6c515f
Greg Kroah-Hartman authored Oct 03, 2002
```
into kroah.com:/home/greg/linux/BK/pci-2.5
```
0f6c515f
PCI: remove pcibios_find_device() from the 53c7,8xx.c SCSI driver · c4e4d47d
Greg Kroah-Hartman authored Oct 03, 2002

c4e4d47d
Merge s390 update into current tree · b3819ec5
Linus Torvalds authored Oct 03, 2002

b3819ec5

[PATCH] s390 update (27/27): control characters. · 5cc974f5

Martin Schwidefsky authored Oct 03, 2002

Replace IMMEDIATE_BH bottom half by tasklets in helper functions for
console control characters. Fix a race condition and make it look nicer.

5cc974f5

[PATCH] s390 update (26/27): /proc/interrupts. · 2b46c627
Martin Schwidefsky authored Oct 03, 2002
```
Don't create /proc/interrupts on s390.
```
2b46c627

[PATCH] s390 update (25/27): init call. · 4eef3a34

Martin Schwidefsky authored Oct 03, 2002

Remove call to s390_init_machine_check in init/main.c, the new boot code
on s390 calls it via arch_initcall.

4eef3a34

[PATCH] s390 update (24/27): boot sequence. · ebbde003

Martin Schwidefsky authored Oct 03, 2002

Rework boot sequence on s390:

Traditionally, device detection os s390 is done completely
at a _very_ early stage during bootup (from init_irq(),
i.e. before memory management or the console are there).

This has always been a bad idea, but now it broke even more
since the linux driver model requires devices detection
to take place after the core_initcalls are done.

We now do only a small amount of scanning (probably
less in the future) at the early stage, the bulk of it
is done from a proper subsys_initcall(). This requires
some changes in related areas:

- the machine check handler initialization is split in
  two halves, since we want to catch major machine malfunctions
  as early as possible, but device machine checks can only
  be caught after the channel subsystem is up.

- some functions that are called from the css initialization
  made some assumptions of when to use kmalloc or bootmem_alloc,
  which were broken anyway. We fix this here and hopefully
  can get rid of bootmem_alloc for the css completely in the future.

- the debug logging feature for s390 was not used for functions
  in the initialization before, since it requires the memory
  management to be working. Now that we can be sure that it
  works, some special cases can be removed.

Now that these changes are done, a partial implementation of the
device model for the channel subsystem is possible, but at this
point, none of the device drivers make use of that yet.

ebbde003

[PATCH] s390 update (23/27): channel paths. · 2abb6c50
Martin Schwidefsky authored Oct 03, 2002
```
Check if defined chpids are available. Some code simplification.
```
2abb6c50
[PATCH] s390 update (22/27): s390_process_IRQ. · 09212816
Martin Schwidefsky authored Oct 03, 2002
```
Cleanup s390_process_IRQ a little, the ending_status argument is never
really used.
```
09212816

[PATCH] s390 update (21/27): sync i/o bug. · 9c6615ff

Martin Schwidefsky authored Oct 03, 2002

Remove bogus sanity check from {en,dis}able_sync_isc() and really disable all
interrupt sub classes except isc 7 in wait_cons_dev.

9c6615ff

[PATCH] s390 update (20/27): signal quiesce. · 9ef3b8ca

Martin Schwidefsky authored Oct 03, 2002

Add 'signal quiesque' feature to s390 hardware console. A signal quiesce
is sent from VM or the service element every time the system should shut
down. We receive the quiesce signal and call ctrl_alt_del(). Finally the
mainframes have ctrl-alt-del as well :-)

9ef3b8ca

[PATCH] s390 update (19/27): ptrace cleanup. · f90dc9f3

Martin Schwidefsky authored Oct 03, 2002

Rewrite s390 ptrace code in a more readable and less buggy way. As a part of
this, all psw related definitions are moved into ptrace.h from a number of
different locations.

f90dc9f3

[PATCH] s390 update (18/27): fpu registers. · 91b9f2e4
Martin Schwidefsky authored Oct 03, 2002
```
Cleanup load/store of fpu register on s390.
```
91b9f2e4
[PATCH] s390 update (17/27): beautification. · fbd32c90
Martin Schwidefsky authored Oct 03, 2002
```
Remove bogus sanity checks and code cleanup.
```
fbd32c90