Commits · bebff73ce59d7b28a85b6ba63ed81d378d03223c · nexedi / linux

30 Aug, 2002 40 commits

Andrew Morton authored Aug 30, 2002

mpage_writepages() does a lock_page() on pages to be written back, even
when it is being used for page reclaim writeback.

This is normally OK, because the page is unlocked quickly - pages are
unlocked during writeback and nobody should be performing __GFP_FS
allocations inside lock_page().

But it has introduced a ranking problem in ext3:

generic_file_write
->lock_page
  ->ext3_prepare_write
    ->journal_start	(waits for a commit)

versus

ext3_create()
->journal_start()
  ->ext3_new_inode(GFP_KERNEL)
    ->page reclaim
      ->mpage_writepages
        ->lock_page	(locks up, transaction is held open)

Maybe sometime, I'll have to turn mpage_writepages' lock_page into a
trylock if the caller is PF_MEMALLOC.  But for now, let's make ext3's
inside-transaction allocations use GFP_NOFS.  There is only one of them.

bebff73c

[PATCH] writeback correctness and efficiency changes · ec12ac49

Andrew Morton authored Aug 30, 2002

This is a performance and correctness fix against the writeback paths.

The writeback code has competing requirements.  Sometimes it is used
for "memory cleansing": kupdate, bdflush, writer throttling, page
allocator writeback, etc.  And sometimes this same code is used for
data integrity pruposes: fsync, msync, fdatasync, sync, umount, various
other kernel-internal uses.

The problem is: how to handle a dirty buffer or page which is currently
under writeback.

For memory cleansing, we just want to skip that buffer/page and go onto
the next one.  But for sync, we must wait on the old writeback and then
start new writeback.

mpage_writepages() is current correct for cleansing, but incorrect for
sync.  block_write_full_page() is currently correct for sync, but
inefficient for cleansing.

The fix is fairly simple.

- In mpage_writepages(), don't skip the page is it's a sync
operation.

- In block_write_full_page(), skip the buffer if it is a sync
operation.  And return -EAGAIN to tell the caller that the writeout
didn't work out.  The caller must then set the page dirty again and
move it onto mapping->dirty_pages.

This is an extension of the writepage API: writepage can now return
EAGAIN.  There are only three callers, and they have been updated.

fail_writepage() and ext3_writepage() were actually doing this by
hand.  They have been changed to return -EAGAIN.  NTFS will want to
be able to return -EAGAIN from its writepage as well.

- A sticky question is: how to tell the writeout code which mode it
is operating in?  Cleansing or sync?

It's such a tiny code change that I didn't have the heart to go and
propagate a `mode' argument down every instance of writepages() and
writepage() in the kernel.  So I passed it in via current->flags.

Incidentally, the occurrence of a locked-and-dirty buffer in
block_write_full_page() is fairly rare: normally the collision avoidance
happens at the address_space level, via PageWriteback.  But some
mappings (blockdevs, ext3 files, etc) have their dirty buffers written
out via submit_bh().  It is these buffers which can stall
block_write_full_page().

This wart will be pretty intrusive to fix.  ext3 needs to become fully
page-based (ugh.  It's a block-based journalling filesystem, and pages
are unnatural).  blockdev mappings are still written out by buffers
because that's how filesystems use them.  Putting _all_ metadata
(indirects, inodes, superblocks, etc) into standalone address_spaces
would fix that up.

- filemap_fdatawrite() sets PF_SYNC.  So filemap_fdatawrite() is the
kernel function which will start writeback against a mapping for
"data integrity" purposes, whereas the unexported, internal-only
do_writepages() is the writeback function which is used for memory
cleansing.  This difference is the reason why I didn't consolidate
those functions ages ago...

- Lots of code paths had a bogus extra call to filemap_fdatawait(),
which I previously added in a moment of weak-headedness.  They have
all been removed.

ec12ac49

[PATCH] batched freeing of anon pages · 8fd3d458

Andrew Morton authored Aug 30, 2002

A reworked version of the batched page freeing and lock amortisation
for VMA teardown.

It walks the existing 507-page list in the mmu_gather_t in 16-page
chunks, drops their refcounts in 16-page chunks, and de-LRUs and
frees any resulting zero-count pages in up-to-16 page chunks.

8fd3d458

[PATCH] put_page() consolidation · 2b341443

Andrew Morton authored Aug 30, 2002

Clean up put_page() and page_cache_release().  It's pretty simple now:

#define page_cache_get(page)           get_page(page)
#define page_cache_release(page)       put_page(page)

2b341443

[PATCH] remove pagevec_lru_del() · e035a047

Andrew Morton authored Aug 30, 2002

it was only being used in invalidate_inode_pages(), and from there,
pagevec_release() does the same thing.

e035a047

[PATCH] debug check in put_page_testzero() · c99b0372
Andrew Morton authored Aug 30, 2002
```
As suggested by Daniel - it's a bug to run put_page_testzero
against a zero-ref page.
```
c99b0372

[PATCH] MAINTAINERS patch · cdf2f98b

Ingo Molnar authored Aug 30, 2002

please apply this patch (Robert ACK-ed it). While there is a preemptible
kernel entry already, i think listing this at the scheduler entry is
justfied, preemption has a number of scheduler interactions.

cdf2f98b

[PATCH] ldt-fix-2.5.32-A3 · 89d637a8

Ingo Molnar authored Aug 30, 2002

this is an updated version of the LDT fixes. It fixes the following kinds
of problems:

 - fix a possible gcc optimization causing a race causing the loading of a
   corrupt LDT descriptor upon context switch. [this fix got simplified
   over previous versions.]

 - remove an unconditional OOM printk, and there's no need to set ->size
   in the OOM path.

 - fix preemption bugs, load_LDT()/clear_LDT() was not preemption-safe,
   when it was used outside of spinlocks.

the context-switch race is the following. 'LDT modification' is the
following operation: the seg->ldt pointer is modified, then seg->size is
modified. In theory gcc is free to reschedule the two modifications, and
first modify ->size, then ->ldt. Thus if this modification is not
synchronized with context-switches, another thread might see a temporary
state of the new ->size [which was increased], but still the old pointer.
Ie.:

	CPU0				CPU1

	pc->size = newsize;
					load_LDT(); // (oldptr, newsize)
	pc->ldt = newptr;

the corrupt LDT is loaded until the SMP cross-call is sent, leaving the
window open for many usecs.

the fix is to put a wmb() after ->ldt modifications. [this is also in
preparation of not-write-ordered SMP x86 designs.]

89d637a8

Merge bk://linux-input.bkbits.net/linux-input · e5d588fe
Linus Torvalds authored Aug 30, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
e5d588fe

Ignore error 0xff - 'general error' in AUX wire test in i8042.c, · ed0a0a9c

Vojtech Pavlik authored Aug 30, 2002

some mainboards (Andrew Morton's Dell) report that even everything
is okay with AUX. Also remove a check for very old AMI i8042's, which
could generate false positives on modern buggy mainboards.

ed0a0a9c

Merge bk://jfs.bkbits.net/linux-2.5 · c71a4337
Linus Torvalds authored Aug 30, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
c71a4337
[PATCH] oss/gus_card.c - convert cli to spinlocks · 652cbb16
Peter Wächtler authored Aug 30, 2002

652cbb16
[PATCH] oss/nm256.h - convert cli to spinlocks · f7dc2012
Peter Wächtler authored Aug 30, 2002

f7dc2012
[PATCH] oss/pas2_card.c - convert cli to spinlocks · 81b1edf0
Peter Wächtler authored Aug 30, 2002

81b1edf0
[PATCH] oss/vwsnd.c - convert cli to spinlocks · 2fcfdf56
Peter Wächtler authored Aug 30, 2002

2fcfdf56
[PATCH] oss/trident.c - convert cli to spinlocks · 7a6316fd
Peter Wächtler authored Aug 30, 2002

7a6316fd
[PATCH] oss/midi_synth.c - convert cli to spinlocks · e8342e87
Peter Wächtler authored Aug 30, 2002

e8342e87
[PATCH] oss/sonicvibes.c - convert cli to spinlocks · 4cbc061a
Peter Wächtler authored Aug 30, 2002

4cbc061a
[PATCH] oss/esssolo1.c - convert cli to spinlocks · db0abdb5
Peter Wächtler authored Aug 30, 2002

db0abdb5
[PATCH] oss/rme96xx.c - convert cli to spinlocks · 256de87c
Peter Wächtler authored Aug 30, 2002

256de87c
[PATCH] oss/cmpci.c - convert cli to spinlocks · 8a8ce17b
Peter Wächtler authored Aug 30, 2002

8a8ce17b
[PATCH] oss/waveartist.c - convert cli to spinlocks · 69f5f47a
Peter Wächtler authored Aug 30, 2002

69f5f47a
[PATCH] oss/soundcard.c - convert cli to spinlocks · a5154ee9
Peter Wächtler authored Aug 30, 2002

a5154ee9
[PATCH] oss/wavfront.c - convert cli to spinlocks · a4826ccd
Peter Wächtler authored Aug 30, 2002

a4826ccd
[PATCH] oss/opl3sa2.c - convert cli to spinlocks · 4c6c6e5c
Peter Wächtler authored Aug 30, 2002

4c6c6e5c
[PATCH] oss/opl3sa.c - convert cli to spinlocks · 0b5bb847
Peter Wächtler authored Aug 30, 2002

0b5bb847
[PATCH] oss/dev_table.h - convert cli to spinlocks · 386c8a8e
Peter Wächtler authored Aug 30, 2002

386c8a8e
[PATCH] oss/sys_timer.c - convert cli to spinlocks · 0a4d98b4
Peter Wächtler authored Aug 30, 2002

0a4d98b4
[PATCH] oss/mad16.c - convert cli to spinlocks · e1f63f69
Peter Wächtler authored Aug 30, 2002

e1f63f69
[PATCH] oss/nec_vrc5477.c - convert cli to spinlocks · a74ebe7f
Peter Wächtler authored Aug 30, 2002

a74ebe7f
[PATCH] oss/sound_timer.c - convert cli to spinlocks · 6cadcfe1
Peter Wächtler authored Aug 30, 2002

6cadcfe1
[PATCH] oss/msnd_pinnacle.c - convert cli to spinlocks · 8531bffd
Peter Wächtler authored Aug 30, 2002

8531bffd
[PATCH] oss/es1370.c - convert cli to spinlocks · 01189cd5
Peter Wächtler authored Aug 30, 2002

01189cd5
[PATCH] oss/ite8172.c - convert cli to spinlocks · 64a39157
Peter Wächtler authored Aug 30, 2002

64a39157
[PATCH] oss/maestro.c - convert cli to spinlocks · 9781bbe6
Peter Wächtler authored Aug 30, 2002

9781bbe6

Proper implementation of jfs_get_blocks · ee1aaacd

Dave Kleikamp authored Aug 30, 2002

jfs_get_blocks should return up to the number of blocks in the
extent rather than limiting itself to one block, as the initial,
trivial implementation did.  This greatly reduces the overhead of
O_DIRECT reads and writes.

Submitted by Badari Pulavarty (pbadari@us.ibm.com)

ee1aaacd

Merge http://linuxusb.bkbits.net/pci-2.5 · 55ff4d61
Linus Torvalds authored Aug 30, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
55ff4d61
JFS: Add write_super_lockfs() and unlock_fs() for snapshot. · 68067d0e
Dave Kleikamp authored Aug 30, 2002
```
Submitted by Steve Best.
```
68067d0e
PCI: compile time fix for the pci pool patch. · 17aa2fe5
Greg Kroah-Hartman authored Aug 30, 2002

17aa2fe5

[PATCH] show pci_pool stats in driverfs] · 1e31bbe1

David Brownell authored Aug 30, 2002

This patch exposes basic allocation statistics for pci pools,
very much like /proc/slabinfo but applying to DMA-consistent
memory.  A file "pools" is created in the driverfs directory
for the relevant pci device when the first pool is created, and
removed when the last pool is destroyed.

Please merge to 2.5.latest.  If it matters, DaveM said it
looks fine.  It produces sane output for all the 2.5.30
USB host controller drivers.

1e31bbe1