Commits · e386771cbbc2a90b46df2ace02214f94cc23cb50 · Kirill Smelkov / linux

21 Dec, 2002 16 commits

[PATCH] Give kswapd writeback higher priority than pdflush · e386771c

Andrew Morton authored Dec 21, 2002

The `low latency page reclaim' design works by preventing page
allocators from blocking on request queues (and by preventing them from
blocking against writeback of individual pages, but that is immaterial
here).

This has a problem under some situations.  pdflush (or a write(2)
caller) could be saturating the queue with highmem pages.  This
prevents anyone from writing back ZONE_NORMAL pages.  We end up doing
enormous amounts of scenning.

A test case is to mmap(MAP_SHARED) almost all of a 4G machine's memory,
then kill the mmapping applications.  The machine instantly goes from
0% of memory dirty to 95% or more.  pdflush kicks in and starts writing
the least-recently-dirtied pages, which are all highmem.  The queue is
congested so nobody will write back ZONE_NORMAL pages.  kswapd chews
50% of the CPU scanning past dirty ZONE_NORMAL pages and page reclaim
efficiency (pages_reclaimed/pages_scanned) falls to 2%.

So this patch changes the policy for kswapd.  kswapd may use all of a
request queue, and is prepared to block on request queues.

What will now happen in the above scenario is:

1: The page alloctor scans some pages, fails to reclaim enough
   memory and takes a nap in blk_congetion_wait().

2: kswapd() will scan the ZONE_NORMAL LRU and will start writing
   back pages.  (These pages will be rotated to the tail of the
   inactive list at IO-completion interrupt time).

   This writeback will saturate the queue with ZONE_NORMAL pages.
   Conveniently, pdflush will avoid the congested queues.  So we end up
   writing the correct pages.

In this test, kswapd CPU utilisation falls from 50% to 2%, page reclaim
efficiency rises from 2% to 40% and things are generally a lot happier.


The downside is that kswapd may now do a lot less page reclaim,
increasing page allocation latency, causing more direct reclaim,
increasing lock contention in the VM, etc.  But I have not been able to
demonstrate that in testing.


The other problem is that there is only one kswapd, and there are lots
of disks.  That is a generic problem - without being able to co-opt
user processes we don't have enough threads to keep lots of disks saturated.

One fix for this would be to add an additional "really congested"
threshold in the request queues, so kswapd can still perform
nonblocking writeout.  This gives kswapd priority over pdflush while
allowing kswapd to feed many disk queues.  I doubt if this will be
called for.

e386771c

[PATCH] Remove PF_NOWARN · 833cb2a6

Andrew Morton authored Dec 21, 2002

We keep getting in a mess with the current->flags setting and
unsetting.

Remove current->flags:PF_NOWARN and create __GFP_NOWARN instead.

833cb2a6

[PATCH] misc fixes · 72c36b7d

Andrew Morton authored Dec 21, 2002

- A C99 initialiser in drivers/char/mem.c

- Remove unneeded deref in madvise_willneed()

72c36b7d

[PATCH] Add generic_file_readonly_mmap() for nommu · 503c99ef
Andrew Morton authored Dec 21, 2002
```
Add a generic_file_readonly_mmap() for !CONFIG_MMU.
```
503c99ef

[PATCH] more informative slab poisoning · 4f781c84

Andrew Morton authored Dec 21, 2002

slab poisons objects with 0x5a both when they are constructed and when
they are freed.  So it is not possible to tell whether a deref of
0x5a5a5a5a was a use-before-initialisation bug or a use-after-free bug.

The patch changes it so that

1) A deref of 0x5a5a5a5a means use-of-uninitialised-memory

2) A deref of 0x6b6b6b6b means use-of-freed-memory.

4f781c84

[PATCH] fix use-after-free bug in move_vma() · 5446f21e

Andrew Morton authored Dec 21, 2002

move_vma() calls do_munmap() and then uses the memory at *new_vma.

But when starting X11 it just happens that the memory which do_munmap
unmapped had the same start address and the range at *new_vma.  So new_vma
is freed by do_munmap().

This was never noticed before because (vm_flags & VM_LOCKED) evaluates
false when vm_flags is 0x5a5a5a5a.  But I just changed that to 0x6b6b6b6b
and boom - we call make_pages_present() with start == end == 0x6b6b6b6b and
it goes BUG.

So I think the right fix here is for move_vma() to not inspect the values
of any vma's after it has called do_munmap().

The patch does that, for `new_vma'.

The local variable `vma' is also being used after the call do do_munmap(),
and this may also be a bug.  Proving that this is not so, and adding a
comment to explain why is hereby added to Hugh's todo list ;)

5446f21e

[PATCH] fix a page dirtying race in vmscan.c · 985babe8

Andrew Morton authored Dec 21, 2002

There's a small window in which another CPU could dirty the page after
we've cleaned it, and before we've moved it to mapping->dirty_pages().
The end result is a dirty page on mapping->locked_pages, which is
wrong.

So take mapping->page_lock before clearing the dirty bit.

985babe8

[PATCH] sync_fs deadlock fix · e101875d

Andrew Morton authored Dec 21, 2002

Running a `mount -o remount' against ext3 deadlocks if there is heavy
write activity. It's a sort of AB/BA deadlock caused by calling
log_wait_commit() under lock_super(). The caller holds lock_super()
and is waiting for a commit, but the commit cannot complete because
lock_super() is also used in the block allocator.

The way we fixed this in tha past is to drop the superblock lock inside
ext3. The way this patch fixes it is to arrange for lock_super() to
not be held around the ->sync_fs() call.

Also: sync_filesystems is on the sys_sync() path and is racy wrt
unmount. Check sb->s_root after taking sb->s_umount.

e101875d

Sysenter cleanups (originals by Brian Gerst, updated and expanded by me): · d8ce4c5f

Linus Torvalds authored Dec 21, 2002

 - set up kernel stack pointer for sysenter at each context switch.
 - disable sysenter while in vm86 mode.
 - clean up mtrr number defines and SEP feature testing

d8ce4c5f

Get rid of silly printk's in recent mtrr driver changes. · 5909af06
Linus Torvalds authored Dec 20, 2002

5909af06

[PATCH] PCI: setup-xx fixes · 2ce208e5

Ivan Kokshaysky authored Dec 20, 2002

Don't disable PCI devices before changing the BARs, as discussed
recently. Disabling PCI_COMMAND_MASTER bit is an obvious bug.

Further, pdev_enable_device() is a leftover from very old (2.0, I guess)
alpha PCI code. It's used in pci_assign_unassigned_resources() to
enable *every* PCI device in the system. So, if we have two graphic
cards on the same bus, both with legacy VGA IO... oops.

Actually, only alpha relied on that due to the lack of
pcibios_enable_device (which has been already fixed).

2ce208e5

[PATCH] new attempt at sys_poll allocation (was: Re: Poll patches..) · 9dd405aa

Manfred Spraul authored Dec 20, 2002

This replaces the dynamically allocated two-level array in sys_poll with
a dynamically allocated linked list. The current implementation causes
at least two alloc/free calls, even if only one or two descriptors are
polled. This reduces that to one alloc/free, and the .text segment is
around 220 bytes shorter. The microbenchmark that polls one pipe fd is
around 30% faster. [1140 cycles instead of 1604 cycles, Celeron mobile
1.13 GHz]

9dd405aa

Merge bk://linux-dj.bkbits.net/agpgart · 564dede9
Linus Torvalds authored Dec 20, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
564dede9
Merge tetrachloride.(none):/mnt/stuff/kernel/2.5/bk-linus · 01d8392d
Dave Jones authored Dec 21, 2002
```
into tetrachloride.(none):/mnt/stuff/kernel/2.5/agpgart
```
01d8392d
[AGP] Make things compile again if AGP3=n · add4c230
Dave Jones authored Dec 21, 2002

add4c230
Merge http://lia64.bkbits.net/to-linus-2.5 · c6bb6a89
Linus Torvalds authored Dec 20, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
c6bb6a89

20 Dec, 2002 24 commits

[AGP] Make i845g use correct initialisation routine. · 7f8d65c4
Michael Milligan authored Dec 20, 2002

7f8d65c4
ia64: Fix printing of memory attributes. · 0b6e72b3
David Mosberger authored Dec 20, 2002

0b6e72b3
ia64: Finish 2.5.52+ merge. · 51bba81a
David Mosberger authored Dec 20, 2002

51bba81a
Ignore ".ko" files - kernel module objects. · 6521e426
Linus Torvalds authored Dec 20, 2002

6521e426
Make NFS compile even without NFS_V4 support · b063d7d5
Linus Torvalds authored Dec 20, 2002

b063d7d5
Merge bk://lsm.bkbits.net/linus-2.5 · f803e090
Linus Torvalds authored Dec 20, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
f803e090
Merge clashes between the req_offset() and the XDR cleanups · cbfe51cb
Linus Torvalds authored Dec 20, 2002

cbfe51cb

[PATCH] cleanup: simplify req_offset function in NFS client · 0607be17

Chuck Lever authored Dec 20, 2002

Description:
  everywhere the NFS client uses the req_offset() function today, it adds
  req->wb_offset to the result.  this patch simply makes "+req->wb_offset"
  a part of the req_offset() function.

Test status:
  Passes all Connectathon '02 tests with v2, v3, UDP and TCP.  Passes
  NFS torture tests on an x86 UP highmem system.

0607be17

[PATCH] give NFS client a "set_page_dirty" address space op. · 756e3174

Chuck Lever authored Dec 20, 2002

Description:
  The default set_page_dirty address space op is too heavyweight for NFS,
  which doesn't use buffers.

756e3174

[PATCH] use kmap_atomic instaed of kmap in NFS client · 28865d68

Chuck Lever authored Dec 20, 2002

Description:
  andrew morton suggested there are places in the NFS client that could
  make use of kmap_atomic instead of vanilla kmap in order to improve
  scalability on 8-way and higher SMP systems.

Test status:
  Passes all Connectathon '02 tests with v2 and v3, UDP and TCP; passes
  NFS torture tests on a UP HIGHMEM x86 system.

28865d68

[PATCH] Reduce redundancy in v850 linker scripts · e7e4d66f

Miles Bader authored Dec 20, 2002

This moves most of the duplicated text in the various v850 platform-
specific linker scripts (each of which was previously completely
standalone) into cpp macros in vmlinux.lds.S, which are then used by the
platform linker scripts as appropriate.  This should make the scripts a
lot easier to maintain.

Also, a number of linker-script bugs are fixed.

e7e4d66f

[PATCH] Pass extra signal handler args correctly on the v850 · fb2fde15

Miles Bader authored Dec 20, 2002

The old code seems completely wrong; I guess it was just left over from
whichever architecture this code was copied from.

fb2fde15

[PATCH] Add some v850 elf constants · 0c907d80

Miles Bader authored Dec 20, 2002

These are used for the new in-kernel module loader (actually not all the
relocation types are used right now, but are included for completeness).

Only the EM_CYGNUS_V850 macro, which is in a global namespace, is added
to <linux/elf.h>; the relocation types, which are private to the v850,
are added to <asm-v850/elf.h>. [Perhaps some other archs can do a
similar split, to reduce the bloat in <linux/elf.h>]

0c907d80

[PATCH] Add v850 support for `sys_restart_syscall' · 285a7c9f
Miles Bader authored Dec 20, 2002

285a7c9f

[PATCH] Make some symbol exports conditional on CONFIG_MMU · 31c9fa59

Miles Bader authored Dec 20, 2002

A few symbols are only defined when CONFIG_MMU=y, but are exported
(by kernel/ksyms.c) unconditionally.  This patch makes them conditional.

31c9fa59

[PATCH] Update v850 includes for slimmed-down sched.h · bacb63a9

Miles Bader authored Dec 20, 2002

Adds extra includes needed because sched.h doesn't include them anymore,
and removes includes of sched.h where they're not really necessary.

bacb63a9

[PATCH] Fix CPU bitmask truncation · 8f309a3f

William Lee Irwin III authored Dec 20, 2002

Fix task->cpus_allowed bitmask truncations on 64.bit architectures.

Originally by Bjorn Helgaas for 2.4.x.

8f309a3f

Merge master.kernel.org:/home/hch/BK/xfs/linux-2.5 · c2e95c3f
Linus Torvalds authored Dec 20, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
c2e95c3f
[XFS] "merge" the 2.4 fsx fix for block size < page size to 2.5. This needed · b33cc8f7
Russell Cattelan authored Dec 20, 2002
```
major changes to actually fit.

SGI Modid: 2.5.x-xfs:slinx:132210a
```
b33cc8f7
[XFS] Change some %x formats to %p for pointers · 3b1b949f
Eric Sandeen authored Dec 20, 2002
```
SGI Modid: 2.5.x-xfs:slinx:135454a
```
3b1b949f
[XFS] Fix some setxattr compiler warnings (const). · 8920b3cc
Nathan Scott authored Dec 20, 2002
```
SGI Modid: 2.5.x-xfs:slinx:135453a
```
8920b3cc

[XFS] Fix up setting up of sector size for the superblock buffer after the · 82079c70

Nathan Scott authored Dec 20, 2002

very first read on mount.  Make some of the surrounding code dealing
with buffers consistent.

SGI Modid: 2.5.x-xfs:slinx:135452a

82079c70

[XFS] fix an out-of-date comment · 073f32e7
Christoph Hellwig authored Dec 20, 2002
```
SGI Modid: 2.5.x-xfs:slinx:135307a
```
073f32e7
[XFS] remove references to i_dev, it's gone in recent kernels · 84a9a256
Christoph Hellwig authored Dec 20, 2002
```
SGI Modid: 2.5.x-xfs:slinx:135308a
```
84a9a256