Commits · 18020d06801fc477eb1373ed3ace0855d8301c69 · Kirill Smelkov / linux

17 Aug, 2002 3 commits
- ppc64: missing include · 18020d06
  Anton Blanchard authored Aug 17, 2002
  
  18020d06
- Merge samba.org:/scratch/anton/linux-2.5 · 142543df
  Anton Blanchard authored Aug 17, 2002
```
into samba.org:/scratch/anton/linux-2.5_work
```
  142543df
- ppc64: Fix breakage when I added sys_readahead · 4cedf231
  Anton Blanchard authored Aug 17, 2002
  
  4cedf231
16 Aug, 2002 2 commits
- Merge samba.org:/scratch/anton/linux-2.5 · 545d1ac3
  Anton Blanchard authored Aug 17, 2002
```
into samba.org:/scratch/anton/linux-2.5_work
```
  545d1ac3
- Merge samba.org:/scratch/anton/linux-2.5 · ca5c2cf6
  Anton Blanchard authored Aug 16, 2002
```
into samba.org:/scratch/anton/linux-2.5_work
```
  ca5c2cf6
15 Aug, 2002 35 commits

Missed prototype for 'system_running' fix. · c2480c85
Linus Torvalds authored Aug 15, 2002

c2480c85

[PATCH] memory leak in current BK · 2329a4f6

Andrew Morton authored Aug 15, 2002

Well I didn't test that very well.  __page_cache_release() is doing a
__free_page() on a zero-ref page, so __free_pages() sends the refcount
negative and doesn't free it.  With patch #8, page_cache_release()
almost never frees pages, but it must have been leaking a little bit.
Lucky it showed up.

This fixes it, and also adds a missing PageReserved test in put_page().
Which makes put_page() identical to page_cache_release(), but there are
header file woes.  I'll fix that up later.

2329a4f6

[PATCH] Reorder unlocking in rq_unlock · 0016745e

Brad Heilbrun authored Aug 15, 2002

This trivial patch reorders the unlocking in rq_unlock()... I was
tired of getting stack dumps in my messages file.

0016745e

Don't allow user-level helpers to be run when our infrastructure · 0704298b
Linus Torvalds authored Aug 15, 2002
```
isn't ready for it (either during early boot, or at shutdown)
```
0704298b
Merge http://linux-isdn.bkbits.net/linux-2.5.isdn · 41421468
Linus Torvalds authored Aug 15, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
41421468
ISDN: Remove debugging code · 0ccca8d5
Kai Germaschewski authored Aug 15, 2002

0ccca8d5
ISDN: Fix BC_BUSY problem · 4f609391
Kai Germaschewski authored Aug 15, 2002
```
Make sure to properly reset the state after disconnect

(Karsten Keil)
```
4f609391
ISDN: Change Christian Mock's email adress · 992bbca5
Kai Germaschewski authored Aug 15, 2002

992bbca5

ISDN: __FUNCTION__ cleanup · 3baba482

Kai Germaschewski authored Aug 15, 2002

Newer gcc's don't like string concat with __FUNCTION__, so
use %s and __FUNCTION__ as argument.

3baba482

ISDN: Use C99 initializers · 6f124a96
Kai Germaschewski authored Aug 15, 2002
```
Thanks to Rusty for posting the script...
```
6f124a96

ISDN: Fix Config.in problem · 7e7f7ea3

Kai Germaschewski authored Aug 15, 2002

drivers/isdn/hysdn/Config.in was referring to
CONFIG_ISDN_CAPI before it was defined.

Noticed by Greg Banks.

7e7f7ea3

[PATCH] thread management - take three · 496084cb

Ingo Molnar authored Aug 15, 2002

you have applied my independent-pointer patch already, but i think your
CLEARTID variant is the most elegant solution: it reuses a clone argument,
thus reduces the number of arguments and it's also a nice conceptual pair
to the existing SETTID call. And the TID field can be used as a 'usage'
field as well, because the TID (PID) can never be 0, reducing the number
of fields in the TCB. And we can change the userspace locking code to use
the TID field no problem.

496084cb

[PATCH] Include tgid when finding next_safe in get_pid() · eb2e58fd
Paul Larson authored Aug 15, 2002
```
Include tgid when finding next_safe in get_pid()
```
eb2e58fd

[PATCH] reduce stack usage of sanitize_e820_map · 270ebb5c

Benjamin LaHaise authored Aug 15, 2002

Currently, sanitize_e820_map uses 0x738 bytes of stack.  The patch below
moves the arrays into __initdata, reducing stack usage to 0x34 bytes.

270ebb5c

[PATCH] uninitialised local in generic_file_write · 7dd294f7

Andrew Morton authored Aug 14, 2002

generic_file_write_nolock() is initialising the pagevec too late,
so if we take an early `goto out' the kernel oopses.  O_DIRECT writes
take that path.

7dd294f7

[PATCH] PCI ID's for 2.5.31 · 75754eb4

Martin Mares authored Aug 14, 2002

I've filtered all submissions to the ID database, merged new ID's from
both 2.4.x and 2.5.x kernels and here is the result -- patch to 2.5.31
pci.ids with all the new stuff. Could you please send it to Linus?
(I would do it myself, but it seems I'll have a lot of work with the
floods in Prague very soon.)

75754eb4

[PATCH] for i386 SETUP CODE · 9cbec887

Keith Mannthey authored Aug 14, 2002

   The following is a simple fix for an array overrun problem in
mpparse.c.  I am working on a multiquad box which has a EISA bus in it
for it's service processor.  It's local bus number is 18 which is > 3
(see quad_local_to_mp_bus_id.  When the NR_CPUS is close the the real
number of cpus adding the EISA bus #18 in the array stomps all over
various things in memory.  The EISA bus does not need to be mapped
anywhere in the kernel for anything.  This patch will not affect non
clustered apic (multiquad) kernels.

9cbec887

[PATCH] Clean up the RPC socket slot allocation code [2/2] · fb9100d0

Trond Myklebust authored Aug 14, 2002

Patch by Chuck Lever. Remove the timeout logic from call_reserve.
This improves the overall RPC call ordering, and ensures that soft
tasks don't time out and give up before they have attempted to send
their message down the socket.

fb9100d0

[PATCH] Clean up the RPC socket slot allocation code [1/2] · 7a72fa16
Trond Myklebust authored Aug 14, 2002
```
Another patch by Chuck Lever. Fixes up some nasty logic in
call_reserveresult().
```
7a72fa16

[PATCH] cleanup RPC accounting · be6dd3ef

Trond Myklebust authored Aug 14, 2002

The following patch is by Chuck Lever, and fixes an an accounting
error in the 'rpc' field in /proc/net/rpc/nfs.

be6dd3ef

[PATCH] Fix typo in the RPC reconnect code... · 0e6a8740
Trond Myklebust authored Aug 14, 2002
```
The following patch fixes a typo that appears both in kernel 2.4.19
and 2.5.31
```
0e6a8740
[PATCH] 2.5.31 reverse spin_lock_irq for i2c-elektor.c · 2e2fa887
Albert Cranford authored Aug 14, 2002
```
Pleaase reverse deadlocking change to i2c-elektor.c
```
2e2fa887

[PATCH] deferred and batched addition of pages to the LRU · 44260240

Andrew Morton authored Aug 14, 2002

The remaining source of page-at-a-time activity against
pagemap_lru_lock is the anonymous pagefault path, which cannot be
changed to operate against multiple pages at a time.

But what we can do is to batch up just its adding of pages to the LRU,
via buffering and deferral.

This patch is based on work from Bill Irwin.

The patch changes lru_cache_add to put the pages into a per-CPU
pagevec.  They are added to the LRU 16-at-a-time.

And in the page reclaim code, purge the local CPU's buffer before
starting.  This is mainly to decrease the chances of pages staying off
the LRU for very long periods: if the machine is under memory pressure,
CPUs will spill their pages onto the LRU promptly.

A consequence of this change is that we can have up to 15*num_cpus
pages which are not on the LRU.  Which could have a slight effect on VM
accuracy, but I find that doubtful.  If the system is under memory
pressure the pages will be added to the LRU promptly, and these pages
are the most-recently-touched ones - the VM isn't very interested in
them anyway.

This optimisation could be made SMP-specific, but I felt it best to
turn it on for UP as well for consistency and better testing coverage.

44260240

[PATCH] pagemap_lru_lock wrapup · eed29d66

Andrew Morton authored Aug 14, 2002

Some fallout from the pagemap_lru_lock changes:

- lru_cache_del() is no longer used.  Kill it.

- page_cache_release() almost never actually frees pages.  So inline
  page_cache_release() and move its rarely-called slow path into (the
  misnamed) mm/swap.c

- update the locking comment in filemap.c.  pagemap_lru_lock used to
  be one of the outermost locks in the VM locking hierarchy.  Now, we
  never take any other locks while holding pagemap_lru_lock.  So it
  doesn't have any relationship with anything.

- put_page() now removes pages from the LRU on the final put.  The
  lock is interrupt safe.

eed29d66

[PATCH] make pagemap_lru_lock irq-safe · aaba9265

Andrew Morton authored Aug 14, 2002

It is expensive for a CPU to take an interrupt while holding the page
LRU lock, because other CPUs will pile up on the lock while the
interrupt runs.

Disabling interrupts while holding the lock reduces contention by an
additional 30% on 4-way.  This is when the only source of interrupts is
disk completion.  The improvement will be higher with more CPUs and it
will be higher if there is networking happening.

The maximum hold time of this lock is 17 microseconds on 500 MHx PIII,
which is well inside the kernel's maximum interrupt latency (which was
100 usecs when I last looked, a year ago).

This optimisation is not needed on uniprocessor, but the patch disables
IRQs while holding pagemap_lru_lock anyway, so it becomes an irq-safe
spinlock, and pages can be moved from the LRU in interrupt context.

pagemap_lru_lock has been renamed to _pagemap_lru_lock to pick up any
missed uses, and to reliably break any out-of-tree patches which may be
using the old semantics.

aaba9265

[PATCH] batched removal of pages from the LRU · 008f707c

Andrew Morton authored Aug 14, 2002

Convert all the bulk callers of lru_cache_del() to use the batched
pagevec_lru_del() function.

Change truncate_complete_page() to not delete the page from the LRU.
Do it in page_cache_release() instead.  (This reintroduces the problem
with final-release-from-interrupt.  THat gets fixed further on).

This patch changes the truncate locking somewhat.  The removal from the
LRU now happens _after_ the page has been removed from the
address_space and has been unlocked.  So there is now a window where
the shrink_cache code can discover the to-be-freed page via the LRU
list.  But that's OK - the page is clean, its buffers (if any) are
clean.  It's not attached to any mapping.

008f707c

[PATCH] batched addition of pages to the LRU · 9eb76ee2

Andrew Morton authored Aug 14, 2002

The patch goes through the various places which were calling
lru_cache_add() against bulk pages and batches them up.

Also.  This whole patch series improves the behaviour of the system
under heavy writeback load.  There is a reduction in page allocation
failures, some reduction in loss of interactivity due to page
allocators getting stuck on writeback from the VM.  (This is still bad
though).

I think it's due to the change here in mpage_writepages().  That
function was originally unconditionally refiling written-back pages to
the head of the inactive list.  The theory being that they should be
moved out of the way of page allocators, who would end up waiting on
them.

It appears that this simply had the effect of pushing dirty, unwritten
data closer to the tail of the inactive list, making things worse.

So instead, if the caller is (typically) balance_dirty_pages() then
leave the pages where they are on the LRU.

If the caller is PF_MEMALLOC then the pages *have* to be refiled.  This
is because VM writeback is clustered along mapping->dirty_pages, and
it's almost certain that the pages which are being written are near the
tail of the LRU.  If they were left there, page allocators would block
on them too soon.  It would effectively become a synchronous write.

9eb76ee2

[PATCH] batched movement of lru pages in writeback · 823e0df8
Andrew Morton authored Aug 14, 2002
```
Makes mpage_writepages() move pages around on the LRU sixteen-at-a-time
rather than one-at-a-time.
```
823e0df8

[PATCH] multithread page reclaim · 3aa1dc77

Andrew Morton authored Aug 14, 2002

This patch multithreads the main page reclaim function, shrink_cache().

This function used to run under pagemap_lru_lock.  Instead, we grab
that lock, put 32 pages from the LRU into a private list, drop the
pagemap_lru_lock and then proceed to attempt to free those pages.

Any pages which were succesfully reclaimed are batch-freed.  Pages
which were not reclaimed are re-added to the LRU.

This patch reduces pagemap_lru_lock contention on the 4-way by a factor
of thirty.

The shrink_cache() code has been simplified somewhat.

refill_inactive() was being called too often - often just to process
two or three pages.  Fiddled with that so it processes pages at the
same rate, but works on 32 pages at a time.

Added a couple of mark_page_accessed() calls into mm/memory.c from 2.4.
They seem appropriate.

Change the shrink_caches() logic so that it will still trickle through
the active list (via refill_inactive) even if the inactive list is much
larger than the active list.

3aa1dc77

[PATCH] pagevec infrastructure · 6a952840

Andrew Morton authored Aug 14, 2002

This is the first patch in a series of eight which address
pagemap_lru_lock contention, and which simplify the VM locking
hierarchy.

Most testing has been done with all eight patches applied, so it would
be best not to cherrypick, please.

The workload which was optimised was: 4x500MHz PIII CPUs, mem=512m, six
disks, six filesystems, six processes each flat-out writing a large
file onto one of the disks.  ie: heavy page replacement load.

The frequency with which pagemap_lru_lock is taken is reduced by 90%.

Lockmeter claims that pagemap_lru_lock contention on the 4-way has been
reduced by 98%.  Total amount of system time lost to lock spinning went
from 2.5% to 0.85%.

Anton ran a similar test on 8-way PPC, the reduction in system time was
around 25%, and the reduction in time spent playing with
pagemap_lru_lock was 80%.

	http://samba.org/~anton/linux/2.5.30/standard/
versus
	http://samba.org/~anton/linux/2.5.30/akpm/

Throughput changes on uniprocessor are modest: a 1% speedup with this
workload due to shortened code paths and improved cache locality.

The patches do two main things:

1: In almost all places where the kernel was doing something with
   lots of pages one-at-a-time, convert the code to do the same thing
   sixteen-pages-at-a-time.  Take the lock once rather than sixteen
   times.  Take the lock for the minimum possible time.

2: Multithread the pagecache reclaim function: don't hold
   pagemap_lru_lock while reclaiming pagecache pages.  That function
   was massively expensive.

One fallout from this work is that we never take any other locks while
holding pagemap_lru_lock.  So this lock conceptually disappears from
the VM locking hierarchy.


So.  This is all basically a code tweak to improve kernel scalability.
It does it by optimising the existing design, rather than by redesign.
There is little conceptual change to how the VM works.

This is as far as I can tweak it.  It seems that the results are now
acceptable on SMP.  But things are still bad on NUMA.  It is expected
that the per-zone LRU and per-zone LRU lock patches will fix NUMA as
well, but that has yet to be tested.


This first patch introduces `struct pagevec', which is the basic unit
of batched work.  It is simply:

struct pagevec {
	unsigned nr;
	struct page *pages[16];
};

pagevecs are used in the following patches to get the VM away from
page-at-a-time operations.

This patch includes all the pagevec library functions which are used in
later patches.

6a952840

[PATCH] lockd shouldn't call posix_unblock_lock here · ecc9d325

Matthew Wilcox authored Aug 14, 2002

nlmsvc_notify_blocked() is only called via the fl_notify() pointer which
is only called immediately after we already did a locks_delete_block(),
so calling posix_unblock_lock() here is always a NOP.

ecc9d325

[PATCH] Modular x86 MTRR driver. · 6a85ced0

Dave Jones authored Aug 14, 2002

This patch from Pat Mochel cleans up the hell that was mtrr.c
into something a lot more modular and easy to understand, by
doing the implementation-per-file as has been done to various
other things by Pat and myself over the last months.

It's functionally identical from a kernel internal point of view,
and a userspace point of view, and is basically just a very large
code clean up.

6a85ced0

[PATCH] stale thread detach debugging removal · 3b307fd5

Ingo Molnar authored Aug 14, 2002

one of the debugging tests triggered a false-positive BUG() when a
detached thread was straced.

3b307fd5

[PATCH] thread release infrastructure · d2b7244f

Ingo Molnar authored Aug 14, 2002

it is much cleaner to pass in the address of the user-space VM lock -
this will also enable arbitrary implementations of the stack-unlock, as
the fifth clone() parameter.

d2b7244f

[PATCH] init_tasks is not defined anywhere. · 86ae817e

Rusty Russell authored Aug 14, 2002

It's referenced by mips and mips64 (both far out of date), but never
actually defined anywhere.

86ae817e