Commits · 88c4297444a93e4ff6161eb3e529b2b085657285 · Kirill Smelkov / linux

16 Sep, 2002 12 commits

[PATCH] fix elevator_linus accounting · 88c42974

Jens Axboe authored Sep 15, 2002

elevator_linus is seriously broken wrt accounting. Marcelo recently took
the patch to fix it in 2.4.20-pre, here's the 2.5 equiv.

Right now, we account merges as costly and seeks as not. Only thing that
prevents seek starvation is the aging scan. That is broken, very much
so. This patch fixes that to account merges and inserts differently. A
seek is ELV_LINUS_SEEK_COST more costly than a merge, currently that
define is at '16'. Doing the math on a disk, this sort of makes sense.

Defaults are read latency of 1024, which means 1024 merges or 64 seeks.
Writes are double that.

88c42974

[PATCH] limit size of bio_vec pools · 33ddb687

Jens Axboe authored Sep 15, 2002

We are currently wasting ~2MiB on the bio pools. This is ok on systems
with plenty of ram, but it's too much for a 16mb system for instance.

This patch scales the bio_vec mempool sizes a bit. The logic is mainly:

+       megabytes = nr_free_pages() >> (20 - PAGE_SHIFT);
+       if (megabytes <= 16)
+               scale = 0;
+       else if (megabytes <= 32)
+               scale = 1;
+       else if (megabytes <= 64)
+               scale = 2;
+       else if (megabytes <= 96)
+               scale = 3;
+       else if (megabytes <= 128)
+               scale = 4;

and then for mempool setup:

+               if (i >= scale)
+                       pool_entries >>= 1;
+
+               bp->pool = mempool_create(pool_entries, slab_pool_alloc,
                                        slab_pool_free, bp->slab);

So we allocate less and less entries for the bigger sized pools. It
doesn't make too much sense to fill the memory with sg tables for 256
page entries on a 16mb system.

In addition, we select a starting nr_pool_entries point, based on amount
of ram as well:

+       pool_entries = megabytes * 2;
+       if (pool_entries > 256)
+               pool_entries = 256;

The end-result is that on a 128mb system, it looks like:

BIO: pool of 256 setup, 14Kb (56 bytes/bio)
biovec pool[0]:   1 bvecs: 244 entries (12 bytes)
biovec pool[1]:   4 bvecs: 244 entries (48 bytes)
biovec pool[2]:  16 bvecs: 244 entries (192 bytes)
biovec pool[3]:  64 bvecs: 244 entries (768 bytes)
biovec pool[4]: 128 bvecs: 122 entries (1536 bytes)
biovec pool[5]: 256 bvecs:  61 entries (3072 bytes)

ie a total of ~620KiB used. Booting with mem=32m gives us:

BIO: pool of 256 setup, 14Kb (56 bytes/bio)
biovec pool[0]:   1 bvecs:  56 entries (12 bytes)
biovec pool[1]:   4 bvecs:  28 entries (48 bytes)
biovec pool[2]:  16 bvecs:  14 entries (192 bytes)
biovec pool[3]:  64 bvecs:   7 entries (768 bytes)
biovec pool[4]: 128 bvecs:   3 entries (1536 bytes)
biovec pool[5]: 256 bvecs:   1 entries (3072 bytes)

ie a total of ~31KiB. Booting with 512mb makes it:

BIO: pool of 256 setup, 14Kb (56 bytes/bio)
biovec pool[0]:   1 bvecs: 256 entries (12 bytes)
biovec pool[1]:   4 bvecs: 256 entries (48 bytes)
biovec pool[2]:  16 bvecs: 256 entries (192 bytes)
biovec pool[3]:  64 bvecs: 256 entries (768 bytes)
biovec pool[4]: 128 bvecs: 256 entries (1536 bytes)
biovec pool[5]: 256 bvecs: 256 entries (3072 bytes)

which is the same as before. The cut-off point is somewhere a bit over
256mb. Andrew suggested we may want to 'cheat' a bit here, and leave the
busy pools alone. We know that mpage is going to be heavy on the 16
entry pool, so it migh make sense to make such a pool and not scale
that. We can deal with that later, though.

33ddb687

ide.h needs to include pci.h · e80bc959
Jens Axboe authored Sep 16, 2002

e80bc959
piix_pci_info() needs to be __initdata, not __devinit · 04968341
Jens Axboe authored Sep 16, 2002

04968341
Mistakenly enabled ide-tape, disable it again (update of it is broken) · 1a9deedf
Jens Axboe authored Sep 16, 2002

1a9deedf
Update promise drivers to new ide pci init scheme, remove now unused · c8b74f4b
Jens Axboe authored Sep 16, 2002
```
old pdc202xx.c
```
c8b74f4b
New IDE pci low level driver setup scheme · aa509d0d
Jens Axboe authored Sep 16, 2002

aa509d0d
Missing module_init() · a40bed1d
Jens Axboe authored Sep 16, 2002

a40bed1d
Move pio setup and blacklists to ide-lib · 84fb4308
Jens Axboe authored Sep 16, 2002

84fb4308
Missing exports · 7526c9af
Jens Axboe authored Sep 16, 2002

7526c9af
Make sure ide init happens in the right order · 65fce515
Jens Axboe authored Sep 16, 2002

65fce515
Cleanup Config.in, and remove unused options · 409d51dc
Jens Axboe authored Sep 16, 2002

409d51dc

15 Sep, 2002 14 commits

[PATCH] Remove CONFIG_SMP around wait_task_inactive() · 6865038a

David Gibson authored Sep 15, 2002

Linus, please apply.  This defines wait_task_inactive() to be a no-op
on UP machines, and removes the #ifdef CONFIG_SMP which surrounds
current calls.

This also fixes compile on UP which was broken by the addition of a
call to wait_task_inactive in fs/exec.c which was not protected by an
#ifdef.

6865038a

[PATCH] various small cleanups · 16b38746

Andrew Morton authored Sep 15, 2002

- Remove defunct active_list/inactive_list declarations (wli)

- Update an obsolete comment (wli)

- "mm/slab.c contains one leftover from the initial version with
  'unsigned short' bufctl entries.  The attached patch replaces '2'
  with the correct sizeof [which is now 4]" - Manfred Spraul

- BUG checks for vfree/vunmap being called in interrupt context
  (because they take irq-unsafe spinlocks, I guess?) - davej

- Simplify some coding in one_highpage_init() (Christoph Hellwig).

16b38746

[PATCH] add dump_stack(): cross-arch backtrace · 5868caf6

Andrew Morton authored Sep 15, 2002

From Christoph Hellwig, also present in 2.4.

Create an arch-independent `dump_stack()' function.  So we don't need to do

#ifdef CONFIG_X86
	show_stack(0);		/* No prototype in scope! */
#endif

any more.

The whole dump_stack() implementation is delegated to the architecture.
If it doesn't provide one, there is a default do-nothing library
function.

5868caf6

[PATCH] clean up the TLB takedown code, remove debug · 5045fffe

Andrew Morton authored Sep 15, 2002

- Remove the temp /proc/meminfo stats

- Make the mmu_gather_t be 2048 bytes again

- Removed unused variable (Oleg Nesterov)

5045fffe

[PATCH] fix a bogus OOM condition for __GFP_NOFS allocations · 483a40e4

Andrew Morton authored Sep 15, 2002

If a GFP_NOFS allocation is made when the ZONE_NORMAL inactive list is
full of dirty or under-writeback pages, there is nothing the caller can
do to force some page reclaim.  The caller ends up getting oom-killed.

- In mempool_alloc(), don't try to perform page reclaim again.  Just
  go to sleep and wait for some elements to be returned to the pool.

- In try_to_free_pages(): perform a single, short scan of the LRU and
  if that doesn't work, fail the allocation.  GFP_NOFS allocators know
  how to handle that.

483a40e4

[PATCH] hold the page ref across ->readpage · f3b3dc81

Andrew Morton authored Sep 15, 2002

read_pages() is dropping the page refcount before running ->readpage().
Which just happens to work, because the page is in pagecache and
locked.

But it breaks under some unconventional things which reiser4 is doing,
and it's better/safer/saner this way anyway.

f3b3dc81

[PATCH] ext3 ceanup: use EXT3_SB · db748675

Andrew Morton authored Sep 15, 2002

Patch from Jani Monoses <jani@iv.ro>

"This turns the remaining parts of ext3 to EXT3_SB and turns the
 latter from a macro to inline function which returns the generic_sbp
 field of u.

 linux/fs.h is not touched by this patch though.

 Intermezzo's three uses of ext3_sb are also not changed."

db748675

[PATCH] add /proc/meminfo:Mapped · 73960360

Andrew Morton authored Sep 15, 2002

The patch adds a "Mapped" field to /proc/meminfo - tha amount of memory
which is mapped into pagetables.

This is a useful statistic to monitor when testing and observing the
vitual memory system.

73960360

[PATCH] fix reverse map accounting leak · 05d9bac3
Andrew Morton authored Sep 15, 2002
```
From Hugh Dickins.  Fix a leak in the /proc/meminfo:ReverseMaps
accounting.
```
05d9bac3

[PATCH] hugetlb pages · c9d3808f

Andrew Morton authored Sep 15, 2002

Rohit Seth's ia32 huge tlb pages patch.

Anton Blanchard took a look at this today; he seemed happy
with it and said he could borrow bits.

c9d3808f

[PATCH] resurrect /proc/meminfo:Buffers · fca174cc

Andrew Morton authored Sep 15, 2002

The /proc/meminfo:Buffers statistic is quite useful - it tells us
how effective we are being at caching filesystem metadata.

For example, increases in this figure are a measure of success of the
slablru and buffer_head-limitation patches.

The patch resurrects buffermem accounting.  The metric is calculated
on-demand, via a walk of the blockdev hashtable.

fca174cc

[PATCH] low-latency zap_page_range · e572ef2e

Andrew Morton authored Sep 15, 2002

zap_page_range and truncate are the two main latency problems
in the VM/VFS.  The radix-tree-based truncate grinds that into
the dust, but no algorithmic fixes for pagetable takedown have
presented themselves...

Patch from Robert Love.

Attached patch implements a low latency version of "zap_page_range()".

Calls with even moderately large page ranges result in very long lock
held times and consequently very long periods of non-preemptibility.
This function is in my list of the top 3 worst offenders.  It is gross.

This new version reimplements zap_page_range() as a loop over
ZAP_BLOCK_SIZE chunks.  After each iteration, if a reschedule is
pending, we drop page_table_lock and automagically preempt.  Note we can
not blindly drop the locks and reschedule (e.g. for the non-preempt
case) since there is a possibility to enter this codepath holding other
locks.

... I am sure you are familar with all this, its the same deal as your
low-latency work.  This patch implements the "cond_resched_lock()" as we
discussed sometime back.  I think this solution should be acceptable to
you and Linus.

There are other misc. cleanups, too.

This new zap_page_range() yields latency too-low-to-benchmark: <<1ms.

e572ef2e

Linux v2.5.35 · 697f3abe
Linus Torvalds authored Sep 15, 2002

697f3abe
Merge bk://ppc.bkbits.net/for-linus-ppc · 11a5dbb4
Linus Torvalds authored Sep 15, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
11a5dbb4

16 Sep, 2002 10 commits
- PPC32: define rwlock_is_locked(). · dad4b2dc
  Paul Mackerras authored Sep 17, 2002
  
  dad4b2dc
- PPC32: remove unused IDE functions from include/asm-ppc/ide.h. · b5038c40
  Paul Mackerras authored Sep 17, 2002
```
This gets rid of ide_request/free_irq, ide_get/release_lock,
ide_check/request/release_region etc.
```
  b5038c40
- PPC32: define kmap_atomic_to_page · 680272da
  Paul Mackerras authored Sep 17, 2002
  
  680272da
- PPC32: remove the ppc32-specific ide_fix_driveid. · 6917068e
  Paul Mackerras authored Sep 17, 2002
```
There is a perfectly good one in drivers/ide/ide-iops.c now.
```
  6917068e
- PPC32: allocate syscall #s for alloc/free_hugepages and exit_group · 0493bae2
  Paul Mackerras authored Sep 17, 2002
```
and add exit_group to the syscall table.
```
  0493bae2
- PPC32: define atomic_add_negative · 51ed88a4
  Paul Mackerras authored Sep 17, 2002
  
  51ed88a4
- PPC32: convert xtime usage from timeval to timespec · 94f68b22
  Paul Mackerras authored Sep 17, 2002
  
  94f68b22
- PPC32: add argument to INIT_SIGNALS use in arch/ppc/kernel/process.c · 03d42991
  Paul Mackerras authored Sep 17, 2002
  
  03d42991
- PPC32: extra argument for pcibios_enable_resources/device · b8d2934f
  Paul Mackerras authored Sep 17, 2002
  
  b8d2934f
- Merge samba.org:/home/paulus/kernel/linux-2.5 · f2a5155d
  Paul Mackerras authored Sep 17, 2002
```
into samba.org:/home/paulus/kernel/for-linus-ppc
```
  f2a5155d
15 Sep, 2002 4 commits

[PATCH] thread exec fix, BK-curr · 71ee22d3

Ingo Molnar authored Sep 15, 2002

The broadcast SIGKILL kept pending in the new thread as well, and killed
it prematurely ...

71ee22d3

Use CLONE_KERNEL for the common kernel thread flags. · 9325c684
Linus Torvalds authored Sep 14, 2002

9325c684

[PATCH] thread-exec-2.5.34-B1, BK-curr · 63540cea

Ingo Molnar authored Sep 14, 2002

This implements one of the last missing POSIX threading details - exec()
semantics.  Previous kernels had code that tried to handle it, but that
code had a number of disadvantages:

 - it only worked if the exec()-ing thread was the thread group leader,
   creating an assymetry. This does not work if the thread group leader
   has exited already.

 - it was racy: it sent a SIGKILL to every thread in the group but did not
   wait for them to actually process the SIGKILL. It did a yield() but
   that is not enough. All 'other' threads have to finish processing
   before we can continue with the exec().

This adds the same logic, but extended with the following enhancements:

 - works from non-leader threads just as much as the thread group leader.

 - waits for all other threads to exit before continuing with the exec().

 - reuses the PID of the group.

It would perhaps be a more generic approach to add a new syscall,
sys_ungroup() - which would do largely what de_thread() does in this
patch.

But it's not really needed now - posix_spawn() is currently implemented
via starting a non-CLONE_THREAD helper thread that does a sys_exec().
There's no API currently that needs a direct exec() from a thread - but
it could be created (such as pthread_exec_np()).  It would have the
advantage of not having to go through a helper thread, but the
difference is minimal.

63540cea

[PATCH] exit-fix-2.5.34-C0, BK-curr · 7cd0a691

Ingo Molnar authored Sep 14, 2002

This fixes one more exit-time resource accounting issue - and it's also
a speedup and a thread-tree (to-be thread-aware pstree) visual
improvement.

In the current code we reparent detached threads to the init thread.
This worked but was not very nice in ps output: threads showed up as
being related to init.  There was also a resource-accounting issue, upon
exit they update their parent's (ie.  init's) rusage fields -
effectively losing these statistics.  Eg.  'time' under-reports CPU
usage if the threaded app is Ctrl-C-ed prematurely.

The solution is to reparent threads to the group leader - this is now
very easy since we have p->group_leader cached and it's also valid all
the time.  It's also somewhat faster for applications that use
CLONE_THREAD but do not use the CLONE_DETACHED feature.

7cd0a691