Commits · 3dfa303d9839496fc21d3ac47d10ff017dbb1c3a · Kirill Smelkov / linux

10 May, 2004 5 commits

[PATCH] scheduler domain balancing improvements · 3dfa303d

Andrew Morton authored May 09, 2004

From: Nick Piggin <piggin@cyberone.com.au>

This patch gets the sched_domain scheduler working better WRT balancing.
Its been tested on the NUMAQ. Among other things it changes to the way SMT
load calculation works so as not to active load blances when it shouldn't.

It still has a problem with SMT and NUMA: it will put a task on each
sibling in a node before moving tasks to another node. It should probably
start moving tasks after each *physical* CPU is filled.

To fix, you need "how much CPU power in this domain?" At the moment we
approximate # runqueues == CPU power, and hack around it at the CPU
physical domain by counting all sibling runqueues as 1.

It isn't hard to correctly work the CPU power out, but once CPU hotplug is
in the equation it becomes much more hotplug events. If anyone is actually
interested in getting this fixed, that is.

3dfa303d

[PATCH] sched_domain debugging · b45bb339

Andrew Morton authored May 09, 2004

From: Nick Piggin <piggin@cyberone.com.au>

Anton was attempting to make a sched domain topology for his POWER5 and was
having some trouble.

This patch only includes code which is ifdefed out, but hopefully it will
be of some use to implementors.

b45bb339

[PATCH] sched: scheduler domain support · 8c136f71

Andrew Morton authored May 09, 2004

From: Nick Piggin <piggin@cyberone.com.au>

This is the core sched domains patch.  It can handle any number of levels
in a scheduling heirachy, and allows architectures to easily customize how
the scheduler behaves.  It also provides progressive balancing backoff
needed by SGI on their large systems (although they have not yet tested
it).

It is built on top of (well, uses ideas from) my previous SMP/NUMA work, and
gets results very similar to them when using the default scheduling
description.

Benchmarks
==========

Martin was seeing I think 10-20% better system times in kernbench on the 32
way.  I was seeing improvements in dbench, tbench, kernbench, reaim,
hackbench on a 16-way NUMAQ.  Hackbench in fact had a non linear element
which is all but eliminated.  Large improvements in volanomark.

Cross node task migration was decreased in all above benchmarks, sometimes by
a factor of 100!!  Cross CPU migration was also generally decreased.  See
this post:
http://groups.google.com.au/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&frame=right&th=a406c910b30cbac4&seekm=UAdQ.3hj.5%40gated-at.bofh.it#link2

Results on a hyperthreading P4 are equivalent to Ingo's shared runqueues
patch (which is a big improvement).

Some examples on the 16-way NUMAQ (this is slightly older sched domain code):

 http://www.kerneltrap.org/~npiggin/w26/hbench.png
 http://www.kerneltrap.org/~npiggin/w26/vmark.html

From: Jes Sorensen <jes@wildopensource.com>

   Tiny patch to make -mm3 compile on an NUMA box with NR_CPUS >
   BITS_PER_LONG.

From: "Martin J. Bligh" <mbligh@aracnet.com>

   Fix a minor nit with the find_busiest_group code.  No functional change,
   but makes the code simpler and clearer.  This patch does two things ... 
   adds some more expansive comments, and removes this if clause:

      if (*imbalance < SCHED_LOAD_SCALE
                      && max_load - this_load > SCHED_LOAD_SCALE)
		*imbalance = SCHED_LOAD_SCALE;

   If we remove the scaling factor, we're basically conditionally doing:

	if (*imbalance < 1)
		*imbalance = 1;

   Which is pointless, as the very next thing we do is to remove the
   scaling factor, rounding up to the nearest integer as we do:

	*imbalance = (*imbalance + SCHED_LOAD_SCALE - 1) >> SCHED_LOAD_SHIFT;

   Thus the if statement is redundant, and only makes the code harder to
   read ;-)

From: Rick Lindsley <ricklind@us.ibm.com>

   In find_busiest_group(), after we exit the do/while, we select our
   imbalance.  But max_load, avg_load, and this_load are all unsigned, so
   min(x,y) will make a bad choice if max_load < avg_load < this_load (that
   is, a choice between two negative [very large] numbers).

   Unfortunately, there is a bug when max_load never gets changed from zero
   (look in the loop and think what happens if the only load on the machine is
   being created by cpu groups of which we are a member).  And you have a
   recipe for some really bogus values for imbalance.

   Even if you fix the max_load == 0 bug, there will still be times when
   avg_load - this_load will be negative (thus very large) and you'll make the
   decision to move stuff when you shouldn't have.

   This patch allows for this_load to set max_load, which if I understand
   the logic properly is correct.  With this patch applied, the algorithm is
   *much* more conservative ...  maybe *too* conservative but that's for
   another round of testing ...

From: Ingo Molnar <mingo@elte.hu>

   sched-find-busiest-fix

8c136f71

[PATCH] sched: improved resolution in find_busiest_node · 067e0480

Andrew Morton authored May 09, 2004

From: Nick Piggin <piggin@cyberone.com.au>

From: Frank Cornelis <frank.cornelis@elis.ugent.be>

In order to get the best possible resolution we need to use NR_CPUS instead
of the constant value 10.  load is an int, so no need to worry about
overflows...

067e0480

[PATCH] small scheduler cleanup · 4f20771c

Andrew Morton authored May 09, 2004

From: Ingo Molnar <mingo@elte.hu>

From: Nick Piggin <piggin@cyberone.com.au> wrote:

It removes the last place where we mess with run_list open coded.

4f20771c

09 May, 2004 13 commits

Linux 2.6.6 · 3dc567d8
Linus Torvalds authored May 09, 2004

3dc567d8

Mark the ACPI CPU throttle and timer IO regions busy. · 7e941d4d

Linus Torvalds authored May 09, 2004

This should help some laptops where the generic PCI
code might otherwise believe that this range is unused.
The ACPI IO range is usually not visible as a standard
BAR.

7e941d4d

[PATCH] Fix x86-64 compilation without iommu for 2.6.6rc3 · c99ae253

Andi Kleen authored May 09, 2004

Various people hit this in earlier kernels. The x86-64 kernel did not compile
without CONFIG_IOMMU_GART in various configurations. Just add the missing symbol
and export it. Also export iommu_merge while I am at it.

c99ae253

[PATCH] Fix machine check handler on x86-64 · 57bfa2c5

Andi Kleen authored May 09, 2004

This fixes a bug in the new machine check handler on x86-64.

One nasty part was that when you got an MCE during boot up
then it would not always print it on the screen, but still
panic because it attempted to kill the idle task.

This patch does:
 - Always use KERN_EMERG when printing MCEs
 - Always panic and print on screen before killing idle loop
   or init.

57bfa2c5

Merge bk://bk.arm.linux.org.uk/linux-2.6-rmk · b81346bc
Linus Torvalds authored May 09, 2004
```
into ppc970.osdl.org:/home/torvalds/v2.6/linux
```
b81346bc
Merge flint.arm.linux.org.uk:/usr/src/bk/linux-2.6-sharp · 1d89057b
Russell King authored May 09, 2004
```
into flint.arm.linux.org.uk:/usr/src/bk/linux-2.6-rmk
```
1d89057b
[ARM PATCH] 1818/1: lh7a40x #2 (3/7) doc · f42083cc
Marc Singer authored May 09, 2004
```
Patch from Marc Singer

Documentation for the Sharp-LH machines.
```
f42083cc

[ARM PATCH] 1817/1: lh7a40x #2 (2/7) core-include · a7c57d4a

Marc Singer authored May 09, 2004

Patch from Marc Singer

Include files for this updated lh7a40x patch set.  The changes in this
set from the previous are mostly cosmetic.  The memory macros were
reworked in order to be more similar to the other ARM versions.  The
previous versions produced the same results, but the forms are
slightly different.

a7c57d4a

[ARM PATCH] 1816/1: lh7a40x #2 (1/7) core · 1c0c2783

Marc Singer authored May 09, 2004

Patch from Marc Singer

Updated change set for the 2.6.5 kernel *and* for the April 8th arm
patch.  Also included are changes suggested by Russell that merge
several of the files in the mach- directory.  I have also endeavored
to remove all unnecessary whitespace additions.

Note that since I've found the cause of an annoying user-space crash,
I believe that this patch is OK.  The crash appears to have nothing to
do with the system setup.

1c0c2783

[ARM PATCH] 1847/1: OMAP update 2/2: include files · a263e250

Tony Lindgren authored May 09, 2004

Patch from Tony Lindgren

This patch syncs the mainline kernel with the linux-omap tree. The
patch contains following updates:
- Move virtual IO area to 0xfefb0000 from 0xfffb0000 to fix parts of
  IO area  overlapping with ARM Linux reserved memory area
- Add support to OMAP-730, OMAP-5912, and OMAP-1710 processors
- Reorganize board support
- Add OMAP core detection
This patch requires ARM Linux patch 1844/1 be applied to compile
OMAP-730 and OMAP-5912

a263e250

[ARM PATCH] 1846/1: OMAP update 1/2: arch files · 62b2119f

Tony Lindgren authored May 09, 2004

Patch from Tony Lindgren

This patch syncs the mainline kernel with the linux-omap tree. The
patch contains following updates:
- Move virtual IO area to 0xfefb0000 from 0xfffb0000 to fix parts of
  IO area overlapping with ARM Linux reserved memory area
- Add support to OMAP-730, OMAP-5912, and OMAP-1710 processors
- Reorganize board support
- Add OMAP core detection
This patch requires ARM Linux patch 1844/1 be applied to compile
OMAP-730 and OMAP-5912

62b2119f

[ARM PATCH] 1844/1: Allow OMAP-730 and OMAP-5910 to use ARM926 in mm/Kconfig · 78c4d584
Tony Lindgren authored May 09, 2004
```
Patch from Tony Lindgren

Adds OMAP-730 and OMAP-5910 support
```
78c4d584

[PATCH] ISDN Eicon driver: fix idi cleanup deadlock · 8f555e6d

Armin Schindler authored May 08, 2004

   On IDI module cleanup, the freed card must be removed from list.  
   Use list_empty() instead of list_for_each() loop. Thanks Linus.

8f555e6d

08 May, 2004 18 commits

Waste less memory in dentries. · 293889f5

Linus Torvalds authored May 08, 2004

We don't bother aligining them on a cacheline boundary, since
that is totally excessive in some configurations (especially
P4's with 128-byte cachelines).

Instead, we make the minimum inline string size a bit longer,
and re-order a few fields that allow for better packing on
64-bit architectures, for better memory utilization.

293889f5

Merge bk://kernel.bkbits.net/davem/sparc-2.6 · 82f1671a
Linus Torvalds authored May 08, 2004
```
into ppc970.osdl.org:/home/torvalds/v2.6/linux
```
82f1671a
Merge bk://kernel.bkbits.net/davem/net-2.6 · 0ef8ced2
Linus Torvalds authored May 08, 2004
```
into ppc970.osdl.org:/home/torvalds/v2.6/linux
```
0ef8ced2

[PATCH] run populate_rootfs() before initcalls · 25714ddf

Andrew Morton authored May 08, 2004

I moved this a little too late - we need to run populate_rootfs() before
running initcalls because some driver initcalls need to open files for
firmware.

The populate_rootfs() call is still coming after init_idle(), so it won't
knock the scheduler over.

25714ddf

[SPARC64]: hugetlbpage.c needs linux/module.h · 2b308273
David S. Miller authored May 08, 2004

2b308273
[NET]: Undo marking sock_alloc() as static, still exported to modules. · 63206b3b
David S. Miller authored May 08, 2004

63206b3b

[TCP]: BIC TCP for Linux 2.6.6 · 54d05783

Stephen Hemminger authored May 08, 2004

This is a version of Binary Increase Control (BIC) TCP
developed by NCSU.   It is yet another TCP congestion control
algorithm for handling big fat pipes. For normal size congestion
windows it behaves the same as existing TCP Reno, but when window
is large it uses additive increase to ensure fairness and when
window is small it uses binary search increase.

For more details see the BIC TCP web page
 http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/

The original code was for web100 (2.4); this version is pretty
much the same but targeted for 2.6 with less sysctl parameters 
and more constants.

I don't have a real high speed long haul network to test, but
when running over 1G links with delays, the performance is more stable
(ie tests are repeatable) and as fast as existing Reno.

54d05783

[SCTP]: Fix multihomed connection failures on 64-bit systems. · fbb3aa0d

Sridhar Samudrala authored May 08, 2004

Avoid the use of sizeof() and pointer arithmetic to get to the end
of sctp_cookie structure. Instead use the last element peer_init which
is a zero-sized array as the offset.

fbb3aa0d

[IPV4]: Use time_after() in override ARP calculation. · 6619be03
David Stevens authored May 08, 2004

6619be03

[NET]: Add sock_create_lite() · 398b3c44

James Morris authored May 08, 2004

The purpose of this is to allow sockets created by the kernel in this way
to be passed through the LSM socket creation hooks and be labeled and
mediated in the same manner as other sockets.

This patches addresses a class of potential issues with LSMs, where such
sockets will not be labeled correctly (if at all), or mediated during
creation. Under SELinux, it fixes a specific bug where RPC sockets
created by the kernel during TCP NFS serving are unlabeled.

398b3c44

[NET]: Add sock_create_kern() · e2943dca

James Morris authored May 08, 2004

Under SELinux, and potentially other LSMs, we need to be able to
distinguish between user sockets and kernel sockets. For SELinux
specifically, kernel sockets need to be specially labeled during creation,
then bypass access control checks (they are controlled by the kernel
itself and not subject to SELinux mediation).

This addresses a class of potential issues in SELinux where, for example,
a TCP NFS session times out, then the kernel re-establishes an RPC
connection upon further user activity. We do not want such kernel
created sockets to be labeled with user security contexts.

sock_create() and sock_create_kern() are wrapper functions, which seems
semantically clearer to me than e.g. adding a flag to sock_create(). If
you prefer the latter, then let me know.

The patch also adds an argument to the LSM socket creation functions
indicating whether the socket being created is a kernel socket or not.

e2943dca

Merge nuts.davemloft.net:/disk1/BK/network-2.6 · 49a1f4d4
David S. Miller authored May 07, 2004
```
into nuts.davemloft.net:/disk1/BK/net-2.6
```
49a1f4d4
[SPARC64]: Use $(CC) in NEW_GCC checks. · 812b724d
Joshua Kwan authored May 07, 2004

812b724d
[SUNZILOG]: Fix DCD/CTS change tests, just like in pmac_zilog. · c0251be5
Benjamin Herrenschmidt authored May 07, 2004

c0251be5

[PATCH] fix WARN_ON on XFS module unload · 08faf52b

Andrew Morton authored May 07, 2004

From: Christoph Hellwig <hch@lst.de>

This one is a little funny.  The SGI trees don't show this issue because dmapi
and quota are separate modules so they must be unloaded before xfs_fs_exit can
be called at all.

So let's move the exitcalls for them in mainline first to simulate that
behaviour.

08faf52b

[PATCH] Fix CTS handling in pmac-zilog.c · 6032402c

Benjamin Herrenschmidt authored May 07, 2004

From: Paul Mackerras <paulus@samba.org>

This patch fixes a bug in the pmac-zilog driver where if you enable
CRTSCTS mode, it won't output data when CTS is asserted.  On
powermacs, the CTS input is inverted.  It also fixes a logic bug in
testing for CTS and DCD changes.

6032402c

[PATCH] ISDN Eicon driver: fix empty queue check · d2ac9ae6
Armin Schindler authored May 07, 2004
```
   Check for last adapter link is done by next member,
   because entries are not removed yet.
```
d2ac9ae6

All the Intel LPC bridges have the same PCI quirks. · 16f16e70

Linus Torvalds authored May 07, 2004

They all have 128 bytes of ACPI/TCO IO space pointed to
by config space register 0x40, and 64 bytes of GPIO space
pointed to by 0x58.

Thanks to Jun Nakajima for the full list.

16f16e70

07 May, 2004 4 commits
- Merge bk://bk.arm.linux.org.uk/linux-2.6-rmk · 60774082
  Linus Torvalds authored May 07, 2004
```
into ppc970.osdl.org:/home/torvalds/v2.6/linux
```
  60774082
- [SERIAL] Remove unmerged 'clk' subsystem from PL011 driver. · d9e7bd2f
  Russell King authored May 08, 2004
  
  d9e7bd2f
- [ARM] Enclose MMC-related code in #ifdef CONFIG_MMC .. #endif · abda8bc7
  Russell King authored May 08, 2004
  
  abda8bc7
- [ARM] Remove DMA support in Versatile · fbfb0222
  Russell King authored May 08, 2004
```
We don't have DMA support for AMBA devices yet.
```
  fbfb0222