Commits · v2.6.16.32-rc1 · Kirill Smelkov / linux

11 Nov, 2006 4 commits

Linux 2.6.16.32-rc1 · ae92a0d0
Adrian Bunk authored Nov 11, 2006

ae92a0d0

Fix longstanding load balancing bug in the scheduler · ef147950

Christoph Lameter authored Nov 11, 2006

The scheduler will stop load balancing if the most busy processor contains
processes pinned via processor affinity.

The scheduler currently only does one search for busiest cpu. If it cannot
pull any tasks away from the busiest cpu because they were pinned then the
scheduler goes into a corner and sulks leaving the idle processors idle.

F.e. If you have processor 0 busy running four tasks pinned via taskset,
there are none on processor 1 and one just started two processes on
processor 2 then the scheduler will not move one of the two processes away
from processor 2.

This patch fixes that issue by forcing the scheduler to come out of its
corner and retrying the load balancing by considering other processors for
load balancing.

This patch was originally developed by John Hawkes and discussed at

http://marc.theaimsgroup.com/?l=linux-kernel&m=113901368523205&w=2.

I have removed extraneous material and gone back to equipping struct rq
with the cpu the queue is associated with since this makes the patch much
easier and it is likely that others in the future will have the same
difficulty of figuring out which processor owns which runqueue.

The overhead added through these patches is a single word on the stack if
the kernel is configured to support 32 cpus or less (32 bit). For 32 bit
environments the maximum number of cpus that can be configued is 255 which
would result in the use of 32 bytes additional on the stack. On IA64 up to
1k cpus can be configured which will result in the use of 128 additional
bytes on the stack. The maximum additional cache footprint is one
cacheline. Typically memory use will be much less than a cacheline and the
additional cpumask will be placed on the stack in a cacheline that already
contains other local variable.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

ef147950

sata_sil24: add a new PCI ID for SiI 3124 · f9a198cc

Tejun Heo authored Nov 11, 2006

Add a new PCI ID for SiI 3124.  Reported by Silicon Image.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

f9a198cc

ia64/sparc: fix local DoS with corrupted ELFs (CVE-2006-4538) · 05c19c43

Kirill Korotaev authored Nov 11, 2006

This patch prevents cross-region mappings
on IA64 and SPARC which could lead to system crash.

Adrian Bunk:
Adapted to 2.6.16.
Signed-Off-By: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

05c19c43

10 Nov, 2006 5 commits

nvidia fbdev: fix powerpc xmon scribbles · 567e0e32

Paul Mackerras authored Nov 11, 2006

xmon writes garbage on the screen because the nvidia console driver has
changed the line pitch from what the firmware set it to. Fix it by making
the nvidia driver inform the btext engine (which xmon uses if the screen is
its output device) about changes to display resolution.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

567e0e32

[POWERPC] Fix return value from memcpy · 582b1199

Paul Mackerras authored Nov 11, 2006

As pointed out by Herbert Xu <herbert@gondor.apana.org.au>, our
memcpy implementation didn't return the destination pointer as its
return value, and there is code in the kernel that expects that.
This fixes it.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

582b1199

[NET]: Update frag_list in pskb_trim · e1dc7abb

Herbert Xu authored Nov 11, 2006

When pskb_trim has to defer to ___pksb_trim to trim the frag_list part of
the packet, the frag_list is not updated to reflect the trimming.  This
will usually work fine until you hit something that uses the packet length
or tail from the frag_list.

Examples include esp_output and ip_fragment.

Another problem caused by this is that you can end up with a linear packet
with a frag_list attached.

It is possible to get away with this if we audit everything to make sure
that they always consult skb->len before going down onto frag_list.  In
fact we can do the samething for the paged part as well to avoid copying
the data area of the skb.  For now though, let's do the conservative fix
and update frag_list.

Many thanks to Marco Berizzi for helping me to track down this bug.

This 4-year old bug took 3 months to track down.  Marco was very patient
indeed :)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

e1dc7abb

scx200_acb: Fix the block transactions · c04f8dca

Jean Delvare authored Nov 11, 2006

The scx200_acb i2c bus driver pretends to support SMBus block
transactions, but in fact it implements the more simple I2C block
transactions. Additionally, it lacks sanity checks on the length
of the block transactions, which could lead to a buffer overrun.

This fixes an oops reported by Alexander Atanasov:
http://marc.theaimsgroup.com/?l=linux-kernel&m=114970382125094

Thanks to Ben Gardner for fixing my bugs :)
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

c04f8dca

Fix the scx200_acb state machine: · b88acf65

Thomas Andrews authored Nov 11, 2006

* Nack was sent one byte too late on reads >= 2 bytes.
* Stop bit was set one byte too late on reads.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

b88acf65

09 Nov, 2006 8 commits

drivers/video/nvidia/nvidia.c: Add ID for Quadro NVS280 · b235e283

Pavel Roskin authored Nov 09, 2006

Quadro NVS280 is a dual-head PCIe card with PCI ID 10de:00fd and subsystem I
10de:0215.
Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

b235e283

[DISKLABEL] SUN: Fix signed int usage for sector count · d2b85a5a

Jeff Mahoney authored Nov 09, 2006

The current sun disklabel code uses a signed int for the sector count.
When partitions larger than 1 TB are used, the cast to a sector_t causes
the partition sizes to be invalid:

 # cat /proc/paritions | grep sdan
   66   112 2146435072 sdan
   66   115 9223372036853660736 sdan3
   66   120 9223372036853660736 sdan8

This patch switches the sector count to an unsigned int to fix this.

Eric Sandeen also submitted the same patch.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

d2b85a5a

[NET]: __alloc_pages() failures reported due to fragmentation · 4504530e

Larry Woodman authored Nov 09, 2006

We have seen a couple of __alloc_pages() failures due to
fragmentation, there is plenty of free memory but no large order pages
available.  I think the problem is in sock_alloc_send_pskb(), the
gfp_mask includes __GFP_REPEAT but its never used/passed to the page
allocator.  Shouldnt the gfp_mask be passed to alloc_skb() ?
Signed-off-by: Larry Woodman <lwoodman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

4504530e

[NET]: Set truesize in pskb_copy · 5f61e927

Herbert Xu authored Nov 09, 2006

Since pskb_copy tacks on the non-linear bits from the original
skb, it needs to count them in the truesize field of the new skb.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

5f61e927

[TCP]: Don't use highmem in tcp hash size calculation. · 289b5dce

John Heffner authored Nov 09, 2006

This patch removes consideration of high memory when determining TCP
hash table sizes.  Taking into account high memory results in tcp_mem
values that are too large.
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

289b5dce

[AGPGART] remove unused variable · c3fe9b53

Adrian Bunk authored Nov 09, 2006

This patch removes an unused variable.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Dave Jones <davej@redhat.com>

c3fe9b53

[AGPGART] Suspend/Resume support for nVidia nForce AGP. · 8a3decfe

Dave Jones authored Nov 09, 2006

Based on a patch from the Ubuntu kernel.
Signed-off-by: Ben Collins <bcollins@ubuntu.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

8a3decfe

drivers/telephony/ixj: fix an array overrun · 9e914f50

Adrian Bunk authored Nov 09, 2006

The Coverity checker noted that in
drivers/telephony/ixj.c:ixj_build_filter_cadence(), filter_en[4] or
filter_en[5] could be written to.
Signed-off-by: Adrian Bunk <bunk@stusta.de>

9e914f50

08 Nov, 2006 7 commits

nvidiafb: Add support for Geforce 6100 and related chipsets · 0ed0aa77

Antonino Daplas authored Nov 08, 2006

Add support for Geforce 6100 and related chipsets (PCI device id 0x024x)
Signed-off-by: Antonino Daplas <adaplas@pol.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

0ed0aa77

drivers/md/md.c: update START_ARRAY printk · f9196433

Adrian Bunk authored Nov 08, 2006

START_ARRAY will not be removed in 2.6.16, therefore replace the date
reference with a kernel version reference.
Signed-off-by: Adrian Bunk <bunk@stusta.de>

f9196433

remove Documentation/feature-removal-schedule.txt · aac3c5dc

Adrian Bunk authored Nov 08, 2006

The information in Documentation/feature-removal-schedule.txt
doesn't apply to the 2.6.16 branch (and most dates were already
in the past.
Signed-off-by: Adrian Bunk <bunk@stusta.de>

aac3c5dc

[IPV4]: Limit rt cache size properly. · a38c4343

Kirill Korotaev authored Nov 08, 2006

During OpenVZ stress testing we found that UDP traffic with random src
can generate too much excessive rt hash growing leading finally to OOM
and kernel panics.

It was found that for 4GB i686 system (having 1048576 total pages and
225280 normal zone pages) kernel allocates the following route hash:
syslog: IP route cache hash table entries: 262144 (order: 8, 1048576
bytes) => ip_rt_max_size = 4194304 entries, i.e.  max rt size is
4194304 * 256b = 1Gb of RAM > normal_zone

Attached the patch which removes HASH_HIGHMEM flag from
alloc_large_system_hash() call.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

a38c4343

Don't allow chmod() on the /proc/<pid>/ files · 5db60db6

Marcel Holtmann authored Nov 08, 2006

This just turns off chmod() on the /proc/<pid>/ files, since there is no
good reason to allow it, and had we disallowed it originally, the nasty
/proc race exploit wouldn't have been possible.

The other patches already fixed the problem chmod() could cause, so this
is really just some final mop-up..

This particular version is based off a patch by Eugene and Marcel which
had much better naming than my original equivalent one.
Signed-off-by: Eugene Teo <eteo@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

5db60db6

fbdev: correct buffer size limit in fbmem_read_proc() · 5f4b6b03

Geert Uytterhoeven authored Nov 08, 2006

Address http://bugzilla.kernel.org/show_bug.cgi?id=7189

It should check `clen', not `len'.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

5f4b6b03

[NET]: Add missing UFO initialisations · 41dc00ec

Herbert Xu authored Nov 08, 2006

This bug was unknowingly fixed the GSO patches (or rather, its effect was
unknown at the time).

Thanks to Marco Berizzi's persistence which is documented in the thread
"ipsec tunnel asymmetrical mtu", we now know that it can have highly
non-obvious symptoms.

What happens is that uninitialised uso_size fields can cause packets to
be incorrectly identified as UFO, which means that it does not get
fragmented even if it's over the MTU.

The fix is simple enough.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

41dc00ec

07 Nov, 2006 10 commits

from mm/memory.c: · 8bd3ff1d

Dmitriy Monakhov authored Nov 07, 2006

  1434  static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va)
  1435  {
  1436          /*
  1437           * If the source page was a PFN mapping, we don't have
  1438           * a "struct page" for it. We do a best-effort copy by
  1439           * just copying from the original user address. If that
  1440           * fails, we just zero-fill it. Live with it.
  1441           */
  1442          if (unlikely(!src)) {
  1443                  void *kaddr = kmap_atomic(dst, KM_USER0);
  1444                  void __user *uaddr = (void __user *)(va & PAGE_MASK);
  1445
  1446                  /*
  1447                   * This really shouldn't fail, because the page is there
  1448                   * in the page tables. But it might just be unreadable,
  1449                   * in which case we just give up and fill the result with
  1450                   * zeroes.
  1451                   */
  1452                  if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
  1453                          memset(kaddr, 0, PAGE_SIZE);
  1454                  kunmap_atomic(kaddr, KM_USER0);
  #### D-cache have to be flushed here.
  #### It seems it is just forgotten.

  1455                  return;
  1456
  1457          }
  1458          copy_user_highpage(dst, src, va);
  #### Ok here. flush_dcache_page() called from this func if arch need it
  1459  }
Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

8bd3ff1d

[MAINTAINERS]: Add proper entry for TC classifier · de4fdc93

Stephen Hemminger authored Nov 07, 2006

Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

de4fdc93

[PKT_SCHED]: act_api: Fix module leak while flushing actions · aa459cd3

Thomas Graf authored Nov 07, 2006

Module reference needs to be given back if message header
construction fails.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

aa459cd3

PKT_SCHED: Return ENOENT if action module is unavailable · ed382a2a

Thomas Graf authored Nov 07, 2006

Return ENOENT if action module is unavailable
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

ed382a2a

PKT_SCHED: Fix illegal memory dereferences when dumping actions · f6dde50d

Thomas Graf authored Nov 07, 2006

The TCA_ACT_KIND attribute is used without checking its
availability when dumping actions therefore leading to a
value of 0x4 being dereferenced.

The use of strcmp() in tc_lookup_action_n() isn't safe
when fed with string from an attribute without enforcing
proper NUL termination.

Both bugs can be triggered with malformed netlink message
and don't require any privileges.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

f6dde50d

PKT_SCHED: Fix error handling while dumping actions · 118d32a5

Thomas Graf authored Nov 07, 2006

"return -err" and blindly inheriting the error code in the netlink
failure exception handler causes errors codes to be returned as
positive value therefore making them being ignored by the caller.

May lead to sending out incomplete netlink messages.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

118d32a5

[PATCH] md: Make sure bi_max_vecs is set properly in bio_split · f0524af0

Neil Brown authored Nov 07, 2006

Else a subsequent bio_clone might make a mess.
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

f0524af0

[CPUFREQ] Fix powernow-k8 SMP kernel on UP hardware bug. · 6f5bccdd

Randy Dunlap authored Nov 07, 2006

Fix powernow-k8 doesn't load bug.
Reference:
https://launchpad.net/distros/ubuntu/+source/linux-source-2.6.15/+bug/35145Signed-off-by: Ben Collins <bcollins@ubuntu.com>
Signed-off-by: Dave Jones <davej@redhat.com>

6f5bccdd

[CPUFREQ] Make powernow-k7 work on SMP kernels. · 22d66b36

Dave Jones authored Nov 07, 2006

Even though powernow-k7 doesn't work in SMP environments,
it can work on an SMP configured kernel if there's only
one CPU present, however recalibrate_cpu_khz was returning
-EINVAL on such kernels, so we failed to init the cpufreq driver.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

22d66b36

Linux 2.6.16.31 · ba9cf572
Adrian Bunk authored Nov 07, 2006

ba9cf572

05 Nov, 2006 6 commits

Linux 2.6.16.31-rc1 · afaa018c
Adrian Bunk authored Nov 05, 2006

afaa018c

[NETFILTER]: Fix ip6_tables extension header bypass bug (CVE-2006-4572) · 0ddfcc96

Patrick McHardy authored Nov 05, 2006

As reported by Mark Dowd <Mark_Dowd@McAfee.com>, ip6_tables is susceptible
to a fragmentation attack causing false negatives on extension header
matches.

When extension headers occur in the non-first fragment after the fragment
header (possibly with an incorrect nexthdr value in the fragment header)
a rule looking for this extension header will never match.

Drop fragments that are at offset 0 and don't contain the final protocol
header regardless of the ruleset, since this should not happen normally.
Since all extension headers are before the protocol header this makes sure
an extension header is either not present or in the first fragment, where
we can properly parse it.

With help from Yasuyuki KOZAKAI <yasuyuki.kozakai@toshiba.co.jp>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

0ddfcc96

[NETFILTER]: Fix ip6_tables protocol bypass bug (CVE-2006-4572) · 6ac62be8

Patrick McHardy authored Nov 05, 2006

As reported by Mark Dowd <Mark_Dowd@McAfee.com>, ip6_tables is susceptible
to a fragmentation attack causing false negatives on protocol matches.

When the protocol header doesn't follow the fragment header immediately,
the fragment header contains the protocol number of the next extension
header. When the extension header and the protocol header are sent in
a second fragment a rule like "ip6tables .. -p udp -j DROP" will never
match.

Drop fragments that are at offset 0 and don't contain the final protocol
header regardless of the ruleset, since this should not happen normally.

With help from Yasuyuki KOZAKAI <yasuyuki.kozakai@toshiba.co.jp>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

6ac62be8

knfsd: Fix race that can disable NFS server. · 0ac0a208

Neil Brown authored Nov 05, 2006

This is a long standing bug that seems to have only recently become
apparent, presumably due to increasing use of NFS over TCP - many
distros seem to be making it the default.

The SK_CONN bit gets set when a listening socket may be ready
for an accept, just as SK_DATA is set when data may be available.

It is entirely possible for svc_tcp_accept to be called with neither
of these set.  It doesn't happen often but there is a small race in
svc_sock_enqueue as SK_CONN and SK_DATA are tested outside the
spin_lock.  They could be cleared immediately after the test and
before the lock is gained.

This normally shouldn't be a problem.  The sockets are non-blocking so
trying to read() or accept() when ther is nothing to do is not a problem.

However: svc_tcp_recvfrom makes the decision "Should I accept() or
should I read()" based on whether SK_CONN is set or not.  This usually
works but is not safe.  The decision should be based on whether it is
a TCP_LISTEN socket or a TCP_CONNECTED socket.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

0ac0a208

posix-cpu-timers: prevent signal delivery starvation · fe8187b8

Thomas Gleixner authored Nov 05, 2006

The integer divisions in the timer accounting code can round the result
down to 0. Adding 0 is without effect and the signal delivery stops.

Clamp the division result to minimum 1 to avoid this.

Problem was reported by Seongbae Park <spark@google.com>, who provided
also an inital patch.

Roland sayeth:

I have had some more time to think about the problem, and to reproduce it
using Toyo's test case. For the record, if my understanding of the problem
is correct, this happens only in one very particular case. First, the
expiry time has to be so soon that in cputime_t units (usually 1s/HZ ticks)
it's < nthreads so the division yields zero. Second, it only affects each
thread that is so new that its CPU time accumulation is zero so now+0 is
still zero and ->it_*_expires winds up staying zero. For the VIRT and PROF
clocks when cputime_t is tick granularity (or the SCHED clock on
configurations where sched_clock's value only advances on clock ticks), this
is not hard to arrange with new threads starting up and blocking before they
accumulate a whole tick of CPU time. That's what happens in Toyo's test
case.

Note that in general it is fine for that division to round down to zero,
and set each thread's expiry time to its "now" time. The problem only
arises with thread's whose "now" value is still zero, so that now+0 winds up
0 and is interpreted as "not set" instead of ">= now". So it would be a
sufficient and more precise fix to just use max(ticks, 1) inside the loop
when setting each it_*_expires value.

But, it does no harm to round the division up to one and always advance
every thread's expiry time. If the thread didn't already fire timers for
the expiry time of "now", there is no expectation that it will do so before
the next tick anyway. So I followed Thomas's patch in lifting the max out
of the loops.

This patch also covers the reload cases, which are harder to write a test
for (and I didn't try). I've tested it with Toyo's case and it fixes that.

[toyoa@mvista.com: fix: min_t -> max_t]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

fe8187b8

[IPV6]: fix lockup via /proc/net/ip6_flowlabel (CVE-2006-5619) · d1ce361a

James Morris authored Nov 05, 2006

There's a bug in the seqfile handling for /proc/net/ip6_flowlabel, where,
after finding a flowlabel, the code will loop forever not finding any
further flowlabels, first traversing the rest of the hash bucket then just
looping.

This patch fixes the problem by breaking after the hash bucket has been
traversed.

Note that this bug can cause lockups and oopses, and is trivially invoked
by an unpriveleged user.
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

d1ce361a