Commits · 7d778ac64b7d42ff3814348a25e99c778f8473f5 · Kirill Smelkov / linux

22 Nov, 2010 9 commits

x86, cpu: Fix renamed, not-yet-shipping AMD CPUID feature bit · 7d778ac6

Andre Przywara authored Sep 06, 2010

commit 7ef8aa72 upstream.

The AMD SSE5 feature set as-it has been replaced by some extensions
to the AVX instruction set. Thus the bit formerly advertised as SSE5
is re-used for one of these extensions (XOP).
Although this changes the /proc/cpuinfo output, it is not user visible, as
there are no CPUs (yet) having this feature.
To avoid confusion this should be added to the stable series, too.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
LKML-Reference: <1283778860-26843-2-git-send-email-andre.przywara@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

7d778ac6

mm, x86: Saving vmcore with non-lazy freeing of vmas · 9809892e

Cliff Wickman authored Sep 16, 2010

commit 3ee48b6a upstream.

During the reading of /proc/vmcore the kernel is doing
ioremap()/iounmap() repeatedly. And the buildup of un-flushed
vm_area_struct's is causing a great deal of overhead. (rb_next()
is chewing up most of that time).

This solution is to provide function set_iounmap_nonlazy(). It
causes a subsequent call to iounmap() to immediately purge the
vma area (with try_purge_vmap_area_lazy()).

With this patch we have seen the time for writing a 250MB
compressed dump drop from 71 seconds to 44 seconds.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: kexec@lists.infradead.org
LKML-Reference: <E1OwHZ4-0005WK-Tw@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

9809892e

futex: Fix errors in nested key ref-counting · 183308db

Darren Hart authored Oct 17, 2010

commit 7ada876a upstream.

futex_wait() is leaking key references due to futex_wait_setup()
acquiring an additional reference via the queue_lock() routine. The
nested key ref-counting has been masking bugs and complicating code
analysis. queue_lock() is only called with a previously ref-counted
key, so remove the additional ref-counting from the queue_(un)lock()
functions.

Also futex_wait_requeue_pi() drops one key reference too many in
unqueue_me_pi(). Remove the key reference handling from
unqueue_me_pi(). This was paired with a queue_lock() in
futex_lock_pi(), so the count remains unchanged.

Document remaining nested key ref-counting sites.
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Reported-and-tested-by: Matthieu Fertré<matthieu.fertre@kerlabs.com>
Reported-by: Louis Rilling<louis.rilling@kerlabs.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <4CBB17A8.70401@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

183308db

bluetooth: Fix missing NULL check · 3502405e

Alan Cox authored Oct 22, 2010

commit c19483cc upstream.

Fortunately this is only exploitable on very unusual hardware.

[Reported a while ago but nothing happened so just fixing it]
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

3502405e

sched: Fix string comparison in /proc/sched_features · ebf7608d

Mathieu Desnoyers authored Sep 13, 2010

commit 7740191c upstream.

Fix incorrect handling of the following case:

 INTERACTIVE
 INTERACTIVE_SOMETHING_ELSE

The comparison only checks up to each element's length.

Changelog since v1:
 - Embellish using some Rostedtisms.
  [ mingo:                 ^^ == smaller and cleaner ]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Lindgren <tony@atomide.com>
LKML-Reference: <20100913214700.GB16118@Krystal>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

ebf7608d

pcmcia: synclink_cs: fix information leak to userland · f055e6df

Vasiliy Kulikov authored Oct 17, 2010

commit 5b917a14 upstream.

Structure new_line is copied to userland with some padding fields unitialized.
It leads to leaking of stack memory.
Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

f055e6df

powerpc/perf: Fix sampling enable for PPC970 · 89323c1e

Paul Mackerras authored Sep 09, 2010

commit 9f5f9ffe upstream.

The logic to distinguish marked instruction events from ordinary events
on PPC970 and derivatives was flawed.  The result is that instruction
sampling didn't get enabled in the PMU for some marked instruction
events, so they would never trigger.  This fixes it by adding the
appropriate break statements in the switch statement.
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

89323c1e

staging: usbip: Process event flags without delay · f37563af

Max Vozeler authored Sep 21, 2010

commit 584c5b7c upstream.

The way the event handler works can cause it to delay
events until eventual wakeup for another event.

For example, on device detach (vhci):

 - Write to sysfs detach file
    -> usbip_event_add(VDEV_EVENT_DOWN)
      -> wakeup()

#define VDEV_EVENT_DOWN (USBIP_EH_SHUTDOWN | USBIP_EH_RESET).

 - Event thread wakes up and passes the event to
   event_handler() to process.

 - It processes and clears the USBIP_EH_SHUTDOWN
   flag then returns.

 - The outer event loop (event_handler_loop()) calls
   wait_event_interruptible().

The processing of the second flag which is part of
VDEV_EVENT_DOWN (USBIP_EH_RESET) did not happen yet.
It is delayed until the next event.

This means the ->reset callback may not happen for
a long time (if ever), leaving the usbip port in a
weird state which prevents its reuse.

This patch changes the handler to process all flags
before waiting for another wakeup.

I have verified this change to fix a problem which
prevented reattach of a usbip device. It also helps
for socket errors which missed the RESET as well.

The delayed event processing also affects the stub
side of usbip and the error handling there.
Signed-off-by: Max Vozeler <mvz@vozeler.com>
Reported-by: Marco Lancione <marco@optikam.com>
Tested-by: Luc Jalbert <ljalbert@optikam.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

f37563af

staging: usbip: Notify usb core of port status changes · 7f04bf4d

Max Vozeler authored Sep 21, 2010

commit 0c9a32f0 upstream.

This patch changes vhci to behave like dummy and
other hcds when disconnecting a device.

Previously detaching a device from the root hub
did not notify the usb core of the disconnect and
left the device visible.
Signed-off-by: Max Vozeler <mvz@vozeler.com>
Reported-by: Marco Lancione <marco@optikam.com>
Tested-by: Luc Jalbert <ljalbert@optikam.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

7f04bf4d

29 Oct, 2010 31 commits

Linux 2.6.32.25 · 80630134
Greg Kroah-Hartman authored Oct 28, 2010

80630134

mm: Move vma_stack_continue into mm.h · 78b8fca4

Stefan Bader authored Aug 31, 2010

commit 39aa3cb3 upstream.

So it can be used by all that need to check for that.
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

78b8fca4

Phonet: disable network namespace support · 8e983dfe

Rémi Denis-Courmont authored Sep 18, 2010

[Solved differently upstream]

Network namespace in the Phonet socket stack causes an OOPS when a
namespace is destroyed. This occurs as the loopback exit_net handler is
called after the Phonet exit_net handler, and re-enters the Phonet
stack. I cannot think of any nice way to fix this in kernel <= 2.6.32.

For lack of a better solution, disable namespace support completely.
If you need that, upgrade to a newer kernel.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>

8e983dfe

execve: make responsive to SIGKILL with large arguments · b9d023fb

Roland McGrath authored Sep 07, 2010

commit 9aea5a65 upstream.

An execve with a very large total of argument/environment strings
can take a really long time in the execve system call.  It runs
uninterruptibly to count and copy all the strings.  This change
makes it abort the exec quickly if sent a SIGKILL.

Note that this is the conservative change, to interrupt only for
SIGKILL, by using fatal_signal_pending().  It would be perfectly
correct semantics to let any signal interrupt the string-copying in
execve, i.e. use signal_pending() instead of fatal_signal_pending().
We'll save that change for later, since it could have user-visible
consequences, such as having a timer set too quickly make it so that
an execve can never complete, though it always happened to work before.
Signed-off-by: Roland McGrath <roland@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b9d023fb

execve: improve interactivity with large arguments · bd720c92

Roland McGrath authored Sep 07, 2010

commit 7993bc1f upstream.

This adds a preemption point during the copying of the argument and
environment strings for execve, in copy_strings().  There is already
a preemption point in the count() loop, so this doesn't add any new
points in the abstract sense.

When the total argument+environment strings are very large, the time
spent copying them can be much more than a normal user time slice.
So this change improves the interactivity of the rest of the system
when one process is doing an execve with very large arguments.
Signed-off-by: Roland McGrath <roland@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

bd720c92

setup_arg_pages: diagnose excessive argument size · 8816b5d0

Roland McGrath authored Sep 07, 2010

commit 1b528181 upstream.

The CONFIG_STACK_GROWSDOWN variant of setup_arg_pages() does not
check the size of the argument/environment area on the stack.
When it is unworkably large, shift_arg_pages() hits its BUG_ON.
This is exploitable with a very large RLIMIT_STACK limit, to
create a crash pretty easily.

Check that the initial stack is not too large to make it possible
to map in any executable.  We're not checking that the actual
executable (or intepreter, for binfmt_elf) will fit.  So those
mappings might clobber part of the initial stack mapping.  But
that is just userland lossage that userland made happen, not a
kernel problem.
Signed-off-by: Roland McGrath <roland@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

8816b5d0

fix 2.6.32.23 suspend regression caused by commit · c6e6bb49

Mike Galbraith authored Oct 01, 2010

[Not upstream in the same way, as it was fixed differently there]

6f6198a7 sched: kill migration thread in CPU_POST_DEAD instead of CPU_DEAD
leaves migration threads lying about.  Mask out CPU_TASKS_FROZEN.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

c6e6bb49

x86: detect scattered cpuid features earlier · 3a70a506

Jacob Pan authored May 19, 2010

commit 1dedefd1 upstream.

Some extra CPU features such as ARAT is needed in early boot so
that x86_init function pointers can be set up properly.
http://lkml.org/lkml/2010/5/18/519
At start_kernel() level, this patch moves init_scattered_cpuid_features()
from check_bugs() to setup_arch() -> early_cpu_init() which is earlier than
platform specific x86_init layer setup. Suggested by HPA.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1274295685-6774-2-git-send-email-jacob.jun.pan@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

3a70a506

ACPI: Disable Windows Vista compatibility for Toshiba P305D · a58d4d9d

Zhang Rui authored Sep 28, 2010

commit 337279ce upstream.

Disable the Windows Vista (SP1) compatibility for Toshiba P305D.

http://bugzilla.kernel.org/show_bug.cgi?id=14736Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

a58d4d9d

ACPI: delete ZEPTO idle=nomwait DMI quirk · 47de918e

Len Brown authored Sep 28, 2010

commit 64a32307 upstream.

per comments in the bug report, this entry
seems to hurt at much as it helps.

https://bugzilla.kernel.org/show_bug.cgi?id=10807Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

47de918e

ACPI: EC: add Vista incompatibility DMI entry for Toshiba Satellite L355 · 4caeea1a

Len Brown authored Sep 28, 2010

commit 7a1d602f upstream.

https://bugzilla.kernel.org/show_bug.cgi?id=12641Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

4caeea1a

intel_idle: PCI quirk to prevent Lenovo Ideapad s10-3 boot hang · e6397535

Len Brown authored Sep 24, 2010

commit 4731fdcf upstream.

When the Lenovo Ideapad S10-3 is booted with HT enabled,
it hits a boot hang in the intel_idle driver.

This occurs when entering ATM-C4 for the first time,
unless BM_STS is first cleared.

acpi_idle doesn't see this because it first checks
and clears BM_STS, but it would hit the same hang
if that check were disabled.

http://bugs.meego.com/show_bug.cgi?id=7093
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/634702Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

e6397535

ACPI: enable repeated PCIEXP wakeup by clearing PCIEXP_WAKE_STS on resume · 2c41bbda

Colin Ian King authored Aug 02, 2010

commit 573b6381 upstream.

Section 4.7.3.1.1 (PM1 Status Registers) of version 4.0 of
the ACPI spec concerning PCIEXP_WAKE_STS points out in
in the final note field in table 4-11 that if this bit is
set to 1 and the system is put into a sleeping state then
the system will not automatically wake.

This bit gets set by hardware to indicate that the system
woke up due to a PCI Express wakeup event, so clear it during
acpi_hw_clear_acpi_status() calls to enable subsequent
resumes to work.

BugLink: http://bugs.launchpad.net/bugs/613381Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

2c41bbda

b44: fix carrier detection on bind · c1faa695

Paul Fertser authored Oct 11, 2010

commit bcf64aa3 upstream.

For carrier detection to work properly when binding the driver with a cable
unplugged, netif_carrier_off() should be called after register_netdev(),
not before.
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

c1faa695

powerpc: Don't use kernel stack with translation off · 84d7536c

Michael Neuling authored Aug 25, 2010

commit 54a83404 upstream.

In f761622e we changed
early_setup_secondary so it's called using the proper kernel stack
rather than the emergency one.

Unfortunately, this stack pointer can't be used when translation is off
on PHYP as this stack pointer might be outside the RMO.  This results in
the following on all non zero cpus:
  cpu 0x1: Vector: 300 (Data Access) at [c00000001639fd10]
      pc: 000000000001c50c
      lr: 000000000000821c
      sp: c00000001639ff90
     msr: 8000000000001000
     dar: c00000001639ffa0
   dsisr: 42000000
    current = 0xc000000016393540
    paca    = 0xc000000006e00200
      pid   = 0, comm = swapper

The original patch was only tested on bare metal system, so it never
caught this problem.

This changes __secondary_start so that we calculate the new stack
pointer but only start using it after we've called early_setup_secondary.

With this patch, the above problem goes away.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

84d7536c

powerpc: Initialise paca->kstack before early_setup_secondary · f9ca496a

Matt Evans authored Aug 12, 2010

commit f761622e upstream.

As early setup calls down to slb_initialize(), we must have kstack
initialised before checking "should we add a bolted SLB entry for our kstack?"

Failing to do so means stack access requires an SLB miss exception to refill
an entry dynamically, if the stack isn't accessible via SLB(0) (kernel text
& static data).  It's not always allowable to take such a miss, and
intermittent crashes will result.

Primary CPUs don't have this issue; an SLB entry is not bolted for their
stack anyway (as that lives within SLB(0)).  This patch therefore only
affects the init of secondaries.
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

f9ca496a

r6040: Fix multicast list iteration when hash filter is used · e25a2ee6

Ben Hutchings authored Oct 15, 2010

This was fixed in mainline by the interface change made in commit
f9dcbcc9.

After walking the multicast list to set up the hash filter, this
function will walk off the end of the list when filling the
exact-match entries.  This was fixed in mainline by the interface
change made in commit f9dcbcc9.

Reported-by: spamalot@hispeed.ch
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=15355Reported-by: Jason Heeris <jason.heeris@gmail.com>
Reference: http://bugs.debian.org/600155Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

e25a2ee6

r6040: fix r6040_multicast_list · e3ddbfba

Florian Fainelli authored Apr 07, 2010

commit 3bcf8229 upstream.

As reported in <https://bugzilla.kernel.org/show_bug.cgi?id=15355>, r6040_
multicast_list currently crashes. This is due a wrong maximum of multicast
entries. This patch fixes the following issues with multicast:

- number of maximum entries if off-by-one (4 instead of 3)

- the writing of the hash table index is not necessary and leads to invalid
values being written into the MCR1 register, so the MAC is simply put in a non
coherent state

- when we exceed the maximum number of mutlticast address, writing the
broadcast address should be done in registers MID_1{L,M,H} instead of
MID_O{L,M,H}, otherwise we would loose the adapter's MAC address

[bwh: Adjust for 2.6.32; should also apply to 2.6.27]
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

e3ddbfba

bsg: fix incorrect device_status value · 6cd36d86

FUJITA Tomonori authored Sep 17, 2010

commit 47897160 upstream.

bsg incorrectly returns sg's masked_status value for device_status.

[jejb: fix up expression logic]
Reported-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

6cd36d86

r8169: allocate with GFP_KERNEL flag when able to sleep · f462e0c4

Stanislaw Gruszka authored Oct 08, 2010

commit aeb19f60 upstream.

We have fedora bug report where driver fail to initialize after
suspend/resume because of memory allocation errors:
https://bugzilla.redhat.com/show_bug.cgi?id=629158

To fix use GFP_KERNEL allocation where possible.
Tested-by: Neal Becker <ndbecker2@gmail.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

f462e0c4

skge: add quirk to limit DMA · 1198008f

Stanislaw Gruszka authored Oct 05, 2010

commit 392bd0cb upstream.

Skge devices installed on some Gigabyte motherboards are not able to
perform 64 dma correctly due to board PCI implementation, so limit
DMA to 32bit if such boards are detected.

Bug was reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=447489Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Tested-by: Luya Tshimbalanga <luya@fedoraproject.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

1198008f

net: blackhole route should always be recalculated · 2aff10f5

Jianzhao Wang authored Sep 08, 2010

[ Upstream commit ae2688d5 ]

Blackhole routes are used when xfrm_lookup() returns -EREMOTE (error
triggered by IKE for example), hence this kind of route is always
temporary and so we should check if a better route exists for next
packets.
Bug has been introduced by commit d11a4dc1.
Signed-off-by: Jianzhao Wang <jianzhao.wang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

2aff10f5

rose: Fix signedness issues wrt. digi count. · be743887

David S. Miller authored Sep 20, 2010

[ Upstream commit 9828e6e6 ]

Just use explicit casts, since we really can't change the
types of structures exported to userspace which have been
around for 15 years or so.
Reported-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

be743887

netxen: dont set skb->truesize · 6a20bf09

Eric Dumazet authored Sep 21, 2010

[ Upstream commit 7e96dc70 ]

skb->truesize is set in core network.

Dont change it unless dealing with fragments.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

6a20bf09

tcp: Fix race in tcp_poll · 898e8970

Tom Marshall authored Sep 20, 2010

[ Upstream commit a4d25803 ]

If a RST comes in immediately after checking sk->sk_err, tcp_poll will
return POLLIN but not POLLOUT.  Fix this by checking sk->sk_err at the end
of tcp_poll.  Additionally, ensure the correct order of operations on SMP
machines with memory barriers.
Signed-off-by: Tom Marshall <tdm.code@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

898e8970

net: clear heap allocations for privileged ethtool actions · 3765ba02

Kees Cook authored Oct 11, 2010

[ Upstream commit b00916b1 ]

Several other ethtool functions leave heap uncleared (potentially) by
drivers. Some interfaces appear safe (eeprom, etc), in that the sizes
are well controlled. In some situations (e.g. unchecked error conditions),
the heap will remain unchanged in areas before copying back to userspace.
Note that these are less of an issue since these all require CAP_NET_ADMIN.

Cc: stable@kernel.org
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

3765ba02

ip: fix truesize mismatch in ip fragmentation · cea06e60

Eric Dumazet authored Sep 21, 2010

[ Upstream commit 3d13008e ]

Special care should be taken when slow path is hit in ip_fragment() :

When walking through frags, we transfert truesize ownership from skb to
frags. Then if we hit a slow_path condition, we must undo this or risk
uncharging frags->truesize twice, and in the end, having negative socket
sk_wmem_alloc counter, or even freeing socket sooner than expected.

Many thanks to Nick Bowler, who provided a very clean bug report and
test program.

Thanks to Jarek for reviewing my first patch and providing a V2

While Nick bisection pointed to commit 2b85a34e (net: No more
expensive sock_hold()/sock_put() on each tx), underlying bug is older
(2.6.12-rc5)

A side effect is to extend work done in commit b2722b1c
(ip_fragment: also adjust skb->truesize for packets not owned by a
socket) to ipv6 as well.
Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com>
Tested-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

cea06e60

net: Fix IPv6 PMTU disc. w/ asymmetric routes · da9d9968

Maciej Żenczykowski authored Oct 03, 2010

[ Upstream commit ae878ae2 ]
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

da9d9968

Phonet: Correct header retrieval after pskb_may_pull · 81f9ffe4

Kumar Sanghvi authored Sep 27, 2010

[ Upstream commit a91e7d47 ]

Retrieve the header after doing pskb_may_pull since, pskb_may_pull
could change the buffer structure.

This is based on the comment given by Eric Dumazet on Phonet
Pipe controller patch for a similar problem.
Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

81f9ffe4

net: Fix the condition passed to sk_wait_event() · a39fcb13

Nagendra Tomar authored Oct 02, 2010

[ Upstream commit 482964e5 ]

This patch fixes the condition (3rd arg) passed to sk_wait_event() in
sk_stream_wait_memory(). The incorrect check in sk_stream_wait_memory()
causes the following soft lockup in tcp_sendmsg() when the global tcp
memory pool has exhausted.

>>> snip <<<

localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429]
localhost kernel: CPU 3:
localhost kernel: RIP: 0010:[sk_stream_wait_memory+0xcd/0x200]  [sk_stream_wait_memory+0xcd/0x200] sk_stream_wait_memory+0xcd/0x200
localhost kernel:
localhost kernel: Call Trace:
localhost kernel:  [sk_stream_wait_memory+0x1b1/0x200] sk_stream_wait_memory+0x1b1/0x200
localhost kernel:  [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel:  [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0
localhost kernel:  [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140
localhost kernel:  [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130
localhost kernel:  [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel:  [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170
localhost kernel:  [vfs_write+0x185/0x190] vfs_write+0x185/0x190
localhost kernel:  [sys_write+0x50/0x90] sys_write+0x50/0x90
localhost kernel:  [system_call+0x7e/0x83] system_call+0x7e/0x83

>>> snip <<<

What is happening is, that the sk_wait_event() condition passed from
sk_stream_wait_memory() evaluates to true for the case of tcp global memory
exhaustion. This is because both sk_stream_memory_free() and vm_wait are true
which causes sk_wait_event() to *not* call schedule_timeout().
Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping.
This causes the caller to again try allocation, which again fails and again
calls sk_stream_wait_memory(), and so on.

[ Bug introduced by commit c1cbe4b7
  ("[NET]: Avoid atomic xchg() for non-error case") -DaveM ]
Signed-off-by: Nagendra Singh Tomar <tomer_iisc@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

a39fcb13

tcp: Fix >4GB writes on 64-bit. · 237a9f8f

David S. Miller authored Sep 27, 2010

[ Upstream commit 01db403c ]

Fixes kernel bugzilla #16603

tcp_sendmsg() truncates iov_len to an 'int' which a 4GB write to write
zero bytes, for example.

There is also the problem higher up of how verify_iovec() works.  It
wants to prevent the total length from looking like an error return
value.

However it does this using 'int', but syscalls return 'long' (and
thus signed 64-bit on 64-bit machines).  So it could trigger
false-positives on 64-bit as written.  So fix it to use 'long'.
Reported-by: Olaf Bonorden <bono@onlinehome.de>
Reported-by: Daniel Büse <dbuese@gmx.de>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

237a9f8f