- 22 Nov, 2010 9 commits
-
-
Andre Przywara authored
commit 7ef8aa72 upstream. The AMD SSE5 feature set as-it has been replaced by some extensions to the AVX instruction set. Thus the bit formerly advertised as SSE5 is re-used for one of these extensions (XOP). Although this changes the /proc/cpuinfo output, it is not user visible, as there are no CPUs (yet) having this feature. To avoid confusion this should be added to the stable series, too. Signed-off-by:
Andre Przywara <andre.przywara@amd.com> LKML-Reference: <1283778860-26843-2-git-send-email-andre.przywara@amd.com> Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Cliff Wickman authored
commit 3ee48b6a upstream. During the reading of /proc/vmcore the kernel is doing ioremap()/iounmap() repeatedly. And the buildup of un-flushed vm_area_struct's is causing a great deal of overhead. (rb_next() is chewing up most of that time). This solution is to provide function set_iounmap_nonlazy(). It causes a subsequent call to iounmap() to immediately purge the vma area (with try_purge_vmap_area_lazy()). With this patch we have seen the time for writing a 250MB compressed dump drop from 71 seconds to 44 seconds. Signed-off-by:
Cliff Wickman <cpw@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: kexec@lists.infradead.org LKML-Reference: <E1OwHZ4-0005WK-Tw@eag09.americas.sgi.com> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Darren Hart authored
commit 7ada876a upstream. futex_wait() is leaking key references due to futex_wait_setup() acquiring an additional reference via the queue_lock() routine. The nested key ref-counting has been masking bugs and complicating code analysis. queue_lock() is only called with a previously ref-counted key, so remove the additional ref-counting from the queue_(un)lock() functions. Also futex_wait_requeue_pi() drops one key reference too many in unqueue_me_pi(). Remove the key reference handling from unqueue_me_pi(). This was paired with a queue_lock() in futex_lock_pi(), so the count remains unchanged. Document remaining nested key ref-counting sites. Signed-off-by:
Darren Hart <dvhart@linux.intel.com> Reported-and-tested-by: Matthieu Fertré<matthieu.fertre@kerlabs.com> Reported-by: Louis Rilling<louis.rilling@kerlabs.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Kacur <jkacur@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <4CBB17A8.70401@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Alan Cox authored
commit c19483cc upstream. Fortunately this is only exploitable on very unusual hardware. [Reported a while ago but nothing happened so just fixing it] Signed-off-by:
Alan Cox <alan@linux.intel.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Mathieu Desnoyers authored
commit 7740191c upstream. Fix incorrect handling of the following case: INTERACTIVE INTERACTIVE_SOMETHING_ELSE The comparison only checks up to each element's length. Changelog since v1: - Embellish using some Rostedtisms. [ mingo: ^^ == smaller and cleaner ] Signed-off-by:
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by:
Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Lindgren <tony@atomide.com> LKML-Reference: <20100913214700.GB16118@Krystal> Signed-off-by:
Ingo Molnar <mingo@elte.hu> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Vasiliy Kulikov authored
commit 5b917a14 upstream. Structure new_line is copied to userland with some padding fields unitialized. It leads to leaking of stack memory. Signed-off-by:
Vasiliy Kulikov <segooon@gmail.com> Signed-off-by:
Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Paul Mackerras authored
commit 9f5f9ffe upstream. The logic to distinguish marked instruction events from ordinary events on PPC970 and derivatives was flawed. The result is that instruction sampling didn't get enabled in the PMU for some marked instruction events, so they would never trigger. This fixes it by adding the appropriate break statements in the switch statement. Reported-by:
David Binderman <dcb314@hotmail.com> Signed-off-by:
Paul Mackerras <paulus@samba.org> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Max Vozeler authored
commit 584c5b7c upstream. The way the event handler works can cause it to delay events until eventual wakeup for another event. For example, on device detach (vhci): - Write to sysfs detach file -> usbip_event_add(VDEV_EVENT_DOWN) -> wakeup() #define VDEV_EVENT_DOWN (USBIP_EH_SHUTDOWN | USBIP_EH_RESET). - Event thread wakes up and passes the event to event_handler() to process. - It processes and clears the USBIP_EH_SHUTDOWN flag then returns. - The outer event loop (event_handler_loop()) calls wait_event_interruptible(). The processing of the second flag which is part of VDEV_EVENT_DOWN (USBIP_EH_RESET) did not happen yet. It is delayed until the next event. This means the ->reset callback may not happen for a long time (if ever), leaving the usbip port in a weird state which prevents its reuse. This patch changes the handler to process all flags before waiting for another wakeup. I have verified this change to fix a problem which prevented reattach of a usbip device. It also helps for socket errors which missed the RESET as well. The delayed event processing also affects the stub side of usbip and the error handling there. Signed-off-by:
Max Vozeler <mvz@vozeler.com> Reported-by:
Marco Lancione <marco@optikam.com> Tested-by:
Luc Jalbert <ljalbert@optikam.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Max Vozeler authored
commit 0c9a32f0 upstream. This patch changes vhci to behave like dummy and other hcds when disconnecting a device. Previously detaching a device from the root hub did not notify the usb core of the disconnect and left the device visible. Signed-off-by:
Max Vozeler <mvz@vozeler.com> Reported-by:
Marco Lancione <marco@optikam.com> Tested-by:
Luc Jalbert <ljalbert@optikam.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
- 29 Oct, 2010 31 commits
-
-
Greg Kroah-Hartman authored
-
Stefan Bader authored
commit 39aa3cb3 upstream. So it can be used by all that need to check for that. Signed-off-by:
Stefan Bader <stefan.bader@canonical.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Rémi Denis-Courmont authored
[Solved differently upstream] Network namespace in the Phonet socket stack causes an OOPS when a namespace is destroyed. This occurs as the loopback exit_net handler is called after the Phonet exit_net handler, and re-enters the Phonet stack. I cannot think of any nice way to fix this in kernel <= 2.6.32. For lack of a better solution, disable namespace support completely. If you need that, upgrade to a newer kernel. Signed-off-by:
Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Ben Hutchings <ben@decadent.org.uk> Acked-by:
David S. Miller <davem@davemloft.net>
-
Roland McGrath authored
commit 9aea5a65 upstream. An execve with a very large total of argument/environment strings can take a really long time in the execve system call. It runs uninterruptibly to count and copy all the strings. This change makes it abort the exec quickly if sent a SIGKILL. Note that this is the conservative change, to interrupt only for SIGKILL, by using fatal_signal_pending(). It would be perfectly correct semantics to let any signal interrupt the string-copying in execve, i.e. use signal_pending() instead of fatal_signal_pending(). We'll save that change for later, since it could have user-visible consequences, such as having a timer set too quickly make it so that an execve can never complete, though it always happened to work before. Signed-off-by:
Roland McGrath <roland@redhat.com> Reviewed-by:
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Roland McGrath authored
commit 7993bc1f upstream. This adds a preemption point during the copying of the argument and environment strings for execve, in copy_strings(). There is already a preemption point in the count() loop, so this doesn't add any new points in the abstract sense. When the total argument+environment strings are very large, the time spent copying them can be much more than a normal user time slice. So this change improves the interactivity of the rest of the system when one process is doing an execve with very large arguments. Signed-off-by:
Roland McGrath <roland@redhat.com> Reviewed-by:
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Roland McGrath authored
commit 1b528181 upstream. The CONFIG_STACK_GROWSDOWN variant of setup_arg_pages() does not check the size of the argument/environment area on the stack. When it is unworkably large, shift_arg_pages() hits its BUG_ON. This is exploitable with a very large RLIMIT_STACK limit, to create a crash pretty easily. Check that the initial stack is not too large to make it possible to map in any executable. We're not checking that the actual executable (or intepreter, for binfmt_elf) will fit. So those mappings might clobber part of the initial stack mapping. But that is just userland lossage that userland made happen, not a kernel problem. Signed-off-by:
Roland McGrath <roland@redhat.com> Reviewed-by:
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Mike Galbraith authored
[Not upstream in the same way, as it was fixed differently there] 6f6198a7 sched: kill migration thread in CPU_POST_DEAD instead of CPU_DEAD leaves migration threads lying about. Mask out CPU_TASKS_FROZEN. Signed-off-by:
Mike Galbraith <efault@gmx.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Jacob Pan authored
commit 1dedefd1 upstream. Some extra CPU features such as ARAT is needed in early boot so that x86_init function pointers can be set up properly. http://lkml.org/lkml/2010/5/18/519 At start_kernel() level, this patch moves init_scattered_cpuid_features() from check_bugs() to setup_arch() -> early_cpu_init() which is earlier than platform specific x86_init layer setup. Suggested by HPA. Signed-off-by:
Jacob Pan <jacob.jun.pan@linux.intel.com> LKML-Reference: <1274295685-6774-2-git-send-email-jacob.jun.pan@linux.intel.com> Acked-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Zhang Rui authored
commit 337279ce upstream. Disable the Windows Vista (SP1) compatibility for Toshiba P305D. http://bugzilla.kernel.org/show_bug.cgi?id=14736Signed-off-by:
Zhang Rui <rui.zhang@intel.com> Signed-off-by:
Len Brown <len.brown@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Len Brown authored
commit 64a32307 upstream. per comments in the bug report, this entry seems to hurt at much as it helps. https://bugzilla.kernel.org/show_bug.cgi?id=10807Signed-off-by:
Len Brown <len.brown@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Len Brown authored
commit 7a1d602f upstream. https://bugzilla.kernel.org/show_bug.cgi?id=12641Signed-off-by:
Len Brown <len.brown@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Len Brown authored
commit 4731fdcf upstream. When the Lenovo Ideapad S10-3 is booted with HT enabled, it hits a boot hang in the intel_idle driver. This occurs when entering ATM-C4 for the first time, unless BM_STS is first cleared. acpi_idle doesn't see this because it first checks and clears BM_STS, but it would hit the same hang if that check were disabled. http://bugs.meego.com/show_bug.cgi?id=7093 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/634702Signed-off-by:
Len Brown <len.brown@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Colin Ian King authored
commit 573b6381 upstream. Section 4.7.3.1.1 (PM1 Status Registers) of version 4.0 of the ACPI spec concerning PCIEXP_WAKE_STS points out in in the final note field in table 4-11 that if this bit is set to 1 and the system is put into a sleeping state then the system will not automatically wake. This bit gets set by hardware to indicate that the system woke up due to a PCI Express wakeup event, so clear it during acpi_hw_clear_acpi_status() calls to enable subsequent resumes to work. BugLink: http://bugs.launchpad.net/bugs/613381Signed-off-by:
Colin Ian King <colin.king@canonical.com> Signed-off-by:
Len Brown <len.brown@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Paul Fertser authored
commit bcf64aa3 upstream. For carrier detection to work properly when binding the driver with a cable unplugged, netif_carrier_off() should be called after register_netdev(), not before. Signed-off-by:
Paul Fertser <fercerpav@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Michael Neuling authored
commit 54a83404 upstream. In f761622e we changed early_setup_secondary so it's called using the proper kernel stack rather than the emergency one. Unfortunately, this stack pointer can't be used when translation is off on PHYP as this stack pointer might be outside the RMO. This results in the following on all non zero cpus: cpu 0x1: Vector: 300 (Data Access) at [c00000001639fd10] pc: 000000000001c50c lr: 000000000000821c sp: c00000001639ff90 msr: 8000000000001000 dar: c00000001639ffa0 dsisr: 42000000 current = 0xc000000016393540 paca = 0xc000000006e00200 pid = 0, comm = swapper The original patch was only tested on bare metal system, so it never caught this problem. This changes __secondary_start so that we calculate the new stack pointer but only start using it after we've called early_setup_secondary. With this patch, the above problem goes away. Signed-off-by:
Michael Neuling <mikey@neuling.org> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Matt Evans authored
commit f761622e upstream. As early setup calls down to slb_initialize(), we must have kstack initialised before checking "should we add a bolted SLB entry for our kstack?" Failing to do so means stack access requires an SLB miss exception to refill an entry dynamically, if the stack isn't accessible via SLB(0) (kernel text & static data). It's not always allowable to take such a miss, and intermittent crashes will result. Primary CPUs don't have this issue; an SLB entry is not bolted for their stack anyway (as that lives within SLB(0)). This patch therefore only affects the init of secondaries. Signed-off-by:
Matt Evans <matt@ozlabs.org> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Ben Hutchings authored
This was fixed in mainline by the interface change made in commit f9dcbcc9. After walking the multicast list to set up the hash filter, this function will walk off the end of the list when filling the exact-match entries. This was fixed in mainline by the interface change made in commit f9dcbcc9. Reported-by: spamalot@hispeed.ch Reference: https://bugzilla.kernel.org/show_bug.cgi?id=15355Reported-by:
Jason Heeris <jason.heeris@gmail.com> Reference: http://bugs.debian.org/600155Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Florian Fainelli authored
commit 3bcf8229 upstream. As reported in <https://bugzilla.kernel.org/show_bug.cgi?id=15355>, r6040_ multicast_list currently crashes. This is due a wrong maximum of multicast entries. This patch fixes the following issues with multicast: - number of maximum entries if off-by-one (4 instead of 3) - the writing of the hash table index is not necessary and leads to invalid values being written into the MCR1 register, so the MAC is simply put in a non coherent state - when we exceed the maximum number of mutlticast address, writing the broadcast address should be done in registers MID_1{L,M,H} instead of MID_O{L,M,H}, otherwise we would loose the adapter's MAC address [bwh: Adjust for 2.6.32; should also apply to 2.6.27] Signed-off-by:
Florian Fainelli <florian@openwrt.org> Signed-off-by:
David S. Miller <davem@davemloft.net> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
FUJITA Tomonori authored
commit 47897160 upstream. bsg incorrectly returns sg's masked_status value for device_status. [jejb: fix up expression logic] Reported-by:
Douglas Gilbert <dgilbert@interlog.com> Signed-off-by:
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by:
James Bottomley <James.Bottomley@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Stanislaw Gruszka authored
commit aeb19f60 upstream. We have fedora bug report where driver fail to initialize after suspend/resume because of memory allocation errors: https://bugzilla.redhat.com/show_bug.cgi?id=629158 To fix use GFP_KERNEL allocation where possible. Tested-by:
Neal Becker <ndbecker2@gmail.com> Signed-off-by:
Stanislaw Gruszka <sgruszka@redhat.com> Acked-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Stanislaw Gruszka authored
commit 392bd0cb upstream. Skge devices installed on some Gigabyte motherboards are not able to perform 64 dma correctly due to board PCI implementation, so limit DMA to 32bit if such boards are detected. Bug was reported here: https://bugzilla.redhat.com/show_bug.cgi?id=447489Signed-off-by:
Stanislaw Gruszka <sgruszka@redhat.com> Tested-by:
Luya Tshimbalanga <luya@fedoraproject.org> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Jianzhao Wang authored
[ Upstream commit ae2688d5 ] Blackhole routes are used when xfrm_lookup() returns -EREMOTE (error triggered by IKE for example), hence this kind of route is always temporary and so we should check if a better route exists for next packets. Bug has been introduced by commit d11a4dc1. Signed-off-by:
Jianzhao Wang <jianzhao.wang@6wind.com> Signed-off-by:
Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
David S. Miller authored
[ Upstream commit 9828e6e6 ] Just use explicit casts, since we really can't change the types of structures exported to userspace which have been around for 15 years or so. Reported-by:
Dan Rosenberg <dan.j.rosenberg@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Eric Dumazet authored
[ Upstream commit 7e96dc70 ] skb->truesize is set in core network. Dont change it unless dealing with fragments. Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Tom Marshall authored
[ Upstream commit a4d25803 ] If a RST comes in immediately after checking sk->sk_err, tcp_poll will return POLLIN but not POLLOUT. Fix this by checking sk->sk_err at the end of tcp_poll. Additionally, ensure the correct order of operations on SMP machines with memory barriers. Signed-off-by:
Tom Marshall <tdm.code@gmail.com> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Kees Cook authored
[ Upstream commit b00916b1 ] Several other ethtool functions leave heap uncleared (potentially) by drivers. Some interfaces appear safe (eeprom, etc), in that the sizes are well controlled. In some situations (e.g. unchecked error conditions), the heap will remain unchanged in areas before copying back to userspace. Note that these are less of an issue since these all require CAP_NET_ADMIN. Cc: stable@kernel.org Signed-off-by:
Kees Cook <kees.cook@canonical.com> Acked-by:
Ben Hutchings <bhutchings@solarflare.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Eric Dumazet authored
[ Upstream commit 3d13008e ] Special care should be taken when slow path is hit in ip_fragment() : When walking through frags, we transfert truesize ownership from skb to frags. Then if we hit a slow_path condition, we must undo this or risk uncharging frags->truesize twice, and in the end, having negative socket sk_wmem_alloc counter, or even freeing socket sooner than expected. Many thanks to Nick Bowler, who provided a very clean bug report and test program. Thanks to Jarek for reviewing my first patch and providing a V2 While Nick bisection pointed to commit 2b85a34e (net: No more expensive sock_hold()/sock_put() on each tx), underlying bug is older (2.6.12-rc5) A side effect is to extend work done in commit b2722b1c (ip_fragment: also adjust skb->truesize for packets not owned by a socket) to ipv6 as well. Reported-and-bisected-by:
Nick Bowler <nbowler@elliptictech.com> Tested-by:
Nick Bowler <nbowler@elliptictech.com> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Maciej Żenczykowski authored
[ Upstream commit ae878ae2 ] Signed-off-by:
Maciej Żenczykowski <maze@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Kumar Sanghvi authored
[ Upstream commit a91e7d47 ] Retrieve the header after doing pskb_may_pull since, pskb_may_pull could change the buffer structure. This is based on the comment given by Eric Dumazet on Phonet Pipe controller patch for a similar problem. Signed-off-by:
Kumar Sanghvi <kumar.sanghvi@stericsson.com> Acked-by:
Linus Walleij <linus.walleij@stericsson.com> Acked-by:
Eric Dumazet <eric.dumazet@gmail.com> Acked-by:
Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
Nagendra Tomar authored
[ Upstream commit 482964e5 ] This patch fixes the condition (3rd arg) passed to sk_wait_event() in sk_stream_wait_memory(). The incorrect check in sk_stream_wait_memory() causes the following soft lockup in tcp_sendmsg() when the global tcp memory pool has exhausted. >>> snip <<< localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429] localhost kernel: CPU 3: localhost kernel: RIP: 0010:[sk_stream_wait_memory+0xcd/0x200] [sk_stream_wait_memory+0xcd/0x200] sk_stream_wait_memory+0xcd/0x200 localhost kernel: localhost kernel: Call Trace: localhost kernel: [sk_stream_wait_memory+0x1b1/0x200] sk_stream_wait_memory+0x1b1/0x200 localhost kernel: [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40 localhost kernel: [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0 localhost kernel: [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140 localhost kernel: [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130 localhost kernel: [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40 localhost kernel: [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170 localhost kernel: [vfs_write+0x185/0x190] vfs_write+0x185/0x190 localhost kernel: [sys_write+0x50/0x90] sys_write+0x50/0x90 localhost kernel: [system_call+0x7e/0x83] system_call+0x7e/0x83 >>> snip <<< What is happening is, that the sk_wait_event() condition passed from sk_stream_wait_memory() evaluates to true for the case of tcp global memory exhaustion. This is because both sk_stream_memory_free() and vm_wait are true which causes sk_wait_event() to *not* call schedule_timeout(). Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping. This causes the caller to again try allocation, which again fails and again calls sk_stream_wait_memory(), and so on. [ Bug introduced by commit c1cbe4b7 ("[NET]: Avoid atomic xchg() for non-error case") -DaveM ] Signed-off-by:
Nagendra Singh Tomar <tomer_iisc@yahoo.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-
David S. Miller authored
[ Upstream commit 01db403c ] Fixes kernel bugzilla #16603 tcp_sendmsg() truncates iov_len to an 'int' which a 4GB write to write zero bytes, for example. There is also the problem higher up of how verify_iovec() works. It wants to prevent the total length from looking like an error return value. However it does this using 'int', but syscalls return 'long' (and thus signed 64-bit on 64-bit machines). So it could trigger false-positives on 64-bit as written. So fix it to use 'long'. Reported-by:
Olaf Bonorden <bono@onlinehome.de> Reported-by:
Daniel Büse <dbuese@gmx.de> Reported-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@suse.de>
-