- 07 Mar, 2014 40 commits
-
-
Simon Wunderlich authored
[ Upstream commit b2262df7 ] Since batadv_orig_node_new() sets the refcount to two, assuming that the calling function will use a reference for putting the orig_node into a hash or similar, both references must be freed if initialization of the orig_node fails. Otherwise that object may be leaked in that error case. Reported-by:
Antonio Quartulli <antonio@meshcoding.com> Signed-off-by:
Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by:
Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by:
Antonio Quartulli <antonio@meshcoding.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Antonio Quartulli authored
[ Upstream commit 08bf0ed2 ] When adding a new neighbour it is important to atomically perform the following: - check if the neighbour already exists - append the neighbour to the proper list If the two operations are not performed in an atomic context it is possible that two concurrent insertions add the same neighbour twice. Signed-off-by:
Antonio Quartulli <antonio@open-mesh.com> Signed-off-by:
Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Antonio Quartulli authored
[ Upstream commit f1791425 ] pskb_may_pull() returns 1 on success and 0 in case of failure, therefore checking for the return value being negative does not make sense at all. This way if the function fails we will probably read beyond the current skb data buffer. Fix this by doing the proper check. Signed-off-by:
Antonio Quartulli <antonio@meshcoding.com> Signed-off-by:
Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Antonio Quartulli authored
[ Upstream commit 91c2b1a9 ] There is a refcounter unbalance in the CRC checking routine invoked on OGM reception. A vlan object is retrieved (thus its refcounter is increased by one) but it is never properly released. This leads to a memleak because the vlan object will never be free'd. Fix this by releasing the vlan object after having read the CRC. Reported-by:
Russell Senior <russell@personaltelco.net> Reported-by:
Daniel <daniel@makrotopia.org> Reported-by:
cmsv <cmsv@wirelesspt.net> Signed-off-by:
Antonio Quartulli <antonio@meshcoding.com> Signed-off-by:
Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Antonio Quartulli authored
[ Upstream commit e889241f ] When accessing a TT-TVLV container in the OGM RX path the variable pointing to the list of changes to apply is altered by mistake. This makes the TT component read data at the wrong position in the OGM packet buffer. Fix it by removing the bogus pointer alteration. Signed-off-by:
Antonio Quartulli <antonio@meshcoding.com> Signed-off-by:
Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Antonio Quartulli authored
[ Upstream commit 930cd6e4 ] The current MTU computation always returns a value smaller than 1500bytes even if the real interfaces have an MTU large enough to compensate the batman-adv overhead. Fix the computation by properly returning the highest admitted value. Introduced by a19d3d85 ("batman-adv: limit local translation table max size") Reported-by:
Russell Senior <russell@personaltelco.net> Signed-off-by:
Antonio Quartulli <antonio@meshcoding.com> Signed-off-by:
Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit ed98df33 ] sock_alloc_send_pskb() & sk_page_frag_refill() have a loop trying high order allocations to prepare skb with low number of fragments as this increases performance. Problem is that under memory pressure/fragmentation, this can trigger OOM while the intent was only to try the high order allocations, then fallback to order-0 allocations. We had various reports from unexpected regressions. According to David, setting __GFP_NORETRY should be fine, as the asynchronous compaction is still enabled, and this will prevent OOM from kicking as in : CFSClientEventm invoked oom-killer: gfp_mask=0x42d0, order=3, oom_adj=0, oom_score_adj=0, oom_score_badness=2 (enabled),memcg_scoring=disabled CFSClientEventm Call Trace: [<ffffffff8043766c>] dump_header+0xe1/0x23e [<ffffffff80437a02>] oom_kill_process+0x6a/0x323 [<ffffffff80438443>] out_of_memory+0x4b3/0x50d [<ffffffff8043a4a6>] __alloc_pages_may_oom+0xa2/0xc7 [<ffffffff80236f42>] __alloc_pages_nodemask+0x1002/0x17f0 [<ffffffff8024bd23>] alloc_pages_current+0x103/0x2b0 [<ffffffff8028567f>] sk_page_frag_refill+0x8f/0x160 [<ffffffff80295fa0>] tcp_sendmsg+0x560/0xee0 [<ffffffff802a5037>] inet_sendmsg+0x67/0x100 [<ffffffff80283c9c>] __sock_sendmsg_nosec+0x6c/0x90 [<ffffffff80283e85>] sock_sendmsg+0xc5/0xf0 [<ffffffff802847b6>] __sys_sendmsg+0x136/0x430 [<ffffffff80284ec8>] sys_sendmsg+0x88/0x110 [<ffffffff80711472>] system_call_fastpath+0x16/0x1b Out of Memory: Kill process 2856 (bash) score 9999 or sacrifice child Signed-off-by:
Eric Dumazet <edumazet@google.com> Acked-by:
David Rientjes <rientjes@google.com> Acked-by:
"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
willy tarreau authored
[ Upstream commit 71f6d1b3 ] Right now the mvneta driver doesn't handle Tx IRQ, and relies on two mechanisms to flush Tx descriptors : a flush at the end of mvneta_tx() and a timer. If a burst of packets is emitted faster than the device can send them, then the queue is stopped until next wake-up of the timer 10ms later. This causes jerky output traffic with bursts and pauses, making it difficult to reach line rate with very few streams. A test on UDP traffic shows that it's not possible to go beyond 134 Mbps / 12 kpps of outgoing traffic with 1500-bytes IP packets. Routed traffic tends to observe pauses as well if the traffic is bursty, making it even burstier after the wake-up. It seems that this feature was inherited from the original driver but nothing there mentions any reason for not using the interrupt instead, which the chip supports. Thus, this patch enables Tx interrupts and removes the timer. It does the two at once because it's not really possible to make the two mechanisms coexist, so a split patch doesn't make sense. First tests performed on a Mirabox (Armada 370) show that less CPU seems to be used when sending traffic. One reason might be that we now call the mvneta_tx_done_gbe() with a mask indicating which queues have been done instead of looping over all of them. The same UDP test above now happily reaches 987 Mbps / 87.7 kpps. Single-stream TCP traffic can now more easily reach line rate. HTTP transfers of 1 MB objects over a single connection went from 730 to 840 Mbps. It is even possible to go significantly higher (>900 Mbps) by tweaking tcp_tso_win_divisor. Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Cc: Arnaud Ebalard <arno@natisbad.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Tested-by:
Arnaud Ebalard <arno@natisbad.org> Signed-off-by:
Willy Tarreau <w@1wt.eu> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
willy tarreau authored
[ Upstream commit 40ba35e7 ] Marvell has not published the chip's datasheet yet, so it's very hard to find the relevant bits to manipulate to change the IRQ behaviour. Fortunately, these bits are described in the proprietary LSP patch set which is publicly available here : http://www.plugcomputer.org/downloads/mirabox/ So let's put them back in the driver in order to reduce the burden of current and future maintenance. Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by:
Arnaud Ebalard <arno@natisbad.org> Signed-off-by:
Willy Tarreau <w@1wt.eu> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
willy tarreau authored
[ Upstream commit 29021366 ] If a queue timeout is reported, we can oops because of some schedules while the caller is atomic, as shown below : mvneta d0070000.ethernet eth0: tx timeout BUG: scheduling while atomic: bash/1528/0x00000100 Modules linked in: slhttp_ethdiv(C) [last unloaded: slhttp_ethdiv] CPU: 2 PID: 1528 Comm: bash Tainted: G WC 3.13.0-rc4-mvebu-nf #180 [<c0011bd9>] (unwind_backtrace+0x1/0x98) from [<c000f1ab>] (show_stack+0xb/0xc) [<c000f1ab>] (show_stack+0xb/0xc) from [<c02ad323>] (dump_stack+0x4f/0x64) [<c02ad323>] (dump_stack+0x4f/0x64) from [<c02abe67>] (__schedule_bug+0x37/0x4c) [<c02abe67>] (__schedule_bug+0x37/0x4c) from [<c02ae261>] (__schedule+0x325/0x3ec) [<c02ae261>] (__schedule+0x325/0x3ec) from [<c02adb97>] (schedule_timeout+0xb7/0x118) [<c02adb97>] (schedule_timeout+0xb7/0x118) from [<c0020a67>] (msleep+0xf/0x14) [<c0020a67>] (msleep+0xf/0x14) from [<c01dcbe5>] (mvneta_stop_dev+0x21/0x194) [<c01dcbe5>] (mvneta_stop_dev+0x21/0x194) from [<c01dcfe9>] (mvneta_tx_timeout+0x19/0x24) [<c01dcfe9>] (mvneta_tx_timeout+0x19/0x24) from [<c024afc7>] (dev_watchdog+0x18b/0x1c4) [<c024afc7>] (dev_watchdog+0x18b/0x1c4) from [<c0020b53>] (call_timer_fn.isra.27+0x17/0x5c) [<c0020b53>] (call_timer_fn.isra.27+0x17/0x5c) from [<c0020cad>] (run_timer_softirq+0x115/0x170) [<c0020cad>] (run_timer_softirq+0x115/0x170) from [<c001ccb9>] (__do_softirq+0xbd/0x1a8) [<c001ccb9>] (__do_softirq+0xbd/0x1a8) from [<c001cfad>] (irq_exit+0x61/0x98) [<c001cfad>] (irq_exit+0x61/0x98) from [<c000d4bf>] (handle_IRQ+0x27/0x60) [<c000d4bf>] (handle_IRQ+0x27/0x60) from [<c000843b>] (armada_370_xp_handle_irq+0x33/0xc8) [<c000843b>] (armada_370_xp_handle_irq+0x33/0xc8) from [<c000fba9>] (__irq_usr+0x49/0x60) Ben Hutchings attempted to propose a better fix consisting in using a scheduled work for this, but while it fixed this panic, it caused other random freezes and panics proving that the reset sequence in the driver is unreliable and that additional fixes should be investigated. When sending multiple streams over a link limited to 100 Mbps, Tx timeouts happen from time to time, and the driver correctly recovers only when the function is disabled. Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Cc: Ben Hutchings <ben@decadent.org.uk> Tested-by:
Arnaud Ebalard <arno@natisbad.org> Signed-off-by:
Willy Tarreau <w@1wt.eu> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
willy tarreau authored
[ Upstream commit 74c41b04 ] Stats writers are mvneta_rx() and mvneta_tx(). They don't lock anything when they update the stats, and as a result, it randomly happens that the stats freeze on SMP if two updates happen during stats retrieval. This is very easily reproducible by starting two HTTP servers and binding each of them to a different CPU, then consulting /proc/net/dev in loops during transfers, the interface should immediately lock up. This issue also randomly happens upon link state changes during transfers, because the stats are collected in this situation, but it takes more attempts to reproduce it. The comments in netdevice.h suggest using per_cpu stats instead to get rid of this issue. This patch implements this. It merges both rx_stats and tx_stats into a single "stats" member with a single syncp. Both mvneta_rx() and mvneta_rx() now only update the a single CPU's counters. In turn, mvneta_get_stats64() does the summing by iterating over all CPUs to get their respective stats. With this change, stats are still correct and no more lockup is encountered. Note that this bug was present since the first import of the mvneta driver. It might make sense to backport it to some stable trees. If so, it depends on "d33dc73 net: mvneta: increase the 64-bit rx/tx stats out of the hot path". Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Reviewed-by:
Eric Dumazet <edumazet@google.com> Tested-by:
Arnaud Ebalard <arno@natisbad.org> Signed-off-by:
Willy Tarreau <w@1wt.eu> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
willy tarreau authored
[ Upstream commit dc4277dd ] Better count packets and bytes in the stack and on 32 bit then accumulate them at the end for once. This saves two memory writes and two memory barriers per packet. The incoming packet rate was increased by 4.7% on the Openblocks AX3 thanks to this. Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Reviewed-by:
Eric Dumazet <edumazet@google.com> Tested-by:
Arnaud Ebalard <arno@natisbad.org> Signed-off-by:
Willy Tarreau <w@1wt.eu> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Florian Westphal authored
commit fe6cc55f upstream. Marcelo Ricardo Leitner reported problems when the forwarding link path has a lower mtu than the incoming one if the inbound interface supports GRO. Given: Host <mtu1500> R1 <mtu1200> R2 Host sends tcp stream which is routed via R1 and R2. R1 performs GRO. In this case, the kernel will fail to send ICMP fragmentation needed messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu checks in forward path. Instead, Linux tries to send out packets exceeding the mtu. When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does not fragment the packets when forwarding, and again tries to send out packets exceeding R1-R2 link mtu. This alters the forwarding dstmtu checks to take the individual gso segment lengths into account. For ipv6, we send out pkt too big error for gso if the individual segments are too big. For ipv4, we either send icmp fragmentation needed, or, if the DF bit is not set, perform software segmentation and let the output path create fragments when the packet is leaving the machine. It is not 100% correct as the error message will contain the headers of the GRO skb instead of the original/segmented one, but it seems to work fine in my (limited) tests. Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid sofware segmentation. However it turns out that skb_segment() assumes skb nr_frags is related to mss size so we would BUG there. I don't want to mess with it considering Herbert and Eric disagree on what the correct behavior should be. Hannes Frederic Sowa notes that when we would shrink gso_size skb_segment would then also need to deal with the case where SKB_MAX_FRAGS would be exceeded. This uses sofware segmentation in the forward path when we hit ipv4 non-DF packets and the outgoing link mtu is too small. Its not perfect, but given the lack of bug reports wrt. GRO fwd being broken this is a rare case anyway. Also its not like this could not be improved later once the dust settles. Acked-by:
Herbert Xu <herbert@gondor.apana.org.au> Reported-by:
Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by:
Florian Westphal <fw@strlen.de> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Florian Westphal authored
commit d2069403 upstream. Will be used by upcoming ipv4 forward path change that needs to determine feature mask using skb->dst->dev instead of skb->dev. Signed-off-by:
Florian Westphal <fw@strlen.de> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Florian Westphal authored
commit de960aa9 upstream. This moves part of Eric Dumazets skb_gso_seglen helper from tbf sched to skbuff core so it may be reused by upcoming ip forwarding path patch. Signed-off-by:
Florian Westphal <fw@strlen.de> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Borkmann authored
[ Upstream commit ffd59393 ] SCTP's sctp_connectx() abi breaks for 64bit kernels compiled with 32bit emulation (e.g. ia32 emulation or x86_x32). Due to internal usage of 'struct sctp_getaddrs_old' which includes a struct sockaddr pointer, sizeof(param) check will always fail in kernel as the structure in 64bit kernel space is 4bytes larger than for user binaries compiled in 32bit mode. Thus, applications making use of sctp_connectx() won't be able to run under such circumstances. Introduce a compat interface in the kernel to deal with such situations by using a 'struct compat_sctp_getaddrs_old' structure where user data is copied into it, and then sucessively transformed into a 'struct sctp_getaddrs_old' structure with the help of compat_ptr(). That fixes sctp_connectx() abi without any changes needed in user space, and lets the SCTP test suite pass when compiled in 32bit and run on 64bit kernels. Fixes: f9c67811 ("sctp: Fix regression introduced by new sctp_connectx api") Signed-off-by:
Daniel Borkmann <dborkman@redhat.com> Acked-by:
Neil Horman <nhorman@tuxdriver.com> Acked-by:
Vlad Yasevich <vyasevich@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Duan Jiong authored
[ Upstream commit a6254864 ] since commit 89aef892("ipv4: Delete routing cache."), the counter in_slow_tot can't work correctly. The counter in_slow_tot increase by one when fib_lookup() return successfully in ip_route_input_slow(), but actually the dst struct maybe not be created and cached, so we can increase in_slow_tot after the dst struct is created. Signed-off-by:
Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jiri Bohac authored
[ Upstream commit 163c8ff3 ] aggregator_identifier is used to assign unique aggregator identifiers to aggregators of a bond during device enslaving. aggregator_identifier is currently a global variable that is zeroed in bond_3ad_initialize(). This sequence will lead to duplicate aggregator identifiers for eth1 and eth3: create bond0 change bond0 mode to 802.3ad enslave eth0 to bond0 //eth0 gets agg id 1 enslave eth1 to bond0 //eth1 gets agg id 2 create bond1 change bond1 mode to 802.3ad enslave eth2 to bond1 //aggregator_identifier is reset to 0 //eth2 gets agg id 1 enslave eth3 to bond0 //eth3 gets agg id 2 Fix this by making aggregator_identifier private to the bond. Signed-off-by:
Jiri Bohac <jbohac@suse.cz> Acked-by:
Veaceslav Falico <vfalico@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Emil Goode authored
[ Upstream commit eb85569f ] This patch removes a generic hard_header_len check from the usbnet module that is causing dropped packages under certain circumstances for devices that send rx packets that cross urb boundaries. One example is the AX88772B which occasionally send rx packets that cross urb boundaries where the remaining partial packet is sent with no hardware header. When the buffer with a partial packet is of less number of octets than the value of hard_header_len the buffer is discarded by the usbnet module. With AX88772B this can be reproduced by using ping with a packet size between 1965-1976. The bug has been reported here: https://bugzilla.kernel.org/show_bug.cgi?id=29082 This patch introduces the following changes: - Removes the generic hard_header_len check in the rx_complete function in the usbnet module. - Introduces a ETH_HLEN check for skbs that are not cloned from within a rx_fixup callback. - For safety a hard_header_len check is added to each rx_fixup callback function that could be affected by this change. These extra checks could possibly be removed by someone who has the hardware to test. - Removes a call to dev_kfree_skb_any() and instead utilizes the dev->done list to queue skbs for cleanup. The changes place full responsibility on the rx_fixup callback functions that clone skbs to only pass valid skbs to the usbnet_skb_return function. Signed-off-by:
Emil Goode <emilgoode@gmail.com> Reported-by:
Igor Gnatenko <i.gnatenko.brain@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nicolas Dichtel authored
[ Upstream commit 08b44656 ] This bug was reported by Steinar H. Gunderson and was introduced by commit f7cb8886 ("sit/gre6: don't try to add the same route two times"). root@morgental:~# ip tunnel add foo mode gre remote 1.2.3.4 ttl 64 root@morgental:~# ip link set foo up mtu 1468 root@morgental:~# ip -6 route show dev foo fe80::/64 proto kernel metric 256 but after the above commit, no such route shows up. There is no link local route because dev->dev_addr is 0 (because local ipv4 address is 0), hence no link local address is configured. In this scenario, the link local address is added manually: 'ip -6 addr add fe80::1 dev foo' and because prefix is /128, no link local route is added by the kernel. Even if the right things to do is to add the link local address with a /64 prefix, we need to restore the previous behavior to avoid breaking userpace. Reported-by:
Steinar H. Gunderson <sesse@samfundet.no> Signed-off-by:
Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Emil Goode authored
[ Upstream commit d43ff4cd ] The struct driver_info ax88178_info is assigned the function asix_rx_fixup_common as it's rx_fixup callback. This means that FLAG_MULTI_PACKET must be set as this function is cloning the data and calling usbnet_skb_return. Not setting this flag leads to usbnet_skb_return beeing called a second time from within the rx_process function in the usbnet module. Signed-off-by:
Emil Goode <emilgoode@gmail.com> Reported-by:
Bjørn Mork <bjorn@mork.no> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Haiyang Zhang authored
[ Upstream commit 891de74d ] Without this patch, the "cat /sys/class/net/ethN/operstate" shows "unknown", and "ethtool ethN" shows "Link detected: yes", when VM boots up with or without vNIC connected. This patch fixed the problem. Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by:
K. Y. Srinivasan <kys@microsoft.com> Acked-by:
Jason Wang <jasowang@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michael S. Tsirkin authored
[ Upstream commit 0ad8b480 ] vhost checked the counter within the refcnt before decrementing. It really wanted to know that it is the one that has the last reference, as a way to batch freeing resources a bit more efficiently. Note: we only let refcount go to 0 on device release. This works well but we now access the ref counter twice so there's a race: all users might see a high count and decide to defer freeing resources. In the end no one initiates freeing resources until the last reference is gone (which is on VM shotdown so might happen after a looooong time). Let's do what we probably should have done straight away: switch from kref to plain atomic, documenting the semantics, return the refcount value atomically after decrement, then use that to avoid the deadlock. Reported-by:
Qin Chuanyu <qinchuanyu@huawei.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Acked-by:
Jason Wang <jasowang@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nithin Sujir authored
[ Upstream commit c6993dfd ] Quoting David Vrabel - "5780 cards cannot have jumbo frames and TSO enabled together. When jumbo frames are enabled by setting the MTU, the TSO feature must be cleared. This is done indirectly by calling netdev_update_features() which will call tg3_fix_features() to actually clear the flags. netdev_update_features() will also trigger a new netlink message for the feature change event which will result in a call to tg3_get_stats64() which deadlocks on the tg3 lock." tg3_set_mtu() does not need to be under the tg3 lock since converting the flags to use set_bit(). Move it out to after tg3_netif_stop(). Reported-by:
David Vrabel <david.vrabel@citrix.com> Tested-by:
David Vrabel <david.vrabel@citrix.com> Signed-off-by:
Michael Chan <mchan@broadcom.com> Signed-off-by:
Nithin Nayak Sujir <nsujir@broadcom.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
John Ogness authored
[ Upstream commit bf06200e ] Commit 46d3ceab ("tcp: TCP Small Queues") introduced a possible regression for applications using TCP_NODELAY. If TCP session is throttled because of tsq, we should consult tp->nonagle when TX completion is done and allow us to send additional segment, especially if this segment is not a full MSS. Otherwise this segment is sent after an RTO. [edumazet] : Cooked the changelog, added another fix about testing sk_wmem_alloc twice because TX completion can happen right before setting TSQ_THROTTLED bit. This problem is particularly visible with recent auto corking, but might also be triggered with low tcp_limit_output_bytes values or NIC drivers delaying TX completion by hundred of usec, and very low rtt. Thomas Glanzmann for example reported an iscsi regression, caused by tcp auto corking making this bug quite visible. Fixes: 46d3ceab ("tcp: TCP Small Queues") Signed-off-by:
John Ogness <john.ogness@linutronix.de> Signed-off-by:
Eric Dumazet <edumazet@google.com> Reported-by:
Thomas Glanzmann <thomas@glanzmann.de> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Bjørn Mork authored
[ Upstream commit fbd3a77d ] This device was mentioned in an OpenWRT forum. Seems to have a "standard" Sierra Wireless ifnumber to function layout: 0: qcdm 2: nmea 3: modem 8: qmi 9: storage Signed-off-by:
Bjørn Mork <bjorn@mork.no> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sabrina Dubroca authored
[ Upstream commit 00fe11b3 ] Currently, to make netconsole start over IPv6, the source address needs to be specified. Without a source address, netpoll_parse_options assumes we're setting up over IPv4 and the destination IPv6 address is rejected. Check if the IP version has been forced by a source address before checking for a version mismatch when parsing the destination address. Signed-off-by:
Sabrina Dubroca <sd@queasysnail.net> Acked-by:
Cong Wang <cwang@twopensource.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Maciej Żenczykowski authored
[ Upstream commit 946c032e ] ip rules with iif/oif references do not update: (detach/attach) across interface renames. Signed-off-by:
Maciej Żenczykowski <maze@google.com> CC: Willem de Bruijn <willemb@google.com> CC: Eric Dumazet <edumazet@google.com> CC: Chris Davis <chrismd@google.com> CC: Carlo Contavalli <ccontavalli@google.com> Google-Bug-Id: 12936021 Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Geert Uytterhoeven authored
[ Upstream commit 63b5f152 ] On m68k/ARAnyM: WARNING: CPU: 0 PID: 407 at net/ipv4/devinet.c:1599 0x316a99() Modules linked in: CPU: 0 PID: 407 Comm: ifconfig Not tainted 3.13.0-atari-09263-g0c71d68014d1 #1378 Stack from 10c4fdf0: 10c4fdf0 002ffabb 000243e8 00000000 008ced6c 00024416 00316a99 0000063f 00316a99 00000009 00000000 002501b4 00316a99 0000063f c0a86117 00000080 c0a86117 00ad0c90 00250a5a 00000014 00ad0c90 00000000 00000000 00000001 00b02dd0 00356594 00000000 00356594 c0a86117 eff6c9e4 008ced6c 00000002 008ced60 0024f9b4 00250b52 00ad0c90 00000000 00000000 00252390 00ad0c90 eff6c9e4 0000004f 00000000 00000000 eff6c9e4 8000e25c eff6c9e4 80001020 Call Trace: [<000243e8>] warn_slowpath_common+0x52/0x6c [<00024416>] warn_slowpath_null+0x14/0x1a [<002501b4>] rtmsg_ifa+0xdc/0xf0 [<00250a5a>] __inet_insert_ifa+0xd6/0x1c2 [<0024f9b4>] inet_abc_len+0x0/0x42 [<00250b52>] inet_insert_ifa+0xc/0x12 [<00252390>] devinet_ioctl+0x2ae/0x5d6 Adding some debugging code reveals that net_fill_ifaddr() fails in put_cacheinfo(skb, ifa->ifa_cstamp, ifa->ifa_tstamp, preferred, valid)) nla_put complains: lib/nlattr.c:454: skb_tailroom(skb) = 12, nla_total_size(attrlen) = 20 Apparently commit 5c766d64 ("ipv4: introduce address lifetime") forgot to take into account the addition of struct ifa_cacheinfo in inet_nlmsg_size(). Hence add it, like is already done for ipv6. Suggested-by:
Cong Wang <cwang@twopensource.com> Signed-off-by:
Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by:
Cong Wang <cwang@twopensource.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Oliver Hartkopp authored
[ Upstream commit 0ae89beb ] Self generated skbuffs in net/can/bcm.c are setting a skb->sk reference but no explicit destructor which is enforced since Linux 3.11 with commit 376c7311 (net: add a temporary sanity check in skb_orphan()). This patch adds some helper functions to make sure that a destructor is properly defined when a sock reference is assigned to a CAN related skb. To create an unshared skb owned by the original sock a common helper function has been introduced to replace open coded functions to create CAN echo skbs. Signed-off-by:
Oliver Hartkopp <socketcan@hartkopp.net> Tested-by:
Andre Naujoks <nautsch2@gmail.com> Reviewed-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Cong Wang authored
[ Upstream commit dbe17307 ] Commit 93d8bf9f ("bridge: cleanup netpoll code") introduced a check in br_netpoll_enable(), but this check is incorrect for br_netpoll_setup(). This patch moves the code after the check into __br_netpoll_enable() and calls it in br_netpoll_setup(). For br_add_if(), the check is still needed. Fixes: 93d8bf9f ("bridge: cleanup netpoll code") Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by:
Cong Wang <cwang@twopensource.com> Signed-off-by:
Cong Wang <xiyou.wangcong@gmail.com> Acked-by:
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Tested-by:
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Richard Yao authored
[ Upstream commit b6f52ae2 ] The 9p-virtio transport does zero copy on things larger than 1024 bytes in size. It accomplishes this by returning the physical addresses of pages to the virtio-pci device. At present, the translation is usually a bit shift. That approach produces an invalid page address when we read/write to vmalloc buffers, such as those used for Linux kernel modules. Any attempt to load a Linux kernel module from 9p-virtio produces the following stack. [<ffffffff814878ce>] p9_virtio_zc_request+0x45e/0x510 [<ffffffff814814ed>] p9_client_zc_rpc.constprop.16+0xfd/0x4f0 [<ffffffff814839dd>] p9_client_read+0x15d/0x240 [<ffffffff811c8440>] v9fs_fid_readn+0x50/0xa0 [<ffffffff811c84a0>] v9fs_file_readn+0x10/0x20 [<ffffffff811c84e7>] v9fs_file_read+0x37/0x70 [<ffffffff8114e3fb>] vfs_read+0x9b/0x160 [<ffffffff81153571>] kernel_read+0x41/0x60 [<ffffffff810c83ab>] copy_module_from_fd.isra.34+0xfb/0x180 Subsequently, QEMU will die printing: qemu-system-x86_64: virtio: trying to map MMIO memory This patch enables 9p-virtio to correctly handle this case. This not only enables us to load Linux kernel modules off virtfs, but also enables ZFS file-based vdevs on virtfs to be used without killing QEMU. Special thanks to both Avi Kivity and Alexander Graf for their interpretation of QEMU backtraces. Without their guidence, tracking down this bug would have taken much longer. Also, special thanks to Linus Torvalds for his insightful explanation of why this should use is_vmalloc_addr() instead of is_vmalloc_or_module_addr(): https://lkml.org/lkml/2014/2/8/272Signed-off-by:
Richard Yao <ryao@gentoo.org> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Dumazet authored
[ Upstream commit 20e7c4e8 ] When a device ndo_start_xmit() calls again dev_queue_xmit(), lockdep can complain because dev_queue_xmit() is re-entered and the spinlocks protecting tx queues share a common lockdep class. Same issue was fixed for bonding/l2tp/ppp in commits 0daa2303 ("[PATCH] bonding: lockdep annotation") 49ee4920 ("bonding: set qdisc_tx_busylock to avoid LOCKDEP splat") 23d3b8bf ("net: qdisc busylock needs lockdep annotations ") 303c07db ("ppp: set qdisc_tx_busylock to avoid LOCKDEP splat ") Reported-by:
Alexander Aring <alex.aring@gmail.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Tested-by:
Alexander Aring <alex.aring@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andy Adamson authored
commit 146d70ca upstream. Do not return an error when nfs4_copy_delegation_stateid succeeds. Signed-off-by:
Andy Adamson <andros@netapp.com> Link: http://lkml.kernel.org/r/1392737765-41942-1-git-send-email-andros@netapp.com Fixes: ef1820f9 (NFSv4: Don't try to recover NFSv4 locks when...) Cc: NeilBrown <neilb@suse.de> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Trond Myklebust authored
commit fd1defc2 upstream. Commit aa9c2669 (NFS: Client implementation of Labeled-NFS) introduces a performance regression. When nfs_zap_caches_locked is called, it sets the NFS_INO_INVALID_LABEL flag irrespectively of whether or not the NFS server supports security labels. Since that flag is never cleared, it means that all calls to nfs_revalidate_inode() will now trigger an on-the-wire GETATTR call. This patch ensures that we never set the NFS_INO_INVALID_LABEL unless the server advertises support for labeled NFS. It also causes nfs_setsecurity() to clear NFS_INO_INVALID_LABEL when it has successfully set the security label for the inode. Finally it gets rid of the NFS_INO_INVALID_LABEL cruft from nfs_update_inode, which has nothing to do with labeled NFS. Reported-by:
Neil Brown <neilb@suse.de> Tested-by:
Neil Brown <neilb@suse.de> Signed-off-by:
Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Olivier Langlois authored
commit f78bccd7 upstream. rtl8192ce is disabling for too long the local interrupts during hw initiatialisation when performing scans The observable symptoms in dmesg can be: - underruns from ALSA playback - clock freezes (tstamps do not change for several dmesg entries until irqs are finaly reenabled): [ 250.817669] rtlwifi:rtl_op_config():<0-0-0> 0x100 [ 250.817685] rtl8192ce:_rtl92ce_phy_set_rf_power_state():<0-1-0> IPS Set eRf nic enable [ 250.817732] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11 [ 250.817796] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11 [ 250.817910] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11 [ 250.818024] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11 [ 250.818139] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11 [ 250.818253] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11 [ 250.818367] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11 [ 250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11 [ 250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11 [ 250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11 [ 250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11 [ 250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:98053f15:10 [ 250.818472] rtl8192ce:rtl92ce_sw_led_on():<0-1-0> LedAddr:4E ledpin=1 [ 250.818472] rtl8192c_common:rtl92c_download_fw():<0-1-0> Firmware Version(49), Signature(0x88c1),Size(32) [ 250.818472] rtl8192ce:rtl92ce_enable_hw_security_config():<0-1-0> PairwiseEncAlgorithm = 0 GroupEncAlgorithm = 0 [ 250.818472] rtl8192ce:rtl92ce_enable_hw_security_config():<0-1-0> The SECR-value cc [ 250.818472] rtl8192c_common:rtl92c_dm_check_txpower_tracking_thermal_meter():<0-1-0> Schedule TxPowerTracking direct call!! [ 250.818472] rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> rtl92c_dm_txpower_tracking_callback_thermalmeter [ 250.818472] rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> Readback Thermal Meter = 0xe pre thermal meter 0xf eeprom_thermalmeter 0xf [ 250.818472] rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> Initial pathA ele_d reg0xc80 = 0x40000000, ofdm_index=0xc [ 250.818472] rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> Initial reg0xa24 = 0x90e1317, cck_index=0xc, ch14 0 [ 250.818472] rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> Readback Thermal Meter = 0xe pre thermal meter 0xf eeprom_thermalmeter 0xf delta 0x1 delta_lck 0x0 delta_iqk 0x0 [ 250.818472] rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> <=== [ 250.818472] rtl8192c_common:rtl92c_dm_initialize_txpower_tracking_thermalmeter():<0-1-0> pMgntInfo->txpower_tracking = 1 [ 250.818472] rtl8192ce:rtl92ce_led_control():<0-1-0> ledaction 3 [ 250.818472] rtl8192ce:rtl92ce_sw_led_on():<0-1-0> LedAddr:4E ledpin=1 [ 250.818472] rtlwifi:rtl_ips_nic_on():<0-1-0> before spin_unlock_irqrestore [ 251.154656] PCM: Lost interrupts? [Q]-0 (stream=0, delta=15903, new_hw_ptr=293408, old_hw_ptr=277505) The exact code flow that causes that is: 1. wpa_supplicant send a start_scan request to the nl80211 driver 2. mac80211 module call rtl_op_config with IEEE80211_CONF_CHANGE_IDLE 3. rtl_ips_nic_on is called which disable local irqs 4. rtl92c_phy_set_rf_power_state() is called 5. rtl_ps_enable_nic() is called and hw_init()is executed and then the interrupts on the device are enabled A good solution could be to refactor the code to avoid calling rtl92ce_hw_init() with the irqs disabled but a quick and dirty solution that has proven to work is to reenable the irqs during the function rtl92ce_hw_init(). I think that it is safe doing so since the device interrupt will only be enabled after the init function succeed. Signed-off-by:
Olivier Langlois <olivier@trillion01.com> Acked-by:
Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by:
John W. Linville <linville@tuxdriver.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Olivier Langlois authored
commit 2e8c5e56 upstream. rtl_ps_enable_nic() is called from loops that will loop until this function returns true or a maximum number of retries is performed. hw_init() returns non-zero on error. In that situation return false to restore the original design intent to retry hw init when it fails. Signed-off-by:
Olivier Langlois <olivier@trillion01.com> Acked-by:
Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by:
John W. Linville <linville@tuxdriver.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stanislaw Gruszka authored
commit b6213e41 upstream. This patch fixes regression caused by commit a16dad77 "MIPS: Fix potencial corruption". That commit fixes one corruption scenario in cost of adding another one, which actually start to cause crashes on Yeeloong laptop when rtl8187 driver is used. For correct DMA read operation on machines without DMA coherence, kernel have to invalidate cache, such it will refill later with new data that device wrote to memory, when that data is needed to process. We can only invalidate full cache line. Hence when cache line includes both dma buffer and some other data (written in cache, but not yet in main memory), the other data can not hit memory due to invalidation. That happen on rtl8187 where struct rtl8187_priv fields are located just before and after small buffers that are passed to USB layer and DMA is performed on them. To fix the problem we align buffers and reserve space after them to make them match cache line. This patch does not resolve all possible MIPS problems entirely, for that we have to assure that we always map cache aligned buffers for DMA, what can be complex or even not possible. But patch fixes visible and reproducible regression and seems other possible corruptions do not happen in practice, since Yeeloong laptop works stable without rtl8187 driver. Bug report: https://bugzilla.kernel.org/show_bug.cgi?id=54391Reported-by:
Petr Pisar <petr.pisar@atlas.cz> Bisected-by:
Tom Li <biergaizi2009@gmail.com> Reported-and-tested-by:
Tom Li <biergaizi2009@gmail.com> Signed-off-by:
Stanislaw Gruszka <stf_xl@wp.pl> Acked-by:
Larry Finger <Larry.Finger@lwfinger.next> Acked-by:
Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by:
John W. Linville <linville@tuxdriver.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Shilovsky authored
commit 2365c4ea upstream. SMB3 servers can respond with MaxTransactSize of more than 4M that can cause a memory allocation error returned from kmalloc in a lock codepath. Also the client doesn't support multicredit requests now and allows buffer sizes of 65536 bytes only. Set MaxTransactSize to this maximum supported value. Signed-off-by:
Pavel Shilovsky <piastry@etersoft.ru> Acked-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Steve French <smfrench@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jeff Layton authored
commit 5d81de8e upstream. It's possible for userland to pass down an iovec via writev() that has a bogus user pointer in it. If that happens and we're doing an uncached write, then we can end up getting less bytes than we expect from the call to iov_iter_copy_from_user. This is CVE-2014-0069 cifs_iovec_write isn't set up to handle that situation however. It'll blindly keep chugging through the page array and not filling those pages with anything useful. Worse yet, we'll later end up with a negative number in wdata->tailsz, which will confuse the sending routines and cause an oops at the very least. Fix this by having the copy phase of cifs_iovec_write stop copying data in this situation and send the last write as a short one. At the same time, we want to avoid sending a zero-length write to the server, so break out of the loop and set rc to -EFAULT if that happens. This also allows us to handle the case where no address in the iovec is valid. [Note: Marking this for stable on v3.4+ kernels, but kernels as old as v2.6.38 may have a similar problem and may need similar fix] Reviewed-by:
Pavel Shilovsky <piastry@etersoft.ru> Reported-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Steve French <smfrench@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-