- 30 Jan, 2016 15 commits
-
-
David S. Miller authored
Paolo Abeni says: ==================== ipv6: fix sticky pktinfo behaviour Currently: ip addr add dev eth0 2001:0010::1/64 ip addr add dev eth1 2001:0020::1/64 ping6 -I eth0 2001:0020::2 do not lead to the expected results, i.e. eth1 is used as the egress interface. This is due to two related issues in handling sticky pktinfo, used by ping6 to enforce the device binding: - ip6_dst_lookup_flow()/ip6_dst_lookup_tail() do not really enforce flowi6_oif match - ipv6 udp connect() just ignore flowi6_oif These patches address each issue individually. The kernel has never enforced the egress interface specified via the sticky pktinfo, except briefly between the commits 741a11d9 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set") and d46a9d67 ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr set"), but the ping6 tools was unaffected up to iputils-20100214, since before it used SO_BINDTODEVICE to enforce the egress interface. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
Currently, the egress interface index specified via IPV6_PKTINFO is ignored by __ip6_datagram_connect(), so that RFC 3542 section 6.7 can be subverted when the user space application calls connect() before sendmsg(). Fix it by initializing properly flowi6_oif in connect() before performing the route lookup. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
The current implementation of ip6_dst_lookup_tail basically ignore the egress ifindex match: if the saddr is set, ip6_route_output() purposefully ignores flowi6_oif, due to the commit d46a9d67 ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr set"), if the saddr is 'any' the first route lookup in ip6_dst_lookup_tail fails, but upon failure a second lookup will be performed with saddr set, thus ignoring the ifindex constraint. This commit adds an output route lookup function variant, which allows the caller to specify lookup flags, and modify ip6_dst_lookup_tail() to enforce the ifindex match on the second lookup via said helper. ip6_route_output() becames now a static inline function build on top of ip6_route_output_flags(); as a side effect, out-of-tree modules need now a GPL license to access the output route lookup functionality. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'wireless-drivers-for-davem-2016-01-29' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== iwlwifi * Fix support for 3168 device: * NVM version * firmware file name * device IDs * Fix a compilation warning in dvm calibration code * Fix the TPC (reduced Tx Power) code. This fixes performance issues * Add device IDs for 8265 rtx2x00 * fix monitor mode regression dating back to 4.1 brcmfmac * fix sdio initialisation related crash rtlwifi * rtl8821ae: Fix 5G failure when EEPROM is incorrectly encoded ath9k * ignore eeprom magic mismatch on flash based devices ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ken-ichirou MATSUZAWA authored
We should not trim skb for mmaped socket since its buf size is fixed and userspace will read as frame which data equals head. mmaped socket will not call recvmsg, means max_recvmsg_len is 0, skb_reserve was not called before commit: db65a3aa. Fixes: db65a3aa (netlink: Trim skb to alloc size to avoid MSG_TRUNC) Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Li RongQing authored
The size of all_zeros_mac is 6 byte, but eth_hash() will access the 8 byte, and KASan reported the below bug: [ 8596.479031] BUG: KASan: out of bounds access in __vxlan_find_mac+0x24/0x100 at addr ffffffff841514c0 [ 8596.487647] Read of size 8 by task ip/52820 [ 8596.490818] Address belongs to variable all_zeros_mac+0x0/0x40 [ 8596.496051] CPU: 0 PID: 52820 Comm: ip Tainted: G WC 4.1.15 #1 [ 8596.503520] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 02/10/2014 [ 8596.509365] ffffffff841514c0 ffff88007450f0b8 ffffffff822fa5e1 0000000000000032 [ 8596.516112] ffff88007450f150 ffff88007450f138 ffffffff812dd58c ffff88007450f1d8 [ 8596.522856] ffffffff81113b80 0000000000000282 0000000000000001 ffffffff8101ee4d [ 8596.529599] Call Trace: [ 8596.530858] [<ffffffff822fa5e1>] dump_stack+0x4f/0x7b [ 8596.535080] [<ffffffff812dd58c>] kasan_report_error+0x3bc/0x3f0 [ 8596.540258] [<ffffffff81113b80>] ? __lock_acquire+0x90/0x2140 [ 8596.545245] [<ffffffff8101ee4d>] ? save_stack_trace+0x2d/0x80 [ 8596.550234] [<ffffffff812dda70>] kasan_report+0x40/0x50 [ 8596.554647] [<ffffffff81b211e4>] ? __vxlan_find_mac+0x24/0x100 [ 8596.559729] [<ffffffff812dc399>] __asan_load8+0x69/0xa0 [ 8596.564141] [<ffffffff81b211e4>] __vxlan_find_mac+0x24/0x100 [ 8596.569033] [<ffffffff81b2683d>] vxlan_fdb_create+0x9d/0x570 it can be fixed by enlarging the all_zeros_mac to 8 byte, although it is harmless; eth_hash() will be called in other place with the memory which is larger and equal to 8 byte. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
Currently the port based VLAN maps should be configured to allow every port to egress frames on all other ports, except themselves. The debugfs interface shows that they are misconfigured. For instance, a 7-port switch has the following content in the related register 0x06: GLOBAL GLOBAL2 SERDES 0 1 2 3 4 5 6 ... 6: 1fa4 1f0f 4 7f 7e 7d 7c 7b 7a 79 ... This means that port 3 is allowed to talk to port 2-6, but cannot talk to ports 0 and 1. With this fix, port 3 can correctly talk to all ports except 3 itself: GLOBAL GLOBAL2 SERDES 0 1 2 3 4 5 6 ... 6: 1fa4 1f0f 4 7e 7d 7b 77 6f 5f 3f ... Fixes: ede8098d ("net: dsa: mv88e6xxx: bridges do not need an FID") Reported-by: Kevin Smith <kevin.smith@elecsyscorp.com> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Kevin Smith <kevin.smith@elecsyscorp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Duyck authored
The fib_table_lookup function had a shift by 32 that triggered a UBSAN warning. This was due to the fact that I had placed the shift first and then followed it with the check for the suffix length to ignore the undefined behavior. If we reorder this so that we verify the suffix is less than 32 before shifting the value we can avoid the issue. Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
The moxart ethernet driver confuses coherent DMA buffers with MMIO registers. moxart_ether.c: In function 'moxart_mac_setup_desc_ring': moxart_ether.c:146:428: error: passing argument 1 of '__fswab32' makes integer from pointer without a cast [-Werror=int-conversion] moxart_ether.c:74:39: warning: incorrect type in argument 3 (different address spaces) moxart_ether.c:74:39: expected void *cpu_addr moxart_ether.c:74:39: got void [noderef] <asn:2>*tx_desc_base This leaves the basic logic alone and uses normal pointers for the virtual address of the descriptor. As we cannot use readl/writel to access them, we also introduce our own moxart_desc_read moxart_desc_write helpers that perform the same endianess swap as the original code, but without the address space conversion. The barriers are made explicit here where needed: Even in the worst-case scenario, we just have to use a rmb() after checking ownership so we don't read any input data before we are sure it is value, and we use wmb() before transferring ownership back to the device. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
When CONFIG_PROC_FS, CONFIG_IP_PNP_BOOTP, CONFIG_IP_PNP_DHCP and CONFIG_IP_PNP_RARP are all disabled, we get a warning about the ic_proto_used variable being unused: net/ipv4/ipconfig.c:146:12: error: 'ic_proto_used' defined but not used [-Werror=unused-variable] This avoids the warning, by making the definition conditional on whether a dynamic IP configuration protocol is configured. If not, we know that the value is always zero, so we can optimize away the variable and all code that depends on it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Michael Chan says: ==================== bnxt_en: Bug fixes. 3 small bug fix patches for net. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
The ring index j is not wrapped properly at the end of the ring, causing it to reference pointers past the end of the ring. For proper loop termination and to access the ring properly, we need to increment j and mask it before referencing the ring entry. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
This hardware counter is misleading as it counts dropped packets that don't match the hardware filters for unicast/broadcast/multicast. We will still report this counter in ethtool -S for diagnostics purposes. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Prashant Sreedharan authored
Use completion ring for ring free response from firmware. The response will be the last entry in the ring and we can free the ring after getting the response. This will guarantee no spurious DMA to freed memory. Signed-off-by: Prashant Sreedharan <prashant@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bernie Harris authored
There are cases where qdisc_dequeue_peeked can return NULL, and the result is dereferenced later on in the function. Similarly to the other qdisc dequeue functions, check whether the skb pointer is NULL and if it is, goto out. Signed-off-by: Bernie Harris <bernie.harris@alliedtelesis.co.nz> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 29 Jan, 2016 22 commits
-
-
Kefeng Wang authored
Convert the driver to use ns_to_timespec64() and timespec64_to_ns() instead of open coding the same logic. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jörg Thalheim authored
Signed-off-by: Jörg Thalheim <joerg@higgsboson.tk> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Currently when a macvlan is being initialized and the lower device is netif_carrier_ok(), the macvlan device doesn't run through rfc2863_policy() and is left with UNKNOWN operstate. Fix it by adding an unconditional linkwatch event for the new macvlan device. Similar fix is already used by the 8021q device (see register_vlan_dev()). Also fix the inconsistent state when the lower device has been down and its carrier was changed (when a device is down NETDEV_CHANGE doesn't get generated). The second issue can be seen f.e. when we have a macvlan on top of a 8021q device which has been down and its real device has been changing carrier states, after setting the 8021q device up, the macvlan device will have the same carrier state as it was before even though the 8021q can now have a different state. Example for case 1: 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 $ ip l add l eth2 macvl0 type macvlan $ ip l set macvl0 up $ ip l sh macvl0 72: macvl0@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default link/ether f6:0b:54:0a:9d:a3 brd ff:ff:ff:ff:ff:ff Example for case 2 (order is important): Prestate: eth2 UP/CARRIER, vlan1 down, vlan1-macvlan down $ ip l set vlan1-macvlan up $ ip l sh vlan1-macvlan 71: vlan1-macvlan@vlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default link/ether 4a:b8:44:56:b9:b9 brd ff:ff:ff:ff:ff:ff [ eth2 loses CARRIER before vlan1 has been UP-ed ] $ ip l sh eth2 4: eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:bf:57:16 brd ff:ff:ff:ff:ff:ff $ ip l sh vlan1-macvlan 71: vlan1-macvlan@vlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default link/ether 4a:b8:44:56:b9:b9 brd ff:ff:ff:ff:ff:ff $ ip l set vlan1 up $ ip l sh vlan1 70: vlan1@eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:bf:57:16 brd ff:ff:ff:ff:ff:ff $ ip l sh vlan1-macvlan 71: vlan1-macvlan@vlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default link/ether 4a:b8:44:56:b9:b9 brd ff:ff:ff:ff:ff:ff vlan1-macvlan is still UP, still has carrier and is still in the same operstate as before. After the patch in case 1 macvl0 has state UP as it should and in case 2 vlan1-macvlan has state LOWERLAYERDOWN again as it should. Note that while the lower macvlan device is down their carrier and thus operstate can go out of sync but that will be fixed once the lower device goes up again. This behaviour seems to have been present since beginning of git history. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Parthasarathy Bhuvaragan authored
In 'commit 7fe8097c ("tipc: fix nullpointer bug when subscribing to events")', we terminate the connection if the subscription creation fails. In the same commit, the subscription creation result was based on the value of the subscription pointer (set in the function) instead of the return code. Unfortunately, the same function tipc_subscrp_create() handles subscription cancel request. For a subscription cancellation request, the subscription pointer cannot be set. Thus if a subscriber has several subscriptions and cancels any of them, the connection is terminated. In this commit, we terminate the connection based on the return value of tipc_subscrp_create(). Fixes: commit 7fe8097c ("tipc: fix nullpointer bug when subscribing to events") Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kefeng Wang authored
Convert the driver to use ns_to_timespec64() to keep consistency with timespec64_to_ns() instead of open coding the same logic. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
We should not assume a valid protocol header is present, as this is not the case for IPv4 fragments. Lets avoid extra cache line misses and potential bugs if we actually find a socket and incorrectly uses its dst. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Andrew Lunn says: ==================== Part 2 of v4.5-rc1 phylib regression White list PHY compatible values which indicate PHYs. Issue a warning when one is encountered. Update the documentation to make it clear what is expected in the compatible string. v2: Fix Grammar, reword changelog, add Tested-by and Acked-by. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
PHY devices may only list compatibility with clause 22, 45, and if they need to be more specific, their PHY identifier values. No other compatible strings are allowed. Make this clear in the documentation, and remove examples where make/model compatible strings are listed. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
Some phy nodes list a compatible value indicating the PHY make/model. This is never used to match the device to the driver. However it does confuse the code to separate a PHY from a generic MDIO device like a switch. Generic MDIO devices must have a compatible value, PHYs can list clause 22 or 45, but nothing else. Issue a warning if we find a compatible value known on the whitelist, and say it is a PHY. Fixes: a9049e0c ("mdio: Add support for mdio drivers.") Reported-by: Aaro Koskinen <aaro.koskinen@nokia.com> Reported-by: Olof Johansson <olof@lixom.net> Tested-by: Aaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Woojung Huh says: ==================== lan78xx: update and fixes lan78xx: change to use updated phy-ignore-interrupts lan78xx: Add to handle mux control per chip id lan78xx: throttle TX path at slower than SuperSpeed USB ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Woojung.Huh@microchip.com authored
Throttle TX path only at slower than SuperSpeed USB. SuperSpeed USB has enough bandwidth to maintain GigE. Signed-off-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Woojung.Huh@microchip.com authored
Depends on chip, some EEPROM pins are muxed with LED function. Disable & restore LED function to access EEPROM. Signed-off-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Woojung.Huh@microchip.com authored
Update lan78xx to use patch of commit 4f2aaf7d ("Merge branch 'fix-phy-ignore-interrupts'"). Signed-off-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
With some combinations of user provided flags in netlink command, it is possible to call tcp_get_info() with a buffer that is not 8-bytes aligned. It does matter on some arches, so we need to use put_unaligned() to store the u64 fields. Current iproute2 package does not trigger this particular issue. Fixes: 0df48c26 ("tcp: add tcpi_bytes_acked to tcp_info") Fixes: 977cb0ec ("tcp: add pacing_rate information into tcp_info") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ido Schimmel authored
When switchdev drivers process FDB notifications from the underlying device they resolve the netdev to which the entry points to and notify the bridge using the switchdev notifier. However, since the RTNL mutex is not held there is nothing preventing the netdev from disappearing in the middle, which will cause br_switchdev_event() to dereference a non-existing netdev. Make switchdev drivers hold the lock at the beginning of the notification processing session and release it once it ends, after notifying the bridge. Also, remove switchdev_mutex and fdb_lock, as they are no longer needed when RTNL mutex is held. Fixes: 03bf0c28 ("switchdev: introduce switchdev notifier") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Malcolm Crossley authored
Trying to batch Tx response events results in poor performance because this delays freeing the transmitted skbs. Instead use the standard RING_FINAL_CHECK_FOR_RESPONSES() macro to be notified once the next Tx response is placed on the ring. Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Nicolas Schichan authored
The code in txq_put_data() would use txq->tx_curr_desc to index the tso_hdrs/tso_hdrs_dma buffers, for less than 8 bytes unaligned fragments, which is already moved to the next descriptor at the beginning of the function. If that fragment was the last of the the skb, the next skb would use that same space to place the ip headers, overwritting that small fragment data. Fixes: 91986fd3 (net: mv643xx_eth: Ensure proper data alignment in TSO TX path) Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Reviewed-by: Philipp Kirchhofer <philipp@familie-kirchhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
of_phy_find_device() is used to find the phy device associated with a device node. It is expected the node is for a PHY device, but in fact it could of been probed as a generic MDIO device. Ensure the device is a PHY before returning it. Fixes: a9049e0c ("mdio: Add support for mdio drivers.") Reported-by: Aaro Koskinen <aaro.koskinen@nokia.com> Reported-by: Olof Johansson <olof@lixom.net> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Aaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'mac80211-for-davem-2016-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Here's a first set of fixes for the 4.5-rc cycle: * make regulatory messages much less verbose by default * various remain-on-channel fixes * scheduled scanning fixes with hardware restart * a PS-Poll handling fix; was broken just recently * bugfix to avoid buffering non-bufferable MMPDUs * world regulatory domain data fix * a fix for scanning causing other work to get stuck * hwsim: revert an older problematic patch that caused some userspace tools to have issues - not that big a deal as it's a debug only driver though ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Richard Weinberger authored
Not every arch has io memory. So, unbreak the build by fixing the dependencies. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Neal Cardwell authored
This commit fixes a corner case in tcp_mark_head_lost() which was causing the WARN_ON(len > skb->len) in tcp_fragment() to fire. tcp_mark_head_lost() was assuming that if a packet has tcp_skb_pcount(skb) of N, then it's safe to fragment off a prefix of M*mss bytes, for any M < N. But with the tricky way TCP pcounts are maintained, this is not always true. For example, suppose the sender sends 4 1-byte packets and have the last 3 packet sacked. It will merge the last 3 packets in the write queue into an skb with pcount = 3 and len = 3 bytes. If another recovery happens after a sack reneging event, tcp_mark_head_lost() may attempt to split the skb assuming it has more than 2*MSS bytes. This sounds very counterintuitive, but as the commit description for the related commit c0638c24 ("tcp: don't fragment SACKed skbs in tcp_mark_head_lost()") notes, this is because tcp_shifted_skb() coalesces adjacent regions of SACKed skbs, and when doing this it preserves the sum of their packet counts in order to reflect the real-world dynamics on the wire. The c0638c24 commit tried to avoid problems by not fragmenting SACKed skbs, since SACKed skbs are where the non-proportionality between pcount and skb->len/mss is known to be possible. However, that commit did not handle the case where during a reneging event one of these weird SACKed skbs becomes an un-SACKed skb, which tcp_mark_head_lost() can then try to fragment. The fix is to simply mark the entire skb lost when this happens. This makes the recovery slightly more aggressive in such corner cases before we detect reordering. But once we detect reordering this code path is by-passed because FACK is disabled. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Joe Stringer authored
Later parts of the stack (including fragmentation) expect that there is never a socket attached to frag in a frag_list, however this invariant was not enforced on all defrag paths. This could lead to the BUG_ON(skb->sk) during ip_do_fragment(), as per the call stack at the end of this commit message. While the call could be added to openvswitch to fix this particular error, the head and tail of the frags list are already orphaned indirectly inside ip_defrag(), so it seems like the remaining fragments should all be orphaned in all circumstances. kernel BUG at net/ipv4/ip_output.c:586! [...] Call Trace: <IRQ> [<ffffffffa0205270>] ? do_output.isra.29+0x1b0/0x1b0 [openvswitch] [<ffffffffa02167a7>] ovs_fragment+0xcc/0x214 [openvswitch] [<ffffffff81667830>] ? dst_discard_out+0x20/0x20 [<ffffffff81667810>] ? dst_ifdown+0x80/0x80 [<ffffffffa0212072>] ? find_bucket.isra.2+0x62/0x70 [openvswitch] [<ffffffff810e0ba5>] ? mod_timer_pending+0x65/0x210 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 [<ffffffffa03205a2>] ? nf_conntrack_in+0x252/0x500 [nf_conntrack] [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 [<ffffffffa02051a3>] do_output.isra.29+0xe3/0x1b0 [openvswitch] [<ffffffffa0206411>] do_execute_actions+0xe11/0x11f0 [openvswitch] [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 [<ffffffffa0206822>] ovs_execute_actions+0x32/0xd0 [openvswitch] [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch] [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 [<ffffffffa02068a2>] ovs_execute_actions+0xb2/0xd0 [openvswitch] [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch] [<ffffffffa0215019>] ? ovs_ct_get_labels+0x49/0x80 [openvswitch] [<ffffffffa0213a1d>] ovs_vport_receive+0x5d/0xa0 [openvswitch] [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch] [<ffffffffa02148fc>] internal_dev_xmit+0x6c/0x140 [openvswitch] [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch] [<ffffffff81660299>] dev_hard_start_xmit+0x2b9/0x5e0 [<ffffffff8165fc21>] ? netif_skb_features+0xd1/0x1f0 [<ffffffff81660f20>] __dev_queue_xmit+0x800/0x930 [<ffffffff81660770>] ? __dev_queue_xmit+0x50/0x930 [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90 [<ffffffff81669876>] ? neigh_resolve_output+0x106/0x220 [<ffffffff81661060>] dev_queue_xmit+0x10/0x20 [<ffffffff816698e8>] neigh_resolve_output+0x178/0x220 [<ffffffff816a8e6f>] ? ip_finish_output2+0x1ff/0x590 [<ffffffff816a8e6f>] ip_finish_output2+0x1ff/0x590 [<ffffffff816a8cee>] ? ip_finish_output2+0x7e/0x590 [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0 [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0 [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80 [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340 [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190 [<ffffffff816ab4c0>] ip_output+0x70/0x110 [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80 [<ffffffff816aa9f9>] ip_local_out+0x39/0x70 [<ffffffff816abf89>] ip_send_skb+0x19/0x40 [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40 [<ffffffff816df21a>] icmp_push_reply+0xea/0x120 [<ffffffff816df93d>] icmp_reply.constprop.23+0x1ed/0x230 [<ffffffff816df9ce>] icmp_echo.part.21+0x4e/0x50 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 [<ffffffff810d5f9e>] ? rcu_read_lock_held+0x5e/0x70 [<ffffffff816dfa06>] icmp_echo+0x36/0x70 [<ffffffff816e0d11>] icmp_rcv+0x271/0x450 [<ffffffff816a4ca7>] ip_local_deliver_finish+0x127/0x3a0 [<ffffffff816a4bc1>] ? ip_local_deliver_finish+0x41/0x3a0 [<ffffffff816a5160>] ip_local_deliver+0x60/0xd0 [<ffffffff816a4b80>] ? ip_rcv_finish+0x560/0x560 [<ffffffff816a46fd>] ip_rcv_finish+0xdd/0x560 [<ffffffff816a5453>] ip_rcv+0x283/0x3e0 [<ffffffff810b6302>] ? match_held_lock+0x192/0x200 [<ffffffff816a4620>] ? inet_del_offload+0x40/0x40 [<ffffffff8165d062>] __netif_receive_skb_core+0x392/0xae0 [<ffffffff8165e68e>] ? process_backlog+0x8e/0x230 [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90 [<ffffffff8165d7c8>] __netif_receive_skb+0x18/0x60 [<ffffffff8165e678>] process_backlog+0x78/0x230 [<ffffffff8165e6dd>] ? process_backlog+0xdd/0x230 [<ffffffff8165e355>] net_rx_action+0x155/0x400 [<ffffffff8106b48c>] __do_softirq+0xcc/0x420 [<ffffffff816a8e87>] ? ip_finish_output2+0x217/0x590 [<ffffffff8178e78c>] do_softirq_own_stack+0x1c/0x30 <EOI> [<ffffffff8106b88e>] do_softirq+0x4e/0x60 [<ffffffff8106b948>] __local_bh_enable_ip+0xa8/0xb0 [<ffffffff816a8eb0>] ip_finish_output2+0x240/0x590 [<ffffffff816a9a31>] ? ip_do_fragment+0x831/0x8a0 [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0 [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0 [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80 [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340 [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190 [<ffffffff816ab4c0>] ip_output+0x70/0x110 [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80 [<ffffffff816aa9f9>] ip_local_out+0x39/0x70 [<ffffffff816abf89>] ip_send_skb+0x19/0x40 [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40 [<ffffffff816d55d3>] raw_sendmsg+0x7d3/0xc30 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90 [<ffffffff816e7557>] ? inet_sendmsg+0xc7/0x1d0 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70 [<ffffffff816e759a>] inet_sendmsg+0x10a/0x1d0 [<ffffffff816e7495>] ? inet_sendmsg+0x5/0x1d0 [<ffffffff8163e398>] sock_sendmsg+0x38/0x50 [<ffffffff8163ec5f>] ___sys_sendmsg+0x25f/0x270 [<ffffffff811aadad>] ? handle_mm_fault+0x8dd/0x1320 [<ffffffff8178c147>] ? _raw_spin_unlock+0x27/0x40 [<ffffffff810529b2>] ? __do_page_fault+0x1e2/0x460 [<ffffffff81204886>] ? __fget_light+0x66/0x90 [<ffffffff8163f8e2>] __sys_sendmsg+0x42/0x80 [<ffffffff8163f932>] SyS_sendmsg+0x12/0x20 [<ffffffff8178cb17>] entry_SYSCALL_64_fastpath+0x12/0x6f Code: 00 00 44 89 e0 e9 7c fb ff ff 4c 89 ff e8 e7 e7 ff ff 41 8b 9d 80 00 00 00 2b 5d d4 89 d8 c1 f8 03 0f b7 c0 e9 33 ff ff f 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 RIP [<ffffffff816a9a92>] ip_do_fragment+0x892/0x8a0 RSP <ffff88006d603170> Fixes: 7f8a436e ("openvswitch: Add conntrack action") Signed-off-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 28 Jan, 2016 3 commits
-
-
David S. Miller authored
Xin Long says: ==================== fix the transport dead race check by using atomic_add_unless on refcnt sctp: fix the transport dead race check by using atomic_add_unless on refcnt sctp: hold transport before we access t->asoc in sctp proc sctp: remove the dead field of sctp_transport ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
After we use refcnt to check if transport is alive, the dead can be removed from sctp_transport. The traversal of transport_addr_list in procfs dump is using list_for_each_entry_rcu, no need to check if it has been freed. sctp_generate_t3_rtx_event and sctp_generate_heartbeat_event is protected by sock lock, it's not necessary to check dead, either. also, the timers are cancelled when sctp_transport_free() is called, that it doesn't wait for refcnt to reach 0 to cancel them. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
Previously, before rhashtable, /proc assoc listing was done by read-locking the entire hash entry and dumping all assocs at once, so we were sure that the assoc wasn't freed because it wouldn't be possible to remove it from the hash meanwhile. Now we use rhashtable to list transports, and dump entries one by one. That is, now we have to check if the assoc is still a good one, as the transport we got may be being freed. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-