- 21 Jun, 2021 10 commits
-
-
David S. Miller authored
Mat Martineau says: ==================== mptcp: 32-bit sequence number improvements MPTCP-level sequence numbers are 64 bits, but RFC 8684 allows use of 32-bit sequence numbers in the DSS option to save header space. Those 32-bit numbers are the least significant bits of the full 64-bit sequence number, so the receiver must infer the correct upper 32 bits. These two patches improve the logic for determining the full 64-bit sequence numbers when the 32-bit truncated version has wrapped around. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
The current implementation of 32 bit DSN expansion is buggy. After the previous patch, we can simply reuse the newly introduced helper to do the expansion safely. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/120 Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni authored
When receiving 32 bits DSS ack from the peer, the MPTCP need to expand them to 64 bits value. The current code is buggy WRT detecting 32 bits ack wrap-around: when the wrap-around happens the current unsigned 32 bit ack value is lower than the previous one. Additionally check for possible reverse wrap and make the helper visible, so that we could re-use it for the next patch. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/204 Fixes: cc9d2566 ("mptcp: update per unacked sequence on pkt reception") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
We got multiple reports that multi_chunk_sendfile test case from tls selftest fails. This was sort of expected, as the original fix was never applied (see it in the first Link:). The test in question uses sendfile() with count larger than the size of the underlying file. This will make splice set MSG_MORE on all sendpage calls, meaning TLS will never close and flush the last partial record. Eric seem to have addressed a similar problem in commit 35f9c09f ("tcp: tcp_sendpages() should call tcp_push() once") by introducing MSG_SENDPAGE_NOTLAST. Unlike MSG_MORE MSG_SENDPAGE_NOTLAST is not set on the last call of a "pipefull" of data (PIPE_DEF_BUFFERS == 16, so every 16 pages or whenever we run out of data). Having a break every 16 pages should be fine, TLS can pack exactly 4 pages into a record, so for aligned reads there should be no difference, unaligned may see one extra record per sendpage(). Sticking to TCP semantics seems preferable to modifying splice, but we can revisit it if real life scenarios show a regression. Reported-by: Vadim Fedorenko <vfedorenko@novek.ru> Reported-by: Seth Forshee <seth.forshee@canonical.com> Link: https://lore.kernel.org/netdev/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/ Fixes: 3c4d7559 ("tls: kernel TLS support") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'linux-can-fixes-for-5.13-20210619' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2021-06-19 this is a pull request of 5 patches for net/master. The first patch is by Thadeu Lima de Souza Cascardo and fixes a potential use-after-free in the CAN broadcast manager socket, by delaying the release of struct bcm_op after synchronize_rcu(). Oliver Hartkopp's patch fixes a similar potential user-after-free in the CAN gateway socket by synchronizing RCU operations before removing gw job entry. Another patch by Oliver Hartkopp fixes a potential use-after-free in the ISOTP socket by omitting unintended hrtimer restarts on socket release. Oleksij Rempel's patch for the j1939 socket fixes a potential use-after-free by setting the SOCK_RCU_FREE flag on the socket. The last patch is by Pavel Skripkin and fixes a use-after-free in the ems_usb CAN driver. All patches are intended for stable and have stable@v.k.o on Cc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'wireless-drivers-2021-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for v5.13 Only one important fix for an mwifiex regression. mwifiex * fix deadlock during rmmod or firmware reset, regression from cfg80211 RTNL changes in v5.12-rc1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Haiyang Zhang authored
Set needed_headroom according to VF if VF needs a bigger headroom. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sebastian Andrzej Siewior authored
The preempt disable around do_xdp_generic() has been introduced in commit bbbe211c ("net: rcu lock and preempt disable missing around generic xdp") For BPF it is enough to use migrate_disable() and the code was updated as it can be seen in commit 3c58482a ("bpf: Provide bpf_prog_run_pin_on_cpu() helper") This is a leftover which was not converted. Use migrate_disable() before invoking do_xdp_generic(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yunsheng Lin authored
The spin_trylock() was assumed to contain the implicit barrier needed to ensure the correct ordering between STATE_MISSED setting/clearing and STATE_MISSED checking in commit a90c57f2 ("net: sched: fix packet stuck problem for lockless qdisc"). But it turns out that spin_trylock() only has load-acquire semantic, for strongly-ordered system(like x86), the compiler barrier implicitly contained in spin_trylock() seems enough to ensure the correct ordering. But for weakly-orderly system (like arm64), the store-release semantic is needed to ensure the correct ordering as clear_bit() and test_bit() is store operation, see queued_spin_lock(). So add the explicit barrier to ensure the correct ordering for the above case. Fixes: a90c57f2 ("net: sched: fix packet stuck problem for lockless qdisc") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Antoine Tenart authored
Non-ND strict packets with a source LLA go through the packet taps again, while non-ND strict packets with other source addresses do not, and we can see a clone of those packets on the vrf interface (we should not). This is due to a series of changes: Commit 6f12fa77[1] made non-ND strict packets not being pushed again in the packet taps. This changed with commit 205704c6[2] for those packets having a source LLA, as they need a lookup with the orig_iif. The issue now is those packets do not skip the 'vrf_ip6_rcv' function to the end (as the ones without a source LLA) and go through the check to call packet taps again. This check was changed by commit 6f12fa77[1] and do not exclude non-strict packets anymore. Packets matching 'need_strict && !is_ndisc && is_ll_src' are now being sent through the packet taps again. This can be seen by dumping packets on the vrf interface. Fix this by having the same code path for all non-ND strict packets and selectively lookup with the orig_iif for those with a source LLA. This has the effect to revert to the pre-205704c6[2] condition, which should also be easier to maintain. [1] 6f12fa77 ("vrf: mark skb for multicast or link-local as enslaved to VRF") [2] 205704c6 ("vrf: packets with lladdr src needs dst at input with orig_iif when needs strict") Fixes: 205704c6 ("vrf: packets with lladdr src needs dst at input with orig_iif when needs strict") Cc: Stephen Suryaputra <ssuryaextr@gmail.com> Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 19 Jun, 2021 11 commits
-
-
Pavel Skripkin authored
In ems_usb_disconnect() dev pointer, which is netdev private data, is used after free_candev() call: | if (dev) { | unregister_netdev(dev->netdev); | free_candev(dev->netdev); | | unlink_all_urbs(dev); | | usb_free_urb(dev->intr_urb); | | kfree(dev->intr_in_buffer); | kfree(dev->tx_msg_buffer); | } Fix it by simply moving free_candev() at the end of the block. Fail log: | BUG: KASAN: use-after-free in ems_usb_disconnect | Read of size 8 at addr ffff88804e041008 by task kworker/1:2/2895 | | CPU: 1 PID: 2895 Comm: kworker/1:2 Not tainted 5.13.0-rc5+ #164 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.4 | Workqueue: usb_hub_wq hub_event | Call Trace: | dump_stack (lib/dump_stack.c:122) | print_address_description.constprop.0.cold (mm/kasan/report.c:234) | kasan_report.cold (mm/kasan/report.c:420 mm/kasan/report.c:436) | ems_usb_disconnect (drivers/net/can/usb/ems_usb.c:683 drivers/net/can/usb/ems_usb.c:1058) Fixes: 702171ad ("ems_usb: Added support for EMS CPC-USB/ARM7 CAN/USB interface") Link: https://lore.kernel.org/r/20210617185130.5834-1-paskripkin@gmail.com Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Oleksij Rempel authored
Set SOCK_RCU_FREE to let RCU to call sk_destruct() on completion. Without this patch, we will run in to j1939_can_recv() after priv was freed by j1939_sk_release()->j1939_sk_sock_destruct() Fixes: 25fe97cb ("can: j1939: move j1939_priv_put() into sk_destruct callback") Link: https://lore.kernel.org/r/20210617130623.12705-1-o.rempel@pengutronix.de Cc: linux-stable <stable@vger.kernel.org> Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Reported-by: syzbot+bdf710cfc41c186fdff3@syzkaller.appspotmail.com Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Oliver Hartkopp authored
When closing the isotp socket, the potentially running hrtimers are canceled before removing the subscription for CAN identifiers via can_rx_unregister(). This may lead to an unintended (re)start of a hrtimer in isotp_rcv_cf() and isotp_rcv_fc() in the case that a CAN frame is received by isotp_rcv() while the subscription removal is processed. However, isotp_rcv() is called under RCU protection, so after calling can_rx_unregister, we may call synchronize_rcu in order to wait for any RCU read-side critical sections to finish. This prevents the reception of CAN frames after hrtimer_cancel() and therefore the unintended (re)start of the hrtimers. Link: https://lore.kernel.org/r/20210618173713.2296-1-socketcan@hartkopp.net Fixes: e057dd3f ("can: add ISO 15765-2:2016 transport protocol") Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Oliver Hartkopp authored
can_can_gw_rcv() is called under RCU protection, so after calling can_rx_unregister(), we have to call synchronize_rcu in order to wait for any RCU read-side critical sections to finish before removing the kmem_cache entry with the referenced gw job entry. Link: https://lore.kernel.org/r/20210618173645.2238-1-socketcan@hartkopp.net Fixes: c1aabdf3 ("can-gw: add netlink based CAN routing") Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Thadeu Lima de Souza Cascardo authored
can_rx_register() callbacks may be called concurrently to the call to can_rx_unregister(). The callbacks and callback data, though, are protected by RCU and the struct sock reference count. So the callback data is really attached to the life of sk, meaning that it should be released on sk_destruct. However, bcm_remove_op() calls tasklet_kill(), and RCU callbacks may be called under RCU softirq, so that cannot be used on kernels before the introduction of HRTIMER_MODE_SOFT. However, bcm_rx_handler() is called under RCU protection, so after calling can_rx_unregister(), we may call synchronize_rcu() in order to wait for any RCU read-side critical sections to finish. That is, bcm_rx_handler() won't be called anymore for those ops. So, we only free them, after we do that synchronize_rcu(). Fixes: ffd980f9 ("[CAN]: Add broadcast manager (bcm) protocol") Link: https://lore.kernel.org/r/20210619161813.2098382-1-cascardo@canonical.com Cc: linux-stable <stable@vger.kernel.org> Reported-by: syzbot+0f7e7e5e2f4f40fa89c0@syzkaller.appspotmail.com Reported-by: Norbert Slusarek <nslusarek@gmx.net> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
David S. Miller authored
Pavel Skripkin says: ==================== net: ethernat: ezchip: bug fixing and code improvments While manual code reviewing, I found some error in ezchip driver. Two of them looks very dangerous: 1. use-after-free in nps_enet_remove Accessing netdev private data after free_netdev() 2. wrong error handling of platform_get_irq() It can cause passing negative irq to request_irq() Also, in 2nd patch I removed redundant check to increase execution speed and make code more straightforward. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Skripkin authored
As documented at drivers/base/platform.c for platform_get_irq: * Gets an IRQ for a platform device and prints an error message if finding the * IRQ fails. Device drivers should check the return value for errors so as to * not pass a negative integer value to the request_irq() APIs. So, the driver should check that platform_get_irq() return value is _negative_, not that it's equal to zero, because -ENXIO (return value from request_irq() if irq was not found) will pass this check and it leads to passing negative irq to request_irq() Fixes: 0dd07709 ("NET: Add ezchip ethernet driver") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Skripkin authored
err varibale will be set everytime, when code gets into this path. This check will just slowdown the execution and that's all. Fixes: 0dd07709 ("NET: Add ezchip ethernet driver") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Skripkin authored
priv is netdev private data, but it is used after free_netdev(). It can cause use-after-free when accessing priv pointer. So, fix it by moving free_netdev() after netif_napi_del() call. Fixes: 0dd07709 ("NET: Add ezchip ethernet driver") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Skripkin authored
static int greth_of_remove(struct platform_device *of_dev) { ... struct greth_private *greth = netdev_priv(ndev); ... unregister_netdev(ndev); free_netdev(ndev); of_iounmap(&of_dev->resource[0], greth->regs, resource_size(&of_dev->resource[0])); ... } greth is netdev private data, but it is used after free_netdev(). It can cause use-after-free when accessing greth pointer. So, fix it by moving free_netdev() after of_iounmap() call. Fixes: d4c41139 ("net: Add Aeroflex Gaisler 10/100/1G Ethernet MAC driver") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.13-rc7, including fixes from wireless, bpf, bluetooth, netfilter and can. Current release - regressions: - mlxsw: spectrum_qdisc: Pass handle, not band number to find_class() to fix modifying offloaded qdiscs - lantiq: net: fix duplicated skb in rx descriptor ring - rtnetlink: fix regression in bridge VLAN configuration, empty info is not an error, bot-generated "fix" was not needed - libbpf: s/rx/tx/ typo on umem->rx_ring_setup_done to fix umem creation Current release - new code bugs: - ethtool: fix NULL pointer dereference during module EEPROM dump via the new netlink API - mlx5e: don't update netdev RQs with PTP-RQ, the special purpose queue should not be visible to the stack - mlx5e: select special PTP queue only for SKBTX_HW_TSTAMP skbs - mlx5e: verify dev is present in get devlink port ndo, avoid a panic Previous releases - regressions: - neighbour: allow NUD_NOARP entries to be force GCed - further fixes for fallout from reorg of WiFi locking (staging: rtl8723bs, mac80211, cfg80211) - skbuff: fix incorrect msg_zerocopy copy notifications - mac80211: fix NULL ptr deref for injected rate info - Revert "net/mlx5: Arm only EQs with EQEs" it may cause missed IRQs Previous releases - always broken: - bpf: more speculative execution fixes - netfilter: nft_fib_ipv6: skip ipv6 packets from any to link-local - udp: fix race between close() and udp_abort() resulting in a panic - fix out of bounds when parsing TCP options before packets are validated (in netfilter: synproxy, tc: sch_cake and mptcp) - mptcp: improve operation under memory pressure, add missing wake-ups - mptcp: fix double-lock/soft lookup in subflow_error_report() - bridge: fix races (null pointer deref and UAF) in vlan tunnel egress - ena: fix DMA mapping function issues in XDP - rds: fix memory leak in rds_recvmsg Misc: - vrf: allow larger MTUs - icmp: don't send out ICMP messages with a source address of 0.0.0.0 - cdc_ncm: switch to eth%d interface naming" * tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (139 commits) net: ethernet: fix potential use-after-free in ec_bhf_remove selftests/net: Add icmp.sh for testing ICMP dummy address responses icmp: don't send out ICMP messages with a source address of 0.0.0.0 net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY net: ll_temac: Fix TX BD buffer overwrite net: ll_temac: Add memory-barriers for TX BD access net: ll_temac: Make sure to free skb when it is completely used MAINTAINERS: add Guvenc as SMC maintainer bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error path bnxt_en: Fix TQM fastpath ring backing store computation bnxt_en: Rediscover PHY capabilities after firmware reset cxgb4: fix wrong shift. mac80211: handle various extensible elements correctly mac80211: reset profile_periodicity/ema_ap cfg80211: avoid double free of PMSR request cfg80211: make certificate generation more robust mac80211: minstrel_ht: fix sample time check net: qed: Fix memcpy() overflow of qed_dcbx_params() net: cdc_eem: fix tx fixup skb leak net: hamradio: fix memory leak in mkiss_close ...
-
- 18 Jun, 2021 19 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds authored
Pull btrfs fix from David Sterba: "One more fix, for a space accounting bug in zoned mode. It happens when a block group is switched back rw->ro and unusable bytes (due to zoned constraints) are subtracted twice. It has user visible effects so I consider it important enough for late -rc inclusion and backport to stable" * tag 'for-5.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: fix negative space_info->bytes_readonly
-
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pciLinus Torvalds authored
Pull PCI fixes from Bjorn Helgaas: - Clear 64-bit flag for host bridge windows below 4GB to fix a resource allocation regression added in -rc1 (Punit Agrawal) - Fix tegra194 MCFG quirk build regressions added in -rc1 (Jon Hunter) - Avoid secondary bus resets on TI KeyStone C667X devices (Antti Järvinen) - Avoid secondary bus resets on some NVIDIA GPUs (Shanker Donthineni) - Work around FLR erratum on Huawei Intelligent NIC VF (Chiqijun) - Avoid broken ATS on AMD Navi14 GPU (Evan Quan) - Trust Broadcom BCM57414 NIC to isolate functions even though it doesn't advertise ACS support (Sriharsha Basavapatna) - Work around AMD RS690 BIOSes that don't configure DMA above 4GB (Mikel Rychliski) - Fix panic during PIO transfer on Aardvark controller (Pali Rohár) * tag 'pci-v5.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: aardvark: Fix kernel panic during PIO transfer PCI: Add AMD RS690 quirk to enable 64-bit DMA PCI: Add ACS quirk for Broadcom BCM57414 NIC PCI: Mark AMD Navi14 GPU ATS as broken PCI: Work around Huawei Intelligent NIC VF FLR erratum PCI: Mark some NVIDIA GPUs to avoid bus reset PCI: Mark TI C667X to avoid bus reset PCI: tegra194: Fix MCFG quirk build regressions PCI: of: Clear 64-bit flag for non-prefetchable memory below 4GB
-
Matthew Wilcox (Oracle) authored
If a task is killed during a page fault, it does not currently call sb_end_pagefault(), which means that the filesystem cannot be frozen at any time thereafter. This may be reported by lockdep like this: ==================================== WARNING: fsstress/10757 still has locks held! 5.13.0-rc4-build4+ #91 Not tainted ------------------------------------ 1 lock held by fsstress/10757: #0: ffff888104eac530 ( sb_pagefaults as filesystem freezing is modelled as a lock. Fix this by removing all the direct returns from within the function, and using 'ret' to indicate whether we were interrupted or successful. Fixes: 1cf7a151 ("afs: Implement shared-writeable mmap") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/20210616154900.1958373-1-willy@infradead.org/Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Pavel Skripkin authored
static void ec_bhf_remove(struct pci_dev *dev) { ... struct ec_bhf_priv *priv = netdev_priv(net_dev); unregister_netdev(net_dev); free_netdev(net_dev); pci_iounmap(dev, priv->dma_io); pci_iounmap(dev, priv->io); ... } priv is netdev private data, but it is used after free_netdev(). It can cause use-after-free when accessing priv pointer. So, fix it by moving free_netdev() after pci_iounmap() calls. Fixes: 6af55ff5 ("Driver for Beckhoff CX5020 EtherCAT master module.") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'mac80211-for-net-2021-06-18' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A couple of straggler fixes: * a minstrel HT sample check fix * peer measurement could double-free on races * certificate file generation at build time could sometimes hang * some parameters weren't reset between connections in mac80211 * some extensible elements were treated as non- extensible, possibly causuing bad connections (or failures) if the AP adds data ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Toke Høiland-Jørgensen authored
This adds a new icmp.sh selftest for testing that the kernel will respond correctly with an ICMP unreachable message with the dummy (192.0.0.8) source address when there are no IPv4 addresses configured to use as source addresses. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Toke Høiland-Jørgensen authored
When constructing ICMP response messages, the kernel will try to pick a suitable source address for the outgoing packet. However, if no IPv4 addresses are configured on the system at all, this will fail and we end up producing an ICMP message with a source address of 0.0.0.0. This can happen on a box routing IPv4 traffic via v6 nexthops, for instance. Since 0.0.0.0 is not generally routable on the internet, there's a good chance that such ICMP messages will never make it back to the sender of the original packet that the ICMP message was sent in response to. This, in turn, can create connectivity and PMTUd problems for senders. Fortunately, RFC7600 reserves a dummy address to be used as a source for ICMP messages (192.0.0.8/32), so let's teach the kernel to substitute that address as a last resort if the regular source address selection procedure fails. Below is a quick example reproducing this issue with network namespaces: ip netns add ns0 ip l add type veth peer netns ns0 ip l set dev veth0 up ip a add 10.0.0.1/24 dev veth0 ip a add fc00:dead:cafe:42::1/64 dev veth0 ip r add 10.1.0.0/24 via inet6 fc00:dead:cafe:42::2 ip -n ns0 l set dev veth0 up ip -n ns0 a add fc00:dead:cafe:42::2/64 dev veth0 ip -n ns0 r add 10.0.0.0/24 via inet6 fc00:dead:cafe:42::1 ip netns exec ns0 sysctl -w net.ipv4.icmp_ratelimit=0 ip netns exec ns0 sysctl -w net.ipv4.ip_forward=1 tcpdump -tpni veth0 -c 2 icmp & ping -w 1 10.1.0.1 > /dev/null tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 29, seq 1, length 64 IP 0.0.0.0 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92 2 packets captured 2 packets received by filter 0 packets dropped by kernel With this patch the above capture changes to: IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 31127, seq 1, length 64 IP 192.0.0.8 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92 Fixes: 1da177e4 ("Linux-2.6.12-rc2") Reported-by: Juliusz Chroboczek <jch@irif.fr> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Esben Haabendal authored
As documented in Documentation/networking/driver.rst, the ndo_start_xmit method must not return NETDEV_TX_BUSY under any normal circumstances, and as recommended, we simply stop the tx queue in advance, when there is a risk that the next xmit would cause a NETDEV_TX_BUSY return. Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Esben Haabendal authored
Just as the initial check, we need to ensure num_frag+1 buffers available, as that is the number of buffers we are going to use. This fixes a buffer overflow, which might be seen during heavy network load. Complete lockup of TEMAC was reproducible within about 10 minutes of a particular load. Fixes: 84823ff8 ("net: ll_temac: Fix race condition causing TX hang") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Esben Haabendal authored
Add a couple of memory-barriers to ensure correct ordering of read/write access to TX BDs. In xmit_done, we should ensure that reading the additional BD fields are only done after STS_CTRL_APP0_CMPLT bit is set. When xmit_done marks the BD as free by setting APP0=0, we need to ensure that the other BD fields are reset first, so we avoid racing with the xmit path, which writes to the same fields. Finally, making sure to read APP0 of next BD after the current BD, ensures that we see all available buffers. Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Esben Haabendal authored
With the skb pointer piggy-backed on the TX BD, we have a simple and efficient way to free the skb buffer when the frame has been transmitted. But in order to avoid freeing the skb while there are still fragments from the skb in use, we need to piggy-back on the TX BD of the skb, not the first. Without this, we are doing use-after-free on the DMA side, when the first BD of a multi TX BD packet is seen as completed in xmit_done, and the remaining BDs are still being processed. Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Karsten Graul authored
Add Guvenc as maintainer for Shared Memory Communications (SMC) Sockets. Cc: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Michael Chan says: ==================== bnxt_en: Bug fixes This patchset includes 3 small bug fixes to reinitialize PHY capabilities after firmware reset, setup the chip's internal TQM fastpath ring backing memory properly for RoCE traffic, and to free ethtool related memory if driver probe fails. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Somnath Kotur authored
bnxt_ethtool_init() may have allocated some memory and we need to call bnxt_ethtool_free() to properly unwind if bnxt_init_one() fails. Fixes: 7c380918 ("bnxt_en: Refactor bnxt_init_one() and turn on TPA support on 57500 chips.") Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Rukhsana Ansari authored
TQM fastpath ring needs to be sized to store both the requester and responder side of RoCE QPs in TQM for supporting bi-directional tests. Fix bnxt_alloc_ctx_mem() to multiply the RoCE QPs by a factor of 2 when computing the number of entries for TQM fastpath ring. This fixes an RX pipeline stall issue when running bi-directional max RoCE QP tests. Fixes: c7dd7ab4 ("bnxt_en: Improve TQM ring context memory sizing formulas.") Signed-off-by: Rukhsana Ansari <rukhsana.ansari@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Michael Chan authored
There is a missing bnxt_probe_phy() call in bnxt_fw_init_one() to rediscover the PHY capabilities after a firmware reset. This can cause some PHY related functionalities to fail after a firmware reset. For example, in multi-host, the ability for any host to configure the PHY settings may be lost after a firmware reset. Fixes: ec5d31e3 ("bnxt_en: Handle firmware reset status during IF_UP.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Pavel Machek authored
While fixing coverity warning, commit dd2c7967 introduced typo in shift value. Fix that. Signed-off-by: Pavel Machek (CIP) <pavel@denx.de> Fixes: dd2c7967 ("cxgb4: Fix unintentional sign extension issues") Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arcLinus Torvalds authored
Pull ARC fixes from Vineet Gupta: - ARCv2 userspace ABI not populating a few registers - Unbork CONFIG_HARDENED_USERCOPY for ARC * tag 'arc-5.13-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: fix CONFIG_HARDENED_USERCOPY ARCv2: save ABI registers across signal handling
-
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds authored
Pull tracing fixes from Steven Rostedt: - Have recordmcount check for valid st_shndx otherwise some archs may have invalid references for the mcount location. - Two fixes done for mapping pids to task names. Traces were not showing the names of tasks when they should have. - Fix to trace_clock_global() to prevent it from going backwards * tag 'trace-v5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Do no increment trace_clock_global() by one tracing: Do not stop recording comms if the trace file is being read tracing: Do not stop recording cmdlines when tracing is off recordmcount: Correct st_shndx handling
-