- 29 Oct, 2022 10 commits
-
-
Russell King (Oracle) authored
Add support for forcing the link speed and duplex setting in the pcs_link_up() method for out of band modes, which will be useful when we finish converting the pcs_config() method. Until then, we still have to force duplex for 802.3z modes to work correctly. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
mtk_sgmii does a lot of read-modify-write operations, for which there is a specific regmap function. Use this function instead of open-coding the operations. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Add a pcs_get_state() implementation which uses the advertisements to compute the resulting link modes, and BMSR contents to determine negotiation and link status. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
The functions called by the pcs_config() method always return zero, so there is no point trying to handle an error from these functions. Make these functions void, eliminate the "err" variable and simply return zero from the pcs_config() function itself. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
As a result of help from Frank Wunderlich to investigate and test, we know a bit more about the PCS on the Mediatek platforms. Update the definitions from this investigation. This PCS appears similar, but not identical to the Lynx PCS. Although not included in this patch, but for future reference, the PHY ID registers at offset 4 read as 0x4d544950 'MTIP'. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Add a helper to convert the PHY interface mode to the required link timer setting as stated by the appropriate standard. Inappropriate interface modes return an error. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Sebastian Andrzej Siewior says: ==================== net: Remove the obsolte u64_stats_fetch_*_irq() This is the removal of u64_stats_fetch_*_irq() users in networking. The prerequisites are part of v6.1-rc1. The spi and bpf bits are not part of the series and have been routed directly. ==================== Link: https://lore.kernel.org/r/20221026132215.696950-1-bigeasy@linutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Thomas Gleixner authored
Now that the 32bit UP oddity is gone and 32bit uses always a sequence count, there is no need for the fetch_irq() variants anymore. Convert to the regular interface. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Thomas Gleixner authored
Now that the 32bit UP oddity is gone and 32bit uses always a sequence count, there is no need for the fetch_irq() variants anymore. Convert to the regular interface. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
==================== pull-request: wireless-next-2022-10-28 First set of patches v6.2. mac80211 refactoring continues for Wi-Fi 7. All mac80211 driver are now converted to use internal TX queues, this might cause some regressions so we wanted to do this early in the cycle. Note: wireless tree was merged[1] to wireless-next to avoid some conflicts with mac80211 patches between the trees. Unfortunately there are still two smaller conflicts in net/mac80211/util.c which Stephen also reported[2]. In the first conflict initialise scratch_len to "params->scratch_len ?: 3 * params->len" (note number 3, not 2!) and in the second conflict take the version which uses elems->scratch_pos. [1] https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git/commit/?id=dfd2d876b3fda1790bc0239ba4c6967e25d16e91 [2] https://lore.kernel.org/all/20221020032340.5cf101c0@canb.auug.org.au/ mac80211 - preparation for Wi-Fi 7 Multi-Link Operation (MLO) continues - add API to show the link STAs in debugfs - all mac80211 drivers are now using mac80211 internal TX queues (iTXQs) rtw89 - support 8852BE rtl8xxxu - support RTL8188FU brmfmac - support two station interfaces concurrently bcma - support SPROM rev 11 ==================== Link: https://lore.kernel.org/r/20221028132943.304ECC433B5@smtp.kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 28 Oct, 2022 22 commits
-
-
Dmitry Torokhov authored
Because we enable the clock immediately after acquiring it in probe, we can combine the 2 operations and use devm_clk_get_optional_enabled() helper. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jiawen Wu says: ==================== net: WangXun txgbe ethernet driver This patch series adds support for WangXun 10 gigabit NIC, to initialize hardware, set mac address, and register netdev. Change log: v6: address comments: Jakub Kicinski: check with scripts/kernel-doc v5: address comments: Jakub Kicinski: clean build with W=1 C=1 v4: address comments: Andrew Lunn: https://lore.kernel.org/all/YzXROBtztWopeeaA@lunn.ch/ v3: address comments: Andrew Lunn: remove hw function ops, reorder functions, use BIT(n) for register bit offset, move the same code of txgbe and ngbe to libwx v2: address comments: Andrew Lunn: https://lore.kernel.org/netdev/YvRhld5rD%2FxgITEg@lunn.ch/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiawen Wu authored
Add MAC address related operations, and register netdev. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiawen Wu authored
Reset and initialize the hardware by configuring the MAC layer. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jiawen Wu authored
Get PCI config space info, set LAN id and check flash status. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Mubashir Adnan Qureshi says: ==================== net: Add PLB functionality to TCP This patch series adds PLB (Protective Load Balancing) to TCP and hooks it up to DCTCP. PLB is disabled by default and can be enabled using relevant sysctls and support from underlying CC. PLB (Protective Load Balancing) is a host based mechanism for load balancing across switch links. It leverages congestion signals(e.g. ECN) from transport layer to randomly change the path of the connection experiencing congestion. PLB changes the path of the connection by changing the outgoing IPv6 flow label for IPv6 connections (implemented in Linux by calling sk_rethink_txhash()). Because of this implementation mechanism, PLB can currently only work for IPv6 traffic. For more information, see the SIGCOMM 2022 paper: https://doi.org/10.1145/3544216.3544226 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mubashir Adnan Qureshi authored
rcv_wnd can be useful to diagnose TCP performance where receiver window becomes the bottleneck. rehash reports the PLB and timeout triggered rehash attempts by the TCP connection. Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mubashir Adnan Qureshi authored
A u32 counter is added to tcp_sock for counting the number of PLB triggered rehashes for a TCP connection. An SNMP counter is also added to count overall PLB triggered rehash events for a host. These counters are hooked up to PLB implementation for DCTCP. TCP_NLA_REHASH is added to SCM_TIMESTAMPING_OPT_STATS that reports the rehash attempts triggered due to PLB or timeouts. This gives a historical view of sustained congestion or timeouts experienced by the TCP connection. Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mubashir Adnan Qureshi authored
PLB support is added to TCP DCTCP code. As DCTCP uses ECN as the congestion signal, PLB also uses ECN to make decisions whether to change the path or not upon sustained congestion. Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mubashir Adnan Qureshi authored
Congestion control algorithms track PLB state and cause the connection to trigger a path change when either of the 2 conditions is satisfied: - No packets are in flight and (# consecutive congested rounds >= sysctl_tcp_plb_idle_rehash_rounds) - (# consecutive congested rounds >= sysctl_tcp_plb_rehash_rounds) A round (RTT) is marked as congested when congestion signal (ECN ce_ratio) over an RTT is greater than sysctl_tcp_plb_cong_thresh. In the event of RTO, PLB (via tcp_write_timeout()) triggers a path change and disables congestion-triggered path changes for random time between (sysctl_tcp_plb_suspend_rto_sec, 2*sysctl_tcp_plb_suspend_rto_sec) to avoid hopping onto the "connectivity blackhole". RTO-triggered path changes can still happen during this cool-off period. Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Mubashir Adnan Qureshi authored
PLB (Protective Load Balancing) is a host based mechanism for load balancing across switch links. It leverages congestion signals(e.g. ECN) from transport layer to randomly change the path of the connection experiencing congestion. PLB changes the path of the connection by changing the outgoing IPv6 flow label for IPv6 connections (implemented in Linux by calling sk_rethink_txhash()). Because of this implementation mechanism, PLB can currently only work for IPv6 traffic. For more information, see the SIGCOMM 2022 paper: https://doi.org/10.1145/3544216.3544226 This commit adds new sysctl knobs and sets their default values for TCP PLB. Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Raju Lakkaraju says: ==================== net: phy: mxl-gpy: Add MDI-X This patch series add the MDI-X feature to GPY211 PHYs and Also Change return type to gpy_update_interface() function ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Raju Lakkaraju authored
Add support for MDI-X status and configuration for GPY211 chips Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Raju Lakkaraju authored
gpy_update_interface() is called from gpy_read_status() which does return error codes. gpy_read_status() would benefit from returning -EINVAL, etc. Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextJakub Kicinski authored
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) Move struct nft_payload_set definition to .c file where it is only used. 2) Shrink transport and inner header offset fields in the nft_pktinfo structure to 16-bits, from Florian Westphal. 3) Get rid of nft_objref Kbuild toggle, make it built-in into nf_tables. This expression is used to instantiate conntrack helpers in nftables. After removing the conntrack helper auto-assignment toggle it this feature became more important so move it to the nf_tables core module. Also from Florian. 4) Extend the existing function to calculate payload inner header offset to deal with the GRE and IPIP transport protocols. 6) Add inner expression support for nf_tables. This new expression provides a packet parser for tunneled packets which uses a userspace description of the expected inner headers. The inner expression invokes the payload expression (via direct call) to match on the inner header protocol fields using the inner link, network and transport header offsets. An example of the bytecode generated from userspace to match on IP source encapsulated in a VxLAN packet: # nft --debug=netlink add rule netdev x y udp dport 4789 vxlan ip saddr 1.2.3.4 netdev x y [ meta load l4proto => reg 1 ] [ cmp eq reg 1 0x00000011 ] [ payload load 2b @ transport header + 2 => reg 1 ] [ cmp eq reg 1 0x0000b512 ] [ inner type vxlan hdrsize 8 flags f [ meta load protocol => reg 1 ] ] [ cmp eq reg 1 0x00000008 ] [ inner type vxlan hdrsize 8 flags f [ payload load 4b @ network header + 12 => reg 1 ] ] [ cmp eq reg 1 0x04030201 ] 7) Store inner link, network and transport header offsets in percpu area to parse inner packet header once only. Matching on a different tunnel type invalidates existing offsets in the percpu area and it invokes the inner tunnel parser again. 8) Add support for inner meta matching. This support for NFTA_META_PROTOCOL, which specifies the inner ethertype, and NFT_META_L4PROTO, which specifies the inner transport protocol. 9) Extend nft_inner to parse GENEVE optional fields to calculate the link layer offset. 10) Update inner expression so tunnel offset points to GRE header to normalize tunnel header handling. This also allows to perform different interpretations of the GRE header from userspace. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nft_inner: set tunnel offset to GRE header offset netfilter: nft_inner: add geneve support netfilter: nft_meta: add inner match support netfilter: nft_inner: add percpu inner context netfilter: nft_inner: support for inner tunnel header matching netfilter: nft_payload: access ipip payload for inner offset netfilter: nft_payload: access GRE payload via inner offset netfilter: nft_objref: make it builtin netfilter: nf_tables: reduce nft_pktinfo by 8 bytes netfilter: nft_payload: move struct nft_payload_set definition where it belongs ==================== Link: https://lore.kernel.org/r/20221026132227.3287-1-pablo@netfilter.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yang Li authored
./drivers/net/ethernet/freescale/dpaa2/dpaa2-xsk.c:453:42-47: WARNING: conversion to bool not needed here Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2577Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20221026051824.38730-1-yang.lee@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Shannon Nelson says: ==================== ionic: VF attr replay and other updates For better VF management when a FW update restart or a FW crash recover is detected, the PF now will replay any user specified VF attributes to be sure the FW hasn't lost them in the restart. Newer FW offers more packet processing offloads, so we now support them in the driver. A small refactor of the Rx buffer fill cleans a bit of code and will help future work on buffer caching. ==================== Link: https://lore.kernel.org/r/20221026143744.11598-1-snelson@pensando.ioSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Neel Patel authored
The same pre-work code is used before each call to ionic_rx_fill(), so bring it in and make it a part of the routine. Signed-off-by: Neel Patel <neel@pensando.io> Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Neel Patel authored
Support stateless offloads for GRE, VXLAN, GENEVE, IPXIP4 and IPXIP6 when the FW supports them. Signed-off-by: Neel Patel <neel@pensando.io> Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Shannon Nelson authored
A new ionic dev_cmd is added to the interface in ionic_if.h, with a new capabilities field in the ionic device identity to signal its availability in the FW. The identity level code is incremented to '2' to show support for this new capabilities bitfield. If the driver has indicated with the new identity level that it has the VF_CTRL command, newer FW will wait for the start command before starting the VFs after a FW update or crash recovery. This patch updates the driver to make use of the new VF start control in fw_up path to be sure that the PF has set the user attributes on the VF before the FW allows the VFs to restart. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Shannon Nelson authored
Report the current FW values for the VF attributes, but don't save the FW values locally, only save the vf attributes that are given to us from the user. This allows us to replay user data, and doesn't end up confusing things like "who set the mac address". Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Shannon Nelson authored
The VF attributes that the user has set into the FW through the PF can be lost over a FW crash recovery. Much like we already replay the PF mac/vlan filters, we now add a replay in the recovery path to be sure the FW has the up-to-date VF configurations. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 27 Oct, 2022 8 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski authored
drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c 2871edb3 ("can: kvaser_usb: Fix possible completions during init_completion") abb86709 ("can: kvaser_usb_leaf: Ignore stale bus-off after start") 8d21f592 ("can: kvaser_usb_leaf: Fix improved state not being reported") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from Jakub Kicinski: "Including fixes from 802.15.4 (Zigbee et al). Current release - regressions: - ipa: fix bugs in the register conversion for IPA v3.1 and v3.5.1 Current release - new code bugs: - mptcp: fix abba deadlock on fastopen - eth: stmmac: rk3588: allow multiple gmac controllers in one system Previous releases - regressions: - ip: rework the fix for dflt addr selection for connected nexthop - net: couple more fixes for misinterpreting bits in struct page after the signature was added Previous releases - always broken: - ipv6: ensure sane device mtu in tunnels - openvswitch: switch from WARN to pr_warn on a user-triggerable path - ethtool: eeprom: fix null-deref on genl_info in dump - ieee802154: more return code fixes for corner cases in dgram_sendmsg - mac802154: fix link-quality-indicator recording - eth: mlx5: fixes for IPsec, PTP timestamps, OvS and conntrack offload - eth: fec: limit register access on i.MX6UL - eth: bcm4908_enet: update TX stats after actual transmission - can: rcar_canfd: improve IRQ handling for RZ/G2L Misc: - genetlink: piggy back on the newly added resv_op_start to enforce more sanity checks on new commands" * tag 'net-6.1-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits) net: enetc: survive memory pressure without crashing kcm: do not sense pfmemalloc status in kcm_sendpage() net: do not sense pfmemalloc status in skb_append_pagefrags() net/mlx5e: Fix macsec sci endianness at rx sa update net/mlx5e: Fix wrong bitwise comparison usage in macsec_fs_rx_add_rule function net/mlx5e: Fix macsec rx security association (SA) update/delete net/mlx5e: Fix macsec coverity issue at rx sa update net/mlx5: Fix crash during sync firmware reset net/mlx5: Update fw fatal reporter state on PCI handlers successful recover net/mlx5e: TC, Fix cloned flow attr instance dests are not zeroed net/mlx5e: TC, Reject forwarding from internal port to internal port net/mlx5: Fix possible use-after-free in async command interface net/mlx5: ASO, Create the ASO SQ with the correct timestamp format net/mlx5e: Update restore chain id for slow path packets net/mlx5e: Extend SKB room check to include PTP-SQ net/mlx5: DR, Fix matcher disconnect error flow net/mlx5: Wait for firmware to enable CRS before pci_restore_state net/mlx5e: Do not increment ESN when updating IPsec ESN state netdevsim: remove dir in nsim_dev_debugfs_init() when creating ports dir failed netdevsim: fix memory leak in nsim_drv_probe() when nsim_dev_resources_register() failed ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds authored
Pull execve fixes from Kees Cook: - Fix an ancient signal action copy race (Bernd Edlinger) - Fix a memory leak in ELF loader, when under memory pressure (Li Zetao) * tag 'execve-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: fs/binfmt_elf: Fix memory leak in load_elf_binary() exec: Copy oldsighand->action under spin-lock
-
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds authored
Pull hardening fixes from Kees Cook: - Fix older Clang vs recent overflow KUnit test additions (Nick Desaulniers, Kees Cook) - Fix kern-doc visibility for overflow helpers (Kees Cook) * tag 'hardening-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: overflow: Refactor test skips for Clang-specific issues overflow: disable failing tests for older clang versions overflow: Fix kern-doc markup for functions
-
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-mediaLinus Torvalds authored
Pull media fixes from Mauro Carvalho Chehab: "A bunch of patches addressing issues in the vivid driver and adding new checks in V4L2 to validate the input parameters from some ioctls" * tag 'media/v6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: vivid.rst: loop_video is set on the capture devnode media: vivid: set num_in/outputs to 0 if not supported media: vivid: drop GFP_DMA32 media: vivid: fix control handler mutex deadlock media: videodev2.h: V4L2_DV_BT_BLANKING_HEIGHT should check 'interlaced' media: v4l2-dv-timings: add sanity checks for blanking values media: vivid: dev->bitmap_cap wasn't freed in all cases media: vivid: s_fbuf: add more sanity checks
-
git://git.kernel.org/pub/scm/fs/fscrypt/fscryptLinus Torvalds authored
Pull fscrypt fix from Eric Biggers: "Fix a memory leak that was introduced by a change that went into -rc1" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fscrypt: fix keyring memory leak on mount failure
-
Vladimir Oltean authored
Under memory pressure, enetc_refill_rx_ring() may fail, and when called during the enetc_open() -> enetc_setup_rxbdr() procedure, this is not checked for. An extreme case of memory pressure will result in exactly zero buffers being allocated for the RX ring, and in such a case it is expected that hardware drops all RX packets due to lack of buffers. This does not happen, because the reset-default value of the consumer and produces index is 0, and this makes the ENETC think that all buffers have been initialized and that it owns them (when in reality none were). The hardware guide explains this best: | Configure the receive ring producer index register RBaPIR with a value | of 0. The producer index is initially configured by software but owned | by hardware after the ring has been enabled. Hardware increments the | index when a frame is received which may consume one or more BDs. | Hardware is not allowed to increment the producer index to match the | consumer index since it is used to indicate an empty condition. The ring | can hold at most RBLENR[LENGTH]-1 received BDs. | | Configure the receive ring consumer index register RBaCIR. The | consumer index is owned by software and updated during operation of the | of the BD ring by software, to indicate that any receive data occupied | in the BD has been processed and it has been prepared for new data. | - If consumer index and producer index are initialized to the same | value, it indicates that all BDs in the ring have been prepared and | hardware owns all of the entries. | - If consumer index is initialized to producer index plus N, it would | indicate N BDs have been prepared. Note that hardware cannot start if | only a single buffer is prepared due to the restrictions described in | (2). | - Software may write consumer index to match producer index anytime | while the ring is operational to indicate all received BDs prior have | been processed and new BDs prepared for hardware. Normally, the value of rx_ring->rcir (consumer index) is brought in sync with the rx_ring->next_to_use software index, but this only happens if page allocation ever succeeded. When PI==CI==0, the hardware appears to receive frames and write them to DMA address 0x0 (?!), then set the READY bit in the BD. The enetc_clean_rx_ring() function (and its XDP derivative) is naturally not prepared to handle such a condition. It will attempt to process those frames using the rx_swbd structure associated with index i of the RX ring, but that structure is not fully initialized (enetc_new_page() does all of that). So what happens next is undefined behavior. To operate using no buffer, we must initialize the CI to PI + 1, which will block the hardware from advancing the CI any further, and drop everything. The issue was seen while adding support for zero-copy AF_XDP sockets, where buffer memory comes from user space, which can even decide to supply no buffers at all (example: "xdpsock --txonly"). However, the bug is present also with the network stack code, even though it would take a very determined person to trigger a page allocation failure at the perfect time (a series of ifup/ifdown under memory pressure should eventually reproduce it given enough retries). Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20221027182925.3256653-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
Similar to changes done in TCP in blamed commit. We should not sense pfmemalloc status in sendpage() methods. Fixes: 32614006 ("tcp: TX zerocopy should not sense pfmemalloc status") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20221027040637.1107703-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-