- 22 Sep, 2022 8 commits
-
-
Jakub Kicinski authored
Jonathan Toppins says: ==================== bonding: fix NULL deref in bond_rr_gen_slave_id Fix a NULL dereference of the struct bonding.rr_tx_counter member because if a bond is initially created with an initial mode != zero (Round Robin) the memory required for the counter is never created and when the mode is changed there is never any attempt to verify the memory is allocated upon switching modes. ==================== Link: https://lore.kernel.org/r/cover.1663694476.git.jtoppins@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jonathan Toppins authored
This bonding selftest used to cause a kernel oops on aarch64 and should be architectures agnostic. Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jonathan Toppins authored
Fix a NULL dereference of the struct bonding.rr_tx_counter member because if a bond is initially created with an initial mode != zero (Round Robin) the memory required for the counter is never created and when the mode is changed there is never any attempt to verify the memory is allocated upon switching modes. This causes the following Oops on an aarch64 machine: [ 334.686773] Unable to handle kernel paging request at virtual address ffff2c91ac905000 [ 334.694703] Mem abort info: [ 334.697486] ESR = 0x0000000096000004 [ 334.701234] EC = 0x25: DABT (current EL), IL = 32 bits [ 334.706536] SET = 0, FnV = 0 [ 334.709579] EA = 0, S1PTW = 0 [ 334.712719] FSC = 0x04: level 0 translation fault [ 334.717586] Data abort info: [ 334.720454] ISV = 0, ISS = 0x00000004 [ 334.724288] CM = 0, WnR = 0 [ 334.727244] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000008044d662000 [ 334.733944] [ffff2c91ac905000] pgd=0000000000000000, p4d=0000000000000000 [ 334.740734] Internal error: Oops: 96000004 [#1] SMP [ 334.745602] Modules linked in: bonding tls veth rfkill sunrpc arm_spe_pmu vfat fat acpi_ipmi ipmi_ssif ixgbe igb i40e mdio ipmi_devintf ipmi_msghandler arm_cmn arm_dsu_pmu cppc_cpufreq acpi_tad fuse zram crct10dif_ce ast ghash_ce sbsa_gwdt nvme drm_vram_helper drm_ttm_helper nvme_core ttm xgene_hwmon [ 334.772217] CPU: 7 PID: 2214 Comm: ping Not tainted 6.0.0-rc4-00133-g64ae13ed #4 [ 334.779950] Hardware name: GIGABYTE R272-P31-00/MP32-AR1-00, BIOS F18v (SCP: 1.08.20211002) 12/01/2021 [ 334.789244] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 334.796196] pc : bond_rr_gen_slave_id+0x40/0x124 [bonding] [ 334.801691] lr : bond_xmit_roundrobin_slave_get+0x38/0xdc [bonding] [ 334.807962] sp : ffff8000221733e0 [ 334.811265] x29: ffff8000221733e0 x28: ffffdbac8572d198 x27: ffff80002217357c [ 334.818392] x26: 000000000000002a x25: ffffdbacb33ee000 x24: ffff07ff980fa000 [ 334.825519] x23: ffffdbacb2e398ba x22: ffff07ff98102000 x21: ffff07ff981029c0 [ 334.832646] x20: 0000000000000001 x19: ffff07ff981029c0 x18: 0000000000000014 [ 334.839773] x17: 0000000000000000 x16: ffffdbacb1004364 x15: 0000aaaabe2f5a62 [ 334.846899] x14: ffff07ff8e55d968 x13: ffff07ff8e55db30 x12: 0000000000000000 [ 334.854026] x11: ffffdbacb21532e8 x10: 0000000000000001 x9 : ffffdbac857178ec [ 334.861153] x8 : ffff07ff9f6e5a28 x7 : 0000000000000000 x6 : 000000007c2b3742 [ 334.868279] x5 : ffff2c91ac905000 x4 : ffff2c91ac905000 x3 : ffff07ff9f554400 [ 334.875406] x2 : ffff2c91ac905000 x1 : 0000000000000001 x0 : ffff07ff981029c0 [ 334.882532] Call trace: [ 334.884967] bond_rr_gen_slave_id+0x40/0x124 [bonding] [ 334.890109] bond_xmit_roundrobin_slave_get+0x38/0xdc [bonding] [ 334.896033] __bond_start_xmit+0x128/0x3a0 [bonding] [ 334.901001] bond_start_xmit+0x54/0xb0 [bonding] [ 334.905622] dev_hard_start_xmit+0xb4/0x220 [ 334.909798] __dev_queue_xmit+0x1a0/0x720 [ 334.913799] arp_xmit+0x3c/0xbc [ 334.916932] arp_send_dst+0x98/0xd0 [ 334.920410] arp_solicit+0xe8/0x230 [ 334.923888] neigh_probe+0x60/0xb0 [ 334.927279] __neigh_event_send+0x3b0/0x470 [ 334.931453] neigh_resolve_output+0x70/0x90 [ 334.935626] ip_finish_output2+0x158/0x514 [ 334.939714] __ip_finish_output+0xac/0x1a4 [ 334.943800] ip_finish_output+0x40/0xfc [ 334.947626] ip_output+0xf8/0x1a4 [ 334.950931] ip_send_skb+0x5c/0x100 [ 334.954410] ip_push_pending_frames+0x3c/0x60 [ 334.958758] raw_sendmsg+0x458/0x6d0 [ 334.962325] inet_sendmsg+0x50/0x80 [ 334.965805] sock_sendmsg+0x60/0x6c [ 334.969286] __sys_sendto+0xc8/0x134 [ 334.972853] __arm64_sys_sendto+0x34/0x4c [ 334.976854] invoke_syscall+0x78/0x100 [ 334.980594] el0_svc_common.constprop.0+0x4c/0xf4 [ 334.985287] do_el0_svc+0x38/0x4c [ 334.988591] el0_svc+0x34/0x10c [ 334.991724] el0t_64_sync_handler+0x11c/0x150 [ 334.996072] el0t_64_sync+0x190/0x194 [ 334.999726] Code: b9001062 f9403c02 d53cd044 8b040042 (b8210040) [ 335.005810] ---[ end trace 0000000000000000 ]--- [ 335.010416] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 335.017279] SMP: stopping secondary CPUs [ 335.021374] Kernel Offset: 0x5baca8eb0000 from 0xffff800008000000 [ 335.027456] PHYS_OFFSET: 0x80000000 [ 335.030932] CPU features: 0x0000,0085c029,19805c82 [ 335.035713] Memory Limit: none [ 335.038756] Rebooting in 180 seconds.. The fix is to allocate the memory in bond_open() which is guaranteed to be called before any packets are processed. Fixes: 848ca918 ("net: bonding: Use per-cpu rr_tx_counter") CC: Jussi Maki <joamaki@gmail.com> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Michael Walle authored
Since commit ece19502 ("net: phy: micrel: 1588 support for LAN8814 phy") the handler always returns IRQ_HANDLED, except in an error case. Before that commit, the interrupt status register was checked and if it was empty, IRQ_NONE was returned. Restore that behavior to play nice with the interrupt line being shared with others. Fixes: ece19502 ("net: phy: micrel: 1588 support for LAN8814 phy") Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Divya Koppera <Divya.Koppera@microchip.com> Link: https://lore.kernel.org/r/20220920141619.808117-1-michael@walle.ccSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Wen Gu authored
There might be a potential race between SMC-R buffer map and link group termination. smc_smcr_terminate_all() | smc_connect_rdma() -------------------------------------------------------------- | smc_conn_create() for links in smcibdev | schedule links down | | smc_buf_create() | \- smcr_buf_map_usable_links() | \- no usable links found, | (rmb->mr = NULL) | | smc_clc_send_confirm() | \- access conn->rmb_desc->mr[]->rkey | (panic) During reboot and IB device module remove, all links will be set down and no usable links remain in link groups. In such situation smcr_buf_map_usable_links() should return an error and stop the CLC flow accessing to uninitialized mr. Fixes: b9247544 ("net/smc: convert static link ID instances to support multiple links") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Link: https://lore.kernel.org/r/1663656189-32090-1-git-send-email-guwen@linux.alibaba.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queueJakub Kicinski authored
Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-09-20 (ice) Michal re-sets TC configuration when changing number of queues. Mateusz moves the check and call for link-down-on-close to the specific path for downing/closing the interface. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Fix interface being down after reset with link-down-on-close flag on ice: config netdev tc before setting queues number ==================== Link: https://lore.kernel.org/r/20220920205344.1860934-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Larysa Zaremba authored
The original patch added the static branch to handle the situation, when assigning an XDP TX queue to every CPU is not possible, so they have to be shared. However, in the XDP transmit handler ice_xdp_xmit(), an error was returned in such cases even before static condition was checked, thus making queue sharing still impossible. Fixes: 22bf877e ("ice: introduce XDP_TX fallback path") Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/r/20220919134346.25030-1-larysa.zaremba@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queueJakub Kicinski authored
Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-09-19 (iavf, i40e) Norbert adds checking of buffer size for Rx buffer checks in iavf. Michal corrects setting of max MTU in iavf to account for MTU data provided by PF, fixes i40e to set VF max MTU, and resolves lack of rate limiting when value was less than divisor for i40e. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: i40e: Fix set max_tx_rate when it is lower than 1 Mbps i40e: Fix VF set max MTU size iavf: Fix set max MTU size with port VLAN and jumbo frames iavf: Fix bad page state ==================== Link: https://lore.kernel.org/r/20220919223428.572091-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 21 Sep, 2022 8 commits
-
-
Jakub Kicinski authored
Merge tag 'linux-can-fixes-for-6.0-20220921' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2022-09-21 The 1st patch is by me, targets the flexcan driver and fixes a potential system hang on single core systems under high CAN packet rate. The next 2 patches are also by me and target the gs_usb driver. A potential race condition during the ndo_open callback as well as the return value if the ethtool identify feature is not supported are fixed. * tag 'linux-can-fixes-for-6.0-20220921' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: gs_usb: gs_usb_set_phys_id(): return with error if identify is not supported can: gs_usb: gs_can_open(): fix race dev->can.state condition can: flexcan: flexcan_mailbox_read() fix return value for drop = true ==================== Link: https://lore.kernel.org/r/20220921083609.419768-1-mkl@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jianglei Nie authored
If aq_nic_stop() fails, aq_ndev_close() returns err without calling aq_nic_deinit() to release the relevant memory and resource, which will lead to a memory leak. We can fix it by deleting the if condition judgment and goto statement to call aq_nic_deinit() directly after aq_nic_stop() to fix the memory leak. Signed-off-by: Jianglei Nie <niejianglei2021@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfDavid S. Miller authored
Florian Westphal says: ==================== netfilter: bugfixes for net The following set contains netfilter fixes for the *net* tree. Regressions (rc only): recent ebtables crash fix was incomplete, it added a memory leak. The patch to fix possible buffer overrun for BIG TCP in ftp conntrack tried to be too clever, we cannot re-use ct->lock: NAT engine might grab it again -> deadlock. Revert back to a global spinlock. Both from myself. Remove the documentation for the recently removed 'nf_conntrack_helper' sysctl as well, from Pablo Neira. The static_branch_inc() that guards the 'chain stats enabled' path needs to be deferred further, until the entire transaction was created. From Tetsuo Handa. Older bugs: Since 5.3: nf_tables_addchain may leak pcpu memory in error path when offloading fails. Also from Tetsuo Handa. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Marc Kleine-Budde authored
Until commit 409c188c ("can: tree-wide: advertise software timestamping capabilities") the ethtool_ops was only assigned for devices which support the GS_CAN_FEATURE_IDENTIFY feature. That commit assigns ethtool_ops unconditionally. This results on controllers without GS_CAN_FEATURE_IDENTIFY support for the following ethtool error: | $ ethtool -p can0 1 | Cannot identify NIC: Broken pipe Restore the correct error value by checking for GS_CAN_FEATURE_IDENTIFY in the gs_usb_set_phys_id() function. | $ ethtool -p can0 1 | Cannot identify NIC: Operation not supported While there use the variable "netdev" for the "struct net_device" pointer and "dev" for the "struct gs_can" pointer as in the rest of the driver. Fixes: 409c188c ("can: tree-wide: advertise software timestamping capabilities") Link: http://lore.kernel.org/all/20220818143853.2671854-1-mkl@pengutronix.de Cc: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Marc Kleine-Budde authored
The dev->can.state is set to CAN_STATE_ERROR_ACTIVE, after the device has been started. On busy networks the CAN controller might receive CAN frame between and go into an error state before the dev->can.state is assigned. Assign dev->can.state before starting the controller to close the race window. Fixes: d08e973a ("can: gs_usb: Added support for the GS_USB CAN devices") Link: https://lore.kernel.org/all/20220920195216.232481-1-mkl@pengutronix.deSigned-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Marc Kleine-Budde authored
The following happened on an i.MX25 using flexcan with many packets on the bus: The rx-offload queue reached a length more than skb_queue_len_max. In can_rx_offload_offload_one() the drop variable was set to true which made the call to .mailbox_read() (here: flexcan_mailbox_read()) to _always_ return ERR_PTR(-ENOBUFS) and drop the rx'ed CAN frame. So can_rx_offload_offload_one() returned ERR_PTR(-ENOBUFS), too. can_rx_offload_irq_offload_fifo() looks as follows: | while (1) { | skb = can_rx_offload_offload_one(offload, 0); | if (IS_ERR(skb)) | continue; | if (!skb) | break; | ... | } The flexcan driver wrongly always returns ERR_PTR(-ENOBUFS) if drop is requested, even if there is no CAN frame pending. As the i.MX25 is a single core CPU, while the rx-offload processing is active, there is no thread to process packets from the offload queue. So the queue doesn't get any shorter and this results is a tight loop. Instead of always returning ERR_PTR(-ENOBUFS) if drop is requested, return NULL if no CAN frame is pending. Changes since v1: https://lore.kernel.org/all/20220810144536.389237-1-u.kleine-koenig@pengutronix.de - don't break in can_rx_offload_irq_offload_fifo() in case of an error, return NULL in flexcan_mailbox_read() in case of no pending CAN frame instead Fixes: 4e9c9484 ("can: rx-offload: Prepare for CAN FD support") Link: https://lore.kernel.org/all/20220811094254.1864367-1-mkl@pengutronix.de Cc: stable@vger.kernel.org # v5.5 Suggested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Tested-by: Thorsten Scherer <t.scherer@eckelmann.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-
Geert Uytterhoeven authored
Since commit 744d23c7 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state"), a warning splat is printed during system resume with Wake-on-LAN disabled: WARNING: CPU: 0 PID: 626 at drivers/net/phy/phy_device.c:323 mdio_bus_phy_resume+0xbc/0xe4 As the Renesas SuperH Ethernet driver already calls phy_{stop,start}() in its suspend/resume callbacks, it is sufficient to just mark the MAC responsible for managing the power state of the PHY. Fixes: fba863b8 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/c6e1331b9bef61225fa4c09db3ba3e2e7214ba2d.1663598886.git.geert+renesas@glider.beSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geert Uytterhoeven authored
Since commit 744d23c7 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state"), a warning splat is printed during system resume with Wake-on-LAN disabled: WARNING: CPU: 0 PID: 1197 at drivers/net/phy/phy_device.c:323 mdio_bus_phy_resume+0xbc/0xc8 As the Renesas Ethernet AVB driver already calls phy_{stop,start}() in its suspend/resume callbacks, it is sufficient to just mark the MAC responsible for managing the power state of the PHY. Fixes: fba863b8 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/8ec796f47620980fdd0403e21bd8b7200b4fa1d4.1663598796.git.geert+renesas@glider.beSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 20 Sep, 2022 24 commits
-
-
Florian Westphal authored
We can't use ct->lock, this is already used by the seqadj internals. When using ftp helper + nat, seqadj will attempt to acquire ct->lock again. Revert back to a global lock for now. Fixes: c783a29c ("netfilter: nf_ct_ftp: prefer skb_linearize") Reported-by: Bruno de Paula Larini <bruno.larini@riosoft.com.br> Signed-off-by: Florian Westphal <fw@strlen.de>
-
Florian Westphal authored
The bug fix was incomplete, it "replaced" crash with a memory leak. The old code had an assignment to "ret" embedded into the conditional, restore this. Fixes: 7997eff8 ("netfilter: ebtables: reject blobs that don't provide all entry points") Reported-and-tested-by: syzbot+a24c5252f3e3ab733464@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de>
-
Tetsuo Handa authored
It seems to me that percpu memory for chain stats started leaking since commit 3bc158f8 ("netfilter: nf_tables: map basechain priority to hardware priority") when nft_chain_offload_priority() returned an error. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 3bc158f8 ("netfilter: nf_tables: map basechain priority to hardware priority") Signed-off-by: Florian Westphal <fw@strlen.de>
-
Tetsuo Handa authored
syzbot is reporting underflow of nft_counters_enabled counter at nf_tables_addchain() [1], for commit 43eb8949 ("netfilter: nf_tables: do not leave chain stats enabled on error") missed that nf_tables_chain_destroy() after nft_basechain_init() in the error path of nf_tables_addchain() decrements the counter because nft_basechain_init() makes nft_is_base_chain() return true by setting NFT_CHAIN_BASE flag. Increment the counter immediately after returning from nft_basechain_init(). Link: https://syzkaller.appspot.com/bug?extid=b5d82a651b71cd8a75ab [1] Reported-by: syzbot <syzbot+b5d82a651b71cd8a75ab@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot <syzbot+b5d82a651b71cd8a75ab@syzkaller.appspotmail.com> Fixes: 43eb8949 ("netfilter: nf_tables: do not leave chain stats enabled on error") Signed-off-by: Florian Westphal <fw@strlen.de>
-
Pablo Neira Ayuso authored
This toggle has been already remove by b1185090 ("netfilter: remove nf_conntrack_helper sysctl and modparam toggles"). Remove the documentation entry for this toggle too. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
-
Bhupesh Sharma authored
As suggested by Vinod, adding myself as the reviewer for the Qualcomm ETHQOS Ethernet driver. Recently I have enabled this driver on a few Qualcomm SoCs / boards and hence trying to keep a close eye on it. Signed-off-by: Bhupesh Sharma <bhupesh.sharma@linaro.org> Acked-by: Vinod Koul <vkoul@kernel.org> Link: https://lore.kernel.org/r/20220915112804.3950680-1-bhupesh.sharma@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Mateusz Palczewski authored
When performing a reset on ice driver with link-down-on-close flag on interface would always stay down. Fix this by moving a check of this flag to ice_stop() that is called only when user wants to bring interface down. Fixes: ab4ab73f ("ice: Add ethtool private flag to make forcing link down optional") Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Petr Oros <poros@redhat.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Michal Swiatkowski authored
After lowering number of tx queues the warning appears: "Number of in use tx queues changed invalidating tc mappings. Priority traffic classification disabled!" Example command to reproduce: ethtool -L enp24s0f0 tx 36 rx 36 Fix this by setting correct tc mapping before setting real number of queues on netdev. Fixes: 0754d65b ("ice: Add infrastructure for mqprio support via ndo_setup_tc") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Jakub Kicinski authored
Vladimir Oltean says: ==================== Fixes for tc-taprio software mode While working on some new features for tc-taprio, I found some strange behavior which looked like bugs. I was able to eventually trigger a NULL pointer dereference. This patch set fixes 2 issues I saw. Detailed explanation in patches. ==================== Link: https://lore.kernel.org/r/20220915100802.2308279-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Vladimir Oltean authored
taprio can only operate as root qdisc, and to that end, there exists the following check in taprio_init(), just as in mqprio: if (sch->parent != TC_H_ROOT) return -EOPNOTSUPP; And indeed, when we try to attach taprio to an mqprio child, it fails as expected: $ tc qdisc add dev swp0 root handle 1: mqprio num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0 $ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \ flags 0x0 clockid CLOCK_TAI Error: sch_taprio: Can only be attached as root qdisc. (extack message added by me) But when we try to attach a taprio child to a taprio root qdisc, surprisingly it doesn't fail: $ tc qdisc replace dev swp0 root handle 1: taprio num_tc 8 \ map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \ flags 0x0 clockid CLOCK_TAI $ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \ flags 0x0 clockid CLOCK_TAI This is because tc_modify_qdisc() behaves differently when mqprio is root, vs when taprio is root. In the mqprio case, it finds the parent qdisc through p = qdisc_lookup(dev, TC_H_MAJ(clid)), and then the child qdisc through q = qdisc_leaf(p, clid). This leaf qdisc q has handle 0, so it is ignored according to the comment right below ("It may be default qdisc, ignore it"). As a result, tc_modify_qdisc() goes through the qdisc_create() code path, and this gives taprio_init() a chance to check for sch_parent != TC_H_ROOT and error out. Whereas in the taprio case, the returned q = qdisc_leaf(p, clid) is different. It is not the default qdisc created for each netdev queue (both taprio and mqprio call qdisc_create_dflt() and keep them in a private q->qdiscs[], or priv->qdiscs[], respectively). Instead, taprio makes qdisc_leaf() return the _root_ qdisc, aka itself. When taprio does that, tc_modify_qdisc() goes through the qdisc_change() code path, because the qdisc layer never finds out about the child qdisc of the root. And through the ->change() ops, taprio has no reason to check whether its parent is root or not, just through ->init(), which is not called. The problem is the taprio_leaf() implementation. Even though code wise, it does the exact same thing as mqprio_leaf() which it is copied from, it works with different input data. This is because mqprio does not attach itself (the root) to each device TX queue, but one of the default qdiscs from its private array. In fact, since commit 13511704 ("net: taprio offload: enforce qdisc to netdev queue mapping"), taprio does this too, but just for the full offload case. So if we tried to attach a taprio child to a fully offloaded taprio root qdisc, it would properly fail too; just not to a software root taprio. To fix the problem, stop looking at the Qdisc that's attached to the TX queue, and instead, always return the default qdiscs that we've allocated (and to which we privately enqueue and dequeue, in software scheduling mode). Since Qdisc_class_ops :: leaf is only called from tc_modify_qdisc(), the risk of unforeseen side effects introduced by this change is minimal. Fixes: 5a781ccb ("tc: Add support for configuring the taprio scheduler") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Vladimir Oltean authored
In an incredibly strange API design decision, qdisc->destroy() gets called even if qdisc->init() never succeeded, not exclusively since commit 87b60cfa ("net_sched: fix error recovery at qdisc creation"), but apparently also earlier (in the case of qdisc_create_dflt()). The taprio qdisc does not fully acknowledge this when it attempts full offload, because it starts off with q->flags = TAPRIO_FLAGS_INVALID in taprio_init(), then it replaces q->flags with TCA_TAPRIO_ATTR_FLAGS parsed from netlink (in taprio_change(), tail called from taprio_init()). But in taprio_destroy(), we call taprio_disable_offload(), and this determines what to do based on FULL_OFFLOAD_IS_ENABLED(q->flags). But looking at the implementation of FULL_OFFLOAD_IS_ENABLED() (a bitwise check of bit 1 in q->flags), it is invalid to call this macro on q->flags when it contains TAPRIO_FLAGS_INVALID, because that is set to U32_MAX, and therefore FULL_OFFLOAD_IS_ENABLED() will return true on an invalid set of flags. As a result, it is possible to crash the kernel if user space forces an error between setting q->flags = TAPRIO_FLAGS_INVALID, and the calling of taprio_enable_offload(). This is because drivers do not expect the offload to be disabled when it was never enabled. The error that we force here is to attach taprio as a non-root qdisc, but instead as child of an mqprio root qdisc: $ tc qdisc add dev swp0 root handle 1: \ mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0 $ tc qdisc replace dev swp0 parent 1:1 \ taprio num_tc 8 map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 \ sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \ flags 0x0 clockid CLOCK_TAI Unable to handle kernel paging request at virtual address fffffffffffffff8 [fffffffffffffff8] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 96000004 [#1] PREEMPT SMP Call trace: taprio_dump+0x27c/0x310 vsc9959_port_setup_tc+0x1f4/0x460 felix_port_setup_tc+0x24/0x3c dsa_slave_setup_tc+0x54/0x27c taprio_disable_offload.isra.0+0x58/0xe0 taprio_destroy+0x80/0x104 qdisc_create+0x240/0x470 tc_modify_qdisc+0x1fc/0x6b0 rtnetlink_rcv_msg+0x12c/0x390 netlink_rcv_skb+0x5c/0x130 rtnetlink_rcv+0x1c/0x2c Fix this by keeping track of the operations we made, and undo the offload only if we actually did it. I've added "bool offloaded" inside a 4 byte hole between "int clockid" and "atomic64_t picos_per_byte". Now the first cache line looks like below: $ pahole -C taprio_sched net/sched/sch_taprio.o struct taprio_sched { struct Qdisc * * qdiscs; /* 0 8 */ struct Qdisc * root; /* 8 8 */ u32 flags; /* 16 4 */ enum tk_offsets tk_offset; /* 20 4 */ int clockid; /* 24 4 */ bool offloaded; /* 28 1 */ /* XXX 3 bytes hole, try to pack */ atomic64_t picos_per_byte; /* 32 0 */ /* XXX 8 bytes hole, try to pack */ spinlock_t current_entry_lock; /* 40 0 */ /* XXX 8 bytes hole, try to pack */ struct sched_entry * current_entry; /* 48 8 */ struct sched_gate_list * oper_sched; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ Fixes: 9c66d156 ("taprio: Add support for hardware offloading") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
The global 'raw_v6_hashinfo' variable can be accessed even when IPv6 is administratively disabled via the 'ipv6.disable=1' kernel command line option, leading to a crash [1]. Fix by restoring the original behavior and always initializing the variable, regardless of IPv6 support being administratively disabled or not. [1] BUG: unable to handle page fault for address: ffffffffffffffc8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 173e18067 P4D 173e18067 PUD 173e1a067 PMD 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 3 PID: 271 Comm: ss Not tainted 6.0.0-rc4-custom-00136-g0727a9a5 #1396 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014 RIP: 0010:raw_diag_dump+0x310/0x7f0 [...] Call Trace: <TASK> __inet_diag_dump+0x10f/0x2e0 netlink_dump+0x575/0xfd0 __netlink_dump_start+0x67b/0x940 inet_diag_handler_cmd+0x273/0x2d0 sock_diag_rcv_msg+0x317/0x440 netlink_rcv_skb+0x15e/0x430 sock_diag_rcv+0x2b/0x40 netlink_unicast+0x53b/0x800 netlink_sendmsg+0x945/0xe60 ____sys_sendmsg+0x747/0x960 ___sys_sendmsg+0x13a/0x1e0 __sys_sendmsg+0x118/0x1e0 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 1da177e4 ("Linux-2.6.12-rc2") Fixes: 0daf07e5 ("raw: convert raw sockets to RCU") Reported-by: Roberto Ricci <rroberto2r@gmail.com> Tested-by: Roberto Ricci <rroberto2r@gmail.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220916084821.229287-1-idosch@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Vladimir Oltean authored
TSN features on the ENETC (taprio, cbs, gate, police) are configured through a mix of command BD ring messages and port registers: enetc_port_rd(), enetc_port_wr(). Port registers are a region of the ENETC memory map which are only accessible from the PCIe Physical Function. They are not accessible from the Virtual Functions. Moreover, attempting to access these registers crashes the kernel: $ echo 1 > /sys/bus/pci/devices/0000\:00\:00.0/sriov_numvfs pci 0000:00:01.0: [1957:ef00] type 00 class 0x020001 fsl_enetc_vf 0000:00:01.0: Adding to iommu group 15 fsl_enetc_vf 0000:00:01.0: enabling device (0000 -> 0002) fsl_enetc_vf 0000:00:01.0 eno0vf0: renamed from eth0 $ tc qdisc replace dev eno0vf0 root taprio num_tc 8 map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 \ sched-entry S 0x7f 900000 sched-entry S 0x80 100000 flags 0x2 Unable to handle kernel paging request at virtual address ffff800009551a08 Internal error: Oops: 96000007 [#1] PREEMPT SMP pc : enetc_setup_tc_taprio+0x170/0x47c lr : enetc_setup_tc_taprio+0x16c/0x47c Call trace: enetc_setup_tc_taprio+0x170/0x47c enetc_setup_tc+0x38/0x2dc taprio_change+0x43c/0x970 taprio_init+0x188/0x1e0 qdisc_create+0x114/0x470 tc_modify_qdisc+0x1fc/0x6c0 rtnetlink_rcv_msg+0x12c/0x390 Split enetc_setup_tc() into separate functions for the PF and for the VF drivers. Also remove enetc_qos.o from being included into enetc-vf.ko, since it serves absolutely no purpose there. Fixes: 34c6adf1 ("enetc: Configure the Time-Aware Scheduler via tc-taprio offload") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220916133209.3351399-2-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Vladimir Oltean authored
The VF netdev driver shouldn't respond to changes in the NETIF_F_HW_TC flag; only PFs should. Moreover, TSN-specific code should go to enetc_qos.c, which should not be included in the VF driver. Fixes: 79e49982 ("net: enetc: add hw tc hw offload features for PSPF capability") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220916133209.3351399-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Jason A. Donenfeld says: ==================== wireguard patches for 6.0-rc6 1) The ratelimiter timing test doesn't help outside of development, yet it is currently preventing the module from being inserted on some kernels when it flakes at insertion time. So we disable it. 2) A fix for a build error on UML, caused by a recent change in a different tree. 3) A WARN_ON() is triggered by Kees' new fortified memcpy() patch, due to memcpy()ing over a sockaddr pointer with the size of a sockaddr_in[6]. The type safe fix is pretty simple. Given how classic of a thing sockaddr punning is, I suspect this may be the first in a few patches like this throughout the net tree, once Kees' fortify series is more widely deployed (current it's just in next). ==================== Link: https://lore.kernel.org/r/20220916143740.831881-1-Jason@zx2c4.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jason A. Donenfeld authored
Doing a variable-sized memcpy is slower, and the compiler isn't smart enough to turn this into a constant-size assignment. Further, Kees' latest fortified memcpy will actually bark, because the destination pointer is type sockaddr, not explicitly sockaddr_in or sockaddr_in6, so it thinks there's an overflow: memcpy: detected field-spanning write (size 28) of single field "&endpoint.addr" at drivers/net/wireguard/netlink.c:446 (size 16) Fix this by just assigning by using explicit casts for each checked case. Fixes: e7096c13 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reported-by: syzbot+a448cda4dba2dac50de5@syzkaller.appspotmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jason A. Donenfeld authored
Since 1b620d53 ("kbuild: disable header exports for UML in a straightforward way"), installing headers fails on UML, so just disable installing them, since they're not needed anyway on the architecture. Fixes: b438b3b8 ("wireguard: selftests: support UML") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jason A. Donenfeld authored
A previous commit tried to make the ratelimiter timings test more reliable but in the process made it less reliable on other configurations. This is an impossible problem to solve without increasingly ridiculous heuristics. And it's not even a problem that actually needs to be solved in any comprehensive way, since this is only ever used during development. So just cordon this off with a DEBUG_ ifdef, just like we do for the trie's randomized tests, so it can be enabled while hacking on the code, and otherwise disabled in CI. In the process we also revert 151c8e49. Fixes: 151c8e49 ("wireguard: ratelimiter: use hrtimer in selftest") Fixes: e7096c13 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Íñigo Huguet authored
Like in previous patch for sfc, prevent potential (but unlikely) NULL pointer dereference. Fixes: 12804793 ("sfc: decouple TXQ type from label") Reported-by: Tianhao Zhao <tizhao@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Link: https://lore.kernel.org/r/20220915141958.16458-1-ihuguet@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Íñigo Huguet authored
As in previous commit for sfc, fix TX channels offset when efx_siena_separate_tx_channels is false (the default) Fixes: 25bde571 ("sfc/siena: fix wrong tx channel offset with efx_separate_tx_channels") Reported-by: Tianhao Zhao <tizhao@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Link: https://lore.kernel.org/r/20220915141653.15504-1-ihuguet@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tetsuo Handa authored
syzbot is still complaining uninit-value in tcp_recvmsg(), for commit 1228b34c ("net: clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user()") missed that __get_compat_msghdr() is called instead of copy_msghdr_from_user() when MSG_CMSG_COMPAT is specified. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 1228b34c ("net: clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user()") Reviewed-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/d06d0f7f-696c-83b4-b2d5-70b5f2730a37@I-love.SAKURA.ne.jpSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Ido Schimmel says: ==================== ipmr: Always call ip{,6}_mr_forward() from RCU read-side critical section Patch #1 fixes a bug in ipmr code. Patch #2 adds corresponding test cases. ==================== Link: https://lore.kernel.org/r/20220914075339.4074096-1-idosch@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
Add IPv4 and IPv6 test cases for unresolved multicast routes, testing that queued packets are forwarded after installing a matching (S, G) route. The test cases can be used to reproduce the bugs fixed in "ipmr: Always call ip{,6}_mr_forward() from RCU read-side critical section". Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Ido Schimmel authored
These functions expect to be called from RCU read-side critical section, but this only happens when invoked from the data path via ip{,6}_mr_input(). They can also be invoked from process context in response to user space adding a multicast route which resolves a cache entry with queued packets [1][2]. Fix by adding missing rcu_read_lock() / rcu_read_unlock() in these call paths. [1] WARNING: suspicious RCU usage 6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387 Not tainted ----------------------------- net/ipv4/ipmr.c:84 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by smcrouted/246: #0: ffffffff862389b0 (rtnl_mutex){+.+.}-{3:3}, at: ip_mroute_setsockopt+0x11c/0x1420 stack backtrace: CPU: 0 PID: 246 Comm: smcrouted Not tainted 6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x91/0xb9 vif_dev_read+0xbf/0xd0 ipmr_queue_xmit+0x135/0x1ab0 ip_mr_forward+0xe7b/0x13d0 ipmr_mfc_add+0x1a06/0x2ad0 ip_mroute_setsockopt+0x5c1/0x1420 do_ip_setsockopt+0x23d/0x37f0 ip_setsockopt+0x56/0x80 raw_setsockopt+0x219/0x290 __sys_setsockopt+0x236/0x4d0 __x64_sys_setsockopt+0xbe/0x160 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [2] WARNING: suspicious RCU usage 6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387 Not tainted ----------------------------- net/ipv6/ip6mr.c:69 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by smcrouted/246: #0: ffffffff862389b0 (rtnl_mutex){+.+.}-{3:3}, at: ip6_mroute_setsockopt+0x6b9/0x2630 stack backtrace: CPU: 1 PID: 246 Comm: smcrouted Not tainted 6.0.0-rc3-custom-15969-g049d233c8bcc-dirty #1387 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x91/0xb9 vif_dev_read+0xbf/0xd0 ip6mr_forward2.isra.0+0xc9/0x1160 ip6_mr_forward+0xef0/0x13f0 ip6mr_mfc_add+0x1ff2/0x31f0 ip6_mroute_setsockopt+0x1825/0x2630 do_ipv6_setsockopt+0x462/0x4440 ipv6_setsockopt+0x105/0x140 rawv6_setsockopt+0xd8/0x690 __sys_setsockopt+0x236/0x4d0 __x64_sys_setsockopt+0xbe/0x160 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: ebc31979 ("ipmr: add rcu protection over (struct vif_device)->dev") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-