- 18 Dec, 2022 1 commit
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queueDavid S. Miller authored
Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-12-15 (igc) Muhammad Husaini Zulkifli says: This patch series fixes bugs for the Time-Sensitive Networking(TSN) Qbv Scheduling features. An overview of each patch series is given below: Patch 1: Using a first flag bit to schedule a packet to the next cycle if packet cannot fit in current Qbv cycle. Patch 2: Enable strict cycle for Qbv scheduling. Patch 3: Prevent user to set basetime less than zero during tc config. Patch 4: Allow the basetime enrollment with zero value. Patch 5: Calculate the new end time value to exclude the time interval that exceed the cycle time as user can specify the cycle time in tc config. Patch 6: Resolve the HW bugs where the gate is not fully closed. --- This contains the net patches from this original pull request: https://lore.kernel.org/netdev/20221205212414.3197525-1-anthony.l.nguyen@intel.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 17 Dec, 2022 3 commits
-
-
Subash Abhinov Kasiviswanathan authored
Extending the tail can have some unexpected side effects if a program uses a helper like BPF_FUNC_skb_pull_data to read partial content beyond the head skb headlen when all the skbs in the gso frag_list are linear with no head_frag - kernel BUG at net/core/skbuff.c:4219! pc : skb_segment+0xcf4/0xd2c lr : skb_segment+0x63c/0xd2c Call trace: skb_segment+0xcf4/0xd2c __udp_gso_segment+0xa4/0x544 udp4_ufo_fragment+0x184/0x1c0 inet_gso_segment+0x16c/0x3a4 skb_mac_gso_segment+0xd4/0x1b0 __skb_gso_segment+0xcc/0x12c udp_rcv_segment+0x54/0x16c udp_queue_rcv_skb+0x78/0x144 udp_unicast_rcv_skb+0x8c/0xa4 __udp4_lib_rcv+0x490/0x68c udp_rcv+0x20/0x30 ip_protocol_deliver_rcu+0x1b0/0x33c ip_local_deliver+0xd8/0x1f0 ip_rcv+0x98/0x1a4 deliver_ptype_list_skb+0x98/0x1ec __netif_receive_skb_core+0x978/0xc60 Fix this by marking these skbs as GSO_DODGY so segmentation can handle the tail updates accordingly. Fixes: 3dcbdb13 ("net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list") Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com> Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/1671084718-24796-1-git-send-email-quic_subashab@quicinc.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Take the instance lock around devlink_nl_fill() when dumping, doit takes it already. We are only dumping basic info so in the worst case we were risking data races around the reload statistics. Until the big devlink mutex was removed all relevant code was protected by it, so the missing instance lock was not exposed. Fixes: d3efc2a6 ("net: devlink: remove devlink_mutex") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20221216044122.1863550-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Arnd Bergmann authored
The #ifdef check is incorrect and leads to a warning: drivers/net/ethernet/ti/am65-cpsw-nuss.c:1679:13: error: 'am65_cpsw_nuss_remove_rx_chns' defined but not used [-Werror=unused-function] 1679 | static void am65_cpsw_nuss_remove_rx_chns(void *data) It's better to remove the #ifdef here and use the modern SYSTEM_SLEEP_PM_OPS() macro instead. Fixes: 24bc19b0 ("net: ethernet: ti: am65-cpsw: Add suspend/resume support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20221215163918.611609-1-arnd@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 16 Dec, 2022 7 commits
-
-
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski authored
Daniel Borkmann says: ==================== pull-request: bpf 2022-12-16 We've added 7 non-merge commits during the last 2 day(s) which contain a total of 9 files changed, 119 insertions(+), 36 deletions(-). 1) Fix for recent syzkaller XDP dispatcher update splat, from Jiri Olsa. 2) Fix BPF program refcount leak in LSM attachment failure path, from Milan Landaverde. 3) Fix BPF program type in map compatibility check for fext, from Toke Høiland-Jørgensen. 4) Fix a BPF selftest compilation error under !CONFIG_SMP config, from Yonghong Song. 5) Fix CI to enable CONFIG_FUNCTION_ERROR_INJECTION after it got changed to a prompt, from Song Liu. 6) Various BPF documentation fixes for socket local storage, from Donald Hunter. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add a test for using a cpumap from an freplace-to-XDP program bpf: Resolve fext program type when checking map compatibility bpf: Synchronize dispatcher update with bpf_dispatcher_xdp_func bpf: prevent leak of lsm program after failed attach selftests/bpf: Select CONFIG_FUNCTION_ERROR_INJECTION selftests/bpf: Fix a selftest compilation error with CONFIG_SMP=n docs/bpf: Reword docs for BPF_MAP_TYPE_SK_STORAGE ==================== Link: https://lore.kernel.org/r/20221216174540.16598-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eelco Chaudron authored
The commit mentioned below causes the ovs_flow_tbl_lookup() function to be called with the masked key. However, it's supposed to be called with the unmasked key. This due to the fact that the datapath supports installing wider flows, and OVS relies on this behavior. For example if ipv4(src=1.1.1.1/192.0.0.0, dst=1.1.1.2/192.0.0.0) exists, a wider flow (smaller mask) of ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/ 128.0.0.0) is allowed to be added. However, if we try to add a wildcard rule, the installation fails: $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \ ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2 $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \ ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2 ovs-vswitchd: updating flow table (File exists) The reason is that the key used to determine if the flow is already present in the system uses the original key ANDed with the mask. This results in the IP address not being part of the (miniflow) key, i.e., being substituted with an all-zero value. When doing the actual lookup, this results in the key wrongfully matching the first flow, and therefore the flow does not get installed. This change reverses the commit below, but rather than having the key on the stack, it's allocated. Fixes: 190aa3e7 ("openvswitch: Fix Frame-size larger than 1024 bytes warning.") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Jakub Kicinski says: ==================== devlink: region snapshot locking fix and selftest adjustments Minor fix for region snapshot locking and adjustments to selftests. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
NetworkManager (and other daemons) may bring the interface up and cause failures in quiescence checks. Print a helpful warning, and take the interface down again. I seem to forget about this every time I run these tests on a new VM. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
$number + > bash means redirect FD $number, e.g. commonly used 2> redirects stderr (fd 2). The test uses 8192> to write the number 8192 to a file, this results in: ./devlink.sh: line 499: 8192: Bad file descriptor Oddly the test also papers over this issue by checking for failure (expecting an error rather than success) so it passes, anyway. Fixes: ff18176a ("selftests: Add a test of large binary to devlink health test") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Netdevsim triggers a splat on reload, when it destroys regions with snapshots pending: WARNING: CPU: 1 PID: 787 at net/core/devlink.c:6291 devlink_region_snapshot_del+0x12e/0x140 CPU: 1 PID: 787 Comm: devlink Not tainted 6.1.0-07460-g7ae9888d #580 RIP: 0010:devlink_region_snapshot_del+0x12e/0x140 Call Trace: <TASK> devl_region_destroy+0x70/0x140 nsim_dev_reload_down+0x2f/0x60 [netdevsim] devlink_reload+0x1f7/0x360 devlink_nl_cmd_reload+0x6ce/0x860 genl_family_rcv_msg_doit.isra.0+0x145/0x1c0 This is the locking assert in devlink_region_snapshot_del(), we're supposed to be holding the region->snapshot_lock here. Fixes: 2dec18ad ("net: devlink: remove region snapshots list dependency on devlink->lock") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Golle authored
Russell King correctly pointed out that the MAC_2500FD capability is already added for port 5 (if not in RGMII mode) and port 6 (which only supports SGMII) by mt7531_mac_port_get_caps. Remove the reduntant setting of this capability flag which was added by a previous commit. Fixes: e19de30d ("net: dsa: mt7530: add support for in-band link status") Reported-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://lore.kernel.org/r/Y5qY7x6la5TxZxzX@makrotopia.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 15 Dec, 2022 15 commits
-
-
Tan Tee Min authored
The default setting of end_time minus start_time is whole 1 second. Thus, if it's not being configured in any GCL entry then it will be staying at original 1 second. This patch is changing the start_time and end_time to be end_time as if setting zero will be having weird HW behavior where the gate will not be fully closed. Fixes: ec50a9d4 ("igc: Add support for taprio offloading") Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Tan Tee Min authored
Qbv users can specify a cycle time that is not equal to the total GCL intervals. Hence, recalculation is necessary here to exclude the time interval that exceeds the cycle time. As those GCL which exceeds the cycle time will be truncated. According to IEEE Std. 802.1Q-2018 section 8.6.9.2, once the end of the list is reached, it will switch to the END_OF_CYCLE state and leave the gates in the same state until the next cycle is started. Fixes: ec50a9d4 ("igc: Add support for taprio offloading") Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Tan Tee Min authored
Introduce qbv_enable flag in igc_adapter struct to store the Qbv on/off. So this allow the BaseTime to enroll with zero value. Fixes: 61572d5f ("igc: Simplify TSN flags handling") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Muhammad Husaini Zulkifli authored
Using the tc qdisc command, the user can set basetime to any value. Checking should be done on the driver's side to prevent registering basetime values that are less than zero. Fixes: ec50a9d4 ("igc: Add support for taprio offloading") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Vinicius Costa Gomes authored
Configuring strict cycle mode in the controller forces more well behaved transmissions when taprio is offloaded. When set this strict_cycle and strict_end, transmission is not enabled if the whole packet cannot be completed before end of the Qbv cycle. Fixes: 82faa9b7 ("igc: Add support for ETF offloading") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Vinicius Costa Gomes authored
The I225 hardware has a limitation that packets can only be scheduled in the [0, cycle-time] interval. So, scheduling a packet to the start of the next cycle doesn't usually work. To overcome this, we use the Transmit Descriptor first flag to indicates that a packet should be the first packet (from a queue) in a cycle according to the section 7.5.2.9.3.4 The First Packet on Each QBV Cycle in Intel Discrete I225/6 User Manual. But this only works if there was any packet from that queue during the current cycle, to avoid this issue, we issue an empty packet if that's not the case. Also require one more descriptor to be available, to take into account the empty packet that might be issued. Test Setup: Talker: Use l2_tai to generate the launchtime into packet load. Listener: Use timedump.c to compute the delta between packet arrival and LaunchTime packet payload. Test Result: Before: 1666000610127300000,1666000610127300096,96,621273 1666000610127400000,1666000610127400192,192,621274 1666000610127500000,1666000610127500032,32,621275 1666000610127600000,1666000610127600128,128,621276 1666000610127700000,1666000610127700224,224,621277 1666000610127800000,1666000610127800064,64,621278 1666000610127900000,1666000610127900160,160,621279 1666000610128000000,1666000610128000000,0,621280 1666000610128100000,1666000610128100096,96,621281 1666000610128200000,1666000610128200192,192,621282 1666000610128300000,1666000610128300032,32,621283 1666000610128400000,1666000610128301056,-98944,621284 1666000610128500000,1666000610128302080,-197920,621285 1666000610128600000,1666000610128302848,-297152,621286 1666000610128700000,1666000610128303872,-396128,621287 1666000610128800000,1666000610128304896,-495104,621288 1666000610128900000,1666000610128305664,-594336,621289 1666000610129000000,1666000610128306688,-693312,621290 1666000610129100000,1666000610128307712,-792288,621291 1666000610129200000,1666000610128308480,-891520,621292 1666000610129300000,1666000610128309504,-990496,621293 1666000610129400000,1666000610128310528,-1089472,621294 1666000610129500000,1666000610128311296,-1188704,621295 1666000610129600000,1666000610128312320,-1287680,621296 1666000610129700000,1666000610128313344,-1386656,621297 1666000610129800000,1666000610128314112,-1485888,621298 1666000610129900000,1666000610128315136,-1584864,621299 1666000610130000000,1666000610128316160,-1683840,621300 1666000610130100000,1666000610128316928,-1783072,621301 1666000610130200000,1666000610128317952,-1882048,621302 1666000610130300000,1666000610128318976,-1981024,621303 1666000610130400000,1666000610128319744,-2080256,621304 1666000610130500000,1666000610128320768,-2179232,621305 1666000610130600000,1666000610128321792,-2278208,621306 1666000610130700000,1666000610128322816,-2377184,621307 1666000610130800000,1666000610128323584,-2476416,621308 1666000610130900000,1666000610128324608,-2575392,621309 1666000610131000000,1666000610128325632,-2674368,621310 1666000610131100000,1666000610128326400,-2773600,621311 1666000610131200000,1666000610128327424,-2872576,621312 1666000610131300000,1666000610128328448,-2971552,621313 1666000610131400000,1666000610128329216,-3070784,621314 1666000610131500000,1666000610131500032,32,621315 1666000610131600000,1666000610131600128,128,621316 1666000610131700000,1666000610131700224,224,621317 After: 1666073510646200000,1666073510646200064,64,2676462 1666073510646300000,1666073510646300160,160,2676463 1666073510646400000,1666073510646400256,256,2676464 1666073510646500000,1666073510646500096,96,2676465 1666073510646600000,1666073510646600192,192,2676466 1666073510646700000,1666073510646700032,32,2676467 1666073510646800000,1666073510646800128,128,2676468 1666073510646900000,1666073510646900224,224,2676469 1666073510647000000,1666073510647000064,64,2676470 1666073510647100000,1666073510647100160,160,2676471 1666073510647200000,1666073510647200256,256,2676472 1666073510647300000,1666073510647300096,96,2676473 1666073510647400000,1666073510647400192,192,2676474 1666073510647500000,1666073510647500032,32,2676475 1666073510647600000,1666073510647600128,128,2676476 1666073510647700000,1666073510647700224,224,2676477 1666073510647800000,1666073510647800064,64,2676478 1666073510647900000,1666073510647900160,160,2676479 1666073510648000000,1666073510648000000,0,2676480 1666073510648100000,1666073510648100096,96,2676481 1666073510648200000,1666073510648200192,192,2676482 1666073510648300000,1666073510648300032,32,2676483 1666073510648400000,1666073510648400128,128,2676484 1666073510648500000,1666073510648500224,224,2676485 1666073510648600000,1666073510648600064,64,2676486 1666073510648700000,1666073510648700160,160,2676487 1666073510648800000,1666073510648800000,0,2676488 1666073510648900000,1666073510648900096,96,2676489 1666073510649000000,1666073510649000192,192,2676490 1666073510649100000,1666073510649100032,32,2676491 1666073510649200000,1666073510649200128,128,2676492 1666073510649300000,1666073510649300224,224,2676493 1666073510649400000,1666073510649400064,64,2676494 1666073510649500000,1666073510649500160,160,2676495 1666073510649600000,1666073510649600000,0,2676496 1666073510649700000,1666073510649700096,96,2676497 1666073510649800000,1666073510649800192,192,2676498 1666073510649900000,1666073510649900032,32,2676499 1666073510650000000,1666073510650000128,128,2676500 Fixes: 82faa9b7 ("igc: Add support for ETF offloading") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Co-developed-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com> Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com> Co-developed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Signed-off-by: Malli C <mallikarjuna.chilakala@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Vladimir Oltean authored
In the blamed commit, it was not noticed that one implementation of chip->info->ops->phylink_get_caps(), called by mv88e6xxx_get_caps(), may access hardware registers, and in doing so, it takes the mv88e6xxx_reg_lock(). Namely, this is mv88e6352_phylink_get_caps(). This is a problem because mv88e6xxx_get_caps(), apart from being a top-level function (method invoked by dsa_switch_ops), is now also directly called from mv88e6xxx_setup_port(), which runs under the mv88e6xxx_reg_lock() taken by mv88e6xxx_setup(). Therefore, when running on mv88e6352, the reg_lock would be acquired a second time and the system would deadlock on driver probe. The things that mv88e6xxx_setup() can compete with in terms of register access with are the IRQ handlers and MDIO bus operations registered by mv88e6xxx_probe(). So there is a real need to acquire the register lock. The register lock can, in principle, be dropped and re-acquired pretty much at will within the driver, as long as no operations that involve waiting for indirect access to complete (essentially, callers of mv88e6xxx_smi_direct_wait() and mv88e6xxx_wait_mask()) are interrupted with the lock released. However, I would guess that in mv88e6xxx_setup(), the critical section is kept open for such a long time just in order to optimize away multiple lock/unlock operations on the registers. We could, in principle, drop the reg_lock right before the mv88e6xxx_setup_port() -> mv88e6xxx_get_caps() call, and re-acquire it immediately afterwards. But this would look ugly, because mv88e6xxx_setup_port() would release a lock which it didn't acquire, but the caller did. A cleaner solution to this issue comes from the observation that struct mv88e6xxxx_ops methods generally assume they are called with the reg_lock already acquired. Whereas mv88e6352_phylink_get_caps() is more the exception rather than the norm, in that it acquires the lock itself. Let's enforce the same locking pattern/convention for chip->info->ops->phylink_get_caps() as well, and make mv88e6xxx_get_caps(), the top-level function, acquire the register lock explicitly, for this one implementation that will access registers for port 4 to work properly. This means that mv88e6xxx_setup_port() will no longer call the top-level function, but the low-level mv88e6xxx_ops method which expects the correct calling context (register lock held). Compared to chip->info->ops->phylink_get_caps(), mv88e6xxx_get_caps() also fixes up the supported_interfaces bitmap for internal ports, since that can be done generically and does not require per-switch knowledge. That's code which will no longer execute, however mv88e6xxx_setup_port() doesn't need that. It just needs to look at the mac_capabilities bitmap. Fixes: cc1049cc ("net: dsa: mv88e6xxx: fix speed setting for CPU/DSA ports") Reported-by: Maksim Kiselev <bigunclemax@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Maksim Kiselev <bigunclemax@gmail.com> Link: https://lore.kernel.org/r/20221214110120.3368472-1-vladimir.oltean@nxp.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Biju Das authored
This patch fixes the error "ravb 11c20000.ethernet eth0: failed to switch device to config mode" during unbind. We are doing register access after pm_runtime_put_sync(). We usually do cleanup in reverse order of init. Currently in remove(), the "pm_runtime_put_sync" is not in reverse order. Probe reset_control_deassert(rstc); pm_runtime_enable(&pdev->dev); pm_runtime_get_sync(&pdev->dev); remove pm_runtime_put_sync(&pdev->dev); unregister_netdev(ndev); .. ravb_mdio_release(priv); pm_runtime_disable(&pdev->dev); Consider the call to unregister_netdev() unregister_netdev->unregister_netdevice_queue->rollback_registered_many that calls the below functions which access the registers after pm_runtime_put_sync() 1) ravb_get_stats 2) ravb_close Fixes: c156633f ("Renesas Ethernet AVB driver proper") Cc: stable@vger.kernel.org Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20221214105118.2495313-1-biju.das.jz@bp.renesas.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Gaosheng Cui authored
We should set the return value to -ENOMEM explicitly when create_singlethread_workqueue() fails in stmmac_dvr_probe(), otherwise we'll lose the error value. Fixes: a137f3f2 ("net: stmmac: fix possible memory leak in stmmac_dvr_probe()") Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20221214080117.3514615-1-cuigaosheng1@huawei.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Li Zetao authored
There is a memory leaks reported by kmemleak: unreferenced object 0xffff888116111000 (size 2048): comm "modprobe", pid 817, jiffies 4294759745 (age 76.502s) hex dump (first 32 bytes): 00 c4 0a 04 81 88 ff ff 08 10 11 16 81 88 ff ff ................ 08 10 11 16 81 88 ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff815bcd82>] kmalloc_trace+0x22/0x60 [<ffffffff827e20ee>] phy_device_create+0x4e/0x90 [<ffffffff827e6072>] get_phy_device+0xd2/0x220 [<ffffffff827e7844>] mdiobus_scan+0xa4/0x2e0 [<ffffffff827e8be2>] __mdiobus_register+0x482/0x8b0 [<ffffffffa01f5d24>] r6040_init_one+0x714/0xd2c [r6040] ... The problem occurs in probe process as follows: r6040_init_one: mdiobus_register mdiobus_scan <- alloc and register phy_device, the reference count of phy_device is 3 r6040_mii_probe phy_connect <- connect to the first phy_device, so the reference count of the first phy_device is 4, others are 3 register_netdev <- fault inject succeeded, goto error handling path // error handling path err_out_mdio_unregister: mdiobus_unregister(lp->mii_bus); err_out_mdio: mdiobus_free(lp->mii_bus); <- the reference count of the first phy_device is 1, it is not released and other phy_devices are released // similarly, the remove process also has the same problem The root cause is traced to the phy_device is not disconnected when removes one r6040 device in r6040_remove_one() or on error handling path after r6040_mii probed successfully. In r6040_mii_probe(), a net ethernet device is connected to the first PHY device of mii_bus, in order to notify the connected driver when the link status changes, which is the default behavior of the PHY infrastructure to handle everything. Therefore the phy_device should be disconnected when removes one r6040 device or on error handling path. Fix it by adding phy_disconnect() when removes one r6040 device or on error handling path after r6040_mii probed successfully. Fixes: 3831861b ("r6040: implement phylib") Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20221213125614.927754-1-lizetao1@huawei.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Kirill Tkhai authored
There is a race resulting in alive SOCK_SEQPACKET socket may change its state from TCP_ESTABLISHED to TCP_CLOSE: unix_release_sock(peer) unix_dgram_sendmsg(sk) sock_orphan(peer) sock_set_flag(peer, SOCK_DEAD) sock_alloc_send_pskb() if !(sk->sk_shutdown & SEND_SHUTDOWN) OK if sock_flag(peer, SOCK_DEAD) sk->sk_state = TCP_CLOSE sk->sk_shutdown = SHUTDOWN_MASK After that socket sk remains almost normal: it is able to connect, listen, accept and recvmsg, while it can't sendmsg. Since this is the only possibility for alive SOCK_SEQPACKET to change the state in such way, we should better fix this strange and potentially danger corner case. Note, that we will return EPIPE here like this is normally done in sock_alloc_send_pskb(). Originally used ECONNREFUSED looks strange, since it's strange to return a specific retval in dependence of race in kernel, when user can't affect on this. Also, move TCP_CLOSE assignment for SOCK_DGRAM sockets under state lock to fix race with unix_dgram_connect(): unix_dgram_connect(other) unix_dgram_sendmsg(sk) unix_peer(sk) = NULL unix_state_unlock(sk) unix_state_double_lock(sk, other) sk->sk_state = TCP_ESTABLISHED unix_peer(sk) = other unix_state_double_unlock(sk, other) sk->sk_state = TCP_CLOSED This patch fixes both of these races. Fixes: 83301b53 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too") Signed-off-by: Kirill Tkhai <tkhai@ya.ru> Link: https://lore.kernel.org/r/135fda25-22d5-837a-782b-ceee50e19844@ya.ruSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Toke Høiland-Jørgensen authored
This adds a simple test for inserting an XDP program into a cpumap that is "owned" by an XDP program that was loaded as PROG_TYPE_EXT (as libxdp does). Prior to the kernel fix this would fail because the map type ownership would be set to PROG_TYPE_EXT instead of being resolved to PROG_TYPE_XDP. v5: - Fix a few nits from Andrii, add his ACK v4: - Use skeletons for selftest v3: - Update comment to better explain the cause - Add Yonghong's ACK Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20221214230254.790066-2-toke@redhat.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Toke Høiland-Jørgensen authored
The bpf_prog_map_compatible() check makes sure that BPF program types are not mixed inside BPF map types that can contain programs (tail call maps, cpumaps and devmaps). It does this by setting the fields of the map->owner struct to the values of the first program being checked against, and rejecting any subsequent programs if the values don't match. One of the values being set in the map owner struct is the program type, and since the code did not resolve the prog type for fext programs, the map owner type would be set to PROG_TYPE_EXT and subsequent loading of programs of the target type into the map would fail. This bug is seen in particular for XDP programs that are loaded as PROG_TYPE_EXT using libxdp; these cannot insert programs into devmaps and cpumaps because the check fails as described above. Fix the bug by resolving the fext program type to its target program type as elsewhere in the verifier. v3: - Add Yonghong's ACK Fixes: f45d5b6c ("bpf: generalise tail call map compatibility check") Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20221214230254.790066-1-toke@redhat.comSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Minsuk Kang authored
Fix a slab-out-of-bounds read that occurs in nla_put() called from nfc_genl_send_target() when target->sensb_res_len, which is duplicated from an nfc_target in pn533, is too large as the nfc_target is not properly initialized and retains garbage values. Clear nfc_targets with memset() before they are used. Found by a modified version of syzkaller. BUG: KASAN: slab-out-of-bounds in nla_put Call Trace: memcpy nla_put nfc_genl_dump_targets genl_lock_dumpit netlink_dump __netlink_dump_start genl_family_rcv_msg_dumpit genl_rcv_msg netlink_rcv_skb genl_rcv netlink_unicast netlink_sendmsg sock_sendmsg ____sys_sendmsg ___sys_sendmsg __sys_sendmsg do_syscall_64 Fixes: 673088fb ("NFC: pn533: Send ATR_REQ directly for active device detection") Fixes: 361f3cb7 ("NFC: DEP link hook implementation for pn533") Signed-off-by: Minsuk Kang <linuxlovemin@yonsei.ac.kr> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20221214015139.119673-1-linuxlovemin@yonsei.ac.krSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Vladimir Oltean authored
Before enetc_clean_rx_ring_xdp() calls xdp_do_redirect(), each software BD in the RX ring between index orig_i and i can have one of 2 refcount values on its page. We are the owner of the current buffer that is being processed, so the refcount will be at least 1. If the current owner of the buffer at the diametrically opposed index in the RX ring (i.o.w, the other half of this page) has not yet called kfree(), this page's refcount could even be 2. enetc_page_reusable() in enetc_flip_rx_buff() tests for the page refcount against 1, and [ if it's 2 ] does not attempt to reuse it. But if enetc_flip_rx_buff() is put after the xdp_do_redirect() call, the page refcount can have one of 3 values. It can also be 0, if there is no owner of the other page half, and xdp_do_redirect() for this buffer ran so far that it triggered a flush of the devmap/cpumap bulk queue, and the consumers of those bulk queues also freed the buffer, all by the time xdp_do_redirect() returns the execution back to enetc. This is the reason why enetc_flip_rx_buff() is called before xdp_do_redirect(), but there is a big flaw with that reasoning: enetc_flip_rx_buff() will set rx_swbd->page = NULL on both sides of the enetc_page_reusable() branch, and if xdp_do_redirect() returns an error, we call enetc_xdp_free(), which does not deal gracefully with that. In fact, what happens is quite special. The page refcounts start as 1. enetc_flip_rx_buff() figures they're reusable, transfers these rx_swbd->page pointers to a different rx_swbd in enetc_reuse_page(), and bumps the refcount to 2. When xdp_do_redirect() later returns an error, we call the no-op enetc_xdp_free(), but we still haven't lost the reference to that page. A copy of it is still at rx_ring->next_to_alloc, but that has refcount 2 (and there are no concurrent owners of it in flight, to drop the refcount). What really kills the system is when we'll flip the rx_swbd->page the second time around. With an updated refcount of 2, the page will not be reusable and we'll really leak it. Then enetc_new_page() will have to allocate more pages, which will then eventually leak again on further errors from xdp_do_redirect(). The problem, summarized, is that we zeroize rx_swbd->page before we're completely done with it, and this makes it impossible for the error path to do something with it. Since the packet is potentially multi-buffer and therefore the rx_swbd->page is potentially an array, manual passing of the old pointers between enetc_flip_rx_buff() and enetc_xdp_free() is a bit difficult. For the sake of going with a simple solution, we accept the possibility of racing with xdp_do_redirect(), and we move the flip procedure to execute only on the redirect success path. By racing, I mean that the page may be deemed as not reusable by enetc (having a refcount of 0), but there will be no leak in that case, either. Once we accept that, we have something better to do with buffers on XDP_REDIRECT failure. Since we haven't performed half-page flipping yet, we won't, either (and this way, we can avoid enetc_xdp_free() completely, which gives the entire page to the slab allocator). Instead, we'll call enetc_xdp_drop(), which will recycle this half of the buffer back to the RX ring. Fixes: 9d2b68cc ("net: enetc: add support for XDP_REDIRECT") Suggested-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20221213001908.2347046-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 14 Dec, 2022 14 commits
-
-
Jiri Olsa authored
Hao Sun reported crash in dispatcher image [1]. Currently we don't have any sync between bpf_dispatcher_update and bpf_dispatcher_xdp_func, so following race is possible: cpu 0: cpu 1: bpf_prog_run_xdp ... bpf_dispatcher_xdp_func in image at offset 0x0 bpf_dispatcher_update update image at offset 0x800 bpf_dispatcher_update update image at offset 0x0 in image at offset 0x0 -> crash Fixing this by synchronizing dispatcher image update (which is done in bpf_dispatcher_update function) with bpf_dispatcher_xdp_func that reads and execute the dispatcher image. Calling synchronize_rcu after updating and installing new image ensures that readers leave old image before it's changed in the next dispatcher update. The update itself is locked with dispatcher's mutex. The bpf_prog_run_xdp is called under local_bh_disable and synchronize_rcu will wait for it to leave [2]. [1] https://lore.kernel.org/bpf/Y5SFho7ZYXr9ifRn@krava/T/#m00c29ece654bc9f332a17df493bbca33e702896c [2] https://lore.kernel.org/bpf/0B62D35A-E695-4B7A-A0D4-774767544C1A@gmail.com/T/#mff43e2c003ae99f4a38f353c7969be4c7162e877Reported-by: Hao Sun <sunhao.th@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20221214123542.1389719-1-jolsa@kernel.orgSigned-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-
Milan Landaverde authored
In [0], we added the ability to bpf_prog_attach LSM programs to cgroups, but in our validation to make sure the prog is meant to be attached to BPF_LSM_CGROUP, we return too early if the check fails. This results in lack of decrementing prog's refcnt (through bpf_prog_put) leaving the LSM program alive past the point of the expected lifecycle. This fix allows for the decrement to take place. [0] https://lore.kernel.org/all/20220628174314.1216643-4-sdf@google.com/ Fixes: 69fd337a ("bpf: per-cgroup lsm flavor") Signed-off-by: Milan Landaverde <milan@mdaverde.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20221213175714.31963-1-milan@mdaverde.com
-
Song Liu authored
BPF selftests require CONFIG_FUNCTION_ERROR_INJECTION to work. However, CONFIG_FUNCTION_ERROR_INJECTION is no longer 'y' by default after recent changes. As a result, we are seeing errors like the following from BPF CI: bpf_testmod_test_read() is not modifiable __x64_sys_setdomainname is not sleepable __x64_sys_getpgid is not sleepable Fix this by explicitly selecting CONFIG_FUNCTION_ERROR_INJECTION in the selftest config. Fixes: a4412fdd ("error-injection: Add prompt for function error injection") Reported-by: Daniel Müller <deso@posteo.net> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Müller <deso@posteo.net> Link: https://lore.kernel.org/bpf/20221213220500.3427947-1-song@kernel.org
-
Yonghong Song authored
Kernel test robot reported bpf selftest build failure when CONFIG_SMP is not set. The error message looks below: >> progs/rcu_read_lock.c:256:34: error: no member named 'last_wakee' in 'struct task_struct' last_wakee = task->real_parent->last_wakee; ~~~~~~~~~~~~~~~~~ ^ 1 error generated. When CONFIG_SMP is not set, the field 'last_wakee' is not available in struct 'task_struct'. Hence the above compilation failure. To fix the issue, let us choose another field 'group_leader' which is available regardless of CONFIG_SMP set or not. Fixes: fe147956 ("bpf/selftests: Add selftests for new task kfuncs") Fixes: 48671232 ("selftests/bpf: Add tests for bpf_rcu_read_lock()") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20221213012224.379581-1-yhs@fb.com
-
Donald Hunter authored
Improve the grammar of the function descriptions and highlight that the key is a socket fd. Fixes: f3212ad5 ("docs/bpf: Add documentation for BPF_MAP_TYPE_SK_STORAGE") Reported-by: Martin KaFai Lau <martin.lau@linux.dev> Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20221212101600.56026-1-donald.hunter@gmail.com
-
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski authored
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net 1) Fix NAT IPv6 flowtable hardware offload, from Qingfang DENG. 2) Add a safety check to IPVS socket option interface report a warning if unsupported command is seen, this. From Li Qiong. 3) Document SCTP conntrack timeouts, from Sriram Yagnaraman. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: conntrack: document sctp timeouts ipvs: add a 'default' case in do_ip_vs_set_ctl() netfilter: flowtable: really fix NAT IPv6 offload ==================== Link: https://lore.kernel.org/r/20221213140923.154594-1-pablo@netfilter.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jiri Slaby (SUSE) authored
Since gcc13, each member of an enum has the same type as the enum. And that is inherited from its members. Provided "REKEY_AFTER_MESSAGES = 1ULL << 60", the named type is unsigned long. This generates warnings with gcc-13: error: format '%d' expects argument of type 'int', but argument 6 has type 'long unsigned int' Cast those particular enum members to int when printing them. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36113 Cc: Martin Liska <mliska@suse.cz> Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://lore.kernel.org/all/20221213225208.3343692-2-Jason@zx2c4.com/Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Tony Nguyen authored
When a MAC address is not assigned to the VF, that portion of the message sent to the VF is not set. The memory, however, is allocated from the stack meaning that information may be leaked to the VM. Initialize the message buffer to 0 so that no information is passed to the VM in this case. Fixes: 6ddbc4cf ("igb: Indicate failure on vf reset for empty mac address") Reported-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20221212190031.3983342-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Yang Yingliang says: ==================== mISDN: don't call dev_kfree_skb/kfree_skb() under spin_lock_irqsave() It is not allowed to call kfree_skb() or consume_skb() from hardware interrupt context or with hardware interrupts being disabled. This pachset try to avoid calling dev_kfree_skb/kfree_skb()() under spin_lock_irqsave(). ==================== Link: https://lore.kernel.org/r/20221212084139.3277913-1-yangyingliang@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yang Yingliang authored
It is not allowed to call kfree_skb() or consume_skb() from hardware interrupt context or with hardware interrupts being disabled. skb_queue_purge() is called under spin_lock_irqsave() in handle_dmsg() and hfcm_l1callback(), kfree_skb() is called in them, to fix this, use skb_queue_splice_init() to move the dch->squeue to a free queue, also enqueue the tx_skb and rx_skb, at last calling __skb_queue_purge() to free the SKBs afer unlock. Fixes: af69fb3a ("Add mISDN HFC multiport driver") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yang Yingliang authored
It is not allowed to call kfree_skb() or consume_skb() from hardware interrupt context or with hardware interrupts being disabled. skb_queue_purge() is called under spin_lock_irqsave() in hfcpci_l2l1D(), kfree_skb() is called in it, to fix this, use skb_queue_splice_init() to move the dch->squeue to a free queue, also enqueue the tx_skb and rx_skb, at last calling __skb_queue_purge() to free the SKBs afer unlock. Fixes: 1700fe1a ("Add mISDN HFC PCI driver") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yang Yingliang authored
It is not allowed to call kfree_skb() or consume_skb() from hardware interrupt context or with hardware interrupts being disabled. It should use dev_kfree_skb_irq() or dev_consume_skb_irq() instead. The difference between them is free reason, dev_kfree_skb_irq() means the SKB is dropped in error and dev_consume_skb_irq() means the SKB is consumed in normal. skb_queue_purge() is called under spin_lock_irqsave() in hfcusb_l2l1D(), kfree_skb() is called in it, to fix this, use skb_queue_splice_init() to move the dch->squeue to a free queue, also enqueue the tx_skb and rx_skb, at last calling __skb_queue_purge() to free the SKBs afer unlock. In tx_iso_complete(), dev_kfree_skb() is called to consume the transmitted SKB, so replace it with dev_consume_skb_irq(). Fixes: 69f52adb ("mISDN: Add HFC USB driver") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Hangbin Liu says: ==================== Bonding: fix high prio not effect issue When a high prio link up, if there has current link, it will not do failover as we missed the check in link up event. Fix it in this patchset and add a prio option test case. ==================== Link: https://lore.kernel.org/all/20221212035647.1053865-1-liuhangbin@gmail.com/Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Liang Li authored
Add a test for bonding prio option. Here is the test result: ]# ./option_prio.sh TEST: prio_test (Test bonding option 'prio' with mode=1 monitor=arp_ip_target and primary_reselect=0) [ OK ] TEST: prio_test (Test bonding option 'prio' with mode=1 monitor=arp_ip_target and primary_reselect=1) [ OK ] TEST: prio_test (Test bonding option 'prio' with mode=1 monitor=arp_ip_target and primary_reselect=2) [ OK ] TEST: prio_test (Test bonding option 'prio' with mode=1 monitor=miimon and primary_reselect=0) [ OK ] TEST: prio_test (Test bonding option 'prio' with mode=1 monitor=miimon and primary_reselect=1) [ OK ] TEST: prio_test (Test bonding option 'prio' with mode=1 monitor=miimon and primary_reselect=2) [ OK ] TEST: prio_test (Test bonding option 'prio' with mode=5 monitor=miimon and primary_reselect=0) [ OK ] TEST: prio_test (Test bonding option 'prio' with mode=5 monitor=miimon and primary_reselect=1) [ OK ] TEST: prio_test (Test bonding option 'prio' with mode=5 monitor=miimon and primary_reselect=2) [ OK ] TEST: prio_test (Test bonding option 'prio' with mode=6 monitor=miimon and primary_reselect=0) [ OK ] TEST: prio_test (Test bonding option 'prio' with mode=6 monitor=miimon and primary_reselect=1) [ OK ] TEST: prio_test (Test bonding option 'prio' with mode=6 monitor=miimon and primary_reselect=2) [ OK ] Signed-off-by: Liang Li <liali@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-