- 24 Aug, 2023 17 commits
-
-
Thomas Weißschuh authored
Remove the necessity to modify skb_ext_total_length() when new extension types are added. Also reduces the line count a bit. With optimizations enabled the function is folded down to the same constant value as before during compilation. This has been validated on x86 with GCC 6.5.0 and 13.2.1. Also a similar construct has been validated on godbolt.org with GCC 5.1. In any case the compiler has to be able to evaluate the construct at compile-time for the BUILD_BUG_ON() in skb_extensions_init(). Even if not evaluated at compile-time this function would only ever be executed once at run-time, so the overhead would be very minuscule. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230823-skb_ext-simplify-v2-1-66e26cd66860@weissschuh.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski authored
Cross-merge networking fixes after downstream PR. Conflicts: include/net/inet_sock.h f866fbc8 ("ipv4: fix data-races around inet->inet_id") c274af22 ("inet: introduce inet->inet_flags") https://lore.kernel.org/all/679ddff6-db6e-4ff6-b177-574e90d0103d@tessares.net/ Adjacent changes: drivers/net/bonding/bond_alb.c e74216b8 ("bonding: fix macvlan over alb bond support") f11e5bd1 ("bonding: support balance-alb with openvswitch") drivers/net/ethernet/broadcom/bgmac.c d6499f0b ("net: bgmac: Return PTR_ERR() for fixed_phy_register()") 23a14488 ("net: bgmac: Fix return value check for fixed_phy_register()") drivers/net/ethernet/broadcom/genet/bcmmii.c 32bbe64a ("net: bcmgenet: Fix return value check for fixed_phy_register()") acf50d1a ("net: bcmgenet: Return PTR_ERR() for fixed_phy_register()") net/sctp/socket.c f866fbc8 ("ipv4: fix data-races around inet->inet_id") b09bde5c ("inet: move inet->mc_loop to inet->inet_frags") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from Paolo Abeni: "Including fixes from wifi, can and netfilter. Fixes to fixes: - nf_tables: - GC transaction race with abort path - defer gc run if previous batch is still pending Previous releases - regressions: - ipv4: fix data-races around inet->inet_id - phy: fix deadlocking in phy_error() invocation - mdio: fix C45 read/write protocol - ipvlan: fix a reference count leak warning in ipvlan_ns_exit() - ice: fix NULL pointer deref during VF reset - i40e: fix potential NULL pointer dereferencing of pf->vf in i40e_sync_vsi_filters() - tg3: use slab_build_skb() when needed - mtk_eth_soc: fix NULL pointer on hw reset Previous releases - always broken: - core: validate veth and vxcan peer ifindexes - sched: fix a qdisc modification with ambiguous command request - devlink: add missing unregister linecard notification - wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning - batman: - do not get eth header before batadv_check_management_packet - fix batadv_v_ogm_aggr_send memory leak - bonding: fix macvlan over alb bond support - mlxsw: set time stamp fields also when its type is MIRROR_UTC" * tag 'net-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits) selftests: bonding: add macvlan over bond testing selftest: bond: add new topo bond_topo_2d1c.sh bonding: fix macvlan over alb bond support rtnetlink: Reject negative ifindexes in RTM_NEWLINK netfilter: nf_tables: defer gc run if previous batch is still pending netfilter: nf_tables: fix out of memory error handling netfilter: nf_tables: use correct lock to protect gc_list netfilter: nf_tables: GC transaction race with abort path netfilter: nf_tables: flush pending destroy work before netlink notifier netfilter: nf_tables: validate all pending tables ibmveth: Use dcbf rather than dcbfl i40e: fix potential NULL pointer dereferencing of pf->vf i40e_sync_vsi_filters() net/sched: fix a qdisc modification with ambiguous command request igc: Fix the typo in the PTM Control macro batman-adv: Hold rtnl lock during MTU update via netlink igb: Avoid starting unnecessary workqueues can: raw: add missing refcount for memory leak fix can: isotp: fix support for transmission of SF without flow control bnx2x: new flag for track HW resource allocation sfc: allocate a big enough SKB for loopback selftest packet ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxPaolo Abeni authored
Saeed Mahameed says: ==================== mlx5-updates-2023-08-22 1) Patches #1..#13 From Jiri: The goal of this patchset is to make the SF code cleaner. Benefit from previously introduced devlink_port struct containerization to avoid unnecessary lookups in devlink port ops. Also, benefit from the devlink locking changes and avoid unnecessary reference counting. 2) Patches #14,#15: Add ability to configure proto both UDP and TCP selectors in RX and TX directions. * tag 'mlx5-updates-2023-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Support IPsec upper TCP protocol selector net/mlx5e: Support IPsec upper protocol selector field offload for RX net/mlx5: Store vport in struct mlx5_devlink_port and use it in port ops net/mlx5: Check vhca_resource_manager capability in each op and add extack msg net/mlx5: Relax mlx5_devlink_eswitch_get() return value checking net/mlx5: Return -EOPNOTSUPP in mlx5_devlink_port_fn_migratable_set() directly net/mlx5: Reduce number of vport lookups passing vport pointer instead of index net/mlx5: Embed struct devlink_port into driver structure net/mlx5: Don't register ops for non-PF/VF/SF port and avoid checks in ops net/mlx5: Remove no longer used mlx5_esw_offloads_sf_vport_enable/disable() net/mlx5: Introduce mlx5_eswitch_load/unload_sf_vport() and use it from SF code net/mlx5: Allow mlx5_esw_offloads_devlink_port_register() to register SFs net/mlx5: Push devlink port PF/VF init/cleanup calls out of devlink_port_register/unregister() net/mlx5: Push out SF devlink port init and cleanup code to separate helpers net/mlx5: Rework devlink port alloc/free into init/cleanup ==================== Link: https://lore.kernel.org/all/20230823051012.162483-1-saeed@kernel.org/Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nfPaolo Abeni authored
Florian Westphal says: ==================== netfilter updates for net This PR contains nf_tables updates for your *net* tree. First patch fixes table validation, I broke this in 6.4 when tracking validation state per table, reported by Pablo, fixup from myself. Second patch makes sure objects waiting for memory release have been released, this was broken in 6.1, patch from Pablo Neira Ayuso. Patch three is a fix-for-fix from previous PR: In case a transaction gets aborted, gc sequence counter needs to be incremented so pending gc requests are invalidated, from Pablo. Same for patch 4: gc list needs to use gc list lock, not destroy lock, also from Pablo. Patch 5 fixes a UaF in a set backend, but this should only occur when failslab is enabled for GFP_KERNEL allocations, broken since feature was added in 5.6, from myself. Patch 6 fixes a double-free bug that was also added via previous PR: We must not schedule gc work if the previous batch is still queued. netfilter pull request 2023-08-23 * tag 'nf-23-08-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: defer gc run if previous batch is still pending netfilter: nf_tables: fix out of memory error handling netfilter: nf_tables: use correct lock to protect gc_list netfilter: nf_tables: GC transaction race with abort path netfilter: nf_tables: flush pending destroy work before netlink notifier netfilter: nf_tables: validate all pending tables ==================== Link: https://lore.kernel.org/r/20230823152711.15279-1-fw@strlen.deSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Paolo Abeni authored
Hangbin Liu says: ==================== fix macvlan over alb bond support Currently, the macvlan over alb bond is broken after commit 14af9963 ("bonding: Support macvlans on top of tlb/rlb mode bonds"). Fix this and add relate tests. ==================== Link: https://lore.kernel.org/r/20230823071907.3027782-1-liuhangbin@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Hangbin Liu authored
Add a macvlan over bonding test with mode active-backup, balance-tlb and balance-alb. ]# ./bond_macvlan.sh TEST: active-backup: IPv4: client->server [ OK ] TEST: active-backup: IPv6: client->server [ OK ] TEST: active-backup: IPv4: client->macvlan_1 [ OK ] TEST: active-backup: IPv6: client->macvlan_1 [ OK ] TEST: active-backup: IPv4: client->macvlan_2 [ OK ] TEST: active-backup: IPv6: client->macvlan_2 [ OK ] TEST: active-backup: IPv4: macvlan_1->macvlan_2 [ OK ] TEST: active-backup: IPv6: macvlan_1->macvlan_2 [ OK ] TEST: active-backup: IPv4: server->client [ OK ] TEST: active-backup: IPv6: server->client [ OK ] TEST: active-backup: IPv4: macvlan_1->client [ OK ] TEST: active-backup: IPv6: macvlan_1->client [ OK ] TEST: active-backup: IPv4: macvlan_2->client [ OK ] TEST: active-backup: IPv6: macvlan_2->client [ OK ] TEST: active-backup: IPv4: macvlan_2->macvlan_2 [ OK ] TEST: active-backup: IPv6: macvlan_2->macvlan_2 [ OK ] [...] TEST: balance-alb: IPv4: client->server [ OK ] TEST: balance-alb: IPv6: client->server [ OK ] TEST: balance-alb: IPv4: client->macvlan_1 [ OK ] TEST: balance-alb: IPv6: client->macvlan_1 [ OK ] TEST: balance-alb: IPv4: client->macvlan_2 [ OK ] TEST: balance-alb: IPv6: client->macvlan_2 [ OK ] TEST: balance-alb: IPv4: macvlan_1->macvlan_2 [ OK ] TEST: balance-alb: IPv6: macvlan_1->macvlan_2 [ OK ] TEST: balance-alb: IPv4: server->client [ OK ] TEST: balance-alb: IPv6: server->client [ OK ] TEST: balance-alb: IPv4: macvlan_1->client [ OK ] TEST: balance-alb: IPv6: macvlan_1->client [ OK ] TEST: balance-alb: IPv4: macvlan_2->client [ OK ] TEST: balance-alb: IPv6: macvlan_2->client [ OK ] TEST: balance-alb: IPv4: macvlan_2->macvlan_2 [ OK ] TEST: balance-alb: IPv6: macvlan_2->macvlan_2 [ OK ] Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Hangbin Liu authored
Add a new testing topo bond_topo_2d1c.sh which is used more commonly. Make bond_topo_3d1c.sh just source bond_topo_2d1c.sh and add the extra link. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Hangbin Liu authored
The commit 14af9963 ("bonding: Support macvlans on top of tlb/rlb mode bonds") aims to enable the use of macvlans on top of rlb bond mode. However, the current rlb bond mode only handles ARP packets to update remote neighbor entries. This causes an issue when a macvlan is on top of the bond, and remote devices send packets to the macvlan using the bond's MAC address as the destination. After delivering the packets to the macvlan, the macvlan will rejects them as the MAC address is incorrect. Consequently, this commit makes macvlan over bond non-functional. To address this problem, one potential solution is to check for the presence of a macvlan port on the bond device using netif_is_macvlan_port(bond->dev) and return NULL in the rlb_arp_xmit() function. However, this approach doesn't fully resolve the situation when a VLAN exists between the bond and macvlan. So let's just do a partial revert for commit 14af9963 in rlb_arp_xmit(). As the comment said, Don't modify or load balance ARPs that do not originate locally. Fixes: 14af9963 ("bonding: Support macvlans on top of tlb/rlb mode bonds") Reported-by: susan.zheng@veritas.com Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2117816Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Ido Schimmel authored
Negative ifindexes are illegal, but the kernel does not validate the ifindex in the ancillary header of RTM_NEWLINK messages, resulting in the kernel generating a warning [1] when such an ifindex is specified. Fix by rejecting negative ifindexes. [1] WARNING: CPU: 0 PID: 5031 at net/core/dev.c:9593 dev_index_reserve+0x1a2/0x1c0 net/core/dev.c:9593 [...] Call Trace: <TASK> register_netdevice+0x69a/0x1490 net/core/dev.c:10081 br_dev_newlink+0x27/0x110 net/bridge/br_netlink.c:1552 rtnl_newlink_create net/core/rtnetlink.c:3471 [inline] __rtnl_newlink+0x115e/0x18c0 net/core/rtnetlink.c:3688 rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3701 rtnetlink_rcv_msg+0x439/0xd30 net/core/rtnetlink.c:6427 netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2545 netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline] netlink_unicast+0x536/0x810 net/netlink/af_netlink.c:1368 netlink_sendmsg+0x93c/0xe40 net/netlink/af_netlink.c:1910 sock_sendmsg_nosec net/socket.c:728 [inline] sock_sendmsg+0xd9/0x180 net/socket.c:751 ____sys_sendmsg+0x6ac/0x940 net/socket.c:2538 ___sys_sendmsg+0x135/0x1d0 net/socket.c:2592 __sys_sendmsg+0x117/0x1e0 net/socket.c:2621 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 38f7b870 ("[RTNETLINK]: Link creation API") Reported-by: syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230823064348.2252280-1-idosch@nvidia.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Jakub Kicinski authored
Daniel Golle says: ==================== net: ethernet: mtk_eth_soc: improve support for MT7988 This series fixes and completes commit 445eb644 ("net: ethernet: mtk_eth_soc: add basic support for MT7988 SoC") and also adds support for using the in-SoC SRAM to previous MT7986 and MT7981 SoCs. ==================== Link: https://lore.kernel.org/r/cover.1692721443.git.daniel@makrotopia.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Daniel Golle authored
Systems having 4 GiB of RAM and more require DMA addressing beyond the current 32-bit limit. Starting from MT7988 the hardware now supports 36-bit DMA addressing, let's use that new capability in the driver to avoid running into swiotlb on systems with 4 GiB of RAM or more. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://lore.kernel.org/r/95b919c98876c9e49761e44662e7c937479eecb8.1692721443.git.daniel@makrotopia.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Daniel Golle authored
MT7981, MT7986 and MT7988 come with in-SoC SRAM dedicated for Ethernet DMA rings. Support using the SRAM without breaking existing device tree bindings, ie. only new SoC starting from MT7988 will have the SRAM declared as additional resource in device tree. For MT7981 and MT7986 an offset on top of the main I/O base is used. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://lore.kernel.org/r/e45e0f230c63ad58869e8fe35b95a2fb8925b625.1692721443.git.daniel@makrotopia.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Daniel Golle authored
Add bits needed to reset the frame engine on MT7988. Fixes: 445eb644 ("net: ethernet: mtk_eth_soc: add basic support for MT7988 SoC") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://lore.kernel.org/r/89b6c38380e7a3800c1362aa7575600717bc7543.1692721443.git.daniel@makrotopia.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Daniel Golle authored
More register macros need to be adjusted for the 3rd GMAC on MT7988. Account for added bit in SYSCFG0_SGMII_MASK. Fixes: 445eb644 ("net: ethernet: mtk_eth_soc: add basic support for MT7988 SoC") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/1c8da012e2ca80939906d85f314138c552139f0f.1692721443.git.daniel@makrotopia.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Wei Fang authored
As we already added the exception tracing for XDP_TX, I think it is necessary to add the exception tracing for other XDP actions, such as XDP_REDIRECT, XDP_ABORTED and unknown error actions. Signed-off-by: Wei Fang <wei.fang@nxp.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230822065255.606739-1-wei.fang@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Yu Liao authored
Use the standard error pointer macro to shorten the code and simplify. Signed-off-by: Yu Liao <liaoyu15@huawei.com> Link: https://lore.kernel.org/r/20230822021455.205101-2-liaoyu15@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 23 Aug, 2023 23 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull ACPI fix from Rafael Wysocki: "Make an existing ACPI IRQ override quirk for PCSpecialist Elimina Pro 16 M work as intended (Hans de Goede)" * tag 'acpi-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: resource: Fix IRQ override quirk for PCSpecialist Elimina Pro 16 M
-
Linus Torvalds authored
Merge tag 'platform-drivers-x86-v6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: "Final set of three small fixes for 6.5" * tag 'platform-drivers-x86-v6.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/mellanox: Fix mlxbf-tmfifo not handling all virtio CONSOLE notifications platform/x86: ideapad-laptop: Add support for new hotkeys found on ThinkBook 14s Yoga ITL platform/x86: lenovo-ymc: Add Lenovo Yoga 7 14ACN6 to ec_trigger_quirk_dmi_table
-
Shih-Yi Chen authored
rshim console does not show all entries of dmesg. Fixed by setting MLXBF_TM_TX_LWM_IRQ for every CONSOLE notification. Signed-off-by: Shih-Yi Chen <shihyic@nvidia.com> Reviewed-by: Liming Sung <limings@nvidia.com> Reviewed-by: David Thompson <davthompson@nvidia.com> Link: https://lore.kernel.org/r/20230821150627.26075-1-shihyic@nvidia.comReviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
-
Florian Westphal authored
Don't queue more gc work, else we may queue the same elements multiple times. If an element is flagged as dead, this can mean that either the previous gc request was invalidated/discarded by a transaction or that the previous request is still pending in the system work queue. The latter will happen if the gc interval is set to a very low value, e.g. 1ms, and system work queue is backlogged. The sets refcount is 1 if no previous gc requeusts are queued, so add a helper for this and skip gc run if old requests are pending. Add a helper for this and skip the gc run in this case. Fixes: f6c383b8 ("netfilter: nf_tables: adapt set backend to use GC transaction API") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
-
Florian Westphal authored
Several instances of pipapo_resize() don't propagate allocation failures, this causes a crash when fault injection is enabled for gfp_kernel slabs. Fixes: 3c4287f6 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
-
Pablo Neira Ayuso authored
Use nf_tables_gc_list_lock spinlock, not nf_tables_destroy_list_lock to protect the gc list. Fixes: 5f68718b ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
-
Pablo Neira Ayuso authored
Abort path is missing a synchronization point with GC transactions. Add GC sequence number hence any GC transaction losing race will be discarded. Fixes: 5f68718b ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
-
Pablo Neira Ayuso authored
Destroy work waits for the RCU grace period then it releases the objects with no mutex held. All releases objects follow this path for transactions, therefore, order is guaranteed and references to top-level objects in the hierarchy remain valid. However, netlink notifier might interfer with pending destroy work. rcu_barrier() is not correct because objects are not release via RCU callback. Flush destroy work before releasing objects from netlink notifier path. Fixes: d4bc8271 ("netfilter: nf_tables: netlink notifier might race to release objects") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
-
Florian Westphal authored
We have to validate all tables in the transaction that are in VALIDATE_DO state, the blamed commit below did not move the break statement to its right location so we only validate one table. Moreover, we can't init table->validate to _SKIP when a table object is allocated. If we do, then if a transcaction creates a new table and then fails the transaction, nfnetlink will loop and nft will hang until user cancels the command. Add back the pernet state as a place to stash the last state encountered. This is either _DO (we hit an error during commit validation) or _SKIP (transaction passed all checks). Fixes: 00c320f9 ("netfilter: nf_tables: make validation state per table") Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
-
Michael Ellerman authored
When building for power4, newer binutils don't recognise the "dcbfl" extended mnemonic. dcbfl RA, RB is equivalent to dcbf RA, RB, 1. Switch to "dcbf" to avoid the build error. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrii Staikov authored
Add check for pf->vf not being NULL before dereferencing pf->vf[vsi->vf_id] in updating VSI filter sync. Add a similar check before dereferencing !pf->vf[vsi->vf_id].trusted in the condition for clearing promisc mode bit. Fixes: c87c938f ("i40e: Add VF VLAN pruning") Signed-off-by: Andrii Staikov <andrii.staikov@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
All callers of build_skb() (*) in bnxt are in NAPI context. The budget checking is somewhat convoluted because in the shared completion queue cases Rx packets are discarded by netpoll by forcing an error (E). But that happens before skb allocation. Only a call chain starting at __bnxt_poll_work() can lead to an skb allocation and it checks budget (b). * bnxt_rx_multi_page_skb * bnxt_rx_skb ` bp->rx_skb_func * bnxt_tpa_end ` bnxt_rx_pkt E bnxt_force_rx_discard E bnxt_poll_nitroa0 b __bnxt_poll_work Use napi_build_skb() to take advantage of the skb cache. In iperf tests with HW-GRO enabled it barely makes a difference but in cases where HW-GRO is not as effective (or disabled) it can give even a >10% boost (20.7Gbps -> 23.1Gbps). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jamal Hadi Salim authored
When replacing an existing root qdisc, with one that is of the same kind, the request boils down to essentially a parameterization change i.e not one that requires allocation and grafting of a new qdisc. syzbot was able to create a scenario which resulted in a taprio qdisc replacing an existing taprio qdisc with a combination of NLM_F_CREATE, NLM_F_REPLACE and NLM_F_EXCL leading to create and graft scenario. The fix ensures that only when the qdisc kinds are different that we should allow a create and graft, otherwise it goes into the "change" codepath. While at it, fix the code and comments to improve readability. While syzbot was able to create the issue, it did not zone on the root cause. Analysis from Vladimir Oltean <vladimir.oltean@nxp.com> helped narrow it down. v1->V2 changes: - remove "inline" function definition (Vladmir) - remove extrenous braces in branches (Vladmir) - change inline function names (Pedro) - Run tdc tests (Victor) v2->v3 changes: - dont break else/if (Simon) Fixes: 1da177e4 ("Linux-2.6.12-rc2") Reported-by: syzbot+a3618a167af2021433cd@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/20230816225759.g25x76kmgzya2gei@skbuf/T/Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexis Lothoré authored
Remove debug logs in port vlan management, since there are already multiple tracepoints defined for those operations in DSA Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jordan Rife authored
BPF programs that run on connect can rewrite the connect address. For the connect system call this isn't a problem, because a copy of the address is made when it is moved into kernel space. However, kernel_connect simply passes through the address it is given, so the caller may observe its address value unexpectedly change. A practical example where this is problematic is where NFS is combined with a system such as Cilium which implements BPF-based load balancing. A common pattern in software-defined storage systems is to have an NFS mount that connects to a persistent virtual IP which in turn maps to an ephemeral server IP. This is usually done to achieve high availability: if your server goes down you can quickly spin up a replacement and remap the virtual IP to that endpoint. With BPF-based load balancing, mounts will forget the virtual IP address when the address rewrite occurs because a pointer to the only copy of that address is passed down the stack. Server failover then breaks, because clients have forgotten the virtual IP address. Reconnects fail and mounts remain broken. This patch was tested by setting up a scenario like this and ensuring that NFS reconnects worked after applying the patch. Signed-off-by: Jordan Rife <jrife@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Feng Liu authored
The virtio_net driver currently deals with different versions and types of virtio net headers, such as virtio_net_hdr_mrg_rxbuf, virtio_net_hdr_v1_hash, etc. Due to these variations, the code relies on multiple type casts to convert memory between different structures, potentially leading to bugs when there are changes in these structures. Introduces the "struct skb_vnet_common_hdr" as a unifying header structure using a union. With this approach, various virtio net header structures can be converted by accessing different members of this structure, thus eliminating the need for type casting and reducing the risk of potential bugs. For example following code: static struct sk_buff *page_to_skb(struct virtnet_info *vi, struct receive_queue *rq, struct page *page, unsigned int offset, unsigned int len, unsigned int truesize, unsigned int headroom) { [...] struct virtio_net_hdr_mrg_rxbuf *hdr; [...] hdr_len = vi->hdr_len; [...] ok: hdr = skb_vnet_hdr(skb); memcpy(hdr, hdr_p, hdr_len); [...] } When VIRTIO_NET_F_HASH_REPORT feature is enabled, hdr_len = 20 But the sizeof(*hdr) is 12, memcpy(hdr, hdr_p, hdr_len); will copy 20 bytes to the hdr, which make a potential risk of bug. And this risk can be avoided by introducing struct skb_vnet_common_hdr. Change log v1->v2 feedback from Willem de Bruijn <willemdebruijn.kernel@gmail.com> feedback from Simon Horman <horms@kernel.org> 1. change to use net-next tree. 2. move skb_vnet_common_hdr inside kernel file instead of the UAPI header. v2->v3 feedback from Willem de Bruijn <willemdebruijn.kernel@gmail.com> 1. fix typo in commit message. 2. add original struct virtio_net_hdr into union 3. remove virtio_net_hdr_mrg_rxbuf variable in receive_buf; Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jinjie Ruan authored
Convert list_for_each() to list_for_each_entry() where applicable. No functional changed. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Petr Pavlu says: ==================== Convert mlx4 to use auxiliary bus This series converts the mlx4 drivers to use auxiliary bus, similarly to how mlx5 was converted [1]. The first 6 patches are preparatory changes, the remaining 4 are the final conversion. Initial motivation for this change was to address a problem related to loading mlx4_en/mlx4_ib by mlx4_core using request_module_nowait(). When doing such a load in initrd, the operation is asynchronous to any init control and can get unexpectedly affected/interrupted by an eventual root switch. Using an auxiliary bus leaves these module loads to udevd which better integrates with systemd processing. [2] General benefit is to get rid of custom interface logic and instead use a common facility available for this task. An obvious risk is that some new bug is introduced by the conversion. Leon Romanovsky was kind enough to check for me that the series passes their verification tests. Changes since v2 [3]: * Use 'void *' as the event param of mlx4_dispatch_event(). Changes since v1 [4]: * Fix a missing definition of the err variable in mlx4_en_add(). * Remove not needed comments about the event type in mlx4_en_event() and mlx4_ib_event(). [1] https://lore.kernel.org/netdev/20201101201542.2027568-1-leon@kernel.org/ [2] https://lore.kernel.org/netdev/0a361ac2-c6bd-2b18-4841-b1b991f0635e@suse.com/ [3] https://lore.kernel.org/netdev/20230813145127.10653-1-petr.pavlu@suse.com/ [4] https://lore.kernel.org/netdev/20230804150527.6117-1-petr.pavlu@suse.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Pavlu authored
After the conversion to use the auxiliary bus, the custom device management is not needed anymore and can be deleted. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Pavlu authored
Use the auxiliary bus to perform device management of the infiniband part of the mlx4 driver. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Pavlu authored
Use the auxiliary bus to perform device management of the ethernet part of the mlx4 driver. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Pavlu authored
Add an auxiliary virtual bus to model the mlx4 driver structure. The code is added along the current custom device management logic. Subsequent patches switch mlx4_en and mlx4_ib to the auxiliary bus and the old interface is then removed. Structure mlx4_priv gains a new adev dynamic array to keep track of its auxiliary devices. Access to the array is protected by the global mlx4_intf mutex. Functions mlx4_register_device() and mlx4_unregister_device() are updated to expose auxiliary devices on the bus in order to load mlx4_en and/or mlx4_ib. Functions mlx4_register_auxiliary_driver() and mlx4_unregister_auxiliary_driver() are added to substitute mlx4_register_interface() and mlx4_unregister_interface(), respectively. Function mlx4_do_bond() is adjusted to walk over the adev array and re-adds a specific auxiliary device if its driver sets the MLX4_INTFF_BONDING flag. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Petr Pavlu authored
The mlx4_core driver has a logic that allows a sub-driver to set the MLX4_INTFF_BONDING flag which then causes that function mlx4_do_bond() asks the sub-driver to fully re-probe a device when its bonding configuration changes. Performing this operation is disallowed in mlx4_register_interface() when it is detected that any mlx4 device is multifunction (SRIOV). The code then resets MLX4_INTFF_BONDING in the driver flags. Move this check directly into mlx4_do_bond(). It provides a better separation as mlx4_core no longer directly modifies the sub-driver flags and it will allow to get rid of explicitly keeping track of all mlx4 devices by the intf.c code when it is switched to an auxiliary bus. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-