- 10 Apr, 2024 11 commits
-
-
Jianbo Liu authored
The filter counter is updated under the protection of cb_lock in the cited commit. While waiting for the lock, it's possible the filter is being deleted by other thread, and thus causes UAF when dump it. Fix this issue by moving tcf_block_filter_cnt_update() after tfilter_put(). ================================================================== BUG: KASAN: slab-use-after-free in fl_dump_key+0x1d3e/0x20d0 [cls_flower] Read of size 4 at addr ffff88814f864000 by task tc/2973 CPU: 7 PID: 2973 Comm: tc Not tainted 6.9.0-rc2_for_upstream_debug_2024_04_02_12_41 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x7e/0xc0 print_report+0xc1/0x600 ? __virt_addr_valid+0x1cf/0x390 ? fl_dump_key+0x1d3e/0x20d0 [cls_flower] ? fl_dump_key+0x1d3e/0x20d0 [cls_flower] kasan_report+0xb9/0xf0 ? fl_dump_key+0x1d3e/0x20d0 [cls_flower] fl_dump_key+0x1d3e/0x20d0 [cls_flower] ? lock_acquire+0x1c2/0x530 ? fl_dump+0x172/0x5c0 [cls_flower] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? fl_dump_key_options.part.0+0x10f0/0x10f0 [cls_flower] ? do_raw_spin_lock+0x12d/0x270 ? spin_bug+0x1d0/0x1d0 fl_dump+0x21d/0x5c0 [cls_flower] ? fl_tmplt_dump+0x1f0/0x1f0 [cls_flower] ? nla_put+0x15f/0x1c0 tcf_fill_node+0x51b/0x9a0 ? tc_skb_ext_tc_enable+0x150/0x150 ? __alloc_skb+0x17b/0x310 ? __build_skb_around+0x340/0x340 ? down_write+0x1b0/0x1e0 tfilter_notify+0x1a5/0x390 ? fl_terse_dump+0x400/0x400 [cls_flower] tc_new_tfilter+0x963/0x2170 ? tc_del_tfilter+0x1490/0x1490 ? print_usage_bug.part.0+0x670/0x670 ? lock_downgrade+0x680/0x680 ? security_capable+0x51/0x90 ? tc_del_tfilter+0x1490/0x1490 rtnetlink_rcv_msg+0x75e/0xac0 ? if_nlmsg_stats_size+0x4c0/0x4c0 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? __netlink_lookup+0x35e/0x6e0 netlink_rcv_skb+0x12c/0x360 ? if_nlmsg_stats_size+0x4c0/0x4c0 ? netlink_ack+0x15e0/0x15e0 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? netlink_deliver_tap+0xcd/0xa60 ? netlink_deliver_tap+0xcd/0xa60 ? netlink_deliver_tap+0x1c9/0xa60 netlink_unicast+0x43e/0x700 ? netlink_attachskb+0x750/0x750 ? lock_acquire+0x1c2/0x530 ? __might_fault+0xbb/0x170 netlink_sendmsg+0x749/0xc10 ? netlink_unicast+0x700/0x700 ? __might_fault+0xbb/0x170 ? netlink_unicast+0x700/0x700 __sock_sendmsg+0xc5/0x190 ____sys_sendmsg+0x534/0x6b0 ? import_iovec+0x7/0x10 ? kernel_sendmsg+0x30/0x30 ? __copy_msghdr+0x3c0/0x3c0 ? entry_SYSCALL_64_after_hwframe+0x46/0x4e ? lock_acquire+0x1c2/0x530 ? __virt_addr_valid+0x116/0x390 ___sys_sendmsg+0xeb/0x170 ? __virt_addr_valid+0x1ca/0x390 ? copy_msghdr_from_user+0x110/0x110 ? __delete_object+0xb8/0x100 ? __virt_addr_valid+0x1cf/0x390 ? do_sys_openat2+0x102/0x150 ? lockdep_hardirqs_on_prepare+0x284/0x400 ? do_sys_openat2+0x102/0x150 ? __fget_light+0x53/0x1d0 ? sockfd_lookup_light+0x1a/0x150 __sys_sendmsg+0xb5/0x140 ? __sys_sendmsg_sock+0x20/0x20 ? lock_downgrade+0x680/0x680 do_syscall_64+0x70/0x140 entry_SYSCALL_64_after_hwframe+0x46/0x4e RIP: 0033:0x7f98e3713367 Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RSP: 002b:00007ffc74a64608 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000000000047eae0 RCX: 00007f98e3713367 RDX: 0000000000000000 RSI: 00007ffc74a64670 RDI: 0000000000000003 RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f98e360c5e8 R11: 0000000000000246 R12: 00007ffc74a6a508 R13: 00000000660d518d R14: 0000000000484a80 R15: 00007ffc74a6a50b </TASK> Allocated by task 2973: kasan_save_stack+0x20/0x40 kasan_save_track+0x10/0x30 __kasan_kmalloc+0x77/0x90 fl_change+0x27a6/0x4540 [cls_flower] tc_new_tfilter+0x879/0x2170 rtnetlink_rcv_msg+0x75e/0xac0 netlink_rcv_skb+0x12c/0x360 netlink_unicast+0x43e/0x700 netlink_sendmsg+0x749/0xc10 __sock_sendmsg+0xc5/0x190 ____sys_sendmsg+0x534/0x6b0 ___sys_sendmsg+0xeb/0x170 __sys_sendmsg+0xb5/0x140 do_syscall_64+0x70/0x140 entry_SYSCALL_64_after_hwframe+0x46/0x4e Freed by task 283: kasan_save_stack+0x20/0x40 kasan_save_track+0x10/0x30 kasan_save_free_info+0x37/0x50 poison_slab_object+0x105/0x190 __kasan_slab_free+0x11/0x30 kfree+0x111/0x340 process_one_work+0x787/0x1490 worker_thread+0x586/0xd30 kthread+0x2df/0x3b0 ret_from_fork+0x2d/0x70 ret_from_fork_asm+0x11/0x20 Last potentially related work creation: kasan_save_stack+0x20/0x40 __kasan_record_aux_stack+0x9b/0xb0 insert_work+0x25/0x1b0 __queue_work+0x640/0xc90 rcu_work_rcufn+0x42/0x70 rcu_core+0x6a9/0x1850 __do_softirq+0x264/0x88f Second to last potentially related work creation: kasan_save_stack+0x20/0x40 __kasan_record_aux_stack+0x9b/0xb0 __call_rcu_common.constprop.0+0x6f/0xac0 queue_rcu_work+0x56/0x70 fl_mask_put+0x20d/0x270 [cls_flower] __fl_delete+0x352/0x6b0 [cls_flower] fl_delete+0x97/0x160 [cls_flower] tc_del_tfilter+0x7d1/0x1490 rtnetlink_rcv_msg+0x75e/0xac0 netlink_rcv_skb+0x12c/0x360 netlink_unicast+0x43e/0x700 netlink_sendmsg+0x749/0xc10 __sock_sendmsg+0xc5/0x190 ____sys_sendmsg+0x534/0x6b0 ___sys_sendmsg+0xeb/0x170 __sys_sendmsg+0xb5/0x140 do_syscall_64+0x70/0x140 entry_SYSCALL_64_after_hwframe+0x46/0x4e Fixes: 2081fd34 ("net: sched: cls_api: add filter counter") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Tested-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Mina Almasry says: ==================== Minor cleanups to skb frag ref/unref (part) This series is largely motivated by a recent discussion where there was some confusion on how to properly ref/unref pp pages vs non pp pages: https://lore.kernel.org/netdev/CAHS8izOoO-EovwMwAm9tLYetwikNPxC0FKyVGu1TPJWSz4bGoA@mail.gmail.com/T/#t There is some subtely there because pp uses page->pp_ref_count for refcounting, while non-pp uses get_page()/put_page() for ref counting. Getting the refcounting pairs wrong can lead to kernel crash. [...] https://lore.kernel.org/lkml/CAHS8izN436pn3SndrzsCyhmqvJHLyxgCeDpWXA4r1ANt3RCDLQ@mail.gmail.com/T/ ==================== Link: https://lore.kernel.org/r/20240408153000.2152844-1-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Mina Almasry authored
With the changes in the last patches, napi_frag_unref() is now reduandant. Remove it and use skb_page_unref directly. Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240408153000.2152844-4-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Mina Almasry authored
The implementations of these 2 functions are almost identical. Remove the implementation of napi_frag_unref, and make it a call into skb_page_unref so we don't duplicate the implementation. Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240408153000.2152844-2-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queueJakub Kicinski authored
Tony Nguyen says: ==================== net/e1000e, igb, igc: Remove redundant runtime resume Bjorn Helgaas says: e1000e, igb, and igc all have code to runtime resume the device during ethtool operations. Since f32a2137 ("ethtool: runtime-resume netdev parent before ethtool ioctl ops"), dev_ethtool() does this for us, so remove it from the individual drivers. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: igc: Remove redundant runtime resume for ethtool ops igb: Remove redundant runtime resume for ethtool_ops e1000e: Remove redundant runtime resume for ethtool_ops ==================== Link: https://lore.kernel.org/r/20240408210849.3641172-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Eric Dumazet says: ==================== bonding: remove RTNL from three sysfs files First patch might fix a potential deadlock. sysfs handlers should use rtnl_trylock() instead of rtnl_lock(). Following files can be read without acquiring RTNL : - /sys/class/net/bonding_masters - /sys/class/net/<name>/bonding/slaves - /sys/class/net/<name>/bonding/queue_id ==================== Link: https://lore.kernel.org/r/20240408190437.2214473-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
Annotate lockless reads of slave->queue_id. Annotate writes of slave->queue_id. Switch bonding_show_queue_id() to rcu_read_lock() and bond_for_each_slave_rcu(). Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20240408190437.2214473-4-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
Slave devices are already RCU protected, simply switch to bond_for_each_slave_rcu(), Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20240408190437.2214473-3-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
netdev structures are already RCU protected. Change bond_init() and bond_uninit() to use RCU enabled list_add_tail_rcu() and list_del_rcu(). Then bonding_show_bonds() can use rcu_read_lock() while iterating through bn->dev_list. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20240408190437.2214473-2-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Kuan-Wei Chiu authored
When constructing a heap, heapify operations are required on all non-leaf nodes. Thus, determining the index of the first non-leaf node is crucial. In a heap, the left child's index of node i is 2 * i + 1 and the right child's index is 2 * i + 2. Node CAKE_MAX_TINS * CAKE_QUEUES / 2 has its left and right children at indexes CAKE_MAX_TINS * CAKE_QUEUES + 1 and CAKE_MAX_TINS * CAKE_QUEUES + 2, respectively, which are beyond the heap's range, indicating it as a leaf node. Conversely, node CAKE_MAX_TINS * CAKE_QUEUES / 2 - 1 has a left child at index CAKE_MAX_TINS * CAKE_QUEUES - 1, confirming its non-leaf status. The loop should start from it since it's not a leaf node. By starting the loop from CAKE_MAX_TINS * CAKE_QUEUES / 2 - 1, we minimize function calls and branch condition evaluations. This adjustment theoretically reduces two function calls (one for cake_heapify() and another for cake_heap_get_backlog()) and five branch evaluations (one for iterating all non-leaf nodes, one within cake_heapify()'s while loop, and three more within the while loop with if conditions). Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/20240408174716.751069-1-visitorckw@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Asbjørn Sloth Tønnesen authored
Replace netdev_{warn,err} with NL_SET_ERR_MSG_{FMT_,}MOD to better inform the user about the problem. Only compile-tested, no access to HW. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Link: https://lore.kernel.org/r/20240408165506.94483-1-ast@fiberby.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 09 Apr, 2024 24 commits
-
-
Catalin Popescu authored
Unlike other ethernet PHYs from TI, PHY dp8382x has WOL enabled at reset. The driver explicitly disables WOL in config_init callback which is called during init and during resume from suspend. Hence, WOL is unconditionally disabled during resume, even if it was enabled before the suspend. We make sure that WOL configuration is persistent across suspends. Signed-off-by: Catalin Popescu <catalin.popescu@leica-geosystems.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240408082602.3654090-1-catalin.popescu@leica-geosystems.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
Horatiu Vultur says: ==================== net: phy: micrel: lan8814: Enable PTP_PF_PEROUT Add support for PTP_PF_PEROUT to lan8814. First patch just enables the LTC at probe time, such that it is not required to enable timestamping to have the LTC enabled. While the second patch actually adds support for PTP_PF_PEROUT. ==================== Link: https://lore.kernel.org/r/20240408064432.3881636-1-horatiu.vultur@microchip.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Horatiu Vultur authored
Lan8814 has 24 GPIOs but only 2 GPIOs (GPIO 0 and GPIO 1) can be configured to generate period signals. And there are 2 events (EVENT_A and EVENT_B) but these events are hardcoded to the GPIO 0 and GPIO 1. These events are used to generate period signals. It is possible to configure the length, the start time and the period of the signal by configuring the event. These events are generated by comparing the target time with the PHC time. In case the PHC time is changed to a value bigger than the target time + reload time, then it would generate only 1 event and then it would stop because target time + reload time is smaller than PHC time. Therefore it is required to change also the target time every time when the PHC is changed. The same will apply also when the PHC time is changed to a smaller value. This was tested using: testptp -i 1 -L 1,2 testptp -i 1 -p 1000000000 -w 200000000 Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Divya Koppera <divya.koppera@microchip.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Horatiu Vultur authored
The LTC for lan8814 was enabled only if timestamping was enabled, otherwise it would be stopped. Meaning that LTC will not increase by itself. This might break other features that don't required timestamping like generating 1PPS. Therefore enable the LTC at probe time. Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Sascha Hauer authored
The dwmac supports specifying the RGMII clock delays, but it is recommended to use rgmii-id and to specify the delays in the phy node instead [1]. Change the example accordingly to no longer promote this undesired setting. [1] https://lore.kernel.org/all/1a0de7b4-f0f7-4080-ae48-f5ffa9e76be3@lunn.ch/Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Dragan Simic <dsimic@manjaro.org> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240408-rockchip-dwmac-rgmii-id-binding-v1-1-3886d1a8bd54@pengutronix.deSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Paolo Abeni authored
Eric Dumazet says: ==================== tcp: fix ISN selection in TIMEWAIT -> SYN_RECV TCP can transform a TIMEWAIT socket into a SYN_RECV one from a SYN packet, and the ISN of the SYNACK packet is normally generated using TIMEWAIT tw_snd_nxt. This SYN packet also bypasses normal checks against listen queue being full or not. Unfortunately this has been broken almost one decade ago. This series fixes the issue, in two patches. First patch refactors code to add tcp_tw_isn as a parameter to ->route_req(), to make the second patch smaller. Second patch fixes the issue, by no longer using TCP_SKB_CB(skb) to store the tcp_tw_isn. Following packetdrill test passes after this series: // Set up a server listening socket. 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // Establish connection +0 < S 0:0(0) win 32792 <mss 1460,nop,nop,sackOK> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK> +.01 < . 1:1(0) ack 1 win 32792 +0 accept(3, ..., ...) = 4 // We close(), send a FIN, and get an ACK and FIN, in order to get into TIME_WAIT. +.01 close(4) = 0 +0 > F. 1:1(0) ack 1 +.01 < F. 1:1(0) ack 2 win 32792 +0 > . 2:2(0) ack 2 // SYN hitting a TIME_WAIT -> should use an ISN based on TIMEWAIT tw_snd_nxt +.01 < S 1000:1000(0) win 65535 <mss 1460,nop,nop,sackOK> +0 > S. 65539:65539(0) ack 1001 <mss 1460,nop,nop,sackOK> ==================== Link: https://lore.kernel.org/r/20240407093322.3172088-1-edumazet@google.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Eric Dumazet authored
TCP can transform a TIMEWAIT socket into a SYN_RECV one from a SYN packet, and the ISN of the SYNACK packet is normally generated using TIMEWAIT tw_snd_nxt : tcp_timewait_state_process() ... u32 isn = tcptw->tw_snd_nxt + 65535 + 2; if (isn == 0) isn++; TCP_SKB_CB(skb)->tcp_tw_isn = isn; return TCP_TW_SYN; This SYN packet also bypasses normal checks against listen queue being full or not. tcp_conn_request() ... __u32 isn = TCP_SKB_CB(skb)->tcp_tw_isn; ... /* TW buckets are converted to open requests without * limitations, they conserve resources and peer is * evidently real one. */ if ((syncookies == 2 || inet_csk_reqsk_queue_is_full(sk)) && !isn) { want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name); if (!want_cookie) goto drop; } This was using TCP_SKB_CB(skb)->tcp_tw_isn field in skb. Unfortunately this field has been accidentally cleared after the call to tcp_timewait_state_process() returning TCP_TW_SYN. Using a field in TCP_SKB_CB(skb) for a temporary state is overkill. Switch instead to a per-cpu variable. As a bonus, we do not have to clear tcp_tw_isn in TCP receive fast path. It is temporarily set then cleared only in the TCP_TW_SYN dance. Fixes: 4ad19de8 ("net: tcp6: fix double call of tcp_v6_fill_cb()") Fixes: eeea10b8 ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Eric Dumazet authored
tcp_v6_init_req() reads TCP_SKB_CB(skb)->tcp_tw_isn to find out if the request socket is created by a SYN hitting a TIMEWAIT socket. This has been buggy for a decade, lets directly pass the information from tcp_conn_request(). This is a preparatory patch to make the following one easier to review. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Paolo Abeni authored
Daniel Machon says: ==================== Add support for flower actions mirred and redirect This series adds support for the two tc flower actions mirred and redirect. Both actions are implemented by means of a port mask and a mask mode. The mask mode controls how the mask is applied, and together they are used by the switch to make a forwarding decision. Both actions are configurable via the IS0 or IS2 VCAP's (ingress stage 0 and 2, respectively). Patch #1: adds support for tc flower mirred action. Patch #2: adds support for tc flower redirect action. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> ==================== Link: https://lore.kernel.org/r/20240405-mirror-redirect-actions-v2-0-875d4c1927c8@microchip.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Daniel Machon authored
Add support for the flower redirect action. Two VCAP actions are encoded in the rule - one for the port mask, and one for the port mask mode. When the rule is hit, the port mask is used as the final destination set, replacing all other port masks. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Daniel Machon authored
Add support for tc flower mirred action. Two VCAP actions are encoded in the rule - one for the port mask, and one for the port mask mode. When the rule is hit, the destination mask is OR'ed with the port mask. Also add new VCAP function for supporting 72-bit wide actions, and a tc helper for setting the port forwarding mask. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Paolo Abeni authored
Diogo Ivo says: ==================== Support ICSSG-based Ethernet on AM65x SR1.0 devices This series extends the current ICSSG-based Ethernet driver to support AM65x Silicon Revision 1.0 devices. Notable differences between the Silicon Revisions are that there is no TX core in SR1.0 with this being handled by the firmware, requiring extra DMA channels to manage communication with the firmware (with the firmware being different as well) and in the packet classifier. The motivation behind it is that a significant number of Siemens devices containing SR1.0 silicon have been deployed in the field and need to be supported and updated to newer kernel versions without losing functionality. This series is based on TI's 5.10 SDK [1]. The fifth version of this patch series can be found in [2]. Compared to the last version of the patch set there are only changes in patch 05/10, where the fields of a struct are now explicitly declared as __le32 so that we can properly interpret them. Both of the problems mentioned in v4 have been addressed by disabling those functionalities, meaning that this driver currently only supports one TX queue and does not support a 100Mbit/s half-duplex connection. The removal of these features has been commented in the appropriate locations in the code. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y [2]: https://lore.kernel.org/netdev/20240326110709.26165-1-diogo.ivo@siemens.com/ ==================== Link: https://lore.kernel.org/r/20240403104821.283832-1-diogo.ivo@siemens.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
Add the PRUeth driver for the ICSSG subsystem found in AM65x SR1.0 devices. The main differences that set SR1.0 and SR2.0 apart are the missing TXPRU core in SR1.0, two extra DMA channels for management purposes and different firmware that needs to be configured accordingly. Based on the work of Roger Quadros, Vignesh Raghavendra and Grygorii Strashko in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.yCo-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
Some parts of the logic differ only slightly between Silicon Revisions. In these cases add the bits that differ to a common function that executes those bits conditionally based on the Silicon Revision. Based on the work of Roger Quadros, Vignesh Raghavendra and Grygorii Strashko in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.yCo-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
Add the functions to configure the SR1.0 packet classifier. Based on the work of Roger Quadros in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.yCo-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
As SR1.0 uses the current higher priority channel to send commands to the firmware, take this into account when setting/getting the number of channels to/from the user. Based on the work of Roger Quadros in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.yCo-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
Correctly adjust the IPG based on the Silicon Revision. Based on the work of Roger Quadros, Vignesh Raghavendra and Grygorii Strashko in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.yCo-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
Add a field to distinguish between SR1.0 and SR2.0 in the driver as well as the necessary structures to program SR1.0. Based on the work of Roger Quadros in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.yCo-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
Define the firmware configuration structure and commands needed to communicate with SR1.0 firmware, as well as SR1.0 buffer information where it differs from SR2.0. Based on the work of Roger Quadros, Murali Karicheri and Grygorii Strashko in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.yCo-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
In order to allow code sharing between Silicon Revisions 1.0 and 2.0 move all functions that can be shared into a common file. This commit introduces no functional changes. Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
As these addresses can be useful outside of checking if an address is a multicast address (for example in device drivers) make them accessible to users of etherdevice.h to avoid code duplication. Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
Silicon Revision 1.0 of the AM65x came with a slightly different ICSSG support: Only 2 PRUs per slice are available and instead 2 additional DMA channels are used for management purposes. We have no restrictions on specified PRUs, but the DMA channels need to be adjusted. Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Dan Carpenter authored
These error paths accidentally return "ret" which is zero/success instead of the correct error code. Fixes: 71e79430 ("net: phy: air_en8811h: Add the Airoha EN8811H PHY driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/7ef2e230-dfb7-4a77-8973-9e5be1a99fc2@moroto.mountainSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Allen Pais authored
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This patch converts drivers/net/archnet/* from tasklet to BH workqueue. Based on the work done by Tejun Heo <tj@kernel.org> Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10 Signed-off-by: Allen Pais <allen.lkml@gmail.com> Link: https://lore.kernel.org/r/20240403162306.20258-1-apais@linux.microsoft.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 08 Apr, 2024 5 commits
-
-
Bjorn Helgaas authored
8c5ad0da ("igc: Add ethtool support") added ethtool_ops.begin() and .complete(), which used pm_runtime_get_sync() to resume suspended devices before any ethtool_ops callback and allow suspend after it completed. Subsequently, f32a2137 ("ethtool: runtime-resume netdev parent before ethtool ioctl ops") added pm_runtime_get_sync() in the dev_ethtool() path, so the device is resumed before any ethtool_ops callback even if the driver didn't supply a .begin() callback. Remove the .begin() and .complete() callbacks, which are now redundant because dev_ethtool() already resumes the device. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Bjorn Helgaas authored
749ab2cd ("igb: add basic runtime PM support") added ethtool_ops.begin() and .complete(), which used pm_runtime_get_sync() to resume suspended devices before any ethtool_ops callback and allow suspend after it completed. Subsequently, f32a2137 ("ethtool: runtime-resume netdev parent before ethtool ioctl ops") added pm_runtime_get_sync() in the dev_ethtool() path, so the device is resumed before any ethtool_ops callback even if the driver didn't supply a .begin() callback. Remove the .begin() and .complete() callbacks, which are now redundant because dev_ethtool() already resumes the device. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Bjorn Helgaas authored
e60b22c5 ("e1000e: fix accessing to suspended device") added ethtool_ops.begin() and .complete(), which used pm_runtime_get_sync() to resume suspended devices before any ethtool_ops callback and allow suspend after it completed. 3ef672ab ("e1000e: ethtool unnecessarily takes device out of RPM suspend") removed ethtool_ops.begin() and .complete() and instead did pm_runtime_get_sync() only in the individual ethtool_ops callbacks that access device registers. Subsequently, f32a2137 ("ethtool: runtime-resume netdev parent before ethtool ioctl ops") added pm_runtime_get_sync() in the dev_ethtool() path, so the device is resumed before *any* ethtool_ops callback, as it was before 3ef672ab. Remove most runtime resumes from ethtool_ops, which are now redundant because the resume has already been done by dev_ethtool(). This is essentially a revert of 3ef672ab ("e1000e: ethtool unnecessarily takes device out of RPM suspend"). There are a couple subtleties: - Prior to 3ef672ab, the device was resumed only for the duration of a single ethtool callback. 3ef672ab changed e1000_set_phys_id() so the device was resumed for ETHTOOL_ID_ACTIVE and remained resumed until a subsequent callback for ETHTOOL_ID_INACTIVE. Preserve that part of 3ef672ab so the device will not be runtime suspended while in the ETHTOOL_ID_ACTIVE state. - 3ef672ab added "if (!pm_runtime_suspended())" in before reading the STATUS register in e1000_get_settings(). This was racy and is now unnecessary because dev_ethtool() has resumed the device already, so revert that. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-
Heiner Kallweit authored
A user reported an unknown chip version. According to the r8168 vendor driver it's called RTL8168M, but handling is identical to RTL8168H. So let's simply treat it as RTL8168H. Tested-by: Евгений <octobergun@gmail.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Parav Pandit says: ==================== devlink: Add port function attribute for IO EQs Currently, PCI SFs and VFs use IO event queues to deliver netdev per channel events. The number of netdev channels is a function of IO event queues. In the second scenario of an RDMA device, the completion vectors are also a function of IO event queues. Currently, an administrator on the hypervisor has no means to provision the number of IO event queues for the SF device or the VF device. Device/firmware determines some arbitrary value for these IO event queues. Due to this, the SF netdev channels are unpredictable, and consequently, the performance is too. This short series introduces a new port function attribute: max_io_eqs. The goal is to provide administrators at the hypervisor level with the ability to provision the maximum number of IO event queues for a function. This gives the control to the administrator to provision right number of IO event queues and have predictable performance. Examples of when an administrator provisions (set) maximum number of IO event queues when using switchdev mode: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 10 $ devlink port function set pci/0000:06:00.0/1 max_io_eqs 20 $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 20 This sets the corresponding maximum IO event queues of the function before it is enumerated. Thus, when the VF/SF driver reads the capability from the device, it sees the value provisioned by the hypervisor. The driver is then able to configure the number of channels for the net device, as well as the number of completion vectors for the RDMA device. The device/firmware also honors the provisioned value, hence any VF/SF driver attempting to create IO EQs beyond provisioned value results in an error. With above setting now, the administrator is able to achieve the 2x performance on SFs with 20 channels. In second example when SF was provisioned for a container with 2 cpus, the administrator provisioned only 2 IO event queues, thereby saving device resources. With the above settings now in place, the administrator achieved 2x performance with the SF device with 20 channels. In the second example, when the SF was provisioned for a container with 2 CPUs, the administrator provisioned only 2 IO event queues, thereby saving device resources. changelog: v2->v3: - limited to 80 chars per line in devlink - fixed comments from Jakub in mlx5 driver to fix missing mutex unlock on error path v1->v2: - limited comment to 80 chars per line in header file - fixed set function variables for reverse christmas tree - fixed comments from Kalesh - fixed missing kfree in get call - returning error code for get cmd failure - fixed error msg copy paste error in set on cmd failure ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-