- 10 Apr, 2024 6 commits
-
-
Jakub Kicinski authored
Eric Dumazet says: ==================== bonding: remove RTNL from three sysfs files First patch might fix a potential deadlock. sysfs handlers should use rtnl_trylock() instead of rtnl_lock(). Following files can be read without acquiring RTNL : - /sys/class/net/bonding_masters - /sys/class/net/<name>/bonding/slaves - /sys/class/net/<name>/bonding/queue_id ==================== Link: https://lore.kernel.org/r/20240408190437.2214473-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
Annotate lockless reads of slave->queue_id. Annotate writes of slave->queue_id. Switch bonding_show_queue_id() to rcu_read_lock() and bond_for_each_slave_rcu(). Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20240408190437.2214473-4-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
Slave devices are already RCU protected, simply switch to bond_for_each_slave_rcu(), Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20240408190437.2214473-3-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
netdev structures are already RCU protected. Change bond_init() and bond_uninit() to use RCU enabled list_add_tail_rcu() and list_del_rcu(). Then bonding_show_bonds() can use rcu_read_lock() while iterating through bn->dev_list. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20240408190437.2214473-2-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Kuan-Wei Chiu authored
When constructing a heap, heapify operations are required on all non-leaf nodes. Thus, determining the index of the first non-leaf node is crucial. In a heap, the left child's index of node i is 2 * i + 1 and the right child's index is 2 * i + 2. Node CAKE_MAX_TINS * CAKE_QUEUES / 2 has its left and right children at indexes CAKE_MAX_TINS * CAKE_QUEUES + 1 and CAKE_MAX_TINS * CAKE_QUEUES + 2, respectively, which are beyond the heap's range, indicating it as a leaf node. Conversely, node CAKE_MAX_TINS * CAKE_QUEUES / 2 - 1 has a left child at index CAKE_MAX_TINS * CAKE_QUEUES - 1, confirming its non-leaf status. The loop should start from it since it's not a leaf node. By starting the loop from CAKE_MAX_TINS * CAKE_QUEUES / 2 - 1, we minimize function calls and branch condition evaluations. This adjustment theoretically reduces two function calls (one for cake_heapify() and another for cake_heap_get_backlog()) and five branch evaluations (one for iterating all non-leaf nodes, one within cake_heapify()'s while loop, and three more within the while loop with if conditions). Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/20240408174716.751069-1-visitorckw@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Asbjørn Sloth Tønnesen authored
Replace netdev_{warn,err} with NL_SET_ERR_MSG_{FMT_,}MOD to better inform the user about the problem. Only compile-tested, no access to HW. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Link: https://lore.kernel.org/r/20240408165506.94483-1-ast@fiberby.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 09 Apr, 2024 24 commits
-
-
Catalin Popescu authored
Unlike other ethernet PHYs from TI, PHY dp8382x has WOL enabled at reset. The driver explicitly disables WOL in config_init callback which is called during init and during resume from suspend. Hence, WOL is unconditionally disabled during resume, even if it was enabled before the suspend. We make sure that WOL configuration is persistent across suspends. Signed-off-by: Catalin Popescu <catalin.popescu@leica-geosystems.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240408082602.3654090-1-catalin.popescu@leica-geosystems.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Paolo Abeni authored
Horatiu Vultur says: ==================== net: phy: micrel: lan8814: Enable PTP_PF_PEROUT Add support for PTP_PF_PEROUT to lan8814. First patch just enables the LTC at probe time, such that it is not required to enable timestamping to have the LTC enabled. While the second patch actually adds support for PTP_PF_PEROUT. ==================== Link: https://lore.kernel.org/r/20240408064432.3881636-1-horatiu.vultur@microchip.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Horatiu Vultur authored
Lan8814 has 24 GPIOs but only 2 GPIOs (GPIO 0 and GPIO 1) can be configured to generate period signals. And there are 2 events (EVENT_A and EVENT_B) but these events are hardcoded to the GPIO 0 and GPIO 1. These events are used to generate period signals. It is possible to configure the length, the start time and the period of the signal by configuring the event. These events are generated by comparing the target time with the PHC time. In case the PHC time is changed to a value bigger than the target time + reload time, then it would generate only 1 event and then it would stop because target time + reload time is smaller than PHC time. Therefore it is required to change also the target time every time when the PHC is changed. The same will apply also when the PHC time is changed to a smaller value. This was tested using: testptp -i 1 -L 1,2 testptp -i 1 -p 1000000000 -w 200000000 Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Divya Koppera <divya.koppera@microchip.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Horatiu Vultur authored
The LTC for lan8814 was enabled only if timestamping was enabled, otherwise it would be stopped. Meaning that LTC will not increase by itself. This might break other features that don't required timestamping like generating 1PPS. Therefore enable the LTC at probe time. Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Sascha Hauer authored
The dwmac supports specifying the RGMII clock delays, but it is recommended to use rgmii-id and to specify the delays in the phy node instead [1]. Change the example accordingly to no longer promote this undesired setting. [1] https://lore.kernel.org/all/1a0de7b4-f0f7-4080-ae48-f5ffa9e76be3@lunn.ch/Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Dragan Simic <dsimic@manjaro.org> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240408-rockchip-dwmac-rgmii-id-binding-v1-1-3886d1a8bd54@pengutronix.deSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Paolo Abeni authored
Eric Dumazet says: ==================== tcp: fix ISN selection in TIMEWAIT -> SYN_RECV TCP can transform a TIMEWAIT socket into a SYN_RECV one from a SYN packet, and the ISN of the SYNACK packet is normally generated using TIMEWAIT tw_snd_nxt. This SYN packet also bypasses normal checks against listen queue being full or not. Unfortunately this has been broken almost one decade ago. This series fixes the issue, in two patches. First patch refactors code to add tcp_tw_isn as a parameter to ->route_req(), to make the second patch smaller. Second patch fixes the issue, by no longer using TCP_SKB_CB(skb) to store the tcp_tw_isn. Following packetdrill test passes after this series: // Set up a server listening socket. 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 // Establish connection +0 < S 0:0(0) win 32792 <mss 1460,nop,nop,sackOK> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK> +.01 < . 1:1(0) ack 1 win 32792 +0 accept(3, ..., ...) = 4 // We close(), send a FIN, and get an ACK and FIN, in order to get into TIME_WAIT. +.01 close(4) = 0 +0 > F. 1:1(0) ack 1 +.01 < F. 1:1(0) ack 2 win 32792 +0 > . 2:2(0) ack 2 // SYN hitting a TIME_WAIT -> should use an ISN based on TIMEWAIT tw_snd_nxt +.01 < S 1000:1000(0) win 65535 <mss 1460,nop,nop,sackOK> +0 > S. 65539:65539(0) ack 1001 <mss 1460,nop,nop,sackOK> ==================== Link: https://lore.kernel.org/r/20240407093322.3172088-1-edumazet@google.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Eric Dumazet authored
TCP can transform a TIMEWAIT socket into a SYN_RECV one from a SYN packet, and the ISN of the SYNACK packet is normally generated using TIMEWAIT tw_snd_nxt : tcp_timewait_state_process() ... u32 isn = tcptw->tw_snd_nxt + 65535 + 2; if (isn == 0) isn++; TCP_SKB_CB(skb)->tcp_tw_isn = isn; return TCP_TW_SYN; This SYN packet also bypasses normal checks against listen queue being full or not. tcp_conn_request() ... __u32 isn = TCP_SKB_CB(skb)->tcp_tw_isn; ... /* TW buckets are converted to open requests without * limitations, they conserve resources and peer is * evidently real one. */ if ((syncookies == 2 || inet_csk_reqsk_queue_is_full(sk)) && !isn) { want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name); if (!want_cookie) goto drop; } This was using TCP_SKB_CB(skb)->tcp_tw_isn field in skb. Unfortunately this field has been accidentally cleared after the call to tcp_timewait_state_process() returning TCP_TW_SYN. Using a field in TCP_SKB_CB(skb) for a temporary state is overkill. Switch instead to a per-cpu variable. As a bonus, we do not have to clear tcp_tw_isn in TCP receive fast path. It is temporarily set then cleared only in the TCP_TW_SYN dance. Fixes: 4ad19de8 ("net: tcp6: fix double call of tcp_v6_fill_cb()") Fixes: eeea10b8 ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Eric Dumazet authored
tcp_v6_init_req() reads TCP_SKB_CB(skb)->tcp_tw_isn to find out if the request socket is created by a SYN hitting a TIMEWAIT socket. This has been buggy for a decade, lets directly pass the information from tcp_conn_request(). This is a preparatory patch to make the following one easier to review. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Paolo Abeni authored
Daniel Machon says: ==================== Add support for flower actions mirred and redirect This series adds support for the two tc flower actions mirred and redirect. Both actions are implemented by means of a port mask and a mask mode. The mask mode controls how the mask is applied, and together they are used by the switch to make a forwarding decision. Both actions are configurable via the IS0 or IS2 VCAP's (ingress stage 0 and 2, respectively). Patch #1: adds support for tc flower mirred action. Patch #2: adds support for tc flower redirect action. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> ==================== Link: https://lore.kernel.org/r/20240405-mirror-redirect-actions-v2-0-875d4c1927c8@microchip.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Daniel Machon authored
Add support for the flower redirect action. Two VCAP actions are encoded in the rule - one for the port mask, and one for the port mask mode. When the rule is hit, the port mask is used as the final destination set, replacing all other port masks. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Daniel Machon authored
Add support for tc flower mirred action. Two VCAP actions are encoded in the rule - one for the port mask, and one for the port mask mode. When the rule is hit, the destination mask is OR'ed with the port mask. Also add new VCAP function for supporting 72-bit wide actions, and a tc helper for setting the port forwarding mask. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Paolo Abeni authored
Diogo Ivo says: ==================== Support ICSSG-based Ethernet on AM65x SR1.0 devices This series extends the current ICSSG-based Ethernet driver to support AM65x Silicon Revision 1.0 devices. Notable differences between the Silicon Revisions are that there is no TX core in SR1.0 with this being handled by the firmware, requiring extra DMA channels to manage communication with the firmware (with the firmware being different as well) and in the packet classifier. The motivation behind it is that a significant number of Siemens devices containing SR1.0 silicon have been deployed in the field and need to be supported and updated to newer kernel versions without losing functionality. This series is based on TI's 5.10 SDK [1]. The fifth version of this patch series can be found in [2]. Compared to the last version of the patch set there are only changes in patch 05/10, where the fields of a struct are now explicitly declared as __le32 so that we can properly interpret them. Both of the problems mentioned in v4 have been addressed by disabling those functionalities, meaning that this driver currently only supports one TX queue and does not support a 100Mbit/s half-duplex connection. The removal of these features has been commented in the appropriate locations in the code. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y [2]: https://lore.kernel.org/netdev/20240326110709.26165-1-diogo.ivo@siemens.com/ ==================== Link: https://lore.kernel.org/r/20240403104821.283832-1-diogo.ivo@siemens.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
Add the PRUeth driver for the ICSSG subsystem found in AM65x SR1.0 devices. The main differences that set SR1.0 and SR2.0 apart are the missing TXPRU core in SR1.0, two extra DMA channels for management purposes and different firmware that needs to be configured accordingly. Based on the work of Roger Quadros, Vignesh Raghavendra and Grygorii Strashko in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.yCo-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
Some parts of the logic differ only slightly between Silicon Revisions. In these cases add the bits that differ to a common function that executes those bits conditionally based on the Silicon Revision. Based on the work of Roger Quadros, Vignesh Raghavendra and Grygorii Strashko in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.yCo-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
Add the functions to configure the SR1.0 packet classifier. Based on the work of Roger Quadros in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.yCo-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
As SR1.0 uses the current higher priority channel to send commands to the firmware, take this into account when setting/getting the number of channels to/from the user. Based on the work of Roger Quadros in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.yCo-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
Correctly adjust the IPG based on the Silicon Revision. Based on the work of Roger Quadros, Vignesh Raghavendra and Grygorii Strashko in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.yCo-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
Add a field to distinguish between SR1.0 and SR2.0 in the driver as well as the necessary structures to program SR1.0. Based on the work of Roger Quadros in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.yCo-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
Define the firmware configuration structure and commands needed to communicate with SR1.0 firmware, as well as SR1.0 buffer information where it differs from SR2.0. Based on the work of Roger Quadros, Murali Karicheri and Grygorii Strashko in TI's 5.10 SDK [1]. [1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.yCo-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
In order to allow code sharing between Silicon Revisions 1.0 and 2.0 move all functions that can be shared into a common file. This commit introduces no functional changes. Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
As these addresses can be useful outside of checking if an address is a multicast address (for example in device drivers) make them accessible to users of etherdevice.h to avoid code duplication. Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Diogo Ivo authored
Silicon Revision 1.0 of the AM65x came with a slightly different ICSSG support: Only 2 PRUs per slice are available and instead 2 additional DMA channels are used for management purposes. We have no restrictions on specified PRUs, but the DMA channels need to be adjusted. Co-developed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: MD Danish Anwar <danishanwar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-
Dan Carpenter authored
These error paths accidentally return "ret" which is zero/success instead of the correct error code. Fixes: 71e79430 ("net: phy: air_en8811h: Add the Airoha EN8811H PHY driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/7ef2e230-dfb7-4a77-8973-9e5be1a99fc2@moroto.mountainSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Allen Pais authored
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This patch converts drivers/net/archnet/* from tasklet to BH workqueue. Based on the work done by Tejun Heo <tj@kernel.org> Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10 Signed-off-by: Allen Pais <allen.lkml@gmail.com> Link: https://lore.kernel.org/r/20240403162306.20258-1-apais@linux.microsoft.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 08 Apr, 2024 10 commits
-
-
Heiner Kallweit authored
A user reported an unknown chip version. According to the r8168 vendor driver it's called RTL8168M, but handling is identical to RTL8168H. So let's simply treat it as RTL8168H. Tested-by: Евгений <octobergun@gmail.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Parav Pandit says: ==================== devlink: Add port function attribute for IO EQs Currently, PCI SFs and VFs use IO event queues to deliver netdev per channel events. The number of netdev channels is a function of IO event queues. In the second scenario of an RDMA device, the completion vectors are also a function of IO event queues. Currently, an administrator on the hypervisor has no means to provision the number of IO event queues for the SF device or the VF device. Device/firmware determines some arbitrary value for these IO event queues. Due to this, the SF netdev channels are unpredictable, and consequently, the performance is too. This short series introduces a new port function attribute: max_io_eqs. The goal is to provide administrators at the hypervisor level with the ability to provision the maximum number of IO event queues for a function. This gives the control to the administrator to provision right number of IO event queues and have predictable performance. Examples of when an administrator provisions (set) maximum number of IO event queues when using switchdev mode: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 10 $ devlink port function set pci/0000:06:00.0/1 max_io_eqs 20 $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 20 This sets the corresponding maximum IO event queues of the function before it is enumerated. Thus, when the VF/SF driver reads the capability from the device, it sees the value provisioned by the hypervisor. The driver is then able to configure the number of channels for the net device, as well as the number of completion vectors for the RDMA device. The device/firmware also honors the provisioned value, hence any VF/SF driver attempting to create IO EQs beyond provisioned value results in an error. With above setting now, the administrator is able to achieve the 2x performance on SFs with 20 channels. In second example when SF was provisioned for a container with 2 cpus, the administrator provisioned only 2 IO event queues, thereby saving device resources. With the above settings now in place, the administrator achieved 2x performance with the SF device with 20 channels. In the second example, when the SF was provisioned for a container with 2 CPUs, the administrator provisioned only 2 IO event queues, thereby saving device resources. changelog: v2->v3: - limited to 80 chars per line in devlink - fixed comments from Jakub in mlx5 driver to fix missing mutex unlock on error path v1->v2: - limited comment to 80 chars per line in header file - fixed set function variables for reverse christmas tree - fixed comments from Kalesh - fixed missing kfree in get call - returning error code for get cmd failure - fixed error msg copy paste error in set on cmd failure ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Parav Pandit authored
Implement get and set for the maximum IO event queues for SF and VF. This enables administrator on the hypervisor to control the maximum IO event queues which are typically used to derive the maximum and default number of net device channels or rdma device completion vectors. Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Parav Pandit authored
Many devices send event notifications for the IO queues, such as tx and rx queues, through event queues. Enable a privileged owner, such as a hypervisor PF, to set the number of IO event queues for the VF and SF during the provisioning stage. example: Get maximum IO event queues of the VF device:: $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10 Set maximum IO event queues of the VF device:: $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32 Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Print these additional fields in skb_dump() to ease debugging. - mac_len - csum_start (in v2, at Willem suggestion) - csum_offset (in v2, at Willem suggestion) - priority - mark - alloc_cpu - vlan_all - encapsulation - inner_protocol - inner_mac_header - inner_network_header - inner_transport_header Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Andrew Lunn says: ==================== net: Clean up some EEE code Previous patches have reworked the API between phylib and MAC drivers with respect to EEE, pushing most of the work into phylib. These two patches rework two drivers to make use of the new API, and fix their EEE implementation, so that EEE is configured in the MAC based on what is actually negotiated during autoneg. Compile tested only. ==================== Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
The enabling/disabling of EEE in the MAC should happen as a result of auto negotiation. So move the enable/disable into lan743x_phy_link_status_change() which gets called by phylib when there is a change in link status. lan743x_ethtool_set_eee() now just programs the hardware with the LTI timer value, and passed everything else to phylib, so it can correctly setup the PHY. lan743x_ethtool_get_eee() relies on phylib doing most of the work, the MAC driver just adds the LTI timer value. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Andrew Lunn authored
The enabling/disabling of EEE in the MAC should happen as a result of auto negotiation. So move the enable/disable into lan783xx_phy_link_status_change() which gets called by phylib when there is a change in link status. lan78xx_set_eee() now just programs the hardware with the LPI timer value, and passed everything else to phylib, so it can correctly setup the PHY. lan743x_get_eee() relies on phylib doing most of the work, the MAC driver just adds the LPI timer value. Call phy_support_eee() to indicate the MAC does actually support EEE. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jason Xing authored
The reason codes are handled in two ways nowadays (quoting Mat Martineau): 1. Sending in the MPTCP option on RST packets when there is no subflow context available (these use subflow_add_reset_reason() and directly call a TCP-level send_reset function) 2. The "normal" way via subflow->reset_reason. This will propagate to both the outgoing reset packet and to a local path manager process via netlink in mptcp_event_sub_closed() RFC 8684 defines the skb reset reason behaviour which is not required even though in some places: A host sends a TCP RST in order to close a subflow or reject an attempt to open a subflow (MP_JOIN). In order to let the receiving host know why a subflow is being closed or rejected, the TCP RST packet MAY include the MP_TCPRST option (Figure 15). The host MAY use this information to decide, for example, whether it tries to re-establish the subflow immediately, later, or never. Since the commit dc87efdb ("mptcp: add mptcp reset option support") introduced this feature about three years ago, we can fully use it. There remains some places where we could insert reason into skb as we can see in this patch. Many thanks to Mat and Paolo for help:) Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Guillaume Nault authored
Add a "scope" parameter to ip_route_output() so that callers don't have to override the tos parameter with the RTO_ONLINK flag if they want a local scope. This will allow converting flowi4_tos to dscp_t in the future, thus allowing static analysers to flag invalid interactions between "tos" (the DSCP bits) and ECN. Only three users ask for local scope (bonding, arp and atm). The others continue to use RT_SCOPE_UNIVERSE. While there, add a comment to warn users about the limitations of ip_route_output(). Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> # infiniband Signed-off-by: David S. Miller <davem@davemloft.net>
-