An error occurred fetching the project authors.
- 27 Aug, 2024 1 commit
-
-
Shradha Gupta authored
Currently the values of WQs for RX and TX queues for MANA devices are hardcoded to default sizes. Allow configuring these values for MANA devices as ringparam configuration(get/set) through ethtool_ops. Pre-allocate buffers at the beginning of this operation, to prevent complete network loss in low-memory conditions. Signed-off-by:
Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by:
Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by:
Saurabh Sengar <ssengar@linux.microsoft.com> Link: https://patch.msgid.link/1724688461-12203-1-git-send-email-shradhagupta@linux.microsoft.comSigned-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- 13 Aug, 2024 1 commit
-
-
Long Li authored
After napi_complete_done() is called when NAPI is polling in the current process context, another NAPI may be scheduled and start running in softirq on another CPU and may ring the doorbell before the current CPU does. When combined with unnecessary rings when there is no need to arm the CQ, it triggers error paths in the hardware. This patch fixes this by calling napi_complete_done() after doorbell rings. It limits the number of unnecessary rings when there is no need to arm. MANA hardware specifies that there must be one doorbell ring every 8 CQ wraparounds. This driver guarantees one doorbell ring as soon as the number of consumed CQEs exceeds 4 CQ wraparounds. In practical workloads, the 4 CQ wraparounds proves to be big enough that it rarely exceeds this limit before all the napi weight is consumed. To implement this, add a per-CQ counter cq->work_done_since_doorbell, and make sure the CQ is armed as soon as passing 4 wraparounds of the CQ. Cc: stable@vger.kernel.org Fixes: e1b5683f ("net: mana: Move NAPI from EQ to CQ") Reviewed-by:
Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by:
Long Li <longli@microsoft.com> Link: https://patch.msgid.link/1723219138-29887-1-git-send-email-longli@linuxonhyperv.comSigned-off-by:
Paolo Abeni <pabeni@redhat.com>
-
- 12 Aug, 2024 1 commit
-
-
Haiyang Zhang authored
The MANA driver's RX buffer alloc_size is passed into napi_build_skb() to create SKB. skb_shinfo(skb) is located at the end of skb, and its alignment is affected by the alloc_size passed into napi_build_skb(). The size needs to be aligned properly for better performance and atomic operations. Otherwise, on ARM64 CPU, for certain MTU settings like 4000, atomic operations may panic on the skb_shinfo(skb)->dataref due to alignment fault. To fix this bug, add proper alignment to the alloc_size calculation. Sample panic info: [ 253.298819] Unable to handle kernel paging request at virtual address ffff000129ba5cce [ 253.300900] Mem abort info: [ 253.301760] ESR = 0x0000000096000021 [ 253.302825] EC = 0x25: DABT (current EL), IL = 32 bits [ 253.304268] SET = 0, FnV = 0 [ 253.305172] EA = 0, S1PTW = 0 [ 253.306103] FSC = 0x21: alignment fault Call trace: __skb_clone+0xfc/0x198 skb_clone+0x78/0xe0 raw6_local_deliver+0xfc/0x228 ip6_protocol_deliver_rcu+0x80/0x500 ip6_input_finish+0x48/0x80 ip6_input+0x48/0xc0 ip6_sublist_rcv_finish+0x50/0x78 ip6_sublist_rcv+0x1cc/0x2b8 ipv6_list_rcv+0x100/0x150 __netif_receive_skb_list_core+0x180/0x220 netif_receive_skb_list_internal+0x198/0x2a8 __napi_poll+0x138/0x250 net_rx_action+0x148/0x330 handle_softirqs+0x12c/0x3a0 Cc: stable@vger.kernel.org Fixes: 80f6215b ("net: mana: Add support for jumbo frame") Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by:
Long Li <longli@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 14 Jul, 2024 1 commit
-
-
Konstantin Taranov authored
Add mana_get_primary_netdev_rcu helper to get a primary netdevice for a given port. When mana is used with netvsc, the VF netdev is controlled by an upper netvsc device. In a baremetal case, the VF netdev is the primary device. Use the mana_get_primary_netdev_rcu() helper in the mana_ib to get the correct device for querying network states. Fixes: 8b184e4f ("RDMA/mana_ib: Enable RoCE on port 1") Signed-off-by:
Konstantin Taranov <kotaranov@microsoft.com> Link: https://lore.kernel.org/r/1720705077-322-1-git-send-email-kotaranov@linux.microsoft.comReviewed-by:
Long Li <longli@microsoft.com> Reviewed-by:
Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by:
Leon Romanovsky <leon@kernel.org>
-
- 27 Jun, 2024 1 commit
-
-
Ma Ke authored
When auxiliary_device_add() returns error and then calls auxiliary_device_uninit(), callback function adev_release calls kfree(madev). We shouldn't call kfree(madev) again in the error handling path. Set 'madev' to NULL. Fixes: a69839d4 ("net: mana: Add support for auxiliary device") Signed-off-by:
Ma Ke <make24@iscas.ac.cn> Link: https://patch.msgid.link/20240625130314.2661257-1-make24@iscas.ac.cnSigned-off-by:
Paolo Abeni <pabeni@redhat.com>
-
- 19 Jun, 2024 1 commit
-
-
Haiyang Zhang authored
As defined by the MANA Hardware spec, the queue size for DMA is 4KB minimal, and power of 2. And, the HWC queue size has to be exactly 4KB. To support page sizes other than 4KB on ARM64, define the minimal queue size as a macro separately from the PAGE_SIZE, which we always assumed it to be 4KB before supporting ARM64. Also, add MANA specific macros and update code related to size alignment, DMA region calculations, etc. Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by:
Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1718655446-6576-1-git-send-email-haiyangz@microsoft.comSigned-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- 18 Jun, 2024 1 commit
-
-
Shradha Gupta authored
To cleanup rxqs in port context structures, instead of duplicating the code, use existing function mana_cleanup_port_context() which does the exact cleanup that's needed. Signed-off-by:
Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by:
Simon Horman <horms@kernel.org> Reviewed-by:
Wei Liu <wei.liu@kernel.org> Reviewed-by:
Heng Qi <hengqi@linux.alibaba.com> Link: https://lore.kernel.org/r/1718349548-28697-1-git-send-email-shradhagupta@linux.microsoft.comSigned-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- 12 Jun, 2024 1 commit
-
-
Shradha Gupta authored
Allow variable size indirection table allocation in MANA instead of using a constant value MANA_INDIRECT_TABLE_SIZE. The size is now derived from the MANA_QUERY_VPORT_CONFIG and the indirection table is allocated dynamically. Signed-off-by:
Shradha Gupta <shradhagupta@linux.microsoft.com> Link: https://lore.kernel.org/r/1718015319-9609-1-git-send-email-shradhagupta@linux.microsoft.comReviewed-by:
Dexuan Cui <decui@microsoft.com> Reviewed-by:
Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by:
Leon Romanovsky <leon@kernel.org>
-
- 07 May, 2024 1 commit
-
-
Eric Dumazet authored
Simon reported that ndo_change_mtu() methods were never updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted in commit 501a90c9 ("inet: protect against too small mtu values.") We read dev->mtu without holding RTNL in many places, with READ_ONCE() annotations. It is time to take care of ndo_change_mtu() methods to use corresponding WRITE_ONCE() Signed-off-by:
Eric Dumazet <edumazet@google.com> Reported-by:
Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/Reviewed-by:
Jacob Keller <jacob.e.keller@intel.com> Reviewed-by:
Sabrina Dubroca <sd@queasysnail.net> Reviewed-by:
Simon Horman <horms@kernel.org> Acked-by:
Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.comSigned-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- 11 Apr, 2024 1 commit
-
-
Erick Archer authored
This is an effort to get rid of all multiplications from allocation functions in order to prevent integer overflows [1][2]. As the "req" variable is a pointer to "struct mana_cfg_rx_steer_req_v2" and this structure ends in a flexible array: struct mana_cfg_rx_steer_req_v2 { [...] mana_handle_t indir_tab[] __counted_by(num_indir_entries); }; the preferred way in the kernel is to use the struct_size() helper to do the arithmetic instead of the calculation "size + size * count" in the kzalloc() function. Moreover, use the "offsetof" helper to get the indirect table offset instead of the "sizeof" operator and avoid the open-coded arithmetic in pointers using the new flex member. This new structure member also allow us to remove the "req_indir_tab" variable since it is no longer needed. Now, it is also possible to use the "flex_array_size" helper to compute the size of these trailing elements in the "memcpy" function. This way, the code is more readable and safer. This code was detected with the help of Coccinelle, and audited and modified manually. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1] Link: https://github.com/KSPP/linux/issues/160 [2] Signed-off-by:
Erick Archer <erick.archer@outlook.com> Link: https://lore.kernel.org/r/AS8PR02MB7237A21355C86EC0DCC0D83B8B022@AS8PR02MB7237.eurprd02.prod.outlook.comReviewed-by:
Justin Stitt <justinstitt@google.com> Signed-off-by:
Leon Romanovsky <leon@kernel.org>
-
- 04 Apr, 2024 1 commit
-
-
Haiyang Zhang authored
mana_get_rxbuf_cfg() aligns the RX buffer's DMA datasize to be multiple of 64. So a packet slightly bigger than mtu+14, say 1536, can be received and cause skb_over_panic. Sample dmesg: [ 5325.237162] skbuff: skb_over_panic: text:ffffffffc043277a len:1536 put:1536 head:ff1100018b517000 data:ff1100018b517100 tail:0x700 end:0x6ea dev:<NULL> [ 5325.243689] ------------[ cut here ]------------ [ 5325.245748] kernel BUG at net/core/skbuff.c:192! [ 5325.247838] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 5325.258374] RIP: 0010:skb_panic+0x4f/0x60 [ 5325.302941] Call Trace: [ 5325.304389] <IRQ> [ 5325.315794] ? skb_panic+0x4f/0x60 [ 5325.317457] ? asm_exc_invalid_op+0x1f/0x30 [ 5325.319490] ? skb_panic+0x4f/0x60 [ 5325.321161] skb_put+0x4e/0x50 [ 5325.322670] mana_poll+0x6fa/0xb50 [mana] [ 5325.324578] __napi_poll+0x33/0x1e0 [ 5325.326328] net_rx_action+0x12e/0x280 As discussed internally, this alignment is not necessary. To fix this bug, remove it from the code. So oversized packets will be marked as CQE_RX_TRUNCATED by NIC, and dropped. Cc: stable@vger.kernel.org Fixes: 2fbbd712 ("net: mana: Enable RX path to handle various MTU sizes") Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by:
Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1712087316-20886-1-git-send-email-haiyangz@microsoft.comSigned-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- 15 Dec, 2023 1 commit
-
-
Konstantin Taranov authored
This patch allows to assign and poll more than one EQ on the same msix index. It is achieved by introducing a list of attached EQs in each IRQ context. It also removes the existing msix_index map that tried to ensure that there is only one EQ at each msix_index. This patch exports symbols for creating EQs from other MANA kernel modules. Signed-off-by:
Konstantin Taranov <kotaranov@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 30 Nov, 2023 1 commit
-
-
Colin Ian King authored
There is a spelling mistake in struct field hc_tx_err_sqpdid_enforecement. Fix it. Signed-off-by:
Colin Ian King <colin.i.king@gmail.com> Signed-off-by:
Shradha Gupta <shradhagupta@linux.microsoft.com> Link: https://lore.kernel.org/r/20231128095304.515492-1-colin.i.king@gmail.comSigned-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- 28 Nov, 2023 1 commit
-
-
Jakub Kicinski authored
Link page pool instances to netdev for the drivers which already link to NAPI. Unless the driver is doing something very weird per-NAPI should imply per-netdev. Add netsec as well, Ilias indicates that it fits the mold. Reviewed-by:
Eric Dumazet <edumazet@google.com> Acked-by:
Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Paolo Abeni <pabeni@redhat.com>
-
- 27 Nov, 2023 1 commit
-
-
Shradha Gupta authored
Extend performance counter stats in 'ethtool -S <interface>' for MANA VF to include all GDMA stat counter. Tested-on: Ubuntu22 Testcases: 1. LISA testcase: PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-Synthetic 2. LISA testcase: PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-SRIOV Signed-off-by:
Shradha Gupta <shradhagupta@linux.microsoft.com> Link: https://lore.kernel.org/r/1700830950-803-1-git-send-email-shradhagupta@linux.microsoft.comSigned-off-by:
Paolo Abeni <pabeni@redhat.com>
-
- 27 Oct, 2023 1 commit
-
-
Konstantin Taranov authored
This patch uses a helper function for assignment of xdp_features. This change simplifies backports. Signed-off-by:
Konstantin Taranov <kotaranov@microsoft.com> Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Link: https://lore.kernel.org/r/1698430011-21562-1-git-send-email-haiyangz@microsoft.comSigned-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- 05 Oct, 2023 3 commits
-
-
Haiyang Zhang authored
Handle the case when GSO SKB linear length is too large. MANA NIC requires GSO packets to put only the header part to SGE0, otherwise the TX queue may stop at the HW level. So, use 2 SGEs for the skb linear part which contains more than the packet header. Fixes: ca9c54d2 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by:
Simon Horman <horms@kernel.org> Reviewed-by:
Shradha Gupta <shradhagupta@linux.microsoft.com> Signed-off-by:
Paolo Abeni <pabeni@redhat.com>
-
Haiyang Zhang authored
sizeof(struct hop_jumbo_hdr) is not part of tso_bytes, so remove the subtraction from header size. Cc: stable@vger.kernel.org Fixes: bd7fc6e1 ("net: mana: Add new MANA VF performance counters for easier troubleshooting") Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by:
Simon Horman <horms@kernel.org> Reviewed-by:
Shradha Gupta <shradhagupta@linux.microsoft.com> Signed-off-by:
Paolo Abeni <pabeni@redhat.com>
-
Haiyang Zhang authored
For an unknown TX CQE error type (probably from a newer hardware), still free the SKB, update the queue tail, etc., otherwise the accounting will be wrong. Also, TX errors can be triggered by injecting corrupted packets, so replace the WARN_ONCE to ratelimited error logging. Cc: stable@vger.kernel.org Fixes: ca9c54d2 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by:
Simon Horman <horms@kernel.org> Reviewed-by:
Shradha Gupta <shradhagupta@linux.microsoft.com> Signed-off-by:
Paolo Abeni <pabeni@redhat.com>
-
- 11 Aug, 2023 1 commit
-
-
Shradha Gupta authored
Extended performance counter stats in 'ethtool -S <interface>' for MANA VF to include GDMA tx LSO packets and bytes count. Tested-on: Ubuntu22 Testcases: 1. LISA testcase: PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-Synthetic 2. LISA testcase: PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-SRIOV 3. Validated the GDMA stat packets and byte counters Signed-off-by:
Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by:
Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 10 Aug, 2023 1 commit
-
-
Souradeep Chakrabarti authored
When unloading the MANA driver, mana_dealloc_queues() waits for the MANA hardware to complete any inflight packets and set the pending send count to zero. But if the hardware has failed, mana_dealloc_queues() could wait forever. Fix this by adding a timeout to the wait. Set the timeout to 120 seconds, which is a somewhat arbitrary value that is more than long enough for functional hardware to complete any sends. Cc: stable@vger.kernel.org Fixes: ca9c54d2 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by:
Souradeep Chakrabarti <schakrabarti@linux.microsoft.com> Link: https://lore.kernel.org/r/1691576525-24271-1-git-send-email-schakrabarti@linux.microsoft.comSigned-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- 07 Aug, 2023 1 commit
-
-
Yunsheng Lin authored
Split types and pure function declarations from page_pool.h and add them in page_page/types.h, so that C sources can include page_pool.h and headers should generally only include page_pool/types.h as suggested by jakub. Rename page_pool.h to page_pool/helpers.h to have both in one place. Signed-off-by:
Yunsheng Lin <linyunsheng@huawei.com> Suggested-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by:
Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230804180529.2483231-2-aleksander.lobakin@intel.com [Jakub: change microsoft/mana, fix kdoc paths in Documentation] Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- 06 Aug, 2023 1 commit
-
-
Haiyang Zhang authored
Add page pool for RX buffers for faster buffer cycle and reduce CPU usage. The standard page pool API is used. With iperf and 128 threads test, this patch improved the throughput by 12-15%, and decreased the IRQ associated CPU's usage from 99-100% to 10-50%. Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by:
Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 03 Aug, 2023 1 commit
-
-
Jakub Kicinski authored
Handful of drivers currently expect to get xdp.h by virtue of including netdevice.h. This will soon no longer be the case so add explicit includes. Reviewed-by:
Wei Fang <wei.fang@nxp.com> Reviewed-by:
Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Acked-by:
Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/r/20230803010230.1755386-2-kuba@kernel.orgSigned-off-by:
Martin KaFai Lau <martin.lau@kernel.org>
-
- 19 Jul, 2023 1 commit
-
-
Long Li authored
It's inefficient to ring the doorbell page every time a WQE is posted to the received queue. Excessive MMIO writes result in CPU spending more time waiting on LOCK instructions (atomic operations), resulting in poor scaling performance. Move the code for ringing doorbell page to where after we have posted all WQEs to the receive queue during a callback from napi_poll(). With this change, tests showed an improvement from 120G/s to 160G/s on a 200G physical link, with 16 or 32 hardware queues. Tests showed no regression in network latency benchmarks on single connection. Reviewed-by:
Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by:
Dexuan Cui <decui@microsoft.com> Signed-off-by:
Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1689622539-5334-2-git-send-email-longli@linuxonhyperv.comSigned-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- 12 Jun, 2023 1 commit
-
-
Haiyang Zhang authored
To support vlan, use MANA_LONG_PKT_FMT if vlan tag is present in TX skb. Then extract the vlan tag from the skb struct, and save it to tx_oob for the NIC to transmit. For vlan tags on the payload, they are accepted by the NIC too. For RX, extract the vlan tag from CQE and put it into skb. Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 01 Jun, 2023 1 commit
-
-
Long Li authored
With RX coalescing, one CQE entry can be used to indicate multiple packets on the receive queue. This saves processing time and PCI bandwidth over the CQ. The MANA Ethernet driver also uses the v2 version of the protocol. It doesn't use RX coalescing and its behavior is not changed. Link: https://lore.kernel.org/r/1684045095-31228-1-git-send-email-longli@linuxonhyperv.comSigned-off-by:
Long Li <longli@microsoft.com> Signed-off-by:
Jason Gunthorpe <jgg@nvidia.com>
-
- 30 May, 2023 1 commit
-
-
Haiyang Zhang authored
The apc->eth_stats.rx_cqes is one per NIC (vport), and it's on the frequent and parallel code path of all queues. So, r/w into this single shared variable by many threads on different CPUs creates a lot caching and memory overhead, hence perf regression. And, it's not accurate due to the high volume concurrent r/w. For example, a workload is iperf with 128 threads, and with RPS enabled. We saw perf regression of 25% with the previous patch adding the counters. And this patch eliminates the regression. Since the error path of mana_poll_rx_cq() already has warnings, so keeping the counter and convert it to a per-queue variable is not necessary. So, just remove this counter from this high frequency code path. Also, remove the tx_cqes counter for the same reason. We have warnings & other counters for errors on that path, and don't need to count every normal cqe processing. Cc: stable@vger.kernel.org Fixes: bd7fc6e1 ("net: mana: Add new MANA VF performance counters for easier troubleshooting") Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by:
Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by:
Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/1685115537-31675-1-git-send-email-haiyangz@microsoft.comSigned-off-by:
Paolo Abeni <pabeni@redhat.com>
-
- 25 Apr, 2023 2 commits
-
-
Haiyang Zhang authored
netdev/napi_alloc_frag() may fall back to single page which is smaller than the requested size. Add error checking to avoid memory overwritten. Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Haiyang Zhang authored
Rename mana_refill_rxoob for naming consistency. And remove some empty lines between function call and error checking. Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- 14 Apr, 2023 4 commits
-
-
Haiyang Zhang authored
During probe, get the hardware-allowed max MTU by querying the device configuration. Users can select MTU up to the device limit. When XDP is in use, limit MTU settings so the buffer size is within one page. And, when MTU is set to a too large value, XDP is not allowed to run. Also, to prevent changing MTU fails, and leaves the NIC in a bad state, pre-allocate all buffers before starting the change. So in low memory condition, it will return error, without affecting the NIC. Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by:
Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Haiyang Zhang authored
Update RX data path to allocate and use RX queue DMA buffers with proper size based on potentially various MTU sizes. Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by:
Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Haiyang Zhang authored
Move out common buffer allocation code from mana_process_rx_cqe() and mana_alloc_rx_wqe() to helper functions. Refactor related variables so they can be changed in one place, and buffer sizes are in sync. Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by:
Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Haiyang Zhang authored
Use napi_build_skb() instead of build_skb() to take advantage of the NAPI percpu caches to obtain skbuff_head. Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by:
Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 17 Mar, 2023 1 commit
-
-
Shradha Gupta authored
Extended performance counter stats in 'ethtool -S <interface>' output for MANA VF to facilitate troubleshooting. Tested-on: Ubuntu22 Signed-off-by:
Shradha Gupta <shradhagupta@linux.microsoft.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 03 Feb, 2023 1 commit
-
-
Marek Majtyka authored
A summary of the flags being set for various drivers is given below. Note that XDP_F_REDIRECT_TARGET and XDP_F_FRAG_TARGET are features that can be turned off and on at runtime. This means that these flags may be set and unset under RTNL lock protection by the driver. Hence, READ_ONCE must be used by code loading the flag value. Also, these flags are not used for synchronization against the availability of XDP resources on a device. It is merely a hint, and hence the read may race with the actual teardown of XDP resources on the device. This may change in the future, e.g. operations taking a reference on the XDP resources of the driver, and in turn inhibiting turning off this flag. However, for now, it can only be used as a hint to check whether device supports becoming a redirection target. Turn 'hw-offload' feature flag on for: - netronome (nfp) - netdevsim. Turn 'native' and 'zerocopy' features flags on for: - intel (i40e, ice, ixgbe, igc) - mellanox (mlx5). - stmmac - netronome (nfp) Turn 'native' features flags on for: - amazon (ena) - broadcom (bnxt) - freescale (dpaa, dpaa2, enetc) - funeth - intel (igb) - marvell (mvneta, mvpp2, octeontx2) - mellanox (mlx4) - mtk_eth_soc - qlogic (qede) - sfc - socionext (netsec) - ti (cpsw) - tap - tsnep - veth - xen - virtio_net. Turn 'basic' (tx, pass, aborted and drop) features flags on for: - netronome (nfp) - cavium (thunder) - hyperv. Turn 'redirect_target' feature flag on for: - amanzon (ena) - broadcom (bnxt) - freescale (dpaa, dpaa2) - intel (i40e, ice, igb, ixgbe) - ti (cpsw) - marvell (mvneta, mvpp2) - sfc - socionext (netsec) - qlogic (qede) - mellanox (mlx5) - tap - veth - virtio_net - xen Reviewed-by:
Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by:
Simon Horman <simon.horman@corigine.com> Acked-by:
Stanislav Fomichev <sdf@google.com> Acked-by:
Jakub Kicinski <kuba@kernel.org> Co-developed-by:
Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by:
Kumar Kartikeya Dwivedi <memxor@gmail.com> Co-developed-by:
Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by:
Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by:
Marek Majtyka <alardam@gmail.com> Link: https://lore.kernel.org/r/3eca9fafb308462f7edb1f58e451d59209aa07eb.1675245258.git.lorenzo@kernel.orgSigned-off-by:
Alexei Starovoitov <ast@kernel.org>
-
- 06 Dec, 2022 1 commit
-
-
Haiyang Zhang authored
After calling napi_complete_done(), the NAPIF_STATE_SCHED bit may be cleared, and another CPU can start napi thread and access per-CQ variable, cq->work_done. If the other thread (for example, from busy_poll) sets it to a value >= budget, this thread will continue to run when it should stop, and cause memory corruption and panic. To fix this issue, save the per-CQ work_done variable in a local variable before napi_complete_done(), so it won't be corrupted by a possible concurrent thread after napi_complete_done(). Also, add a flag bit to advertise to the NIC firmware: the NAPI work_done variable race is fixed, so the driver is able to reliably support features like busy_poll. Cc: stable@vger.kernel.org Fixes: e1b5683f ("net: mana: Move NAPI from EQ to CQ") Signed-off-by:
Haiyang Zhang <haiyangz@microsoft.com> Link: https://lore.kernel.org/r/1670010190-28595-1-git-send-email-haiyangz@microsoft.comSigned-off-by:
Paolo Abeni <pabeni@redhat.com>
-
- 10 Nov, 2022 3 commits
-
-
Nathan Huckleberry authored
The ndo_start_xmit field in net_device_ops is expected to be of type netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev). The mismatched return type breaks forward edge kCFI since the underlying function definition does not match the function hook definition. A new warning in clang will catch this at compile time: drivers/net/ethernet/microsoft/mana/mana_en.c:382:21: error: incompatible function pointer types initializing 'netdev_tx_t (*)(struct sk_buff *, struct net_device *)' (aka 'enum netdev_tx (*)(struct sk_buff *, struct net_device *)') with an expression of type 'int (struct sk_buff *, struct net_device *)' [-Werror,-Wincompatible-function-pointer-types-strict] .ndo_start_xmit = mana_start_xmit, ^~~~~~~~~~~~~~~ 1 error generated. The return type of mana_start_xmit should be changed from int to netdev_tx_t. Reported-by:
Dan Carpenter <error27@gmail.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1703 Link: https://github.com/ClangBuiltLinux/linux/issues/1750Signed-off-by:
Nathan Huckleberry <nhuck@google.com> Reviewed-by:
Dexuan Cui <decui@microsoft.com> [nathan: Rebase on net-next and resolve conflicts Add note about new clang warning] Signed-off-by:
Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20221109002629.1446680-1-nathan@kernel.orgSigned-off-by:
Paolo Abeni <pabeni@redhat.com>
-
Ajay Sharma authored
The MANA hardware support protection domain and memory registration for use in RDMA environment. Add those definitions and expose them for use by the RDMA driver. Signed-off-by:
Ajay Sharma <sharmaajay@microsoft.com> Signed-off-by:
Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1667502990-2559-12-git-send-email-longli@linuxonhyperv.comReviewed-by:
Dexuan Cui <decui@microsoft.com> Acked-by:
Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by:
Leon Romanovsky <leonro@nvidia.com>
-
Long Li authored
The number of maximum SGl entries should be computed from the maximum WQE size for the intended queue type and the corresponding OOB data size. This guarantees the hardware queue can successfully queue requests up to the queue depth exposed to the upper layer. Reviewed-by:
Dexuan Cui <decui@microsoft.com> Signed-off-by:
Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1667502990-2559-9-git-send-email-longli@linuxonhyperv.comAcked-by:
Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by:
Leon Romanovsky <leonro@nvidia.com>
-