- 10 Jun, 2021 11 commits
-
-
Vlad Buslov authored
Dynamic FDB entries require capability to age out unused entries. Such entries are either aged out by kernel software bridge implementation or by hardware switch that offloaded them (and notified the kernel to mark them as SWITCHDEV_FDB_ADD_TO_BRIDGE). Leaving ageing to kernel bridge would result it deleting offloaded dynamic FDB entries every ageing_time period due to packets being processed by hardware and, consecutively, 'used' timestamp for FDB entry not being updated. However, since hardware doesn't support ageing, software solution inside the driver is required. In order to emulate hardware ageing in driver, extend bridge FDB ingress flows with counter and create delayed br_offloads->update_work task on bridge offloads workqueue. Run the task every second, update 'used' timestamp in software bridge dynamic entry by sending SWITCHDEV_FDB_ADD_TO_BRIDGE for the entry, if it flow hardware counter lastuse field was changed since last update. If lastuse wasn't changed for ageing_time period, then delete the FDB entry and notify kernel bridge by sending SWITCHDEV_FDB_DEL_TO_BRIDGE notification. Register blocking switchdev notifier callback and handle attribute set SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME event to allow user to dynamically configure bridge FDB entry ageing timeout. Save the value per-bridge in struct mlx5_esw_bridge. Silently ignore SWITCHDEV_ATTR_ID_PORT_{PRE_}BRIDGE_FLAGS switchdev event since mlx5 bridge implementation relies on software bridge for implementing necessary behavior for all of these flags. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Vlad Buslov authored
Hardware supported by mlx5 driver doesn't provide learning and requires the driver to emulate all switch-like behavior in software. As such, all packets by default go through miss path, appear on representor and get to software bridge, if it is the upper device of the representor. This causes bridge to process packet in software, learn the MAC address to FDB and send SWITCHDEV_FDB_ADD_TO_DEVICE event to all subscribers. In order to offload FDB entries in mlx5, register switchdev notifier callback and implement support for both 'added_by_user' and dynamic FDB entry SWITCHDEV_FDB_ADD_TO_DEVICE events asynchronously using new mlx5_esw_bridge_offloads->wq ordered workqueue. In workqueue callback offload the ingress rule (matching FDB entry MAC as packet source MAC) and egress table rule (matching FDB entry MAC as destination MAC). For ingress table rule also match source vport to ensure that only traffic coming from expected bridge port is matched by offloaded rule. Save all the relevant FDB entry data in struct mlx5_esw_bridge_fdb_entry instance and insert the instance in new mlx5_esw_bridge->fdb_list list (for traversing all entries by software ageing implementation in following patch) and in new mlx5_esw_bridge->fdb_ht hash table for fast retrieval. Notify the bridge that FDB entry has been offloaded by sending SWITCHDEV_FDB_OFFLOADED notification. Delete FDB entry on reception of SWITCHDEV_FDB_DEL_TO_DEVICE event. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Vlad Buslov authored
Create new files bridge.{c|h} in en/rep directory that implement bridge interaction with representor netdevices and handle required events/notifications, bridge.{c|h} in esw directory that implement all necessary eswitch offloading infrastructure and works on vport/eswitch level. Provide new kconfig MLX5_BRIDGE which is automatically selected when both kernel bridge and mlx5 eswitch configs are enabled. Provide basic infrastructure for bridge offloads: - struct mlx5_esw_bridge_offloads - per-eswitch bridge offload structure that encapsulates generic bridge-offloads data (notifier blocks, ingress flow table/group, etc.) that is created/deleted on enable/disable eswitch offloads. - struct mlx5_esw_bridge - per-bridge structure that encapsulates per-bridge data (reference counter, FDB, egress flow table/group, etc.) that is created when first eswitch represetor is attached to new bridge and deleted when last representor is removed from the bridge as a result of NETDEV_CHANGEUPPER event. The bridge tables are created with new priority FDB_BR_OFFLOAD in FDB namespace. The new priority is between tc-miss and slow path priorities. Priority consist of two levels: the ingress table that is global per eswitch and matches incoming packets by src_mac/vid and redirects them to next level (egress table) that is chosen according to ingress port bridge membership and matches on dst_mac/vid in order to redirect packet to vport according to the following diagram: + | +---------v----------+ | | | FDB_TC_OFFLOAD | | | +---------+----------+ | | +---------v----------+ | | | FDB_FT_OFFLOAD | | | +---------+----------+ | | +---------v----------+ | | | FDB_TC_MISS | | | +---------+----------+ | +--------------------------------------+ | | | | +------+ | | | | | +------v--------+ FDB_BR_OFFLOAD | | | INGRESS_TABLE | | | +------+---+----+ | | | | match | | | +---------+ | | | | | +-------+ | | +-------v-------+ match | | | | | | EGRESS_TABLE +------------> vport | | | +-------+-------+ | | | | | | | +-------+ | | miss | | | +------+------+ | | | | +--------------------------------------+ | | +---------v----------+ | | | FDB_SLOW_PATH | | | +---------+----------+ | v Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Vlad Buslov authored
Change the helper to functions to accept constant pointer to struct net_device. This is necessary for following patches in series that pass mlx5e_eswitch_rep() as a callback to kernel bridge infrastructure code. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Vlad Buslov authored
In order to adhere to kernel software datapath model bridge offloads must come after TC and NF FDBs. Following patches in this series add new FDB priority for bridge after FDB_FT_OFFLOAD. However, since netfilter offload is implemented with unmanaged tables, its miss path is not automatically connected to next priority and requires the code to manually connect with slow table. To keep bridge offloads encapsulated and not mix it with eswitch offloads, create a new FDB_TC_MISS priority between FDB_FT_OFFLOAD and FDB_SLOW_PATH: + | +---------v----------+ | | | FDB_TC_OFFLOAD | | | +---------+----------+ | | | +---------v----------+ | | | FDB_FT_OFFLOAD | | | +---------+----------+ | | | +---------v----------+ | | | FDB_TC_MISS | | | +---------+----------+ | | | +---------v----------+ | | | FDB_SLOW_PATH | | | +---------+----------+ | v Initialize the new priority with single default empty managed table and use the table as TC/NF miss patch instead of slow table. This approach allows bridge offloads to be created as new FDB namespace priority between FDB_TC_MISS and FDB_SLOW_PATH without exposing its internal tables to any other modules since miss path of managed TC-miss table is automatically wired to next priority. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yevgeny Kliteynik authored
Add support for EMD tag in modify header set/copy actions on device that supports STEv1. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yevgeny Kliteynik authored
Add support for INSERT_HEADER packet reformat context type Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yevgeny Kliteynik authored
Adding new reformat context type (INSERT_HEADER) requires adding two new parameters to reformat context - reformat_param_0 and reformat_param_1. As defined by HW spec, these parameters have different meaning for different reformat context type. The first parameter (reformat_param_0) is not new to HW spec, but it wasn't used by any of the supported reformats. The second parameter (reformat_param_1) is new to the HW spec - it was added to allow supporting INSERT_HEADER. For NSERT_HEADER, reformat_param_0 indicates the header used to reference the location of the inserted header, and reformat_param_1 indicates the offset of the inserted header from the reference point defined by reformat_param_0. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yevgeny Kliteynik authored
Encap actions on RX flow were not supported on older devices. However, this is no longer the case in devices that support STEv1. This patch adds support for encap l3/l2 on RX flow for supported devices: update actions state machine by adding the newely supported transitions and add the required support in STEv0/1 files. The new transitions that are supported are: - from decap/modify-header/pop-vlan to encap - from encap to termination table Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yevgeny Kliteynik authored
Split single reformat state into two separate states for encap and decap. This will allow adding actions to the specific domain, such as encap on RX. Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
Yevgeny Kliteynik authored
Add support for HCA caps 2 that contains capabilities for the new insert/remove header actions. Added the required definitions for supporting the new reformat type: added packet reformat parameters, reformat anchors and definitions to allow copy/set into the inserted EMD (Embedded MetaData) tag. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
-
- 09 Jun, 2021 29 commits
-
-
David S. Miller authored
Alex Elder says: ==================== net: ipa: memory region rework, part 1 This is the first portion of a very long series of patches that has been split in two. Once these patches are accepted, I'll post the remaining patches. The combined series reworks the way memory regions are defined in the configuration data, and in the process solidifies code that ensures configurations are valid. In this portion (part 1), most of the focus is on improving validation of code. This validation is now done unconditionally (something I promised Leon Romanovsky I would work on). Validation will occur earlier than before, catching configuration problems as early as possible and permitting the rest of the driver to avoid needing to do some error checking. There will now be checks to ensure all defined regions are supported by the hardware, that required regions are all defined, and that there are no duplicate regions. The second portion (part 2) is mainly a set of small but pervasive changes whose result is to have the memory region array not be indexed by region ID. I'll provide further explanation when I post that series. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
In ipa_mem_valid(), wait until regions have been marked in the memory region bitmap, and check all that are not found there to ensure they are not required. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
Add a test in ipa_mem_valid() to ensure no memory region is defined more than once, using a bitmap to record each defined memory region. Skip over undefined regions when checking (we can have any number of those). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
Introduce ipa_mem_id_valid(), and use it to check defined memory regions to ensure they are valid for a given version of IPA. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
Introduce a new function that indicates whether a given memory region is required for a given version of IPA hardware. Use it to verify that all required regions are present during initialization. Reorder the definitions of the memory region IDs to be based on the version in which they're first defined. Use "+" rather than "and above" where defining the IPA versions in which memory IDs are used, and indicate which regions are optional (many are not). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
Pass the memory configuration data array to ipa_mem_valid() for validation, and use that rather than assuming it's already been recorded in the IPA structure. Move the memory data array size check into ipa_mem_valid(). Call ipa_mem_valid() early in ipa_mem_init(), and only proceed with assigning the memory array pointer and size if it is found to be valid. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
Move the memory region validation check so it happens earlier when initializing the driver, at init time rather than config time (i.e., before access to hardware is required). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
The only thing done by ipa_mem_valid_one() that requires hardware access is the check for whether all regions fit within the size of IPA local memory specified by an IPA register. Introduce ipa_mem_size_valid() to implement this verification and stop doing so in ipa_mem_valid_one(). Call the new function from ipa_mem_config() (which is also the caller of ipa_mem_valid()). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
Currently, memory regions are validated in the loop that initializes them. Instead, validate them separately. Rename ipa_mem_valid() to be ipa_mem_valid_one(). Define a *new* function named ipa_mem_valid() that performs validation of the array of memory regions provided. This function calls ipa_mem_valid_one() for each region in turn. Skip validation for any "empty" region descriptors, which have zero size and are not preceded by any canary values. Issue a warning for such descriptors if the offset is non-zero. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
Do memory region descriptor validation unconditionally, rather than having it depend on IPA_VALIDATION being defined. Pass the address of a memory region descriptor rather than a memory ID to ipa_mem_valid(). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
Store the memory region ID in the memory descriptor structure. This is a move toward *not* indexing the array by the ID, but for now we must still specify those index values. Define an explicitly undefined region ID, value 0, so uninitialized entries in the array won't use an otherwise valid ID. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alex Elder authored
Define a new pseudo memory region identifer that specifies the offset at the end of IPA resident memory. Use it instead of IPA_MEM_UC_EVENT_RING in places where the size of that region was defined to be 0. The size of the IPA_MEM_END_MARKER pseudo region must be zero. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
The return code variable rc is being set to return error values in two places in sja1105_mdiobus_base_tx_register and yet it is not being returned, the function always returns 0 instead. Fix this by replacing the return 0 with the return code rc. Addresses-Coverity: ("Unused value") Fixes: 5a8f0974 ("net: dsa: sja1105: register the MDIO buses for 100base-T1 and 100base-TX") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Matteo Croce authored
To support XDP, a headroom is prepended to the packet data. Consider this offset when doing a prefetch. Fixes: da5ec7f2 ("net: stmmac: refactor stmmac_init_rx_buffers for stmmac_reinit_rx_buffers") Signed-off-by: Matteo Croce <mcroce@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
The call to mlxsw_thermal_module_temp_and_thresholds_get passes a NULL pointer for the temperature and this can be dereferenced in this function if the mlxsw_reg_query call fails. The simplist fix is to pass the address of dummy temperature variable instead of a NULL pointer. Addresses-Coverity: ("Explicit null dereferenced") Fixes: 72a64c2f ("mlxsw: thermal: Read module temperature thresholds using MTMP register") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Kristian Evensen authored
Commit e1d9a90a ("net: ethernet: rmnet: Support for ingress MAPv5 checksum offload") broke ingress handling for devices where RMNET_FLAGS_INGRESS_MAP_CKSUMV5 or RMNET_FLAGS_INGRESS_MAP_CKSUMV4 are not set. Unless either of these flags are set, the MAP header is not removed. This commit restores the original logic by ensuring that the MAP header is removed for all MAP packets. Fixes: e1d9a90a ("net: ethernet: rmnet: Support for ingress MAPv5 checksum offload") Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
The sparse tool complains as follows: drivers/net/ethernet/amazon/ena/ena_netdev.c:978:13: warning: symbol 'ena_alloc_map_page' was not declared. Should it be static? This symbol is not used outside of ena_netdev.c, so marks it static. Fixes: 947c54c3 ("net: ena: Use dev_alloc() in RX buffer allocation") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
The comparisons of the u16 values priv->phycr1 and priv->phycr2 to less than zero always false because they are unsigned. Fix this by using an int for the assignment and less than zero check. Addresses-Coverity: ("Unsigned compared against 0") Fixes: 0a4355c2 ("net: phy: realtek: add dt property to disable CLKOUT clock") Fixes: d90db36a ("net: phy: realtek: add dt property to enable ALDPS mode") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
There are missing { } around a block of code on an if statement. Fix this by adding them in. Addresses-Coverity: ("Nesting level does not match indentation") Fixes: 46682cb8 ("net: stmmac: enable Intel mGbE 2.5Gbps link speed") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yang Yingliang authored
Use the devm_platform_ioremap_resource_byname() helper instead of calling platform_get_resource_byname() and devm_ioremap_resource() separately. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Matteo Croce says: ==================== mvpp2: prefetch data early These two patches prefetch some data from RAM so to reduce stall and speedup the packet processing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Matteo Croce authored
Most of the time during the RX is caused by the compound_head() call done at the end of the RX loop: │ build_skb(): [...] │ static inline struct page *compound_head(struct page *page) │ { │ unsigned long head = READ_ONCE(page->compound_head); 65.23 │ ldr x2, [x1, #8] Prefetch the page struct as soon as possible, to speedup the RX path noticeabily by a ~3-4% packet rate in a drop test. │ build_skb(): [...] │ static inline struct page *compound_head(struct page *page) │ { │ unsigned long head = READ_ONCE(page->compound_head); 17.92 │ ldr x2, [x1, #8] Signed-off-by: Matteo Croce <mcroce@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Matteo Croce authored
In the RX buffer, the received data starts after a headroom used to align the IP header and to allow prepending headers efficiently. The prefetch() should take this into account, and prefetch from the very start of the received data. We can see that ether_addr_equal_64bits(), which is the first function to access the data, drops from the top of the perf top output. prefetch(data): Overhead Shared Object Symbol 11.64% [kernel] [k] eth_type_trans prefetch(data + MVPP2_MH_SIZE + MVPP2_SKB_HEADROOM): Overhead Shared Object Symbol 13.42% [kernel] [k] build_skb 10.35% [mvpp2] [k] mvpp2_rx 9.35% [kernel] [k] __netif_receive_skb_core 8.24% [kernel] [k] kmem_cache_free 7.97% [kernel] [k] dev_gro_receive 7.68% [kernel] [k] page_pool_put_page 7.32% [kernel] [k] kmem_cache_alloc 7.09% [mvpp2] [k] mvpp2_bm_pool_put 3.36% [kernel] [k] eth_type_trans Also, move the eth_type_trans() call a bit down, to give the RAM more time to prefetch the data. Signed-off-by: Matteo Croce <mcroce@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yang Yingliang authored
Use the devm_platform_ioremap_resource_byname() helper instead of calling platform_get_resource_byname() and devm_ioremap_resource() separately. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yang Yingliang authored
Use the devm_platform_ioremap_resource_byname() helper instead of calling platform_get_resource_byname() and devm_ioremap_resource() separately. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yang Yingliang authored
It will cause null-ptr-deref if platform_get_resource() returns NULL, we need check the return value. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Shai Malin authored
This reverts commits: - 76241154 nvme: NVME_TCP_OFFLOAD should not default to m - 5ff5622e: Merge branch 'NVMeTCP-Offload-ULP' As requested on the mailing-list: https://lore.kernel.org/netdev/SJ0PR18MB3882C20793EA35A3E8DAE300CC379@SJ0PR18MB3882.namprd18.prod.outlook.com/ This patch will revert the nvme-tcp-offload ULP from net-next. The nvme-tcp-offload ULP series will continue to be considered only on linux-nvme@lists.infradead.org. Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com> Signed-off-by: Michal Kalderon <mkalderon@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
The comparison of the u16 priv->phy_addr < 0 is always false because phy_addr is unsigned. Fix this by assigning the return from the call to function asix_read_phy_addr to int ret and using this for the less than zero error check comparison. Fixes: 7e88b11a ("net: usb: asix: refactor asix_read_phy_addr() and handle errors on return") Addresses-Coverity: ("Unsigned compared against 0") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
The comparison of the u16 priv->phy_addr < 0 is always false because phy_addr is unsigned. Fix this by assigning the return from the call to function asix_read_phy_addr to int ret and using this for the less than zero error check comparison. Addresses-Coverity: ("Unsigned compared against 0") Fixes: e532a096 ("net: usb: asix: ax88772: add phylib support") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-