Merge tag 'mlx5-updates-2022-10-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says: ==================== mlx5-updates-2022-10-24 SW steering updates from Yevgeny Kliteynik: 1) 1st Four patches: small fixes / optimizations for SW steering: - Patch 1: Don't abort destroy flow if failed to destroy table - continue and free everything else. - Patches 2 and 3 deal with fast teardown: + Skip sync during fast teardown, as PCI device is not there any more. + Check device state when polling CQ - otherwise SW steering keeps polling the CQ forever, because nobody is there to flush it. - Patch 4: Removing unneeded function argument. 2) Deal with the hiccups that we get during rules insertion/deletion, which sometimes reach 1/4 of a second. While insertion/deletion rate improvement was not the focus here, it still is a by-product of removing these hiccups. Another by-product is the reduced standard deviation in measuring the duration of rules insertion/deletion bursts. In the testing we add K rules (warm-up phase), and then continuously do insertion/deletion bursts of N rules. During the test execution, the driver measures hiccups (amount and duration) and total time for insertion/deletion of a batch of rules. Here are some numbers, before and after these patches: +--------------------------------------------+-----------------+----------------+ | | Create rules | Delete rules | | +--------+--------+--------+-------+ | | Before | After | Before | After | +--------------------------------------------+--------+--------+--------+-------+ | Max hiccup [msec] | 253 | 42 | 254 | 68 | +--------------------------------------------+--------+--------+--------+-------+ | Avg duration of 10K rules add/remove [msec]| 140.07 | 124.32 | 106.99 | 99.51 | +--------------------------------------------+--------+--------+--------+-------+ | Num of hiccups per 100K rules add/remove | 7.77 | 7.97 | 12.60 | 11.57 | +--------------------------------------------+--------+--------+--------+-------+ | Avg hiccup duration [msec] | 36.92 | 33.25 | 36.15 | 33.74 | +--------------------------------------------+--------+--------+--------+-------+ - Patch 5: Allocate a short array on stack instead of dynamically- it is destroyed at the end of the function. - Patch 6: Rather than cleaning the corresponding chunk's section of ste_arrays on chunk deletion, initialize these areas upon chunk creation. Chunk destruction tend to come in large batches (during pool syncing), so instead of doing huge memory initialization during pool sync, we amortize this by doing small initsializations on chunk creation. - Patch 7: In order to simplifies error flow and allows cleaner addition of new pools, handle creation/destruction of all the domain's memory pools and other memory-related fields in a separate init/uninit functions. - Patch 8: During rehash, write each table row immediately instead of waiting for the whole table to be ready and writing it all - saves allocations of ste_send_info structures and improves performance. - Patch 9: Instead of allocating/freeing send info objects dynamically, manage them in pool. The number of send info objects doesn't depend on number of rules, so after pre-populating the pool with an initial batch of send info objects, the pool is not expected to grow. This way we save alloc/free during writing STEs to ICM, which by itself can sometimes take up to 40msec. - Patch 10: Allocate icm_chunks from their own slab allocator, which lowered the alloc/free "hiccups" frequency. - Patch 11: Similar to patch 9, allocate htbl from its own slab allocator. - Patch 12: Lower sync threshold for ICM hot memory - set the threshold for sync to 1/4 of the pool instead of 1/2 of the pool. Although we will have more syncs, each sync will be shorter and will help with insertion rate stability. Also, notice that the overall number of hiccups wasn't increased due to all the other patches. - Patch 13: Keep track of hot ICM chunks in an array instead of list. After steering sync, we traverse the hot list and finally free all the chunks. It appears that traversing a long list takes unusually long time due to cache misses on many entries, which causes a big "hiccup" during rule insertion. This patch replaces the list with pre-allocated array that stores only the bookkeeping information that is needed to later free the chunks in its buddy allocator. - Patch 14: Remove the unneeded buddy used_list - we don't need to have the list of used chunks, we only need the total amount of used memory. * tag 'mlx5-updates-2022-10-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: DR, Remove the buddy used_list net/mlx5: DR, Keep track of hot ICM chunks in an array instead of list net/mlx5: DR, Lower sync threshold for ICM hot memory net/mlx5: DR, Allocate htbl from its own slab allocator net/mlx5: DR, Allocate icm_chunks from their own slab allocator net/mlx5: DR, Manage STE send info objects in pool net/mlx5: DR, In rehash write the line in the entry immediately net/mlx5: DR, Handle domain memory resources init/uninit separately net/mlx5: DR, Initialize chunk's ste_arrays at chunk creation net/mlx5: DR, For short chains of STEs, avoid allocating ste_arr dynamically net/mlx5: DR, Remove unneeded argument from dr_icm_chunk_destroy net/mlx5: DR, Check device state when polling CQ net/mlx5: DR, Fix the SMFS sync_steering for fast teardown net/mlx5: DR, In destroy flow, free resources even if FW command failed ==================== Link: https://lore.kernel.org/r/20221027145643.6618-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'mlx5-updates-2022-10-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says: ==================== mlx5-updates-2022-10-24 SW steering updates from Yevgeny Kliteynik: 1) 1st Four patches: small fixes / optimizations for SW steering: - Patch 1: Don't abort destroy flow if failed to destroy table - continue and free everything else. - Patches 2 and 3 deal with fast teardown: + Skip sync during fast teardown, as PCI device is not there any more. + Check device state when polling CQ - otherwise SW steering keeps polling the CQ forever, because nobody is there to flush it. - Patch 4: Removing unneeded function argument. 2) Deal with the hiccups that we get during rules insertion/deletion, which sometimes reach 1/4 of a second. While insertion/deletion rate improvement was not the focus here, it still is a by-product of removing these hiccups. Another by-product is the reduced standard deviation in measuring the duration of rules insertion/deletion bursts. In the testing we add K rules (warm-up phase), and then continuously do insertion/deletion bursts of N rules. During the test execution, the driver measures hiccups (amount and duration) and total time for insertion/deletion of a batch of rules. Here are some numbers, before and after these patches: +--------------------------------------------+-----------------+----------------+ | | Create rules | Delete rules | | +--------+--------+--------+-------+ | | Before | After | Before | After | +--------------------------------------------+--------+--------+--------+-------+ | Max hiccup [msec] | 253 | 42 | 254 | 68 | +--------------------------------------------+--------+--------+--------+-------+ | Avg duration of 10K rules add/remove [msec]| 140.07 | 124.32 | 106.99 | 99.51 | +--------------------------------------------+--------+--------+--------+-------+ | Num of hiccups per 100K rules add/remove | 7.77 | 7.97 | 12.60 | 11.57 | +--------------------------------------------+--------+--------+--------+-------+ | Avg hiccup duration [msec] | 36.92 | 33.25 | 36.15 | 33.74 | +--------------------------------------------+--------+--------+--------+-------+ - Patch 5: Allocate a short array on stack instead of dynamically- it is destroyed at the end of the function. - Patch 6: Rather than cleaning the corresponding chunk's section of ste_arrays on chunk deletion, initialize these areas upon chunk creation. Chunk destruction tend to come in large batches (during pool syncing), so instead of doing huge memory initialization during pool sync, we amortize this by doing small initsializations on chunk creation. - Patch 7: In order to simplifies error flow and allows cleaner addition of new pools, handle creation/destruction of all the domain's memory pools and other memory-related fields in a separate init/uninit functions. - Patch 8: During rehash, write each table row immediately instead of waiting for the whole table to be ready and writing it all - saves allocations of ste_send_info structures and improves performance. - Patch 9: Instead of allocating/freeing send info objects dynamically, manage them in pool. The number of send info objects doesn't depend on number of rules, so after pre-populating the pool with an initial batch of send info objects, the pool is not expected to grow. This way we save alloc/free during writing STEs to ICM, which by itself can sometimes take up to 40msec. - Patch 10: Allocate icm_chunks from their own slab allocator, which lowered the alloc/free "hiccups" frequency. - Patch 11: Similar to patch 9, allocate htbl from its own slab allocator. - Patch 12: Lower sync threshold for ICM hot memory - set the threshold for sync to 1/4 of the pool instead of 1/2 of the pool. Although we will have more syncs, each sync will be shorter and will help with insertion rate stability. Also, notice that the overall number of hiccups wasn't increased due to all the other patches. - Patch 13: Keep track of hot ICM chunks in an array instead of list. After steering sync, we traverse the hot list and finally free all the chunks. It appears that traversing a long list takes unusually long time due to cache misses on many entries, which causes a big "hiccup" during rule insertion. This patch replaces the list with pre-allocated array that stores only the bookkeeping information that is needed to later free the chunks in its buddy allocator. - Patch 14: Remove the unneeded buddy used_list - we don't need to have the list of used chunks, we only need the total amount of used memory. * tag 'mlx5-updates-2022-10-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: DR, Remove the buddy used_list net/mlx5: DR, Keep track of hot ICM chunks in an array instead of list net/mlx5: DR, Lower sync threshold for ICM hot memory net/mlx5: DR, Allocate htbl from its own slab allocator net/mlx5: DR, Allocate icm_chunks from their own slab allocator net/mlx5: DR, Manage STE send info objects in pool net/mlx5: DR, In rehash write the line in the entry immediately net/mlx5: DR, Handle domain memory resources init/uninit separately net/mlx5: DR, Initialize chunk's ste_arrays at chunk creation net/mlx5: DR, For short chains of STEs, avoid allocating ste_arr dynamically net/mlx5: DR, Remove unneeded argument from dr_icm_chunk_destroy net/mlx5: DR, Check device state when polling CQ net/mlx5: DR, Fix the SMFS sync_steering for fast teardown net/mlx5: DR, In destroy flow, free resources even if FW command failed ==================== Link: https://lore.kernel.org/r/20221027145643.6618-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
02a97e02 · Jakub Kicinski · eb288cbd · edaea001 · 02a97e02 · 02a97e02
Commit 02a97e02 authored Oct 28, 2022 by Jakub Kicinski
10 changed files
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_buddy.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_buddy.c
@@ -15,8 +15,6 @@ int mlx5dr_buddy_init(struct mlx5dr_icm_buddy_mem *buddy,
 	buddy->max_order = max_order;

 	INIT_LIST_HEAD(&buddy->list_node);
-	INIT_LIST_HEAD(&buddy->used_list);
-	INIT_LIST_HEAD(&buddy->hot_list);

 	buddy->bitmap = kcalloc(buddy->max_order + 1,
 				sizeof(*buddy->bitmap),

--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_cmd.c
@@ -271,6 +271,13 @@ int mlx5dr_cmd_sync_steering(struct mlx5_core_dev *mdev)
 {
 	u32 in[MLX5_ST_SZ_DW(sync_steering_in)] = {};

+	/* Skip SYNC in case the device is internal error state.
+	 * Besides a device error, this also happens when we're
+	 * in fast teardown
+	 */
+	if (mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
+		return 0;
+
 	MLX5_SET(sync_steering_in, in, opcode, MLX5_CMD_OP_SYNC_STEERING);

 	return mlx5_cmd_exec_in(mdev, sync_steering, in);

--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c
@@ -56,6 +56,70 @@ int mlx5dr_domain_get_recalc_cs_ft_addr(struct mlx5dr_domain *dmn,
 	return 0;
 }

+static int dr_domain_init_mem_resources(struct mlx5dr_domain *dmn)
+{
+	int ret;
+
+	dmn->chunks_kmem_cache = kmem_cache_create("mlx5_dr_chunks",
+						   sizeof(struct mlx5dr_icm_chunk), 0,
+						   SLAB_HWCACHE_ALIGN, NULL);
+	if (!dmn->chunks_kmem_cache) {
+		mlx5dr_err(dmn, "Couldn't create chunks kmem_cache\n");
+		return -ENOMEM;
+	}
+
+	dmn->htbls_kmem_cache = kmem_cache_create("mlx5_dr_htbls",
+						  sizeof(struct mlx5dr_ste_htbl), 0,
+						  SLAB_HWCACHE_ALIGN, NULL);
+	if (!dmn->htbls_kmem_cache) {
+		mlx5dr_err(dmn, "Couldn't create hash tables kmem_cache\n");
+		ret = -ENOMEM;
+		goto free_chunks_kmem_cache;
+	}
+
+	dmn->ste_icm_pool = mlx5dr_icm_pool_create(dmn, DR_ICM_TYPE_STE);
+	if (!dmn->ste_icm_pool) {
+		mlx5dr_err(dmn, "Couldn't get icm memory\n");
+		ret = -ENOMEM;
+		goto free_htbls_kmem_cache;
+	}
+
+	dmn->action_icm_pool = mlx5dr_icm_pool_create(dmn, DR_ICM_TYPE_MODIFY_ACTION);
+	if (!dmn->action_icm_pool) {
+		mlx5dr_err(dmn, "Couldn't get action icm memory\n");
+		ret = -ENOMEM;
+		goto free_ste_icm_pool;
+	}
+
+	ret = mlx5dr_send_info_pool_create(dmn);
+	if (ret) {
+		mlx5dr_err(dmn, "Couldn't create send info pool\n");
+		goto free_action_icm_pool;
+	}
+
+	return 0;
+
+free_action_icm_pool:
+	mlx5dr_icm_pool_destroy(dmn->action_icm_pool);
+free_ste_icm_pool:
+	mlx5dr_icm_pool_destroy(dmn->ste_icm_pool);
+free_htbls_kmem_cache:
+	kmem_cache_destroy(dmn->htbls_kmem_cache);
+free_chunks_kmem_cache:
+	kmem_cache_destroy(dmn->chunks_kmem_cache);
+
+	return ret;
+}
+
+static void dr_domain_uninit_mem_resources(struct mlx5dr_domain *dmn)
+{
+	mlx5dr_send_info_pool_destroy(dmn);
+	mlx5dr_icm_pool_destroy(dmn->action_icm_pool);
+	mlx5dr_icm_pool_destroy(dmn->ste_icm_pool);
+	kmem_cache_destroy(dmn->htbls_kmem_cache);
+	kmem_cache_destroy(dmn->chunks_kmem_cache);
+}
+
 static int dr_domain_init_resources(struct mlx5dr_domain *dmn)
 {
 	int ret;
@@ -79,32 +143,22 @@ static int dr_domain_init_resources(struct mlx5dr_domain *dmn)
 		goto clean_pd;
 	}

-	dmn->ste_icm_pool = mlx5dr_icm_pool_create(dmn, DR_ICM_TYPE_STE);
-	if (!dmn->ste_icm_pool) {
-		mlx5dr_err(dmn, "Couldn't get icm memory\n");
-		ret = -ENOMEM;
+	ret = dr_domain_init_mem_resources(dmn);
+	if (ret) {
+		mlx5dr_err(dmn, "Couldn't create domain memory resources\n");
 		goto clean_uar;
 	}

-	dmn->action_icm_pool = mlx5dr_icm_pool_create(dmn, DR_ICM_TYPE_MODIFY_ACTION);
-	if (!dmn->action_icm_pool) {
-		mlx5dr_err(dmn, "Couldn't get action icm memory\n");
-		ret = -ENOMEM;
-		goto free_ste_icm_pool;
-	}
-
 	ret = mlx5dr_send_ring_alloc(dmn);
 	if (ret) {
 		mlx5dr_err(dmn, "Couldn't create send-ring\n");
-		goto free_action_icm_pool;
+		goto clean_mem_resources;
 	}

 	return 0;

-free_action_icm_pool:
-	mlx5dr_icm_pool_destroy(dmn->action_icm_pool);
-free_ste_icm_pool:
-	mlx5dr_icm_pool_destroy(dmn->ste_icm_pool);
+clean_mem_resources:
+	dr_domain_uninit_mem_resources(dmn);
 clean_uar:
 	mlx5_put_uars_page(dmn->mdev, dmn->uar);
 clean_pd:
@@ -116,8 +170,7 @@ static int dr_domain_init_resources(struct mlx5dr_domain *dmn)
 static void dr_domain_uninit_resources(struct mlx5dr_domain *dmn)
 {
 	mlx5dr_send_ring_free(dmn, dmn->send_ring);
-	mlx5dr_icm_pool_destroy(dmn->action_icm_pool);
-	mlx5dr_icm_pool_destroy(dmn->ste_icm_pool);
+	dr_domain_uninit_mem_resources(dmn);
 	mlx5_put_uars_page(dmn->mdev, dmn->uar);
 	mlx5_core_dealloc_pd(dmn->mdev, dmn->pdn);
 }

--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c
@@ -4,14 +4,30 @@
 #include "dr_types.h"

 #define DR_ICM_MODIFY_HDR_ALIGN_BASE 64
+#define DR_ICM_POOL_HOT_MEMORY_FRACTION 4
+
+struct mlx5dr_icm_hot_chunk {
+	struct mlx5dr_icm_buddy_mem *buddy_mem;
+	unsigned int seg;
+	enum mlx5dr_icm_chunk_size size;
+};

 struct mlx5dr_icm_pool {
 	enum mlx5dr_icm_type icm_type;
 	enum mlx5dr_icm_chunk_size max_log_chunk_sz;
 	struct mlx5dr_domain *dmn;
+	struct kmem_cache *chunks_kmem_cache;
+
 	/* memory management */
 	struct mutex mutex; /* protect the ICM pool and ICM buddy */
 	struct list_head buddy_mem_list;
+
+	/* Hardware may be accessing this memory but at some future,
+	 * undetermined time, it might cease to do so.
+	 * sync_ste command sets them free.
+	 */
+	struct mlx5dr_icm_hot_chunk *hot_chunks_arr;
+	u32 hot_chunks_num;
 	u64 hot_memory_size;
 };

@@ -177,46 +193,20 @@ static int dr_icm_buddy_get_ste_size(struct mlx5dr_icm_buddy_mem *buddy)

 static void dr_icm_chunk_ste_init(struct mlx5dr_icm_chunk *chunk, int offset)
 {
+	int num_of_entries = mlx5dr_icm_pool_get_chunk_num_of_entries(chunk);
 	struct mlx5dr_icm_buddy_mem *buddy = chunk->buddy_mem;
+	int ste_size = dr_icm_buddy_get_ste_size(buddy);
 	int index = offset / DR_STE_SIZE;

 	chunk->ste_arr = &buddy->ste_arr[index];
 	chunk->miss_list = &buddy->miss_list[index];
-	chunk->hw_ste_arr = buddy->hw_ste_arr +
-			    index * dr_icm_buddy_get_ste_size(buddy);
-}
+	chunk->hw_ste_arr = buddy->hw_ste_arr + index * ste_size;

-static void dr_icm_chunk_ste_cleanup(struct mlx5dr_icm_chunk *chunk)
-{
-	int num_of_entries = mlx5dr_icm_pool_get_chunk_num_of_entries(chunk);
-	struct mlx5dr_icm_buddy_mem *buddy = chunk->buddy_mem;
-
-	memset(chunk->hw_ste_arr, 0,
-	       num_of_entries * dr_icm_buddy_get_ste_size(buddy));
+	memset(chunk->hw_ste_arr, 0, num_of_entries * ste_size);
 	memset(chunk->ste_arr, 0,
 	       num_of_entries * sizeof(chunk->ste_arr[0]));
 }

-static enum mlx5dr_icm_type
-get_chunk_icm_type(struct mlx5dr_icm_chunk *chunk)
-{
-	return chunk->buddy_mem->pool->icm_type;
-}
-
-static void dr_icm_chunk_destroy(struct mlx5dr_icm_chunk *chunk,
-				 struct mlx5dr_icm_buddy_mem *buddy)
-{
-	enum mlx5dr_icm_type icm_type = get_chunk_icm_type(chunk);
-
-	buddy->used_memory -= mlx5dr_icm_pool_get_chunk_byte_size(chunk);
-	list_del(&chunk->chunk_list);
-
-	if (icm_type == DR_ICM_TYPE_STE)
-		dr_icm_chunk_ste_cleanup(chunk);
-
-	kvfree(chunk);
-}
-
 static int dr_icm_buddy_init_ste_cache(struct mlx5dr_icm_buddy_mem *buddy)
 {
 	int num_of_entries =
@@ -296,14 +286,6 @@ static int dr_icm_buddy_create(struct mlx5dr_icm_pool *pool)

 static void dr_icm_buddy_destroy(struct mlx5dr_icm_buddy_mem *buddy)
 {
-	struct mlx5dr_icm_chunk *chunk, *next;
-
-	list_for_each_entry_safe(chunk, next, &buddy->hot_list, chunk_list)
-		dr_icm_chunk_destroy(chunk, buddy);
-
-	list_for_each_entry_safe(chunk, next, &buddy->used_list, chunk_list)
-		dr_icm_chunk_destroy(chunk, buddy);
-
 	dr_icm_pool_mr_destroy(buddy->icm_mr);

 	mlx5dr_buddy_cleanup(buddy);
@@ -314,53 +296,62 @@ static void dr_icm_buddy_destroy(struct mlx5dr_icm_buddy_mem *buddy)
 	kvfree(buddy);
 }

-static struct mlx5dr_icm_chunk *
-dr_icm_chunk_create(struct mlx5dr_icm_pool *pool,
-		    enum mlx5dr_icm_chunk_size chunk_size,
-		    struct mlx5dr_icm_buddy_mem *buddy_mem_pool,
-		    unsigned int seg)
+static void
+dr_icm_chunk_init(struct mlx5dr_icm_chunk *chunk,
+		  struct mlx5dr_icm_pool *pool,
+		  enum mlx5dr_icm_chunk_size chunk_size,
+		  struct mlx5dr_icm_buddy_mem *buddy_mem_pool,
+		  unsigned int seg)
 {
-	struct mlx5dr_icm_chunk *chunk;
 	int offset;

-	chunk = kvzalloc(sizeof(*chunk), GFP_KERNEL);
-	if (!chunk)
-		return NULL;
-
-	offset = mlx5dr_icm_pool_dm_type_to_entry_size(pool->icm_type) * seg;
-
 	chunk->seg = seg;
 	chunk->size = chunk_size;
 	chunk->buddy_mem = buddy_mem_pool;

-	if (pool->icm_type == DR_ICM_TYPE_STE)
+	if (pool->icm_type == DR_ICM_TYPE_STE) {
+		offset = mlx5dr_icm_pool_dm_type_to_entry_size(pool->icm_type) * seg;
 		dr_icm_chunk_ste_init(chunk, offset);
+	}

 	buddy_mem_pool->used_memory += mlx5dr_icm_pool_get_chunk_byte_size(chunk);
-	INIT_LIST_HEAD(&chunk->chunk_list);
-
-	/* chunk now is part of the used_list */
-	list_add_tail(&chunk->chunk_list, &buddy_mem_pool->used_list);
-
-	return chunk;
 }

 static bool dr_icm_pool_is_sync_required(struct mlx5dr_icm_pool *pool)
 {
 	int allow_hot_size;

-	/* sync when hot memory reaches half of the pool size */
+	/* sync when hot memory reaches a certain fraction of the pool size */
 	allow_hot_size =
 		mlx5dr_icm_pool_chunk_size_to_byte(pool->max_log_chunk_sz,
-						   pool->icm_type) / 2;
+						   pool->icm_type) /
+		DR_ICM_POOL_HOT_MEMORY_FRACTION;

 	return pool->hot_memory_size > allow_hot_size;
 }

+static void dr_icm_pool_clear_hot_chunks_arr(struct mlx5dr_icm_pool *pool)
+{
+	struct mlx5dr_icm_hot_chunk *hot_chunk;
+	u32 i, num_entries;
+
+	for (i = 0; i < pool->hot_chunks_num; i++) {
+		hot_chunk = &pool->hot_chunks_arr[i];
+		num_entries = mlx5dr_icm_pool_chunk_size_to_entries(hot_chunk->size);
+		mlx5dr_buddy_free_mem(hot_chunk->buddy_mem,
+				      hot_chunk->seg, ilog2(num_entries));
+		hot_chunk->buddy_mem->used_memory -=
+			mlx5dr_icm_pool_chunk_size_to_byte(hot_chunk->size,
+							   pool->icm_type);
+	}
+
+	pool->hot_chunks_num = 0;
+	pool->hot_memory_size = 0;
+}
+
 static int dr_icm_pool_sync_all_buddy_pools(struct mlx5dr_icm_pool *pool)
 {
 	struct mlx5dr_icm_buddy_mem *buddy, *tmp_buddy;
-	u32 num_entries;
 	int err;

 	err = mlx5dr_cmd_sync_steering(pool->dmn->mdev);
@@ -369,16 +360,9 @@ static int dr_icm_pool_sync_all_buddy_pools(struct mlx5dr_icm_pool *pool)
 		return err;
 	}

-	list_for_each_entry_safe(buddy, tmp_buddy, &pool->buddy_mem_list, list_node) {
-		struct mlx5dr_icm_chunk *chunk, *tmp_chunk;
-
-		list_for_each_entry_safe(chunk, tmp_chunk, &buddy->hot_list, chunk_list) {
-			num_entries = mlx5dr_icm_pool_get_chunk_num_of_entries(chunk);
-			mlx5dr_buddy_free_mem(buddy, chunk->seg, ilog2(num_entries));
-			pool->hot_memory_size -= mlx5dr_icm_pool_get_chunk_byte_size(chunk);
-			dr_icm_chunk_destroy(chunk, buddy);
-		}
+	dr_icm_pool_clear_hot_chunks_arr(pool);

+	list_for_each_entry_safe(buddy, tmp_buddy, &pool->buddy_mem_list, list_node) {
 		if (!buddy->used_memory && pool->icm_type == DR_ICM_TYPE_STE)
 			dr_icm_buddy_destroy(buddy);
 	}
@@ -452,10 +436,12 @@ mlx5dr_icm_alloc_chunk(struct mlx5dr_icm_pool *pool,
 	if (ret)
 		goto out;

-	chunk = dr_icm_chunk_create(pool, chunk_size, buddy, seg);
+	chunk = kmem_cache_alloc(pool->chunks_kmem_cache, GFP_KERNEL);
 	if (!chunk)
 		goto out_err;

+	dr_icm_chunk_init(chunk, pool, chunk_size, buddy, seg);
+
 	goto out;

 out_err:
@@ -469,12 +455,23 @@ void mlx5dr_icm_free_chunk(struct mlx5dr_icm_chunk *chunk)
 {
 	struct mlx5dr_icm_buddy_mem *buddy = chunk->buddy_mem;
 	struct mlx5dr_icm_pool *pool = buddy->pool;
+	struct mlx5dr_icm_hot_chunk *hot_chunk;
+	struct kmem_cache *chunks_cache;
+
+	chunks_cache = pool->chunks_kmem_cache;

-	/* move the memory to the waiting list AKA "hot" */
+	/* move the chunk to the waiting chunks array, AKA "hot" memory */
 	mutex_lock(&pool->mutex);
-	list_move_tail(&chunk->chunk_list, &buddy->hot_list);
+
 	pool->hot_memory_size += mlx5dr_icm_pool_get_chunk_byte_size(chunk);

+	hot_chunk = &pool->hot_chunks_arr[pool->hot_chunks_num++];
+	hot_chunk->buddy_mem = chunk->buddy_mem;
+	hot_chunk->seg = chunk->seg;
+	hot_chunk->size = chunk->size;
+
+	kmem_cache_free(chunks_cache, chunk);
+
 	/* Check if we have chunks that are waiting for sync-ste */
 	if (dr_icm_pool_is_sync_required(pool))
 		dr_icm_pool_sync_all_buddy_pools(pool);
@@ -482,9 +479,20 @@ void mlx5dr_icm_free_chunk(struct mlx5dr_icm_chunk *chunk)
 	mutex_unlock(&pool->mutex);
 }

+struct mlx5dr_ste_htbl *mlx5dr_icm_pool_alloc_htbl(struct mlx5dr_icm_pool *pool)
+{
+	return kmem_cache_alloc(pool->dmn->htbls_kmem_cache, GFP_KERNEL);
+}
+
+void mlx5dr_icm_pool_free_htbl(struct mlx5dr_icm_pool *pool, struct mlx5dr_ste_htbl *htbl)
+{
+	kmem_cache_free(pool->dmn->htbls_kmem_cache, htbl);
+}
+
 struct mlx5dr_icm_pool *mlx5dr_icm_pool_create(struct mlx5dr_domain *dmn,
 					       enum mlx5dr_icm_type icm_type)
 {
+	u32 num_of_chunks, entry_size, max_hot_size;
 	enum mlx5dr_icm_chunk_size max_log_chunk_sz;
 	struct mlx5dr_icm_pool *pool;

@@ -500,21 +508,43 @@ struct mlx5dr_icm_pool *mlx5dr_icm_pool_create(struct mlx5dr_domain *dmn,
 	pool->dmn = dmn;
 	pool->icm_type = icm_type;
 	pool->max_log_chunk_sz = max_log_chunk_sz;
+	pool->chunks_kmem_cache = dmn->chunks_kmem_cache;

 	INIT_LIST_HEAD(&pool->buddy_mem_list);

 	mutex_init(&pool->mutex);

+	entry_size = mlx5dr_icm_pool_dm_type_to_entry_size(pool->icm_type);
+
+	max_hot_size = mlx5dr_icm_pool_chunk_size_to_byte(pool->max_log_chunk_sz,
+							  pool->icm_type) /
+		       DR_ICM_POOL_HOT_MEMORY_FRACTION;
+
+	num_of_chunks = DIV_ROUND_UP(max_hot_size, entry_size) + 1;
+
+	pool->hot_chunks_arr = kvcalloc(num_of_chunks,
+					sizeof(struct mlx5dr_icm_hot_chunk),
+					GFP_KERNEL);
+	if (!pool->hot_chunks_arr)
+		goto free_pool;
+
 	return pool;
+
+free_pool:
+	kvfree(pool);
+	return NULL;
 }

 void mlx5dr_icm_pool_destroy(struct mlx5dr_icm_pool *pool)
 {
 	struct mlx5dr_icm_buddy_mem *buddy, *tmp_buddy;

+	dr_icm_pool_clear_hot_chunks_arr(pool);
+
 	list_for_each_entry_safe(buddy, tmp_buddy, &pool->buddy_mem_list, list_node)
 		dr_icm_buddy_destroy(buddy);

+	kvfree(pool->hot_chunks_arr);
 	mutex_destroy(&pool->mutex);
 	kvfree(pool);
 }
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_rule.c
@@ -3,13 +3,16 @@

 #include "dr_types.h"

-#define DR_RULE_MAX_STE_CHAIN (DR_RULE_MAX_STES + DR_ACTION_MAX_STES)
+#define DR_RULE_MAX_STES_OPTIMIZED 5
+#define DR_RULE_MAX_STE_CHAIN_OPTIMIZED (DR_RULE_MAX_STES_OPTIMIZED + DR_ACTION_MAX_STES)

-static int dr_rule_append_to_miss_list(struct mlx5dr_ste_ctx *ste_ctx,
+static int dr_rule_append_to_miss_list(struct mlx5dr_domain *dmn,
+				       enum mlx5dr_domain_nic_type nic_type,
 				       struct mlx5dr_ste *new_last_ste,
 				       struct list_head *miss_list,
 				       struct list_head *send_list)
 {
+	struct mlx5dr_ste_ctx *ste_ctx = dmn->ste_ctx;
 	struct mlx5dr_ste_send_info *ste_info_last;
 	struct mlx5dr_ste *last_ste;

@@ -17,7 +20,7 @@ static int dr_rule_append_to_miss_list(struct mlx5dr_ste_ctx *ste_ctx,
 	last_ste = list_last_entry(miss_list, struct mlx5dr_ste, miss_list_node);
 	WARN_ON(!last_ste);

-	ste_info_last = kzalloc(sizeof(*ste_info_last), GFP_KERNEL);
+	ste_info_last = mlx5dr_send_info_alloc(dmn, nic_type);
 	if (!ste_info_last)
 		return -ENOMEM;

@@ -120,7 +123,7 @@ dr_rule_handle_one_ste_in_update_list(struct mlx5dr_ste_send_info *ste_info,
 		goto out;

 out:
-	kfree(ste_info);
+	mlx5dr_send_info_free(ste_info);
 	return ret;
 }

@@ -191,8 +194,8 @@ dr_rule_rehash_handle_collision(struct mlx5dr_matcher *matcher,
 	new_ste->htbl->chunk->miss_list = mlx5dr_ste_get_miss_list(col_ste);

 	/* Update the previous from the list */
-	ret = dr_rule_append_to_miss_list(dmn->ste_ctx, new_ste,
-					  mlx5dr_ste_get_miss_list(col_ste),
+	ret = dr_rule_append_to_miss_list(dmn, nic_matcher->nic_tbl->nic_dmn->type,
+					  new_ste, mlx5dr_ste_get_miss_list(col_ste),
 					  update_list);
 	if (ret) {
 		mlx5dr_dbg(dmn, "Failed update dup entry\n");
@@ -278,7 +281,8 @@ dr_rule_rehash_copy_ste(struct mlx5dr_matcher *matcher,
 	new_htbl->ctrl.num_of_valid_entries++;

 	if (use_update_list) {
-		ste_info = kzalloc(sizeof(*ste_info), GFP_KERNEL);
+		ste_info = mlx5dr_send_info_alloc(dmn,
+						  nic_matcher->nic_tbl->nic_dmn->type);
 		if (!ste_info)
 			goto err_exit;

@@ -357,6 +361,15 @@ static int dr_rule_rehash_copy_htbl(struct mlx5dr_matcher *matcher,
 						    update_list);
 		if (err)
 			goto clean_copy;
+
+		/* In order to decrease the number of allocated ste_send_info
+		 * structs, send the current table row now.
+		 */
+		err = dr_rule_send_update_list(update_list, matcher->tbl->dmn, false);
+		if (err) {
+			mlx5dr_dbg(matcher->tbl->dmn, "Failed updating table to HW\n");
+			goto clean_copy;
+		}
 	}

 clean_copy:
@@ -387,7 +400,8 @@ dr_rule_rehash_htbl(struct mlx5dr_rule *rule,
 	nic_matcher = nic_rule->nic_matcher;
 	nic_dmn = nic_matcher->nic_tbl->nic_dmn;

-	ste_info = kzalloc(sizeof(*ste_info), GFP_KERNEL);
+	ste_info = mlx5dr_send_info_alloc(dmn,
+					  nic_matcher->nic_tbl->nic_dmn->type);
 	if (!ste_info)
 		return NULL;

@@ -473,13 +487,13 @@ dr_rule_rehash_htbl(struct mlx5dr_rule *rule,
 	list_for_each_entry_safe(del_ste_info, tmp_ste_info,
 				 &rehash_table_send_list, send_list) {
 		list_del(&del_ste_info->send_list);
-		kfree(del_ste_info);
+		mlx5dr_send_info_free(del_ste_info);
 	}

 free_new_htbl:
 	mlx5dr_ste_htbl_free(new_htbl);
 free_ste_info:
-	kfree(ste_info);
+	mlx5dr_send_info_free(ste_info);
 	mlx5dr_info(dmn, "Failed creating rehash table\n");
 	return NULL;
 }
@@ -512,11 +526,11 @@ dr_rule_handle_collision(struct mlx5dr_matcher *matcher,
 			 struct list_head *send_list)
 {
 	struct mlx5dr_domain *dmn = matcher->tbl->dmn;
-	struct mlx5dr_ste_ctx *ste_ctx = dmn->ste_ctx;
 	struct mlx5dr_ste_send_info *ste_info;
 	struct mlx5dr_ste *new_ste;

-	ste_info = kzalloc(sizeof(*ste_info), GFP_KERNEL);
+	ste_info = mlx5dr_send_info_alloc(dmn,
+					  nic_matcher->nic_tbl->nic_dmn->type);
 	if (!ste_info)
 		return NULL;

@@ -524,8 +538,8 @@ dr_rule_handle_collision(struct mlx5dr_matcher *matcher,
 	if (!new_ste)
 		goto free_send_info;

-	if (dr_rule_append_to_miss_list(ste_ctx, new_ste,
-					miss_list, send_list)) {
+	if (dr_rule_append_to_miss_list(dmn, nic_matcher->nic_tbl->nic_dmn->type,
+					new_ste, miss_list, send_list)) {
 		mlx5dr_dbg(dmn, "Failed to update prev miss_list\n");
 		goto err_exit;
 	}
@@ -541,7 +555,7 @@ dr_rule_handle_collision(struct mlx5dr_matcher *matcher,
 err_exit:
 	mlx5dr_ste_free(new_ste, matcher, nic_matcher);
 free_send_info:
-	kfree(ste_info);
+	mlx5dr_send_info_free(ste_info);
 	return NULL;
 }

@@ -721,8 +735,8 @@ static int dr_rule_handle_action_stes(struct mlx5dr_rule *rule,
 		list_add_tail(&action_ste->miss_list_node,
 			      mlx5dr_ste_get_miss_list(action_ste));

-		ste_info_arr[k] = kzalloc(sizeof(*ste_info_arr[k]),
-					  GFP_KERNEL);
+		ste_info_arr[k] = mlx5dr_send_info_alloc(dmn,
+							 nic_matcher->nic_tbl->nic_dmn->type);
 		if (!ste_info_arr[k])
 			goto err_exit;

@@ -772,7 +786,8 @@ static int dr_rule_handle_empty_entry(struct mlx5dr_matcher *matcher,

 	ste->ste_chain_location = ste_location;

-	ste_info = kzalloc(sizeof(*ste_info), GFP_KERNEL);
+	ste_info = mlx5dr_send_info_alloc(dmn,
+					  nic_matcher->nic_tbl->nic_dmn->type);
 	if (!ste_info)
 		goto clean_ste_setting;

@@ -793,7 +808,7 @@ static int dr_rule_handle_empty_entry(struct mlx5dr_matcher *matcher,
 	return 0;

 clean_ste_info:
-	kfree(ste_info);
+	mlx5dr_send_info_free(ste_info);
 clean_ste_setting:
 	list_del_init(&ste->miss_list_node);
 	mlx5dr_htbl_put(cur_htbl);
@@ -1089,6 +1104,7 @@ dr_rule_create_rule_nic(struct mlx5dr_rule *rule,
 			size_t num_actions,
 			struct mlx5dr_action *actions[])
 {
+	u8 hw_ste_arr_optimized[DR_RULE_MAX_STE_CHAIN_OPTIMIZED * DR_STE_SIZE] = {};
 	struct mlx5dr_ste_send_info *ste_info, *tmp_ste_info;
 	struct mlx5dr_matcher *matcher = rule->matcher;
 	struct mlx5dr_domain *dmn = matcher->tbl->dmn;
@@ -1098,6 +1114,7 @@ dr_rule_create_rule_nic(struct mlx5dr_rule *rule,
 	struct mlx5dr_ste_htbl *cur_htbl;
 	struct mlx5dr_ste *ste = NULL;
 	LIST_HEAD(send_ste_list);
+	bool hw_ste_arr_is_opt;
 	u8 *hw_ste_arr = NULL;
 	u32 new_hw_ste_arr_sz;
 	int ret, i;
@@ -1109,9 +1126,23 @@ dr_rule_create_rule_nic(struct mlx5dr_rule *rule,
 			 rule->flow_source))
 		return 0;

-	hw_ste_arr = kzalloc(DR_RULE_MAX_STE_CHAIN * DR_STE_SIZE, GFP_KERNEL);
-	if (!hw_ste_arr)
-		return -ENOMEM;
+	ret = mlx5dr_matcher_select_builders(matcher,
+					     nic_matcher,
+					     dr_rule_get_ipv(&param->outer),
+					     dr_rule_get_ipv(&param->inner));
+	if (ret)
+		return ret;
+
+	hw_ste_arr_is_opt = nic_matcher->num_of_builders <= DR_RULE_MAX_STES_OPTIMIZED;
+	if (likely(hw_ste_arr_is_opt)) {
+		hw_ste_arr = hw_ste_arr_optimized;
+	} else {
+		hw_ste_arr = kzalloc((nic_matcher->num_of_builders + DR_ACTION_MAX_STES) *
+				     DR_STE_SIZE, GFP_KERNEL);
+
+		if (!hw_ste_arr)
+			return -ENOMEM;
+	}

 	mlx5dr_domain_nic_lock(nic_dmn);

@@ -1119,13 +1150,6 @@ dr_rule_create_rule_nic(struct mlx5dr_rule *rule,
 	if (ret)
 		goto free_hw_ste;

-	ret = mlx5dr_matcher_select_builders(matcher,
-					     nic_matcher,
-					     dr_rule_get_ipv(&param->outer),
-					     dr_rule_get_ipv(&param->inner));
-	if (ret)
-		goto remove_from_nic_tbl;
-
 	/* Set the tag values inside the ste array */
 	ret = mlx5dr_ste_build_ste_arr(matcher, nic_matcher, param, hw_ste_arr);
 	if (ret)
@@ -1187,7 +1211,8 @@ dr_rule_create_rule_nic(struct mlx5dr_rule *rule,

 	mlx5dr_domain_nic_unlock(nic_dmn);

-	kfree(hw_ste_arr);
+	if (unlikely(!hw_ste_arr_is_opt))
+		kfree(hw_ste_arr);

 	return 0;

@@ -1196,7 +1221,7 @@ dr_rule_create_rule_nic(struct mlx5dr_rule *rule,
 	/* Clean all ste_info's */
 	list_for_each_entry_safe(ste_info, tmp_ste_info, &send_ste_list, send_list) {
 		list_del(&ste_info->send_list);
-		kfree(ste_info);
+		mlx5dr_send_info_free(ste_info);
 	}

 remove_from_nic_tbl:
@@ -1205,7 +1230,10 @@ dr_rule_create_rule_nic(struct mlx5dr_rule *rule,

 free_hw_ste:
 	mlx5dr_domain_nic_unlock(nic_dmn);
-	kfree(hw_ste_arr);
+
+	if (unlikely(!hw_ste_arr_is_opt))
+		kfree(hw_ste_arr);
+
 	return ret;
 }


--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
@@ -7,6 +7,7 @@
 #define QUEUE_SIZE 128
 #define SIGNAL_PER_DIV_QUEUE 16
 #define TH_NUMS_TO_DRAIN 2
+#define DR_SEND_INFO_POOL_SIZE 1000

 enum { CQ_OK = 0, CQ_EMPTY = -1, CQ_POLL_ERR = -2 };

@@ -49,6 +50,136 @@ struct dr_qp_init_attr {
 	u8 isolate_vl_tc:1;
 };

+struct mlx5dr_send_info_pool_obj {
+	struct mlx5dr_ste_send_info ste_send_info;
+	struct mlx5dr_send_info_pool *pool;
+	struct list_head list_node;
+};
+
+struct mlx5dr_send_info_pool {
+	struct list_head free_list;
+};
+
+static int dr_send_info_pool_fill(struct mlx5dr_send_info_pool *pool)
+{
+	struct mlx5dr_send_info_pool_obj *pool_obj, *tmp_pool_obj;
+	int i;
+
+	for (i = 0; i < DR_SEND_INFO_POOL_SIZE; i++) {
+		pool_obj = kzalloc(sizeof(*pool_obj), GFP_KERNEL);
+		if (!pool_obj)
+			goto clean_pool;
+
+		pool_obj->pool = pool;
+		list_add_tail(&pool_obj->list_node, &pool->free_list);
+	}
+
+	return 0;
+
+clean_pool:
+	list_for_each_entry_safe(pool_obj, tmp_pool_obj, &pool->free_list, list_node) {
+		list_del(&pool_obj->list_node);
+		kfree(pool_obj);
+	}
+
+	return -ENOMEM;
+}
+
+static void dr_send_info_pool_destroy(struct mlx5dr_send_info_pool *pool)
+{
+	struct mlx5dr_send_info_pool_obj *pool_obj, *tmp_pool_obj;
+
+	list_for_each_entry_safe(pool_obj, tmp_pool_obj, &pool->free_list, list_node) {
+		list_del(&pool_obj->list_node);
+		kfree(pool_obj);
+	}
+
+	kfree(pool);
+}
+
+void mlx5dr_send_info_pool_destroy(struct mlx5dr_domain *dmn)
+{
+	dr_send_info_pool_destroy(dmn->send_info_pool_tx);
+	dr_send_info_pool_destroy(dmn->send_info_pool_rx);
+}
+
+static struct mlx5dr_send_info_pool *dr_send_info_pool_create(void)
+{
+	struct mlx5dr_send_info_pool *pool;
+	int ret;
+
+	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
+	if (!pool)
+		return NULL;
+
+	INIT_LIST_HEAD(&pool->free_list);
+
+	ret = dr_send_info_pool_fill(pool);
+	if (ret) {
+		kfree(pool);
+		return NULL;
+	}
+
+	return pool;
+}
+
+int mlx5dr_send_info_pool_create(struct mlx5dr_domain *dmn)
+{
+	dmn->send_info_pool_rx = dr_send_info_pool_create();
+	if (!dmn->send_info_pool_rx)
+		return -ENOMEM;
+
+	dmn->send_info_pool_tx = dr_send_info_pool_create();
+	if (!dmn->send_info_pool_tx) {
+		dr_send_info_pool_destroy(dmn->send_info_pool_rx);
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+struct mlx5dr_ste_send_info
+*mlx5dr_send_info_alloc(struct mlx5dr_domain *dmn,
+			enum mlx5dr_domain_nic_type nic_type)
+{
+	struct mlx5dr_send_info_pool_obj *pool_obj;
+	struct mlx5dr_send_info_pool *pool;
+	int ret;
+
+	pool = nic_type == DR_DOMAIN_NIC_TYPE_RX ? dmn->send_info_pool_rx :
+						   dmn->send_info_pool_tx;
+
+	if (unlikely(list_empty(&pool->free_list))) {
+		ret = dr_send_info_pool_fill(pool);
+		if (ret)
+			return NULL;
+	}
+
+	pool_obj = list_first_entry_or_null(&pool->free_list,
+					    struct mlx5dr_send_info_pool_obj,
+					    list_node);
+
+	if (likely(pool_obj)) {
+		list_del_init(&pool_obj->list_node);
+	} else {
+		WARN_ONCE(!pool_obj, "Failed getting ste send info obj from pool");
+		return NULL;
+	}
+
+	return &pool_obj->ste_send_info;
+}
+
+void mlx5dr_send_info_free(struct mlx5dr_ste_send_info *ste_send_info)
+{
+	struct mlx5dr_send_info_pool_obj *pool_obj;
+
+	pool_obj = container_of(ste_send_info,
+				struct mlx5dr_send_info_pool_obj,
+				ste_send_info);
+
+	list_add(&pool_obj->list_node, &pool_obj->pool->free_list);
+}
+
 static int dr_parse_cqe(struct mlx5dr_cq *dr_cq, struct mlx5_cqe64 *cqe64)
 {
 	unsigned int idx;
@@ -78,8 +209,15 @@ static int dr_cq_poll_one(struct mlx5dr_cq *dr_cq)
 	int err;

 	cqe64 = mlx5_cqwq_get_cqe(&dr_cq->wq);
-	if (!cqe64)
+	if (!cqe64) {
+		if (unlikely(dr_cq->mdev->state ==
+			     MLX5_DEVICE_STATE_INTERNAL_ERROR)) {
+			mlx5_core_dbg_once(dr_cq->mdev,
+					   "Polling CQ while device is shutting down\n");
+			return CQ_POLL_ERR;
+		}
 		return CQ_EMPTY;
+	}

 	mlx5_cqwq_pop(&dr_cq->wq);
 	err = dr_parse_cqe(dr_cq, cqe64);
@@ -833,6 +971,7 @@ static struct mlx5dr_cq *dr_create_cq(struct mlx5_core_dev *mdev,

 	cq->mcq.vector = 0;
 	cq->mcq.uar = uar;
+	cq->mdev = mdev;

 	return cq;


--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c
@@ -491,7 +491,7 @@ struct mlx5dr_ste_htbl *mlx5dr_ste_htbl_alloc(struct mlx5dr_icm_pool *pool,
 	u32 num_entries;
 	int i;

-	htbl = kzalloc(sizeof(*htbl), GFP_KERNEL);
+	htbl = mlx5dr_icm_pool_alloc_htbl(pool);
 	if (!htbl)
 		return NULL;

@@ -503,6 +503,9 @@ struct mlx5dr_ste_htbl *mlx5dr_ste_htbl_alloc(struct mlx5dr_icm_pool *pool,
 	htbl->lu_type = lu_type;
 	htbl->byte_mask = byte_mask;
 	htbl->refcount = 0;
+	htbl->pointing_ste = NULL;
+	htbl->ctrl.num_of_valid_entries = 0;
+	htbl->ctrl.num_of_collisions = 0;
 	num_entries = mlx5dr_icm_pool_get_chunk_num_of_entries(chunk);

 	for (i = 0; i < num_entries; i++) {
@@ -517,17 +520,20 @@ struct mlx5dr_ste_htbl *mlx5dr_ste_htbl_alloc(struct mlx5dr_icm_pool *pool,
 	return htbl;

 out_free_htbl:
-	kfree(htbl);
+	mlx5dr_icm_pool_free_htbl(pool, htbl);
 	return NULL;
 }

 int mlx5dr_ste_htbl_free(struct mlx5dr_ste_htbl *htbl)
 {
+	struct mlx5dr_icm_pool *pool = htbl->chunk->buddy_mem->pool;
+
 	if (htbl->refcount)
 		return -EBUSY;

 	mlx5dr_icm_free_chunk(htbl->chunk);
-	kfree(htbl);
+	mlx5dr_icm_pool_free_htbl(pool, htbl);
+
 	return 0;
 }


--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_table.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_table.c
@@ -292,7 +292,7 @@ int mlx5dr_table_destroy(struct mlx5dr_table *tbl)
 	mlx5dr_dbg_tbl_del(tbl);
 	ret = dr_table_destroy_sw_owned_tbl(tbl);
 	if (ret)
-		return ret;
+		mlx5dr_err(tbl->dmn, "Failed to destoy sw owned table\n");

 	dr_table_uninit(tbl);


--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h
@@ -146,6 +146,8 @@ struct mlx5dr_cmd_caps;
 struct mlx5dr_rule_rx_tx;
 struct mlx5dr_matcher_rx_tx;
 struct mlx5dr_ste_ctx;
+struct mlx5dr_send_info_pool;
+struct mlx5dr_icm_hot_chunk;

 struct mlx5dr_ste {
 	/* refcount: indicates the num of rules that using this ste */
@@ -912,6 +914,10 @@ struct mlx5dr_domain {
 	refcount_t refcount;
 	struct mlx5dr_icm_pool *ste_icm_pool;
 	struct mlx5dr_icm_pool *action_icm_pool;
+	struct mlx5dr_send_info_pool *send_info_pool_rx;
+	struct mlx5dr_send_info_pool *send_info_pool_tx;
+	struct kmem_cache *chunks_kmem_cache;
+	struct kmem_cache *htbls_kmem_cache;
 	struct mlx5dr_send_ring *send_ring;
 	struct mlx5dr_domain_info info;
 	struct xarray csum_fts_xa;
@@ -1105,7 +1111,6 @@ int mlx5dr_rule_get_reverse_rule_members(struct mlx5dr_ste **ste_arr,

 struct mlx5dr_icm_chunk {
 	struct mlx5dr_icm_buddy_mem *buddy_mem;
-	struct list_head chunk_list;

 	/* indicates the index of this chunk in the whole memory,
 	 * used for deleting the chunk from the buddy
@@ -1158,6 +1163,9 @@ u32 mlx5dr_icm_pool_get_chunk_num_of_entries(struct mlx5dr_icm_chunk *chunk);
 u32 mlx5dr_icm_pool_get_chunk_byte_size(struct mlx5dr_icm_chunk *chunk);
 u8 *mlx5dr_ste_get_hw_ste(struct mlx5dr_ste *ste);

+struct mlx5dr_ste_htbl *mlx5dr_icm_pool_alloc_htbl(struct mlx5dr_icm_pool *pool);
+void mlx5dr_icm_pool_free_htbl(struct mlx5dr_icm_pool *pool, struct mlx5dr_ste_htbl *htbl);
+
 static inline int
 mlx5dr_icm_pool_dm_type_to_entry_size(enum mlx5dr_icm_type icm_type)
 {
@@ -1404,6 +1412,12 @@ int mlx5dr_send_postsend_formatted_htbl(struct mlx5dr_domain *dmn,
 int mlx5dr_send_postsend_action(struct mlx5dr_domain *dmn,
 				struct mlx5dr_action *action);

+int mlx5dr_send_info_pool_create(struct mlx5dr_domain *dmn);
+void mlx5dr_send_info_pool_destroy(struct mlx5dr_domain *dmn);
+struct mlx5dr_ste_send_info *mlx5dr_send_info_alloc(struct mlx5dr_domain *dmn,
+						    enum mlx5dr_domain_nic_type nic_type);
+void mlx5dr_send_info_free(struct mlx5dr_ste_send_info *ste_send_info);
+
 struct mlx5dr_cmd_ft_info {
 	u32 id;
 	u16 vport;

--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h
@@ -164,16 +164,9 @@ struct mlx5dr_icm_buddy_mem {
 	struct mlx5dr_icm_mr	*icm_mr;
 	struct mlx5dr_icm_pool	*pool;

-	/* This is the list of used chunks. HW may be accessing this memory */
-	struct list_head	used_list;
+	/* Amount of memory in used chunks - HW may be accessing this memory */
 	u64			used_memory;

-	/* Hardware may be accessing this memory but at some future,
-	 * undetermined time, it might cease to do so.
-	 * sync_ste command sets them free.
-	 */
-	struct list_head	hot_list;
-
 	/* Memory optimisation */
 	struct mlx5dr_ste	*ste_arr;
 	struct list_head	*miss_list;