Commits · 1a0182785a6d2f9e626b861406bdd587642b218e · Kirill Smelkov / linux

26 Aug, 2021 7 commits

RDMA/hns: Delete unnecessary blank lines. · 1a018278

Xinhao Liu authored Aug 26, 2021

Link: https://lore.kernel.org/r/1629985056-57004-8-git-send-email-liangwenpeng@huawei.comSigned-off-by: Xinhao Liu <liuxinhao5@hisilicon.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

1a018278

RDMA/hns: Encapsulate the qp db as a function · ae2854c5

Yixing Liu authored Aug 26, 2021

Encapsulate qp db into two functions: user and kernel.

Link: https://lore.kernel.org/r/1629985056-57004-7-git-send-email-liangwenpeng@huawei.comSigned-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

ae2854c5

RDMA/hns: Adjust the order in which irq are requested and enabled · 7fac7169

Wenpeng Liang authored Aug 26, 2021

It should first alloc workqueue and request irq, and finally enable irq.

Link: https://lore.kernel.org/r/1629985056-57004-6-git-send-email-liangwenpeng@huawei.comSigned-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

7fac7169

RDMA/hns: Remove RST2RST error prints for hw v1 · ab5cbb9d

Weihang Li authored Aug 26, 2021

There is no need to prints error for hw_v1.

Link: https://lore.kernel.org/r/1629985056-57004-5-git-send-email-liangwenpeng@huawei.comSigned-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

ab5cbb9d

RDMA/hns: Remove dqpn filling when modify qp from Init to Init · fe164fc8

Wenpeng Liang authored Aug 26, 2021

According to the IB specification, the destination qpn is allowed to be
filled into the qpc only when the qp transitions from Init to RTR, so this
code is unused.

Link: https://lore.kernel.org/r/1629985056-57004-4-git-send-email-liangwenpeng@huawei.comSigned-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

fe164fc8

RDMA/hns: Fix QP's resp incomplete assignment · d2e0ccff

Wenpeng Liang authored Aug 26, 2021

The resp passed to the user space represents the enable flag of qp,
incomplete assignment will cause some features of the user space to be
disabled.

Fixes: 90ae0b57 ("RDMA/hns: Combine enable flags of qp")
Fixes: aba457ca ("RDMA/hns: Support owner mode doorbell")
Link: https://lore.kernel.org/r/1629985056-57004-3-git-send-email-liangwenpeng@huawei.comSigned-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

d2e0ccff

RDMA/hns: Fix query destination qpn · e788a3cd

Wenpeng Liang authored Aug 26, 2021

The bit width of dqpn is 24 bits, using u8 will cause truncation error.

Fixes: 926a01dc ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1629985056-57004-2-git-send-email-liangwenpeng@huawei.comSigned-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

e788a3cd

25 Aug, 2021 8 commits

RDMA/hfi1: Convert to SPDX identifier · 145eba1a

Cai Huoqing authored Aug 23, 2021

use SPDX-License-Identifier instead of a verbose license text

Link: https://lore.kernel.org/r/20210823042622.109-1-caihuoqing@baidu.comSigned-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

145eba1a

IB/rdmavt: Convert to SPDX identifier · d164bf64

Cai Huoqing authored Aug 23, 2021

use SPDX-License-Identifier instead of a verbose license text

Link: https://lore.kernel.org/r/20210823023530.48-1-caihuoqing@baidu.comSigned-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

d164bf64

RDMA/hns: Bugfix for incorrect association between dip_idx and dgid · eb653eda

Junxian Huang authored Aug 25, 2021

dip_idx and dgid should be a one-to-one mapping relationship, but when
qp_num loops back to the start number, it may happen that two different
dgid are assiociated to the same dip_idx incorrectly.

One solution is to store the qp_num that is not assigned to dip_idx in an
array. When a dip_idx needs to be allocated to a new dgid, an spare qp_num
is extracted and assigned to dip_idx.

Fixes: f91696f2 ("RDMA/hns: Support congestion control type selection according to the FW")
Link: https://lore.kernel.org/r/1629884592-23424-4-git-send-email-liangwenpeng@huawei.comSigned-off-by: Junxian Huang <huangjunxian4@hisilicon.com>
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

eb653eda

RDMA/hns: Bugfix for the missing assignment for dip_idx · 074f315f

Junxian Huang authored Aug 25, 2021

When the dgid-dip_idx mapping relationship exists, dip should be assigned.

Fixes: f91696f2 ("RDMA/hns: Support congestion control type selection according to the FW")
Link: https://lore.kernel.org/r/1629884592-23424-3-git-send-email-liangwenpeng@huawei.comSigned-off-by: Junxian Huang <huangjunxian4@hisilicon.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

074f315f

RDMA/hns: Bugfix for data type of dip_idx · 4303e612

Junxian Huang authored Aug 25, 2021

dip_idx is associated with qp_num whose data type is u32. However, dip_idx
is incorrectly defined as u8 data in the hns_roce_dip struct, which leads
to data truncation during value assignment.

Fixes: f91696f2 ("RDMA/hns: Support congestion control type selection according to the FW")
Link: https://lore.kernel.org/r/1629884592-23424-2-git-send-email-liangwenpeng@huawei.comSigned-off-by: Junxian Huang <huangjunxian4@hisilicon.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

4303e612

RDMA/hns: Fix incorrect lsn field · 9bed8a70

Yixing Liu authored Aug 25, 2021

In RNR NAK screnario, according to the specification, when no credit is
available, only the first fragment of the send request can be sent. The
LSN(Limit Sequence Number) field should be 0 or the entire packet will be
resent.

Fixes: 926a01dc ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1629883169-2306-1-git-send-email-liangwenpeng@huawei.comSigned-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

9bed8a70

RDMA/irdma: Remove the repeated declaration · fc3bf30f

Shaokun Zhang authored Aug 25, 2021

Functions 'irdma_alloc_ws_node_id' and 'irdma_free_ws_node_id' are
declared twice, so remove the repeated declaration.

Link: https://lore.kernel.org/r/1629861674-53343-1-git-send-email-zhangshaokun@hisilicon.comSigned-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

fc3bf30f

RDMA/core/sa_query: Retry SA queries · 5f5a6509

Håkon Bugge authored Aug 12, 2021

A MAD packet is sent as an unreliable datagram (UD). SA requests are sent
as MAD packets. As such, SA requests or responses may be silently dropped.

IB Core's MAD layer has a timeout and retry mechanism, which amongst
other, is used by RDMA CM. But it is not used by SA queries. The lack of
retries of SA queries leads to long specified timeout, and error being
returned in case of packet loss. The ULP or user-land process has to
perform the retry.

Fix this by taking advantage of the MAD layer's retry mechanism.

First, a check against a zero timeout is added in rdma_resolve_route(). In
send_mad(), we set the MAD layer timeout to one tenth of the specified
timeout and the number of retries to 10. The special case when timeout is
less than 10 is handled.

With this fix:

 # ucmatose -c 1000 -S 1024 -C 1

runs stable on an Infiniband fabric. Without this fix, we see an
intermittent behavior and it errors out with:

cmatose: event: RDMA_CM_EVENT_ROUTE_ERROR, error: -110

(110 is ETIMEDOUT)

Link: https://lore.kernel.org/r/1628784755-28316-1-git-send-email-haakon.bugge@oracle.comSigned-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

5f5a6509

24 Aug, 2021 3 commits

RDMA/hns: Delete unused hns bitmap interface · f0a64199

Yangyang Li authored Aug 19, 2021

The resources that use the hns bitmap interface: qp, cq, mr, pd, xrcd,
uar, srq, have been changed to IDA interfaces, and the unused hns' own
bitmap interfaces need to be deleted.

Link: https://lore.kernel.org/r/1629336980-17499-4-git-send-email-liangwenpeng@huawei.comSigned-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

f0a64199

RDMA/hns: Use IDA interface to manage srq index · c4f11b36

Yangyang Li authored Aug 19, 2021

Switch srq index allocation and release from hns' own bitmap interface to
IDA interface.

Link: https://lore.kernel.org/r/1629336980-17499-3-git-send-email-liangwenpeng@huawei.comSigned-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

c4f11b36

RDMA/hns: Use IDA interface to manage uar index · 8feafd90

Yangyang Li authored Aug 19, 2021

Switch uar index allocation and release from hns' own bitmap interface to
IDA interface.

Link: https://lore.kernel.org/r/1629336980-17499-2-git-send-email-liangwenpeng@huawei.comSigned-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

8feafd90

23 Aug, 2021 5 commits

RDMA/hns: Ownerbit mode add control field · f8c549af

Lang Cheng authored Aug 21, 2021

The ownerbit mode is for external card mode. Make it controlled by the
firmware.

Fixes: aba457ca ("RDMA/hns: Support owner mode doorbell")
Link: https://lore.kernel.org/r/1629539607-33217-4-git-send-email-liangwenpeng@huawei.comSigned-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

f8c549af

RDMA/hns: Enable stash feature of HIP09 · 260f64a4

Yixing Liu authored Aug 21, 2021

The stash feature is enabled by default on HIP09.

Fixes: f93c39bc ("RDMA/hns: Add support for QP stash")
Fixes: bfefae9f ("RDMA/hns: Add support for CQ stash")
Link: https://lore.kernel.org/r/1629539607-33217-3-git-send-email-liangwenpeng@huawei.comSigned-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

260f64a4

RDMA/hns: Remove unsupport cmdq mode · 0110a1ed

Lang Cheng authored Aug 21, 2021

CMDQ support un-interrupt mode only, and firmware ignores this mode flag,
so remove it.

Fixes: a04ff739 ("RDMA/hns: Add command queue support for hip08 RoCE driver")
Link: https://lore.kernel.org/r/1629539607-33217-2-git-send-email-liangwenpeng@huawei.comSigned-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

0110a1ed

RDMA: switch from 'pci_' to 'dma_' API · 3f69f4e0

Christophe JAILLET authored Aug 22, 2021

The wrappers in include/linux/pci-dma-compat.h should go away.

The patch has been generated with the coccinelle script below.

It has been hand modified to use 'dma_set_mask_and_coherent()' instead of
'pci_set_dma_mask()/pci_set_consistent_dma_mask()' when applicable.
This is less verbose.

It has been compile tested.

@@
@@
-    PCI_DMA_BIDIRECTIONAL
+    DMA_BIDIRECTIONAL

@@
@@
-    PCI_DMA_TODEVICE
+    DMA_TO_DEVICE

@@
@@
-    PCI_DMA_FROMDEVICE
+    DMA_FROM_DEVICE

@@
@@
-    PCI_DMA_NONE
+    DMA_NONE

@@
expression e1, e2, e3;
@@
-    pci_alloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3;
@@
-    pci_zalloc_consistent(e1, e2, e3)
+    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)

@@
expression e1, e2, e3, e4;
@@
-    pci_free_consistent(e1, e2, e3, e4)
+    dma_free_coherent(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_single(e1, e2, e3, e4)
+    dma_map_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_single(e1, e2, e3, e4)
+    dma_unmap_single(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4, e5;
@@
-    pci_map_page(e1, e2, e3, e4, e5)
+    dma_map_page(&e1->dev, e2, e3, e4, e5)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_page(e1, e2, e3, e4)
+    dma_unmap_page(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_map_sg(e1, e2, e3, e4)
+    dma_map_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_unmap_sg(e1, e2, e3, e4)
+    dma_unmap_sg(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_single_for_device(e1, e2, e3, e4)
+    dma_sync_single_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)

@@
expression e1, e2, e3, e4;
@@
-    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)

@@
expression e1, e2;
@@
-    pci_dma_mapping_error(e1, e2)
+    dma_mapping_error(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_dma_mask(e1, e2)
+    dma_set_mask(&e1->dev, e2)

@@
expression e1, e2;
@@
-    pci_set_consistent_dma_mask(e1, e2)
+    dma_set_coherent_mask(&e1->dev, e2)

Link: https://lore.kernel.org/r/259e53b7a00f64bf081d41da8761b171b2ad8f5c.1629634798.git.christophe.jaillet@wanadoo.frSigned-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

3f69f4e0

IB/core: Remove deprecated current_seq comments · 03da1b26

Li Zhijian authored Aug 23, 2021

current_seq was removed since the commit below.

Fixes: 36f30e48 ("IB/core: Improve ODP to use hmm_range_fault()")
Link: https://lore.kernel.org/r/20210823035246.3506-1-lizhijian@cn.fujitsu.comSigned-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

03da1b26

22 Aug, 2021 8 commits

RDMA/efa: Rename vector field in efa_irq struct to irqn · 273691c3

Gal Pressman authored Aug 11, 2021

The vector field naming is quite confusing, it is better referred to as
irqn.

Link: https://lore.kernel.org/r/20210811151131.39138-4-galpress@amazon.comReviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

273691c3

RDMA/efa: Remove unused cpu field from irq struct · 0043dbcf

Gal Pressman authored Aug 11, 2021

The cpu field in efa_irq struct is unused, remove it.

Link: https://lore.kernel.org/r/20210811151131.39138-3-galpress@amazon.comReviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

0043dbcf

RDMA/rtrs: Remove (void) casting for functions · cbe2de39

Gioh Kim authored Aug 06, 2021

Casting to (void) does nothing, remove them.

Link: https://lore.kernel.org/r/20210806112112.124313-7-haris.iqbal@ionos.comSuggested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

cbe2de39

RDMA/rtrs-clt: Fix counting inflight IO · 0d8f2cfa

Gioh Kim authored Aug 06, 2021

There are mis-match at counting inflight IO after changing the multipath
policy.

For example, we started fio test with round-robin policy and then we
changed the policy to min-inflight. IOs created under the RR policy is
finished under the min-inflight policy and inflight counter only
decreased. So the counter would be negative value. And also we started
fio test with min-inflight policy and changed the policy to the
round-robin. IOs created under the min-inflight policy increased the
inflight IO counter but the inflight IO counter was not decreased because
the policy was the round-robin when IO was finished.

So it should count IOs only if the IO is created under the min-inflight
policy. It should not care the policy when the IO is finished.

This patch adds a field mp_policy in struct rtrs_clt_io_req and stores the
multipath policy when an object of rtrs_clt_io_req is created. Then
rtrs-clt checks the mp_policy of only struct rtrs_clt_io_req instead of
the struct rtrs_clt.

Link: https://lore.kernel.org/r/20210806112112.124313-6-haris.iqbal@ionos.comSigned-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

0d8f2cfa

RDMA/rtrs: Remove all likely and unlikely · 4693d6b7

Gioh Kim authored Aug 06, 2021

The IO performance test with fio after swapping the likely and unlikely
macros in all if-statement shows no difference.  They do not help for the
performance of rtrs.

Thanks to Haakon Bugge for the test scenario.

The fio test did random read on 32 rnbd devices and 64 processes.
Test environment:
- Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
- 376G memory
- kernel version: 5.4.86
- gcc version: gcc (Debian 8.3.0-6) 8.3.0
- Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]

Test result:
- before swapping:       IOPS=829k, BW=3239MiB/s
- after swapping:        IOPS=829k, BW=3238MiB/s
- remove all (un)likely: IOPS=829k, BW=3238MiB/s

Link: https://lore.kernel.org/r/20210806112112.124313-5-haris.iqbal@ionos.comSigned-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

4693d6b7

RDMA/rtrs: Remove unused functions · d9b9f59e

Jack Wang authored Aug 06, 2021

The two functions are unused, so just remove them.

Link: https://lore.kernel.org/r/20210806112112.124313-3-haris.iqbal@ionos.comSigned-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

d9b9f59e

RDMA/rtrs-clt: During add_path change for_new_clt according to path_num · ac5e8814

Md Haris Iqbal authored Aug 06, 2021

When all the paths are removed for a session, the addition of the first
path is like a new session for the storage server.

Hence, for_new_clt has to be set to 1.

Link: https://lore.kernel.org/r/20210806112112.124313-2-haris.iqbal@ionos.comSigned-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

ac5e8814

Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux · 1a010d73

Jason Gunthorpe authored Aug 20, 2021

Saeed Mahameed says:
====================

This pulls mlx5-next branch which includes patches already reviewed on
net-next and rdma mailing lists.

1) mlx5 single E-Switch FDB for lag

2) IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq

3) Add DCS caps & fields support

We need this in net-next as multiple features are dependent on the
single FDB feature.

====================
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

* mellanox/mlx5-next:
  net/mlx5: Lag, Create shared FDB when in switchdev mode
  net/mlx5: E-Switch, add logic to enable shared FDB
  net/mlx5: Lag, move lag destruction to a workqueue
  net/mlx5: Lag, properly lock eswitch if needed
  net/mlx5: Add send to vport rules on paired device
  net/mlx5: E-Switch, Add event callback for representors
  net/mlx5e: Use shared mappings for restoring from metadata
  net/mlx5e: Add an option to create a shared mapping
  net/mlx5: E-Switch, set flow source for send to uplink rule
  RDMA/mlx5: Add shared FDB support
  {net, RDMA}/mlx5: Extend send to vport rules
  RDMA/mlx5: Fill port info based on the relevant eswitch
  net/mlx5: Lag, add initial logic for shared FDB
  net/mlx5: Return mdev from eswitch
  IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq

1a010d73

19 Aug, 2021 5 commits

RDMA/core/sa_query: Remove unused function · bfeababd

Håkon Bugge authored Aug 11, 2021

ib_sa_service_rec_query() was introduced in kernel v2.6.13 by
commit cbae32c5 ("[PATCH] IB: Add Service Record support to SA client")
in 2005. It was not used then and have never been used since.

Removing it and related functions/structs.

Link: https://lore.kernel.org/r/1628702736-12651-1-git-send-email-haakon.bugge@oracle.comSigned-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

bfeababd

RDMA/qedr: Move variables reset to qedr_set_common_qp_params() · 6ef793cb

Prabhakar Kushwaha authored Aug 11, 2021

Qedr code is tightly coupled with existing both INIT transitions.  Here,
during first INIT transition all variables are reset and the RESET state
is checked in post_recv() before any posting.

Commit dc70f7c3 ("RDMA/cma: Remove unnecessary INIT->INIT transition")
exposed this bug.

So moving variables reset to qedr_set_common_qp_params() and also avoid
RESET state check for post_recv().

Link: https://lore.kernel.org/r/20210811051650.14914-1-pkushwaha@marvell.comSigned-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

6ef793cb

RDMA/hfi1: Stop using seq_get_buf in _driver_stats_seq_show · 4b89451d

Christoph Hellwig authored Aug 10, 2021

Just use seq_write to copy the stats into the seq_file buffer instead of
poking holes into the seq_file abstraction.

Link: https://lore.kernel.org/r/20210810151711.1795374-1-hch@lst.deSigned-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

4b89451d

RDMA/rtrs: Remove a useless kfree() · 00c85b65

Christophe JAILLET authored Aug 05, 2021

'sess->rbufs' is known to be NULL here, so there is no point in kfree'ing
it.

Fixes: 6a98d71d ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/9a57c9f837fa2c6f0070578a1bc4840688f62962.1628185335.git.christophe.jaillet@wanadoo.frSigned-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

00c85b65

RDMA/hns: Fix return in hns_roce_rereg_user_mr() · c4c7d7a4

YueHaibing authored Aug 04, 2021

If re-registering an MR in hns_roce_rereg_user_mr(), we should return NULL
instead of passing 0 to ERR_PTR for clarity.

Fixes: 4e9fc1da ("RDMA/hns: Optimize the MR registration process")
Link: https://lore.kernel.org/r/20210804125939.20516-1-yuehaibing@huawei.comSigned-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

c4c7d7a4

05 Aug, 2021 4 commits

net/mlx5: Lag, Create shared FDB when in switchdev mode · 598fe77d

Mark Bloch authored Aug 03, 2021

If both eswitches are in switchdev mode and the uplink representors
are enslaved to the same bond device create a shared FDB configuration.

When moving to shared FDB mode not only the hardware needs be configured
but the RDMA driver needs to reconfigure itself.

When such change is done, unload the RDMA devices, configure the hardware
and load the RDMA representors.

When destroying the lag (can happen if a PCI function is unbinded,
driver is unloaded or by just removing a netdev from the bond) make sure
to restore the system to the previous state only if possible.

For example, if a PCI function is unbinded there is no need to load the
representors as the device is going away.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

598fe77d

net/mlx5: E-Switch, add logic to enable shared FDB · db202995

Mark Bloch authored Aug 03, 2021

Shared FDB allows to direct traffic from all the vports in the HCA to a
single eswitch. In order to do that three things are needed.

1) Point the ingress ACL of the slave uplink to that of the master.
   With this, wire traffic from both uplinks will reach the same eswitch
   with the same metadata where a single steering rule can catch traffic
   from both ports.

2) Set the FDB root flow table of the slave's eswitch to that of the
   master. As this flow table can change dynamically make sure to
   sync it on any set root flow table FDB command.
   This will make sure traffic from SFs, VFs, ECPFs and PFs reach the
   master eswitch.

3) Split wire traffic at the eswitch manager egress ACL so that it's
   directed to the native eswitch manager. We only treat wire traffic
   from both ports the same at the eswitch level. If such traffic wasn't
   handled in the eswitch it needs to reach the right representor to be
   processed by software. For example LACP packets should *always*
   reach the right uplink representor for correct operation.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

db202995

net/mlx5: Lag, move lag destruction to a workqueue · 63d4a9af

Mark Bloch authored Aug 03, 2021

If a netdev is removed from the lag the lag should be destroyed.
With downstream patches this might trigger a reconfiguration of
representors on a different eswitch and such we don't have the proper
locking to so from this path. Move the destruction to be done by the
workqueue.

As the destruction won't affect the netdev side it okay to do so.
The RDMA side will be reconfigured and it already coded to handle such
reconfiguration.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

63d4a9af

net/mlx5: Lag, properly lock eswitch if needed · cac1eb2c

Mark Bloch authored Aug 03, 2021

Currently when doing hardware lag we check the eswitch mode
but as this isn't done under a lock the check isn't valid.

As the code needs to sync between two different devices an extra
care is needed.

- When going to change eswitch mode, if hardware lag is active destroy it.
- While changing eswitch modes block any hardware bond creation.
- Delay handling bonding events until there are no mode changes in
  progress.
- When attaching a new mdev to lag, block until there is no mode change
  in progress. In order for the mode change to finish the interface lock
  will have to be taken. Release the lock and sleep for 100ms to
  allow forward progress. As this is a very rare condition (can happen if
  the user unbinds and binds a PCI function while also changing eswitch
  mode of the other PCI function) it has no real world impact.

As taking multiple eswitch mode locks is now required lockdep will
complain about a possible deadlock. Register a key per eswitch to make
lockdep happy.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

cac1eb2c