Commits · 03ccba5c2cf7ec5149c59039a5644233a2f60ca9 · Kirill Smelkov / linux

08 Nov, 2019 3 commits

RDMA/hns: Delete unnecessary uar from hns_roce_cq · 03ccba5c

Yixian Liu authored Nov 05, 2019

The uar information is already recorded in priv_uar of hns_roce_dev, there
is no need to record it in hns_roce_cq again.

Link: https://lore.kernel.org/r/1572952082-6681-4-git-send-email-liweihang@hisilicon.comSigned-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

03ccba5c

RDMA/hns: Remove unnecessary structure hns_roce_sqp · 16a11e0b

Lang Cheng authored Nov 05, 2019

Special QP have no differences with normal qp in data structure, so
definition of struct hns_roce_sqp should be removed and replaced by struct
hns_roce_qp.

Link: https://lore.kernel.org/r/1572952082-6681-3-git-send-email-liweihang@hisilicon.comSigned-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

16a11e0b

RDMA/hns: Delete unnecessary variable max_post · ec6adad0

Yixian Liu authored Nov 05, 2019

There is no need to define max_post in hns_roce_wq, as it does same thing
as wqe_cnt.

Link: https://lore.kernel.org/r/1572952082-6681-2-git-send-email-liweihang@hisilicon.comSigned-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

ec6adad0

06 Nov, 2019 20 commits

RDMA/mlx5: Rewrite MAD processing logic to be readable · ffa2fd13

Leon Romanovsky authored Oct 29, 2019

Multiple "if"s and "||" make extension of process_mad() function
as a tedious task, rewrite that function to be more readable.

Link: https://lore.kernel.org/r/20191029062745.7932-14-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

ffa2fd13

RDMA/ocrdma: Simplify process_mad function · 84b56d57

Leon Romanovsky authored Oct 29, 2019

Change the switch with one case into a simple if statement so the code is
less confusing.

Link: https://lore.kernel.org/r/20191029062745.7932-12-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

84b56d57

RDMA/mad: Do not check MAD sizes in roce and ib drivers · dd0b0159

Leon Romanovsky authored Oct 29, 2019

All callers for process_mad allocate MAD structures with proper sizes,
there is no need to recheck it.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

dd0b0159

RDMA/ocrdma: Make ocrdma_pma_counters() return void · 6a42265c

Leon Romanovsky authored Oct 29, 2019

This function always returns 0, so just use void and remove the bogus
checking at the only call site.

Link: https://lore.kernel.org/r/20191029062745.7932-6-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

6a42265c

RDMA/mad: Allocate zeroed MAD buffer · be4a8d46

Leon Romanovsky authored Oct 29, 2019

Ensure that MAD output buffer is zero-based allocated in all the callers
of process_mad and remove the various memset()'s from the drivers.

Link: https://lore.kernel.org/r/20191029062745.7932-3-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

be4a8d46

RDMA/qib: Delete empty check_cc_key function · 874e476b

Leon Romanovsky authored Oct 29, 2019

Function always returns zero, just delete it.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

874e476b

RDMA/qib: Delete extra line · 688eec9d

Leon Romanovsky authored Oct 29, 2019

Trivial cleanup to fix the following warning:
drivers/infiniband/hw/qib/qib_iba6120.c:1420: warning: bad line:

Fixes: f931551b ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters")
Link: https://lore.kernel.org/r/20191029062745.7932-15-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

688eec9d

RDMA/mad: Delete never implemented functions · 8a80cf93

Leon Romanovsky authored Oct 29, 2019

Delete never implemented and used MAD functions.

Link: https://lore.kernel.org/r/20191029062745.7932-2-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

8a80cf93

Revert "RDMA/srpt: Postpone HCA removal until after configfs directory removal" · 77cf98d4

Bart Van Assche authored Nov 01, 2019

Although the mentioned patch fixes a use-after-free bug, it introduces a
hang during shutdown. Since the latter is worse, revert this patch.

Link: https://lore.kernel.org/r/20191101204756.182162-1-bvanassche@acm.orgReported-by: Honggang Li <honli@redhat.com>
Fixes: 9b64f7d0 ("RDMA/srpt: Postpone HCA removal until after configfs directory removal")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Honggang Li <honli@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

77cf98d4

RDMA/qedr: Remove unsupported modify_port callback · ad0593ec

Kamal Heib authored Oct 28, 2019

There is no need to return always zero for function which is not
supported.

Fixes: ac1b36e5 ("qedr: Add support for user context verbs")
Link: https://lore.kernel.org/r/20191028155931.1114-5-kamalheib1@gmail.comSigned-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

ad0593ec

RDMA/ocrdma: Remove unsupported modify_port callback · 6135b711

Kamal Heib authored Oct 28, 2019

There is no need to return always zero for function which is not
supported.

Fixes: fe2caefc ("RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA adapter")
Link: https://lore.kernel.org/r/20191028155931.1114-4-kamalheib1@gmail.comSigned-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

6135b711

RDMA/hns: Remove unsupported modify_port callback · 25f3b49b

Kamal Heib authored Oct 28, 2019

There is no need to return always zero for function which is not
supported.

Fixes: 9a443537 ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/20191028155931.1114-3-kamalheib1@gmail.comSigned-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

25f3b49b

RDMA/core: Fix return code when modify_port isn't supported · 55bfe905

Kamal Heib authored Oct 28, 2019

Improve return code from ib_modify_port() by doing the following:
 - Use "-EOPNOTSUPP" instead "-ENOSYS" which is the proper return code

 - Allow only fake IB_PORT_CM_SUP manipulation for RoCE providers that
   didn't implement the modify_port callback, otherwise return
   "-EOPNOTSUPP"

Fixes: 61e0962d ("IB: Avoid ib_modify_port() failure for RoCE devices")
Link: https://lore.kernel.org/r/20191028155931.1114-2-kamalheib1@gmail.comSigned-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

55bfe905

RDMA/qedr: Add iWARP doorbell recovery support · b4bc7660

Michal Kalderon authored Oct 30, 2019

This patch adds the iWARP specific doorbells to the doorbell recovery
mechanism.

Link: https://lore.kernel.org/r/20191030094417.16866-9-michal.kalderon@marvell.comSigned-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

b4bc7660

RDMA/qedr: Add doorbell overflow recovery support · 97f61250

Michal Kalderon authored Oct 30, 2019

Use the doorbell recovery mechanism to register rdma related doorbells
that will be restored in case there is a doorbell overflow attention.

Link: https://lore.kernel.org/r/20191030094417.16866-8-michal.kalderon@marvell.comSigned-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

97f61250

RDMA/qedr: Use the common mmap API · 4c6bb02d

Michal Kalderon authored Oct 30, 2019

Remove all functions related to mmap from qedr and use the common API.

Link: https://lore.kernel.org/r/20191030094417.16866-7-michal.kalderon@marvell.comSigned-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

4c6bb02d

RDMA/siw: Use the common mmap_xa helpers · 11f1a755

Michal Kalderon authored Oct 30, 2019

Remove the functions related to managing the mmap_xa database. This code
is now common in ib_core.

Link: https://lore.kernel.org/r/20191030094417.16866-6-michal.kalderon@marvell.comSigned-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Tested-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

11f1a755

RDMA/efa: Use the common mmap_xa helpers · e84d3c18

Michal Kalderon authored Oct 30, 2019

Remove the functions related to managing the mmap_xa database. This code
was replaced with common code in ib_core.

Link: https://lore.kernel.org/r/20191030094417.16866-5-michal.kalderon@marvell.comSigned-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

e84d3c18

RDMA: Connect between the mmap entry and the umap_priv structure · c043ff2c

Michal Kalderon authored Oct 30, 2019

The rdma_user_mmap_io interface created a common interface for drivers to
correctly map hw resources and zap them once the ucontext is destroyed
enabling the drivers to safely free the hw resources.

However, this meant the drivers need to delay freeing the resource to the
ucontext destroy phase to ensure they were no longer mapped. The new
mechanism for a common way of handling user/driver address mapping enabled
notifying the driver if all umap_priv mappings were removed, and enabled
freeing the hw resources when they are done with and not delay it until
ucontext destroy.

Since not all drivers use the mechanism, NULL can be sent to the
rdma_user_mmap_io interface to continue working as before. Drivers that
use the mmap_xa interface can pass the entry being mapped to the
rdma_user_mmap_io function to be linked together.

Link: https://lore.kernel.org/r/20191030094417.16866-4-michal.kalderon@marvell.comSigned-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

c043ff2c

RDMA/core: Create mmap database and cookie helper functions · 3411f9f0

Michal Kalderon authored Oct 30, 2019

Create some common API's for adding entries to a xa_mmap. Searching for
an entry and freeing one.

The general approach is copied from the EFA driver and improved to be more
general and do more to help the drivers. Integration with the core allows
a reference counted scheme with a free function so that the driver can
know when its mmaps are all gone.

This significant new functionality will be helpful for drivers to have the
correct lifetime model for mmap objects.

Link: https://lore.kernel.org/r/20191030094417.16866-3-michal.kalderon@marvell.comSigned-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

3411f9f0

05 Nov, 2019 1 commit

RDMA/core: Move core content from ib_uverbs to ib_core · b86deba9

Michal Kalderon authored Oct 30, 2019

Move functionality that is called by the driver, which is
related to umap, to a new file that will be linked in ib_core.
This is a first step in later enabling ib_uverbs to be optional.
vm_ops is now initialized in ib_uverbs_mmap instead of
priv_init to avoid having to move all the rdma_umap functions
as well.

Link: https://lore.kernel.org/r/20191030094417.16866-2-michal.kalderon@marvell.comSuggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

b86deba9

31 Oct, 2019 2 commits

IB/mlx5: Test write combining support · 11f552e2

Michael Guralnik authored Jun 10, 2019

Linux can run in all sorts of physical machines and VMs where write
combining may or may not be supported. Currently there is no way to
reliably tell if the system supports WC, or not. The driver uses WC to
optimize posting work to the HCA, and getting this wrong in either
direction can cause a significant performance loss.

Add a test in mlx5_ib initialization process to test whether
write-combining is supported on the machine. The test will run as part of
the enable_driver callback to ensure that the test runs after the device
is setup and can create and modify the QP needed, but runs before the
device is exposed to the users.

The test opens UD QP and posts NOP WQEs, the WQE written to the BlueFlame
is different from the WQE in memory, requesting CQE only on the BlueFlame
WQE. By checking whether we received a completion on one of these WQEs we
can know if BlueFlame succeeded and this write-combining must be
supported.

Change reporting of BlueFlame support to be dependent on write-combining
support instead of the FW's guess as to what the machine can do.

Link: https://lore.kernel.org/r/20191027062234.10993-1-leon@kernel.orgSigned-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

11f552e2

RDMA/mlx5: Return proper error value · 546d3009

Leon Romanovsky authored Oct 29, 2019

Returned value from mlx5_mr_cache_alloc() is checked to be error or real
pointer. Return proper error code instead of NULL which is not checked
later.

Fixes: 81713d37 ("IB/mlx5: Add implicit MR support")
Link: https://lore.kernel.org/r/20191029055721.7192-1-leon@kernel.orgSigned-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

546d3009

29 Oct, 2019 1 commit

RDMA/hns: Fix build error again · d5b60e26

Arnd Bergmann authored Oct 07, 2019

This is not the first attempt to fix building random configurations,
unfortunately the attempt in commit a07fc0bb ("RDMA/hns: Fix build
error") caused a new problem when CONFIG_INFINIBAND_HNS_HIP06=m and
CONFIG_INFINIBAND_HNS_HIP08=y:

drivers/infiniband/hw/hns/hns_roce_main.o:(.rodata+0xe60): undefined reference to `__this_module'

Revert commits a07fc0bb ("RDMA/hns: Fix build error") and
a3e2d4c7 ("RDMA/hns: remove obsolete Kconfig comment") to get back to
the previous state, then fix the issues described there differently, by
adding more specific dependencies: INFINIBAND_HNS can now only be built-in
if at least one of HNS or HNS3 are built-in, and the individual back-ends
are only available if that code is reachable from the main driver.

Fixes: a07fc0bb ("RDMA/hns: Fix build error")
Fixes: a3e2d4c7 ("RDMA/hns: remove obsolete Kconfig comment")
Fixes: dd74282d ("RDMA/hns: Initialize the PCI device for hip08 RoCE")
Fixes: 08805fdb ("RDMA/hns: Split hw v1 driver from hns roce driver")
Link: https://lore.kernel.org/r/20191007211826.3361202-1-arnd@arndb.deSigned-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

d5b60e26

28 Oct, 2019 13 commits

Merge branch 'odp_rework' into rdma.git for-next · bb3dba33

Jason Gunthorpe authored Oct 28, 2019

Jason Gunthorpe says:

====================
In order to hoist the interval tree code out of the drivers and into the
mmu_notifiers it is necessary for the drivers to not use the interval tree
for other things.

This series replaces the interval tree with an xarray and along the way
re-aligns all the locking to use a sensible SRCU model where the 'update'
step is done by modifying an xarray.

The result is overall much simpler and with less locking in the critical
path. Many functions were reworked for clarity and small details like
using 'imr' to refer to the implicit MR make the entire code flow here
more readable.

This also squashes at least two race bugs on its own, and quite possibily
more that haven't been identified.
====================

Merge conflicts with the odp statistics patch resolved.

* branch 'odp_rework':
  RDMA/odp: Remove broken debugging call to invalidate_range
  RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy
  RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray
  RDMA/mlx5: Rework implicit ODP destroy
  RDMA/mlx5: Avoid double lookups on the pagefault path
  RDMA/mlx5: Reduce locking in implicit_mr_get_data()
  RDMA/mlx5: Use an xarray for the children of an implicit ODP
  RDMA/mlx5: Split implicit handling from pagefault_mr
  RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree
  RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it
  RDMA/mlx5: Rework implicit_mr_get_data
  RDMA/mlx5: Delete struct mlx5_priv->mkey_table
  RDMA/mlx5: Use a dedicated mkey xarray for ODP
  RDMA/mlx5: Split sig_err MR data into its own xarray
  RDMA/mlx5: Use SRCU properly in ODP prefetch
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

bb3dba33

RDMA/odp: Remove broken debugging call to invalidate_range · 46870b23

Jason Gunthorpe authored Oct 09, 2019

invalidate_range() also obtains the umem_mutex which is being held at this
point, so if this path were was ever called it would deadlock. Thus
conclude the debugging never triggers and rework it into a simple WARN_ON
and leave things as they are.

While here add a note to explain how we could possibly get inconsistent
page pointers.

Link: https://lore.kernel.org/r/20191009160934.3143-16-jgg@ziepe.caSigned-off-by: Jason Gunthorpe <jgg@mellanox.com>

46870b23

RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy · 09689703

Jason Gunthorpe authored Oct 09, 2019

For creation, as soon as the umem_odp is created the notifier can be
called, however the underlying MR may not have been setup yet. This would
cause problems if mlx5_ib_invalidate_range() runs. There is some
confusing/ulocked/racy code that might by trying to solve this, but
without locks it isn't going to work right.

Instead trivially solve the problem by short-circuiting the invalidation
if there are not yet any DMA mapped pages. By definition there is nothing
to invalidate in this case.

The create code will have the umem fully setup before anything is DMA
mapped, and npages is fully locked by the umem_mutex.

For destroy, invalidate the entire MR at the HW to stop DMA then DMA unmap
the pages before destroying the MR. This drives npages to zero and
prevents similar racing with invalidate while the MR is undergoing
destruction.

Arguably it would be better if the umem was created after the MR and
destroyed before, but that would require a big rework of the MR code.

Fixes: 6aec21f6 ("IB/mlx5: Page faults handling infrastructure")
Link: https://lore.kernel.org/r/20191009160934.3143-15-jgg@ziepe.caReviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

09689703

RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray · d561987f

Jason Gunthorpe authored Oct 09, 2019

These mkeys are entirely internal and are never used by the HW for
page fault. They should also never be used by userspace for prefetch.
Simplify & optimize things by not including them in the xarray.

Since the prefetch path can now never see a child mkey there is no need
for the second synchronize_srcu() during imr destroy.

Link: https://lore.kernel.org/r/20191009160934.3143-14-jgg@ziepe.caReviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

d561987f

RDMA/mlx5: Rework implicit ODP destroy · 5256edcb

Jason Gunthorpe authored Oct 09, 2019

Use SRCU in a sensible way by removing all MRs in the implicit tree from
the two xarrays (the update operation), then a synchronize, followed by a
normal single threaded teardown.

This is only a little unusual from the normal pattern as there can still
be some work pending in the unbound wq that may also require a workqueue
flush. This is tracked with a single atomic, consolidating the redundant
existing atomics and wait queue.

For understand-ability the entire ODP implicit create/destroy flow now
largely exists in a single pair of functions within odp.c, with a few
support functions for tearing down an unused child.

Link: https://lore.kernel.org/r/20191009160934.3143-13-jgg@ziepe.caReviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

5256edcb

RDMA/mlx5: Avoid double lookups on the pagefault path · b70d785d

Jason Gunthorpe authored Oct 09, 2019

Now that the locking is simplified combine pagefault_implicit_mr() with
implicit_mr_get_data() so that we sweep over the idx range only once,
and do the single xlt update at the end, after the child umems are
setup.

This avoids double iteration/xa_loads plus the sketchy failure path if the
xa_load() fails.

Link: https://lore.kernel.org/r/20191009160934.3143-12-jgg@ziepe.caReviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

b70d785d

RDMA/mlx5: Reduce locking in implicit_mr_get_data() · 3389baa8

Jason Gunthorpe authored Oct 09, 2019

Now that the child MRs are stored in an xarray we can rely on the SRCU
lock to protect the xa_load and use xa_cmpxchg on the slow allocation path
to resolve races with concurrent page fault.

This reduces the scope of the critical section of umem_mutex for implicit
MRs to only cover mlx5_ib_update_xlt, and avoids taking a lock at all if
the child MR is already in the xarray. This makes it consistent with the
normal ODP MR critical section for umem_lock, and the locking approach
used for destroying an unusued implicit child MR.

The MLX5_IB_UPD_XLT_ATOMIC is no longer needed in implicit_get_child_mr()
since it is no longer called with any locks.

Link: https://lore.kernel.org/r/20191009160934.3143-11-jgg@ziepe.caReviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

3389baa8

RDMA/mlx5: Use an xarray for the children of an implicit ODP · 423f52d6

Jason Gunthorpe authored Oct 09, 2019

Currently the child leaves are stored in the shared interval tree and
every lookup for a child must be done under the interval tree rwsem.

This is further complicated by dropping the rwsem during iteration (ie the
odp_lookup(), odp_next() pattern), which requires a very tricky an
difficult to understand locking scheme with SRCU.

Instead reserve the interval tree for the exclusive use of the mmu
notifier related code in umem_odp.c and give each implicit MR a xarray
containing all the child MRs.

Since the size of each child is 1GB of VA, a 1 level xarray will index 64G
of VA, and a 2 level will index 2TB, making xarray a much better
data structure choice than an interval tree.

The locking properties of xarray will be used in the next patches to
rework the implicit ODP locking scheme into something simpler.

At this point, the xarray is locked by the implicit MR's umem_mutex, and
read can also be locked by the odp_srcu.

Link: https://lore.kernel.org/r/20191009160934.3143-10-jgg@ziepe.caReviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

423f52d6

RDMA/mlx5: Split implicit handling from pagefault_mr · 54375e73

Jason Gunthorpe authored Oct 09, 2019

The single routine has a very confusing scheme to advance to the next
child MR when working on an implicit parent. This scheme can only be used
when working with an implicit parent and must not be triggered when
working on a normal MR.

Re-arrange things by directly putting all the single-MR stuff into one
function and calling it in a loop for the implicit case. Simplify some of
the error handling in the new pagefault_real_mr() to remove unneeded gotos.

Link: https://lore.kernel.org/r/20191009160934.3143-9-jgg@ziepe.caReviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

54375e73

RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree · 9162420d

Jason Gunthorpe authored Oct 09, 2019

Instead of rewriting all the IOVA's to 0 as things progress down the tree
make the IOVA of the children equal to placement in the tree. This makes
things easier to understand by keeping mmkey.iova == HW configuration.

Link: https://lore.kernel.org/r/20191009160934.3143-8-jgg@ziepe.caReviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

9162420d

RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it · c2edcd69

Jason Gunthorpe authored Oct 09, 2019

This makes the routines easier to understand, particularly with respect
the locking requirements of the entire sequence. The implicit_mr_alloc()
had a lot of ifs specializing it to each of the callers, and only a very
small amount of code was actually shared.

Following patches will cause the flow in the two functions to diverge
further.

Link: https://lore.kernel.org/r/20191009160934.3143-7-jgg@ziepe.caReviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

c2edcd69

RDMA/mlx5: Rework implicit_mr_get_data · 3d5f3c54

Jason Gunthorpe authored Oct 09, 2019

This function is intended to loop across each MTT chunk in the implicit
parent that intersects the range [io_virt, io_virt+bnct).  But it is has a
confusing construction, so:

- Consistently use imr and odp_imr to refer to the implicit parent
  to avoid confusion with the normal mr and odp of the child
- Directly compute the inclusive start/end indexes by shifting. This is
  clearer to understand the intent and avoids any errors from unaligned
  values of addr
- Iterate directly over the range of MTT indexes, do not make a loop
  out of goto
- Follow 'success oriented flow', with goto error unwind
- Directly calculate the range of idx's that need update_xlt
- Ensure that any leaf MR added to the interval tree always results in an
  update to the XLT

Link: https://lore.kernel.org/r/20191009160934.3143-6-jgg@ziepe.caReviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

3d5f3c54

RDMA/mlx5: Delete struct mlx5_priv->mkey_table · 74bddb36

Jason Gunthorpe authored Oct 09, 2019

No users are left, delete it.

Link: https://lore.kernel.org/r/20191009160934.3143-5-jgg@ziepe.caReviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

74bddb36