Commits · 4a7aaf88c89f12f8048137e274ce0d40fe1056b2 · Kirill Smelkov / linux

16 Jun, 2021 24 commits

RDMA/qib: Use attributes for the port sysfs · 4a7aaf88

Jason Gunthorpe authored Jun 11, 2021

qib should not be creating a mess of kobjects to attach to the port
kobject - this is all attributes. The proper API is to create an
attribute_group list and create it against the port's kobject.

Link: https://lore.kernel.org/r/911e0031e1ed495b0006e8a6efec7b67a702cd5e.1623427137.git.leonro@nvidia.comTested-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

4a7aaf88

RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj's · 526a12c8

Jason Gunthorpe authored Jun 11, 2021

This code is trying to attach a list of counters grouped into 4 groups to
the ib_port sysfs. Instead of creating a bunch of kobjects simply express
everything naturally as an ib_port_attribute and add a single
attribute_groups list.

Remove all the naked kobject manipulations.

Link: https://lore.kernel.org/r/0d5a7241ee0fe66622de04fcbaafaf6a791d5c7c.1623427137.git.leonro@nvidia.comSigned-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

526a12c8

RDMA/core: Expose the ib port sysfs attribute machinery · 054239f4

Jason Gunthorpe authored Jun 11, 2021

Other things outside the core code are creating attributes against the
port. This patch exposes the basic machinery to do this.

The ib_port_attribute type allows creating groups of attributes attatched
to the port and comes with the usual machinery to do this.

Link: https://lore.kernel.org/r/5c4aeae57f6fa7c59a1d6d1c5506069516ae9bbf.1623427137.git.leonro@nvidia.comSigned-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

054239f4

RDMA/core: Remove the kobject_uevent() NOP · d89eb509

Jason Gunthorpe authored Jun 11, 2021

This call does nothing because the ib_port kobj is nested under a struct
device kobject and the dev_uevent_filter() function of the struct device
blocks uevents for any children kobj's that are not also struct devices.

A uevent for the struct device will be triggered after
ib_setup_port_attrs() returns which causes udev to pick up all the deep
"attributes" which are implemented as kobjects nested under a struct
device and assign them to the udev object for the struct device:

 $ udevadm info -a /sys/class/infiniband/ibp0s9
     ATTR{ports/1/counters/excessive_buffer_overrun_errors}=="0"

Link: https://lore.kernel.org/r/49231c92c7d4c60686de18f7e20932d0c82160ee.1623427137.git.leonro@nvidia.comSigned-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

d89eb509

RDMA/core: Create the device hw_counters through the normal groups mechanism · b7066b32

Jason Gunthorpe authored Jun 11, 2021

Instead of calling device_add_groups() add the group to the existing
groups array which is managed through device_add().

This requires setting up the hw_counters before device_add(), so it gets
split up from the already split port sysfs flow.

Move all the memory freeing to the release function.

Link: https://lore.kernel.org/r/666250d937b64f6fdf45da9e2dc0b6e5e4f7abd8.1623427137.git.leonro@nvidia.comSigned-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

b7066b32

RDMA/core: Simplify how the port sysfs is created · 2ca1cca4

Jason Gunthorpe authored Jun 11, 2021

Use the same technique as gid_attrs now uses to manage the port
sysfs. Bundle everything into three allocations and use a single
sysfs_create_groups() to build everything in one shot.

All the memory is always freed in the kobj release function, removing most
of the error unwinding.

The gid_attr technique and the hw_counters are very similar, merge the two
together and combine the sysfs_create_group() call for hw_counters with
the single sysfs group setup.

Link: https://lore.kernel.org/r/b688f3340694c59f7b44b1bde40e25559ef43cf3.1623427137.git.leonro@nvidia.comSigned-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

2ca1cca4

RDMA/core: Simplify how the gid_attrs sysfs is created · a4676388

Jason Gunthorpe authored Jun 11, 2021

Instead of having an whole bunch of different allocations to create the
gid_attr kobjects reduce it to three, one for the kobj struct plus the
attributes, and one for the attribute list for each of the two
groups.

Move the freeing of all allocations to the release function.

Reorder the operations so all the allocations happen first then the
kobject & sysfs operations are last.

This removes the majority of the complicated error unwind since the
release function will always undo all the memory allocations. Freeing the
memory is also much simpler since there is a lot less of it.

Consolidate creating the "group of array indexes" pattern into one helper
function. Ensure kobject_del is used.

Link: https://lore.kernel.org/r/f4149d379db7178d37d11d75e3026bf550f818a1.1623427137.git.leonro@nvidia.comSigned-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

a4676388

RDMA/core: Split gid_attrs related sysfs from add_port() · a32f4335

Jason Gunthorpe authored Jun 11, 2021

The gid_attrs directory is a dedicated kobj nested under the port,
construct/destruct it with its own pair of functions for
understandability. This is much more readable than having it weirdly
inlined out of order into the add_port() function.

Link: https://lore.kernel.org/r/1c9434111b6770a7aef0e644a88a16eee7e325b8.1623427137.git.leonro@nvidia.comSigned-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

a32f4335

RDMA/core: Split port and device counter sysfs attributes · 467f432a

Jason Gunthorpe authored Jun 11, 2021

This code creates a 'struct hw_stats_attribute' for each sysfs entry that
contains a naked 'struct attribute' inside.

It then proceeds to attach this same structure to a 'struct device' kobj
and a 'struct ib_port' kobj. However, this violates the typing
requirements. 'struct device' requires the attribute to be a 'struct
device_attribute' and 'struct ib_port' requires the attribute to be
'struct port_attribute'.

This happens to work because the show/store function pointers in all three
structures happen to be at the same offset and happen to be nearly the
same signature. This means when container_of() was used to go between the
wrong two types it still managed to work.

However clang CFI detection notices that the function pointers have a
slightly different signature. As with show/store this was only working
because the device and port struct layouts happened to have the kobj at
the front.

Correct this by have two independent sets of data structures for the port
and device case. The two different attributes correctly include the
port/device_attribute struct and everything from there up is kept
split. The show/store function call chains start with device/port unique
functions that invoke a common show/store function pointer.

Link: https://lore.kernel.org/r/a8b3864b4e722aed3657512af6aa47dc3c5033be.1623427137.git.leonro@nvidia.comReported-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

467f432a

RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointer · d8a58838

Jason Gunthorpe authored Jun 11, 2021

It is much saner to store a pointer to the kobject structure that contains
the cannonical stats pointer than to copy the stats pointers into a public
structure.

Future patches will require the sysfs pointer for other purposes.

Link: https://lore.kernel.org/r/f90551dfd296cde1cb507bbef27cca9891d19871.1623427137.git.leonro@nvidia.comSigned-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

d8a58838

RDMA: Split the alloc_hw_stats() ops to port and device variants · 4b5f4d3f

Jason Gunthorpe authored Jun 11, 2021

This is being used to implement both the port and device global stats,
which is causing some confusion in the drivers. For instance EFA and i40iw
both seem to be misusing the device stats.

Split it into two ops so drivers that don't support one or the other can
leave the op NULL'd, making the calling code a little simpler to
understand.

Link: https://lore.kernel.org/r/1955c154197b2a159adc2dc97266ddc74afe420c.1623427137.git.leonro@nvidia.comTested-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

4b5f4d3f

RDMA/rxe: Disallow MR dereg and invalidate when bound · 570d2b99

Bob Pearson authored Jun 07, 2021

Check that an MR has no bound MWs before allowing a dereg or invalidate
operation.

Link: https://lore.kernel.org/r/20210608042552.33275-11-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

570d2b99

RDMA/rxe: Implement memory access through MWs · cdd0b856

Bob Pearson authored Jun 07, 2021

Add code to implement memory access through memory windows.

Link: https://lore.kernel.org/r/20210608042552.33275-10-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

cdd0b856

RDMA/rxe: Implement invalidate MW operations · 3902b429

Bob Pearson authored Jun 07, 2021

Implement invalidate MW and cleaned up invalidate MR operations.

Added code to perform remote invalidate for send with invalidate. Added
code to perform local invalidation. Deleted some blank lines in rxe_loc.h.

Link: https://lore.kernel.org/r/20210608042552.33275-9-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

3902b429

RDMA/rxe: Add support for bind MW work requests · 32a577b4

Bob Pearson authored Jun 07, 2021

Add support for bind MW work requests from user space. Since rdma/core
does not support bind mw in ib_send_wr there is no way to support bind mw
in kernel space.

Added bind_mw local operation in rxe_req.c. Added bind_mw WR operation in
rxe_opcode.c. Added bind_mw WC in rxe_comp.c. Added additional fields to
rxe_mw in rxe_verbs.h. Added rxe_do_dealloc_mw() subroutine to cleanup an
mw when rxe_dealloc_mw is called. Added code to implement bind_mw
operation in rxe_mw.c

Link: https://lore.kernel.org/r/20210608042552.33275-8-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

32a577b4

RDMA/rxe: Move local ops to subroutine · c1a41126

Bob Pearson authored Jun 07, 2021

Simplify rxe_requester() by moving the local operations to a subroutine.
Add an error return for illegal send WR opcode. Moved next_index ahead of
rxe_run_task which fixed a small bug where work completions were delayed
until after the next wqe which was not the intended behavior. Let errors
return their own WC status. Previously all errors were reported as
protection errors which was incorrect. Changed the return of errors from
rxe_do_local_ops() to err: which causes an immediate completion. Without
this an error on a last WR may get lost. Changed fill_packet() to
finish_packet() which is more accurate.

Fixes: 8700e2e7c485 ("The software RoCE driver")
Link: https://lore.kernel.org/r/20210608042552.33275-7-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

c1a41126

RDMA/rxe: Replace WR_REG_MASK by WR_LOCAL_OP_MASK · 886441fb

Bob Pearson authored Jun 07, 2021

Rxe has two mask bits WR_LOCAL_MASK and WR_REG_MASK with WR_REG_MASK used
to indicate any local operation and WR_LOCAL_MASK unused. This patch
replaces both of these with one mask bit WR_LOCAL_OP_MASK which is
clearer.

Link: https://lore.kernel.org/r/20210608042552.33275-6-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

886441fb

RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs · beec0239

Bob Pearson authored Jun 07, 2021

Add ib_alloc_mw and ib_dealloc_mw verbs APIs.

Added new file rxe_mw.c focused on MWs. Changed the 8 bit random key
generator. Added a cleanup routine for MWs. Added verbs routines to
ib_device_ops.

Link: https://lore.kernel.org/r/20210608042552.33275-5-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

beec0239

RDMA/rxe: Enable MW object pool · af732adf

Bob Pearson authored Jun 07, 2021

Currently the rxe driver has a rxe_mw struct object but nothing about
memory windows is enabled. This patch turns on memory windows and some
minor cleanup.

Set device attribute in rxe.c so max_mw = MAX_MW. Change parameters in
rxe_param.h so that MAX_MW is the same as MAX_MR. Reduce the number of
MRs and MWs to 4K from 256K. Add device capability bits for 2a and 2b
memory windows. Removed RXE_MR_TYPE_MW from the rxe_mr_type enum.

Link: https://lore.kernel.org/r/20210608042552.33275-4-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

af732adf

RDMA/rxe: Return errors for add index and key · 08224016

Bob Pearson authored Jun 07, 2021

Modify rxe_add_index() and rxe_add_key() to return an error if the index
or key is aleady present in the pool. Currently they print a warning and
silently fail with bad consequences to the caller.

Link: https://lore.kernel.org/r/20210608042552.33275-3-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

08224016

RDMA/rxe: Add bind MW fields to rxe_send_wr · 660a5936

Bob Pearson authored Jun 07, 2021

Add fields to struct rxe_send_wr in rdma_user_rxe.h to support bind MW
work requests

Link: https://lore.kernel.org/r/20210608042552.33275-2-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

660a5936

RDMA/rxe: Fix qp reference counting for atomic ops · 15ae1375

Bob Pearson authored Jun 04, 2021

Currently the rdma_rxe driver attempts to protect atomic responder
resources by taking a reference to the qp which is only freed when the
resource is recycled for a new read or atomic operation. This means that
in normal circumstances there is almost always an extra qp reference once
an atomic operation has been executed which prevents cleaning up the qp
and associated pd and cqs when the qp is destroyed.

This patch removes the call to rxe_add_ref() in send_atomic_ack() and the
call to rxe_drop_ref() in free_rd_atomic_resource(). If the qp is
destroyed while a peer is retrying an atomic op it will cause the
operation to fail which is acceptable.

Link: https://lore.kernel.org/r/20210604230558.4812-1-rpearsonhpe@gmail.comReported-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Fixes: 86af6176 ("IB/rxe: remove unnecessary skb_clone")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

15ae1375

RDMA/hns: Support getting max QP number from firmware · 61b460d1

Xi Wang authored Jun 01, 2021

All functions of HIP09's ROCEE share on-chip resources for all QPs, the
driver needs configure the resource index and number for each function
during the init stage.

Link: https://lore.kernel.org/r/1622541427-42193-1-git-send-email-liweihang@huawei.comSigned-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

61b460d1

RDMA/mlx5: Don't add slave port to unaffiliated list · 7ce6095e

Leon Romanovsky authored May 31, 2021

The mlx5_ib_bind_slave_port() doesn't remove multiport device from the
unaffiliated list, but mlx5_ib_unbind_slave_port() did it. This unbalanced
flow caused to the situation where mlx5_ib_unaffiliated_port_list was
changed during iteration.

Fixes: 32f69e4b ("{net, IB}/mlx5: Manage port association for multiport RoCE")
Link: https://lore.kernel.org/r/2726e6603b1e6ecfe76aa5a12a063af72173bcf7.1622477058.git.leonro@nvidia.comReported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

7ce6095e

10 Jun, 2021 2 commits

RDMA/irdma: Store PBL info address a pointer type · 2db7b2ea

Shiraz Saleem authored Jun 09, 2021

The level1 PBL info address is stored as u64. This requires casting
through a uinptr_t before used as a pointer type.

And this leads to sparse warning such as this when uinptr_t is missing:

drivers/infiniband/hw/irdma/hw.c: In function 'irdma_destroy_virt_aeq':
drivers/infiniband/hw/irdma/hw.c:579:23: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
579 | dma_addr_t *pg_arr = (dma_addr_t *)aeq->palloc.level1.addr;

This can be fixed using an intermediate uintptr_t, but rather it is better
to fix the structure irdm_pble_info to store the address as u64* and the
VA it is assigned in irdma_chunk as a void*. This greatly reduces the
casting on this address.

Fixes: 44d9e529 ("RDMA/irdma: Implement device initialization definitions")
Link: https://lore.kernel.org/r/20210609234924.938-1-shiraz.saleem@intel.comReported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

2db7b2ea

IB/cm: Remove dgid from the cm_id_priv av · bf0480a2

Jason Gunthorpe authored Jun 09, 2021

It turns out this is only being used to store the LID for SIDR mode to
search the RB tree for request de-duplication. Store the LID value
directly and don't pretend it is a GID.

Link: https://lore.kernel.org/r/2e7c87b6f662c90c642fc1838e363ad3e6ef14a4.1623236345.git.leonro@nvidia.comReviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

bf0480a2

08 Jun, 2021 13 commits

RDMA/irdma: Use list_last_entry/list_first_entry · 6246f1cc

Shiraz Saleem authored Jun 08, 2021

Use list_last_entry and list_first_entry instead of using prev and next
pointers.

Link: https://lore.kernel.org/r/20210608211415.680-1-shiraz.saleem@intel.comReported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

6246f1cc

RDMA/irdma: Use list_move instead of list_del/list_add · ac477efc

Baokun Li authored Jun 08, 2021

Using list_move() instead of list_del() + list_add().

Link: https://lore.kernel.org/r/20210608031041.2820429-1-libaokun1@huawei.comReported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>