Commits · 090473004b026747a2795244ecd67bd51a24c923 · Kirill Smelkov / linux

30 Jul, 2021 9 commits

RDMA/qed: Use accurate error num in qed_cxt_dynamic_ilt_alloc · 09047300

Prabhakar Kushwaha authored Jul 29, 2021

To have more accurate error return type use -EOPNOTSUPP instead of
-EINVAL.

Link: https://lore.kernel.org/r/20210729151732.30995-1-pkushwaha@marvell.comSigned-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

09047300

RDMA/hfi1: Fix typo in comments · 991c4274

Cai Huoqing authored Jul 29, 2021

Remove the repeated word 'the' from comments

Link: https://lore.kernel.org/r/20210729082346.1882-1-caihuoqing@baidu.comSigned-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

991c4274

docs: Fix infiniband uverbs minor number · 8d7e415d

Leon Romanovsky authored Jul 28, 2021

Starting from the beginning of infiniband subsystem, the uverbs char
devices start from 192 as a minor number, see
commit bc38a6ab ("[PATCH] IB uverbs: core implementation").

This patch updates the admin guide documentation to reflect it.

Fixes: 9d85025b ("docs-rst: create an user's manual book")
Link: https://lore.kernel.org/r/bad03e6bcde45550c01e12908a6fe7dfa4770703.1627477347.git.leonro@nvidia.comSigned-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

8d7e415d

RDMA/iwpm: Rely on the rdma_nl_[un]register() to ensure that requests are valid · bbafcbc2

Leon Romanovsky authored Jul 23, 2021

The core netlink code alread guarentees that no netlink callback can be
running outside the rdma_nl_register/unregister() region and this
registration happens during module init/exit. Thus it is already prevented
that iwpm_valid_client() can ever fail. Remove it.

Link: https://lore.kernel.org/r/a9f05a78f9996bf6ea47099b5e02671bf742f5ab.1627048781.git.leonro@nvidia.comSigned-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

bbafcbc2

RDMA/iwpm: Remove not-needed reference counting · bdb0e4e3

Leon Romanovsky authored Jul 23, 2021

iwpm_init() and iwpm_exit() are called only once during iw_cm module
load. This makes whole reference count implementation not needed at all.

Link: https://lore.kernel.org/r/1778ded873ba58c9fadc5bb25038de1cec843bec.1627048781.git.leonro@nvidia.comSigned-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

bdb0e4e3

RDMA/iwcm: Release resources if iw_cm module initialization fails · e677b72a

Leon Romanovsky authored Jul 23, 2021

The failure during iw_cm module initialization partially left the system
with unreleased memory and other resources. Rewrite the module init/exit
routines in such way that netlink commands will be opened only after
successful initialization.

Fixes: b493d91d ("iwcm: common code for port mapper")
Link: https://lore.kernel.org/r/b01239f99cb1a3e6d2b0694c242d89e6410bcd93.1627048781.git.leonro@nvidia.comSigned-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

e677b72a

RDMA/hfi1: Convert from atomic_t to refcount_t on hfi1_devdata->user_refcount · a0293eb2

Xiyu Yang authored Jul 19, 2021

refcount_t type and corresponding API can protect refcounters from
accidental underflow and overflow and further use-after-free situations.

Link: https://lore.kernel.org/r/1626674454-56075-1-git-send-email-xiyuyang19@fudan.edu.cnSigned-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Tested-by: Josh Fisher <josh.fisher@cornelisnetworks.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

a0293eb2

IB/hfi1: Adjust pkey entry in index 0 · 62004871

Mike Marciniszyn authored Jul 15, 2021

It is possible for the primary IPoIB network device associated with any
RDMA device to fail to join certain multicast groups preventing IPv6
neighbor discovery and possibly other network ULPs from working
correctly. The IPv4 broadcast group is not affected as the IPoIB network
device handles joining that multicast group directly.

This is because the primary IPoIB network device uses the pkey at ndex 0
in the associated RDMA device's pkey table. Anytime the pkey value of
index 0 changes, the primary IPoIB network device automatically modifies
it's broadcast address (i.e. /sys/class/net/[ib0]/broadcast), since the
broadcast address includes the pkey value, and then bounces carrier. This
includes initial pkey assignment, such as when the pkey at index 0
transitions from the opa default of invalid (0x0000) to some value such as
the OPA default pkey for Virtual Fabric 0: 0x8001 or when the fabric
manager is restarted with a configuration change causing the pkey at index
0 to change. Many network ULPs are not sensitive to the carrier bounce and
are not expecting the broadcast address to change including the linux IPv6
stack. This problem does not affect IPoIB child network devices as their
pkey value is constant for all time.

To mitigate this issue, change the default pkey in at index 0 to 0x8001 to
cover the predominant case and avoid issues as ipoib comes up and the FM
sweeps.

At some point, ipoib multicast support should automatically fix
non-broadcast addresses as it does with the primary broadcast address.

Fixes: 77241056 ("IB/hfi1: add driver files")
Link: https://lore.kernel.org/r/20210715160445.142451.47651.stgit@awfm-01.cornelisnetworks.comSuggested-by: Josh Collier <josh.d.collier@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

62004871

IB/hfi1: Indicate DMA wait when txq is queued for wakeup · e9901043

Mike Marciniszyn authored Jul 15, 2021

There is no counter for dmawait in AIP, which hampers debugging
performance issues.

Add the counter increment when the txq is queued.

Fixes: d99dc602 ("IB/hfi1: Add functions to transmit datagram ipoib packets")
Link: https://lore.kernel.org/r/20210715160440.142451.8278.stgit@awfm-01.cornelisnetworks.comSigned-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

e9901043

20 Jul, 2021 3 commits

Merge branch 'mlx5_dcs' into rdma.git for-next · 07d0f314

Jason Gunthorpe authored Jul 20, 2021

Leon Romanovsky says:

====================
Add ConnectX DCS offload support

This patchset from Lior adds support of DCI stream channel (DCS) support.

DCS is an offload to SW load balancing of DC initiator work requests.

A single DC QP initiator (DCI) can be connected to only one target at the
time and can't start new connection until the previous work request is
completed.

This limitation causes to delays when the initiator process needs to
transfer data to multiple targets at the same time.
====================

* branch 'mlx5_dcs':
  RDMA/mlx5: Add DCS offload support
  RDMA/mlx5: Separate DCI QP creation logic
  net/mlx5: Add DCS caps & fields support

07d0f314

RDMA/mlx5: Add DCS offload support · 11656f59

Lior Nahmanson authored Jun 21, 2021

DCS is an offload to SW load balancing of DC initiator work requests.

A single DCI can be connected to only one target at the time and can't
start new connection until the previous work request is completed. This
limitation will cause to delay when the initiator process needs to
transfer data to multiple targets at the same time. The SW solution is to
use a process that handling and spreading the work request on many DCIs
according to destinations.

This feature is an offload to this process and coming to reduce the load
from the CPU and improve the performance.

Link: https://lore.kernel.org/r/491c2c2afdb5b07de7f03eab3f93cf0704549dbc.1624258894.git.leonro@nvidia.comReviewed-by: Meir Lichtinger <meirl@nvidia.com>
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

11656f59

RDMA/mlx5: Separate DCI QP creation logic · 2013b4d5

Lior Nahmanson authored Jun 21, 2021

This patch isolates DCI QP creation logic to separate function, so this
change will reduce complexity when adding new features to DCI QP without
interfering with other QP types.

The code was copied from create_user_qp() while taking only DCI relevant bits.

Link: https://lore.kernel.org/r/b4530bdd999349c59691224f016ff1efb5dc3b92.1624258894.git.leonro@nvidia.comReviewed-by: Meir Lichtinger <meirl@nvidia.com>
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

2013b4d5

18 Jul, 2021 1 commit

net/mlx5: Add DCS caps & fields support · 96cd2dd6

Lior Nahmanson authored Dec 28, 2020

This fields will be needed when adding a support for DCS offload

max_dci_stream_channels - maximum DCI stream channels supported per DCI.
max_dci_errored_streams - maximum DCI error stream channels
supported per DCI before a DCI move to error state.
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

96cd2dd6

16 Jul, 2021 12 commits

RDMA/rxe: Fix types in rxe_icrc.c · 923232bb

Bob Pearson authored Jul 06, 2021

Currently the ICRC is generated as a u32 type and then forced to a __be32
and stored into the ICRC field in the packet. The actual type of the ICRC
is __be32. This patch replaces u32 by __be32 and eliminates the casts.
The computation is exactly the same as the original but the types are more
consistent.

Link: https://lore.kernel.org/r/20210707040040.15434-10-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

923232bb

RDMA/rxe: Add kernel-doc comments to rxe_icrc.c · e4f5c82f

Bob Pearson authored Jul 06, 2021

This patch adds kernel-doc style comments to rxe_icrc.c

Link: https://lore.kernel.org/r/20210707040040.15434-9-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

e4f5c82f

RDMA/rxe: Move crc32 init code to rxe_icrc.c · add2b3b8

Bob Pearson authored Jul 06, 2021

This patch collects the code from rxe_register_device() that sets up the
crc32 calculation into a subroutine rxe_icrc_init() in rxe_icrc.c.

Link: https://lore.kernel.org/r/20210707040040.15434-8-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

add2b3b8

RDMA/rxe: Fixup rxe_icrc_hdr · 63887510

Bob Pearson authored Jul 06, 2021

rxe_icrc_hdr() in rxe_icrc.c is no longer shared. This patch makes it
static and changes the parameter list to match the other routines there.

Link: https://lore.kernel.org/r/20210707040040.15434-7-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

63887510

RDMA/rxe: Move rxe_crc32 to a subroutine · b6c6cc4a

Bob Pearson authored Jul 06, 2021

Move rxe_crc32() from rxe.h to rxe_icrc.c as a static local function.

Link: https://lore.kernel.org/r/20210707040040.15434-6-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

b6c6cc4a

RDMA/rxe: Move ICRC generation to a subroutine · 1117f26e

Bob Pearson authored Jul 06, 2021

Isolate ICRC generation into a single subroutine named rxe_generate_icrc()
in rxe_icrc.c. Remove scattered crc generation code from elsewhere.

Link: https://lore.kernel.org/r/20210707040040.15434-5-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

1117f26e

RDMA/rxe: Fixup rxe_send and rxe_loopback · 13050a0b

Bob Pearson authored Jul 06, 2021

Fixup rxe_send() and rxe_loopback() in rxe_net.c to have the same calling
sequence. This patch makes them static and have the same parameter list
and return value.

Link: https://lore.kernel.org/r/20210707040040.15434-4-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

13050a0b

RDMA/rxe: Move rxe_xmit_packet to a subroutine · 36fbb03d

Bob Pearson authored Jul 06, 2021

rxe_xmit_packet() was an overlong inline subroutine. This patch moves it
into rxe_net.c as an ordinary subroutine.

Link: https://lore.kernel.org/r/20210707040040.15434-3-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

36fbb03d

RDMA/rxe: Move ICRC checking to a subroutine · fe87fb17

Bob Pearson authored Jul 06, 2021

Move the code in rxe_recv() that checks the ICRC on incoming packets to a
subroutine rxe_check_icrc() and move that to rxe_icrc.c.

Link: https://lore.kernel.org/r/20210707040040.15434-2-rpearsonhpe@gmail.comSigned-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

fe87fb17

IB/core: Read subnet_prefix in ib_query_port via cache. · 21bfee9c

Anand Khoje authored Jul 12, 2021

ib_query_port() calls device->ops.query_port() to get the port
attributes. The method of querying is device driver specific. The same
function calls device->ops.query_gid() to get the GID and extract the
subnet_prefix (gid_prefix).

The GID and subnet_prefix are stored in a cache. But they do not get
read from the cache if the device is an Infiniband device. The
following change takes advantage of the cached subnet_prefix.
Testing with RDBMS has shown a significant improvement in performance
with this change.

Link: https://lore.kernel.org/r/20210712122625.1147-4-anand.a.khoje@oracle.comSigned-off-by: Anand Khoje <anand.a.khoje@oracle.com>
Signed-off-by: Haakon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

21bfee9c

IB/core: Shifting initialization of device->cache_lock · 36721a6d

Anand Khoje authored Jul 12, 2021

The lock cache_lock of struct ib_device is initialized in function
ib_cache_setup_one(). This is much later than the device initialization in
_ib_alloc_device().

This change shifts initialization of cache_lock in _ib_alloc_device().

Link: https://lore.kernel.org/r/20210712122625.1147-3-anand.a.khoje@oracle.comSuggested-by: Haakon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

36721a6d

IB/core: Updating cache for subnet_prefix in config_non_roce_gid_cache() · 0bc0602a

Anand Khoje authored Jul 12, 2021

Currently, cache for subnet_prefix was getting updated by reading port
attributes via ib_query_port. ib_query_port() calls ops.query_gid() to get
subnet_prefix and returns it via port_attr.

In ib_cache_update(), config_non_roce_gid_cache() obtains GIDs by calling
ops.query_gid(). We utilize this to store subnet_prefix in cache.

Link: https://lore.kernel.org/r/20210712122625.1147-2-anand.a.khoje@oracle.comSuggested-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Aru Kolappan <aru.kolappan@oracle.com>
Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com>
Signed-off-by: Haakon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

0bc0602a

15 Jul, 2021 11 commits

RDMA/efa: Split hardware stats to device and port stats · 8c1b4316

Gal Pressman authored Jul 12, 2021

The hardware stats API distinguishes between device and port statistics,
split the EFA stats accordingly instead of always dumping everything.

Link: https://lore.kernel.org/r/20210712105923.17389-1-galpress@amazon.comReviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

8c1b4316

MAINTAINERS: Update maintainers of HiSilicon RoCE · 91607118

Weihang Li authored Jul 08, 2021

Lijun has moved to work in other technical areas, and Wenpeng will
maintain this modules instead of him.

Link: https://lore.kernel.org/r/1625741958-51363-1-git-send-email-liweihang@huawei.comSigned-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

91607118

RDMA/rxe: Remove the repeated 'mr->umem = umem' · cdbdb772

Xiao Yang authored Jul 02, 2021

Drop duplicated code

Link: https://lore.kernel.org/r/20210702123024.37025-1-ice_yangxiao@163.comSigned-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

cdbdb772

RDMA/siw: Convert siw_tx_hdt() to kmap_local_page() · 9d649d59

Ira Weiny authored Jun 24, 2021

kmap() is being deprecated and will break uses of device dax after PKS
protection is introduced.[1]

The use of kmap() in siw_tx_hdt() is all thread local therefore
kmap_local_page() is a sufficient replacement and will work with pgmap
protected pages when those are implemented.

siw_tx_hdt() tracks pages used in a page_array.  It uses that array to
unmap pages which were mapped on function exit.  Not all entries in the
array are mapped and this is tracked in kmap_mask.

kunmap_local() takes a mapped address rather than a page.  Alter
siw_unmap_pages() to take the iov array to reuse the iov_base address of
each mapping.  Use PAGE_MASK to get the proper address for kunmap_local().

kmap_local_page() mappings are tracked in a stack and must be unmapped in
the opposite order they were mapped in.  Because segments are mapped into
the page array in increasing index order, modify siw_unmap_pages() to
unmap pages in decreasing order.

Use kmap_local_page() instead of kmap() to map pages in the page_array.

[1] https://lore.kernel.org/lkml/20201009195033.3208459-59-ira.weiny@intel.com/

Link: https://lore.kernel.org/r/20210624174814.2822896-1-ira.weiny@intel.comSigned-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

9d649d59

RDMA/siw: Remove kmap() · 1ec50dd1

Ira Weiny authored Jun 21, 2021

kmap() is being deprecated and will break uses of device dax after PKS
protection is introduced.[1]

These uses of kmap() in the SIW driver are thread local.  Therefore
kmap_local_page() is sufficient to use and will work with pgmap protected
pages when those are implemnted.

There is one more use of kmap() in this driver which is split into its own
patch because kmap_local_page() has strict ordering rules and the use of
the kmap_mask over multiple segments must be handled carefully.
Therefore, that conversion is handled in a stand alone patch.

Use kmap_local_page() instead of kmap() in the 'easy' cases.

[1] https://lore.kernel.org/lkml/20201009195033.3208459-59-ira.weiny@intel.com/

Link: https://lore.kernel.org/r/20210622061422.2633501-4-ira.weiny@intel.comSigned-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

1ec50dd1

RDMA/rtrs: Move sq_wr_avail to rtrs_con · cfcdbd9d

Jack Wang authored Jul 12, 2021

In order to account HB for sq_wr_avail properly, move sq_wr_avail from
rtrs_srv_con to rtrs_con.

Although rtrs-clt do not care sq_wr_avail, but still init it to
max_send_wr.

Fixes: b38041d5 ("RDMA/rtrs: Do not signal for heatbeat")
Link: https://lore.kernel.org/r/20210712060750.16494-7-jinpu.wang@ionos.comSigned-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

cfcdbd9d

RDMA/rtrs: Remove unused flags parameter · 99fac8bf

Jack Wang authored Jul 12, 2021

flags is not used, so remove it from rtrs_post_rdma_write_imm_empty.

Link: https://lore.kernel.org/r/20210712060750.16494-6-jinpu.wang@ionos.comSigned-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

99fac8bf

RDMA/rtrs: Make rtrs_post_rdma_write_imm_empty static · 6ea9b773

Jack Wang authored Jul 12, 2021

It's only used in rtrs.c, so no need to export.

Link: https://lore.kernel.org/r/20210712060750.16494-5-jinpu.wang@ionos.comSigned-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

6ea9b773

RDMA/rtrs: Enable the same selective signal for heartbeat and IO · e2d98504

Jack Wang authored Jul 12, 2021

On idle session, because we do not do signal for heartbeat, it will
overflow the send queue after sometime.

To avoid that, we need to enable the signal for heartbeat. To do that, add
a new member signal_interval in rtrs_path, which will set min of
queue_depth and SERVICE_CON_QUEUE_DEPTH, and track it for both heartbeat
and IO, so the sq queue full accounting is correct.

Fixes: b38041d5 ("RDMA/rtrs: Do not signal for heatbeat")
Link: https://lore.kernel.org/r/20210712060750.16494-4-jinpu.wang@ionos.comSigned-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

e2d98504

RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_con · a10431ef

Jack Wang authored Jul 12, 2021

We need to track also the wr used for heatbeat. This is a preparation for
that, will be used in later patch.

The io_cnt in rtrs_clt is removed, use wr_cnt instead.

Link: https://lore.kernel.org/r/20210712060750.16494-3-jinpu.wang@ionos.comSigned-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

a10431ef

RDMA/rtrs: Add error messages for failed operations. · 350ec9bc

Jack Wang authored Jul 12, 2021

It could help debugging in case of error happens.

Link: https://lore.kernel.org/r/20210712060750.16494-2-jinpu.wang@ionos.comSigned-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

350ec9bc

11 Jul, 2021 4 commits

Linux 5.14-rc1 · e73f0f0e
Linus Torvalds authored Jul 11, 2021

e73f0f0e

mm/rmap: try_to_migrate() skip zone_device !device_private · 6c855fce

Hugh Dickins authored Jul 07, 2021

I know nothing about zone_device pages and !device_private pages; but if
try_to_migrate_one() will do nothing for them, then it's better that
try_to_migrate() filter them first, than trawl through all their vmas.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Link: https://lore.kernel.org/lkml/1241d356-8ec9-f47b-a5ec-9b2bf66d242@google.com/
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

6c855fce

mm/rmap: fix new bug: premature return from page_mlock_one() · 023e1a8d

Hugh Dickins authored Jul 07, 2021

In the unlikely race case that page_mlock_one() finds VM_LOCKED has been
cleared by the time it got page table lock, page_vma_mapped_walk_done()
must be called before returning, either explicitly, or by a final call
to page_vma_mapped_walk() - otherwise the page table remains locked.

Fixes: cd62734c ("mm/rmap: split try_to_munlock from try_to_unmap")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/lkml/20210711151446.GB4070@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/f71f8523-cba7-3342-40a7-114abc5d1f51@google.com/
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

023e1a8d

mm/rmap: fix old bug: munlocking THP missed other mlocks · d9770fcc

Hugh Dickins authored Jul 07, 2021

The kernel recovers in due course from missing Mlocked pages: but there
was no point in calling page_mlock() (formerly known as
try_to_munlock()) on a THP, because nothing got done even when it was
found to be mapped in another VM_LOCKED vma.

It's true that we need to be careful: Mlocked accounting of pte-mapped
THPs is too difficult (so consistently avoided); but Mlocked accounting
of only-pmd-mapped THPs is supposed to work, even when multiple mappings
are mlocked and munlocked or munmapped.  Refine the tests.

There is already a VM_BUG_ON_PAGE(PageDoubleMap) in page_mlock(), so
page_mlock_one() does not even have to worry about that complication.

(I said the kernel recovers: but would page reclaim be likely to split
THP before rediscovering that it's VM_LOCKED? I've not followed that up)

Fixes: 9a73f61b ("thp, mlock: do not mlock PTE-mapped file huge pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/lkml/cfa154c-d595-406-eb7d-eb9df730f944@google.com/
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d9770fcc