Commits · 1d31e9c09f41f9ffa53eaf8457b3e77a1d527f1e · Kirill Smelkov / linux

24 Aug, 2017 2 commits

IB/mlx5: Fix Raw Packet QP event handler assignment · 1d31e9c0

Majd Dibbiny authored Aug 23, 2017

In case we have SQ and RQ for Raw Packet QP, the SQ's event handler
wasn't assigned.

Fixing this by assigning event handler for each WQ after creation.

[ 1877.145243] Call Trace:
[ 1877.148644] <IRQ>
[ 1877.150580] [<ffffffffa07987c5>] ? mlx5_rsc_event+0x105/0x210 [mlx5_core]
[ 1877.159581] [<ffffffffa0795bd7>] ? mlx5_cq_event+0x57/0xd0 [mlx5_core]
[ 1877.167137] [<ffffffffa079208e>] mlx5_eq_int+0x53e/0x6c0 [mlx5_core]
[ 1877.174526] [<ffffffff8101a679>] ? sched_clock+0x9/0x10
[ 1877.180753] [<ffffffff810f717e>] handle_irq_event_percpu+0x3e/0x1e0
[ 1877.188014] [<ffffffff810f735d>] handle_irq_event+0x3d/0x60
[ 1877.194567] [<ffffffff810f9fe7>] handle_edge_irq+0x77/0x130
[ 1877.201129] [<ffffffff81014c3f>] handle_irq+0xbf/0x150
[ 1877.207244] [<ffffffff815ed78a>] ? atomic_notifier_call_chain+0x1a/0x20
[ 1877.214829] [<ffffffff815f434f>] do_IRQ+0x4f/0xf0
[ 1877.220498] [<ffffffff815e94ad>] common_interrupt+0x6d/0x6d
[ 1877.227025] <EOI>
[ 1877.228967] [<ffffffff814834e2>] ? cpuidle_enter_state+0x52/0xc0
[ 1877.236990] [<ffffffff81483615>] cpuidle_idle_call+0xc5/0x200
[ 1877.243676] [<ffffffff8101bc7e>] arch_cpu_idle+0xe/0x30
[ 1877.249831] [<ffffffff810b4725>] cpu_startup_entry+0xf5/0x290
[ 1877.256513] [<ffffffff815cfee1>] start_secondary+0x265/0x27b
[ 1877.263111] Code: Bad RIP value.
[ 1877.267296] RIP [< (null)>] (null)
[ 1877.273264] RSP <ffff88046fd63df8>
[ 1877.277531] CR2: 0000000000000000

Fixes: 19098df2 ("IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>

1d31e9c0

IB/core: Avoid accessing non-allocated memory when inferring port type · 498ca3c8

Noa Osherovich authored Aug 23, 2017

Commit 44c58487 ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
introduced the concept of type in ah_attr:
 * During ib_register_device, each port is checked for its type which
   is stored in ib_device's port_immutable array.
 * During uverbs' modify_qp, the type is inferred using the port number
   in ib_uverbs_qp_dest struct (address vector) by accessing the
   relevant port_immutable array and the type is passed on to
   providers.

IB spec (version 1.3) enforces a valid port value only in Reset to
Init. During Init to RTR, the address vector must be valid but port
number is not mentioned as a field in the address vector, so its
value is not validated, which leads to accesses to a non-allocated
memory when inferring the port type.

Save the real port number in ib_qp during modify to Init (when the
comp_mask indicates that the port number is valid) and use this value
to infer the port type.

Avoid copying the address vector fields if the matching bit is not set
in the attr_mask. Address vector can't be modified before the port, so
no valid flow is affected.

Fixes: 44c58487 ('IB/core: Define 'ib' and 'roce' rdma_ah_attr types')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>

498ca3c8

22 Aug, 2017 1 commit

RDMA/uverbs: Initialize cq_context appropriately · 65159c05

Bharat Potnuri authored Aug 01, 2017

Initializing cq_context with ev_queue in create_cq(), leads to NULL pointer
dereference in ib_uverbs_comp_handler(), if application doesnot use completion
channel. This patch fixes the cq_context initialization.

Fixes: 1e7710f3 ("IB/core: Change completion channel to use the reworked")
Cc: stable@vger.kernel.org # 4.12
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
(cherry picked from commit 699a2d5b)

65159c05

16 Aug, 2017 10 commits

IB/uverbs: Fix NULL pointer dereference during device removal · 870201f9

Maor Gottlieb authored Aug 16, 2017

As part of ib_uverbs_remove_one which might be triggered upon
reset flow, we trigger IB_EVENT_DEVICE_FATAL event to userspace
application.
If device was removed after uverbs fd was opened but before
ib_uverbs_get_context was called, the event file will be accessed
before it was allocated, result in NULL pointer dereference:

[ 72.325873] BUG: unable to handle kernel NULL pointer dereference at (null)
...
[ 72.325984] IP: _raw_spin_lock_irqsave+0x22/0x40
[ 72.327123] Call Trace:
[ 72.327168] ib_uverbs_async_handler.isra.8+0x2e/0x160 [ib_uverbs]
[ 72.327216] ? synchronize_srcu_expedited+0x27/0x30
[ 72.327269] ib_uverbs_remove_one+0x120/0x2c0 [ib_uverbs]
[ 72.327330] ib_unregister_device+0xd0/0x180 [ib_core]
[ 72.327373] mlx5_ib_remove+0x74/0x140 [mlx5_ib]
[ 72.327422] mlx5_remove_device+0xfb/0x110 [mlx5_core]
[ 72.327466] mlx5_unregister_interface+0x3c/0xa0 [mlx5_core]
[ 72.327509] mlx5_ib_cleanup+0x10/0x962 [mlx5_ib]
[ 72.327546] SyS_delete_module+0x155/0x230
[ 72.328472] ? exit_to_usermode_loop+0x70/0xa6
[ 72.329370] do_syscall_64+0x54/0xc0
[ 72.330262] entry_SYSCALL64_slow_path+0x25/0x25

Fix it by checking that user context was allocated before
trigger the event.

Fixes: 036b1063 ('IB/uverbs: Enable device removal when there are active user space applications')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>

870201f9

IB/core: Protect sysfs entry on ib_unregister_device · 06f8174a

Shiraz Saleem authored Jul 17, 2017

ib_unregister_device is not protecting removal of sysfs entries.
A call to ib_register_device in that window can result in
duplicate sysfs entry warning. Move mutex_unlock to after
ib_device_unregister_sysfs to protect against sysfs entry creation.

This issue is exposed during driver load/unload stress test.

WARNING: CPU: 5 PID: 4445 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5f/0x70
sysfs: cannot create duplicate filename '/class/infiniband/i40iw0'
Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Q87M-D2H
BIOS F7 01/17/2014
Workqueue: i40e i40e_service_task [i40e]
Call Trace:
dump_stack+0x67/0x98
__warn+0xcc/0xf0
warn_slowpath_fmt+0x4a/0x50
? kernfs_path_from_node+0x4b/0x60
sysfs_warn_dup+0x5f/0x70
sysfs_do_create_link_sd.isra.2+0xb7/0xc0
sysfs_create_link+0x20/0x40
device_add+0x28c/0x600
ib_device_register_sysfs+0x58/0x170 [ib_core]
ib_register_device+0x325/0x570 [ib_core]
? i40iw_register_rdma_device+0x1f4/0x400 [i40iw]
? kmem_cache_alloc_trace+0x143/0x330
? __raw_spin_lock_init+0x2d/0x50
i40iw_register_rdma_device+0x2dc/0x400 [i40iw]
i40iw_open+0x10a6/0x1950 [i40iw]
? i40iw_open+0xeab/0x1950 [i40iw]
? i40iw_make_cm_node+0x9c0/0x9c0 [i40iw]
i40e_client_subtask+0xa4/0x110 [i40e]
i40e_service_task+0xc2d/0x1320 [i40e]
process_one_work+0x203/0x710
? process_one_work+0x16f/0x710
worker_thread+0x126/0x4a0
? trace_hardirqs_on+0xd/0x10
kthread+0x112/0x150
? process_one_work+0x710/0x710
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x2e/0x40
---[ end trace fd11b69e21ea7653 ]---
Couldn't register device i40iw0 with driver model
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

06f8174a

iw_cxgb4: fix misuse of integer variable · d4ba61d2

Steve Wise authored Jul 25, 2017

Fixes: ee30f7d5 ("iw_cxgb4: Max fastreg depth depends on DSGL support")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

d4ba61d2

IB/hns: fix memory leak on ah on error return path · 5b59a396

Colin Ian King authored Aug 08, 2017

When dmac is NULL, ah is not being freed on the error return path. Fix
this by kfree'ing it.

Detected by CoverityScan, CID#1452636 ("Resource Leak")

Fixes: d8966fcd ("IB/core: Use rdma_ah_attr accessor functions")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

5b59a396

i40iw: Fix potential fcn_id_array out of bounds · aa939c12

Christopher N Bednarz authored Aug 08, 2017

Avoid out of bounds error by utilizing I40IW_MAX_STATS_COUNT
instead of I40IW_INVALID_FCN_ID.
Signed-off-by: Christopher N Bednarz <christoper.n.bednarz@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

aa939c12

i40iw: Use correct alignment for CQ0 memory · a28f047e

Christopher N Bednarz authored Aug 08, 2017

Utilize correct alignment variable when allocating
DMA memory for CQ0.
Signed-off-by: Christopher N Bednarz <christopher.n.bednarz@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

a28f047e

i40iw: Fix typecast of tcp_seq_num · 29c2415a

Mustafa Ismail authored Aug 08, 2017

The typecast of tcp_seq_num incorrectly uses u8. Fix by
casting to u32.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

29c2415a

i40iw: Correct variable names · 8129331f

Mustafa Ismail authored Aug 08, 2017

Fix incorrect naming of status code and struct. Use inline
instead of immediate.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

8129331f

i40iw: Fix parsing of query/commit FPM buffers · f67ace2d

Chien Tin Tung authored Aug 08, 2017

Parsing of commit/query Host Memory Cache Function Private Memory
is not skipping over reserved fields and incorrectly assigning
those values into object's base/cnt/max_cnt fields. Skip over
reserved fields and set correct values. Also correct memory
alignment requirement for commit/query FPM buffers.
Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Christopher N Bednarz <christopher.n.bednarz@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

f67ace2d

RDMA/vmw_pvrdma: Report CQ missed events · a7d2e039

Bryan Tan authored Aug 10, 2017

There is a chance of a race between arming the CQ and receiving
completions. By reporting CQ missed events any ULPs should poll
again to get the completions.

Fixes: 29c8d9eb ("IB: Add vmw_pvrdma driver")
Acked-by: Aditya Sarwade <asarwade@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

a7d2e039

07 Aug, 2017 1 commit

Merge tag 'rdma-rc-2017-07-26' of... · 48107c4e

Doug Ledford authored Aug 07, 2017

Merge tag 'rdma-rc-2017-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma into leon-ipoib

IPoIB fixes for 4.13

The patchset provides various fixes for IPoIB. It is combination of
fixes to various issues discovered during verification along with
static checkers cleanup patches.

Most of the patches are from pre-git era and hence lack of Fixes lines.

There is one exception in this IPoIB group - addition of patch revert:
Revert "IB/core: Allow QP state transition from reset to error", but
it followed by proper fix to the annoying print, so I thought it is
appropriate to include it.
Signed-off-by: Doug Ledford <dledford@redhat.com>

48107c4e

04 Aug, 2017 5 commits

IB/hns: checking for IS_ERR() instead of NULL · 5db465f2

Dan Carpenter authored Aug 04, 2017

The hns_roce_v1_create_lp_qp() returns NULL on error, not error pointers.

Fixes: bfcc681b ("IB/hns: Fix the bug when free mr")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

5db465f2

RDMA/mlx5: Fix existence check for extended address vector · 931b3c1a

Leon Romanovsky authored Aug 01, 2017

The extended address vector is the highest bit in be32 variable,
but it was compared with the lowest. This patch fixes the endianness
of that check and removes already declared define.

Fixes: 17d2f88f ("IB/mlx5: Add ODP atomics support")
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

931b3c1a

IB/uverbs: Fix device cleanup · efdd6f53

Yishai Hadas authored Aug 01, 2017

Uverbs device should be cleaned up only when there is no
potential usage of.

As part of ib_uverbs_remove_one which might be triggered upon reset flow
the device reference count is decreased as expected and leave the final
cleanup to the FDs that were opened.

Current code increases reference count upon opening a new command FD and
decreases it upon closing the file. The event FD is opened internally
and rely on the command FD by taking on it a reference count.

In case that the command FD was closed and just later the event FD we
may ensure that the device resources as of srcu are still alive as they
are still in use.

Fixing the above by moving the reference count decreasing to the place
where the command FD is really freed instead of doing that when it was
just closed.

fixes: 036b1063 ("IB/uverbs: Enable device removal when there are active user space applications")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>

efdd6f53

RDMA/uverbs: Prevent leak of reserved field · f7a6cb7b

Leon Romanovsky authored Aug 01, 2017

initialize to zero the response structure to prevent
the leakage of "resp.reserved" field.

drivers/infiniband/core/uverbs_cmd.c:1178 ib_uverbs_resize_cq() warn:
	check that 'resp.reserved' doesn't leak information

Fixes: 33b9b3ee ("IB: Add userspace support for resizing CQs")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

f7a6cb7b

IB/core: Fix race condition in resolving IP to MAC · 5fff41e1

Parav Pandit authored Aug 01, 2017

Currently while resolving IP address to MAC address single delayed work
is used for resolving multiple such resolve requests. This singled work
is essentially performs two tasks.
(a) any retry needed to resolve and
(b) it executes the callback function for all completed requests

While work is executing callbacks, any new work scheduled on for this
workqueue is lost because workqueue has completed looking at all pending
requests and now looking at callbacks, but work is still under
execution. Any further retry to look at pending requests in
process_req() after executing callbacks would lead to similar race
condition (may be reduce the probably further but doesn't eliminate it).
Retrying to enqueue work that from queue_req() context is not something
rest of the kernel modules have followed.

Therefore fix in this patch utilizes kernel facility to enqueue multiple
work items to a workqueue. This ensures that no such requests
gets lost in synchronization. Request list is still maintained so that
rdma_cancel_addr() can unlink the request and get the completion with
error sooner. Neighbour update event handling continues to be handled in
same way as before.
Additionally process_req() work entry cancels any pending work for a
request that gets completed while processing those requests.

Originally ib_addr was ST workqueue, but it became MT work queue with
patch of [1]. This patch again makes it similar to ST so that
neighbour update events handler work item doesn't race with
other work items.

In one such below trace, (though on 4.5 based kernel) it can be seen
that process_req() never executed the callback, which is likely for an
event that was schedule by queue_req() when previous callback was
getting executed by workqueue.

 [<ffffffff816b0dde>] schedule+0x3e/0x90
 [<ffffffff816b3c45>] schedule_timeout+0x1b5/0x210
 [<ffffffff81618c37>] ? ip_route_output_flow+0x27/0x70
 [<ffffffffa027f9c9>] ? addr_resolve+0x149/0x1b0 [ib_addr]
 [<ffffffff816b228f>] wait_for_completion+0x10f/0x170
 [<ffffffff810b6140>] ? try_to_wake_up+0x210/0x210
 [<ffffffffa027f220>] ? rdma_copy_addr+0xa0/0xa0 [ib_addr]
 [<ffffffffa0280120>] rdma_addr_find_l2_eth_by_grh+0x1d0/0x278 [ib_addr]
 [<ffffffff81321297>] ? sub_alloc+0x77/0x1c0
 [<ffffffffa02943b7>] ib_init_ah_from_wc+0x3a7/0x5a0 [ib_core]
 [<ffffffffa0457aba>] cm_req_handler+0xea/0x580 [ib_cm]
 [<ffffffff81015982>] ? __switch_to+0x212/0x5e0
 [<ffffffffa04582fd>] cm_work_handler+0x6d/0x150 [ib_cm]
 [<ffffffff810a14c1>] process_one_work+0x151/0x4b0
 [<ffffffff810a1940>] worker_thread+0x120/0x480
 [<ffffffff816b074b>] ? __schedule+0x30b/0x890
 [<ffffffff810a1820>] ? process_one_work+0x4b0/0x4b0
 [<ffffffff810a1820>] ? process_one_work+0x4b0/0x4b0
 [<ffffffff810a6b1e>] kthread+0xce/0xf0
 [<ffffffff810a6a50>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff816b53a2>] ret_from_fork+0x42/0x70
 [<ffffffff810a6a50>] ? kthread_freezable_should_stop+0x70/0x70
INFO: task kworker/u144:1:156520 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kworker/u144:1  D ffff883ffe1d7600     0 156520      2 0x00000080
Workqueue: ib_addr process_req [ib_addr]
 ffff883f446fbbd8 0000000000000046 ffff881f95280000 ffff881ff24de200
 ffff883f66120000 ffff883f446f8008 ffff881f95280000 ffff883f6f9208c4
 ffff883f6f9208c8 00000000ffffffff ffff883f446fbbf8 ffffffff816b0dde

[1] http://lkml.iu.edu/hypermail/linux/kernel/1608.1/05834.htmlSigned-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>

5fff41e1

23 Jul, 2017 11 commits

IB/ipoib: Notify on modify QP failure only when relevant · 5dc78ad1

Erez Shitrit authored Jul 13, 2017

Modify QP can fail and it can be acceptable, like when moving from RST to
ERR state, all the rest are not acceptable and a message to the log
should be printed.

The current code prints on all failures and many messages like:
"Failed to modify QP to ERROR state" appear, even when supported by the
state machine of the QP object.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

5dc78ad1

Revert "IB/core: Allow QP state transition from reset to error" · b287b76e

Leon Romanovsky authored Jul 23, 2017

The commit ebc9ca43 ("IB/core: Allow QP state transition from reset to error")
allowed transition from Reset to Error state for the QPs. This behavior
doesn't follow the IBTA specification 1.3, which in 10.3.1 QUEUE PAIR AND
EE CONTEXT STATES section.

The quote from the spec:
"An error can be forced from any state, except Reset, with
the Modify QP/EE Verb."
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>

b287b76e

IB/ipoib: Remove double pointer assigning · 1b355094

Leon Romanovsky authored Jul 15, 2017

There is no need to assign "p" pointer twice.

This patch fixes the following smatch warning:
drivers/infiniband/ulp/ipoib/ipoib_cm.c:517 ipoib_cm_rx_handler() warn:
	missing break? reassigning 'p->id'

Fixes: 839fcaba ("IPoIB: Connected mode experimental support")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>

1b355094

IB/ipoib: Clean error paths in add port · dc892e17

Leon Romanovsky authored Jul 13, 2017

Refactor error paths in ipoib_add_port() function. The code flow
ensures that the function terminates on every error flow and it makes
redundant all "else" cases.

The functions are called during the flow are returning "result < 0", in
case of error, so there is no need to check it explicitly.

Fixes: 58e9cc90 ("IB/IPoIB: Fix bad error flow in ipoib_add_port()")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>

dc892e17

IB/ipoib: Add get statistics support to SRIOV VF · eb54714d

Feras Daoud authored Jul 02, 2017

Add SRIOV VF support to get traffic statistics.
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

eb54714d

IB/ipoib: Add multicast packets statistics · 4829d964

Alex Vesker authored Jul 10, 2017

Update the multicast counter when multicast packets are received and
provide this information through ethtool support.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

4829d964

IB/ipoib: Set IPOIB_NEIGH_TBL_FLUSH after flushed completion initialization · d2e46fcc

Feras Daoud authored Jul 16, 2017

Set IPOIB_NEIGH_TBL_FLUSH bit after initializing the neighbor
flushed completion, otherwise the garbage collector may signal
a completion while it is not initialized yet.

Fixes: b63b70d8 ("IPoIB: Use a private hash table for path lookup in xmit path")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

d2e46fcc

IB/ipoib: Prevent setting negative values to max_nonsrq_conn_qp · 11f74b40

Alex Vesker authored Jul 13, 2017

Don't allow negative values to max_nonsrq_conn_qp. There is no functional
impact on a negative value but it is logicically incorrect.

Fixes: 68e995a2 ("IPoIB/cm: Add connected mode support for devices without SRQs")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

11f74b40

IB/ipoib: Make sure no in-flight joins while leaving that mcast · a08e1120

Erez Shitrit authored Jul 12, 2017

While cleaning neighs and there is a send-only mcast neigh, the driver
should wait to finish its join process before trying to remove it.

Without this patch, we will see messages like: "ipoib_mcast_leave on an
in-flight join" and unexpected results in the join_complete.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

a08e1120

IB/ipoib: Use cancel_delayed_work_sync when needed · 6bdc8de2

Erez Shitrit authored Jul 12, 2017

The work mcast_task can re-queue itself, so instead of doing
cancel && flush_workqueue, that still can leave a queued task
on the air, use cancel_delayed_work_sync.

Also, no need to use lock over the cancel, the original lock was
due to bit assignment setting (IPOIB_MCAST_RUN) that is not in use
anymore.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

6bdc8de2

IB/ipoib: Fix race between light events and interface restart · edf3f301

Feras Daoud authored Jul 10, 2017

A potential race between light_event and interface restart
may attach multicast group to an already attached QP.

Scenario:
light_event flow goes through ipoib_mcast_dev_flush function,
if a context switch occurs before calling ipoib_mcast_remove_list,
then we may face a situation where the broadcast of the priv is null
and the corresponding QP is not detached yet.
If an "interface restart" runs during the previous context switch,
the following scenario occurs:
When the device goes up, ipoib_ib_dev_up function will be called,
it will send a new registration request to the broadcast group and then
attach the group to the QP that was not detached before.

     IPOIB_FLUSH_LIGHT                                          INTERFACE RESTART

    __ipoib_ib_dev_flush                                                |
        |                                                               |
        |                                                               |
        |                                                               |
    ipoib_mcast_dev_flush                                               |
    Move mcast list and broadcast to remove_list                        |
        |                                                               |
        |                                                               |
    Context Switch-->                                                   |
        |                                                       ipoib_ib_dev_down
        |                                                               |
        |                                                               |
        |                                                       ipoib_ib_dev_up
        |                                                               |
        |                                                               |
        |                                                       ipoib_mcast_join_task
        |                                                       allocate new broadcast
        |                                                               |
        |                                                               |
        |                                                       Attach QP to multicast group
        |                                                               |
        |                                                               |
        |                                                       <--Context Switch
    ipoib_mcast_leave
    Detach QP from multicast group
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

edf3f301

20 Jul, 2017 10 commits

RDMA/core: Initialize port_num in qp_attr · a62ab66b

Ismail, Mustafa authored Jul 14, 2017

Initialize the port_num for iWARP in rdma_init_qp_attr.

Fixes: 5ecce4c9("Check port number supplied by user verbs cmds")
Cc: <stable@vger.kernel.org> # v2.6.14+
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

a62ab66b

RDMA/uverbs: Fix the check for port number · 5a7a88f1

Ismail, Mustafa authored Jul 14, 2017

The port number is only valid if IB_QP_PORT is set in the mask.
So only check port number if it is valid to prevent modify_qp from
failing due to an invalid port number.

Fixes: 5ecce4c9("Check port number supplied by user verbs cmds")
Cc: <stable@vger.kernel.org> # v2.6.14+
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

5a7a88f1

IB/cma: Fix reference count leak when no ipv4 addresses are set · 963916fd

Kalderon, Michal authored Jul 06, 2017

Once in_dev_get is called to receive in_device pointer, the
in_device reference counter is increased, but if there are
no ipv4 addresses configured on the net-device the ifa_list
will be null, resulting in a flow that doesn't call in_dev_put
to decrease the ref_cnt.
This was exposed when running RoCE over ipv6 without any ipv4
addresses configured

Fixes: commit 8e3867310c90 ("IB/cma: Fix a race condition in iboe_addr_get_sgid()")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

963916fd

RDMA/iser: don't send an rkey if all data is written as immadiate-data · e6e52aec

Sagi Grimberg authored Jul 06, 2017

We might get some bogus error completions in case the target will
remotely invalidate the rkey and the HCA will need to retransmit
from this buffer.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>

e6e52aec

rxe: fix broken receive queue draining · 12171971

Vijay Immanuel authored Jun 27, 2017

If we modified the qp to ERROR state, and
drained the recieve queue, post_recv must
trigger the responder task to complete
the drain work request.

Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>--
Signed-off-by: Doug Ledford <dledford@redhat.com>

12171971

RDMA/qedr: Prevent memory overrun in verbs' user responses · c75d3ec8

Amrani, Ram authored Jun 26, 2017

Wrap ib_copy_to_udata with a function that ensures that the data
being copied over to user space isn't longer than the allowed.

Fixes: cecbcddf ("qedr: Add support for QP verbs")
Fixes: a7efd777 ("qedr: Add support for PD,PKEY and CQ verbs")
Fixes: ac1b36e5 ("qedr: Add support for user context verbs")
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

c75d3ec8

iw_cxgb4: don't use WR keys/addrs for 0 byte reads · 720336c4

Ganesh Goudar authored Jun 21, 2017

Only use the read sge lkey/addr and the remote rkey/addr if the
length of the read is not zero. Otherwise the read response might
be treated as the RTR read response and not delivered to the
application. Or worse Terminator hardware will fail a 0B read
if the STAG is 0 even if the read length is 0.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

720336c4

IB/mlx4: Fix CM REQ retries in paravirt mode · 4542e3c7

Håkon Bugge authored Jun 20, 2017

CM REQs cannot be successfully retried, because a new pv_cm_id is
created for each request, without checking if one already exists.

By checking if an id exists before creating one, the bug is fixed.

This bug can be provoked by running an RDMA CM user-land application,
but inserting a five seconds delay before the rdma_accept() call on
the passive side. This delay is larger than the default CMA timeout,
and triggers a retry from the active side. The retried REQ will use
another pv_cm_id (the cm_id on the wire). This confuses the CM
protocol and two REJs are sent from the passive side.

Here is an excerpt from ibdump running without the patch:

3.285092 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello)
7.382711 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello)
7.382861 LID: 4 -> LID: 4 InfiniBand 290 CM: ConnectReject
7.387644 LID: 4 -> LID: 4 InfiniBand 290 CM: ConnectReject

and here is the same with bug fix applied:

3.251010 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello)
7.349387 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello)
8.258443 LID: 4 -> LID: 4 SDP 290 CM: ConnectReply(SDP Hello)
8.259890 LID: 4 -> LID: 4 InfiniBand 290 CM: ReadyToUse
Suggested-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reported-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Tested-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Doug Ledford <dledford@redhat.com>

4542e3c7

IB/rdmavt: Setting of QP timeout can overflow jiffies computation · a25ce427

Kaike Wan authored Jun 17, 2017

Current computation of qp->timeout_jiffies in rvt_modify_qp() will cause
overflow due to the fact that the input to the function usecs_to_jiffies
is only 32-bit ( unsigned int). Overflow will occur when attr->timeout is
equal to or greater than 30. The consequence is unnecessarily excessive
retry and thus degradation of the system performance.

This patch fixes the problem by limiting the input to 5-bit and calling
usecs_to_jiffies() before multiplying the scaling factor.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

a25ce427

IB/core: Fix sparse warnings · 266098b8

Matan Barak authored Jun 08, 2017

Delete unused variables to prevent sparse warnings.

Fixes: db1b5ddd ("IB/core: Rename uverbs event file structure")
Fixes: fd3c7904 ("IB/core: Change idr objects to use the new schema")
Signed-off-by: Doug Ledford <dledford@redhat.com>

266098b8