Commits · 3289025aedc018f8fd9d0e37fb9efa0c6d531ffa · Kirill Smelkov / linux

02 Jan, 2017 40 commits

RDS: add receive message trace used by application · 3289025a

Santosh Shilimkar authored Jul 04, 2016

Socket option to tap receive path latency in various stages
in nano seconds. It can be enabled on selective sockets using
using SO_RDS_MSG_RXPATH_LATENCY socket option. RDS will return
the data to application with RDS_CMSG_RXPATH_LATENCY in defined
format. Scope is left to add more trace points for future
without need of change in the interface.
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

3289025a

RDS: make message size limit compliant with spec · f9fb69ad

Avinash Repaka authored Feb 29, 2016

RDS support max message size as 1M but the code doesn't check this
in all cases. Patch fixes it for RDMA & non-RDMA and RDS MR size
and its enforced irrespective of underlying transport.
Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

f9fb69ad

RDS: add stat for socket recv memory usage · 192a798f

Venkat Venkatsubra authored Jul 09, 2016

Tracks the receive side memory added to scokets and removed from sockets.
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

192a798f

RDS: IB: fix panic due to handlers running post teardown · cf657269

Santosh Shilimkar authored Sep 29, 2016

Shutdown code reaping loop takes care of emptying the
CQ's before they being destroyed. And once tasklets are
killed, the hanlders are not expected to run.

But because of core tasklet code issues, tasklet handler could
still run even after tasklet_kill,
RDS IB shutdown code already reaps the CQs before freeing
cq/qp resources so as such the handlers have nothing left
to do post shutdown.

On other hand any handler running after teardown and trying
to access already freed qp/cq resources causes issues
Patch fixes this race by  makes sure that handlers returns
without any action post teardown.
Reviewed-by: Wengang <wen.gang.wang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

cf657269

RDS: RDMA: Fix the composite message user notification · 941f8d55

Santosh Shilimkar authored Feb 18, 2016

When application sends an RDS RDMA composite message consist of
RDMA transfer to be followed up by non RDMA payload, it expect to
be notified *only* when the full message gets delivered. RDS RDMA
notification doesn't behave this way though.

Thanks to Venkat for debug and root casuing the issue
where only first part of the message(RDMA) was
successfully delivered but remainder payload delivery failed.
In that case, application should not be notified with
a false positive of message delivery success.

Fix this case by making sure the user gets notified only after
the full message delivery.
Reviewed-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

941f8d55

RDS: IB: Add vector spreading for cqs · be2f76ea

Santosh Shilimkar authored Jul 04, 2016

Based on available device vectors, allocate cqs accordingly to
get better spread of completion vectors which helps performace
great deal..
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

be2f76ea

RDS: IB: add few useful cache stasts · 09b2b8f5

Santosh Shilimkar authored Jul 09, 2016

Tracks the ib receive cache total, incoming and frag allocations.
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

09b2b8f5

RDS: IB: track and log active side endpoint in connection · 581d53c9

Santosh Shilimkar authored Jul 09, 2016

Useful to know the active and passive end points in a
RDS IB connection.
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

581d53c9

RDS: RDMA: silence the use_once mr log flood · c536a068

Santosh Shilimkar authored Jul 03, 2016

In absence of extension headers, message log will keep
flooding the console. As such even without use_once we can
clean up the MRs so its not really an error case message
so make it debug message
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

c536a068

RDS: IB: split the mr registration and invalidation path · 56012459

Santosh Shilimkar authored Mar 08, 2016

MR invalidation in RDS is done in background thread and not in
data path like registration. So break the dependency between them
which helps to remove the performance bottleneck.
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

56012459

RDS: RDMA: return appropriate error on rdma map failures · 584a8279

Santosh Shilimkar authored Jul 04, 2016

The first message to a remote node should prompt a new
connection even if it is RDMA operation. For RDMA operation
the MR mapping can fail because connections is not yet up.

Since the connection establishment is asynchronous,
we make sure the map failure because of unavailable
connection reach to the user by appropriate error code.
Before returning to the user, lets trigger the connection
so that its ready for the next retry.
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

584a8279

RDS: RDMA: start rdma listening after init · 8d5d8a5f

Qing Huang authored Jul 04, 2016

This prevents RDS from handling incoming rdma packets before RDS
completes initializing its recv/send components.
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

8d5d8a5f

RDS: RDMA: fix the ib_map_mr_sg_zbva() argument · 3e56c2f8

Santosh Shilimkar authored Dec 04, 2016

Fixes warning: Using plain integer as NULL pointer
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

3e56c2f8

RDS: IB: make the transport retry count smallest · fab8688d

Santosh Shilimkar authored Jul 04, 2016

Transport retry is not much useful since it indicate packet loss
in fabric so its better to failover fast rather than longer retry.
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

fab8688d

RDS: IB: include faddr in connection log · ff3f19a2
Santosh Shilimkar authored Mar 14, 2016
```
Also use pr_* for it.
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
```
ff3f19a2

RDS: mark few internal functions static to make sparse build happy · bb789763

Santosh Shilimkar authored Dec 04, 2016

Fixes below warnings:
warning: symbol 'rds_send_probe' was not declared. Should it be static?
warning: symbol 'rds_send_ping' was not declared. Should it be static?
warning: symbol 'rds_tcp_accept_one_path' was not declared. Should it be static?
warning: symbol 'rds_walk_conn_path_info' was not declared. Should it be static?
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

bb789763

RDS: log the address on bind failure · f69b22e6

Santosh Shilimkar authored Nov 04, 2015

It's useful to know the IP address when RDS fails to bind a
connection. Thus, adding it to the error message.

Orabug: 21894138
Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

f69b22e6

Merge branch 'mlx5-odp' · 525dfa2c

David S. Miller authored Jan 02, 2017

Saeed Mahameed says:

====================
Mellanox mlx5 core and ODP updates 2017-01-01

The following eleven patches mainly come from Artemy Kovalyov
who expanded mlx5 on-demand-paging (ODP) support. In addition
there are three cleanup patches which don't change any functionality,
but are needed to align codebase prior accepting other patches.

Memory region (MR) in IB can be huge and ODP (on-demand paging)
technique allows to use unpinned memory, which can be consumed and
released on demand. This allows to applications do not pin down
the underlying physical pages of the address space, and save from them
need to track the validity of the mappings.

Rather, the HCA requests the latest translations from the OS when pages
are not present, and the OS invalidates translations which are no longer
valid due to either non-present pages or mapping changes.

In existing ODP implementation applications is needed to register
memory buffers for communication, though registered memory regions
need not have valid mappings at registration time.

This patch set performs the following steps to expand
current ODP implementation:

1. It refactors UMR to support large regions, by introducing generic
   function to perform HCA translation table modifications. This
   function supports both atomic and process contexts and is not limited
   by number of modified entries.

   This function allows to enable reallocated memory regions of
   arbitrary size, so adding MR cache buckets to support up to 16GB MRs.

2. It changes page fault event format and refactor page faults logic
   together with addition of atomic support.

3. It prepares mlx5 core code to support implicit registration with
   simplified and relaxed semantics.

   Implicit ODP semantics allows to applications provide special memory
   key that represents their complete address space. Thus all IO accesses
   referencing to this key (with proper access rights associated with the key)
   wouldn't need not register any virtual address range.

Thanks,
        Artemy, Ilya and Leon

v1->v2:
  - Don't use 'inline' in .c files
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

525dfa2c

IB/mlx5: Improve MR check · aa8e08d2

Artemy Kovalyov authored Jan 02, 2017

Add "type" field to mlx5_core MKEY struct.
Check whether page fault happens on MKEY corresponding to MR.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aa8e08d2

IB/mlx5: Add ODP atomics support · 17d2f88f

Artemy Kovalyov authored Jan 02, 2017

Handle ODP atomic operations. When initiator of RDMA atomic
operation use ODP MR to provide source data handle pagefault properly.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

17d2f88f

{net,IB}/mlx5: Refactor page fault handling · d9aaed83

Artemy Kovalyov authored Jan 02, 2017

* Update page fault event according to last specification.
* Separate code path for page fault EQ, completion EQ and async EQ.
* Move page fault handling work queue from mlx5_ib static variable
  into mlx5_core page fault EQ.
* Allocate memory to store ODP event dynamically as the
  events arrive, since in atomic context - use mempool.
* Make mlx5_ib page fault handler run in process context.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d9aaed83

net/mlx5: Update PAGE_FAULT_RESUME layout · 223cdc72

Artemy Kovalyov authored Jan 02, 2017

Update PAGE_FAULT_RESUME command layout.

Three bit fields describing page fault: rdma, rdma_write, req_res gave 8
possible combinations, while only a few were legal. Now they
are interpreted as three-bit type field, where former legal
combinations turns into corresponding types and unused were added as new
types.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

223cdc72

IB/mlx5: Add MR cache for large UMR regions · 7d0cc6ed

Artemy Kovalyov authored Jan 02, 2017

In this change we turn mlx5_ib_update_mtt() into generic
mlx5_ib_update_xlt() to perfrom HCA translation table modifiactions
supporting both atomic and process contexts and not limited by number
of modified entries.
Using this function we increase preallocated MRs up to 16GB.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7d0cc6ed

IB/mlx5: Add support for big MRs · c438fde1

Artemy Kovalyov authored Jan 02, 2017

Make use of extended UMR translation offset.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c438fde1

IB/mlx5: Refactor UMR post send format · 31616255

Artemy Kovalyov authored Jan 02, 2017

* Update struct mlx5_wqe_umr_ctrl_seg.
* Currenlty UMR send_flags aim only certain use cases: enabled/disable
  cached MR, modifying XLT for ODP. By making flags independent make UMR
  more flexible allowing arbitrary manipulations.
* Since different UMR formats have different entry sizes UMR request
  should receive exact size of translation table update instead of
  number of entries. Rename field npages to xlt_size in struct mlx5_umr_wr
  and update relevant code accordingly.
* Add support of length64 bit.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

31616255

net/mlx5: Support new MR features · bcda1aca

Artemy Kovalyov authored Jan 02, 2017

This patch adds the following items to IFC file.

1. MLX5_MKC_ACCESS_MODE_KSM enum value for creating KSM memory keys.
KSM access mode used when indirect MKey associated with fixed memory
size entries.

2. null_mkey field that is used to indicate non-present KLM/KSM
entries, where it causes the device to generate page fault event
when trying to access it.

3. struct mlx5_ifc_cmd_hca_cap_bits capability bits indicating
related value/field is supported:
* fixed_buffer_size - MLX5_MKC_ACCESS_MODE_KSM
* umr_extended_translation_offset - translation_offset_42_16
    in UMR ctrl segment
* null_mkey - null_mkey in QUERY_SPECIAL_CONTEXTS
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bcda1aca

IB/mlx5: Add helper mlx5_ib_post_send_wait · d5ea2df9

Binoy Jayan authored Jan 02, 2017

Clean up the following common code (to post a list of work requests to the
send queue of the specified QP) at various places and add a helper function
'mlx5_ib_post_send_wait' to implement the same.

 - Initialize 'mlx5_ib_umr_context' on stack
 - Assign "mlx5_umr_wr:wr:wr_cqe to umr_context.cqe
 - Acquire the semaphore
 - call ib_post_send with a single ib_send_wr
 - wait_for_completion()
 - Check for umr_context.status
 - Release the semaphore
Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d5ea2df9

IB/mlx5: Reorder code in query device command · 9f885201

Leon Romanovsky authored Jan 02, 2017

The order of features exposed by private mlx5-abi.h
file is CQE zipping, packet pacing and multi-packet WQE.

The internal order implemented in mlx5_ib_query_device() is
multi-packet WQE, CQE zipping and packet pacing.

Such difference hurts code readability, so let's sync,
while mlx5-abi.h (exposed to userspace) is the primary
order.

This commit doesn't change any functionality.
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9f885201

net/mlx5: Fix offset naming for reserved fields in hca_cap_bits · 7b13558f

Max Gurtovoy authored Jan 02, 2017

Fix offset for reserved fields.

Fixes: 7486216b ("{net,IB}/mlx5: mlx5_ifc updates")
Fixes: b4ff3a36 ("net/mlx5: Use offset based reserved field names in the IFC header file")
Fixes: 7d5e1423 ("net/mlx5: Update mlx5_ifc hardware features")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7b13558f

Merge tag 'wireless-drivers-next-for-davem-2017-01-02' of... · 85eb018f

David S. Miller authored Jan 02, 2017

Merge tag 'wireless-drivers-next-for-davem-2017-01-02' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.11

The most notable change here is the inclusion of airtime fairness
scheduling to ath9k. It prevents slow clients from hogging all the
airtime and unfairly slowing down faster clients.

Otherwise smaller changes and cleanup.

Major changes:

ath9k

* cleanup eeprom endian handling
* add airtime fairness scheduling

ath10k

* fix issues for new QCA9377 firmware version
* support dev_coredump() for firmware crash dump
* enable channel 169 on 5 GHz band
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

85eb018f

net: stmmac: remove unused duplicate property snps,axi_all · 31b95c9b

Niklas Cassel authored Dec 30, 2016

For core revision 3.x Address-Aligned Beats is available in two registers.
The DT property snps,aal was created for AAL in the DMA bus register,
which is a read/write bit.
The DT property snps,axi_all was created for AXI_AAL in the AXI bus mode
register, which is a read only bit that reflects the value of AAL in the
DMA bus register.

Since the value of snps,axi_all is never used in the driver,
and since the property was created for a bit that is read only,
it should be safe to remove the property.
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

31b95c9b

Merge branch 'qed-driver-updates' · 4b64e1a4

David S. Miller authored Jan 01, 2017

Yuval Mintz says:

====================
qed*: Driver updates

The more interesting changes in this series include:
  - Restructuring of the qede files - qede_main.c has grown big and this
    series splits it into 3 parts [patches #2 and #3].
  - Some significant changes in the API through which RSS indirection
    table gets configured [#8].
  - Support for ndo_set_vf_trust() [#9] which would regulate which VFs
    are allowed to use promisc/multi-promisc mode.

It also contains various minor changes to qed/qede, as well as
non-functional changes [#1, #12] to complement other changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4b64e1a4

qed*: Advance driver versions to 8.10.10.20. · ce742922

Mintz, Yuval authored Jan 01, 2017

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ce742922

qed: Conserve RDMA resources when !QEDR · 1fe582ec

Ram Amrani authored Jan 01, 2017

If qedr isn't part of the kernel then don't allocate RDMA resources
for it in qed.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1fe582ec

qed: Support Multicast on Tx-switching · 88067876

Mintz, Yuval authored Jan 01, 2017

Currently multicast traffic wouldn't be routed internally to
listener; Instead it would only be sent to network via the
physical carrier.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

88067876

qed*: Add support for ndo_set_vf_trust · f990c82c

Mintz, Yuval authored Jan 01, 2017

Trusted VFs would be allowed to receive promiscuous and
multicast promiscuous data.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f990c82c

qed*: RSS indirection based on queue-handles · f29ffdb6

Mintz, Yuval authored Jan 01, 2017

A step toward having qede agnostic to the queue configurations
in firmware/hardware - let the RSS indirections use queue handles
instead of actual queue indices.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f29ffdb6

qede: Remove unnecessary datapath dereference · 04e0fd00

Mintz, Yuval authored Jan 01, 2017

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

04e0fd00

qede - mark SKB as encapsulated · 7ca547bd

Manish Chopra authored Jan 01, 2017

When driver receives a recognized encapsulated packet it needs
to set the skb->encapsulation field as well.
Signed-off-by: Manish Chopra <Manish.Chopra@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7ca547bd

qede: Postpone reallocation until NAPI end · e3eef7ee

Mintz, Yuval authored Jan 01, 2017

During Rx flow driver allocates a replacement buffer each time
it consumes an Rx buffer. Failing to do so, it would consume the
currently processed buffer and re-post it on the ring.
As a result, the Rx ring is always completely full [from driver POV].

We now allow the Rx ring to shorten by doing the re-allocations
at the end of the NAPI run. The only limitation is that we still want to
make sure each time we reallocate that we'd still have sufficient
elements in the Rx ring to guarantee that FW would be able to post
additional data and trigger an interrupt.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e3eef7ee