Commits · f0e8cb6106da27039cdc23ecf5b5a776d7c7e66e · Kirill Smelkov / linux

03 Jun, 2021 37 commits

nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP · f0e8cb61

Shai Malin authored Jun 02, 2021

This patch will present the structure for the NVMeTCP offload common
layer driver. This module is added under "drivers/nvme/host/" and future
offload drivers which will register to it will be placed under
"drivers/nvme/hw".
This new driver will be enabled by the Kconfig "NVM Express over Fabrics
TCP offload commmon layer".
In order to support the new transport type, for host mode, no change is
needed.

Each new vendor-specific offload driver will register to this ULP during
its probe function, by filling out the nvme_tcp_ofld_dev->ops and
nvme_tcp_ofld_dev->private_data and calling nvme_tcp_ofld_register_dev
with the initialized struct.

The internal implementation:
- tcp-offload.h:
  Includes all common structs and ops to be used and shared by offload
  drivers.

- tcp-offload.c:
  Includes the init function which registers as a NVMf transport just
  like any other transport.
Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dean Balandin <dbalandin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f0e8cb61

Merge branch 'tipc-cleanups' · ae1d9cc3

David S. Miller authored Jun 03, 2021

Jon Maloy says:

====================
tipc: some small cleanups

We make some minor code cleanups and improvements.

v2: Changed value of TIPC_ANY_SCOPE macro in patch #3
    to avoid compiler warning
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ae1d9cc3

tipc: simplify handling of lookup scope during multicast message reception · 5ef21325

Jon Maloy authored Jun 02, 2021

We introduce a new macro TIPC_ANY_SCOPE to make the handling of the
lookup scope value more comprehensible during multicast reception.

The (unchanged) rules go as follows:

1) Multicast messages sent from own node are delivered to all matching
   sockets on the own node, irrespective of their binding scope.

2) Multicast messages sent from other nodes arrive here because they
   have found TIPC_CLUSTER_SCOPE bindings emanating from this node.
   Those messages should be delivered to exactly those sockets, but not
   to local sockets bound with TIPC_NODE_SCOPE, since the latter
   obviously were not meant to be visible for those senders.

3) Group multicast/broadcast messages are delivered to the sockets with
   a binding scope matching exactly the lookup scope indicated in the
   message header, and nobody else.
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Tested-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5ef21325

tipc: refactor function tipc_sk_anc_data_recv() · 62633c2f

Jon Maloy authored Jun 02, 2021

We refactor tipc_sk_anc_data_recv() to make it slightly more
comprehensible, but also to facilitate application of some additions
to the code in a future commit.
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Tested-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

62633c2f

tipc: eliminate redundant fields in struct tipc_sock · 14623e00

Jon Maloy authored Jun 02, 2021

We eliminate the redundant fields conn_type and conn_instance in
struct tipc_sock. On the connecting side, this information is already
present in the unused (after the connection is established) part of
the pre-allocated header, and on the accepting side, we put it there
when the new socket is created.
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Tested-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

14623e00

Merge branch 'QED-NVMeTCP-Offload' · eda1bc65

David S. Miller authored Jun 03, 2021

Shai Malin says:

====================
QED NVMeTCP Offload

Intro:
======
This is the qed part of Marvell’s NVMeTCP offload series, shared as
RFC series "NVMeTCP Offload ULP and QEDN Device Drive".
This part is a standalone series, and is not dependent on other parts
of the RFC.
The overall goal is to add qedn as the offload driver for NVMeTCP,
alongside the existing offload drivers (qedr, qedi and qedf for rdma,
iscsi and fcoe respectively).

In this series we are making the necessary changes to qed to enable this
by exposing APIs for FW/HW initializations.

The qedn series (and required changes to NVMe stack) will be sent to the
linux-nvme mailing list.
I have included more details on the upstream plan under section with the
same name below.

The Series Patches:
===================
1. qed: Add TCP_ULP FW resource layout – replacing iSCSI when common
   with NVMeTCP.
2. qed: Add NVMeTCP Offload PF Level FW and HW HSI.
3. qed: Add NVMeTCP Offload Connection Level FW and HW HSI.
4. qed: Add support of HW filter block – enables redirecting NVMeTCP
   traffic to the dedicated PF.
5. qed: Add NVMeTCP Offload IO Level FW and HW HSI.
6. qed: Add NVMeTCP Offload IO Level FW Initializations.
7. qed: Add IP services APIs support –VLAN, IP routing and reserving
   TCP ports for the offload device.

The NVMeTCP Offload:
====================
With the goal of enabling a generic infrastructure that allows NVMe/TCP
offload devices like NICs to seamlessly plug into the NVMe-oF stack, this
patch series introduces the nvme-tcp-offload ULP host layer, which will
be a new transport type called "tcp-offload" and will serve as an
abstraction layer to work with vendor specific nvme-tcp offload drivers.

NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes
both the TCP level and the NVMeTCP level.

The nvme-tcp-offload transport can co-exist with the existing tcp and
other transports. The tcp offload was designed so that stack changes are
kept to a bare minimum: only registering new transports.
All other APIs, ops etc. are identical to the regular tcp transport.
Representing the TCP offload as a new transport allows clear and manageable
differentiation between the connections which should use the offload path
and those that are not offloaded (even on the same device).

The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma:

* NVMe layer: *

       [ nvme/nvme-fabrics/blk-mq ]
             |
        (nvme API and blk-mq API)
             |
             |
* Vendor agnostic transport layer: *

      [ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ]
             |        |             |
           (Verbs)
             |        |             |
             |     (Socket)
             |        |             |
             |        |        (nvme-tcp-offload API)
             |        |             |
             |        |             |
* Vendor Specific Driver: *

             |        |             |
           [ qedr ]
                      |             |
                   [ qede ]
                                    |
                                  [ qedn ]

Performance:
============
With this implementation on top of the Marvell qedn driver (using the
Marvell FastLinQ NIC), we were able to demonstrate the following CPU
utilization improvement:

On AMD EPYC 7402, 2.80GHz, 28 cores:
- For 16K queued read IOs, 16jobs, 4qd (50Gbps line rate):
  Improved the CPU utilization from 15.1% with NVMeTCP SW to 4.7% with
  NVMeTCP offload.

On Intel(R) Xeon(R) Gold 5122 CPU, 3.60GHz, 16 cores:
- For 512K queued read IOs, 16jobs, 4qd (25Gbps line rate):
  Improved the CPU utilization from 16.3% with NVMeTCP SW to 1.1% with
  NVMeTCP offload.

In addition, we were able to demonstrate the following latency improvement:
- For 200K read IOPS (16 jobs, 16 qd, with fio rate limiter):
  Improved the average latency from 105 usec with NVMeTCP SW to 39 usec
  with NVMeTCP offload.

  Improved the 99.99 tail latency from 570 usec with NVMeTCP SW to 91 usec
  with NVMeTCP offload.

The end-to-end offload latency was measured from fio while running against
back end of null device.

The Marvell FastLinQ NIC HW engine:
====================================
The Marvell NIC HW engine is capable of offloading the entire TCP/IP
stack and managing up to 64K connections per PF, already implemented and
upstream use cases for this include iWARP (by the Marvell qedr driver)
and iSCSI (by the Marvell qedi driver).
In addition, the Marvell NIC HW engine offloads the NVMeTCP queue layer
and is able to manage the IO level also in case of TCP re-transmissions
and OOO events.
The HW engine enables direct data placement (including the data digest CRC
calculation and validation) and direct data transmission (including data
digest CRC calculation).

The Marvell qedn driver:
========================
The new driver will be added under "drivers/nvme/hw" and will be enabled
by the Kconfig "Marvell NVM Express over Fabrics TCP offload".
As part of the qedn init, the driver will register as a pci device driver
and will work with the Marvell fastlinQ NIC.
As part of the probe, the driver will register to the nvme_tcp_offload
(ULP) and to the qed module (qed_nvmetcp_ops) - similar to other
"qed_*_ops" which are used by the qede, qedr, qedf and qedi device
drivers.

Upstream Plan:
=============
The RFC series "NVMeTCP Offload ULP and QEDN Device Driver"
https://lore.kernel.org/netdev/20210531225222.16992-1-smalin@marvell.com/
was designed in a modular way so that part 1 (nvme-tcp-offload) and
part 2 (qed) are independent and part 3 (qedn) depends on both parts 1+2.

- Part 1 (RFC patch 1-8): NVMeTCP Offload ULP
  The nvme-tcp-offload patches, will be sent to
  'linux-nvme@lists.infradead.org'.

- Part 2 (RFC patches 9-15): QED NVMeTCP Offload
  The qed infrastructure, will be sent to 'netdev@vger.kernel.org'.

Once part 1 and 2 are accepted:

- Part 3 (RFC patches 16-27): QEDN NVMeTCP Offload
  The qedn patches, will be sent to 'linux-nvme@lists.infradead.org'.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

eda1bc65

qed: Add IP services APIs support · 806ee7f8

Nikolay Assa authored Jun 02, 2021

This patch introduces APIs which the NVMeTCP Offload device (qedn)
will use through the paired net-device (qede).
It includes APIs for:
- ipv4/ipv6 routing
- get VLAN from net-device
- TCP ports reservation
Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Nikolay Assa <nassa@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

806ee7f8

qed: Add NVMeTCP Offload IO Level FW Initializations · 826da486

Shai Malin authored Jun 02, 2021

This patch introduces the NVMeTCP FW initializations which is used
to initialize the IO level configuration into a per IO HW
resource ("task") as part of the IO path flow.

This includes:
- Write IO FW initialization
- Read IO FW initialization.
- IC-Req and IC-Resp FW exchange.
- FW Cleanup flow (Flush IO).
Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

826da486

qed: Add NVMeTCP Offload IO Level FW and HW HSI · ab47bdfd

Shai Malin authored Jun 02, 2021

This patch introduces the NVMeTCP Offload FW and HW  HSI in order
to initialize the IO level configuration into a per IO HW
resource ("task") as part of the IO path flow.
Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ab47bdfd

qed: Add support of HW filter block · 203d136e

Prabhakar Kushwaha authored Jun 02, 2021

This patch introduces the functionality of HW filter block.
It adds and removes filters based on source and target TCP port.

It also add functionality to clear all filters at once.
Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

203d136e

qed: Add NVMeTCP Offload Connection Level FW and HW HSI · 76684ab8

Shai Malin authored Jun 02, 2021

This patch introduces the NVMeTCP HSI and HSI functionality in order to
initialize and interact with the HW device as part of the connection level
HSI.

This includes:
- Connection offload: offload a TCP connection to the FW.
- Connection update: update the ICReq-ICResp params
- Connection clear SQ: outstanding IOs FW flush.
- Connection termination: terminate the TCP connection and flush the FW.
Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

76684ab8

qed: Add NVMeTCP Offload PF Level FW and HW HSI · 897e87a1

Shai Malin authored Jun 02, 2021

This patch introduces the NVMeTCP device and PF level HSI and HSI
functionality in order to initialize and interact with the HW device.
The patch also adds qed NVMeTCP personality.

This patch is based on the qede, qedr, qedi, qedf drivers HSI.
Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Dean Balandin <dbalandin@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

897e87a1

qed: Add TCP_ULP FW resource layout · 1bd4f571

Omkar Kulkarni authored Jun 02, 2021

Add TCP_ULP as a storage common TCP offload FW resource layout.
This will be used by the core driver (QED) for both the NVMeTCP and iSCSI.
Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

1bd4f571

nfc: mrvl: reduce the scope of local variables · 2c95e6c7

Krzysztof Kozlowski authored Jun 02, 2021

In two places the 'ep_desc' and 'skb' local variables are used only
within if() or for() block, so they scope can be reduced which makes the
entire code slightly easier to follow.  No functional change.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2c95e6c7

nfc: mrvl: remove useless "continue" at end of loop · a5822404

Krzysztof Kozlowski authored Jun 02, 2021

The "continue" statement at the end of a for loop does not have an
effect.  Entire loop contents can be slightly simplified to increase
code readability.  No functional change.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a5822404

Merge branch 'smc-next' · 81ac670a

David S. Miller authored Jun 03, 2021

Karsten Graul says:

====================
net/smc: updates 2021-06-02

Please apply the following patch series for smc to netdev's net-next tree.

Both patches are cleanups and remove unnecessary code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

81ac670a

net/smc: no need to flush smcd_dev's event_wq before destroying it · 5e4a43ce

Julian Wiedmann authored Jun 02, 2021

destroy_workqueue() already calls drain_workqueue(), which is a stronger
variant of flush_workqueue().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5e4a43ce

net/smc: avoid possible duplicate dmb unregistration · f8e0a68b

Karsten Graul authored Jun 02, 2021

smc_lgr_cleanup() calls smcd_unregister_all_dmbs() as part of the link
group termination process. This is a leftover from the times when
smc_lgr_cleanup() scheduled a worker to actually free the link group.
Nowadays smc_lgr_cleanup() directly calls smc_lgr_free() without any
delay so an earlier dmb unregistration is no longer needed.
So remove smcd_unregister_all_dmbs() and the call to it.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f8e0a68b

Merge branch 'xpcs-phylink_pcs_ops' · c356be05

David S. Miller authored Jun 03, 2021

Vladimir Oltean says:

====================
Convert xpcs to phylink_pcs_ops

Background: the sja1105 DSA driver currently drives a Designware XPCS
for SGMII and 2500base-X, and it would be nice to reuse some code with
the xpcs module. This would also help consolidate the phylink_pcs_ops,
since the only other dedicated PCS driver, currently, is the lynx_pcs.

Therefore, this series makes the xpcs expose the same kind of API that
the lynx_pcs module does. The main changes are getting rid of struct
mdio_xpcs_ops, being compatible with struct phylink_pcs_ops and being
less reliant on the phy_interface_t passed to xpcs_probe (now renamed to
xpcs_create).

This patch series is partially tested (some code paths have been covered
on the NXP SJA1105 and some others with the help of Vee Khee Wong on
Intel Tiger Lake / stmmac) but further testing on 10G setups would be
appreciated, if possible.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c356be05

net: pcs: xpcs: convert to phylink_pcs_ops · 11059740

Vladimir Oltean authored Jun 02, 2021

Since all the remaining members of struct mdio_xpcs_ops have direct
equivalents in struct phylink_pcs_ops, it is about time we remove it
altogether.

Since the phylink ops return void, we need to remove the error
propagation from the various xpcs methods and simply print an error
message where appropriate.

Since xpcs_get_state_c73() detects link faults and attempts to reset the
link on its own by calling xpcs_config(), but xpcs_config() now has a
lot of phylink arguments which are not needed and cannot be simply
fabricated by anybody else except phylink, the actual implementation has
been moved into a smaller xpcs_do_config().

The const struct mdio_xpcs_ops *priv->hw->xpcs has been removed, so we
need to look at the struct mdio_xpcs_args pointer now as an indication
whether the port has an XPCS or not.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

11059740

net: pcs: xpcs: convert to mdio_device · 2cac15da

Vladimir Oltean authored Jun 02, 2021

Unify the 2 existing PCS drivers (lynx and xpcs) by doing a similar
thing on probe, which is to have a *_create function that takes a
struct mdio_device * given by the caller, and builds a private PCS
structure around that.

This changes stmmac to hold only a pointer to the xpcs, as opposed to
the full structure. This will be used in the next patch when struct
mdio_xpcs_ops is removed. Currently a pointer to struct mdio_xpcs_ops
is used as a shorthand to determine whether the port has an XPCS or not.
We can do the same now with the mdio_xpcs_args pointer.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2cac15da

net: pcs: xpcs: use mdiobus_c45_addr in xpcs_{read,write} · 679e283e

Vladimir Oltean authored Jun 02, 2021

Use the dedicated helper for abstracting away how the clause 45 address
is packed in reg_addr.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

679e283e

net: pcs: xpcs: export xpcs_probe · 8e2bb956

Vladimir Oltean authored Jun 02, 2021

Similar to the other recently functions, it is not necessary for
xpcs_probe to be a function pointer, so export it so that it can be
called directly.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8e2bb956

net: pcs: xpcs: export xpcs_config_eee · 14b517cb

Vladimir Oltean authored Jun 02, 2021

There is no good reason why we need to go through:

stmmac_xpcs_config_eee
-> stmmac_do_callback
   -> mdio_xpcs_ops->config_eee
      -> xpcs_config_eee

when we can simply call xpcs_config_eee.

priv->hw->xpcs is of the type "const struct mdio_xpcs_ops *" and is used
as a placeholder/synonym for priv->plat->mdio_bus_data->has_xpcs. It is
done that way because the mdio_bus_data pointer might or might not be
populated in all stmmac instantiations.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

14b517cb

net: pcs: xpcs: export xpcs_validate · a1a753ed

Vladimir Oltean authored Jun 02, 2021

Calling a function pointer with a single implementation through
struct mdio_xpcs_ops is clunky, and the stmmac_do_callback system forces
this to return int, even though it always returns zero.

Simply remove the "validate" function pointer from struct mdio_xpcs_ops
and replace it with an exported xpcs_validate symbol which is called
directly by stmmac.

priv->hw->xpcs is of the type "const struct mdio_xpcs_ops *" and is used
as a placeholder/synonym for priv->plat->mdio_bus_data->has_xpcs. It is
done that way because the mdio_bus_data pointer might or might not be
populated in all stmmac instantiations.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a1a753ed

net: pcs: xpcs: make the checks related to the PHY interface mode stateless · 9900074e

Vladimir Oltean authored Jun 02, 2021

The operating mode of the driver is currently to populate its
struct mdio_xpcs_args::supported and struct mdio_xpcs_args::an_mode
statically in xpcs_probe(), based on the passed phy_interface_t,
and work with those.

However this is not the operation that phylink expects from a PCS
driver, because the port might be attached to an SFP cage that triggers
changes of the phy_interface_t dynamically as one SFP module is
unpluggged and another is plugged.

To migrate towards that model, the struct mdio_xpcs_args should not
cache anything related to the phy_interface_t, but just look up the
statically defined, const struct xpcs_compat structure corresponding to
the detected PCS OUI/model number.

So we delete the "supported" and "an_mode" members of struct
mdio_xpcs_args, and add the "id" structure there (since the ID is not
expected to change at runtime).

Since xpcs->supported is used deep in the code in _xpcs_config_aneg_c73(),
we need to modify some function headers to pass the xpcs_compat from all
callers. In turn, the xpcs_compat is always supplied externally to the
xpcs module:
- Most of the time by phylink
- In xpcs_probe() it is needed because xpcs_soft_reset() writes to
  MDIO_MMD_PCS or to MDIO_MMD_VEND2 depending on whether an_mode is clause
  37 or clause 73. In order to not introduce functional changes related
  to when the soft reset is issued, we continue to require the initial
  phy_interface_t argument to be passed to xpcs_probe() so we can pass
  this on to xpcs_soft_reset().
- stmmac_open() wants to know whether to call stmmac_init_phy() or not,
  and for that it looks inside xpcs->an_mode, because the clause 73
  (backplane) AN modes supposedly do not have a PHY. Because we moved
  an_mode outside of struct mdio_xpcs_args, this is now no longer
  directly possible, so we introduce a helper function xpcs_get_an_mode()
  which protects the data encapsulation of the xpcs module and requires
  a phy_interface_t to be passed as argument. This function can look up
  the appropriate compat based on the phy_interface_t.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9900074e

net: pcs: xpcs: there is only one PHY ID · a54a8b71

Vladimir Oltean authored Jun 02, 2021

The xpcs driver has an apparently inadequate structure for the actual
hardware it drives.

These defines and the xpcs_probe() function would suggest that there is
one PHY ID per supported PHY interface type, and the driver simply
validates whether the mode it should operate in (the argument of
xpcs_probe) matches what the hardware is capable of:

	#define SYNOPSYS_XPCS_USXGMII_ID	0x7996ced0
	#define SYNOPSYS_XPCS_10GKR_ID		0x7996ced0
	#define SYNOPSYS_XPCS_XLGMII_ID		0x7996ced0
	#define SYNOPSYS_XPCS_SGMII_ID		0x7996ced0
	#define SYNOPSYS_XPCS_MASK		0xffffffff

but that is not the case, because upon closer inspection, all the above
4 PHY ID definitions are in fact equal.

So it is the same XPCS that is compatible with all 4 sets of PHY
interface types.

This change introduces an array of struct xpcs_compat which is populated
by the single struct xpcs_id instance. It also eliminates the bogus
defines for multiple Synopsys XPCS PHY IDs and replaces them with a
single XPCS_ID, which better reflects the way in which the hardware
operates.

Because we are touching this area of the code anyway, the new array of
struct xpcs_compat, as well as the array of xpcs_id, have been moved
towards the end of the file, since they are variable declarations not
definitions. If whichever of struct xpcs_compat or struct xpcs_id need
to gain a function pointer member in the future, it is easier to
reference functions (no forward declarations needed) if we have the
const variable declarations at the end of the file.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a54a8b71

net: pcs: xpcs: delete shim definition for mdio_xpcs_get_ops() · b81017ae

Vladimir Oltean authored Jun 02, 2021

CONFIG_STMMAC_ETH selects CONFIG_PCS_XPCS, so there should be no
situation where the shim should be needed.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b81017ae

Merge branch 'hdlc_cisco-cleanups' · b596ce68

David S. Miller authored Jun 03, 2021

Peng Li says:

====================
net: hdlc_cisco: clean up some code style issues

This patchset clean up some code style issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b596ce68

net: hdlc_cisco: remove redundant space · 4a20f8ec

Peng Li authored Jun 02, 2021

Space prohibited between function name and open parenthesis '('.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4a20f8ec

net: hdlc_cisco: add blank line after declaration · 4e38d514

Peng Li authored Jun 02, 2021

This patch fixes the checkpatch error about missing a blank line
after declarations.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4e38d514

net: hdlc_cisco: remove unnecessary out of memory message · 05ff5525

Peng Li authored Jun 02, 2021

This patch removes unnecessary out of memory message,
to fix the following checkpatch.pl warning:
"WARNING: Possible unnecessary 'out of memory' message"
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

05ff5525

net: hdlc_cisco: add some required spaces · c1300f37

Peng Li authored Jun 02, 2021

Add spaces required after the close parenthesis '}'.
Add spaces required after that ','.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c1300f37

net: hdlc_cisco: fix the code style issue about "foo* bar" · 001aa274

Peng Li authored Jun 02, 2021

Fix the checkpatch error as "foo* bar" and should be "foo *bar",
and "(foo*)" should be "(foo *)".
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

001aa274

net: hdlc_cisco: remove redundant blank lines · 5abaf211

Peng Li authored Jun 02, 2021

This patch removes some redundant blank lines.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5abaf211

libceph: Fix spelling mistakes · dd0d91b9

Zheng Yongjun authored Jun 02, 2021

Fix some spelling mistakes in comments:
enconding  ==> encoding
ambigous  ==> ambiguous
orignal  ==> original
encyption  ==> encryption
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dd0d91b9

rtnetlink: Fix spelling mistakes · d467d0bc

Zheng Yongjun authored Jun 02, 2021

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d467d0bc

02 Jun, 2021 3 commits

Merge branch 'devlink-rate-objects' · 270d47dc

David S. Miller authored Jun 02, 2021

Dmytro Linkin says:

====================
devlink: rate objects API

Resending without RFC.

Currently kernel provides a way to change tx rate of single VF in
switchdev mode via tc-police action. When lots of VFs are configured
management of theirs rates becomes non-trivial task and some grouping
mechanism is required. Implementing such grouping in tc-police will bring
flow related limitations and unwanted complications, like:
- tc-police is a policer and there is a user request for a traffic
  shaper, so shared tc-police action is not suitable;
- flows requires net device to be placed on, means "groups" wouldn't
  have net device instance itself. Taking into the account previous
  point was reviewed a sollution, when representor have a policer and
  the driver use a shaper if qdisc contains group of VFs - such approach
  ugly, compilated and misleading;
- TC is ingress only, while configuring "other" side of the wire looks
  more like a "real" picture where shaping is outside of the steering
  world, similar to "ip link" command;

According to that devlink is the most appropriate place.

This series introduces devlink API for managing tx rate of single devlink
port or of a group by invoking callbacks (see below) of corresponding
driver. Also devlink port or a group can be added to the parent group,
where driver responsible to handle rates of a group elements. To achieve
all of that new rate object is added. It can be one of the two types:
- leaf - represents a single devlink port; created/destroyed by the
  driver and bound to the devlink port. As example, some driver may
  create leaf rate object for every devlink port associated with VF.
  Since leaf have 1to1 mapping to it's devlink port, in user space it is
  referred as pci/<bus_addr>/<port_index>;
- node - represents a group of rate objects; created/deleted by request
  from the userspace; initially empty (no rate objects added). In
  userspace it is referred as pci/<bus_addr>/<node_name>, where node name
  can be any, except decimal number, to avoid collisions with leafs.

devlink_ops extended with following callbacks:
- rate_{leaf|node}_tx_{share|max}_set
- rate_node_{new|del}
- rate_{leaf|node}_parent_set

KAPI provides:
- creation/destruction of the leaf rate object associated with devlink
  port
- destruction of rate nodes to allow a vendor driver to free allocated
  resources on driver removal or due to the other reasons when nodes
  destruction required

UAPI provides:
- dumping all or single rate objects
- setting tx_{share|max} of rate object of any type
- creating/deleting node rate object
- setting/unsetting parent of any rate object

Added devlink rate object support for netdevsim driver

Issues/open questions:
- Does user need DEVLINK_CMD_RATE_DEL_ALL_CHILD command to clean all
  children of particular parent node? For example:
  $ devlink port function rate flush netdevsim/netdevsim10/group
- priv pointer passed to the callbacks is a source of bugs; in leaf case
  driver can embed rate object into internal structure and use
  container_of() on it; in node case it cannot be done since nodes are
  created from userspace

v1->v2:
- fixed kernel-doc for devlink_rate_leaf_{create|destroy}()
- s/func/function/ for all devlink port command occurences

v2->v3:
- devlink:
  - added devlink_rate_nodes_destroy() function
- netdevsim:
  - added call of devlink_rate_nodes_destroy() function
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

270d47dc

Documentation: devlink rate objects · b62767e7

Dmytro Linkin authored Jun 02, 2021

Add devlink rate objects section at devlink port documentation.
Add devlink rate support info at netdevsim devlink documentation.
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b62767e7

selftest: netdevsim: Add devlink rate grouping test · 1a9c0482

Dmytro Linkin authored Jun 02, 2021

Test verifies that netdevsim correctly implements devlink ops callbacks
that set node as a parent of devlink leaf or node rate object.
Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1a9c0482