Commits · c9ad20573a91ecfce45404bd0e33913b476613c5 · Kirill Smelkov / linux

20 Apr, 2021 40 commits

Merge branch 'mlxsw-refactor-qdisc-offload' · c9ad2057

David S. Miller authored Apr 20, 2021

Petr Machata says:

====================
mlxsw: Refactor qdisc offload

Currently, mlxsw admits for offload a suitable root qdisc, and its
children. Thus up to two levels of hierarchy are offloaded. Often, this is
enough: one can configure TCs with RED and TCs with a shaper, and can even
see counters for each TC by looking at a qdisc at a sufficiently shallow
position.

While simple, the system has obvious shortcomings. It is not possible to
configure both RED and shaping on one TC. It is not possible to place a
PRIO below root TBF, which would then be offloaded as port shaper. FIFOs
are only offloaded at root or directly below, which is confusing to users,
because RED and TBF of course have their own FIFO.

This patchset is a step towards the end goal of allowing more comprehensive
qdisc tree offload and cleans up the qdisc offload code.

- Patches #1-#4 contain small cleanups.

- Up until now, since mlxsw offloaded only a very simple qdisc
  configurations, basically all bookkeeping was done using one container
  for the root qdisc, and 8 containers for its children. Patches #5, #6, #8
  and #9 gradually introduce a more dynamic structure, where parent-child
  relationships are tracked directly at qdiscs, instead of being implicit.

- This tree management assumes only one qdisc is created at a time. In FIFO
  handlers, this condition was enforced simply by asserting RTNL lock. But
  instead of furthering this RTNL dependence, patch #7 converts the whole
  qdisc offload logic to a per-port mutex.

- Patch #10 adds a selftest.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c9ad2057

selftests: mlxsw: sch_red_ets: Test proper counter cleaning in ETS · 0a4d0cb1

Petr Machata authored Apr 20, 2021

There was a bug introduced during the rework which cause non-zero backlog
being stuck at ETS. Introduce a selftest that would have caught the issue
earlier.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0a4d0cb1

mlxsw: spectrum_qdisc: Index future FIFOs by band number · 7de85b04

Petr Machata authored Apr 20, 2021

mlxsw used to hold an array of qdiscs indexed by the TC number. In the
previous patch, it was changed to allocate child qdiscs dynamically, and
they are now indexed by band number. Follow suit with the array of future
FIFOs.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7de85b04

mlxsw: spectrum_qdisc: Allocate child qdiscs dynamically · 5cbd9602

Petr Machata authored Apr 20, 2021

Instead of keeping qdiscs in globally-preallocated arrays, introduce a
per-qdisc-kind value num_classes, and then allocate the necessary child
qdiscs (if any) based on that value. Since now dynamic allocation is
involved, mlxsw_sp_qdisc_replace() gets messy enough that it is worth it to
split it to two cases: a new qdisc allocation and a change of existing
qdisc. (Note that the change also includes what TC formally calls replace,
if the qdisc kind is the same.)
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5cbd9602

mlxsw: spectrum_qdisc: Guard all qdisc accesses with a lock · cff99e20

Petr Machata authored Apr 20, 2021

The FIFO handler currently guards accesses to the future FIFO tracking by
asserting RTNL. In the future, the changes to the qdisc state will be more
thorough, so other qdiscs will need this guarding is as well. In order
to not further the RTNL infestation, instead convert to a custom lock that
will guard accesses to the qdisc state.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cff99e20

mlxsw: spectrum_qdisc: Track children per qdisc · 51d52ed9

Petr Machata authored Apr 20, 2021

mlxsw currently allows a two-level structure of qdiscs: the root and
possibly a number of children. In order to support offloading more general
qdisc trees, introduce to struct mlxsw_sp_qdisc a pointer to child qdiscs.
Refer to the child qdiscs through this pointer, instead of going through
the tclass_qdiscs in qdisc_state. Additionally introduce a field
num_classes, which holds number of given qdisc's children.

Also introduce a generic function for walking qdisc trees. Rewrite
mlxsw_sp_qdisc_find() and _find_by_handle() to use the generic walker.

For now, keep the qdisc_state.tclass_qdisc, and just point root_qdiscs's
children to this array. Following patches will make the allocation dynamic.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

51d52ed9

mlxsw: spectrum_qdisc: Promote backlog reduction to mlxsw_sp_qdisc_destroy() · b21832b5

Petr Machata authored Apr 20, 2021

When a qdisc is removed, it is necessary to update the backlog value at its
parent--unless the qdisc is at root position. RED, TBF and FIFO all do
that, each separately. Since all of them need to do this, just promote the
operation directly to mlxsw_sp_qdisc_destroy(), instead of deferring it to
individual destructors. Since FIFO dtor thus becomes trivial, remove it.

Add struct mlxsw_sp_qdisc.parent to point at the parent qdisc. This will be
handy later as deeper structures are offloaded. Use the parent qdisc to
find the chain of parents whose backlog value needs to be updated.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b21832b5

mlxsw: spectrum_qdisc: Track tclass_num as int, not u8 · 017a131c

Petr Machata authored Apr 20, 2021

tclass_num is just a number, a value that would be ordinarily passed around
as an int. (Which is unlike a u8 prio_bitmap.) In several places,
tclass_num already is an int. Convert the remaining instances.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

017a131c

mlxsw: spectrum_qdisc: Drop an always-true condition · 549f2aae

Petr Machata authored Apr 20, 2021

The function mlxsw_sp_qdisc_compare() is invoked a couple lines above this
check, which will bounce any requests where this condition does not hold.
Therefore drop it.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

549f2aae

mlxsw: spectrum_qdisc: Simplify mlxsw_sp_qdisc_compare() · 290fe2c5

Petr Machata authored Apr 20, 2021

The purpose of this function is to filter out events that are related to
qdiscs that are not offloaded, or are not offloaded anymore. But the
function is unnecessarily thorough:

- mlxsw_sp_qdisc pointer is never NULL in the context where it is called
- Two qdiscs with the same handle will never have different types. Even
  when replacing one qdisc with another in the same class, Linux will not
  permit handle reuse unless the qdisc type also matches.

Simplify the function by omitting these two unnecessary conditions.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

290fe2c5

mlxsw: spectrum_qdisc: Drop one argument from check_params callback · 17c0e6d1

Petr Machata authored Apr 20, 2021

The mlxsw_sp_qdisc argument is not used in any of the actual callbacks.
Drop it.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

17c0e6d1

korina: Fix build. · 790aad0e
David S. Miller authored Apr 20, 2021
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
790aad0e

Merge branch 'marvell-phy-hwmon' · b015f4ef

David S. Miller authored Apr 20, 2021

Marek Behún says:

====================
net: phy: marvell: some HWMON updates

Here are some updates for Marvell PHY HWMON, mainly
- refactoring for code deduplication
- Amethyst PHY support
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b015f4ef

net: phy: marvell: add support for Amethyst internal PHY · a978f7c4

Marek Behún authored Apr 20, 2021

Add support for Amethyst internal PHY.

The only difference from Peridot is HWMON.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

a978f7c4

net: dsa: mv88e6xxx: simulate Amethyst PHY model number · c5d015b0

Marek Behún authored Apr 20, 2021

Amethyst internal PHYs also report empty model number in MII_PHYSID2.

Fill in switch product number, as is done for Topaz and Peridot.
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

c5d015b0

net: phy: marvell: use assignment by bitwise AND operator · 00218173

Marek Behún authored Apr 20, 2021

Use the &= operator instead of
  ret = ret & ...
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

00218173

net: phy: marvell: fix HWMON enable register for 6390 · 4f920c29

Marek Behún authored Apr 20, 2021

Register 27_6.15:14 has the following description in 88E6393X
documentation:
  Temperature Sensor Enable
    0x0 - Sample every 1s
    0x1 - Sense rate decided by bits 10:8 of this register
    0x2 - Use 26_6.5 (One shot Temperature Sample) to enable
    0x3 - Disable

This is compatible with how the 6390 code uses this register currently,
but the 6390 code handles it as two 1-bit registers (somewhat), instead
of one register with 4 possible values.

(A newer version of the 6390 documentation removed temperature sensor
 section completely. In an older version, the above mentioned register
 is reserved, although it is R/W. Since the code works, I think we can
 assume that it is correct.)

Rename this register and define all 4 values according to 6393X
documentation.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f920c29

net: phy: marvell: refactor HWMON OOP style · 41d26bf4

Marek Behún authored Apr 20, 2021

Use a structure of Marvell PHY specific HWMON methods to reduce code
duplication. Store a pointer to this structure into the PHY driver's
driver_data member.
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

41d26bf4

korina: Fix conflict with global symbol desc_empty on x86. · 56e2e5de
David S. Miller authored Apr 20, 2021
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
56e2e5de

Merge tag 'mlx5-updates-2021-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · ff254dad

David S. Miller authored Apr 20, 2021

Saeed Mahameed says:

====================
mlx5-updates-2021-04-19

This patchset provides some updates to mlx5e and mlx5 SW steering drivers:

1) Tariq and Vladyslav they both provide some trivial update to mlx5e netdev.

The next 12 patches in the patchset are focused toward mlx5 SW steering:
2) 3 trivial cleanup patches

3) Dynamic Flex parser support:
   Flex parser is a HW parser that can support protocols that are not
    natively supported by the HCA, such as Geneve (TLV options) and GTP-U.
    There are 8 such parsers, and each of them can be assigned to parse a
    specific set of protocols.

4) Enable matching on Geneve TLV options

5) Use Flex parser for MPLS over UDP/GRE

6) Enable matching on tunnel GTP-U and GTP-U first extension
   header using

7) Improved QoS for SW steering internal QPair for a better insertion rate
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ff254dad

net: dsa: felix: disable always guard band bit for TAS config · 316bcffe

Xiaoliang Yang authored Apr 19, 2021

ALWAYS_GUARD_BAND_SCH_Q bit in TAS config register is descripted as
this:
	0: Guard band is implemented for nonschedule queues to schedule
	   queues transition.
	1: Guard band is implemented for any queue to schedule queue
	   transition.

The driver set guard band be implemented for any queue to schedule queue
transition before, which will make each GCL time slot reserve a guard
band time that can pass the max SDU frame. Because guard band time could
not be set in tc-taprio now, it will use about 12000ns to pass 1500B max
SDU. This limits each GCL time interval to be more than 12000ns.

This patch change the guard band to be only implemented for nonschedule
queues to schedule queues transition, so that there is no need to reserve
guard band on each GCL. Users can manually add guard band time for each
schedule queues in their configuration if they want.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

316bcffe

Merge branch 'net-generic-selftest-support' · e655bbf9

David S. Miller authored Apr 20, 2021

Oleksij Rempel says:

====================
provide generic net selftest support

changes v3:
- make more granular tests
- enable loopback for all PHYs by default
- fix allmodconfig build errors
- poll for link status update after switching to the loopback mode

changes v2:
- make generic selftests available for all networking devices.
- make use of net_selftest* on FEC, ag71xx and all DSA switches.
- add loopback support on more PHYs.

This patch set provides diagnostic capabilities for some iMX, ag71xx or
any DSA based devices. For proper functionality, PHY loopback support is
needed.
So far there is only initial infrastructure with basic tests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e655bbf9

net: dsa: enable selftest support for all switches by default · a71acad9

Oleksij Rempel authored Apr 19, 2021

Most of generic selftest should be able to work with probably all ethernet
controllers. The DSA switches are not exception, so enable it by default at
least for DSA.

This patch was tested with SJA1105 and AR9331.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

a71acad9

net: ag71xx: make use of generic NET_SELFTESTS library · b62a12fc

Oleksij Rempel authored Apr 19, 2021

With this patch the ag71xx on Atheros AR9331 will able to run generic net
selftests.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

b62a12fc

net: fec: make use of generic NET_SELFTESTS library · 6016ba34

Oleksij Rempel authored Apr 19, 2021

With this patch FEC on iMX will able to run generic net selftests
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

6016ba34

net: add generic selftest support · 3e1e58d6

Oleksij Rempel authored Apr 19, 2021

Port some parts of the stmmac selftest and reuse it as basic generic selftest
library. This patch was tested with following combinations:
- iMX6DL FEC -> AT8035
- iMX6DL FEC -> SJA1105Q switch -> KSZ8081
- iMX6DL FEC -> SJA1105Q switch -> KSZ9031
- AR9331 ag71xx -> AR9331 PHY
- AR9331 ag71xx -> AR9331 switch -> AR9331 PHY
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

3e1e58d6

net: phy: genphy_loopback: add link speed configuration · 014068dc

Oleksij Rempel authored Apr 19, 2021

In case of loopback, in most cases we need to disable autoneg support
and force some speed configuration. Otherwise, depending on currently
active auto negotiated link speed, the loopback may or may not work.

This patch was tested with following PHYs: TJA1102, KSZ8081, KSZ9031,
AT8035, AR9331.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

014068dc

net: phy: execute genphy_loopback() per default on all PHYs · f4f86d8d

Oleksij Rempel authored Apr 19, 2021

The generic loopback is really generic and is defined by the 802.3
standard, we should just mandate that drivers implement a custom
loopback if the generic one cannot work.
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

f4f86d8d

net/mlx5: DR, Add support for isolate_vl_tc QP · aeacb52a

Yevgeny Kliteynik authored Nov 03, 2020

When using SW steering, rule insertion rate depends on the RDMA RC QP
performance used for writing to the ICM. During stress this QP is competing
on the HW resources with all the other QPs that are used to send data.
To protect SW steering QP's performance in such cases, we set this QP to
use isolated VL. The VL number is reserved by FW and is not exposed to the
driver.
Support for this QP on isolated VL exists only when both force-loopback and
isolate_vl_tc capabilities are set.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

aeacb52a

net/mlx5: DR, Add support for force-loopback QP · 7304d603

Yevgeny Kliteynik authored Nov 02, 2020

When supported by the device, SW steering RoCE RC QP that is used to
write/read to/from ICM will be created with force-loopback attribute.
Such QP doesn't require GID index upon creation.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

7304d603

net/mlx5: DR, Add support for matching tunnel GTP-U · df9dd15a

Yevgeny Kliteynik authored Feb 07, 2021

Enable matching on tunnel GTP-U and GTP-U first extension
header using dynamic flex parser.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

df9dd15a

net/mlx5: DR, Set flex parser for TNL_MPLS dynamically · 35ba005d

Yevgeny Kliteynik authored Feb 07, 2021

Query the flex_parser id that's intended for TNL_MPLS
and use an appropriate flex parser for MPLS over UDP/GRE.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

35ba005d

net/mlx5: DR, Add support for matching on geneve TLV option · 3442e033

Yevgeny Kliteynik authored Feb 07, 2021

Enable matching on tunnel geneve TLV option using the flex parser.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

3442e033

net/mlx5: DR, Set STEv0 ICMP flex parser dynamically · 4923938d

Yevgeny Kliteynik authored Feb 07, 2021

Set the flex parser ID dynamicly for ICMP instead of relying
on hardcoded values.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

4923938d

net/mlx5: DR, Add support for dynamic flex parser · 160e9cb3

Yevgeny Kliteynik authored Nov 24, 2020

Flex parser is a HW parser that can support protocols that are not
natively supported by the HCA, such as Geneve (TLV options) and GTP-U.
There are 8 such parsers, and each of them can be assigned to parse a
specific set of protocols.
This patch adds misc4 match params which allows using a correct flex parser
that was programmed to the required protocol.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

160e9cb3

net/mlx5: DR, Remove protocol-specific flex_parser_3 definitions · 323b91ac

Muhammad Sammar authored Oct 21, 2020

Remove MPLS specific fields from flex parser 3 layout.
Flex parser can be used for multiple protocols and should
not be hardcoded to a specific type.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

323b91ac

net/mlx5: mlx5_ifc updates for flex parser · 704cfecd

Yevgeny Kliteynik authored Feb 28, 2021

Added the required definitions for supporting more protocols by flex parsers
(GTP-U, Geneve TLV options), and for using the right flex parser that was
configured for this protocol.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

704cfecd

net/mlx5: E-Switch, Improve error messages in term table creation · 25cb3176

Yevgeny Kliteynik authored Feb 06, 2021

Add error code to the error messages and removed duplicated message:
if termination table creation failed, we already get an error message
in mlx5_eswitch_termtbl_create, so no need for the additional error print
in the calling function.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

25cb3176

net/mlx5: DR, Fix SQ/RQ in doorbell bitmask · ff1925bb

Yevgeny Kliteynik authored Feb 06, 2021

QP doorbell size is 16 bits.
Fixing sw steering's QP doorbel bitmask, which had 20 bits.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

ff1925bb

net/mlx5: DR, Rename an argument in dr_rdma_segments · 7d22ad73

Yevgeny Kliteynik authored Sep 24, 2020

Rename the argument to better reflect that the meaning is
not number of records, but wheather or not we should
ring the dorbell.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

7d22ad73