Commits · 163a180213afb2d0ec16cd71d0e0669cb940cd08 · Kirill Smelkov / linux

06 Feb, 2021 40 commits

Merge branch 'net-ipa-a-mix-of-small-improvements' · 163a1802

Jakub Kicinski authored Feb 06, 2021

Alex Elder says:

====================
net: ipa: a mix of small improvements

Version 2 of this series restructures a couple of the changed
functions (in patches 1 and 2) to avoid blocks of indented code
by returning early when possible, as suggested by Jakub.  The
description of the first patch was changed as a result, to better
reflect what the updated patch does.  It also fixes one spot I
identified when updating the code, where gsi_channel_stop() was
doing the wrong thing on error.

The original description for this series is below.

This series contains a sort of unrelated set of code cleanups.

The first two are things I wanted to do in a series that updated
some NAPI code recently.  I didn't want to change things in a way
that affected existing testing so I set these aside for later
(i.e., now).

The third makes a change to event ring handling that's similar to
what was done a while back for channels.  There's little benefit to
cacheing the current state of an event ring, so with this we'll just
fetch the state from hardware whenever we need it.

The fourth patch removes the definitions of two unused symbols.

The fifth replaces a count that is always 0 or 1 with a Boolean.

The sixth removes a build-time validation check that doesn't really
provide benefit.

And the last one fixes a problem (in two spots) that could cause a
build-time check to fail "bogusly".
====================

Link: https://lore.kernel.org/r/20210205221100.1738-1-elder@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

163a1802

net: ipa: avoid field overflow · cd115009

Alex Elder authored Feb 05, 2021

It's possible that the length passed to ipa_header_size_encoded()
is larger than what can be represented by the HDR_LEN field alone
(starting with IPA v4.5).  If we attempted that, u32_encode_bits()
would trigger a build-time error.

Avoid this problem by masking off high-order bits of the value
encoded as the lower portion of the header length.

The same sort of problem exists in ipa_metadata_offset_encoded(),
so implement the same fix there.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

cd115009

net: ipa: get rid of status size constraint · 48735374

Alex Elder authored Feb 05, 2021

There is a build-time check that the packet status structure is a
multiple of 4 bytes in size.  It's not clear where that constraint
comes from, but the structure defines what hardware provides so its
definition won't change.  Get rid of the check; it adds no value.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

48735374

net: ipa: use a Boolean rather than count when replenishing · 9af5ccf3

Alex Elder authored Feb 05, 2021

The count argument to ipa_endpoint_replenish() is only ever 0 or 1,
and always will be (because we always handle each receive buffer in
a single transaction).  Rename the argument to be add_one and change
it to be Boolean.

Update the function description to reflect the current code.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

9af5ccf3

net: ipa: remove two unused register definitions · d5bc5015

Alex Elder authored Feb 05, 2021

We do not support inter-EE channel or event ring commands.  Inter-EE
interrupts are disabled (and never re-enabled) for all channels and
event rings, so we have no need for the GSI registers that clear
those interrupt conditions.  So remove their definitions.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

d5bc5015

net: ipa: do not cache event ring state · 3f77c926

Alex Elder authored Feb 05, 2021

An event ring's state only needs to be known when it is allocated,
reset, or deallocated.  We check an event ring's state both before
and after performing an event ring control command that changes
its state.  These are only issued at startup and shutdown, so there
is very little value in caching the state.

Stop recording a copy of the channel's last known state, and instead
fetch the true state from hardware whenever it's needed.  In such
cases, *do* record the state in a local variable, in case an error
message reports it (so the value reported is the value seen).
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

3f77c926

net: ipa: synchronize NAPI only for suspend · b1750723

Alex Elder authored Feb 05, 2021

When stopping a channel, gsi_channel_stop() will ensure NAPI
polling is complete when it calls napi_disable().  So there is no
need to call napi_synchronize() in that case.

Move the call to napi_synchronize() out of __gsi_channel_stop()
and into gsi_channel_suspend(), so it's only used where needed.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b1750723

net: ipa: move mutex calls into __gsi_channel_stop() · 63ec9be1

Alex Elder authored Feb 05, 2021

Move the mutex calls out of gsi_channel_stop_retry() and into
__gsi_channel_stop(), to make the latter more semantically similar
to __gsi_channel_start().
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

63ec9be1

Merge branch 'lag-offload-for-ocelot-dsa-switches' · bfc213f1

Jakub Kicinski authored Feb 06, 2021

Vladimir Oltean says:

====================
LAG offload for Ocelot DSA switches

This patch series reworks the ocelot switchdev driver such that it could
share the same implementation for LAG offload as the felix DSA driver.

Testing has been done in the following topology:

         +----------------------------------+
         | Board 1         br0              |
         |             +---------+          |
         |            /           \         |
         |            |           |         |
         |            |         bond0       |
         |            |        +-----+      |
         |            |       /       \     |
         |  eno0     swp0    swp1    swp2   |
         +---|--------|-------|-------|-----+
             |        |       |       |
             +--------+       |       |
               Cable          |       |
                         Cable|       |Cable
               Cable          |       |
             +--------+       |       |
             |        |       |       |
         +---|--------|-------|-------|-----+
         |  eno0     swp0    swp1    swp2   |
         |            |       \       /     |
         |            |        +-----+      |
         |            |         bond0       |
         |            |           |         |
         |            \           /         |
         |             +---------+          |
         | Board 2         br0              |
         +----------------------------------+

The same script can be run on both Board 1 and Board 2 to set this up:

ip link del bond0
ip link add bond0 type bond mode balance-xor miimon 1
OR
ip link add bond0 type bond mode 802.3ad
ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
ip link del br0
ip link add br0 type bridge
ip link set bond0 master br0
ip link set swp0 master br0

Then traffic can be tested between eno0 of Board 1 and eno0 of Board 2.
====================

Link: https://lore.kernel.org/r/20210205220221.255646-1-olteanv@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bfc213f1

net: dsa: felix: propagate the LAG offload ops towards the ocelot lib · 8fe6832e

Vladimir Oltean authored Feb 06, 2021

The ocelot switch has been supporting LAG offload since its initial
commit, however felix could not make use of that, due to lack of a LAG
abstraction in DSA. Now that we have that, let's forward DSA's calls
towards the ocelot library, who will deal with setting up the bonding.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

8fe6832e

net: dsa: make assisted_learning_on_cpu_port bypass offloaded LAG interfaces · a324d3d4

Vladimir Oltean authored Feb 06, 2021

Given the following topology, and focusing only on Box A:

         Box A
         +----------------------------------+
         | Board 1         br0              |
         |             +---------+          |
         |            /           \         |
         |            |           |         |
         |            |         bond0       |
         |            |        +-----+      |
         |192.168.1.1 |       /       \     |
         |  eno0     swp0    swp1    swp2   |
         +---|--------|-------|-------|-----+
             |        |       |       |
             +--------+       |       |
               Cable          |       |
                         Cable|       |Cable
               Cable          |       |
             +--------+       |       |
             |        |       |       |
         +---|--------|-------|-------|-----+
         |  eno0     swp0    swp1    swp2   |
         |192.168.1.2 |       \       /     |
         |            |        +-----+      |
         |            |         bond0       |
         |            |           |         |
         |            \           /         |
         |             +---------+          |
         | Board 2         br0              |
         +----------------------------------+
         Box B

The assisted_learning_on_cpu_port logic will see that swp0 is bridged
with a "foreign interface" (bond0) and will therefore install all
addresses learnt by the software bridge towards bond0 (including the
address of eno0 on Box B) as static addresses towards the CPU port.

But that's not what we want - bond0 is not really a "foreign interface"
but one we can offload including L2 forwarding from/towards it. So we
need to refine our logic for assisted learning such that, whenever we
see an address learnt on a non-DSA interface, we search through the tree
for any port that offloads that non-DSA interface.

Some confusion might arise as to why we search through the whole tree
instead of just the local switch returned by dsa_slave_dev_lower_find.
Or a different angle of the same confusion: why does
dsa_slave_dev_lower_find(br_dev) return a single dp that's under br_dev
instead of the whole list of bridged DSA ports?

To answer the second question, it should be enough to install the static
FDB entry on the CPU port of a single switch in the tree, because
dsa_port_fdb_add uses DSA_NOTIFIER_FDB_ADD which ensures that all other
switches in the tree get notified of that address, and add the entry
themselves using dsa_towards_port().

This should help understand the answer to the first question: the port
returned by dsa_slave_dev_lower_find may not be on the same switch as
the ports that offload the LAG. Nonetheless, if the driver implements
.crosschip_lag_join and .crosschip_bridge_join as mv88e6xxx does, there
still isn't any reason for trapping addresses learnt on the remote LAG
towards the CPU, and we should prevent that.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

a324d3d4

net: mscc: ocelot: rebalance LAGs on link up/down events · 23ca3b72

Vladimir Oltean authored Feb 06, 2021

At present there is an issue when ocelot is offloading a bonding
interface, but one of the links of the physical ports goes down. Traffic
keeps being hashed towards that destination, and of course gets dropped
on egress.

Monitor the netdev notifier events emitted by the bonding driver for
changes in the physical state of lower interfaces, to determine which
ports are active and which ones are no longer.

Then extend ocelot_get_bond_mask to return either the configured bonding
interfaces, or the active ones, depending on a boolean argument. The
code that does rebalancing only needs to do so among the active ports,
whereas the bridge forwarding mask and the logical port IDs still need
to look at the permanently bonded ports.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

23ca3b72

net: mscc: ocelot: rename aggr_count to num_ports_in_lag · 21357b61

Vladimir Oltean authored Feb 06, 2021

It makes it a bit easier to read and understand the code that deals with
balancing the 16 aggregation codes among the ports in a certain LAG.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

21357b61

net: mscc: ocelot: drop the use of the "lags" array · 528d3f19

Vladimir Oltean authored Feb 06, 2021

We can now simplify the implementation by always using ocelot_get_bond_mask
to look up the other ports that are offloading the same bonding interface
as us.

In ocelot_set_aggr_pgids, the code had a way to uniquely iterate through
LAGs. We need to achieve the same behavior by marking each LAG as visited,
which we do now by using a temporary 32-bit "visited" bitmask. This is
ok and we do not need dynamic memory allocation, because we know that
this switch architecture will not have more than 32 ports (the PGID port
masks are 32-bit anyway).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

528d3f19

net: mscc: ocelot: set up logical port IDs centrally · 2527f2e8

Vladimir Oltean authored Feb 06, 2021

The setup of logical port IDs is done in two places: from the inconclusively
named ocelot_setup_lag and from ocelot_port_lag_leave, a function that
also calls ocelot_setup_lag (which apparently does an incomplete setup
of the LAG).

To improve this situation, we can rename ocelot_setup_lag into
ocelot_setup_logical_port_ids, and drop the "lag" argument. It will now
set up the logical port IDs of all switch ports, which may be just
slightly more inefficient but more maintainable.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2527f2e8

net: mscc: ocelot: avoid unneeded "lp" variable in LAG join · 2e9f4afa

Vladimir Oltean authored Feb 06, 2021

The index of the LAG is equal to the logical port ID that all the
physical port members have, which is further equal to the index of the
first physical port that is a member of the LAG.

The code gets a bit carried away with logic like this:

	if (a == b)
		c = a;
	else
		c = b;

which can be simplified, of course, into:

	c = b;

(with a being port, b being lp, c being lag)

This further makes the "lp" variable redundant, since we can use "lag"
everywhere where "lp" (logical port) was used. So instead of a "c = b"
assignment, we can do a complete deletion of b. Only one comment here:

		if (bond_mask) {
			lp = __ffs(bond_mask);
			ocelot->lags[lp] = 0;
		}

lp was clobbered before, because it was used as a temporary variable to
hold the new smallest port ID from the bond. Now that we don't have "lp"
any longer, we'll just avoid the temporary variable and zeroize the
bonding mask directly.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2e9f4afa

net: mscc: ocelot: set up the bonding mask in a way that avoids a net_device · b80af659

Vladimir Oltean authored Feb 06, 2021

Since this code should be called from pure switchdev as well as from
DSA, we must find a way to determine the bonding mask not by looking
directly at the net_device lowers of the bonding interface, since those
could have different private structures.

We keep a pointer to the bonding upper interface, if present, in struct
ocelot_port. Then the bonding mask becomes the bitwise OR of all ports
that have the same bonding upper interface. This adds a duplication of
functionality with the current "lags" array, but the duplication will be
short-lived, since further patches will remove the latter completely.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b80af659

net: mscc: ocelot: use ipv6 in the aggregation code · f79c20c8

Vladimir Oltean authored Feb 06, 2021

IPv6 header information is not currently part of the entropy source for
the 4-bit aggregation code used for LAG offload, even though it could be.
The hardware reference manual says about these fields:

ANA::AGGR_CFG.AC_IP6_TCPUDP_PORT_ENA
Use IPv6 TCP/UDP port when calculating aggregation code. Configure
identically for all ports. Recommended value is 1.

ANA::AGGR_CFG.AC_IP6_FLOW_LBL_ENA
Use IPv6 flow label when calculating AC. Configure identically for all
ports. Recommended value is 1.

Integration with the xmit_hash_policy of the bonding interface is TBD.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

f79c20c8

net: mscc: ocelot: don't refuse bonding interfaces we can't offload · 583cbbe3

Vladimir Oltean authored Feb 06, 2021

Since switchdev/DSA exposes network interfaces that fulfill many of the
same user space expectations that dedicated NICs do, it makes sense to
not deny bonding interfaces with a bonding policy that we cannot offload,
but instead allow the bonding driver to select the egress interface in
software.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

583cbbe3

net: mscc: ocelot: use a switch-case statement in ocelot_netdevice_event · 41e66fa2

Vladimir Oltean authored Feb 06, 2021

Make ocelot's net device event handler more streamlined by structuring
it in a similar way with others. The inspiration here was
dsa_slave_netdevice_event.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

41e66fa2

net: mscc: ocelot: rename ocelot_netdevice_port_event to ocelot_netdevice_changeupper · 662981bb

Vladimir Oltean authored Feb 06, 2021

ocelot_netdevice_port_event treats a single event, NETDEV_CHANGEUPPER.
So we can remove the check for the type of event, and rename the
function to be more suggestive, since there already is a function with a
very similar name of ocelot_netdevice_event.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

662981bb

Merge branch 'automatically-manage-dsa-master-interface-state' · 8d9dbce4

Jakub Kicinski authored Feb 06, 2021

Vladimir Oltean says:

====================
Automatically manage DSA master interface state

This patch series adds code that makes DSA open the master interface
automatically whenever one user interface gets opened, either by the
user, or by various networking subsystems: netconsole, nfsroot.
With that in place, we can remove some of the places in the network
stack where DSA-specific code was sprinkled.
====================

Link: https://lore.kernel.org/r/20210205133713.4172846-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8d9dbce4

Revert "net: ipv4: handle DSA enabled master network devices" · 46acf7bd

Vladimir Oltean authored Feb 05, 2021

This reverts commit 728c0208.

Since 2015 DSA has gained more integration with the network stack, we
can now have the same functionality without explicitly open-coding for
it:
- It now opens the DSA master netdevice automatically whenever a user
  netdevice is opened.
- The master and switch interfaces are coupled in an upper/lower
  hierarchy using the netdev adjacency lists.

In the nfsroot example below, the interface chosen by autoconfig was
swp3, and every interface except that and the DSA master, eth1, was
brought down afterwards:

[    8.714215] mscc_felix 0000:00:00.5 swp0 (uninitialized): PHY [0000:00:00.3:10] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
[    8.978041] mscc_felix 0000:00:00.5 swp1 (uninitialized): PHY [0000:00:00.3:11] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
[    9.246134] mscc_felix 0000:00:00.5 swp2 (uninitialized): PHY [0000:00:00.3:12] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
[    9.486203] mscc_felix 0000:00:00.5 swp3 (uninitialized): PHY [0000:00:00.3:13] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
[    9.512827] mscc_felix 0000:00:00.5: configuring for fixed/internal link mode
[    9.521047] mscc_felix 0000:00:00.5: Link is Up - 2.5Gbps/Full - flow control off
[    9.530382] device eth1 entered promiscuous mode
[    9.535452] DSA: tree 0 setup
[    9.539777] printk: console [netcon0] enabled
[    9.544504] netconsole: network logging started
[    9.555047] fsl_enetc 0000:00:00.2 eth1: configuring for fixed/internal link mode
[    9.562790] fsl_enetc 0000:00:00.2 eth1: Link is Up - 1Gbps/Full - flow control off
[    9.564661] 8021q: adding VLAN 0 to HW filter on device bond0
[    9.637681] fsl_enetc 0000:00:00.0 eth0: PHY [0000:00:00.0:02] driver [Qualcomm Atheros AR8031/AR8033] (irq=POLL)
[    9.655679] fsl_enetc 0000:00:00.0 eth0: configuring for inband/sgmii link mode
[    9.666611] mscc_felix 0000:00:00.5 swp0: configuring for inband/qsgmii link mode
[    9.676216] 8021q: adding VLAN 0 to HW filter on device swp0
[    9.682086] mscc_felix 0000:00:00.5 swp1: configuring for inband/qsgmii link mode
[    9.690700] 8021q: adding VLAN 0 to HW filter on device swp1
[    9.696538] mscc_felix 0000:00:00.5 swp2: configuring for inband/qsgmii link mode
[    9.705131] 8021q: adding VLAN 0 to HW filter on device swp2
[    9.710964] mscc_felix 0000:00:00.5 swp3: configuring for inband/qsgmii link mode
[    9.719548] 8021q: adding VLAN 0 to HW filter on device swp3
[    9.747811] Sending DHCP requests ..
[   12.742899] mscc_felix 0000:00:00.5 swp1: Link is Up - 1Gbps/Full - flow control rx/tx
[   12.743828] mscc_felix 0000:00:00.5 swp0: Link is Up - 1Gbps/Full - flow control off
[   12.747062] IPv6: ADDRCONF(NETDEV_CHANGE): swp1: link becomes ready
[   12.755216] fsl_enetc 0000:00:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[   12.766603] IPv6: ADDRCONF(NETDEV_CHANGE): swp0: link becomes ready
[   12.783188] mscc_felix 0000:00:00.5 swp2: Link is Up - 1Gbps/Full - flow control rx/tx
[   12.785354] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   12.799535] IPv6: ADDRCONF(NETDEV_CHANGE): swp2: link becomes ready
[   13.803141] mscc_felix 0000:00:00.5 swp3: Link is Up - 1Gbps/Full - flow control rx/tx
[   13.811646] IPv6: ADDRCONF(NETDEV_CHANGE): swp3: link becomes ready
[   15.452018] ., OK
[   15.470336] IP-Config: Got DHCP answer from 10.0.0.1, my address is 10.0.0.39
[   15.477887] IP-Config: Complete:
[   15.481330]      device=swp3, hwaddr=00:04:9f:05:de:0a, ipaddr=10.0.0.39, mask=255.255.255.0, gw=10.0.0.1
[   15.491846]      host=10.0.0.39, domain=(none), nis-domain=(none)
[   15.498429]      bootserver=10.0.0.1, rootserver=10.0.0.1, rootpath=
[   15.498481]      nameserver0=8.8.8.8
[   15.627542] fsl_enetc 0000:00:00.0 eth0: Link is Down
[   15.690903] mscc_felix 0000:00:00.5 swp0: Link is Down
[   15.745216] mscc_felix 0000:00:00.5 swp1: Link is Down
[   15.800498] mscc_felix 0000:00:00.5 swp2: Link is Down
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

46acf7bd

Revert "net: Have netpoll bring-up DSA management interface" · ea92000d

Vladimir Oltean authored Feb 05, 2021

This reverts commit 1532b977.

The above commit is good and it works, however it was meant as a bugfix
for stable kernels and now we have more self-contained ways in DSA to
handle the situation where the DSA master must be brought up.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ea92000d

net: dsa: automatically bring user ports down when master goes down · c0a8a9c2

Vladimir Oltean authored Feb 05, 2021

This is not fixing any actual bug that I know of, but having a DSA
interface that is up even when its lower (master) interface is down is
one of those things that just do not sound right.

Yes, DSA checks if the master is up before actually bringing the
user interface up, but nobody prevents bringing the master interface
down immediately afterwards... Then the user ports would attempt
dev_queue_xmit on an interface that is down, and wonder what's wrong.

This patch prevents that from happening. NETDEV_GOING_DOWN is the
notification emitted _before_ the master actually goes down, and we are
protected by the rtnl_mutex, so all is well.

For those of you reading this because you were doing switch testing
such as latency measurements for autonomously forwarded traffic, and you
needed a controlled environment with no extra packets sent by the
network stack, this patch breaks that, because now the user ports go
down too, which may shut down the PHY etc. But please don't do it like
that, just do instead:

tc qdisc add dev eno2 clsact
tc filter add dev eno2 egress flower action drop

Tested with two cascaded DSA switches:
$ ip link set eno2 down
sja1105 spi2.0 sw0p2: Link is Down
mscc_felix 0000:00:00.5 swp0: Link is Down
fsl_enetc 0000:00:00.2 eno2: Link is Down
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c0a8a9c2

net: dsa: automatically bring up DSA master when opening user port · 9d5ef190

Vladimir Oltean authored Feb 05, 2021

DSA wants the master interface to be open before the user port is due to
historical reasons. The promiscuity of interfaces that are down used to
have issues, as referenced Lennert Buytenhek in commit df02c6ff
("dsa: fix master interface allmulti/promisc handling").

The bugfix mentioned there, commit b6c40d68 ("net: only invoke
dev->change_rx_flags when device is UP"), was basically a "don't do
that" approach to working around the promiscuity while down issue.

Further work done by Vlad Yasevich in commit d2615bf4 ("net: core:
Always propagate flag changes to interfaces") has resolved the
underlying issue, and it is strictly up to the DSA and 8021q drivers
now, it is no longer mandated by the networking core that the master
interface must be up when changing its promiscuity.

From DSA's point of view, deciding to error out in dsa_slave_open
because the master isn't up is
(a) a bad user experience and
(b) knocking at an open door.
Even if there still was an issue with promiscuity while down, DSA could
still just open the master and avoid it.

Doing it this way has the additional benefit that user space can now
remove DSA-specific workarounds, like systemd-networkd with BindCarrier:
https://github.com/systemd/systemd/issues/7478

And we can finally remove one of the 2 bullets in the "Common pitfalls
using DSA setups" chapter.

Tested with two cascaded DSA switches:

$ ip link set sw0p2 up
fsl_enetc 0000:00:00.2 eno2: configuring for fixed/internal link mode
fsl_enetc 0000:00:00.2 eno2: Link is Up - 1Gbps/Full - flow control rx/tx
mscc_felix 0000:00:00.5 swp0: configuring for fixed/sgmii link mode
mscc_felix 0000:00:00.5 swp0: Link is Up - 1Gbps/Full - flow control off
8021q: adding VLAN 0 to HW filter on device swp0
sja1105 spi2.0 sw0p2: configuring for phy/rgmii-id link mode
IPv6: ADDRCONF(NETDEV_CHANGE): eno2: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): swp0: link becomes ready
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

9d5ef190

Merge branch 'net-hns3-updates-for-next' · 78936acc

Jakub Kicinski authored Feb 06, 2021

Huazhong Tan says:

====================
net: hns3: updates for -next

This series adds some code optimizations and compatibility
handlings for the HNS3 ethernet driver.

change log:
V2: refactor #2 as Jukub Kicinski reported and remove the part
    about RSS size which will not be different in different hw.
    updates netdev->max_mtu as well in #4 reported by Jakub Kicinski.

previous version:
V1: https://patchwork.kernel.org/project/netdevbpf/cover/1612269593-18691-1-git-send-email-tanhuazhong@huawei.com/
====================

Link: https://lore.kernel.org/r/1612513969-9278-1-git-send-email-tanhuazhong@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

78936acc

net: hns3: replace macro of max qset number with specification · 3f094bd1

Guangbin Huang authored Feb 05, 2021

The max qset number is a fixed value now and it is defined by a macro.
In order to support other value in different kinds of device, it is
better to use specification queried from firmware to replace macro.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

3f094bd1

net: hns3: debugfs add max tm rate specification print · 2783e77b

Guangbin Huang authored Feb 05, 2021

In order to add a method to check the specification of max tm rate
for debugging, function hns3_dbg_dev_specs() adds this value print.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2783e77b

net: hns3: add support for obtaining the maximum frame size · e070c8b9

Yufeng Mo authored Feb 05, 2021

Since the newer hardware may supports different frame size,
so add support to obtain the capability from the firmware
instead of the fixed value.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e070c8b9

net: hns3: optimize the code when update the tc info · 693e4415

GuoJia Liao authored Feb 05, 2021

When update the TC info for NIC, there are some differences
between PF and VF. Currently, four "vport->vport_id" are
used to distinguish PF or VF. So merge them into one to
improve readability and maintainability of code.
Signed-off-by: GuoJia Liao <liaoguojia@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

693e4415

net: hns3: RSS indirection table use device specification · 87ce161e

Guangbin Huang authored Feb 05, 2021

As RSS indirection table size may be different in different
hardware. Instead of using macro, this value is better to use
device specification which querying from firmware.

BTW, RSS indirection table should be allocated by the queried
size instead the static array.

.get_rss_indir_size in struct hnae3_ae_ops is not used now,
so remove it as well.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

87ce161e

net: hns3: add api capability bits for firmware · 1cef42c8

Jian Shen authored Feb 05, 2021

To improve the compatibility of firmware for driver, help firmware
to deal with different api commands, add api capability bits when
initialize the command queue.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

1cef42c8

Merge branch 'mptcp-misc-updates-for-tests-lock-annotation' · 74c05b9f

Jakub Kicinski authored Feb 06, 2021

Mat Martineau says:

====================
mptcp: Misc. updates for tests & lock annotation

Here are two fixes we've collected in the mptcp tree.

Patch 1 refactors a MPTCP selftest script to allow running a subset of
the tests.

Patch 2 adds some locking & might_sleep assertations.
====================

Link: https://lore.kernel.org/r/20210204232330.202441-1-mathew.j.martineau@linux.intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

74c05b9f

mptcp: pm: add lockdep assertions · 3abc05d9

Florian Westphal authored Feb 04, 2021

Add a few assertions to make sure functions are called with the needed
locks held.
Two functions gain might_sleep annotations because they contain
conditional calls to functions that sleep.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

3abc05d9

selftests: mptcp: add command line arguments for mptcp_join.sh · 1002b89f

Geliang Tang authored Feb 04, 2021

Since the mptcp_join script is becoming too big, this patch splits it
into several smaller chunks, each of them has been defined in a function
as a individual test group for several related testcases.

Using bash getopts function to parse command line arguments, and invoke
each function to do the individual test group.

Here are all the arguments:
  -f subflows_tests
  -s signal_address_tests
  -l link_failure_tests
  -t add_addr_timeout_tests
  -r remove_tests
  -a add_tests
  -6 ipv6_tests
  -4 v4mapped_tests
  -b backup_tests
  -p add_addr_ports_tests
  -c syncookies_tests
  -h help

Run mptcp_join.sh with no argument will execute all testcases.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

1002b89f

Merge branch 'dpaa2-add-1000base-x-support' · 502c65af

Jakub Kicinski authored Feb 06, 2021

Russell King says:

====================
dpaa2: add 1000base-X support

This patch series adds 1000base-X support to pcs-lynx and DPAA2,
allowing runtime switching between SGMII and 1000base-X. This is
a pre-requisit for SFP module support on the SolidRun ComExpress 7.

v2: updated with Ioana's r-b's, and comment on backplane support
====================

Link: https://lore.kernel.org/r/20210205103859.GH1463@shell.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>

502c65af

net: dpaa2-mac: add backplane link mode support · 085f1776

Russell King authored Feb 05, 2021

Add support for backplane link mode, which is, according to discussions
with NXP earlier in the year, is a mode where the OS (Linux) is able to
manage the PCS and Serdes itself.

This commit prepares the ground work for allowing 1G fiber connections
to be used with DPAA2 on the SolidRun CEX7 platforms.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

085f1776

net: dpaa2-mac: add 1000BASE-X support · 46c518c8

Russell King authored Feb 05, 2021

Now that pcs-lynx supports 1000BASE-X, add support for this interface
mode to dpaa2-mac. pcs-lynx can be switched at runtime between SGMII
and 1000BASE-X mode, so allow dpaa2-mac to switch between these as
well.

This commit prepares the ground work for allowing 1G fiber connections
to be used with DPAA2 on the SolidRun CEX7 platforms.
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

46c518c8

net: pcs: add pcs-lynx 1000BASE-X support · 694a0006

Russell King authored Feb 05, 2021

Add support for 1000BASE-X to pcs-lynx for the LX2160A.

This commit prepares the ground work for allowing 1G fiber connections
to be used with DPAA2 on the SolidRun CEX7 platforms.
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

694a0006