Commits · 1aac309d32075e73d1f93208b38cd2d5f03e0a5c · Kirill Smelkov / linux

22 Aug, 2021 1 commit

Alex Elder authored Aug 20, 2021

Use runtime power management autosuspend.

Up until this point, we only suspended the IPA hardware for system
suspend; now we'll suspend it aggressively using runtime power
management, setting the initial autosuspend delay to half a second
of inactivity.

Replace pm_runtime_put() calls with pm_runtime_put_autosuspend(),
call pm_runtime_mark_last_busy() before each of those.  In places
where we're shutting things down, or decrementing power references
for errors, use pm_runtime_put_noidle() instead.

Finally, remove ipa_runtime_idle(), so the ->runtime_suspend
callback will occur if idle.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

1aac309d

20 Aug, 2021 39 commits

Merge tag 'mac80211-next-for-net-next-2021-08-20' of... · 4af14dba

Jakub Kicinski authored Aug 20, 2021

Merge tag 'mac80211-next-for-net-next-2021-08-20' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Minor updates:
 * BSS coloring support
 * MEI commands for Intel platforms
 * various fixes/cleanups

* tag 'mac80211-next-for-net-next-2021-08-20' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next:
  cfg80211: fix BSS color notify trace enum confusion
  mac80211: Fix insufficient headroom issue for AMSDU
  mac80211: add support for BSS color change
  nl80211: add support for BSS coloring
  mac80211: Use flex-array for radiotap header bitmap
  mac80211: radiotap: Use BIT() instead of shifts
  mac80211: Remove unnecessary variable and label
  mac80211: include <linux/rbtree.h>
  mac80211: Fix monitor MTU limit so that A-MSDUs get through
  mac80211: remove unnecessary NULL check in ieee80211_register_hw()
  mac80211: Reject zero MAC address in sta_info_insert_check()
  nl80211: vendor-cmd: add Intel vendor commands for iwlmei usage
====================

Link: https://lore.kernel.org/r/20210820105329.48674-1-johannes@sipsolutions.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4af14dba

Merge branch 'bridge-vlan' · 0ba218e2

David S. Miller authored Aug 20, 2021

Nikolay Aleksandrov says:

====================
net: bridge: mcast: add support for port/vlan router control

This small set adds control over port/vlan mcast router config.
Initially I had added host vlan entry router control via vlan's global
options but that is really unnecessary and we can use a single per-vlan
option to control it both for port/vlan and host/vlan entries. Since
it's all still in net-next we can convert BRIDGE_VLANDB_GOPTS_MCAST_ROUTER
to BRIDGE_VLANDB_ENTRY_MCAST_ROUTER and use it for both. That makes much
more sense and is easier for user-space. Patch 01 prepares the port
router function to be used with port mcast context instead of port and
then patch 02 converts the global vlan mcast router option to per-vlan
mcast router option which directly gives us both host/vlan and port/vlan
mcast router control without any additional changes.

This way we get the following coherent syntax:
 [ port/vlan mcast router]
 $ bridge vlan set vid 100 dev ens20 mcast_router 2

 [ bridge/vlan mcast router ]
 $ bridge vlan set vid 100 dev bridge mcast_router 2
instead of:
 $ bridge vlan set vid 100 dev bridge mcast_router 1 global

The mcast_router should not be regarded as a global option, it controls
the port/vlan and bridge/vlan mcast router behaviour.

This is the last set needed for the initial per-vlan mcast support.
Next patch-sets:
 - iproute2 support
 - selftests
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0ba218e2

net: bridge: vlan: convert mcast router global option to per-vlan entry · 2796d846

Nikolay Aleksandrov authored Aug 20, 2021

The per-vlan router option controls the port/vlan and host vlan entries'
mcast router config. The global option controlled only the host vlan
config, but that is unnecessary and incosistent as it's not really a
global vlan option, but rather bridge option to control host router
config, so convert BRIDGE_VLANDB_GOPTS_MCAST_ROUTER to
BRIDGE_VLANDB_ENTRY_MCAST_ROUTER which can be used to control both host
vlan and port vlan mcast router config.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2796d846

net: bridge: mcast: br_multicast_set_port_router takes multicast context as argument · a53581d5

Nikolay Aleksandrov authored Aug 20, 2021

Change br_multicast_set_port_router to take port multicast context as
its first argument so we can later use it to control port/vlan mcast
router option.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a53581d5

octeontx2-pf: Add check for non zero mcam flows · a515e5b5

Sunil Goutham authored Aug 20, 2021

This patch ensures that mcam flows are allocated
before adding or destroying the flows.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a515e5b5

tools/net: Use bitwise instead of arithmetic operator for flags · fa16ee77

jing yangyang authored Aug 19, 2021

This silences the following coccinelle warning:

"WARNING: sum of probable bitmasks, consider |"
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: jing yangyang <jing.yangyang@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa16ee77

Merge branch 'ipa-kill-off-ipa_clock_get' · c1125062

David S. Miller authored Aug 20, 2021

Alex Elder says:

====================
net: ipa: kill off ipa_clock_get()

This series replaces the remaining uses of ipa_clock_get() with
calls to pm_runtime_get_sync() instead.  It replaces all calls to
ipa_clock_put() with calls to pm_runtime_put().

This completes the preparation for enabling automated suspend under
the control of the power management core code.  The next patch (in
an upcoming series) enables that.  Then the "ipa_clock" files and
symbols will switch to using an "ipa_power" naming convention instead.

Additional info

It is possible for pm_runtime_get_sync() to return an error.  There
are really three cases, identified by return value:
  - 1, meaning power was already active
  - 0, meaning power was not previously active, but is now
  - EACCES, meaning runtime PM is disabled
One additional case is EINVAL, meaning a previous suspend or resume
(or idle) call returned an error.  But we have always assumed this
won't happen (we previously didn't even check for an error).

But because we use pm_runtime_force_suspend() to implement system
suspend, there's a chance we'd get an EACCES error (the first thing
that function does is disable runtime suspend).  Individual patches
explain what happens in that case, but generally we just accept that
it could be an unlikely problem (occurring only at startup time).

Similarly, pm_runtime_put() could return an error.  There too, we
ignore EINVAL, assuming the IPA suspend and resume operations won't
produce an error.  EBUSY and EPERM are not applicable, EAGAIN is not
expected (and harmless).  We should never get EACCES (runtime
suspend disabled), because pm_runtime_put() calls match prior
pm_runtime_get_sync() calls, and a system suspend will not be
started while a runtime suspend or resume is underway.  In summary,
the value returned from pm_runtime_put() is not meaningful, so we
explicitly ignore it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c1125062

net: ipa: kill ipa_clock_get() · c3f115aa

Alex Elder authored Aug 19, 2021

The only remaining user of the ipa_clock_{get,put}() interface is
ipa_isr_thread().  Replace calls to ipa_clock_get() there calling
pm_runtime_get_sync() instead.  And call pm_runtime_put() there
rather than ipa_clock_put().  Warn if we ever get an error.

With that, we can get rid of ipa_clock_get() and ipa_clock_put().
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3f115aa

net: ipa: don't use ipa_clock_get() in "ipa_modem.c" · 724c2d74

Alex Elder authored Aug 19, 2021

When we open or close the modem network device we need to ensure the
hardware is powered.  Replace the callers of ipa_clock_get() found
in ipa_open() and ipa_stop() with calls to pm_runtime_get_sync().
If an error is returned, simply return that error to the caller
(without any error or warning message).  This could conceivably
occur if the function was called while the system was suspended,
but that really shouldn't happen.  Replace corresponding calls to
ipa_clock_put() with pm_runtime_put() also.

If the modem crashes we also need to ensure the hardware is powered
to recover.  If getting power returns an error there's not much we
can do, but at least report the error.  (Ideally the remoteproc SSR
code would ensure the AP was not suspended when it sends the
notification, but that is not (yet) the case.)
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

724c2d74

net: ipa: don't use ipa_clock_get() in "ipa_uc.c" · 799c5c24

Alex Elder authored Aug 19, 2021

Replace the ipa_clock_get() call in ipa_uc_clock() when taking the
"proxy" clock reference for the microcontroller with a call to
pm_runtime_get_sync().  Replace calls of ipa_clock_put() for the
microcontroller with pm_runtime_put() calls instead.

There is a chance we get an error when taking the microcontroller
power reference.  This is an unlikely scenario, where system suspend
is initiated just before we learn the modem is booting.  For now
we'll just accept that this could occur, and report it if it does.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

799c5c24

net: ipa: don't use ipa_clock_get() in "ipa_smp2p.c" · c43adc75

Alex Elder authored Aug 19, 2021

If the "modem-init" Device Tree property is present for a platform,
the modem performs early IPA hardware initialization, and signals
this is complete with an "ipa-setup-ready" SMP2P interrupt.  This
triggers a call to ipa_setup(), which requires the hardware to be
powered.

Replace the call to ipa_clock_get() in this case with a call to
pm_runtime_get_sync().  And replace the corresponding calls to
ipa_clock_put() with calls to pm_runtime_put() instead.

There is a chance we get an error when taking this power reference.
This is an unlikely scenario, where system suspend is initiated just
before the modem signals it has finished initializing the IPA
hardware.  For now we'll just accept that this could occur, and
report it if it does.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c43adc75

net: ipa: don't use ipa_clock_get() in "ipa_main.c" · 4c6a4da8

Alex Elder authored Aug 19, 2021

We need the hardware to be powered starting at the config stage of
initialization when the IPA driver probes.  And we need it powered
when the driver is removed, at least until the deconfig stage has
completed.

Replace callers of ipa_clock_get() in ipa_probe() and ipa_exit(),
calling pm_runtime_get_sync() instead.  Replace the corresponding
callers of ipa_clock_put(), calling pm_runtime_put() instead.

The only error we expect when getting power would occur when the
system is suspended.  The ->probe and ->remove driver callbacks
won't be called when suspended, so issue a WARN() call if an error
is seen getting power.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

4c6a4da8

net: ipa: fix TX queue race · b8e36e13

Alex Elder authored Aug 19, 2021

Jakub Kicinski pointed out a race condition in ipa_start_xmit() in a
recently-accepted series of patches:
  https://lore.kernel.org/netdev/20210812195035.2816276-1-elder@linaro.org/
We are stopping the modem TX queue in that function if the power
state is not active.  We restart the TX queue again once hardware
resume is complete.

  TX path                       Power Management
  -------                       ----------------
  pm_runtime_get(); no power    Start resume
  Stop TX queue                      ...
  pm_runtime_put()              Resume complete
  return NETDEV_TX_BUSY         Start TX queue

  pm_runtime_get()
  Power present, transmit
  pm_runtime_put()              (auto-suspend)

The issue is that the power management (resume) activity and the
network transmit activity can occur concurrently, and there's a
chance the queue will be stopped *after* it has been started again.

  TX path                       Power Management
  -------                       ----------------
                                Resume underway
  pm_runtime_get(); no power         ...
                                Resume complete
                                Start TX queue
  Stop TX queue       <-- No more transmits after this
  pm_runtime_put()
  return NETDEV_TX_BUSY

We address this using a STARTED flag to indicate when the TX queue
has been started from the resume path, and a spinlock to make the
flag and queue updates happen atomically.

  TX path                       Power Management
  -------                       ----------------
                                Resume underway
  pm_runtime_get(); no power    Resume complete
                                start TX queue     \
  If STARTED flag is *not* set:                     > atomic
      Stop TX queue             set STARTED flag   /
  pm_runtime_put()
  return NETDEV_TX_BUSY

A second flag is used to address a different race that involves
another path requesting power.

  TX path            Other path              Power Management
  -------            ----------              ----------------
                     pm_runtime_get_sync()   Resume
                                             Start TX queue   \ atomic
                                             Set STARTED flag /
                     (do its thing)
                     pm_runtime_put()
                                             (auto-suspend)
  pm_runtime_get()                           Mark delayed resume
  STARTED *is* set, so
    do *not* stop TX queue  <-- Queue should be stopped here
  pm_runtime_put()
  return NETDEV_TX_BUSY                      Suspend done, resume
                                             Resume complete
  pm_runtime_get()
  Stop TX queue
    (STARTED is *not* set)                   Start TX queue   \ atomic
  pm_runtime_put()                           Set STARTED flag /
  return NETDEV_TX_BUSY

So a STOPPED flag is set in the transmit path when it has stopped
the TX queue, and this pair of operations is also protected by the
spinlock.  The resume path only restarts the TX queue if the STOPPED
flag is set.  This case isn't a major problem, but it avoids the
"non-trivial amount of useless work" done by the networking stack
when NETDEV_TX_BUSY is returned.

Fixes: 6b51f802 ("net: ipa: ensure hardware has power in ipa_start_xmit()")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

b8e36e13

Merge branch 'ocelot-vlan' · 6505782c

David S. Miller authored Aug 20, 2021

Vladimir Oltean says:

====================
Small ocelot VLAN improvements

This small series propagates some VLAN restrictions via netlink extack
and creates some helper functions instead of open-coding VLAN table
manipulations from multiple places.

This is split from the larger "DSA FDB isolation" series, hence the v2
tag:
https://patchwork.kernel.org/project/netdevbpf/cover/20210818120150.892647-1-vladimir.oltean@nxp.com/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6505782c

net: mscc: ocelot: use helpers for port VLAN membership · bbf6a2d9

Vladimir Oltean authored Aug 19, 2021

This is a mostly cosmetic patch that creates some helpers for accessing
the VLAN table. These helpers are also a bit more careful in that they
do not modify the ocelot->vlan_mask unless the hardware operation
succeeded.

Not all callers check the return value (the init code doesn't), but anyway.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bbf6a2d9

net: mscc: ocelot: transmit the VLAN filtering restrictions via extack · 3b95d1b2

Vladimir Oltean authored Aug 19, 2021

We need to transmit more restrictions in future patches, convert this
one to netlink extack.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3b95d1b2

net: mscc: ocelot: transmit the "native VLAN" error via extack · 01af940e

Vladimir Oltean authored Aug 19, 2021

We need to reject some more configurations in future patches, convert
the existing one to netlink extack.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

01af940e

Merge branch 'ocelot-phylink-fixes' · f2aea90d

David S. Miller authored Aug 20, 2021

Vladimir Oltean says:

====================
Ocelot phylink fixes

This series addresses a regression reported by Horatiu which introduced
by the ocelot conversion to phylink: there are broken device trees in
the wild, and the driver fails to probe the entire switch when a port
fails to probe, which it previously did not do.

Continue probing even when some ports fail to initialize properly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f2aea90d

net: mscc: ocelot: allow probing to continue with ports that fail to register · 5c8bb71d

Vladimir Oltean authored Aug 19, 2021

The existing ocelot device trees, like ocelot_pcb123.dts for example,
have SERDES ports (ports 4 and higher) that do not have status = "disabled";
but on the other hand do not have a phy-handle or a fixed-link either.

So from the perspective of phylink, they have broken DT bindings.

Since the blamed commit, probing for the entire switch will fail when
such a device tree binding is encountered on a port. There used to be
this piece of code which skipped ports without a phy-handle:

	phy_node = of_parse_phandle(portnp, "phy-handle", 0);
	if (!phy_node)
		continue;

but now it is gone.

Anyway, fixed-link setups are a thing which should work out of the box
with phylink, so it would not be in the best interest of the driver to
add that check back.

Instead, let's look at what other drivers do. Since commit 86f8b1c0
("net: dsa: Do not make user port errors fatal"), DSA continues after a
switch port fails to register, and works only with the ports that
succeeded.

We can achieve the same behavior in ocelot by unregistering the devlink
port for ports where ocelot_port_phylink_create() failed (called via
ocelot_probe_port), and clear the bit in devlink_ports_registered for
that port. This will make the next iteration reconsider the port that
failed to probe as an unused port, and re-register a devlink port of
type UNUSED for it. No other cleanup should need to be performed, since
ocelot_probe_port() should be self-contained when it fails.

Fixes: e6e12df6 ("net: mscc: ocelot: convert to phylink")
Reported-and-tested-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c8bb71d

net: mscc: ocelot: be able to reuse a devlink_port after teardown · b5e33a15

Horatiu Vultur authored Aug 19, 2021

There are cases where we would like to continue probing the switch even
if one port has failed to probe. When that happens, we need to
unregister a devlink_port of type DEVLINK_PORT_FLAVOUR_PHYSICAL and
re-register it of type DEVLINK_PORT_FLAVOUR_UNUSED.

This is fine, except when calling devlink_port_attrs_set on a structure
on which devlink_port_register has been previously called, there is a
WARN_ON in devlink_port_attrs_set that devlink_port->devlink must be
NULL.

So don't assume that the memory behind dlp is clean when calling
ocelot_port_devlink_init, just zero-initialize it.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b5e33a15

Merge branch 'dpaa2-switch-phylikn-fixes' · 42edc1fc

David S. Miller authored Aug 20, 2021

Vladimir Oltean says:

====================
dpaa2-switch phylink fixes

This is fixing two regressions introduced by the recent conversion of
the dpaa2-switch driver to phylink.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

42edc1fc

net: dpaa2-switch: call dpaa2_switch_port_disconnect_mac on probe error path · 860fe1f8

Vladimir Oltean authored Aug 19, 2021

Currently when probing returns an error, the netdev is freed but
phylink_disconnect is not called.

Create a common function between the unbind path and the error path,
call it the opposite of dpaa2_switch_probe_port: dpaa2_switch_remove_port,
and call it from both the unbind and the error path.

Fixes: 84cba729 ("dpaa2-switch: integrate the MAC endpoint support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

860fe1f8

net: dpaa2-switch: phylink_disconnect_phy needs rtnl_lock · d52ef12f

Vladimir Oltean authored Aug 19, 2021

There is an ASSERT_RTNL in phylink_disconnect_phy which triggers
whenever dpaa2_switch_port_disconnect_mac is called.

To follow the pattern established by dpaa2_eth_disconnect_mac, take the
rtnl_mutex every time we call dpaa2_switch_port_disconnect_mac.

Fixes: 84cba729 ("dpaa2-switch: integrate the MAC endpoint support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d52ef12f

Merge branch 'gmii2rgmii-loopback' · 6985157c

David S. Miller authored Aug 20, 2021

Gerhard Engleder says:

====================
Add Xilinx GMII2RGMII loopback support

The Xilinx GMII2RGMII driver overrides PHY driver functions in order to
configure the device according to the link speed of the PHY attached to
it. This is implemented for a normal link but not for loopback.

Andrew told me to use phy_loopback and this changes make phy_loopback
work in combination with Xilinx GMII2RGMII.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6985157c

net: phy: gmii2rgmii: Support PHY loopback · ceaeaafc

Gerhard Engleder authored Aug 19, 2021

Configure speed if loopback is used. read_status is not called for
loopback.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ceaeaafc

net: phy: Uniform PHY driver access · 3ac8eed6

Gerhard Engleder authored Aug 19, 2021

struct phy_device contains a pointer to the PHY driver and nearly
everywhere this pointer is used to access the PHY driver. Only
mdio_bus_phy_may_suspend() is still using to_phy_driver() instead of the
PHY driver pointer. Uniform PHY driver access by eliminating
to_phy_driver() use in mdio_bus_phy_may_suspend().

Only phy_bus_match() and phy_probe() are still using to_phy_driver(),
because PHY driver pointer is not available there.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ac8eed6

net: phy: Support set_loopback override · 4ed311b0

Gerhard Engleder authored Aug 19, 2021

phy_read_status and various other PHY functions support PHY specific
overriding of driver functions by using a PHY specific pointer to the
PHY driver. Add support of PHY specific override to phy_loopback too.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4ed311b0

Merge branch 'sparx5-dma' · 600003a3

David S. Miller authored Aug 20, 2021

Steen Hegelund says:

====================
Adding Frame DMA functionality to Sparx5

v2:
    Removed an unused variable (proc_ctrl) from sparx5_fdma_start.

This add frame DMA functionality to the Sparx5 platform.

Until now the Sparx5 SwitchDev driver has been using register based
injection and extraction when sending frames to/from the host CPU.

With this series the Frame DMA functionality now added.

The Frame DMA is only used if the Frame DMA interrupt is configured in the
device tree; otherwise the existing register based injection and extraction
is used.

The Sparx5 has two ports that can be used for sending and receiving frames,
but there are 8 channels that can be configured: 6 for injection and 2 for
extraction.

The additional channels can be used for more advanced scenarios e.g. where
virtual cores are used, but currently the driver only uses port 0 and
channel 0 and 6 respectively.

DCB (data control block) structures are passed to the Frame DMA with
suitable information about frame start/end etc, as well as pointers to DB
(data blocks) buffers.

The Frame DMA engine can use interrupts to signal back when the frames have
been injected or extracted.

There is a limitation on the DB alignment also for injection: Block must
start on 16byte boundaries, and this is why the driver currently copies the
data to into separate buffers.

The Sparx5 switch core needs a IFH (Internal Frame Header) to pass
information from the port to the switch core, and this header is added
before injection and stripped after extraction.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

600003a3

arm64: dts: sparx5: Add the Sparx5 switch frame DMA support · 920c293a

Steen Hegelund authored Aug 19, 2021

This adds the interrupt for the Sparx5 Frame DMA.

If this configuration is present the Sparx5 SwitchDev driver will use the
Frame DMA feature, and if not it will use register based injection and
extraction for sending and receiving frames to the CPU.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

920c293a

net: sparx5: switchdev: adding frame DMA functionality · 10615907

Steen Hegelund authored Aug 19, 2021

This add frame DMA functionality to the Sparx5 platform.

Ethernet frames can be extracted or injected autonomously to or from the
device’s DDR3/DDR3L memory and/or PCIe memory space. Linked list data
structures in memory are used for injecting or extracting Ethernet frames.
The FDMA generates interrupts when frame extraction or injection is done
and when the linked lists need updating.

The FDMA implements two extraction channels, one per switch core port
towards the VCore CPU system and a total of six injection channels.
Extraction channels are mapped one-to-one to the CPU ports, while injection
channels can be individually assigned to any CPU port.

- FDMA channel 0 through 5 corresponds to CPU port 0 injection direction
  FDMA_CH_CFG[channel].CH_INJ_PORT is set to 0.
- FDMA channel 0 through 5 corresponds to CPU port 1 injection direction when
  FDMA_CH_CFG[channel].CH_INJ_PORT is set to 1.
- FDMA channel 6 corresponds to CPU port 0 extraction direction.
- FDMA channel 7 corresponds to CPU port 1 extraction direction.

The FDMA implements a strict priority scheme among channels. Extraction
channels are prioritized over injection channels and secondarily channels
with higher channel number are prioritized over channels with lower number.
On the other hand, ports are being served on an equal-bandwidth principle
both on injection and extraction directions.  The equal-bandwidth principle
will not force an equal bandwidth. Instead, it ensures that the ports
perform at their best considering the operating conditions.

When more than one injection channel is enabled for injection on the same
CPU port, priority determines which channel can inject data. Ownership
is re-arbitrated on frame boundaries.

The FDMA processes linked lists of DMA Control Block Structures (DCBs). The
DCBs have the same basic structure for both injection and extraction. A DCB
must be placed on a 64-bit word-aligned address in memory. Each DCB has a
per-channel configurable amount of associated data blocks in memory, where
the frame data is stored.

The data blocks that are used by extraction channels must be placed on
64-bit word aligned addresses in memory, and their length must be a
multiple of 128 bytes.

A DCB carries the pointer to the next DCB of the linked list, the INFO word
which holds information for the DCB, and a pair of status word and memory
pointer for every data block that it is associated with.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

10615907

Merge tag 'batadv-next-pullrequest-20210820' of git://git.open-mesh.org/linux-merge · f402303b

David S. Miller authored Aug 20, 2021

Simon Wunderlich says:

====================
This (updated) cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - update docs about move IRC channel away from freenode,
   by Sven Eckelmann (updated, added missing sign-off)

 - Switch to kstrtox.h for kstrtou64, by Sven Eckelmann

 - Update NULL checks, by Sven Eckelmann (2 patches)

 - remove remaining skb-copy calls for broadcast packets,
   by Linus Lüssing
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f402303b

Merge tag 'mlx5-updates-2021-08-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · f96b48c6

David S. Miller authored Aug 20, 2021

Saeed Mahameed says:

====================
mlx5-updates-2021-08-19

This series introduces the support for two new mlx5 features:

1) Sample offload for tunneled traffic
2) devlink rate objects support

1) From Chris Mi: Sample offload for tunneled traffic
=====================================================

Background and solution
-----------------------

Currently the sample offload actions send the encapsulated packet
to software. This series de-capsulates the packet before performing
the sampling and set the tunnel properties on the skb metadata
fields to make the behavior consistent with OVS sFlow.

If de-capsulating first, we can't use the same match like before in
default table. So instantiate a post action instance to continue
processing the action list. If HW can preserve reg_c, also use the
post action instance.

Post action infrastructure
--------------------------

Some tc actions are modeled in hardware using multiple tables
causing a tc action list split. For example, CT action is modeled
by jumping to a ct table which is controlled by nf flow table.
sFlow jumps in hardware to a sample table, which continues to a
"default table" where it should continue processing the action list.

Multi table actions are modeled in hardware using a unique fte_id.
The fte_id is set before jumping to a table. Split actions continue
to a post-action table where the matched fte_id value continues the
execution the tc action list.

This series also introduces post action infrastructure. Both ct and
sample use it.

Sample for tunnel in TC SW
--------------------------

tc filter add dev vxlan1 protocol ip parent ffff: prio 3		\
	flower src_mac 24:25:d0:e1:00:00 dst_mac 02:25:d0:13:01:02	\
	enc_src_ip 192.168.1.14 enc_dst_ip 192.168.1.13			\
	enc_dst_port 4789 enc_key_id 4					\
	action sample rate 1 group 6					\
	action tunnel_key unset						\
	action mirred egress redirect dev enp4s0f0_1

MLX5 sample HW offload
----------------------

For the following typical flow table:

+-------------------------------+
+       original flow table     +
+-------------------------------+
+         original match        +
+-------------------------------+
+ sample action + other actions +
+-------------------------------+

We translate the tc filter with sample action to the following HW model:

        +---------------------+
        + original flow table +
        +---------------------+
        +   original match    +
        +---------------------+
              | set fte_id (if reg_c preserve cap)
              | do decap
              v
+------------------------------------------------+
+                Flow Sampler Object             +
+------------------------------------------------+
+                    sample ratio                +
+------------------------------------------------+
+    sample table id    |    default table id    +
+------------------------------------------------+
           |                            |
           v                            v
+-----------------------------+  +-------------------+
+        sample table         +  +   default table   +
+-----------------------------+  +-------------------+
+ forward to management vport +             |
+-----------------------------+             |
                                    +-------+------+
                                    |              |reg_c preserve cap
                                    |              |or decap action
                                    v              v
                       +-----------------+   +-------------+
                       + per vport table +   + post action +
                       +-----------------+   +-------------+
                       + original match  +
                       +-----------------+
                       + other actions   +
                       +-----------------+

2) From Dmytro Linkin: devlink rate object support for mlx5_core driver
=======================================================================

HIGH-LEVEL OVERVIEW

Devlink leaf rate objects created per vport (VF/SF, and PF on BlueField)
in switchdev mode on devlink port registration.
Implement devlink ops callbacks to create/destroy rate groups, set TX
rate values of the vport/group, assign vport to the group.
Driver accepts TX rate values as fraction of 1Mbps.

Refactor existing eswitch QoS infrastructure to be accessible by legacy
NDO rate API and new devlink rate API. NDO rate API is not
removed/disabled in switchdev mode to not break existing users. Rate
values configured with NDO rate API are not visible for devlink
infrastructure, therefore APIs should not be used simultaneously.

IMPLEMENTATION DETAILS

Driver provide two level rate hierarchy to manage bandwidth - group
level and vport level. Initially each vport added to internal unlimited
group created by default. Each rate element (vport or group) receive
bandwidth relative to its parent element (for groups the parent is a
physical link itself) in a Round Robin manner, where element get
bandwidth value according to its weight. Example:

Created four rate groups with tx_share limits:

$ devlink port function rate add \
    pci/0000:06:00.0/group_1 tx_share 30gbit
$ devlink port function rate add \
    pci/0000:06:00.0/group_2 tx_share 20gbit
$ devlink port function rate add \
    pci/0000:06:00.0/group_3 tx_share 20gbit
$ devlink port function rate add \
    pci/0000:06:00.0/group_4 tx_share 10gbit

Weights created in HW for each group are relative to the bigest tx_share
value, which is 30gbit:

<group_1> 1.0
<group_2> 0.67
<group_3> 0.67
<group_4> 0.33

Assuming link speed is 50 Gbit/sec and each group can sustain such
amount of traffic, maximum bandwidth is 50 / (1.0 + 0.67 + 0.67 + 0.33)
 = ~18.75 Gbit/sec. Normilized bandwidth values for groups:

<group_1> 18.75 * 1.0  = 18.75 Gbit/sec
<group_2> 18.75 * 0.67 = 12.5 Gbit/sec
<group_3> 18.75 * 0.67 = 12.5 Gbit/sec
<group_4> 18.75 * 0.33 = 6.25 Gbit/sec

If in example above group_1 doesn't produce any traffic, then maximum
bandwidth becomes 50 / (0.67 + 0.67 + 0.33) = ~30.0 Gbit/sec. Normalized
values:

<group_2> 30.0 * 0.67 = 20.0 Gbit/sec
<group_3> 30.0 * 0.67 = 20.0 Gbit/sec
<group_4> 30.0 * 0.33 = 10.0 Gbit/sec

Same normalization applied to each vport in the group.

Normalized values are internal, therefore driver provides QoS
tracepoints for next events:

* vport rate element creation/deletion:
* vport rate element configuration;
* group rate element creation/deletion;
* group rate element configuration.

PATCHES OVERVIEW

1 - Moving and isolation of eswitch QoS logic in separate file;

2 - Implement devlink leaf rate object support for vports;

3 - Implement rate groups creation/deletion;

4 - Implement TX rate management for the groups;

5 - Implement parent set for vports;

6 - Eswitch QoS tracepoints.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f96b48c6

Merge tag 'for-net-next-2021-08-19' of... · e61fbee7

David S. Miller authored Aug 20, 2021

Merge tag 'for-net-next-2021-08-19' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Luiz Augusto von Dentz says:

====================
bluetooth-next pull request for net-next:

 - Add support for Foxconn Mediatek Chip
 - Add support for LG LGSBWAC92/TWCM-K505D
 - hci_h5 flow control fixes and suspend support
 - Switch to use lock_sock for SCO and RFCOMM
 - Various fixes for extended advertising
 - Reword Intel's setup on btusb unifying the supported generations
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e61fbee7

Merge tag 'batadv-next-pullrequest-20210819' of git://git.open-mesh.org/linux-merge · 815cc21d

David S. Miller authored Aug 20, 2021

Simon Wunderlich says:

====================
This cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - update docs about move IRC channel away from freenode,
   by Sven Eckelmann

 - Switch to kstrtox.h for kstrtou64, by Sven Eckelmann

 - Update NULL checks, by Sven Eckelmann (2 patches)

 - remove remaining skb-copy calls for broadcast packets,
   by Linus Lüssing
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

815cc21d

batman-adv: bcast: remove remaining skb-copy calls · a006aa51

Linus Lüssing authored May 17, 2021

We currently have two code paths for broadcast packets:

A) self-generated, via batadv_interface_tx()->
   batadv_send_bcast_packet().
B) received/forwarded, via batadv_recv_bcast_packet()->
   batadv_forw_bcast_packet().

For A), self-generated broadcast packets:

The only modifications to the skb data is the ethernet header which is
added/pushed to the skb in
batadv_send_broadcast_skb()->batadv_send_skb_packet(). However before
doing so, batadv_skb_head_push() is called which calls skb_cow_head() to
unshare the space for the to be pushed ethernet header. So for this
case, it is safe to use skb clones.

For B), received/forwarded packets:

The same applies as in A) for the to be forwarded packets. Only the
ethernet header is added. However after (queueing for) forwarding the
packet in batadv_recv_bcast_packet()->batadv_forw_bcast_packet(), a
packet is additionally decapsulated and is sent up the stack through
batadv_recv_bcast_packet()->batadv_interface_rx().

Protocols higher up the stack are already required to check if the
packet is shared and create a copy for further modifications. When the
next (protocol) layer works correctly, it cannot happen that it tries to
operate on the data behind the skb clone which is still queued up for
forwarding.
Co-authored-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>

a006aa51

batman-adv: Drop NULL check before dropping references · a2b7b148

Sven Eckelmann authored Aug 08, 2021

The check if a batman-adv related object is NULL or not is now directly in
the batadv_*_put functions. It is not needed anymore to perform this check
outside these function:

The changes were generated using a coccinelle semantic patch:

  @@
  expression E;
  @@
  - if (likely(E != NULL))
  (
  batadv_backbone_gw_put
  |
  batadv_claim_put
  |
  batadv_dat_entry_put
  |
  batadv_gw_node_put
  |
  batadv_hardif_neigh_put
  |
  batadv_hardif_put
  |
  batadv_nc_node_put
  |
  batadv_nc_path_put
  |
  batadv_neigh_ifinfo_put
  |
  batadv_neigh_node_put
  |
  batadv_orig_ifinfo_put
  |
  batadv_orig_node_put
  |
  batadv_orig_node_vlan_put
  |
  batadv_softif_vlan_put
  |
  batadv_tp_vars_put
  |
  batadv_tt_global_entry_put
  |
  batadv_tt_local_entry_put
  |
  batadv_tt_orig_list_entry_put
  |
  batadv_tt_req_node_put
  |
  batadv_tvlv_container_put
  |
  batadv_tvlv_handler_put
  )(E);
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>

a2b7b148

batman-adv: Check ptr for NULL before reducing its refcnt · e78783da

Sven Eckelmann authored Aug 08, 2021

The commit b37a4668 ("netdevice: add the case if dev is NULL") changed
the way how the NULL check for net_devices have to be handled when trying
to reduce its reference counter. Before this commit, it was the
responsibility of the caller to check whether the object is NULL or not.
But it was changed to behave more like kfree. Now the callee has to handle
the NULL-case.

The batman-adv code was scanned via cocinelle for similar places. These
were changed to use the paradigm

  @@
  identifier E, T, R, C;
  identifier put;
  @@
   void put(struct T *E)
   {
  +	if (!E)
  +		return;
  	kref_put(&E->C, R);
   }

Functions which were used in other sources files were moved to the header
to allow the compiler to inline the NULL check and the kref_put call.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>

e78783da

batman-adv: Switch to kstrtox.h for kstrtou64 · 55207227

Sven Eckelmann authored Jul 23, 2021

The commit 4c527293 ("kernel.h: split out kstrtox() and simple_strtox()
to a separate header") moved the kstrtou64 function to a new header called
linux/kstrtox.h.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>

55207227

batman-adv: Move IRC channel to hackint.org · 3baa9f52

Sven Eckelmann authored Jun 13, 2021

Due to recent developments around the Freenode.org IRC network, the
opinions about the usage of this service shifted dramatically. The majority
of the still active users of the #batman channel prefers a move to the
hackint.org network.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>

3baa9f52