Commits · 9e855b1fe37f7ed6de7a2767edb49b851ab6c52f · Kirill Smelkov / linux

01 Dec, 2022 38 commits

Merge branch 'fix-rtnl_mutex-deadlock-with-dpaa2-and-sfp-modules' · 9e855b1f

Paolo Abeni authored Dec 01, 2022

Vladimir Oltean says:

====================
Fix rtnl_mutex deadlock with DPAA2 and SFP modules

This patch set deliberately targets net-next and lacks Fixes: tags due
to caution on my part.

While testing some SFP modules on the Solidrun Honeycomb LX2K platform,
I noticed that rebooting causes a deadlock:

============================================
WARNING: possible recursive locking detected
6.1.0-rc5-07010-ga9b9500ffaac-dirty #656 Not tainted
--------------------------------------------
systemd-shutdow/1 is trying to acquire lock:
ffffa62db6cf42f0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x1c/0x30

but task is already holding lock:
ffffa62db6cf42f0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x1c/0x30

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(rtnl_mutex);
  lock(rtnl_mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

6 locks held by systemd-shutdow/1:
 #0: ffffa62db6863c70 (system_transition_mutex){+.+.}-{4:4}, at: __do_sys_reboot+0xd4/0x260
 #1: ffff2f2b0176f100 (&dev->mutex){....}-{4:4}, at: device_shutdown+0xf4/0x260
 #2: ffff2f2b017be900 (&dev->mutex){....}-{4:4}, at: device_shutdown+0x104/0x260
 #3: ffff2f2b017680f0 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x40/0x260
 #4: ffff2f2b0e1608f0 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x40/0x260
 #5: ffffa62db6cf42f0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x1c/0x30

stack backtrace:
CPU: 6 PID: 1 Comm: systemd-shutdow Not tainted 6.1.0-rc5-07010-ga9b9500ffaac-dirty #656
Hardware name: SolidRun LX2160A Honeycomb (DT)
Call trace:
 lock_acquire+0x68/0x84
 __mutex_lock+0x98/0x460
 mutex_lock_nested+0x2c/0x40
 rtnl_lock+0x1c/0x30
 sfp_bus_del_upstream+0x1c/0xac
 phylink_destroy+0x1c/0x50
 dpaa2_mac_disconnect+0x28/0x70
 dpaa2_eth_remove+0x1dc/0x1f0
 fsl_mc_driver_remove+0x24/0x60
 device_remove+0x70/0x80
 device_release_driver_internal+0x1f0/0x260
 device_links_unbind_consumers+0xe0/0x110
 device_release_driver_internal+0x138/0x260
 device_release_driver+0x18/0x24
 bus_remove_device+0x12c/0x13c
 device_del+0x16c/0x424
 fsl_mc_device_remove+0x28/0x40
 __fsl_mc_device_remove+0x10/0x20
 device_for_each_child+0x5c/0xac
 dprc_remove+0x94/0xb4
 fsl_mc_driver_remove+0x24/0x60
 device_remove+0x70/0x80
 device_release_driver_internal+0x1f0/0x260
 device_release_driver+0x18/0x24
 bus_remove_device+0x12c/0x13c
 device_del+0x16c/0x424
 fsl_mc_bus_remove+0x8c/0x10c
 fsl_mc_bus_shutdown+0x10/0x20
 platform_shutdown+0x24/0x3c
 device_shutdown+0x15c/0x260
 kernel_restart+0x40/0xa4
 __do_sys_reboot+0x1e4/0x260
 __arm64_sys_reboot+0x24/0x30

But fixing this appears to be not so simple. The patch set represents my
attempt to address it.

In short, the problem is that dpaa2_mac_connect() and dpaa2_mac_disconnect()
call 2 phylink functions in a row, one takes rtnl_lock() itself -
phylink_create(), and one which requires rtnl_lock() to be held by the
caller - phylink_fwnode_phy_connect(). The existing approach in the
drivers is too simple. We take rtnl_lock() when calling dpaa2_mac_connect(),
which is what results in the deadlock.

Fixing just that creates another problem. The drivers make use of
rtnl_lock() for serializing with other code paths too. I think I've
found all those code paths, and established other mechanisms for
serializing with them.
====================

Link: https://lore.kernel.org/r/20221129141221.872653-1-vladimir.oltean@nxp.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

9e855b1f

net: dpaa2-mac: move rtnl_lock() only around phylink_{,dis}connect_phy() · 87db82cb

Vladimir Oltean authored Nov 29, 2022

After the introduction of a private mac_lock that serializes access to
priv->mac (and port_priv->mac in the switch), the only remaining purpose
of rtnl_lock() is to satisfy the locking requirements of
phylink_fwnode_phy_connect() and phylink_disconnect_phy().

But the functions these live in, dpaa2_mac_connect() and
dpaa2_mac_disconnect(), have contradictory locking requirements.
While phylink_fwnode_phy_connect() wants rtnl_lock() to be held,
phylink_create() wants it to not be held.

Move the rtnl_lock() from top-level (in the dpaa2-eth and dpaa2-switch
drivers) to only surround the phylink calls that require it, in the
dpaa2-mac library code.

This is possible because dpaa2_mac_connect() and dpaa2_mac_disconnect()
run unlocked, and there isn't any danger of an AB/BA deadlock between
the rtnl_mutex and other private locks.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

87db82cb

net: dpaa2-switch: serialize changes to priv->mac with a mutex · 3c7f44fa

Vladimir Oltean authored Nov 29, 2022

The dpaa2-switch driver uses a DPMAC in the same way as the dpaa2-eth
driver, so we need to duplicate the locking solution established by the
previous change to the switch driver as well.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

3c7f44fa

net: dpaa2-eth: serialize changes to priv->mac with a mutex · 2291982e

Vladimir Oltean authored Nov 29, 2022

The dpaa2 architecture permits dynamic connections between objects on
the fsl-mc bus, specifically between a DPNI object (represented by a
struct net_device) and a DPMAC object (represented by a struct phylink).

The DPNI driver is notified when those connections are created/broken
through the dpni_irq0_handler_thread() method. To ensure that ethtool
operations, as well as netdev up/down operations serialize with the
connection/disconnection of the DPNI with a DPMAC,
dpni_irq0_handler_thread() takes the rtnl_lock() to block those other
operations from taking place.

There is code called by dpaa2_mac_connect() which wants to acquire the
rtnl_mutex once again, see phylink_create() -> phylink_register_sfp() ->
sfp_bus_add_upstream() -> rtnl_lock(). So the strategy doesn't quite
work out, even though it's fairly simple.

Create a different strategy, where all code paths in the dpaa2-eth
driver access priv->mac only while they are holding priv->mac_lock.
The phylink instance is not created or connected to the PHY under the
priv->mac_lock, but only assigned to priv->mac then. This will eliminate
the reliance on the rtnl_mutex.

Add lockdep annotations and put comments where holding the lock is not
necessary, and priv->mac can be dereferenced freely.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

2291982e

net: dpaa2-eth: connect to MAC before requesting the "endpoint changed" IRQ · 55f90a4d

Vladimir Oltean authored Nov 29, 2022

dpaa2_eth_connect_mac() is called both from dpaa2_eth_probe() and from
dpni_irq0_handler_thread().

It could happen that the DPNI gets connected to a DPMAC on the fsl-mc
bus exactly during probe, as soon as the "endpoint change" interrupt is
requested in dpaa2_eth_setup_irqs(). This will cause the
dpni_irq0_handler_thread() to register a phylink instance for that DPMAC.

Then, the probing function will also try to register a phylink instance
for the same DPMAC, operation which should fail (and this will fail the
probing of the driver).

Reorder dpaa2_eth_setup_irqs() and dpaa2_eth_connect_mac(), such that
dpni_irq0_handler_thread() never races with the DPMAC-related portion of
the probing path.

Also reorder dpaa2_eth_disconnect_mac() to be in the mirror position of
dpaa2_eth_connect_mac() in the teardown path.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

55f90a4d

net: dpaa2-switch replace direct MAC access with dpaa2_switch_port_has_mac() · bc230671

Vladimir Oltean authored Nov 29, 2022

The helper function will gain a lockdep annotation in a future patch.
Make sure to benefit from it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

bc230671

net: dpaa2: publish MAC stringset to ethtool -S even if MAC is missing · 29811d6e

Vladimir Oltean authored Nov 29, 2022

DPNIs and DPSW objects can connect and disconnect at runtime from DPMAC
objects on the same fsl-mc bus. The DPMAC object also holds "ethtool -S"
unstructured counters. Those counters are only shown for the entity
owning the netdev (DPNI, DPSW) if it's connected to a DPMAC.

The ethtool stringset code path is split into multiple callbacks, but
currently, connecting and disconnecting the DPMAC takes the rtnl_lock().
This blocks the entire ethtool code path from running, see
ethnl_default_doit() -> rtnl_lock() -> ops->prepare_data() ->
strset_prepare_data().

This is going to be a problem if we are going to no longer require
rtnl_lock() when connecting/disconnecting the DPMAC, because the DPMAC
could appear between ops->get_sset_count() and ops->get_strings().
If it appears out of the blue, we will provide a stringset into an array
that was dimensioned thinking the DPMAC wouldn't be there => array
accessed out of bounds.

There isn't really a good way to work around that, and I don't want to
put too much pressure on the ethtool framework by playing locking games.
Just make the DPMAC counters be always available. They'll be zeroes if
the DPNI or DPSW isn't connected to a DPMAC.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

29811d6e

net: dpaa2-switch: assign port_priv->mac after dpaa2_mac_connect() call · 88d64367

Vladimir Oltean authored Nov 29, 2022

The dpaa2-switch has the exact same locking requirements when connected
to a DPMAC, so it needs port_priv->mac to always point either to NULL,
or to a DPMAC with a fully initialized phylink instance.

Make the same preparatory change in the dpaa2-switch driver as in the
dpaa2-eth one.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

88d64367

net: dpaa2-eth: assign priv->mac after dpaa2_mac_connect() call · 02d61948

Vladimir Oltean authored Nov 29, 2022

There are 2 requirements for correct code:

- Any time the driver accesses the priv->mac pointer at runtime, it
  either holds NULL to indicate a DPNI-DPNI connection (or unconnected
  DPNI), or a struct dpaa2_mac whose phylink instance was fully
  initialized (created and connected to the PHY). No changes are made to
  priv->mac while it is being used. Currently, rtnl_lock() watches over
  the call to dpaa2_eth_connect_mac(), so it serves the purpose of
  serializing this with all readers of priv->mac.

- dpaa2_mac_connect() should run unlocked, because inside it are 2
  phylink calls with incompatible locking requirements: phylink_create()
  requires that the rtnl_mutex isn't held, and phylink_fwnode_phy_connect()
  requires that the rtnl_mutex is held. The only way to solve those
  contradictory requirements is to let dpaa2_mac_connect() take
  rtnl_lock() when it needs to.

To solve both requirements, we need to identify the writer side of the
priv->mac pointer, which can be wrapped in a mutex private to the driver
in a future patch. The dpaa2_mac_connect() cannot be part of the writer
side critical section, because of an AB/BA deadlock with rtnl_lock().

So the strategy needs to be that where we prepare the DPMAC by calling
dpaa2_mac_connect(), and only make priv->mac point to it once it's fully
prepared. This ensures that the writer side critical section has the
absolute minimum surface it can.

The reverse strategy is adopted in the dpaa2_eth_disconnect_mac() code
path. This makes sure that priv->mac is NULL when we start tearing down
the DPMAC that we disconnected from, and concurrent code will simply not
see it.

No locking changes in this patch (concurrent code is still blocked by
the rtnl_mutex).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

02d61948

net: dpaa2-mac: remove defensive check in dpaa2_mac_disconnect() · ccbd7822

Vladimir Oltean authored Nov 29, 2022

dpaa2_mac_disconnect() will only be called with a NULL mac->phylink if
dpaa2_mac_connect() failed, or was never called.

The callers are these:

dpaa2_eth_disconnect_mac():

	if (dpaa2_eth_is_type_phy(priv))
		dpaa2_mac_disconnect(priv->mac);

dpaa2_switch_port_disconnect_mac():

	if (dpaa2_switch_port_is_type_phy(port_priv))
		dpaa2_mac_disconnect(port_priv->mac);

priv->mac can be NULL, but in that case, dpaa2_eth_is_type_phy() returns
false, and dpaa2_mac_disconnect() is never called. Similar for
dpaa2-switch.

When priv->mac is non-NULL, it means that dpaa2_mac_connect() returned
zero (success), and therefore, priv->mac->phylink is also a valid
pointer.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ccbd7822

net: dpaa2-mac: absorb phylink_start() call into dpaa2_mac_start() · 38533388

Vladimir Oltean authored Nov 29, 2022

The phylink handling is intended to be hidden inside the dpaa2_mac
object. Move the phylink_start() call into dpaa2_mac_start(), and
phylink_stop() into dpaa2_mac_stop().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

38533388

net: dpaa2: replace dpaa2_mac_is_type_fixed() with dpaa2_mac_is_type_phy() · 320fefa9

Vladimir Oltean authored Nov 29, 2022

dpaa2_mac_is_type_fixed() is a header with no implementation and no
callers, which is referenced from the documentation though. It can be
deleted.

On the other hand, it would be useful to reuse the code between
dpaa2_eth_is_type_phy() and dpaa2_switch_port_is_type_phy(). That common
code should be called dpaa2_mac_is_type_phy(), so let's create that.

The removal and the addition are merged into the same patch because,
in fact, is_type_phy() is the logical opposite of is_type_fixed().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

320fefa9

net: dpaa2-eth: don't use -ENOTSUPP error code · 91c71bf1

Vladimir Oltean authored Nov 29, 2022

dpaa2_eth_setup_dpni() is called from the probe path and
dpaa2_eth_set_link_ksettings() is propagated to user space.

include/linux/errno.h says that ENOTSUPP is "Defined for the NFSv3
protocol". Conventional wisdom has it to not use it in networking
drivers. Replace it with -EOPNOTSUPP.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

91c71bf1

net: microchip: sparx5: Fix error handling in vcap_show_admin() · 682f560b

Dan Carpenter authored Nov 29, 2022

If vcap_dup_rule() fails that leads to an error pointer dereference
side the call to vcap_free_rule().  Also it only returns an error if the
very last call to vcap_read_rule() fails and it returns success for
other errors.

I've changed it to just stop printing after the first error and return
an error code.

Fixes: 3a792156 ("net: microchip: sparx5: Add VCAP rule debugFS support for the VCAP API")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Link: https://lore.kernel.org/r/Y4XUUx9kzurBN+BV@kiliSigned-off-by: Paolo Abeni <pabeni@redhat.com>

682f560b

bonding: uninitialized variable in bond_miimon_inspect() · e5214f36

Dan Carpenter authored Nov 28, 2022

The "ignore_updelay" variable needs to be initialized to false.

Fixes: f8a65ab2 ("bonding: fix link recovery in mode 2 when updelay is nonzero")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/Y4SWJlh3ohJ6EPTL@kiliSigned-off-by: Paolo Abeni <pabeni@redhat.com>

e5214f36

Merge tag 'mlx5-updates-2022-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 46115b27

Jakub Kicinski authored Nov 30, 2022

Saeed Mahameed says:

====================
mlx5-updates-2022-11-29

Misc update for mlx5 driver

1) Various trivial cleanups

2) Maor Dickman, Adds support for trap offload with additional actions

3) From Tariq, UMR (device memory registrations) cleanups,
   UMR WQE must be aligned to 64B per device spec, (not a bug fix).

* tag 'mlx5-updates-2022-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: Support devlink reload of IPsec core
  net/mlx5e: TC, Add offload support for trap with additional actions
  net/mlx5e: Do early return when setup vports dests for slow path flow
  net/mlx5: Remove redundant check
  net/mlx5e: Delete always true DMA check
  net/mlx5e: Don't access directly DMA device pointer
  net/mlx5e: Don't use termination table when redundant
  net/mlx5: Fix orthography errors in documentation
  net/mlx5: Use generic definition for UMR KLM alignment
  net/mlx5: Generalize name of UMR alignment definition
  net/mlx5: Remove unused UMR MTT definitions
  net/mlx5e: Add padding when needed in UMR WQEs
  net/mlx5: Remove unused ctx variables
  net/mlx5e: Replace zero-length arrays with DECLARE_FLEX_ARRAY() helper
  net/mlx5e: Remove unneeded io-mapping.h #include
====================

Link: https://lore.kernel.org/r/20221130051152.479480-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

46115b27

net: phy: Add link between phy dev and mac dev · bc66fa87

Xiaolei Wang authored Nov 30, 2022

If the external phy used by current mac interface is
managed by another mac interface, it means that this
network port cannot work independently, especially
when the system suspends and resumes, the following
trace may appear, so we should create a device link
between phy dev and mac dev.

  WARNING: CPU: 0 PID: 24 at drivers/net/phy/phy.c:983 phy_error+0x20/0x68
  Modules linked in:
  CPU: 0 PID: 24 Comm: kworker/0:2 Not tainted 6.1.0-rc3-00011-g5aaef24b-dirty #34
  Hardware name: Freescale i.MX6 SoloX (Device Tree)
  Workqueue: events_power_efficient phy_state_machine
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x68/0x90
  dump_stack_lvl from __warn+0xb4/0x24c
  __warn from warn_slowpath_fmt+0x5c/0xd8
  warn_slowpath_fmt from phy_error+0x20/0x68
  phy_error from phy_state_machine+0x22c/0x23c
  phy_state_machine from process_one_work+0x288/0x744
  process_one_work from worker_thread+0x3c/0x500
  worker_thread from kthread+0xf0/0x114
  kthread from ret_from_fork+0x14/0x28
  Exception stack(0xf0951fb0 to 0xf0951ff8)
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20221130021216.1052230-1-xiaolei.wang@windriver.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bc66fa87

Merge branch 'net-devlink-return-the-driver-name-in-devlink_nl_info_fill' · 281f8203

Jakub Kicinski authored Nov 30, 2022

Vincent Mailhol says:

====================
net: devlink: return the driver name in devlink_nl_info_fill

The driver name is available in device_driver::name. Right now,
drivers still have to report this piece of information themselves in
their devlink_ops::info_get callback function.

The goal of this series is to have the devlink core to report this
information instead of the drivers.

The first patch fulfills the actual goal of this series: modify
devlink core to report the driver name and clean-up all drivers. Both
have to be done in an atomic change to avoid attribute
duplication. This same patch also removes the
devlink_info_driver_name_put() function to prevent future drivers from
reporting the driver name themselves.

The second patch allows the core to call devlink_nl_info_fill() even
if the devlink_ops::info_get() callback is NULL. This leads to the
third and final patch which cleans up the drivers which have an empty
info_get().
====================

Link: https://lore.kernel.org/r/20221129095140.3913303-1-mailhol.vincent@wanadoo.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>

281f8203

net: devlink: clean-up empty devlink_ops::info_get() · cf4590b9

Vincent Mailhol authored Nov 29, 2022

devlink_ops::info_get() is now optional and devlink will continue to
report information even if that callback gets removed.

Remove all the empty devlink_ops::info_get() callbacks from the
drivers.
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Jacob Keller  <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

cf4590b9

net: devlink: make the devlink_ops::info_get() callback optional · c5cd7c86

Vincent Mailhol authored Nov 29, 2022

Some drivers only reported the driver name in their
devlink_ops::info_get() callback. Now that the core provides this
information, the callback became empty. For such drivers, just
removing the callback would prevent the core from executing
devlink_nl_info_fill() meaning that "devlink dev info" would not
return anything.

Make the callback function optional by executing
devlink_nl_info_fill() even if devlink_ops::info_get() is NULL.

N.B.: the drivers with devlink support which previously did not
implement devlink_ops::info_get() will now also be able to report
the driver name.
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Jacob Keller  <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c5cd7c86

net: devlink: let the core report the driver name instead of the drivers · 226bf980

Vincent Mailhol authored Nov 29, 2022

The driver name is available in device_driver::name. Right now,
drivers still have to report this piece of information themselves in
their devlink_ops::info_get callback function.

In order to factorize code, make devlink_nl_info_fill() add the driver
name attribute.

Now that the core sets the driver name attribute, drivers are not
supposed to call devlink_info_driver_name_put() anymore. Remove
devlink_info_driver_name_put() and clean-up all the drivers using this
function in their callback.
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Tested-by: Ido Schimmel <idosch@nvidia.com> # mlxsw
Reviewed-by: Jacob Keller  <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

226bf980

dt-bindings: nfc: nxp,nci: Document NQ310 compatible · a933e7f0

Luca Weiss authored Nov 28, 2022

The NQ310 is another NFC chip from NXP, document the compatible in the
bindings.
Signed-off-by: Luca Weiss <luca@z3ntu.xyz>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221128173744.833018-1-luca@z3ntu.xyzSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a933e7f0

Merge branch 'support-direct-read-from-region' · d8621762

Jakub Kicinski authored Nov 30, 2022

Jacob Keller says:

====================
support direct read from region

A long time ago when initially implementing devlink regions in ice I
proposed the ability to allow reading from a region without taking a
snapshot [1]. I eventually dropped this work from the original series due to
size. Then I eventually lost track of submitting this follow up.

This can be useful when interacting with some region that has some
definitive "contents" from which snapshots are made. For example the ice
driver has regions representing the contents of the device flash.

If userspace wants to read the contents today, it must first take a snapshot
and then read from that snapshot. This makes sense if you want to read a
large portion of data or you want to be sure reads are consistently from the
same recording of the flash.

However if user space only wants to read a small chunk, it must first
generate a snapshot of the entire contents, perform a read from the
snapshot, and then delete the snapshot after reading.

For such a use case, a direct read from the region makes more sense. This
can be achieved by allowing the devlink region read command to work without
a snapshot. Instead the portion to be read can be forwarded directly to the
driver via a new .read callback.

This avoids the need to read the entire region contents into memory first
and avoids the software overhead of creating a snapshot and then deleting
it.

This series implements such behavior and hooks up the ice NVM and shadow RAM
regions to allow it.

[1] https://lore.kernel.org/netdev/20200130225913.1671982-1-jacob.e.keller@intel.com/
====================

Link: https://lore.kernel.org/r/20221128203647.1198669-1-jacob.e.keller@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d8621762

ice: implement direct read for NVM and Shadow RAM regions · 3af4b40b

Jacob Keller authored Nov 28, 2022

Implement the .read handler for the NVM and Shadow RAM regions. This
enables user space to read a small chunk of the flash without needing the
overhead of creating a full snapshot.

Update the documentation for ice to detail which regions have direct read
support.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

3af4b40b

ice: document 'shadow-ram' devlink region · 2d019784

Jacob Keller authored Nov 28, 2022

78ad87da ("ice: devlink: add shadow-ram region to snapshot Shadow RAM")
added support for the 'shadow-ram' devlink region, but did not document it
in the ice devlink documentation. Fix this.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2d019784

ice: use same function to snapshot both NVM and Shadow RAM · ed23debe

Jacob Keller authored Nov 28, 2022

The ice driver supports a region for both the flat NVM contents as well as
the Shadow RAM which is a layer built on top of the flash during device
initialization.

These regions use an almost identical read function, except that the NVM
needs to set the direct flag when reading, while Shadow RAM needs to read
without the direct flag set. They each call ice_read_flat_nvm with the only
difference being whether to set the direct flash flag.

The NVM region read function also was fixed to read the NVM in blocks to
avoid a situation where the firmware reclaims the lock due to taking too
long.

Note that the region snapshot function takes the ops pointer so the
function can easily determine which region to read. Make use of this and
re-use the NVM snapshot function for both the NVM and Shadow RAM regions.
This makes Shadow RAM benefit from the same block approach as the NVM
region. It also reduces code in the ice driver.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ed23debe

devlink: support directly reading from region memory · af6397c9

Jacob Keller authored Nov 28, 2022

To read from a region, user space must currently request a new snapshot of
the region and then read from that snapshot. This can sometimes be overkill
if user space only reads a tiny portion. They first create the snapshot,
then request a read, then destroy the snapshot.

For regions which have a single underlying "contents", it makes sense to
allow supporting direct reading of the region data.

Extend the DEVLINK_CMD_REGION_READ to allow direct reading from a region if
requested via the new DEVLINK_ATTR_REGION_DIRECT. If this attribute is set,
then perform a direct read instead of using a snapshot. Direct read is
mutually exclusive with DEVLINK_ATTR_REGION_SNAPSHOT_ID, and care is taken
to ensure that we reject commands which provide incorrect attributes.

Regions must enable support for direct read by implementing the .read()
callback function. If a region does not support such direct reads, a
suitable extended error message is reported.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af6397c9

devlink: refactor region_read_snapshot_fill to use a callback function · 2d4caf09

Jacob Keller authored Nov 28, 2022

The devlink_nl_region_read_snapshot_fill is used to copy the contents of
a snapshot into a message for reporting to userspace via the
DEVLINK_CMG_REGION_READ netlink message.

A future change is going to add support for directly reading from
a region. Almost all of the logic for this new capability is identical.

To help reduce code duplication and make this logic more generic,
refactor the function to take a cb and cb_priv pointer for doing the
actual copy.

Add a devlink_region_snapshot_fill implementation that will simply copy
the relevant chunk of the region. This does require allocating some
storage for the chunk as opposed to simply passing the correct address
forward to the devlink_nl_cmg_region_read_chunk_fill function.

A future change to implement support for directly reading from a region
without a snapshot will provide a separate implementation that calls the
newly added devlink region operation.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2d4caf09

devlink: remove unnecessary parameter from chunk_fill function · 284e9d1e

Jacob Keller authored Nov 28, 2022

The devlink parameter of the devlink_nl_cmd_region_read_chunk_fill
function is not used. Remove it, to simplify the function signature.

Once removed, it is also obvious that the devlink parameter is not
necessary for the devlink_nl_region_read_snapshot_fill either.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

284e9d1e

devlink: find snapshot in devlink_nl_cmd_region_read_dumpit · e004ea10

Jacob Keller authored Nov 28, 2022

The snapshot pointer is obtained inside of the function
devlink_nl_region_read_snapshot_fill. Simplify this function by locating
the snapshot upfront in devlink_nl_cmd_region_read_dumpit instead. This
aligns with how other netlink attributes are handled, and allows us to
exit slightly earlier if an invalid snapshot ID is provided.

It also allows us to pass the snapshot pointer directly to the
devlink_nl_region_read_snapshot_fill, and remove the now unused attrs
parameter.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e004ea10

devlink: report extended error message in region_read_dumpit() · 611fd12c

Jacob Keller authored Nov 28, 2022

Report extended error details in the devlink_nl_cmd_region_read_dumpit()
function, by using the extack structure from the netlink_callback.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

611fd12c

devlink: use min_t to calculate data_size · 28e0c250

Jacob Keller authored Nov 28, 2022

The calculation for the data_size in the devlink_nl_read_snapshot_fill
function uses an if statement that is better expressed using the min_t
macro.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

28e0c250

net: microchip: vcap: Change how the rule id is generated · c1d8e3fb

Horatiu Vultur authored Nov 28, 2022

Currently whenever a new rule id is generated, it picks up the next
number bigger than previous id. So it would always be 1, 2, 3, etc.
When the rule with id 1 will be deleted and a new rule will be added,
it will have the id 4 and not id 1.
In theory this can be a problem if at some point a rule will be added
and removed ~0 times. Then no more rules can be added because there
are no more ids.

Change this such that when a new rule is added, search for an empty
rule id starting with value of 1 as value 0 is reserved.

Fixes: c9da1ac1 ("net: microchip: sparx5: Adding initial tc flower support for VCAP API")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20221128142959.8325-1-horatiu.vultur@microchip.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c1d8e3fb

octeontx2-af: Simplify a size computation in rvu_npc_exact_init() · 450f0650

Christophe JAILLET authored Nov 25, 2022

We know that table_size = table->mem_table.depth * table->mem_table.ways,
so use it instead, it is less verbose.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/5230dabe27f48948a9fd0f50a62e2437b65e6a6e.1669378798.git.christophe.jaillet@wanadoo.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>

450f0650

octeontx2-af: Fix the size of memory allocated for the 'id_bmap' bitmap · 6d135d9e

Christophe JAILLET authored Nov 25, 2022

This allocation is really spurious.
The size of the bitmap is 'tot_ids' and it is used as such in the driver.

So we could expect something like:
   table->id_bmap = devm_kcalloc(rvu->dev, BITS_TO_LONGS(table->tot_ids),
			         sizeof(long), GFP_KERNEL);

However, when the bitmap is allocated, we allocate:
   BITS_TO_LONGS(table->tot_ids) * table->tot_ids ~=
   table->tot_ids / 32 * table->tot_ids ~=
   table->tot_ids^2 / 32

It is proportional to the square of 'table->tot_ids' which seems to
potentially be big.

Allocate the expected amount of memory, and switch to the bitmap API to
have it more straightforward.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/ce2710771939065d68f95d86a27cf7cea7966365.1669378798.git.christophe.jaillet@wanadoo.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6d135d9e

octeontx2-af: Use the bitmap API to allocate bitmaps · 05a7b52e

Christophe JAILLET authored Nov 25, 2022

Use devm_bitmap_zalloc() instead of hand-writing it.

This also makes the comment "Allocate bitmap for 32 entry mcam" more
explicit because now 32 is really used in the allocation function, instead
of an obscure 'sizeof(long)'.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/24177a9ee7043259448b735263d9cfd6a70e89a4.1669378798.git.christophe.jaillet@wanadoo.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>

05a7b52e

octeontx2-af: Slightly simplify rvu_npc_exact_init() · b6a0ecae

Christophe JAILLET authored Nov 25, 2022

Use kzalloc() instead of kmalloc()/memset().
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/60ea220ccf3b61963f7d5a97e3df2c76a5feb837.1669378798.git.christophe.jaillet@wanadoo.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b6a0ecae

octeontx2-af: Fix a potentially spurious error message · 2450d7d9

Christophe JAILLET authored Nov 25, 2022

When this error message is displayed, we know that the all the bits in the
bitmap are set.

So, bitmap_weight() will return the number of bits of the bitmap, which is
'table->tot_ids'.

It is unlikely that a bit will be cleared between mutex_unlock() and
dev_err(), but, in order to simplify the code and avoid this possibility,
just take 'table->tot_ids'.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/5ce01c402f86412dc57884ff0994b63f0c5b3871.1669378798.git.christophe.jaillet@wanadoo.frSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2450d7d9

30 Nov, 2022 2 commits

selftests/net: add csum offload test · 91a7de85

Willem de Bruijn authored Nov 28, 2022

Test NIC hardware checksum offload:

- Rx + Tx
- IPv4 + IPv6
- TCP + UDP

Optional features:

- zero checksum 0xFFFF
- checksum disable 0x0000
- transport encap headers
- randomization

See file header for detailed comments.

Expected results differ depending on NIC features:

- CHECKSUM_UNNECESSARY vs CHECKSUM_COMPLETE
- NETIF_F_HW_CSUM (csum_start/csum_off) vs NETIF_F_IP(V6)_CSUM
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20221128140210.553391-1-willemdebruijn.kernel@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

91a7de85

net/mlx5e: Support devlink reload of IPsec core · 953d7715

Leon Romanovsky authored Aug 22, 2022

Change IPsec initialization flow to allow future creation of hardware
resources that should be released and allocated during devlink reload
operation. As part of that change, update function signature to be
void as no callers are actually interested in it.
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

953d7715