Commits · cfbad64706c1c9d555fba5dbb545967262bd9f17 · Kirill Smelkov / linux

06 Mar, 2024 21 commits

ravb: Create helper to allocate skb and align it · cfbad647

Niklas Söderlund authored Mar 04, 2024

The EtherAVB device requires the SKB data to be aligned to 128 bytes.
The alignment is done by allocating an skb 128 bytes larger than the
maximum frame size supported by the device and adjusting the headroom to
fit the requirement.

This code has been refactored a few times and small issues have been
added along the way. The issues are not harmful but prevent merging
parts of the Rx code which have been split in two implementations with
the addition of RZ/G2L support, a device that supports larger frame
sizes.

This change removes the need for duplicated and somewhat inaccurate
hardware alignment constrains stored in the hardware information struct
by creating a helper to handle the allocation of an skb and alignment of
an skb data.

For the R-Car class of devices the maximum frame size is 4K and each
descriptor is limited to 2K of data. The current implementation does not
support split descriptors, this limits the frame size to 2K. The
current hardware information however records the descriptor size just
under 2K due to bad understanding of the device when larger MTUs where
added.

For the RZ/G2L device the maximum frame size is 8K and each descriptor
is limited to 4K of data. The current hardware information records this
correctly, but it gets the alignment constrains wrong as just aligns it
by 128, it does not extend it by 128 bytes to allow the full frame to be
stored. This works because the RZ/G2L device supports split descriptors
and allocates each skb to 8K and aligns each 4K descriptor in this
space.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Paul Barker <paul.barker.ct@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

cfbad647

ravb: Make it clear the information relates to maximum frame size · e82700b8

Niklas Söderlund authored Mar 04, 2024

The struct member rx_max_buf_size was added before split descriptor
support was added. It is unclear if the value describes the full skb
frame buffer or the data descriptor buffer which can be combined into a
single skb.

Rename it to make it clear it referees to the maximum frame size and can
cover multiple descriptors.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Paul Barker <paul.barker.ct@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

e82700b8

ravb: Group descriptor types used in Rx ring · 4123c3fb

Niklas Söderlund authored Mar 04, 2024

The Rx ring can either be made up of normal or extended descriptors, not
a mix of the two at the same time. Make this explicit by grouping the
two variables in a rx_ring union.

The extension of the storage for more than one queue of normal
descriptors from a single to NUM_RX_QUEUE queues have no practical
effect. But aids in making the code readable as the code that uses it
already piggyback on other members of struct ravb_private that are
arrays of max length NUM_RX_QUEUE, e.g. rx_desc_dma. This will also make
further refactoring easier.

While at it, rename the normal descriptor Rx ring to make it clear it's
not strictly related to the GbEthernet E-MAC IP found in RZ/G2L, normal
descriptors could be used on R-Car SoCs too.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Paul Barker <paul.barker.ct@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

4123c3fb

Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · dbb0b6ca

David S. Miller authored Mar 06, 2024

From: Tony Nguyen <anthony.l.nguyen@intel.com>
To: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
	edumazet@google.com, netdev@vger.kernel.org
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>, alan.brady@intel.com
Tony Nguyen says:

====================
idpf: refactor virtchnl messages

Alan Brady says:

The motivation for this series has two primary goals. We want to enable
support of multiple simultaneous messages and make the channel more
robust. The way it works right now, the driver can only send and receive
a single message at a time and if something goes really wrong, it can
lead to data corruption and strange bugs.

To start the series, we introduce an idpf_virtchnl.h file. This reduces
the burden on idpf.h which is overloaded with struct and function
declarations.

The conversion works by conceptualizing a send and receive as a
"virtchnl transaction" (idpf_vc_xn) and introducing a "transaction
manager" (idpf_vc_xn_manager). The vcxn_mngr will init a ring of
transactions from which the driver will pop from a bitmap of free
transactions to track in-flight messages. Instead of needing to handle a
complicated send/recv for every a message, the driver now just needs to
fill out a xn_params struct and hand it over to idpf_vc_xn_exec which
will take care of all the messy bits. Once a message is sent and
receives a reply, we leverage the completion API to signal the received
buffer is ready to be used (assuming success, or an error code
otherwise).

At a low-level, this implements the "sw cookie" field of the virtchnl
message descriptor to enable this. We have 16 bits we can put whatever
we want and the recipient is required to apply the same cookie to the
reply for that message.  We use the first 8 bits as an index into the
array of transactions to enable fast lookups and we use the second 8
bits as a salt to make sure each cookie is unique for that message. As
transactions are received in arbitrary order, it's possible to reuse a
transaction index and the salt guards against index conflicts to make
certain the lookup is correct. As a primitive example, say index 1 is
used with salt 1. The message times out without receiving a reply so
index 1 is renewed to be ready for a new transaction, we report the
timeout, and send the message again. Since index 1 is free to be used
again now, index 1 is again sent but now salt is 2. This time we do get
a reply, however it could be that the reply is _actually_ for the
previous send index 1 with salt 1.  Without the salt we would have no
way of knowing for sure if it's the correct reply, but with we will know
for certain.

Through this conversion we also get several other benefits. We can now
more appropriately handle asynchronously sent messages by providing
space for a callback to be defined. This notably allows us to handle MAC
filter failures better; previously we could potentially have stale,
failed filters in our list, which shouldn't really have a major impact
but is obviously not correct. I also managed to remove fairly
significant more lines than I added which is a win in my book.

Additionally, this converts some variables to use auto-variables where
appropriate. This makes the alloc paths much cleaner and less prone to
memory leaks. We also fix a few virtchnl related bugs while we're here.

====================
Signed-off-by: David S. Miller <davem@davemloft.net>

dbb0b6ca

Merge branch 'netlink-emsgsize' · 784ee615

David S. Miller authored Mar 06, 2024

Jakub Kicinski says:

====================
netlink: handle EMSGSIZE errors in the core

Ido discovered some time back that we usually force NLMSG_DONE
to be delivered in a separate recv() syscall, even if it would
fit into the same skb as data messages. He made nexthop try
to fit DONE with data in commit 8743aeff ("nexthop: Fix
infinite nexthop bucket dump when using maximum nexthop ID"),
and nobody has complained so far.

We have since also tried to follow the same pattern in new
genetlink families, but explaining to people, or even remembering
the correct handling ourselves is tedious.

Let the netlink socket layer consume -EMSGSIZE errors.
Practically speaking most families use this error code
as "dump needs more space", anyway.

v2:
 - init err to 0 in last patch
v1: https://lore.kernel.org/all/20240301012845.2951053-1-kuba@kernel.org/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

784ee615

genetlink: fit NLMSG_DONE into same read() as families · 87d38197

Jakub Kicinski authored Mar 02, 2024

Make sure ctrl_fill_info() returns sensible error codes and
propagate them out to netlink core. Let netlink core decide
when to return skb->len and when to treat the exit as an
error. Netlink core does better job at it, if we always
return skb->len the core doesn't know when we're done
dumping and NLMSG_DONE ends up in a separate read().
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

87d38197

netdev: let netlink core handle -EMSGSIZE errors · 0b11b1c5

Jakub Kicinski authored Mar 02, 2024

Previous change added -EMSGSIZE handling to af_netlink, we don't
have to hide these errors any longer.

Theoretically the error handling changes from:
 if (err == -EMSGSIZE)
to
 if (err == -EMSGSIZE && skb->len)

everywhere, but in practice it doesn't matter.
All messages fit into NLMSG_GOODSIZE, so overflow of an empty
skb cannot happen.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0b11b1c5

netlink: handle EMSGSIZE errors in the core · b5a89915

Jakub Kicinski authored Mar 02, 2024

Eric points out that our current suggested way of handling
EMSGSIZE errors ((err == -EMSGSIZE) ? skb->len : err) will
break if we didn't fit even a single object into the buffer
provided by the user. This should not happen for well behaved
applications, but we can fix that, and free netlink families
from dealing with that completely by moving error handling
into the core.

Let's assume from now on that all EMSGSIZE errors in dumps are
because we run out of skb space. Families can now propagate
the error nla_put_*() etc generated and not worry about any
return value magic. If some family really wants to send EMSGSIZE
to user space, assuming it generates the same error on the next
dump iteration the skb->len should be 0, and user space should
still see the EMSGSIZE.

This should simplify families and prevent mistakes in return
values which lead to DONE being forced into a separate recv()
call as discovered by Ido some time ago.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b5a89915

selftests: avoid using SKIP(exit()) in harness fixure setup · e3350ba4

Jakub Kicinski authored Mar 04, 2024

selftest harness uses various exit codes to signal test
results. Avoid calling exit() directly, otherwise tests
may get broken by harness refactoring (like the commit
under Fixes). SKIP() will instruct the harness that the
test shouldn't run, it used to not be the case, but that
has been fixed. So just return, no need to exit.

Note that for hmm-tests this actually changes the result
from pass to skip. Which seems fair, the test is skipped,
after all.
Reported-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/all/05f7bf89-04a5-4b65-bf59-c19456aeb1f0@sirena.org.uk
Fixes: a7247079 ("selftests: kselftest_harness: use KSFT_* exit codes")
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240304233621.646054-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e3350ba4

Merge branch 'net-ethernet-rework-eee' · 6d0f77a0

Jakub Kicinski authored Mar 05, 2024

Oleksij Rempel says:

====================
net: ethernet: Rework EEE

with Andrew's permission I'll continue mainlining this patches:
==============================================================

Most MAC drivers get EEE wrong. The API to the PHY is not very
obvious, which is probably why. Rework the API, pushing most of the
EEE handling into phylib core, leaving the MAC drivers to just
enable/disable support for EEE in there change_link call back.

MAC drivers are now expect to indicate to phylib if they support
EEE. This will allow future patches to configure the PHY to advertise
no EEE link modes when EEE is not supported. The information could
also be used to enable SmartEEE if the PHY supports it.

With these changes, the uAPI configuration eee_enable becomes a global
on/off. tx-lpi must also be enabled before EEE is enabled. This fits
the discussion here:

https://lore.kernel.org/netdev/af880ce8-a7b8-138e-1ab9-8c89e662eecf@gmail.com/T/

This patchset puts in place all the infrastructure, and converts one
MAC driver to the new API. Following patchsets will convert other MAC
drivers, extend support into phylink, and when all MAC drivers are
converted to the new scheme, clean up some unneeded code.
====================

Link: https://lore.kernel.org/r/20240302195306.3207716-1-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6d0f77a0

net: fec: Fixup EEE · 6a2495ad

Andrew Lunn authored Mar 02, 2024

The enabling/disabling of EEE in the MAC should happen as a result of
auto negotiation. So move the enable/disable into
fec_enet_adjust_link() which gets called by phylib when there is a
change in link status.

fec_enet_set_eee() now just stores away the LPI timer value.
Everything else is passed to phylib, so it can correctly setup the
PHY.

fec_enet_get_eee() relies on phylib doing most of the work,
the MAC driver just adds the LPI timer value.

Call phy_support_eee() if the quirk is present to indicate the MAC
actually supports EEE.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> (On iMX8MP debix)
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://lore.kernel.org/r/20240302195306.3207716-8-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6a2495ad

net: fec: Move fec_enet_eee_mode_set() and helper earlier · aff1b8c8

Andrew Lunn authored Mar 02, 2024

FEC is about to get its EEE code re-written. To allow this, move
fec_enet_eee_mode_set() before fec_enet_adjust_link() which will
need to call it.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://lore.kernel.org/r/20240302195306.3207716-7-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

aff1b8c8

net: phy: Add phy_support_eee() indicating MAC support EEE · 49168d19

Andrew Lunn authored Mar 02, 2024

In order for EEE to operate, both the MAC and the PHY need to support
it, similar to how pause works. With some exception - a number of PHYs
have SmartEEE or AutoGrEEEn support in order to provide some EEE-like
power savings with non-EEE capable MACs.

Copy the pause concept and add the call phy_support_eee() which the MAC
makes after connecting the PHY to indicate it supports EEE. phylib will
then advertise EEE when auto-neg is performed.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20240302195306.3207716-6-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

49168d19

net: phy: Immediately call adjust_link if only tx_lpi_enabled changes · 3e43b903

Andrew Lunn authored Mar 02, 2024

The MAC driver changes its EEE hardware configuration in its
adjust_link callback. This is called when auto-neg
completes. Disabling EEE via eee_enabled false will trigger an
autoneg, and as a result the adjust_link callback will be called with
phydev->enable_tx_lpi set to false. Similarly, eee_enabled set to true
and with a change of advertised link modes will result in a new
autoneg, and a call the adjust_link call.

If set_eee is called with only a change to tx_lpi_enabled which does
not trigger an auto-neg, it is necessary to call the adjust_link
callback so that the MAC is reconfigured to take this change into
account.

When setting phydev->enable_tx_lpi, take both eee_enabled and
tx_lpi_enabled into account, so the MAC drivers just needs to act on
phydev->enable_tx_lpi and not the whole EEE configuration.
The same check should be done for tx_lpi_timer too.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240302195306.3207716-5-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3e43b903

net: phy: Keep track of EEE configuration · fe0d4fd9

Andrew Lunn authored Mar 02, 2024

Have phylib keep track of the EEE configuration. This simplifies the
MAC drivers, in that they don't need to store it.

Future patches to phylib will also make use of this information to
further simplify the MAC drivers.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20240302195306.3207716-4-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fe0d4fd9

net: phy: Add phydev->enable_tx_lpi to simplify adjust link callbacks · e3b6876a

Andrew Lunn authored Mar 02, 2024

MAC drivers which support EEE need to know the results of the EEE
auto-neg in order to program the hardware to perform EEE or not. The
oddly named phy_init_eee() can be used to determine this, it returns 0
if EEE should be used, or a negative error code,
e.g. -EOPPROTONOTSUPPORT if the PHY does not support EEE or negotiate
resulted in it not being used.

However, many MAC drivers get this wrong. Add phydev->enable_tx_lpi
which indicates the result of the autoneg for EEE, including if EEE is
administratively disabled with ethtool. The MAC driver can then access
this in the same way as link speed and duplex in the adjust link
callback. If enable_tx_lpi is true, the MAC should send low power
indications and does not need to consider anything else with respect
to EEE.
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20240302195306.3207716-3-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e3b6876a

net: add helpers for EEE configuration · 6f2fc858

Russell King authored Mar 02, 2024

Add helpers that phylib and phylink can use to manage EEE configuration
and determine whether the MAC should be permitted to use LPI based on
that configuration.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20240302195306.3207716-2-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6f2fc858

ethtool: ignore unused/unreliable fields in set_eee op · 344f7a46

Heiner Kallweit authored Mar 02, 2024

This function is used with the set_eee() ethtool operation. Certain
fields of struct ethtool_keee() are relevant only for the get_eee()
operation. In addition, in case of the ioctl interface, we have no
guarantee that userspace sends sane values in struct ethtool_eee.
Therefore explicitly ignore all fields not needed for set_eee().
This protects from drivers trying to use unchecked and unreliable
data, relying on specific userspace behavior.

Note: Such unsafe driver behavior has been found and fixed in the
tg3 driver.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/ad7ee11e-eb7a-4975-9122-547e13a161d8@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

344f7a46

sock: Use unsafe_memcpy() for sock_copy() · ff73f834

Kees Cook authored Mar 04, 2024

While testing for places where zero-sized destinations were still showing
up in the kernel, sock_copy() and inet_reqsk_clone() were found, which
are using very specific memcpy() offsets for both avoiding a portion of
struct sock, and copying beyond the end of it (since struct sock is really
just a common header before the protocol-specific allocation). Instead
of trying to unravel this historical lack of container_of(), just switch
to unsafe_memcpy(), since that's effectively what was happening already
(memcpy() wasn't checking 0-sized destinations while the code base was
being converted away from fake flexible arrays).

Avoid the following false positive warning with future changes to
CONFIG_FORTIFY_SOURCE:

  memcpy: detected field-spanning write (size 3068) of destination "&nsk->__sk_common.skc_dontcopy_end" at net/core/sock.c:2057 (size 0)
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240304212928.make.772-kees@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ff73f834

net: tap: Remove generic .ndo_get_stats64 · 4166204d

Breno Leitao authored Mar 04, 2024

Commit 3e2f544d ("net: get stats64 if device if driver is
configured") moved the callback to dev_get_tstats64() to net core, so,
unless the driver is doing some custom stats collection, it does not
need to set .ndo_get_stats64.

Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it
doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64
function pointer.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240304183810.1474883-2-leitao@debian.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4166204d

net: tuntap: Leverage core stats allocator · 46f480ec

Breno Leitao authored Mar 04, 2024

With commit 34d21de9 ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core
instead of in this driver.

With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.

Remove the allocation in the tun/tap driver and leverage the network
core allocation instead.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240304183810.1474883-1-leitao@debian.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

46f480ec

05 Mar, 2024 19 commits

ptp: fc3: Convert to platform remove callback returning void · 60d06425

Uwe Kleine-König authored Mar 04, 2024

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240304091325.717546-2-u.kleine-koenig@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

60d06425

Merge branch 'net-constify-struct-class-usage' · 73a42f41

Jakub Kicinski authored Mar 05, 2024

Ricardo B. Marliere says:

====================
net: constify struct class usage

This is a simple and straight forward cleanup series that aims to make the
class structures in net constant. This has been possible since 2023 [1].

[1]: https://lore.kernel.org/all/2023040248-customary-release-4aec@gregkh/
====================

Link: https://lore.kernel.org/r/20240302-class_cleanup-net-next-v1-0-8fa378595b93@marliere.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

73a42f41

nfc: core: make nfc_class constant · e5560011

Ricardo B. Marliere authored Mar 02, 2024

Since commit 43a7206b ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the nfc_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240302-class_cleanup-net-next-v1-6-8fa378595b93@marliere.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e5560011

net: wwan: core: make wwan_class constant · 070bef83

Ricardo B. Marliere authored Mar 02, 2024

Since commit 43a7206b ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the wwan_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Link: https://lore.kernel.org/r/20240302-class_cleanup-net-next-v1-5-8fa378595b93@marliere.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

070bef83

net: wwan: hwsim: make wwan_hwsim_class constant · d9567f21

Ricardo B. Marliere authored Mar 02, 2024

Since commit 43a7206b ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the wwan_hwsim_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Link: https://lore.kernel.org/r/20240302-class_cleanup-net-next-v1-4-8fa378595b93@marliere.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d9567f21

net: ppp: make ppp_class constant · 2ad2018a

Ricardo B. Marliere authored Mar 02, 2024

Since commit 43a7206b ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the ppp_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240302-class_cleanup-net-next-v1-3-8fa378595b93@marliere.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2ad2018a

net: wan: framer: make framer_class constant · 63767a76

Ricardo B. Marliere authored Mar 02, 2024

Since commit 43a7206b ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the framer_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Herve Codina <herve.codina@bootlin.com>
Link: https://lore.kernel.org/r/20240302-class_cleanup-net-next-v1-2-8fa378595b93@marliere.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

63767a76

net: hns: make hnae_class constant · b6e3c115

Ricardo B. Marliere authored Mar 02, 2024

Since commit 43a7206b ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the hnae_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240302-class_cleanup-net-next-v1-1-8fa378595b93@marliere.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b6e3c115

Merge branch 'net-phy-micrel-lan8814-erratas' · e8efd372

Jakub Kicinski authored Mar 05, 2024

Horatiu Vultur says:

====================
net: phy: micrel: lan8814 erratas

Add two erratas for lan8814. First one fix the led which might
stay on even that there is no link. The second one improves increases
length of the cable that can be used when used in 1000Base-T.
====================

Link: https://lore.kernel.org/r/20240304091548.1386022-1-horatiu.vultur@microchip.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e8efd372

net: phy: micrel: lan8814 cable improvement errata · ad080db4

Horatiu Vultur authored Mar 04, 2024

When the length of the cable is more than 100m and the lan8814 is
configured to run in 1000Base-T Slave then the register of the device
needs to be optimized.

Workaround this by setting the measure time to a value of 0xb. This
value can be set regardless of the configuration.

This issue is described in 'LAN8814 Silicon Errata and Data Sheet
Clarification' and according to that, this will not be corrected in a
future silicon revision.
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Link: https://lore.kernel.org/r/20240304091548.1386022-3-horatiu.vultur@microchip.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ad080db4

net: phy: micrel: lan8814 led errata · e9097f8e

Horatiu Vultur authored Mar 04, 2024

Lan8814 phy led behavior is not correct. It was noticed that the led
still remains ON when the cable is unplugged while there was traffic
passing at that time.

The fix consists in clearing bit 10 of register 0x38, in this way the
led behaviour is correct and gets OFF when there is no link.
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240304091548.1386022-2-horatiu.vultur@microchip.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e9097f8e

Merge branch 'selftests-forwarding-various-improvements' · 5ad05123

Jakub Kicinski authored Mar 05, 2024

Ido Schimmel says:

====================
selftests: forwarding: Various improvements

This patchset speeds up the multipath tests (patches #1-#2) and makes
other tests more stable (patches #3-#6) so that they will not randomly
fail in the netdev CI.

On my system, after applying the first two patches, the run time of
gre_multipath_nh_res.sh is reduced by over 90%.
====================

Link: https://lore.kernel.org/r/20240304095612.462900-1-idosch@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5ad05123

selftests: forwarding: Make {, ip6}gre-inner-v6-multipath tests more robust · 35df2ce8

Ido Schimmel authored Mar 04, 2024

These tests generate various IPv6 flows, encapsulate them in GRE packets
and check that the encapsulated packets are distributed between the
available nexthops according to the configured weights.

Unlike the corresponding IPv4 tests, these tests sometimes fail in the
netdev CI because of large discrepancies between the expected and
measured ratios [1]. This can be explained by the fact that the IPv4
tests generate about 3,600 different flows whereas the IPv6 tests only
generate about 784 different flows (potentially by mistake).

Fix by aligning the IPv6 tests to the IPv4 ones and increase the number
of generated flows.

[1]
 [...]
 # TEST: ping                                                          [ OK ]
 # INFO: Running IPv6 over GRE over IPv4 multipath tests
 # TEST: ECMP                                                          [FAIL]
 # Too large discrepancy between expected and measured ratios
 # INFO: Expected ratio 1.00 Measured ratio 1.18
 [...]
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20240304095612.462900-7-idosch@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

35df2ce8

selftests: forwarding: Make VXLAN ECN encap tests more robust · f0008b04

Ido Schimmel authored Mar 04, 2024

These tests sometimes fail on the netdev CI because the expected number
of packets is larger than expected [1].

Make the tests more robust by specifically matching on VXLAN
encapsulated packets and allowing up to five stray packets instead of
just two.

[1]
 [...]
 # TEST: VXLAN: ECN encap: 0x00->0x00                                  [FAIL]
 # v1: Expected to capture 10 packets, got 13.
 # TEST: VXLAN: ECN encap: 0x01->0x01                                  [ OK ]
 # TEST: VXLAN: ECN encap: 0x02->0x02                                  [ OK ]
 # TEST: VXLAN: ECN encap: 0x03->0x02                                  [ OK ]
 [...]
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20240304095612.462900-6-idosch@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f0008b04

selftests: forwarding: Make vxlan-bridge-1q pass on debug kernels · dfbab740

Ido Schimmel authored Mar 04, 2024

The ageing time used by the test is too short for debug kernels and
results in entries being aged out prematurely [1].

Fix by increasing the ageing time.

[1]
 # ./vxlan_bridge_1q.sh
 [...]
 INFO: learning vlan 10
 TEST: VXLAN: flood before learning                                  [ OK ]
 TEST: VXLAN: show learned FDB entry                                 [ OK ]
 TEST: VXLAN: learned FDB entry                                      [FAIL]
         swp4: Expected to capture 0 packets, got 10.
 RTNETLINK answers: No such file or directory
 TEST: VXLAN: deletion of learned FDB entry                          [ OK ]
 TEST: VXLAN: Ageing of learned FDB entry                            [FAIL]
         swp4: Expected to capture 0 packets, got 10.
 TEST: VXLAN: learning toggling on bridge port                       [ OK ]
 [...]
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20240304095612.462900-5-idosch@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

dfbab740

selftests: forwarding: Make tc-police pass on debug kernels · 4aca9eae

Ido Schimmel authored Mar 04, 2024

The test configures a policer with a rate of 80Mbps and expects to
measure a rate close to it. This is a too high rate for debug kernels,
causing the test to fail [1].

Fix by reducing the rate to 10Mbps.

[1]
 # ./tc_police.sh
 TEST: police on rx                                                  [FAIL]
         Expected rate 76.2Mbps, got 29.6Mbps, which is -61% off. Required accuracy is +-10%.
 TEST: police on tx                                                  [FAIL]
         Expected rate 76.2Mbps, got 30.4Mbps, which is -60% off. Required accuracy is +-10%.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20240304095612.462900-4-idosch@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4aca9eae

selftests: forwarding: Parametrize mausezahn delay · 748d2744

Ido Schimmel authored Mar 04, 2024

The various multipath tests use mausezahn to generate different flows
and check how they are distributed between the available nexthops. The
tool is currently invoked with an hard coded transmission delay of 1 ms.
This is unnecessary when the tests are run with veth pairs and
needlessly prolongs the tests.

Parametrize this delay and default it to 0 us. It can be overridden
using the forwarding.config file. On my system, this reduces the run
time of router_multipath.sh by 93%.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20240304095612.462900-3-idosch@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

748d2744

selftests: forwarding: Remove IPv6 L3 multipath hash tests · 7b2d64f9

Ido Schimmel authored Mar 04, 2024

The multipath tests currently test both the L3 and L4 multipath hash
policies for IPv6, but only the L4 policy for IPv4. The reason is mostly
historic: When the initial multipath test was added
(router_multipath.sh) the IPv6 L4 policy did not exist and was later
added to the test. The other multipath tests copied this pattern
although there is little value in testing both policies.

Align the IPv4 and IPv6 tests and only test the L4 policy. On my system,
this reduces the run time of router_multipath.sh by 89% because of the
repeated ping6 invocations to randomize the flow label.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20240304095612.462900-2-idosch@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7b2d64f9

net/smc: reduce rtnl pressure in smc_pnet_create_pnetids_list() · 00af2aa9

Eric Dumazet authored Mar 02, 2024

Many syzbot reports show extreme rtnl pressure, and many of them hint
that smc acquires rtnl in netns creation for no good reason [1]

This patch returns early from smc_pnet_net_init()
if there is no netdevice yet.

I am not even sure why smc_pnet_create_pnetids_list() even exists,
because smc_pnet_netdev_event() is also calling
smc_pnet_add_base_pnetid() when handling NETDEV_UP event.

[1] extract of typical syzbot reports

2 locks held by syz-executor.3/12252:
  #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491
  #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline]
  #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878
2 locks held by syz-executor.4/12253:
  #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491
  #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline]
  #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878
2 locks held by syz-executor.1/12257:
  #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491
  #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline]
  #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878
2 locks held by syz-executor.2/12261:
  #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491
  #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline]
  #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878
2 locks held by syz-executor.0/12265:
  #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491
  #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline]
  #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878
2 locks held by syz-executor.3/12268:
  #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491
  #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline]
  #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878
2 locks held by syz-executor.4/12271:
  #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491
  #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline]
  #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878
2 locks held by syz-executor.1/12274:
  #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491
  #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline]
  #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878
2 locks held by syz-executor.2/12280:
  #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491
  #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline]
  #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wenjia Zhang <wenjia@linux.ibm.com>
Cc: Jan Karcher <jaka@linux.ibm.com>
Cc: "D. Wythe" <alibuda@linux.alibaba.com>
Cc: Tony Lu <tonylu@linux.alibaba.com>
Cc: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://lore.kernel.org/r/20240302100744.3868021-1-edumazet@google.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

00af2aa9