Commits · 73f49f8cc1fef29c062bf92b1ea24047cf67bee4 · Kirill Smelkov / linux

12 Jun, 2023 40 commits

Merge branch 'tcp-tx-headless' · 73f49f8c

David S. Miller authored Jun 12, 2023

Eric Dumazet says:

====================
tcp: tx path fully headless

This series completes transition of TCP stack tx path
to headless packets : All payload now reside in page frags,
never in skb->head.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

73f49f8c

tcp: remove size parameter from tcp_stream_alloc_skb() · 5882efff

Eric Dumazet authored Jun 09, 2023

Now all tcp_stream_alloc_skb() callers pass @size == 0, we can
remove this parameter.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5882efff

tcp: remove some dead code · b4a24397

Eric Dumazet authored Jun 09, 2023

Now all skbs in write queue do not contain any payload in skb->head,
we can remove some dead code.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b4a24397

tcp: let tcp_send_syn_data() build headless packets · fbf93406

Eric Dumazet authored Jun 09, 2023

tcp_send_syn_data() is the last component in TCP transmit
path to put payload in skb->head.

Switch it to use page frags, so that we can remove dead
code later.

This allows to put more payload than previous implementation.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fbf93406

Merge branch 'ethtool-extack' · f2f069da

David S. Miller authored Jun 12, 2023

Jakub Kicinski says:

====================
net: support extack in dump and simplify ethtool uAPI

Ethtool currently requires header nest to be always present even if
it doesn't have to carry any attr for a given request. This inflicts
unnecessary pain on the users.

What makes it worse is that extack was not working in dump's ->start()
callback. Address both of those issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f2f069da

net: ethtool: don't require empty header nests · 500e1340

Jakub Kicinski authored Jun 09, 2023

Ethtool currently requires a header nest (which is used to carry
the common family options) in all requests including dumps.

  $ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get
  lib.ynl.NlError: Netlink error: Invalid argument
  nl_len = 64 (48) nl_flags = 0x300 nl_type = 2
	error: -22      extack: {'msg': 'request header missing'}

  $ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get \
           --json '{"header":{}}';  )
  [{'combined-count': 1,
    'combined-max': 1,
    'header': {'dev-index': 2, 'dev-name': 'enp1s0'}}]

Requiring the header nest to always be there may seem nice
from the consistency perspective, but it's not serving any
practical purpose. We shouldn't burden the user like this.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

500e1340

netlink: support extack in dump ->start() · 5ab8c41c

Jakub Kicinski authored Jun 09, 2023

Commit 4a19edb6 ("netlink: Pass extack to dump handlers")
added extack support to netlink dumps. It was focused on rtnl
and since rtnl does not use ->start(), ->done() callbacks
it ignored those. Genetlink on the other hand uses ->start()
extensively, for parsing and input validation.

Pass the extact in via struct netlink_dump_control and link
it to cb for the time of ->start(). Both struct netlink_dump_control
and extack itself live on the stack so we can't keep the same
extack for the duration of the dump. This means that the extack
visible in ->start() and each ->dump() callbacks will be different.
Corner cases like reporting a warning message in DONE across dump
calls are still not supported.

We could put the extack (for dumps) in the socket struct,
but layering makes it slightly awkward (extack pointer is decided
before the DO / DUMP split).

The genetlink dump error extacks are now surfaced:

  $ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get
  lib.ynl.NlError: Netlink error: Invalid argument
  nl_len = 64 (48) nl_flags = 0x300 nl_type = 2
	error: -22	extack: {'msg': 'request header missing'}

Previously extack was missing:

  $ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get
  lib.ynl.NlError: Netlink error: Invalid argument
  nl_len = 36 (20) nl_flags = 0x100 nl_type = 2
	error: -22
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

5ab8c41c

Merge branch 'ynl-ethtool' · 23813168

David S. Miller authored Jun 12, 2023

Jakub Kicinski says:

====================
tools: ynl: generate code for the ethtool family

And finally ethtool support. Thanks to Stan's work the ethtool family
spec is quite complete, so there is a lot of operations to support.

I chickened out of stats-get support, they require at the very least
type-value support on a u64 scalar. Type-value is an arrangement where
a u16 attribute is encoded directly in attribute type. Code gen can
support this if the inside is a nest, we just throw in an extra
field into that nest to carry the attr type. But a little more coding
is needed to for a scalar, because first we need to turn the scalar
into a struct with one member, then we can add the attr type.

Other than that ethtool required event support (notification which
does not share contents with any GET), but the previous series
already added that to the codegen.

I haven't tested all the ops here, and a few I tried seem to work.
====================
Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

23813168

tools: ynl: add sample for ethtool · f561ff23

Jakub Kicinski authored Jun 09, 2023

Configuring / reading ring sizes and counts is a fairly common
operation for ethtool netlink. Present a sample doing that with
YNL:

$ ./ethtool
Channels:
    enp1s0: combined 1
   eni1np1: combined 1
   eni2np1: combined 1
Rings:
    enp1s0: rx 256 tx 256
   eni1np1: rx 0 tx 0
   eni2np1: rx 0 tx 0
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f561ff23

tools: ynl: generate code for the ethtool family · 2d7be507

Jakub Kicinski authored Jun 09, 2023

Generate the protocol code for ethtool. Skip the stats
for now, they are the only outlier in terms of complexity.
Stats are a sort-of semi-polymorphic (attr space of a nest
depends on value of another attr) or a type-value-scalar,
depending on how one wants to look at it...
A challenge for another time.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

2d7be507

netlink: specs: ethtool: mark pads as pads · 68335713

Jakub Kicinski authored Jun 09, 2023

Pad is a separate type. Even though in practice they can
only be a u32 the value should be discarded.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

68335713

netlink: specs: ethtool: untangle stats-get · 709d0c3b

Jakub Kicinski authored Jun 09, 2023

Code gen for stats is a bit of a challenge, but from looking
at the attrs I think that the format isn't quite right.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

709d0c3b

netlink: specs: ethtool: untangle UDP tunnels and cable test a bit · 37c85222

Jakub Kicinski authored Jun 09, 2023

UDP tunnel and cable test messages have a lot of nests,
which do not match the names of the enum entries in C uAPI.
Some of the structure / nesting also looks wrong.

Untangle this a little bit based on the names, comments and
educated guesses, I haven't actually tested the results.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

37c85222

netlink: specs: ethtool: add empty enum stringset · 180ad455

Jakub Kicinski authored Jun 09, 2023

C does not allow defining structures and enums with the same name.
Since enum ethtool_stringset exists in the uAPI we need to include
at least a stub of it in the spec. This will trigger name collision
avoidance in the code gen.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

180ad455

tools: ynl-gen: resolve enum vs struct name conflicts · 2c9d47a0

Jakub Kicinski authored Jun 09, 2023

Ethtool has an attribute set called stringset, from which
we'll generate struct ethtool_stringset. Unfortunately,
the old ethtool header declares enum ethtool_stringset
(the same name), to which compilers object.

This seems unavoidable. Check struct names against known
constants and append an underscore if conflict is detected.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

2c9d47a0

tools: ynl-gen: don't generate enum types if unnamed · dddc9f53

Jakub Kicinski authored Jun 09, 2023

If attr set or enum has empty enum name we need to use u32 or int
as function arguments and struct members.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

dddc9f53

netlink: specs: ethtool: add C render hints · d4813b11

Jakub Kicinski authored Jun 09, 2023

Most of the C enum names are guessed correctly, but there
is a handful of corner cases we need to name explicitly.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

d4813b11

netlink: specs: support setting prefix-name per attribute · ed2042cc

Jakub Kicinski authored Jun 09, 2023

Ethtool's PSE PoDL has a attr nest with different prefixes:

/* Power Sourcing Equipment */
enum {
	ETHTOOL_A_PSE_UNSPEC,
	ETHTOOL_A_PSE_HEADER,			/* nest - _A_HEADER_* */
	ETHTOOL_A_PODL_PSE_ADMIN_STATE,		/* u32 */
	ETHTOOL_A_PODL_PSE_ADMIN_CONTROL,	/* u32 */
	ETHTOOL_A_PODL_PSE_PW_D_STATUS,		/* u32 */

Header has a prefix of ETHTOOL_A_PSE_ and other attrs prefix of
ETHTOOL_A_PODL_PSE_ we can't cover them uniformly.
If PODL was after PSE life would be easy.

Now we either need to add prefixes to attr names which is yucky
or support setting prefix name per attr.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ed2042cc

tools: ynl-gen: record extra args for regen · 33eedb00

Jakub Kicinski authored Jun 09, 2023

ynl-regen needs to know the arguments used to generate a file.
Record excluded ops and, while at it, user headers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

33eedb00

tools: ynl-gen: support excluding tricky ops · 008bcd68

Jakub Kicinski authored Jun 09, 2023

The ethtool family has a small handful of quite tricky ops
and a lot of simple very useful ops. Teach ynl-gen to skip
ops so that we can bypass the tricky ones.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

008bcd68

mdio: mdio-mux-mmioreg: Use of_property_read_reg() to parse "reg" · b30a1f30

Rob Herring authored Jun 09, 2023

Use the recently added of_property_read_reg() helper to get the
untranslated "reg" address value.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

b30a1f30

dt-bindings: net: drop unneeded quotes · 61ab5a06

Krzysztof Kozlowski authored Jun 09, 2023

Cleanup bindings dropping unneeded quotes. Once all these are fixed,
checking for this can be enabled in yamllint.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

61ab5a06

Merge branch 'SCM_PIDFD-SCM_PEERPIDFD' · ba47545c

David S. Miller authored Jun 12, 2023

Alexander Mikhalitsyn says:

====================
Add SCM_PIDFD and SO_PEERPIDFD

1. Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS,
but it contains pidfd instead of plain pid, which allows programmers not
to care about PID reuse problem.

2. Add SO_PEERPIDFD which allows to get pidfd of peer socket holder pidfd.
This thing is direct analog of SO_PEERCRED which allows to get plain PID.

3. Add SCM_PIDFD / SO_PEERPIDFD kselftest

Idea comes from UAPI kernel group:
https://uapi-group.org/kernel-features/

Big thanks to Christian Brauner and Lennart Poettering for productive
discussions about this and Luca Boccassi for testing and reviewing this.

=== Motivation behind this patchset

Eric Dumazet raised a question:
> It seems that we already can use pidfd_open() (since linux-5.3), and
> pass the resulting fd in af_unix SCM_RIGHTS message ?

Yes, it's possible, but it means that from the receiver side we need
to trust the sent pidfd (in SCM_RIGHTS),
or always use combination of SCM_RIGHTS+SCM_CREDENTIALS, then we can
extract pidfd from SCM_RIGHTS,
then acquire plain pid from pidfd and after compare it with the pid
from SCM_CREDENTIALS.

A few comments from other folks regarding this.

Christian Brauner wrote:

>Let me try and provide some of the missing background.

>There are a range of use-cases where we would like to authenticate a
>client through sockets without being susceptible to PID recycling
>attacks. Currently, we can't do this as the race isn't fully fixable.
>We can only apply mitigations.

>What this patchset will allows us to do is to get a pidfd without the
>client having to send us an fd explicitly via SCM_RIGHTS. As that's
>already possibly as you correctly point out.

>But for protocols like polkit this is quite important. Every message is
>standalone and we would need to force a complete protocol change where
>we would need to require that every client allocate and send a pidfd via
>SCM_RIGHTS. That would also mean patching through all polkit users.

>For something like systemd-journald where we provide logging facilities
>and want to add metadata to the log we would also immensely benefit from
>being able to get a receiver-side controlled pidfd.

>With the message type we envisioned we don't need to change the sender
>at all and can be safe against pid recycling.

>Link: https://gitlab.freedesktop.org/polkit/polkit/-/merge_requests/154
>Link: https://uapi-group.org/kernel-features

Lennart Poettering wrote:

>So yes, this is of course possible, but it would mean the pidfd would
>have to be transported as part of the user protocol, explicitly sent
>by the sender. (Moreover, the receiver after receiving the pidfd would
>then still have to somehow be able to prove that the pidfd it just
>received actually refers to the peer's process and not some random
>process. – this part is actually solvable in userspace, but ugly)

>The big thing is simply that we want that the pidfd is associated
>*implicity* with each AF_UNIX connection, not explicitly. A lot of
>userspace already relies on this, both in the authentication area
>(polkit) as well as in the logging area (systemd-journald). Right now
>using the PID field from SO_PEERCREDS/SCM_CREDENTIALS is racy though
>and very hard to get right. Making this available as pidfd too, would
>solve this raciness, without otherwise changing semantics of it all:
>receivers can still enable the creds stuff as they wish, and the data
>is then implicitly appended to the connections/datagrams the sender
>initiates.

>Or to turn this around: things like polkit are typically used to
>authenticate arbitrary dbus methods calls: some service implements a
>dbus method call, and when an unprivileged client then issues that
>call, it will take the client's info, go to polkit and ask it if this
>is ok. If we wanted to send the pidfd as part of the protocol we
>basically would have to extend every single method call to contain the
>client's pidfd along with it as an additional argument, which would be
>a massive undertaking: it would change the prototypes of basically
>*all* methods a service defines… And that's just ugly.

>Note that Alex' patch set doesn't expose anything that wasn't exposed
>before, or attach, propagate what wasn't before. All it does, is make
>the field already available anyway (the struct ucred .pid field)
>available also in a better way (as a pidfd), to solve a variety of
>races, with no effect on the protocol actually spoken within the
>AF_UNIX transport. It's a seamless improvement of the status quo.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ba47545c

af_unix: Kconfig: make CONFIG_UNIX bool · 97154bcf

Alexander Mikhalitsyn authored Jun 08, 2023

Let's make CONFIG_UNIX a bool instead of a tristate.
We've decided to do that during discussion about SCM_PIDFD patchset [1].

[1] https://lore.kernel.org/lkml/20230524081933.44dc8bea@kernel.org/

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Luca Boccassi <bluca@debian.org>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

97154bcf

selftests: net: add SCM_PIDFD / SO_PEERPIDFD test · ec80f488

Alexander Mikhalitsyn authored Jun 08, 2023

Basic test to check consistency between:
- SCM_CREDENTIALS and SCM_PIDFD
- SO_PEERCRED and SO_PEERPIDFD

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ec80f488

net: core: add getsockopt SO_PEERPIDFD · 7b26952a

Alexander Mikhalitsyn authored Jun 08, 2023

Add SO_PEERPIDFD which allows to get pidfd of peer socket holder pidfd.
This thing is direct analog of SO_PEERCRED which allows to get plain PID.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Luca Boccassi <bluca@debian.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: bpf@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Tested-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7b26952a

scm: add SO_PASSPIDFD and SCM_PIDFD · 5e2ff670

Alexander Mikhalitsyn authored Jun 08, 2023

Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS,
but it contains pidfd instead of plain pid, which allows programmers not
to care about PID reuse problem.

We mask SO_PASSPIDFD feature if CONFIG_UNIX is not builtin because
it depends on a pidfd_prepare() API which is not exported to the kernel
modules.

Idea comes from UAPI kernel group:
https://uapi-group.org/kernel-features/

Big thanks to Christian Brauner and Lennart Poettering for productive
discussions about this.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Luca Boccassi <bluca@debian.org>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Tested-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5e2ff670

Merge branch 'mlxsw-cleanups' · 55d7c914

David S. Miller authored Jun 12, 2023

Petr Machata says:

====================
mlxsw: Cleanups in router code

This patchset moves some router-related code from spectrum.c to
spectrum_router.c where it should be. It also simplifies handlers of
netevent notifications.

- Patch #1 caches router pointer in a dedicated variable. This obviates the
  need to access the same as mlxsw_sp->router, making lines shorter, and
  permitting a future patch to add code that fits within 80 character
  limit.

- Patch #2 moves IP / IPv6 validation notifier blocks from spectrum.c
  to spectrum_router, where the handlers are anyway.

- In patch #3, pass router pointer to scheduler of deferred work directly,
  instead of having it deduce it on its own.

- This makes the router pointer available in the handler function
  mlxsw_sp_router_netevent_event(), so in patch #4, use it directly,
  instead of finding it through mlxsw_sp_port.

- In patch #5, extend mlxsw_sp_router_schedule_work() so that the
  NETEVENT_NEIGH_UPDATE handler can use it directly instead of inlining
  equivalent code.

- In patches #6 and #7, add helpers for two common operations involving
  a backing netdev of a RIF. This makes it unnecessary for the function
  mlxsw_sp_rif_dev() to be visible outside of the router module, so in
  patch #8, hide it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

55d7c914

mlxsw: spectrum_router: Privatize mlxsw_sp_rif_dev() · df95ae66

Petr Machata authored Jun 09, 2023

Now that the external users of mlxsw_sp_rif_dev() have been converted in
the preceding patches, make the function static.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

df95ae66

mlxsw: Convert does-RIF-have-this-netdev queries to a dedicated helper · 5374a50f

Petr Machata authored Jun 09, 2023

In a number of places, a netdevice underlying a RIF is obtained only to
compare it to another pointer. In order to clean up the interface between
the router and the other modules, add a new helper to specifically answer
this question, and convert the relevant uses to this new interface.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5374a50f

mlxsw: Convert RIF-has-netdevice queries to a dedicated helper · 0255f748

Petr Machata authored Jun 09, 2023

In a number of places, a netdevice underlying a RIF is obtained only to
check if it a NULL pointer. In order to clean up the interface between the
router and the other modules, add a new helper to specifically answer this
question, and convert the relevant uses to this new interface.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0255f748

mlxsw: spectrum_router: Reuse work neighbor initialization in work scheduler · 151b89f6

Petr Machata authored Jun 09, 2023

After the struct mlxsw_sp_netevent_work.n field initialization is moved
here, the body of code that handles NETEVENT_NEIGH_UPDATE is almost
identical to the one in the helper function. Therefore defer to the helper
instead of inlining the equivalent.

Note that previously, the code took and put a reference of the netdevice.
The new code defers to mlxsw_sp_dev_lower_is_port() to obviate the need for
taking the reference.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

151b89f6

mlxsw: spectrum_router: Use the available router pointer for netevent handling · 14304e70

Petr Machata authored Jun 09, 2023

This code handles NETEVENT_DELAY_PROBE_TIME_UPDATE, which is invoked every
time the delay_probe_time changes. mlxsw router currently only maintains
one timer, so the last delay_probe_time set wins.

Currently, mlxsw uses mlxsw_sp_port_lower_dev_hold() to find a reference to
the router. This is no longer necessary. But as a side effect, this makes
sure that only updates to "interesting netdevices" (ones that have a
physical netdevice lower) are projected.

Retain that side effect by calling mlxsw_sp_port_dev_lower_find_rcu() and
punting if there is none. Then just proceed using the router pointer that's
already at hand in the helper.

Note that previously, the code took and put a reference of the netdevice.
Because the mlxsw_sp pointer is now obtained from the notifier block, the
port pointer (non-) NULL-ness is all that's relevant, and the reference
does not need to be taken anymore.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

14304e70

mlxsw: spectrum_router: Pass router to mlxsw_sp_router_schedule_work() directly · 48dde35e

Petr Machata authored Jun 09, 2023

Instead of passing a notifier block and deducing the router pointer from
that in the helper, do that in the caller, and pass the result. In the
following patches, the pointer will also be made useful in the caller.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

48dde35e

mlxsw: spectrum_router: Move here inetaddr validator notifiers · 41b2bd20

Petr Machata authored Jun 09, 2023

The validation logic is already in the router code. Move there the notifier
blocks themselves as well.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

41b2bd20

mlxsw: spectrum_router: mlxsw_sp_router_fini(): Extract a helper variable · 50f6c3d5

Petr Machata authored Jun 09, 2023

Make mlxsw_sp_router_fini() more similar to the _init() function (and more
concise) by extracting the `router' handle to a named variable and using
that throughout. The availability of a dedicated `router' variable will
come in handy in following patches.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

50f6c3d5

net: openvswitch: add support for l4 symmetric hashing · e069ba07

Aaron Conole authored Jun 09, 2023

Since its introduction, the ovs module execute_hash action allowed
hash algorithms other than the skb->l4_hash to be used.  However,
additional hash algorithms were not implemented.  This means flows
requiring different hash distributions weren't able to use the
kernel datapath.

Now, introduce support for symmetric hashing algorithm as an
alternative hash supported by the ovs module using the flow
dissector.

Output of flow using l4_sym hash:

    recirc_id(0),in_port(3),eth(),eth_type(0x0800),
    ipv4(dst=64.0.0.0/192.0.0.0,proto=6,frag=no), packets:30473425,
    bytes:45902883702, used:0.000s, flags:SP.,
    actions:hash(sym_l4(0)),recirc(0xd)

Some performance testing with no GRO/GSO, two veths, single flow:

    hash(l4(0)):      4.35 GBits/s
    hash(l4_sym(0)):  4.24 GBits/s
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e069ba07

Merge branch 'taprio-xstats' · 65137877

David S. Miller authored Jun 12, 2023

Vladimir Oltean says:

====================
Fixes for taprio xstats

1. Taprio classes correspond to TXQs, and thus, class stats are TXQ
   stats not TC stats.
2. Drivers reporting taprio xstats should report xstats for *this*
   taprio, not for a previous one.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

65137877

net: enetc: reset taprio stats when taprio is deleted · f1e668d2

Vladimir Oltean authored Jun 09, 2023

Currently, the window_drop stats persist even if an incorrect Qdisc was
removed from the interface and a new one is installed. This is because
the enetc driver keeps the state, and that is persistent across multiple
Qdiscs.

To resolve the issue, clear all win_drop counters from all TX queues
when the currently active Qdisc is removed. These counters are zero
by default. The counters visible in ethtool -S are also affected,
but I don't care very much about preserving those enough to keep them
monotonically incrementing.

Fixes: 4802fca8 ("net: enetc: report statistics counters for taprio")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f1e668d2

net/sched: taprio: report class offload stats per TXQ, not per TC · 2b84960f

Vladimir Oltean authored Jun 09, 2023

The taprio Qdisc creates child classes per netdev TX queue, but
taprio_dump_class_stats() currently reports offload statistics per
traffic class. Traffic classes are groups of TXQs sharing the same
dequeue priority, so this is incorrect and we shouldn't be bundling up
the TXQ stats when reporting them, as we currently do in enetc.

Modify the API from taprio to drivers such that they report TXQ offload
stats and not TC offload stats.

There is no change in the UAPI or in the global Qdisc stats.

Fixes: 6c1adb65 ("net/sched: taprio: add netlink reporting for offload statistics counters")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b84960f