Commits · 8dbdf24f4e9e060e4c4a3ad282a4a57f80f48b1f · Kirill Smelkov / linux

26 Jan, 2023 22 commits

selftests: mptcp: userspace: avoid read errors · 8dbdf24f

Matthieu Baerts authored Jan 25, 2023

During the cleanup phase, the server pids were killed with a SIGTERM
directly, not using a SIGUSR1 first to quit safely. As a result, this
test was often ending with two error messages:

  read: Connection reset by peer

While at it, use a for-loop to terminate all the PIDs the same way.

Also the different files are now removed after having killed the PIDs
using them. It makes more sense to do that in this order.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

8dbdf24f

selftests: mptcp: userspace: print error details if any · 10d42734

Matthieu Baerts authored Jan 25, 2023

Before, only '[FAIL]' was printed in case of error during the validation
phase.

Now, in case of failure, the variable name, its value and expected one
are displayed to help understand what was wrong.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

10d42734

selftests: mptcp: userspace: refactor asserts · 1c0b0ee2

Matthieu Baerts authored Jan 25, 2023

Instead of having a long list of conditions to check, it is possible to
give a list of variable names to compare with their 'e_XXX' version.

This will ease the introduction of the following commit which will print
which condition has failed (if any).
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

1c0b0ee2

selftests: mptcp: userspace: print titles · f790ae03

Matthieu Baerts authored Jan 25, 2023

This script is running a few tests after having setup the environment.

Printing titles helps understand what is being tested.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

f790ae03

mptcp: userspace pm: use a single point of exit · 40c71f76

Matthieu Baerts authored Jan 25, 2023

Like in all other functions in this file, a single point of exit is used
when extra operations are needed: unlock, decrement refcount, etc.

There is no functional change for the moment but it is better to do the
same here to make sure all cleanups are done in case of intermediate
errors.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

40c71f76

selftests: mptcp: add test-cases for mixed v4/v6 subflows · ad349374

Paolo Abeni authored Jan 25, 2023

Note that we can't guess the listener family anymore based on the client
target address: always use IPv6.

The fullmesh flag with endpoints from different families is also
validated here.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ad349374

mptcp: propagate sk_ipv6only to subflows · 7e9740e0

Matthieu Baerts authored Jan 25, 2023

Usually, attributes are propagated to subflows as well.

Here, if subflows are created by other ways than the MPTCP path-manager,
it is important to make sure they are in v6 if it is asked by the
userspace.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

7e9740e0

mptcp: let the in-kernel PM use mixed IPv4 and IPv6 addresses · b9d69db8

Paolo Abeni authored Jan 25, 2023

Currently the in-kernel PM arbitrary enforces that created subflow's
family must match the main MPTCP socket while the RFC allows mixing
IPv4 and IPv6 subflows.

This patch changes the in-kernel PM logic to create subflows matching
the currently selected source (or destination) address. IPv4 sockets
can pick only IPv4 addresses (and v4 mapped in v6), while IPv6 sockets
not restricted to V6ONLY can pick either IPv4 and IPv6 addresses as
long as the source and destination matches.

A helper, previously introduced is used to ease family matching checks,
taking care of IPv4 vs IPv4-mapped-IPv6 vs IPv6 only addresses.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/269Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

b9d69db8

icmp: Add counters for rate limits · d0941130

Jamie Bainbridge authored Jan 25, 2023

There are multiple ICMP rate limiting mechanisms:

* Global limits: net.ipv4.icmp_msgs_burst/icmp_msgs_per_sec
* v4 per-host limits: net.ipv4.icmp_ratelimit/ratemask
* v6 per-host limits: net.ipv6.icmp_ratelimit/ratemask

However, when ICMP output is limited, there is no way to tell
which limit has been hit or even if the limits are responsible
for the lack of ICMP output.

Add counters for each of the cases above. As we are within
local_bh_disable(), use the __INC stats variant.

Example output:

 # nstat -sz "*RateLimit*"
 IcmpOutRateLimitGlobal          134                0.0
 IcmpOutRateLimitHost            770                0.0
 Icmp6OutRateLimitHost           84                 0.0
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Suggested-by: Abhishek Rawal <rawal.abhishek92@gmail.com>
Link: https://lore.kernel.org/r/273b32241e6b7fdc5c609e6f5ebc68caf3994342.1674605770.git.jamie.bainbridge@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

d0941130

Merge branch 'adding-sparx5-is0-vcap-support' · 9f927527

Paolo Abeni authored Jan 26, 2023

Steen Hegelund says:

====================
Adding Sparx5 IS0 VCAP support

This provides the Ingress Stage 0 (IS0) VCAP (Versatile Content-Aware
Processor) support for the Sparx5 platform.

The IS0 VCAP (also known in the datasheet as CLM) is a classifier VCAP that
mainly extracts frame information to metadata that follows the frame in the
Sparx5 processing flow all the way to the egress port.

The IS0 VCAP has 4 lookups and they are accessible with a TC chain id:

- chain 1000000: IS0 Lookup 0
- chain 1100000: IS0 Lookup 1
- chain 1200000: IS0 Lookup 2
- chain 1300000: IS0 Lookup 3
- chain 1400000: IS0 Lookup 4
- chain 1500000: IS0 Lookup 5

Each of these lookups have their own port keyset configuration that decides
which keys will be used for matching on which traffic type.

The IS0 VCAP has these traffic classifications:

- IPv4 frames
- IPv6 frames
- Unicast MPLS frames (ethertype = 0x8847)
- Multicast MPLS frames (ethertype = 0x8847)
- Other frame types than MPLS, IPv4 and IPv6

The IS0 VCAP has an action that allows setting the value of a PAG (Policy
Association Group) key field in the frame metadata, and this can be used
for matching in an IS2 VCAP rule.

This allow rules in the IS0 VCAP to be linked to rules in the IS2 VCAP.

The linking is exposed by using the TC "goto chain" action with an offset
from the IS2 chain ids.

As an example a "goto chain 8000001" will use a PAG value of 1 to chain to
a rule in IS2 Lookup 0.
====================

Link: https://lore.kernel.org/r/20230124104511.293938-1-steen.hegelund@microchip.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

9f927527

net: microchip: sparx5: Add support for IS0 VCAP CVLAN TC keys · 52df82cc

Steen Hegelund authored Jan 24, 2023

This adds support for parsing and matching on the CVLAN tags in the Sparx5
IS0 VCAP.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

52df82cc

net: microchip: sparx5: Add support for IS0 VCAP ethernet protocol types · 63e35645

Steen Hegelund authored Jan 24, 2023

This allows the IS0 VCAP to have its own list of supported ethernet
protocol types matching what is supported by the VCAPs port lookup
classification.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

63e35645

net: microchip: sparx5: Add automatic selection of VCAP rule actionset · 81e164c4

Steen Hegelund authored Jan 24, 2023

With more than one possible actionset in a VCAP instance, the VCAP API will
now use the actions in a VCAP rule to select the actionset that fits these
actions the best possible way.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

81e164c4

net: microchip: sparx5: Add TC filter chaining support for IS0 and IS2 VCAPs · 88bd9ea7

Steen Hegelund authored Jan 24, 2023

This allows rules to be chained between VCAP instances, e.g. from IS0
Lookup 0 to IS0 Lookup 1, or from one of the IS0 Lookups to one of the IS2
Lookups.

Chaining from an IS2 Lookup to another IS2 Lookup is not supported in the
hardware.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

88bd9ea7

net: microchip: sparx5: Add TC support for IS0 VCAP · 542e6e2c

Steen Hegelund authored Jan 24, 2023

This enables the TC command to use the Sparx5 IS0 VCAP
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

542e6e2c

net: microchip: sparx5: Add actionset type id information to rule · 7306fcd1

Steen Hegelund authored Jan 24, 2023

This adds the actionset type id to the rule information.  This is needed as
we now have more than one actionset in a VCAP instance (IS0).
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

7306fcd1

net: microchip: sparx5: Add IS0 VCAP keyset configuration for Sparx5 · 545609fd

Steen Hegelund authored Jan 24, 2023

This adds the IS0 VCAP port keyset configuration for Sparx5 and also
updates the debugFS support to show the keyset configuration.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

545609fd

net: microchip: sparx5: Add IS0 VCAP model and updated KUNIT VCAP model · f274a659

Steen Hegelund authored Jan 24, 2023

This provides the IS0 (Ingress Stage 0) or CLM VCAP model for Sparx5.
This VCAP provides classification actions for Sparx5.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

f274a659

Merge branch 'add-ip_local_port_range-socket-option' · 3f17e16f

Jakub Kicinski authored Jan 25, 2023

Jakub Sitnicki says:

====================
Add IP_LOCAL_PORT_RANGE socket option

This patch set is a follow up to the "How to share IPv4 addresses by
partitioning the port space" talk given at LPC 2022 [1].

Please see patch #1 for the motivation & the use case description.
Patch #2 adds tests exercising the new option in various scenarios.

Documentation
-------------

Proposed update to the ip(7) man-page:

       IP_LOCAL_PORT_RANGE (since Linux X.Y)
              Set or get the per-socket default local  port  range.  This
              option  can  be  used  to  clamp down the global local port
              range, defined by the ip_local_port_range  /proc  interface
              described below, for a given socket.

              The  option  takes  an uint32_t value with the high 16 bits
              set to the upper range bound, and the low 16  bits  set  to
              the  lower  range  bound.  Range  bounds are inclusive. The
              16-bit values should be in host byte order.

              The lower bound has to be less than the  upper  bound  when
              both  bounds  are  not  zero. Otherwise, setting the option
              fails with EINVAL.

              If either bound is outside of the global local port  range,
              or is zero, then that bound has no effect.

              To  reset  the setting, pass zero as both the upper and the
              lower bound.

Interaction with SELinux bind() hook
------------------------------------

SELinux bind() hook - selinux_socket_bind() - performs a permission check
if the requested local port number lies outside of the netns ephemeral port
range.

The proposed socket option cannot be used change the ephemeral port range
to extend beyond the per-netns port range, as set by
net.ipv4.ip_local_port_range.

Hence, there is no interaction with SELinux, AFAICT.

RFC -> v1
RFC: https://lore.kernel.org/netdev/20220912225308.93659-1-jakub@cloudflare.com/

 * Allow either the high bound or the low bound, or both, to be zero
 * Add getsockopt support
 * Add selftests

Links:
------

[1]: https://lpc.events/event/16/contributions/1349/
====================

Link: https://lore.kernel.org/r/20221221-sockopt-port-range-v6-0-be255cc0e51f@cloudflare.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3f17e16f

selftests/net: Cover the IP_LOCAL_PORT_RANGE socket option · ae543965

Jakub Sitnicki authored Jan 24, 2023

Exercise IP_LOCAL_PORT_RANGE socket option in various scenarios:

1. pass invalid values to setsockopt
2. pass a range outside of the per-netns port range
3. configure a single-port range
4. exhaust a configured multi-port range
5. check interaction with late-bind (IP_BIND_ADDRESS_NO_PORT)
6. set then get the per-socket port range
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ae543965

inet: Add IP_LOCAL_PORT_RANGE socket option · 91d0b78c

Jakub Sitnicki authored Jan 24, 2023

Users who want to share a single public IP address for outgoing connections
between several hosts traditionally reach for SNAT. However, SNAT requires
state keeping on the node(s) performing the NAT.

A stateless alternative exists, where a single IP address used for egress
can be shared between several hosts by partitioning the available ephemeral
port range. In such a setup:

1. Each host gets assigned a disjoint range of ephemeral ports.
2. Applications open connections from the host-assigned port range.
3. Return traffic gets routed to the host based on both, the destination IP
   and the destination port.

An application which wants to open an outgoing connection (connect) from a
given port range today can choose between two solutions:

1. Manually pick the source port by bind()'ing to it before connect()'ing
   the socket.

   This approach has a couple of downsides:

   a) Search for a free port has to be implemented in the user-space. If
      the chosen 4-tuple happens to be busy, the application needs to retry
      from a different local port number.

      Detecting if 4-tuple is busy can be either easy (TCP) or hard
      (UDP). In TCP case, the application simply has to check if connect()
      returned an error (EADDRNOTAVAIL). That is assuming that the local
      port sharing was enabled (REUSEADDR) by all the sockets.

        # Assume desired local port range is 60_000-60_511
        s = socket(AF_INET, SOCK_STREAM)
        s.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
        s.bind(("192.0.2.1", 60_000))
        s.connect(("1.1.1.1", 53))
        # Fails only if 192.0.2.1:60000 -> 1.1.1.1:53 is busy
        # Application must retry with another local port

      In case of UDP, the network stack allows binding more than one socket
      to the same 4-tuple, when local port sharing is enabled
      (REUSEADDR). Hence detecting the conflict is much harder and involves
      querying sock_diag and toggling the REUSEADDR flag [1].

   b) For TCP, bind()-ing to a port within the ephemeral port range means
      that no connecting sockets, that is those which leave it to the
      network stack to find a free local port at connect() time, can use
      the this port.

      IOW, the bind hash bucket tb->fastreuse will be 0 or 1, and the port
      will be skipped during the free port search at connect() time.

2. Isolate the app in a dedicated netns and use the use the per-netns
   ip_local_port_range sysctl to adjust the ephemeral port range bounds.

   The per-netns setting affects all sockets, so this approach can be used
   only if:

   - there is just one egress IP address, or
   - the desired egress port range is the same for all egress IP addresses
     used by the application.

   For TCP, this approach avoids the downsides of (1). Free port search and
   4-tuple conflict detection is done by the network stack:

     system("sysctl -w net.ipv4.ip_local_port_range='60000 60511'")

     s = socket(AF_INET, SOCK_STREAM)
     s.setsockopt(SOL_IP, IP_BIND_ADDRESS_NO_PORT, 1)
     s.bind(("192.0.2.1", 0))
     s.connect(("1.1.1.1", 53))
     # Fails if all 4-tuples 192.0.2.1:60000-60511 -> 1.1.1.1:53 are busy

  For UDP this approach has limited applicability. Setting the
  IP_BIND_ADDRESS_NO_PORT socket option does not result in local source
  port being shared with other connected UDP sockets.

  Hence relying on the network stack to find a free source port, limits the
  number of outgoing UDP flows from a single IP address down to the number
  of available ephemeral ports.

To put it another way, partitioning the ephemeral port range between hosts
using the existing Linux networking API is cumbersome.

To address this use case, add a new socket option at the SOL_IP level,
named IP_LOCAL_PORT_RANGE. The new option can be used to clamp down the
ephemeral port range for each socket individually.

The option can be used only to narrow down the per-netns local port
range. If the per-socket range lies outside of the per-netns range, the
latter takes precedence.

UAPI-wise, the low and high range bounds are passed to the kernel as a pair
of u16 values in host byte order packed into a u32. This avoids pointer
passing.

  PORT_LO = 40_000
  PORT_HI = 40_511

  s = socket(AF_INET, SOCK_STREAM)
  v = struct.pack("I", PORT_HI << 16 | PORT_LO)
  s.setsockopt(SOL_IP, IP_LOCAL_PORT_RANGE, v)
  s.bind(("127.0.0.1", 0))
  s.getsockname()
  # Local address between ("127.0.0.1", 40_000) and ("127.0.0.1", 40_511),
  # if there is a free port. EADDRINUSE otherwise.

[1] https://github.com/cloudflare/cloudflare-blog/blob/232b432c1d57/2022-02-connectx/connectx.py#L116Reviewed-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

91d0b78c

net: Kconfig: fix spellos · 6a7a2c18

Randy Dunlap authored Jan 24, 2023

Fix spelling in net/ Kconfig files.
(reported by codespell)
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: coreteam@netfilter.org
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Link: https://lore.kernel.org/r/20230124181724.18166-1-rdunlap@infradead.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6a7a2c18

25 Jan, 2023 17 commits

net: ethtool: fix NULL pointer dereference in pause_prepare_data() · f5be9caf

Vladimir Oltean authored Jan 24, 2023

In the following call path:

ethnl_default_dumpit
-> ethnl_default_dump_one
   -> ctx->ops->prepare_data
      -> pause_prepare_data

struct genl_info *info will be passed as NULL, and pause_prepare_data()
dereferences it while getting the extended ack pointer.

To avoid that, just set the extack to NULL if "info" is NULL, since the
netlink extack handling messages know how to deal with that.

The pattern "info ? info->extack : NULL" is present in quite a few other
"prepare_data" implementations, so it's clear that it's a more general
problem to be dealt with at a higher level, but the code should have at
least adhered to the current conventions to avoid the NULL dereference.

Fixes: 04692c90 ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reported-by: syzbot+9d44aae2720fc40b8474@syzkaller.appspotmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>

f5be9caf

net: ethtool: fix NULL pointer dereference in stats_prepare_data() · c96de136

Vladimir Oltean authored Jan 24, 2023

In the following call path:

ethnl_default_dumpit
-> ethnl_default_dump_one
   -> ctx->ops->prepare_data
      -> stats_prepare_data

struct genl_info *info will be passed as NULL, and stats_prepare_data()
dereferences it while getting the extended ack pointer.

To avoid that, just set the extack to NULL if "info" is NULL, since the
netlink extack handling messages know how to deal with that.

The pattern "info ? info->extack : NULL" is present in quite a few other
"prepare_data" implementations, so it's clear that it's a more general
problem to be dealt with at a higher level, but the code should have at
least adhered to the current conventions to avoid the NULL dereference.

Fixes: 04692c90 ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c96de136

Merge branch 's390-ism-generalized-interface' · 99db6fb0

David S. Miller authored Jan 25, 2023

Jan Karcher says:

====================
drivers/s390/net/ism: Add generalized interface

Previously, there was no clean separation between SMC-D code and the ISM
device driver.This patch series addresses the situation to make ISM available
for uses outside of SMC-D.
In detail: SMC-D offers an interface via struct smcd_ops, which only the
ISM module implements so far. However, there is no real separation between
the smcd and ism modules, which starts right with the ISM device
initialization, which calls directly into the SMC-D code.
This patch series introduces a new API in the ISM module, which allows
registration of arbitrary clients via include/linux/ism.h: struct ism_client.
Furthermore, it introduces a "pure" struct ism_dev (i.e. getting rid of
dependencies on SMC-D in the device structure), and adds a number of API
calls for data transfers via ISM (see ism_register_dmb() & friends).
Still, the ISM module implements the SMC-D API, and therefore has a number
of internal helper functions for that matter.
Note that the ISM API is consciously kept thin for now (as compared to the
SMC-D API calls), as a number of API calls are only used with SMC-D and
hardly have any meaningful usage beyond SMC-D, e.g. the VLAN-related calls.

v1 -> v2:
  Removed s390x dependency which broke config for other archs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

99db6fb0

net/smc: De-tangle ism and smc device initialization · 8c81ba20

Stefan Raspl authored Jan 23, 2023

The struct device for ISM devices was part of struct smcd_dev. Move to
struct ism_dev, provide a new API call in struct smcd_ops, and convert
existing SMCD code accordingly.
Furthermore, remove struct smcd_dev from struct ism_dev.
This is the final part of a bigger overhaul of the interfaces between SMC
and ISM.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8c81ba20

s390/ism: Consolidate SMC-D-related code · 820f2100

Stefan Raspl authored Jan 23, 2023

The ism module had SMC-D-specific code sprinkled across the entire module.
We are now consolidating the SMC-D-specific parts into the latter parts
of the module, so it becomes more clear what code is intended for use with
ISM, and which parts are glue code for usage in the context of SMC-D.
This is the fourth part of a bigger overhaul of the interfaces between SMC
and ISM.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

820f2100

net/smc: Separate SMC-D and ISM APIs · 9de4df7b

Stefan Raspl authored Jan 23, 2023

We separate the code implementing the struct smcd_ops API in the ISM
device driver from the functions that may be used by other exploiters of
ISM devices.
Note: We start out small, and don't offer the whole breadth of the ISM
device for public use, as many functions are specific to or likely only
ever used in the context of SMC-D.
This is the third part of a bigger overhaul of the interfaces between SMC
and ISM.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9de4df7b

net/smc: Register SMC-D as ISM client · 8747716f

Stefan Raspl authored Jan 23, 2023

Register the smc module with the new ism device driver API.
This is the second part of a bigger overhaul of the interfaces between SMC
and ISM.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8747716f

net/ism: Add new API for client registration · 89e7d2ba

Stefan Raspl authored Jan 23, 2023

Add a new API that allows other drivers to concurrently access ISM devices.
To do so, we introduce a new API that allows other modules to register for
ISM device usage. Furthermore, we move the GID to struct ism, where it
belongs conceptually, and rename and relocate struct smcd_event to struct
ism_event.
This is the first part of a bigger overhaul of the interfaces between SMC
and ISM.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

89e7d2ba

s390/ism: Introduce struct ism_dmb · 1baedb13

Stefan Raspl authored Jan 23, 2023

Conceptually, a DMB is a structure that belongs to ISM devices. However,
SMC currently 'owns' this structure. So future exploiters of ISM devices
would be forced to include SMC headers to work - which is just weird.
Therefore, we switch ISM to struct ism_dmb, introduce a new public header
with the definition (will be populated with further API calls later on),
and, add a thin wrapper to please SMC. Since structs smcd_dmb and ism_dmb
are identical, we can simply convert between the two for now.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1baedb13

net/ism: Add missing calls to disable bus-mastering · 462502ff

Stefan Raspl authored Jan 23, 2023

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

462502ff

net/smc: Terminate connections prior to device removal · c40bff41

Stefan Raspl authored Jan 23, 2023

Removing an ISM device prior to terminating its associated connections
doesn't end well.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c40bff41

virtio-net: Reduce debug name field size to 16 bytes · d0671115

Parav Pandit authored Jan 23, 2023

virtio queue index can be maximum of 65535. 16 bytes are enough to store
the vq name with the existing string prefix.

With this change, send queue struct saves 24 bytes and receive
queue saves whole cache line worth 64 bytes per structure
due to saving in alignment bytes.

Pahole results before:

pahole -s drivers/net/virtio_net.o | \
    grep -e "send_queue" -e "receive_queue"
send_queue      1112    0
receive_queue   1280    1

Pahole results after:
pahole -s drivers/net/virtio_net.o | \
    grep -e "send_queue" -e "receive_queue"
send_queue      1088    0
receive_queue   1216    1
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d0671115

devlink: remove a dubious assumption in fmsg dumping · 4373a023

Jakub Kicinski authored Jan 23, 2023

Build bot detects that err may be returned uninitialized in
devlink_fmsg_prepare_skb(). This is not really true because
all fmsgs users should create at least one outer nest, and
therefore fmsg can't be completely empty.

That said the assumption is not trivial to confirm, so let's
follow the bots advice, anyway.

This code does not seem to have changed since its inception in
commit 1db64e87 ("devlink: Add devlink formatted message (fmsg) API")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230124035231.787381-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4373a023

net: mscc: ocelot: fix incorrect verify_enabled reporting in ethtool get_mm() · 28113cfa

Vladimir Oltean authored Jan 23, 2023

We don't read the verify_enabled variable from hardware in the MAC Merge
layer state GET operation, instead we always leave it set to "false".
The user may think something is wrong if they set verify_enabled to
true, then read it back and see it's still false, even though the
configuration took place.

Fixes: 6505b680 ("net: mscc: ocelot: add MAC Merge layer support for VSC9959")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230123184538.3420098-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

28113cfa

nfp: flower: change get/set_eeprom logic and enable for flower reps · 74b4f173

James Hershaw authored Jan 23, 2023

The changes in this patch are as follows:

- Alter the logic of get/set_eeprom functions to use the helper function
nfp_app_from_netdev() which handles differentiating between an nfp_net
and a nfp_repr. This allows us to get an agnostic backpointer to the
pdev.

- Enable the various eeprom commands by adding the 'get_eeprom_len',
'get_eeprom', 'set_eeprom' callbacks to the nfp_port_ethtool_ops struct.
This allows the eeprom commands to work on representor interfaces,
similar to a previous patch which added it to the vnics.
Currently these are being used to configure persistent MAC addresses for
the physical ports on the nfp.
Signed-off-by: James Hershaw <james.hershaw@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230123134135.293278-1-simon.horman@corigine.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

74b4f173

ipv6: Make ip6_route_output_flags_noref() static. · 90317bcd

Guillaume Nault authored Jan 23, 2023

This function is only used in net/ipv6/route.c and has no reason to be
visible outside of it.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/50706db7f675e40b3594d62011d9363dce32b92e.1674495822.git.gnault@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

90317bcd

netlink: fix spelling mistake in dump size assert · ec8f7d49

Jakub Kicinski authored Jan 23, 2023

Commit 2c7bc10d ("netlink: add macro for checking dump ctx size")
misspelled the name of the assert as asset, missing an R.
Reported-by: Ido Schimmel <idosch@idosch.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230123222224.732338-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ec8f7d49

24 Jan, 2023 1 commit

Merge branch 'netlink-protocol-specs' · c554520f

Paolo Abeni authored Jan 24, 2023

Jakub Kicinski says:

====================
Netlink protocol specs

I think the Netlink proto specs are far along enough to merge.
Filling in all attribute types and quirks will be an ongoing
effort but we have enough to cover FOU so it's somewhat complete.

I fully intend to continue polishing the code but at the same
time I'd like to start helping others base their work on the
specs (e.g. DPLL) and need to start working on some new families
myself.

That's the progress / motivation for merging. The RFC [1] has more
of a high level blurb, plus I created a lot of documentation, I'm
not going to repeat it here. There was also the talk at LPC [2].

[1] https://lore.kernel.org/all/20220811022304.583300-1-kuba@kernel.org/
[2] https://youtu.be/9QkXIQXkaQk?t=2562
v2: https://lore.kernel.org/all/20220930023418.1346263-1-kuba@kernel.org/
v3: https://lore.kernel.org/all/20230119003613.111778-1-kuba@kernel.org/1

v4:
 - spec improvements (patch 2)
 - Python cleanup (patch 3)
 - rename auto-gen files and use the right comment style
====================

Link: https://lore.kernel.org/r/20230120175041.342573-1-kuba@kernel.orgSigned-off-by: Paolo Abeni <pabeni@redhat.com>

c554520f