Commits · d1dab4f71d372e00e2d34a9c32bf261623e3a95c · Kirill Smelkov / linux

12 Jun, 2024 22 commits

net: add and use __skb_get_hash_symmetric_net · d1dab4f7

Florian Westphal authored Jun 09, 2024

Similar to previous patch: apply same logic for
__skb_get_hash_symmetric and let callers pass the netns to the dissector
core.

Existing function is turned into a wrapper to avoid adjusting all
callers, nft_hash.c uses new function.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240608221057.16070-3-fw@strlen.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d1dab4f7

net: add and use skb_get_hash_net · b975d3ee

Florian Westphal authored Jun 09, 2024

Years ago flow dissector gained ability to delegate flow dissection
to a bpf program, scoped per netns.

Unfortunately, skb_get_hash() only gets an sk_buff argument instead
of both net+skb.  This means the flow dissector needs to obtain the
netns pointer from somewhere else.

The netns is derived from skb->dev, and if that is not available, from
skb->sk.  If neither is set, we hit a (benign) WARN_ON_ONCE().

Trying both dev and sk covers most cases, but not all, as recently
reported by Christoph Paasch.

In case of nf-generated tcp reset, both sk and dev are NULL:

WARNING: .. net/core/flow_dissector.c:1104
 skb_flow_dissect_flow_keys include/linux/skbuff.h:1536 [inline]
 skb_get_hash include/linux/skbuff.h:1578 [inline]
 nft_trace_init+0x7d/0x120 net/netfilter/nf_tables_trace.c:320
 nft_do_chain+0xb26/0xb90 net/netfilter/nf_tables_core.c:268
 nft_do_chain_ipv4+0x7a/0xa0 net/netfilter/nft_chain_filter.c:23
 nf_hook_slow+0x57/0x160 net/netfilter/core.c:626
 __ip_local_out+0x21d/0x260 net/ipv4/ip_output.c:118
 ip_local_out+0x26/0x1e0 net/ipv4/ip_output.c:127
 nf_send_reset+0x58c/0x700 net/ipv4/netfilter/nf_reject_ipv4.c:308
 nft_reject_ipv4_eval+0x53/0x90 net/ipv4/netfilter/nft_reject_ipv4.c:30
 [..]

syzkaller did something like this:
table inet filter {
  chain input {
    type filter hook input priority filter; policy accept;
    meta nftrace set 1
    tcp dport 42 reject with tcp reset
   }
   chain output {
    type filter hook output priority filter; policy accept;
    # empty chain is enough
   }
}

... then sends a tcp packet to port 42.

Initial attempt to simply set skb->dev from nf_reject_ipv4 doesn't cover
all cases: skbs generated via ipv4 igmp_send_report trigger similar splat.

Moreover, Pablo Neira found that nft_hash.c uses __skb_get_hash_symmetric()
which would trigger same warn splat for such skbs.

Lets allow callers to pass the current netns explicitly.
The nf_trace infrastructure is adjusted to use the new helper.

__skb_get_hash_symmetric is handled in the next patch.
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/494Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240608221057.16070-2-fw@strlen.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b975d3ee

Merge branch 'tcp-ao-md5-racepoits' · 91579c93

David S. Miller authored Jun 12, 2024

Dmitry Safonov says:

====================
net/tcp: TCP-AO and TCP-MD5 tracepoints

Changes in v2:
- Fix the build with CONFIG_IPV6=m (Eric Dumazet)
- Move unused keyid/rnext/maclen later in the series to the patch
  that uses them (Simon Horman)
- Reworked tcp_ao selftest lib to allow async tracing non-tcp events
  (was working on a stress-test that needs trace_kfree_skb() event,
   not in this series).
- Separated selftest changes from kernel, as they now have a couple
  of unrelated to tracepoints changes
- Wrote a few lines of Documentation/
- Link to v1: https://lore.kernel.org/r/20240224-tcp-ao-tracepoints-v1-0-15f31b7f30a7@arista.com
====================
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

91579c93

Documentation/tcp-ao: Add a few lines on tracepoints · efe46fb1

Dmitry Safonov authored Jun 07, 2024

Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

efe46fb1

net/tcp: Remove tcp_hash_fail() · 78b1b27d

Dmitry Safonov authored Jun 07, 2024

Now there are tracepoints, that cover all functionality of
tcp_hash_fail(), but also wire up missing places
They are also faster, can be disabled and provide filtering.

This potentially may create a regression if a userspace depends on dmesg
logs. Fingers crossed, let's see if anyone complains in reality.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

78b1b27d

net/tcp: Add tcp-md5 and tcp-ao tracepoints · 96be3dcd

Dmitry Safonov authored Jun 07, 2024

Instead of forcing userspace to parse dmesg (that's what currently is
happening, at least in codebase of my current company), provide a better
way, that can be enabled/disabled in runtime.

Currently, there are already tcp events, add hashing related ones there,
too. Rasdaemon currently exercises net_dev_xmit_timeout,
devlink_health_report, but it'll be trivial to teach it to deal with
failed hashes. Otherwise, BGP may trace/log them itself. Especially
exciting for possible investigations is key rotation (RNext_key
requests).
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

96be3dcd

net/tcp: Move tcp_inbound_hash() from headers · 811efc06

Dmitry Safonov authored Jun 07, 2024

Two reasons:
1. It's grown up enough
2. In order to not do header spaghetti by including
   <trace/events/tcp.h>, which is necessary for TCP tracepoints.

While at it, unexport and make static tcp_inbound_ao_hash().
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

811efc06

net/tcp: Add a helper tcp_ao_hdr_maclen() · 72863087

Dmitry Safonov authored Jun 07, 2024

It's going to be used more in TCP-AO tracepoints.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

72863087

net/tcp: Use static_branch_tcp_{md5,ao} to drop ifdefs · 3966a668

Dmitry Safonov authored Jun 07, 2024

It's possible to clean-up some ifdefs by hiding that
tcp_{md5,ao}_needed static branch is defined and compiled only
under related configs, since commit 4c8530dc ("net/tcp: Only produce
AO/MD5 logs if there are any keys").
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3966a668

Merge branch 'selftests-mptcp-use-net-lib-sh-to-manage-netns' · d4cea2ca

Jakub Kicinski authored Jun 11, 2024

Matthieu Baerts says:

====================
selftests: mptcp: use net/lib.sh to manage netns

The goal of this series is to use helpers from net/lib.sh with MPTCP
selftests.

- Patches 1 to 4 are some clean-ups and preparation in net/lib.sh:

  - Patch 1 simplifies the code handling errexit by ignoring possible
    errors instead of disabling errexit temporary.

  - Patch 2 removes the netns from the list after having cleaned it, not
    to try to clean it twice.

  - Patch 3 removes the 'readonly' attribute for the netns variable, to
    allow using the same name in local variables.

  - Patch 4 removes the local 'ns' var, not to conflict with the global
    one it needs to setup.

- Patch 5 uses helpers from net/lib.sh to create and delete netns in
  MPTCP selftests.

- Patch 6 uses wait_local_port_listen helper from net/net_helper.sh.
====================

Link: https://lore.kernel.org/r/20240607-upstream-net-next-20240607-selftests-mptcp-net-lib-v1-0-e36986faac94@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d4cea2ca

selftests: mptcp: lib: use wait_local_port_listen helper · 1af3bc91

Geliang Tang authored Jun 07, 2024

This patch includes net_helper.sh into mptcp_lib.sh, uses the helper
wait_local_port_listen() defined in it to implement the similar mptcp
helper. This can drop some duplicate code.

It looks like this helper from net_helper.sh was originally coming from
MPTCP, but MPTCP selftests have not been updated to use it from this
shared place.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240607-upstream-net-next-20240607-selftests-mptcp-net-lib-v1-6-e36986faac94@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

1af3bc91

selftests: mptcp: lib: use setup/cleanup_ns helpers · f265d311

Geliang Tang authored Jun 07, 2024

This patch includes lib.sh into mptcp_lib.sh, uses setup_ns helper
defined in lib.sh to set up namespaces in mptcp_lib_ns_init(), and
uses cleanup_ns to delete namespaces in mptcp_lib_ns_exit().
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240607-upstream-net-next-20240607-selftests-mptcp-net-lib-v1-5-e36986faac94@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f265d311

selftests: net: lib: remove 'ns' var in setup_ns · f8a2d2f8

Geliang Tang authored Jun 07, 2024

The helper setup_ns() doesn't work when a net namespace named "ns" is
passed to it.

For example, in net/mptcp/diag.sh, the name of the namespace is "ns". If
"setup_ns ns" is used in it, diag.sh fails with errors:

Invalid netns name "./mptcp_connect"
Cannot open network namespace "10000": No such file or directory
Cannot open network namespace "10000": No such file or directory

That is because "ns" is also a local variable in setup_ns, and it will
not set the value for the global variable that has been giving in
argument. To solve this, we could rename the variable, but it sounds
better to drop it, as we can resolve the name using the variable passed
in argument instead.

The other local variables -- "ns_list" and "ns_name" -- are more
unlikely to conflict with existing global variables. They don't seem to
be currently used in any other net selftests.
Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240607-upstream-net-next-20240607-selftests-mptcp-net-lib-v1-4-e36986faac94@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f8a2d2f8

selftests: net: lib: do not set ns var as readonly · 577db6bd

Matthieu Baerts (NGI0) authored Jun 07, 2024

It sounds good to mark the global netns variable as 'readonly', but Bash
doesn't allow the creation of local variables with the same name.

Because it looks like 'readonly' is mainly used here to check if a netns
with that name has already been set, it sounds fine to check if a
variable with this name has already been set instead. By doing that, we
avoid having to modify helpers from MPTCP selftests using the same
variable name as the one used to store the created netns name.

While at it, also avoid an unnecessary call to 'eval' to set a local
variable.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240607-upstream-net-next-20240607-selftests-mptcp-net-lib-v1-3-e36986faac94@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

577db6bd

selftests: net: lib: remove ns from list after clean-up · 92fe5670

Matthieu Baerts (NGI0) authored Jun 07, 2024

Instead of only appending items to the list, removing them when the
netns has been deleted.

By doing that, we can make sure 'cleanup_all_ns()' is not trying to
remove already deleted netns.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240607-upstream-net-next-20240607-selftests-mptcp-net-lib-v1-2-e36986faac94@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

92fe5670

selftests: net: lib: ignore possible errors · 7e0620bc

Matthieu Baerts (NGI0) authored Jun 07, 2024

No need to disable errexit temporary, simply ignore the only possible
and not handled error.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240607-upstream-net-next-20240607-selftests-mptcp-net-lib-v1-1-e36986faac94@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7e0620bc

Merge branch 'net-core-unify-dstats-with-tstats-and-lstats-implement-generic-dstats-collection' · bfc65070

Jakub Kicinski authored Jun 11, 2024

Jeremy Kerr says:

====================
net: core: Unify dstats with tstats and lstats, implement generic dstats collection

The struct pcpu_dstats ("dstats") has a few variations from the other
two stats types (struct pcpu_sw_netstats and struct pcpu_lstats), and
doesn't have generic helpers for collecting the per-cpu stats into a
struct rtnl_link_stats64.

This change unifies dstats with the other types, adds a collection
implementation to the core, and updates the single driver (vrf) to use
this generic implementation.

Of course, questions/comments/etc are most welcome!

v3: https://lore.kernel.org/r/20240605-dstats-v2-0-7fae03f813f3@codeconstruct.com.au
v2: v1: https://lore.kernel.org/r/20240605-dstats-v1-0-1024396e1670@codeconstruct.com.au
====================

Link: https://lore.kernel.org/r/20240607-dstats-v3-0-cc781fe116f7@codeconstruct.com.auSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bfc65070

net: vrf: move to generic dstat helpers · 2202576d

Jeremy Kerr authored Jun 07, 2024

The vrf driver has its own dstats-to-rtnl_link_stats64 collection, but
we now have a generic implementation for dstats collection, so switch to
this.

In doing so, we fix a minor issue where the (non-percpu)
dev->stats->tx_errors value was never collected into rtnl_link_stats64,
as the generic dev_get_dstats64() consumes the starting values from
dev->stats.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240607-dstats-v3-3-cc781fe116f7@codeconstruct.com.auSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2202576d

net: core: Implement dstats-type stats collections · 94b601bc

Jeremy Kerr authored Jun 07, 2024

We currently have dev_get_tstats64() for collecting per-cpu stats of
type pcpu_sw_netstats ("tstats"). However, tstats doesn't allow for
accounting tx/rx drops. We do have a stats variant that does have stats
for dropped packets: struct pcpu_dstats, but there are no core helpers
for using those stats.

The VRF driver uses dstats, by providing its own collation/fetch
functions to do so.

This change adds a common implementation for dstats-type collection,
used when pcpu_stat_type == NETDEV_PCPU_STAT_DSTAT. This is based on the
VRF driver's existing stats collator (plus the unused tx_drops stat from
there). We will switch the VRF driver to use this in the next change.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240607-dstats-v3-2-cc781fe116f7@codeconstruct.com.auSigned-off-by: Jakub Kicinski <kuba@kernel.org>

94b601bc

net: core,vrf: Change pcpu_dstat fields to u64_stats_t · fa59dc2f

Jeremy Kerr authored Jun 07, 2024

The pcpu_sw_netstats and pcpu_lstats structs both contain a set of
u64_stats_t fields for individual stats, but pcpu_dstats uses u64s
instead.

Make this consistent by using u64_stats_t across all stats types.

The per-cpu dstats are only used by the vrf driver at present, so update
that driver as part of this change.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240607-dstats-v3-1-cc781fe116f7@codeconstruct.com.auSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fa59dc2f

ip_tunnel: Move stats allocation to core · 45403b12

Breno Leitao authored Jun 07, 2024

With commit 34d21de9 ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core instead
of this driver.

With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.

Move ip_tunnel driver to leverage the core allocation.

All the ip_tunnel_init() users call ip_tunnel_init() as part of their
.ndo_init callback. The .ndo_init callback is called before the stats
allocation in netdev_register(), thus, the allocation will happen before
the netdev is visible.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240607084420.3932875-1-leitao@debian.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

45403b12

net: dsa: Fix typo in NET_DSA_TAG_RTL4_A Kconfig · 983e44f0

Chris Packham authored Jun 07, 2024

Fix a minor typo in the help text for the NET_DSA_TAG_RTL4_A config
option.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Link: https://lore.kernel.org/r/20240607020843.1380735-1-chris.packham@alliedtelesis.co.nzSigned-off-by: Jakub Kicinski <kuba@kernel.org>

983e44f0

11 Jun, 2024 7 commits

Merge branch 'intel-wired-lan-driver-updates-2024-06-03' · bb678f01

Jakub Kicinski authored Jun 10, 2024

Jacob Keller says:

====================
Intel Wired LAN Driver Updates 2024-06-03

This series includes miscellaneous improvements for the ice as well as a
cleanup to the Makefiles for all Intel net drivers.

Andy fixes all of the Intel net driver Makefiles to use the documented
'*-y' syntax for specifying object files to link into kernel driver
modules, rather than the '*-objs' syntax which works but is documented as
reserved for user-space host programs.

Jacob has a cleanup to refactor rounding logic in the ice driver into a
common roundup_u64 helper function.

Michal Schmidt replaces irq_set_affinity_hint() to use
irq_update_affinity_hint() which behaves better with user-applied affinity
settings.

v2: https://lore.kernel.org/r/20240605-next-2024-06-03-intel-next-batch-v2-0-39c23963fa78@intel.com
v1: https://lore.kernel.org/r/20240603-next-2024-06-03-intel-next-batch-v1-0-e0523b28f325@intel.com
====================

Link: https://lore.kernel.org/r/20240607-next-2024-06-03-intel-next-batch-v3-0-d1470cee3347@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bb678f01

ice: use irq_update_affinity_hint() · dee55767

Michal Schmidt authored Jun 07, 2024

irq_set_affinity_hint() is deprecated. Use irq_update_affinity_hint()
instead. This removes the side-effect of actually applying the affinity.

The driver does not really need to worry about spreading its IRQs across
CPUs. The core code already takes care of that.
On the contrary, when the driver applies affinities by itself, it breaks
the users' expectations:
 1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in
    order to prevent IRQs from being moved to certain CPUs that run a
    real-time workload.
 2. ice reconfigures VSIs at runtime due to a MIB change
    (ice_dcb_process_lldp_set_mib_change). Reopening a VSI resets the
    affinity in ice_vsi_req_irq_msix().
 3. ice has no idea about irqbalance's config, so it may move an IRQ to
    a banned CPU. The real-time workload suffers unacceptable latency.

I am not sure if updating the affinity hints is at all useful, because
irqbalance ignores them since 2016 ([1]), but at least it's harmless.

This ice change is similar to i40e commit d34c54d1 ("i40e: Use
irq_update_affinity_hint()").

[1] https://github.com/Irqbalance/irqbalance/commit/dcc411e7bfddSigned-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Sunil Goutham <sgoutham@marvell.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240607-next-2024-06-03-intel-next-batch-v3-3-d1470cee3347@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

dee55767

ice: add and use roundup_u64 instead of open coding equivalent · 1d4ce389

Jacob Keller authored Jun 07, 2024

In ice_ptp_cfg_clkout(), the ice driver needs to calculate the nearest next
second of a current time value specified in nanoseconds. It implements this
using div64_u64, because the time value is a u64. It could use div_u64
since NSEC_PER_SEC is smaller than 32-bits.

Ideally this would be implemented directly with roundup(), but that can't
work on all platforms due to a division which requires using the specific
macros and functions due to platform restrictions, and to ensure that the
most appropriate and fast instructions are used.

The kernel doesn't currently provide any 64-bit equivalents for doing
roundup. Attempting to use roundup() on a 32-bit platform will result in a
link failure due to not having a direct 64-bit division.

The closest equivalent for this is DIV64_U64_ROUND_UP, which does a
division always rounding up. However, this only computes the division, and
forces use of the div64_u64 in cases where the divisor is a 32bit value and
could make use of div_u64.

Introduce DIV_U64_ROUND_UP based on div_u64, and then use it to implement
roundup_u64 which takes a u64 input value and a u32 rounding value.

The name roundup_u64 matches the naming scheme of div_u64, and future
patches could implement roundup64_u64 if they need to round by a multiple
that is greater than 32-bits.

Replace the logic in ice_ptp.c which does this equivalent with the newly
added roundup_u64.
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240607-next-2024-06-03-intel-next-batch-v3-2-d1470cee3347@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

1d4ce389

net: intel: Use *-y instead of *-objs in Makefile · a2fe35df

Andy Shevchenko authored Jun 07, 2024

*-objs suffix is reserved rather for (user-space) host programs while
usually *-y suffix is used for kernel drivers (although *-objs works
for that purpose for now).

Let's correct the old usages of *-objs in Makefiles.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240607-next-2024-06-03-intel-next-batch-v3-1-d1470cee3347@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a2fe35df

isdn: add missing MODULE_DESCRIPTION() macros · 2ebb87f4

Jeff Johnson authored Jun 07, 2024

make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/hfcpci.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/hfcmulti.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/hfcsusb.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/avmfritz.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/speedfax.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/mISDNinfineon.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/w6692.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/netjet.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/mISDNipac.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/mISDNisar.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/mISDN/mISDN_core.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/mISDN/mISDN_dsp.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/mISDN/l1oip.o

Add the missing invocations of the MODULE_DESCRIPTION() macro.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20240607-md-drivers-isdn-v1-1-81fb7001bc3a@quicinc.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2ebb87f4

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · b1156532

Jakub Kicinski authored Jun 10, 2024

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-06-06

We've added 54 non-merge commits during the last 10 day(s) which contain
a total of 50 files changed, 1887 insertions(+), 527 deletions(-).

The main changes are:

1) Add a user space notification mechanism via epoll when a struct_ops
   object is getting detached/unregistered, from Kui-Feng Lee.

2) Big batch of BPF selftest refactoring for sockmap and BPF congctl
   tests, from Geliang Tang.

3) Add BTF field (type and string fields, right now) iterator support
   to libbpf instead of using existing callback-based approaches,
   from Andrii Nakryiko.

4) Extend BPF selftests for the latter with a new btf_field_iter
   selftest, from Alan Maguire.

5) Add new kfuncs for a generic, open-coded bits iterator,
   from Yafang Shao.

6) Fix BPF selftests' kallsyms_find() helper under kernels configured
   with CONFIG_LTO_CLANG_THIN, from Yonghong Song.

7) Remove a bunch of unused structs in BPF selftests,
   from David Alan Gilbert.

8) Convert test_sockmap section names into names understood by libbpf
   so it can deduce program type and attach type, from Jakub Sitnicki.

9) Extend libbpf with the ability to configure log verbosity
   via LIBBPF_LOG_LEVEL environment variable, from Mykyta Yatsenko.

10) Fix BPF selftests with regards to bpf_cookie and find_vma flakiness
    in nested VMs, from Song Liu.

11) Extend riscv32/64 JITs to introduce shift/add helpers to generate Zba
    optimization, from Xiao Wang.

12) Enable BPF programs to declare arrays and struct fields with kptr,
    bpf_rb_root, and bpf_list_head, from Kui-Feng Lee.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits)
  selftests/bpf: Drop useless arguments of do_test in bpf_tcp_ca
  selftests/bpf: Use start_test in test_dctcp in bpf_tcp_ca
  selftests/bpf: Use start_test in test_dctcp_fallback in bpf_tcp_ca
  selftests/bpf: Add start_test helper in bpf_tcp_ca
  selftests/bpf: Use connect_to_fd_opts in do_test in bpf_tcp_ca
  libbpf: Auto-attach struct_ops BPF maps in BPF skeleton
  selftests/bpf: Add btf_field_iter selftests
  selftests/bpf: Fix send_signal test with nested CONFIG_PARAVIRT
  libbpf: Remove callback-based type/string BTF field visitor helpers
  bpftool: Use BTF field iterator in btfgen
  libbpf: Make use of BTF field iterator in BTF handling code
  libbpf: Make use of BTF field iterator in BPF linker code
  libbpf: Add BTF field iterator
  selftests/bpf: Ignore .llvm.<hash> suffix in kallsyms_find()
  selftests/bpf: Fix bpf_cookie and find_vma in nested VM
  selftests/bpf: Test global bpf_list_head arrays.
  selftests/bpf: Test global bpf_rb_root arrays and fields in nested struct types.
  selftests/bpf: Test kptr arrays and kptrs in nested struct fields.
  bpf: limit the number of levels of a nested struct type.
  bpf: look into the types of the fields of a struct type recursively.
  ...
====================

Link: https://lore.kernel.org/r/20240606223146.23020-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b1156532

Merge tag 'wireless-next-2024-06-07' of... · 93d4e8bb

Jakub Kicinski authored Jun 10, 2024

Merge tag 'wireless-next-2024-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.11

The first "new features" pull request for v6.11 with changes both in
stack and in drivers. Nothing out of ordinary, except that we have
two conflicts this time:

net/mac80211/cfg.c
  https://lore.kernel.org/all/20240531124415.05b25e7a@canb.auug.org.au

drivers/net/wireless/microchip/wilc1000/netdev.c
  https://lore.kernel.org/all/20240603110023.23572803@canb.auug.org.au

Major changes:

cfg80211/mac80211
 * parse Transmit Power Envelope (TPE) data in mac80211 instead of in drivers

wilc1000
 * read MAC address during probe to make it visible to user space

iwlwifi
 * bump FW API to 91 for BZ/SC devices
 * report 64-bit radiotap timestamp
 * enable P2P low latency by default
 * handle Transmit Power Envelope (TPE) advertised by AP
 * start using guard()

rtlwifi
 * RTL8192DU support

ath12k
 * remove unsupported tx monitor handling
 * channel 2 in 6 GHz band support
 * Spatial Multiplexing Power Save (SMPS) in 6 GHz band support
 * multiple BSSID (MBSSID) and Enhanced Multi-BSSID Advertisements (EMA)
   support
 * dynamic VLAN support
 * add panic handler for resetting the firmware state

ath10k
 * add qcom,no-msa-ready-indicator Device Tree property
 * LED support for various chipsets

* tag 'wireless-next-2024-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (194 commits)
  wifi: ath12k: add hw_link_id in ath12k_pdev
  wifi: ath12k: add panic handler
  wifi: rtw89: chan: Use swap() in rtw89_swap_sub_entity()
  wifi: brcm80211: remove unused structs
  wifi: brcm80211: use sizeof(*pointer) instead of sizeof(type)
  wifi: ath12k: do not process consecutive RDDM event
  dt-bindings: net: wireless: ath11k: Drop "qcom,ipq8074-wcss-pil" from example
  wifi: ath12k: fix memory leak in ath12k_dp_rx_peer_frag_setup()
  wifi: rtlwifi: handle return value of usb init TX/RX
  wifi: rtlwifi: Enable the new rtl8192du driver
  wifi: rtlwifi: Add rtl8192du/sw.c
  wifi: rtlwifi: Constify rtl_hal_cfg.{ops,usb_interface_cfg} and rtl_priv.cfg
  wifi: rtlwifi: Add rtl8192du/dm.{c,h}
  wifi: rtlwifi: Add rtl8192du/fw.{c,h} and rtl8192du/led.{c,h}
  wifi: rtlwifi: Add rtl8192du/rf.{c,h}
  wifi: rtlwifi: Add rtl8192du/trx.{c,h}
  wifi: rtlwifi: Add rtl8192du/phy.{c,h}
  wifi: rtlwifi: Add rtl8192du/hw.{c,h}
  wifi: rtlwifi: Add new members to struct rtl_priv for RTL8192DU
  wifi: rtlwifi: Add rtl8192du/table.{c,h}
  ...
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

====================

Link: https://lore.kernel.org/r/20240607093517.41394C2BBFC@smtp.kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

93d4e8bb

10 Jun, 2024 11 commits

Merge branch 'fix-changing-dsa-conduit' · 2ba6d157

David S. Miller authored Jun 10, 2024

Marek Behún says:

====================
Fix changing DSA conduit

This series fixes an issue in the DSA code related to host interface UC
address installed into port FDB and port conduit address database when
live-changing port conduit.

The first patch refactores/deduplicates the installation/uninstallation
of the interface's MAC address and the second patch fixes the issue.

Cover letter for v1 and v2:
  https://patchwork.kernel.org/project/netdevbpf/cover/20240429163627.16031-1-kabel@kernel.org/
  https://patchwork.kernel.org/project/netdevbpf/cover/20240502122922.28139-1-kabel@kernel.org/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2ba6d157

net: dsa: update the unicast MAC address when changing conduit · eef8e906

Marek Behún authored Jun 05, 2024

When changing DSA user interface conduit while the user interface is up,
DSA exhibits different behavior in comparison to when the interface is
down. This different behavior concerns the primary unicast MAC address
stored in the port standalone FDB and in the conduit device UC database.

If we put a switch port down while changing the conduit with
  ip link set sw0p0 down
  ip link set sw0p0 type dsa conduit conduit1
  ip link set sw0p0 up
we delete the address in dsa_user_close() and install the (possibly
different) address in dsa_user_open().

But when changing the conduit on the fly, the old address is not
deleted and the new one is not installed.

Since we explicitly want to support live-changing the conduit, uninstall
the old address before calling dsa_port_assign_conduit() and install the
(possibly different) new address after the call.

Because conduit change might also trigger address change (the user
interface is supposed to inherit the conduit interface MAC address if no
address is defined in hardware (dp->mac is a zero address)), move the
eth_hw_addr_inherit() call from dsa_user_change_conduit() to
dsa_port_change_conduit(), just before installing the new address.

Although this is in theory a flaw in DSA core, it needs not be
backported, since there is currently no DSA driver that can be affected
by this. The only DSA driver that supports changing conduit is felix,
and, as explained by Vladimir Oltean [1]:

  There are 2 reasons why with felix the bug does not manifest itself.

  First is because both the 'ocelot' and the alternate 'ocelot-8021q'
  tagging protocols have the 'promisc_on_conduit = true' flag. So the
  unicast address doesn't have to be in the conduit's RX filter -
  neither the old or the new conduit.

  Second, dsa_user_host_uc_install() theoretically leaves behind host
  FDB entries installed towards the wrong (old) CPU port. But in
  felix_fdb_add(), we treat any FDB entry requested towards any CPU port
  as if it was a multicast FDB entry programmed towards _all_ CPU ports.
  For that reason, it is installed towards the port mask of the PGID_CPU
  port group ID:

	if (dsa_port_is_cpu(dp))
		port = PGID_CPU;

Therefore no Fixes tag for this change.

[1] https://lore.kernel.org/netdev/20240507201827.47suw4fwcjrbungy@skbuf/Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eef8e906

net: dsa: deduplicate code adding / deleting the port address to fdb · 77f75412

Marek Behún authored Jun 05, 2024

The sequence
  if (dsa_switch_supports_uc_filtering(ds))
    dsa_port_standalone_host_fdb_add(dp, addr, 0);
  if (!ether_addr_equal(addr, conduit->dev_addr))
    dev_uc_add(conduit, addr);
is executed both in dsa_user_open() and dsa_user_set_mac_addr().

Its reverse is executed both in dsa_user_close() and
dsa_user_set_mac_addr().

Refactor these sequences into new functions dsa_user_host_uc_install()
and dsa_user_host_uc_uninstall().
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

77f75412

Merge branch 'rtnetlink-rtnl_lock' · 395059c5

David S. Miller authored Jun 10, 2024

Jakub Kicinski says:

====================
rtnetlink: move rtnl_lock handling out of af_netlink

With the changes done in commit 5b4b62a1 ("rtnetlink: make
the "split" NLM_DONE handling generic") we can also move the
rtnl locking out of af_netlink.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

395059c5

net: netlink: remove the cb_mutex "injection" from netlink core · 5fbf57a9

Jakub Kicinski authored Jun 06, 2024

Back in 2007, in commit af65bdfc ("[NETLINK]: Switch cb_lock spinlock
to mutex and allow to override it") netlink core was extended to allow
subsystems to replace the dump mutex lock with its own lock.

The mechanism was used by rtnetlink to take rtnl_lock but it isn't
sufficiently flexible for other users. Over the 17 years since
it was added no other user appeared. Since rtnetlink needs conditional
locking now, and doesn't use it either, axe this feature complete.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5fbf57a9

rtnetlink: move rtnl_lock handling out of af_netlink · 5380d64f

Jakub Kicinski authored Jun 06, 2024

Now that we have an intermediate layer of code for handling
rtnl-level netlink dump quirks, we can move the rtnl_lock
taking there.

For dump handlers with RTNL_FLAG_DUMP_SPLIT_NLM_DONE we can
avoid taking rtnl_lock just to generate NLM_DONE, once again.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5380d64f

net: dsa: hellcreek: Replace kernel.h with what is used · c917b26e

Andy Shevchenko authored Jun 06, 2024

kernel.h is included solely for some other existing headers.
Include them directly and get rid of kernel.h.

While at it, sort headers alphabetically for easier maintenance.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

c917b26e

Merge branch 'tcp-up-pin-tw-timer' · a9522664

David S. Miller authored Jun 10, 2024

Florian Westphal says:

====================
net: tcp: un-pin tw timer

Changes since previous iteration:
 - Patch 1: update a comment, I copied Erics v7 RvB tag.
 - Patch 2: move bh off/on into hashdance_schedule and get rid of
   comment mentioning pinned tw timer.
   I did not copy Erics RvB tag over from v7 because of the change.
 - Patch 3 is unchanged, so I kept Erics RvB tag.

This is v8 of the series where the tw_timer is un-pinned to get rid of
interferences in isolated CPUs setups.

First patch makes necessary preparations, existing code relies on
TIMER_PINNED to avoid races.

Second patch un-pins the TW timer. Could be folded into the first one,
but it might help wrt. bisection.

Third patch is a minor cleanup to move a helper from .h to the only
remaining compilation unit.

Tested with iperf3 and stress-ng socket mode.
====================
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

a9522664

tcp: move inet_twsk_schedule helper out of header · f81d0dd2

Florian Westphal authored Jun 06, 2024

Its no longer used outside inet_timewait_sock.c, so move it there.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

f81d0dd2

net: tcp: un-pin the tw_timer · c75ad7c7

Florian Westphal authored Jun 06, 2024

After previous patch, even if timer fires immediately on another CPU,
context that schedules the timer now holds the ehash spinlock, so timer
cannot reap tw socket until ehash lock is released.

BH disable is moved into hashdance_schedule.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c75ad7c7

net: tcp/dccp: prepare for tw_timer un-pinning · b334b924

Valentin Schneider authored Jun 06, 2024

The TCP timewait timer is proving to be problematic for setups where
scheduler CPU isolation is achieved at runtime via cpusets (as opposed to
statically via isolcpus=domains).

What happens there is a CPU goes through tcp_time_wait(), arming the
time_wait timer, then gets isolated. TCP_TIMEWAIT_LEN later, the timer
fires, causing interference for the now-isolated CPU. This is conceptually
similar to the issue described in commit e02b9312 ("workqueue: Unbind
kworkers before sending them to exit()")

Move inet_twsk_schedule() to within inet_twsk_hashdance(), with the ehash
lock held. Expand the lock's critical section from inet_twsk_kill() to
inet_twsk_deschedule_put(), serializing the scheduling vs descheduling of
the timer. IOW, this prevents the following race:

			     tcp_time_wait()
			       inet_twsk_hashdance()
  inet_twsk_deschedule_put()
    del_timer_sync()
			       inet_twsk_schedule()

Thanks to Paolo Abeni for suggesting to leverage the ehash lock.

This also restores a comment from commit ec94c269 ("tcp/dccp: avoid
one atomic operation for timewait hashdance") as inet_twsk_hashdance() had
a "Step 1" and "Step 3" comment, but the "Step 2" had gone missing.

inet_twsk_deschedule_put() now acquires the ehash spinlock to synchronize
with inet_twsk_hashdance_schedule().

To ease possible regression search, actual un-pin is done in next patch.

Link: https://lore.kernel.org/all/ZPhpfMjSiHVjQkTk@localhost.localdomain/Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

b334b924