- 13 Jun, 2024 2 commits
-
-
Asbjørn Sloth Tønnesen authored
This patch adds two new helper functions: flow_rule_is_supp_enc_control_flags() flow_rule_has_enc_control_flags() They are intended to be used for validating encapsulation control flags, and compliment the similar helpers without "enc_" in the name. The only difference is that they have their own error message, to make it obvious if an unsupported flag error is related to FLOW_DISSECTOR_KEY_CONTROL or FLOW_DISSECTOR_KEY_ENC_CONTROL. flow_rule_has_enc_control_flags() is for drivers supporting FLOW_DISSECTOR_KEY_ENC_CONTROL, but not supporting any encapsulation control flags. (Currently all 4 drivers fits this category) flow_rule_is_supp_enc_control_flags() is currently only used for the above helper, but should also be used by drivers once they implement at least one encapsulation control flag. There is AFAICT currently no need for an "enc_" variant of flow_rule_match_has_control_flags(), as all drivers currently supporting FLOW_DISSECTOR_KEY_ENC_CONTROL, are already calling flow_rule_match_enc_control() directly. Only compile tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/20240609173358.193178-2-ast@fiberby.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Elad Yifee authored
Add the missing pieces to allow multiple PPEs units, one for each GMAC. mtk_gdm_config has been modified to work on targted mac ID, the inner loop moved outside of the function to allow unrelated operations like setting the MAC's PPE index. Introduce a sanity check in flow_offload_replace to account for non-MTK ingress devices. Additional field 'ppe_idx' was added to struct mtk_mac in order to keep track on the assigned PPE unit. Signed-off-by: Elad Yifee <eladwf@gmail.com> Link: https://lore.kernel.org/r/20240607082155.20021-1-eladwf@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 12 Jun, 2024 31 commits
-
-
Jakub Kicinski authored
Petr Machata says: ==================== Allow configuration of multipath hash seed Let me just quote the commit message of patch #2 here to inform the motivation and some of the implementation: When calculating hashes for the purpose of multipath forwarding, both IPv4 and IPv6 code currently fall back on flow_hash_from_keys(). That uses a randomly-generated seed. That's a fine choice by default, but unfortunately some deployments may need a tighter control over the seed used. In this patchset, make the seed configurable by adding a new sysctl key, net.ipv4.fib_multipath_hash_seed to control the seed. This seed is used specifically for multipath forwarding and not for the other concerns that flow_hash_from_keys() is used for, such as queue selection. Expose the knob as sysctl because other such settings, such as headers to hash, are also handled that way. Despite being placed in the net.ipv4 namespace, the multipath seed sysctl is used for both IPv4 and IPv6, similarly to e.g. a number of TCP variables. Like those, the multipath hash seed is a per-netns variable. The seed used by flow_hash_from_keys() is a 128-bit quantity. However it seems that usually the seed is a much more modest value. 32 bits seem typical (Cisco, Cumulus), some systems go even lower. For that reason, and to decouple the user interface from implementation details, go with a 32-bit quantity, which is then quadruplicated to form the siphash key. One example of use of this interface is avoiding hash polarization, where two ECMP routers, one behind the other, happen to make consistent hashing decisions, and as a result, part of the ECMP space of the latter router is never used. Another is a load balancer where several machines forward traffic to one of a number of leaves, and the forwarding decisions need to be made consistently. (This is a case of a desired hash polarization, mentioned e.g. in chapter 6.3 of [0].) There has already been a proposal to include a hash seed control interface in the past[1]. - Patches #1-#2 contain the substance of the work - Patch #3 is an mlxsw offload - Patches #4 and #5 are a selftest [0] https://www.usenix.org/system/files/conference/nsdi18/nsdi18-araujo.pdf [1] https://lore.kernel.org/netdev/YIlVpYMCn%2F8WfE1P@rnd/ ==================== Link: https://lore.kernel.org/r/20240607151357.421181-1-petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Petr Machata authored
Add a selftest that exercises the sysctl added in the previous patches. Test that set/get works as expected; that across seeds we eventually hit all NHs (test_mpath_seed_*); and that a given seed keeps hitting the same NHs even across seed changes (test_mpath_seed_stability_*). Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240607151357.421181-6-petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Petr Machata authored
In order to be able to save the current value of a sysctl without changing it, split the relevant bit out of sysctl_set() into a new helper. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240607151357.421181-5-petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Petr Machata authored
When Spectrum machines compute hash for the purposes of ECMP routing, they use a seed specified through RECR_v2 (Router ECMP Configuration Register). Up until now mlxsw computed the seed by hashing the machine's base MAC. Now that we can optionally have a user-provided seed, use that if possible. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240607151357.421181-4-petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Petr Machata authored
When calculating hashes for the purpose of multipath forwarding, both IPv4 and IPv6 code currently fall back on flow_hash_from_keys(). That uses a randomly-generated seed. That's a fine choice by default, but unfortunately some deployments may need a tighter control over the seed used. In this patch, make the seed configurable by adding a new sysctl key, net.ipv4.fib_multipath_hash_seed to control the seed. This seed is used specifically for multipath forwarding and not for the other concerns that flow_hash_from_keys() is used for, such as queue selection. Expose the knob as sysctl because other such settings, such as headers to hash, are also handled that way. Like those, the multipath hash seed is a per-netns variable. Despite being placed in the net.ipv4 namespace, the multipath seed sysctl is used for both IPv4 and IPv6, similarly to e.g. a number of TCP variables. The seed used by flow_hash_from_keys() is a 128-bit quantity. However it seems that usually the seed is a much more modest value. 32 bits seem typical (Cisco, Cumulus), some systems go even lower. For that reason, and to decouple the user interface from implementation details, go with a 32-bit quantity, which is then quadruplicated to form the siphash key. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240607151357.421181-3-petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Petr Machata authored
The following patches will add a sysctl to control multipath hash seed. In order to centralize the hash computation, add a helper, fib_multipath_hash_from_keys(), and have all IPv4 and IPv6 route.c invocations of flow_hash_from_keys() go through this helper instead. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240607151357.421181-2-petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Sean Anderson authored
This error message can be triggered by userspace. Use NL_SET_ERR_MSG so the message is returned to the user and to avoid polluting the kernel logs. Additionally, change the return value from EFAULT to EBUSY to better reflect the error (which has nothing to do with addressing). Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Link: https://lore.kernel.org/r/20240611154116.2643662-1-sean.anderson@linux.devSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geert Uytterhoeven authored
If CONFIG_PAGE_POOL is not enabled: aarch64-linux-gnu-ld: Unexpected GOT/PLT entries detected! aarch64-linux-gnu-ld: Unexpected run-time procedure linkages detected! aarch64-linux-gnu-ld: drivers/net/ethernet/renesas/ravb_main.o: in function `ravb_rx_ring_refill': ravb_main.c:(.text+0x8d8): undefined reference to `page_pool_alloc_pages' aarch64-linux-gnu-ld: ravb_main.c:(.text+0x944): undefined reference to `page_pool_alloc_frag' aarch64-linux-gnu-ld: drivers/net/ethernet/renesas/ravb_main.o: in function `ravb_ring_init': ravb_main.c:(.text+0x1d4c): undefined reference to `page_pool_create' Fixes: 96672632 ("net: ravb: Allocate RX buffers via page pool") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Paul Barker <paul.barker.ct@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/fa61b464ae1aa7630e9024f091991937941d49f1.1718113630.git.geert+renesas@glider.beSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Florian Westphal says: ==================== net: flow dissector: allow explicit passing of netns Change since last version: fix kdoc comment warning reported by kbuild robot, no other changes, thus retaining RvB tags from Eric and Willem. v1: https://lore.kernel.org/netdev/20240607083205.3000-1-fw@strlen.de/ Years ago flow dissector gained ability to delegate flow dissection to a bpf program, scoped per netns. The netns is derived from skb->dev, and if that is not available, from skb->sk. If neither is set, we hit a (benign) WARN_ON_ONCE(). This WARN_ON_ONCE can be triggered from netfilter. Known skb origins are nf_send_reset and ipv4 stack generated IGMP messages. Lets allow callers to pass the current netns explicitly and make nf_tables use those instead. This targets net-next instead of net because the WARN is benign and this is not a regression. ==================== Link: https://lore.kernel.org/r/20240608221057.16070-1-fw@strlen.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Florian Westphal authored
Similar to previous patch: apply same logic for __skb_get_hash_symmetric and let callers pass the netns to the dissector core. Existing function is turned into a wrapper to avoid adjusting all callers, nft_hash.c uses new function. Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240608221057.16070-3-fw@strlen.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Florian Westphal authored
Years ago flow dissector gained ability to delegate flow dissection to a bpf program, scoped per netns. Unfortunately, skb_get_hash() only gets an sk_buff argument instead of both net+skb. This means the flow dissector needs to obtain the netns pointer from somewhere else. The netns is derived from skb->dev, and if that is not available, from skb->sk. If neither is set, we hit a (benign) WARN_ON_ONCE(). Trying both dev and sk covers most cases, but not all, as recently reported by Christoph Paasch. In case of nf-generated tcp reset, both sk and dev are NULL: WARNING: .. net/core/flow_dissector.c:1104 skb_flow_dissect_flow_keys include/linux/skbuff.h:1536 [inline] skb_get_hash include/linux/skbuff.h:1578 [inline] nft_trace_init+0x7d/0x120 net/netfilter/nf_tables_trace.c:320 nft_do_chain+0xb26/0xb90 net/netfilter/nf_tables_core.c:268 nft_do_chain_ipv4+0x7a/0xa0 net/netfilter/nft_chain_filter.c:23 nf_hook_slow+0x57/0x160 net/netfilter/core.c:626 __ip_local_out+0x21d/0x260 net/ipv4/ip_output.c:118 ip_local_out+0x26/0x1e0 net/ipv4/ip_output.c:127 nf_send_reset+0x58c/0x700 net/ipv4/netfilter/nf_reject_ipv4.c:308 nft_reject_ipv4_eval+0x53/0x90 net/ipv4/netfilter/nft_reject_ipv4.c:30 [..] syzkaller did something like this: table inet filter { chain input { type filter hook input priority filter; policy accept; meta nftrace set 1 tcp dport 42 reject with tcp reset } chain output { type filter hook output priority filter; policy accept; # empty chain is enough } } ... then sends a tcp packet to port 42. Initial attempt to simply set skb->dev from nf_reject_ipv4 doesn't cover all cases: skbs generated via ipv4 igmp_send_report trigger similar splat. Moreover, Pablo Neira found that nft_hash.c uses __skb_get_hash_symmetric() which would trigger same warn splat for such skbs. Lets allow callers to pass the current netns explicitly. The nf_trace infrastructure is adjusted to use the new helper. __skb_get_hash_symmetric is handled in the next patch. Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/494Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240608221057.16070-2-fw@strlen.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
David S. Miller authored
Dmitry Safonov says: ==================== net/tcp: TCP-AO and TCP-MD5 tracepoints Changes in v2: - Fix the build with CONFIG_IPV6=m (Eric Dumazet) - Move unused keyid/rnext/maclen later in the series to the patch that uses them (Simon Horman) - Reworked tcp_ao selftest lib to allow async tracing non-tcp events (was working on a stress-test that needs trace_kfree_skb() event, not in this series). - Separated selftest changes from kernel, as they now have a couple of unrelated to tracepoints changes - Wrote a few lines of Documentation/ - Link to v1: https://lore.kernel.org/r/20240224-tcp-ao-tracepoints-v1-0-15f31b7f30a7@arista.com ==================== Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dmitry Safonov authored
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dmitry Safonov authored
Now there are tracepoints, that cover all functionality of tcp_hash_fail(), but also wire up missing places They are also faster, can be disabled and provide filtering. This potentially may create a regression if a userspace depends on dmesg logs. Fingers crossed, let's see if anyone complains in reality. Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dmitry Safonov authored
Instead of forcing userspace to parse dmesg (that's what currently is happening, at least in codebase of my current company), provide a better way, that can be enabled/disabled in runtime. Currently, there are already tcp events, add hashing related ones there, too. Rasdaemon currently exercises net_dev_xmit_timeout, devlink_health_report, but it'll be trivial to teach it to deal with failed hashes. Otherwise, BGP may trace/log them itself. Especially exciting for possible investigations is key rotation (RNext_key requests). Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dmitry Safonov authored
Two reasons: 1. It's grown up enough 2. In order to not do header spaghetti by including <trace/events/tcp.h>, which is necessary for TCP tracepoints. While at it, unexport and make static tcp_inbound_ao_hash(). Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dmitry Safonov authored
It's going to be used more in TCP-AO tracepoints. Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Dmitry Safonov authored
It's possible to clean-up some ifdefs by hiding that tcp_{md5,ao}_needed static branch is defined and compiled only under related configs, since commit 4c8530dc ("net/tcp: Only produce AO/MD5 logs if there are any keys"). Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
Matthieu Baerts says: ==================== selftests: mptcp: use net/lib.sh to manage netns The goal of this series is to use helpers from net/lib.sh with MPTCP selftests. - Patches 1 to 4 are some clean-ups and preparation in net/lib.sh: - Patch 1 simplifies the code handling errexit by ignoring possible errors instead of disabling errexit temporary. - Patch 2 removes the netns from the list after having cleaned it, not to try to clean it twice. - Patch 3 removes the 'readonly' attribute for the netns variable, to allow using the same name in local variables. - Patch 4 removes the local 'ns' var, not to conflict with the global one it needs to setup. - Patch 5 uses helpers from net/lib.sh to create and delete netns in MPTCP selftests. - Patch 6 uses wait_local_port_listen helper from net/net_helper.sh. ==================== Link: https://lore.kernel.org/r/20240607-upstream-net-next-20240607-selftests-mptcp-net-lib-v1-0-e36986faac94@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
This patch includes net_helper.sh into mptcp_lib.sh, uses the helper wait_local_port_listen() defined in it to implement the similar mptcp helper. This can drop some duplicate code. It looks like this helper from net_helper.sh was originally coming from MPTCP, but MPTCP selftests have not been updated to use it from this shared place. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240607-upstream-net-next-20240607-selftests-mptcp-net-lib-v1-6-e36986faac94@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
This patch includes lib.sh into mptcp_lib.sh, uses setup_ns helper defined in lib.sh to set up namespaces in mptcp_lib_ns_init(), and uses cleanup_ns to delete namespaces in mptcp_lib_ns_exit(). Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240607-upstream-net-next-20240607-selftests-mptcp-net-lib-v1-5-e36986faac94@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Geliang Tang authored
The helper setup_ns() doesn't work when a net namespace named "ns" is passed to it. For example, in net/mptcp/diag.sh, the name of the namespace is "ns". If "setup_ns ns" is used in it, diag.sh fails with errors: Invalid netns name "./mptcp_connect" Cannot open network namespace "10000": No such file or directory Cannot open network namespace "10000": No such file or directory That is because "ns" is also a local variable in setup_ns, and it will not set the value for the global variable that has been giving in argument. To solve this, we could rename the variable, but it sounds better to drop it, as we can resolve the name using the variable passed in argument instead. The other local variables -- "ns_list" and "ns_name" -- are more unlikely to conflict with existing global variables. They don't seem to be currently used in any other net selftests. Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240607-upstream-net-next-20240607-selftests-mptcp-net-lib-v1-4-e36986faac94@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Matthieu Baerts (NGI0) authored
It sounds good to mark the global netns variable as 'readonly', but Bash doesn't allow the creation of local variables with the same name. Because it looks like 'readonly' is mainly used here to check if a netns with that name has already been set, it sounds fine to check if a variable with this name has already been set instead. By doing that, we avoid having to modify helpers from MPTCP selftests using the same variable name as the one used to store the created netns name. While at it, also avoid an unnecessary call to 'eval' to set a local variable. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240607-upstream-net-next-20240607-selftests-mptcp-net-lib-v1-3-e36986faac94@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Matthieu Baerts (NGI0) authored
Instead of only appending items to the list, removing them when the netns has been deleted. By doing that, we can make sure 'cleanup_all_ns()' is not trying to remove already deleted netns. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240607-upstream-net-next-20240607-selftests-mptcp-net-lib-v1-2-e36986faac94@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Matthieu Baerts (NGI0) authored
No need to disable errexit temporary, simply ignore the only possible and not handled error. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240607-upstream-net-next-20240607-selftests-mptcp-net-lib-v1-1-e36986faac94@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Jeremy Kerr says: ==================== net: core: Unify dstats with tstats and lstats, implement generic dstats collection The struct pcpu_dstats ("dstats") has a few variations from the other two stats types (struct pcpu_sw_netstats and struct pcpu_lstats), and doesn't have generic helpers for collecting the per-cpu stats into a struct rtnl_link_stats64. This change unifies dstats with the other types, adds a collection implementation to the core, and updates the single driver (vrf) to use this generic implementation. Of course, questions/comments/etc are most welcome! v3: https://lore.kernel.org/r/20240605-dstats-v2-0-7fae03f813f3@codeconstruct.com.au v2: v1: https://lore.kernel.org/r/20240605-dstats-v1-0-1024396e1670@codeconstruct.com.au ==================== Link: https://lore.kernel.org/r/20240607-dstats-v3-0-cc781fe116f7@codeconstruct.com.auSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jeremy Kerr authored
The vrf driver has its own dstats-to-rtnl_link_stats64 collection, but we now have a generic implementation for dstats collection, so switch to this. In doing so, we fix a minor issue where the (non-percpu) dev->stats->tx_errors value was never collected into rtnl_link_stats64, as the generic dev_get_dstats64() consumes the starting values from dev->stats. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240607-dstats-v3-3-cc781fe116f7@codeconstruct.com.auSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jeremy Kerr authored
We currently have dev_get_tstats64() for collecting per-cpu stats of type pcpu_sw_netstats ("tstats"). However, tstats doesn't allow for accounting tx/rx drops. We do have a stats variant that does have stats for dropped packets: struct pcpu_dstats, but there are no core helpers for using those stats. The VRF driver uses dstats, by providing its own collation/fetch functions to do so. This change adds a common implementation for dstats-type collection, used when pcpu_stat_type == NETDEV_PCPU_STAT_DSTAT. This is based on the VRF driver's existing stats collator (plus the unused tx_drops stat from there). We will switch the VRF driver to use this in the next change. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240607-dstats-v3-2-cc781fe116f7@codeconstruct.com.auSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jeremy Kerr authored
The pcpu_sw_netstats and pcpu_lstats structs both contain a set of u64_stats_t fields for individual stats, but pcpu_dstats uses u64s instead. Make this consistent by using u64_stats_t across all stats types. The per-cpu dstats are only used by the vrf driver at present, so update that driver as part of this change. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240607-dstats-v3-1-cc781fe116f7@codeconstruct.com.auSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Breno Leitao authored
With commit 34d21de9 ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Move ip_tunnel driver to leverage the core allocation. All the ip_tunnel_init() users call ip_tunnel_init() as part of their .ndo_init callback. The .ndo_init callback is called before the stats allocation in netdev_register(), thus, the allocation will happen before the netdev is visible. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240607084420.3932875-1-leitao@debian.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Chris Packham authored
Fix a minor typo in the help text for the NET_DSA_TAG_RTL4_A config option. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Link: https://lore.kernel.org/r/20240607020843.1380735-1-chris.packham@alliedtelesis.co.nzSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
- 11 Jun, 2024 7 commits
-
-
Jakub Kicinski authored
Jacob Keller says: ==================== Intel Wired LAN Driver Updates 2024-06-03 This series includes miscellaneous improvements for the ice as well as a cleanup to the Makefiles for all Intel net drivers. Andy fixes all of the Intel net driver Makefiles to use the documented '*-y' syntax for specifying object files to link into kernel driver modules, rather than the '*-objs' syntax which works but is documented as reserved for user-space host programs. Jacob has a cleanup to refactor rounding logic in the ice driver into a common roundup_u64 helper function. Michal Schmidt replaces irq_set_affinity_hint() to use irq_update_affinity_hint() which behaves better with user-applied affinity settings. v2: https://lore.kernel.org/r/20240605-next-2024-06-03-intel-next-batch-v2-0-39c23963fa78@intel.com v1: https://lore.kernel.org/r/20240603-next-2024-06-03-intel-next-batch-v1-0-e0523b28f325@intel.com ==================== Link: https://lore.kernel.org/r/20240607-next-2024-06-03-intel-next-batch-v3-0-d1470cee3347@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Michal Schmidt authored
irq_set_affinity_hint() is deprecated. Use irq_update_affinity_hint() instead. This removes the side-effect of actually applying the affinity. The driver does not really need to worry about spreading its IRQs across CPUs. The core code already takes care of that. On the contrary, when the driver applies affinities by itself, it breaks the users' expectations: 1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in order to prevent IRQs from being moved to certain CPUs that run a real-time workload. 2. ice reconfigures VSIs at runtime due to a MIB change (ice_dcb_process_lldp_set_mib_change). Reopening a VSI resets the affinity in ice_vsi_req_irq_msix(). 3. ice has no idea about irqbalance's config, so it may move an IRQ to a banned CPU. The real-time workload suffers unacceptable latency. I am not sure if updating the affinity hints is at all useful, because irqbalance ignores them since 2016 ([1]), but at least it's harmless. This ice change is similar to i40e commit d34c54d1 ("i40e: Use irq_update_affinity_hint()"). [1] https://github.com/Irqbalance/irqbalance/commit/dcc411e7bfddSigned-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Sunil Goutham <sgoutham@marvell.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240607-next-2024-06-03-intel-next-batch-v3-3-d1470cee3347@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jacob Keller authored
In ice_ptp_cfg_clkout(), the ice driver needs to calculate the nearest next second of a current time value specified in nanoseconds. It implements this using div64_u64, because the time value is a u64. It could use div_u64 since NSEC_PER_SEC is smaller than 32-bits. Ideally this would be implemented directly with roundup(), but that can't work on all platforms due to a division which requires using the specific macros and functions due to platform restrictions, and to ensure that the most appropriate and fast instructions are used. The kernel doesn't currently provide any 64-bit equivalents for doing roundup. Attempting to use roundup() on a 32-bit platform will result in a link failure due to not having a direct 64-bit division. The closest equivalent for this is DIV64_U64_ROUND_UP, which does a division always rounding up. However, this only computes the division, and forces use of the div64_u64 in cases where the divisor is a 32bit value and could make use of div_u64. Introduce DIV_U64_ROUND_UP based on div_u64, and then use it to implement roundup_u64 which takes a u64 input value and a u32 rounding value. The name roundup_u64 matches the naming scheme of div_u64, and future patches could implement roundup64_u64 if they need to round by a multiple that is greater than 32-bits. Replace the logic in ice_ptp.c which does this equivalent with the newly added roundup_u64. Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240607-next-2024-06-03-intel-next-batch-v3-2-d1470cee3347@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Andy Shevchenko authored
*-objs suffix is reserved rather for (user-space) host programs while usually *-y suffix is used for kernel drivers (although *-objs works for that purpose for now). Let's correct the old usages of *-objs in Makefiles. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240607-next-2024-06-03-intel-next-batch-v3-1-d1470cee3347@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jeff Johnson authored
make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/hfcpci.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/hfcmulti.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/hfcsusb.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/avmfritz.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/speedfax.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/mISDNinfineon.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/w6692.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/netjet.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/mISDNipac.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/hardware/mISDN/mISDNisar.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/mISDN/mISDN_core.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/mISDN/mISDN_dsp.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/isdn/mISDN/l1oip.o Add the missing invocations of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240607-md-drivers-isdn-v1-1-81fb7001bc3a@quicinc.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski authored
Daniel Borkmann says: ==================== pull-request: bpf-next 2024-06-06 We've added 54 non-merge commits during the last 10 day(s) which contain a total of 50 files changed, 1887 insertions(+), 527 deletions(-). The main changes are: 1) Add a user space notification mechanism via epoll when a struct_ops object is getting detached/unregistered, from Kui-Feng Lee. 2) Big batch of BPF selftest refactoring for sockmap and BPF congctl tests, from Geliang Tang. 3) Add BTF field (type and string fields, right now) iterator support to libbpf instead of using existing callback-based approaches, from Andrii Nakryiko. 4) Extend BPF selftests for the latter with a new btf_field_iter selftest, from Alan Maguire. 5) Add new kfuncs for a generic, open-coded bits iterator, from Yafang Shao. 6) Fix BPF selftests' kallsyms_find() helper under kernels configured with CONFIG_LTO_CLANG_THIN, from Yonghong Song. 7) Remove a bunch of unused structs in BPF selftests, from David Alan Gilbert. 8) Convert test_sockmap section names into names understood by libbpf so it can deduce program type and attach type, from Jakub Sitnicki. 9) Extend libbpf with the ability to configure log verbosity via LIBBPF_LOG_LEVEL environment variable, from Mykyta Yatsenko. 10) Fix BPF selftests with regards to bpf_cookie and find_vma flakiness in nested VMs, from Song Liu. 11) Extend riscv32/64 JITs to introduce shift/add helpers to generate Zba optimization, from Xiao Wang. 12) Enable BPF programs to declare arrays and struct fields with kptr, bpf_rb_root, and bpf_list_head, from Kui-Feng Lee. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits) selftests/bpf: Drop useless arguments of do_test in bpf_tcp_ca selftests/bpf: Use start_test in test_dctcp in bpf_tcp_ca selftests/bpf: Use start_test in test_dctcp_fallback in bpf_tcp_ca selftests/bpf: Add start_test helper in bpf_tcp_ca selftests/bpf: Use connect_to_fd_opts in do_test in bpf_tcp_ca libbpf: Auto-attach struct_ops BPF maps in BPF skeleton selftests/bpf: Add btf_field_iter selftests selftests/bpf: Fix send_signal test with nested CONFIG_PARAVIRT libbpf: Remove callback-based type/string BTF field visitor helpers bpftool: Use BTF field iterator in btfgen libbpf: Make use of BTF field iterator in BTF handling code libbpf: Make use of BTF field iterator in BPF linker code libbpf: Add BTF field iterator selftests/bpf: Ignore .llvm.<hash> suffix in kallsyms_find() selftests/bpf: Fix bpf_cookie and find_vma in nested VM selftests/bpf: Test global bpf_list_head arrays. selftests/bpf: Test global bpf_rb_root arrays and fields in nested struct types. selftests/bpf: Test kptr arrays and kptrs in nested struct fields. bpf: limit the number of levels of a nested struct type. bpf: look into the types of the fields of a struct type recursively. ... ==================== Link: https://lore.kernel.org/r/20240606223146.23020-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-
Jakub Kicinski authored
Merge tag 'wireless-next-2024-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.11 The first "new features" pull request for v6.11 with changes both in stack and in drivers. Nothing out of ordinary, except that we have two conflicts this time: net/mac80211/cfg.c https://lore.kernel.org/all/20240531124415.05b25e7a@canb.auug.org.au drivers/net/wireless/microchip/wilc1000/netdev.c https://lore.kernel.org/all/20240603110023.23572803@canb.auug.org.au Major changes: cfg80211/mac80211 * parse Transmit Power Envelope (TPE) data in mac80211 instead of in drivers wilc1000 * read MAC address during probe to make it visible to user space iwlwifi * bump FW API to 91 for BZ/SC devices * report 64-bit radiotap timestamp * enable P2P low latency by default * handle Transmit Power Envelope (TPE) advertised by AP * start using guard() rtlwifi * RTL8192DU support ath12k * remove unsupported tx monitor handling * channel 2 in 6 GHz band support * Spatial Multiplexing Power Save (SMPS) in 6 GHz band support * multiple BSSID (MBSSID) and Enhanced Multi-BSSID Advertisements (EMA) support * dynamic VLAN support * add panic handler for resetting the firmware state ath10k * add qcom,no-msa-ready-indicator Device Tree property * LED support for various chipsets * tag 'wireless-next-2024-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (194 commits) wifi: ath12k: add hw_link_id in ath12k_pdev wifi: ath12k: add panic handler wifi: rtw89: chan: Use swap() in rtw89_swap_sub_entity() wifi: brcm80211: remove unused structs wifi: brcm80211: use sizeof(*pointer) instead of sizeof(type) wifi: ath12k: do not process consecutive RDDM event dt-bindings: net: wireless: ath11k: Drop "qcom,ipq8074-wcss-pil" from example wifi: ath12k: fix memory leak in ath12k_dp_rx_peer_frag_setup() wifi: rtlwifi: handle return value of usb init TX/RX wifi: rtlwifi: Enable the new rtl8192du driver wifi: rtlwifi: Add rtl8192du/sw.c wifi: rtlwifi: Constify rtl_hal_cfg.{ops,usb_interface_cfg} and rtl_priv.cfg wifi: rtlwifi: Add rtl8192du/dm.{c,h} wifi: rtlwifi: Add rtl8192du/fw.{c,h} and rtl8192du/led.{c,h} wifi: rtlwifi: Add rtl8192du/rf.{c,h} wifi: rtlwifi: Add rtl8192du/trx.{c,h} wifi: rtlwifi: Add rtl8192du/phy.{c,h} wifi: rtlwifi: Add rtl8192du/hw.{c,h} wifi: rtlwifi: Add new members to struct rtl_priv for RTL8192DU wifi: rtlwifi: Add rtl8192du/table.{c,h} ... Signed-off-by: Jakub Kicinski <kuba@kernel.org> ==================== Link: https://lore.kernel.org/r/20240607093517.41394C2BBFC@smtp.kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>
-