Commits · e43e6d9582e0dcd20532310dbb9f47dfe7a74ace · Kirill Smelkov / linux

25 Oct, 2023 4 commits

s390/qeth: replace deprecated strncpy with strscpy · e43e6d95

Justin Stitt authored Oct 23, 2023

strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

We expect new_entry->dbf_name to be NUL-terminated based on its use with
strcmp():
|       if (strcmp(entry->dbf_name, name) == 0) {

Moreover, NUL-padding is not required as new_entry is kzalloc'd just
before this assignment:
|       new_entry = kzalloc(sizeof(struct qeth_dbf_entry), GFP_KERNEL);

... rendering any future NUL-byte assignments (like the ones strncpy()
does) redundant.

Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Thorsten Winkler <twinkler@linux.ibm.com>
Tested-by: Thorsten Winkler <twinkler@linux.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20231023-strncpy-drivers-s390-net-qeth_core_main-c-v1-1-e7ce65454446@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e43e6d95

s390/ctcm: replace deprecated strncpy with strscpy · 19d1c64b

Justin Stitt authored Oct 23, 2023

strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

We expect chid to be NUL-terminated based on its use with format
strings:

CTCM_DBF_TEXT_(SETUP, CTC_DBF_INFO, "%s(%s) %s", CTCM_FUNTAIL,
chid, ok ? "OK" : "failed");

Moreover, NUL-padding is not required as it is _only_ used in this one
instance with a format string.

Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.

We can also drop the +1 from chid's declaration as we no longer need to
be cautious about leaving a spot for a NUL-byte. Let's use the more
idiomatic strscpy usage of (dest, src, sizeof(dest)) as this more
closely ties the destination buffer to the length.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Thorsten Winkler <twinkler@linux.ibm.com>
Tested-by: Thorsten Winkler <twinkler@linux.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20231023-strncpy-drivers-s390-net-ctcm_main-c-v1-1-265db6e78165@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

19d1c64b

net: ethernet: mtk_wed: remove wo pointer in wo_r32/wo_w32 signature · 42c815c5

Lorenzo Bianconi authored Oct 24, 2023

wo pointer is no longer used in wo_r32 and wo_w32 routines so get rid of
it.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/530537db0872f7523deff21f0a5dfdd9b75fdc9d.1698098459.git.lorenzo@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

42c815c5

net: ethernet: mtk_wed: fix firmware loading for MT7986 SoC · c35d7636

Lorenzo Bianconi authored Oct 24, 2023

The WED mcu firmware does not contain all the memory regions defined in
the dts reserved_memory node (e.g. MT7986 WED firmware does not contain
cpu-boot region).
Reverse the mtk_wed_mcu_run_firmware() logic to check all the fw
sections are defined in the dts reserved_memory node.

Fixes: c6d961ae ("net: ethernet: mtk_wed: move mem_region array out of mtk_wed_mcu_load_firmware")
Tested-by: Frank Wunderlich <frank-w@public-files.de>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/d983cbfe8ea562fef9264de8f0c501f7d5705bd5.1698098381.git.lorenzo@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c35d7636

24 Oct, 2023 36 commits

Merge branch 'net-ethernet-renesas-infrastructure-preparations-for-upcoming-driver' · 898ae9bd

Jakub Kicinski authored Oct 24, 2023

Wolfram Sang says:

====================
net: ethernet: renesas: infrastructure preparations for upcoming driver

Before we upstream a new driver, Niklas and I thought that a few
cleanups for Kconfig/Makefile will help readability and maintainability.
Here they are, looking forward to comments.
====================

Link: https://lore.kernel.org/r/20231022205316.3209-1-wsa+renesas@sang-engineering.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

898ae9bd

net: ethernet: renesas: drop SoC names in Kconfig · 2fc75e37

Wolfram Sang authored Oct 22, 2023

Mentioning SoCs in Kconfig descriptions tends to get stale (e.g. RAVB is
missing RZV2M) or imprecise (e.g. SH_ETH is not available on all
R8A779x). Drop them instead of providing vague information. Improve the
file description a tad while here.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://lore.kernel.org/r/20231022205316.3209-3-wsa+renesas@sang-engineering.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2fc75e37

net: ethernet: renesas: group entries in Makefile · de0ad34b

Wolfram Sang authored Oct 22, 2023

A new Renesas driver shall be added soon. Prepare the Makefile by
grouping the specific objects to the Kconfig symbol for better
readability. Improve the file description a tad while here.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://lore.kernel.org/r/20231022205316.3209-2-wsa+renesas@sang-engineering.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

de0ad34b

selftests: net: change ifconfig with ip command · 37a38e43

Swarup Laxman Kotiaklapudi authored Oct 23, 2023

Change ifconfig with ip command, on a system where ifconfig is
not used this script will not work correcly.

Test result with this patchset:

sudo make TARGETS="net" kselftest
....
TAP version 13
1..1
 timeout set to 1500
 selftests: net: route_localnet.sh
 run arp_announce test
 net.ipv4.conf.veth0.route_localnet = 1
 net.ipv4.conf.veth1.route_localnet = 1
 net.ipv4.conf.veth0.arp_announce = 2
 net.ipv4.conf.veth1.arp_announce = 2
 PING 127.25.3.14 (127.25.3.14) from 127.25.3.4 veth0: 56(84)
  bytes of data.
 64 bytes from 127.25.3.14: icmp_seq=1 ttl=64 time=0.038 ms
 64 bytes from 127.25.3.14: icmp_seq=2 ttl=64 time=0.068 ms
 64 bytes from 127.25.3.14: icmp_seq=3 ttl=64 time=0.068 ms
 64 bytes from 127.25.3.14: icmp_seq=4 ttl=64 time=0.068 ms
 64 bytes from 127.25.3.14: icmp_seq=5 ttl=64 time=0.068 ms

 --- 127.25.3.14 ping statistics ---
 5 packets transmitted, 5 received, 0% packet loss, time 4073ms
 rtt min/avg/max/mdev = 0.038/0.062/0.068/0.012 ms
 ok
 run arp_ignore test
 net.ipv4.conf.veth0.route_localnet = 1
 net.ipv4.conf.veth1.route_localnet = 1
 net.ipv4.conf.veth0.arp_ignore = 3
 net.ipv4.conf.veth1.arp_ignore = 3
 PING 127.25.3.14 (127.25.3.14) from 127.25.3.4 veth0: 56(84)
  bytes of data.
 64 bytes from 127.25.3.14: icmp_seq=1 ttl=64 time=0.032 ms
 64 bytes from 127.25.3.14: icmp_seq=2 ttl=64 time=0.065 ms
 64 bytes from 127.25.3.14: icmp_seq=3 ttl=64 time=0.066 ms
 64 bytes from 127.25.3.14: icmp_seq=4 ttl=64 time=0.065 ms
 64 bytes from 127.25.3.14: icmp_seq=5 ttl=64 time=0.065 ms

 --- 127.25.3.14 ping statistics ---
 5 packets transmitted, 5 received, 0% packet loss, time 4092ms
 rtt min/avg/max/mdev = 0.032/0.058/0.066/0.013 ms
 ok
ok 1 selftests: net: route_localnet.sh
...
Signed-off-by: Swarup Laxman Kotiaklapudi <swarupkotikalapudi@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20231023123422.2895-1-swarupkotikalapudi@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

37a38e43

Merge branch 'switch-dsa-to-inclusive-terminology' · 58ab9171

Jakub Kicinski authored Oct 24, 2023

Florian Fainelli says:

====================
Switch DSA to inclusive terminology

One of the action items following Netconf'23 is to switch subsystems to
use inclusive terminology. DSA has been making extensive use of the
"master" and "slave" words which are now replaced by "conduit" and
"user" respectively.
====================

Link: https://lore.kernel.org/r/20231023181729.1191071-1-florian.fainelli@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

58ab9171

net: dsa: Rename IFLA_DSA_MASTER to IFLA_DSA_CONDUIT · 87cd8371

Florian Fainelli authored Oct 23, 2023

This preserves the existing IFLA_DSA_MASTER which is part of the uAPI
and creates an alias named IFLA_DSA_CONDUIT.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20231023181729.1191071-3-florian.fainelli@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

87cd8371

net: dsa: Use conduit and user terms · 6ca80638

Florian Fainelli authored Oct 23, 2023

Use more inclusive terms throughout the DSA subsystem by moving away
from "master" which is replaced by "conduit" and "slave" which is
replaced by "user". No functional changes.
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20231023181729.1191071-2-florian.fainelli@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6ca80638

tsnep: Fix tsnep_request_irq() format-overflow warning · 00e984cb

Gerhard Engleder authored Oct 23, 2023

Compiler warns about a possible format-overflow in tsnep_request_irq():
drivers/net/ethernet/engleder/tsnep_main.c:884:55: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=]
                         sprintf(queue->name, "%s-rx-%d", name,
                                                       ^
drivers/net/ethernet/engleder/tsnep_main.c:881:55: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=]
                         sprintf(queue->name, "%s-tx-%d", name,
                                                       ^
drivers/net/ethernet/engleder/tsnep_main.c:878:49: warning: '-txrx-' directive writing 6 bytes into a region of size between 5 and 25 [-Wformat-overflow=]
                         sprintf(queue->name, "%s-txrx-%d", name,
                                                 ^~~~~~

Actually overflow cannot happen. Name is limited to IFNAMSIZ, because
netdev_name() is called during ndo_open(). queue_index is single char,
because less than 10 queues are supported.

Fix warning with snprintf(). Additionally increase buffer to 32 bytes,
because those 7 additional bytes were unused anyway.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202310182028.vmDthIUa-lkp@intel.com/Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231023183856.58373-1-gerhard@engleder-embedded.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

00e984cb

Merge branch 'net-deduplicate-netdev-name-allocation' · fcc017e3

Jakub Kicinski authored Oct 24, 2023

Jakub Kicinski says:

====================
net: deduplicate netdev name allocation

After recent fixes we have even more duplicated code in netdev name
allocation helpers. There are two complications in this code.
First, __dev_alloc_name() clobbers its output arg even if allocation
fails, forcing callers to do extra copies. Second as our experience in
commit 55a5ec9b ("Revert "net: core: dev_get_valid_name is now the same as dev_alloc_name_ns"") and
commit 029b6d14 ("Revert "net: core: maybe return -EEXIST in __dev_alloc_name"")
taught us, user space is very sensitive to the exact error codes.

Align the callers of __dev_alloc_name(), and remove some of its
complexity.

v1: https://lore.kernel.org/all/20231020011856.3244410-1-kuba@kernel.org/
====================

Link: https://lore.kernel.org/r/20231023152346.3639749-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fcc017e3

net: remove else after return in dev_prep_valid_name() · ce4cfa23

Jakub Kicinski authored Oct 23, 2023

Remove unnecessary else clauses after return.
I copied this if / else construct from somewhere,
it makes the code harder to read.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231023152346.3639749-7-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ce4cfa23

net: remove dev_valid_name() check from __dev_alloc_name() · 70e1b14c

Jakub Kicinski authored Oct 23, 2023

__dev_alloc_name() is only called by dev_prep_valid_name(),
which already checks that name is valid.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231023152346.3639749-6-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

70e1b14c

net: trust the bitmap in __dev_alloc_name() · 7ad17b04

Jakub Kicinski authored Oct 23, 2023

Prior to restructuring __dev_alloc_name() handled both printf
and non-printf names. In a clever attempt at code reuse it
always prints the name into a buffer and checks if it's
a duplicate.

Trust the bitmap, and return an error if its full.

This shrinks the possible ID space by one from 32K to 32K - 1,
as previously the max value would have been tried as a valid ID.
It seems very unlikely that anyone would care as we heard
no requests to increase the max beyond 32k.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231023152346.3639749-5-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7ad17b04

net: reduce indentation of __dev_alloc_name() · 9a810468

Jakub Kicinski authored Oct 23, 2023

All callers of __dev_valid_name() go thru dev_prep_valid_name()
which handles the non-printf case. Focus __dev_alloc_name() on
the sprintf case, remove the indentation level.

Minor functional change of returning -EINVAL if % is not found,
which should now never happen.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231023152346.3639749-4-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

9a810468

net: make dev_alloc_name() call dev_prep_valid_name() · 556c755a

Jakub Kicinski authored Oct 23, 2023

__dev_alloc_name() handles both the sprintf and non-sprintf
target names. This complicates the code.

dev_prep_valid_name() already handles the non-sprintf case,
before calling __dev_alloc_name(), make the only other caller
also go thru dev_prep_valid_name(). This way we can drop
the non-sprintf handling in __dev_alloc_name() in one of
the next changes.

commit 55a5ec9b ("Revert "net: core: dev_get_valid_name is now the same as dev_alloc_name_ns"") and
commit 029b6d14 ("Revert "net: core: maybe return -EEXIST in __dev_alloc_name"")
tell us that we can't start returning -EEXIST from dev_alloc_name()
on name duplicates. Bite the bullet and pass the expected errno to
dev_prep_valid_name().

dev_prep_valid_name() must now propagate out the allocated id
for printf names.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231023152346.3639749-3-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

556c755a

net: don't use input buffer of __dev_alloc_name() as a scratch space · bd07063d

Jakub Kicinski authored Oct 23, 2023

Callers of __dev_alloc_name() want to pass dev->name as
the output buffer. Make __dev_alloc_name() not clobber
that buffer on failure, and remove the workarounds
in callers.

dev_alloc_name_ns() is now completely unnecessary.

The extra strscpy() added here will be gone by the end
of the patch series.
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231023152346.3639749-2-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bd07063d

Merge branch 'mptcp-convert-netlink-code-to-use-yaml-spec' · fb1c535b

Jakub Kicinski authored Oct 24, 2023

Mat Martineau says:

====================
mptcp: convert Netlink code to use YAML spec

This series from Davide converts most of the MPTCP Netlink interface
(plus uAPI bits) to use sources generated by YNL using a YAML spec file.

This new YAML file is useful to validate the API and to generate a good
documentation page.

Patch 1 modifies YNL spec to support "uns-admin-perm" for genetlink
legacy.

Patch 2 adds support for validating exact length of netlink attrs.

Patch 3 converts Netlink structures from small_ops to ops to prepare the
switch to YAML.

Patch 4 adds the Netlink YAML spec for MPTCP.

Patch 5 adds and uses a new header file generated from the new YAML
spec.

Patch 6 renames some handlers to match the ones generated from the YAML
spec.

Patch 7 adds and uses Netlink policies automatically generated from the
YAML spec.
====================

Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-0-16b1f701f900@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fb1c535b

net: mptcp: use policy generated by YAML spec · aab4d856

Davide Caratti authored Oct 23, 2023

generated with:

 $ ./tools/net/ynl/ynl-gen-c.py --mode kernel \
 > --spec Documentation/netlink/specs/mptcp.yaml --source \
 > -o net/mptcp/mptcp_pm_gen.c
 $ ./tools/net/ynl/ynl-gen-c.py --mode kernel \
 > --spec Documentation/netlink/specs/mptcp.yaml --header \
 > -o net/mptcp/mptcp_pm_gen.h

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/340Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-7-16b1f701f900@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

aab4d856

net: mptcp: rename netlink handlers to mptcp_pm_nl_<blah>_{doit,dumpit} · 1e07938e

Davide Caratti authored Oct 23, 2023

so that they will match names generated from YAML spec.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-6-16b1f701f900@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

1e07938e

uapi: mptcp: use header file generated from YAML spec · 9d1ed17f

Davide Caratti authored Oct 23, 2023

generated with:

 $ ./tools/net/ynl/ynl-gen-c.py --mode uapi \
 > --spec Documentation/netlink/specs/mptcp.yaml \
 > --header -o include/uapi/linux/mptcp_pm.h

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-5-16b1f701f900@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

9d1ed17f

Documentation: netlink: add a YAML spec for mptcp · bc8aeb20

Davide Caratti authored Oct 23, 2023

it describes most of the current netlink interface (uAPI definitions,
doit/dumpit operations and attributes)

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-4-16b1f701f900@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bc8aeb20

net: mptcp: convert netlink from small_ops to ops · 1d0507f4

Davide Caratti authored Oct 23, 2023

in the current MPTCP control plane, all operations use a netlink
attribute of the same type "MPTCP_PM_ATTR". However, add/del/get/flush
operations only parse the first element in the message _ the one that
describes MPTCP endpoints (that was named MPTCP_PM_ATTR_ADDR and
mostly used in ADD_ADDR operations _ probably the similarity of "attr",
"addr" and "add" might cause some confusion to human readers).
Convert MPTCP from 'small_ops' to 'ops', thus allowing different attributes
for each single operation, hopefully makes all this clearer to human
readers.

- use a separate attribute set for add/del/get/flush address operation,
  binary compatible with the existing one, to store the endpoint address.
  MPTCP_PM_ENDPOINT_ADDR is added to the uAPI (with the same value as
  MPTCP_PM_ATTR_ADDR) for these operations.
- convert mptcp_pm_ops[] and add policy files accordingly.

this prepares MPTCP control plane to be described as YAML spec.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-3-16b1f701f900@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

1d0507f4

tools: ynl-gen: add support for exact-len validation · 0c63ad37

Davide Caratti authored Oct 23, 2023

add support for 'exact-len' validation on netlink attributes.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/340Acked-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-2-16b1f701f900@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0c63ad37

tools: ynl: add uns-admin-perm to genetlink legacy · 52c121f4

Davide Caratti authored Oct 23, 2023

this flag maps to GENL_UNS_ADMIN_PERM and will be used by future specs.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231023-send-net-next-20231023-1-v2-1-16b1f701f900@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

52c121f4

net: sched: sch_qfq: Use non-work-conserving warning handler · 6d25d1dc

Liu Jian authored Oct 23, 2023

A helper function for printing non-work-conserving alarms is added in
commit b00355db ("pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving
warning handler."). In this commit, use qdisc_warn_nonwc() instead of
WARN_ONCE() to handle the non-work-conserving warning in qfq Qdisc.
Signed-off-by: Liu Jian <liujian56@huawei.com>
Link: https://lore.kernel.org/r/20231023064729.370649-1-liujian56@huawei.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

6d25d1dc

net: ethernet: davinci_emac: Use MAC Address from Device Tree · f30a51a4

Adam Ford authored Oct 22, 2023

Currently there is a device tree entry called "local-mac-address"
which can be filled by the bootloader or manually set.This is
useful when the user does not want to use the MAC address
programmed into the SoC.

Currently, the davinci_emac reads the MAC from the DT, copies
it from pdata->mac_addr to priv->mac_addr, then blindly overwrites
it by reading from registers in the SoC, and falls back to a
random MAC if it's still not valid.  This completely ignores any
MAC address in the device tree.

In order to use the local-mac-address, check to see if the contents
of priv->mac_addr are valid before falling back to reading from the
SoC when the MAC address is not valid.
Signed-off-by: Adam Ford <aford173@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231022151911.4279-1-aford173@gmail.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

f30a51a4

sock: Ignore memcg pressure heuristics when raising allocated · 66e6369e

Abel Wu authored Oct 19, 2023

Before sockets became aware of net-memcg's memory pressure since
commit e1aab161 ("socket: initial cgroup code."), the memory
usage would be granted to raise if below average even when under
protocol's pressure. This provides fairness among the sockets of
same protocol.

That commit changes this because the heuristic will also be
effective when only memcg is under pressure which makes no sense.
So revert that behavior.

After reverting, __sk_mem_raise_allocated() no longer considers
memcg's pressure. As memcgs are isolated from each other w.r.t.
memory accounting, consuming one's budget won't affect others.
So except the places where buffer sizes are needed to be tuned,
allow workloads to use the memory they are provisioned.
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231019120026.42215-3-wuyun.abel@bytedance.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

66e6369e

sock: Doc behaviors for pressure heurisitics · 2e12072c

Abel Wu authored Oct 19, 2023

There are now two accounting infrastructures for skmem, while the
heuristics in __sk_mem_raise_allocated() were actually introduced
before memcg was born.

Add some comments to clarify whether they can be applied to both
infrastructures or not.
Suggested-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231019120026.42215-2-wuyun.abel@bytedance.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

2e12072c

sock: Code cleanup on __sk_mem_raise_allocated() · 2def8ff3

Abel Wu authored Oct 19, 2023

Code cleanup for both better simplicity and readability.
No functional change intended.
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231019120026.42215-1-wuyun.abel@bytedance.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

2def8ff3

net: ti: icssg-prueth: Add phys_port_name support · f6e12277

Jan Kiszka authored Oct 22, 2023

Helps identifying the ports in udev rules e.g.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/895ae9c1-b6dd-4a97-be14-6f2b73c7b2b5@siemens.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

f6e12277

net: microchip: lan743x: improve throughput with rx timestamp config · 169e0a5e

Vishvambar Panth S authored Oct 21, 2023

Currently all RX frames are timestamped which results in a performance
penalty when timestamping is not needed.  The default is now being
changed to not timestamp any Rx frames (HWTSTAMP_FILTER_NONE), but
support has been added to allow changing the desired RX timestamping
mode (HWTSTAMP_FILTER_ALL -  which was the previous setting and
HWTSTAMP_FILTER_PTP_V2_EVENT are now supported) using
SIOCSHWTSTAMP. All settings were tested using the hwstamp_ctl application.
It is also noted that ptp4l, when started, preconfigures the device to
timestamp using HWTSTAMP_FILTER_PTP_V2_EVENT, so this driver continues
to work properly "out of the box".

Test setup:  x64 PC with LAN7430 ---> x64 PC as partner

iperf3 with - Timestamp all incoming packets:
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-5.05   sec   517 MBytes   859 Mbits/sec    0    sender
[  5]   0.00-5.00   sec   515 MBytes   864 Mbits/sec         receiver

iperf Done.

iperf3 with - Timestamp only PTP packets:
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-5.04   sec   563 MBytes   937 Mbits/sec    0    sender
[  5]   0.00-5.00   sec   561 MBytes   941 Mbits/sec         receiver
Signed-off-by: Vishvambar Panth S <vishvambarpanth.s@microchip.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231020185801.25649-1-vishvambarpanth.s@microchip.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

169e0a5e

Merge branch 'introduce-page_pool_alloc-related-api' · efb3e0e1

Jakub Kicinski authored Oct 23, 2023

Yunsheng Lin says:

====================
introduce page_pool_alloc() related API

In [1] & [2] & [3], there are usecases for veth and virtio_net
to use frag support in page pool to reduce memory usage, and it
may request different frag size depending on the head/tail
room space for xdp_frame/shinfo and mtu/packet size. When the
requested frag size is large enough that a single page can not
be split into more than one frag, using frag support only have
performance penalty because of the extra frag count handling
for frag support.

So this patchset provides a page pool API for the driver to
allocate memory with least memory utilization and performance
penalty when it doesn't know the size of memory it need
beforehand.

1. https://patchwork.kernel.org/project/netdevbpf/patch/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org/
2. https://patchwork.kernel.org/project/netdevbpf/patch/20230526054621.18371-3-liangchen.linux@gmail.com/
3. https://github.com/alobakin/linux/tree/iavf-pp-frag
====================

Link: https://lore.kernel.org/r/20231020095952.11055-1-linyunsheng@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

efb3e0e1

net: veth: use newly added page pool API for veth with xdp · 2d0de67d

Yunsheng Lin authored Oct 20, 2023

Use page_pool_alloc() API to allocate memory with least
memory utilization and performance penalty.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20231020095952.11055-6-linyunsheng@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2d0de67d

page_pool: update document about fragment API · 8ab32fa1

Yunsheng Lin authored Oct 20, 2023

As more drivers begin to use the fragment API, update the
document about how to decide which API to use for the
driver author.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
CC: Dima Tisnek <dimaqq@gmail.com>
Link: https://lore.kernel.org/r/20231020095952.11055-5-linyunsheng@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8ab32fa1

page_pool: introduce page_pool_alloc() API · de97502e

Yunsheng Lin authored Oct 20, 2023

Currently page pool supports the below use cases:
use case 1: allocate page without page splitting using
            page_pool_alloc_pages() API if the driver knows
            that the memory it need is always bigger than
            half of the page allocated from page pool.
use case 2: allocate page frag with page splitting using
            page_pool_alloc_frag() API if the driver knows
            that the memory it need is always smaller than
            or equal to the half of the page allocated from
            page pool.

There is emerging use case [1] & [2] that is a mix of the
above two case: the driver doesn't know the size of memory it
need beforehand, so the driver may use something like below to
allocate memory with least memory utilization and performance
penalty:

if (size << 1 > max_size)
	page = page_pool_alloc_pages();
else
	page = page_pool_alloc_frag();

To avoid the driver doing something like above, add the
page_pool_alloc() API to support the above use case, and update
the true size of memory that is acctually allocated by updating
'*size' back to the driver in order to avoid exacerbating
truesize underestimate problem.

Rename page_pool_free() which is used in the destroy process to
__page_pool_destroy() to avoid confusion with the newly added
API.

1. https://lore.kernel.org/all/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org/
2. https://lore.kernel.org/all/20230526054621.18371-3-liangchen.linux@gmail.com/Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20231020095952.11055-4-linyunsheng@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

de97502e

page_pool: remove PP_FLAG_PAGE_FRAG · 09d96ee5

Yunsheng Lin authored Oct 20, 2023

PP_FLAG_PAGE_FRAG is not really needed after pp_frag_count
handling is unified and page_pool_alloc_frag() is supported
in 32-bit arch with 64-bit DMA, so remove it.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20231020095952.11055-3-linyunsheng@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

09d96ee5

page_pool: unify frag_count handling in page_pool_is_last_frag() · 58d53d8f

Yunsheng Lin authored Oct 20, 2023

Currently when page_pool_create() is called with
PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
allowed to be called under the below constraints:
1. page_pool_fragment_page() need to be called to setup
page->pp_frag_count immediately.
2. page_pool_defrag_page() often need to be called to drain
the page->pp_frag_count when there is no more user will
be holding on to that page.

Those constraints exist in order to support a page to be
split into multi fragments.

And those constraints have some overhead because of the
cache line dirtying/bouncing and atomic update.

Those constraints are unavoidable for case when we need a
page to be split into more than one fragment, but there is
also case that we want to avoid the above constraints and
their overhead when a page can't be split as it can only
hold a fragment as requested by user, depending on different
use cases:
use case 1: allocate page without page splitting.
use case 2: allocate page with page splitting.
use case 3: allocate page with or without page splitting
depending on the fragment size.

Currently page pool only provide page_pool_alloc_pages() and
page_pool_alloc_frag() API to enable the 1 & 2 separately,
so we can not use a combination of 1 & 2 to enable 3, it is
not possible yet because of the per page_pool flag
PP_FLAG_PAGE_FRAG.

So in order to allow allocating unsplit page without the
overhead of split page while still allow allocating split
page we need to remove the per page_pool flag in
page_pool_is_last_frag(), as best as I can think of, it seems
there are two methods as below:
1. Add per page flag/bit to indicate a page is split or
not, which means we might need to update that flag/bit
everytime the page is recycled, dirtying the cache line
of 'struct page' for use case 1.
2. Unify the page->pp_frag_count handling for both split and
unsplit page by assuming all pages in the page pool is split
into a big fragment initially.

As page pool already supports use case 1 without dirtying the
cache line of 'struct page' whenever a page is recyclable, we
need to support the above use case 3 with minimal overhead,
especially not adding any noticeable overhead for use case 1,
and we are already doing an optimization by not updating
pp_frag_count in page_pool_defrag_page() for the last fragment
user, this patch chooses to unify the pp_frag_count handling
to support the above use case 3.

There is no noticeable performance degradation and some
justification for unifying the frag_count handling with this
patch applied using a micro-benchmark testing in [1].

1. https://lore.kernel.org/all/bf2591f8-7b3c-4480-bb2c-31dc9da1d6ac@huawei.com/Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20231020095952.11055-2-linyunsheng@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

58d53d8f