Commits · a2713257ee2be22827d7bc248302d408c91bfb95 · Kirill Smelkov / linux

18 Sep, 2023 11 commits

tls: Use size_add() in call to struct_size() · a2713257

Gustavo A. R. Silva authored Sep 15, 2023

If, for any reason, the open-coded arithmetic causes a wraparound,
the protection that `struct_size()` adds against potential integer
overflows is defeated. Fix this by hardening call to `struct_size()`
with `size_add()`.

Fixes: b89fec54 ("tls: rx: wrap decrypt params in a struct")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

a2713257

mlxsw: Use size_mul() in call to struct_size() · e22c6ea0

Gustavo A. R. Silva authored Sep 15, 2023

If, for any reason, the open-coded arithmetic causes a wraparound, the
protection that `struct_size()` adds against potential integer overflows
is defeated. Fix this by hardening call to `struct_size()` with `size_mul()`.

Fixes: 2285ec87 ("mlxsw: spectrum_acl_bloom_filter: use struct_size() in kzalloc()")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e22c6ea0

Merge branch 'kselftest-rtnetlink' · 74fa1a82

David S. Miller authored Sep 18, 2023

Daniel Mendes says:

====================
kselftest: rtnetlink: add additional command line options

Many other tests implement options like verbose, pause, and pause
on failure. These patches just add these options to rtnetlink.sh.
The same conventions are used as the tests that already have this
functionality: eg verbose is 0 or 1 but PAUSE is "yes" or "no".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

74fa1a82

kselftest: rtnetlink: add pause and pause on fail flag · a68eed9f

Daniel Mendes authored Sep 12, 2023

'Pause' prompts the user to press Enter to continue running tests
once one test has finished. Pause on fail on prompts the user to press
enter only when a test fails.

Modifications to kci_test_addrlft() and kci_test_ipsec_offload()
ensure that whenever end_test is called, [$ret -ne 0] indicates
failure. This allows end_test to really easily implement pause on fail
functionality.
Signed-off-by: Daniel Mendes <dmendes@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a68eed9f

kselftest: rtnetlink.sh: add verbose flag · 9c2a19f7

Daniel Mendes authored Sep 12, 2023

Uses a run_cmd helper function similar to other selftests to add
verbose functionality i.e. print executed commands and their outputs

Many commands silence or redirect output. This can be removed since
the verbose helper function captures output anyway and only outputs it
if VERBOSE is true. Similarly, the helper command for pipes to grep
searches stderr and stdout. This makes output redirection unnecessary
in those cases.
Signed-off-by: Daniel Mendes <dmendes@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9c2a19f7

Merge branch 'pds_core-pci-reset' · 760554a9

David S. Miller authored Sep 18, 2023

Shannon Nelson says:

====================
pds_core: add PCI reset handling

Make sure pds_core can handle and recover from PCI function resets and
similar PCI bus issues: add detection and handlers for PCI problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

760554a9

pds_core: add attempts to fix broken PCI · 1e18ec3e

Shannon Nelson authored Sep 14, 2023

If we see a 0xff value from a PCI register read, we know that
the PCI connection is broken, possibly by a low level reset that
didn't go through the nice pci_error_handlers path.

Make use of the PCI cleanup code that we already have from the
reset handlers and add some detection and attempted recovery
from a broken PCI connection.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1e18ec3e

pds_core: implement pci reset handlers · ffa55858

Shannon Nelson authored Sep 14, 2023

Implement the callbacks for a nice PCI reset.  These get called
when a user is nice enough to use the sysfs PCI reset entry, e.g.
    echo 1 > /sys/bus/pci/devices/0000:2b:00.0/reset
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ffa55858

pds_core: keep viftypes table across reset · d557c094

Shannon Nelson authored Sep 14, 2023

Keep the viftypes and the current enable/disable states
across a recovery action.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d557c094

pds_core: check health in devcmd wait · f7b5bd72

Shannon Nelson authored Sep 14, 2023

Similar to what we do in the AdminQ, check for devcmd health
while waiting for an answer.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f7b5bd72

octeon_ep: support to fetch firmware info · 8d6198a1

Shinas Rasheed authored Sep 15, 2023

Add support to fetch firmware info such as heartbeat miss count,
heartbeat interval. This shall be used for heartbeat monitor.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8d6198a1

17 Sep, 2023 29 commits

gve: Use size_add() in call to struct_size() · d692873c

Gustavo A. R. Silva authored Sep 15, 2023

If, for any reason, `tx_stats_num + rx_stats_num` wraps around, the
protection that struct_size() adds against potential integer overflows
is defeated. Fix this by hardening call to struct_size() with size_add().

Fixes: 691f4077 ("gve: Replace zero-length array with flexible-array member")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

d692873c

Merge branch 'vsock-tests' · ebdada9d

David S. Miller authored Sep 17, 2023

Stefano Garzarella says:

====================
vsock/test: add recv_buf()/send_buf() utility functions and some improvements

We recently found that some tests were failing [1].

The problem was that we were not waiting for all the bytes correctly,
so we had a partial read. I had initially suggested using MSG_WAITALL,
but this could have timeout problems.

Since we already had send_byte() and recv_byte() that handled the timeout,
but also the expected return value, I moved that code to two new functions
that we can now use to send/receive generic buffers.

The last commit is just an improvement to a test I found difficult to
understand while using the new functions.

@Arseniy a review and some testing are really appreciated :-)

[1] https://lore.kernel.org/netdev/63xflnwiohdfo6m3vnrrxgv2ulplencpwug5qqacugqh7xxpu3@tsczkuqgwurb/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ebdada9d

vsock/test: track bytes in sk_buff merging test for SOCK_SEQPACKET · bc7bea45

Stefano Garzarella authored Sep 15, 2023

The test was a bit complicated to read.
Added variables to keep track of the bytes read and to be read
in each step. Also some comments.

The test is unchanged.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bc7bea45

vsock/test: use send_buf() in vsock_test.c · 2a8548a9

Stefano Garzarella authored Sep 15, 2023

We have a very common pattern used in vsock_test that we can
now replace with the new send_buf().

This allows us to reuse the code we already had to check the
actual return value and wait for all the bytes to be sent with
an appropriate timeout.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2a8548a9

vsock/test: add send_buf() utility function · 12329bd5

Stefano Garzarella authored Sep 15, 2023

Move the code of send_byte() out in a new utility function that
can be used to send a generic buffer.

This new function can be used when we need to send a custom
buffer and not just a single 'A' byte.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

12329bd5

vsock/test: use recv_buf() in vsock_test.c · a0bcb835

Stefano Garzarella authored Sep 15, 2023

We have a very common pattern used in vsock_test that we can
now replace with the new recv_buf().

This allows us to reuse the code we already had to check the
actual return value and wait for all bytes to be received with
an appropriate timeout.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a0bcb835

vsock/test: add recv_buf() utility function · a8ed71a2

Stefano Garzarella authored Sep 15, 2023

Move the code of recv_byte() out in a new utility function that
can be used to receive a generic buffer.

This new function can be used when we need to receive a custom
buffer and not just a single 'A' byte.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a8ed71a2

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 685c6d5b

David S. Miller authored Sep 17, 2023

Alexei Starovoitov says:

====================
The following pull-request contains BPF updates for your *net-next* tree.

We've added 73 non-merge commits during the last 9 day(s) which contain
a total of 79 files changed, 5275 insertions(+), 600 deletions(-).

The main changes are:

1) Basic BTF validation in libbpf, from Andrii Nakryiko.

2) bpf_assert(), bpf_throw(), exceptions in bpf progs, from Kumar Kartikeya Dwivedi.

3) next_thread cleanups, from Oleg Nesterov.

4) Add mcpu=v4 support to arm32, from Puranjay Mohan.

5) Add support for __percpu pointers in bpf progs, from Yonghong Song.

6) Fix bpf tailcall interaction with bpf trampoline, from Leon Hwang.

7) Raise irq_work in bpf_mem_alloc while irqs are disabled to improve refill probabablity, from Hou Tao.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git

Thanks a lot!

Also thanks to reporters, reviewers and testers of commits in this pull-request:

Alan Maguire, Andrey Konovalov, Dave Marchevsky, "Eric W. Biederman",
Jiri Olsa, Maciej Fijalkowski, Quentin Monnet, Russell King (Oracle),
Song Liu, Stanislav Fomichev, Yonghong Song
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

685c6d5b

Merge branch 'phy-stopping-race' · fbb49deb

David S. Miller authored Sep 17, 2023

Russell King says:

====================
net: phy: avoid race when erroring stopping PHY

This series addresses a problem reported by Jijie Shao where the PHY
state machine can race with phy_stop() leading to an incorrect state.

The issue centres around phy_state_machine() dropping the phydev->lock
mutex briefly, which allows phy_stop() to get in half-way through the
state machine, and when the state machine resumes, it overwrites
phydev->state with a value incompatible with a stopped PHY. This causes
a subsequent phy_start() to issue a warning.

We address this firstly by using versions of functions that do not take
tne lock, moving them into the locked region. The only function that
this can't be done with is phy_suspend() which needs to call into the
driver without taking the lock.

For phy_suspend(), we split the state machine into two parts - the
initial part which runs under the phydev->lock, and the second part
which runs without the lock.

We finish off by using the split state machine in phy_stop() which
removes another unnecessary unlock-lock sequence from phylib.

Changes from RFC:
- Added Jijie Shao's tested-by
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

fbb49deb

net: phy: convert phy_stop() to use split state machine · adcbb855

Russell King (Oracle) authored Sep 14, 2023

Convert phy_stop() to use the new locked-section and unlocked-section
parts of the PHY state machine.
Tested-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

adcbb855

net: phy: split locked and unlocked section of phy_state_machine() · 8635c066

Russell King (Oracle) authored Sep 14, 2023

Split out the locked and unlocked sections of phy_state_machine() into
two separate functions which can be called inside the phydev lock and
outside the phydev lock as appropriate, thus allowing us to combine
the locked regions in the caller of phy_state_machine() with the
locked region inside phy_state_machine().

This avoids unnecessarily dropping the phydev lock which may allow
races to occur.
Tested-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8635c066

net: phy: move phy_state_machine() · c398ef41

Russell King (Oracle) authored Sep 14, 2023

Move phy_state_machine() before phy_stop() to avoid subsequent patches
introducing forward references.
Tested-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c398ef41

net: phy: move phy_suspend() to end of phy_state_machine() · 6e19b350

Russell King (Oracle) authored Sep 14, 2023

Move the call to phy_suspend() to the end of phy_state_machine() after
we release the lock so that we can combine the locked areas.
phy_suspend() can not be called while holding phydev->lock as it has
caused deadlocks in the past.
Tested-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6e19b350

net: phy: move call to start aneg · ea5968cd

Russell King (Oracle) authored Sep 14, 2023

Move the call to start auto-negotiation inside the lock in the PHYLIB
state machine, calling the locked variant _phy_start_aneg(). This
avoids unnecessarily releasing and re-acquiring the lock.
Tested-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ea5968cd

net: phy: call phy_error_precise() while holding the lock · ef113a60

Russell King (Oracle) authored Sep 14, 2023

Move the locking out of phy_error_precise() and to its only call site,
merging with the locked region that has already been taken.
Tested-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ef113a60

net: phy: always call phy_process_state_change() under lock · 8da77df6

Russell King (Oracle) authored Sep 14, 2023

phy_stop() calls phy_process_state_change() while holding the phydev
lock, so also arrange for phy_state_machine() to do the same, so that
this function is called with consistent locking.
Tested-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8da77df6

net: dsa: microchip: Add partial ACL support for ksz9477 switches · 002841be

Oleksij Rempel authored Sep 14, 2023

This patch adds partial Access Control List (ACL) support for the
ksz9477 family of switches. ACLs enable filtering of incoming layer 2
MAC, layer 3 IP, and layer 4 TCP/UDP packets on each port. They provide
additional capabilities for filtering routed network protocols and can
take precedence over other forwarding functions.

ACLs can filter ingress traffic based on header fields such as
source/destination MAC address, EtherType, IPv4 address, IPv4 protocol,
UDP/TCP ports, and TCP flags. The ACL is an ordered list of up to 16
access control rules programmed into the ACL Table. Each entry specifies
a set of matching conditions and action rules for controlling packet
forwarding and priority.

The ACL also implements a count function, generating an interrupt
instead of a forwarding action. It can be used as a watchdog timer or an
event counter. The ACL consists of three parts: matching rules, action
rules, and processing entries. Multiple match conditions can be either
AND'ed or OR'ed together.

This patch introduces support for a subset of the available ACL
functionality, specifically layer 2 matching and prioritization of
matched packets. For example:

tc qdisc add dev lan2 clsact
tc filter add dev lan2 ingress protocol 0x88f7 flower action skbedit prio 7

tc qdisc add dev lan1 clsact
tc filter add dev lan1 ingress protocol 0x88f7 flower action skbedit prio 7

The hardware offloading implementation was benchmarked against a
configuration without hardware offloading. This latter setup relied on a
software-based Linux bridge. No noticeable differences were observed
between the two configurations. Here is an example of software-based
test:

ip l s dev enu1u1 up
ip l s dev enu1u2 up
ip l s dev enu1u4 up
ethtool -A enu1u1 autoneg off rx off tx off
ethtool -A enu1u2 autoneg off rx off tx off
ethtool -A enu1u4 autoneg off rx off tx off
ip l a name br0 type bridge
ip l s dev br0 up
ip l s enu1u1 master br0
ip l s enu1u2 master br0
ip l s enu1u4 master br0

tc qdisc add dev enu1u1 root handle 1:  ets strict 4 priomap 3 3 2 2 1 1 0 0
tc qdisc add dev enu1u4 root handle 1:  ets strict 4 priomap 3 3 2 2 1 1 0 0
tc qdisc add dev enu1u2 root handle 1:  ets strict 4 priomap 3 3 2 2 1 1 0 0

tc qdisc add dev enu1u1 clsact
tc filter add dev enu1u1 ingress protocol ipv4  flower action skbedit prio 7

tc qdisc add dev enu1u4 clsact
tc filter add dev enu1u4 ingress protocol ipv4  flower action skbedit prio 0

On a system attached to the port enu1u2 I run two iperf3 server
instances:
iperf3 -s -p 5210 &
iperf3 -s -p 5211 &

On systems attached to enu1u4 and enu1u1 I run:
iperf3 -u -c  172.17.0.1 -p 5210 -b100M  -l1472 -t100
and
iperf3 -u -c  172.17.0.1 -p 5211 -b100M  -l1472 -t100

As a result, IP traffic on port enu1u1 will be prioritized and take
precedence over IP traffic on port enu1u4
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

002841be

net: dsa: microchip: Move *_port_setup code to dsa_switch_ops::port_setup() · 15299227

Oleksij Rempel authored Sep 14, 2023

Right now, the *_port_setup code is in dsa_switch_ops::port_enable(),
which is not the best place for it. This patch moves it to a more
suitable place, dsa_switch_ops::port_setup(), to match the function's
purpose and name.

This patch is a preparation for coming ACL support patch.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

15299227

Merge branch 'devlink-instances-relationships' · e03f0dfb

David S. Miller authored Sep 17, 2023

Jiri Pirko says:

====================
expose devlink instances relationships

From: Jiri Pirko <jiri@nvidia.com>

Currently, the user can instantiate new SF using "devlink port add"
command. That creates an E-switch representor devlink port.

When user activates this SF, there is an auxiliary device created and
probed for it which leads to SF devlink instance creation.

There is 1:1 relationship between E-switch representor devlink port and
the SF auxiliary device devlink instance.

Also, for example in mlx5, one devlink instance is created for
PCI device and one is created for an auxiliary device that represents
the uplink port. The relation between these is invisible to the user.

Patches #1-#3 and #5 are small preparations.

Patch #4 adds netnsid attribute for nested devlink if that in a
different namespace.

Patch #5 is the main one in this set, introduces the relationship
tracking infrastructure later on used to track SFs, linecards and
devlink instance relationships with nested devlink instances.

Expose the relation to the user by introducing new netlink attribute
DEVLINK_PORT_FN_ATTR_DEVLINK which contains the devlink instance related
to devlink port function. This is done by patch #8.
Patch #9 implements this in mlx5 driver.

Patch #10 converts the linecard nested devlink handling to the newly
introduced rel infrastructure.

Patch #11 benefits from the rel infra and introduces possiblitily to
have relation between devlink instances.
Patch #12 implements this in mlx5 driver.

Examples:
$ devlink dev
pci/0000:08:00.0: nested_devlink auxiliary/mlx5_core.eth.0
pci/0000:08:00.1: nested_devlink auxiliary/mlx5_core.eth.1
auxiliary/mlx5_core.eth.1
auxiliary/mlx5_core.eth.0

$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 106
pci/0000:08:00.0/32768: type eth netdev eth4 flavour pcisf controller 0 pfnum 0 sfnum 106 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable
$ devlink port function set pci/0000:08:00.0/32768 state active
$ devlink port show pci/0000:08:00.0/32768
pci/0000:08:00.0/32768: type eth netdev eth4 flavour pcisf controller 0 pfnum 0 sfnum 106 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state active opstate attached roce enable nested_devlink auxiliary/mlx5_core.sf.2

$ devlink port show pci/0000:08:00.0/32768
pci/0000:08:00.0/32768: type eth netdev eth4 flavour pcisf controller 0 pfnum 0 sfnum 106 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state active opstate attached roce enable nested_devlink auxiliary/mlx5_core.sf.2 nested_devlink_netns ns1
====================
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>