Commits · 4ec7329517027db28c5683675ab3b3842ad60324 · Kirill Smelkov / linux

03 Jun, 2023 12 commits

net: phylib: fix phy_read*_poll_timeout() · 4ec73295

Russell King (Oracle) authored Jun 01, 2023

Dan Carpenter reported a signedness bug in genphy_loopback(). Andrew
reports that:

"It is common to get this wrong in general with PHY drivers. Dan
regularly posts fixes like this soon after a PHY driver patch it
merged. I really wish we could somehow get the compiler to warn when
the result from phy_read() is stored into a unsigned type. It would
save Dan a lot of work."

Let's make phy_read*_poll_timeout() immune to further issues when "val"
is an unsigned type by storing the read function's result in a signed
int as well as "val", and using the signed variable both to check for
an error and for propagating that error to the caller.

The advantage of this method is we don't change where the cast from
the signed return code to the user's variable occurs - so users will
see no change.

Previously Heiner changed phy_read_poll_timeout() to check for an error
before evaluating the user supplied condition, but didn't update
phy_read_mmd_poll_timeout(). Make that change there too.

Link: https://lore.kernel.org/r/d7bb312e-2428-45f6-b9b3-59ba544e8b94@kili.mountainSigned-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1q4kX6-00BNuM-Mx@rmk-PC.armlinux.org.ukSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4ec73295

Merge branch 'tools-ynl-gen-dust-off-the-user-space-code' · 7fa217d4

Jakub Kicinski authored Jun 02, 2023

Jakub Kicinski says:

====================
tools: ynl-gen: dust off the user space code

Every now and then I wish I finished the user space part of
the netlink specs, Python scripts kind of stole the show but
C is useful for selftests and stuff which needs to be fast.
Recently someone asked me how to access devlink and ethtool
from C++ which pushed me over the edge.

Fix things which bit rotted and finish notification handling.
This series contains code gen changes only. I'll follow up
with the fixed component, samples and docs as soon as it's
merged.
====================

Link: https://lore.kernel.org/r/20230602023548.463441-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7fa217d4

tools: ynl-gen: generate static descriptions of notifications · 59d814f0

Jakub Kicinski authored Jun 01, 2023

Notifications may come in at any time. The family must be always
ready to parse a random incoming notification. Generate notification
table for parsing and tell YNL which request we're processing
to distinguish responses from notifications.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

59d814f0

tools: ynl-gen: switch to family struct · 8cb6afb3

Jakub Kicinski authored Jun 01, 2023

We'll want to store static info about the family soon.
Generate a struct. This changes creation from, e.g.:

	 ys = ynl_sock_create("netdev", &yerr);
to:
	 ys = ynl_sock_create(&ynl_netdev_family, &yerr);

on user's side.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

8cb6afb3

tools: ynl-gen: generate alloc and free helpers for req · 5d58f911

Jakub Kicinski authored Jun 01, 2023

We expect user to allocate requests with calloc(),
make things a bit more consistent and provide helpers.
Generate free calls, too.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5d58f911

tools: ynl-gen: move the response reading logic into YNL · dc0956c9

Jakub Kicinski authored Jun 01, 2023

We generate send() and recv() calls and all msg handling for
each operation. It's a lot of repeated code and will only grow
with notification handling. Call back to a helper YNL lib instead.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dc0956c9

tools: ynl-gen: generate enum-to-string helpers · 21b6e302

Jakub Kicinski authored Jun 01, 2023

It's sometimes useful to print the name of an enum value,
flag or name of the op. Python can do it, add C helper
code gen for getting names of things.

Example:

  static const char * const netdev_xdp_act_strmap[] = {
	[0] = "basic",
	[1] = "redirect",
	[2] = "ndo-xmit",
	[3] = "xsk-zerocopy",
	[4] = "hw-offload",
	[5] = "rx-sg",
	[6] = "ndo-xmit-sg",
  };

  const char *netdev_xdp_act_str(enum netdev_xdp_act value)
  {
	value = ffs(value) - 1;
	if (value < 0 || value >= (int)MNL_ARRAY_SIZE(netdev_xdp_act_strmap))
		return NULL;
	return netdev_xdp_act_strmap[value];
  }
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

21b6e302

tools: ynl-gen: add error checking for nested structs · eef9b794

Jakub Kicinski authored Jun 01, 2023

Parsing nested types may return an error, propagate it.
Not marking as a fix, because nothing uses YNL upstream.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eef9b794

tools: ynl-gen: loosen type consistency check for events · 5605f102

Jakub Kicinski authored Jun 01, 2023

Both event and notify types are always consistent. Rewrite
the condition checking if we can reuse reply types to be
less picky and let notify thru.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5605f102

tools: ynl-gen: don't override pure nested struct · 67c65ce7

Jakub Kicinski authored Jun 01, 2023

For pure structs (parsed nested attributes) we track what
forms of the struct exist in request and reply directions.
Make sure we don't overwrite the recorded struct each time,
otherwise the information is lost.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

67c65ce7

tools: ynl-gen: fix unused / pad attribute handling · 6ad49839

Jakub Kicinski authored Jun 01, 2023

Unused and Pad attributes don't carry information.
Unused should never exist, and be rejected.
Pad should be silently skipped.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

6ad49839

tools: ynl-gen: add extra headers for user space · 91dfaef2

Jakub Kicinski authored Jun 01, 2023

Make sure all relevant headers are included, we allocate memory,
use memcpy() and Linux types without including the headers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

91dfaef2

02 Jun, 2023 11 commits

ipv4: Drop tos parameter from flowi4_update_output() · 3f06760c

Guillaume Nault authored Jun 01, 2023

Callers of flowi4_update_output() never try to update ->flowi4_tos:

  * ip_route_connect() updates ->flowi4_tos with its own current
    value.

  * ip_route_newports() has two users: tcp_v4_connect() and
    dccp_v4_connect. Both initialise fl4 with ip_route_connect(), which
    in turn sets ->flowi4_tos with RT_TOS(inet_sk(sk)->tos) and
    ->flowi4_scope based on SOCK_LOCALROUTE.

    Then ip_route_newports() updates ->flowi4_tos with
    RT_CONN_FLAGS(sk), which is the same as RT_TOS(inet_sk(sk)->tos),
    unless SOCK_LOCALROUTE is set on the socket. In that case, the
    lowest order bit is set to 1, to eventually inform
    ip_route_output_key_hash() to restrict the scope to RT_SCOPE_LINK.
    This is equivalent to properly setting ->flowi4_scope as
    ip_route_connect() did.

  * ip_vs_xmit.c initialises ->flowi4_tos with memset(0), then calls
    flowi4_update_output() with tos=0.

  * sctp_v4_get_dst() uses the same RT_CONN_FLAGS_TOS() when
    initialising ->flowi4_tos and when calling flowi4_update_output().

In the end, ->flowi4_tos never changes. So let's just drop the tos
parameter. This will simplify the conversion of ->flowi4_tos from __u8
to dscp_t.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3f06760c

net: lan743x: Remove extranous gotos · 0f0f5868

Moritz Fischer authored Jun 02, 2023

The gotos for cleanup aren't required, the function
might as well just return the actual error code.
Signed-off-by: Moritz Fischer <moritzf@google.com>
Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0f0f5868

ip_gre: clean up some inconsistent indenting · a92fb5c0

Jiapeng Chong authored Jun 02, 2023

No functional modification involved.

net/ipv4/ip_gre.c:192 ipgre_err() warn: inconsistent indenting.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5375Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a92fb5c0

selftests: tls: add tests for poll behavior · 23fcb62b

Jakub Kicinski authored May 31, 2023

Make sure we don't generate premature POLLIN events.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

23fcb62b

tls: suppress wakeups unless we have a full record · 121dca78

Jakub Kicinski authored May 31, 2023

TLS does not override .poll() so TLS-enabled socket will generate
an event whenever data arrives at the TCP socket. This leads to
unnecessary wakeups on slow connections.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

121dca78

selftests/tc-testing: replace mq with invalid parent ID · a395b8d1

Zhengchao Shao authored Jun 01, 2023

The test case shown in [1] triggers the kernel to access the null pointer.
Therefore, add related test cases to mq.
The test results are as follows:

./tdc.py -e 0531
1..1
ok 1 0531 - Replace mq with invalid parent ID

./tdc.py -c mq
1..8
ok 1 ce7d - Add mq Qdisc to multi-queue device (4 queues)
ok 2 2f82 - Add mq Qdisc to multi-queue device (256 queues)
ok 3 c525 - Add duplicate mq Qdisc
ok 4 128a - Delete nonexistent mq Qdisc
ok 5 03a9 - Delete mq Qdisc twice
ok 6 be0f - Add mq Qdisc to single-queue device
ok 7 1023 - Show mq class
ok 8 0531 - Replace mq with invalid parent ID

[1] https://lore.kernel.org/all/20230527093747.3583502-1-shaozhengchao@huawei.com/Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20230601012250.52738-1-shaozhengchao@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a395b8d1

net: phy: broadcom: Add LPI counter · e8b6f79b

Florian Fainelli authored May 31, 2023

Add the ability to read the PHY maintained LPI counter which is in the
Clause 45 vendor space, device address 7, offset 0x803F. The counter is
cleared on read.
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230531231729.1873932-1-florian.fainelli@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e8b6f79b

r8169: use dev_err_probe in all appropriate places in rtl_init_one() · 733b3e27

Heiner Kallweit authored May 31, 2023

In addition to properly handling probe deferrals dev_err_probe()
conveniently combines printing an error message with returning
the errno. So let's use it for every error path in rtl_init_one()
to simplify the code.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/f0596a19-d517-e301-b649-304f9247b75a@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

733b3e27

Merge branch 'extend-dt-bindings-for-pse-pd-controllers-and-update-prtt1c-dts' · b345b2a6

Jakub Kicinski authored Jun 01, 2023

Oleksij Rempel says:

====================
Extend dt-bindings for PSE-PD controllers and update prtt1c dts

This patch set comes in response to issues identified while adding PoDL
PSE support to the stm32 prtt1c device tree. The existing pse-pd device
tree bindings did not allow node name patterns like "ethernet-pse-0" and
"ethernet-pse-1", leading to validation failures.

To address these false positives in validation, the device tree bindings
are extended to support these node name patterns. Alongside this, an
example node is added to aid in the improved validation process.
====================

Link: https://lore.kernel.org/r/20230531102113.3353065-1-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b345b2a6

dt-bindings: net: pse-pd: Allow -N suffix for ethernet-pse node names · bd415f6c

Oleksij Rempel authored May 31, 2023

Extend the pattern matching for PSE-PD controller nodes to allow -N
suffixes. This enables the use of multiple "ethernet-pse" nodes without the
need for a "reg" property.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bd415f6c

devlink: bring port new reply back · 5ff9424e

Jiri Pirko authored May 31, 2023

In the offending fixes commit I mistakenly removed the reply message of
the port new command. I was under impression it is a new port
notification, partly due to the "notify" in the name of the helper
function. Bring the code sending reply with new port message back, this
time putting it directly to devlink_nl_cmd_port_new_doit()

Fixes: c496daeb ("devlink: remove duplicate port notification")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230531142025.2605001-1-jiri@resnulli.usSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5ff9424e

01 Jun, 2023 17 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · a03a91bd

Jakub Kicinski authored Jun 01, 2023

Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

drivers/net/ethernet/sfc/tc.c
  622ab656 ("sfc: fix error unwinds in TC offload")
  b6583d5e ("sfc: support TC decap rules matching on enc_src_port")

net/mptcp/protocol.c
  5b825727 ("mptcp: add annotations around msk->subflow accesses")
  e76c8ef5 ("mptcp: refactor mptcp_stream_accept()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

a03a91bd

Merge tag 'net-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 714069da

Linus Torvalds authored Jun 01, 2023

Pull networking fixes from Jakub Kicinski:
 "Happy Wear a Dress Day.

  Fairly standard-sized batch of fixes, accounting for the lack of
  sub-tree submissions this week. The mlx5 IRQ fixes are notable, people
  were complaining about that. No fires burning.

  Current release - regressions:

   - eth: mlx5e:
      - multiple fixes for dynamic IRQ allocation
      - prevent encap offload when neigh update is running

   - eth: mana: fix perf regression: remove rx_cqes, tx_cqes counters

  Current release - new code bugs:

   - eth: mlx5e: DR, add missing mutex init/destroy in pattern manager

  Previous releases - always broken:

   - tcp: deny tcp_disconnect() when threads are waiting

   - sched: prevent ingress Qdiscs from getting installed in random
     locations in the hierarchy and moving around

   - sched: flower: fix possible OOB write in fl_set_geneve_opt()

   - netlink: fix NETLINK_LIST_MEMBERSHIPS length report

   - udp6: fix race condition in udp6_sendmsg & connect

   - tcp: fix mishandling when the sack compression is deferred

   - rtnetlink: validate link attributes set at creation time

   - mptcp: fix connect timeout handling

   - eth: stmmac: fix call trace when stmmac_xdp_xmit() is invoked

   - eth: amd-xgbe: fix the false linkup in xgbe_phy_status

   - eth: mlx5e:
      - fix corner cases in internal buffer configuration
      - drain health before unregistering devlink

   - usb: qmi_wwan: set DTR quirk for BroadMobi BM818

  Misc:

   - tcp: return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if
     user_mss set"

* tag 'net-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits)
  mptcp: fix active subflow finalization
  mptcp: add annotations around sk->sk_shutdown accesses
  mptcp: fix data race around msk->first access
  mptcp: consolidate passive msk socket initialization
  mptcp: add annotations around msk->subflow accesses
  mptcp: fix connect timeout handling
  rtnetlink: add the missing IFLA_GRO_ tb check in validate_linkmsg
  rtnetlink: move IFLA_GSO_ tb check to validate_linkmsg
  rtnetlink: call validate_linkmsg in rtnl_create_link
  ice: recycle/free all of the fragments from multi-buffer frame
  net: phy: mxl-gpy: extend interrupt fix to all impacted variants
  net: renesas: rswitch: Fix return value in error path of xmit
  net: dsa: mv88e6xxx: Increase wait after reset deactivation
  net: ipa: Use correct value for IPA_STATUS_SIZE
  tcp: fix mishandling when the sack compression is deferred.
  net/sched: flower: fix possible OOB write in fl_set_geneve_opt()
  sfc: fix error unwinds in TC offload
  net/mlx5: Read embedded cpu after init bit cleared
  net/mlx5e: Fix error handling in mlx5e_refresh_tirs
  net/mlx5: Ensure af_desc.mask is properly initialized
  ...

714069da

fork, vhost: Use CLONE_THREAD to fix freezer/ps regression · f9010dbd

Mike Christie authored Jun 01, 2023

When switching from kthreads to vhost_tasks two bugs were added:
1. The vhost worker tasks's now show up as processes so scripts doing
ps or ps a would not incorrectly detect the vhost task as another
process.  2. kthreads disabled freeze by setting PF_NOFREEZE, but
vhost tasks's didn't disable or add support for them.

To fix both bugs, this switches the vhost task to be thread in the
process that does the VHOST_SET_OWNER ioctl, and has vhost_worker call
get_signal to support SIGKILL/SIGSTOP and freeze signals. Note that
SIGKILL/STOP support is required because CLONE_THREAD requires
CLONE_SIGHAND which requires those 2 signals to be supported.

This is a modified version of the patch written by Mike Christie
<michael.christie@oracle.com> which was a modified version of patch
originally written by Linus.

Much of what depended upon PF_IO_WORKER now depends on PF_USER_WORKER.
Including ignoring signals, setting up the register state, and having
get_signal return instead of calling do_group_exit.

Tidied up the vhost_task abstraction so that the definition of
vhost_task only needs to be visible inside of vhost_task.c.  Making
it easier to review the code and tell what needs to be done where.
As part of this the main loop has been moved from vhost_worker into
vhost_task_fn.  vhost_worker now returns true if work was done.

The main loop has been updated to call get_signal which handles
SIGSTOP, freezing, and collects the message that tells the thread to
exit as part of process exit.  This collection clears
__fatal_signal_pending.  This collection is not guaranteed to
clear signal_pending() so clear that explicitly so the schedule()
sleeps.

For now the vhost thread continues to exist and run work until the
last file descriptor is closed and the release function is called as
part of freeing struct file.  To avoid hangs in the coredump
rendezvous and when killing threads in a multi-threaded exec.  The
coredump code and de_thread have been modified to ignore vhost threads.

Remvoing the special case for exec appears to require teaching
vhost_dev_flush how to directly complete transactions in case
the vhost thread is no longer running.

Removing the special case for coredump rendezvous requires either the
above fix needed for exec or moving the coredump rendezvous into
get_signal.

Fixes: 6e890c5d ("vhost: use vhost_tasks for worker threads")
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Co-developed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

f9010dbd

Merge tag 'mlx5-fixes-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · a451b8eb

Jakub Kicinski authored Jun 01, 2023

Saeed Mahameed says:

====================
mlx5 fixes 2023-05-31

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2023-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Read embedded cpu after init bit cleared
  net/mlx5e: Fix error handling in mlx5e_refresh_tirs
  net/mlx5: Ensure af_desc.mask is properly initialized
  net/mlx5: Fix setting of irq->map.index for static IRQ case
  net/mlx5: Remove rmap also in case dynamic MSIX not supported
====================

Link: https://lore.kernel.org/r/20230601031051.131529-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a451b8eb

Merge branch 'mptcp-fixes-for-connect-timeout-access-annotations-and-subflow-init' · 66dd1014

Jakub Kicinski authored Jun 01, 2023

Mat Martineau says:

====================
mptcp: Fixes for connect timeout, access annotations, and subflow init

Patch 1 allows the SO_SNDTIMEO sockopt to correctly change the connect
timeout on MPTCP sockets.

Patches 2-5 add READ_ONCE()/WRITE_ONCE() annotations to fix KCSAN issues.

Patch 6 correctly initializes some subflow fields on outgoing connections.
====================

Link: https://lore.kernel.org/r/20230531-send-net-20230531-v1-0-47750c420571@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

66dd1014

mptcp: fix active subflow finalization · 55b47ca7

Paolo Abeni authored May 31, 2023

Active subflow are inserted into the connection list at creation time.
When the MPJ handshake completes successfully, a new subflow creation
netlink event is generated correctly, but the current code wrongly
avoid initializing a couple of subflow data.

The above will cause misbehavior on a few exceptional events: unneeded
mptcp-level retransmission on msk-level sequence wrap-around and infinite
mapping fallback even when a MPJ socket is present.

Address the issue factoring out the needed initialization in a new helper
and invoking the latter from __mptcp_finish_join() time for passive
subflow and from mptcp_finish_join() for active ones.

Fixes: 0530020a ("mptcp: track and update contiguous data status")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

55b47ca7

mptcp: add annotations around sk->sk_shutdown accesses · 6b9831bf

Paolo Abeni authored May 31, 2023

Christoph reported the mptcp variant of a recently addressed plain
TCP issue. Similar to commit e14cadfd ("tcp: add annotations around
sk->sk_shutdown accesses") add READ/WRITE ONCE annotations to silence
KCSAN reports around lockless sk_shutdown access.

Fixes: 71ba088c ("mptcp: cleanup accept and poll")
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/401Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

6b9831bf

mptcp: fix data race around msk->first access · 1b1b43ee

Paolo Abeni authored May 31, 2023

The first subflow socket is accessed outside the msk socket lock
by mptcp_subflow_fail(), we need to annotate each write access
with WRITE_ONCE, but a few spots still lacks it.

Fixes: 76a13b31 ("mptcp: invoke MP_FAIL response when needed")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

1b1b43ee

mptcp: consolidate passive msk socket initialization · 7e8b88ec

Paolo Abeni authored May 31, 2023

When the msk socket is cloned at MPC handshake time, a few
fields are initialized in a racy way outside mptcp_sk_clone()
and the msk socket lock.

The above is due historical reasons: before commit a88d0092
("mptcp: simplify subflow_syn_recv_sock()") as the first subflow socket
carrying all the needed date was not available yet at msk creation
time

We can now refactor the code moving the missing initialization bit
under the socket lock, removing the init race and avoiding some
code duplication.

This will also simplify the next patch, as all msk->first write
access are now under the msk socket lock.

Fixes: 0397c6d8 ("mptcp: keep unaccepted MPC subflow into join list")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

7e8b88ec

mptcp: add annotations around msk->subflow accesses · 5b825727

Paolo Abeni authored May 31, 2023

The MPTCP can access the first subflow socket in a few spots
outside the socket lock scope. That is actually safe, as MPTCP
will delete the socket itself only after the msk sock close().

Still the such accesses causes a few KCSAN splats, as reported
by Christoph. Silence the harmless warning adding a few annotation
around the relevant accesses.

Fixes: 71ba088c ("mptcp: cleanup accept and poll")
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/402Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5b825727

mptcp: fix connect timeout handling · 786fc124

Paolo Abeni authored May 31, 2023

Ondrej reported a functional issue WRT timeout handling on connect
with a nice reproducer.

The problem is that the current mptcp connect waits for both the
MPTCP socket level timeout, and the first subflow socket timeout.
The latter is not influenced/touched by the exposed setsockopt().

Overall the above makes the SO_SNDTIMEO a no-op on connect.

Since mptcp_connect is invoked via inet_stream_connect and the
latter properly handle the MPTCP level timeout, we can address the
issue making the nested subflow level connect always unblocking.

This also allow simplifying a bit the code, dropping an ugly hack
to handle the fastopen and custom proto_ops connect.

The issues predates the blamed commit below, but the current resolution
requires the infrastructure introduced there.

Fixes: 54f1944e ("mptcp: factor out mptcp_connect()")
Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/399
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

786fc124

Merge branch 'rtnetlink-a-couple-of-fixes-in-linkmsg-validation' · 3021dbfe

Jakub Kicinski authored Jun 01, 2023

Xin Long says:

====================
rtnetlink: a couple of fixes in linkmsg validation

validate_linkmsg() was introduced to do linkmsg validation for existing
links. However, the new created links also need this linkmsg validation.

Add validate_linkmsg() check for link creating in Patch 1, and add more
tb checks into validate_linkmsg() in Patch 2 and 3.
====================

Link: https://lore.kernel.org/r/cover.1685548598.git.lucien.xin@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3021dbfe

rtnetlink: add the missing IFLA_GRO_ tb check in validate_linkmsg · 65d6914e

Xin Long authored May 31, 2023

This fixes the issue that dev gro_max_size and gso_ipv4_max_size
can be set to a huge value:

  # ip link add dummy1 type dummy
  # ip link set dummy1 gro_max_size 4294967295
  # ip -d link show dummy1
    dummy addrgenmode eui64 ... gro_max_size 4294967295

Fixes: 0fe79f28 ("net: allow gro_max_size to exceed 65536")
Fixes: 9eefedd5 ("net: add gso_ipv4_max_size and gro_ipv4_max_size per device")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

65d6914e

rtnetlink: move IFLA_GSO_ tb check to validate_linkmsg · fef5b228

Xin Long authored May 31, 2023

These IFLA_GSO_* tb check should also be done for the new created link,
otherwise, they can be set to a huge value when creating links:

  # ip link add dummy1 gso_max_size 4294967295 type dummy
  # ip -d link show dummy1
    dummy addrgenmode eui64 ... gso_max_size 4294967295

Fixes: 46e6b992 ("rtnetlink: allow GSO maximums to be set on device creation")
Fixes: 9eefedd5 ("net: add gso_ipv4_max_size and gro_ipv4_max_size per device")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

fef5b228

rtnetlink: call validate_linkmsg in rtnl_create_link · b0ad3c17

Xin Long authored May 31, 2023

validate_linkmsg() was introduced by commit 1840bb13 ("[RTNL]:
Validate hardware and broadcast address attribute for RTM_NEWLINK")
to validate tb[IFLA_ADDRESS/BROADCAST] for existing links. The same
check should also be done for newly created links.

This patch adds validate_linkmsg() call in rtnl_create_link(), to
avoid the invalid address set when creating some devices like:

  # ip link add dummy0 type dummy
  # ip link add link dummy0 name mac0 address 01:02 type macsec

Fixes: 0e06877c ("[RTNETLINK]: rtnl_link: allow specifying initial device address")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b0ad3c17

ice: recycle/free all of the fragments from multi-buffer frame · abaf8d51

Maciej Fijalkowski authored May 31, 2023

The ice driver caches next_to_clean value at the beginning of
ice_clean_rx_irq() in order to remember the first buffer that has to be
freed/recycled after main Rx processing loop. The end boundary is
indicated by first descriptor of frame that Rx processing loop has ended
its duties. Note that if mentioned loop ended in the middle of gathering
multi-buffer frame, next_to_clean would be pointing to the descriptor in
the middle of the frame BUT freeing/recycling stage will stop at the
first descriptor. This means that next iteration of ice_clean_rx_irq()
will miss the (first_desc, next_to_clean - 1) entries.

 When running various 9K MTU workloads, such splats were observed:

[  540.780716] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  540.787787] #PF: supervisor read access in kernel mode
[  540.793002] #PF: error_code(0x0000) - not-present page
[  540.798218] PGD 0 P4D 0
[  540.800801] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  540.805231] CPU: 18 PID: 3984 Comm: xskxceiver Tainted: G        W          6.3.0-rc7+ #96
[  540.813619] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[  540.824209] RIP: 0010:ice_clean_rx_irq+0x2b6/0xf00 [ice]
[  540.829678] Code: 74 24 10 e9 aa 00 00 00 8b 55 78 41 31 57 10 41 09 c4 4d 85 ff 0f 84 83 00 00 00 49 8b 57 08 41 8b 4f 1c 65 8b 35 1a fa 4b 3f <48> 8b 02 48 c1 e8 3a 39 c6 0f 85 a2 00 00 00 f6 42 08 02 0f 85 98
[  540.848717] RSP: 0018:ffffc9000f42fc50 EFLAGS: 00010282
[  540.854029] RAX: 0000000000000004 RBX: 0000000000000002 RCX: 000000000000fffe
[  540.861272] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000ffffffff
[  540.868519] RBP: ffff88984a05ac00 R08: 0000000000000000 R09: dead000000000100
[  540.875760] R10: ffff88983fffcd00 R11: 000000000010f2b8 R12: 0000000000000004
[  540.883008] R13: 0000000000000003 R14: 0000000000000800 R15: ffff889847a10040
[  540.890253] FS:  00007f6ddf7fe640(0000) GS:ffff88afdf800000(0000) knlGS:0000000000000000
[  540.898465] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  540.904299] CR2: 0000000000000000 CR3: 000000010d3da001 CR4: 00000000007706e0
[  540.911542] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  540.918789] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  540.926032] PKRU: 55555554
[  540.928790] Call Trace:
[  540.931276]  <TASK>
[  540.933418]  ice_napi_poll+0x4ca/0x6d0 [ice]
[  540.937804]  ? __pfx_ice_napi_poll+0x10/0x10 [ice]
[  540.942716]  napi_busy_loop+0xd7/0x320
[  540.946537]  xsk_recvmsg+0x143/0x170
[  540.950178]  sock_recvmsg+0x99/0xa0
[  540.953729]  __sys_recvfrom+0xa8/0x120
[  540.957543]  ? do_futex+0xbd/0x1d0
[  540.961008]  ? __x64_sys_futex+0x73/0x1d0
[  540.965083]  __x64_sys_recvfrom+0x20/0x30
[  540.969155]  do_syscall_64+0x38/0x90
[  540.972796]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  540.977934] RIP: 0033:0x7f6de5f27934

To fix this, set cached_ntc to first_desc so that at the end, when
freeing/recycling buffers, descriptors from first to ntc are not missed.

Fixes: 2fba7dc5 ("ice: Add support for XDP multi-buffer on Rx side")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230531154457.3216621-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

abaf8d51

net: phy: mxl-gpy: extend interrupt fix to all impacted variants · 519d6487

Xu Liang authored May 31, 2023

The interrupt fix in commit 97a89ed1 should be applied on all variants
of GPY2xx PHY and GPY115C.

Fixes: 97a89ed1 ("net: phy: mxl-gpy: disable interrupts on GPY215 by default")
Signed-off-by: Xu Liang <lxu@maxlinear.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230531074822.39136-1-lxu@maxlinear.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

519d6487