Commits · 4be84df38a6f49b81e5909ede78242ba1538c1e6 · Kirill Smelkov / linux

20 Dec, 2022 7 commits

Merge tag 'linux-can-fixes-for-6.2-20221219' of... · 4be84df3

Jakub Kicinski authored Dec 19, 2022

Merge tag 'linux-can-fixes-for-6.2-20221219' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2022-12-19

The first patch is by Vincent Mailhol and adds the etas_es58x
devlink documentation to the index.

Haibo Chen's patch for the flexcan driver fixes a unbalanced
pm_runtime_enable warning.

The last patch is by me, targets the kvaser_usb driver and fixes
an error occurring with gcc-13.

* tag 'linux-can-fixes-for-6.2-20221219' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: kvaser_usb: hydra: help gcc-13 to figure out cmd_len
  can: flexcan: avoid unbalanced pm_runtime_enable warning
  Documentation: devlink: add missing toc entry for etas_es58x devlink doc
====================

Link: https://lore.kernel.org/r/20221219155210.1143439-1-mkl@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4be84df3

Merge branch 'stop-corrupting-socket-s-task_frag' · 918fb1aa

Jakub Kicinski authored Dec 19, 2022

Benjamin Coddington says:

====================
Stop corrupting socket's task_frag

The networking code uses flags in sk_allocation to determine if it can use
current->task_frag, however in-kernel users of sockets may stop setting
sk_allocation when they convert to the preferred memalloc_nofs_save/restore,
as SUNRPC has done in commit a1231fda ("SUNRPC: Set memalloc_nofs_save()
on all rpciod/xprtiod jobs").

This will cause corruption in current->task_frag when recursing into the
network layer for those subsystems during page fault or reclaim.  The
corruption is difficult to diagnose because stack traces may not contain the
offending subsystem at all.  The corruption is unlikely to show up in
testing because it requires memory pressure, and so subsystems that
convert to memalloc_nofs_save/restore are likely to continue to run into
this issue.

Previous reports and proposed fixes:
https://lore.kernel.org/netdev/96a18bd00cbc6cb554603cc0d6ef1c551965b078.1663762494.git.gnault@redhat.com/
https://lore.kernel.org/netdev/b4d8cb09c913d3e34f853736f3f5628abfd7f4b6.1656699567.git.gnault@redhat.com/
https://lore.kernel.org/linux-nfs/de6d99321d1dcaa2ad456b92b3680aa77c07a747.1665401788.git.gnault@redhat.com/

Guilluame Nault has done all of the hard work tracking this problem down and
finding the best fix for this issue.  I'm just taking a turn posting another
fix.
====================

Link: https://lore.kernel.org/r/cover.1671194454.git.bcodding@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

918fb1aa

net: simplify sk_page_frag · 08f65892

Benjamin Coddington authored Dec 16, 2022

Now that in-kernel socket users that may recurse during reclaim have benn
converted to sk_use_task_frag = false, we can have sk_page_frag() simply
check that value.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

08f65892

Treewide: Stop corrupting socket's task_frag · 98123866

Benjamin Coddington authored Dec 16, 2022

Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
GFP_NOIO flag on sk_allocation which the networking system uses to decide
when it is safe to use current->task_frag.  The results of this are
unexpected corruption in task_frag when SUNRPC is involved in memory
reclaim.

The corruption can be seen in crashes, but the root cause is often
difficult to ascertain as a crashing machine's stack trace will have no
evidence of being near NFS or SUNRPC code.  I believe this problem to
be much more pervasive than reports to the community may indicate.

Fix this by having kernel users of sockets that may corrupt task_frag due
to reclaim set sk_use_task_frag = false.  Preemptively correcting this
situation for users that still set sk_allocation allows them to convert to
memalloc_nofs_save/restore without the same unexpected corruptions that are
sure to follow, unlikely to show up in testing, and difficult to bisect.

CC: Philipp Reisner <philipp.reisner@linbit.com>
CC: Lars Ellenberg <lars.ellenberg@linbit.com>
CC: "Christoph Böhmwalder" <christoph.boehmwalder@linbit.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Josef Bacik <josef@toxicpanda.com>
CC: Keith Busch <kbusch@kernel.org>
CC: Christoph Hellwig <hch@lst.de>
CC: Sagi Grimberg <sagi@grimberg.me>
CC: Lee Duncan <lduncan@suse.com>
CC: Chris Leech <cleech@redhat.com>
CC: Mike Christie <michael.christie@oracle.com>
CC: "James E.J. Bottomley" <jejb@linux.ibm.com>
CC: "Martin K. Petersen" <martin.petersen@oracle.com>
CC: Valentina Manea <valentina.manea.m@gmail.com>
CC: Shuah Khan <shuah@kernel.org>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: David Howells <dhowells@redhat.com>
CC: Marc Dionne <marc.dionne@auristor.com>
CC: Steve French <sfrench@samba.org>
CC: Christine Caulfield <ccaulfie@redhat.com>
CC: David Teigland <teigland@redhat.com>
CC: Mark Fasheh <mark@fasheh.com>
CC: Joel Becker <jlbec@evilplan.org>
CC: Joseph Qi <joseph.qi@linux.alibaba.com>
CC: Eric Van Hensbergen <ericvh@gmail.com>
CC: Latchesar Ionkov <lucho@ionkov.net>
CC: Dominique Martinet <asmadeus@codewreck.org>
CC: Ilya Dryomov <idryomov@gmail.com>
CC: Xiubo Li <xiubli@redhat.com>
CC: Chuck Lever <chuck.lever@oracle.com>
CC: Jeff Layton <jlayton@kernel.org>
CC: Trond Myklebust <trond.myklebust@hammerspace.com>
CC: Anna Schumaker <anna@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

98123866

net: Introduce sk_use_task_frag in struct sock. · fb87bd47

Guillaume Nault authored Dec 16, 2022

Sockets that can be used while recursing into memory reclaim, like
those used by network block devices and file systems, mustn't use
current->task_frag: if the current process is already using it, then
the inner memory reclaim call would corrupt the task_frag structure.

To avoid this, sk_page_frag() uses ->sk_allocation to detect sockets
that mustn't use current->task_frag, assuming that those used during
memory reclaim had their allocation constraints reflected in
->sk_allocation.

This unfortunately doesn't cover all cases: in an attempt to remove all
usage of GFP_NOFS and GFP_NOIO, sunrpc stopped setting these flags in
->sk_allocation, and used memalloc_nofs critical sections instead.
This breaks the sk_page_frag() heuristic since the allocation
constraints are now stored in current->flags, which sk_page_frag()
can't read without risking triggering a cache miss and slowing down
TCP's fast path.

This patch creates a new field in struct sock, named sk_use_task_frag,
which sockets with memory reclaim constraints can set to false if they
can't safely use current->task_frag. In such cases, sk_page_frag() now
always returns the socket's page_frag (->sk_frag). The first user is
sunrpc, which needs to avoid using current->task_frag but can keep
->sk_allocation set to GFP_KERNEL otherwise.

Eventually, it might be possible to simplify sk_page_frag() by only
testing ->sk_use_task_frag and avoid relying on the ->sk_allocation
heuristic entirely (assuming other sockets will set ->sk_use_task_frag
according to their constraints in the future).

The new ->sk_use_task_frag field is placed in a hole in struct sock and
belongs to a cache line shared with ->sk_shutdown. Therefore it should
be hot and shouldn't have negative performance impacts on TCP's fast
path (sk_shutdown is tested just before the while() loop in
tcp_sendmsg_locked()).

Link: https://lore.kernel.org/netdev/b4d8cb09c913d3e34f853736f3f5628abfd7f4b6.1656699567.git.gnault@redhat.com/Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

fb87bd47

mctp: Remove device type check at unregister · b389a902

Matt Johnston authored Dec 15, 2022

The unregister check could be incorrectly triggered if a netdev
changes its type after register. That is possible for a tun device
using TUNSETLINK ioctl, resulting in mctp unregister failing
and the netdev unregister waiting forever.

This was encountered by https://github.com/openthread/openthread/issues/8523

Neither check at register or unregister is required. They were added in
an attempt to track down mctp_ptr being set unexpectedly, which should
not happen in normal operation.

Fixes: 7b1871af ("mctp: Warn if pointer is set for a wrong dev type")
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://lore.kernel.org/r/20221215054933.2403401-1-matt@codeconstruct.com.auSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b389a902

net: dsa: microchip: remove IRQF_TRIGGER_FALLING in request_threaded_irq · 62e027fb

Arun Ramadoss authored Dec 13, 2022

KSZ swithes used interrupts for detecting the phy link up and down.
During registering the interrupt handler, it used IRQF_TRIGGER_FALLING
flag. But this flag has to be retrieved from device tree instead of hard
coding in the driver, so removing the flag.

Fixes: ff319a64 ("net: dsa: microchip: move interrupt handling logic from lan937x to ksz_common")
Reported-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Link: https://lore.kernel.org/r/20221213101440.24667-1-arun.ramadoss@microchip.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

62e027fb

19 Dec, 2022 19 commits

can: kvaser_usb: hydra: help gcc-13 to figure out cmd_len · f0062291

Marc Kleine-Budde authored Dec 19, 2022

Debian's gcc-13 [1] throws the following error in
kvaser_usb_hydra_cmd_size():

[1] gcc version 13.0.0 20221214 (experimental) [master r13-4693-g512098a3316] (Debian 13-20221214-1)

| drivers/net/can/usb/kvaser_usb/kvaser_usb_hydra.c:502:65: error:
| array subscript ‘struct kvaser_cmd_ext[0]’ is partly outside array
| bounds of ‘unsigned char[32]’ [-Werror=array-bounds=]
|   502 |                 ret = le16_to_cpu(((struct kvaser_cmd_ext *)cmd)->len);

kvaser_usb_hydra_cmd_size() returns the size of given command. It
depends on the command number (cmd->header.cmd_no). For extended
commands (cmd->header.cmd_no == CMD_EXTENDED) the above shown code is
executed.

Help gcc to recognize that this code path is not taken in all cases,
by calling kvaser_usb_hydra_cmd_size() directly after assigning the
command number.

Fixes: aec5fb22 ("can: kvaser_usb: Add support for Kvaser USB hydra family")
Cc: Jimmy Assarsson <extja@kvaser.com>
Cc: Anssi Hannula <anssi.hannula@bitwise.fi>
Link: https://lore.kernel.org/all/20221219110104.1073881-1-mkl@pengutronix.deTested-by: Jimmy Assarsson <extja@kvaser.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

f0062291

can: flexcan: avoid unbalanced pm_runtime_enable warning · 3bc2afcb

Haibo Chen authored Dec 13, 2022

When do suspend/resume, meet the following warning message:
[   30.028336] flexcan 425b0000.can: Unbalanced pm_runtime_enable!

Balance the pm_runtime_force_suspend() and pm_runtime_force_resume().

Fixes: 8cb53b48 ("can: flexcan: add auto stop mode for IMX93 to support wakeup")
Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Link: https://lore.kernel.org/all/20221213094351.3023858-1-haibo.chen@nxp.comSigned-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

3bc2afcb

Documentation: devlink: add missing toc entry for etas_es58x devlink doc · 115dd546

Vincent Mailhol authored Dec 13, 2022

toc entry is missing for etas_es58x devlink doc and triggers this warning:

Documentation/networking/devlink/etas_es58x.rst: WARNING: document isn't included in any toctree

Add the missing toc entry.

Fixes: 9f63f96a ("Documentation: devlink: add devlink documentation for the etas_es58x driver")
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20221213051136.721887-1-mailhol.vincent@wanadoo.frReported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

115dd546

mctp: serial: Fix starting value for frame check sequence · 2856a627

Jeremy Kerr authored Dec 16, 2022

RFC1662 defines the start state for the crc16 FCS to be 0xffff, but
we're currently starting at zero.

This change uses the correct start state. We're only early in the
adoption for the serial binding, so there aren't yet any other users to
interface to.

Fixes: a0c2ccd9 ("mctp: Add MCTP-over-serial transport binding")
Reported-by: Harsh Tyagi <harshtya@google.com>
Tested-by: Harsh Tyagi <harshtya@google.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2856a627

nfp: fix unaligned io read of capabilities word · 1b0c84a3

Huanhuan Wang authored Dec 16, 2022

The address of 32-bit extend capability is not qword aligned,
and may cause exception in some arch.

Fixes: 484963ce ("nfp: extend capability and control words")
Signed-off-by: Huanhuan Wang <huanhuan.wang@corigine.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b0c84a3

net: stream: purge sk_error_queue in sk_stream_kill_queues() · e0c8bccd

Eric Dumazet authored Dec 16, 2022

Changheon Lee reported TCP socket leaks, with a nice repro.

It seems we leak TCP sockets with the following sequence:

1) SOF_TIMESTAMPING_TX_ACK is enabled on the socket.

   Each ACK will cook an skb put in error queue, from __skb_tstamp_tx().
   __skb_tstamp_tx() is using skb_clone(), unless
   SOF_TIMESTAMPING_OPT_TSONLY was also requested.

2) If the application is also using MSG_ZEROCOPY, then we put in the
   error queue cloned skbs that had a struct ubuf_info attached to them.

   Whenever an struct ubuf_info is allocated, sock_zerocopy_alloc()
   does a sock_hold().

   As long as the cloned skbs are still in sk_error_queue,
   socket refcount is kept elevated.

3) Application closes the socket, while error queue is not empty.

Since tcp_close() no longer purges the socket error queue,
we might end up with a TCP socket with at least one skb in
error queue keeping the socket alive forever.

This bug can be (ab)used to consume all kernel memory
and freeze the host.

We need to purge the error queue, with proper synchronization
against concurrent writers.

Fixes: 24bcbe1c ("net: stream: don't purge sk_error_queue in sk_stream_kill_queues()")
Reported-by: Changheon Lee <darklight2357@icloud.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e0c8bccd

myri10ge: Fix an error handling path in myri10ge_probe() · d83b950d

Christophe JAILLET authored Dec 18, 2022

Some memory allocated in myri10ge_probe_slices() is not released in the
error handling path of myri10ge_probe().

Add the corresponding kfree(), as already done in the remove function.

Fixes: 0dcffac1 ("myri10ge: add multislices support")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

d83b950d

net: microchip: vcap: Fix initialization of value and mask · 10073399

Horatiu Vultur authored Dec 19, 2022

Fix the following smatch warning:

smatch warnings:
drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c:103 vcap_debugfs_show_rule_keyfield() error: uninitialized symbol 'value'.
drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c:106 vcap_debugfs_show_rule_keyfield() error: uninitialized symbol 'mask'.

In case the vcap field was VCAP_FIELD_U128 and the key was different
than IP6_S/DIP then the value and mask were not initialized, therefore
initialize them.

Fixes: 610c32b2 ("net: microchip: vcap: Add vcap_get_rule")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

10073399

Merge branch 'rxrpc-fixes' · 98dbec0a

David S. Miller authored Dec 19, 2022

David Howells says:

====================
rxrpc: Fixes for I/O thread conversion/SACK table expansion

Here are some fixes for AF_RXRPC:

 (1) Fix missing unlock in rxrpc's sendmsg.

 (2) Fix (lack of) propagation of security settings to rxrpc_call.

 (3) Fix NULL ptr deref in rxrpc_unuse_local().

 (4) Fix problem with kthread_run() not invoking the I/O thread function if
     the kthread gets stopped first.  Possibly this should actually be
     fixed in the kthread code.

 (5) Fix locking problem as putting a peer (which may be done from RCU) may
     now invoke kthread_stop().

 (6) Fix switched parameters in a couple of trace calls.

 (7) Fix I/O thread's checking for kthread stop to make sure it completes
     all outstanding work before returning so that calls are cleaned up.

 (8) Fix an uninitialised var in the new rxperf test server.

 (9) Fix the return value of rxrpc_new_incoming_call() so that the checks
     on it work correctly.

The patches fix at least one syzbot bug[1] and probably some others that
don't have reproducers[2][3][4].  I think it also fixes another[5], but
that showed another failure during testing that was different to the
original.

There's also an outstanding bug in rxrpc_put_peer()[6] that is fixed by a
combination of several patches in my rxrpc-next branch, but I haven't
included that here.
====================
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: kafs-testing+fedora36_64checkkafs-build-164@auristor.com
Signed-off-by: David S. Miller <davem@davemloft.net>

98dbec0a

rxrpc: Fix the return value of rxrpc_new_incoming_call() · 31d35a02

David Howells authored Dec 15, 2022

Dan Carpenter sayeth[1]:

  The patch 5e6ef4f1: "rxrpc: Make the I/O thread take over the
  call and local processor work" from Jan 23, 2020, leads to the
  following Smatch static checker warning:

	net/rxrpc/io_thread.c:283 rxrpc_input_packet()
	warn: bool is not less than zero.

Fix this (for now) by changing rxrpc_new_incoming_call() to return an int
with 0 or error code rather than bool.  Note that the actual return value
of rxrpc_input_packet() is currently ignored.  I have a separate patch to
clean that up.

Fixes: 5e6ef4f1 ("rxrpc: Make the I/O thread take over the call and local processor work")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2022-December/006123.html [1]
Signed-off-by: David S. Miller <davem@davemloft.net>

31d35a02

rxrpc: rxperf: Fix uninitialised variable · 11e1706b

David Howells authored Dec 15, 2022

Dan Carpenter sayeth[1]:

  The patch 75bfdbf2: "rxrpc: Implement an in-kernel rxperf server
  for testing purposes" from Nov 3, 2022, leads to the following Smatch
  static checker warning:

	net/rxrpc/rxperf.c:337 rxperf_deliver_to_call()
	error: uninitialized symbol 'ret'.

Fix this by initialising ret to 0.  The value is only used for tracing
purposes in the rxperf server.

Fixes: 75bfdbf2 ("rxrpc: Implement an in-kernel rxperf server for testing purposes")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2022-December/006124.html [1]
Signed-off-by: David S. Miller <davem@davemloft.net>

11e1706b

rxrpc: Fix I/O thread stop · 743d1768

David Howells authored Dec 15, 2022

The rxrpc I/O thread checks to see if there's any work it needs to do, and
if not, checks kthread_should_stop() before scheduling, and if it should
stop, breaks out of the loop and tries to clean up and exit.

This can, however, race with socket destruction, wherein outstanding calls
are aborted and released from the socket and then the socket unuses the
local endpoint, causing kthread_stop() to be issued.  The abort is deferred
to the I/O thread and the event can by issued between the I/O thread
checking if there's any work to be done (such as processing call aborts)
and the stop being seen.

This results in the I/O thread stopping processing of events whilst call
cleanup events are still outstanding, leading to connections or other
objects still being around and uncleaned up, which can result in assertions
being triggered, e.g.:

    rxrpc: AF_RXRPC: Leaked client conn 00000000e8009865 {2}
    ------------[ cut here ]------------
    kernel BUG at net/rxrpc/conn_client.c:64!

Fix this by retrieving the kthread_should_stop() indication, then checking
to see if there's more work to do, and going back round the loop if there
is, and breaking out of the loop only if there wasn't.

This was triggered by a syzbot test that produced some other symptom[1].

Fixes: a275da62 ("rxrpc: Create a per-local endpoint receive queue and I/O thread")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/0000000000002b4a9f05ef2b616f@google.com/ [1]
Signed-off-by: David S. Miller <davem@davemloft.net>

743d1768

rxrpc: Fix switched parameters in peer tracing · c838f1a7

David Howells authored Dec 15, 2022

Fix the switched parameters on rxrpc_alloc_peer() and rxrpc_get_peer().
The ref argument and the why argument got mixed.

Fixes: 47c810a7 ("rxrpc: trace: Don't use __builtin_return_address for rxrpc_peer tracing")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>

c838f1a7

rxrpc: Fix locking issues in rxrpc_put_peer_locked() · 608aecd1

David Howells authored Dec 15, 2022

Now that rxrpc_put_local() may call kthread_stop(), it can't be called
under spinlock as it might sleep.  This can cause a problem in the peer
keepalive code in rxrpc as it tries to avoid dropping the peer_hash_lock
from the point it needs to re-add peer->keepalive_link to going round the
loop again in rxrpc_peer_keepalive_dispatch().

Fix this by just dropping the lock when we don't need it and accepting that
we'll have to take it again.  This code is only called about every 20s for
each peer, so not very often.

This allows rxrpc_put_peer_unlocked() to be removed also.

If triggered, this bug produces an oops like the following, as reproduced
by a syzbot reproducer for a different oops[1]:

BUG: sleeping function called from invalid context at kernel/sched/completion.c:101
...
RCU nest depth: 0, expected: 0
3 locks held by kworker/u9:0/50:
 #0: ffff88810e74a138 ((wq_completion)krxrpcd){+.+.}-{0:0}, at: process_one_work+0x294/0x636
 #1: ffff8881013a7e20 ((work_completion)(&rxnet->peer_keepalive_work)){+.+.}-{0:0}, at: process_one_work+0x294/0x636
 #2: ffff88817d366390 (&rxnet->peer_hash_lock){+.+.}-{2:2}, at: rxrpc_peer_keepalive_dispatch+0x2bd/0x35f
...
Call Trace:
 <TASK>
 dump_stack_lvl+0x4c/0x5f
 __might_resched+0x2cf/0x2f2
 __wait_for_common+0x87/0x1e8
 kthread_stop+0x14d/0x255
 rxrpc_peer_keepalive_dispatch+0x333/0x35f
 rxrpc_peer_keepalive_worker+0x2e9/0x449
 process_one_work+0x3c1/0x636
 worker_thread+0x25f/0x359
 kthread+0x1a6/0x1b5
 ret_from_fork+0x1f/0x30

Fixes: a275da62 ("rxrpc: Create a per-local endpoint receive queue and I/O thread")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/0000000000002b4a9f05ef2b616f@google.com/ [1]
Signed-off-by: David S. Miller <davem@davemloft.net>

608aecd1

rxrpc: Fix I/O thread startup getting skipped · 8fbcc833

David Howells authored Dec 15, 2022

When starting a kthread, the __kthread_create_on_node() function, as called
from kthread_run(), waits for a completion to indicate that the task_struct
(or failure state) of the new kernel thread is available before continuing.

This does not wait, however, for the thread function to be invoked and,
indeed, will skip it if kthread_stop() gets called before it gets there.

If this happens, though, kthread_run() will have returned successfully,
indicating that the thread was started and returning the task_struct
pointer. The actual error indication is returned by kthread_stop().

Note that this is ambiguous, as the caller cannot tell whether the -EINTR
error code came from kthread() or from the thread function.

This was encountered in the new rxrpc I/O thread, where if the system is
being pounded hard by, say, syzbot, the check of KTHREAD_SHOULD_STOP can be
delayed long enough for kthread_stop() to get called when rxrpc releases a
socket - and this causes an oops because the I/O thread function doesn't
get started and thus doesn't remove the rxrpc_local struct from the
local_endpoints list.

Fix this by using a completion to wait for the thread to actually enter
rxrpc_io_thread(). This makes sure the thread can't be prematurely
stopped and makes sure the relied-upon cleanup is done.

Fixes: a275da62 ("rxrpc: Create a per-local endpoint receive queue and I/O thread")
Reported-by: syzbot+3538a6a72efa8b059c38@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Hillf Danton <hdanton@sina.com>
Link: https://lore.kernel.org/r/000000000000229f1505ef2b6159@google.com/Signed-off-by: David S. Miller <davem@davemloft.net>

8fbcc833

rxrpc: Fix NULL deref in rxrpc_unuse_local() · eaa02390

David Howells authored Dec 15, 2022

Fix rxrpc_unuse_local() to get the debug_id *after* checking to see if
local is NULL.

Fixes: a2cf3264 ("rxrpc: Fold __rxrpc_unuse_local() into rxrpc_unuse_local()")
Reported-by: syzbot+3538a6a72efa8b059c38@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: syzbot+3538a6a72efa8b059c38@syzkaller.appspotmail.com
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>

eaa02390

rxrpc: Fix security setting propagation · fdb99487

David Howells authored Dec 15, 2022

Fix the propagation of the security settings from sendmsg to the rxrpc_call
struct.

Fixes: f3441d41 ("rxrpc: Copy client call parameters into rxrpc_call earlier")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>

fdb99487

rxrpc: Fix missing unlock in rxrpc_do_sendmsg() · 4feb2c44

David Howells authored Dec 15, 2022

One of the error paths in rxrpc_do_sendmsg() doesn't unlock the call mutex
before returning.  Fix it to do this.

Note that this still doesn't get rid of the checker warning:

   ../net/rxrpc/sendmsg.c:617:5: warning: context imbalance in 'rxrpc_do_sendmsg' - wrong count at exit

I think the interplay between the socket lock and the call's user_mutex may
be too complicated for checker to analyse, especially as
rxrpc_new_client_call_for_sendmsg(), which it calls, returns with the
call's user_mutex if successful but unconditionally drops the socket lock.

Fixes: e754eba6 ("rxrpc: Provide a cmsg to specify the amount of Tx data for a call")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>

4feb2c44

net_sched: reject TCF_EM_SIMPLE case for complex ematch module · 9cd3fd20

Cong Wang authored Dec 17, 2022

When TCF_EM_SIMPLE was introduced, it is supposed to be convenient
for ematch implementation:

https://lore.kernel.org/all/20050105110048.GO26856@postel.suug.ch/

"You don't have to, providing a 32bit data chunk without TCF_EM_SIMPLE
set will simply result in allocating & copy. It's an optimization,
nothing more."

So if an ematch module provides ops->datalen that means it wants a
complex data structure (saved in its em->data) instead of a simple u32
value. We should simply reject such a combination, otherwise this u32
could be misinterpreted as a pointer.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Reported-and-tested-by: syzbot+4caeae4c7103813598ae@syzkaller.appspotmail.com
Reported-by: Jun Nie <jun.nie@linaro.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9cd3fd20

18 Dec, 2022 1 commit

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 89529367

David S. Miller authored Dec 18, 2022

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-12-15 (igc)

Muhammad Husaini Zulkifli says:

This patch series fixes bugs for the Time-Sensitive Networking(TSN)
Qbv Scheduling features.

An overview of each patch series is given below:

Patch 1: Using a first flag bit to schedule a packet to the next cycle if
packet cannot fit in current Qbv cycle.
Patch 2: Enable strict cycle for Qbv scheduling.
Patch 3: Prevent user to set basetime less than zero during tc config.
Patch 4: Allow the basetime enrollment with zero value.
Patch 5: Calculate the new end time value to exclude the time interval that
exceed the cycle time as user can specify the cycle time in tc config.
Patch 6: Resolve the HW bugs where the gate is not fully closed.
---
This contains the net patches from this original pull request:
https://lore.kernel.org/netdev/20221205212414.3197525-1-anthony.l.nguyen@intel.com/

====================
Signed-off-by: David S. Miller <davem@davemloft.net>

89529367

17 Dec, 2022 3 commits

skbuff: Account for tail adjustment during pull operations · 2d7afdcb

Subash Abhinov Kasiviswanathan authored Dec 14, 2022

Extending the tail can have some unexpected side effects if a program uses
a helper like BPF_FUNC_skb_pull_data to read partial content beyond the
head skb headlen when all the skbs in the gso frag_list are linear with no
head_frag -

  kernel BUG at net/core/skbuff.c:4219!
  pc : skb_segment+0xcf4/0xd2c
  lr : skb_segment+0x63c/0xd2c
  Call trace:
   skb_segment+0xcf4/0xd2c
   __udp_gso_segment+0xa4/0x544
   udp4_ufo_fragment+0x184/0x1c0
   inet_gso_segment+0x16c/0x3a4
   skb_mac_gso_segment+0xd4/0x1b0
   __skb_gso_segment+0xcc/0x12c
   udp_rcv_segment+0x54/0x16c
   udp_queue_rcv_skb+0x78/0x144
   udp_unicast_rcv_skb+0x8c/0xa4
   __udp4_lib_rcv+0x490/0x68c
   udp_rcv+0x20/0x30
   ip_protocol_deliver_rcu+0x1b0/0x33c
   ip_local_deliver+0xd8/0x1f0
   ip_rcv+0x98/0x1a4
   deliver_ptype_list_skb+0x98/0x1ec
   __netif_receive_skb_core+0x978/0xc60

Fix this by marking these skbs as GSO_DODGY so segmentation can handle
the tail updates accordingly.

Fixes: 3dcbdb13 ("net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list")
Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/1671084718-24796-1-git-send-email-quic_subashab@quicinc.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2d7afdcb

devlink: protect devlink dump by the instance lock · 214964a1

Jakub Kicinski authored Dec 15, 2022

Take the instance lock around devlink_nl_fill() when dumping,
doit takes it already.

We are only dumping basic info so in the worst case we were risking
data races around the reload statistics. Until the big devlink mutex
was removed all relevant code was protected by it, so the missing
instance lock was not exposed.

Fixes: d3efc2a6 ("net: devlink: remove devlink_mutex")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20221216044122.1863550-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

214964a1

net: ethernet: ti: am65-cpsw: fix CONFIG_PM #ifdef · 078838f5

Arnd Bergmann authored Dec 15, 2022

The #ifdef check is incorrect and leads to a warning:

drivers/net/ethernet/ti/am65-cpsw-nuss.c:1679:13: error: 'am65_cpsw_nuss_remove_rx_chns' defined but not used [-Werror=unused-function]
1679 | static void am65_cpsw_nuss_remove_rx_chns(void *data)

It's better to remove the #ifdef here and use the modern
SYSTEM_SLEEP_PM_OPS() macro instead.

Fixes: 24bc19b0 ("net: ethernet: ti: am65-cpsw: Add suspend/resume support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20221215163918.611609-1-arnd@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

078838f5

16 Dec, 2022 7 commits

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 13e3c779

Jakub Kicinski authored Dec 16, 2022

Daniel Borkmann says:

====================
pull-request: bpf 2022-12-16

We've added 7 non-merge commits during the last 2 day(s) which contain
a total of 9 files changed, 119 insertions(+), 36 deletions(-).

1) Fix for recent syzkaller XDP dispatcher update splat, from Jiri Olsa.

2) Fix BPF program refcount leak in LSM attachment failure path,
   from Milan Landaverde.

3) Fix BPF program type in map compatibility check for fext,
   from Toke Høiland-Jørgensen.

4) Fix a BPF selftest compilation error under !CONFIG_SMP config,
   from Yonghong Song.

5) Fix CI to enable CONFIG_FUNCTION_ERROR_INJECTION after it got changed
   to a prompt, from Song Liu.

6) Various BPF documentation fixes for socket local storage,
   from Donald Hunter.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Add a test for using a cpumap from an freplace-to-XDP program
  bpf: Resolve fext program type when checking map compatibility
  bpf: Synchronize dispatcher update with bpf_dispatcher_xdp_func
  bpf: prevent leak of lsm program after failed attach
  selftests/bpf: Select CONFIG_FUNCTION_ERROR_INJECTION
  selftests/bpf: Fix a selftest compilation error with CONFIG_SMP=n
  docs/bpf: Reword docs for BPF_MAP_TYPE_SK_STORAGE
====================

Link: https://lore.kernel.org/r/20221216174540.16598-1-daniel@iogearbox.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

13e3c779

openvswitch: Fix flow lookup to use unmasked key · 68bb1010

Eelco Chaudron authored Dec 15, 2022

The commit mentioned below causes the ovs_flow_tbl_lookup() function
to be called with the masked key. However, it's supposed to be called
with the unmasked key. This due to the fact that the datapath supports
installing wider flows, and OVS relies on this behavior. For example
if ipv4(src=1.1.1.1/192.0.0.0, dst=1.1.1.2/192.0.0.0) exists, a wider
flow (smaller mask) of ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/
128.0.0.0) is allowed to be added.

However, if we try to add a wildcard rule, the installation fails:

$ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
  ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2
$ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
  ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2
ovs-vswitchd: updating flow table (File exists)

The reason is that the key used to determine if the flow is already
present in the system uses the original key ANDed with the mask.
This results in the IP address not being part of the (miniflow) key,
i.e., being substituted with an all-zero value. When doing the actual
lookup, this results in the key wrongfully matching the first flow,
and therefore the flow does not get installed.

This change reverses the commit below, but rather than having the key
on the stack, it's allocated.

Fixes: 190aa3e7 ("openvswitch: Fix Frame-size larger than 1024 bytes warning.")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

68bb1010

Merge branch 'devlink-fixes' · 3e31d209

David S. Miller authored Dec 16, 2022

Jakub Kicinski says:

====================
devlink: region snapshot locking fix and selftest adjustments

Minor fix for region snapshot locking and adjustments to selftests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3e31d209

selftests: devlink: add a warning for interfaces coming up · d1c4a346

Jakub Kicinski authored Dec 14, 2022

NetworkManager (and other daemons) may bring the interface up
and cause failures in quiescence checks. Print a helpful warning,
and take the interface down again.

I seem to forget about this every time I run these tests on a new VM.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

d1c4a346

selftests: devlink: fix the fd redirect in dummy_reporter_test · 2fc60e2f

Jakub Kicinski authored Dec 14, 2022

$number + > bash means redirect FD $number, e.g. commonly
used 2> redirects stderr (fd 2). The test uses 8192> to
write the number 8192 to a file, this results in:

  ./devlink.sh: line 499: 8192: Bad file descriptor

Oddly the test also papers over this issue by checking
for failure (expecting an error rather than success)
so it passes, anyway.

Fixes: ff18176a ("selftests: Add a test of large binary to devlink health test")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

2fc60e2f

devlink: hold region lock when flushing snapshots · b4cafb3d

Jakub Kicinski authored Dec 14, 2022

Netdevsim triggers a splat on reload, when it destroys regions
with snapshots pending:

  WARNING: CPU: 1 PID: 787 at net/core/devlink.c:6291 devlink_region_snapshot_del+0x12e/0x140
  CPU: 1 PID: 787 Comm: devlink Not tainted 6.1.0-07460-g7ae9888d #580
  RIP: 0010:devlink_region_snapshot_del+0x12e/0x140
  Call Trace:
   <TASK>
   devl_region_destroy+0x70/0x140
   nsim_dev_reload_down+0x2f/0x60 [netdevsim]
   devlink_reload+0x1f7/0x360
   devlink_nl_cmd_reload+0x6ce/0x860
   genl_family_rcv_msg_doit.isra.0+0x145/0x1c0

This is the locking assert in devlink_region_snapshot_del(),
we're supposed to be holding the region->snapshot_lock here.

Fixes: 2dec18ad ("net: devlink: remove region snapshots list dependency on devlink->lock")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b4cafb3d

net: dsa: mt7530: remove redundant assignment · 32f1002e

Daniel Golle authored Dec 15, 2022

Russell King correctly pointed out that the MAC_2500FD capability is
already added for port 5 (if not in RGMII mode) and port 6 (which only
supports SGMII) by mt7531_mac_port_get_caps. Remove the reduntant
setting of this capability flag which was added by a previous commit.

Fixes: e19de30d ("net: dsa: mt7530: add support for in-band link status")
Reported-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/Y5qY7x6la5TxZxzX@makrotopia.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

32f1002e

15 Dec, 2022 3 commits

igc: Set Qbv start_time and end_time to end_time if not being configured in GCL · 72abeedd

Tan Tee Min authored Dec 09, 2022

The default setting of end_time minus start_time is whole 1 second.
Thus, if it's not being configured in any GCL entry then it will be
staying at original 1 second.

This patch is changing the start_time and end_time to be end_time as
if setting zero will be having weird HW behavior where the gate will
not be fully closed.

Fixes: ec50a9d4 ("igc: Add support for taprio offloading")
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

72abeedd

igc: recalculate Qbv end_time by considering cycle time · 6d05251d

Tan Tee Min authored Dec 09, 2022

Qbv users can specify a cycle time that is not equal to the total GCL
intervals. Hence, recalculation is necessary here to exclude the time
interval that exceeds the cycle time. As those GCL which exceeds the
cycle time will be truncated.

According to IEEE Std. 802.1Q-2018 section 8.6.9.2, once the end of
the list is reached, it will switch to the END_OF_CYCLE state and
leave the gates in the same state until the next cycle is started.

Fixes: ec50a9d4 ("igc: Add support for taprio offloading")
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

6d05251d

igc: allow BaseTime 0 enrollment for Qbv · e17090eb

Tan Tee Min authored Dec 09, 2022

Introduce qbv_enable flag in igc_adapter struct to store the Qbv on/off.
So this allow the BaseTime to enroll with zero value.

Fixes: 61572d5f ("igc: Simplify TSN flags handling")
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

e17090eb