Commits · c7f75954212b5e64f6b1f2375215b02fd79758ce · Kirill Smelkov / linux

14 Jun, 2024 1 commit

dt-bindings: net: dsa: lantiq,gswip: convert to YAML schema · c7f75954

Martin Schiller authored Jun 11, 2024

Convert the lantiq,gswip bindings to YAML format.

Also add this new file to the MAINTAINERS file.

Furthermore, the CPU port has to specify a phy-mode and either a phy or
a fixed-link. Since GSWIP is connected using a SoC internal protocol
there's no PHY involved. Add phy-mode = "internal" and a fixed-link to
the example code to describe the communication between the PMAC
(Ethernet controller) and GSWIP switch.
Signed-off-by: Martin Schiller <ms@dev.tdt.de>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20240611135434.3180973-2-ms@dev.tdt.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c7f75954

13 Jun, 2024 23 commits

Merge branch 'mana-shared' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · cf157f33

Jakub Kicinski authored Jun 13, 2024

Leon Romanovsky says:

====================
net: mana: Allow variable size indirection table

Like we talked, I created new shared branch for this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=mana-shared

* 'mana-shared' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  net: mana: Allow variable size indirection table
====================

Link: https://lore.kernel.org/all/20240612183051.GE4966@unrealSigned-off-by: Jakub Kicinski <kuba@kernel.org>

cf157f33

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 4c7d3d79

Jakub Kicinski authored Jun 13, 2024

Cross-merge networking fixes after downstream PR.

No conflicts, no adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

4c7d3d79

Merge tag 'net-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · d20f6b3d

Linus Torvalds authored Jun 13, 2024

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth and netfilter.

  Slim pickings this time, probably a combination of summer, DevConf.cz,
  and the end of first half of the year at corporations.

  Current release - regressions:

   - Revert "igc: fix a log entry using uninitialized netdev", it traded
     lack of netdev name in a printk() for a crash

  Previous releases - regressions:

   - Bluetooth: L2CAP: fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ

   - geneve: fix incorrectly setting lengths of inner headers in the
     skb, confusing the drivers and causing mangled packets

   - sched: initialize noop_qdisc owner to avoid false-positive
     recursion detection (recursing on CPU 0), which bubbles up to user
     space as a sendmsg() error, while noop_qdisc should silently drop

   - netdevsim: fix backwards compatibility in nsim_get_iflink()

  Previous releases - always broken:

   - netfilter: ipset: fix race between namespace cleanup and gc in the
     list:set type"

* tag 'net-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits)
  bnxt_en: Adjust logging of firmware messages in case of released token in __hwrm_send()
  af_unix: Read with MSG_PEEK loops if the first unread byte is OOB
  bnxt_en: Cap the size of HWRM_PORT_PHY_QCFG forwarded response
  gve: Clear napi->skb before dev_kfree_skb_any()
  ionic: fix use after netif_napi_del()
  Revert "igc: fix a log entry using uninitialized netdev"
  net: bridge: mst: fix suspicious rcu usage in br_mst_set_state
  net: bridge: mst: pass vlan group directly to br_mst_vlan_set_state
  net/ipv6: Fix the RT cache flush via sysctl using a previous delay
  net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters
  gve: ignore nonrelevant GSO type bits when processing TSO headers
  net: pse-pd: Use EOPNOTSUPP error code instead of ENOTSUPP
  netfilter: Use flowlabel flow key when re-routing mangled packets
  netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type
  netfilter: nft_inner: validate mandatory meta and payload
  tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()
  mailmap: map Geliang's new email address
  mptcp: pm: update add_addr counters after connect
  mptcp: pm: inc RmAddr MIB counter once per RM_ADDR ID
  mptcp: ensure snd_una is properly initialized on connect
  ...

d20f6b3d

Merge tag 'nfs-for-6.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · fd88e181

Linus Torvalds authored Jun 13, 2024

Pull NFS client fixes from Trond Myklebust:
 "Bugfixes:
   - NFSv4.2: Fix a memory leak in nfs4_set_security_label
   - NFSv2/v3: abort nfs_atomic_open_v23 if the name is too long.
   - NFS: Add appropriate memory barriers to the sillyrename code
   - Propagate readlink errors in nfs_symlink_filler
   - NFS: don't invalidate dentries on transient errors
   - NFS: fix unnecessary synchronous writes in random write workloads
   - NFSv4.1: enforce rootpath check when deciding whether or not to trunk

  Other:
   - Change email address for Trond Myklebust due to email server concerns"

* tag 'nfs-for-6.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: add barriers when testing for NFS_FSDATA_BLOCKED
  SUNRPC: return proper error from gss_wrap_req_priv
  NFSv4.1 enforce rootpath check in fs_location query
  NFS: abort nfs_atomic_open_v23 if name is too long.
  nfs: don't invalidate dentries on transient errors
  nfs: Avoid flushing many pages with NFS_FILE_SYNC
  nfs: propagate readlink errors in nfs_symlink_filler
  MAINTAINERS: Change email address for Trond Myklebust
  NFSv4: Fix memory leak in nfs4_set_security_label

fd88e181

Merge tag 'fixes-2024-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock · 3572597c

Linus Torvalds authored Jun 13, 2024

Pull memblock fixes from Mike Rapoport:
 "Fix validation of NUMA coverage.

  memblock_validate_numa_coverage() was checking for a unset node ID
  using NUMA_NO_NODE, but x86 used MAX_NUMNODES when no node ID was
  specified by buggy firmware.

  Update memblock to substitute MAX_NUMNODES with NUMA_NO_NODE in
  memblock_set_node() and use NUMA_NO_NODE in x86::numa_init()"

* tag 'fixes-2024-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  x86/mm/numa: Use NUMA_NO_NODE when calling memblock_set_node()
  memblock: make memblock_set_node() also warn about use of MAX_NUMNODES

3572597c

bnxt_en: Adjust logging of firmware messages in case of released token in __hwrm_send() · a9b97418

Aleksandr Mishin authored Jun 11, 2024

In case of token is released due to token->state == BNXT_HWRM_DEFERRED,
released token (set to NULL) is used in log messages. This issue is
expected to be prevented by HWRM_ERR_CODE_PF_UNAVAILABLE error code. But
this error code is returned by recent firmware. So some firmware may not
return it. This may lead to NULL pointer dereference.
Adjust this issue by adding token pointer check.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 8fa4219d ("bnxt_en: add dynamic debug support for HWRM messages")
Suggested-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240611082547.12178-1-amishin@t-argos.ruSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a9b97418

af_unix: Read with MSG_PEEK loops if the first unread byte is OOB · a6736a0a

Rao Shoaib authored Jun 11, 2024

Read with MSG_PEEK flag loops if the first byte to read is an OOB byte.
commit 22dd70eb ("af_unix: Don't peek OOB data without MSG_OOB.")
addresses the loop issue but does not address the issue that no data
beyond OOB byte can be read.

>>> from socket import *
>>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
>>> c1.send(b'a', MSG_OOB)
1
>>> c1.send(b'b')
1
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'b'

>>> from socket import *
>>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
>>> c2.setsockopt(SOL_SOCKET, SO_OOBINLINE, 1)
>>> c1.send(b'a', MSG_OOB)
1
>>> c1.send(b'b')
1
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'a'
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'a'
>>> c2.recv(1, MSG_DONTWAIT)
b'a'
>>> c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'b'
>>>

Fixes: 314001f0 ("af_unix: Add OOB support")
Signed-off-by: Rao Shoaib <Rao.Shoaib@oracle.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240611084639.2248934-1-Rao.Shoaib@oracle.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a6736a0a

bnxt_en: Cap the size of HWRM_PORT_PHY_QCFG forwarded response · 7d9df38c

Michael Chan authored Jun 12, 2024

Firmware interface 1.10.2.118 has increased the size of
HWRM_PORT_PHY_QCFG response beyond the maximum size that can be
forwarded. When the VF's link state is not the default auto state,
the PF will need to forward the response back to the VF to indicate
the forced state. This regression may cause the VF to fail to
initialize.

Fix it by capping the HWRM_PORT_PHY_QCFG response to the maximum
96 bytes. The SPEEDS2_SUPPORTED flag needs to be cleared because the
new speeds2 fields are beyond the legacy structure. Also modify
bnxt_hwrm_fwd_resp() to print a warning if the message size exceeds 96
bytes to make this failure more obvious.

Fixes: 84a911db ("bnxt_en: Update firmware interface to 1.10.2.118")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240612231736.57823-1-michael.chan@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7d9df38c

gve: Clear napi->skb before dev_kfree_skb_any() · 6f4d93b7

Ziwei Xiao authored Jun 12, 2024

gve_rx_free_skb incorrectly leaves napi->skb referencing an skb after it
is freed with dev_kfree_skb_any(). This can result in a subsequent call
to napi_get_frags returning a dangling pointer.

Fix this by clearing napi->skb before the skb is freed.

Fixes: 9b8dd5e5 ("gve: DQO: Add RX path")
Cc: stable@vger.kernel.org
Reported-by: Shailend Chand <shailend@google.com>
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Shailend Chand <shailend@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Link: https://lore.kernel.org/r/20240612001654.923887-1-ziweixiao@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6f4d93b7

ionic: fix use after netif_napi_del() · 79f18a41

Taehee Yoo authored Jun 12, 2024

When queues are started, netif_napi_add() and napi_enable() are called.
If there are 4 queues and only 3 queues are used for the current
configuration, only 3 queues' napi should be registered and enabled.
The ionic_qcq_enable() checks whether the .poll pointer is not NULL for
enabling only the using queue' napi. Unused queues' napi will not be
registered by netif_napi_add(), so the .poll pointer indicates NULL.
But it couldn't distinguish whether the napi was unregistered or not
because netif_napi_del() doesn't reset the .poll pointer to NULL.
So, ionic_qcq_enable() calls napi_enable() for the queue, which was
unregistered by netif_napi_del().

Reproducer:
   ethtool -L <interface name> rx 1 tx 1 combined 0
   ethtool -L <interface name> rx 0 tx 0 combined 1
   ethtool -L <interface name> rx 0 tx 0 combined 4

Splat looks like:
kernel BUG at net/core/dev.c:6666!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 3 PID: 1057 Comm: kworker/3:3 Not tainted 6.10.0-rc2+ #16
Workqueue: events ionic_lif_deferred_work [ionic]
RIP: 0010:napi_enable+0x3b/0x40
Code: 48 89 c2 48 83 e2 f6 80 b9 61 09 00 00 00 74 0d 48 83 bf 60 01 00 00 00 74 03 80 ce 01 f0 4f
RSP: 0018:ffffb6ed83227d48 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff97560cda0828 RCX: 0000000000000029
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff97560cda0a28
RBP: ffffb6ed83227d50 R08: 0000000000000400 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: ffff97560ce3c1a0 R14: 0000000000000000 R15: ffff975613ba0a20
FS:  0000000000000000(0000) GS:ffff975d5f780000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8f734ee200 CR3: 0000000103e50000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
 <TASK>
 ? die+0x33/0x90
 ? do_trap+0xd9/0x100
 ? napi_enable+0x3b/0x40
 ? do_error_trap+0x83/0xb0
 ? napi_enable+0x3b/0x40
 ? napi_enable+0x3b/0x40
 ? exc_invalid_op+0x4e/0x70
 ? napi_enable+0x3b/0x40
 ? asm_exc_invalid_op+0x16/0x20
 ? napi_enable+0x3b/0x40
 ionic_qcq_enable+0xb7/0x180 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8]
 ionic_start_queues+0xc4/0x290 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8]
 ionic_link_status_check+0x11c/0x170 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8]
 ionic_lif_deferred_work+0x129/0x280 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8]
 process_one_work+0x145/0x360
 worker_thread+0x2bb/0x3d0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xcc/0x100
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2d/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30

Fixes: 0f3154e6 ("ionic: Add Tx and Rx handling")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240612060446.1754392-1-ap420073@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

79f18a41

Revert "igc: fix a log entry using uninitialized netdev" · 8eef5c3c

Sasha Neftin authored Jun 11, 2024

This reverts commit 86167183.

igc_ptp_init() needs to be called before igc_reset(), otherwise kernel
crash could be observed. Following the corresponding discussion [1] and
[2] revert this commit.

Link: https://lore.kernel.org/all/8fb634f8-7330-4cf4-a8ce-485af9c0a61a@intel.com/ [1]
Link: https://lore.kernel.org/all/87o78rmkhu.fsf@intel.com/ [2]
Fixes: 86167183 ("igc: fix a log entry using uninitialized netdev")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240611162456.961631-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8eef5c3c

CDC-NCM: add support for Apple's private interface · 3ec8d757

Ole André Vadla Ravnås authored Jun 07, 2024

Available on iOS/iPadOS >= 17, where this new interface is used by
developer tools using the new RemoteXPC protocol.

This private interface lacks a status endpoint, presumably because there
isn't a physical cable that can be unplugged, nor any speed changes to
be notified about.

Note that NCM interfaces are not exposed until a mode switch is
requested, which macOS does automatically.

The mode switch can be performed like this:

        uint8_t status;
        libusb_control_transfer(device_handle,
                LIBUSB_RECIPIENT_DEVICE | LIBUSB_REQUEST_TYPE_VENDOR |
                LIBUSB_ENDPOINT_IN,
                82, /* bRequest */
                0,  /* wValue   */
                3,  /* wIndex   */
                &status,
                sizeof(status),
                0);

Newer versions of usbmuxd do this automatically.
Co-developed-by: Håvard Sørbø <havard@hsorbo.no>
Signed-off-by: Håvard Sørbø <havard@hsorbo.no>
Signed-off-by: Ole André Vadla Ravnås <oleavr@frida.re>
Link: https://lore.kernel.org/r/20240607074117.31322-1-oleavr@frida.reSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3ec8d757

Merge branch 'net-bridge-mst-fix-suspicious-rcu-usage-warning' · b60b1bdc

Jakub Kicinski authored Jun 12, 2024

Nikolay Aleksandrov says:

====================
net: bridge: mst: fix suspicious rcu usage warning

This set fixes a suspicious RCU usage warning triggered by syzbot[1] in
the bridge's MST code. After I converted br_mst_set_state to RCU, I
forgot to update the vlan group dereference helper. Fix it by using
the proper helper, in order to do that we need to pass the vlan group
which is already obtained correctly by the callers for their respective
context. Patch 01 is a requirement for the fix in patch 02.

Note I did consider rcu_dereference_rtnl() but the churn is much bigger
and in every part of the bridge. We can do that as a cleanup in
net-next.

[1] https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe
 =============================
 WARNING: suspicious RCU usage
 6.10.0-rc2-syzkaller-00235-g8a929806 #0 Not tainted
 -----------------------------
 net/bridge/br_private.h:1599 suspicious rcu_dereference_protected() usage!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 4 locks held by syz-executor.1/5374:
  #0: ffff888022d50b18 (&mm->mmap_lock){++++}-{3:3}, at: mmap_read_lock include/linux/mmap_lock.h:144 [inline]
  #0: ffff888022d50b18 (&mm->mmap_lock){++++}-{3:3}, at: __mm_populate+0x1b0/0x460 mm/gup.c:2111
  #1: ffffc90000a18c00 ((&p->forward_delay_timer)){+.-.}-{0:0}, at: call_timer_fn+0xc0/0x650 kernel/time/timer.c:1789
  #2: ffff88805fb2ccb8 (&br->lock){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
  #2: ffff88805fb2ccb8 (&br->lock){+.-.}-{2:2}, at: br_forward_delay_timer_expired+0x50/0x440 net/bridge/br_stp_timer.c:86
  #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
  #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:781 [inline]
  #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: br_mst_set_state+0x171/0x7a0 net/bridge/br_mst.c:105

 stack backtrace:
 CPU: 1 PID: 5374 Comm: syz-executor.1 Not tainted 6.10.0-rc2-syzkaller-00235-g8a929806 #0
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
 Call Trace:
  <IRQ>
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
  lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712
  nbp_vlan_group net/bridge/br_private.h:1599 [inline]
  br_mst_set_state+0x29e/0x7a0 net/bridge/br_mst.c:106
  br_set_state+0x28a/0x7b0 net/bridge/br_stp.c:47
  br_forward_delay_timer_expired+0x176/0x440 net/bridge/br_stp_timer.c:88
  call_timer_fn+0x18e/0x650 kernel/time/timer.c:1792
  expire_timers kernel/time/timer.c:1843 [inline]
  __run_timers kernel/time/timer.c:2417 [inline]
  __run_timer_base+0x66a/0x8e0 kernel/time/timer.c:2428
  run_timer_base kernel/time/timer.c:2437 [inline]
  run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2447
  handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
  __do_softirq kernel/softirq.c:588 [inline]
  invoke_softirq kernel/softirq.c:428 [inline]
  __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637
  irq_exit_rcu+0x9/0x30 kernel/softirq.c:649
  instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
  sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
  </IRQ>
  <TASK>
====================

Link: https://lore.kernel.org/r/20240609103654.914987-1-razor@blackwall.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b60b1bdc

net: bridge: mst: fix suspicious rcu usage in br_mst_set_state · 546ceb1d

Nikolay Aleksandrov authored Jun 09, 2024

I converted br_mst_set_state to RCU to avoid a vlan use-after-free
but forgot to change the vlan group dereference helper. Switch to vlan
group RCU deref helper to fix the suspicious rcu usage warning.

Fixes: 3a7c1661 ("net: bridge: mst: fix vlan use-after-free")
Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5feSigned-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240609103654.914987-3-razor@blackwall.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

546ceb1d

net: bridge: mst: pass vlan group directly to br_mst_vlan_set_state · 36c92936

Nikolay Aleksandrov authored Jun 09, 2024

Pass the already obtained vlan group pointer to br_mst_vlan_set_state()
instead of dereferencing it again. Each caller has already correctly
dereferenced it for their context. This change is required for the
following suspicious RCU dereference fix. No functional changes
intended.

Fixes: 3a7c1661 ("net: bridge: mst: fix vlan use-after-free")
Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5feSigned-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20240609103654.914987-2-razor@blackwall.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

36c92936

Merge branch 'net-flower-validate-encapsulation-control-flags' · 6fc1b322

Jakub Kicinski authored Jun 12, 2024

Asbjørn Sloth Tønnesen says:

====================
net: flower: validate encapsulation control flags

Now that all drivers properly rejects unsupported flower control flags
used with FLOW_DISSECTOR_KEY_CONTROL, then time has come to add similar
checks to the drivers supporting FLOW_DISSECTOR_KEY_ENC_CONTROL.

There are currently just 4 drivers supporting this key, and
3 of those currently doesn't validate encapsulated control flags.

Encapsulation control flags may currently be unused, but they should
still be validated by the drivers, so that drivers will properly
reject any new flags when they are introduced.

This series adds some helper functions, and implements them in all
4 drivers.

NB: It is currently discussed[1] to use encapsulation control flags
for tunnel flags instead of the new FLOW_DISSECTOR_KEY_ENC_FLAGS.

[1] https://lore.kernel.org/netdev/ZmFuxElwZiYJzBkh@dcaratti.users.ipa.redhat.com/
====================

Link: https://lore.kernel.org/r/20240609173358.193178-1-ast@fiberby.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6fc1b322

ice: flower: validate encapsulation control flags · 5a1b015d

Asbjørn Sloth Tønnesen authored Jun 09, 2024

Encapsulation control flags are currently not used anywhere,
so all flags are currently unsupported by all drivers.

This patch adds validation of this assumption, so that
encapsulation flags may be used in the future.

In case any encapsulation control flags are masked,
flow_rule_match_has_enc_control_flags() sets a NL extended
error message, and we return -EOPNOTSUPP.

Only compile tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/20240609173358.193178-6-ast@fiberby.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5a1b015d

nfp: flower: validate encapsulation control flags · 34cdd984

Asbjørn Sloth Tønnesen authored Jun 09, 2024

Encapsulation control flags are currently not used anywhere,
so all flags are currently unsupported by all drivers.

This patch adds validation of this assumption, so that
encapsulation flags may be used in the future.

In case any encapsulation control flags are masked,
flow_rule_match_has_enc_control_flags() sets a NL extended
error message, and we return -EOPNOTSUPP.

Only compile tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/20240609173358.193178-5-ast@fiberby.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

34cdd984

net/mlx5e: flower: validate encapsulation control flags · 28d19ec9

Asbjørn Sloth Tønnesen authored Jun 09, 2024

Encapsulation control flags are currently not used anywhere,
so all flags are currently unsupported by all drivers.

This patch adds validation of this assumption, so that
encapsulation flags may be used in the future.

In case any encapsulation control flags are masked,
flow_rule_match_has_enc_control_flags() sets a NL extended
error message, and we return -EOPNOTSUPP.

Only compile tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/20240609173358.193178-4-ast@fiberby.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

28d19ec9

sfc: use flow_rule_is_supp_enc_control_flags() · 2ede54f8

Asbjørn Sloth Tønnesen authored Jun 09, 2024

Change the existing check for unsupported encapsulation control flags,
to use the new helper flow_rule_is_supp_enc_control_flags().

No functional change, only compile tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/20240609173358.193178-3-ast@fiberby.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2ede54f8

flow_offload: add encapsulation control flag helpers · b48a1540

Asbjørn Sloth Tønnesen authored Jun 09, 2024

This patch adds two new helper functions:
  flow_rule_is_supp_enc_control_flags()
  flow_rule_has_enc_control_flags()

They are intended to be used for validating encapsulation control
flags, and compliment the similar helpers without "enc_" in the name.

The only difference is that they have their own error message,
to make it obvious if an unsupported flag error is related to
FLOW_DISSECTOR_KEY_CONTROL or FLOW_DISSECTOR_KEY_ENC_CONTROL.

flow_rule_has_enc_control_flags() is for drivers supporting
FLOW_DISSECTOR_KEY_ENC_CONTROL, but not supporting any
encapsulation control flags.
(Currently all 4 drivers fits this category)

flow_rule_is_supp_enc_control_flags() is currently only used
for the above helper, but should also be used by drivers once
they implement at least one encapsulation control flag.

There is AFAICT currently no need for an "enc_" variant of
flow_rule_match_has_control_flags(), as all drivers currently
supporting FLOW_DISSECTOR_KEY_ENC_CONTROL, are already calling
flow_rule_match_enc_control() directly.

Only compile tested.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/20240609173358.193178-2-ast@fiberby.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b48a1540

net/ipv6: Fix the RT cache flush via sysctl using a previous delay · 14a20e5b

Petr Pavlu authored Jun 07, 2024

The net.ipv6.route.flush system parameter takes a value which specifies
a delay used during the flush operation for aging exception routes. The
written value is however not used in the currently requested flush and
instead utilized only in the next one.

A problem is that ipv6_sysctl_rtcache_flush() first reads the old value
of net->ipv6.sysctl.flush_delay into a local delay variable and then
calls proc_dointvec() which actually updates the sysctl based on the
provided input.

Fix the problem by switching the order of the two operations.

Fixes: 4990509f ("[NETNS][IPV6]: Make sysctls route per namespace.")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240607112828.30285-1-petr.pavlu@suse.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

14a20e5b

net: ethernet: mtk_eth_soc: ppe: add support for multiple PPEs · dee4dd10

Elad Yifee authored Jun 07, 2024

Add the missing pieces to allow multiple PPEs units, one for each GMAC.
mtk_gdm_config has been modified to work on targted mac ID,
the inner loop moved outside of the function to allow unrelated
operations like setting the MAC's PPE index.
Introduce a sanity check in flow_offload_replace to account for
non-MTK ingress devices.
Additional field 'ppe_idx' was added to struct mtk_mac in order
to keep track on the assigned PPE unit.
Signed-off-by: Elad Yifee <eladwf@gmail.com>
Link: https://lore.kernel.org/r/20240607082155.20021-1-eladwf@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

dee4dd10

12 Jun, 2024 16 commits

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux · 2ccbdf43

Linus Torvalds authored Jun 12, 2024

Pull ARM and clkdev fixes from Russell King:

 - Fix clkdev - erroring out on long strings causes boot failures, so
   don't do this. Still warn about the over-sized strings (which will
   never match and thus their registration with clkdev is useless)

 - Fix for ftrace with frame pointer unwinder with recent GCC changing
   the way frames are stacked.

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
  ARM: 9405/1: ftrace: Don't assume stack frames are contiguous in memory
  clkdev: don't fail clkdev_alloc() if over-sized

2ccbdf43

Merge branch 'allow-configuration-of-multipath-hash-seed' · 05f43db7

Jakub Kicinski authored Jun 12, 2024

Petr Machata says:

====================
Allow configuration of multipath hash seed

Let me just quote the commit message of patch #2 here to inform the
motivation and some of the implementation:

    When calculating hashes for the purpose of multipath forwarding,
    both IPv4 and IPv6 code currently fall back on
    flow_hash_from_keys(). That uses a randomly-generated seed. That's a
    fine choice by default, but unfortunately some deployments may need
    a tighter control over the seed used.

    In this patchset, make the seed configurable by adding a new sysctl
    key, net.ipv4.fib_multipath_hash_seed to control the seed. This seed
    is used specifically for multipath forwarding and not for the other
    concerns that flow_hash_from_keys() is used for, such as queue
    selection. Expose the knob as sysctl because other such settings,
    such as headers to hash, are also handled that way.

    Despite being placed in the net.ipv4 namespace, the multipath seed
    sysctl is used for both IPv4 and IPv6, similarly to e.g. a number of
    TCP variables. Like those, the multipath hash seed is a per-netns
    variable.

    The seed used by flow_hash_from_keys() is a 128-bit quantity.
    However it seems that usually the seed is a much more modest value.
    32 bits seem typical (Cisco, Cumulus), some systems go even lower.
    For that reason, and to decouple the user interface from
    implementation details, go with a 32-bit quantity, which is then
    quadruplicated to form the siphash key.

One example of use of this interface is avoiding hash polarization,
where two ECMP routers, one behind the other, happen to make consistent
hashing decisions, and as a result, part of the ECMP space of the latter
router is never used. Another is a load balancer where several machines
forward traffic to one of a number of leaves, and the forwarding
decisions need to be made consistently. (This is a case of a desired
hash polarization, mentioned e.g. in chapter 6.3 of [0].)

There has already been a proposal to include a hash seed control
interface in the past[1].

- Patches #1-#2 contain the substance of the work
- Patch #3 is an mlxsw offload
- Patches #4 and #5 are a selftest

[0] https://www.usenix.org/system/files/conference/nsdi18/nsdi18-araujo.pdf
[1] https://lore.kernel.org/netdev/YIlVpYMCn%2F8WfE1P@rnd/
====================

Link: https://lore.kernel.org/r/20240607151357.421181-1-petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

05f43db7

selftests: forwarding: router_mpath_hash: Add a new selftest · 5f90d93b

Petr Machata authored Jun 07, 2024

Add a selftest that exercises the sysctl added in the previous patches.

Test that set/get works as expected; that across seeds we eventually hit
all NHs (test_mpath_seed_*); and that a given seed keeps hitting the same
NHs even across seed changes (test_mpath_seed_stability_*).
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240607151357.421181-6-petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5f90d93b

selftests: forwarding: lib: Split sysctl_save() out of sysctl_set() · 6f51aed3

Petr Machata authored Jun 07, 2024

In order to be able to save the current value of a sysctl without changing
it, split the relevant bit out of sysctl_set() into a new helper.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240607151357.421181-5-petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6f51aed3

mlxsw: spectrum_router: Apply user-defined multipath hash seed · 60bcfede

Petr Machata authored Jun 07, 2024

When Spectrum machines compute hash for the purposes of ECMP routing, they
use a seed specified through RECR_v2 (Router ECMP Configuration Register).
Up until now mlxsw computed the seed by hashing the machine's base MAC.
Now that we can optionally have a user-provided seed, use that if possible.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240607151357.421181-4-petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

60bcfede

net: ipv4: Add a sysctl to set multipath hash seed · 4ee2a8ca

Petr Machata authored Jun 07, 2024

When calculating hashes for the purpose of multipath forwarding, both IPv4
and IPv6 code currently fall back on flow_hash_from_keys(). That uses a
randomly-generated seed. That's a fine choice by default, but unfortunately
some deployments may need a tighter control over the seed used.

In this patch, make the seed configurable by adding a new sysctl key,
net.ipv4.fib_multipath_hash_seed to control the seed. This seed is used
specifically for multipath forwarding and not for the other concerns that
flow_hash_from_keys() is used for, such as queue selection. Expose the knob
as sysctl because other such settings, such as headers to hash, are also
handled that way. Like those, the multipath hash seed is a per-netns
variable.

Despite being placed in the net.ipv4 namespace, the multipath seed sysctl
is used for both IPv4 and IPv6, similarly to e.g. a number of TCP
variables.

The seed used by flow_hash_from_keys() is a 128-bit quantity. However it
seems that usually the seed is a much more modest value. 32 bits seem
typical (Cisco, Cumulus), some systems go even lower. For that reason, and
to decouple the user interface from implementation details, go with a
32-bit quantity, which is then quadruplicated to form the siphash key.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240607151357.421181-3-petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4ee2a8ca

net: ipv4,ipv6: Pass multipath hash computation through a helper · 3e453ca1

Petr Machata authored Jun 07, 2024

The following patches will add a sysctl to control multipath hash
seed. In order to centralize the hash computation, add a helper,
fib_multipath_hash_from_keys(), and have all IPv4 and IPv6 route.c
invocations of flow_hash_from_keys() go through this helper instead.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240607151357.421181-2-petrm@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3e453ca1

Merge tag 'nf-24-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · d92589f8

Jakub Kicinski authored Jun 12, 2024

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

Patch #1 fixes insufficient sanitization of netlink attributes for the
	 inner expression which can trigger nul-pointer dereference,
	 from Davide Ornaghi.

Patch #2 address a report that there is a race condition between
         namespace cleanup and the garbage collection of the list:set
         type. This patch resolves this issue with other minor issues
	 as well, from Jozsef Kadlecsik.

Patch #3 ip6_route_me_harder() ignores flowlabel/dsfield when ip dscp
	 has been mangled, this unbreaks ip6 dscp set $v,
	 from Florian Westphal.

All of these patches address issues that are present in several releases.

* tag 'nf-24-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: Use flowlabel flow key when re-routing mangled packets
  netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type
  netfilter: nft_inner: validate mandatory meta and payload
====================

Link: https://lore.kernel.org/r/20240611220323.413713-1-pablo@netfilter.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d92589f8

Merge tag 'bcachefs-2024-06-12' of https://evilpiepirate.org/git/bcachefs · 0b4989eb

Linus Torvalds authored Jun 12, 2024

Pull bcachefs fixes from Kent Overstreet:

 - fix kworker explosion, due to calling submit_bio() (which can block)
   from a multithreaded workqueue

 - fix error handling in btree node scan

 - forward compat fix: kill an old debug assert

 - key cache shrinker fixes

   This is a partial fix for stalls doing multithreaded creates - there
   were various O(n^2) issues the key cache shrinker was hitting [1].

   There's more work coming here; I'm working on a patch to delete the
   key cache lock, which initial testing shows to be a pretty drastic
   performance improvement

 - assorted syzbot fixes

Link: https://lore.kernel.org/linux-bcachefs/CAGudoHGenxzk0ZqPXXi1_QDbfqQhGHu+wUwzyS6WmfkUZ1HiXA@mail.gmail.com/ [1]

* tag 'bcachefs-2024-06-12' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: Fix rcu_read_lock() leak in drop_extra_replicas
  bcachefs: Add missing bch_inode_info.ei_flags init
  bcachefs: Add missing synchronize_srcu_expedited() call when shutting down
  bcachefs: Check for invalid bucket from bucket_gen(), gc_bucket()
  bcachefs: Replace bucket_valid() asserts in bucket lookup with proper checks
  bcachefs: Fix snapshot_create_lock lock ordering
  bcachefs: Fix refcount leak in check_fix_ptrs()
  bcachefs: Leave a buffer in the btree key cache to avoid lock thrashing
  bcachefs: Fix reporting of freed objects from key cache shrinker
  bcachefs: set sb->s_shrinker->seeks = 0
  bcachefs: increase key cache shrinker batch size
  bcachefs: Enable automatic shrinking for rhashtables
  bcachefs: fix the display format for show-super
  bcachefs: fix stack frame size in fsck.c
  bcachefs: Delete incorrect BTREE_ID_NR assertion
  bcachefs: Fix incorrect error handling found_btree_node_is_readable()
  bcachefs: Split out btree_write_submit_wq

0b4989eb

net: xilinx: axienet: Use NL_SET_ERR_MSG instead of netdev_err · 32b06603

Sean Anderson authored Jun 11, 2024

This error message can be triggered by userspace. Use NL_SET_ERR_MSG so
the message is returned to the user and to avoid polluting the kernel
logs. Additionally, change the return value from EFAULT to EBUSY to
better reflect the error (which has nothing to do with addressing).
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://lore.kernel.org/r/20240611154116.2643662-1-sean.anderson@linux.devSigned-off-by: Jakub Kicinski <kuba@kernel.org>

32b06603

ravb: RAVB should select PAGE_POOL · 721478fe

Geert Uytterhoeven authored Jun 11, 2024

If CONFIG_PAGE_POOL is not enabled:

aarch64-linux-gnu-ld: Unexpected GOT/PLT entries detected!
aarch64-linux-gnu-ld: Unexpected run-time procedure linkages detected!
aarch64-linux-gnu-ld: drivers/net/ethernet/renesas/ravb_main.o: in function `ravb_rx_ring_refill':
ravb_main.c:(.text+0x8d8): undefined reference to `page_pool_alloc_pages'
aarch64-linux-gnu-ld: ravb_main.c:(.text+0x944): undefined reference to `page_pool_alloc_frag'
aarch64-linux-gnu-ld: drivers/net/ethernet/renesas/ravb_main.o: in function `ravb_ring_init':
ravb_main.c:(.text+0x1d4c): undefined reference to `page_pool_create'

Fixes: 96672632 ("net: ravb: Allocate RX buffers via page pool")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Paul Barker <paul.barker.ct@bp.renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/fa61b464ae1aa7630e9024f091991937941d49f1.1718113630.git.geert+renesas@glider.beSigned-off-by: Jakub Kicinski <kuba@kernel.org>

721478fe

Merge branch 'net-flow-dissector-allow-explicit-passing-of-netns' · d2675fe9

Jakub Kicinski authored Jun 12, 2024

Florian Westphal says:

====================
net: flow dissector: allow explicit passing of netns

Change since last version:
 fix kdoc comment warning reported by kbuild robot, no other changes,
 thus retaining RvB tags from Eric and Willem.
 v1: https://lore.kernel.org/netdev/20240607083205.3000-1-fw@strlen.de/

Years ago flow dissector gained ability to delegate flow dissection
to a bpf program, scoped per netns.

The netns is derived from skb->dev, and if that is not available, from
skb->sk.  If neither is set, we hit a (benign) WARN_ON_ONCE().

This WARN_ON_ONCE can be triggered from netfilter.
Known skb origins are nf_send_reset and ipv4 stack generated IGMP
messages.

Lets allow callers to pass the current netns explicitly and make
nf_tables use those instead.

This targets net-next instead of net because the WARN is benign and this
is not a regression.
====================

Link: https://lore.kernel.org/r/20240608221057.16070-1-fw@strlen.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d2675fe9

net: add and use __skb_get_hash_symmetric_net · d1dab4f7

Florian Westphal authored Jun 09, 2024

Similar to previous patch: apply same logic for
__skb_get_hash_symmetric and let callers pass the netns to the dissector
core.

Existing function is turned into a wrapper to avoid adjusting all
callers, nft_hash.c uses new function.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240608221057.16070-3-fw@strlen.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d1dab4f7

net: add and use skb_get_hash_net · b975d3ee

Florian Westphal authored Jun 09, 2024

Years ago flow dissector gained ability to delegate flow dissection
to a bpf program, scoped per netns.

Unfortunately, skb_get_hash() only gets an sk_buff argument instead
of both net+skb.  This means the flow dissector needs to obtain the
netns pointer from somewhere else.

The netns is derived from skb->dev, and if that is not available, from
skb->sk.  If neither is set, we hit a (benign) WARN_ON_ONCE().

Trying both dev and sk covers most cases, but not all, as recently
reported by Christoph Paasch.

In case of nf-generated tcp reset, both sk and dev are NULL:

WARNING: .. net/core/flow_dissector.c:1104
 skb_flow_dissect_flow_keys include/linux/skbuff.h:1536 [inline]
 skb_get_hash include/linux/skbuff.h:1578 [inline]
 nft_trace_init+0x7d/0x120 net/netfilter/nf_tables_trace.c:320
 nft_do_chain+0xb26/0xb90 net/netfilter/nf_tables_core.c:268
 nft_do_chain_ipv4+0x7a/0xa0 net/netfilter/nft_chain_filter.c:23
 nf_hook_slow+0x57/0x160 net/netfilter/core.c:626
 __ip_local_out+0x21d/0x260 net/ipv4/ip_output.c:118
 ip_local_out+0x26/0x1e0 net/ipv4/ip_output.c:127
 nf_send_reset+0x58c/0x700 net/ipv4/netfilter/nf_reject_ipv4.c:308
 nft_reject_ipv4_eval+0x53/0x90 net/ipv4/netfilter/nft_reject_ipv4.c:30
 [..]

syzkaller did something like this:
table inet filter {
  chain input {
    type filter hook input priority filter; policy accept;
    meta nftrace set 1
    tcp dport 42 reject with tcp reset
   }
   chain output {
    type filter hook output priority filter; policy accept;
    # empty chain is enough
   }
}

... then sends a tcp packet to port 42.

Initial attempt to simply set skb->dev from nf_reject_ipv4 doesn't cover
all cases: skbs generated via ipv4 igmp_send_report trigger similar splat.

Moreover, Pablo Neira found that nft_hash.c uses __skb_get_hash_symmetric()
which would trigger same warn splat for such skbs.

Lets allow callers to pass the current netns explicitly.
The nf_trace infrastructure is adjusted to use the new helper.

__skb_get_hash_symmetric is handled in the next patch.
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/494Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240608221057.16070-2-fw@strlen.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b975d3ee

net: mana: Allow variable size indirection table · 7fc45cb6

Shradha Gupta authored Jun 10, 2024

Allow variable size indirection table allocation in MANA instead
of using a constant value MANA_INDIRECT_TABLE_SIZE.
The size is now derived from the MANA_QUERY_VPORT_CONFIG and the
indirection table is allocated dynamically.
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Link: https://lore.kernel.org/r/1718015319-9609-1-git-send-email-shradhagupta@linux.microsoft.comReviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>

7fc45cb6

mailmap: Add my outdated addresses to the map file · cea2a265

Andy Shevchenko authored Jun 11, 2024

There is a couple of outdated addresses that are still visible
in the Git history, add them to .mailmap.

While at it, replace one in the comment.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cea2a265