Commits · 8750539ba3178c2bb0d178a30ce57dae132cbbb8 · Kirill Smelkov / linux

11 Apr, 2024 8 commits

net: team: fix incorrect maxattr · 8750539b

Hangbin Liu authored Apr 09, 2024

The maxattr should be the latest attr value, i.e. array size - 1,
not total array size.

Reported-by: syzbot+ecd7e07b4be038658c9f@syzkaller.appspotmail.com
Fixes: 948dbafc ("net: team: use policy generated by YAML spec")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240409092812.3999785-1-liuhangbin@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8750539b

net: wan: fsl_qmc_hdlc: Convert to platform remove callback returning void · 07409cf7

Uwe Kleine-König authored Apr 09, 2024

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Herve Codina <herve.codina@bootlin.com>
Link: https://lore.kernel.org/r/20240409091203.39062-2-u.kleine-koenig@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

07409cf7

doc/netlink/specs: Add bond support to rt_link.yaml · 4ede4575

Hangbin Liu authored Apr 09, 2024

Add bond support to rt_link.yaml. Here is an example output:

 $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_link.yaml \
   --do getlink --json '{"ifname": "bond0"}' --output-json | jq '.linkinfo'
 {
   "kind": "bond",
   "data": {
     "mode": 4,
     "miimon": 100,
     ...
     "arp-interval": 0,
     "arp-ip-target": [
       "192.168.1.1",
       "192.168.1.2"
     ],
     "arp-validate": 0,
     "arp-all-targets": 0,
     "ns-ip6-target": [
       "2001::1",
       "2001::2"
     ],
     "primary-reselect": 0,
     ...
     "missed-max": 2,
     "ad-info": {
       "aggregator": 1,
       "num-ports": 1,
       "actor-key": 0,
       "partner-key": 1,
       "partner-mac": "00:00:00:00:00:00"
     }
   }
 }

And here is the downlink info.

 $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_link.yaml \
   --do getlink --json '{"ifname": "dummy0"}' --output-json | jq '.linkinfo'
 {
   "kind": "dummy",
   "slave-kind": "bond",
   "slave-data": {
     "state": 0,
     "mii-status": 0,
     "link-failure-count": 0,
     "perm-hwaddr": "f2:82:f7:cc:47:13",
     "queue-id": 0,
     "prio": 0
   }
 }
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240409083504.3900877-1-liuhangbin@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4ede4575

ethtool: update tsinfo statistics attribute docs with correct type · 65f35aa7

Rahul Rameshbabu authored Apr 09, 2024

nla_put_uint can either write a u32 or u64 netlink attribute value. The
size depends on whether the value can be represented with a u32 or requires
a u64. Use a uint annotation in various documentation to represent this.
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Link: https://lore.kernel.org/r/20240409232520.237613-2-rrameshbabu@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

65f35aa7

Merge branch 'optimise-local-cpu-skb_attempt_defer_free' · 52a85468

Jakub Kicinski authored Apr 10, 2024

Pavel Begunkov says:

====================
optimise local CPU skb_attempt_defer_free

Optimise the case when an skb comes to skb_attempt_defer_free()
on the same CPU it was allocated on. The patch 1 enables skb caches
and gives frags a chance to hit the page pool's fast path.
CPU bound benchmarking with perfect skb_attempt_defer_free()
gives around 1% of extra throughput.
====================

Link: https://lore.kernel.org/r/cover.1712711977.git.asml.silence@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

52a85468

net: use SKB_CONSUMED in skb_attempt_defer_free() · d8415a16

Pavel Begunkov authored Apr 10, 2024

skb_attempt_defer_free() is used to free already processed skbs, so pass
SKB_CONSUMED as the reason in kfree_skb_napi_cache().
Suggested-by: Jason Xing <kerneljasonxing@gmail.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/bcf5dbdda79688b074ab7ae2238535840a6d3fc2.1712711977.git.asml.silence@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d8415a16

net: cache for same cpu skb_attempt_defer_free · 7cb31c46

Pavel Begunkov authored Apr 10, 2024

Optimise skb_attempt_defer_free() when run by the same CPU the skb was
allocated on. Instead of __kfree_skb() -> kmem_cache_free() we can
disable softirqs and put the buffer into cpu local caches.

CPU bound TCP ping pong style benchmarking (i.e. netbench) showed a 1%
throughput increase (392.2 -> 396.4 Krps). Cross checking with profiles,
the total CPU share of skb_attempt_defer_free() dropped by 0.6%. Note,
I'd expect the win doubled with rx only benchmarks, as the optimisation
is for the receive path, but the test spends >55% of CPU doing writes.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/a887463fb219d973ec5ad275e31194812571f1f5.1712711977.git.asml.silence@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7cb31c46

tcp: tweak tcp_sock_write_txrx size assertion · 9b9fd458

Eric Dumazet authored Apr 09, 2024

I forgot 32bit arches might have 64bit alignment for u64
fields.

tcp_sock_write_txrx group does not contain pointers,
but two u64 fields. It is possible that on 32bit kernel,
a 32bit hole is before tp->tcp_clock_cache.

I will try to remember a group can be bigger on 32bit
kernels in the future.

With help from Vladimir Oltean.

Fixes: d2c3a7eb ("tcp: more struct tcp_sock adjustments")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202404082207.HCEdQhUO-lkp@intel.com/Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20240409140914.4105429-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

9b9fd458

10 Apr, 2024 16 commits

Merge branch 'selftests-move-bpf-offload-test-from-bpf-to-net' · 414e576f

Jakub Kicinski authored Apr 10, 2024

Jakub Kicinski says:

====================
selftests: move bpf-offload test from bpf to net

The test_offload.py test fits in networking and bpf equally
well. We started adding more Python tests in networking
and some of the code in test_offload.py can be reused,
so move it to networking. Looks like it bit rotted over
time and some fixes are needed.

Admittedly more code could be extracted but I only had
the time for a minor cleanup :(
====================

Link: https://lore.kernel.org/r/20240409031549.3531084-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

414e576f

selftests: net: reuse common code in bpf_offload · 6ce2b689

Jakub Kicinski authored Apr 08, 2024

net/lib/py/nsim.py already contains the most useful parts
of the netdevsim wrapper classes. Reuse them.
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240409031549.3531084-5-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6ce2b689

selftests: net: declare section names for bpf_offload · b1c2ce11

Jakub Kicinski authored Apr 08, 2024

Non-ancient ip (iproute2-5.15.0, libbpf 0.7.0) refuses to load
the sample with maps because we don't generate BTF:

   libbpf: BTF is required, but is missing or corrupted.
   ERROR: opening BPF object file failed

Enable BTF by adding -g to clang flags. With that done
neither of the programs load:

  libbpf: prog 'func': error relocating .BTF.ext function info: -22
  libbpf: prog 'func': failed to relocate calls: -22
  libbpf: failed to load object 'ksft-net-drv/net/sample_ret0.bpf.o'

Andrii explains that this is because we don't specify
section names for the code. Add the section names, too.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240409031549.3531084-4-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b1c2ce11

selftests: net: bpf_offload: wait for maps · fc50c698

Jakub Kicinski authored Apr 08, 2024

Maps are removed asynchronously. Either there's a bigger delay
now or the test has always been flaky. Retry waiting in the loop.
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240409031549.3531084-3-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fc50c698

selftests: move bpf-offload test from bpf to net · e59f0e93

Jakub Kicinski authored Apr 08, 2024

We're building more python tests on the netdev side, and some
of the classes from the venerable BPF offload tests can be reused.
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240409031549.3531084-2-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e59f0e93

net: sched: cls_api: fix slab-use-after-free in fl_dump_key · 2ecd487b

Jianbo Liu authored Apr 08, 2024

The filter counter is updated under the protection of cb_lock in the
cited commit. While waiting for the lock, it's possible the filter is
being deleted by other thread, and thus causes UAF when dump it.

Fix this issue by moving tcf_block_filter_cnt_update() after
tfilter_put().

 ==================================================================
 BUG: KASAN: slab-use-after-free in fl_dump_key+0x1d3e/0x20d0 [cls_flower]
 Read of size 4 at addr ffff88814f864000 by task tc/2973

 CPU: 7 PID: 2973 Comm: tc Not tainted 6.9.0-rc2_for_upstream_debug_2024_04_02_12_41 #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x7e/0xc0
  print_report+0xc1/0x600
  ? __virt_addr_valid+0x1cf/0x390
  ? fl_dump_key+0x1d3e/0x20d0 [cls_flower]
  ? fl_dump_key+0x1d3e/0x20d0 [cls_flower]
  kasan_report+0xb9/0xf0
  ? fl_dump_key+0x1d3e/0x20d0 [cls_flower]
  fl_dump_key+0x1d3e/0x20d0 [cls_flower]
  ? lock_acquire+0x1c2/0x530
  ? fl_dump+0x172/0x5c0 [cls_flower]
  ? lockdep_hardirqs_on_prepare+0x400/0x400
  ? fl_dump_key_options.part.0+0x10f0/0x10f0 [cls_flower]
  ? do_raw_spin_lock+0x12d/0x270
  ? spin_bug+0x1d0/0x1d0
  fl_dump+0x21d/0x5c0 [cls_flower]
  ? fl_tmplt_dump+0x1f0/0x1f0 [cls_flower]
  ? nla_put+0x15f/0x1c0
  tcf_fill_node+0x51b/0x9a0
  ? tc_skb_ext_tc_enable+0x150/0x150
  ? __alloc_skb+0x17b/0x310
  ? __build_skb_around+0x340/0x340
  ? down_write+0x1b0/0x1e0
  tfilter_notify+0x1a5/0x390
  ? fl_terse_dump+0x400/0x400 [cls_flower]
  tc_new_tfilter+0x963/0x2170
  ? tc_del_tfilter+0x1490/0x1490
  ? print_usage_bug.part.0+0x670/0x670
  ? lock_downgrade+0x680/0x680
  ? security_capable+0x51/0x90
  ? tc_del_tfilter+0x1490/0x1490
  rtnetlink_rcv_msg+0x75e/0xac0
  ? if_nlmsg_stats_size+0x4c0/0x4c0
  ? lockdep_set_lock_cmp_fn+0x190/0x190
  ? __netlink_lookup+0x35e/0x6e0
  netlink_rcv_skb+0x12c/0x360
  ? if_nlmsg_stats_size+0x4c0/0x4c0
  ? netlink_ack+0x15e0/0x15e0
  ? lockdep_hardirqs_on_prepare+0x400/0x400
  ? netlink_deliver_tap+0xcd/0xa60
  ? netlink_deliver_tap+0xcd/0xa60
  ? netlink_deliver_tap+0x1c9/0xa60
  netlink_unicast+0x43e/0x700
  ? netlink_attachskb+0x750/0x750
  ? lock_acquire+0x1c2/0x530
  ? __might_fault+0xbb/0x170
  netlink_sendmsg+0x749/0xc10
  ? netlink_unicast+0x700/0x700
  ? __might_fault+0xbb/0x170
  ? netlink_unicast+0x700/0x700
  __sock_sendmsg+0xc5/0x190
  ____sys_sendmsg+0x534/0x6b0
  ? import_iovec+0x7/0x10
  ? kernel_sendmsg+0x30/0x30
  ? __copy_msghdr+0x3c0/0x3c0
  ? entry_SYSCALL_64_after_hwframe+0x46/0x4e
  ? lock_acquire+0x1c2/0x530
  ? __virt_addr_valid+0x116/0x390
  ___sys_sendmsg+0xeb/0x170
  ? __virt_addr_valid+0x1ca/0x390
  ? copy_msghdr_from_user+0x110/0x110
  ? __delete_object+0xb8/0x100
  ? __virt_addr_valid+0x1cf/0x390
  ? do_sys_openat2+0x102/0x150
  ? lockdep_hardirqs_on_prepare+0x284/0x400
  ? do_sys_openat2+0x102/0x150
  ? __fget_light+0x53/0x1d0
  ? sockfd_lookup_light+0x1a/0x150
  __sys_sendmsg+0xb5/0x140
  ? __sys_sendmsg_sock+0x20/0x20
  ? lock_downgrade+0x680/0x680
  do_syscall_64+0x70/0x140
  entry_SYSCALL_64_after_hwframe+0x46/0x4e
 RIP: 0033:0x7f98e3713367
 Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
 RSP: 002b:00007ffc74a64608 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 000000000047eae0 RCX: 00007f98e3713367
 RDX: 0000000000000000 RSI: 00007ffc74a64670 RDI: 0000000000000003
 RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000000
 R10: 00007f98e360c5e8 R11: 0000000000000246 R12: 00007ffc74a6a508
 R13: 00000000660d518d R14: 0000000000484a80 R15: 00007ffc74a6a50b
  </TASK>

 Allocated by task 2973:
  kasan_save_stack+0x20/0x40
  kasan_save_track+0x10/0x30
  __kasan_kmalloc+0x77/0x90
  fl_change+0x27a6/0x4540 [cls_flower]
  tc_new_tfilter+0x879/0x2170
  rtnetlink_rcv_msg+0x75e/0xac0
  netlink_rcv_skb+0x12c/0x360
  netlink_unicast+0x43e/0x700
  netlink_sendmsg+0x749/0xc10
  __sock_sendmsg+0xc5/0x190
  ____sys_sendmsg+0x534/0x6b0
  ___sys_sendmsg+0xeb/0x170
  __sys_sendmsg+0xb5/0x140
  do_syscall_64+0x70/0x140
  entry_SYSCALL_64_after_hwframe+0x46/0x4e

 Freed by task 283:
  kasan_save_stack+0x20/0x40
  kasan_save_track+0x10/0x30
  kasan_save_free_info+0x37/0x50
  poison_slab_object+0x105/0x190
  __kasan_slab_free+0x11/0x30
  kfree+0x111/0x340
  process_one_work+0x787/0x1490
  worker_thread+0x586/0xd30
  kthread+0x2df/0x3b0
  ret_from_fork+0x2d/0x70
  ret_from_fork_asm+0x11/0x20

 Last potentially related work creation:
  kasan_save_stack+0x20/0x40
  __kasan_record_aux_stack+0x9b/0xb0
  insert_work+0x25/0x1b0
  __queue_work+0x640/0xc90
  rcu_work_rcufn+0x42/0x70
  rcu_core+0x6a9/0x1850
  __do_softirq+0x264/0x88f

 Second to last potentially related work creation:
  kasan_save_stack+0x20/0x40
  __kasan_record_aux_stack+0x9b/0xb0
  __call_rcu_common.constprop.0+0x6f/0xac0
  queue_rcu_work+0x56/0x70
  fl_mask_put+0x20d/0x270 [cls_flower]
  __fl_delete+0x352/0x6b0 [cls_flower]
  fl_delete+0x97/0x160 [cls_flower]
  tc_del_tfilter+0x7d1/0x1490
  rtnetlink_rcv_msg+0x75e/0xac0
  netlink_rcv_skb+0x12c/0x360
  netlink_unicast+0x43e/0x700
  netlink_sendmsg+0x749/0xc10
  __sock_sendmsg+0xc5/0x190
  ____sys_sendmsg+0x534/0x6b0
  ___sys_sendmsg+0xeb/0x170
  __sys_sendmsg+0xb5/0x140
  do_syscall_64+0x70/0x140
  entry_SYSCALL_64_after_hwframe+0x46/0x4e

Fixes: 2081fd34 ("net: sched: cls_api: add filter counter")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Tested-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

2ecd487b

Merge branch 'minor-cleanups-to-skb-frag-ref-unref' · 811b8362

Jakub Kicinski authored Apr 09, 2024

Mina Almasry says:

====================
Minor cleanups to skb frag ref/unref (part)

This series is largely motivated by a recent discussion where there was
some confusion on how to properly ref/unref pp pages vs non pp pages:

https://lore.kernel.org/netdev/CAHS8izOoO-EovwMwAm9tLYetwikNPxC0FKyVGu1TPJWSz4bGoA@mail.gmail.com/T/#t

There is some subtely there because pp uses page->pp_ref_count for
refcounting, while non-pp uses get_page()/put_page() for ref counting.
Getting the refcounting pairs wrong can lead to kernel crash.
[...]

https://lore.kernel.org/lkml/CAHS8izN436pn3SndrzsCyhmqvJHLyxgCeDpWXA4r1ANt3RCDLQ@mail.gmail.com/T/
====================

Link: https://lore.kernel.org/r/20240408153000.2152844-1-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

811b8362

net: remove napi_frag_unref · f58f3c95

Mina Almasry authored Apr 08, 2024

With the changes in the last patches, napi_frag_unref() is now
reduandant. Remove it and use skb_page_unref directly.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240408153000.2152844-4-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f58f3c95

net: make napi_frag_unref reuse skb_page_unref · 959fa5c1

Mina Almasry authored Apr 08, 2024

The implementations of these 2 functions are almost identical. Remove
the implementation of napi_frag_unref, and make it a call into
skb_page_unref so we don't duplicate the implementation.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240408153000.2152844-2-almasrymina@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

959fa5c1

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · 445e6030

Jakub Kicinski authored Apr 09, 2024

Tony Nguyen says:

====================
net/e1000e, igb, igc: Remove redundant runtime resume

Bjorn Helgaas says:

e1000e, igb, and igc all have code to runtime resume the device during
ethtool operations.

Since f32a2137 ("ethtool: runtime-resume netdev parent before ethtool
ioctl ops"), dev_ethtool() does this for us, so remove it from the
individual drivers.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  igc: Remove redundant runtime resume for ethtool ops
  igb: Remove redundant runtime resume for ethtool_ops
  e1000e: Remove redundant runtime resume for ethtool_ops
====================

Link: https://lore.kernel.org/r/20240408210849.3641172-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

445e6030

Merge branch 'bonding-remove-rtnl-from-three-sysfs-files' · 91f2210c

Jakub Kicinski authored Apr 09, 2024

Eric Dumazet says:

====================
bonding: remove RTNL from three sysfs files

First patch might fix a potential deadlock.
sysfs handlers should use rtnl_trylock() instead of rtnl_lock().

Following files can be read without acquiring RTNL :

- /sys/class/net/bonding_masters
- /sys/class/net/<name>/bonding/slaves
- /sys/class/net/<name>/bonding/queue_id
====================

Link: https://lore.kernel.org/r/20240408190437.2214473-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

91f2210c

bonding: no longer use RTNL in bonding_show_queue_id() · 662e451d

Eric Dumazet authored Apr 08, 2024

Annotate lockless reads of slave->queue_id.

Annotate writes of slave->queue_id.

Switch bonding_show_queue_id() to rcu_read_lock()
and bond_for_each_slave_rcu().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20240408190437.2214473-4-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

662e451d

bonding: no longer use RTNL in bonding_show_slaves() · d67fed98

Eric Dumazet authored Apr 08, 2024

Slave devices are already RCU protected, simply
switch to bond_for_each_slave_rcu(),
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20240408190437.2214473-3-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d67fed98

bonding: no longer use RTNL in bonding_show_bonds() · 6c5d1714

Eric Dumazet authored Apr 08, 2024

netdev structures are already RCU protected.

Change bond_init() and bond_uninit() to use RCU
enabled list_add_tail_rcu() and list_del_rcu().

Then bonding_show_bonds() can use rcu_read_lock()
while iterating through bn->dev_list.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20240408190437.2214473-2-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6c5d1714

net: sched: cake: Optimize the number of function calls and branches in heap construction · d034d02d

Kuan-Wei Chiu authored Apr 09, 2024

When constructing a heap, heapify operations are required on all
non-leaf nodes. Thus, determining the index of the first non-leaf node
is crucial. In a heap, the left child's index of node i is 2 * i + 1
and the right child's index is 2 * i + 2. Node CAKE_MAX_TINS *
CAKE_QUEUES / 2 has its left and right children at indexes
CAKE_MAX_TINS * CAKE_QUEUES + 1 and CAKE_MAX_TINS * CAKE_QUEUES + 2,
respectively, which are beyond the heap's range, indicating it as a
leaf node. Conversely, node CAKE_MAX_TINS * CAKE_QUEUES / 2 - 1 has a
left child at index CAKE_MAX_TINS * CAKE_QUEUES - 1, confirming its
non-leaf status. The loop should start from it since it's not a leaf
node.

By starting the loop from CAKE_MAX_TINS * CAKE_QUEUES / 2 - 1, we
minimize function calls and branch condition evaluations. This
adjustment theoretically reduces two function calls (one for
cake_heapify() and another for cake_heap_get_backlog()) and five branch
evaluations (one for iterating all non-leaf nodes, one within
cake_heapify()'s while loop, and three more within the while loop
with if conditions).
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://lore.kernel.org/r/20240408174716.751069-1-visitorckw@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d034d02d

cxgb4: flower: use NL_SET_ERR_MSG_MOD for validation errors · 545d95e5

Asbjørn Sloth Tønnesen authored Apr 08, 2024

Replace netdev_{warn,err} with NL_SET_ERR_MSG_{FMT_,}MOD
to better inform the user about the problem.

Only compile-tested, no access to HW.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Link: https://lore.kernel.org/r/20240408165506.94483-1-ast@fiberby.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

545d95e5

09 Apr, 2024 16 commits

net: phy: dp8382x: keep WOL settings across suspends · 9ef9ecfa

Catalin Popescu authored Apr 08, 2024

Unlike other ethernet PHYs from TI, PHY dp8382x has WOL enabled
at reset. The driver explicitly disables WOL in config_init callback
which is called during init and during resume from suspend. Hence,
WOL is unconditionally disabled during resume, even if it was enabled
before the suspend. We make sure that WOL configuration is persistent
across suspends.
Signed-off-by: Catalin Popescu <catalin.popescu@leica-geosystems.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240408082602.3654090-1-catalin.popescu@leica-geosystems.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

9ef9ecfa

Merge branch 'net-phy-micrel-lan8814-enable-ptp_pf_perout' · 6a053f07

Paolo Abeni authored Apr 09, 2024

Horatiu Vultur says:

====================
net: phy: micrel: lan8814: Enable PTP_PF_PEROUT

Add support for PTP_PF_PEROUT to lan8814. First patch just enables
the LTC at probe time, such that it is not required to enable
timestamping to have the LTC enabled. While the second patch actually
adds support for PTP_PF_PEROUT.
====================

Link: https://lore.kernel.org/r/20240408064432.3881636-1-horatiu.vultur@microchip.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

6a053f07

net: phy: micrel: lan8814: Add support for PTP_PF_PEROUT · 9e63941b

Horatiu Vultur authored Apr 08, 2024

Lan8814 has 24 GPIOs but only 2 GPIOs (GPIO 0 and GPIO 1) can be
configured to generate period signals. And there are 2 events (EVENT_A
and EVENT_B) but these events are hardcoded to the GPIO 0 and GPIO 1.
These events are used to generate period signals. It is possible to
configure the length, the start time and the period of the signal by
configuring the event.

These events are generated by comparing the target time with the PHC
time. In case the PHC time is changed to a value bigger than the target
time + reload time, then it would generate only 1 event and then it
would stop because target time + reload time is smaller than PHC time.
Therefore it is required to change also the target time every time when
the PHC is changed. The same will apply also when the PHC time is
changed to a smaller value.

This was tested using:
testptp -i 1 -L 1,2
testptp -i 1 -p 1000000000 -w 200000000
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Divya Koppera <divya.koppera@microchip.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

9e63941b

net: phy: micrel: lan8814: Enable LTC at probe time · 9f6b3a49

Horatiu Vultur authored Apr 08, 2024

The LTC for lan8814 was enabled only if timestamping was enabled,
otherwise it would be stopped. Meaning that LTC will not increase by
itself. This might break other features that don't required timestamping
like generating 1PPS. Therefore enable the LTC at probe time.
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

9f6b3a49

dt-bindings: net: rockchip-dwmac: use rgmii-id in example · 220d63f2

Sascha Hauer authored Apr 08, 2024

The dwmac supports specifying the RGMII clock delays, but it is
recommended to use rgmii-id and to specify the delays in the phy node
instead [1].

Change the example accordingly to no longer promote this undesired
setting.

[1] https://lore.kernel.org/all/1a0de7b4-f0f7-4080-ae48-f5ffa9e76be3@lunn.ch/Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Dragan Simic <dsimic@manjaro.org>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240408-rockchip-dwmac-rgmii-id-binding-v1-1-3886d1a8bd54@pengutronix.deSigned-off-by: Paolo Abeni <pabeni@redhat.com>

220d63f2

Merge branch 'tcp-fix-isn-selection-in-timewait-syn_recv' · d2fd6cf3

Paolo Abeni authored Apr 09, 2024

Eric Dumazet says:

====================
tcp: fix ISN selection in TIMEWAIT -> SYN_RECV

TCP can transform a TIMEWAIT socket into a SYN_RECV one from
a SYN packet, and the ISN of the SYNACK packet is normally
generated using TIMEWAIT tw_snd_nxt.

This SYN packet also bypasses normal checks against listen queue
being full or not.

Unfortunately this has been broken almost one decade ago.

This series fixes the issue, in two patches.

First patch refactors code to add tcp_tw_isn as a parameter
to ->route_req(), to make the second patch smaller.

Second patch fixes the issue, by no longer using TCP_SKB_CB(skb)
to store the tcp_tw_isn.

Following packetdrill test passes after this series:

// Set up a server listening socket.
    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

// Establish connection
   +0 < S 0:0(0) win 32792 <mss 1460,nop,nop,sackOK>
   +0 > S. 0:0(0) ack 1    <mss 1460,nop,nop,sackOK>
 +.01 < . 1:1(0) ack 1 win 32792

   +0 accept(3, ..., ...) = 4

// We close(), send a FIN, and get an ACK and FIN, in order to get into TIME_WAIT.

 +.01 close(4) = 0
   +0 > F. 1:1(0) ack 1
 +.01 < F. 1:1(0) ack 2 win 32792
   +0 > . 2:2(0) ack 2

// SYN hitting a TIME_WAIT -> should use an ISN based on TIMEWAIT tw_snd_nxt

 +.01 < S 1000:1000(0) win 65535 <mss 1460,nop,nop,sackOK>
   +0 > S. 65539:65539(0) ack 1001 <mss 1460,nop,nop,sackOK>
====================

Link: https://lore.kernel.org/r/20240407093322.3172088-1-edumazet@google.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

d2fd6cf3

tcp: replace TCP_SKB_CB(skb)->tcp_tw_isn with a per-cpu field · 41eecbd7

Eric Dumazet authored Apr 07, 2024

TCP can transform a TIMEWAIT socket into a SYN_RECV one from
a SYN packet, and the ISN of the SYNACK packet is normally
generated using TIMEWAIT tw_snd_nxt :

tcp_timewait_state_process()
...
    u32 isn = tcptw->tw_snd_nxt + 65535 + 2;
    if (isn == 0)
        isn++;
    TCP_SKB_CB(skb)->tcp_tw_isn = isn;
    return TCP_TW_SYN;

This SYN packet also bypasses normal checks against listen queue
being full or not.

tcp_conn_request()
...
       __u32 isn = TCP_SKB_CB(skb)->tcp_tw_isn;
...
        /* TW buckets are converted to open requests without
         * limitations, they conserve resources and peer is
         * evidently real one.
         */
        if ((syncookies == 2 || inet_csk_reqsk_queue_is_full(sk)) && !isn) {
                want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name);
                if (!want_cookie)
                        goto drop;
        }

This was using TCP_SKB_CB(skb)->tcp_tw_isn field in skb.

Unfortunately this field has been accidentally cleared
after the call to tcp_timewait_state_process() returning
TCP_TW_SYN.

Using a field in TCP_SKB_CB(skb) for a temporary state
is overkill.

Switch instead to a per-cpu variable.

As a bonus, we do not have to clear tcp_tw_isn in TCP receive
fast path.
It is temporarily set then cleared only in the TCP_TW_SYN dance.

Fixes: 4ad19de8 ("net: tcp6: fix double call of tcp_v6_fill_cb()")
Fixes: eeea10b8 ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

41eecbd7

tcp: propagate tcp_tw_isn via an extra parameter to ->route_req() · b9e81040

Eric Dumazet authored Apr 07, 2024

tcp_v6_init_req() reads TCP_SKB_CB(skb)->tcp_tw_isn to find
out if the request socket is created by a SYN hitting a TIMEWAIT socket.

This has been buggy for a decade, lets directly pass the information
from tcp_conn_request().

This is a preparatory patch to make the following one easier to review.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

b9e81040

Merge branch 'add-support-for-flower-actions-mirred-and-redirect' · 1c25fe9a

Paolo Abeni authored Apr 09, 2024

Daniel Machon says:

====================
Add support for flower actions mirred and redirect

This series adds support for the two tc flower actions mirred and
redirect. Both actions are implemented by means of a port mask and a
mask mode. The mask mode controls how the mask is applied, and together
they are used by the switch to make a forwarding decision. Both actions
are configurable via the IS0 or IS2 VCAP's (ingress stage 0 and 2,
respectively).

Patch #1: adds support for tc flower mirred action.
Patch #2: adds support for tc flower redirect action.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
====================

Link: https://lore.kernel.org/r/20240405-mirror-redirect-actions-v2-0-875d4c1927c8@microchip.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

1c25fe9a

net: sparx5: add support for tc flower redirect action · 1164b8e0

Daniel Machon authored Apr 05, 2024

Add support for the flower redirect action. Two VCAP actions are encoded
in the rule - one for the port mask, and one for the port mask mode.
When the rule is hit, the port mask is used as the final destination
set, replacing all other port masks.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

1164b8e0

net: sparx5: add support for tc flower mirred action. · 48ba00da

Daniel Machon authored Apr 05, 2024

Add support for tc flower mirred action. Two VCAP actions are encoded in
the rule - one for the port mask, and one for the port mask mode. When
the rule is hit, the destination mask is OR'ed with the port mask.

Also add new VCAP function for supporting 72-bit wide actions, and a tc
helper for setting the port forwarding mask.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

48ba00da

Merge branch 'support-icssg-based-ethernet-on-am65x-sr1-0-devices' · 74bd5dbe

Paolo Abeni authored Apr 09, 2024

Diogo Ivo says:

====================
Support ICSSG-based Ethernet on AM65x SR1.0 devices

This series extends the current ICSSG-based Ethernet driver to support
AM65x Silicon Revision 1.0 devices.

Notable differences between the Silicon Revisions are that there is
no TX core in SR1.0 with this being handled by the firmware, requiring
extra DMA channels to manage communication with the firmware (with the
firmware being different as well) and in the packet classifier.

The motivation behind it is that a significant number of Siemens
devices containing SR1.0 silicon have been deployed in the field
and need to be supported and updated to newer kernel versions
without losing functionality.

This series is based on TI's 5.10 SDK [1].

The fifth version of this patch series can be found in [2].

Compared to the last version of the patch set there are only changes in
patch 05/10, where the fields of a struct are now explicitly declared as
__le32 so that we can properly interpret them.

Both of the problems mentioned in v4 have been addressed by disabling
those functionalities, meaning that this driver currently only supports
one TX queue and does not support a 100Mbit/s half-duplex connection.
The removal of these features has been commented in the appropriate
locations in the code.

[1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.y
[2]: https://lore.kernel.org/netdev/20240326110709.26165-1-diogo.ivo@siemens.com/
====================

Link: https://lore.kernel.org/r/20240403104821.283832-1-diogo.ivo@siemens.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

74bd5dbe

net: ti: icssg-prueth: Add ICSSG Ethernet driver for AM65x SR1.0 platforms · e654b85a

Diogo Ivo authored Apr 03, 2024

Add the PRUeth driver for the ICSSG subsystem found in AM65x SR1.0 devices.
The main differences that set SR1.0 and SR2.0 apart are the missing TXPRU
core in SR1.0, two extra DMA channels for management purposes and different
firmware that needs to be configured accordingly.

Based on the work of Roger Quadros, Vignesh Raghavendra and
Grygorii Strashko in TI's 5.10 SDK [1].

[1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.yCo-developed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

e654b85a

net: ti: icssg-prueth: Modify common functions for SR1.0 · ce95cb4c

Diogo Ivo authored Apr 03, 2024

Some parts of the logic differ only slightly between Silicon Revisions.
In these cases add the bits that differ to a common function that
executes those bits conditionally based on the Silicon Revision.

Based on the work of Roger Quadros, Vignesh Raghavendra and
Grygorii Strashko in TI's 5.10 SDK [1].

[1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.yCo-developed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ce95cb4c

net: ti: icssg-prueth: Add functions to configure SR1.0 packet classifier · 0a74a9de

Diogo Ivo authored Apr 03, 2024

Add the functions to configure the SR1.0 packet classifier.

Based on the work of Roger Quadros in TI's 5.10 SDK [1].

[1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.yCo-developed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

0a74a9de

net: ti: icssg-prueth: Adjust the number of TX channels for SR1.0 · 604e603d

Diogo Ivo authored Apr 03, 2024

As SR1.0 uses the current higher priority channel to send commands to
the firmware, take this into account when setting/getting the number
of channels to/from the user.

Based on the work of Roger Quadros in TI's 5.10 SDK [1].

[1]: https://git.ti.com/cgit/ti-linux-kernel/ti-linux-kernel/tree/?h=ti-linux-5.10.yCo-developed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

604e603d