1. 16 Oct, 2024 9 commits
    • Jinjie Ruan's avatar
      net: microchip: vcap api: Fix memory leaks in vcap_api_encode_rule_test() · 217a3d98
      Jinjie Ruan authored
      Commit a3c1e451 ("net: microchip: vcap: Fix use-after-free error in
      kunit test") fixed the use-after-free error, but introduced below
      memory leaks by removing necessary vcap_free_rule(), add it to fix it.
      
      	unreferenced object 0xffffff80ca58b700 (size 192):
      	  comm "kunit_try_catch", pid 1215, jiffies 4294898264
      	  hex dump (first 32 bytes):
      	    00 12 7a 00 05 00 00 00 0a 00 00 00 64 00 00 00  ..z.........d...
      	    00 00 00 00 00 00 00 00 00 04 0b cc 80 ff ff ff  ................
      	  backtrace (crc 9c09c3fe):
      	    [<0000000052a0be73>] kmemleak_alloc+0x34/0x40
      	    [<0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
      	    [<0000000040a01b8d>] vcap_alloc_rule+0x3cc/0x9c4
      	    [<000000003fe86110>] vcap_api_encode_rule_test+0x1ac/0x16b0
      	    [<00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
      	    [<0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
      	    [<00000000c5d82c9a>] kthread+0x2e8/0x374
      	    [<00000000f4287308>] ret_from_fork+0x10/0x20
      	unreferenced object 0xffffff80cc0b0400 (size 64):
      	  comm "kunit_try_catch", pid 1215, jiffies 4294898265
      	  hex dump (first 32 bytes):
      	    80 04 0b cc 80 ff ff ff 18 b7 58 ca 80 ff ff ff  ..........X.....
      	    39 00 00 00 02 00 00 00 06 05 04 03 02 01 ff ff  9...............
      	  backtrace (crc daf014e9):
      	    [<0000000052a0be73>] kmemleak_alloc+0x34/0x40
      	    [<0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
      	    [<000000000ff63fd4>] vcap_rule_add_key+0x2cc/0x528
      	    [<00000000dfdb1e81>] vcap_api_encode_rule_test+0x224/0x16b0
      	    [<00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
      	    [<0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
      	    [<00000000c5d82c9a>] kthread+0x2e8/0x374
      	    [<00000000f4287308>] ret_from_fork+0x10/0x20
      	unreferenced object 0xffffff80cc0b0700 (size 64):
      	  comm "kunit_try_catch", pid 1215, jiffies 4294898265
      	  hex dump (first 32 bytes):
      	    80 07 0b cc 80 ff ff ff 28 b7 58 ca 80 ff ff ff  ........(.X.....
      	    3c 00 00 00 00 00 00 00 01 2f 03 b3 ec ff ff ff  <......../......
      	  backtrace (crc 8d877792):
      	    [<0000000052a0be73>] kmemleak_alloc+0x34/0x40
      	    [<0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
      	    [<000000006eadfab7>] vcap_rule_add_action+0x2d0/0x52c
      	    [<00000000323475d1>] vcap_api_encode_rule_test+0x4d4/0x16b0
      	    [<00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
      	    [<0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
      	    [<00000000c5d82c9a>] kthread+0x2e8/0x374
      	    [<00000000f4287308>] ret_from_fork+0x10/0x20
      	unreferenced object 0xffffff80cc0b0900 (size 64):
      	  comm "kunit_try_catch", pid 1215, jiffies 4294898266
      	  hex dump (first 32 bytes):
      	    80 09 0b cc 80 ff ff ff 80 06 0b cc 80 ff ff ff  ................
      	    7d 00 00 00 01 00 00 00 00 00 00 00 ff 00 00 00  }...............
      	  backtrace (crc 34181e56):
      	    [<0000000052a0be73>] kmemleak_alloc+0x34/0x40
      	    [<0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
      	    [<000000000ff63fd4>] vcap_rule_add_key+0x2cc/0x528
      	    [<00000000991e3564>] vcap_val_rule+0xcf0/0x13e8
      	    [<00000000fc9868e5>] vcap_api_encode_rule_test+0x678/0x16b0
      	    [<00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
      	    [<0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
      	    [<00000000c5d82c9a>] kthread+0x2e8/0x374
      	    [<00000000f4287308>] ret_from_fork+0x10/0x20
      	unreferenced object 0xffffff80cc0b0980 (size 64):
      	  comm "kunit_try_catch", pid 1215, jiffies 4294898266
      	  hex dump (first 32 bytes):
      	    18 b7 58 ca 80 ff ff ff 00 09 0b cc 80 ff ff ff  ..X.............
      	    67 00 00 00 00 00 00 00 01 01 74 88 c0 ff ff ff  g.........t.....
      	  backtrace (crc 275fd9be):
      	    [<0000000052a0be73>] kmemleak_alloc+0x34/0x40
      	    [<0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
      	    [<000000000ff63fd4>] vcap_rule_add_key+0x2cc/0x528
      	    [<000000001396a1a2>] test_add_def_fields+0xb0/0x100
      	    [<000000006e7621f0>] vcap_val_rule+0xa98/0x13e8
      	    [<00000000fc9868e5>] vcap_api_encode_rule_test+0x678/0x16b0
      	    [<00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
      	    [<0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
      	    [<00000000c5d82c9a>] kthread+0x2e8/0x374
      	    [<00000000f4287308>] ret_from_fork+0x10/0x20
      	......
      
      Cc: stable@vger.kernel.org
      Fixes: a3c1e451 ("net: microchip: vcap: Fix use-after-free error in kunit test")
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarJens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com>
      Signed-off-by: default avatarJinjie Ruan <ruanjinjie@huawei.com>
      Link: https://patch.msgid.link/20241014121922.1280583-1-ruanjinjie@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      217a3d98
    • Jakub Kicinski's avatar
      Merge branch 'net-phy-mdio-bcm-unimac-add-bcm6846-variant' · 9626c182
      Jakub Kicinski authored
      Linus Walleij says:
      
      ====================
      net: phy: mdio-bcm-unimac: Add BCM6846 variant
      
      As pointed out by Florian:
      https://lore.kernel.org/linux-devicetree/b542b2e8-115c-4234-a464-e73aa6bece5c@broadcom.com/
      
      The BCM6846 has a few extra registers and cannot reuse the
      compatible string from other variants of the Unimac
      MDIO block: we need to be able to tell them apart.
      ====================
      
      Link: https://patch.msgid.link/20241012-bcm6846-mdio-v1-0-c703ca83e962@linaro.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9626c182
    • Linus Walleij's avatar
      net: phy: mdio-bcm-unimac: Add BCM6846 support · 906b77ca
      Linus Walleij authored
      Add Unimac mdio compatible string for the special BCM6846
      variant.
      
      This variant has a few extra registers compared to other
      versions.
      Suggested-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Link: https://lore.kernel.org/linux-devicetree/b542b2e8-115c-4234-a464-e73aa6bece5c@broadcom.com/Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Link: https://patch.msgid.link/20241012-bcm6846-mdio-v1-2-c703ca83e962@linaro.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      906b77ca
    • Linus Walleij's avatar
      dt-bindings: net: brcm,unimac-mdio: Add bcm6846-mdio · 6ed97afd
      Linus Walleij authored
      The MDIO block in the BCM6846 is not identical to any of the
      previous versions, but has extended registers not present in
      the other variants. For this reason we need to use a new
      compatible especially for this SoC.
      Suggested-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Link: https://lore.kernel.org/linux-devicetree/b542b2e8-115c-4234-a464-e73aa6bece5c@broadcom.com/Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Acked-by: default avatarRob Herring (Arm) <robh@kernel.org>
      Link: https://patch.msgid.link/20241012-bcm6846-mdio-v1-1-c703ca83e962@linaro.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6ed97afd
    • Jakub Sitnicki's avatar
      udp: Compute L4 checksum as usual when not segmenting the skb · d96016a7
      Jakub Sitnicki authored
      If:
      
        1) the user requested USO, but
        2) there is not enough payload for GSO to kick in, and
        3) the egress device doesn't offer checksum offload, then
      
      we want to compute the L4 checksum in software early on.
      
      In the case when we are not taking the GSO path, but it has been requested,
      the software checksum fallback in skb_segment doesn't get a chance to
      compute the full checksum, if the egress device can't do it. As a result we
      end up sending UDP datagrams with only a partial checksum filled in, which
      the peer will discard.
      
      Fixes: 10154dbd ("udp: Allow GSO transmit from devices with no checksum offload")
      Reported-by: default avatarIvan Babrou <ivan@cloudflare.com>
      Signed-off-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: default avatarWillem de Bruijn <willemdebruijn.kernel@gmail.com>
      Cc: stable@vger.kernel.org
      Link: https://patch.msgid.link/20241011-uso-swcsum-fixup-v2-1-6e1ddc199af9@cloudflare.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d96016a7
    • Eric Dumazet's avatar
      genetlink: hold RCU in genlmsg_mcast() · 56440d7e
      Eric Dumazet authored
      While running net selftests with CONFIG_PROVE_RCU_LIST=y I saw
      one lockdep splat [1].
      
      genlmsg_mcast() uses for_each_net_rcu(), and must therefore hold RCU.
      
      Instead of letting all callers guard genlmsg_multicast_allns()
      with a rcu_read_lock()/rcu_read_unlock() pair, do it in genlmsg_mcast().
      
      This also means the @flags parameter is useless, we need to always use
      GFP_ATOMIC.
      
      [1]
      [10882.424136] =============================
      [10882.424166] WARNING: suspicious RCU usage
      [10882.424309] 6.12.0-rc2-virtme #1156 Not tainted
      [10882.424400] -----------------------------
      [10882.424423] net/netlink/genetlink.c:1940 RCU-list traversed in non-reader section!!
      [10882.424469]
      other info that might help us debug this:
      
      [10882.424500]
      rcu_scheduler_active = 2, debug_locks = 1
      [10882.424744] 2 locks held by ip/15677:
      [10882.424791] #0: ffffffffb6b491b0 (cb_lock){++++}-{3:3}, at: genl_rcv (net/netlink/genetlink.c:1219)
      [10882.426334] #1: ffffffffb6b49248 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg (net/netlink/genetlink.c:61 net/netlink/genetlink.c:57 net/netlink/genetlink.c:1209)
      [10882.426465]
      stack backtrace:
      [10882.426805] CPU: 14 UID: 0 PID: 15677 Comm: ip Not tainted 6.12.0-rc2-virtme #1156
      [10882.426919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
      [10882.427046] Call Trace:
      [10882.427131]  <TASK>
      [10882.427244] dump_stack_lvl (lib/dump_stack.c:123)
      [10882.427335] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822)
      [10882.427387] genlmsg_multicast_allns (net/netlink/genetlink.c:1940 (discriminator 7) net/netlink/genetlink.c:1977 (discriminator 7))
      [10882.427436] l2tp_tunnel_notify.constprop.0 (net/l2tp/l2tp_netlink.c:119) l2tp_netlink
      [10882.427683] l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:253) l2tp_netlink
      [10882.427748] genl_family_rcv_msg_doit (net/netlink/genetlink.c:1115)
      [10882.427834] genl_rcv_msg (net/netlink/genetlink.c:1195 net/netlink/genetlink.c:1210)
      [10882.427877] ? __pfx_l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:186) l2tp_netlink
      [10882.427927] ? __pfx_genl_rcv_msg (net/netlink/genetlink.c:1201)
      [10882.427959] netlink_rcv_skb (net/netlink/af_netlink.c:2551)
      [10882.428069] genl_rcv (net/netlink/genetlink.c:1220)
      [10882.428095] netlink_unicast (net/netlink/af_netlink.c:1332 net/netlink/af_netlink.c:1357)
      [10882.428140] netlink_sendmsg (net/netlink/af_netlink.c:1901)
      [10882.428210] ____sys_sendmsg (net/socket.c:729 (discriminator 1) net/socket.c:744 (discriminator 1) net/socket.c:2607 (discriminator 1))
      
      Fixes: 33f72e6f ("l2tp : multicast notification to the registered listeners")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: James Chapman <jchapman@katalix.com>
      Cc: Tom Parkin <tparkin@katalix.com>
      Cc: Johannes Berg <johannes.berg@intel.com>
      Link: https://patch.msgid.link/20241011171217.3166614-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      56440d7e
    • Peter Rashleigh's avatar
      net: dsa: mv88e6xxx: Fix the max_vid definition for the MV88E6361 · 1833d8a2
      Peter Rashleigh authored
      According to the Marvell datasheet the 88E6361 has two VTU pages
      (4k VIDs per page) so the max_vid should be 8191, not 4095.
      
      In the current implementation mv88e6xxx_vtu_walk() gives unexpected
      results because of this error. I verified that mv88e6xxx_vtu_walk()
      works correctly on the MV88E6361 with this patch in place.
      
      Fixes: 12899f29 ("net: dsa: mv88e6xxx: enable support for 88E6361 switch")
      Signed-off-by: default avatarPeter Rashleigh <peter@rashleigh.ca>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://patch.msgid.link/20241014204342.5852-1-peter@rashleigh.caSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1833d8a2
    • Kuniyuki Iwashima's avatar
      tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink(). · e8c526f2
      Kuniyuki Iwashima authored
      Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler().
      
        """
        We are seeing a use-after-free from a bpf prog attached to
        trace_tcp_retransmit_synack. The program passes the req->sk to the
        bpf_sk_storage_get_tracing kernel helper which does check for null
        before using it.
        """
      
      The commit 83fccfc3 ("inet: fix potential deadlock in
      reqsk_queue_unlink()") added timer_pending() in reqsk_queue_unlink() not
      to call del_timer_sync() from reqsk_timer_handler(), but it introduced a
      small race window.
      
      Before the timer is called, expire_timers() calls detach_timer(timer, true)
      to clear timer->entry.pprev and marks it as not pending.
      
      If reqsk_queue_unlink() checks timer_pending() just after expire_timers()
      calls detach_timer(), TCP will miss del_timer_sync(); the reqsk timer will
      continue running and send multiple SYN+ACKs until it expires.
      
      The reported UAF could happen if req->sk is close()d earlier than the timer
      expiration, which is 63s by default.
      
      The scenario would be
      
        1. inet_csk_complete_hashdance() calls inet_csk_reqsk_queue_drop(),
           but del_timer_sync() is missed
      
        2. reqsk timer is executed and scheduled again
      
        3. req->sk is accept()ed and reqsk_put() decrements rsk_refcnt, but
           reqsk timer still has another one, and inet_csk_accept() does not
           clear req->sk for non-TFO sockets
      
        4. sk is close()d
      
        5. reqsk timer is executed again, and BPF touches req->sk
      
      Let's not use timer_pending() by passing the caller context to
      __inet_csk_reqsk_queue_drop().
      
      Note that reqsk timer is pinned, so the issue does not happen in most
      use cases. [1]
      
      [0]
      BUG: KFENCE: use-after-free read in bpf_sk_storage_get_tracing+0x2e/0x1b0
      
      Use-after-free read at 0x00000000a891fb3a (in kfence-#1):
      bpf_sk_storage_get_tracing+0x2e/0x1b0
      bpf_prog_5ea3e95db6da0438_tcp_retransmit_synack+0x1d20/0x1dda
      bpf_trace_run2+0x4c/0xc0
      tcp_rtx_synack+0xf9/0x100
      reqsk_timer_handler+0xda/0x3d0
      run_timer_softirq+0x292/0x8a0
      irq_exit_rcu+0xf5/0x320
      sysvec_apic_timer_interrupt+0x6d/0x80
      asm_sysvec_apic_timer_interrupt+0x16/0x20
      intel_idle_irq+0x5a/0xa0
      cpuidle_enter_state+0x94/0x273
      cpu_startup_entry+0x15e/0x260
      start_secondary+0x8a/0x90
      secondary_startup_64_no_verify+0xfa/0xfb
      
      kfence-#1: 0x00000000a72cc7b6-0x00000000d97616d9, size=2376, cache=TCPv6
      
      allocated by task 0 on cpu 9 at 260507.901592s:
      sk_prot_alloc+0x35/0x140
      sk_clone_lock+0x1f/0x3f0
      inet_csk_clone_lock+0x15/0x160
      tcp_create_openreq_child+0x1f/0x410
      tcp_v6_syn_recv_sock+0x1da/0x700
      tcp_check_req+0x1fb/0x510
      tcp_v6_rcv+0x98b/0x1420
      ipv6_list_rcv+0x2258/0x26e0
      napi_complete_done+0x5b1/0x2990
      mlx5e_napi_poll+0x2ae/0x8d0
      net_rx_action+0x13e/0x590
      irq_exit_rcu+0xf5/0x320
      common_interrupt+0x80/0x90
      asm_common_interrupt+0x22/0x40
      cpuidle_enter_state+0xfb/0x273
      cpu_startup_entry+0x15e/0x260
      start_secondary+0x8a/0x90
      secondary_startup_64_no_verify+0xfa/0xfb
      
      freed by task 0 on cpu 9 at 260507.927527s:
      rcu_core_si+0x4ff/0xf10
      irq_exit_rcu+0xf5/0x320
      sysvec_apic_timer_interrupt+0x6d/0x80
      asm_sysvec_apic_timer_interrupt+0x16/0x20
      cpuidle_enter_state+0xfb/0x273
      cpu_startup_entry+0x15e/0x260
      start_secondary+0x8a/0x90
      secondary_startup_64_no_verify+0xfa/0xfb
      
      Fixes: 83fccfc3 ("inet: fix potential deadlock in reqsk_queue_unlink()")
      Reported-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Closes: https://lore.kernel.org/netdev/eb6684d0-ffd9-4bdc-9196-33f690c25824@linux.dev/
      Link: https://lore.kernel.org/netdev/b55e2ca0-42f2-4b7c-b445-6ffd87ca74a0@linux.dev/ [1]
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Link: https://patch.msgid.link/20241014223312.4254-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e8c526f2
    • Wang Hai's avatar
      net: bcmasp: fix potential memory leak in bcmasp_xmit() · fed07d3e
      Wang Hai authored
      The bcmasp_xmit() returns NETDEV_TX_OK without freeing skb
      in case of mapping fails, add dev_kfree_skb() to fix it.
      
      Fixes: 490cb412 ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
      Signed-off-by: default avatarWang Hai <wanghai38@huawei.com>
      Acked-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Link: https://patch.msgid.link/20241014145901.48940-1-wanghai38@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fed07d3e
  2. 15 Oct, 2024 18 commits
  3. 14 Oct, 2024 1 commit
  4. 11 Oct, 2024 10 commits
    • Alessandro Zanni's avatar
      selftests: drivers: net: fix name not defined · 174714f0
      Alessandro Zanni authored
      This fix solves this error, when calling kselftest with targets
      "drivers/net":
      
      File "tools/testing/selftests/net/lib/py/nsim.py", line 64, in __init__
        if e.errno == errno.ENOSPC:
      NameError: name 'errno' is not defined
      
      The error was found by running tests manually with the command:
      make kselftest TARGETS="drivers/net"
      
      The module errno makes available standard error system symbols.
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarAlessandro Zanni <alessandro.zanni87@gmail.com>
      Link: https://patch.msgid.link/20241010183034.24739-1-alessandro.zanni87@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      174714f0
    • Alessandro Zanni's avatar
      selftests: net/rds: add module not found · 6ea8a1c2
      Alessandro Zanni authored
      This fix solves this error, when calling kselftest with targets "net/rds":
      
      The error was found by running tests manually with the command:
      make kselftest TARGETS="net/rds"
      
      The patch also specifies to import ip() function from the utils module.
      Signed-off-by: default avatarAlessandro Zanni <alessandro.zanni87@gmail.com>
      Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      Link: https://patch.msgid.link/20241010194421.48198-1-alessandro.zanni87@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6ea8a1c2
    • Wei Fang's avatar
      net: enetc: add missing static descriptor and inline keyword · 1d7b2ce4
      Wei Fang authored
      Fix the build warnings when CONFIG_FSL_ENETC_MDIO is not enabled.
      The detailed warnings are shown as follows.
      
      include/linux/fsl/enetc_mdio.h:62:18: warning: no previous prototype for function 'enetc_hw_alloc' [-Wmissing-prototypes]
            62 | struct enetc_hw *enetc_hw_alloc(struct device *dev, void __iomem *port_regs)
               |                  ^
      include/linux/fsl/enetc_mdio.h:62:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
            62 | struct enetc_hw *enetc_hw_alloc(struct device *dev, void __iomem *port_regs)
               | ^
               | static
      8 warnings generated.
      
      Fixes: 6517798d ("enetc: Make MDIO accessors more generic and export to include/linux/fsl")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202410102136.jQHZOcS4-lkp@intel.com/Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Reviewed-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://patch.msgid.link/20241011030103.392362-1-wei.fang@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1d7b2ce4
    • Jakub Kicinski's avatar
      Merge branch 'net-enetc-fix-some-issues-of-xdp' · 0af8c8ae
      Jakub Kicinski authored
      Wei Fang says:
      
      ====================
      net: enetc: fix some issues of XDP
      
      We found some bugs when testing the XDP function of enetc driver,
      and these bugs are easy to reproduce. This is not only causes XDP
      to not work, but also the network cannot be restored after exiting
      the XDP program. So the patch set is mainly to fix these bugs. For
      details, please see the commit message of each patch.
      
      v1: https://lore.kernel.org/bpf/20240919084104.661180-1-wei.fang@nxp.com/
      v2: https://lore.kernel.org/netdev/20241008224806.2onzkt3gbslw5jxb@skbuf/
      v3: https://lore.kernel.org/imx/20241009090327.146461-1-wei.fang@nxp.com/
      ====================
      
      Link: https://patch.msgid.link/20241010092056.298128-1-wei.fang@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0af8c8ae
    • Wei Fang's avatar
      net: enetc: disable NAPI after all rings are disabled · 6b58fadd
      Wei Fang authored
      When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC
      on LS1028A, it was found that if the command was re-run multiple times,
      Rx could not receive the frames, and the result of xdp-bench showed
      that the rx rate was 0.
      
      root@ls1028ardb:~# ./xdp-bench tx eno0
      Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc)
      Summary                      2046 rx/s                  0 err,drop/s
      Summary                         0 rx/s                  0 err,drop/s
      Summary                         0 rx/s                  0 err,drop/s
      Summary                         0 rx/s                  0 err,drop/s
      
      By observing the Rx PIR and CIR registers, CIR is always 0x7FF and
      PIR is always 0x7FE, which means that the Rx ring is full and can no
      longer accommodate other Rx frames. Therefore, the problem is caused
      by the Rx BD ring not being cleaned up.
      
      Further analysis of the code revealed that the Rx BD ring will only
      be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met.
      Therefore, some debug logs were added to the driver and the current
      values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx
      BD ring was full. The logs are as follows.
      
      [  178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140
      [  178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110
      [  178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110
      
      From the results, the max value of xdp_tx_in_flight has reached 2140.
      However, the size of the Rx BD ring is only 2048. So xdp_tx_in_flight
      did not drop to 0 after enetc_stop() is called and the driver does not
      clear it. The root cause is that NAPI is disabled too aggressively,
      without having waited for the pending XDP_TX frames to be transmitted,
      and their buffers recycled, so that xdp_tx_in_flight cannot naturally
      drop to 0. Later, enetc_free_tx_ring() does free those stale, unsent
      XDP_TX packets, but it is not coded up to also reset xdp_tx_in_flight,
      hence the manifestation of the bug.
      
      One option would be to cover this extra condition in enetc_free_tx_ring(),
      but now that the ENETC_TX_DOWN exists, we have created a window at
      the beginning of enetc_stop() where NAPI can still be scheduled, but
      any concurrent enqueue will be blocked. Therefore, enetc_wait_bdrs()
      and enetc_disable_tx_bdrs() can be called with NAPI still scheduled,
      and it is guaranteed that this will not wait indefinitely, but instead
      give us an indication that the pending TX frames have orderly dropped
      to zero. Only then should we call napi_disable().
      
      This way, enetc_free_tx_ring() becomes entirely redundant and can be
      dropped as part of subsequent cleanup.
      
      The change also refactors enetc_start() so that it looks like the
      mirror opposite procedure of enetc_stop().
      
      Fixes: ff58fda0 ("net: enetc: prioritize ability to go down over packet processing")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://patch.msgid.link/20241010092056.298128-5-wei.fang@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6b58fadd
    • Wei Fang's avatar
      net: enetc: disable Tx BD rings after they are empty · 0a93f2ca
      Wei Fang authored
      The Tx BD rings are disabled first in enetc_stop() and the driver
      waits for them to become empty. This operation is not safe while
      the ring is actively transmitting frames, and will cause the ring
      to not be empty and hardware exception. As described in the NETC
      block guide, software should only disable an active Tx ring after
      all pending ring entries have been consumed (i.e. when PI = CI).
      Disabling a transmit ring that is actively processing BDs risks
      a HW-SW race hazard whereby a hardware resource becomes assigned
      to work on one or more ring entries only to have those entries be
      removed due to the ring becoming disabled.
      
      When testing XDP_REDIRECT feautre, although all frames were blocked
      from being put into Tx rings during ring reconfiguration, the similar
      warning log was still encountered:
      
      fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear
      fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear
      
      The reason is that when there are still unsent frames in the Tx ring,
      disabling the Tx ring causes the remaining frames to be unable to be
      sent out. And the Tx ring cannot be restored, which means that even
      if the xdp program is uninstalled, the Tx frames cannot be sent out
      anymore. Therefore, correct the operation order in enect_start() and
      enect_stop().
      
      Fixes: ff58fda0 ("net: enetc: prioritize ability to go down over packet processing")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://patch.msgid.link/20241010092056.298128-4-wei.fang@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0a93f2ca
    • Wei Fang's avatar
      net: enetc: block concurrent XDP transmissions during ring reconfiguration · c728a95c
      Wei Fang authored
      When testing the XDP_REDIRECT function on the LS1028A platform, we
      found a very reproducible issue that the Tx frames can no longer be
      sent out even if XDP_REDIRECT is turned off. Specifically, if there
      is a lot of traffic on Rx direction, when XDP_REDIRECT is turned on,
      the console may display some warnings like "timeout for tx ring #6
      clear", and all redirected frames will be dropped, the detailed log
      is as follows.
      
      root@ls1028ardb:~# ./xdp-bench redirect eno0 eno2
      Redirecting from eno0 (ifindex 3; driver fsl_enetc) to eno2 (ifindex 4; driver fsl_enetc)
      [203.849809] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #5 clear
      [204.006051] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear
      [204.161944] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear
      eno0->eno2     1420505 rx/s       1420590 err,drop/s      0 xmit/s
        xmit eno0->eno2    0 xmit/s     1420590 drop/s     0 drv_err/s     15.71 bulk-avg
      eno0->eno2     1420484 rx/s       1420485 err,drop/s      0 xmit/s
        xmit eno0->eno2    0 xmit/s     1420485 drop/s     0 drv_err/s     15.71 bulk-avg
      
      By analyzing the XDP_REDIRECT implementation of enetc driver, the
      driver will reconfigure Tx and Rx BD rings when a bpf program is
      installed or uninstalled, but there is no mechanisms to block the
      redirected frames when enetc driver reconfigures rings. Similarly,
      XDP_TX verdicts on received frames can also lead to frames being
      enqueued in the Tx rings. Because XDP ignores the state set by the
      netif_tx_wake_queue() API, so introduce the ENETC_TX_DOWN flag to
      suppress transmission of XDP frames.
      
      Fixes: c33bfaf9 ("net: enetc: set up XDP program under enetc_reconfigure()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://patch.msgid.link/20241010092056.298128-3-wei.fang@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c728a95c
    • Wei Fang's avatar
      net: enetc: remove xdp_drops statistic from enetc_xdp_drop() · 412950d5
      Wei Fang authored
      The xdp_drops statistic indicates the number of XDP frames dropped in
      the Rx direction. However, enetc_xdp_drop() is also used in XDP_TX and
      XDP_REDIRECT actions. If frame loss occurs in these two actions, the
      frames loss count should not be included in xdp_drops, because there
      are already xdp_tx_drops and xdp_redirect_failures to count the frame
      loss of these two actions, so it's better to remove xdp_drops statistic
      from enetc_xdp_drop() and increase xdp_drops in XDP_DROP action.
      
      Fixes: 7ed2bc80 ("net: enetc: add support for XDP_TX")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://patch.msgid.link/20241010092056.298128-2-wei.fang@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      412950d5
    • Daniel Machon's avatar
      net: sparx5: fix source port register when mirroring · 8a6be4bd
      Daniel Machon authored
      When port mirroring is added to a port, the bit position of the source
      port, needs to be written to the register ANA_AC_PROBE_PORT_CFG.  This
      register is replicated for n_ports > 32, and therefore we need to derive
      the correct register from the port number.
      
      Before this patch, we wrongly calculate the register from portno /
      BITS_PER_BYTE, where the divisor ought to be 32, causing any port >=8 to
      be written to the wrong register. We fix this, by using do_div(), where
      the dividend is the register, the remainder is the bit position and the
      divisor is now 32.
      
      Fixes: 4e50d72b ("net: sparx5: add port mirroring implementation")
      Signed-off-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20241009-mirroring-fix-v1-1-9ec962301989@microchip.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8a6be4bd
    • Xin Long's avatar
      ipv4: give an IPv4 dev to blackhole_netdev · 22600596
      Xin Long authored
      After commit 8d7017fd ("blackhole_netdev: use blackhole_netdev to
      invalidate dst entries"), blackhole_netdev was introduced to invalidate
      dst cache entries on the TX path whenever the cache times out or is
      flushed.
      
      When two UDP sockets (sk1 and sk2) send messages to the same destination
      simultaneously, they are using the same dst cache. If the dst cache is
      invalidated on one path (sk2) while the other (sk1) is still transmitting,
      sk1 may try to use the invalid dst entry.
      
               CPU1                   CPU2
      
            udp_sendmsg(sk1)       udp_sendmsg(sk2)
            udp_send_skb()
            ip_output()
                                                   <--- dst timeout or flushed
                                   dst_dev_put()
            ip_finish_output2()
            ip_neigh_for_gw()
      
      This results in a scenario where ip_neigh_for_gw() returns -EINVAL because
      blackhole_dev lacks an in_dev, which is needed to initialize the neigh in
      arp_constructor(). This error is then propagated back to userspace,
      breaking the UDP application.
      
      The patch fixes this issue by assigning an in_dev to blackhole_dev for
      IPv4, similar to what was done for IPv6 in commit e5f80fcf ("ipv6:
      give an IPv6 dev to blackhole_netdev"). This ensures that even when the
      dst entry is invalidated with blackhole_dev, it will not fail to create
      the neigh entry.
      
      As devinet_init() is called ealier than blackhole_netdev_init() in system
      booting, it can not assign the in_dev to blackhole_dev in devinet_init().
      As Paolo suggested, add a separate late_initcall() in devinet.c to ensure
      inet_blackhole_dev_init() is called after blackhole_netdev_init().
      
      Fixes: 8d7017fd ("blackhole_netdev: use blackhole_netdev to invalidate dst entries")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/3000792d45ca44e16c785ebe2b092e610e5b3df1.1728499633.git.lucien.xin@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      22600596
  5. 10 Oct, 2024 2 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 1d227fcc
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bluetooth and netfilter.
      
        Current release - regressions:
      
         - dsa: sja1105: fix reception from VLAN-unaware bridges
      
         - Revert "net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is
           enabled"
      
         - eth: fec: don't save PTP state if PTP is unsupported
      
        Current release - new code bugs:
      
         - smc: fix lack of icsk_syn_mss with IPPROTO_SMC, prevent null-deref
      
         - eth: airoha: update Tx CPU DMA ring idx at the end of xmit loop
      
         - phy: aquantia: AQR115c fix up PMA capabilities
      
        Previous releases - regressions:
      
         - tcp: 3 fixes for retrans_stamp and undo logic
      
        Previous releases - always broken:
      
         - net: do not delay dst_entries_add() in dst_release()
      
         - netfilter: restrict xtables extensions to families that are safe,
           syzbot found a way to combine ebtables with extensions that are
           never used by userspace tools
      
         - sctp: ensure sk_state is set to CLOSED if hashing fails in
           sctp_listen_start
      
         - mptcp: handle consistently DSS corruption, and prevent corruption
           due to large pmtu xmit"
      
      * tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
        MAINTAINERS: Add headers and mailing list to UDP section
        MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL]
        slip: make slhc_remember() more robust against malicious packets
        net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC
        ppp: fix ppp_async_encode() illegal access
        docs: netdev: document guidance on cleanup patches
        phonet: Handle error of rtnl_register_module().
        mpls: Handle error of rtnl_register_module().
        mctp: Handle error of rtnl_register_module().
        bridge: Handle error of rtnl_register_module().
        vxlan: Handle error of rtnl_register_module().
        rtnetlink: Add bulk registration helpers for rtnetlink message handlers.
        net: do not delay dst_entries_add() in dst_release()
        mptcp: pm: do not remove closing subflows
        mptcp: fallback when MPTCP opts are dropped after 1st data
        tcp: fix mptcp DSS corruption due to large pmtu xmit
        mptcp: handle consistently DSS corruption
        net: netconsole: fix wrong warning
        net: dsa: refuse cross-chip mirroring operations
        net: fec: don't save PTP state if PTP is unsupported
        ...
      1d227fcc
    • Linus Torvalds's avatar
      Merge tag 'trace-ringbuffer-v6.12-rc2' of... · 0edab8d1
      Linus Torvalds authored
      Merge tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull tracing fix from Steven Rostedt:
       "Ring-buffer fix: do not have boot-mapped buffers use CPU hotplug
        callbacks
      
        When a ring buffer is mapped to memory assigned at boot, it also
        splits it up evenly between the possible CPUs. But the allocation code
        still attached a CPU notifier callback to this ring buffer. When a CPU
        is added, the callback will happen and another per-cpu buffer is
        created for the ring buffer.
      
        But for boot mapped buffers, there is no room to add another one (as
        they were all created already). The result of calling the CPU hotplug
        notifier on a boot mapped ring buffer is unpredictable and could lead
        to a system crash.
      
        If the ring buffer is boot mapped simply do not attach the CPU
        notifier to it"
      
      * tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        ring-buffer: Do not have boot mapped buffers hook to CPU hotplug
      0edab8d1