1. 16 Oct, 2024 4 commits
    • Eric Dumazet's avatar
      genetlink: hold RCU in genlmsg_mcast() · 56440d7e
      Eric Dumazet authored
      While running net selftests with CONFIG_PROVE_RCU_LIST=y I saw
      one lockdep splat [1].
      
      genlmsg_mcast() uses for_each_net_rcu(), and must therefore hold RCU.
      
      Instead of letting all callers guard genlmsg_multicast_allns()
      with a rcu_read_lock()/rcu_read_unlock() pair, do it in genlmsg_mcast().
      
      This also means the @flags parameter is useless, we need to always use
      GFP_ATOMIC.
      
      [1]
      [10882.424136] =============================
      [10882.424166] WARNING: suspicious RCU usage
      [10882.424309] 6.12.0-rc2-virtme #1156 Not tainted
      [10882.424400] -----------------------------
      [10882.424423] net/netlink/genetlink.c:1940 RCU-list traversed in non-reader section!!
      [10882.424469]
      other info that might help us debug this:
      
      [10882.424500]
      rcu_scheduler_active = 2, debug_locks = 1
      [10882.424744] 2 locks held by ip/15677:
      [10882.424791] #0: ffffffffb6b491b0 (cb_lock){++++}-{3:3}, at: genl_rcv (net/netlink/genetlink.c:1219)
      [10882.426334] #1: ffffffffb6b49248 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg (net/netlink/genetlink.c:61 net/netlink/genetlink.c:57 net/netlink/genetlink.c:1209)
      [10882.426465]
      stack backtrace:
      [10882.426805] CPU: 14 UID: 0 PID: 15677 Comm: ip Not tainted 6.12.0-rc2-virtme #1156
      [10882.426919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
      [10882.427046] Call Trace:
      [10882.427131]  <TASK>
      [10882.427244] dump_stack_lvl (lib/dump_stack.c:123)
      [10882.427335] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822)
      [10882.427387] genlmsg_multicast_allns (net/netlink/genetlink.c:1940 (discriminator 7) net/netlink/genetlink.c:1977 (discriminator 7))
      [10882.427436] l2tp_tunnel_notify.constprop.0 (net/l2tp/l2tp_netlink.c:119) l2tp_netlink
      [10882.427683] l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:253) l2tp_netlink
      [10882.427748] genl_family_rcv_msg_doit (net/netlink/genetlink.c:1115)
      [10882.427834] genl_rcv_msg (net/netlink/genetlink.c:1195 net/netlink/genetlink.c:1210)
      [10882.427877] ? __pfx_l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:186) l2tp_netlink
      [10882.427927] ? __pfx_genl_rcv_msg (net/netlink/genetlink.c:1201)
      [10882.427959] netlink_rcv_skb (net/netlink/af_netlink.c:2551)
      [10882.428069] genl_rcv (net/netlink/genetlink.c:1220)
      [10882.428095] netlink_unicast (net/netlink/af_netlink.c:1332 net/netlink/af_netlink.c:1357)
      [10882.428140] netlink_sendmsg (net/netlink/af_netlink.c:1901)
      [10882.428210] ____sys_sendmsg (net/socket.c:729 (discriminator 1) net/socket.c:744 (discriminator 1) net/socket.c:2607 (discriminator 1))
      
      Fixes: 33f72e6f ("l2tp : multicast notification to the registered listeners")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: James Chapman <jchapman@katalix.com>
      Cc: Tom Parkin <tparkin@katalix.com>
      Cc: Johannes Berg <johannes.berg@intel.com>
      Link: https://patch.msgid.link/20241011171217.3166614-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      56440d7e
    • Peter Rashleigh's avatar
      net: dsa: mv88e6xxx: Fix the max_vid definition for the MV88E6361 · 1833d8a2
      Peter Rashleigh authored
      According to the Marvell datasheet the 88E6361 has two VTU pages
      (4k VIDs per page) so the max_vid should be 8191, not 4095.
      
      In the current implementation mv88e6xxx_vtu_walk() gives unexpected
      results because of this error. I verified that mv88e6xxx_vtu_walk()
      works correctly on the MV88E6361 with this patch in place.
      
      Fixes: 12899f29 ("net: dsa: mv88e6xxx: enable support for 88E6361 switch")
      Signed-off-by: default avatarPeter Rashleigh <peter@rashleigh.ca>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://patch.msgid.link/20241014204342.5852-1-peter@rashleigh.caSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1833d8a2
    • Kuniyuki Iwashima's avatar
      tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink(). · e8c526f2
      Kuniyuki Iwashima authored
      Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler().
      
        """
        We are seeing a use-after-free from a bpf prog attached to
        trace_tcp_retransmit_synack. The program passes the req->sk to the
        bpf_sk_storage_get_tracing kernel helper which does check for null
        before using it.
        """
      
      The commit 83fccfc3 ("inet: fix potential deadlock in
      reqsk_queue_unlink()") added timer_pending() in reqsk_queue_unlink() not
      to call del_timer_sync() from reqsk_timer_handler(), but it introduced a
      small race window.
      
      Before the timer is called, expire_timers() calls detach_timer(timer, true)
      to clear timer->entry.pprev and marks it as not pending.
      
      If reqsk_queue_unlink() checks timer_pending() just after expire_timers()
      calls detach_timer(), TCP will miss del_timer_sync(); the reqsk timer will
      continue running and send multiple SYN+ACKs until it expires.
      
      The reported UAF could happen if req->sk is close()d earlier than the timer
      expiration, which is 63s by default.
      
      The scenario would be
      
        1. inet_csk_complete_hashdance() calls inet_csk_reqsk_queue_drop(),
           but del_timer_sync() is missed
      
        2. reqsk timer is executed and scheduled again
      
        3. req->sk is accept()ed and reqsk_put() decrements rsk_refcnt, but
           reqsk timer still has another one, and inet_csk_accept() does not
           clear req->sk for non-TFO sockets
      
        4. sk is close()d
      
        5. reqsk timer is executed again, and BPF touches req->sk
      
      Let's not use timer_pending() by passing the caller context to
      __inet_csk_reqsk_queue_drop().
      
      Note that reqsk timer is pinned, so the issue does not happen in most
      use cases. [1]
      
      [0]
      BUG: KFENCE: use-after-free read in bpf_sk_storage_get_tracing+0x2e/0x1b0
      
      Use-after-free read at 0x00000000a891fb3a (in kfence-#1):
      bpf_sk_storage_get_tracing+0x2e/0x1b0
      bpf_prog_5ea3e95db6da0438_tcp_retransmit_synack+0x1d20/0x1dda
      bpf_trace_run2+0x4c/0xc0
      tcp_rtx_synack+0xf9/0x100
      reqsk_timer_handler+0xda/0x3d0
      run_timer_softirq+0x292/0x8a0
      irq_exit_rcu+0xf5/0x320
      sysvec_apic_timer_interrupt+0x6d/0x80
      asm_sysvec_apic_timer_interrupt+0x16/0x20
      intel_idle_irq+0x5a/0xa0
      cpuidle_enter_state+0x94/0x273
      cpu_startup_entry+0x15e/0x260
      start_secondary+0x8a/0x90
      secondary_startup_64_no_verify+0xfa/0xfb
      
      kfence-#1: 0x00000000a72cc7b6-0x00000000d97616d9, size=2376, cache=TCPv6
      
      allocated by task 0 on cpu 9 at 260507.901592s:
      sk_prot_alloc+0x35/0x140
      sk_clone_lock+0x1f/0x3f0
      inet_csk_clone_lock+0x15/0x160
      tcp_create_openreq_child+0x1f/0x410
      tcp_v6_syn_recv_sock+0x1da/0x700
      tcp_check_req+0x1fb/0x510
      tcp_v6_rcv+0x98b/0x1420
      ipv6_list_rcv+0x2258/0x26e0
      napi_complete_done+0x5b1/0x2990
      mlx5e_napi_poll+0x2ae/0x8d0
      net_rx_action+0x13e/0x590
      irq_exit_rcu+0xf5/0x320
      common_interrupt+0x80/0x90
      asm_common_interrupt+0x22/0x40
      cpuidle_enter_state+0xfb/0x273
      cpu_startup_entry+0x15e/0x260
      start_secondary+0x8a/0x90
      secondary_startup_64_no_verify+0xfa/0xfb
      
      freed by task 0 on cpu 9 at 260507.927527s:
      rcu_core_si+0x4ff/0xf10
      irq_exit_rcu+0xf5/0x320
      sysvec_apic_timer_interrupt+0x6d/0x80
      asm_sysvec_apic_timer_interrupt+0x16/0x20
      cpuidle_enter_state+0xfb/0x273
      cpu_startup_entry+0x15e/0x260
      start_secondary+0x8a/0x90
      secondary_startup_64_no_verify+0xfa/0xfb
      
      Fixes: 83fccfc3 ("inet: fix potential deadlock in reqsk_queue_unlink()")
      Reported-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Closes: https://lore.kernel.org/netdev/eb6684d0-ffd9-4bdc-9196-33f690c25824@linux.dev/
      Link: https://lore.kernel.org/netdev/b55e2ca0-42f2-4b7c-b445-6ffd87ca74a0@linux.dev/ [1]
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Link: https://patch.msgid.link/20241014223312.4254-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e8c526f2
    • Wang Hai's avatar
      net: bcmasp: fix potential memory leak in bcmasp_xmit() · fed07d3e
      Wang Hai authored
      The bcmasp_xmit() returns NETDEV_TX_OK without freeing skb
      in case of mapping fails, add dev_kfree_skb() to fix it.
      
      Fixes: 490cb412 ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
      Signed-off-by: default avatarWang Hai <wanghai38@huawei.com>
      Acked-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Link: https://patch.msgid.link/20241014145901.48940-1-wanghai38@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fed07d3e
  2. 15 Oct, 2024 18 commits
  3. 14 Oct, 2024 1 commit
  4. 11 Oct, 2024 10 commits
    • Alessandro Zanni's avatar
      selftests: drivers: net: fix name not defined · 174714f0
      Alessandro Zanni authored
      This fix solves this error, when calling kselftest with targets
      "drivers/net":
      
      File "tools/testing/selftests/net/lib/py/nsim.py", line 64, in __init__
        if e.errno == errno.ENOSPC:
      NameError: name 'errno' is not defined
      
      The error was found by running tests manually with the command:
      make kselftest TARGETS="drivers/net"
      
      The module errno makes available standard error system symbols.
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarAlessandro Zanni <alessandro.zanni87@gmail.com>
      Link: https://patch.msgid.link/20241010183034.24739-1-alessandro.zanni87@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      174714f0
    • Alessandro Zanni's avatar
      selftests: net/rds: add module not found · 6ea8a1c2
      Alessandro Zanni authored
      This fix solves this error, when calling kselftest with targets "net/rds":
      
      The error was found by running tests manually with the command:
      make kselftest TARGETS="net/rds"
      
      The patch also specifies to import ip() function from the utils module.
      Signed-off-by: default avatarAlessandro Zanni <alessandro.zanni87@gmail.com>
      Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      Link: https://patch.msgid.link/20241010194421.48198-1-alessandro.zanni87@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6ea8a1c2
    • Wei Fang's avatar
      net: enetc: add missing static descriptor and inline keyword · 1d7b2ce4
      Wei Fang authored
      Fix the build warnings when CONFIG_FSL_ENETC_MDIO is not enabled.
      The detailed warnings are shown as follows.
      
      include/linux/fsl/enetc_mdio.h:62:18: warning: no previous prototype for function 'enetc_hw_alloc' [-Wmissing-prototypes]
            62 | struct enetc_hw *enetc_hw_alloc(struct device *dev, void __iomem *port_regs)
               |                  ^
      include/linux/fsl/enetc_mdio.h:62:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
            62 | struct enetc_hw *enetc_hw_alloc(struct device *dev, void __iomem *port_regs)
               | ^
               | static
      8 warnings generated.
      
      Fixes: 6517798d ("enetc: Make MDIO accessors more generic and export to include/linux/fsl")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202410102136.jQHZOcS4-lkp@intel.com/Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Reviewed-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://patch.msgid.link/20241011030103.392362-1-wei.fang@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1d7b2ce4
    • Jakub Kicinski's avatar
      Merge branch 'net-enetc-fix-some-issues-of-xdp' · 0af8c8ae
      Jakub Kicinski authored
      Wei Fang says:
      
      ====================
      net: enetc: fix some issues of XDP
      
      We found some bugs when testing the XDP function of enetc driver,
      and these bugs are easy to reproduce. This is not only causes XDP
      to not work, but also the network cannot be restored after exiting
      the XDP program. So the patch set is mainly to fix these bugs. For
      details, please see the commit message of each patch.
      
      v1: https://lore.kernel.org/bpf/20240919084104.661180-1-wei.fang@nxp.com/
      v2: https://lore.kernel.org/netdev/20241008224806.2onzkt3gbslw5jxb@skbuf/
      v3: https://lore.kernel.org/imx/20241009090327.146461-1-wei.fang@nxp.com/
      ====================
      
      Link: https://patch.msgid.link/20241010092056.298128-1-wei.fang@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0af8c8ae
    • Wei Fang's avatar
      net: enetc: disable NAPI after all rings are disabled · 6b58fadd
      Wei Fang authored
      When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC
      on LS1028A, it was found that if the command was re-run multiple times,
      Rx could not receive the frames, and the result of xdp-bench showed
      that the rx rate was 0.
      
      root@ls1028ardb:~# ./xdp-bench tx eno0
      Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc)
      Summary                      2046 rx/s                  0 err,drop/s
      Summary                         0 rx/s                  0 err,drop/s
      Summary                         0 rx/s                  0 err,drop/s
      Summary                         0 rx/s                  0 err,drop/s
      
      By observing the Rx PIR and CIR registers, CIR is always 0x7FF and
      PIR is always 0x7FE, which means that the Rx ring is full and can no
      longer accommodate other Rx frames. Therefore, the problem is caused
      by the Rx BD ring not being cleaned up.
      
      Further analysis of the code revealed that the Rx BD ring will only
      be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met.
      Therefore, some debug logs were added to the driver and the current
      values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx
      BD ring was full. The logs are as follows.
      
      [  178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140
      [  178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110
      [  178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110
      
      From the results, the max value of xdp_tx_in_flight has reached 2140.
      However, the size of the Rx BD ring is only 2048. So xdp_tx_in_flight
      did not drop to 0 after enetc_stop() is called and the driver does not
      clear it. The root cause is that NAPI is disabled too aggressively,
      without having waited for the pending XDP_TX frames to be transmitted,
      and their buffers recycled, so that xdp_tx_in_flight cannot naturally
      drop to 0. Later, enetc_free_tx_ring() does free those stale, unsent
      XDP_TX packets, but it is not coded up to also reset xdp_tx_in_flight,
      hence the manifestation of the bug.
      
      One option would be to cover this extra condition in enetc_free_tx_ring(),
      but now that the ENETC_TX_DOWN exists, we have created a window at
      the beginning of enetc_stop() where NAPI can still be scheduled, but
      any concurrent enqueue will be blocked. Therefore, enetc_wait_bdrs()
      and enetc_disable_tx_bdrs() can be called with NAPI still scheduled,
      and it is guaranteed that this will not wait indefinitely, but instead
      give us an indication that the pending TX frames have orderly dropped
      to zero. Only then should we call napi_disable().
      
      This way, enetc_free_tx_ring() becomes entirely redundant and can be
      dropped as part of subsequent cleanup.
      
      The change also refactors enetc_start() so that it looks like the
      mirror opposite procedure of enetc_stop().
      
      Fixes: ff58fda0 ("net: enetc: prioritize ability to go down over packet processing")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://patch.msgid.link/20241010092056.298128-5-wei.fang@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6b58fadd
    • Wei Fang's avatar
      net: enetc: disable Tx BD rings after they are empty · 0a93f2ca
      Wei Fang authored
      The Tx BD rings are disabled first in enetc_stop() and the driver
      waits for them to become empty. This operation is not safe while
      the ring is actively transmitting frames, and will cause the ring
      to not be empty and hardware exception. As described in the NETC
      block guide, software should only disable an active Tx ring after
      all pending ring entries have been consumed (i.e. when PI = CI).
      Disabling a transmit ring that is actively processing BDs risks
      a HW-SW race hazard whereby a hardware resource becomes assigned
      to work on one or more ring entries only to have those entries be
      removed due to the ring becoming disabled.
      
      When testing XDP_REDIRECT feautre, although all frames were blocked
      from being put into Tx rings during ring reconfiguration, the similar
      warning log was still encountered:
      
      fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear
      fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear
      
      The reason is that when there are still unsent frames in the Tx ring,
      disabling the Tx ring causes the remaining frames to be unable to be
      sent out. And the Tx ring cannot be restored, which means that even
      if the xdp program is uninstalled, the Tx frames cannot be sent out
      anymore. Therefore, correct the operation order in enect_start() and
      enect_stop().
      
      Fixes: ff58fda0 ("net: enetc: prioritize ability to go down over packet processing")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://patch.msgid.link/20241010092056.298128-4-wei.fang@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0a93f2ca
    • Wei Fang's avatar
      net: enetc: block concurrent XDP transmissions during ring reconfiguration · c728a95c
      Wei Fang authored
      When testing the XDP_REDIRECT function on the LS1028A platform, we
      found a very reproducible issue that the Tx frames can no longer be
      sent out even if XDP_REDIRECT is turned off. Specifically, if there
      is a lot of traffic on Rx direction, when XDP_REDIRECT is turned on,
      the console may display some warnings like "timeout for tx ring #6
      clear", and all redirected frames will be dropped, the detailed log
      is as follows.
      
      root@ls1028ardb:~# ./xdp-bench redirect eno0 eno2
      Redirecting from eno0 (ifindex 3; driver fsl_enetc) to eno2 (ifindex 4; driver fsl_enetc)
      [203.849809] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #5 clear
      [204.006051] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear
      [204.161944] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear
      eno0->eno2     1420505 rx/s       1420590 err,drop/s      0 xmit/s
        xmit eno0->eno2    0 xmit/s     1420590 drop/s     0 drv_err/s     15.71 bulk-avg
      eno0->eno2     1420484 rx/s       1420485 err,drop/s      0 xmit/s
        xmit eno0->eno2    0 xmit/s     1420485 drop/s     0 drv_err/s     15.71 bulk-avg
      
      By analyzing the XDP_REDIRECT implementation of enetc driver, the
      driver will reconfigure Tx and Rx BD rings when a bpf program is
      installed or uninstalled, but there is no mechanisms to block the
      redirected frames when enetc driver reconfigures rings. Similarly,
      XDP_TX verdicts on received frames can also lead to frames being
      enqueued in the Tx rings. Because XDP ignores the state set by the
      netif_tx_wake_queue() API, so introduce the ENETC_TX_DOWN flag to
      suppress transmission of XDP frames.
      
      Fixes: c33bfaf9 ("net: enetc: set up XDP program under enetc_reconfigure()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://patch.msgid.link/20241010092056.298128-3-wei.fang@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c728a95c
    • Wei Fang's avatar
      net: enetc: remove xdp_drops statistic from enetc_xdp_drop() · 412950d5
      Wei Fang authored
      The xdp_drops statistic indicates the number of XDP frames dropped in
      the Rx direction. However, enetc_xdp_drop() is also used in XDP_TX and
      XDP_REDIRECT actions. If frame loss occurs in these two actions, the
      frames loss count should not be included in xdp_drops, because there
      are already xdp_tx_drops and xdp_redirect_failures to count the frame
      loss of these two actions, so it's better to remove xdp_drops statistic
      from enetc_xdp_drop() and increase xdp_drops in XDP_DROP action.
      
      Fixes: 7ed2bc80 ("net: enetc: add support for XDP_TX")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://patch.msgid.link/20241010092056.298128-2-wei.fang@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      412950d5
    • Daniel Machon's avatar
      net: sparx5: fix source port register when mirroring · 8a6be4bd
      Daniel Machon authored
      When port mirroring is added to a port, the bit position of the source
      port, needs to be written to the register ANA_AC_PROBE_PORT_CFG.  This
      register is replicated for n_ports > 32, and therefore we need to derive
      the correct register from the port number.
      
      Before this patch, we wrongly calculate the register from portno /
      BITS_PER_BYTE, where the divisor ought to be 32, causing any port >=8 to
      be written to the wrong register. We fix this, by using do_div(), where
      the dividend is the register, the remainder is the bit position and the
      divisor is now 32.
      
      Fixes: 4e50d72b ("net: sparx5: add port mirroring implementation")
      Signed-off-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20241009-mirroring-fix-v1-1-9ec962301989@microchip.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8a6be4bd
    • Xin Long's avatar
      ipv4: give an IPv4 dev to blackhole_netdev · 22600596
      Xin Long authored
      After commit 8d7017fd ("blackhole_netdev: use blackhole_netdev to
      invalidate dst entries"), blackhole_netdev was introduced to invalidate
      dst cache entries on the TX path whenever the cache times out or is
      flushed.
      
      When two UDP sockets (sk1 and sk2) send messages to the same destination
      simultaneously, they are using the same dst cache. If the dst cache is
      invalidated on one path (sk2) while the other (sk1) is still transmitting,
      sk1 may try to use the invalid dst entry.
      
               CPU1                   CPU2
      
            udp_sendmsg(sk1)       udp_sendmsg(sk2)
            udp_send_skb()
            ip_output()
                                                   <--- dst timeout or flushed
                                   dst_dev_put()
            ip_finish_output2()
            ip_neigh_for_gw()
      
      This results in a scenario where ip_neigh_for_gw() returns -EINVAL because
      blackhole_dev lacks an in_dev, which is needed to initialize the neigh in
      arp_constructor(). This error is then propagated back to userspace,
      breaking the UDP application.
      
      The patch fixes this issue by assigning an in_dev to blackhole_dev for
      IPv4, similar to what was done for IPv6 in commit e5f80fcf ("ipv6:
      give an IPv6 dev to blackhole_netdev"). This ensures that even when the
      dst entry is invalidated with blackhole_dev, it will not fail to create
      the neigh entry.
      
      As devinet_init() is called ealier than blackhole_netdev_init() in system
      booting, it can not assign the in_dev to blackhole_dev in devinet_init().
      As Paolo suggested, add a separate late_initcall() in devinet.c to ensure
      inet_blackhole_dev_init() is called after blackhole_netdev_init().
      
      Fixes: 8d7017fd ("blackhole_netdev: use blackhole_netdev to invalidate dst entries")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/3000792d45ca44e16c785ebe2b092e610e5b3df1.1728499633.git.lucien.xin@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      22600596
  5. 10 Oct, 2024 7 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 1d227fcc
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bluetooth and netfilter.
      
        Current release - regressions:
      
         - dsa: sja1105: fix reception from VLAN-unaware bridges
      
         - Revert "net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is
           enabled"
      
         - eth: fec: don't save PTP state if PTP is unsupported
      
        Current release - new code bugs:
      
         - smc: fix lack of icsk_syn_mss with IPPROTO_SMC, prevent null-deref
      
         - eth: airoha: update Tx CPU DMA ring idx at the end of xmit loop
      
         - phy: aquantia: AQR115c fix up PMA capabilities
      
        Previous releases - regressions:
      
         - tcp: 3 fixes for retrans_stamp and undo logic
      
        Previous releases - always broken:
      
         - net: do not delay dst_entries_add() in dst_release()
      
         - netfilter: restrict xtables extensions to families that are safe,
           syzbot found a way to combine ebtables with extensions that are
           never used by userspace tools
      
         - sctp: ensure sk_state is set to CLOSED if hashing fails in
           sctp_listen_start
      
         - mptcp: handle consistently DSS corruption, and prevent corruption
           due to large pmtu xmit"
      
      * tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
        MAINTAINERS: Add headers and mailing list to UDP section
        MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL]
        slip: make slhc_remember() more robust against malicious packets
        net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC
        ppp: fix ppp_async_encode() illegal access
        docs: netdev: document guidance on cleanup patches
        phonet: Handle error of rtnl_register_module().
        mpls: Handle error of rtnl_register_module().
        mctp: Handle error of rtnl_register_module().
        bridge: Handle error of rtnl_register_module().
        vxlan: Handle error of rtnl_register_module().
        rtnetlink: Add bulk registration helpers for rtnetlink message handlers.
        net: do not delay dst_entries_add() in dst_release()
        mptcp: pm: do not remove closing subflows
        mptcp: fallback when MPTCP opts are dropped after 1st data
        tcp: fix mptcp DSS corruption due to large pmtu xmit
        mptcp: handle consistently DSS corruption
        net: netconsole: fix wrong warning
        net: dsa: refuse cross-chip mirroring operations
        net: fec: don't save PTP state if PTP is unsupported
        ...
      1d227fcc
    • Linus Torvalds's avatar
      Merge tag 'trace-ringbuffer-v6.12-rc2' of... · 0edab8d1
      Linus Torvalds authored
      Merge tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull tracing fix from Steven Rostedt:
       "Ring-buffer fix: do not have boot-mapped buffers use CPU hotplug
        callbacks
      
        When a ring buffer is mapped to memory assigned at boot, it also
        splits it up evenly between the possible CPUs. But the allocation code
        still attached a CPU notifier callback to this ring buffer. When a CPU
        is added, the callback will happen and another per-cpu buffer is
        created for the ring buffer.
      
        But for boot mapped buffers, there is no room to add another one (as
        they were all created already). The result of calling the CPU hotplug
        notifier on a boot mapped ring buffer is unpredictable and could lead
        to a system crash.
      
        If the ring buffer is boot mapped simply do not attach the CPU
        notifier to it"
      
      * tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        ring-buffer: Do not have boot mapped buffers hook to CPU hotplug
      0edab8d1
    • Linus Torvalds's avatar
      Merge tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · eb952c47
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
      
       - update fstrim loop and add more cancellation points, fix reported
         delayed or blocked suspend if there's a huge chunk queued
      
       - fix error handling in recent qgroup xarray conversion
      
       - in zoned mode, fix warning printing device path without RCU
         protection
      
       - again fix invalid extent xarray state (6252690f), lost due to
         refactoring
      
      * tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix clear_dirty and writeback ordering in submit_one_sector()
        btrfs: zoned: fix missing RCU locking in error message when loading zone info
        btrfs: fix missing error handling when adding delayed ref with qgroups enabled
        btrfs: add cancellation points to trim loops
        btrfs: split remaining space to discard in chunks
      eb952c47
    • Linus Torvalds's avatar
      Merge tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · 5870963f
      Linus Torvalds authored
      Pull nfsd fixes from Chuck Lever:
      
       - Fix NFSD bring-up / shutdown
      
       - Fix a UAF when releasing a stateid
      
      * tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        nfsd: fix possible badness in FREE_STATEID
        nfsd: nfsd_destroy_serv() must call svc_destroy() even if nfsd_startup_net() failed
        NFSD: Mark filecache "down" if init fails
      5870963f
    • Linus Torvalds's avatar
      Merge tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 825ec756
      Linus Torvalds authored
      Pull xfs fixes from Carlos Maiolino:
      
       - A few small typo fixes
      
       - fstests xfs/538 DEBUG-only fix
      
       - Performance fix on blockgc on COW'ed files, by skipping trims on
         cowblock inodes currently opened for write
      
       - Prevent cowblocks to be freed under dirty pagecache during unshare
      
       - Update MAINTAINERS file to quote the new maintainer
      
      * tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: fix a typo
        xfs: don't free cowblocks from under dirty pagecache on unshare
        xfs: skip background cowblock trims on inodes open for write
        xfs: support lowmode allocations in xfs_bmap_exact_minlen_extent_alloc
        xfs: call xfs_bmap_exact_minlen_extent_alloc from xfs_bmap_btalloc
        xfs: don't ifdef around the exact minlen allocations
        xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocate
        xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addname
        xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split
        xfs: return bool from xfs_attr3_leaf_add
        xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addname
        xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate()
        xfs: scrub: convert comma to semicolon
        xfs: Remove empty declartion in header file
        MAINTAINERS: add Carlos Maiolino as XFS release manager
      825ec756
    • Jakub Kicinski's avatar
      Merge branch 'maintainers-networking-file-coverage-updates' · 7b43ba65
      Jakub Kicinski authored
      Simon Horman says:
      
      ====================
      MAINTAINERS: Networking file coverage updates
      
      The aim of this proposal is to make the handling of some files,
      related to Networking and Wireless, more consistently. It does so by:
      
      1. Adding some more headers to the UDP section, making it consistent
         with the TCP section.
      
      2. Excluding some files relating to Wireless from NETWORKING [GENERAL],
         making their handling consistent with other files related to
         Wireless.
      
      The aim of this is to make things more consistent.  And for MAINTAINERS
      to better reflect the situation on the ground.  I am more than happy to
      be told that the current state of affairs is fine. Or for other ideas to
      be discussed.
      
      v1: https://lore.kernel.org/20241004-maint-net-hdrs-v1-0-41fd555aacc5@kernel.org
      ====================
      
      Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-0-f2c86e7309c8@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7b43ba65
    • Simon Horman's avatar
      MAINTAINERS: Add headers and mailing list to UDP section · 5404b5a2
      Simon Horman authored
      Add netdev mailing list and some more udp.h headers to the UDP section.
      This is now more consistent with the TCP section.
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-2-f2c86e7309c8@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5404b5a2