1. 15 Feb, 2024 18 commits
  2. 14 Feb, 2024 11 commits
    • Felix Fietkau's avatar
      netfilter: nf_tables: fix bidirectional offload regression · 84443741
      Felix Fietkau authored
      Commit 8f84780b ("netfilter: flowtable: allow unidirectional rules")
      made unidirectional flow offload possible, while completely ignoring (and
      breaking) bidirectional flow offload for nftables.
      Add the missing flag that was left out as an exercise for the reader :)
      
      Cc: Vlad Buslov <vladbu@nvidia.com>
      Fixes: 8f84780b ("netfilter: flowtable: allow unidirectional rules")
      Reported-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      84443741
    • Kyle Swenson's avatar
      netfilter: nat: restore default DNAT behavior · 0f1ae282
      Kyle Swenson authored
      When a DNAT rule is configured via iptables with different port ranges,
      
      iptables -t nat -A PREROUTING -p tcp -d 10.0.0.2 -m tcp --dport 32000:32010
      -j DNAT --to-destination 192.168.0.10:21000-21010
      
      we seem to be DNATing to some random port on the LAN side. While this is
      expected if --random is passed to the iptables command, it is not
      expected without passing --random.  The expected behavior (and the
      observed behavior prior to the commit in the "Fixes" tag) is the traffic
      will be DNAT'd to 192.168.0.10:21000 unless there is a tuple collision
      with that destination.  In that case, we expect the traffic to be
      instead DNAT'd to 192.168.0.10:21001, so on so forth until the end of
      the range.
      
      This patch intends to restore the behavior observed prior to the "Fixes"
      tag.
      
      Fixes: 6ed5943f ("netfilter: nat: remove l4 protocol port rovers")
      Signed-off-by: default avatarKyle Swenson <kyle.swenson@est.tech>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      0f1ae282
    • Pablo Neira Ayuso's avatar
      netfilter: nft_set_pipapo: fix missing : in kdoc · f6374a82
      Pablo Neira Ayuso authored
      Add missing : in kdoc field names.
      
      Fixes: 8683f4b9 ("nft_set_pipapo: Prepare for vectorised implementation: helpers")
      Reported-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f6374a82
    • Maxime Jayat's avatar
      can: netlink: Fix TDCO calculation using the old data bittiming · 2aa0a5e6
      Maxime Jayat authored
      The TDCO calculation was done using the currently applied data bittiming,
      instead of the newly computed data bittiming, which means that the TDCO
      had an invalid value unless setting the same data bittiming twice.
      
      Fixes: d99755f7 ("can: netlink: add interface for CAN-FD Transmitter Delay Compensation (TDC)")
      Signed-off-by: default avatarMaxime Jayat <maxime.jayat@mobile-devices.fr>
      Reviewed-by: default avatarVincent Mailhol <mailhol.vincent@wanadoo.fr>
      Link: https://lore.kernel.org/all/40579c18-63c0-43a4-8d4c-f3a6c1c0b417@munic.io
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      2aa0a5e6
    • Oleksij Rempel's avatar
      can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER) · efe7cf82
      Oleksij Rempel authored
      Lock jsk->sk to prevent UAF when setsockopt(..., SO_J1939_FILTER, ...)
      modifies jsk->filters while receiving packets.
      
      Following trace was seen on affected system:
       ==================================================================
       BUG: KASAN: slab-use-after-free in j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
       Read of size 4 at addr ffff888012144014 by task j1939/350
      
       CPU: 0 PID: 350 Comm: j1939 Tainted: G        W  OE      6.5.0-rc5 #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
       Call Trace:
        print_report+0xd3/0x620
        ? kasan_complete_mode_report_info+0x7d/0x200
        ? j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
        kasan_report+0xc2/0x100
        ? j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
        __asan_load4+0x84/0xb0
        j1939_sk_recv_match_one+0x1af/0x2d0 [can_j1939]
        j1939_sk_recv+0x20b/0x320 [can_j1939]
        ? __kasan_check_write+0x18/0x20
        ? __pfx_j1939_sk_recv+0x10/0x10 [can_j1939]
        ? j1939_simple_recv+0x69/0x280 [can_j1939]
        ? j1939_ac_recv+0x5e/0x310 [can_j1939]
        j1939_can_recv+0x43f/0x580 [can_j1939]
        ? __pfx_j1939_can_recv+0x10/0x10 [can_j1939]
        ? raw_rcv+0x42/0x3c0 [can_raw]
        ? __pfx_j1939_can_recv+0x10/0x10 [can_j1939]
        can_rcv_filter+0x11f/0x350 [can]
        can_receive+0x12f/0x190 [can]
        ? __pfx_can_rcv+0x10/0x10 [can]
        can_rcv+0xdd/0x130 [can]
        ? __pfx_can_rcv+0x10/0x10 [can]
        __netif_receive_skb_one_core+0x13d/0x150
        ? __pfx___netif_receive_skb_one_core+0x10/0x10
        ? __kasan_check_write+0x18/0x20
        ? _raw_spin_lock_irq+0x8c/0xe0
        __netif_receive_skb+0x23/0xb0
        process_backlog+0x107/0x260
        __napi_poll+0x69/0x310
        net_rx_action+0x2a1/0x580
        ? __pfx_net_rx_action+0x10/0x10
        ? __pfx__raw_spin_lock+0x10/0x10
        ? handle_irq_event+0x7d/0xa0
        __do_softirq+0xf3/0x3f8
        do_softirq+0x53/0x80
        </IRQ>
        <TASK>
        __local_bh_enable_ip+0x6e/0x70
        netif_rx+0x16b/0x180
        can_send+0x32b/0x520 [can]
        ? __pfx_can_send+0x10/0x10 [can]
        ? __check_object_size+0x299/0x410
        raw_sendmsg+0x572/0x6d0 [can_raw]
        ? __pfx_raw_sendmsg+0x10/0x10 [can_raw]
        ? apparmor_socket_sendmsg+0x2f/0x40
        ? __pfx_raw_sendmsg+0x10/0x10 [can_raw]
        sock_sendmsg+0xef/0x100
        sock_write_iter+0x162/0x220
        ? __pfx_sock_write_iter+0x10/0x10
        ? __rtnl_unlock+0x47/0x80
        ? security_file_permission+0x54/0x320
        vfs_write+0x6ba/0x750
        ? __pfx_vfs_write+0x10/0x10
        ? __fget_light+0x1ca/0x1f0
        ? __rcu_read_unlock+0x5b/0x280
        ksys_write+0x143/0x170
        ? __pfx_ksys_write+0x10/0x10
        ? __kasan_check_read+0x15/0x20
        ? fpregs_assert_state_consistent+0x62/0x70
        __x64_sys_write+0x47/0x60
        do_syscall_64+0x60/0x90
        ? do_syscall_64+0x6d/0x90
        ? irqentry_exit+0x3f/0x50
        ? exc_page_fault+0x79/0xf0
        entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      
       Allocated by task 348:
        kasan_save_stack+0x2a/0x50
        kasan_set_track+0x29/0x40
        kasan_save_alloc_info+0x1f/0x30
        __kasan_kmalloc+0xb5/0xc0
        __kmalloc_node_track_caller+0x67/0x160
        j1939_sk_setsockopt+0x284/0x450 [can_j1939]
        __sys_setsockopt+0x15c/0x2f0
        __x64_sys_setsockopt+0x6b/0x80
        do_syscall_64+0x60/0x90
        entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      
       Freed by task 349:
        kasan_save_stack+0x2a/0x50
        kasan_set_track+0x29/0x40
        kasan_save_free_info+0x2f/0x50
        __kasan_slab_free+0x12e/0x1c0
        __kmem_cache_free+0x1b9/0x380
        kfree+0x7a/0x120
        j1939_sk_setsockopt+0x3b2/0x450 [can_j1939]
        __sys_setsockopt+0x15c/0x2f0
        __x64_sys_setsockopt+0x6b/0x80
        do_syscall_64+0x60/0x90
        entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      
      Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
      Reported-by: default avatarSili Luo <rootlab@huawei.com>
      Suggested-by: default avatarSili Luo <rootlab@huawei.com>
      Acked-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Link: https://lore.kernel.org/all/20231020133814.383996-1-o.rempel@pengutronix.deSigned-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      efe7cf82
    • Ziqi Zhao's avatar
      can: j1939: prevent deadlock by changing j1939_socks_lock to rwlock · 6cdedc18
      Ziqi Zhao authored
      The following 3 locks would race against each other, causing the
      deadlock situation in the Syzbot bug report:
      
      - j1939_socks_lock
      - active_session_list_lock
      - sk_session_queue_lock
      
      A reasonable fix is to change j1939_socks_lock to an rwlock, since in
      the rare situations where a write lock is required for the linked list
      that j1939_socks_lock is protecting, the code does not attempt to
      acquire any more locks. This would break the circular lock dependency,
      where, for example, the current thread already locks j1939_socks_lock
      and attempts to acquire sk_session_queue_lock, and at the same time,
      another thread attempts to acquire j1939_socks_lock while holding
      sk_session_queue_lock.
      
      NOTE: This patch along does not fix the unregister_netdevice bug
      reported by Syzbot; instead, it solves a deadlock situation to prepare
      for one or more further patches to actually fix the Syzbot bug, which
      appears to be a reference counting problem within the j1939 codebase.
      
      Reported-by: <syzbot+1591462f226d9cbf0564@syzkaller.appspotmail.com>
      Signed-off-by: default avatarZiqi Zhao <astrajoan@yahoo.com>
      Reviewed-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Acked-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Link: https://lore.kernel.org/all/20230721162226.8639-1-astrajoan@yahoo.com
      [mkl: remove unrelated newline change]
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      6cdedc18
    • Arnd Bergmann's avatar
      ethernet: cpts: fix function pointer cast warnings · 9b23fceb
      Arnd Bergmann authored
      clang-16 warns about the mismatched prototypes for the devm_* callbacks:
      
      drivers/net/ethernet/ti/cpts.c:691:12: error: cast from 'void (*)(struct clk_hw *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
        691 |                                        (void(*)(void *))clk_hw_unregister_mux,
            |                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      include/linux/device.h:406:34: note: expanded from macro 'devm_add_action_or_reset'
        406 |         __devm_add_action_or_reset(dev, action, data, #action)
            |                                         ^~~~~~
      drivers/net/ethernet/ti/cpts.c:703:12: error: cast from 'void (*)(struct device_node *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
        703 |                                        (void(*)(void *))of_clk_del_provider,
            |                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      include/linux/device.h:406:34: note: expanded from macro 'devm_add_action_or_reset'
        406 |         __devm_add_action_or_reset(dev, action, data, #action)
      
      Use separate helper functions for this instead, using the expected prototypes
      with a void* argument.
      
      Fixes: a3047a81 ("net: ethernet: ti: cpts: add support for ext rftclk selection")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b23fceb
    • Arnd Bergmann's avatar
      bnad: fix work_queue type mismatch · 5d07e432
      Arnd Bergmann authored
      clang-16 warns about a function pointer cast:
      
      drivers/net/ethernet/brocade/bna/bnad.c:1995:4: error: cast from 'void (*)(struct delayed_work *)' to 'work_func_t' (aka 'void (*)(struct work_struct *)') converts to incompatible function type [-Werror,-Wcast-function-type-strict]
       1995 |                         (work_func_t)bnad_tx_cleanup);
      drivers/net/ethernet/brocade/bna/bnad.c:2252:4: error: cast from 'void (*)(void *)' to 'work_func_t' (aka 'void (*)(struct work_struct *)') converts to incompatible function type [-Werror,-Wcast-function-type-strict]
       2252 |                         (work_func_t)(bnad_rx_cleanup));
      
      The problem here is mixing up work_struct and delayed_work, which relies
      the former being the first member of the latter.
      
      Change the code to use consistent types here to address the warning and
      make it more robust against workqueue interface changes.
      
      Side note: the use of a delayed workqueue for cleaning up TX descriptors
      is probably a bad idea since this introduces a noticeable delay. The
      driver currently does not appear to use BQL, but if one wanted to add
      that, this would have to be changed as well.
      
      Fixes: 01b54b14 ("bna: tx rx cleanup fix")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d07e432
    • Dmitry Antipov's avatar
      net: smc: fix spurious error message from __sock_release() · 6cf9ff46
      Dmitry Antipov authored
      Commit 67f562e3 ("net/smc: transfer fasync_list in case of fallback")
      leaves the socket's fasync list pointer within a container socket as well.
      When the latter is destroyed, '__sock_release()' warns about its non-empty
      fasync list, which is a dangling pointer to previously freed fasync list
      of an underlying TCP socket. Fix this spurious warning by nullifying
      fasync list of a container socket.
      
      Fixes: 67f562e3 ("net/smc: transfer fasync_list in case of fallback")
      Signed-off-by: default avatarDmitry Antipov <dmantipov@yandex.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6cf9ff46
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · d9a31cda
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2024-02-12 (i40e)
      
      This series contains updates to i40e driver only.
      
      Ivan Vecera corrects the looping value used while waiting for queues to
      be disabled as well as an incorrect mask being used for DCB
      configuration.
      
      Maciej resolves an issue related to XDP traffic; removing a double call to
      i40e_pf_rxq_wait() and accounting for XDP rings when stopping rings.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9a31cda
    • Subbaraya Sundeep's avatar
      octeontx2-af: Remove the PF_FUNC validation for NPC transmit rules · 858b3113
      Subbaraya Sundeep authored
      NPC transmit side mcam rules can use the pcifunc (in packet metadata
      added by hardware) of transmitting device for mcam lookup similar to
      the channel of receiving device at receive side.
      The commit 18603683 ("octeontx2-af: Remove channel verification
      while installing MCAM rules") removed the receive side channel
      verification to save hardware MCAM filters while switching packets
      across interfaces but missed removing transmit side checks.
      This patch removes transmit side rules validation.
      
      Fixes: 18603683 ("octeontx2-af: Remove channel verification while installing MCAM rules")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      858b3113
  3. 13 Feb, 2024 11 commits
    • Jakub Kicinski's avatar
      Merge branch 'selftests-net-more-pmtu-sh-fixes' · 1e41f11f
      Jakub Kicinski authored
      Paolo Abeni says:
      
      ====================
      selftests: net: more pmtu.sh fixes
      
      The mentioned test is still flaky, unusally enough in 'fast'
      environments.
      
      Patch 2/2 [try to] address the existing issues, while patch 1/2
      introduces more strict tests for the existing net helpers, to hopefully
      prevent future pain.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1707731086.git.pabeni@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1e41f11f
    • Paolo Abeni's avatar
      selftests: net: more pmtu.sh fixes · 20622dc9
      Paolo Abeni authored
      The netdev CI is reporting failures for the pmtu test:
      
        [  115.929264] br0: port 2(vxlan_a) entered forwarding state
        # 2024/02/08 17:33:22 socat[7871] E bind(7, {AF=10 [0000:0000:0000:0000:0000:0000:0000:0000]:50000}, 28): Address already in use
        # 2024/02/08 17:33:22 socat[7877] E write(7, 0x5598fb6ff000, 8192): Connection refused
        # TEST: IPv6, bridged vxlan4: PMTU exceptions                         [FAIL]
        # File size 0 mismatches exepcted value in locally bridged vxlan test
      
      The root cause is apparently a socket created by a previous iteration
      of the relevant loop still lasting in LAST_ACK state.
      
      Note that even the file size check is racy, the receiver process dumping
      the file could still be running in background
      
      Allow the listener to bound on the same local port via SO_REUSEADDR and
      collect file output file size only after the listener completion.
      
      Fixes: 136a1b43 ("selftests: net: test vxlan pmtu exceptions with tcp")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/4f51c11a1ce7ca7a4dabd926cffff63dadac9ba1.1707731086.git.pabeni@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      20622dc9
    • Paolo Abeni's avatar
      selftests: net: more strict check in net_helper · a71d0908
      Paolo Abeni authored
      The helper waiting for a listener port can match any socket whose
      hexadecimal representation of source or destination addresses
      matches that of the given port.
      
      Additionally, any socket state is accepted.
      
      All the above can let the helper return successfully before the
      relevant listener is actually ready, with unexpected results.
      
      So far I could not find any related failure in the netdev CI, but
      the next patch is going to make the critical event more easily
      reproducible.
      
      Address the issue matching the port hex only vs the relevant socket
      field and additionally checking the socket state for TCP sockets.
      
      Fixes: 3bdd9fd2 ("selftests/net: synchronize udpgro tests' tx and rx connection")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/192b3dbc443d953be32991d1b0ca432bd4c65008.1707731086.git.pabeni@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a71d0908
    • Paolo Abeni's avatar
      selftests: net: cope with slow env in so_txtime.sh test · a7ee79b9
      Paolo Abeni authored
      The mentioned test is failing in slow environments:
      
        # SO_TXTIME ipv4 clock monotonic
        # ./so_txtime: recv: timeout: Resource temporarily unavailable
        not ok 1 selftests: net: so_txtime.sh # exit=1
      
      Tuning the tolerance in the test binary is error-prone and doomed
      to failures is slow-enough environment.
      
      Just resort to suppress any error in such cases. Note to suppress
      them we need first to refactor a bit the code moving it to explicit
      error handling.
      
      Fixes: af5136f9 ("selftests/net: SO_TXTIME with ETF and FQ")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/2142d9ed4b5c5aa07dd1b455779625d91b175373.1707730902.git.pabeni@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a7ee79b9
    • Paolo Abeni's avatar
      selftests: net: cope with slow env in gro.sh test · e58779f4
      Paolo Abeni authored
      The gro self-tests sends the packets to be aggregated with
      multiple write operations.
      
      When running is slow environment, it's hard to guarantee that
      the GRO engine will wait for the last packet in an intended
      train.
      
      The above causes almost deterministic failures in our CI for
      the 'large' test-case.
      
      Address the issue explicitly ignoring failures for such case
      in slow environments (KSFT_MACHINE_SLOW==true).
      
      Fixes: 7d157501 ("selftests/net: GRO coalesce test")
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/97d3ba83f5a2bfeb36f6bc0fb76724eb3dafb608.1707729403.git.pabeni@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e58779f4
    • Randy Dunlap's avatar
      net: ti: icssg-prueth: add dependency for PTP · e083dd03
      Randy Dunlap authored
      When CONFIG_PTP_1588_CLOCK=m and CONFIG_TI_ICSSG_PRUETH=y, there are
      kconfig dependency warnings and build errors referencing PTP functions.
      
      Fix these by making TI_ICSSG_PRUETH depend on PTP_1588_CLOCK_OPTIONAL.
      
      Fixes these build errors and warnings:
      
      WARNING: unmet direct dependencies detected for TI_ICSS_IEP
        Depends on [m]: NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_TI [=y] && PTP_1588_CLOCK_OPTIONAL [=m] && TI_PRUSS [=y]
        Selected by [y]:
        - TI_ICSSG_PRUETH [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_TI [=y] && PRU_REMOTEPROC [=y] && ARCH_K3 [=y] && OF [=y] && TI_K3_UDMA_GLUE_LAYER [=y]
      
      aarch64-linux-ld: drivers/net/ethernet/ti/icssg/icss_iep.o: in function `icss_iep_get_ptp_clock_idx':
      icss_iep.c:(.text+0x1d4): undefined reference to `ptp_clock_index'
      aarch64-linux-ld: drivers/net/ethernet/ti/icssg/icss_iep.o: in function `icss_iep_exit':
      icss_iep.c:(.text+0xde8): undefined reference to `ptp_clock_unregister'
      aarch64-linux-ld: drivers/net/ethernet/ti/icssg/icss_iep.o: in function `icss_iep_init':
      icss_iep.c:(.text+0x176c): undefined reference to `ptp_clock_register'
      
      Fixes: 186734c1 ("net: ti: icssg-prueth: add packet timestamping and ptp support")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Roger Quadros <rogerq@ti.com>
      Cc: Md Danish Anwar <danishanwar@ti.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: netdev@vger.kernel.org
      Reviewed-by: default avatarMD Danish Anwar <danishanwar@ti.com>
      Link: https://lore.kernel.org/r/20240211061152.14696-1-rdunlap@infradead.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e083dd03
    • Kuniyuki Iwashima's avatar
      af_unix: Fix task hung while purging oob_skb in GC. · 25236c91
      Kuniyuki Iwashima authored
      syzbot reported a task hung; at the same time, GC was looping infinitely
      in list_for_each_entry_safe() for OOB skb.  [0]
      
      syzbot demonstrated that the list_for_each_entry_safe() was not actually
      safe in this case.
      
      A single skb could have references for multiple sockets.  If we free such
      a skb in the list_for_each_entry_safe(), the current and next sockets could
      be unlinked in a single iteration.
      
      unix_notinflight() uses list_del_init() to unlink the socket, so the
      prefetched next socket forms a loop itself and list_for_each_entry_safe()
      never stops.
      
      Here, we must use while() and make sure we always fetch the first socket.
      
      [0]:
      Sending NMI from CPU 0 to CPUs 1:
      NMI backtrace for cpu 1
      CPU: 1 PID: 5065 Comm: syz-executor236 Not tainted 6.8.0-rc3-syzkaller-00136-g1f719a2f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
      RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:26 [inline]
      RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline]
      RIP: 0010:__sanitizer_cov_trace_pc+0xd/0x60 kernel/kcov.c:207
      Code: cc cc cc cc 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 65 48 8b 14 25 40 c2 03 00 <65> 8b 05 b4 7c 78 7e a9 00 01 ff 00 48 8b 34 24 74 0f f6 c4 01 74
      RSP: 0018:ffffc900033efa58 EFLAGS: 00000283
      RAX: ffff88807b077800 RBX: ffff88807b077800 RCX: 1ffffffff27b1189
      RDX: ffff88802a5a3b80 RSI: ffffffff8968488d RDI: ffff88807b077f70
      RBP: ffffc900033efbb0 R08: 0000000000000001 R09: fffffbfff27a900c
      R10: ffffffff93d48067 R11: ffffffff8ae000eb R12: ffff88807b077800
      R13: dffffc0000000000 R14: ffff88807b077e40 R15: 0000000000000001
      FS:  0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000564f4fc1e3a8 CR3: 000000000d57a000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <NMI>
       </NMI>
       <TASK>
       unix_gc+0x563/0x13b0 net/unix/garbage.c:319
       unix_release_sock+0xa93/0xf80 net/unix/af_unix.c:683
       unix_release+0x91/0xf0 net/unix/af_unix.c:1064
       __sock_release+0xb0/0x270 net/socket.c:659
       sock_close+0x1c/0x30 net/socket.c:1421
       __fput+0x270/0xb80 fs/file_table.c:376
       task_work_run+0x14f/0x250 kernel/task_work.c:180
       exit_task_work include/linux/task_work.h:38 [inline]
       do_exit+0xa8a/0x2ad0 kernel/exit.c:871
       do_group_exit+0xd4/0x2a0 kernel/exit.c:1020
       __do_sys_exit_group kernel/exit.c:1031 [inline]
       __se_sys_exit_group kernel/exit.c:1029 [inline]
       __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1029
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xd5/0x270 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x6f/0x77
      RIP: 0033:0x7f9d6cbdac09
      Code: Unable to access opcode bytes at 0x7f9d6cbdabdf.
      RSP: 002b:00007fff5952feb8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9d6cbdac09
      RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
      RBP: 00007f9d6cc552b0 R08: ffffffffffffffb8 R09: 0000000000000006
      R10: 0000000000000006 R11: 0000000000000246 R12: 00007f9d6cc552b0
      R13: 0000000000000000 R14: 00007f9d6cc55d00 R15: 00007f9d6cbabe70
       </TASK>
      
      Reported-by: syzbot+4fa4a2d1f5a5ee06f006@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=4fa4a2d1f5a5ee06f006
      Fixes: 1279f9d9 ("af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC.")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240209220453.96053-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      25236c91
    • Keqi Wang's avatar
      connector/cn_proc: revert "connector: Fix proc_event_num_listeners count not cleared" · 8929f95b
      Keqi Wang authored
      This reverts commit c46bfba1 ("connector: Fix proc_event_num_listeners
      count not cleared").
      
      It is not accurate to reset proc_event_num_listeners according to
      cn_netlink_send_mult() return value -ESRCH.
      
      In the case of stress-ng netlink-proc, -ESRCH will always be returned,
      because netlink_broadcast_filtered will return -ESRCH,
      which may cause stress-ng netlink-proc performance degradation.
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Closes: https://lore.kernel.org/oe-lkp/202401112259.b23a1567-oliver.sang@intel.com
      Fixes: c46bfba1 ("connector: Fix proc_event_num_listeners count not cleared")
      Signed-off-by: default avatarKeqi Wang <wangkeqi_chris@163.com>
      Link: https://lore.kernel.org/r/20240209091659.68723-1-wangkeqi_chris@163.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8929f95b
    • Allison Henderson's avatar
      net:rds: Fix possible deadlock in rds_message_put · f1acf1ac
      Allison Henderson authored
      Functions rds_still_queued and rds_clear_recv_queue lock a given socket
      in order to safely iterate over the incoming rds messages. However
      calling rds_inc_put while under this lock creates a potential deadlock.
      rds_inc_put may eventually call rds_message_purge, which will lock
      m_rs_lock. This is the incorrect locking order since m_rs_lock is
      meant to be locked before the socket. To fix this, we move the message
      item to a local list or variable that wont need rs_recv_lock protection.
      Then we can safely call rds_inc_put on any item stored locally after
      rs_recv_lock is released.
      
      Fixes: bdbe6fbc ("RDS: recv.c")
      Reported-by: syzbot+f9db6ff27b9bfdcfeca0@syzkaller.appspotmail.com
      Reported-by: syzbot+dcd73ff9291e6d34b3ab@syzkaller.appspotmail.com
      Signed-off-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      Link: https://lore.kernel.org/r/20240209022854.200292-1-allison.henderson@oracle.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f1acf1ac
    • Eric Dumazet's avatar
      net: add rcu safety to rtnl_prop_list_size() · 9f308313
      Eric Dumazet authored
      rtnl_prop_list_size() can be called while alternative names
      are added or removed concurrently.
      
      if_nlmsg_size() / rtnl_calcit() can indeed be called
      without RTNL held.
      
      Use explicit RCU protection to avoid UAF.
      
      Fixes: 88f4fb0c ("net: rtnetlink: put alternative names to getlink message")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Jiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20240209181248.96637-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9f308313
    • Shannon Nelson's avatar
      pds_core: no health-thread in VF path · 3e36031c
      Shannon Nelson authored
      The VFs don't run the health thread, so don't try to
      stop or restart the non-existent timer or work item.
      
      Fixes: d9407ff1 ("pds_core: Prevent health thread from running during reset/remove")
      Reviewed-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Link: https://lore.kernel.org/r/20240210002002.49483-1-shannon.nelson@amd.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3e36031c