1. 20 Jul, 2023 12 commits
    • Florian Westphal's avatar
      netfilter: nf_tables: can't schedule in nft_chain_validate · 314c8284
      Florian Westphal authored
      Can be called via nft set element list iteration, which may acquire
      rcu and/or bh read lock (depends on set type).
      
      BUG: sleeping function called from invalid context at net/netfilter/nf_tables_api.c:3353
      in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1232, name: nft
      preempt_count: 0, expected: 0
      RCU nest depth: 1, expected: 0
      2 locks held by nft/1232:
       #0: ffff8881180e3ea8 (&nft_net->commit_mutex){+.+.}-{3:3}, at: nf_tables_valid_genid
       #1: ffffffff83f5f540 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire
      Call Trace:
       nft_chain_validate
       nft_lookup_validate_setelem
       nft_pipapo_walk
       nft_lookup_validate
       nft_chain_validate
       nft_immediate_validate
       nft_chain_validate
       nf_tables_validate
       nf_tables_abort
      
      No choice but to move it to nf_tables_validate().
      
      Fixes: 81ea0106 ("netfilter: nf_tables: add rescheduling points during loop detection walks")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      314c8284
    • Florian Westphal's avatar
      netfilter: nf_tables: fix spurious set element insertion failure · ddbd8be6
      Florian Westphal authored
      On some platforms there is a padding hole in the nft_verdict
      structure, between the verdict code and the chain pointer.
      
      On element insertion, if the new element clashes with an existing one and
      NLM_F_EXCL flag isn't set, we want to ignore the -EEXIST error as long as
      the data associated with duplicated element is the same as the existing
      one.  The data equality check uses memcmp.
      
      For normal data (NFT_DATA_VALUE) this works fine, but for NFT_DATA_VERDICT
      padding area leads to spurious failure even if the verdict data is the
      same.
      
      This then makes the insertion fail with 'already exists' error, even
      though the new "key : data" matches an existing entry and userspace
      told the kernel that it doesn't want to receive an error indication.
      
      Fixes: c016c7e4 ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      ddbd8be6
    • Paolo Abeni's avatar
      Merge branch 'net-support-stp-on-bridge-in-non-root-netns' · ac528649
      Paolo Abeni authored
      Kuniyuki Iwashima says:
      
      ====================
      net: Support STP on bridge in non-root netns.
      
      Currently, STP does not work in non-root netns as llc_rcv() drops
      packets from non-root netns.
      
      This series fixes it by making some protocol handlers netns-aware,
      which are called from llc_rcv() as follows:
      
        llc_rcv()
        |
        |- sap->rcv_func : registered by llc_sap_open()
        |
        |  * functions : regsitered by register_8022_client()
        |    -> No in-kernel user call register_8022_client()
        |
        |  * snap_rcv()
        |    |
        |    `- proto->rcvfunc() : registered by register_snap_client()
        |
        |       * aarp_rcv()  : drop packets from non-root netns
        |       * atalk_rcv() : drop packets from non-root netns
        |
        |  * stp_pdu_rcv()
        |    |
        |    `- garp_protos[]->rcv() : registered by stp_proto_register()
        |
        |       * garp_pdu_rcv() : netns-aware
        |       * br_stp_rcv()   : netns-aware
        |
        |- llc_type_handlers[llc_pdu_type(skb) - 1]
        |
        |  * llc_sap_handler()  : NOT netns-aware (Patch 1)
        |  * llc_conn_handler() : NOT netns-aware (Patch 2)
        |
        `- llc_station_handler
      
           * llc_station_rcv() : netns-aware
      
      Patch 1 & 2 convert not-netns-aware functions and Patch 3 remove the
      netns restriction in llc_rcv().
      
      Note this series does not namespacify AF_LLC so that these patches
      can be backported to stable without conflicts (at least to 4.14.y).
      
      Another series that adds netns support for AF_LLC will be targeted
      to net-next later.
      ====================
      
      Link: https://lore.kernel.org/r/20230718174152.57408-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      ac528649
    • Kuniyuki Iwashima's avatar
      Revert "bridge: Add extack warning when enabling STP in netns." · 7ebd00a5
      Kuniyuki Iwashima authored
      This reverts commit 56a16035.
      
      Since the previous commit, STP works on bridge in netns.
      
        # unshare -n
        # ip link add br0 type bridge
        # ip link add veth0 type veth peer name veth1
      
        # ip link set veth0 master br0 up
        [   50.558135] br0: port 1(veth0) entered blocking state
        [   50.558366] br0: port 1(veth0) entered disabled state
        [   50.558798] veth0: entered allmulticast mode
        [   50.564401] veth0: entered promiscuous mode
      
        # ip link set veth1 master br0 up
        [   54.215487] br0: port 2(veth1) entered blocking state
        [   54.215657] br0: port 2(veth1) entered disabled state
        [   54.215848] veth1: entered allmulticast mode
        [   54.219577] veth1: entered promiscuous mode
      
        # ip link set br0 type bridge stp_state 1
        # ip link set br0 up
        [   61.960726] br0: port 2(veth1) entered blocking state
        [   61.961097] br0: port 2(veth1) entered listening state
        [   61.961495] br0: port 1(veth0) entered blocking state
        [   61.961653] br0: port 1(veth0) entered listening state
        [   63.998835] br0: port 2(veth1) entered blocking state
        [   77.437113] br0: port 1(veth0) entered learning state
        [   86.653501] br0: received packet on veth0 with own address as source address (addr:6e:0f:e7:6f:5f:5f, vlan:0)
        [   92.797095] br0: port 1(veth0) entered forwarding state
        [   92.797398] br0: topology change detected, propagating
      
      Let's remove the warning.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7ebd00a5
    • Kuniyuki Iwashima's avatar
      llc: Don't drop packet from non-root netns. · 6631463b
      Kuniyuki Iwashima authored
      Now these upper layer protocol handlers can be called from llc_rcv()
      as sap->rcv_func(), which is registered by llc_sap_open().
      
        * function which is passed to register_8022_client()
          -> no in-kernel user calls register_8022_client().
      
        * snap_rcv()
          `- proto->rcvfunc() : registered by register_snap_client()
             -> aarp_rcv() and atalk_rcv() drop packets from non-root netns
      
        * stp_pdu_rcv()
          `- garp_protos[]->rcv() : registered by stp_proto_register()
             -> garp_pdu_rcv() and br_stp_rcv() are netns-aware
      
      So, we can safely remove the netns restriction in llc_rcv().
      
      Fixes: e730c155 ("[NET]: Make packet reception network namespace safe")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6631463b
    • Kuniyuki Iwashima's avatar
      llc: Check netns in llc_estab_match() and llc_listener_match(). · 97b1d320
      Kuniyuki Iwashima authored
      We will remove this restriction in llc_rcv() in the following patch,
      which means that the protocol handler must be aware of netns.
      
              if (!net_eq(dev_net(dev), &init_net))
                      goto drop;
      
      llc_rcv() fetches llc_type_handlers[llc_pdu_type(skb) - 1] and calls it
      if not NULL.
      
      If the PDU type is LLC_DEST_CONN, llc_conn_handler() is called to pass
      skb to corresponding sockets.  Then, we must look up a proper socket in
      the same netns with skb->dev.
      
      llc_conn_handler() calls __llc_lookup() to look up a established or
      litening socket by __llc_lookup_established() and llc_lookup_listener().
      
      Both functions iterate on a list and call llc_estab_match() or
      llc_listener_match() to check if the socket is the correct destination.
      However, these functions do not check netns.
      
      Also, bind() and connect() call llc_establish_connection(), which
      finally calls __llc_lookup_established(), to check if there is a
      conflicting socket.
      
      Let's test netns in llc_estab_match() and llc_listener_match().
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      97b1d320
    • Kuniyuki Iwashima's avatar
      llc: Check netns in llc_dgram_match(). · 9b64e93e
      Kuniyuki Iwashima authored
      We will remove this restriction in llc_rcv() soon, which means that the
      protocol handler must be aware of netns.
      
      	if (!net_eq(dev_net(dev), &init_net))
      		goto drop;
      
      llc_rcv() fetches llc_type_handlers[llc_pdu_type(skb) - 1] and calls it
      if not NULL.
      
      If the PDU type is LLC_DEST_SAP, llc_sap_handler() is called to pass skb
      to corresponding sockets.  Then, we must look up a proper socket in the
      same netns with skb->dev.
      
      If the destination is a multicast address, llc_sap_handler() calls
      llc_sap_mcast().  It calculates a hash based on DSAP and skb->dev->ifindex,
      iterates on a socket list, and calls llc_mcast_match() to check if the
      socket is the correct destination.  Then, llc_mcast_match() checks if
      skb->dev matches with llc_sk(sk)->dev.  So, we need not check netns here.
      
      OTOH, if the destination is a unicast address, llc_sap_handler() calls
      llc_lookup_dgram() to look up a socket, but it does not check the netns.
      
      Therefore, we need to add netns check in llc_lookup_dgram().
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9b64e93e
    • Daniel Golle's avatar
      net: ethernet: mtk_eth_soc: always mtk_get_ib1_pkt_type · 9f9d4c1a
      Daniel Golle authored
      entries and bind debugfs files would display wrong data on NETSYS_V2 and
      later because instead of using mtk_get_ib1_pkt_type the driver would use
      MTK_FOE_IB1_PACKET_TYPE which corresponds to NETSYS_V1(.x) SoCs.
      Use mtk_get_ib1_pkt_type so entries and bind records display correctly.
      
      Fixes: 03a3180e ("net: ethernet: mtk_eth_soc: introduce flow offloading support for mt7986")
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Acked-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Link: https://lore.kernel.org/r/c0ae03d0182f4d27b874cbdf0059bc972c317f3c.1689727134.git.daniel@makrotopia.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9f9d4c1a
    • Jakub Kicinski's avatar
      Merge branch 'r8169-revert-two-changes-that-caused-regressions' · 88f2e009
      Jakub Kicinski authored
      Heiner Kallweit says:
      
      ====================
      r8169: revert two changes that caused regressions
      
      This reverts two changes that caused regressions.
      ====================
      
      Link: https://lore.kernel.org/r/ddadceae-19c9-81b8-47b5-a4ff85e2563a@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      88f2e009
    • Heiner Kallweit's avatar
      Revert "r8169: disable ASPM during NAPI poll" · e31a9fed
      Heiner Kallweit authored
      This reverts commit e1ed3e4d.
      
      Turned out the change causes a performance regression.
      
      Link: https://lore.kernel.org/netdev/20230713124914.GA12924@green245/T/
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://lore.kernel.org/r/055c6bc2-74fa-8c67-9897-3f658abb5ae7@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e31a9fed
    • Heiner Kallweit's avatar
      r8169: revert 2ab19de6 ("r8169: remove ASPM restrictions now that ASPM is... · cf2ffdea
      Heiner Kallweit authored
      r8169: revert 2ab19de6 ("r8169: remove ASPM restrictions now that ASPM is disabled during NAPI poll")
      
      There have been reports that on a number of systems this change breaks
      network connectivity. Therefore effectively revert it. Mainly affected
      seem to be systems where BIOS denies ASPM access to OS.
      Due to later changes we can't do a direct revert.
      
      Fixes: 2ab19de6 ("r8169: remove ASPM restrictions now that ASPM is disabled during NAPI poll")
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/netdev/e47bac0d-e802-65e1-b311-6acb26d5cf10@freenet.de/T/
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217596Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://lore.kernel.org/r/57f13ec0-b216-d5d8-363d-5b05528ec5fb@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cf2ffdea
    • Kuniyuki Iwashima's avatar
      Revert "tcp: avoid the lookup process failing to get sk in ehash table" · 81b3ade5
      Kuniyuki Iwashima authored
      This reverts commit 3f4ca5fa.
      
      Commit 3f4ca5fa ("tcp: avoid the lookup process failing to get sk in
      ehash table") reversed the order in how a socket is inserted into ehash
      to fix an issue that ehash-lookup could fail when reqsk/full sk/twsk are
      swapped.  However, it introduced another lookup failure.
      
      The full socket in ehash is allocated from a slab with SLAB_TYPESAFE_BY_RCU
      and does not have SOCK_RCU_FREE, so the socket could be reused even while
      it is being referenced on another CPU doing RCU lookup.
      
      Let's say a socket is reused and inserted into the same hash bucket during
      lookup.  After the blamed commit, a new socket is inserted at the end of
      the list.  If that happens, we will skip sockets placed after the previous
      position of the reused socket, resulting in ehash lookup failure.
      
      As described in Documentation/RCU/rculist_nulls.rst, we should insert a
      new socket at the head of the list to avoid such an issue.
      
      This issue, the swap-lookup-failure, and another variant reported in [0]
      can all be handled properly by adding a locked ehash lookup suggested by
      Eric Dumazet [1].
      
      However, this issue could occur for every packet, thus more likely than
      the other two races, so let's revert the change for now.
      
      Link: https://lore.kernel.org/netdev/20230606064306.9192-1-duanmuquan@baidu.com/ [0]
      Link: https://lore.kernel.org/netdev/CANn89iK8snOz8TYOhhwfimC7ykYA78GA3Nyv8x06SZYa1nKdyA@mail.gmail.com/ [1]
      Fixes: 3f4ca5fa ("tcp: avoid the lookup process failing to get sk in ehash table")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230717215918.15723-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      81b3ade5
  2. 19 Jul, 2023 16 commits
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · e80698b7
      Jakub Kicinski authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2023-07-19
      
      We've added 4 non-merge commits during the last 1 day(s) which contain
      a total of 3 files changed, 55 insertions(+), 10 deletions(-).
      
      The main changes are:
      
      1) Fix stack depth check in presence of async callbacks,
         from Kumar Kartikeya Dwivedi.
      
      2) Fix BTI type used for freplace attached functions,
         from Alexander Duyck.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf, arm64: Fix BTI type used for freplace attached functions
        selftests/bpf: Add more tests for check_max_stack_depth bug
        bpf: Repeat check_max_stack_depth for async callbacks
        bpf: Fix subprog idx logic in check_max_stack_depth
      ====================
      
      Link: https://lore.kernel.org/r/20230719174502.74023-1-alexei.starovoitov@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e80698b7
    • Yuanjun Gong's avatar
      ipv4: ip_gre: fix return value check in erspan_xmit() · aa7cb378
      Yuanjun Gong authored
      goto free_skb if an unexpected result is returned by pskb_tirm()
      in erspan_xmit().
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa7cb378
    • Yuanjun Gong's avatar
      ipv4: ip_gre: fix return value check in erspan_fb_xmit() · 02d84f3e
      Yuanjun Gong authored
      goto err_free_skb if an unexpected result is returned by pskb_tirm()
      in erspan_fb_xmit().
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02d84f3e
    • Yuanjun Gong's avatar
      drivers:net: fix return value check in ocelot_fdma_receive_skb · bce56033
      Yuanjun Gong authored
      ocelot_fdma_receive_skb should return false if an unexpected
      value is returned by pskb_trim.
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bce56033
    • Yuanjun Gong's avatar
      drivers: net: fix return value check in emac_tso_csum() · 78a93c31
      Yuanjun Gong authored
      in emac_tso_csum(), return an error code if an unexpected value
      is returned by pskb_trim().
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      78a93c31
    • Yuanjun Gong's avatar
      net:ipv6: check return value of pskb_trim() · 4258faa1
      Yuanjun Gong authored
      goto tx_err if an unexpected result is returned by pskb_tirm()
      in ip6erspan_tunnel_xmit().
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4258faa1
    • Wang Ming's avatar
      net: ipv4: Use kfree_sensitive instead of kfree · daa75144
      Wang Ming authored
      key might contain private part of the key, so better use
      kfree_sensitive to free it.
      
      Fixes: 38320c70 ("[IPSEC]: Use crypto_aead and authenc in ESP")
      Signed-off-by: default avatarWang Ming <machel@vivo.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      daa75144
    • Jakub Kicinski's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 7f5acea7
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-07-17 (iavf)
      
      This series contains updates to iavf driver only.
      
      Ding Hui fixes use-after-free issue by calling netif_napi_del() for all
      allocated q_vectors. He also resolves out-of-bounds issue by not
      updating to new values when timeout is encountered.
      
      Marcin and Ahmed change the way resets are handled so that the callback
      operating under the RTNL lock will wait for the reset to finish, the
      rtnl_lock sensitive functions in reset flow will schedule the netdev update
      for later in order to remove circular dependency with the critical lock.
      
      * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        iavf: fix reset task race with iavf_remove()
        iavf: fix a deadlock caused by rtnl and driver's lock circular dependencies
        Revert "iavf: Do not restart Tx queues after reset task failure"
        Revert "iavf: Detach device during reset task"
        iavf: Wait for reset in callbacks which trigger it
        iavf: use internal state to free traffic IRQs
        iavf: Fix out-of-bounds when setting channels on remove
        iavf: Fix use-after-free in free_netdev
      ====================
      
      Link: https://lore.kernel.org/r/20230717175205.3217774-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7f5acea7
    • Jakub Kicinski's avatar
      Merge branch 'tcp-annotate-data-races-in-tcp_rsk-req' · e9b2bd96
      Jakub Kicinski authored
      Eric Dumazet says:
      
      ====================
      tcp: annotate data-races in tcp_rsk(req)
      
      Small series addressing two syzbot reports around tcp_rsk(req)
      ====================
      
      Link: https://lore.kernel.org/r/20230717144445.653164-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e9b2bd96
    • Eric Dumazet's avatar
      tcp: annotate data-races around tcp_rsk(req)->ts_recent · eba20811
      Eric Dumazet authored
      TCP request sockets are lockless, tcp_rsk(req)->ts_recent
      can change while being read by another cpu as syzbot noticed.
      
      This is harmless, but we should annotate the known races.
      
      Note that tcp_check_req() changes req->ts_recent a bit early,
      we might change this in the future.
      
      BUG: KCSAN: data-race in tcp_check_req / tcp_check_req
      
      write to 0xffff88813c8afb84 of 4 bytes by interrupt on cpu 1:
      tcp_check_req+0x694/0xc70 net/ipv4/tcp_minisocks.c:762
      tcp_v4_rcv+0x12db/0x1b70 net/ipv4/tcp_ipv4.c:2071
      ip_protocol_deliver_rcu+0x356/0x6d0 net/ipv4/ip_input.c:205
      ip_local_deliver_finish+0x13c/0x1a0 net/ipv4/ip_input.c:233
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip_local_deliver+0xec/0x1c0 net/ipv4/ip_input.c:254
      dst_input include/net/dst.h:468 [inline]
      ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip_rcv+0x197/0x270 net/ipv4/ip_input.c:569
      __netif_receive_skb_one_core net/core/dev.c:5493 [inline]
      __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5607
      process_backlog+0x21f/0x380 net/core/dev.c:5935
      __napi_poll+0x60/0x3b0 net/core/dev.c:6498
      napi_poll net/core/dev.c:6565 [inline]
      net_rx_action+0x32b/0x750 net/core/dev.c:6698
      __do_softirq+0xc1/0x265 kernel/softirq.c:571
      do_softirq+0x7e/0xb0 kernel/softirq.c:472
      __local_bh_enable_ip+0x64/0x70 kernel/softirq.c:396
      local_bh_enable+0x1f/0x20 include/linux/bottom_half.h:33
      rcu_read_unlock_bh include/linux/rcupdate.h:843 [inline]
      __dev_queue_xmit+0xabb/0x1d10 net/core/dev.c:4271
      dev_queue_xmit include/linux/netdevice.h:3088 [inline]
      neigh_hh_output include/net/neighbour.h:528 [inline]
      neigh_output include/net/neighbour.h:542 [inline]
      ip_finish_output2+0x700/0x840 net/ipv4/ip_output.c:229
      ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:317
      NF_HOOK_COND include/linux/netfilter.h:292 [inline]
      ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:431
      dst_output include/net/dst.h:458 [inline]
      ip_local_out net/ipv4/ip_output.c:126 [inline]
      __ip_queue_xmit+0xa4d/0xa70 net/ipv4/ip_output.c:533
      ip_queue_xmit+0x38/0x40 net/ipv4/ip_output.c:547
      __tcp_transmit_skb+0x1194/0x16e0 net/ipv4/tcp_output.c:1399
      tcp_transmit_skb net/ipv4/tcp_output.c:1417 [inline]
      tcp_write_xmit+0x13ff/0x2fd0 net/ipv4/tcp_output.c:2693
      __tcp_push_pending_frames+0x6a/0x1a0 net/ipv4/tcp_output.c:2877
      tcp_push_pending_frames include/net/tcp.h:1952 [inline]
      __tcp_sock_set_cork net/ipv4/tcp.c:3336 [inline]
      tcp_sock_set_cork+0xe8/0x100 net/ipv4/tcp.c:3343
      rds_tcp_xmit_path_complete+0x3b/0x40 net/rds/tcp_send.c:52
      rds_send_xmit+0xf8d/0x1420 net/rds/send.c:422
      rds_send_worker+0x42/0x1d0 net/rds/threads.c:200
      process_one_work+0x3e6/0x750 kernel/workqueue.c:2408
      worker_thread+0x5f2/0xa10 kernel/workqueue.c:2555
      kthread+0x1d7/0x210 kernel/kthread.c:379
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
      
      read to 0xffff88813c8afb84 of 4 bytes by interrupt on cpu 0:
      tcp_check_req+0x32a/0xc70 net/ipv4/tcp_minisocks.c:622
      tcp_v4_rcv+0x12db/0x1b70 net/ipv4/tcp_ipv4.c:2071
      ip_protocol_deliver_rcu+0x356/0x6d0 net/ipv4/ip_input.c:205
      ip_local_deliver_finish+0x13c/0x1a0 net/ipv4/ip_input.c:233
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip_local_deliver+0xec/0x1c0 net/ipv4/ip_input.c:254
      dst_input include/net/dst.h:468 [inline]
      ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip_rcv+0x197/0x270 net/ipv4/ip_input.c:569
      __netif_receive_skb_one_core net/core/dev.c:5493 [inline]
      __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5607
      process_backlog+0x21f/0x380 net/core/dev.c:5935
      __napi_poll+0x60/0x3b0 net/core/dev.c:6498
      napi_poll net/core/dev.c:6565 [inline]
      net_rx_action+0x32b/0x750 net/core/dev.c:6698
      __do_softirq+0xc1/0x265 kernel/softirq.c:571
      run_ksoftirqd+0x17/0x20 kernel/softirq.c:939
      smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164
      kthread+0x1d7/0x210 kernel/kthread.c:379
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
      
      value changed: 0x1cd237f1 -> 0x1cd237f2
      
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230717144445.653164-3-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      eba20811
    • Eric Dumazet's avatar
      tcp: annotate data-races around tcp_rsk(req)->txhash · 5e526552
      Eric Dumazet authored
      TCP request sockets are lockless, some of their fields
      can change while being read by another cpu as syzbot noticed.
      
      This is usually harmless, but we should annotate the known
      races.
      
      This patch takes care of tcp_rsk(req)->txhash,
      a separate one is needed for tcp_rsk(req)->ts_recent.
      
      BUG: KCSAN: data-race in tcp_make_synack / tcp_rtx_synack
      
      write to 0xffff8881362304bc of 4 bytes by task 32083 on cpu 1:
      tcp_rtx_synack+0x9d/0x2a0 net/ipv4/tcp_output.c:4213
      inet_rtx_syn_ack+0x38/0x80 net/ipv4/inet_connection_sock.c:880
      tcp_check_req+0x379/0xc70 net/ipv4/tcp_minisocks.c:665
      tcp_v6_rcv+0x125b/0x1b20 net/ipv6/tcp_ipv6.c:1673
      ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437
      ip6_input_finish net/ipv6/ip6_input.c:482 [inline]
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491
      dst_input include/net/dst.h:468 [inline]
      ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309
      __netif_receive_skb_one_core net/core/dev.c:5452 [inline]
      __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566
      netif_receive_skb_internal net/core/dev.c:5652 [inline]
      netif_receive_skb+0x4a/0x310 net/core/dev.c:5711
      tun_rx_batched+0x3bf/0x400
      tun_get_user+0x1d24/0x22b0 drivers/net/tun.c:1997
      tun_chr_write_iter+0x18e/0x240 drivers/net/tun.c:2043
      call_write_iter include/linux/fs.h:1871 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x4ab/0x7d0 fs/read_write.c:584
      ksys_write+0xeb/0x1a0 fs/read_write.c:637
      __do_sys_write fs/read_write.c:649 [inline]
      __se_sys_write fs/read_write.c:646 [inline]
      __x64_sys_write+0x42/0x50 fs/read_write.c:646
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffff8881362304bc of 4 bytes by task 32078 on cpu 0:
      tcp_make_synack+0x367/0xb40 net/ipv4/tcp_output.c:3663
      tcp_v6_send_synack+0x72/0x420 net/ipv6/tcp_ipv6.c:544
      tcp_conn_request+0x11a8/0x1560 net/ipv4/tcp_input.c:7059
      tcp_v6_conn_request+0x13f/0x180 net/ipv6/tcp_ipv6.c:1175
      tcp_rcv_state_process+0x156/0x1de0 net/ipv4/tcp_input.c:6494
      tcp_v6_do_rcv+0x98a/0xb70 net/ipv6/tcp_ipv6.c:1509
      tcp_v6_rcv+0x17b8/0x1b20 net/ipv6/tcp_ipv6.c:1735
      ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437
      ip6_input_finish net/ipv6/ip6_input.c:482 [inline]
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491
      dst_input include/net/dst.h:468 [inline]
      ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309
      __netif_receive_skb_one_core net/core/dev.c:5452 [inline]
      __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566
      netif_receive_skb_internal net/core/dev.c:5652 [inline]
      netif_receive_skb+0x4a/0x310 net/core/dev.c:5711
      tun_rx_batched+0x3bf/0x400
      tun_get_user+0x1d24/0x22b0 drivers/net/tun.c:1997
      tun_chr_write_iter+0x18e/0x240 drivers/net/tun.c:2043
      call_write_iter include/linux/fs.h:1871 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x4ab/0x7d0 fs/read_write.c:584
      ksys_write+0xeb/0x1a0 fs/read_write.c:637
      __do_sys_write fs/read_write.c:649 [inline]
      __se_sys_write fs/read_write.c:646 [inline]
      __x64_sys_write+0x42/0x50 fs/read_write.c:646
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x91d25731 -> 0xe79325cd
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 32078 Comm: syz-executor.4 Not tainted 6.5.0-rc1-syzkaller-00033-geb26cbb1 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
      
      Fixes: 58d607d3 ("tcp: provide skb->hash to synack packets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230717144445.653164-2-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5e526552
    • Subbaraya Sundeep's avatar
      octeontx2-pf: mcs: Generate hash key using ecb(aes) · e7002b3b
      Subbaraya Sundeep authored
      Hardware generated encryption and ICV tags are found to
      be wrong when tested with IEEE MACSEC test vectors.
      This is because as per the HRM, the hash key (derived by
      AES-ECB block encryption of an all 0s block with the SAK)
      has to be programmed by the software in
      MCSX_RS_MCS_CPM_TX_SLAVE_SA_PLCY_MEM_4X register.
      Hence fix this by generating hash key in software and
      configuring in hardware.
      
      Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Reviewed-by: default avatarKalesh AP <kalesh-anakkur.purayil@broadcom.com>
      Link: https://lore.kernel.org/r/1689574603-28093-1-git-send-email-sbhatta@marvell.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e7002b3b
    • Florian Kauer's avatar
      igc: Prevent garbled TX queue with XDP ZEROCOPY · 78adb4bc
      Florian Kauer authored
      In normal operation, each populated queue item has
      next_to_watch pointing to the last TX desc of the packet,
      while each cleaned item has it set to 0. In particular,
      next_to_use that points to the next (necessarily clean)
      item to use has next_to_watch set to 0.
      
      When the TX queue is used both by an application using
      AF_XDP with ZEROCOPY as well as a second non-XDP application
      generating high traffic, the queue pointers can get in
      an invalid state where next_to_use points to an item
      where next_to_watch is NOT set to 0.
      
      However, the implementation assumes at several places
      that this is never the case, so if it does hold,
      bad things happen. In particular, within the loop inside
      of igc_clean_tx_irq(), next_to_clean can overtake next_to_use.
      Finally, this prevents any further transmission via
      this queue and it never gets unblocked or signaled.
      Secondly, if the queue is in this garbled state,
      the inner loop of igc_clean_tx_ring() will never terminate,
      completely hogging a CPU core.
      
      The reason is that igc_xdp_xmit_zc() reads next_to_use
      before acquiring the lock, and writing it back
      (potentially unmodified) later. If it got modified
      before locking, the outdated next_to_use is written
      pointing to an item that was already used elsewhere
      (and thus next_to_watch got written).
      
      Fixes: 9acf59a7 ("igc: Enable TX via AF_XDP zero-copy")
      Signed-off-by: default avatarFlorian Kauer <florian.kauer@linutronix.de>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Tested-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Acked-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20230717175444.3217831-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      78adb4bc
    • Jakub Kicinski's avatar
      Merge tag 'linux-can-fixes-for-6.5-20230717' of... · 936fd2c5
      Jakub Kicinski authored
      Merge tag 'linux-can-fixes-for-6.5-20230717' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2023-07-17
      
      The 1st patch is by Ziyang Xuan and fixes a possible memory leak in
      the receiver handling in the CAN RAW protocol.
      
      YueHaibing contributes a use after free in bcm_proc_show() of the
      Broad Cast Manager (BCM) CAN protocol.
      
      The next 2 patches are by me and fix a possible null pointer
      dereference in the RX path of the gs_usb driver with activated
      hardware timestamps and the candlelight firmware.
      
      The last patch is by Fedor Ross, Marek Vasut and me and targets the
      mcp251xfd driver. The polling timeout of __mcp251xfd_chip_set_mode()
      is increased to fix bus joining on busy CAN buses and very low bit
      rate.
      
      * tag 'linux-can-fixes-for-6.5-20230717' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
        can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll timeout
        can: gs_usb: fix time stamp counter initialization
        can: gs_usb: gs_can_open(): improve error handling
        can: bcm: Fix UAF in bcm_proc_show()
        can: raw: fix receiver memory leak
      ====================
      
      Link: https://lore.kernel.org/r/20230717180938.230816-1-mkl@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      936fd2c5
    • John Fastabend's avatar
      mailmap: Add entry for old intel email · 195e903b
      John Fastabend authored
      Fix old email to avoid bouncing email from net/drivers and older
      netdev work. Anyways my @intel email hasn't been active for years.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20230717173306.38407-1-john.fastabend@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      195e903b
    • Shannon Nelson's avatar
      mailmap: add entries for past lives · d1998e50
      Shannon Nelson authored
      Update old emails for my current work email.
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Link: https://lore.kernel.org/r/20230717193242.43670-1-shannon.nelson@amd.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d1998e50
  3. 18 Jul, 2023 12 commits