1. 24 Feb, 2022 6 commits
  2. 23 Feb, 2022 12 commits
  3. 22 Feb, 2022 2 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 5663b854
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      This is fixing up the use without proper initialization in patch 5/5
      
      -o-
      
      Hi,
      
      The following patchset contains Netfilter fixes for net:
      
      1) Missing #ifdef CONFIG_IP6_NF_IPTABLES in recent xt_socket fix.
      
      2) Fix incorrect flow action array size in nf_tables.
      
      3) Unregister flowtable hooks from netns exit path.
      
      4) Fix missing limit object release, from Florian Westphal.
      
      5) Memleak in nf_tables object update path, also from Florian.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5663b854
    • Florian Westphal's avatar
      netfilter: nf_tables: fix memory leak during stateful obj update · dad3bdee
      Florian Westphal authored
      stateful objects can be updated from the control plane.
      The transaction logic allocates a temporary object for this purpose.
      
      The ->init function was called for this object, so plain kfree() leaks
      resources. We must call ->destroy function of the object.
      
      nft_obj_destroy does this, but it also decrements the module refcount,
      but the update path doesn't increment it.
      
      To avoid special-casing the update object release, do module_get for
      the update case too and release it via nft_obj_destroy().
      
      Fixes: d62d0ba9 ("netfilter: nf_tables: Introduce stateful object update operation")
      Cc: Fernando Fernandez Mancera <ffmancera@riseup.net>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      dad3bdee
  4. 21 Feb, 2022 4 commits
    • Florian Westphal's avatar
      netfilter: nft_limit: fix stateful object memory leak · 1a58f84e
      Florian Westphal authored
      We need to provide a destroy callback to release the extra fields.
      
      Fixes: 3b9e2ea6 ("netfilter: nft_limit: move stateful fields out of expression data")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1a58f84e
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: unregister flowtable hooks on netns exit · 6069da44
      Pablo Neira Ayuso authored
      Unregister flowtable hooks before they are releases via
      nf_tables_flowtable_destroy() otherwise hook core reports UAF.
      
      BUG: KASAN: use-after-free in nf_hook_entries_grow+0x5a7/0x700 net/netfilter/core.c:142 net/netfilter/core.c:142
      Read of size 4 at addr ffff8880736f7438 by task syz-executor579/3666
      
      CPU: 0 PID: 3666 Comm: syz-executor579 Not tainted 5.16.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       __dump_stack lib/dump_stack.c:88 [inline] lib/dump_stack.c:106
       dump_stack_lvl+0x1dc/0x2d8 lib/dump_stack.c:106 lib/dump_stack.c:106
       print_address_description+0x65/0x380 mm/kasan/report.c:247 mm/kasan/report.c:247
       __kasan_report mm/kasan/report.c:433 [inline]
       __kasan_report mm/kasan/report.c:433 [inline] mm/kasan/report.c:450
       kasan_report+0x19a/0x1f0 mm/kasan/report.c:450 mm/kasan/report.c:450
       nf_hook_entries_grow+0x5a7/0x700 net/netfilter/core.c:142 net/netfilter/core.c:142
       __nf_register_net_hook+0x27e/0x8d0 net/netfilter/core.c:429 net/netfilter/core.c:429
       nf_register_net_hook+0xaa/0x180 net/netfilter/core.c:571 net/netfilter/core.c:571
       nft_register_flowtable_net_hooks+0x3c5/0x730 net/netfilter/nf_tables_api.c:7232 net/netfilter/nf_tables_api.c:7232
       nf_tables_newflowtable+0x2022/0x2cf0 net/netfilter/nf_tables_api.c:7430 net/netfilter/nf_tables_api.c:7430
       nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline]
       nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline]
       nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline] net/netfilter/nfnetlink.c:652
       nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] net/netfilter/nfnetlink.c:652
       nfnetlink_rcv+0x10e6/0x2550 net/netfilter/nfnetlink.c:652 net/netfilter/nfnetlink.c:652
      
      __nft_release_hook() calls nft_unregister_flowtable_net_hooks() which
      only unregisters the hooks, then after RCU grace period, it is
      guaranteed that no packets add new entries to the flowtable (no flow
      offload rules and flowtable hooks are reachable from packet path), so it
      is safe to call nf_flow_table_free() which cleans up the remaining
      entries from the flowtable (both software and hardware) and it unbinds
      the flow_block.
      
      Fixes: ff4bf2f4 ("netfilter: nf_tables: add nft_unregister_flowtable_hook()")
      Reported-by: syzbot+e918523f77e62790d6d9@syzkaller.appspotmail.com
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6069da44
    • Baruch Siach's avatar
      net: mdio-ipq4019: add delay after clock enable · b6ad6261
      Baruch Siach authored
      Experimentation shows that PHY detect might fail when the code attempts
      MDIO bus read immediately after clock enable. Add delay to stabilize the
      clock before bus access.
      
      PHY detect failure started to show after commit 7590fc6f ("net:
      mdio: Demote probed message to debug print") that removed coincidental
      delay between clock enable and bus access.
      
      10ms is meant to match the time it take to send the probed message over
      UART at 115200 bps. This might be a far overshoot.
      
      Fixes: 23a890d4 ("net: mdio: Add the reset function for IPQ MDIO driver")
      Signed-off-by: default avatarBaruch Siach <baruch.siach@siklu.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6ad6261
    • Tao Liu's avatar
      gso: do not skip outer ip header in case of ipip and net_failover · cc20cced
      Tao Liu authored
      We encounter a tcp drop issue in our cloud environment. Packet GROed in
      host forwards to a VM virtio_net nic with net_failover enabled. VM acts
      as a IPVS LB with ipip encapsulation. The full path like:
      host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat
       -> ipip encap -> net_failover tx -> virtio_net tx
      
      When net_failover transmits a ipip pkt (gso_type = 0x0103, which means
      SKB_GSO_TCPV4, SKB_GSO_DODGY and SKB_GSO_IPXIP4), there is no gso
      did because it supports TSO and GSO_IPXIP4. But network_header points to
      inner ip header.
      
      Call Trace:
       tcp4_gso_segment        ------> return NULL
       inet_gso_segment        ------> inner iph, network_header points to
       ipip_gso_segment
       inet_gso_segment        ------> outer iph
       skb_mac_gso_segment
      
      Afterwards virtio_net transmits the pkt, only inner ip header is modified.
      And the outer one just keeps unchanged. The pkt will be dropped in remote
      host.
      
      Call Trace:
       inet_gso_segment        ------> inner iph, outer iph is skipped
       skb_mac_gso_segment
       __skb_gso_segment
       validate_xmit_skb
       validate_xmit_skb_list
       sch_direct_xmit
       __qdisc_run
       __dev_queue_xmit        ------> virtio_net
       dev_hard_start_xmit
       __dev_queue_xmit        ------> net_failover
       ip_finish_output2
       ip_output
       iptunnel_xmit
       ip_tunnel_xmit
       ipip_tunnel_xmit        ------> ipip
       dev_hard_start_xmit
       __dev_queue_xmit
       ip_finish_output2
       ip_output
       ip_forward
       ip_rcv
       __netif_receive_skb_one_core
       netif_receive_skb_internal
       napi_gro_receive
       receive_buf
       virtnet_poll
       net_rx_action
      
      The root cause of this issue is specific with the rare combination of
      SKB_GSO_DODGY and a tunnel device that adds an SKB_GSO_ tunnel option.
      SKB_GSO_DODGY is set from external virtio_net. We need to reset network
      header when callbacks.gso_segment() returns NULL.
      
      This patch also includes ipv6_gso_segment(), considering SIT, etc.
      
      Fixes: cb32f511 ("ipip: add GSO/TSO support")
      Signed-off-by: default avatarTao Liu <thomas.liu@ucloud.cn>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc20cced
  5. 20 Feb, 2022 9 commits
  6. 19 Feb, 2022 7 commits
    • Vladimir Oltean's avatar
      net: dsa: avoid call to __dev_set_promiscuity() while rtnl_mutex isn't held · 8940e6b6
      Vladimir Oltean authored
      If the DSA master doesn't support IFF_UNICAST_FLT, then the following
      call path is possible:
      
      dsa_slave_switchdev_event_work
      -> dsa_port_host_fdb_add
         -> dev_uc_add
            -> __dev_set_rx_mode
               -> __dev_set_promiscuity
      
      Since the blamed commit, dsa_slave_switchdev_event_work() no longer
      holds rtnl_lock(), which triggers the ASSERT_RTNL() from
      __dev_set_promiscuity().
      
      Taking rtnl_lock() around dev_uc_add() is impossible, because all the
      code paths that call dsa_flush_workqueue() do so from contexts where the
      rtnl_mutex is already held - so this would lead to an instant deadlock.
      
      dev_uc_add() in itself doesn't require the rtnl_mutex for protection.
      There is this comment in __dev_set_rx_mode() which assumes so:
      
      		/* Unicast addresses changes may only happen under the rtnl,
      		 * therefore calling __dev_set_promiscuity here is safe.
      		 */
      
      but it is from commit 4417da66 ("[NET]: dev: secondary unicast
      address support") dated June 2007, and in the meantime, commit
      f1f28aa3 ("netdev: Add addr_list_lock to struct net_device."), dated
      July 2008, has added &dev->addr_list_lock to protect this instead of the
      global rtnl_mutex.
      
      Nonetheless, __dev_set_promiscuity() does assume rtnl_mutex protection,
      but it is the uncommon path of what we typically expect dev_uc_add()
      to do. So since only the uncommon path requires rtnl_lock(), just check
      ahead of time whether dev_uc_add() would result into a call to
      __dev_set_promiscuity(), and handle that condition separately.
      
      DSA already configures the master interface to be promiscuous if the
      tagger requires this. We can extend this to also cover the case where
      the master doesn't handle dev_uc_add() (doesn't support IFF_UNICAST_FLT),
      and on the premise that we'd end up making it promiscuous during
      operation anyway, either if a DSA slave has a non-inherited MAC address,
      or if the bridge notifies local FDB entries for its own MAC address, the
      address of a station learned on a foreign port, etc.
      
      Fixes: 0faf890f ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work")
      Reported-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8940e6b6
    • Svenning Sørensen's avatar
      net: dsa: microchip: fix bridging with more than two member ports · 3d00827a
      Svenning Sørensen authored
      Commit b3612ccd ("net: dsa: microchip: implement multi-bridge support")
      plugged a packet leak between ports that were members of different bridges.
      Unfortunately, this broke another use case, namely that of more than two
      ports that are members of the same bridge.
      
      After that commit, when a port is added to a bridge, hardware bridging
      between other member ports of that bridge will be cleared, preventing
      packet exchange between them.
      
      Fix by ensuring that the Port VLAN Membership bitmap includes any existing
      ports in the bridge, not just the port being added.
      
      Fixes: b3612ccd ("net: dsa: microchip: implement multi-bridge support")
      Signed-off-by: default avatarSvenning Sørensen <sss@secomea.com>
      Tested-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d00827a
    • Christophe Leroy's avatar
      net: Force inlining of checksum functions in net/checksum.h · 5486f5bf
      Christophe Leroy authored
      All functions defined as static inline in net/checksum.h are
      meant to be inlined for performance reason.
      
      But since commit ac7c3e4f ("compiler: enable
      CONFIG_OPTIMIZE_INLINING forcibly") the compiler is allowed to
      uninline functions when it wants.
      
      Fair enough in the general case, but for tiny performance critical
      checksum helpers that's counter-productive.
      
      The problem mainly arises when selecting CONFIG_CC_OPTIMISE_FOR_SIZE,
      Those helpers being 'static inline' in header files you suddenly find
      them duplicated many times in the resulting vmlinux.
      
      Here is a typical exemple when building powerpc pmac32_defconfig
      with CONFIG_CC_OPTIMISE_FOR_SIZE. csum_sub() appears 4 times:
      
      	c04a23cc <csum_sub>:
      	c04a23cc:	7c 84 20 f8 	not     r4,r4
      	c04a23d0:	7c 63 20 14 	addc    r3,r3,r4
      	c04a23d4:	7c 63 01 94 	addze   r3,r3
      	c04a23d8:	4e 80 00 20 	blr
      		...
      	c04a2ce8:	4b ff f6 e5 	bl      c04a23cc <csum_sub>
      		...
      	c04a2d2c:	4b ff f6 a1 	bl      c04a23cc <csum_sub>
      		...
      	c04a2d54:	4b ff f6 79 	bl      c04a23cc <csum_sub>
      		...
      	c04a754c <csum_sub>:
      	c04a754c:	7c 84 20 f8 	not     r4,r4
      	c04a7550:	7c 63 20 14 	addc    r3,r3,r4
      	c04a7554:	7c 63 01 94 	addze   r3,r3
      	c04a7558:	4e 80 00 20 	blr
      		...
      	c04ac930:	4b ff ac 1d 	bl      c04a754c <csum_sub>
      		...
      	c04ad264:	4b ff a2 e9 	bl      c04a754c <csum_sub>
      		...
      	c04e3b08 <csum_sub>:
      	c04e3b08:	7c 84 20 f8 	not     r4,r4
      	c04e3b0c:	7c 63 20 14 	addc    r3,r3,r4
      	c04e3b10:	7c 63 01 94 	addze   r3,r3
      	c04e3b14:	4e 80 00 20 	blr
      		...
      	c04e5788:	4b ff e3 81 	bl      c04e3b08 <csum_sub>
      		...
      	c04e65c8:	4b ff d5 41 	bl      c04e3b08 <csum_sub>
      		...
      	c0512d34 <csum_sub>:
      	c0512d34:	7c 84 20 f8 	not     r4,r4
      	c0512d38:	7c 63 20 14 	addc    r3,r3,r4
      	c0512d3c:	7c 63 01 94 	addze   r3,r3
      	c0512d40:	4e 80 00 20 	blr
      		...
      	c0512dfc:	4b ff ff 39 	bl      c0512d34 <csum_sub>
      		...
      	c05138bc:	4b ff f4 79 	bl      c0512d34 <csum_sub>
      		...
      
      Restore the expected behaviour by using __always_inline for all
      functions defined in net/checksum.h
      
      vmlinux size is even reduced by 256 bytes with this patch:
      
      	   text	   data	    bss	    dec	    hex	filename
      	6980022	2515362	 194384	9689768	 93daa8	vmlinux.before
      	6979862	2515266	 194384	9689512	 93d9a8	vmlinux.now
      
      Fixes: ac7c3e4f ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly")
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5486f5bf
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 0033fced
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-02-18
      
      This series contains updates to ice driver only.
      
      Wojciech fixes protocol matching for slow-path switchdev so that all
      packets are correctly redirected.
      
      Michal removes accidental unconditional setting of l4 port filtering
      flag.
      
      Jake adds locking to protect VF reset and removal to fix various issues
      that can be encountered when they race with each other.
      
      Tom Rix propagates an error and initializes a struct to resolve reported
      Clang issues.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0033fced
    • David S. Miller's avatar
      Merge branch 'mptcp-fixes' · 90141edc
      David S. Miller authored
      Mat Martineau says:
      
      ====================
      mptcp: Fix address advertisement races and stabilize tests
      
      Patches 1, 2, and 7 modify two self tests to give consistent, accurate
      results by fixing timing issues and accounting for syncookie behavior.
      
      Paches 3-6 fix two races in overlapping address advertisement send and
      receive. Associated self tests are updated, including addition of two
      MIBs to enable testing and tracking dropped address events.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90141edc
    • Paolo Abeni's avatar
      selftests: mptcp: be more conservative with cookie MPJ limits · e35f885b
      Paolo Abeni authored
      Since commit 2843ff6f ("mptcp: remote addresses fullmesh"), an
      MPTCP client can attempt creating multiple MPJ subflow simultaneusly.
      
      In such scenario the server, when syncookies are enabled, could end-up
      accepting incoming MPJ syn even above the configured subflow limit, as
      the such limit can be enforced in a reliable way only after the subflow
      creation. In case of syncookie, only after the 3rd ack reception.
      
      As a consequence the related self-tests case sporadically fails, as it
      verify that the server always accept the expected number of MPJ syn.
      
      Address the issues relaxing the MPJ syn number constrain. Note that the
      check on the accepted number of MPJ 3rd ack still remains intact.
      
      Fixes: 2843ff6f ("mptcp: remote addresses fullmesh")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e35f885b
    • Paolo Abeni's avatar
      selftests: mptcp: more robust signal race test · 6ef84b15
      Paolo Abeni authored
      The in kernel MPTCP PM implementation can process a single
      incoming add address option at any given time. In the
      mentioned test the server can surpass such limit. Let the
      setup cope with that allowing a faster add_addr retransmission.
      
      Fixes: a88c9e49 ("mptcp: do not block subflows creation on errors")
      Fixes: f7efc777 ("mptcp: drop argument port from mptcp_pm_announce_addr")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/254Reported-and-tested-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ef84b15