1. 28 Feb, 2022 1 commit
    • Eric Dumazet's avatar
      netfilter: fix use-after-free in __nf_register_net_hook() · 56763f12
      Eric Dumazet authored
      We must not dereference @new_hooks after nf_hook_mutex has been released,
      because other threads might have freed our allocated hooks already.
      
      BUG: KASAN: use-after-free in nf_hook_entries_get_hook_ops include/linux/netfilter.h:130 [inline]
      BUG: KASAN: use-after-free in hooks_validate net/netfilter/core.c:171 [inline]
      BUG: KASAN: use-after-free in __nf_register_net_hook+0x77a/0x820 net/netfilter/core.c:438
      Read of size 2 at addr ffff88801c1a8000 by task syz-executor237/4430
      
      CPU: 1 PID: 4430 Comm: syz-executor237 Not tainted 5.17.0-rc5-syzkaller-00306-g2293be58 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0x8d/0x336 mm/kasan/report.c:255
       __kasan_report mm/kasan/report.c:442 [inline]
       kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
       nf_hook_entries_get_hook_ops include/linux/netfilter.h:130 [inline]
       hooks_validate net/netfilter/core.c:171 [inline]
       __nf_register_net_hook+0x77a/0x820 net/netfilter/core.c:438
       nf_register_net_hook+0x114/0x170 net/netfilter/core.c:571
       nf_register_net_hooks+0x59/0xc0 net/netfilter/core.c:587
       nf_synproxy_ipv6_init+0x85/0xe0 net/netfilter/nf_synproxy_core.c:1218
       synproxy_tg6_check+0x30d/0x560 net/ipv6/netfilter/ip6t_SYNPROXY.c:81
       xt_check_target+0x26c/0x9e0 net/netfilter/x_tables.c:1038
       check_target net/ipv6/netfilter/ip6_tables.c:530 [inline]
       find_check_entry.constprop.0+0x7f1/0x9e0 net/ipv6/netfilter/ip6_tables.c:573
       translate_table+0xc8b/0x1750 net/ipv6/netfilter/ip6_tables.c:735
       do_replace net/ipv6/netfilter/ip6_tables.c:1153 [inline]
       do_ip6t_set_ctl+0x56e/0xb90 net/ipv6/netfilter/ip6_tables.c:1639
       nf_setsockopt+0x83/0xe0 net/netfilter/nf_sockopt.c:101
       ipv6_setsockopt+0x122/0x180 net/ipv6/ipv6_sockglue.c:1024
       rawv6_setsockopt+0xd3/0x6a0 net/ipv6/raw.c:1084
       __sys_setsockopt+0x2db/0x610 net/socket.c:2180
       __do_sys_setsockopt net/socket.c:2191 [inline]
       __se_sys_setsockopt net/socket.c:2188 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f65a1ace7d9
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 71 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f65a1a7f308 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f65a1ace7d9
      RDX: 0000000000000040 RSI: 0000000000000029 RDI: 0000000000000003
      RBP: 00007f65a1b574c8 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000020000000 R11: 0000000000000246 R12: 00007f65a1b55130
      R13: 00007f65a1b574c0 R14: 00007f65a1b24090 R15: 0000000000022000
       </TASK>
      
      The buggy address belongs to the page:
      page:ffffea0000706a00 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1c1a8
      flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000000000 ffffea0001c1b108 ffffea000046dd08 0000000000000000
      raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as freed
      page last allocated via order 2, migratetype Unmovable, gfp_mask 0x52dc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO), pid 4430, ts 1061781545818, free_ts 1061791488993
       prep_new_page mm/page_alloc.c:2434 [inline]
       get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4165
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5389
       __alloc_pages_node include/linux/gfp.h:572 [inline]
       alloc_pages_node include/linux/gfp.h:595 [inline]
       kmalloc_large_node+0x62/0x130 mm/slub.c:4438
       __kmalloc_node+0x35a/0x4a0 mm/slub.c:4454
       kmalloc_node include/linux/slab.h:604 [inline]
       kvmalloc_node+0x97/0x100 mm/util.c:580
       kvmalloc include/linux/slab.h:731 [inline]
       kvzalloc include/linux/slab.h:739 [inline]
       allocate_hook_entries_size net/netfilter/core.c:61 [inline]
       nf_hook_entries_grow+0x140/0x780 net/netfilter/core.c:128
       __nf_register_net_hook+0x144/0x820 net/netfilter/core.c:429
       nf_register_net_hook+0x114/0x170 net/netfilter/core.c:571
       nf_register_net_hooks+0x59/0xc0 net/netfilter/core.c:587
       nf_synproxy_ipv6_init+0x85/0xe0 net/netfilter/nf_synproxy_core.c:1218
       synproxy_tg6_check+0x30d/0x560 net/ipv6/netfilter/ip6t_SYNPROXY.c:81
       xt_check_target+0x26c/0x9e0 net/netfilter/x_tables.c:1038
       check_target net/ipv6/netfilter/ip6_tables.c:530 [inline]
       find_check_entry.constprop.0+0x7f1/0x9e0 net/ipv6/netfilter/ip6_tables.c:573
       translate_table+0xc8b/0x1750 net/ipv6/netfilter/ip6_tables.c:735
       do_replace net/ipv6/netfilter/ip6_tables.c:1153 [inline]
       do_ip6t_set_ctl+0x56e/0xb90 net/ipv6/netfilter/ip6_tables.c:1639
       nf_setsockopt+0x83/0xe0 net/netfilter/nf_sockopt.c:101
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1352 [inline]
       free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1404
       free_unref_page_prepare mm/page_alloc.c:3325 [inline]
       free_unref_page+0x19/0x690 mm/page_alloc.c:3404
       kvfree+0x42/0x50 mm/util.c:613
       rcu_do_batch kernel/rcu/tree.c:2527 [inline]
       rcu_core+0x7b1/0x1820 kernel/rcu/tree.c:2778
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
      
      Memory state around the buggy address:
       ffff88801c1a7f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff88801c1a7f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      >ffff88801c1a8000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                         ^
       ffff88801c1a8080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff88801c1a8100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      
      Fixes: 2420b79f ("netfilter: debug: check for sorted array")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      56763f12
  2. 23 Feb, 2022 4 commits
  3. 22 Feb, 2022 2 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 5663b854
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      This is fixing up the use without proper initialization in patch 5/5
      
      -o-
      
      Hi,
      
      The following patchset contains Netfilter fixes for net:
      
      1) Missing #ifdef CONFIG_IP6_NF_IPTABLES in recent xt_socket fix.
      
      2) Fix incorrect flow action array size in nf_tables.
      
      3) Unregister flowtable hooks from netns exit path.
      
      4) Fix missing limit object release, from Florian Westphal.
      
      5) Memleak in nf_tables object update path, also from Florian.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5663b854
    • Florian Westphal's avatar
      netfilter: nf_tables: fix memory leak during stateful obj update · dad3bdee
      Florian Westphal authored
      stateful objects can be updated from the control plane.
      The transaction logic allocates a temporary object for this purpose.
      
      The ->init function was called for this object, so plain kfree() leaks
      resources. We must call ->destroy function of the object.
      
      nft_obj_destroy does this, but it also decrements the module refcount,
      but the update path doesn't increment it.
      
      To avoid special-casing the update object release, do module_get for
      the update case too and release it via nft_obj_destroy().
      
      Fixes: d62d0ba9 ("netfilter: nf_tables: Introduce stateful object update operation")
      Cc: Fernando Fernandez Mancera <ffmancera@riseup.net>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      dad3bdee
  4. 21 Feb, 2022 4 commits
    • Florian Westphal's avatar
      netfilter: nft_limit: fix stateful object memory leak · 1a58f84e
      Florian Westphal authored
      We need to provide a destroy callback to release the extra fields.
      
      Fixes: 3b9e2ea6 ("netfilter: nft_limit: move stateful fields out of expression data")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1a58f84e
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: unregister flowtable hooks on netns exit · 6069da44
      Pablo Neira Ayuso authored
      Unregister flowtable hooks before they are releases via
      nf_tables_flowtable_destroy() otherwise hook core reports UAF.
      
      BUG: KASAN: use-after-free in nf_hook_entries_grow+0x5a7/0x700 net/netfilter/core.c:142 net/netfilter/core.c:142
      Read of size 4 at addr ffff8880736f7438 by task syz-executor579/3666
      
      CPU: 0 PID: 3666 Comm: syz-executor579 Not tainted 5.16.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       __dump_stack lib/dump_stack.c:88 [inline] lib/dump_stack.c:106
       dump_stack_lvl+0x1dc/0x2d8 lib/dump_stack.c:106 lib/dump_stack.c:106
       print_address_description+0x65/0x380 mm/kasan/report.c:247 mm/kasan/report.c:247
       __kasan_report mm/kasan/report.c:433 [inline]
       __kasan_report mm/kasan/report.c:433 [inline] mm/kasan/report.c:450
       kasan_report+0x19a/0x1f0 mm/kasan/report.c:450 mm/kasan/report.c:450
       nf_hook_entries_grow+0x5a7/0x700 net/netfilter/core.c:142 net/netfilter/core.c:142
       __nf_register_net_hook+0x27e/0x8d0 net/netfilter/core.c:429 net/netfilter/core.c:429
       nf_register_net_hook+0xaa/0x180 net/netfilter/core.c:571 net/netfilter/core.c:571
       nft_register_flowtable_net_hooks+0x3c5/0x730 net/netfilter/nf_tables_api.c:7232 net/netfilter/nf_tables_api.c:7232
       nf_tables_newflowtable+0x2022/0x2cf0 net/netfilter/nf_tables_api.c:7430 net/netfilter/nf_tables_api.c:7430
       nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline]
       nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline]
       nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline] net/netfilter/nfnetlink.c:652
       nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] net/netfilter/nfnetlink.c:652
       nfnetlink_rcv+0x10e6/0x2550 net/netfilter/nfnetlink.c:652 net/netfilter/nfnetlink.c:652
      
      __nft_release_hook() calls nft_unregister_flowtable_net_hooks() which
      only unregisters the hooks, then after RCU grace period, it is
      guaranteed that no packets add new entries to the flowtable (no flow
      offload rules and flowtable hooks are reachable from packet path), so it
      is safe to call nf_flow_table_free() which cleans up the remaining
      entries from the flowtable (both software and hardware) and it unbinds
      the flow_block.
      
      Fixes: ff4bf2f4 ("netfilter: nf_tables: add nft_unregister_flowtable_hook()")
      Reported-by: syzbot+e918523f77e62790d6d9@syzkaller.appspotmail.com
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6069da44
    • Baruch Siach's avatar
      net: mdio-ipq4019: add delay after clock enable · b6ad6261
      Baruch Siach authored
      Experimentation shows that PHY detect might fail when the code attempts
      MDIO bus read immediately after clock enable. Add delay to stabilize the
      clock before bus access.
      
      PHY detect failure started to show after commit 7590fc6f ("net:
      mdio: Demote probed message to debug print") that removed coincidental
      delay between clock enable and bus access.
      
      10ms is meant to match the time it take to send the probed message over
      UART at 115200 bps. This might be a far overshoot.
      
      Fixes: 23a890d4 ("net: mdio: Add the reset function for IPQ MDIO driver")
      Signed-off-by: default avatarBaruch Siach <baruch.siach@siklu.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6ad6261
    • Tao Liu's avatar
      gso: do not skip outer ip header in case of ipip and net_failover · cc20cced
      Tao Liu authored
      We encounter a tcp drop issue in our cloud environment. Packet GROed in
      host forwards to a VM virtio_net nic with net_failover enabled. VM acts
      as a IPVS LB with ipip encapsulation. The full path like:
      host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat
       -> ipip encap -> net_failover tx -> virtio_net tx
      
      When net_failover transmits a ipip pkt (gso_type = 0x0103, which means
      SKB_GSO_TCPV4, SKB_GSO_DODGY and SKB_GSO_IPXIP4), there is no gso
      did because it supports TSO and GSO_IPXIP4. But network_header points to
      inner ip header.
      
      Call Trace:
       tcp4_gso_segment        ------> return NULL
       inet_gso_segment        ------> inner iph, network_header points to
       ipip_gso_segment
       inet_gso_segment        ------> outer iph
       skb_mac_gso_segment
      
      Afterwards virtio_net transmits the pkt, only inner ip header is modified.
      And the outer one just keeps unchanged. The pkt will be dropped in remote
      host.
      
      Call Trace:
       inet_gso_segment        ------> inner iph, outer iph is skipped
       skb_mac_gso_segment
       __skb_gso_segment
       validate_xmit_skb
       validate_xmit_skb_list
       sch_direct_xmit
       __qdisc_run
       __dev_queue_xmit        ------> virtio_net
       dev_hard_start_xmit
       __dev_queue_xmit        ------> net_failover
       ip_finish_output2
       ip_output
       iptunnel_xmit
       ip_tunnel_xmit
       ipip_tunnel_xmit        ------> ipip
       dev_hard_start_xmit
       __dev_queue_xmit
       ip_finish_output2
       ip_output
       ip_forward
       ip_rcv
       __netif_receive_skb_one_core
       netif_receive_skb_internal
       napi_gro_receive
       receive_buf
       virtnet_poll
       net_rx_action
      
      The root cause of this issue is specific with the rare combination of
      SKB_GSO_DODGY and a tunnel device that adds an SKB_GSO_ tunnel option.
      SKB_GSO_DODGY is set from external virtio_net. We need to reset network
      header when callbacks.gso_segment() returns NULL.
      
      This patch also includes ipv6_gso_segment(), considering SIT, etc.
      
      Fixes: cb32f511 ("ipip: add GSO/TSO support")
      Signed-off-by: default avatarTao Liu <thomas.liu@ucloud.cn>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc20cced
  5. 20 Feb, 2022 9 commits
  6. 19 Feb, 2022 15 commits
    • Vladimir Oltean's avatar
      net: dsa: avoid call to __dev_set_promiscuity() while rtnl_mutex isn't held · 8940e6b6
      Vladimir Oltean authored
      If the DSA master doesn't support IFF_UNICAST_FLT, then the following
      call path is possible:
      
      dsa_slave_switchdev_event_work
      -> dsa_port_host_fdb_add
         -> dev_uc_add
            -> __dev_set_rx_mode
               -> __dev_set_promiscuity
      
      Since the blamed commit, dsa_slave_switchdev_event_work() no longer
      holds rtnl_lock(), which triggers the ASSERT_RTNL() from
      __dev_set_promiscuity().
      
      Taking rtnl_lock() around dev_uc_add() is impossible, because all the
      code paths that call dsa_flush_workqueue() do so from contexts where the
      rtnl_mutex is already held - so this would lead to an instant deadlock.
      
      dev_uc_add() in itself doesn't require the rtnl_mutex for protection.
      There is this comment in __dev_set_rx_mode() which assumes so:
      
      		/* Unicast addresses changes may only happen under the rtnl,
      		 * therefore calling __dev_set_promiscuity here is safe.
      		 */
      
      but it is from commit 4417da66 ("[NET]: dev: secondary unicast
      address support") dated June 2007, and in the meantime, commit
      f1f28aa3 ("netdev: Add addr_list_lock to struct net_device."), dated
      July 2008, has added &dev->addr_list_lock to protect this instead of the
      global rtnl_mutex.
      
      Nonetheless, __dev_set_promiscuity() does assume rtnl_mutex protection,
      but it is the uncommon path of what we typically expect dev_uc_add()
      to do. So since only the uncommon path requires rtnl_lock(), just check
      ahead of time whether dev_uc_add() would result into a call to
      __dev_set_promiscuity(), and handle that condition separately.
      
      DSA already configures the master interface to be promiscuous if the
      tagger requires this. We can extend this to also cover the case where
      the master doesn't handle dev_uc_add() (doesn't support IFF_UNICAST_FLT),
      and on the premise that we'd end up making it promiscuous during
      operation anyway, either if a DSA slave has a non-inherited MAC address,
      or if the bridge notifies local FDB entries for its own MAC address, the
      address of a station learned on a foreign port, etc.
      
      Fixes: 0faf890f ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work")
      Reported-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8940e6b6
    • Svenning Sørensen's avatar
      net: dsa: microchip: fix bridging with more than two member ports · 3d00827a
      Svenning Sørensen authored
      Commit b3612ccd ("net: dsa: microchip: implement multi-bridge support")
      plugged a packet leak between ports that were members of different bridges.
      Unfortunately, this broke another use case, namely that of more than two
      ports that are members of the same bridge.
      
      After that commit, when a port is added to a bridge, hardware bridging
      between other member ports of that bridge will be cleared, preventing
      packet exchange between them.
      
      Fix by ensuring that the Port VLAN Membership bitmap includes any existing
      ports in the bridge, not just the port being added.
      
      Fixes: b3612ccd ("net: dsa: microchip: implement multi-bridge support")
      Signed-off-by: default avatarSvenning Sørensen <sss@secomea.com>
      Tested-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d00827a
    • Christophe Leroy's avatar
      net: Force inlining of checksum functions in net/checksum.h · 5486f5bf
      Christophe Leroy authored
      All functions defined as static inline in net/checksum.h are
      meant to be inlined for performance reason.
      
      But since commit ac7c3e4f ("compiler: enable
      CONFIG_OPTIMIZE_INLINING forcibly") the compiler is allowed to
      uninline functions when it wants.
      
      Fair enough in the general case, but for tiny performance critical
      checksum helpers that's counter-productive.
      
      The problem mainly arises when selecting CONFIG_CC_OPTIMISE_FOR_SIZE,
      Those helpers being 'static inline' in header files you suddenly find
      them duplicated many times in the resulting vmlinux.
      
      Here is a typical exemple when building powerpc pmac32_defconfig
      with CONFIG_CC_OPTIMISE_FOR_SIZE. csum_sub() appears 4 times:
      
      	c04a23cc <csum_sub>:
      	c04a23cc:	7c 84 20 f8 	not     r4,r4
      	c04a23d0:	7c 63 20 14 	addc    r3,r3,r4
      	c04a23d4:	7c 63 01 94 	addze   r3,r3
      	c04a23d8:	4e 80 00 20 	blr
      		...
      	c04a2ce8:	4b ff f6 e5 	bl      c04a23cc <csum_sub>
      		...
      	c04a2d2c:	4b ff f6 a1 	bl      c04a23cc <csum_sub>
      		...
      	c04a2d54:	4b ff f6 79 	bl      c04a23cc <csum_sub>
      		...
      	c04a754c <csum_sub>:
      	c04a754c:	7c 84 20 f8 	not     r4,r4
      	c04a7550:	7c 63 20 14 	addc    r3,r3,r4
      	c04a7554:	7c 63 01 94 	addze   r3,r3
      	c04a7558:	4e 80 00 20 	blr
      		...
      	c04ac930:	4b ff ac 1d 	bl      c04a754c <csum_sub>
      		...
      	c04ad264:	4b ff a2 e9 	bl      c04a754c <csum_sub>
      		...
      	c04e3b08 <csum_sub>:
      	c04e3b08:	7c 84 20 f8 	not     r4,r4
      	c04e3b0c:	7c 63 20 14 	addc    r3,r3,r4
      	c04e3b10:	7c 63 01 94 	addze   r3,r3
      	c04e3b14:	4e 80 00 20 	blr
      		...
      	c04e5788:	4b ff e3 81 	bl      c04e3b08 <csum_sub>
      		...
      	c04e65c8:	4b ff d5 41 	bl      c04e3b08 <csum_sub>
      		...
      	c0512d34 <csum_sub>:
      	c0512d34:	7c 84 20 f8 	not     r4,r4
      	c0512d38:	7c 63 20 14 	addc    r3,r3,r4
      	c0512d3c:	7c 63 01 94 	addze   r3,r3
      	c0512d40:	4e 80 00 20 	blr
      		...
      	c0512dfc:	4b ff ff 39 	bl      c0512d34 <csum_sub>
      		...
      	c05138bc:	4b ff f4 79 	bl      c0512d34 <csum_sub>
      		...
      
      Restore the expected behaviour by using __always_inline for all
      functions defined in net/checksum.h
      
      vmlinux size is even reduced by 256 bytes with this patch:
      
      	   text	   data	    bss	    dec	    hex	filename
      	6980022	2515362	 194384	9689768	 93daa8	vmlinux.before
      	6979862	2515266	 194384	9689512	 93d9a8	vmlinux.now
      
      Fixes: ac7c3e4f ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly")
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5486f5bf
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 0033fced
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-02-18
      
      This series contains updates to ice driver only.
      
      Wojciech fixes protocol matching for slow-path switchdev so that all
      packets are correctly redirected.
      
      Michal removes accidental unconditional setting of l4 port filtering
      flag.
      
      Jake adds locking to protect VF reset and removal to fix various issues
      that can be encountered when they race with each other.
      
      Tom Rix propagates an error and initializes a struct to resolve reported
      Clang issues.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0033fced
    • David S. Miller's avatar
      Merge branch 'mptcp-fixes' · 90141edc
      David S. Miller authored
      Mat Martineau says:
      
      ====================
      mptcp: Fix address advertisement races and stabilize tests
      
      Patches 1, 2, and 7 modify two self tests to give consistent, accurate
      results by fixing timing issues and accounting for syncookie behavior.
      
      Paches 3-6 fix two races in overlapping address advertisement send and
      receive. Associated self tests are updated, including addition of two
      MIBs to enable testing and tracking dropped address events.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90141edc
    • Paolo Abeni's avatar
      selftests: mptcp: be more conservative with cookie MPJ limits · e35f885b
      Paolo Abeni authored
      Since commit 2843ff6f ("mptcp: remote addresses fullmesh"), an
      MPTCP client can attempt creating multiple MPJ subflow simultaneusly.
      
      In such scenario the server, when syncookies are enabled, could end-up
      accepting incoming MPJ syn even above the configured subflow limit, as
      the such limit can be enforced in a reliable way only after the subflow
      creation. In case of syncookie, only after the 3rd ack reception.
      
      As a consequence the related self-tests case sporadically fails, as it
      verify that the server always accept the expected number of MPJ syn.
      
      Address the issues relaxing the MPJ syn number constrain. Note that the
      check on the accepted number of MPJ 3rd ack still remains intact.
      
      Fixes: 2843ff6f ("mptcp: remote addresses fullmesh")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e35f885b
    • Paolo Abeni's avatar
      selftests: mptcp: more robust signal race test · 6ef84b15
      Paolo Abeni authored
      The in kernel MPTCP PM implementation can process a single
      incoming add address option at any given time. In the
      mentioned test the server can surpass such limit. Let the
      setup cope with that allowing a faster add_addr retransmission.
      
      Fixes: a88c9e49 ("mptcp: do not block subflows creation on errors")
      Fixes: f7efc777 ("mptcp: drop argument port from mptcp_pm_announce_addr")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/254Reported-and-tested-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ef84b15
    • Paolo Abeni's avatar
      mptcp: add mibs counter for ignored incoming options · f73c1194
      Paolo Abeni authored
      The MPTCP in kernel path manager has some constraints on incoming
      addresses announce processing, so that in edge scenarios it can
      end-up dropping (ignoring) some of such announces.
      
      The above is not very limiting in practice since such scenarios are
      very uncommon and MPTCP will recover due to ADD_ADDR retransmissions.
      
      This patch adds a few MIB counters to account for such drop events
      to allow easier introspection of the critical scenarios.
      
      Fixes: f7efc777 ("mptcp: drop argument port from mptcp_pm_announce_addr")
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f73c1194
    • Paolo Abeni's avatar
      mptcp: fix race in incoming ADD_ADDR option processing · 837cf45d
      Paolo Abeni authored
      If an MPTCP endpoint received multiple consecutive incoming
      ADD_ADDR options, mptcp_pm_add_addr_received() can overwrite
      the current remote address value after the PM lock is released
      in mptcp_pm_nl_add_addr_received() and before such address
      is echoed.
      
      Fix the issue caching the remote address value a little earlier
      and always using the cached value after releasing the PM lock.
      
      Fixes: f7efc777 ("mptcp: drop argument port from mptcp_pm_announce_addr")
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      837cf45d
    • Paolo Abeni's avatar
      mptcp: fix race in overlapping signal events · 98247bc1
      Paolo Abeni authored
      After commit a88c9e49 ("mptcp: do not block subflows
      creation on errors"), if a signal address races with a failing
      subflow creation, the subflow creation failure control path
      can trigger the selection of the next address to be announced
      while the current announced is still pending.
      
      The above will cause the unintended suppression of the ADD_ADDR
      announce.
      
      Fix the issue skipping the to-be-suppressed announce before it
      will mark an endpoint as already used. The relevant announce
      will be triggered again when the current one will complete.
      
      Fixes: a88c9e49 ("mptcp: do not block subflows creation on errors")
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98247bc1
    • Paolo Abeni's avatar
      selftests: mptcp: improve 'fair usage on close' stability · 5b31dda7
      Paolo Abeni authored
      The mentioned test has to wait for a subflow creation failure.
      The current code looks for TCP sockets in TW state and sometimes
      misses the relevant event. Switch to a more stable check, looking
      for the associated mib counter.
      
      Fixes: 46e967d1 ("selftests: mptcp: add tests for subflow creation failure")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/257Reported-and-tested-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b31dda7
    • Paolo Abeni's avatar
      selftests: mptcp: fix diag instability · 0cd33c5f
      Paolo Abeni authored
      Instead of waiting for an arbitrary amount of time for the MPTCP
      MP_CAPABLE handshake to complete, explicitly wait for the relevant
      socket to enter into the established status.
      
      Additionally let the data transfer application use the slowest
      transfer mode available (-r), to cope with very slow host, or
      high jitter caused by hosting VMs.
      
      Fixes: df62f2ec ("selftests/mptcp: add diag interface tests")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/258Reported-and-tested-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cd33c5f
    • Christophe JAILLET's avatar
      nfp: flower: Fix a potential leak in nfp_tunnel_add_shared_mac() · 3a14d088
      Christophe JAILLET authored
      ida_simple_get() returns an id between min (0) and max (NFP_MAX_MAC_INDEX)
      inclusive.
      So NFP_MAX_MAC_INDEX (0xff) is a valid id.
      
      In order for the error handling path to work correctly, the 'invalid'
      value for 'ida_idx' should not be in the 0..NFP_MAX_MAC_INDEX range,
      inclusive.
      
      So set it to -1.
      
      Fixes: 20cce886 ("nfp: flower: enable MAC address sharing for offloadable devs")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20220218131535.100258-1-simon.horman@corigine.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3a14d088
    • Subash Abhinov Kasiviswanathan's avatar
    • Jeremy Linton's avatar
      net: mvpp2: always set port pcs ops · 5a2aba71
      Jeremy Linton authored
      Booting a MACCHIATObin with 5.17, the system OOPs with
      a null pointer deref when the network is started. This
      is caused by the pcs->ops structure being null in
      mcpp2_acpi_start() when it tries to call pcs_config().
      
      Hoisting the code which sets pcs_gmac.ops and pcs_xlg.ops,
      assuring they are always set, fixes the problem.
      
      The OOPs looks like:
      [   18.687760] Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000010
      [   18.698561] Mem abort info:
      [   18.698564]   ESR = 0x96000004
      [   18.698567]   EC = 0x25: DABT (current EL), IL = 32 bits
      [   18.709821]   SET = 0, FnV = 0
      [   18.714292]   EA = 0, S1PTW = 0
      [   18.718833]   FSC = 0x04: level 0 translation fault
      [   18.725126] Data abort info:
      [   18.729408]   ISV = 0, ISS = 0x00000004
      [   18.734655]   CM = 0, WnR = 0
      [   18.738933] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000111bbf000
      [   18.745409] [0000000000000010] pgd=0000000000000000, p4d=0000000000000000
      [   18.752235] Internal error: Oops: 96000004 [#1] SMP
      [   18.757134] Modules linked in: rfkill ip_set nf_tables nfnetlink qrtr sunrpc vfat fat omap_rng fuse zram xfs crct10dif_ce mvpp2 ghash_ce sbsa_gwdt phylink xhci_plat_hcd ahci_plam
      [   18.773481] CPU: 0 PID: 681 Comm: NetworkManager Not tainted 5.17.0-0.rc3.89.fc36.aarch64 #1
      [   18.781954] Hardware name: Marvell                         Armada 7k/8k Family Board      /Armada 7k/8k Family Board      , BIOS EDK II Jun  4 2019
      [   18.795222] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [   18.802213] pc : mvpp2_start_dev+0x2b0/0x300 [mvpp2]
      [   18.807208] lr : mvpp2_start_dev+0x298/0x300 [mvpp2]
      [   18.812197] sp : ffff80000b4732c0
      [   18.815522] x29: ffff80000b4732c0 x28: 0000000000000000 x27: ffffccab38ae57f8
      [   18.822689] x26: ffff6eeb03065a10 x25: ffff80000b473a30 x24: ffff80000b4735b8
      [   18.829855] x23: 0000000000000000 x22: 00000000000001e0 x21: ffff6eeb07b6ab68
      [   18.837021] x20: ffff6eeb07b6ab30 x19: ffff6eeb07b6a9c0 x18: 0000000000000014
      [   18.844187] x17: 00000000f6232bfe x16: ffffccab899b1dc0 x15: 000000006a30f9fa
      [   18.851353] x14: 000000003b77bd50 x13: 000006dc896f0e8e x12: 001bbbfccfd0d3a2
      [   18.858519] x11: 0000000000001528 x10: 0000000000001548 x9 : ffffccab38ad0fb0
      [   18.865685] x8 : ffff80000b473330 x7 : 0000000000000000 x6 : 0000000000000000
      [   18.872851] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff80000b4732f8
      [   18.880017] x2 : 000000000000001a x1 : 0000000000000002 x0 : ffff6eeb07b6ab68
      [   18.887183] Call trace:
      [   18.889637]  mvpp2_start_dev+0x2b0/0x300 [mvpp2]
      [   18.894279]  mvpp2_open+0x134/0x2b4 [mvpp2]
      [   18.898483]  __dev_open+0x128/0x1e4
      [   18.901988]  __dev_change_flags+0x17c/0x1d0
      [   18.906187]  dev_change_flags+0x30/0x70
      [   18.910038]  do_setlink+0x278/0xa7c
      [   18.913540]  __rtnl_newlink+0x44c/0x7d0
      [   18.917391]  rtnl_newlink+0x5c/0x8c
      [   18.920892]  rtnetlink_rcv_msg+0x254/0x314
      [   18.925006]  netlink_rcv_skb+0x48/0x10c
      [   18.928858]  rtnetlink_rcv+0x24/0x30
      [   18.932449]  netlink_unicast+0x290/0x2f4
      [   18.936386]  netlink_sendmsg+0x1d0/0x41c
      [   18.940323]  sock_sendmsg+0x60/0x70
      [   18.943825]  ____sys_sendmsg+0x248/0x260
      [   18.947762]  ___sys_sendmsg+0x74/0xa0
      [   18.951438]  __sys_sendmsg+0x64/0xcc
      [   18.955027]  __arm64_sys_sendmsg+0x30/0x40
      [   18.959140]  invoke_syscall+0x50/0x120
      [   18.962906]  el0_svc_common.constprop.0+0x4c/0xf4
      [   18.967629]  do_el0_svc+0x30/0x9c
      [   18.970958]  el0_svc+0x28/0xb0
      [   18.974025]  el0t_64_sync_handler+0x10c/0x140
      [   18.978400]  el0t_64_sync+0x1a4/0x1a8
      [   18.982078] Code: 52800004 b9416262 aa1503e0 52800041 (f94008a5)
      [   18.988196] ---[ end trace 0000000000000000 ]---
      
      Fixes: cff05632 ("net: mvpp2: use .mac_select_pcs() interface")
      Suggested-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Reviewed-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Link: https://lore.kernel.org/r/20220214231852.3331430-1-jeremy.linton@arm.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5a2aba71
  7. 18 Feb, 2022 5 commits
    • Tom Rix's avatar
      ice: initialize local variable 'tlv' · 5950bdc8
      Tom Rix authored
      Clang static analysis reports this issues
      ice_common.c:5008:21: warning: The left expression of the compound
        assignment is an uninitialized value. The computed value will
        also be garbage
        ldo->phy_type_low |= ((u64)buf << (i * 16));
        ~~~~~~~~~~~~~~~~~ ^
      
      When called from ice_cfg_phy_fec() ldo is the uninitialized local
      variable tlv.  So initialize.
      
      Fixes: ea78ce4d ("ice: add link lenient and default override support")
      Signed-off-by: default avatarTom Rix <trix@redhat.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      5950bdc8
    • Tom Rix's avatar
      ice: check the return of ice_ptp_gettimex64 · ed22d9c8
      Tom Rix authored
      Clang static analysis reports this issue
      time64.h:69:50: warning: The left operand of '+'
        is a garbage value
        set_normalized_timespec64(&ts_delta, lhs.tv_sec + rhs.tv_sec,
                                             ~~~~~~~~~~ ^
      In ice_ptp_adjtime_nonatomic(), the timespec64 variable 'now'
      is set by ice_ptp_gettimex64().  This function can fail
      with -EBUSY, so 'now' can have a gargbage value.
      So check the return.
      
      Fixes: 06c16d89 ("ice: register 1588 PTP clock device object for E810 devices")
      Signed-off-by: default avatarTom Rix <trix@redhat.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      ed22d9c8
    • Jacob Keller's avatar
      ice: fix concurrent reset and removal of VFs · fadead80
      Jacob Keller authored
      Commit c503e632 ("ice: Stop processing VF messages during teardown")
      introduced a driver state flag, ICE_VF_DEINIT_IN_PROGRESS, which is
      intended to prevent some issues with concurrently handling messages from
      VFs while tearing down the VFs.
      
      This change was motivated by crashes caused while tearing down and
      bringing up VFs in rapid succession.
      
      It turns out that the fix actually introduces issues with the VF driver
      caused because the PF no longer responds to any messages sent by the VF
      during its .remove routine. This results in the VF potentially removing
      its DMA memory before the PF has shut down the device queues.
      
      Additionally, the fix doesn't actually resolve concurrency issues within
      the ice driver. It is possible for a VF to initiate a reset just prior
      to the ice driver removing VFs. This can result in the remove task
      concurrently operating while the VF is being reset. This results in
      similar memory corruption and panics purportedly fixed by that commit.
      
      Fix this concurrency at its root by protecting both the reset and
      removal flows using the existing VF cfg_lock. This ensures that we
      cannot remove the VF while any outstanding critical tasks such as a
      virtchnl message or a reset are occurring.
      
      This locking change also fixes the root cause originally fixed by commit
      c503e632 ("ice: Stop processing VF messages during teardown"), so we
      can simply revert it.
      
      Note that I kept these two changes together because simply reverting the
      original commit alone would leave the driver vulnerable to worse race
      conditions.
      
      Fixes: c503e632 ("ice: Stop processing VF messages during teardown")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      fadead80
    • Michal Swiatkowski's avatar
      ice: fix setting l4 port flag when adding filter · 932645c2
      Michal Swiatkowski authored
      Accidentally filter flag for none encapsulated l4 port field is always
      set. Even if user wants to add encapsulated l4 port field.
      
      Remove this unnecessary flag setting.
      
      Fixes: 9e300987 ("ice: VXLAN and Geneve TC support")
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      932645c2
    • Wojciech Drewek's avatar
      ice: Match on all profiles in slow-path · b70bc066
      Wojciech Drewek authored
      In switchdev mode, slow-path rules need to match all protocols, in order
      to correctly redirect unfiltered or missed packets to the uplink. To set
      this up for the virtual function to uplink flow, the rule that redirects
      packets to the control VSI must have the tunnel type set to
      ICE_SW_TUN_AND_NON_TUN. As a result of that new tunnel type being set,
      ice_get_compat_fv_bitmap will select ICE_PROF_ALL. At that point all
      profiles would be selected for this rule, resulting in the desired
      behavior. Without this change slow-path would not work with
      tunnel protocols.
      
      Fixes: 8b032a55 ("ice: low level support for tunnels")
      Signed-off-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Tested-by: default avatarSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      b70bc066