1. 30 Apr, 2022 13 commits
    • Eric Dumazet's avatar
      mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter() · a9384a4c
      Eric Dumazet authored
      Whenever RCU protected list replaces an object,
      the pointer to the new object needs to be updated
      _before_ the call to kfree_rcu() or call_rcu()
      
      Also ip6_mc_msfilter() needs to update the pointer
      before releasing the mc_lock mutex.
      
      Note that linux-5.13 was supporting kfree_rcu(NULL, rcu),
      so this fix does not need the conditional test I was
      forced to use in the equivalent patch for IPv4.
      
      Fixes: 882ba1f7 ("mld: convert ipv6_mc_socklist->sflist to RCU")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Taehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9384a4c
    • Eric Dumazet's avatar
      net: igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter() · dba5bdd5
      Eric Dumazet authored
      syzbot reported an UAF in ip_mc_sf_allow() [1]
      
      Whenever RCU protected list replaces an object,
      the pointer to the new object needs to be updated
      _before_ the call to kfree_rcu() or call_rcu()
      
      Because kfree_rcu(ptr, rcu) got support for NULL ptr
      only recently in commit 12edff04 ("rcu: Make kfree_rcu()
      ignore NULL pointers"), I chose to use the conditional
      to make sure stable backports won't miss this detail.
      
      if (psl)
          kfree_rcu(psl, rcu);
      
      net/ipv6/mcast.c has similar issues, addressed in a separate patch.
      
      [1]
      BUG: KASAN: use-after-free in ip_mc_sf_allow+0x6bb/0x6d0 net/ipv4/igmp.c:2655
      Read of size 4 at addr ffff88807d37b904 by task syz-executor.5/908
      
      CPU: 0 PID: 908 Comm: syz-executor.5 Not tainted 5.18.0-rc4-syzkaller-00064-g8f4dd166 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0xeb/0x467 mm/kasan/report.c:313
       print_report mm/kasan/report.c:429 [inline]
       kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491
       ip_mc_sf_allow+0x6bb/0x6d0 net/ipv4/igmp.c:2655
       raw_v4_input net/ipv4/raw.c:190 [inline]
       raw_local_deliver+0x4d1/0xbe0 net/ipv4/raw.c:218
       ip_protocol_deliver_rcu+0xcf/0xb30 net/ipv4/ip_input.c:193
       ip_local_deliver_finish+0x2ee/0x4c0 net/ipv4/ip_input.c:233
       NF_HOOK include/linux/netfilter.h:307 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ip_local_deliver+0x1b3/0x200 net/ipv4/ip_input.c:254
       dst_input include/net/dst.h:461 [inline]
       ip_rcv_finish+0x1cb/0x2f0 net/ipv4/ip_input.c:437
       NF_HOOK include/linux/netfilter.h:307 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ip_rcv+0xaa/0xd0 net/ipv4/ip_input.c:556
       __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5405
       __netif_receive_skb+0x24/0x1b0 net/core/dev.c:5519
       netif_receive_skb_internal net/core/dev.c:5605 [inline]
       netif_receive_skb+0x13e/0x8e0 net/core/dev.c:5664
       tun_rx_batched.isra.0+0x460/0x720 drivers/net/tun.c:1534
       tun_get_user+0x28b7/0x3e30 drivers/net/tun.c:1985
       tun_chr_write_iter+0xdb/0x200 drivers/net/tun.c:2015
       call_write_iter include/linux/fs.h:2050 [inline]
       new_sync_write+0x38a/0x560 fs/read_write.c:504
       vfs_write+0x7c0/0xac0 fs/read_write.c:591
       ksys_write+0x127/0x250 fs/read_write.c:644
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f3f12c3bbff
      Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 99 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 cc fd ff ff 48
      RSP: 002b:00007f3f13ea9130 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007f3f12d9bf60 RCX: 00007f3f12c3bbff
      RDX: 0000000000000036 RSI: 0000000020002ac0 RDI: 00000000000000c8
      RBP: 00007f3f12ce308d R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000036 R11: 0000000000000293 R12: 0000000000000000
      R13: 00007fffb68dd79f R14: 00007f3f13ea9300 R15: 0000000000022000
       </TASK>
      
      Allocated by task 908:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:45 [inline]
       set_alloc_info mm/kasan/common.c:436 [inline]
       ____kasan_kmalloc mm/kasan/common.c:515 [inline]
       ____kasan_kmalloc mm/kasan/common.c:474 [inline]
       __kasan_kmalloc+0xa6/0xd0 mm/kasan/common.c:524
       kasan_kmalloc include/linux/kasan.h:234 [inline]
       __do_kmalloc mm/slab.c:3710 [inline]
       __kmalloc+0x209/0x4d0 mm/slab.c:3719
       kmalloc include/linux/slab.h:586 [inline]
       sock_kmalloc net/core/sock.c:2501 [inline]
       sock_kmalloc+0xb5/0x100 net/core/sock.c:2492
       ip_mc_source+0xba2/0x1100 net/ipv4/igmp.c:2392
       do_ip_setsockopt net/ipv4/ip_sockglue.c:1296 [inline]
       ip_setsockopt+0x2312/0x3ab0 net/ipv4/ip_sockglue.c:1432
       raw_setsockopt+0x274/0x2c0 net/ipv4/raw.c:861
       __sys_setsockopt+0x2db/0x6a0 net/socket.c:2180
       __do_sys_setsockopt net/socket.c:2191 [inline]
       __se_sys_setsockopt net/socket.c:2188 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Freed by task 753:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       kasan_set_track+0x21/0x30 mm/kasan/common.c:45
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
       ____kasan_slab_free mm/kasan/common.c:366 [inline]
       ____kasan_slab_free+0x13d/0x180 mm/kasan/common.c:328
       kasan_slab_free include/linux/kasan.h:200 [inline]
       __cache_free mm/slab.c:3439 [inline]
       kmem_cache_free_bulk+0x69/0x460 mm/slab.c:3774
       kfree_bulk include/linux/slab.h:437 [inline]
       kfree_rcu_work+0x51c/0xa10 kernel/rcu/tree.c:3318
       process_one_work+0x996/0x1610 kernel/workqueue.c:2289
       worker_thread+0x665/0x1080 kernel/workqueue.c:2436
       kthread+0x2e9/0x3a0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
      
      Last potentially related work creation:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       __kasan_record_aux_stack+0x7e/0x90 mm/kasan/generic.c:348
       kvfree_call_rcu+0x74/0x990 kernel/rcu/tree.c:3595
       ip_mc_msfilter+0x712/0xb60 net/ipv4/igmp.c:2510
       do_ip_setsockopt net/ipv4/ip_sockglue.c:1257 [inline]
       ip_setsockopt+0x32e1/0x3ab0 net/ipv4/ip_sockglue.c:1432
       raw_setsockopt+0x274/0x2c0 net/ipv4/raw.c:861
       __sys_setsockopt+0x2db/0x6a0 net/socket.c:2180
       __do_sys_setsockopt net/socket.c:2191 [inline]
       __se_sys_setsockopt net/socket.c:2188 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Second to last potentially related work creation:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       __kasan_record_aux_stack+0x7e/0x90 mm/kasan/generic.c:348
       call_rcu+0x99/0x790 kernel/rcu/tree.c:3074
       mpls_dev_notify+0x552/0x8a0 net/mpls/af_mpls.c:1656
       notifier_call_chain+0xb5/0x200 kernel/notifier.c:84
       call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1938
       call_netdevice_notifiers_extack net/core/dev.c:1976 [inline]
       call_netdevice_notifiers net/core/dev.c:1990 [inline]
       unregister_netdevice_many+0x92e/0x1890 net/core/dev.c:10751
       default_device_exit_batch+0x449/0x590 net/core/dev.c:11245
       ops_exit_list+0x125/0x170 net/core/net_namespace.c:167
       cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:594
       process_one_work+0x996/0x1610 kernel/workqueue.c:2289
       worker_thread+0x665/0x1080 kernel/workqueue.c:2436
       kthread+0x2e9/0x3a0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
      
      The buggy address belongs to the object at ffff88807d37b900
       which belongs to the cache kmalloc-64 of size 64
      The buggy address is located 4 bytes inside of
       64-byte region [ffff88807d37b900, ffff88807d37b940)
      
      The buggy address belongs to the physical page:
      page:ffffea0001f4dec0 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88807d37b180 pfn:0x7d37b
      flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000000200 ffff888010c41340 ffffea0001c795c8 ffff888010c40200
      raw: ffff88807d37b180 ffff88807d37b000 000000010000001f 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x342040(__GFP_IO|__GFP_NOWARN|__GFP_COMP|__GFP_HARDWALL|__GFP_THISNODE), pid 2963, tgid 2963 (udevd), ts 139732238007, free_ts 139730893262
       prep_new_page mm/page_alloc.c:2441 [inline]
       get_page_from_freelist+0xba2/0x3e00 mm/page_alloc.c:4182
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5408
       __alloc_pages_node include/linux/gfp.h:587 [inline]
       kmem_getpages mm/slab.c:1378 [inline]
       cache_grow_begin+0x75/0x350 mm/slab.c:2584
       cache_alloc_refill+0x27f/0x380 mm/slab.c:2957
       ____cache_alloc mm/slab.c:3040 [inline]
       ____cache_alloc mm/slab.c:3023 [inline]
       __do_cache_alloc mm/slab.c:3267 [inline]
       slab_alloc mm/slab.c:3309 [inline]
       __do_kmalloc mm/slab.c:3708 [inline]
       __kmalloc+0x3b3/0x4d0 mm/slab.c:3719
       kmalloc include/linux/slab.h:586 [inline]
       kzalloc include/linux/slab.h:714 [inline]
       tomoyo_encode2.part.0+0xe9/0x3a0 security/tomoyo/realpath.c:45
       tomoyo_encode2 security/tomoyo/realpath.c:31 [inline]
       tomoyo_encode+0x28/0x50 security/tomoyo/realpath.c:80
       tomoyo_realpath_from_path+0x186/0x620 security/tomoyo/realpath.c:288
       tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
       tomoyo_path_perm+0x21b/0x400 security/tomoyo/file.c:822
       security_inode_getattr+0xcf/0x140 security/security.c:1350
       vfs_getattr fs/stat.c:157 [inline]
       vfs_statx+0x16a/0x390 fs/stat.c:232
       vfs_fstatat+0x8c/0xb0 fs/stat.c:255
       __do_sys_newfstatat+0x91/0x110 fs/stat.c:425
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1356 [inline]
       free_pcp_prepare+0x549/0xd20 mm/page_alloc.c:1406
       free_unref_page_prepare mm/page_alloc.c:3328 [inline]
       free_unref_page+0x19/0x6a0 mm/page_alloc.c:3423
       __vunmap+0x85d/0xd30 mm/vmalloc.c:2667
       __vfree+0x3c/0xd0 mm/vmalloc.c:2715
       vfree+0x5a/0x90 mm/vmalloc.c:2746
       __do_replace+0x16b/0x890 net/ipv6/netfilter/ip6_tables.c:1117
       do_replace net/ipv6/netfilter/ip6_tables.c:1157 [inline]
       do_ip6t_set_ctl+0x90d/0xb90 net/ipv6/netfilter/ip6_tables.c:1639
       nf_setsockopt+0x83/0xe0 net/netfilter/nf_sockopt.c:101
       ipv6_setsockopt+0x122/0x180 net/ipv6/ipv6_sockglue.c:1026
       tcp_setsockopt+0x136/0x2520 net/ipv4/tcp.c:3696
       __sys_setsockopt+0x2db/0x6a0 net/socket.c:2180
       __do_sys_setsockopt net/socket.c:2191 [inline]
       __se_sys_setsockopt net/socket.c:2188 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Memory state around the buggy address:
       ffff88807d37b800: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
       ffff88807d37b880: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
      >ffff88807d37b900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                         ^
       ffff88807d37b980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff88807d37ba00: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
      
      Fixes: c85bb41e ("igmp: fix ip_mc_sf_allow race [v5]")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Flavio Leitner <fbl@sysclose.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dba5bdd5
    • David Howells's avatar
      rxrpc: Enable IPv6 checksums on transport socket · 39cb9faa
      David Howells authored
      AF_RXRPC doesn't currently enable IPv6 UDP Tx checksums on the transport
      socket it opens and the checksums in the packets it generates end up 0.
      
      It probably should also enable IPv6 UDP Rx checksums and IPv4 UDP
      checksums.  The latter only seem to be applied if the socket family is
      AF_INET and don't seem to apply if it's AF_INET6.  IPv4 packets from an
      IPv6 socket seem to have checksums anyway.
      
      What seems to have happened is that the inet_inv_convert_csum() call didn't
      get converted to the appropriate udp_port_cfg parameters - and
      udp_sock_create() disables checksums unless explicitly told not too.
      
      Fix this by enabling the three udp_port_cfg checksum options.
      
      Fixes: 1a9b86c9 ("rxrpc: use udp tunnel APIs instead of open code in rxrpc_open_socket")
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      cc: Vadim Fedorenko <vfedorenko@novek.ru>
      cc: David S. Miller <davem@davemloft.net>
      cc: linux-afs@lists.infradead.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39cb9faa
    • Yang Yingliang's avatar
      net: cpsw: add missing of_node_put() in cpsw_probe_dt() · 95098d5a
      Yang Yingliang authored
      'tmp_node' need be put before returning from cpsw_probe_dt(),
      so add missing of_node_put() in error path.
      
      Fixes: ed3525ed ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac")
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95098d5a
    • Yang Yingliang's avatar
      net: stmmac: dwmac-sun8i: add missing of_node_put() in sun8i_dwmac_register_mdio_mux() · 1a15267b
      Yang Yingliang authored
      The node pointer returned by of_get_child_by_name() with refcount incremented,
      so add of_node_put() after using it.
      
      Fixes: 634db83b ("net: stmmac: dwmac-sun8i: Handle integrated/external MDIOs")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Link: https://lore.kernel.org/r/20220428095716.540452-1-yangyingliang@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1a15267b
    • Yang Yingliang's avatar
      net: dsa: mt7530: add missing of_node_put() in mt7530_setup() · a9e9b091
      Yang Yingliang authored
      Add of_node_put() if of_get_phy_mode() fails in mt7530_setup()
      
      Fixes: 0c65b2b9 ("net: of_get_phy_mode: Change API to solve int/unit warnings")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Link: https://lore.kernel.org/r/20220428095317.538829-1-yangyingliang@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a9e9b091
    • Arun Ramadoss's avatar
      net: dsa: ksz9477: port mirror sniffing limited to one port · fee34dd1
      Arun Ramadoss authored
      This patch limits the sniffing to only one port during the mirror add.
      And during the mirror_del it checks for all the ports using the sniff,
      if and only if no other ports are referring, sniffing is disabled.
      The code is updated based on the review comments of LAN937x port mirror
      patch.
      
      Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210422094257.1641396-8-prasanna.vengateshan@microchip.com/
      Fixes: b987e98e ("dsa: add DSA switch driver for Microchip KSZ9477")
      Signed-off-by: default avatarPrasanna Vengateshan <prasanna.vengateshan@microchip.com>
      Signed-off-by: default avatarArun Ramadoss <arun.ramadoss@microchip.com>
      Link: https://lore.kernel.org/r/20220428070709.7094-1-arun.ramadoss@microchip.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fee34dd1
    • Qiao Ma's avatar
      hinic: fix bug of wq out of bound access · 52b2abef
      Qiao Ma authored
      If wq has only one page, we need to check wqe rolling over page by
      compare end_idx and curr_idx, and then copy wqe to shadow wqe to
      avoid out of bound access.
      This work has been done in hinic_get_wqe, but missed for hinic_read_wqe.
      This patch fixes it, and removes unnecessary MASKED_WQE_IDX().
      
      Fixes: 7dd29ee1 ("hinic: add sriov feature support")
      Signed-off-by: default avatarQiao Ma <mqaio@linux.alibaba.com>
      Reviewed-by: default avatarXunlei Pang <xlpang@linux.alibaba.com>
      Link: https://lore.kernel.org/r/282817b0e1ae2e28fdf3ed8271a04e77f57bf42e.1651148587.git.mqaio@linux.alibaba.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      52b2abef
    • Niels Dossche's avatar
      net: mdio: Fix ENOMEM return value in BCM6368 mux bus controller · e87f66b3
      Niels Dossche authored
      Error values inside the probe function must be < 0. The ENOMEM return
      value has the wrong sign: it is positive instead of negative.
      Add a minus sign.
      
      Fixes: e2397567 ("net: mdio: Add BCM6368 MDIO mux bus controller")
      Signed-off-by: default avatarNiels Dossche <dossche.niels@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20220428211931.8130-1-dossche.niels@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e87f66b3
    • Yang Yingliang's avatar
      net: ethernet: mediatek: add missing of_node_put() in mtk_sgmii_init() · ff5265d4
      Yang Yingliang authored
      The node pointer returned by of_parse_phandle() with refcount incremented,
      so add of_node_put() after using it in mtk_sgmii_init().
      
      Fixes: 9ffee4a8 ("net: ethernet: mediatek: Extend SGMII related functions")
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Link: https://lore.kernel.org/r/20220428062543.64883-1-yangyingliang@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ff5265d4
    • Jakub Kicinski's avatar
      Merge branch 'selftests-net-add-missing-tests-to-makefile' · 1e4e6904
      Jakub Kicinski authored
      Hangbin Liu says:
      
      ====================
      selftests: net: add missing tests to Makefile
      
      When generating the selftests to another folder, the fixed tests are
      missing as they are not in Makefile. The missing tests are generated
      by command:
      $ for f in $(ls *.sh); do grep -q $f Makefile || echo $f; done
      ====================
      
      Link: https://lore.kernel.org/r/20220428044511.227416-1-liuhangbin@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1e4e6904
    • Hangbin Liu's avatar
      selftests/net/forwarding: add missing tests to Makefile · f62c5acc
      Hangbin Liu authored
      When generating the selftests to another folder, the fixed tests are
      missing as they are not in Makefile, e.g.
      
        make -C tools/testing/selftests/ install \
        	TARGETS="net/forwarding" INSTALL_PATH=/tmp/kselftests
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f62c5acc
    • Hangbin Liu's avatar
      selftests/net: add missing tests to Makefile · 38dcd957
      Hangbin Liu authored
      When generating the selftests to another folder, the fixed tests are
      missing as they are not in Makefile, e.g.
      
        make -C tools/testing/selftests/ install \
        	TARGETS="net" INSTALL_PATH=/tmp/kselftests
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      38dcd957
  2. 29 Apr, 2022 6 commits
  3. 28 Apr, 2022 21 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 249aca0d
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bluetooth, bpf and netfilter.
      
        Current release - new code bugs:
      
         - bridge: switchdev: check br_vlan_group() return value
      
         - use this_cpu_inc() to increment net->core_stats, fix preempt-rt
      
        Previous releases - regressions:
      
         - eth: stmmac: fix write to sgmii_adapter_base
      
        Previous releases - always broken:
      
         - netfilter: nf_conntrack_tcp: re-init for syn packets only,
           resolving issues with TCP fastopen
      
         - tcp: md5: fix incorrect tcp_header_len for incoming connections
      
         - tcp: fix F-RTO may not work correctly when receiving DSACK
      
         - tcp: ensure use of most recently sent skb when filling rate samples
      
         - tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT
      
         - virtio_net: fix wrong buf address calculation when using xdp
      
         - xsk: fix forwarding when combining copy mode with busy poll
      
         - xsk: fix possible crash when multiple sockets are created
      
         - bpf: lwt: fix crash when using bpf_skb_set_tunnel_key() from
           bpf_xmit lwt hook
      
         - sctp: null-check asoc strreset_chunk in sctp_generate_reconf_event
      
         - wireguard: device: check for metadata_dst with skb_valid_dst()
      
         - netfilter: update ip6_route_me_harder to consider L3 domain
      
         - gre: make o_seqno start from 0 in native mode
      
         - gre: switch o_seqno to atomic to prevent races in collect_md mode
      
        Misc:
      
         - add Eric Dumazet to networking maintainers
      
         - dt: dsa: realtek: remove realtek,rtl8367s string
      
         - netfilter: flowtable: Remove the empty file"
      
      * tag 'net-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
        tcp: fix F-RTO may not work correctly when receiving DSACK
        Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"
        net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK
        ixgbe: ensure IPsec VF<->PF compatibility
        MAINTAINERS: Update BNXT entry with firmware files
        netfilter: nft_socket: only do sk lookups when indev is available
        net: fec: add missing of_node_put() in fec_enet_init_stop_mode()
        bnx2x: fix napi API usage sequence
        tls: Skip tls_append_frag on zero copy size
        Add Eric Dumazet to networking maintainers
        netfilter: conntrack: fix udp offload timeout sysctl
        netfilter: nf_conntrack_tcp: re-init for syn packets only
        net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK
        net: Use this_cpu_inc() to increment net->core_stats
        Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted
        Bluetooth: hci_event: Fix creating hci_conn object on error status
        Bluetooth: hci_event: Fix checking for invalid handle on error status
        ice: fix use-after-free when deinitializing mailbox snapshot
        ice: wait 5 s for EMP reset after firmware flash
        ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg()
        ...
      249aca0d
    • Linus Torvalds's avatar
      Merge tag 'thermal-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 3c76fe74
      Linus Torvalds authored
      Pull thermal control fixes from Rafael Wysocki:
       "These take back recent chages that started to confuse users and fix up
        an attr.show callback prototype in a driver.
      
        Specifics:
      
         - Stop warning about deprecation of the userspace thermal governor
           and cooling device status interface, because there are cases in
           which user space has to drive thermal management with the help of
           them (Daniel Lezcano)
      
         - Fix attr.show callback prototype in the int340x thermal driver
           (Kees Cook)"
      
      * tag 'thermal-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thermal/governor: Remove deprecated information
        Revert "thermal/core: Deprecate changing cooling device state from userspace"
        thermal: int340x: Fix attr.show callback prototype
      3c76fe74
    • Linus Torvalds's avatar
      Merge tag 'pm-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 659ed6e2
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These fix up recent intel_idle driver changes and fix some ARM cpufreq
        driver issues.
      
        Specifics:
      
         - Fix issues with the Qualcomm's cpufreq driver (Dmitry Baryshkov,
           Vladimir Zapolskiy).
      
         - Fix memory leak with the Sun501 driver (Xiaobing Luo).
      
         - Make intel_idle enable C1E promotion on all CPUs when C1E is
           preferred to C1 (Artem Bityutskiy).
      
         - Make C6 optimization on Sapphire Rapids added recently work as
           expected if both C1E and C1 are "preferred" (Artem Bityutskiy)"
      
      * tag 'pm-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        intel_idle: Fix SPR C6 optimization
        intel_idle: Fix the 'preferred_cstates' module parameter
        cpufreq: qcom-cpufreq-hw: Clear dcvs interrupts
        cpufreq: fix memory leak in sun50i_cpufreq_nvmem_probe
        cpufreq: qcom-cpufreq-hw: Fix throttle frequency value on EPSS platforms
        cpufreq: qcom-hw: provide online/offline operations
        cpufreq: qcom-hw: fix the opp entries refcounting
        cpufreq: qcom-hw: fix the race between LMH worker and cpuhp
        cpufreq: qcom-hw: drop affinity hint before freeing the IRQ
      659ed6e2
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · f12d31c0
      Linus Torvalds authored
      Pull ACPI fixes from Rafael WysockiL
       "These fix up the ACPI processor driver after a change made during the
        5.16 cycle that inadvertently broke falling back to shallower C-states
        when C3 cannot be used.
      
        Specifics:
      
         - Make the ACPI processor driver avoid falling back to C3 type of
           C-states when C3 cannot be requested (Ville Syrjälä)
      
         - Revert a quirk that is not necessary any more after fixing the
           underlying issue properly (Ville Syrjälä)"
      
      * tag 'acpi-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        Revert "ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40"
        ACPI: processor: idle: Avoid falling back to C3 type C-states
      f12d31c0
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v5.18-3' of... · 259b897e
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull x86 platform driver fixes from Hans de Goede:
       "Highlights:
      
         - asus-wmi bug-fixes
      
         - intel-sdsu bug-fixes
      
         - build (warning) fixes
      
         - couple of hw-id additions"
      
      * tag 'platform-drivers-x86-v5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
        platform/x86/intel: pmc/core: change pmc_lpm_modes to static
        platform/x86/intel/sdsi: Fix bug in multi packet reads
        platform/x86/intel/sdsi: Poll on ready bit for writes
        platform/x86/intel/sdsi: Handle leaky bucket
        platform/x86: intel-uncore-freq: Prevent driver loading in guests
        platform/x86: gigabyte-wmi: added support for B660 GAMING X DDR4 motherboard
        platform/x86: dell-laptop: Add quirk entry for Latitude 7520
        platform/x86: asus-wmi: Fix driver not binding when fan curve control probe fails
        platform/x86: asus-wmi: Potential buffer overflow in asus_wmi_evaluate_method_buf()
        tools/power/x86/intel-speed-select: fix build failure when using -Wl,--as-needed
      259b897e
    • Linus Torvalds's avatar
      Merge tag 'regulator-fix-v5.18-rc4' of... · fd5a4c7d
      Linus Torvalds authored
      Merge tag 'regulator-fix-v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
      
      Pull regulator fix from Mark Brown:
       "A minor fix for the DT binding documentation of the rt5190a driver"
      
      * tag 'regulator-fix-v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: dt-bindings: Revise the rt5190a buck/ldo description
      fd5a4c7d
    • Pengcheng Yang's avatar
      tcp: fix F-RTO may not work correctly when receiving DSACK · d9157f68
      Pengcheng Yang authored
      Currently DSACK is regarded as a dupack, which may cause
      F-RTO to incorrectly enter "loss was real" when receiving
      DSACK.
      
      Packetdrill to demonstrate:
      
      // Enable F-RTO and TLP
          0 `sysctl -q net.ipv4.tcp_frto=2`
          0 `sysctl -q net.ipv4.tcp_early_retrans=3`
          0 `sysctl -q net.ipv4.tcp_congestion_control=cubic`
      
      // Establish a connection
         +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
         +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
         +0 bind(3, ..., ...) = 0
         +0 listen(3, 1) = 0
      
      // RTT 10ms, RTO 210ms
        +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
         +0 > S. 0:0(0) ack 1 <...>
       +.01 < . 1:1(0) ack 1 win 257
         +0 accept(3, ..., ...) = 4
      
      // Send 2 data segments
         +0 write(4, ..., 2000) = 2000
         +0 > P. 1:2001(2000) ack 1
      
      // TLP
      +.022 > P. 1001:2001(1000) ack 1
      
      // Continue to send 8 data segments
         +0 write(4, ..., 10000) = 10000
         +0 > P. 2001:10001(8000) ack 1
      
      // RTO
      +.188 > . 1:1001(1000) ack 1
      
      // The original data is acked and new data is sent(F-RTO step 2.b)
         +0 < . 1:1(0) ack 2001 win 257
         +0 > P. 10001:12001(2000) ack 1
      
      // D-SACK caused by TLP is regarded as a dupack, this results in
      // the incorrect judgment of "loss was real"(F-RTO step 3.a)
      +.022 < . 1:1(0) ack 2001 win 257 <sack 1001:2001,nop,nop>
      
      // Never-retransmitted data(3001:4001) are acked and
      // expect to switch to open state(F-RTO step 3.b)
         +0 < . 1:1(0) ack 4001 win 257
      +0 %{ assert tcpi_ca_state == 0, tcpi_ca_state }%
      
      Fixes: e33099f9 ("tcp: implement RFC5682 F-RTO")
      Signed-off-by: default avatarPengcheng Yang <yangpc@wangsu.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Tested-by: default avatarNeal Cardwell <ncardwell@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/1650967419-2150-1-git-send-email-yangpc@wangsu.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d9157f68
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · c26d0d98
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      1) Fix incorrect TCP connection tracking window reset for non-syn
         packets, from Florian Westphal.
      
      2) Incorrect dependency on CONFIG_NFT_FLOW_OFFLOAD, from Volodymyr Mytnyk.
      
      3) Fix nft_socket from the output path, from Florian Westphal.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nft_socket: only do sk lookups when indev is available
        netfilter: conntrack: fix udp offload timeout sysctl
        netfilter: nf_conntrack_tcp: re-init for syn packets only
      ====================
      
      Link: https://lore.kernel.org/r/20220428142109.38726-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c26d0d98
    • Linus Torvalds's avatar
      Merge tag 'gfs2-v5.18-rc4-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · 4a2316a1
      Linus Torvalds authored
      Pull gfs2 fix from Andreas Gruenbacher:
      
       - No short reads or writes upon glock contention
      
      * tag 'gfs2-v5.18-rc4-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        gfs2: No short reads or writes upon glock contention
      4a2316a1
    • Dany Madden's avatar
      Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits" · aeaf59b7
      Dany Madden authored
      This reverts commit 723ad916
      
      When client requests channel or ring size larger than what the server
      can support the server will cap the request to the supported max. So,
      the client would not be able to successfully request resources that
      exceed the server limit.
      
      Fixes: 723ad916 ("ibmvnic: Add ethtool private flag for driver-defined queue limits")
      Signed-off-by: default avatarDany Madden <drt@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220427235146.23189-1-drt@linux.ibm.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aeaf59b7
    • Vladimir Oltean's avatar
      net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK · 66a2f5ef
      Vladimir Oltean authored
      The Time-Specified Departure feature is indeed mutually exclusive with
      TX IP checksumming in ENETC, but TX checksumming in itself is broken and
      was removed from this driver in commit 82728b91 ("enetc: Remove Tx
      checksumming offload code").
      
      The blamed commit declared NETIF_F_HW_CSUM in dev->features to comply
      with software TSO's expectations, and still did the checksumming in
      software by calling skb_checksum_help(). So there isn't any restriction
      for the Time-Specified Departure feature.
      
      However, enetc_setup_tc_txtime() doesn't understand that, and blindly
      looks for NETIF_F_CSUM_MASK.
      
      Instead of checking for things which can literally never happen in the
      current code base, just remove the check and let the driver offload
      tc-etf qdiscs.
      
      Fixes: acede3c5 ("net: enetc: declare NETIF_F_HW_CSUM and do it in software")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20220427203017.1291634-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      66a2f5ef
    • Leon Romanovsky's avatar
      ixgbe: ensure IPsec VF<->PF compatibility · f049efc7
      Leon Romanovsky authored
      The VF driver can forward any IPsec flags and such makes the function
      is not extendable and prone to backward/forward incompatibility.
      
      If new software runs on VF, it won't know that PF configured something
      completely different as it "knows" only XFRM_OFFLOAD_INBOUND flag.
      
      Fixes: eda0333a ("ixgbe: add VF IPsec management")
      Reviewed-by: default avatarRaed Salem <raeds@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: default avatarShannon Nelson <snelson@pensando.io>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20220427173152.443102-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f049efc7
    • Linus Torvalds's avatar
      Merge tag 'xfs-5.18-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 8061e16e
      Linus Torvalds authored
      Pull xfs fixes from Dave Chinner:
      
       - define buffer bit flags as unsigned to fix gcc-5 + c11 warnings
      
       - remove redundant XFS fields from MAINTAINERS
      
       - fix inode buffer locking order regression
      
      * tag 'xfs-5.18-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: reorder iunlink remove operation in xfs_ifree
        MAINTAINERS: update IOMAP FILESYSTEM LIBRARY and XFS FILESYSTEM
        xfs: convert buffer flags to unsigned.
      8061e16e
    • Florian Fainelli's avatar
      MAINTAINERS: Update BNXT entry with firmware files · 126858db
      Florian Fainelli authored
      There appears to be a maintainer gap for BNXT TEE firmware files which
      causes some patches to be missed. Update the entry for the BNXT Ethernet
      controller with its companion firmware files.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Link: https://lore.kernel.org/r/20220427163606.126154-1-f.fainelli@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      126858db
    • Rafael J. Wysocki's avatar
      Merge branch 'thermal-int340x' · a12475f9
      Rafael J. Wysocki authored
      Merge a fix for the attr.show callback prototype in the int340x thermal
      driver (Kees Cook).
      
      * thermal-int340x:
        thermal: int340x: Fix attr.show callback prototype
      a12475f9
    • Florian Westphal's avatar
      netfilter: nft_socket: only do sk lookups when indev is available · 743b83f1
      Florian Westphal authored
      Check if the incoming interface is available and NFT_BREAK
      in case neither skb->sk nor input device are set.
      
      Because nf_sk_lookup_slow*() assume packet headers are in the
      'in' direction, use in postrouting is not going to yield a meaningful
      result.  Same is true for the forward chain, so restrict the use
      to prerouting, input and output.
      
      Use in output work if a socket is already attached to the skb.
      
      Fixes: 554ced0a ("netfilter: nf_tables: add support for native socket matching")
      Reported-and-tested-by: default avatarTopi Miettinen <toiwoton@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      743b83f1
    • Rafael J. Wysocki's avatar
      Merge branch 'pm-cpuidle' · edbd9772
      Rafael J. Wysocki authored
      Merge cpuidle fixes for 5.18-rc5:
      
       - Make intel_idle enable C1E promotion on all CPUs when C1E is
         preferred to C1 (Artem Bityutskiy).
      
       - Make C6 optimization on Sapphire Rapids added recently work as
         expected if both C1E and C1 are "preferred" (Artem Bityutskiy).
      
      * pm-cpuidle:
        intel_idle: Fix SPR C6 optimization
        intel_idle: Fix the 'preferred_cstates' module parameter
      edbd9772
    • Andreas Gruenbacher's avatar
      gfs2: No short reads or writes upon glock contention · 296abc0d
      Andreas Gruenbacher authored
      Commit 00bfe02f ("gfs2: Fix mmap + page fault deadlocks for buffered
      I/O") changed gfs2_file_read_iter() and gfs2_file_buffered_write() to
      allow dropping the inode glock while faulting in user buffers.  When the
      lock was dropped, a short result was returned to indicate that the
      operation was interrupted.
      
      As pointed out by Linus (see the link below), this behavior is broken
      and the operations should always re-acquire the inode glock and resume
      the operation instead.
      
      Link: https://lore.kernel.org/lkml/CAHk-=whaz-g_nOOoo8RRiWNjnv2R+h6_xk2F1J4TuSRxk1MtLw@mail.gmail.com/
      Fixes: 00bfe02f ("gfs2: Fix mmap + page fault deadlocks for buffered I/O")
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      296abc0d
    • Paolo Abeni's avatar
      Merge tag 'for-net-2022-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · febb2d2f
      Paolo Abeni authored
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth pull request for net:
      
       - Fix regression causing some HCI events to be discarded when they
         shouldn't.
      
      * tag 'for-net-2022-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
        Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted
        Bluetooth: hci_event: Fix creating hci_conn object on error status
        Bluetooth: hci_event: Fix checking for invalid handle on error status
      ====================
      
      Link: https://lore.kernel.org/r/20220427234031.1257281-1-luiz.dentz@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      febb2d2f
    • Yang Yingliang's avatar
      net: fec: add missing of_node_put() in fec_enet_init_stop_mode() · d2b52ec0
      Yang Yingliang authored
      Put device node in error path in fec_enet_init_stop_mode().
      
      Fixes: 8a448bf8 ("net: ethernet: fec: move GPR register offset and bit into DT")
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Link: https://lore.kernel.org/r/20220426125231.375688-1-yangyingliang@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d2b52ec0
    • Manish Chopra's avatar
      bnx2x: fix napi API usage sequence · af68656d
      Manish Chopra authored
      While handling PCI errors (AER flow) driver tries to
      disable NAPI [napi_disable()] after NAPI is deleted
      [__netif_napi_del()] which causes unexpected system
      hang/crash.
      
      System message log shows the following:
      =======================================
      [ 3222.537510] EEH: Detected PCI bus error on PHB#384-PE#800000 [ 3222.537511] EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures.
      [ 3222.537512] EEH: Notify device drivers to shutdown [ 3222.537513] EEH: Beginning: 'error_detected(IO frozen)'
      [ 3222.537514] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
      bnx2x->error_detected(IO frozen)
      [ 3222.537516] bnx2x: [bnx2x_io_error_detected:14236(eth14)]IO error detected [ 3222.537650] EEH: PE#800000 (PCI 0384:80:00.0): bnx2x driver reports:
      'need reset'
      [ 3222.537651] EEH: PE#800000 (PCI 0384:80:00.1): Invoking
      bnx2x->error_detected(IO frozen)
      [ 3222.537651] bnx2x: [bnx2x_io_error_detected:14236(eth13)]IO error detected [ 3222.537729] EEH: PE#800000 (PCI 0384:80:00.1): bnx2x driver reports:
      'need reset'
      [ 3222.537729] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
      [ 3222.537890] EEH: Collect temporary log [ 3222.583481] EEH: of node=0384:80:00.0 [ 3222.583519] EEH: PCI device/vendor: 168e14e4 [ 3222.583557] EEH: PCI cmd/status register: 00100140 [ 3222.583557] EEH: PCI-E capabilities and status follow:
      [ 3222.583744] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.583892] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.583893] EEH: PCI-E 20: 00000000 [ 3222.583893] EEH: PCI-E AER capability register set follows:
      [ 3222.584079] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.584230] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.584378] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.584416] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.584416] EEH: of node=0384:80:00.1 [ 3222.584454] EEH: PCI device/vendor: 168e14e4 [ 3222.584491] EEH: PCI cmd/status register: 00100140 [ 3222.584492] EEH: PCI-E capabilities and status follow:
      [ 3222.584677] EEH: PCI-E 00: 00020010 012c8da2 00095d5e 00455c82 [ 3222.584825] EEH: PCI-E 10: 10820000 00000000 00000000 00000000 [ 3222.584826] EEH: PCI-E 20: 00000000 [ 3222.584826] EEH: PCI-E AER capability register set follows:
      [ 3222.585011] EEH: PCI-E AER 00: 13c10001 00000000 00000000 00062030 [ 3222.585160] EEH: PCI-E AER 10: 00002000 000031c0 000001e0 00000000 [ 3222.585309] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000 [ 3222.585347] EEH: PCI-E AER 30: 00000000 00000000 [ 3222.586872] RTAS: event: 5, Type: Platform Error (224), Severity: 2 [ 3222.586873] EEH: Reset without hotplug activity [ 3224.762767] EEH: Beginning: 'slot_reset'
      [ 3224.762770] EEH: PE#800000 (PCI 0384:80:00.0): Invoking
      bnx2x->slot_reset()
      [ 3224.762771] bnx2x: [bnx2x_io_slot_reset:14271(eth14)]IO slot reset initializing...
      [ 3224.762887] bnx2x 0384:80:00.0: enabling device (0140 -> 0142) [ 3224.768157] bnx2x: [bnx2x_io_slot_reset:14287(eth14)]IO slot reset
      --> driver unload
      
      Uninterruptible tasks
      =====================
      crash> ps | grep UN
           213      2  11  c000000004c89e00  UN   0.0       0      0  [eehd]
           215      2   0  c000000004c80000  UN   0.0       0      0
      [kworker/0:2]
          2196      1  28  c000000004504f00  UN   0.1   15936  11136  wickedd
          4287      1   9  c00000020d076800  UN   0.0    4032   3008  agetty
          4289      1  20  c00000020d056680  UN   0.0    7232   3840  agetty
         32423      2  26  c00000020038c580  UN   0.0       0      0
      [kworker/26:3]
         32871   4241  27  c0000002609ddd00  UN   0.1   18624  11648  sshd
         32920  10130  16  c00000027284a100  UN   0.1   48512  12608  sendmail
         33092  32987   0  c000000205218b00  UN   0.1   48512  12608  sendmail
         33154   4567  16  c000000260e51780  UN   0.1   48832  12864  pickup
         33209   4241  36  c000000270cb6500  UN   0.1   18624  11712  sshd
         33473  33283   0  c000000205211480  UN   0.1   48512  12672  sendmail
         33531   4241  37  c00000023c902780  UN   0.1   18624  11648  sshd
      
      EEH handler hung while bnx2x sleeping and holding RTNL lock
      ===========================================================
      crash> bt 213
      PID: 213    TASK: c000000004c89e00  CPU: 11  COMMAND: "eehd"
        #0 [c000000004d477e0] __schedule at c000000000c70808
        #1 [c000000004d478b0] schedule at c000000000c70ee0
        #2 [c000000004d478e0] schedule_timeout at c000000000c76dec
        #3 [c000000004d479c0] msleep at c0000000002120cc
        #4 [c000000004d479f0] napi_disable at c000000000a06448
                                              ^^^^^^^^^^^^^^^^
        #5 [c000000004d47a30] bnx2x_netif_stop at c0080000018dba94 [bnx2x]
        #6 [c000000004d47a60] bnx2x_io_slot_reset at c0080000018a551c [bnx2x]
        #7 [c000000004d47b20] eeh_report_reset at c00000000004c9bc
        #8 [c000000004d47b90] eeh_pe_report at c00000000004d1a8
        #9 [c000000004d47c40] eeh_handle_normal_event at c00000000004da64
      
      And the sleeping source code
      ============================
      crash> dis -ls c000000000a06448
      FILE: ../net/core/dev.c
      LINE: 6702
      
         6697  {
         6698          might_sleep();
         6699          set_bit(NAPI_STATE_DISABLE, &n->state);
         6700
         6701          while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
      * 6702                  msleep(1);
         6703          while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
         6704                  msleep(1);
         6705
         6706          hrtimer_cancel(&n->timer);
         6707
         6708          clear_bit(NAPI_STATE_DISABLE, &n->state);
         6709  }
      
      EEH calls into bnx2x twice based on the system log above, first through
      bnx2x_io_error_detected() and then bnx2x_io_slot_reset(), and executes
      the following call chains:
      
      bnx2x_io_error_detected()
        +-> bnx2x_eeh_nic_unload()
             +-> bnx2x_del_all_napi()
                  +-> __netif_napi_del()
      
      bnx2x_io_slot_reset()
        +-> bnx2x_netif_stop()
             +-> bnx2x_napi_disable()
                  +->napi_disable()
      
      Fix this by correcting the sequence of NAPI APIs usage,
      that is delete the NAPI after disabling it.
      
      Fixes: 7fa6f340 ("bnx2x: AER revised")
      Reported-by: default avatarDavid Christensen <drc@linux.vnet.ibm.com>
      Tested-by: default avatarDavid Christensen <drc@linux.vnet.ibm.com>
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Link: https://lore.kernel.org/r/20220426153913.6966-1-manishc@marvell.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      af68656d