1. 22 Jun, 2021 12 commits
    • Rafał Miłecki's avatar
      net: broadcom: bcm4908_enet: reset DMA rings sw indexes properly · ddeacc4f
      Rafał Miłecki authored
      Resetting software indexes in bcm4908_dma_alloc_buf_descs() is not
      enough as it's called during device probe only. Driver resets DMA on
      every .ndo_open callback and it's required to reset indexes then.
      
      This fixes inconsistent rings state and stalled traffic after interface
      down & up sequence.
      
      Fixes: 4feffead ("net: broadcom: bcm4908enet: add BCM4908 controller driver")
      Signed-off-by: default avatarRafał Miłecki <rafal@milecki.pl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ddeacc4f
    • Miao Wang's avatar
      net/ipv4: swap flow ports when validating source · c69f114d
      Miao Wang authored
      When doing source address validation, the flowi4 struct used for
      fib_lookup should be in the reverse direction to the given skb.
      fl4_dport and fl4_sport returned by fib4_rules_early_flow_dissect
      should thus be swapped.
      
      Fixes: 5a847a6e ("net/ipv4: Initialize proto and ports in flow struct")
      Signed-off-by: default avatarMiao Wang <shankerwangmiao@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c69f114d
    • Di Zhu's avatar
      bonding: avoid adding slave device with IFF_MASTER flag · 3c9ef511
      Di Zhu authored
      The following steps will definitely cause the kernel to crash:
      	ip link add vrf1 type vrf table 1
      	modprobe bonding.ko max_bonds=1
      	echo "+vrf1" >/sys/class/net/bond0/bonding/slaves
      	rmmod bonding
      
      The root cause is that: When the VRF is added to the slave device,
      it will fail, and some cleaning work will be done. because VRF device
      has IFF_MASTER flag, cleanup process  will not clear the IFF_BONDING flag.
      Then, when we unload the bonding module, unregister_netdevice_notifier()
      will treat the VRF device as a bond master device and treat netdev_priv()
      as struct bonding{} which actually is struct net_vrf{}.
      
      By analyzing the processing logic of bond_enslave(), it seems that
      it is not allowed to add the slave device with the IFF_MASTER flag, so
      we need to add a code check for this situation.
      Signed-off-by: default avatarDi Zhu <zhudi21@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c9ef511
    • Jakub Kicinski's avatar
      ip6_tunnel: fix GRE6 segmentation · a6e3f298
      Jakub Kicinski authored
      Commit 6c11fbf9 ("ip6_tunnel: add MPLS transmit support")
      moved assiging inner_ipproto down from ipxip6_tnl_xmit() to
      its callee ip6_tnl_xmit(). The latter is also used by GRE.
      
      Since commit 38720352 ("gre: Use inner_proto to obtain inner
      header protocol") GRE had been depending on skb->inner_protocol
      during segmentation. It sets it in gre_build_header() and reads
      it in gre_gso_segment(). Changes to ip6_tnl_xmit() overwrite
      the protocol, resulting in GSO skbs getting dropped.
      
      Note that inner_protocol is a union with inner_ipproto,
      GRE uses the former while the change switched it to the latter
      (always setting it to just IPPROTO_GRE).
      
      Restore the original location of skb_set_inner_ipproto(),
      it is unclear why it was moved in the first place.
      
      Fixes: 6c11fbf9 ("ip6_tunnel: add MPLS transmit support")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Tested-by: default avatarVadim Fedorenko <vfedorenko@novek.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6e3f298
    • David S. Miller's avatar
      Merge branch 'mptcp-fixes' · e596212e
      David S. Miller authored
      Mat Martineau says:
      
      ====================
      mptcp: Fixes for v5.13
      
      Here are two MPTCP fixes from Paolo.
      
      Patch 1 fixes some possible connect-time race conditions with
      MPTCP-level connection state changes.
      
      Patch 2 deletes a duplicate function declaration.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e596212e
    • Paolo Abeni's avatar
      mptcp: drop duplicate mptcp_setsockopt() declaration · 597dbae7
      Paolo Abeni authored
      commit 78962489 ("mptcp: add skeleton to sync msk socket
      options to subflows") introduced a duplicate declaration of
      mptcp_setsockopt(), just drop it.
      Reported-by: default avatarFlorian Westphal <fw@strlen.de>
      Fixes: 78962489 ("mptcp: add skeleton to sync msk socket options to subflows")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      597dbae7
    • Paolo Abeni's avatar
      mptcp: avoid race on msk state changes · 490274b4
      Paolo Abeni authored
      The msk socket state is currently updated in a few spots without
      owning the msk socket lock itself.
      
      Some of such operations are safe, as they happens before exposing
      the msk socket to user-space and can't race with other changes.
      
      A couple of them, at connect time, can actually race with close()
      or shutdown(), leaving breaking the socket state machine.
      
      This change addresses the issue moving such update under the msk
      socket lock with the usual:
      
      <acquire spinlock>
      <check sk lock onwers>
      <ev defer to release_cb>
      
      scheme.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/56
      Fixes: 8fd73804 ("mptcp: fallback in case of simultaneous connect")
      Fixes: c3c123d1 ("net: mptcp: don't hang in mptcp_sendmsg() after TCP fallback")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      490274b4
    • Íñigo Huguet's avatar
      sfc: avoid duplicated code in ef10_sriov · 3ddd6e2f
      Íñigo Huguet authored
      The fail path of efx_ef10_sriov_alloc_vf_vswitching is identical to the
      full content of efx_ef10_sriov_free_vf_vswitching, so replace it for a
      single call to efx_ef10_sriov_free_vf_vswitching.
      Signed-off-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ddd6e2f
    • Íñigo Huguet's avatar
      sfc: explain that "attached" VFs only refer to Xen · 9a022e76
      Íñigo Huguet authored
      During SRIOV disabling it is checked wether any VF is currently attached
      to a guest, using pci_vfs_assigned function. However, this check only
      works with VFs attached with Xen, not with vfio/KVM. Added comments
      clarifying this point.
      
      Also, replaced manual check of PCI_DEV_FLAGS_ASSIGNED flag and used the
      helper function pci_is_dev_assigned instead.
      Signed-off-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a022e76
    • Íñigo Huguet's avatar
      sfc: error code if SRIOV cannot be disabled · 1ebe4feb
      Íñigo Huguet authored
      If SRIOV cannot be disabled during device removal or module unloading,
      return error code so it can be logged properly in the calling function.
      
      Note that this can only happen if any VF is currently attached to a
      guest using Xen, but not with vfio/KVM. Despite that in that case the
      VFs won't work properly with PF removed and/or the module unloaded, I
      have let it as is because I don't know what side effects may have
      changing it, and also it seems to be the same that other drivers are
      doing in this situation.
      
      In the case of being called during SRIOV reconfiguration, the behavior
      hasn't changed because the function is called with force=false.
      Signed-off-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ebe4feb
    • Íñigo Huguet's avatar
      sfc: avoid double pci_remove of VFs · 45423cff
      Íñigo Huguet authored
      If pci_remove was called for a PF with VFs, the removal of the VFs was
      called twice from efx_ef10_sriov_fini: one directly with pci_driver->remove
      and another implicit by calling pci_disable_sriov, which also perform
      the VFs remove. This was leading to crashing the kernel on the second
      attempt.
      
      Given that pci_disable_sriov already calls to pci remove function, get
      rid of the direct call to pci_driver->remove from the driver.
      
      2 different ways to trigger the bug:
      - Create one or more VFs, then attach the PF to a virtual machine (at
        least with qemu/KVM)
      - Create one or more VFs, then remove the PF with:
        echo 1 > /sys/bus/pci/devices/PF_PCI_ID/remove
      
      Removing sfc module does not trigger the error, at least for me, because
      it removes the VF first, and then the PF.
      
      Example of a log with the error:
          list_del corruption, ffff967fd20a8ad0->next is LIST_POISON1 (dead000000000100)
          ------------[ cut here ]------------
          kernel BUG at lib/list_debug.c:47!
          [...trimmed...]
          RIP: 0010:__list_del_entry_valid.cold.1+0x12/0x4c
          [...trimmed...]
          Call Trace:
          efx_dissociate+0x1f/0x140 [sfc]
          efx_pci_remove+0x27/0x150 [sfc]
          pci_device_remove+0x3b/0xc0
          device_release_driver_internal+0x103/0x1f0
          pci_stop_bus_device+0x69/0x90
          pci_stop_and_remove_bus_device+0xe/0x20
          pci_iov_remove_virtfn+0xba/0x120
          sriov_disable+0x2f/0xe0
          efx_ef10_pci_sriov_disable+0x52/0x80 [sfc]
          ? pcie_aer_is_native+0x12/0x40
          efx_ef10_sriov_fini+0x72/0x110 [sfc]
          efx_pci_remove+0x62/0x150 [sfc]
          pci_device_remove+0x3b/0xc0
          device_release_driver_internal+0x103/0x1f0
          unbind_store+0xf6/0x130
          kernfs_fop_write+0x116/0x190
          vfs_write+0xa5/0x1a0
          ksys_write+0x4f/0xb0
          do_syscall_64+0x5b/0x1a0
          entry_SYSCALL_64_after_hwframe+0x65/0xca
      Signed-off-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45423cff
    • Eric Dumazet's avatar
      vxlan: add missing rcu_read_lock() in neigh_reduce() · 85e8b032
      Eric Dumazet authored
      syzbot complained in neigh_reduce(), because rcu_read_lock_bh()
      is treated differently than rcu_read_lock()
      
      WARNING: suspicious RCU usage
      5.13.0-rc6-syzkaller #0 Not tainted
      -----------------------------
      include/net/addrconf.h:313 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      3 locks held by kworker/0:0/5:
       #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
       #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
       #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
       #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:617 [inline]
       #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
       #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x871/0x1600 kernel/workqueue.c:2247
       #1: ffffc90000ca7da8 ((work_completion)(&port->wq)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1600 kernel/workqueue.c:2251
       #2: ffffffff8bf795c0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1da/0x3130 net/core/dev.c:4180
      
      stack backtrace:
      CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.13.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events ipvlan_process_multicast
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x141/0x1d7 lib/dump_stack.c:120
       __in6_dev_get include/net/addrconf.h:313 [inline]
       __in6_dev_get include/net/addrconf.h:311 [inline]
       neigh_reduce drivers/net/vxlan.c:2167 [inline]
       vxlan_xmit+0x34d5/0x4c30 drivers/net/vxlan.c:2919
       __netdev_start_xmit include/linux/netdevice.h:4944 [inline]
       netdev_start_xmit include/linux/netdevice.h:4958 [inline]
       xmit_one net/core/dev.c:3654 [inline]
       dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3670
       __dev_queue_xmit+0x2133/0x3130 net/core/dev.c:4246
       ipvlan_process_multicast+0xa99/0xd70 drivers/net/ipvlan/ipvlan_core.c:287
       process_one_work+0x98d/0x1600 kernel/workqueue.c:2276
       worker_thread+0x64c/0x1120 kernel/workqueue.c:2422
       kthread+0x3b1/0x4a0 kernel/kthread.c:313
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
      
      Fixes: f564f45c ("vxlan: add ipv6 proxy support")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85e8b032
  2. 21 Jun, 2021 17 commits
    • Eric Dumazet's avatar
      pkt_sched: sch_qfq: fix qfq_change_class() error path · 0cd58e5c
      Eric Dumazet authored
      If qfq_change_class() is unable to allocate memory for qfq_aggregate,
      it frees the class that has been inserted in the class hash table,
      but does not unhash it.
      
      Defer the insertion after the problematic allocation.
      
      BUG: KASAN: use-after-free in hlist_add_head include/linux/list.h:884 [inline]
      BUG: KASAN: use-after-free in qdisc_class_hash_insert+0x200/0x210 net/sched/sch_api.c:731
      Write of size 8 at addr ffff88814a534f10 by task syz-executor.4/31478
      
      CPU: 0 PID: 31478 Comm: syz-executor.4 Not tainted 5.13.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x141/0x1d7 lib/dump_stack.c:120
       print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233
       __kasan_report mm/kasan/report.c:419 [inline]
       kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436
       hlist_add_head include/linux/list.h:884 [inline]
       qdisc_class_hash_insert+0x200/0x210 net/sched/sch_api.c:731
       qfq_change_class+0x96c/0x1990 net/sched/sch_qfq.c:489
       tc_ctl_tclass+0x514/0xe50 net/sched/sch_api.c:2113
       rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5564
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
       netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
       netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:674
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
       do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x4665d9
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fdc7b5f0188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 000000000056bf80 RCX: 00000000004665d9
      RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003
      RBP: 00007fdc7b5f01d0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
      R13: 00007ffcf7310b3f R14: 00007fdc7b5f0300 R15: 0000000000022000
      
      Allocated by task 31445:
       kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:46 [inline]
       set_alloc_info mm/kasan/common.c:428 [inline]
       ____kasan_kmalloc mm/kasan/common.c:507 [inline]
       ____kasan_kmalloc mm/kasan/common.c:466 [inline]
       __kasan_kmalloc+0x9b/0xd0 mm/kasan/common.c:516
       kmalloc include/linux/slab.h:556 [inline]
       kzalloc include/linux/slab.h:686 [inline]
       qfq_change_class+0x705/0x1990 net/sched/sch_qfq.c:464
       tc_ctl_tclass+0x514/0xe50 net/sched/sch_api.c:2113
       rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5564
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
       netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
       netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:674
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
       do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Freed by task 31445:
       kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
       kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357
       ____kasan_slab_free mm/kasan/common.c:360 [inline]
       ____kasan_slab_free mm/kasan/common.c:325 [inline]
       __kasan_slab_free+0xfb/0x130 mm/kasan/common.c:368
       kasan_slab_free include/linux/kasan.h:212 [inline]
       slab_free_hook mm/slub.c:1583 [inline]
       slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1608
       slab_free mm/slub.c:3168 [inline]
       kfree+0xe5/0x7f0 mm/slub.c:4212
       qfq_change_class+0x10fb/0x1990 net/sched/sch_qfq.c:518
       tc_ctl_tclass+0x514/0xe50 net/sched/sch_api.c:2113
       rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5564
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
       netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
       netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:674
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
       do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The buggy address belongs to the object at ffff88814a534f00
       which belongs to the cache kmalloc-128 of size 128
      The buggy address is located 16 bytes inside of
       128-byte region [ffff88814a534f00, ffff88814a534f80)
      The buggy address belongs to the page:
      page:ffffea0005294d00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x14a534
      flags: 0x57ff00000000200(slab|node=1|zone=2|lastcpupid=0x7ff)
      raw: 057ff00000000200 ffffea00004fee00 0000000600000006 ffff8880110418c0
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 29797, ts 604817765317, free_ts 604810151744
       prep_new_page mm/page_alloc.c:2358 [inline]
       get_page_from_freelist+0x1033/0x2b60 mm/page_alloc.c:3994
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5200
       alloc_pages+0x18c/0x2a0 mm/mempolicy.c:2272
       alloc_slab_page mm/slub.c:1646 [inline]
       allocate_slab+0x2c5/0x4c0 mm/slub.c:1786
       new_slab mm/slub.c:1849 [inline]
       new_slab_objects mm/slub.c:2595 [inline]
       ___slab_alloc+0x4a1/0x810 mm/slub.c:2758
       __slab_alloc.constprop.0+0xa7/0xf0 mm/slub.c:2798
       slab_alloc_node mm/slub.c:2880 [inline]
       slab_alloc mm/slub.c:2922 [inline]
       __kmalloc+0x315/0x330 mm/slub.c:4050
       kmalloc include/linux/slab.h:561 [inline]
       kzalloc include/linux/slab.h:686 [inline]
       __register_sysctl_table+0x112/0x1090 fs/proc/proc_sysctl.c:1318
       mpls_dev_sysctl_register+0x1b7/0x2d0 net/mpls/af_mpls.c:1421
       mpls_add_dev net/mpls/af_mpls.c:1472 [inline]
       mpls_dev_notify+0x214/0x8b0 net/mpls/af_mpls.c:1588
       notifier_call_chain+0xb5/0x200 kernel/notifier.c:83
       call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:2121
       call_netdevice_notifiers_extack net/core/dev.c:2133 [inline]
       call_netdevice_notifiers net/core/dev.c:2147 [inline]
       register_netdevice+0x106b/0x1500 net/core/dev.c:10312
       veth_newlink+0x585/0xac0 drivers/net/veth.c:1547
       __rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3452
       rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3500
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1298 [inline]
       free_pcp_prepare+0x223/0x300 mm/page_alloc.c:1342
       free_unref_page_prepare mm/page_alloc.c:3250 [inline]
       free_unref_page+0x12/0x1d0 mm/page_alloc.c:3298
       __vunmap+0x783/0xb60 mm/vmalloc.c:2566
       free_work+0x58/0x70 mm/vmalloc.c:80
       process_one_work+0x98d/0x1600 kernel/workqueue.c:2276
       worker_thread+0x64c/0x1120 kernel/workqueue.c:2422
       kthread+0x3b1/0x4a0 kernel/kthread.c:313
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
      
      Memory state around the buggy address:
       ffff88814a534e00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88814a534e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff88814a534f00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                               ^
       ffff88814a534f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff88814a535000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      
      Fixes: 462dbc91 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cd58e5c
    • Eldar Gasanov's avatar
      net: dsa: mv88e6xxx: Fix adding vlan 0 · b8b79c41
      Eldar Gasanov authored
      8021q module adds vlan 0 to all interfaces when it starts.
      When 8021q module is loaded it isn't possible to create bond
      with mv88e6xxx interfaces, bonding module dipslay error
      "Couldn't add bond vlan ids", because it tries to add vlan 0
      to slave interfaces.
      
      There is unexpected behavior in the switch. When a PVID
      is assigned to a port the switch changes VID to PVID
      in ingress frames with VID 0 on the port. Expected
      that the switch doesn't assign PVID to tagged frames
      with VID 0. But there isn't a way to change this behavior
      in the switch.
      
      Fixes: 57e661aa ("net: dsa: mv88e6xxx: Link aggregation support")
      Signed-off-by: default avatarEldar Gasanov <eldargasanov2@gmail.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8b79c41
    • Longpeng(Mike)'s avatar
      vsock: notify server to shutdown when client has pending signal · c7ff9cff
      Longpeng(Mike) authored
      The client's sk_state will be set to TCP_ESTABLISHED if the server
      replay the client's connect request.
      
      However, if the client has pending signal, its sk_state will be set
      to TCP_CLOSE without notify the server, so the server will hold the
      corrupt connection.
      
                  client                        server
      
      1. sk_state=TCP_SYN_SENT         |
      2. call ->connect()              |
      3. wait reply                    |
                                       | 4. sk_state=TCP_ESTABLISHED
                                       | 5. insert to connected list
                                       | 6. reply to the client
      7. sk_state=TCP_ESTABLISHED      |
      8. insert to connected list      |
      9. *signal pending* <--------------------- the user kill client
      10. sk_state=TCP_CLOSE           |
      client is exiting...             |
      11. call ->release()             |
           virtio_transport_close
            if (!(sk->sk_state == TCP_ESTABLISHED ||
      	      sk->sk_state == TCP_CLOSING))
      		return true; *return at here, the server cannot notice the connection is corrupt*
      
      So the client should notify the peer in this case.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Cc: Norbert Slusarek <nslusarek@gmx.net>
      Cc: Andra Paraschiv <andraprs@amazon.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: David Brazdil <dbrazdil@google.com>
      Cc: Alexander Popov <alex.popov@linux.com>
      Suggested-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Link: https://lkml.org/lkml/2021/5/17/418Signed-off-by: default avatarlixianming <lixianming5@huawei.com>
      Signed-off-by: default avatarLongpeng(Mike) <longpeng2@huawei.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7ff9cff
    • Christophe JAILLET's avatar
      net: mana: Fix a memory leak in an error handling path in 'mana_create_txq()' · b9078845
      Christophe JAILLET authored
      If this test fails we must free some resources as in all the other error
      handling paths of this function.
      
      Fixes: ca9c54d2 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarDexuan Cui <decui@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9078845
    • David S. Miller's avatar
      Merge branch 'nnicstar-fixes' · 4f35dabb
      David S. Miller authored
      Zheyu Ma says:
      
      ====================
      atm: nicstar: fix two bugs about error handling
      
      Zheyu Ma (2):
        atm: nicstar: use 'dma_free_coherent' instead of 'kfree'
        atm: nicstar: register the interrupt handler in the right place
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f35dabb
    • Zheyu Ma's avatar
      atm: nicstar: register the interrupt handler in the right place · 70b639dc
      Zheyu Ma authored
      Because the error handling is sequential, the application of resources
      should be carried out in the order of error handling, so the operation
      of registering the interrupt handler should be put in front, so as not
      to free the unregistered interrupt handler during error handling.
      
      This log reveals it:
      
      [    3.438724] Trying to free already-free IRQ 23
      [    3.439060] WARNING: CPU: 5 PID: 1 at kernel/irq/manage.c:1825 free_irq+0xfb/0x480
      [    3.440039] Modules linked in:
      [    3.440257] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.12.4-g70e7f0549188-dirty #142
      [    3.440793] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
      [    3.441561] RIP: 0010:free_irq+0xfb/0x480
      [    3.441845] Code: 6e 08 74 6f 4d 89 f4 e8 c3 78 09 00 4d 8b 74 24 18 4d 85 f6 75 e3 e8 b4 78 09 00 8b 75 c8 48 c7 c7 a0 ac d5 85 e8 95 d7 f5 ff <0f> 0b 48 8b 75 c0 4c 89 ff e8 87 c5 90 03 48 8b 43 40 4c 8b a0 80
      [    3.443121] RSP: 0000:ffffc90000017b50 EFLAGS: 00010086
      [    3.443483] RAX: 0000000000000000 RBX: ffff888107c6f000 RCX: 0000000000000000
      [    3.443972] RDX: 0000000000000000 RSI: ffffffff8123f301 RDI: 00000000ffffffff
      [    3.444462] RBP: ffffc90000017b90 R08: 0000000000000001 R09: 0000000000000003
      [    3.444950] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
      [    3.444994] R13: ffff888107dc0000 R14: ffff888104f6bf00 R15: ffff888107c6f0a8
      [    3.444994] FS:  0000000000000000(0000) GS:ffff88817bd40000(0000) knlGS:0000000000000000
      [    3.444994] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    3.444994] CR2: 0000000000000000 CR3: 000000000642e000 CR4: 00000000000006e0
      [    3.444994] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [    3.444994] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [    3.444994] Call Trace:
      [    3.444994]  ns_init_card_error+0x18e/0x250
      [    3.444994]  nicstar_init_one+0x10d2/0x1130
      [    3.444994]  local_pci_probe+0x4a/0xb0
      [    3.444994]  pci_device_probe+0x126/0x1d0
      [    3.444994]  ? pci_device_remove+0x100/0x100
      [    3.444994]  really_probe+0x27e/0x650
      [    3.444994]  driver_probe_device+0x84/0x1d0
      [    3.444994]  ? mutex_lock_nested+0x16/0x20
      [    3.444994]  device_driver_attach+0x63/0x70
      [    3.444994]  __driver_attach+0x117/0x1a0
      [    3.444994]  ? device_driver_attach+0x70/0x70
      [    3.444994]  bus_for_each_dev+0xb6/0x110
      [    3.444994]  ? rdinit_setup+0x40/0x40
      [    3.444994]  driver_attach+0x22/0x30
      [    3.444994]  bus_add_driver+0x1e6/0x2a0
      [    3.444994]  driver_register+0xa4/0x180
      [    3.444994]  __pci_register_driver+0x77/0x80
      [    3.444994]  ? uPD98402_module_init+0xd/0xd
      [    3.444994]  nicstar_init+0x1f/0x75
      [    3.444994]  do_one_initcall+0x7a/0x3d0
      [    3.444994]  ? rdinit_setup+0x40/0x40
      [    3.444994]  ? rcu_read_lock_sched_held+0x4a/0x70
      [    3.444994]  kernel_init_freeable+0x2a7/0x2f9
      [    3.444994]  ? rest_init+0x2c0/0x2c0
      [    3.444994]  kernel_init+0x13/0x180
      [    3.444994]  ? rest_init+0x2c0/0x2c0
      [    3.444994]  ? rest_init+0x2c0/0x2c0
      [    3.444994]  ret_from_fork+0x1f/0x30
      [    3.444994] Kernel panic - not syncing: panic_on_warn set ...
      [    3.444994] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.12.4-g70e7f0549188-dirty #142
      [    3.444994] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
      [    3.444994] Call Trace:
      [    3.444994]  dump_stack+0xba/0xf5
      [    3.444994]  ? free_irq+0xfb/0x480
      [    3.444994]  panic+0x155/0x3ed
      [    3.444994]  ? __warn+0xed/0x150
      [    3.444994]  ? free_irq+0xfb/0x480
      [    3.444994]  __warn+0x103/0x150
      [    3.444994]  ? free_irq+0xfb/0x480
      [    3.444994]  report_bug+0x119/0x1c0
      [    3.444994]  handle_bug+0x3b/0x80
      [    3.444994]  exc_invalid_op+0x18/0x70
      [    3.444994]  asm_exc_invalid_op+0x12/0x20
      [    3.444994] RIP: 0010:free_irq+0xfb/0x480
      [    3.444994] Code: 6e 08 74 6f 4d 89 f4 e8 c3 78 09 00 4d 8b 74 24 18 4d 85 f6 75 e3 e8 b4 78 09 00 8b 75 c8 48 c7 c7 a0 ac d5 85 e8 95 d7 f5 ff <0f> 0b 48 8b 75 c0 4c 89 ff e8 87 c5 90 03 48 8b 43 40 4c 8b a0 80
      [    3.444994] RSP: 0000:ffffc90000017b50 EFLAGS: 00010086
      [    3.444994] RAX: 0000000000000000 RBX: ffff888107c6f000 RCX: 0000000000000000
      [    3.444994] RDX: 0000000000000000 RSI: ffffffff8123f301 RDI: 00000000ffffffff
      [    3.444994] RBP: ffffc90000017b90 R08: 0000000000000001 R09: 0000000000000003
      [    3.444994] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
      [    3.444994] R13: ffff888107dc0000 R14: ffff888104f6bf00 R15: ffff888107c6f0a8
      [    3.444994]  ? vprintk_func+0x71/0x110
      [    3.444994]  ns_init_card_error+0x18e/0x250
      [    3.444994]  nicstar_init_one+0x10d2/0x1130
      [    3.444994]  local_pci_probe+0x4a/0xb0
      [    3.444994]  pci_device_probe+0x126/0x1d0
      [    3.444994]  ? pci_device_remove+0x100/0x100
      [    3.444994]  really_probe+0x27e/0x650
      [    3.444994]  driver_probe_device+0x84/0x1d0
      [    3.444994]  ? mutex_lock_nested+0x16/0x20
      [    3.444994]  device_driver_attach+0x63/0x70
      [    3.444994]  __driver_attach+0x117/0x1a0
      [    3.444994]  ? device_driver_attach+0x70/0x70
      [    3.444994]  bus_for_each_dev+0xb6/0x110
      [    3.444994]  ? rdinit_setup+0x40/0x40
      [    3.444994]  driver_attach+0x22/0x30
      [    3.444994]  bus_add_driver+0x1e6/0x2a0
      [    3.444994]  driver_register+0xa4/0x180
      [    3.444994]  __pci_register_driver+0x77/0x80
      [    3.444994]  ? uPD98402_module_init+0xd/0xd
      [    3.444994]  nicstar_init+0x1f/0x75
      [    3.444994]  do_one_initcall+0x7a/0x3d0
      [    3.444994]  ? rdinit_setup+0x40/0x40
      [    3.444994]  ? rcu_read_lock_sched_held+0x4a/0x70
      [    3.444994]  kernel_init_freeable+0x2a7/0x2f9
      [    3.444994]  ? rest_init+0x2c0/0x2c0
      [    3.444994]  kernel_init+0x13/0x180
      [    3.444994]  ? rest_init+0x2c0/0x2c0
      [    3.444994]  ? rest_init+0x2c0/0x2c0
      [    3.444994]  ret_from_fork+0x1f/0x30
      [    3.444994] Dumping ftrace buffer:
      [    3.444994]    (ftrace buffer empty)
      [    3.444994] Kernel Offset: disabled
      [    3.444994] Rebooting in 1 seconds..
      Signed-off-by: default avatarZheyu Ma <zheyuma97@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70b639dc
    • Zheyu Ma's avatar
      atm: nicstar: use 'dma_free_coherent' instead of 'kfree' · 6a1e5a4a
      Zheyu Ma authored
      When 'nicstar_init_one' fails, 'ns_init_card_error' will be executed for
      error handling, but the correct memory free function should be used,
      otherwise it will cause an error. Since 'card->rsq.org' and
      'card->tsq.org' are allocated using 'dma_alloc_coherent' function, they
      should be freed using 'dma_free_coherent'.
      
      Fix this by using 'dma_free_coherent' instead of 'kfree'
      
      This log reveals it:
      
      [    3.440294] kernel BUG at mm/slub.c:4206!
      [    3.441059] invalid opcode: 0000 [#1] PREEMPT SMP PTI
      [    3.441430] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.12.4-g70e7f0549188-dirty #141
      [    3.441986] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
      [    3.442780] RIP: 0010:kfree+0x26a/0x300
      [    3.443065] Code: e8 3a c3 b9 ff e9 d6 fd ff ff 49 8b 45 00 31 db a9 00 00 01 00 75 4d 49 8b 45 00 a9 00 00 01 00 75 0a 49 8b 45 08 a8 01 75 02 <0f> 0b 89 d9 b8 00 10 00 00 be 06 00 00 00 48 d3 e0 f7 d8 48 63 d0
      [    3.443396] RSP: 0000:ffffc90000017b70 EFLAGS: 00010246
      [    3.443396] RAX: dead000000000100 RBX: 0000000000000000 RCX: 0000000000000000
      [    3.443396] RDX: 0000000000000000 RSI: ffffffff85d3df94 RDI: ffffffff85df38e6
      [    3.443396] RBP: ffffc90000017b90 R08: 0000000000000001 R09: 0000000000000001
      [    3.443396] R10: 0000000000000000 R11: 0000000000000001 R12: ffff888107dc0000
      [    3.443396] R13: ffffea00001f0100 R14: ffff888101a8bf00 R15: ffff888107dc0160
      [    3.443396] FS:  0000000000000000(0000) GS:ffff88817bc80000(0000) knlGS:0000000000000000
      [    3.443396] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    3.443396] CR2: 0000000000000000 CR3: 000000000642e000 CR4: 00000000000006e0
      [    3.443396] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [    3.443396] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [    3.443396] Call Trace:
      [    3.443396]  ns_init_card_error+0x12c/0x220
      [    3.443396]  nicstar_init_one+0x10d2/0x1130
      [    3.443396]  local_pci_probe+0x4a/0xb0
      [    3.443396]  pci_device_probe+0x126/0x1d0
      [    3.443396]  ? pci_device_remove+0x100/0x100
      [    3.443396]  really_probe+0x27e/0x650
      [    3.443396]  driver_probe_device+0x84/0x1d0
      [    3.443396]  ? mutex_lock_nested+0x16/0x20
      [    3.443396]  device_driver_attach+0x63/0x70
      [    3.443396]  __driver_attach+0x117/0x1a0
      [    3.443396]  ? device_driver_attach+0x70/0x70
      [    3.443396]  bus_for_each_dev+0xb6/0x110
      [    3.443396]  ? rdinit_setup+0x40/0x40
      [    3.443396]  driver_attach+0x22/0x30
      [    3.443396]  bus_add_driver+0x1e6/0x2a0
      [    3.443396]  driver_register+0xa4/0x180
      [    3.443396]  __pci_register_driver+0x77/0x80
      [    3.443396]  ? uPD98402_module_init+0xd/0xd
      [    3.443396]  nicstar_init+0x1f/0x75
      [    3.443396]  do_one_initcall+0x7a/0x3d0
      [    3.443396]  ? rdinit_setup+0x40/0x40
      [    3.443396]  ? rcu_read_lock_sched_held+0x4a/0x70
      [    3.443396]  kernel_init_freeable+0x2a7/0x2f9
      [    3.443396]  ? rest_init+0x2c0/0x2c0
      [    3.443396]  kernel_init+0x13/0x180
      [    3.443396]  ? rest_init+0x2c0/0x2c0
      [    3.443396]  ? rest_init+0x2c0/0x2c0
      [    3.443396]  ret_from_fork+0x1f/0x30
      [    3.443396] Modules linked in:
      [    3.443396] Dumping ftrace buffer:
      [    3.443396]    (ftrace buffer empty)
      [    3.458593] ---[ end trace 3c6f8f0d8ef59bcd ]---
      [    3.458922] RIP: 0010:kfree+0x26a/0x300
      [    3.459198] Code: e8 3a c3 b9 ff e9 d6 fd ff ff 49 8b 45 00 31 db a9 00 00 01 00 75 4d 49 8b 45 00 a9 00 00 01 00 75 0a 49 8b 45 08 a8 01 75 02 <0f> 0b 89 d9 b8 00 10 00 00 be 06 00 00 00 48 d3 e0 f7 d8 48 63 d0
      [    3.460499] RSP: 0000:ffffc90000017b70 EFLAGS: 00010246
      [    3.460870] RAX: dead000000000100 RBX: 0000000000000000 RCX: 0000000000000000
      [    3.461371] RDX: 0000000000000000 RSI: ffffffff85d3df94 RDI: ffffffff85df38e6
      [    3.461873] RBP: ffffc90000017b90 R08: 0000000000000001 R09: 0000000000000001
      [    3.462372] R10: 0000000000000000 R11: 0000000000000001 R12: ffff888107dc0000
      [    3.462871] R13: ffffea00001f0100 R14: ffff888101a8bf00 R15: ffff888107dc0160
      [    3.463368] FS:  0000000000000000(0000) GS:ffff88817bc80000(0000) knlGS:0000000000000000
      [    3.463949] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    3.464356] CR2: 0000000000000000 CR3: 000000000642e000 CR4: 00000000000006e0
      [    3.464856] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [    3.465356] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [    3.465860] Kernel panic - not syncing: Fatal exception
      [    3.466370] Dumping ftrace buffer:
      [    3.466616]    (ftrace buffer empty)
      [    3.466871] Kernel Offset: disabled
      [    3.467122] Rebooting in 1 seconds..
      Signed-off-by: default avatarZheyu Ma <zheyuma97@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a1e5a4a
    • David S. Miller's avatar
      Merge branch 'mptcp-sdeq-fixes' · 0d0f2a36
      David S. Miller authored
      Mat Martineau says:
      
      ====================
      mptcp: 32-bit sequence number improvements
      
      MPTCP-level sequence numbers are 64 bits, but RFC 8684 allows use of
      32-bit sequence numbers in the DSS option to save header space. Those
      32-bit numbers are the least significant bits of the full 64-bit
      sequence number, so the receiver must infer the correct upper 32 bits.
      
      These two patches improve the logic for determining the full 64-bit
      sequence numbers when the 32-bit truncated version has wrapped around.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d0f2a36
    • Paolo Abeni's avatar
      mptcp: fix 32 bit DSN expansion · 5957a890
      Paolo Abeni authored
      The current implementation of 32 bit DSN expansion is buggy.
      After the previous patch, we can simply reuse the newly
      introduced helper to do the expansion safely.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/120
      Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path")
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5957a890
    • Paolo Abeni's avatar
      mptcp: fix bad handling of 32 bit ack wrap-around · 1502328f
      Paolo Abeni authored
      When receiving 32 bits DSS ack from the peer, the MPTCP need
      to expand them to 64 bits value. The current code is buggy
      WRT detecting 32 bits ack wrap-around: when the wrap-around
      happens the current unsigned 32 bit ack value is lower than
      the previous one.
      
      Additionally check for possible reverse wrap and make the helper
      visible, so that we could re-use it for the next patch.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/204
      Fixes: cc9d2566 ("mptcp: update per unacked sequence on pkt reception")
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1502328f
    • Jakub Kicinski's avatar
      tls: prevent oversized sendfile() hangs by ignoring MSG_MORE · d452d48b
      Jakub Kicinski authored
      We got multiple reports that multi_chunk_sendfile test
      case from tls selftest fails. This was sort of expected,
      as the original fix was never applied (see it in the first
      Link:). The test in question uses sendfile() with count
      larger than the size of the underlying file. This will
      make splice set MSG_MORE on all sendpage calls, meaning
      TLS will never close and flush the last partial record.
      
      Eric seem to have addressed a similar problem in
      commit 35f9c09f ("tcp: tcp_sendpages() should call tcp_push() once")
      by introducing MSG_SENDPAGE_NOTLAST. Unlike MSG_MORE
      MSG_SENDPAGE_NOTLAST is not set on the last call
      of a "pipefull" of data (PIPE_DEF_BUFFERS == 16,
      so every 16 pages or whenever we run out of data).
      
      Having a break every 16 pages should be fine, TLS
      can pack exactly 4 pages into a record, so for
      aligned reads there should be no difference,
      unaligned may see one extra record per sendpage().
      
      Sticking to TCP semantics seems preferable to modifying
      splice, but we can revisit it if real life scenarios
      show a regression.
      Reported-by: default avatarVadim Fedorenko <vfedorenko@novek.ru>
      Reported-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Link: https://lore.kernel.org/netdev/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/
      Fixes: 3c4d7559 ("tls: kernel TLS support")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Tested-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d452d48b
    • David S. Miller's avatar
      Merge tag 'linux-can-fixes-for-5.13-20210619' of... · d52f9b22
      David S. Miller authored
      Merge tag 'linux-can-fixes-for-5.13-20210619' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2021-06-19
      
      this is a pull request of 5 patches for net/master.
      
      The first patch is by Thadeu Lima de Souza Cascardo and fixes a
      potential use-after-free in the CAN broadcast manager socket, by
      delaying the release of struct bcm_op after synchronize_rcu().
      
      Oliver Hartkopp's patch fixes a similar potential user-after-free in
      the CAN gateway socket by synchronizing RCU operations before removing
      gw job entry.
      
      Another patch by Oliver Hartkopp fixes a potential use-after-free in
      the ISOTP socket by omitting unintended hrtimer restarts on socket
      release.
      
      Oleksij Rempel's patch for the j1939 socket fixes a potential
      use-after-free by setting the SOCK_RCU_FREE flag on the socket.
      
      The last patch is by Pavel Skripkin and fixes a use-after-free in the
      ems_usb CAN driver.
      
      All patches are intended for stable and have stable@v.k.o on Cc.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d52f9b22
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-2021-06-19' of... · 0d98ec87
      David S. Miller authored
      Merge tag 'wireless-drivers-2021-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for v5.13
      
      Only one important fix for an mwifiex regression.
      
      mwifiex
      
      * fix deadlock during rmmod or firmware reset, regression from
        cfg80211 RTNL changes in v5.12-rc1
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d98ec87
    • Haiyang Zhang's avatar
      hv_netvsc: Set needed_headroom according to VF · 536ba2e0
      Haiyang Zhang authored
      Set needed_headroom according to VF if VF needs a bigger
      headroom.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      536ba2e0
    • Sebastian Andrzej Siewior's avatar
      net/netif_receive_skb_core: Use migrate_disable() · 2b4cd14f
      Sebastian Andrzej Siewior authored
      The preempt disable around do_xdp_generic() has been introduced in
      commit
         bbbe211c ("net: rcu lock and preempt disable missing around generic xdp")
      
      For BPF it is enough to use migrate_disable() and the code was updated
      as it can be seen in commit
         3c58482a ("bpf: Provide bpf_prog_run_pin_on_cpu() helper")
      
      This is a leftover which was not converted.
      
      Use migrate_disable() before invoking do_xdp_generic().
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b4cd14f
    • Yunsheng Lin's avatar
      net: sched: add barrier to ensure correct ordering for lockless qdisc · 89837eb4
      Yunsheng Lin authored
      The spin_trylock() was assumed to contain the implicit
      barrier needed to ensure the correct ordering between
      STATE_MISSED setting/clearing and STATE_MISSED checking
      in commit a90c57f2 ("net: sched: fix packet stuck
      problem for lockless qdisc").
      
      But it turns out that spin_trylock() only has load-acquire
      semantic, for strongly-ordered system(like x86), the compiler
      barrier implicitly contained in spin_trylock() seems enough
      to ensure the correct ordering. But for weakly-orderly system
      (like arm64), the store-release semantic is needed to ensure
      the correct ordering as clear_bit() and test_bit() is store
      operation, see queued_spin_lock().
      
      So add the explicit barrier to ensure the correct ordering
      for the above case.
      
      Fixes: a90c57f2 ("net: sched: fix packet stuck problem for lockless qdisc")
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89837eb4
    • Antoine Tenart's avatar
      vrf: do not push non-ND strict packets with a source LLA through packet taps again · 603113c5
      Antoine Tenart authored
      Non-ND strict packets with a source LLA go through the packet taps
      again, while non-ND strict packets with other source addresses do not,
      and we can see a clone of those packets on the vrf interface (we should
      not). This is due to a series of changes:
      
      Commit 6f12fa77[1] made non-ND strict packets not being pushed again
      in the packet taps. This changed with commit 205704c6[2] for those
      packets having a source LLA, as they need a lookup with the orig_iif.
      
      The issue now is those packets do not skip the 'vrf_ip6_rcv' function to
      the end (as the ones without a source LLA) and go through the check to
      call packet taps again. This check was changed by commit 6f12fa77[1]
      and do not exclude non-strict packets anymore. Packets matching
      'need_strict && !is_ndisc && is_ll_src' are now being sent through the
      packet taps again. This can be seen by dumping packets on the vrf
      interface.
      
      Fix this by having the same code path for all non-ND strict packets and
      selectively lookup with the orig_iif for those with a source LLA. This
      has the effect to revert to the pre-205704c6[2] condition, which
      should also be easier to maintain.
      
      [1] 6f12fa77 ("vrf: mark skb for multicast or link-local as enslaved to VRF")
      [2] 205704c6 ("vrf: packets with lladdr src needs dst at input with orig_iif when needs strict")
      
      Fixes: 205704c6 ("vrf: packets with lladdr src needs dst at input with orig_iif when needs strict")
      Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
      Reported-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      603113c5
  3. 19 Jun, 2021 11 commits
    • Pavel Skripkin's avatar
      net: can: ems_usb: fix use-after-free in ems_usb_disconnect() · ab4a0b8f
      Pavel Skripkin authored
      In ems_usb_disconnect() dev pointer, which is netdev private data, is
      used after free_candev() call:
      | 	if (dev) {
      | 		unregister_netdev(dev->netdev);
      | 		free_candev(dev->netdev);
      |
      | 		unlink_all_urbs(dev);
      |
      | 		usb_free_urb(dev->intr_urb);
      |
      | 		kfree(dev->intr_in_buffer);
      | 		kfree(dev->tx_msg_buffer);
      | 	}
      
      Fix it by simply moving free_candev() at the end of the block.
      
      Fail log:
      | BUG: KASAN: use-after-free in ems_usb_disconnect
      | Read of size 8 at addr ffff88804e041008 by task kworker/1:2/2895
      |
      | CPU: 1 PID: 2895 Comm: kworker/1:2 Not tainted 5.13.0-rc5+ #164
      | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.4
      | Workqueue: usb_hub_wq hub_event
      | Call Trace:
      |     dump_stack (lib/dump_stack.c:122)
      |     print_address_description.constprop.0.cold (mm/kasan/report.c:234)
      |     kasan_report.cold (mm/kasan/report.c:420 mm/kasan/report.c:436)
      |     ems_usb_disconnect (drivers/net/can/usb/ems_usb.c:683 drivers/net/can/usb/ems_usb.c:1058)
      
      Fixes: 702171ad ("ems_usb: Added support for EMS CPC-USB/ARM7 CAN/USB interface")
      Link: https://lore.kernel.org/r/20210617185130.5834-1-paskripkin@gmail.com
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      ab4a0b8f
    • Oleksij Rempel's avatar
      can: j1939: j1939_sk_init(): set SOCK_RCU_FREE to call sk_destruct() after RCU is done · 22c696fe
      Oleksij Rempel authored
      Set SOCK_RCU_FREE to let RCU to call sk_destruct() on completion.
      Without this patch, we will run in to j1939_can_recv() after priv was
      freed by j1939_sk_release()->j1939_sk_sock_destruct()
      
      Fixes: 25fe97cb ("can: j1939: move j1939_priv_put() into sk_destruct callback")
      Link: https://lore.kernel.org/r/20210617130623.12705-1-o.rempel@pengutronix.de
      Cc: linux-stable <stable@vger.kernel.org>
      Reported-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Reported-by: syzbot+bdf710cfc41c186fdff3@syzkaller.appspotmail.com
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      22c696fe
    • Oliver Hartkopp's avatar
      can: isotp: isotp_release(): omit unintended hrtimer restart on socket release · 14a4696b
      Oliver Hartkopp authored
      When closing the isotp socket, the potentially running hrtimers are
      canceled before removing the subscription for CAN identifiers via
      can_rx_unregister().
      
      This may lead to an unintended (re)start of a hrtimer in
      isotp_rcv_cf() and isotp_rcv_fc() in the case that a CAN frame is
      received by isotp_rcv() while the subscription removal is processed.
      
      However, isotp_rcv() is called under RCU protection, so after calling
      can_rx_unregister, we may call synchronize_rcu in order to wait for
      any RCU read-side critical sections to finish. This prevents the
      reception of CAN frames after hrtimer_cancel() and therefore the
      unintended (re)start of the hrtimers.
      
      Link: https://lore.kernel.org/r/20210618173713.2296-1-socketcan@hartkopp.net
      Fixes: e057dd3f ("can: add ISO 15765-2:2016 transport protocol")
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      14a4696b
    • Oliver Hartkopp's avatar
      can: gw: synchronize rcu operations before removing gw job entry · fb8696ab
      Oliver Hartkopp authored
      can_can_gw_rcv() is called under RCU protection, so after calling
      can_rx_unregister(), we have to call synchronize_rcu in order to wait
      for any RCU read-side critical sections to finish before removing the
      kmem_cache entry with the referenced gw job entry.
      
      Link: https://lore.kernel.org/r/20210618173645.2238-1-socketcan@hartkopp.net
      Fixes: c1aabdf3 ("can-gw: add netlink based CAN routing")
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      fb8696ab
    • Thadeu Lima de Souza Cascardo's avatar
      can: bcm: delay release of struct bcm_op after synchronize_rcu() · d5f9023f
      Thadeu Lima de Souza Cascardo authored
      can_rx_register() callbacks may be called concurrently to the call to
      can_rx_unregister(). The callbacks and callback data, though, are
      protected by RCU and the struct sock reference count.
      
      So the callback data is really attached to the life of sk, meaning
      that it should be released on sk_destruct. However, bcm_remove_op()
      calls tasklet_kill(), and RCU callbacks may be called under RCU
      softirq, so that cannot be used on kernels before the introduction of
      HRTIMER_MODE_SOFT.
      
      However, bcm_rx_handler() is called under RCU protection, so after
      calling can_rx_unregister(), we may call synchronize_rcu() in order to
      wait for any RCU read-side critical sections to finish. That is,
      bcm_rx_handler() won't be called anymore for those ops. So, we only
      free them, after we do that synchronize_rcu().
      
      Fixes: ffd980f9 ("[CAN]: Add broadcast manager (bcm) protocol")
      Link: https://lore.kernel.org/r/20210619161813.2098382-1-cascardo@canonical.com
      Cc: linux-stable <stable@vger.kernel.org>
      Reported-by: syzbot+0f7e7e5e2f4f40fa89c0@syzkaller.appspotmail.com
      Reported-by: default avatarNorbert Slusarek <nslusarek@gmx.net>
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Acked-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      d5f9023f
    • David S. Miller's avatar
      Merge branch 'ezchip-fixes' · dda2626b
      David S. Miller authored
      Pavel Skripkin says:
      
      ====================
      net: ethernat: ezchip: bug fixing and code improvments
      
      While manual code reviewing, I found some error in ezchip driver.
      Two of them looks very dangerous:
        1. use-after-free in nps_enet_remove
            Accessing netdev private data after free_netdev()
      
        2. wrong error handling of platform_get_irq()
            It can cause passing negative irq to request_irq()
      
      Also, in 2nd patch I removed redundant check to increase execution
      speed and make code more straightforward.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dda2626b
    • Pavel Skripkin's avatar
      net: ethernet: ezchip: fix error handling · 0de449d5
      Pavel Skripkin authored
      As documented at drivers/base/platform.c for platform_get_irq:
      
       * Gets an IRQ for a platform device and prints an error message if finding the
       * IRQ fails. Device drivers should check the return value for errors so as to
       * not pass a negative integer value to the request_irq() APIs.
      
      So, the driver should check that platform_get_irq() return value
      is _negative_, not that it's equal to zero, because -ENXIO (return
      value from request_irq() if irq was not found) will
      pass this check and it leads to passing negative irq to request_irq()
      
      Fixes: 0dd07709 ("NET: Add ezchip ethernet driver")
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0de449d5
    • Pavel Skripkin's avatar
      net: ethernet: ezchip: remove redundant check · 4ae85b23
      Pavel Skripkin authored
      err varibale will be set everytime, when code gets
      into this path. This check will just slowdown the execution
      and that's all.
      
      Fixes: 0dd07709 ("NET: Add ezchip ethernet driver")
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ae85b23
    • Pavel Skripkin's avatar
      net: ethernet: ezchip: fix UAF in nps_enet_remove · e4b8700e
      Pavel Skripkin authored
      priv is netdev private data, but it is used
      after free_netdev(). It can cause use-after-free when accessing priv
      pointer. So, fix it by moving free_netdev() after netif_napi_del()
      call.
      
      Fixes: 0dd07709 ("NET: Add ezchip ethernet driver")
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4b8700e
    • Pavel Skripkin's avatar
      net: ethernet: aeroflex: fix UAF in greth_of_remove · e3a5de6d
      Pavel Skripkin authored
      static int greth_of_remove(struct platform_device *of_dev)
      {
      ...
      	struct greth_private *greth = netdev_priv(ndev);
      ...
      	unregister_netdev(ndev);
      	free_netdev(ndev);
      
      	of_iounmap(&of_dev->resource[0], greth->regs, resource_size(&of_dev->resource[0]));
      ...
      }
      
      greth is netdev private data, but it is used
      after free_netdev(). It can cause use-after-free when accessing greth
      pointer. So, fix it by moving free_netdev() after of_iounmap()
      call.
      
      Fixes: d4c41139 ("net: Add Aeroflex Gaisler 10/100/1G Ethernet MAC driver")
      Signed-off-by: default avatarPavel Skripkin <paskripkin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3a5de6d
    • Linus Torvalds's avatar
      Merge tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 9ed13a17
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes for 5.13-rc7, including fixes from wireless, bpf,
        bluetooth, netfilter and can.
      
        Current release - regressions:
      
         - mlxsw: spectrum_qdisc: Pass handle, not band number to find_class()
           to fix modifying offloaded qdiscs
      
         - lantiq: net: fix duplicated skb in rx descriptor ring
      
         - rtnetlink: fix regression in bridge VLAN configuration, empty info
           is not an error, bot-generated "fix" was not needed
      
         - libbpf: s/rx/tx/ typo on umem->rx_ring_setup_done to fix umem
           creation
      
        Current release - new code bugs:
      
         - ethtool: fix NULL pointer dereference during module EEPROM dump via
           the new netlink API
      
         - mlx5e: don't update netdev RQs with PTP-RQ, the special purpose
           queue should not be visible to the stack
      
         - mlx5e: select special PTP queue only for SKBTX_HW_TSTAMP skbs
      
         - mlx5e: verify dev is present in get devlink port ndo, avoid a panic
      
        Previous releases - regressions:
      
         - neighbour: allow NUD_NOARP entries to be force GCed
      
         - further fixes for fallout from reorg of WiFi locking (staging:
           rtl8723bs, mac80211, cfg80211)
      
         - skbuff: fix incorrect msg_zerocopy copy notifications
      
         - mac80211: fix NULL ptr deref for injected rate info
      
         - Revert "net/mlx5: Arm only EQs with EQEs" it may cause missed IRQs
      
        Previous releases - always broken:
      
         - bpf: more speculative execution fixes
      
         - netfilter: nft_fib_ipv6: skip ipv6 packets from any to link-local
      
         - udp: fix race between close() and udp_abort() resulting in a panic
      
         - fix out of bounds when parsing TCP options before packets are
           validated (in netfilter: synproxy, tc: sch_cake and mptcp)
      
         - mptcp: improve operation under memory pressure, add missing
           wake-ups
      
         - mptcp: fix double-lock/soft lookup in subflow_error_report()
      
         - bridge: fix races (null pointer deref and UAF) in vlan tunnel
           egress
      
         - ena: fix DMA mapping function issues in XDP
      
         - rds: fix memory leak in rds_recvmsg
      
        Misc:
      
         - vrf: allow larger MTUs
      
         - icmp: don't send out ICMP messages with a source address of 0.0.0.0
      
         - cdc_ncm: switch to eth%d interface naming"
      
      * tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (139 commits)
        net: ethernet: fix potential use-after-free in ec_bhf_remove
        selftests/net: Add icmp.sh for testing ICMP dummy address responses
        icmp: don't send out ICMP messages with a source address of 0.0.0.0
        net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY
        net: ll_temac: Fix TX BD buffer overwrite
        net: ll_temac: Add memory-barriers for TX BD access
        net: ll_temac: Make sure to free skb when it is completely used
        MAINTAINERS: add Guvenc as SMC maintainer
        bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error path
        bnxt_en: Fix TQM fastpath ring backing store computation
        bnxt_en: Rediscover PHY capabilities after firmware reset
        cxgb4: fix wrong shift.
        mac80211: handle various extensible elements correctly
        mac80211: reset profile_periodicity/ema_ap
        cfg80211: avoid double free of PMSR request
        cfg80211: make certificate generation more robust
        mac80211: minstrel_ht: fix sample time check
        net: qed: Fix memcpy() overflow of qed_dcbx_params()
        net: cdc_eem: fix tx fixup skb leak
        net: hamradio: fix memory leak in mkiss_close
        ...
      9ed13a17