1. 13 Apr, 2023 10 commits
    • Aaron Conole's avatar
      selftests: openvswitch: adjust datapath NL message declaration · 306dc213
      Aaron Conole authored
      The netlink message for creating a new datapath takes an array
      of ports for the PID creation.  This shouldn't cause much issue
      but correct it for future cases where we need to do decode of
      datapath information that could include the per-cpu PID map.
      
      Fixes: 25f16c87 ("selftests: add openvswitch selftest suite")
      Signed-off-by: default avatarAaron Conole <aconole@redhat.com>
      Link: https://lore.kernel.org/r/20230412115828.3991806-1-aconole@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      306dc213
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-more-fixes-for-6-3' · ecfcc6fb
      Jakub Kicinski authored
      Matthieu Baerts says:
      
      ====================
      mptcp: more fixes for 6.3
      
      Patch 1 avoids scheduling the MPTCP worker on a closed socket on some
      edge cases. It fixes issues that can be visible from v5.11.
      
      Patch 2 makes sure the MPTCP worker doesn't try to manipulate
      disconnected sockets. This is also a fix for an issue that can be
      visible from v5.11.
      
      Patch 3 fixes a NULL pointer dereference when MPTCP FastOpen is used
      and an early fallback is done. A fix for v6.2.
      
      Patch 4 improves the stability of the userspace PM selftest for a
      subtest added in v6.2.
      ====================
      
      Link: https://lore.kernel.org/r/20230411-upstream-net-20230411-mptcp-fixes-v1-0-ca540f3ef986@tessares.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ecfcc6fb
    • Matthieu Baerts's avatar
      selftests: mptcp: userspace pm: uniform verify events · 711ae788
      Matthieu Baerts authored
      Simply adding a "sleep" before checking something is usually not a good
      idea because the time that has been picked can not be enough or too
      much. The best is to wait for events with a timeout.
      
      In this selftest, 'sleep 0.5' is used more than 40 times. It is always
      used before calling a 'verify_*' function except for this
      verify_listener_events which has been added later.
      
      At the end, using all these 'sleep 0.5' seems to work: the slow CIs
      don't complain so far. Also because it doesn't take too much time, we
      can just add two more 'sleep 0.5' to uniform what is done before calling
      a 'verify_*' function. For the same reasons, we can also delay a bigger
      refactoring to replace all these 'sleep 0.5' by functions waiting for
      events instead of waiting for a fix time and hope for the best.
      
      Fixes: 6c73008a ("selftests: mptcp: listener test for userspace PM")
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      711ae788
    • Paolo Abeni's avatar
      mptcp: fix NULL pointer dereference on fastopen early fallback · c0ff6f6d
      Paolo Abeni authored
      In case of early fallback to TCP, subflow_syn_recv_sock() deletes
      the subflow context before returning the newly allocated sock to
      the caller.
      
      The fastopen path does not cope with the above unconditionally
      dereferencing the subflow context.
      
      Fixes: 36b122ba ("mptcp: add subflow_v(4,6)_send_synack()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c0ff6f6d
    • Paolo Abeni's avatar
      mptcp: stricter state check in mptcp_worker · d6a04437
      Paolo Abeni authored
      As reported by Christoph, the mptcp protocol can run the
      worker when the relevant msk socket is in an unexpected state:
      
      connect()
      // incoming reset + fastclose
      // the mptcp worker is scheduled
      mptcp_disconnect()
      // msk is now CLOSED
      listen()
      mptcp_worker()
      
      Leading to the following splat:
      
      divide error: 0000 [#1] PREEMPT SMP
      CPU: 1 PID: 21 Comm: kworker/1:0 Not tainted 6.3.0-rc1-gde5e8fd0123c #11
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      Workqueue: events mptcp_worker
      RIP: 0010:__tcp_select_window+0x22c/0x4b0 net/ipv4/tcp_output.c:3018
      RSP: 0018:ffffc900000b3c98 EFLAGS: 00010293
      RAX: 000000000000ffd7 RBX: 000000000000ffd7 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff8214ce97 RDI: 0000000000000004
      RBP: 000000000000ffd7 R08: 0000000000000004 R09: 0000000000010000
      R10: 000000000000ffd7 R11: ffff888005afa148 R12: 000000000000ffd7
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff88803ed00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000405270 CR3: 000000003011e006 CR4: 0000000000370ee0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       tcp_select_window net/ipv4/tcp_output.c:262 [inline]
       __tcp_transmit_skb+0x356/0x1280 net/ipv4/tcp_output.c:1345
       tcp_transmit_skb net/ipv4/tcp_output.c:1417 [inline]
       tcp_send_active_reset+0x13e/0x320 net/ipv4/tcp_output.c:3459
       mptcp_check_fastclose net/mptcp/protocol.c:2530 [inline]
       mptcp_worker+0x6c7/0x800 net/mptcp/protocol.c:2705
       process_one_work+0x3bd/0x950 kernel/workqueue.c:2390
       worker_thread+0x5b/0x610 kernel/workqueue.c:2537
       kthread+0x138/0x170 kernel/kthread.c:376
       ret_from_fork+0x2c/0x50 arch/x86/entry/entry_64.S:308
       </TASK>
      
      This change addresses the issue explicitly checking for bad states
      before running the mptcp worker.
      
      Fixes: e16163b6 ("mptcp: refactor shutdown and close")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/374Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Tested-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d6a04437
    • Paolo Abeni's avatar
      mptcp: use mptcp_schedule_work instead of open-coding it · a5cb752b
      Paolo Abeni authored
      Beyond reducing code duplication this also avoids scheduling
      the mptcp_worker on a closed socket on some edge scenarios.
      
      The addressed issue is actually older than the blamed commit
      below, but this fix needs it as a pre-requisite.
      
      Fixes: ba8f48f7 ("mptcp: introduce mptcp_schedule_work")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a5cb752b
    • Vladimir Oltean's avatar
      net: enetc: workaround for unresponsive pMAC after receiving express traffic · 5b7be2d4
      Vladimir Oltean authored
      I have observed an issue where the RX direction of the LS1028A ENETC pMAC
      seems unresponsive. The minimal procedure to reproduce the issue is:
      
      1. Connect ENETC port 0 with a loopback RJ45 cable to one of the Felix
         switch ports (0).
      
      2. Bring the ports up (MAC Merge layer is not enabled on either end).
      
      3. Send a large quantity of unidirectional (express) traffic from Felix
         to ENETC. I tried altering frame size and frame count, and it doesn't
         appear to be specific to either of them, but rather, to the quantity
         of octets received. Lowering the frame count, the minimum quantity of
         packets to reproduce relatively consistently seems to be around 37000
         frames at 1514 octets (w/o FCS) each.
      
      4. Using ethtool --set-mm, enable the pMAC in the Felix and in the ENETC
         ports, in both RX and TX directions, and with verification on both
         ends.
      
      5. Wait for verification to complete on both sides.
      
      6. Configure a traffic class as preemptible on both ends.
      
      7. Send some packets again.
      
      The issue is at step 5, where the verification process of ENETC ends
      (meaning that Felix responds with an SMD-R and ENETC sees the response),
      but the verification process of Felix never ends (it remains VERIFYING).
      
      If step 3 is skipped or if ENETC receives less traffic than
      approximately that threshold, the test runs all the way through
      (verification succeeds on both ends, preemptible traffic passes fine).
      
      If, between step 4 and 5, the step below is also introduced:
      
      4.1. Disable and re-enable PM0_COMMAND_CONFIG bit RX_EN
      
      then again, the sequence of steps runs all the way through, and
      verification succeeds, even if there was the previous RX traffic
      injected into ENETC.
      
      Traffic sent *by* the ENETC port prior to enabling the MAC Merge layer
      does not seem to influence the verification result, only received
      traffic does.
      
      The LS1028A manual does not mention any relationship between
      PM0_COMMAND_CONFIG and MMCSR, and the hardware people don't seem to
      know for now either.
      
      The bit that is toggled to work around the issue is also toggled
      by enetc_mac_enable(), called from phylink's mac_link_down() and
      mac_link_up() methods - which is how the workaround was found:
      verification would work after a link down/up.
      
      Fixes: c7b9e808 ("net: enetc: add support for MAC Merge layer")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20230411192645.1896048-1-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5b7be2d4
    • Xin Long's avatar
      sctp: fix a potential overflow in sctp_ifwdtsn_skip · 32832a2c
      Xin Long authored
      Currently, when traversing ifwdtsn skips with _sctp_walk_ifwdtsn, it only
      checks the pos against the end of the chunk. However, the data left for
      the last pos may be < sizeof(struct sctp_ifwdtsn_skip), and dereference
      it as struct sctp_ifwdtsn_skip may cause coverflow.
      
      This patch fixes it by checking the pos against "the end of the chunk -
      sizeof(struct sctp_ifwdtsn_skip)" in sctp_ifwdtsn_skip, similar to
      sctp_fwdtsn_skip.
      
      Fixes: 0fc2ea92 ("sctp: implement validate_ftsn for sctp_stream_interleave")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Link: https://lore.kernel.org/r/2a71bffcd80b4f2c61fac6d344bb2f11c8fd74f7.1681155810.git.lucien.xin@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      32832a2c
    • Ziyang Xuan's avatar
      net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume() · 64170709
      Ziyang Xuan authored
      Syzbot reported a bug as following:
      
      =====================================================
      BUG: KMSAN: uninit-value in qrtr_tx_resume+0x185/0x1f0 net/qrtr/af_qrtr.c:230
       qrtr_tx_resume+0x185/0x1f0 net/qrtr/af_qrtr.c:230
       qrtr_endpoint_post+0xf85/0x11b0 net/qrtr/af_qrtr.c:519
       qrtr_tun_write_iter+0x270/0x400 net/qrtr/tun.c:108
       call_write_iter include/linux/fs.h:2189 [inline]
       aio_write+0x63a/0x950 fs/aio.c:1600
       io_submit_one+0x1d1c/0x3bf0 fs/aio.c:2019
       __do_sys_io_submit fs/aio.c:2078 [inline]
       __se_sys_io_submit+0x293/0x770 fs/aio.c:2048
       __x64_sys_io_submit+0x92/0xd0 fs/aio.c:2048
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Uninit was created at:
       slab_post_alloc_hook mm/slab.h:766 [inline]
       slab_alloc_node mm/slub.c:3452 [inline]
       __kmem_cache_alloc_node+0x71f/0xce0 mm/slub.c:3491
       __do_kmalloc_node mm/slab_common.c:967 [inline]
       __kmalloc_node_track_caller+0x114/0x3b0 mm/slab_common.c:988
       kmalloc_reserve net/core/skbuff.c:492 [inline]
       __alloc_skb+0x3af/0x8f0 net/core/skbuff.c:565
       __netdev_alloc_skb+0x120/0x7d0 net/core/skbuff.c:630
       qrtr_endpoint_post+0xbd/0x11b0 net/qrtr/af_qrtr.c:446
       qrtr_tun_write_iter+0x270/0x400 net/qrtr/tun.c:108
       call_write_iter include/linux/fs.h:2189 [inline]
       aio_write+0x63a/0x950 fs/aio.c:1600
       io_submit_one+0x1d1c/0x3bf0 fs/aio.c:2019
       __do_sys_io_submit fs/aio.c:2078 [inline]
       __se_sys_io_submit+0x293/0x770 fs/aio.c:2048
       __x64_sys_io_submit+0x92/0xd0 fs/aio.c:2048
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      It is because that skb->len requires at least sizeof(struct qrtr_ctrl_pkt)
      in qrtr_tx_resume(). And skb->len equals to size in qrtr_endpoint_post().
      But size is less than sizeof(struct qrtr_ctrl_pkt) when qrtr_cb->type
      equals to QRTR_TYPE_RESUME_TX in qrtr_endpoint_post() under the syzbot
      scenario. This triggers the uninit variable access bug.
      
      Add size check when qrtr_cb->type equals to QRTR_TYPE_RESUME_TX in
      qrtr_endpoint_post() to fix the bug.
      
      Fixes: 5fdeb0d3 ("net: qrtr: Implement outgoing flow control")
      Reported-by: syzbot+4436c9630a45820fda76@syzkaller.appspotmail.com
      Link: https://syzkaller.appspot.com/bug?id=c14607f0963d27d5a3d5f4c8639b500909e43540Suggested-by: default avatarManivannan Sadhasivam <mani@kernel.org>
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230410012352.3997823-1-william.xuanziyang@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      64170709
    • Martin Willi's avatar
      rtnetlink: Restore RTM_NEW/DELLINK notification behavior · 59d3efd2
      Martin Willi authored
      The commits referenced below allows userspace to use the NLM_F_ECHO flag
      for RTM_NEW/DELLINK operations to receive unicast notifications for the
      affected link. Prior to these changes, applications may have relied on
      multicast notifications to learn the same information without specifying
      the NLM_F_ECHO flag.
      
      For such applications, the mentioned commits changed the behavior for
      requests not using NLM_F_ECHO. Multicast notifications are still received,
      but now use the portid of the requester and the sequence number of the
      request instead of zero values used previously. For the application, this
      message may be unexpected and likely handled as a response to the
      NLM_F_ACKed request, especially if it uses the same socket to handle
      requests and notifications.
      
      To fix existing applications relying on the old notification behavior,
      set the portid and sequence number in the notification only if the
      request included the NLM_F_ECHO flag. This restores the old behavior
      for applications not using it, but allows unicasted notifications for
      others.
      
      Fixes: f3a63cce ("rtnetlink: Honour NLM_F_ECHO flag in rtnl_delete_link")
      Fixes: d88e136c ("rtnetlink: Honour NLM_F_ECHO flag in rtnl_newlink_create")
      Signed-off-by: default avatarMartin Willi <martin@strongswan.org>
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Acked-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Link: https://lore.kernel.org/r/20230411074319.24133-1-martin@strongswan.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      59d3efd2
  2. 12 Apr, 2023 6 commits
    • Rob Herring's avatar
      net: ti/cpsw: Add explicit platform_device.h and of_platform.h includes · 136f36c7
      Rob Herring authored
      TI CPSW uses of_platform_* functions which are declared in of_platform.h.
      of_platform.h gets implicitly included by of_device.h, but that is going
      to be removed soon. Nothing else depends on of_device.h so it can be
      dropped. of_platform.h also implicitly includes platform_device.h, so
      add an explicit include for it, too.
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Reviewed-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      136f36c7
    • Harshit Mogalapalli's avatar
      net: wwan: iosm: Fix error handling path in ipc_pcie_probe() · a56ef256
      Harshit Mogalapalli authored
      Smatch reports:
      	drivers/net/wwan/iosm/iosm_ipc_pcie.c:298 ipc_pcie_probe()
      	warn: missing unwind goto?
      
      When dma_set_mask fails it directly returns without disabling pci
      device and freeing ipc_pcie. Fix this my calling a correct goto label
      
      As dma_set_mask returns either 0 or -EIO, we can use a goto label, as
      it finally returns -EIO.
      
      Add a set_mask_fail goto label which stands consistent with other goto
      labels in this function..
      
      Fixes: 035e3bef ("net: wwan: iosm: fix driver not working with INTEL_IOMMU disabled")
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarHarshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a56ef256
    • Kuniyuki Iwashima's avatar
      smc: Fix use-after-free in tcp_write_timer_handler(). · 9744d2bf
      Kuniyuki Iwashima authored
      With Eric's ref tracker, syzbot finally found a repro for
      use-after-free in tcp_write_timer_handler() by kernel TCP
      sockets. [0]
      
      If SMC creates a kernel socket in __smc_create(), the kernel
      socket is supposed to be freed in smc_clcsock_release() by
      calling sock_release() when we close() the parent SMC socket.
      
      However, at the end of smc_clcsock_release(), the kernel
      socket's sk_state might not be TCP_CLOSE.  This means that
      we have not called inet_csk_destroy_sock() in __tcp_close()
      and have not stopped the TCP timers.
      
      The kernel socket's TCP timers can be fired later, so we
      need to hold a refcnt for net as we do for MPTCP subflows
      in mptcp_subflow_create_socket().
      
      [0]:
      leaked reference.
       sk_alloc (./include/net/net_namespace.h:335 net/core/sock.c:2108)
       inet_create (net/ipv4/af_inet.c:319 net/ipv4/af_inet.c:244)
       __sock_create (net/socket.c:1546)
       smc_create (net/smc/af_smc.c:3269 net/smc/af_smc.c:3284)
       __sock_create (net/socket.c:1546)
       __sys_socket (net/socket.c:1634 net/socket.c:1618 net/socket.c:1661)
       __x64_sys_socket (net/socket.c:1672)
       do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      ==================================================================
      BUG: KASAN: slab-use-after-free in tcp_write_timer_handler (net/ipv4/tcp_timer.c:378 net/ipv4/tcp_timer.c:624 net/ipv4/tcp_timer.c:594)
      Read of size 1 at addr ffff888052b65e0d by task syzrepro/18091
      
      CPU: 0 PID: 18091 Comm: syzrepro Tainted: G        W          6.3.0-rc4-01174-gb5d54eb5 #7
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.amzn2022.0.1 04/01/2014
      Call Trace:
       <IRQ>
       dump_stack_lvl (lib/dump_stack.c:107)
       print_report (mm/kasan/report.c:320 mm/kasan/report.c:430)
       kasan_report (mm/kasan/report.c:538)
       tcp_write_timer_handler (net/ipv4/tcp_timer.c:378 net/ipv4/tcp_timer.c:624 net/ipv4/tcp_timer.c:594)
       tcp_write_timer (./include/linux/spinlock.h:390 net/ipv4/tcp_timer.c:643)
       call_timer_fn (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/timer.h:127 kernel/time/timer.c:1701)
       __run_timers.part.0 (kernel/time/timer.c:1752 kernel/time/timer.c:2022)
       run_timer_softirq (kernel/time/timer.c:2037)
       __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:572)
       __irq_exit_rcu (kernel/softirq.c:445 kernel/softirq.c:650)
       irq_exit_rcu (kernel/softirq.c:664)
       sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1107 (discriminator 14))
       </IRQ>
      
      Fixes: ac713874 ("smc: establish new socket family")
      Reported-by: syzbot+7e1e1bdb852961150198@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/netdev/000000000000a3f51805f8bcc43a@google.com/Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9744d2bf
    • Denis Plotnikov's avatar
      qlcnic: check pci_reset_function result · 7573099e
      Denis Plotnikov authored
      Static code analyzer complains to unchecked return value.
      The result of pci_reset_function() is unchecked.
      Despite, the issue is on the FLR supported code path and in that
      case reset can be done with pcie_flr(), the patch uses less invasive
      approach by adding the result check of pci_reset_function().
      
      Found by Linux Verification Center (linuxtesting.org) with SVACE.
      
      Fixes: 7e2cf4fe ("qlcnic: change driver hardware interface mechanism")
      Signed-off-by: default avatarDenis Plotnikov <den-plotnikov@yandex-team.ru>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7573099e
    • Jakub Kicinski's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · adacf21f
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      iavf: fix racing in VLANs
      
      Ahmed Zaki says:
      
      This patchset mainly fixes a racing issue in the iavf where the number of
      VLANs in the vlan_filter_list might be more than the PF limit. To fix that,
      we get rid of the cvlans and svlans bitmaps and keep all the required info
      in the list.
      
      The second patch adds two new states that are needed so that we keep the
      VLAN info while the interface goes DOWN:
          -- DISABLE    (notify PF, but keep the filter in the list)
          -- INACTIVE   (dev is DOWN, filter is removed from PF)
      
      Finally, the current code keeps each state in a separate bit field, which
      is error prone. The first patch refactors that by replacing all bits with
      a single enum. The changes are minimal where each bit change is replaced
      with the new state value.
      
      * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        iavf: remove active_cvlans and active_svlans bitmaps
        iavf: refactor VLAN filter states
      ====================
      
      Link: https://lore.kernel.org/r/20230407210730.3046149-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      adacf21f
    • Jakub Kicinski's avatar
      Merge tag 'for-net-2023-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · 160c1317
      Jakub Kicinski authored
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth pull request for net:
      
       - Fix not setting Dath Path for broadcast sink
       - Fix not cleaning up on LE Connection failure
       - SCO: Fix possible circular locking dependency
       - L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
       - Fix race condition in hidp_session_thread
       - btbcm: Fix logic error in forming the board name
       - btbcm: Fix use after free in btsdio_remove
      
      * tag 'for-net-2023-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
        Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
        Bluetooth: Set ISO Data Path on broadcast sink
        Bluetooth: hci_conn: Fix possible UAF
        Bluetooth: SCO: Fix possible circular locking dependency sco_sock_getsockopt
        Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm
        bluetooth: btbcm: Fix logic error in forming the board name.
        Bluetooth: btsdio: fix use after free bug in btsdio_remove due to race condition
        Bluetooth: Fix race condition in hidp_session_thread
        Bluetooth: Fix printing errors if LE Connection times out
        Bluetooth: hci_conn: Fix not cleaning up on LE Connection failure
      ====================
      
      Link: https://lore.kernel.org/r/20230410172718.4067798-1-luiz.dentz@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      160c1317
  3. 11 Apr, 2023 1 commit
  4. 10 Apr, 2023 10 commits
    • Luiz Augusto von Dentz's avatar
      Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp} · a2a9339e
      Luiz Augusto von Dentz authored
      Similar to commit d0be8347 ("Bluetooth: L2CAP: Fix use-after-free
      caused by l2cap_chan_put"), just use l2cap_chan_hold_unless_zero to
      prevent referencing a channel that is about to be destroyed.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarMin Li <lm0963hack@gmail.com>
      a2a9339e
    • Claudia Draghicescu's avatar
      Bluetooth: Set ISO Data Path on broadcast sink · d2e4f1b1
      Claudia Draghicescu authored
      This patch enables ISO data rx on broadcast sink.
      
      Fixes: eca0ae4a ("Bluetooth: Add initial implementation of BIS connections")
      Signed-off-by: default avatarClaudia Draghicescu <claudia.rosu@nxp.com>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      d2e4f1b1
    • Luiz Augusto von Dentz's avatar
      Bluetooth: hci_conn: Fix possible UAF · 5dc7d23e
      Luiz Augusto von Dentz authored
      This fixes the following trace:
      
      ==================================================================
      BUG: KASAN: slab-use-after-free in hci_conn_del+0xba/0x3a0
      Write of size 8 at addr ffff88800208e9c8 by task iso-tester/31
      
      CPU: 0 PID: 31 Comm: iso-tester Not tainted 6.3.0-rc2-g991aa4a69a47
       #4716
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc36
      04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x1d/0x70
       print_report+0xce/0x610
       ? __virt_addr_valid+0xd4/0x150
       ? hci_conn_del+0xba/0x3a0
       kasan_report+0xdd/0x110
       ? hci_conn_del+0xba/0x3a0
       hci_conn_del+0xba/0x3a0
       hci_conn_hash_flush+0xf2/0x120
       hci_dev_close_sync+0x388/0x920
       hci_unregister_dev+0x122/0x260
       vhci_release+0x4f/0x90
       __fput+0x102/0x430
       task_work_run+0xf1/0x160
       ? __pfx_task_work_run+0x10/0x10
       ? mark_held_locks+0x24/0x90
       exit_to_user_mode_prepare+0x170/0x180
       syscall_exit_to_user_mode+0x19/0x50
       do_syscall_64+0x4e/0x90
       entry_SYSCALL_64_after_hwframe+0x70/0xda
      
      Fixes: 0f00cd32 ("Bluetooth: Free potentially unfreed SCO connection")
      Link: https://syzkaller.appspot.com/bug?extid=8bb72f86fc823817bc5d
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      5dc7d23e
    • Luiz Augusto von Dentz's avatar
      Bluetooth: SCO: Fix possible circular locking dependency sco_sock_getsockopt · 975abc0c
      Luiz Augusto von Dentz authored
      This attempts to fix the following trace:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      6.3.0-rc2-g68fcb3a7bf97 #4706 Not tainted
      ------------------------------------------------------
      sco-tester/31 is trying to acquire lock:
      ffff8880025b8070 (&hdev->lock){+.+.}-{3:3}, at:
      sco_sock_getsockopt+0x1fc/0xa90
      
      but task is already holding lock:
      ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at:
      sco_sock_getsockopt+0x104/0xa90
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}:
             lock_sock_nested+0x32/0x80
             sco_connect_cfm+0x118/0x4a0
             hci_sync_conn_complete_evt+0x1e6/0x3d0
             hci_event_packet+0x55c/0x7c0
             hci_rx_work+0x34c/0xa00
             process_one_work+0x575/0x910
             worker_thread+0x89/0x6f0
             kthread+0x14e/0x180
             ret_from_fork+0x2b/0x50
      
      -> #1 (hci_cb_list_lock){+.+.}-{3:3}:
             __mutex_lock+0x13b/0xcc0
             hci_sync_conn_complete_evt+0x1ad/0x3d0
             hci_event_packet+0x55c/0x7c0
             hci_rx_work+0x34c/0xa00
             process_one_work+0x575/0x910
             worker_thread+0x89/0x6f0
             kthread+0x14e/0x180
             ret_from_fork+0x2b/0x50
      
      -> #0 (&hdev->lock){+.+.}-{3:3}:
             __lock_acquire+0x18cc/0x3740
             lock_acquire+0x151/0x3a0
             __mutex_lock+0x13b/0xcc0
             sco_sock_getsockopt+0x1fc/0xa90
             __sys_getsockopt+0xe9/0x190
             __x64_sys_getsockopt+0x5b/0x70
             do_syscall_64+0x42/0x90
             entry_SYSCALL_64_after_hwframe+0x70/0xda
      
      other info that might help us debug this:
      
      Chain exists of:
        &hdev->lock --> hci_cb_list_lock --> sk_lock-AF_BLUETOOTH-BTPROTO_SCO
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO);
                                     lock(hci_cb_list_lock);
                                     lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO);
        lock(&hdev->lock);
      
       *** DEADLOCK ***
      
      1 lock held by sco-tester/31:
       #0: ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0},
       at: sco_sock_getsockopt+0x104/0xa90
      
      Fixes: 248733e8 ("Bluetooth: Allow querying of supported offload codecs over SCO socket")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      975abc0c
    • Luiz Augusto von Dentz's avatar
      Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm · 9a8ec9e8
      Luiz Augusto von Dentz authored
      This attempts to fix the following trace:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      6.3.0-rc2-g0b93eeba4454 #4703 Not tainted
      ------------------------------------------------------
      kworker/u3:0/46 is trying to acquire lock:
      ffff888001fd9130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at:
      sco_connect_cfm+0x118/0x4a0
      
      but task is already holding lock:
      ffffffff831e3340 (hci_cb_list_lock){+.+.}-{3:3}, at:
      hci_sync_conn_complete_evt+0x1ad/0x3d0
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 (hci_cb_list_lock){+.+.}-{3:3}:
             __mutex_lock+0x13b/0xcc0
             hci_sync_conn_complete_evt+0x1ad/0x3d0
             hci_event_packet+0x55c/0x7c0
             hci_rx_work+0x34c/0xa00
             process_one_work+0x575/0x910
             worker_thread+0x89/0x6f0
             kthread+0x14e/0x180
             ret_from_fork+0x2b/0x50
      
      -> #1 (&hdev->lock){+.+.}-{3:3}:
             __mutex_lock+0x13b/0xcc0
             sco_sock_connect+0xfc/0x630
             __sys_connect+0x197/0x1b0
             __x64_sys_connect+0x37/0x50
             do_syscall_64+0x42/0x90
             entry_SYSCALL_64_after_hwframe+0x70/0xda
      
      -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}:
             __lock_acquire+0x18cc/0x3740
             lock_acquire+0x151/0x3a0
             lock_sock_nested+0x32/0x80
             sco_connect_cfm+0x118/0x4a0
             hci_sync_conn_complete_evt+0x1e6/0x3d0
             hci_event_packet+0x55c/0x7c0
             hci_rx_work+0x34c/0xa00
             process_one_work+0x575/0x910
             worker_thread+0x89/0x6f0
             kthread+0x14e/0x180
             ret_from_fork+0x2b/0x50
      
      other info that might help us debug this:
      
      Chain exists of:
        sk_lock-AF_BLUETOOTH-BTPROTO_SCO --> &hdev->lock --> hci_cb_list_lock
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(hci_cb_list_lock);
                                     lock(&hdev->lock);
                                     lock(hci_cb_list_lock);
        lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO);
      
       *** DEADLOCK ***
      
      4 locks held by kworker/u3:0/46:
       #0: ffff8880028d1130 ((wq_completion)hci0#2){+.+.}-{0:0}, at:
       process_one_work+0x4c0/0x910
       #1: ffff8880013dfde0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0},
       at: process_one_work+0x4c0/0x910
       #2: ffff8880025d8070 (&hdev->lock){+.+.}-{3:3}, at:
       hci_sync_conn_complete_evt+0xa6/0x3d0
       #3: ffffffffb79e3340 (hci_cb_list_lock){+.+.}-{3:3}, at:
       hci_sync_conn_complete_evt+0x1ad/0x3d0
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      9a8ec9e8
    • Sasha Finkelstein's avatar
      bluetooth: btbcm: Fix logic error in forming the board name. · b76abe46
      Sasha Finkelstein authored
      This patch fixes an incorrect loop exit condition in code that replaces
      '/' symbols in the board name. There might also be a memory corruption
      issue here, but it is unlikely to be a real problem.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarSasha Finkelstein <fnkl.kernel@gmail.com>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      b76abe46
    • Zheng Wang's avatar
      Bluetooth: btsdio: fix use after free bug in btsdio_remove due to race condition · 73f7b171
      Zheng Wang authored
      In btsdio_probe, the data->work is bound with btsdio_work. It will be
      started in btsdio_send_frame.
      
      If the btsdio_remove runs with a unfinished work, there may be a race
      condition that hdev is freed but used in btsdio_work. Fix it by
      canceling the work before do cleanup in btsdio_remove.
      Signed-off-by: default avatarZheng Wang <zyytlz.wz@163.com>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      73f7b171
    • Min Li's avatar
      Bluetooth: Fix race condition in hidp_session_thread · c95930ab
      Min Li authored
      There is a potential race condition in hidp_session_thread that may
      lead to use-after-free. For instance, the timer is active while
      hidp_del_timer is called in hidp_session_thread(). After hidp_session_put,
      then 'session' will be freed, causing kernel panic when hidp_idle_timeout
      is running.
      
      The solution is to use del_timer_sync instead of del_timer.
      
      Here is the call trace:
      
      ? hidp_session_probe+0x780/0x780
      call_timer_fn+0x2d/0x1e0
      __run_timers.part.0+0x569/0x940
      hidp_session_probe+0x780/0x780
      call_timer_fn+0x1e0/0x1e0
      ktime_get+0x5c/0xf0
      lapic_next_deadline+0x2c/0x40
      clockevents_program_event+0x205/0x320
      run_timer_softirq+0xa9/0x1b0
      __do_softirq+0x1b9/0x641
      __irq_exit_rcu+0xdc/0x190
      irq_exit_rcu+0xe/0x20
      sysvec_apic_timer_interrupt+0xa1/0xc0
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMin Li <lm0963hack@gmail.com>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      c95930ab
    • Luiz Augusto von Dentz's avatar
      Bluetooth: Fix printing errors if LE Connection times out · b62e7220
      Luiz Augusto von Dentz authored
      This fixes errors like bellow when LE Connection times out since that
      is actually not a controller error:
      
       Bluetooth: hci0: Opcode 0x200d failed: -110
       Bluetooth: hci0: request failed to create LE connection: err -110
      
      Instead the code shall properly detect if -ETIMEDOUT is returned and
      send HCI_OP_LE_CREATE_CONN_CANCEL to give up on the connection.
      
      Link: https://github.com/bluez/bluez/issues/340
      Fixes: 8e8b92ee ("Bluetooth: hci_sync: Add hci_le_create_conn_sync")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      b62e7220
    • Luiz Augusto von Dentz's avatar
      Bluetooth: hci_conn: Fix not cleaning up on LE Connection failure · 19cf60bf
      Luiz Augusto von Dentz authored
      hci_connect_le_scan_cleanup shall always be invoked to cleanup the
      states and re-enable passive scanning if necessary, otherwise it may
      cause the pending action to stay active causing multiple attempts to
      connect.
      
      Fixes: 9b3628d7 ("Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      19cf60bf
  5. 09 Apr, 2023 3 commits
  6. 08 Apr, 2023 4 commits
    • Douglas Anderson's avatar
      r8152: Add __GFP_NOWARN to big allocations · 5cc33f13
      Douglas Anderson authored
      When memory is a little tight on my system, it's pretty easy to see
      warnings that look like this.
      
        ksoftirqd/0: page allocation failure: order:3, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
        ...
        Call trace:
         dump_backtrace+0x0/0x1e8
         show_stack+0x20/0x2c
         dump_stack_lvl+0x60/0x78
         dump_stack+0x18/0x38
         warn_alloc+0x104/0x174
         __alloc_pages+0x588/0x67c
         alloc_rx_agg+0xa0/0x190 [r8152 ...]
         r8152_poll+0x270/0x760 [r8152 ...]
         __napi_poll+0x44/0x1ec
         net_rx_action+0x100/0x300
         __do_softirq+0xec/0x38c
         run_ksoftirqd+0x38/0xec
         smpboot_thread_fn+0xb8/0x248
         kthread+0x134/0x154
         ret_from_fork+0x10/0x20
      
      On a fragmented system it's normal that order 3 allocations will
      sometimes fail, especially atomic ones. The driver handles these
      failures fine and the WARN just creates spam in the logs for this
      case. The __GFP_NOWARN flag is exactly for this situation, so add it
      to the allocation.
      
      NOTE: my testing is on a 5.15 system, but there should be no reason
      that this would be fundamentally different on a mainline kernel.
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Acked-by: default avatarHayes Wang <hayeswang@realtek.com>
      Link: https://lore.kernel.org/r/20230406171411.1.I84dbef45786af440fd269b71e9436a96a8e7a152@changeidSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5cc33f13
    • Radu Pirea (OSS)'s avatar
      net: phy: nxp-c45-tja11xx: fix unsigned long multiplication overflow · bdaaecc1
      Radu Pirea (OSS) authored
      Any multiplication between GENMASK(31, 0) and a number bigger than 1
      will be truncated because of the overflow, if the size of unsigned long
      is 32 bits.
      
      Replaced GENMASK with GENMASK_ULL to make sure that multiplication will
      be between 64 bits values.
      
      Cc: <stable@vger.kernel.org> # 5.15+
      Fixes: 514def5d ("phy: nxp-c45-tja11xx: add timestamping support")
      Signed-off-by: default avatarRadu Pirea (OSS) <radu-nicolae.pirea@oss.nxp.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20230406095953.75622-1-radu-nicolae.pirea@oss.nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bdaaecc1
    • Felix Huettner's avatar
      net: openvswitch: fix race on port output · 066b8678
      Felix Huettner authored
      assume the following setup on a single machine:
      1. An openvswitch instance with one bridge and default flows
      2. two network namespaces "server" and "client"
      3. two ovs interfaces "server" and "client" on the bridge
      4. for each ovs interface a veth pair with a matching name and 32 rx and
         tx queues
      5. move the ends of the veth pairs to the respective network namespaces
      6. assign ip addresses to each of the veth ends in the namespaces (needs
         to be the same subnet)
      7. start some http server on the server network namespace
      8. test if a client in the client namespace can reach the http server
      
      when following the actions below the host has a chance of getting a cpu
      stuck in a infinite loop:
      1. send a large amount of parallel requests to the http server (around
         3000 curls should work)
      2. in parallel delete the network namespace (do not delete interfaces or
         stop the server, just kill the namespace)
      
      there is a low chance that this will cause the below kernel cpu stuck
      message. If this does not happen just retry.
      Below there is also the output of bpftrace for the functions mentioned
      in the output.
      
      The series of events happening here is:
      1. the network namespace is deleted calling
         `unregister_netdevice_many_notify` somewhere in the process
      2. this sets first `NETREG_UNREGISTERING` on both ends of the veth and
         then runs `synchronize_net`
      3. it then calls `call_netdevice_notifiers` with `NETDEV_UNREGISTER`
      4. this is then handled by `dp_device_event` which calls
         `ovs_netdev_detach_dev` (if a vport is found, which is the case for
         the veth interface attached to ovs)
      5. this removes the rx_handlers of the device but does not prevent
         packages to be sent to the device
      6. `dp_device_event` then queues the vport deletion to work in
         background as a ovs_lock is needed that we do not hold in the
         unregistration path
      7. `unregister_netdevice_many_notify` continues to call
         `netdev_unregister_kobject` which sets `real_num_tx_queues` to 0
      8. port deletion continues (but details are not relevant for this issue)
      9. at some future point the background task deletes the vport
      
      If after 7. but before 9. a packet is send to the ovs vport (which is
      not deleted at this point in time) which forwards it to the
      `dev_queue_xmit` flow even though the device is unregistering.
      In `skb_tx_hash` (which is called in the `dev_queue_xmit`) path there is
      a while loop (if the packet has a rx_queue recorded) that is infinite if
      `dev->real_num_tx_queues` is zero.
      
      To prevent this from happening we update `do_output` to handle devices
      without carrier the same as if the device is not found (which would
      be the code path after 9. is done).
      
      Additionally we now produce a warning in `skb_tx_hash` if we will hit
      the infinite loop.
      
      bpftrace (first word is function name):
      
      __dev_queue_xmit server: real_num_tx_queues: 1, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 1
      netdev_core_pick_tx server: addr: 0xffff9f0a46d4a000 real_num_tx_queues: 1, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 1
      dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 2, reg_state: 1
      synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
      synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
      synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
      synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
      dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 6, reg_state: 2
      ovs_netdev_detach_dev server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, reg_state: 2
      netdev_rx_handler_unregister server: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024, reg_state: 2
      synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
      netdev_rx_handler_unregister ret server: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024, reg_state: 2
      dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 27, reg_state: 2
      dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 22, reg_state: 2
      dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 18, reg_state: 2
      netdev_unregister_kobject: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024
      synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
      ovs_vport_send server: real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 2
      __dev_queue_xmit server: real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 2
      netdev_core_pick_tx server: addr: 0xffff9f0a46d4a000 real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 2
      broken device server: real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024
      ovs_dp_detach_port server: real_num_tx_queues: 0 cpu 9, pid: 9124, tid: 9124, reg_state: 2
      synchronize_rcu_expedited: cpu 9, pid: 33604, tid: 33604
      
      stuck message:
      
      watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [curl:1929279]
      Modules linked in: veth pktgen bridge stp llc ip_set_hash_net nft_counter xt_set nft_compat nf_tables ip_set_hash_ip ip_set nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tls binfmt_misc nls_iso8859_1 input_leds joydev serio_raw dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel drm efi_pstore virtio_rng ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel virtio_net ahci net_failover crypto_simd cryptd psmouse libahci virtio_blk failover
      CPU: 5 PID: 1929279 Comm: curl Not tainted 5.15.0-67-generic #74-Ubuntu
      Hardware name: OpenStack Foundation OpenStack Nova, BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      RIP: 0010:netdev_pick_tx+0xf1/0x320
      Code: 00 00 8d 48 ff 0f b7 c1 66 39 ca 0f 86 e9 01 00 00 45 0f b7 ff 41 39 c7 0f 87 5b 01 00 00 44 29 f8 41 39 c7 0f 87 4f 01 00 00 <eb> f2 0f 1f 44 00 00 49 8b 94 24 28 04 00 00 48 85 d2 0f 84 53 01
      RSP: 0018:ffffb78b40298820 EFLAGS: 00000246
      RAX: 0000000000000000 RBX: ffff9c8773adc2e0 RCX: 000000000000083f
      RDX: 0000000000000000 RSI: ffff9c8773adc2e0 RDI: ffff9c870a25e000
      RBP: ffffb78b40298858 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff9c870a25e000
      R13: ffff9c870a25e000 R14: ffff9c87fe043480 R15: 0000000000000000
      FS:  00007f7b80008f00(0000) GS:ffff9c8e5f740000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f7b80f6a0b0 CR3: 0000000329d66000 CR4: 0000000000350ee0
      Call Trace:
       <IRQ>
       netdev_core_pick_tx+0xa4/0xb0
       __dev_queue_xmit+0xf8/0x510
       ? __bpf_prog_exit+0x1e/0x30
       dev_queue_xmit+0x10/0x20
       ovs_vport_send+0xad/0x170 [openvswitch]
       do_output+0x59/0x180 [openvswitch]
       do_execute_actions+0xa80/0xaa0 [openvswitch]
       ? kfree+0x1/0x250
       ? kfree+0x1/0x250
       ? kprobe_perf_func+0x4f/0x2b0
       ? flow_lookup.constprop.0+0x5c/0x110 [openvswitch]
       ovs_execute_actions+0x4c/0x120 [openvswitch]
       ovs_dp_process_packet+0xa1/0x200 [openvswitch]
       ? ovs_ct_update_key.isra.0+0xa8/0x120 [openvswitch]
       ? ovs_ct_fill_key+0x1d/0x30 [openvswitch]
       ? ovs_flow_key_extract+0x2db/0x350 [openvswitch]
       ovs_vport_receive+0x77/0xd0 [openvswitch]
       ? __htab_map_lookup_elem+0x4e/0x60
       ? bpf_prog_680e8aff8547aec1_kfree+0x3b/0x714
       ? trace_call_bpf+0xc8/0x150
       ? kfree+0x1/0x250
       ? kfree+0x1/0x250
       ? kprobe_perf_func+0x4f/0x2b0
       ? kprobe_perf_func+0x4f/0x2b0
       ? __mod_memcg_lruvec_state+0x63/0xe0
       netdev_port_receive+0xc4/0x180 [openvswitch]
       ? netdev_port_receive+0x180/0x180 [openvswitch]
       netdev_frame_hook+0x1f/0x40 [openvswitch]
       __netif_receive_skb_core.constprop.0+0x23d/0xf00
       __netif_receive_skb_one_core+0x3f/0xa0
       __netif_receive_skb+0x15/0x60
       process_backlog+0x9e/0x170
       __napi_poll+0x33/0x180
       net_rx_action+0x126/0x280
       ? ttwu_do_activate+0x72/0xf0
       __do_softirq+0xd9/0x2e7
       ? rcu_report_exp_cpu_mult+0x1b0/0x1b0
       do_softirq+0x7d/0xb0
       </IRQ>
       <TASK>
       __local_bh_enable_ip+0x54/0x60
       ip_finish_output2+0x191/0x460
       __ip_finish_output+0xb7/0x180
       ip_finish_output+0x2e/0xc0
       ip_output+0x78/0x100
       ? __ip_finish_output+0x180/0x180
       ip_local_out+0x5e/0x70
       __ip_queue_xmit+0x184/0x440
       ? tcp_syn_options+0x1f9/0x300
       ip_queue_xmit+0x15/0x20
       __tcp_transmit_skb+0x910/0x9c0
       ? __mod_memcg_state+0x44/0xa0
       tcp_connect+0x437/0x4e0
       ? ktime_get_with_offset+0x60/0xf0
       tcp_v4_connect+0x436/0x530
       __inet_stream_connect+0xd4/0x3a0
       ? kprobe_perf_func+0x4f/0x2b0
       ? aa_sk_perm+0x43/0x1c0
       inet_stream_connect+0x3b/0x60
       __sys_connect_file+0x63/0x70
       __sys_connect+0xa6/0xd0
       ? setfl+0x108/0x170
       ? do_fcntl+0xe8/0x5a0
       __x64_sys_connect+0x18/0x20
       do_syscall_64+0x5c/0xc0
       ? __x64_sys_fcntl+0xa9/0xd0
       ? exit_to_user_mode_prepare+0x37/0xb0
       ? syscall_exit_to_user_mode+0x27/0x50
       ? do_syscall_64+0x69/0xc0
       ? __sys_setsockopt+0xea/0x1e0
       ? exit_to_user_mode_prepare+0x37/0xb0
       ? syscall_exit_to_user_mode+0x27/0x50
       ? __x64_sys_setsockopt+0x1f/0x30
       ? do_syscall_64+0x69/0xc0
       ? irqentry_exit+0x1d/0x30
       ? exc_page_fault+0x89/0x170
       entry_SYSCALL_64_after_hwframe+0x61/0xcb
      RIP: 0033:0x7f7b8101c6a7
      Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 18 89 54 24 0c 48 89 34 24 89
      RSP: 002b:00007ffffd6b2198 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b8101c6a7
      RDX: 0000000000000010 RSI: 00007ffffd6b2360 RDI: 0000000000000005
      RBP: 0000561f1370d560 R08: 00002795ad21d1ac R09: 0030312e302e302e
      R10: 00007ffffd73f080 R11: 0000000000000246 R12: 0000561f1370c410
      R13: 0000000000000000 R14: 0000000000000005 R15: 0000000000000000
       </TASK>
      
      Fixes: 7f8a436e ("openvswitch: Add conntrack action")
      Co-developed-by: default avatarLuca Czesla <luca.czesla@mail.schwarz>
      Signed-off-by: default avatarLuca Czesla <luca.czesla@mail.schwarz>
      Signed-off-by: default avatarFelix Huettner <felix.huettner@mail.schwarz>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/ZC0pBXBAgh7c76CA@kernel-bug-kernel-bugSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      066b8678
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 029294d0
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2023-04-08
      
      We've added 4 non-merge commits during the last 11 day(s) which contain
      a total of 5 files changed, 39 insertions(+), 6 deletions(-).
      
      The main changes are:
      
      1) Fix BPF TCP socket iterator to use correct helper for dropping
         socket's refcount, that is, sock_gen_put instead of sock_put,
         from Martin KaFai Lau.
      
      2) Fix a BTI exception splat in BPF trampoline-generated code on arm64,
         from Xu Kuohai.
      
      3) Fix a LongArch JIT error from missing BPF_NOSPEC no-op, from George Guo.
      
      4) Fix dynamic XDP feature detection of veth in xdp_redirect selftest,
         from Lorenzo Bianconi.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        selftests/bpf: fix xdp_redirect xdp-features selftest for veth driver
        bpf, arm64: Fixed a BTI error on returning to patched function
        LoongArch, bpf: Fix jit to skip speculation barrier opcode
        bpf: tcp: Use sock_gen_put instead of sock_put in bpf_iter_tcp
      ====================
      
      Link: https://lore.kernel.org/r/20230407224642.30906-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      029294d0
  7. 07 Apr, 2023 6 commits
    • Ahmed Zaki's avatar
      iavf: remove active_cvlans and active_svlans bitmaps · 9c85b7fa
      Ahmed Zaki authored
      The VLAN filters info is currently being held in a list and 2 bitmaps
      (active_cvlans and active_svlans). We are experiencing some racing where
      data is not in sync in the list and bitmaps. For example, the VLAN is
      initially added to the list but only when the PF replies, it is added to
      the bitmap. If a user adds many V2 VLANS before the PF responds:
      
          while [ $((i++)) ]
              ip l add l eth0 name eth0.$i type vlan id $i
      
      we might end up with more VLAN list entries than the designated limit.
      Also, The "ip link show" will show more links added than the PF limit.
      
      On the other and, the bitmaps are only used to check the number of VLAN
      filters and to re-enable the filters when the interface goes from DOWN to
      UP.
      
      This patch gets rid of the bitmaps and uses the list only. To do that,
      the states of the VLAN filter are modified:
      1 - IAVF_VLAN_REMOVE: the entry needs to be totally removed after informing
        the PF. This is the "ip link del eth0.$i" path.
      2 - IAVF_VLAN_DISABLE: (new) the netdev went down. The filter needs to be
        removed from the PF and then marked INACTIVE.
      3 - IAVF_VLAN_INACTIVE: (new) no PF filter exists, but the user did not
        delete the VLAN.
      
      Fixes: 48ccc43e ("iavf: Add support VIRTCHNL_VF_OFFLOAD_VLAN_V2 during netdev config")
      Signed-off-by: default avatarAhmed Zaki <ahmed.zaki@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      9c85b7fa
    • Ahmed Zaki's avatar
      iavf: refactor VLAN filter states · 0c0da0e9
      Ahmed Zaki authored
      The VLAN filter states are currently being saved as individual bits.
      This is error prone as multiple bits might be mistakenly set.
      
      Fix by replacing the bits with a single state enum. Also, add an
      "ACTIVE" state for filters that are accepted by the PF.
      Signed-off-by: default avatarAhmed Zaki <ahmed.zaki@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      0c0da0e9
    • David S. Miller's avatar
      Merge branch 'bonding-ns-validation-fixes' · b9881d9a
      David S. Miller authored
      Hangbin Liu says:
      
      ====================
      bonding: fix ns validation on backup slaves
      
      The first patch fixed a ns validation issue on backup slaves. The second
      patch re-format the bond option test and add a test lib file. The third
      patch add the arp validate regression test for the kernel patch.
      
      Here is the new bonding option test without the kernel fix:
      
      ]# ./bond_options.sh
      TEST: prio (active-backup miimon primary_reselect 0)           [ OK ]
      TEST: prio (active-backup miimon primary_reselect 1)           [ OK ]
      TEST: prio (active-backup miimon primary_reselect 2)           [ OK ]
      TEST: prio (active-backup arp_ip_target primary_reselect 0)    [ OK ]
      TEST: prio (active-backup arp_ip_target primary_reselect 1)    [ OK ]
      TEST: prio (active-backup arp_ip_target primary_reselect 2)    [ OK ]
      TEST: prio (active-backup ns_ip6_target primary_reselect 0)    [ OK ]
      TEST: prio (active-backup ns_ip6_target primary_reselect 1)    [ OK ]
      TEST: prio (active-backup ns_ip6_target primary_reselect 2)    [ OK ]
      TEST: prio (balance-tlb miimon primary_reselect 0)             [ OK ]
      TEST: prio (balance-tlb miimon primary_reselect 1)             [ OK ]
      TEST: prio (balance-tlb miimon primary_reselect 2)             [ OK ]
      TEST: prio (balance-tlb arp_ip_target primary_reselect 0)      [ OK ]
      TEST: prio (balance-tlb arp_ip_target primary_reselect 1)      [ OK ]
      TEST: prio (balance-tlb arp_ip_target primary_reselect 2)      [ OK ]
      TEST: prio (balance-tlb ns_ip6_target primary_reselect 0)      [ OK ]
      TEST: prio (balance-tlb ns_ip6_target primary_reselect 1)      [ OK ]
      TEST: prio (balance-tlb ns_ip6_target primary_reselect 2)      [ OK ]
      TEST: prio (balance-alb miimon primary_reselect 0)             [ OK ]
      TEST: prio (balance-alb miimon primary_reselect 1)             [ OK ]
      TEST: prio (balance-alb miimon primary_reselect 2)             [ OK ]
      TEST: prio (balance-alb arp_ip_target primary_reselect 0)      [ OK ]
      TEST: prio (balance-alb arp_ip_target primary_reselect 1)      [ OK ]
      TEST: prio (balance-alb arp_ip_target primary_reselect 2)      [ OK ]
      TEST: prio (balance-alb ns_ip6_target primary_reselect 0)      [ OK ]
      TEST: prio (balance-alb ns_ip6_target primary_reselect 1)      [ OK ]
      TEST: prio (balance-alb ns_ip6_target primary_reselect 2)      [ OK ]
      TEST: arp_validate (active-backup arp_ip_target arp_validate 0)  [ OK ]
      TEST: arp_validate (active-backup arp_ip_target arp_validate 1)  [ OK ]
      TEST: arp_validate (active-backup arp_ip_target arp_validate 2)  [ OK ]
      TEST: arp_validate (active-backup arp_ip_target arp_validate 3)  [ OK ]
      TEST: arp_validate (active-backup arp_ip_target arp_validate 4)  [ OK ]
      TEST: arp_validate (active-backup arp_ip_target arp_validate 5)  [ OK ]
      TEST: arp_validate (active-backup arp_ip_target arp_validate 6)  [ OK ]
      TEST: arp_validate (active-backup ns_ip6_target arp_validate 0)  [ OK ]
      TEST: arp_validate (active-backup ns_ip6_target arp_validate 1)  [ OK ]
      TEST: arp_validate (interface eth1 mii_status DOWN)                 [FAIL]
      TEST: arp_validate (interface eth2 mii_status DOWN)                 [FAIL]
      TEST: arp_validate (active-backup ns_ip6_target arp_validate 2)  [FAIL]
      TEST: arp_validate (interface eth1 mii_status DOWN)                 [FAIL]
      TEST: arp_validate (interface eth2 mii_status DOWN)                 [FAIL]
      TEST: arp_validate (active-backup ns_ip6_target arp_validate 3)  [FAIL]
      TEST: arp_validate (active-backup ns_ip6_target arp_validate 4)  [ OK ]
      TEST: arp_validate (active-backup ns_ip6_target arp_validate 5)  [ OK ]
      TEST: arp_validate (interface eth1 mii_status DOWN)                 [FAIL]
      TEST: arp_validate (interface eth2 mii_status DOWN)                 [FAIL]
      TEST: arp_validate (active-backup ns_ip6_target arp_validate 6)  [FAIL]
      
      Here is the test result after the kernel fix:
      TEST: arp_validate (active-backup arp_ip_target arp_validate 0)  [ OK ]
      TEST: arp_validate (active-backup arp_ip_target arp_validate 1)  [ OK ]
      TEST: arp_validate (active-backup arp_ip_target arp_validate 2)  [ OK ]
      TEST: arp_validate (active-backup arp_ip_target arp_validate 3)  [ OK ]
      TEST: arp_validate (active-backup arp_ip_target arp_validate 4)  [ OK ]
      TEST: arp_validate (active-backup arp_ip_target arp_validate 5)  [ OK ]
      TEST: arp_validate (active-backup arp_ip_target arp_validate 6)  [ OK ]
      TEST: arp_validate (active-backup ns_ip6_target arp_validate 0)  [ OK ]
      TEST: arp_validate (active-backup ns_ip6_target arp_validate 1)  [ OK ]
      TEST: arp_validate (active-backup ns_ip6_target arp_validate 2)  [ OK ]
      TEST: arp_validate (active-backup ns_ip6_target arp_validate 3)  [ OK ]
      TEST: arp_validate (active-backup ns_ip6_target arp_validate 4)  [ OK ]
      TEST: arp_validate (active-backup ns_ip6_target arp_validate 5)  [ OK ]
      TEST: arp_validate (active-backup ns_ip6_target arp_validate 6)  [ OK ]
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b9881d9a
    • Hangbin Liu's avatar
      selftests: bonding: add arp validate test · 2e825f8a
      Hangbin Liu authored
      This patch add bonding arp validate tests with mode active backup,
      monitor arp_ip_target and ns_ip6_target. It also checks mii_status
      to make sure all slaves are UP.
      Acked-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e825f8a
    • Hangbin Liu's avatar
      selftests: bonding: re-format bond option tests · 481b56e0
      Hangbin Liu authored
      To improve the testing process for bond options, A new bond topology lib
      is added to our testing setup. The current option_prio.sh file will be
      renamed to bond_options.sh so that all bonding options can be tested here.
      Specifically, for priority testing, we will run all tests using modes
      1, 5, and 6. These changes will help us streamline the testing process
      and ensure that our bond options are rigorously evaluated.
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Acked-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      481b56e0
    • Hangbin Liu's avatar
      bonding: fix ns validation on backup slaves · 4598380f
      Hangbin Liu authored
      When arp_validate is set to 2, 3, or 6, validation is performed for
      backup slaves as well. As stated in the bond documentation, validation
      involves checking the broadcast ARP request sent out via the active
      slave. This helps determine which slaves are more likely to function in
      the event of an active slave failure.
      
      However, when the target is an IPv6 address, the NS message sent from
      the active interface is not checked on backup slaves. Additionally,
      based on the bond_arp_rcv() rule b, we must reverse the saddr and daddr
      when checking the NS message.
      
      Note that when checking the NS message, the destination address is a
      multicast address. Therefore, we must convert the target address to
      solicited multicast in the bond_get_targets_ip6() function.
      
      Prior to the fix, the backup slaves had a mii status of "down", but
      after the fix, all of the slaves' mii status was updated to "UP".
      
      Fixes: 4e24be01 ("bonding: add new parameter ns_targets")
      Reviewed-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4598380f