1. 09 Feb, 2023 20 commits
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 8697a258
      Jakub Kicinski authored
      net/devlink/leftover.c / net/core/devlink.c:
        565b4824 ("devlink: change port event netdev notifier from per-net to global")
        f05bd8eb ("devlink: move code to a dedicated directory")
        687125b5 ("devlink: split out core code")
      https://lore.kernel.org/all/20230208094657.379f2b1a@canb.auug.org.au/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8697a258
    • Eric Dumazet's avatar
      net: enable usercopy for skb_small_head_cache · 0b34d680
      Eric Dumazet authored
      syzbot and other bots reported that we have to enable
      user copy to/from skb->head. [1]
      
      We can prevent access to skb_shared_info, which is a nice
      improvement over standard kmem_cache.
      
      Layout of these kmem_cache objects is:
      
      < SKB_SMALL_HEAD_HEADROOM >< struct skb_shared_info >
      
      usercopy: Kernel memory overwrite attempt detected to SLUB object 'skbuff_small_head' (offset 32, size 20)!
      ------------[ cut here ]------------
      kernel BUG at mm/usercopy.c:102 !
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc6-syzkaller-01425-gcb6b2e11 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
      RIP: 0010:usercopy_abort+0xbd/0xbf mm/usercopy.c:102
      Code: e8 ee ad ba f7 49 89 d9 4d 89 e8 4c 89 e1 41 56 48 89 ee 48 c7 c7 20 2b 5b 8a ff 74 24 08 41 57 48 8b 54 24 20 e8 7a 17 fe ff <0f> 0b e8 c2 ad ba f7 e8 7d fb 08 f8 48 8b 0c 24 49 89 d8 44 89 ea
      RSP: 0000:ffffc90000067a48 EFLAGS: 00010286
      RAX: 000000000000006b RBX: ffffffff8b5b6ea0 RCX: 0000000000000000
      RDX: ffff8881401c0000 RSI: ffffffff8166195c RDI: fffff5200000cf3b
      RBP: ffffffff8a5b2a60 R08: 000000000000006b R09: 0000000000000000
      R10: 0000000080000000 R11: 0000000000000000 R12: ffffffff8bf2a925
      R13: ffffffff8a5b29a0 R14: 0000000000000014 R15: ffffffff8a5b2960
      FS: 0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 000000000c48e000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <TASK>
      __check_heap_object+0xdd/0x110 mm/slub.c:4761
      check_heap_object mm/usercopy.c:196 [inline]
      __check_object_size mm/usercopy.c:251 [inline]
      __check_object_size+0x1da/0x5a0 mm/usercopy.c:213
      check_object_size include/linux/thread_info.h:199 [inline]
      check_copy_size include/linux/thread_info.h:235 [inline]
      copy_from_iter include/linux/uio.h:186 [inline]
      copy_from_iter_full include/linux/uio.h:194 [inline]
      memcpy_from_msg include/linux/skbuff.h:3977 [inline]
      qrtr_sendmsg+0x65f/0x970 net/qrtr/af_qrtr.c:965
      sock_sendmsg_nosec net/socket.c:722 [inline]
      sock_sendmsg+0xde/0x190 net/socket.c:745
      say_hello+0xf6/0x170 net/qrtr/ns.c:325
      qrtr_ns_init+0x220/0x2b0 net/qrtr/ns.c:804
      qrtr_proto_init+0x59/0x95 net/qrtr/af_qrtr.c:1296
      do_one_initcall+0x141/0x790 init/main.c:1306
      do_initcall_level init/main.c:1379 [inline]
      do_initcalls init/main.c:1395 [inline]
      do_basic_setup init/main.c:1414 [inline]
      kernel_init_freeable+0x6f9/0x782 init/main.c:1634
      kernel_init+0x1e/0x1d0 init/main.c:1522
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
      </TASK>
      
      Fixes: bf9f1baa ("net: add dedicated kmem_cache for typical/small skb->head")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reported-by: default avatarLinux Kernel Functional Testing <lkft@linaro.org>
      Tested-by: default avatarLinux Kernel Functional Testing <lkft@linaro.org>
      Link: https://lore.kernel.org/linux-next/CA+G9fYs-i-c2KTSA7Ai4ES_ZESY1ZnM=Zuo8P1jN00oed6KHMA@mail.gmail.com
      Link: https://lore.kernel.org/r/20230208142508.3278406-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0b34d680
    • Linus Torvalds's avatar
      Merge tag 'net-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 35674e78
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from can and ipsec subtrees.
      
        Current release - regressions:
      
         - sched: fix off by one in htb_activate_prios()
      
         - eth: mana: fix accessing freed irq affinity_hint
      
         - eth: ice: fix out-of-bounds KASAN warning in virtchnl
      
        Current release - new code bugs:
      
         - eth: mtk_eth_soc: enable special tag when any MAC uses DSA
      
        Previous releases - always broken:
      
         - core: fix sk->sk_txrehash default
      
         - neigh: make sure used and confirmed times are valid
      
         - mptcp: be careful on subflow status propagation on errors
      
         - xfrm: prevent potential spectre v1 gadget in xfrm_xlate32_attr()
      
         - phylink: move phy_device_free() to correctly release phy device
      
         - eth: mlx5:
            - fix crash unsetting rx-vlan-filter in switchdev mode
            - fix hang on firmware reset
            - serialize module cleanup with reload and remove"
      
      * tag 'net-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
        selftests: forwarding: lib: quote the sysctl values
        net: mscc: ocelot: fix all IPv6 getting trapped to CPU when PTP timestamping is used
        rds: rds_rm_zerocopy_callback() use list_first_entry()
        net: txgbe: Update support email address
        selftests: Fix failing VXLAN VNI filtering test
        selftests: mptcp: stop tests earlier
        selftests: mptcp: allow more slack for slow test-case
        mptcp: be careful on subflow status propagation on errors
        mptcp: fix locking for in-kernel listener creation
        mptcp: fix locking for setsockopt corner-case
        mptcp: do not wait for bare sockets' timeout
        net: ethernet: mtk_eth_soc: fix DSA TX tag hwaccel for switch port 0
        nfp: ethtool: fix the bug of setting unsupported port speed
        txhash: fix sk->sk_txrehash default
        net: ethernet: mtk_eth_soc: fix wrong parameters order in __xdp_rxq_info_reg()
        net: ethernet: mtk_eth_soc: enable special tag when any MAC uses DSA
        net: sched: sch: Fix off by one in htb_activate_prios()
        igc: Add ndo_tx_timeout support
        net: mana: Fix accessing freed irq affinity_hint
        hv_netvsc: Allocate memory in netvsc_dma_map() with GFP_ATOMIC
        ...
      35674e78
    • Linus Torvalds's avatar
      Merge tag 'for-linus-2023020901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · 0b028189
      Linus Torvalds authored
      Pull HID fixes from Benjamin Tissoires:
      
       - fix potential infinite loop with a badly crafted HID device (Xin
         Zhao)
      
       - fix regression from 6.1 in USB logitech devices potentially making
         their mouse wheel not working (Bastien Nocera)
      
       - clean up in AMD sensors, which fixes a long time resume bug (Mario
         Limonciello)
      
       - few device small fixes and quirks
      
      * tag 'for-linus-2023020901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
        HID: Ignore battery for ELAN touchscreen 29DF on HP
        HID: amd_sfh: if no sensors are enabled, clean up
        HID: logitech: Disable hi-res scrolling on USB
        HID: core: Fix deadloop in hid_apply_multiplier.
        HID: Ignore battery for Elan touchscreen on Asus TP420IA
        HID: elecom: add support for TrackBall 056E:011C
      0b028189
    • Linus Torvalds's avatar
      Merge tag '6.2-rc8-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6 · 94a1f56d
      Linus Torvalds authored
      Pull cifx fix from Steve French:
       "Small fix for use after free"
      
      * tag '6.2-rc8-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: Fix use-after-free in rdata->read_into_pages()
      94a1f56d
    • Hangbin Liu's avatar
      selftests: forwarding: lib: quote the sysctl values · 3a082086
      Hangbin Liu authored
      When set/restore sysctl value, we should quote the value as some keys
      may have multi values, e.g. net.ipv4.ping_group_range
      
      Fixes: f5ae5778 ("selftests: forwarding: lib: Add sysctl_set(), sysctl_restore()")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Link: https://lore.kernel.org/r/20230208032110.879205-1-liuhangbin@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3a082086
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix all IPv6 getting trapped to CPU when PTP timestamping is used · 2fcde9fe
      Vladimir Oltean authored
      While running this selftest which usually passes:
      
      ~/selftests/drivers/net/dsa# ./local_termination.sh eno0 swp0
      TEST: swp0: Unicast IPv4 to primary MAC address                     [ OK ]
      TEST: swp0: Unicast IPv4 to macvlan MAC address                     [ OK ]
      TEST: swp0: Unicast IPv4 to unknown MAC address                     [ OK ]
      TEST: swp0: Unicast IPv4 to unknown MAC address, promisc            [ OK ]
      TEST: swp0: Unicast IPv4 to unknown MAC address, allmulti           [ OK ]
      TEST: swp0: Multicast IPv4 to joined group                          [ OK ]
      TEST: swp0: Multicast IPv4 to unknown group                         [ OK ]
      TEST: swp0: Multicast IPv4 to unknown group, promisc                [ OK ]
      TEST: swp0: Multicast IPv4 to unknown group, allmulti               [ OK ]
      TEST: swp0: Multicast IPv6 to joined group                          [ OK ]
      TEST: swp0: Multicast IPv6 to unknown group                         [ OK ]
      TEST: swp0: Multicast IPv6 to unknown group, promisc                [ OK ]
      TEST: swp0: Multicast IPv6 to unknown group, allmulti               [ OK ]
      
      if I start PTP timestamping then run it again (debug prints added by me),
      the unknown IPv6 MC traffic is seen by the CPU port even when it should
      have been dropped:
      
      ~/selftests/drivers/net/dsa# ptp4l -i swp0 -2 -P -m
      ptp4l[225.410]: selected /dev/ptp1 as PTP clock
      [  225.445746] mscc_felix 0000:00:00.5: ocelot_l2_ptp_trap_add: port 0 adding L2 PTP trap
      [  225.453815] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_add: port 0 adding IPv4 PTP event trap
      [  225.462703] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_add: port 0 adding IPv4 PTP general trap
      [  225.471768] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_add: port 0 adding IPv6 PTP event trap
      [  225.480651] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_add: port 0 adding IPv6 PTP general trap
      ptp4l[225.488]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE
      ptp4l[225.488]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
      ^C
      ~/selftests/drivers/net/dsa# ./local_termination.sh eno0 swp0
      TEST: swp0: Unicast IPv4 to primary MAC address                     [ OK ]
      TEST: swp0: Unicast IPv4 to macvlan MAC address                     [ OK ]
      TEST: swp0: Unicast IPv4 to unknown MAC address                     [ OK ]
      TEST: swp0: Unicast IPv4 to unknown MAC address, promisc            [ OK ]
      TEST: swp0: Unicast IPv4 to unknown MAC address, allmulti           [ OK ]
      TEST: swp0: Multicast IPv4 to joined group                          [ OK ]
      TEST: swp0: Multicast IPv4 to unknown group                         [ OK ]
      TEST: swp0: Multicast IPv4 to unknown group, promisc                [ OK ]
      TEST: swp0: Multicast IPv4 to unknown group, allmulti               [ OK ]
      TEST: swp0: Multicast IPv6 to joined group                          [ OK ]
      TEST: swp0: Multicast IPv6 to unknown group                         [FAIL]
              reception succeeded, but should have failed
      TEST: swp0: Multicast IPv6 to unknown group, promisc                [ OK ]
      TEST: swp0: Multicast IPv6 to unknown group, allmulti               [ OK ]
      
      The PGID_MCIPV6 is configured correctly to not flood to the CPU,
      I checked that.
      
      Furthermore, when I disable back PTP RX timestamping (ptp4l doesn't do
      that when it exists), packets are RX filtered again as they should be:
      
      ~/selftests/drivers/net/dsa# hwstamp_ctl -i swp0 -r 0
      [  218.202854] mscc_felix 0000:00:00.5: ocelot_l2_ptp_trap_del: port 0 removing L2 PTP trap
      [  218.212656] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_del: port 0 removing IPv4 PTP event trap
      [  218.222975] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_del: port 0 removing IPv4 PTP general trap
      [  218.233133] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_del: port 0 removing IPv6 PTP event trap
      [  218.242251] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_del: port 0 removing IPv6 PTP general trap
      current settings:
      tx_type 1
      rx_filter 12
      new settings:
      tx_type 1
      rx_filter 0
      ~/selftests/drivers/net/dsa# ./local_termination.sh eno0 swp0
      TEST: swp0: Unicast IPv4 to primary MAC address                     [ OK ]
      TEST: swp0: Unicast IPv4 to macvlan MAC address                     [ OK ]
      TEST: swp0: Unicast IPv4 to unknown MAC address                     [ OK ]
      TEST: swp0: Unicast IPv4 to unknown MAC address, promisc            [ OK ]
      TEST: swp0: Unicast IPv4 to unknown MAC address, allmulti           [ OK ]
      TEST: swp0: Multicast IPv4 to joined group                          [ OK ]
      TEST: swp0: Multicast IPv4 to unknown group                         [ OK ]
      TEST: swp0: Multicast IPv4 to unknown group, promisc                [ OK ]
      TEST: swp0: Multicast IPv4 to unknown group, allmulti               [ OK ]
      TEST: swp0: Multicast IPv6 to joined group                          [ OK ]
      TEST: swp0: Multicast IPv6 to unknown group                         [ OK ]
      TEST: swp0: Multicast IPv6 to unknown group, promisc                [ OK ]
      TEST: swp0: Multicast IPv6 to unknown group, allmulti               [ OK ]
      
      So it's clear that something in the PTP RX trapping logic went wrong.
      
      Looking a bit at the code, I can see that there are 4 typos, which
      populate "ipv4" VCAP IS2 key filter fields for IPv6 keys.
      
      VCAP IS2 keys of type OCELOT_VCAP_KEY_IPV4 and OCELOT_VCAP_KEY_IPV6 are
      handled by is2_entry_set(). OCELOT_VCAP_KEY_IPV4 looks at
      &filter->key.ipv4, and OCELOT_VCAP_KEY_IPV6 at &filter->key.ipv6.
      Simply put, when we populate the wrong key field, &filter->key.ipv6
      fields "proto.mask" and "proto.value" remain all zeroes (or "don't care").
      So is2_entry_set() will enter the "else" of this "if" condition:
      
      	if (msk == 0xff && (val == IPPROTO_TCP || val == IPPROTO_UDP))
      
      and proceed to ignore the "proto" field. The resulting rule will match
      on all IPv6 traffic, trapping it to the CPU.
      
      This is the reason why the local_termination.sh selftest sees it,
      because control traps are stronger than the PGID_MCIPV6 used for
      flooding (from the forwarding data path).
      
      But the problem is in fact much deeper. We trap all IPv6 traffic to the
      CPU, but if we're bridged, we set skb->offload_fwd_mark = 1, so software
      forwarding will not take place and IPv6 traffic will never reach its
      destination.
      
      The fix is simple - correct the typos.
      
      I was intentionally inaccurate in the commit message about the breakage
      occurring when any PTP timestamping is enabled. In fact it only happens
      when L4 timestamping is requested (HWTSTAMP_FILTER_PTP_V2_EVENT or
      HWTSTAMP_FILTER_PTP_V2_L4_EVENT). But ptp4l requests a larger RX
      timestamping filter than it needs for "-2": HWTSTAMP_FILTER_PTP_V2_EVENT.
      I wanted people skimming through git logs to not think that the bug
      doesn't affect them because they only use ptp4l in L2 mode.
      
      Fixes: 96ca08c0 ("net: mscc: ocelot: set up traps for PTP packets")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230207183117.1745754-1-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2fcde9fe
    • Pietro Borrello's avatar
      rds: rds_rm_zerocopy_callback() use list_first_entry() · f753a689
      Pietro Borrello authored
      rds_rm_zerocopy_callback() uses list_entry() on the head of a list
      causing a type confusion.
      Use list_first_entry() to actually access the first element of the
      rs_zcookie_queue list.
      
      Fixes: 9426bbc6 ("rds: use list structure to track information for zerocopy completion notification")
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarPietro Borrello <borrello@diag.uniroma1.it>
      Link: https://lore.kernel.org/r/20230202-rds-zerocopy-v3-1-83b0df974f9a@diag.uniroma1.itSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f753a689
    • Jakub Kicinski's avatar
      Merge tag 'ipsec-2023-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 646be03e
      Jakub Kicinski authored
      Steffen Klassert says:
      
      ====================
      ipsec 2023-02-08
      
      1) Fix policy checks for nested IPsec tunnels when using
         xfrm interfaces. From Benedict Wong.
      
      2) Fix netlink message expression on 32=>64-bit
         messages translators. From Anastasia Belova.
      
      3) Prevent potential spectre v1 gadget in xfrm_xlate32_attr.
         From Eric Dumazet.
      
      4) Always consistently use time64_t in xfrm_timer_handler.
         From Eric Dumazet.
      
      5) Fix KCSAN reported bug: Multiple cpus can update use_time
         at the same time. From Eric Dumazet.
      
      6) Fix SCP copy from IPv4 to IPv6 on interfamily tunnel.
         From Christian Hopps.
      
      * tag 'ipsec-2023-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
        xfrm: fix bug with DSCP copy to v6 from v4 tunnel
        xfrm: annotate data-race around use_time
        xfrm: consistently use time64_t in xfrm_timer_handler()
        xfrm/compat: prevent potential spectre v1 gadget in xfrm_xlate32_attr()
        xfrm: compat: change expression for switch in xfrm_xlate64
        Fix XFRM-I support for nested ESP tunnels
      ====================
      
      Link: https://lore.kernel.org/r/20230208114322.266510-1-steffen.klassert@secunet.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      646be03e
    • Jakub Kicinski's avatar
      Merge tag 'linux-can-next-for-6.3-20230208' of... · 5131a053
      Jakub Kicinski authored
      Merge tag 'linux-can-next-for-6.3-20230208' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
      
      Marc Kleine-Budde says:
      
      ====================
      can-next 2023-02-08
      
      The 1st patch is by Oliver Hartkopp and cleans up the CAN_RAW's
      raw_setsockopt() for CAN_RAW_FD_FRAMES.
      
      The 2nd patch is by me and fixes the compilation if
      CONFIG_CAN_CALC_BITTIMING is disabled. (Problem introduced in last
      pull request to next-next.)
      
      * tag 'linux-can-next-for-6.3-20230208' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
        can: bittiming: can_calc_bittiming(): add missing parameter to no-op function
        can: raw: use temp variable instead of rolling back config
      ====================
      
      Link: https://lore.kernel.org/r/20230208210014.3169347-1-mkl@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5131a053
    • Jakub Kicinski's avatar
      Merge tag 'mlx5-next-netdev-deadlock' of... · 9245b518
      Jakub Kicinski authored
      Merge tag 'mlx5-next-netdev-deadlock' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
      
      Saeed Mahameed says:
      
      ====================
      mlx5-next-netdev-deadlock
      
      This series from Jiri solves a deadlock when removing a network namespace
      with mlx5 devlink instance being in it.
      The deadlock is between:
      1) mlx5_ib->unregister_netdevice_notifier()
      AND
      2) mlx5_core->devlink_reload->cleanup_net()
      
      To slove this introduced mlx5 netdev added/removed events to track uplink
      netdev to be used for register_netdevice_notifier_dev_net() purposes.
      
      * tag 'mlx5-next-netdev-deadlock' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
        RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregister
        net/mlx5e: Propagate an internal event in case uplink netdev changes
        net/mlx5e: Fix trap event handling
        net/mlx5: Introduce CQE error syndrome
      ====================
      
      Link: https://lore.kernel.org/r/20230208005626.72930-1-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9245b518
    • Yang Li's avatar
      net: libwx: Remove unneeded semicolon · 3ca11619
      Yang Li authored
      ./drivers/net/ethernet/wangxun/libwx/wx_lib.c:683:2-3: Unneeded semicolon
      Reported-by: default avatarAbaci Robot <abaci@linux.alibaba.com>
      Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3976Signed-off-by: default avatarYang Li <yang.lee@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20230208004959.47553-1-yang.lee@linux.alibaba.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3ca11619
    • Yang Li's avatar
      net: libwx: clean up one inconsistent indenting · f978fa41
      Yang Li authored
      drivers/net/ethernet/wangxun/libwx/wx_lib.c:1835 wx_setup_all_rx_resources() warn: inconsistent indenting
      Reported-by: default avatarAbaci Robot <abaci@linux.alibaba.com>
      Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3981Signed-off-by: default avatarYang Li <yang.lee@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20230208013227.111605-1-yang.lee@linux.alibaba.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f978fa41
    • Jiawen Wu's avatar
      net: txgbe: Update support email address · 363d7c22
      Jiawen Wu authored
      Update new email address for Wangxun 10Gb NIC support team.
      Signed-off-by: default avatarJiawen Wu <jiawenwu@trustnetic.com>
      Link: https://lore.kernel.org/r/20230208023035.3371250-1-jiawenwu@trustnetic.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      363d7c22
    • Jiri Pirko's avatar
      RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregister · dca55da0
      Jiri Pirko authored
      When removing a network namespace with mlx5 devlink instance being in
      it, following callchain is performed:
      
      cleanup_net (takes down_read(&pernet_ops_rwsem)
      devlink_pernet_pre_exit()
      devlink_reload()
      mlx5_devlink_reload_down()
      mlx5_unload_one_devl_locked()
      mlx5_detach_device()
      del_adev()
      mlx5r_remove()
      __mlx5_ib_remove()
      mlx5_ib_roce_cleanup()
      mlx5_remove_netdev_notifier()
      unregister_netdevice_notifier (takes down_write(&pernet_ops_rwsem)
      
      This deadlocks.
      
      Resolve this by converting to register_netdevice_notifier_dev_net()
      which does not take pernet_ops_rwsem and moves the notifier block around
      according to netdev it takes as arg.
      
      Use previously introduced netdev added/removed events to track uplink
      netdev to be used for register_netdevice_notifier_dev_net() purposes.
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      dca55da0
    • Jiri Pirko's avatar
      net/mlx5e: Propagate an internal event in case uplink netdev changes · c7d4e6ab
      Jiri Pirko authored
      Whenever uplink netdev is set/cleared, propagate newly introduced event
      to inform notifier blocks netdev was added/removed.
      
      Move the set() helper to core.c from header, introduce clear() and
      netdev_added_event_replay() helpers. The last one is going to be called
      from rdma driver, so export it.
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      c7d4e6ab
    • Jiri Pirko's avatar
      net/mlx5e: Fix trap event handling · 3f26a315
      Jiri Pirko authored
      Current code does not return correct return value from event handler.
      Fix it by returning NOTIFY_* and propagate err over newly introduce ctx
      structure.
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      3f26a315
    • Jakub Kicinski's avatar
      Merge tag 'mlx5-fixes-2023-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · ff8ced4e
      Jakub Kicinski authored
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2023-02-07
      
      This series provides bug fixes to mlx5 driver.
      
      * tag 'mlx5-fixes-2023-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
        net/mlx5: Serialize module cleanup with reload and remove
        net/mlx5: fw_tracer, Zero consumer index when reloading the tracer
        net/mlx5: fw_tracer, Clear load bit when freeing string DBs buffers
        net/mlx5: Expose SF firmware pages counter
        net/mlx5: Store page counters in a single array
        net/mlx5e: IPoIB, Show unknown speed instead of error
        net/mlx5e: Fix crash unsetting rx-vlan-filter in switchdev mode
        net/mlx5: Bridge, fix ageing of peer FDB entries
        net/mlx5: DR, Fix potential race in dr_rule_create_rule_nic
        net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change
      ====================
      
      Link: https://lore.kernel.org/r/20230208030302.95378-1-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ff8ced4e
    • Jakub Kicinski's avatar
      Merge tag 'mlx5-updates-2023-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 7eadc0a0
      Jakub Kicinski authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2023-02-07
      
      1) Minor and trivial code Cleanups
      
      2) Minor fixes for net-next
      
      3) From Shay: dynamic FW trace strings update.
      
      * tag 'mlx5-updates-2023-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
        net/mlx5: fw_tracer, Add support for unrecognized string
        net/mlx5: fw_tracer, Add support for strings DB update event
        net/mlx5: fw_tracer, allow 0 size string DBs
        net/mlx5: fw_tracer: Fix debug print
        net/mlx5: fs, Remove redundant assignment of size
        net/mlx5: fs_core, Remove redundant variable err
        net/mlx5: Fix memory leak in error flow of port set buffer
        net/mlx5e: Remove incorrect debugfs_create_dir NULL check in TLS
        net/mlx5e: Remove incorrect debugfs_create_dir NULL check in hairpin
        net/mlx5: fs, Remove redundant vport_number assignment
        net/mlx5e: Remove redundant code for handling vlan actions
        net/mlx5e: Don't listen to remove flows event
        net/mlx5: fw reset: Skip device ID check if PCI link up failed
        net/mlx5: Remove redundant health work lock
        mlx5: reduce stack usage in mlx5_setup_tc
      ====================
      
      Link: https://lore.kernel.org/r/20230208003712.68386-1-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7eadc0a0
    • Ido Schimmel's avatar
      selftests: Fix failing VXLAN VNI filtering test · b963d9d5
      Ido Schimmel authored
      iproute2 does not recognize the "group6" and "remote6" keywords. Fix by
      using "group" and "remote" instead.
      
      Before:
      
       # ./test_vxlan_vnifiltering.sh
       [...]
       Tests passed:  25
       Tests failed:   2
      
      After:
      
       # ./test_vxlan_vnifiltering.sh
       [...]
       Tests passed:  27
       Tests failed:   0
      
      Fixes: 3edf5f66 ("selftests: add new tests for vxlan vnifiltering")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Link: https://lore.kernel.org/r/20230207141819.256689-1-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b963d9d5
  2. 08 Feb, 2023 20 commits
    • Marc Kleine-Budde's avatar
      can: bittiming: can_calc_bittiming(): add missing parameter to no-op function · 65db3d8b
      Marc Kleine-Budde authored
      In commit 286c0e09 ("can: bittiming: can_changelink() pass extack
      down callstack") a new parameter was added to can_calc_bittiming(),
      however the static inline no-op (which is used if
      CONFIG_CAN_CALC_BITTIMING is disabled) wasn't converted.
      
      Add the new parameter to the static inline no-op of
      can_calc_bittiming().
      
      Fixes: 286c0e09 ("can: bittiming: can_changelink() pass extack down callstack")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/20230207201734.2905618-1-mkl@pengutronix.deSigned-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      65db3d8b
    • Oliver Hartkopp's avatar
      can: raw: use temp variable instead of rolling back config · f2f527d5
      Oliver Hartkopp authored
      Introduce a temporary variable to check for an invalid configuration
      attempt from user space. Before this patch the value was copied to
      the real config variable and rolled back in the case of an error.
      Suggested-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Link: https://lore.kernel.org/all/20230203090807.97100-1-socketcan@hartkopp.netSigned-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      f2f527d5
    • David S. Miller's avatar
      Merge branch 'taprio-auto-qmaxsdu-new-tx' · e6ebe6c1
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      taprio automatic queueMaxSDU and new TXQ selection procedure
      
      This patch set addresses 2 design limitations in the taprio software scheduler:
      
      1. Software scheduling fundamentally prioritizes traffic incorrectly,
         in a way which was inspired from Intel igb/igc drivers and does not
         follow the inputs user space gives (traffic classes and TC to TXQ
         mapping). Patch 05/15 handles this, 01/15 - 04/15 are preparations
         for this work.
      
      2. Software scheduling assumes that the gate for a traffic class closes
         as soon as the next interval begins. But this isn't true.
         If consecutive schedule entries have that traffic class gate open,
         there is no "gate close" event and taprio should keep dequeuing from
         that TC without interruptions. Patches 06/15 - 15/15 handle this.
         Patch 10/15 is a generic Qdisc change required for this to work.
      
      Future development directions which depend on this patch set are:
      
      - Propagating the automatic queueMaxSDU calculation down to offloading
        device drivers, instead of letting them calculate this, as
        vsc9959_tas_guard_bands_update() does today.
      
      - A software data path for tc-taprio with preemptible traffic and
        Hold/Release events.
      
      v1 at:
      https://patchwork.kernel.org/project/netdevbpf/cover/20230128010719.2182346-1-vladimir.oltean@nxp.com/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6ebe6c1
    • Vladimir Oltean's avatar
      net/sched: taprio: don't segment unnecessarily · 39b02d6d
      Vladimir Oltean authored
      Improve commit 497cc002 ("taprio: Handle short intervals and large
      packets") to only perform segmentation when skb->len exceeds what
      taprio_dequeue() expects.
      
      In practice, this will make the biggest difference when a traffic class
      gate is always open in the schedule. This is because the max_frm_len
      will be U32_MAX, and such large skb->len values as Kurt reported will be
      sent just fine unsegmented.
      
      What I don't seem to know how to handle is how to make sure that the
      segmented skbs themselves are smaller than the maximum frame size given
      by the current queueMaxSDU[tc]. Nonetheless, we still need to drop
      those, otherwise the Qdisc will hang.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39b02d6d
    • Vladimir Oltean's avatar
      net/sched: taprio: split segmentation logic from qdisc_enqueue() · 2d5e8071
      Vladimir Oltean authored
      The majority of the taprio_enqueue()'s function is spent doing TCP
      segmentation, which doesn't look right to me. Compilers shouldn't have a
      problem in inlining code no matter how we write it, so move the
      segmentation logic to a separate function.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d5e8071
    • Vladimir Oltean's avatar
      net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations · fed87cc6
      Vladimir Oltean authored
      taprio today has a huge problem with small TC gate durations, because it
      might accept packets in taprio_enqueue() which will never be sent by
      taprio_dequeue().
      
      Since not much infrastructure was available, a kludge was added in
      commit 497cc002 ("taprio: Handle short intervals and large
      packets"), which segmented large TCP segments, but the fact of the
      matter is that the issue isn't specific to large TCP segments (and even
      worse, the performance penalty in segmenting those is absolutely huge).
      
      In commit a54fc09e ("net/sched: taprio: allow user input of per-tc
      max SDU"), taprio gained support for queueMaxSDU, which is precisely the
      mechanism through which packets should be dropped at qdisc_enqueue() if
      they cannot be sent.
      
      After that patch, it was necessary for the user to manually limit the
      maximum MTU per TC. This change adds the necessary logic for taprio to
      further limit the values specified (or not specified) by the user to
      some minimum values which never allow oversized packets to be sent.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fed87cc6
    • Vladimir Oltean's avatar
      net/sched: keep the max_frm_len information inside struct sched_gate_list · a878fd46
      Vladimir Oltean authored
      I have one practical reason for doing this and one concerning correctness.
      
      The practical reason has to do with a follow-up patch, which aims to mix
      2 sources of max_sdu (one coming from the user and the other automatically
      calculated based on TC gate durations @current link speed). Among those
      2 sources of input, we must always select the smaller max_sdu value, but
      this can change at various link speeds. So the max_sdu coming from the
      user must be kept separated from the value that is operationally used
      (the minimum of the 2), because otherwise we overwrite it and forget
      what the user asked us to do.
      
      To solve that, this patch proposes that struct sched_gate_list contains
      the operationally active max_frm_len, and q->max_sdu contains just what
      was requested by the user.
      
      The reason having to do with correctness is based on the following
      observation: the admin sched_gate_list becomes operational at a given
      base_time in the future. Until then, it is inactive and applies no
      shaping, all gates are open, etc. So the queueMaxSDU dropping shouldn't
      apply either (this is a mechanism to ensure that packets smaller than
      the largest gate duration for that TC don't hang the port; clearly it
      makes little sense if the gates are always open).
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a878fd46
    • Vladimir Oltean's avatar
      net/sched: taprio: warn about missing size table · a3d91b2c
      Vladimir Oltean authored
      Vinicius intended taprio to take the L1 overhead into account when
      estimating packet transmission time through user input, specifically
      through the qdisc size table (man tc-stab).
      
      Something like this:
      
      tc qdisc replace dev $eth root stab overhead 24 taprio \
      	num_tc 8 \
      	map 0 1 2 3 4 5 6 7 \
      	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
      	base-time 0 \
      	sched-entry S 0x7e 9000000 \
      	sched-entry S 0x82 1000000 \
      	max-sdu 0 0 0 0 0 0 0 200 \
      	flags 0x0 clockid CLOCK_TAI
      
      Without the overhead being specified, transmission times will be
      underestimated and will cause late transmissions. For an offloading
      driver, it might even cause TX hangs if there is no open gate large
      enough to send the maximum sized packets for that TC (including L1
      overhead). Properly knowing the L1 overhead will ensure that we are able
      to auto-calculate the queueMaxSDU per traffic class just right, and
      avoid these hangs due to head-of-line blocking.
      
      We can't make the stab mandatory due to existing setups, but we can warn
      the user that it's important with a warning netlink extack.
      
      Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220505160357.298794-1-vladimir.oltean@nxp.com/Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3d91b2c
    • Vladimir Oltean's avatar
      net/sched: make stab available before ops->init() call · 1f62879e
      Vladimir Oltean authored
      Some qdiscs like taprio turn out to be actually pretty reliant on a well
      configured stab, to not underestimate the skb transmission time (by
      properly accounting for L1 overhead).
      
      In a future change, taprio will need the stab, if configured by the
      user, to be available at ops->init() time. It will become even more
      important in upcoming work, when the overhead will be used for the
      queueMaxSDU calculation that is passed to an offloading driver.
      
      However, rcu_assign_pointer(sch->stab, stab) is called right after
      ops->init(), making it unavailable, and I don't really see a good reason
      for that.
      
      Move it earlier, which nicely seems to simplify the error handling path
      as well.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f62879e
    • Vladimir Oltean's avatar
      net/sched: taprio: calculate guard band against actual TC gate close time · a1e6ad30
      Vladimir Oltean authored
      taprio_dequeue_from_txq() looks at the entry->end_time to determine
      whether the skb will overrun its traffic class gate, as if at the end of
      the schedule entry there surely is a "gate close" event for it. Hint:
      maybe there isn't.
      
      For each schedule entry, introduce an array of kernel times which
      actually tracks when in the future will there be an *actual* gate close
      event for that traffic class, and use that in the guard band overrun
      calculation.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1e6ad30
    • Vladimir Oltean's avatar
      net/sched: taprio: calculate budgets per traffic class · d2ad689d
      Vladimir Oltean authored
      Currently taprio assumes that the budget for a traffic class expires at
      the end of the current interval as if the next interval contains a "gate
      close" event for this traffic class.
      
      This is, however, an unfounded assumption. Allow schedule entry
      intervals to be fused together for a particular traffic class by
      calculating the budget until the gate *actually* closes.
      
      This means we need to keep budgets per traffic class, and we also need
      to update the budget consumption procedure.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2ad689d
    • Vladimir Oltean's avatar
      net/sched: taprio: rename close_time to end_time · e5517551
      Vladimir Oltean authored
      There is a confusion in terms in taprio which makes what is called
      "close_time" to be actually used for 2 things:
      
      1. determining when an entry "closes" such that transmitted skbs are
         never allowed to overrun that time (?!)
      2. an aid for determining when to advance and/or restart the schedule
         using the hrtimer
      
      It makes more sense to call this so-called "close_time" "end_time",
      because it's not clear at all to me what "closes". Future patches will
      hopefully make better use of the term "to close".
      
      This is an absolutely mechanical change.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5517551
    • Vladimir Oltean's avatar
      net/sched: taprio: calculate tc gate durations · a306a90c
      Vladimir Oltean authored
      Current taprio code operates on a very simplistic (and incorrect)
      assumption: that egress scheduling for a traffic class can only take
      place for the duration of the current interval, or i.o.w., it assumes
      that at the end of each schedule entry, there is a "gate close" event
      for all traffic classes.
      
      As an example, traffic sent with the schedule below will be jumpy, even
      though all 8 TC gates are open, so there is absolutely no "gate close"
      event (effectively a transition from BIT(tc)==1 to BIT(tc)==0 in
      consecutive schedule entries):
      
      tc qdisc replace dev veth0 parent root taprio \
      	num_tc 2 \
      	map 0 1 \
      	queues 1@0 1@1 \
      	base-time 0 \
      	sched-entry S 0xff 4000000000 \
      	clockid CLOCK_TAI \
      	flags 0x0
      
      This qdisc simply does not have what it takes in terms of logic to
      *actually* compute the durations of traffic classes. Also, it does not
      recognize the need to use this information on a per-traffic-class basis:
      it always looks at entry->interval and entry->close_time.
      
      This change proposes that each schedule entry has an array called
      tc_gate_duration[tc]. This holds the information: "for how long will
      this traffic class gate remain open, starting from *this* schedule
      entry". If the traffic class gate is always open, that value is equal to
      the cycle time of the schedule.
      
      We'll also need to keep track, for the purpose of queueMaxSDU[tc]
      calculation, what is the maximum time duration for a traffic class
      having an open gate. This gives us directly what is the maximum sized
      packet that this traffic class will have to accept. For everything else
      it has to qdisc_drop() it in qdisc_enqueue().
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a306a90c
    • Vladimir Oltean's avatar
      net/sched: taprio: give higher priority to higher TCs in software dequeue mode · 2f530df7
      Vladimir Oltean authored
      Current taprio software implementation is haunted by the shadow of the
      igb/igc hardware model. It iterates over child qdiscs in increasing
      order of TXQ index, therefore giving higher xmit priority to TXQ 0 and
      lower to TXQ N. According to discussions with Vinicius, that is the
      default (perhaps even unchangeable) prioritization scheme used for the
      NICs that taprio was first written for (igb, igc), and we have a case of
      two bugs canceling out, resulting in a functional setup on igb/igc, but
      a less sane one on other NICs.
      
      To the best of my understanding, taprio should prioritize based on the
      traffic class, so it should really dequeue starting with the highest
      traffic class and going down from there. We get to the TXQ using the
      tc_to_txq[] netdev property.
      
      TXQs within the same TC have the same (strict) priority, so we should
      pick from them as fairly as we can. We can achieve that by implementing
      something very similar to q->curband from multiq_dequeue().
      
      Since igb/igc really do have TXQ 0 of higher hardware priority than
      TXQ 1 etc, we need to preserve the behavior for them as well. We really
      have no choice, because in txtime-assist mode, taprio is essentially a
      software scheduler towards offloaded child tc-etf qdiscs, so the TXQ
      selection really does matter (not all igb TXQs support ETF/SO_TXTIME,
      says Kurt Kanzenbach).
      
      To preserve the behavior, we need a capability bit so that taprio can
      determine if it's running on igb/igc, or on something else. Because igb
      doesn't offload taprio at all, we can't piggyback on the
      qdisc_offload_query_caps() call from taprio_enable_offload(), but
      instead we need a separate call which is also made for software
      scheduling.
      
      Introduce two static keys to minimize the performance penalty on systems
      which only have igb/igc NICs, and on systems which only have other NICs.
      For mixed systems, taprio will have to dynamically check whether to
      dequeue using one prioritization algorithm or using the other.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f530df7
    • Vladimir Oltean's avatar
      net/sched: taprio: avoid calling child->ops->dequeue(child) twice · 4c229427
      Vladimir Oltean authored
      Simplify taprio_dequeue_from_txq() by noticing that we can goto one call
      earlier than the previous skb_found label. This is possible because
      we've unified the treatment of the child->ops->dequeue(child) return
      call, we always try other TXQs now, instead of abandoning the root
      dequeue completely if we failed in the peek() case.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c229427
    • Vladimir Oltean's avatar
      net/sched: taprio: refactor one skb dequeue from TXQ to separate function · 92f96667
      Vladimir Oltean authored
      Future changes will refactor the TXQ selection procedure, and a lot of
      stuff will become messy, the indentation of the bulk of the dequeue
      procedure would increase, etc.
      
      Break out the bulk of the function into a new one, which knows the TXQ
      (child qdisc) we should perform a dequeue from.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92f96667
    • Vladimir Oltean's avatar
      net/sched: taprio: continue with other TXQs if one dequeue() failed · 1638bbbe
      Vladimir Oltean authored
      This changes the handling of an unlikely condition to not stop dequeuing
      if taprio failed to dequeue the peeked skb in taprio_dequeue().
      
      I've no idea when this can happen, but the only side effect seems to be
      that the atomic_sub_return() call right above will have consumed some
      budget. This isn't a big deal, since either that made us remain without
      any budget (and therefore, we'd exit on the next peeked skb anyway), or
      we could send some packets from other TXQs.
      
      I'm making this change because in a future patch I'll be refactoring the
      dequeue procedure to simplify it, and this corner case will have to go
      away.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1638bbbe
    • Vladimir Oltean's avatar
      net/sched: taprio: delete peek() implementation · ecc0cc98
      Vladimir Oltean authored
      There isn't any code in the network stack which calls taprio_peek().
      We only see qdisc->ops->peek() being called on child qdiscs of other
      classful qdiscs, never from the generic qdisc code. Whereas taprio is
      never a child qdisc, it is always root.
      
      This snippet of a comment from qdisc_peek_dequeued() seems to confirm:
      
      	/* we can reuse ->gso_skb because peek isn't called for root qdiscs */
      
      Since I've been known to be wrong many times though, I'm not completely
      removing it, but leaving a stub function in place which emits a warning.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ecc0cc98
    • David S. Miller's avatar
      Merge branch 'mptcp-fixes' · 965bffd2
      David S. Miller authored
      Matthieu Baerts says:
      
      ====================
      mptcp: fixes for v6.2
      
      Patch 1 clears resources earlier if there is no more reasons to keep
      MPTCP sockets alive.
      
      Patches 2 and 3 fix some locking issues visible in some rare corner
      cases: the linked issues should be quite hard to reproduce.
      
      Patch 4 makes sure subflows are correctly cleaned after the end of a
      connection.
      
      Patch 5 and 6 improve the selftests stability when running in a slow
      environment by transfering data for a longer period on one hand and by
      stopping the tests when all expected events have been observed on the
      other hand.
      
      All these patches fix issues introduced before v6.2.
      ====================
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      965bffd2
    • Matthieu Baerts's avatar
      selftests: mptcp: stop tests earlier · 070d6daf
      Matthieu Baerts authored
      These 'endpoint' tests from 'mptcp_join.sh' selftest start a transfer in
      the background and check the status during this transfer.
      
      Once the expected events have been recorded, there is no reason to wait
      for the data transfer to finish. It can be stopped earlier to reduce the
      execution time by more than half.
      
      For these tests, the exchanged data were not verified. Errors, if any,
      were ignored but that's fine, plenty of other tests are looking at that.
      It is then OK to mute stderr now that we are sure errors will be printed
      (and still ignored) because the transfer is stopped before the end.
      
      Fixes: e274f715 ("selftests: mptcp: add subflow limits test-cases")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      070d6daf