1. 15 Apr, 2024 14 commits
    • Gabriel Krisman Bertazi's avatar
      udp: Avoid call to compute_score on multiple sites · 50aee97d
      Gabriel Krisman Bertazi authored
      We've observed a 7-12% performance regression in iperf3 UDP ipv4 and
      ipv6 tests with multiple sockets on Zen3 cpus, which we traced back to
      commit f0ea27e7 ("udp: re-score reuseport groups when connected
      sockets are present").  The failing tests were those that would spawn
      UDP sockets per-cpu on systems that have a high number of cpus.
      
      Unsurprisingly, it is not caused by the extra re-scoring of the reused
      socket, but due to the compiler no longer inlining compute_score, once
      it has the extra call site in udp4_lib_lookup2.  This is augmented by
      the "Safe RET" mitigation for SRSO, needed in our Zen3 cpus.
      
      We could just explicitly inline it, but compute_score() is quite a large
      function, around 300b.  Inlining in two sites would almost double
      udp4_lib_lookup2, which is a silly thing to do just to workaround a
      mitigation.  Instead, this patch shuffles the code a bit to avoid the
      multiple calls to compute_score.  Since it is a static function used in
      one spot, the compiler can safely fold it in, as it did before, without
      increasing the text size.
      
      With this patch applied I ran my original iperf3 testcases.  The failing
      cases all looked like this (ipv4):
      	iperf3 -c 127.0.0.1 --udp -4 -f K -b $R -l 8920 -t 30 -i 5 -P 64 -O 2
      
      where $R is either 1G/10G/0 (max, unlimited).  I ran 3 times each.
      baseline is v6.9-rc3. harmean == harmonic mean; CV == coefficient of
      variation.
      
      ipv4:
                       1G                10G                  MAX
      	    HARMEAN  (CV)      HARMEAN  (CV)    HARMEAN     (CV)
      baseline 1743852.66(0.0208) 1725933.02(0.0167) 1705203.78(0.0386)
      patched  1968727.61(0.0035) 1962283.22(0.0195) 1923853.50(0.0256)
      
      ipv6:
                       1G                10G                  MAX
      	    HARMEAN  (CV)      HARMEAN  (CV)    HARMEAN     (CV)
      baseline 1729020.03(0.0028) 1691704.49(0.0243) 1692251.34(0.0083)
      patched  1900422.19(0.0067) 1900968.01(0.0067) 1568532.72(0.1519)
      
      This restores the performance we had before the change above with this
      benchmark.  We obviously don't expect any real impact when mitigations
      are disabled, but just to be sure it also doesn't regresses:
      
      mitigations=off ipv4:
                       1G                10G                  MAX
      	    HARMEAN  (CV)      HARMEAN  (CV)    HARMEAN     (CV)
      baseline 3230279.97(0.0066) 3229320.91(0.0060) 2605693.19(0.0697)
      patched  3242802.36(0.0073) 3239310.71(0.0035) 2502427.19(0.0882)
      
      Cc: Lorenz Bauer <lmb@isovalent.com>
      Fixes: f0ea27e7 ("udp: re-score reuseport groups when connected sockets are present")
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@suse.de>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50aee97d
    • Breno Leitao's avatar
      net: ip6_gre: Remove generic .ndo_get_stats64 · 05d604a5
      Breno Leitao authored
      Commit 3e2f544d ("net: get stats64 if device if driver is
      configured") moved the callback to dev_get_tstats64() to net core, so,
      unless the driver is doing some custom stats collection, it does not
      need to set .ndo_get_stats64.
      
      Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it
      doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64
      function pointer.
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05d604a5
    • Breno Leitao's avatar
      net: ipv6_gre: Do not use custom stat allocator · 8622f90a
      Breno Leitao authored
      With commit 34d21de9 ("net: Move {l,t,d}stats allocation to core and
      convert veth & vrf"), stats allocation could be done on net core
      instead of in this driver.
      
      With this new approach, the driver doesn't have to bother with error
      handling (allocation failure checking, making sure free happens in the
      right spot, etc). This is core responsibility now.
      
      Remove the allocation in the ip6_gre and leverage the network
      core allocation instead.
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8622f90a
    • Russell King (Oracle)'s avatar
      net: dsa: convert dsa_user_phylink_fixed_state() to use dsa_phylink_to_port() · a788faff
      Russell King (Oracle) authored
      Convert dsa_user_phylink_fixed_state() to use the newly introduced
      dsa_phylink_to_port() helper.
      Suggested-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a788faff
    • Heiner Kallweit's avatar
      net: constify net_class · 9382b4f3
      Heiner Kallweit authored
      AFAICS all users of net_class take a const struct class * argument.
      Therefore fully constify net_class.
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Acked-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9382b4f3
    • John Fraker's avatar
      gve: Correctly report software timestamping capabilities · 4ca78e61
      John Fraker authored
      gve has supported software timestamp generation since its inception,
      but has not advertised that support via ethtool. This patch correctly
      advertises that support.
      Signed-off-by: default avatarJohn Fraker <jfraker@google.com>
      Reviewed-by: default avatarHarshitha Ramamurthy <hramamurthy@google.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ca78e61
    • Jason Xing's avatar
      net: save some cycles when doing skb_attempt_defer_free() · 4d0470b9
      Jason Xing authored
      Normally, we don't face these two exceptions very often meanwhile
      we have some chance to meet the condition where the current cpu id
      is the same as skb->alloc_cpu.
      
      One simple test that can help us see the frequency of this statement
      'cpu == raw_smp_processor_id()':
      1. running iperf -s and iperf -c [ip] -P [MAX CPU]
      2. using BPF to capture skb_attempt_defer_free()
      
      I can see around 4% chance that happens to satisfy the statement.
      So moving this statement at the beginning can save some cycles in
      most cases.
      Signed-off-by: default avatarJason Xing <kernelxing@tencent.com>
      Reviewed-by: default avatarAlexander Lobakin <aleksander.lobakin@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d0470b9
    • David S. Miller's avatar
      Merge branch 'flower-control-flags' · 71329c49
      David S. Miller authored
      Asbjørn Sloth Tønnesen says:
      
      ====================
      flower: validate control flags
      
      I have reviewed the flower control flags code.
      In all, but one (sfc), the flags field wasn't
      checked properly for unsupported flags.
      
      In this series I have only included a single example
      user for each helper function. Once the helpers are in,
      I will submit patches for all other drivers implementing
      flower.
      
      After which there will be:
      - 6 drivers using flow_rule_is_supp_control_flags()
      - 8 drivers using flow_rule_has_control_flags()
      - 11 drivers using flow_rule_match_has_control_flags()
      
      ---
      Changelog:
      
      v3:
      - Added Reviewed-by from Louis Peens (first two patches)
      - Properly fixed kernel-doc format
      
      v2: https://lore.kernel.org/netdev/20240410093235.5334-1-ast@fiberby.net/
      - Squashed the 3 helper functions to one commmit (requested by Baowen Zheng)
      - Renamed helper functions to avoid double negatives (suggested by Louis Peens)
      - Reverse booleans in some functions and callsites to align with new names
      - Fix autodoc format
      
      v1: https://lore.kernel.org/netdev/20240408130927.78594-1-ast@fiberby.net/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71329c49
    • Asbjørn Sloth Tønnesen's avatar
      net: dsa: microchip: ksz9477: flower: validate control flags · d9a1249e
      Asbjørn Sloth Tønnesen authored
      Add check for unsupported control flags.
      
      Only compile-tested, no access to HW.
      Signed-off-by: default avatarAsbjørn Sloth Tønnesen <ast@fiberby.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9a1249e
    • Asbjørn Sloth Tønnesen's avatar
      net: prestera: flower: validate control flags · f8a5ea8c
      Asbjørn Sloth Tønnesen authored
      Add check for unsupported control flags.
      
      Only compile-tested, no access to HW.
      Signed-off-by: default avatarAsbjørn Sloth Tønnesen <ast@fiberby.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8a5ea8c
    • Asbjørn Sloth Tønnesen's avatar
      nfp: flower: fix check for unsupported control flags · e36245da
      Asbjørn Sloth Tønnesen authored
      Use flow_rule_is_supp_control_flags()
      
      Check the mask, not the key, for unsupported control flags.
      
      Only compile-tested, no access to HW
      Signed-off-by: default avatarAsbjørn Sloth Tønnesen <ast@fiberby.net>
      Reviewed-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e36245da
    • Asbjørn Sloth Tønnesen's avatar
      flow_offload: add control flag checking helpers · d11e6311
      Asbjørn Sloth Tønnesen authored
      These helpers aim to help drivers, with checking
      for the presence of unsupported control flags.
      
      For drivers supporting at least one control flag:
        flow_rule_is_supp_control_flags()
      
      For drivers using flow_rule_match_control(), but not using flags:
        flow_rule_has_control_flags()
      
      For drivers not using flow_rule_match_control():
        flow_rule_match_has_control_flags()
      
      While primarily aimed at FLOW_DISSECTOR_KEY_CONTROL
      and flow_rule_match_control(), then the first two
      can also be used with FLOW_DISSECTOR_KEY_ENC_CONTROL
      and flow_rule_match_enc_control().
      
      These helpers mirrors the existing check done in sfc:
        drivers/net/ethernet/sfc/tc.c +276
      
      Only compile-tested.
      Signed-off-by: default avatarAsbjørn Sloth Tønnesen <ast@fiberby.net>
      Reviewed-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d11e6311
    • Jakub Kicinski's avatar
      net: dev_addr_lists: move locking out of init/exit in kunit · 3db3b629
      Jakub Kicinski authored
      We lock and unlock rtnl in init/exit for convenience,
      but it started causing problems if the exit is handled
      by a different thread. To avoid having to futz with
      disabling locking assertions move the locking into
      the test cases. We don't use ASSERTs so it should
      be safe.
      
         ============= dev-addr-list-test (6 subtests) ==============
         [PASSED] dev_addr_test_basic
         [PASSED] dev_addr_test_sync_one
         [PASSED] dev_addr_test_add_del
         [PASSED] dev_addr_test_del_main
         [PASSED] dev_addr_test_add_set
         [PASSED] dev_addr_test_add_excl
         =============== [PASSED] dev-addr-list-test ================
      
      Link: https://lore.kernel.org/all/20240403131936.787234-7-linux@roeck-us.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3db3b629
    • Wander Lairson Costa's avatar
      drop_monitor: replace spin_lock by raw_spin_lock · f1e197a6
      Wander Lairson Costa authored
      trace_drop_common() is called with preemption disabled, and it acquires
      a spin_lock. This is problematic for RT kernels because spin_locks are
      sleeping locks in this configuration, which causes the following splat:
      
      BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
      in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 449, name: rcuc/47
      preempt_count: 1, expected: 0
      RCU nest depth: 2, expected: 2
      5 locks held by rcuc/47/449:
       #0: ff1100086ec30a60 ((softirq_ctrl.lock)){+.+.}-{2:2}, at: __local_bh_disable_ip+0x105/0x210
       #1: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0xbf/0x130
       #2: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip+0x11c/0x210
       #3: ffffffffb394a160 (rcu_callback){....}-{0:0}, at: rcu_do_batch+0x360/0xc70
       #4: ff1100086ee07520 (&data->lock){+.+.}-{2:2}, at: trace_drop_common.constprop.0+0xb5/0x290
      irq event stamp: 139909
      hardirqs last  enabled at (139908): [<ffffffffb1df2b33>] _raw_spin_unlock_irqrestore+0x63/0x80
      hardirqs last disabled at (139909): [<ffffffffb19bd03d>] trace_drop_common.constprop.0+0x26d/0x290
      softirqs last  enabled at (139892): [<ffffffffb07a1083>] __local_bh_enable_ip+0x103/0x170
      softirqs last disabled at (139898): [<ffffffffb0909b33>] rcu_cpu_kthread+0x93/0x1f0
      Preemption disabled at:
      [<ffffffffb1de786b>] rt_mutex_slowunlock+0xab/0x2e0
      CPU: 47 PID: 449 Comm: rcuc/47 Not tainted 6.9.0-rc2-rt1+ #7
      Hardware name: Dell Inc. PowerEdge R650/0Y2G81, BIOS 1.6.5 04/15/2022
      Call Trace:
       <TASK>
       dump_stack_lvl+0x8c/0xd0
       dump_stack+0x14/0x20
       __might_resched+0x21e/0x2f0
       rt_spin_lock+0x5e/0x130
       ? trace_drop_common.constprop.0+0xb5/0x290
       ? skb_queue_purge_reason.part.0+0x1bf/0x230
       trace_drop_common.constprop.0+0xb5/0x290
       ? preempt_count_sub+0x1c/0xd0
       ? _raw_spin_unlock_irqrestore+0x4a/0x80
       ? __pfx_trace_drop_common.constprop.0+0x10/0x10
       ? rt_mutex_slowunlock+0x26a/0x2e0
       ? skb_queue_purge_reason.part.0+0x1bf/0x230
       ? __pfx_rt_mutex_slowunlock+0x10/0x10
       ? skb_queue_purge_reason.part.0+0x1bf/0x230
       trace_kfree_skb_hit+0x15/0x20
       trace_kfree_skb+0xe9/0x150
       kfree_skb_reason+0x7b/0x110
       skb_queue_purge_reason.part.0+0x1bf/0x230
       ? __pfx_skb_queue_purge_reason.part.0+0x10/0x10
       ? mark_lock.part.0+0x8a/0x520
      ...
      
      trace_drop_common() also disables interrupts, but this is a minor issue
      because we could easily replace it with a local_lock.
      
      Replace the spin_lock with raw_spin_lock to avoid sleeping in atomic
      context.
      Signed-off-by: default avatarWander Lairson Costa <wander@redhat.com>
      Reported-by: default avatarHu Chunyu <chuhu@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1e197a6
  2. 13 Apr, 2024 26 commits