1. 08 Feb, 2023 17 commits
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl_tcam: Move devlink param to TCAM code · 74cbc3c0
      Ido Schimmel authored
      Cited commit added 'DEVLINK_CMD_PARAM_DEL' notifications whenever the
      network namespace of the devlink instance is changed. Specifically, the
      notifications are generated after calling reload_down(), but before
      calling reload_up(). At this stage, the data structures accessed while
      reading the value of the "acl_region_rehash_interval" devlink parameter
      are uninitialized, resulting in a use-after-free [1].
      
      Fix by moving the registration and unregistration of the devlink
      parameter to the TCAM code where it is actually used. This means that
      the parameter is unregistered during reload_down() and then
      re-registered during reload_up(), avoiding the use-after-free between
      these two operations.
      
      Reproducer:
      
       # ip netns add test123
       # devlink dev reload pci/0000:06:00.0 netns test123
      
      [1]
      BUG: KASAN: use-after-free in mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xb2/0xd0
      Read of size 4 at addr ffff888162fd37d8 by task devlink/1323
      [...]
      Call Trace:
       <TASK>
       dump_stack_lvl+0x95/0xbd
       print_report+0x181/0x4a1
       kasan_report+0xdb/0x200
       mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xb2/0xd0
       mlxsw_sp_params_acl_region_rehash_intrvl_get+0x32/0x80
       devlink_nl_param_fill.constprop.0+0x29a/0x11e0
       devlink_param_notify.constprop.0+0xb9/0x250
       devlink_notify_unregister+0xbc/0x470
       devlink_reload+0x1aa/0x440
       devlink_nl_cmd_reload+0x559/0x11b0
       genl_family_rcv_msg_doit.isra.0+0x1f8/0x2e0
       genl_rcv_msg+0x558/0x7f0
       netlink_rcv_skb+0x170/0x440
       genl_rcv+0x2d/0x40
       netlink_unicast+0x53f/0x810
       netlink_sendmsg+0x961/0xe80
       __sys_sendto+0x2a4/0x420
       __x64_sys_sendto+0xe5/0x1c0
       do_syscall_64+0x38/0x80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: 7d7e9169 ("devlink: move devlink reload notifications back in between _down() and _up() calls")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      74cbc3c0
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl_tcam: Reorder functions to avoid forward declarations · 194ab947
      Ido Schimmel authored
      Move the initialization and de-initialization code further below in
      order to avoid forward declarations in the next patch. No functional
      changes.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      194ab947
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl_tcam: Make fini symmetric to init · 61fe3b91
      Ido Schimmel authored
      Move mutex_destroy() to the end to make the function symmetric with
      mlxsw_sp_acl_tcam_init(). No functional changes.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      61fe3b91
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl_tcam: Add missing mutex_destroy() · 65823e07
      Ido Schimmel authored
      Pair mutex_init() with a mutex_destroy() in the error path. Found during
      code review. No functional changes.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      65823e07
    • Danielle Ratson's avatar
      mlxsw: spectrum: Remove pointless call to devlink_param_driverinit_value_set() · 8b50ac29
      Danielle Ratson authored
      The "acl_region_rehash_interval" devlink parameter is a "runtime"
      parameter, making the call to devl_param_driverinit_value_set()
      pointless. Before cited commit the function simply returned an error
      (that was not checked), but now it emits a WARNING [1].
      
      Fix by removing the function call.
      
      [1]
      WARNING: CPU: 0 PID: 7 at net/devlink/leftover.c:10974
      devl_param_driverinit_value_set+0x8c/0x90
      [...]
      Call Trace:
       <TASK>
       mlxsw_sp2_params_register+0x83/0xb0 [mlxsw_spectrum]
       __mlxsw_core_bus_device_register+0x5e5/0x990 [mlxsw_core]
       mlxsw_core_bus_device_register+0x42/0x60 [mlxsw_core]
       mlxsw_pci_probe+0x1f0/0x230 [mlxsw_pci]
       local_pci_probe+0x1a/0x40
       work_for_cpu_fn+0xf/0x20
       process_one_work+0x1db/0x390
       worker_thread+0x1d5/0x3b0
       kthread+0xe5/0x110
       ret_from_fork+0x1f/0x30
       </TASK>
      
      Fixes: 85fe0b32 ("devlink: make devlink_param_driverinit_value_set() return void")
      Signed-off-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8b50ac29
    • Vladimir Oltean's avatar
      net: enetc: add support for MAC Merge statistics counters · cf52bd23
      Vladimir Oltean authored
      Add PF driver support for the following:
      
      - Viewing the standardized MAC Merge layer counters.
      
      - Viewing the standardized Ethernet MAC and RMON counters associated
        with the pMAC.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20230206094531.444988-2-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cf52bd23
    • Vladimir Oltean's avatar
      net: enetc: add support for MAC Merge layer · c7b9e808
      Vladimir Oltean authored
      Add PF driver support for viewing and changing the MAC Merge sublayer
      parameters, and seeing the verification state machine's current state.
      The verification handshake with the link partner is driven by hardware.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20230206094531.444988-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c7b9e808
    • Jakub Kicinski's avatar
      Merge branch 'sched-cpumask-improve-on-cpumask_local_spread-locality' · cc74ca30
      Jakub Kicinski authored
      Yury Norov says:
      
      ====================
      sched: cpumask: improve on cpumask_local_spread() locality
      
      cpumask_local_spread() currently checks local node for presence of i'th
      CPU, and then if it finds nothing makes a flat search among all non-local
      CPUs. We can do it better by checking CPUs per NUMA hops.
      
      This has significant performance implications on NUMA machines, for example
      when using NUMA-aware allocated memory together with NUMA-aware IRQ
      affinity hints.
      
      Performance tests from patch 8 of this series for mellanox network
      driver show:
      
        TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on).
        Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121
      
        +-------------------------+-----------+------------------+------------------+
        |                         | BW (Gbps) | TX side CPU util | RX side CPU util |
        +-------------------------+-----------+------------------+------------------+
        | Baseline                | 52.3      | 6.4 %            | 17.9 %           |
        +-------------------------+-----------+------------------+------------------+
        | Applied on TX side only | 52.6      | 5.2 %            | 18.5 %           |
        +-------------------------+-----------+------------------+------------------+
        | Applied on RX side only | 94.9      | 11.9 %           | 27.2 %           |
        +-------------------------+-----------+------------------+------------------+
        | Applied on both sides   | 95.1      | 8.4 %            | 27.3 %           |
        +-------------------------+-----------+------------------+------------------+
      
        Bottleneck in RX side is released, reached linerate (~1.8x speedup).
        ~30% less cpu util on TX.
      ====================
      
      Link: https://lore.kernel.org/r/20230121042436.2661843-1-yury.norov@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cc74ca30
    • Yury Norov's avatar
      lib/cpumask: update comment for cpumask_local_spread() · 2ac4980c
      Yury Norov authored
      Now that we have an iterator-based alternative for a very common case
      of using cpumask_local_spread for all cpus in a row, it's worth to
      mention that in comment to cpumask_local_spread().
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Reviewed-by: default avatarValentin Schneider <vschneid@redhat.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2ac4980c
    • Tariq Toukan's avatar
      net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints · 2acda577
      Tariq Toukan authored
      In the IRQ affinity hints, replace the binary NUMA preference (local /
      remote) with the improved for_each_numa_hop_cpu() API that minds the
      actual distances, so that remote NUMAs with short distance are preferred
      over farther ones.
      
      This has significant performance implications when using NUMA-aware
      allocated memory (follow [1] and derivatives for example).
      
      [1]
      drivers/net/ethernet/mellanox/mlx5/core/en_main.c :: mlx5e_open_channel()
         int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
      
      Performance tests:
      
      TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on).
      Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121
      
      +-------------------------+-----------+------------------+------------------+
      |                         | BW (Gbps) | TX side CPU util | RX side CPU util |
      +-------------------------+-----------+------------------+------------------+
      | Baseline                | 52.3      | 6.4 %            | 17.9 %           |
      +-------------------------+-----------+------------------+------------------+
      | Applied on TX side only | 52.6      | 5.2 %            | 18.5 %           |
      +-------------------------+-----------+------------------+------------------+
      | Applied on RX side only | 94.9      | 11.9 %           | 27.2 %           |
      +-------------------------+-----------+------------------+------------------+
      | Applied on both sides   | 95.1      | 8.4 %            | 27.3 %           |
      +-------------------------+-----------+------------------+------------------+
      
      Bottleneck in RX side is released, reached linerate (~1.8x speedup).
      ~30% less cpu util on TX.
      
      * CPU util on active cores only.
      
      Setups details (similar for both sides):
      
      NIC: ConnectX6-DX dual port, 100 Gbps each.
      Single port used in the tests.
      
      $ lscpu
      Architecture:        x86_64
      CPU op-mode(s):      32-bit, 64-bit
      Byte Order:          Little Endian
      CPU(s):              256
      On-line CPU(s) list: 0-255
      Thread(s) per core:  2
      Core(s) per socket:  64
      Socket(s):           2
      NUMA node(s):        16
      Vendor ID:           AuthenticAMD
      CPU family:          25
      Model:               1
      Model name:          AMD EPYC 7763 64-Core Processor
      Stepping:            1
      CPU MHz:             2594.804
      BogoMIPS:            4890.73
      Virtualization:      AMD-V
      L1d cache:           32K
      L1i cache:           32K
      L2 cache:            512K
      L3 cache:            32768K
      NUMA node0 CPU(s):   0-7,128-135
      NUMA node1 CPU(s):   8-15,136-143
      NUMA node2 CPU(s):   16-23,144-151
      NUMA node3 CPU(s):   24-31,152-159
      NUMA node4 CPU(s):   32-39,160-167
      NUMA node5 CPU(s):   40-47,168-175
      NUMA node6 CPU(s):   48-55,176-183
      NUMA node7 CPU(s):   56-63,184-191
      NUMA node8 CPU(s):   64-71,192-199
      NUMA node9 CPU(s):   72-79,200-207
      NUMA node10 CPU(s):  80-87,208-215
      NUMA node11 CPU(s):  88-95,216-223
      NUMA node12 CPU(s):  96-103,224-231
      NUMA node13 CPU(s):  104-111,232-239
      NUMA node14 CPU(s):  112-119,240-247
      NUMA node15 CPU(s):  120-127,248-255
      ..
      
      $ numactl -H
      ..
      node distances:
      node   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
        0:  10  11  11  11  12  12  12  12  32  32  32  32  32  32  32  32
        1:  11  10  11  11  12  12  12  12  32  32  32  32  32  32  32  32
        2:  11  11  10  11  12  12  12  12  32  32  32  32  32  32  32  32
        3:  11  11  11  10  12  12  12  12  32  32  32  32  32  32  32  32
        4:  12  12  12  12  10  11  11  11  32  32  32  32  32  32  32  32
        5:  12  12  12  12  11  10  11  11  32  32  32  32  32  32  32  32
        6:  12  12  12  12  11  11  10  11  32  32  32  32  32  32  32  32
        7:  12  12  12  12  11  11  11  10  32  32  32  32  32  32  32  32
        8:  32  32  32  32  32  32  32  32  10  11  11  11  12  12  12  12
        9:  32  32  32  32  32  32  32  32  11  10  11  11  12  12  12  12
       10:  32  32  32  32  32  32  32  32  11  11  10  11  12  12  12  12
       11:  32  32  32  32  32  32  32  32  11  11  11  10  12  12  12  12
       12:  32  32  32  32  32  32  32  32  12  12  12  12  10  11  11  11
       13:  32  32  32  32  32  32  32  32  12  12  12  12  11  10  11  11
       14:  32  32  32  32  32  32  32  32  12  12  12  12  11  11  10  11
       15:  32  32  32  32  32  32  32  32  12  12  12  12  11  11  11  10
      
      $ cat /sys/class/net/ens5f0/device/numa_node
      14
      
      Affinity hints (127 IRQs):
      Before:
      331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000
      332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000
      333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000
      334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000
      335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000
      336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000
      337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000
      338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000
      339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      347: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
      348: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002
      349: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000004
      350: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000008
      351: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000010
      352: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000020
      353: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000040
      354: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000080
      355: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000100
      356: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000200
      357: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000400
      358: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000800
      359: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00001000
      360: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00002000
      361: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00004000
      362: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00008000
      363: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00010000
      364: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00020000
      365: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00040000
      366: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00080000
      367: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00100000
      368: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00200000
      369: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00400000
      370: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00800000
      371: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,01000000
      372: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,02000000
      373: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,04000000
      374: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,08000000
      375: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,10000000
      376: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,20000000
      377: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,40000000
      378: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,80000000
      379: 00000000,00000000,00000000,00000000,00000000,00000000,00000001,00000000
      380: 00000000,00000000,00000000,00000000,00000000,00000000,00000002,00000000
      381: 00000000,00000000,00000000,00000000,00000000,00000000,00000004,00000000
      382: 00000000,00000000,00000000,00000000,00000000,00000000,00000008,00000000
      383: 00000000,00000000,00000000,00000000,00000000,00000000,00000010,00000000
      384: 00000000,00000000,00000000,00000000,00000000,00000000,00000020,00000000
      385: 00000000,00000000,00000000,00000000,00000000,00000000,00000040,00000000
      386: 00000000,00000000,00000000,00000000,00000000,00000000,00000080,00000000
      387: 00000000,00000000,00000000,00000000,00000000,00000000,00000100,00000000
      388: 00000000,00000000,00000000,00000000,00000000,00000000,00000200,00000000
      389: 00000000,00000000,00000000,00000000,00000000,00000000,00000400,00000000
      390: 00000000,00000000,00000000,00000000,00000000,00000000,00000800,00000000
      391: 00000000,00000000,00000000,00000000,00000000,00000000,00001000,00000000
      392: 00000000,00000000,00000000,00000000,00000000,00000000,00002000,00000000
      393: 00000000,00000000,00000000,00000000,00000000,00000000,00004000,00000000
      394: 00000000,00000000,00000000,00000000,00000000,00000000,00008000,00000000
      395: 00000000,00000000,00000000,00000000,00000000,00000000,00010000,00000000
      396: 00000000,00000000,00000000,00000000,00000000,00000000,00020000,00000000
      397: 00000000,00000000,00000000,00000000,00000000,00000000,00040000,00000000
      398: 00000000,00000000,00000000,00000000,00000000,00000000,00080000,00000000
      399: 00000000,00000000,00000000,00000000,00000000,00000000,00100000,00000000
      400: 00000000,00000000,00000000,00000000,00000000,00000000,00200000,00000000
      401: 00000000,00000000,00000000,00000000,00000000,00000000,00400000,00000000
      402: 00000000,00000000,00000000,00000000,00000000,00000000,00800000,00000000
      403: 00000000,00000000,00000000,00000000,00000000,00000000,01000000,00000000
      404: 00000000,00000000,00000000,00000000,00000000,00000000,02000000,00000000
      405: 00000000,00000000,00000000,00000000,00000000,00000000,04000000,00000000
      406: 00000000,00000000,00000000,00000000,00000000,00000000,08000000,00000000
      407: 00000000,00000000,00000000,00000000,00000000,00000000,10000000,00000000
      408: 00000000,00000000,00000000,00000000,00000000,00000000,20000000,00000000
      409: 00000000,00000000,00000000,00000000,00000000,00000000,40000000,00000000
      410: 00000000,00000000,00000000,00000000,00000000,00000000,80000000,00000000
      411: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000
      412: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000
      413: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000
      414: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000
      415: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000
      416: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000
      417: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000
      418: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000
      419: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000
      420: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000
      421: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000
      422: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000
      423: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000
      424: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000
      425: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000
      426: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000
      427: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000
      428: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000
      429: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000
      430: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000
      431: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000
      432: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000
      433: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000
      434: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000
      435: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000
      436: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000
      437: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000
      438: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000
      439: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000
      440: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000
      441: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000
      442: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000
      443: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000
      444: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000
      445: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000
      446: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000
      447: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000
      448: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000
      449: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000
      450: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000
      451: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000
      452: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000
      453: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000
      454: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000
      455: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000
      456: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000
      457: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000
      
      After:
      331: 00000000,00000000,00000000,00000000,00010000,00000000,00000000,00000000
      332: 00000000,00000000,00000000,00000000,00020000,00000000,00000000,00000000
      333: 00000000,00000000,00000000,00000000,00040000,00000000,00000000,00000000
      334: 00000000,00000000,00000000,00000000,00080000,00000000,00000000,00000000
      335: 00000000,00000000,00000000,00000000,00100000,00000000,00000000,00000000
      336: 00000000,00000000,00000000,00000000,00200000,00000000,00000000,00000000
      337: 00000000,00000000,00000000,00000000,00400000,00000000,00000000,00000000
      338: 00000000,00000000,00000000,00000000,00800000,00000000,00000000,00000000
      339: 00010000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      340: 00020000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      341: 00040000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      342: 00080000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      343: 00100000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      344: 00200000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      345: 00400000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      346: 00800000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      347: 00000000,00000000,00000000,00000000,00000001,00000000,00000000,00000000
      348: 00000000,00000000,00000000,00000000,00000002,00000000,00000000,00000000
      349: 00000000,00000000,00000000,00000000,00000004,00000000,00000000,00000000
      350: 00000000,00000000,00000000,00000000,00000008,00000000,00000000,00000000
      351: 00000000,00000000,00000000,00000000,00000010,00000000,00000000,00000000
      352: 00000000,00000000,00000000,00000000,00000020,00000000,00000000,00000000
      353: 00000000,00000000,00000000,00000000,00000040,00000000,00000000,00000000
      354: 00000000,00000000,00000000,00000000,00000080,00000000,00000000,00000000
      355: 00000000,00000000,00000000,00000000,00000100,00000000,00000000,00000000
      356: 00000000,00000000,00000000,00000000,00000200,00000000,00000000,00000000
      357: 00000000,00000000,00000000,00000000,00000400,00000000,00000000,00000000
      358: 00000000,00000000,00000000,00000000,00000800,00000000,00000000,00000000
      359: 00000000,00000000,00000000,00000000,00001000,00000000,00000000,00000000
      360: 00000000,00000000,00000000,00000000,00002000,00000000,00000000,00000000
      361: 00000000,00000000,00000000,00000000,00004000,00000000,00000000,00000000
      362: 00000000,00000000,00000000,00000000,00008000,00000000,00000000,00000000
      363: 00000000,00000000,00000000,00000000,01000000,00000000,00000000,00000000
      364: 00000000,00000000,00000000,00000000,02000000,00000000,00000000,00000000
      365: 00000000,00000000,00000000,00000000,04000000,00000000,00000000,00000000
      366: 00000000,00000000,00000000,00000000,08000000,00000000,00000000,00000000
      367: 00000000,00000000,00000000,00000000,10000000,00000000,00000000,00000000
      368: 00000000,00000000,00000000,00000000,20000000,00000000,00000000,00000000
      369: 00000000,00000000,00000000,00000000,40000000,00000000,00000000,00000000
      370: 00000000,00000000,00000000,00000000,80000000,00000000,00000000,00000000
      371: 00000001,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      372: 00000002,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      373: 00000004,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      374: 00000008,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      375: 00000010,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      376: 00000020,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      377: 00000040,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      378: 00000080,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      379: 00000100,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      380: 00000200,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      381: 00000400,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      382: 00000800,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      383: 00001000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      384: 00002000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      385: 00004000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      386: 00008000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      387: 01000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      388: 02000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      389: 04000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      390: 08000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      391: 10000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      392: 20000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      393: 40000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      394: 80000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
      395: 00000000,00000000,00000000,00000000,00000000,00000001,00000000,00000000
      396: 00000000,00000000,00000000,00000000,00000000,00000002,00000000,00000000
      397: 00000000,00000000,00000000,00000000,00000000,00000004,00000000,00000000
      398: 00000000,00000000,00000000,00000000,00000000,00000008,00000000,00000000
      399: 00000000,00000000,00000000,00000000,00000000,00000010,00000000,00000000
      400: 00000000,00000000,00000000,00000000,00000000,00000020,00000000,00000000
      401: 00000000,00000000,00000000,00000000,00000000,00000040,00000000,00000000
      402: 00000000,00000000,00000000,00000000,00000000,00000080,00000000,00000000
      403: 00000000,00000000,00000000,00000000,00000000,00000100,00000000,00000000
      404: 00000000,00000000,00000000,00000000,00000000,00000200,00000000,00000000
      405: 00000000,00000000,00000000,00000000,00000000,00000400,00000000,00000000
      406: 00000000,00000000,00000000,00000000,00000000,00000800,00000000,00000000
      407: 00000000,00000000,00000000,00000000,00000000,00001000,00000000,00000000
      408: 00000000,00000000,00000000,00000000,00000000,00002000,00000000,00000000
      409: 00000000,00000000,00000000,00000000,00000000,00004000,00000000,00000000
      410: 00000000,00000000,00000000,00000000,00000000,00008000,00000000,00000000
      411: 00000000,00000000,00000000,00000000,00000000,00010000,00000000,00000000
      412: 00000000,00000000,00000000,00000000,00000000,00020000,00000000,00000000
      413: 00000000,00000000,00000000,00000000,00000000,00040000,00000000,00000000
      414: 00000000,00000000,00000000,00000000,00000000,00080000,00000000,00000000
      415: 00000000,00000000,00000000,00000000,00000000,00100000,00000000,00000000
      416: 00000000,00000000,00000000,00000000,00000000,00200000,00000000,00000000
      417: 00000000,00000000,00000000,00000000,00000000,00400000,00000000,00000000
      418: 00000000,00000000,00000000,00000000,00000000,00800000,00000000,00000000
      419: 00000000,00000000,00000000,00000000,00000000,01000000,00000000,00000000
      420: 00000000,00000000,00000000,00000000,00000000,02000000,00000000,00000000
      421: 00000000,00000000,00000000,00000000,00000000,04000000,00000000,00000000
      422: 00000000,00000000,00000000,00000000,00000000,08000000,00000000,00000000
      423: 00000000,00000000,00000000,00000000,00000000,10000000,00000000,00000000
      424: 00000000,00000000,00000000,00000000,00000000,20000000,00000000,00000000
      425: 00000000,00000000,00000000,00000000,00000000,40000000,00000000,00000000
      426: 00000000,00000000,00000000,00000000,00000000,80000000,00000000,00000000
      427: 00000000,00000001,00000000,00000000,00000000,00000000,00000000,00000000
      428: 00000000,00000002,00000000,00000000,00000000,00000000,00000000,00000000
      429: 00000000,00000004,00000000,00000000,00000000,00000000,00000000,00000000
      430: 00000000,00000008,00000000,00000000,00000000,00000000,00000000,00000000
      431: 00000000,00000010,00000000,00000000,00000000,00000000,00000000,00000000
      432: 00000000,00000020,00000000,00000000,00000000,00000000,00000000,00000000
      433: 00000000,00000040,00000000,00000000,00000000,00000000,00000000,00000000
      434: 00000000,00000080,00000000,00000000,00000000,00000000,00000000,00000000
      435: 00000000,00000100,00000000,00000000,00000000,00000000,00000000,00000000
      436: 00000000,00000200,00000000,00000000,00000000,00000000,00000000,00000000
      437: 00000000,00000400,00000000,00000000,00000000,00000000,00000000,00000000
      438: 00000000,00000800,00000000,00000000,00000000,00000000,00000000,00000000
      439: 00000000,00001000,00000000,00000000,00000000,00000000,00000000,00000000
      440: 00000000,00002000,00000000,00000000,00000000,00000000,00000000,00000000
      441: 00000000,00004000,00000000,00000000,00000000,00000000,00000000,00000000
      442: 00000000,00008000,00000000,00000000,00000000,00000000,00000000,00000000
      443: 00000000,00010000,00000000,00000000,00000000,00000000,00000000,00000000
      444: 00000000,00020000,00000000,00000000,00000000,00000000,00000000,00000000
      445: 00000000,00040000,00000000,00000000,00000000,00000000,00000000,00000000
      446: 00000000,00080000,00000000,00000000,00000000,00000000,00000000,00000000
      447: 00000000,00100000,00000000,00000000,00000000,00000000,00000000,00000000
      448: 00000000,00200000,00000000,00000000,00000000,00000000,00000000,00000000
      449: 00000000,00400000,00000000,00000000,00000000,00000000,00000000,00000000
      450: 00000000,00800000,00000000,00000000,00000000,00000000,00000000,00000000
      451: 00000000,01000000,00000000,00000000,00000000,00000000,00000000,00000000
      452: 00000000,02000000,00000000,00000000,00000000,00000000,00000000,00000000
      453: 00000000,04000000,00000000,00000000,00000000,00000000,00000000,00000000
      454: 00000000,08000000,00000000,00000000,00000000,00000000,00000000,00000000
      455: 00000000,10000000,00000000,00000000,00000000,00000000,00000000,00000000
      456: 00000000,20000000,00000000,00000000,00000000,00000000,00000000,00000000
      457: 00000000,40000000,00000000,00000000,00000000,00000000,00000000,00000000
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      [Tweaked API use]
      Suggested-by: default avatarYury Norov <yury.norov@gmail.com>
      Signed-off-by: default avatarValentin Schneider <vschneid@redhat.com>
      Reviewed-by: default avatarYury Norov <yury.norov@gmail.com>
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2acda577
    • Valentin Schneider's avatar
      sched/topology: Introduce for_each_numa_hop_mask() · 06ac0172
      Valentin Schneider authored
      The recently introduced sched_numa_hop_mask() exposes cpumasks of CPUs
      reachable within a given distance budget, wrap the logic for iterating over
      all (distance, mask) values inside an iterator macro.
      Signed-off-by: default avatarValentin Schneider <vschneid@redhat.com>
      Reviewed-by: default avatarYury Norov <yury.norov@gmail.com>
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      06ac0172
    • Valentin Schneider's avatar
      sched/topology: Introduce sched_numa_hop_mask() · 9feae658
      Valentin Schneider authored
      Tariq has pointed out that drivers allocating IRQ vectors would benefit
      from having smarter NUMA-awareness - cpumask_local_spread() only knows
      about the local node and everything outside is in the same bucket.
      
      sched_domains_numa_masks is pretty much what we want to hand out (a cpumask
      of CPUs reachable within a given distance budget), introduce
      sched_numa_hop_mask() to export those cpumasks.
      
      Link: http://lore.kernel.org/r/20220728191203.4055-1-tariqt@nvidia.comSigned-off-by: default avatarValentin Schneider <vschneid@redhat.com>
      Reviewed-by: default avatarYury Norov <yury.norov@gmail.com>
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9feae658
    • Yury Norov's avatar
      lib/cpumask: reorganize cpumask_local_spread() logic · b1beed72
      Yury Norov authored
      Now after moving all NUMA logic into sched_numa_find_nth_cpu(),
      else-branch of cpumask_local_spread() is just a function call, and
      we can simplify logic by using ternary operator.
      
      While here, replace BUG() with WARN_ON().
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Acked-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarPeter Lafreniere <peter@n8pjl.ca>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b1beed72
    • Yury Norov's avatar
      cpumask: improve on cpumask_local_spread() locality · 406d394a
      Yury Norov authored
      Switch cpumask_local_spread() to use newly added sched_numa_find_nth_cpu(),
      which takes into account distances to each node in the system.
      
      For the following NUMA configuration:
      
      root@debian:~# numactl -H
      available: 4 nodes (0-3)
      node 0 cpus: 0 1 2 3
      node 0 size: 3869 MB
      node 0 free: 3740 MB
      node 1 cpus: 4 5
      node 1 size: 1969 MB
      node 1 free: 1937 MB
      node 2 cpus: 6 7
      node 2 size: 1967 MB
      node 2 free: 1873 MB
      node 3 cpus: 8 9 10 11 12 13 14 15
      node 3 size: 7842 MB
      node 3 free: 7723 MB
      node distances:
      node   0   1   2   3
        0:  10  50  30  70
        1:  50  10  70  30
        2:  30  70  10  50
        3:  70  30  50  10
      
      The new cpumask_local_spread() traverses cpus for each node like this:
      
      node 0:   0   1   2   3   6   7   4   5   8   9  10  11  12  13  14  15
      node 1:   4   5   8   9  10  11  12  13  14  15   0   1   2   3   6   7
      node 2:   6   7   0   1   2   3   8   9  10  11  12  13  14  15   4   5
      node 3:   8   9  10  11  12  13  14  15   4   5   6   7   0   1   2   3
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Acked-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarPeter Lafreniere <peter@n8pjl.ca>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      406d394a
    • Yury Norov's avatar
      sched: add sched_numa_find_nth_cpu() · cd7f5535
      Yury Norov authored
      The function finds Nth set CPU in a given cpumask starting from a given
      node.
      
      Leveraging the fact that each hop in sched_domains_numa_masks includes the
      same or greater number of CPUs than the previous one, we can use binary
      search on hops instead of linear walk, which makes the overall complexity
      of O(log n) in terms of number of cpumask_weight() calls.
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Acked-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarPeter Lafreniere <peter@n8pjl.ca>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cd7f5535
    • Yury Norov's avatar
      cpumask: introduce cpumask_nth_and_andnot · 62f4386e
      Yury Norov authored
      Introduce cpumask_nth_and_andnot() based on find_nth_and_andnot_bit().
      It's used in the following patch to traverse cpumasks without storing
      intermediate result in temporary cpumask.
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Acked-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarPeter Lafreniere <peter@n8pjl.ca>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      62f4386e
    • Yury Norov's avatar
      lib/find: introduce find_nth_and_andnot_bit · 43245117
      Yury Norov authored
      In the following patches the function is used to implement in-place bitmaps
      traversing without storing intermediate result in temporary bitmaps.
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Acked-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarPeter Lafreniere <peter@n8pjl.ca>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      43245117
  2. 07 Feb, 2023 11 commits
  3. 06 Feb, 2023 12 commits