Commits · 248376b1b13f7300e94a9f8d97062d43dfa4a847 · Kirill Smelkov / linux

30 Sep, 2022 9 commits

net: dsa: hellcreek: refactor hellcreek_port_setup_tc() to use switch/case · 248376b1

Vladimir Oltean authored Sep 28, 2022

The following patch will need to make this function also respond to
TC_QUERY_BASE, so make the processing more structured around the
tc_setup_type.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

248376b1

net: dsa: felix: offload per-tc max SDU from tc-taprio · 1712be05

Vladimir Oltean authored Sep 28, 2022

Our current vsc9959_tas_guard_bands_update() algorithm has a limitation
imposed by the hardware design. To avoid packet overruns between one
gate interval and the next (which would add jitter for scheduled traffic
in the next gate), we configure the switch to use guard bands. These are
as large as the largest packet which is possible to be transmitted.

The problem is that at tc-taprio intervals of sizes comparable to a
guard band, there isn't an obvious place in which to split the interval
between the useful portion (for scheduling) and the guard band portion
(where scheduling is blocked).

For example, a 10 us interval at 1Gbps allows 1225 octets to be
transmitted. We currently split the interval between the bare minimum of
33 ns useful time (required to schedule a single packet) and the rest as
guard band.

But 33 ns of useful scheduling time will only allow a single packet to
be sent, be that packet 1200 octets in size, or 60 octets in size. It is
impossible to send 2 60 octets frames in the 10 us window. Except that
if we reduced the guard band (and therefore the maximum allowable SDU
size) to 5 us, the useful time for scheduling is now also 5 us, so more
packets could be scheduled.

The hardware inflexibility of not scheduling according to individual
packet lengths must unfortunately propagate to the user, who needs to
tune the queueMaxSDU values if he wants to fit more small packets into a
10 us interval, rather than one large packet.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

1712be05

net/sched: taprio: allow user input of per-tc max SDU · a54fc09e

Vladimir Oltean authored Sep 28, 2022

IEEE 802.1Q clause 12.29.1.1 "The queueMaxSDUTable structure and data
types" and 8.6.8.4 "Enhancements for scheduled traffic" talk about the
existence of a per traffic class limitation of maximum frame sizes, with
a fallback on the port-based MTU.

As far as I am able to understand, the 802.1Q Service Data Unit (SDU)
represents the MAC Service Data Unit (MSDU, i.e. L2 payload), excluding
any number of prepended VLAN headers which may be otherwise present in
the MSDU. Therefore, the queueMaxSDU is directly comparable to the
device MTU (1500 means L2 payload sizes are accepted, or frame sizes of
1518 octets, or 1522 plus one VLAN header). Drivers which offload this
are directly responsible of translating into other units of measurement.

To keep the fast path checks optimized, we keep 2 arrays in the qdisc,
one for max_sdu translated into frame length (so that it's comparable to
skb->len), and another for offloading and for dumping back to the user.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

a54fc09e

net/sched: query offload capabilities through ndo_setup_tc() · aac4daa8

Vladimir Oltean authored Sep 28, 2022

When adding optional new features to Qdisc offloads, existing drivers
must reject the new configuration until they are coded up to act on it.

Since modifying all drivers in lockstep with the changes in the Qdisc
can create problems of its own, it would be nice if there existed an
automatic opt-in mechanism for offloading optional features.

Jakub proposes that we multiplex one more kind of call through
ndo_setup_tc(): one where the driver populates a Qdisc-specific
capability structure.

First user will be taprio in further changes. Here we are introducing
the definitions for the base functionality.

Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220923163310.3192733-3-vladimir.oltean@nxp.com/Suggested-by: Jakub Kicinski <kuba@kernel.org>
Co-developed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

aac4daa8

net/tipc: Remove unused struct distr_queue_item · 8af535b6

Yuan Can authored Sep 28, 2022

After commit 09b5678c("tipc: remove dead code in tipc_net and relatives"),
struct distr_queue_item is not used any more and can be removed as well.
Signed-off-by: Yuan Can <yuancan@huawei.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Link: https://lore.kernel.org/r/20220928085636.71749-1-yuancan@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8af535b6

net: skb: introduce and use a single page frag cache · dbae2b06

Paolo Abeni authored Sep 28, 2022

After commit 3226b158 ("net: avoid 32 x truesize under-estimation
for tiny skbs") we are observing 10-20% regressions in performance
tests with small packets. The perf trace points to high pressure on
the slab allocator.

This change tries to improve the allocation schema for small packets
using an idea originally suggested by Eric: a new per CPU page frag is
introduced and used in __napi_alloc_skb to cope with small allocation
requests.

To ensure that the above does not lead to excessive truesize
underestimation, the frag size for small allocation is inflated to 1K
and all the above is restricted to build with 4K page size.

Note that we need to update accordingly the run-time check introduced
with commit fd9ea57f ("net: add napi_get_frags_check() helper").

Alex suggested a smart page refcount schema to reduce the number
of atomic operations and deal properly with pfmemalloc pages.

Under small packet UDP flood, I measure a 15% peak tput increases.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Suggested-by: Alexander H Duyck <alexanderduyck@fb.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/6b6f65957c59f86a353fc09a5127e83a32ab5999.1664350652.git.pabeni@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

dbae2b06

net: sched: cls_u32: Avoid memcpy() false-positive warning · 7cba1833

Kees Cook authored Sep 27, 2022

To work around a misbehavior of the compiler's ability to see into
composite flexible array structs (as detailed in the coming memcpy()
hardening series[1]), use unsafe_memcpy(), as the sizing,
bounds-checking, and allocation are all very tightly coupled here.
This silences the false-positive reported by syzbot:

  memcpy: detected field-spanning write (size 80) of single field "&n->sel" at net/sched/cls_u32.c:1043 (size 16)

[1] https://lore.kernel.org/linux-hardening/20220901065914.1417829-2-keescook@chromium.org

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Reported-by: syzbot+a2c4601efc75848ba321@syzkaller.appspotmail.com
Link: https://lore.kernel.org/lkml/000000000000a96c0b05e97f0444@google.com/Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20220927153700.3071688-1-keescook@chromium.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7cba1833

dt-bindings: net: snps,dwmac: Document stmmac-axi-config subnode · 5361660a

Marek Vasut authored Sep 27, 2022

The stmmac-axi-config subnode is present in multiple dwmac instance DTs,
document its content per snps,axi-config property description which is
a phandle to this subnode.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220927012449.698915-1-marex@denx.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5361660a

docs: netlink: clarify the historical baggage of Netlink flags · 5493a2ad

Jakub Kicinski authored Sep 27, 2022

nlmsg_flags are full of historical baggage, inconsistencies and
strangeness. Try to document it more thoroughly. Explain the meaning
of the ECHO flag (and while at it clarify the comment in the uAPI).
Handwave a little about the NEW request flags and how they make
sense on the surface but cater to really old paradigm before commands
were a thing.

I will add more notes on how to make use of ECHO and discouragement
for reuse of flags to the kernel-side documentation.

Link: https://lore.kernel.org/r/20220927212306.823862-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5493a2ad

29 Sep, 2022 31 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · accc3b4a
Jakub Kicinski authored Sep 29, 2022
```
No conflicts.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
```
accc3b4a

Merge tag 'net-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 511cce16

Linus Torvalds authored Sep 29, 2022

Pull networking fixes from Paolo Abeni:
 "Including fixes from wifi and can.

  Current release - regressions:

   - phy: don't WARN for PHY_UP state in mdio_bus_phy_resume()

   - wifi: fix locking in mac80211 mlme

   - eth:
      - revert "net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()"
      - mlxbf_gige: fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe

  Previous releases - regressions:

   - wifi: fix regression with non-QoS drivers

  Previous releases - always broken:

   - mptcp: fix unreleased socket in accept queue

   - wifi:
      - don't start TX with fq->lock to fix deadlock
      - fix memory corruption in minstrel_ht_update_rates()

   - eth:
      - macb: fix ZynqMP SGMII non-wakeup source resume failure
      - mt7531: only do PLL once after the reset
      - usbnet: fix memory leak in usbnet_disconnect()

  Misc:

   - usb: qmi_wwan: add new usb-id for Dell branded EM7455"

* tag 'net-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (30 commits)
  mptcp: fix unreleased socket in accept queue
  mptcp: factor out __mptcp_close() without socket lock
  net: ethernet: mtk_eth_soc: fix mask of RX_DMA_GET_SPORT{,_V2}
  net: mscc: ocelot: fix tagged VLAN refusal while under a VLAN-unaware bridge
  can: c_can: don't cache TX messages for C_CAN cores
  ice: xsk: drop power of 2 ring size restriction for AF_XDP
  ice: xsk: change batched Tx descriptor cleaning
  net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455
  selftests: Fix the if conditions of in test_extra_filter()
  net: phy: Don't WARN for PHY_UP state in mdio_bus_phy_resume()
  net: stmmac: power up/down serdes in stmmac_open/release
  wifi: mac80211: mlme: Fix double unlock on assoc success handling
  wifi: mac80211: mlme: Fix missing unlock on beacon RX
  wifi: mac80211: fix memory corruption in minstrel_ht_update_rates()
  wifi: mac80211: fix regression with non-QoS drivers
  wifi: mac80211: ensure vif queues are operational after start
  wifi: mac80211: don't start TX with fq->lock to fix deadlock
  wifi: cfg80211: fix MCS divisor value
  net: hippi: Add missing pci_disable_device() in rr_init_one()
  net/mlxbf_gige: Fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe
  ...

511cce16

Merge tag 'input-for-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · da9eede6

Linus Torvalds authored Sep 29, 2022

Pull input fixes from Dmitry Torokhov:

 - small fixes for iqs62x-keys and melfas_mip4 drivers

 - corrected register address in snvs_pwrkey driver

 - Synaptic driver will stop trying to use intertouch (native) mode on
   some Lenovo AMD devices

* tag 'input-for-v6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: snvs_pwrkey - fix SNVS_HPVIDR1 register address
  Input: synaptics - disable Intertouch for Lenovo T14 and P14s AMD G1
  Input: iqs62x-keys - drop unused device node references
  Input: melfas_mip4 - fix return value check in mip4_probe()

da9eede6

Merge tag 'ata-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · 71f18757

Linus Torvalds authored Sep 29, 2022

Pull ATA fixes from Damien Le Moal:
 "Three late patches to fix problems discovered recently:

   - Add a horkage to disable link power management by default for the
     Pioneer BDR-207M and BDR-205 DVD drives (from Niklas)

   - Two patches to fix setting the maximum queue depth of libsas owned
     ATA devices (from me)"

* tag 'ata-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: libata-sata: Fix device queue depth control
  ata: libata-scsi: Fix initialization of device queue depth
  libata: add ATA_HORKAGE_NOLPM for Pioneer BDR-207M and BDR-205

71f18757

Merge tag 'loongarch-fixes-6.0-3' of... · 81bcd4b5

Linus Torvalds authored Sep 29, 2022

Merge tag 'loongarch-fixes-6.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
 "Some trivial fixes and cleanup"

* tag 'loongarch-fixes-6.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: Clean up loongson3_smp_ops declaration
  LoongArch: Fix and cleanup csr_era handling in do_ri()
  LoongArch: Align the address of kernel_entry to 4KB

81bcd4b5

net: cpmac: Add __init/__exit annotations to module init/exit funcs · 510bbf82

ruanjinjie authored Sep 28, 2022

Add __init/__exit annotations to module init/exit funcs
Signed-off-by: ruanjinjie <ruanjinjie@huawei.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220928031708.89120-1-ruanjinjie@huawei.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

510bbf82

ethernet: 8390: remove unnecessary check of mem · 0e9804cf

Yang Yingliang authored Sep 27, 2022

The 'mem' returned by platform_get_resource() has been checked in probe
function, so it is no need do this check in remove function.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220927151406.797800-1-yangyingliang@huawei.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

0e9804cf

nfp: Use skb_put_data() instead of skb_put/memcpy pair · d49e265b

Shang XiaoJing authored Sep 27, 2022

Use skb_put_data() instead of skb_put() and memcpy(), which is clear.
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Link: https://lore.kernel.org/r/20220927141835.19221-1-shangxiaojing@huawei.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

d49e265b

net: liquidio: Remove unused struct lio_trusted_vf_ctx · 01c617d7

Yuan Can authored Sep 27, 2022

After commit 6870957e("liquidio: make soft command calls synchronous"), no
one use struct lio_trusted_vf_ctx, so remove it.
Signed-off-by: Yuan Can <yuancan@huawei.com>
Link: https://lore.kernel.org/r/20220927133940.104181-1-yuancan@huawei.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

01c617d7

net: ethernet: mtk_eth_soc: use DEFINE_SHOW_ATTRIBUTE to simplify code · 1a0c667e

Liu Shixin authored Sep 27, 2022

Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code.
No functional change.
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Link: https://lore.kernel.org/r/20220927111925.2424100-1-liushixin2@huawei.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

1a0c667e

wwan_hwsim: Use skb_put_data() instead of skb_put/memcpy pair · 6db239f0

Shang XiaoJing authored Sep 27, 2022

Use skb_put_data() instead of skb_put() and memcpy(), which is clear.
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Link: https://lore.kernel.org/r/20220927024511.14665-1-shangxiaojing@huawei.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

6db239f0

net: ax88796c: Use skb_put_data() instead of skb_put/memcpy pair · 85e69a7d

Shang XiaoJing authored Sep 27, 2022

Use skb_put_data() instead of skb_put() and memcpy(), which is clear.
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Link: https://lore.kernel.org/r/20220927023043.17769-1-shangxiaojing@huawei.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

85e69a7d

ethernet: s2io: Use skb_put_data() instead of skb_put/memcpy pair · 1469327b

Shang XiaoJing authored Sep 27, 2022

Use skb_put_data() instead of skb_put() and memcpy(), which is shorter
and clear. Drop the tmp variable that is not needed any more.
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Link: https://lore.kernel.org/r/20220927022802.16050-1-shangxiaojing@huawei.comSigned-off-by: Paolo Abeni <pabeni@redhat.com>

1469327b

Merge tag 'mlx5-updates-2022-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · ceed40d7

Jakub Kicinski authored Sep 28, 2022

Saeed Mahameed says:

====================
mlx5-updates-2022-09-27

This is Part #1 of 4 parts series to align mlx5's implementation of
XSK (AF_XDP) RX-Qs indexing and management with other vendors:

Maxim Says:
===========

xsk: Bug fixes for frame mapping on striding RQ

Striding RQ relies on the driver mapping RX buffers into the NIC's
virtual memory space. Currently, regadless of the XSK frame size, mlx5e
maps them using MTT, and each mapping's length is PAGE_SIZE. As the
result, the stride size used by striding RQ is also equal to PAGE_SIZE.

This decision has the following issues:

1. In the XSK aligned mode with frame size smaller than PAGE_SIZE, it's
suboptimal. Using 2K strides and 2K pages allows to post twice as fewer
WQEs.

2. MTT is not suitable for unaligned frames, as it requires natural
alignment theoretically, in practice at least 8-byte alignment.

3. Using mapping and stride bigger than the frame has risk of writing
over the bounds of the XSK frame upon receiving packets bigger than MTU,
which is possible in some specific configurations.

This series addresses issues 1 and 2 and alleviates issue 3. Where
possible, page and stride size will match the XSK frame size (firmware
upgrade may be needed to have effect for 2K frames). Unaligned mode will
use KSM instead of MTT, which allows to drop the partial workaround [1].

[1]: https://lore.kernel.org/netdev/YufYFQ6JN91lQbso@boxer/T/
====================

Link: https://lore.kernel.org/r/20220927203611.244301-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ceed40d7

net/mlx5e: Use runtime values of striding RQ parameters in datapath · 997ce6af

Maxim Mikityanskiy authored Sep 27, 2022

Some of the parameters of striding RQ are compile-time constants, but
they are going to become dynamically calculated at runtime in a
following commit. This commit prepares the datapath to take cached
runtime parameters, prefilled at queue creation.

New fields added to struct mlx5e_rq fit into an existing 7-byte hole.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

997ce6af

net/mlx5e: Make dma_info array dynamic in struct mlx5e_mpw_info · 258e655c

Maxim Mikityanskiy authored Sep 27, 2022

This commit moves the dma_info array to the end of struct mlx5e_mpw_info
to make it a flexible array. It also removes the intermediate struct
mlx5e_umr_dma_info, which used to contain only this array. The
flexibility of dma_info will allow to choose its size dynamically in a
following commit.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

258e655c

net/mlx5e: Improve the MTU change shortcut · 3904d2af

Maxim Mikityanskiy authored Sep 27, 2022

Normally, the MTU change requires reopening the channels, but it can be
skipped if the new MTU doesn't change any of the queue parameters and if
MTU is not used in the data path.

The shortcut is applicable to the non-linear mode of striding RQ,
because the only thing affected by MTU is the queue length. As ethtool
sets the queue length in packets, but striding RQ length is defined in
strides or bytes, we estimate the RQ length to be at least as big as the
requested number of MTU-sized packets, that's why it depends on MTU.

Improve the shortcut by actually checking whether the RQ length stayed
the same, instead of an intermediate step in the calculation.

As MTU also affects the SHAMPO parameters, skip the shortcut if SHAMPO
is in use.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

3904d2af

net/mlx5e: xsk: Fix SKB headroom calculation in validation · 411295fb

Maxim Mikityanskiy authored Sep 27, 2022

In a typical scenario, if an XSK socket is opened first, then an XDP
program is attached, mlx5e_validate_xsk_param will be called twice:
first on XSK bind, second on channel restart caused by enabling XDP. The
validation includes a call to mlx5e_rx_is_linear_skb, which checks the
presence of the XDP program.

The above means that mlx5e_rx_is_linear_skb might return true the first
time, but false the second time, as mlx5e_rx_get_linear_sz_skb's return
value will increase, because of a different headroom used with XDP.

As XSK RQs never exist without XDP, it would make sense to trick
mlx5e_rx_get_linear_sz_skb into thinking XDP is enabled at the first
check as well. This way, if MTU is too big, it would be detected on XSK
bind, without giving false hope to the userspace application.

However, it turns out that this check is too restrictive in the first
place. SKBs created on XDP_PASS on XSK RQs don't have any headroom. That
means that big MTUs filtered out on the first and the second checks
might actually work.

So, address this issue in the proper way, but taking into account the
absence of the SKB headroom on XSK RQs, when calculating the buffer
size.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

411295fb

net/mlx5e: xsk: Remove dead code in validation · 8c654a1b

Maxim Mikityanskiy authored Sep 27, 2022

One of the checks in mlx5e_rx_is_linear_skb verifies that the RX buffer
fits into the XSK frame size. Remove the duplicating check from
mlx5e_validate_xsk_param. It allows to make mlx5e_rx_get_min_frag_sz
static.

Remove mlx5e_rx_is_xdp altogether, as its only usage is located in a
branch where xsk == NULL.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

8c654a1b

net/mlx5e: Simplify stride size calculation for linear RQ · ddbef365

Maxim Mikityanskiy authored Sep 27, 2022

Linear RX buffers must be big enough to fit the MTU-sized packet along
with the headroom. On the other hand, they must be small enough to fit
into a page (or into an XSK frame). A straightforward way to check
whether the linear mode is possible would be comparing the required
buffer size to PAGE_SIZE or XSK frame size.

Stride size in the linear mode is defined by the following constraints:

1. A stride is at least as big as the buffer size, and it's a power of
two.

2. If non-XSK XDP is enabled, the stride size is PAGE_SIZE, because
mlx5e requires each packet to be in its own page when XDP is in use. The
previous constraint is automatically fulfilled, because buffer size
can't be bigger than PAGE_SIZE.

3. XSK uses stride size equal to PAGE_SIZE, but the following commits
will allow it to use roundup_pow_of_two(XSK frame size), by allowing the
NIC's MMU to use page sizes not equal to the CPU page size.

This commit puts the above requirements and constraints straight to the
code in an attempt to simplify it and to prepare it for changes made in
the next patches.

For the reference, the old code uses an equivalent, but trickier
calculation (high-level simplified pseudocode):

    if XDP or XSK:
        mlx5e_rx_get_linear_frag_sz := max(buffer size, PAGE_SIZE)
    else:
        mlx5e_rx_get_linear_frag_sz := buffer size
    mlx5e_rx_is_linear_skb := mlx5e_rx_get_linear_frag_sz <= PAGE_SIZE
    stride size := roundup_pow_of_two(mlx5e_rx_get_linear_frag_sz)

The new code effectively removes mlx5e_rx_get_linear_frag_sz that used
to return either buffer size or stride size, depending on the situation,
making it hard to work with and to make changes:

    if XDP or XSK:
        mlx5e_rx_get_linear_stride_sz := PAGE_SIZE
    else
        mlx5e_rx_get_linear_stride_sz := roundup_pow_of_two(buffer size)
    mlx5e_rx_is_linear_skb := buffer size <= (PAGE_SIZE or XSK frame sz)
    stride size := mlx5e_rx_get_linear_stride_sz
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ddbef365

net/mlx5e: kTLS, Check ICOSQ WQE size in advance · 4c78782e

Maxim Mikityanskiy authored Sep 27, 2022

Instead of WARNing in runtime when TLS offload WQEs posted to ICOSQ are
over the hardware limit, check their size before enabling TLS RX
offload, and block the offload if the condition fails. It also allows to
drop a u16 field from struct mlx5e_icosq.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

4c78782e

net/mlx5e: Use the aligned max TX MPWQE size · 21a0502d

Maxim Mikityanskiy authored Sep 27, 2022

TX MPWQE size is limited to the cacheline-aligned maximum. Use the same
value for the stop room and the capability check.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

21a0502d

net/mlx5e: Fix a typo in mlx5e_xdp_mpwqe_is_full · e3c4c496

Maxim Mikityanskiy authored Sep 27, 2022

Fix a typo in the function name: mpqwe -> mpwqe (stands for multi-packet
work queue element).
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e3c4c496

net/mlx5e: Use mlx5e_stop_room_for_max_wqe where appropriate · 527918e9

Maxim Mikityanskiy authored Sep 27, 2022

mlx5e_alloc_xdpsq calculates sq->stop_room internally, but there is
already a function for that: mlx5e_stop_room_for_max_wqe. This commit
makes use of this function.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

527918e9

net/mlx5e: Let mlx5e_get_sw_max_sq_mpw_wqebbs accept mdev · ed5c92ff

Maxim Mikityanskiy authored Sep 27, 2022

To shorten and simplify code, let mlx5e_get_sw_max_sq_mpw_wqebbs accept
mdev and derive max SQ WQEBBs from it. Also rename the function to a
more generic name mlx5e_get_max_sq_aligned_wqebbs, because the following
patches will use it in non-MPWQE contexts.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ed5c92ff

net/mlx5e: Validate striding RQ before enabling XDP · 44f4fd03

Maxim Mikityanskiy authored Sep 27, 2022

Currently, the driver can silently fall back to legacy RQ after enabling
XDP, even if striding RQ was active before. It happens when PAGE_SIZE is
bigger than the maximum supported stride size. This commit changes this
behavior to more straightforward: if an operation (enabling XDP) doesn't
support the current parameters (striding RQ mode), it fails.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

44f4fd03

net/mlx5e: Make mlx5e_verify_rx_mpwqe_strides static · 7e49abb1

Maxim Mikityanskiy authored Sep 27, 2022

mlx5e_verify_rx_mpwqe_strides is only used in en/params.c, so it can be
made static and removed from en/params.h.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

7e49abb1

net/mlx5e: Remove unused fields from datapath structs · 665f29de

Maxim Mikityanskiy authored Sep 27, 2022

No need to keep max_sq_wqebbs in mlx5e_txqsq and mlx5e_xdpsq, as it's
only used when allocating the queues. Removing an extra field reduces
the struct size.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

665f29de

net/mlx5e: Convert mlx5e_get_max_sq_wqebbs to u8 · f060ccc2

Maxim Mikityanskiy authored Sep 27, 2022

The return value of mlx5e_get_max_sq_wqebbs is clamped down to
MLX5_SEND_WQE_MAX_WQEBBS = 16, which fits into u8. This commit changes
the return type of this function to u8 for stricter type safety.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

f060ccc2

net/mlx5: Add the log_min_mkey_entity_size capability · 40b72108

Maxim Mikityanskiy authored Sep 27, 2022

Add the capability that will allow the driver to determine the minimal
MTT page size to be able to map the smallest possible pages in XSK. The
older firmwares that don't have this capability default to 12 (i.e.
4096-byte pages).
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

40b72108

Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux · 0d5bfebf

Jakub Kicinski authored Sep 28, 2022

Saeed Mahameed says:

====================
updates from mlx5-next 2022-09-24

Updates form mlx5-next including[1]:

1) HW definitions and support for NPPS clock settings.

2) various cleanups

3) Enable hash mode by default for all NICs

4) page tracker and advanced virtualization HW definitions for vfio

[1] https://lore.kernel.org/netdev/20220907233636.388475-1-saeed@kernel.org/

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Remove from FPGA IFC file not-needed definitions
  net/mlx5: Remove unused structs
  net/mlx5: Remove unused functions
  net/mlx5: detect and enable bypass port select flow table
  net/mlx5: Lag, enable hash mode by default for all NICs
  net/mlx5: Lag, set active ports if support bypass port select flow table
  RDMA/mlx5: Don't set tx affinity when lag is in hash mode
  net/mlx5: add IFC bits for bypassing port select flow table
  net/mlx5: Add support for NPPS with real time mode
  net/mlx5: Expose NPPS related registers
  net/mlx5: Query ADV_VIRTUALIZATION capabilities
  net/mlx5: Introduce ifc bits for page tracker
  RDMA/mlx5: Move function mlx5_core_query_ib_ppcnt() to mlx5_ib
====================

Link: https://lore.kernel.org/all/20220927201906.234015-1-saeed@kernel.org/Signed-off-by: Jakub Kicinski <kuba@kernel.org>

0d5bfebf