Commits · 99c861b44eb1fb9dfe8776854116a6a9064c19bb · Kirill Smelkov / linux

15 Jul, 2024 16 commits

virtio_net: xsk: rx: support recv merge mode · 99c861b4

Xuan Zhuo authored Jul 08, 2024

Support AF-XDP for merge mode.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20240708112537.96291-11-xuanzhuo@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

99c861b4

virtio_net: xsk: rx: support recv small mode · a4e7ba70

Xuan Zhuo authored Jul 08, 2024

In the process:
1. We may need to copy data to create skb for XDP_PASS.
2. We may need to call xsk_buff_free() to release the buffer.
3. The handle for xdp_buff is difference from the buffer.

If we pushed this logic into existing receive handle(merge and small),
we would have to maintain code scattered inside merge and small (and big).
So I think it is a good choice for us to put the xsk code into an
independent function.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20240708112537.96291-10-xuanzhuo@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a4e7ba70

virtio_net: xsk: rx: support fill with xsk buffer · e9f39624

Xuan Zhuo authored Jul 08, 2024

Implement the logic of filling rq with XSK buffers.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20240708112537.96291-9-xuanzhuo@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e9f39624

virtio_net: xsk: support wakeup · 19a5a771

Xuan Zhuo authored Jul 08, 2024

xsk wakeup is used to trigger the logic for xsk xmit by xsk framework or
user.

Virtio-net does not support to actively generate an interruption, so it
tries to trigger tx NAPI on the local cpu.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20240708112537.96291-8-xuanzhuo@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

19a5a771

virtio_net: xsk: bind/unbind xsk for rx · 09d2b318

Xuan Zhuo authored Jul 08, 2024

This patch implement the logic of bind/unbind xsk pool to rq.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20240708112537.96291-7-xuanzhuo@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

09d2b318

virtio_net: separate receive_mergeable · 5db48105

Xuan Zhuo authored Jul 08, 2024

This commit separates the function receive_mergeable(),
put the logic of appending frag to the skb as an independent function.
The subsequent commit will reuse it.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20240708112537.96291-6-xuanzhuo@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5db48105

virtio_net: separate receive_buf · c86c120f

Xuan Zhuo authored Jul 08, 2024

This commit separates the function receive_buf(), then we wrap the logic
of handling the skb to an independent function virtnet_receive_done().
The subsequent commit will reuse it.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20240708112537.96291-5-xuanzhuo@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c86c120f

virtio_net: separate virtnet_tx_resize() · 391aa2aa

Xuan Zhuo authored Jul 08, 2024

This patch separates two sub-functions from virtnet_tx_resize():

* virtnet_tx_pause
* virtnet_tx_resume

Then the subsequent virtnet_tx_reset() can share these two functions.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20240708112537.96291-4-xuanzhuo@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

391aa2aa

virtio_net: separate virtnet_rx_resize() · 47879b73

Xuan Zhuo authored Jul 08, 2024

This patch separates two sub-functions from virtnet_rx_resize():

* virtnet_rx_pause
* virtnet_rx_resume

Then the subsequent reset rx for xsk can share these two functions.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20240708112537.96291-3-xuanzhuo@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

47879b73

virtio_net: replace VIRTIO_XDP_HEADROOM by XDP_PACKET_HEADROOM · 41d4a174

Xuan Zhuo authored Jul 08, 2024

virtio net has VIRTIO_XDP_HEADROOM that is equal to
XDP_PACKET_HEADROOM to calculate the headroom for xdp.

But here we should use the macro XDP_PACKET_HEADROOM from bpf.h to
calculate the headroom for xdp. So here we remove the
VIRTIO_XDP_HEADROOM, and use the XDP_PACKET_HEADROOM to replace it.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20240708112537.96291-2-xuanzhuo@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

41d4a174

Merge branch 'eliminate-config_nr_cpus-dependency-in-dpaa-eth-and-enable-compile_test-in-fsl_qbman' · e6c29506

Jakub Kicinski authored Jul 14, 2024

Vladimir Oltean says:

====================
Eliminate CONFIG_NR_CPUS dependency in dpaa-eth and enable COMPILE_TEST in fsl_qbman

Breno's previous attempt at enabling COMPILE_TEST for the fsl_qbman
driver (now included here as patch 5/5) triggered compilation warnings
for large CONFIG_NR_CPUS values:
https://lore.kernel.org/all/202406261920.l5pzM1rj-lkp@intel.com/

Patch 1/5 switches two NR_CPUS arrays in the dpaa-eth driver to dynamic
allocation to avoid that warning. There is more NR_CPUS usage in the
fsl-qbman driver, but that looks relatively harmless and I couldn't find
a good reason to change it.

I noticed, while testing, that the driver doesn't actually work properly
with high CONFIG_NR_CPUS values, and patch 2/5 addresses that.

During code analysis, I have identified two places which treat
conditions that can never happen. Patches 3/5 and 4/5 simplify the
probing code - dpaa_fq_setup() - just a little bit.

Finally we have at 5/5 the patch that triggered all of this. There is
an okay from Herbert to take it via netdev, despite it being on soc/qbman:
https://lore.kernel.org/all/Zns%2FeVVBc7pdv0yM@gondor.apana.org.au/

Link to v1:
https://lore.kernel.org/netdev/20240710230025.46487-1-vladimir.oltean@nxp.com/
====================

Link: https://patch.msgid.link/20240713225336.1746343-1-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e6c29506

soc: fsl: qbman: FSL_DPAA depends on COMPILE_TEST · 782fe08e

Breno Leitao authored Jul 14, 2024

As most of the drivers that depend on ARCH_LAYERSCAPE, make FSL_DPAA
depend on COMPILE_TEST for compilation and testing.

	# grep -r depends.\*ARCH_LAYERSCAPE.\*COMPILE_TEST | wc -l
	29
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://patch.msgid.link/20240713225336.1746343-6-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

782fe08e

net: dpaa: no need to make sure all CPUs receive a corresponding Tx queue · 6d233820

Vladimir Oltean authored Jul 14, 2024

dpaa_fq_setup() iterates through the &priv->dpaa_fq_list elements
allocated by dpaa_alloc_all_fqs(). This includes a call to:

	if (!dpaa_fq_alloc(dev, 0, dpaa_max_num_txqs(), list, FQ_TYPE_TX))
		goto fq_alloc_failed;

which gives us dpaa_max_num_txqs() elements of FQ_TYPE_TX type.

The code block which we are deleting runs after an earlier iteration
through &priv->dpaa_fq_list. So at the end of this iteration (for which
there is no early break), egress_cnt will be unconditionally equal to
dpaa_max_num_txqs().

In other words, dpaa_alloc_all_fqs() has already allocated TX queues for
all possible CPUs and the maximal number of traffic classes, and we've
already iterated once through them all.

The while() condition is dead code, remove it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://patch.msgid.link/20240713225336.1746343-5-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6d233820

net: dpaa: stop ignoring TX queues past the number of CPUs · e3672a6d

Vladimir Oltean authored Jul 14, 2024

dpaa_fq_setup() iterates through the queues allocated by dpaa_alloc_all_fqs()
and saved in &priv->dpaa_fq_list.

The allocation for FQ_TYPE_TX looks as follows:

	if (!dpaa_fq_alloc(dev, 0, dpaa_max_num_txqs(), list, FQ_TYPE_TX))
		goto fq_alloc_failed;

Thus, iterating again through FQ_TYPE_TX queues in dpaa_fq_setup() and
counting them will never yield an egress_cnt larger than the allocated
size, dpaa_max_num_txqs().

The comparison serves no purpose since it is always true; remove it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://patch.msgid.link/20240713225336.1746343-4-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e3672a6d

net: dpaa: eliminate NR_CPUS dependency in egress_fqs[] and conf_fqs[] · e7072750

Vladimir Oltean authored Jul 14, 2024

The driver uses the DPAA_TC_TXQ_NUM and DPAA_ETH_TXQ_NUM macros for TX
queue handling, and they depend on CONFIG_NR_CPUS.

In generic .config files, these can go to very large (8096 CPUs) values
for the systems that DPAA1 is integrated in (1-24 CPUs). We allocate a
lot of resources that will never be used. Those are:
- system memory
- QMan FQIDs as managed by qman_alloc_fqid_range(). This is especially
  painful since currently, when booting with CONFIG_NR_CPUS=8096, a
  LS1046A-RDB system will only manage to probe 3 of its 6 interfaces.
  The rest will run out of FQD ("/reserved-memory/qman-fqd" in the
  device tree) and fail at the qman_create_fq() stage of the probing
  process.
- netdev queues as alloc_etherdev_mq() argument. The high queue indices
  are simply hidden from the network stack after the call to
  netif_set_real_num_tx_queues().

With just a tiny bit more effort, we can replace the NR_CPUS
compile-time constant with the num_possible_cpus() run-time constant,
and dynamically allocate the egress_fqs[] and conf_fqs[] arrays.
Even on a system with a high CONFIG_NR_CPUS, num_possible_cpus() will
remain equal to the number of available cores on the SoC.

The replacement is as follows:
- DPAA_TC_TXQ_NUM -> dpaa_num_txqs_per_tc()
- DPAA_ETH_TXQ_NUM -> dpaa_max_num_txqs()
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://patch.msgid.link/20240713225336.1746343-3-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e7072750

net: dpaa: avoid on-stack arrays of NR_CPUS elements · 555a05d8

Vladimir Oltean authored Jul 14, 2024

The dpaa-eth driver is written for PowerPC and Arm SoCs which have 1-24
CPUs. It depends on CONFIG_NR_CPUS having a reasonably small value in
Kconfig. Otherwise, there are 2 functions which allocate on-stack arrays
of NR_CPUS elements, and these can quickly explode in size, leading to
warnings such as:

  drivers/net/ethernet/freescale/dpaa/dpaa_eth.c:3280:12: warning:
  stack frame size (16664) exceeds limit (2048) in 'dpaa_eth_probe' [-Wframe-larger-than]

The problem is twofold:
- Reducing the array size to the boot-time num_possible_cpus() (rather
  than the compile-time NR_CPUS) creates a variable-length array,
  which should be avoided in the Linux kernel.
- Using NR_CPUS as an array size makes the driver blow up in stack
  consumption with generic, as opposed to hand-crafted, .config files.

A simple solution is to use dynamic allocation for num_possible_cpus()
elements (aka a small number determined at runtime).

Link: https://lore.kernel.org/all/202406261920.l5pzM1rj-lkp@intel.com/Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Link: https://patch.msgid.link/20240713225336.1746343-2-vladimir.oltean@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

555a05d8

14 Jul, 2024 5 commits

Merge tag 'ipsec-next-2024-07-13' of... · 62fdd170

Jakub Kicinski authored Jul 14, 2024

Merge tag 'ipsec-next-2024-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2024-07-13

1) Support sending NAT keepalives in ESP in UDP states.
   Userspace IKE daemon had to do this before, but the
   kernel can better keep track of it.
   From Eyal Birger.

2) Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated
   ESP data paths. Currently, IPsec crypto offload is enabled for GRO
   code path only. This patchset support UDP encapsulation for the non
   GRO path. From Mike Yu.

* tag 'ipsec-next-2024-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: Support crypto offload for outbound IPv4 UDP-encapsulated ESP packet
  xfrm: Support crypto offload for inbound IPv4 UDP-encapsulated ESP packet
  xfrm: Allow UDP encapsulation in crypto offload control path
  xfrm: Support crypto offload for inbound IPv6 ESP packets not in GRO path
  xfrm: support sending NAT keepalives in ESP in UDP states
====================

Link: https://patch.msgid.link/20240713102416.3272997-1-steffen.klassert@secunet.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

62fdd170

Merge branch 'introduce-en7581-ethernet-support' · ecb1e1dc

Jakub Kicinski authored Jul 14, 2024

Lorenzo Bianconi says:

====================
Introduce EN7581 ethernet support

Add airoha_eth driver in order to introduce ethernet support for
Airoha EN7581 SoC available on EN7581 development board.
EN7581 mac controller is mainly composed by Frame Engine (FE) and
QoS-DMA (QDMA) modules. FE is used for traffic offloading (just basic
functionalities are supported now) while QDMA is used for DMA operation
and QOS functionalities between mac layer and the dsa switch (hw QoS is
not available yet and it will be added in the future).
Currently only hw lan features are available, hw wan will be added with
subsequent patches.
====================

Link: https://patch.msgid.link/cover.1720818878.git.lorenzo@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ecb1e1dc

net: airoha: Introduce ethernet support for EN7581 SoC · 23020f04

Lorenzo Bianconi authored Jul 12, 2024

Add airoha_eth driver in order to introduce ethernet support for
Airoha EN7581 SoC available on EN7581 development board (en7581-evb).
EN7581 mac controller is mainly composed by the Frame Engine (PSE+PPE)
and QoS-DMA (QDMA) modules. FE is used for traffic offloading (just
basic functionalities are currently supported) while QDMA is used for
DMA operations and QOS functionalities between the mac layer and the
external modules conncted to the FE GDM ports (e.g MT7530 DSA switch
or external phys).
A general overview of airoha_eth architecture is reported below:

               ┌───────┐                                     ┌───────┐
               │ QDMA2 │                                     │ QDMA1 │
               └───┬───┘                                     └───┬───┘
                   │                                             │
           ┌───────▼─────────────────────────────────────────────▼────────┐
           │                                                              │
           │       P5                                            P0       │
           │                                                              │
           │                                                              │
           │                                                              │    ┌──────┐
           │                                                           P3 ├────► GDM3 │
           │                                                              │    └──────┘
           │                                                              │
           │                                                              │
┌─────┐    │                                                              │
│ PPE ◄────┤ P4                          PSE                              │
└─────┘    │                                                              │
           │                                                              │
           │                                                              │
           │                                                              │    ┌──────┐
           │                                                           P9 ├────► GDM4 │
           │                                                              │    └──────┘
           │                                                              │
           │                                                              │
           │                                                              │
           │        P2                                           P1       │
           └─────────┬───────────────────────────────────────────┬────────┘
                     │                                           │
                 ┌───▼──┐                                     ┌──▼───┐
                 │ GDM2 │                                     │ GDM1 │
                 └──────┘                                     └──┬───┘
                                                                 │
                                                            ┌────▼─────┐
                                                            │  MT7530  │
                                                            └──────────┘

Currently only hw LAN features (QDMA1+GDM1) are available while hw WAN
(QDMA2+GDM{2,3,4}) ones will be added with subsequent patches introducing
traffic offloading support.
Tested-by: Benjamin Larsson <benjamin.larsson@genexis.eu>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/274945d2391c195098ab180a46d0617b18b9e42c.1720818878.git.lorenzo@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

23020f04

dt-bindings: net: airoha: Add EN7581 ethernet controller · 6bc8719c

Lorenzo Bianconi authored Jul 12, 2024

Introduce device-tree binding documentation for Airoha EN7581 ethernet
mac controller.
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/7dfecf8aa4e6519562a94455b95c49e1b3c858a0.1720818878.git.lorenzo@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6bc8719c

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · 861f34e6

Jakub Kicinski authored Jul 14, 2024

Tony Nguyen says:

====================
ice: Switch API optimizations

Marcin Szycik says:

Optimize the process of creating a recipe in the switch block by removing
duplicate switch ID words and changing how result indexes are fitted into
recipes. In many cases this can decrease the number of recipes required to
add a certain set of rules, potentially allowing a more varied set of rules
to be created. Total rule count will also increase, since less words will
be left unused/wasted. There are only 64 rules available in total, so every
one counts.

After this modification, many fields and some structs became unused or were
simplified, resulting in overall simpler implementation.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: Add tracepoint for adding and removing switch rules
  ice: Remove unused members from switch API
  ice: Optimize switch recipe creation
  ice: remove unused recipe bookkeeping data
  ice: Simplify bitmap setting in adding recipe
  ice: Remove reading all recipes before adding a new one
  ice: Remove unused struct ice_prot_lkup_ext members
====================

Link: https://patch.msgid.link/20240711181312.2019606-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

861f34e6

13 Jul, 2024 19 commits

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · 852e42cc

Jakub Kicinski authored Jul 13, 2024

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-07-11 (net/intel)

This series contains updates to most Intel network drivers.

Tony removes MODULE_AUTHOR from drivers containing the entry.

Simon Horman corrects a kdoc entry for i40e.

Pawel adds implementation for devlink param "local_forwarding" on ice.

Michal removes unneeded call, and code, for eswitch rebuild for ice.

Sasha removed a no longer used field from igc.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  igc: Remove the internal 'eee_advert' field
  ice: remove eswitch rebuild
  ice: Add support for devlink local_forwarding param
  i40e: correct i40e_addr_to_hkey() name in kdoc
  net: intel: Remove MODULE_AUTHORs
====================

Link: https://patch.msgid.link/20240711201932.2019925-1-anthony.l.nguyen@intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

852e42cc

sfc: falcon: Make I2C terminology more inclusive · ba88b478

Easwar Hariharan authored Jul 11, 2024

I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave"
with more appropriate terms. Inspired by Wolfram's series to fix drivers/i2c/,
fix the terminology for users of I2C_ALGOBIT bitbanging interface, now that
the approved verbiage exists in the specification.
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Link: https://patch.msgid.link/20240711052734.1273652-5-eahariha@linux.microsoft.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ba88b478

net: phy: dp83td510: add cable testing support · 0cda1acf

Oleksij Rempel authored Jul 12, 2024

This patch implements the TDR test procedure as described in
"Application Note DP83TD510E Cable Diagnostics Toolkit revC", section 3.2.

The procedure was tested with "draka 08 signalkabel 2x0.8mm". The reported
cable length was 5 meters more for each 20 meters of actual cable length.
For instance, a 20-meter cable showed as 25 meters, and a 40-meter cable
showed as 50 meters. Since other parts of the diagnostics provided by this
PHY (e.g., Active Link Cable Diagnostics) require accurate cable
characterization to provide proper results, this tuning can be implemented
in a separate patch/interface.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
changes v2:
- add comments
- change post silence time to 1000ms
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240712152848.2479912-1-o.rempel@pengutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0cda1acf

net: dpaa: Fix compilation Warning · e7cdef62

Breno Leitao authored Jul 12, 2024

Remove variables that are defined and incremented but never read.
This issue appeared in network tests[1] as:

	drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c:38:6: warning: variable 'i' set but not used [-Wunused-but-set-variable]
	38 |         int i = 0;
	   |             ^

Link: https://netdev.bots.linux.dev/static/nipa/870263/13729811/build_clang/stderr [1]
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20240712134817.913756-1-leitao@debian.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

e7cdef62

eth: mlx5: expose NETIF_F_NTUPLE when ARFS is compiled out · 3771266b

Jakub Kicinski authored Jul 11, 2024

ARFS depends on NTUPLE filters, but the inverse is not true.
Drivers which don't support ARFS commonly still support NTUPLE
filtering. mlx5 has a Kconfig option to disable ARFS (MLX5_EN_ARFS)
and does not advertise NTUPLE filters as a feature at all when ARFS
is compiled out. That's not correct, ntuple filters indeed still work
just fine (as long as MLX5_EN_RXNFC is enabled).

This is needed to make the RSS test not skip all RSS context
related testing.
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://patch.msgid.link/20240711223722.297676-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3771266b

selftests: mptcp: lib: fix shellcheck errors · 464b99e7

Matthieu Baerts (NGI0) authored Jul 12, 2024

It looks like we missed these two errors recently:

  - SC2068: Double quote array expansions to avoid re-splitting elements.
  - SC2145: Argument mixes string and array. Use * or separate argument.

Two simple fixes, it is not supposed to change the behaviour as the
variable names should not have any spaces in their names. Still, better
to fix them to easily spot new issues.

Fixes: f265d311 ("selftests: mptcp: lib: use setup/cleanup_ns helpers")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240712-upstream-net-next-20240712-selftests-mptcp-fix-shellcheck-v1-1-1cb7180db40a@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

464b99e7

Merge branch 'mlx5-misc-2023-07-08-sf-max-eq' · 22767eec

Jakub Kicinski authored Jul 13, 2024

Saeed Mahameed says:

====================
mlx5 misc 2023-07-08 (sf max eq)

Link: https://patchwork.kernel.org/project/netdevbpf/patch/20240708080025.1593555-2-tariqt@nvidia.com/
====================

Link: https://patch.msgid.link/20240712003310.355106-1-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

22767eec

net/mlx5: Use set number of max EQs · 4b66be76

Daniel Jurgens authored Jul 11, 2024

If a maximum number of EQs has been set for an SF, use that amount.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://patch.msgid.link/20240712003310.355106-5-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4b66be76

net/mlx5: Set default max eqs for SFs · 20d80b95

Daniel Jurgens authored Jul 11, 2024

If the user hasn't configured max_io_eqs set a low default. The SF
driver shouldn't try to create more than this, but FW will enforce this
limit.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://patch.msgid.link/20240712003310.355106-4-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

20d80b95

net/mlx5: Set sf_eq_usage for SF max EQs · 2ece6c72

Daniel Jurgens authored Jul 11, 2024

When setting max_io_eqs for an SF function also set the sf_eq_usage_cap.
This is to indicate to the SF driver from the PF that the user has set
the max io eqs via devlink. So the SF driver can later query the proper
max eq value from the new cap.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://patch.msgid.link/20240712003310.355106-3-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2ece6c72

net/mlx5: IFC updates for SF max IO EQs · 63c6e08e

Daniel Jurgens authored Jul 11, 2024

Expose a new cap sf_eq_usage. The vhca_resource_manager can write this
cap, indicating the SF driver should use max_num_eqs_24b to determine
how many EQs to use.

Will be used in the next patch, to indicate to the SF driver from the PF
that the user has set the max io eqs via devlink. So the SF driver can
later query the proper max eq value from the new cap.

devlink port function set pci/0000:08:00.0/32768 max_io_eqs 32
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://patch.msgid.link/20240712003310.355106-2-saeed@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

63c6e08e

net: mvpp2: Improve data types and use min() · f7023b3d

Thorsten Blum authored Jul 11, 2024

Change the data type of the variable freq in mvpp2_rx_time_coal_set()
and mvpp2_tx_time_coal_set() to u32 because port->priv->tclk also has
the data type u32.

Change the data type of the function parameter clk_hz in
mvpp2_usec_to_cycles() and mvpp2_cycles_to_usec() to u32 accordingly
and remove the following Coccinelle/coccicheck warning reported by
do_div.cocci:

  WARNING: do_div() does a 64-by-32 division, please consider using div64_ul instead

Use min() to simplify the code and improve its readability.

Compile-tested only.
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240711154741.174745-1-thorsten.blum@toblux.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f7023b3d

net: ethtool: Monotonically increase the message sequence number · 275a63c9

Danielle Ratson authored Jul 11, 2024

Currently, during the module firmware flashing process, unicast
notifications are sent from the kernel using the same sequence number,
making it impossible for user space to track missed notifications.

Monotonically increase the message sequence number, so the order of
notifications could be tracked effectively.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240711080934.2071869-1-danieller@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

275a63c9

Merge branch 'tcp-make-simultaneous-connect-rfc-compliant' · 646d2ac7

Jakub Kicinski authored Jul 13, 2024

Kuniyuki Iwashima says:

====================
tcp: Make simultaneous connect() RFC-compliant.

Patch 1 fixes an issue that BPF TCP option parser is triggered for ACK
instead of SYN+ACK in the case of simultaneous connect().

Patch 2 removes an wrong assumption in tcp_ao/self-connnect tests.

v2: https://lore.kernel.org/netdev/20240708180852.92919-1-kuniyu@amazon.com/
v1: https://lore.kernel.org/netdev/20240704035703.95065-1-kuniyu@amazon.com/
====================

Link: https://patch.msgid.link/20240710171246.87533-1-kuniyu@amazon.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

646d2ac7

selftests: tcp: Remove broken SNMP assumptions for TCP AO self-connect tests. · b3bb4d23

Kuniyuki Iwashima authored Jul 10, 2024

tcp_ao/self-connect.c checked the following SNMP stats before/after
connect() to confirm that the test exercises the simultaneous connect()
path.

  * TCPChallengeACK
  * TCPSYNChallenge

But the stats should not be counted for self-connect in the first place,
and the assumption is no longer true.

Let's remove the check.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Link: https://patch.msgid.link/20240710171246.87533-3-kuniyu@amazon.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b3bb4d23

tcp: Don't drop SYN+ACK for simultaneous connect(). · 23e89e8e

Kuniyuki Iwashima authored Jul 10, 2024

RFC 9293 states that in the case of simultaneous connect(), the connection
gets established when SYN+ACK is received. [0]

      TCP Peer A                                       TCP Peer B

  1.  CLOSED                                           CLOSED
  2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...
  3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT
  4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED
  5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
  6.  ESTABLISHED  <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
  7.               ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED

However, since commit 0c24604b ("tcp: implement RFC 5961 4.2"), such a
SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge
ACK.

For example, the write() syscall in the following packetdrill script fails
with -EAGAIN, and wrong SNMP stats get incremented.

   0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
  +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)

  +0 > S  0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8>
  +0 < S  0:0(0) win 1000 <mss 1000>
  +0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8>
  +0 < S. 0:0(0) ack 1 win 1000

  +0 write(3, ..., 100) = 100
  +0 > P. 1:101(100) ack 1

  --

  # packetdrill cross-synack.pkt
  cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable)
  # nstat
  ...
  TcpExtTCPChallengeACK           1                  0.0
  TcpExtTCPSYNChallenge           1                  0.0

The problem is that bpf_skops_established() is triggered by the Challenge
ACK instead of SYN+ACK.  This causes the bpf prog to miss the chance to
check if the peer supports a TCP option that is expected to be exchanged
in SYN and SYN+ACK.

Let's accept a bare SYN+ACK for active-open TCP_SYN_RECV sockets to avoid
such a situation.

Note that tcp_ack_snd_check() in tcp_rcv_state_process() is skipped not to
send an unnecessary ACK, but this could be a bit risky for net.git, so this
targets for net-next.

Link: https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 [0]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240710171246.87533-2-kuniyu@amazon.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

23e89e8e

test/vsock: add install target · 42ffe242

Peng Fan authored Jul 10, 2024

Add install target for vsock to make Yocto easy to install the images.
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20240710122728.45044-1-peng.fan@oss.nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

42ffe242

MAINTAINERS: add 5 missing tcp-related files · 8e5f53a6

Eric Dumazet authored Jul 12, 2024

Following files are part of TCP stack:

- net/ipv4/inet_connection_sock.c
- net/ipv4/inet_hashtables.c
- net/ipv4/inet_timewait_sock.c
- net/ipv6/inet6_connection_sock.c
- net/ipv6/inet6_hashtables.c
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240712234213.3178593-1-edumazet@google.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8e5f53a6

Merge branch 'Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated ESP data paths' · d5b60c65

Steffen Klassert authored Jul 13, 2024

Mike Yu says:

====================
Currently, IPsec crypto offload is enabled for GRO code path. However, there
are other code paths where the XFRM stack is involved; for example, IPv6 ESP
packets handled by xfrm6_esp_rcv() in ESP layer, and IPv4 UDP-encapsulated
ESP packets handled by udp_rcv() in UDP layer.

This patchset extends the crypto offload support to cover these two cases.
This is useful for devices with traffic accounting (e.g., Android), where GRO
can lead to inaccurate accounting on the underlying network. For example, VPN
traffic might not be counted on the wifi network interface wlan0 if the packets
are handled in GRO code path before entering the network stack for accounting.

Below is the RX data path scenario the crypto offload can be applied to.

  +-----------+   +-------+
  | HW Driver |-->| wlan0 |--------+
  +-----------+   +-------+        |
                                   v
                             +---------------+   +------+
                     +------>| Network Stack |-->| Apps |
                     |       +---------------+   +------+
                     |             |
                     |             v
                 +--------+   +------------+
                 | ipsec1 |<--| XFRM Stack |
                 +--------+   +------------+
====================
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

d5b60c65