Commits · 5903123f662ed18483f05cac3f9e800a074c29ff · Kirill Smelkov / linux

02 Feb, 2022 21 commits

tcp: Use BPF timeout setting for SYN ACK RTO · 5903123f

Akhmat Karakotov authored Jan 28, 2022

When setting RTO through BPF program, some SYN ACK packets were unaffected
and continued to use TCP_TIMEOUT_INIT constant. This patch adds timeout
option to struct request_sock. Option is initialized with TCP_TIMEOUT_INIT
and is reassigned through BPF using tcp_timeout_init call. SYN ACK
retransmits now use newly added timeout option.
Signed-off-by: Akhmat Karakotov <hmukos@yandex-team.ru>
Acked-by: Martin KaFai Lau <kafai@fb.com>

v2:
	- Add timeout option to struct request_sock. Do not call
	  tcp_timeout_init on every syn ack retransmit.

v3:
	- Use unsigned long for min. Bound tcp_timeout_init to TCP_RTO_MAX.

v4:
	- Refactor duplicate code by adding reqsk_timeout function.
Signed-off-by: David S. Miller <davem@davemloft.net>

5903123f

Merge branch 'qca8k-mdio' · 0b6b0d31

David S. Miller authored Feb 02, 2022

Ansuel Smith says:

====================
Add support for qca8k mdio rw in Ethernet packet

The main reason for this is that we notice some routing problem in the
switch and it seems assisted learning is needed. Considering mdio is
quite slow due to the indirect write using this Ethernet alternative way
seems to be quicker.

The qca8k switch supports a special way to pass mdio read/write request
using specially crafted Ethernet packet.
This works by putting some defined data in the Ethernet header where the
mac source and dst should be placed. The Ethernet type header is set to qca
header and is set to a mdio read/write type.
This is used to communicate to the switch that this is a special packet
and should be parsed differently.

Currently we use Ethernet packet for
- MIB counter
- mdio read/write configuration
- phy read/write for each port

Current implementation of this use completion API to wait for the packet
to be processed by the tagger and has a timeout that fallback to the
legacy mdio way and mutex to enforce one transaction at time.

We now have connect()/disconnect() ops for the tagger. They are used to
allocate priv data in the dsa priv. The header still has to be put in
global include to make it usable by a dsa driver.
They are called when the tag is connect to the dst and the data is freed
using discconect on tagger change.

(if someone wonder why the bind function is put at in the general setup
function it's because tag is set in the cpu port where the notifier is
still not available and we require the notifier to sen the
tag_proto_connect() event.

We now have a tag_proto_connect() for the dsa driver used to put
additional data in the tagger priv (that is actually the dsa priv).
This is called using a switch event DSA_NOTIFIER_TAG_PROTO_CONNECT.
Current use for this is adding handler for the Ethernet packet to keep
the tagger code as dumb as possible.

The tagger priv implement only the handler for the special packet. All the
other stuff is placed in the qca8k_priv and the tagger has to access
it under lock.

We use the new API from Vladimir to track if the master port is
operational or not. We had to track many thing to reach a usable state.
Checking if the port is UP is not enough and tracking a NETDEV_CHANGE is
also not enough since it use also for other task. The correct way was
both track for interface UP and if a qdisc was assigned to the
interface. That tells us the port (and the tagger indirectly) is ready
to accept and process packet.

I tested this with multicpu port and with port6 set as the unique port and
it's sad.
It seems they implemented this feature in a bad way and this is only
supported with cpu port0. When cpu port6 is the unique port, the switch
doesn't send ack packet. With multicpu port, packet ack are not duplicated
and only cpu port0 sends them. This is the same for the MIB counter.
For this reason this feature is enabled only when cpu port0 is enabled and
operational.

v8:
- Reworked to rolling counter for the seq_num
- Reworked the hi/lo cache patch
- Fix multiple missing skb free and mutex lock errors
- Fix some spelling mistake
- Add macro build check for mgmt packet size
- Change some struct naming to make them more descriptive
v7:
- Rebase on net-next changes
- Add bulk patches to speedup this even more
v6:
- Fix some error in ethtool handler caused by rebase/cleanup
v5:
- Adapt to new API fixes
- Fix a wrong logic for noop
- Add additional lock for master_state change
- Limit mdio Ethernet to cpu port0 (switch limitation)
- Add priority to these special packet
- Move mdio cache to qca8k_priv
v4:
- Remove duplicate patch sent by mistake.
v3:
- Include MIB with Ethernet packet.
- Include phy read/write with Ethernet packet.
- Reorganize code with new API.
- Introuce master tracking by Vladimir
v2:
- Address all suggestion from Vladimir.
  Try to generilize this with connect/disconnect function from the
  tagger and tag_proto_connect for the driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0b6b0d31

net: dsa: qca8k: introduce qca8k_bulk_read/write function · 4f3701fc

Ansuel Smith authored Feb 02, 2022

Introduce qca8k_bulk_read/write() function to use mgmt Ethernet way to
read/write packet in bulk. Make use of this new function in the fdb
function and while at it reduce the reg for fdb_read from 4 to 3 as the
max bit for the ARL(fdb) table is 83 bits.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f3701fc

net: dsa: qca8k: add support for larger read/write size with mgmt Ethernet · 90386223

Ansuel Smith authored Feb 02, 2022

mgmt Ethernet packet can read/write up to 16byte at times. The len reg
is limited to 15 (0xf). The switch actually sends and accepts data in 4
different steps of len values.
Len steps:
- 0: nothing
- 1-4: first 4 byte
- 5-6: first 12 byte
- 7-15: all 16 byte

In the alloc skb function we check if the len is 16 and we fix it to a
len of 15. It the read/write function interest to extract the real asked
data. The tagger handler will always copy the fully 16byte with a READ
command. This is useful for some big regs like the fdb reg that are
more than 4byte of data. This permits to introduce a bulk function that
will send and request the entire entry in one go.
Write function is changed and it does now require to pass the pointer to
val to also handle array val.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

90386223

net: dsa: qca8k: cache lo and hi for mdio write · 2481d206

Ansuel Smith authored Feb 02, 2022

From Documentation, we can cache lo and hi the same way we do with the
page. This massively reduce the mdio write as 3/4 of the time as we only
require to write the lo or hi part for a mdio write.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2481d206

net: dsa: qca8k: move page cache to driver priv · 4264350a

Ansuel Smith authored Feb 02, 2022

There can be multiple qca8k switch on the same system. Move the static
qca8k_current_page to qca8k_priv and make it specific for each switch.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4264350a

net: dsa: qca8k: add support for phy read/write with mgmt Ethernet · 2cd54856

Ansuel Smith authored Feb 02, 2022

Use mgmt Ethernet also for phy read/write if availabale. Use a different
seq number to make sure we receive the correct packet.
On any error, we fallback to the legacy mdio read/write.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2cd54856

net: dsa: qca8k: add support for mib autocast in Ethernet packet · 5c957c7c

Ansuel Smith authored Feb 02, 2022

The switch can autocast MIB counter using Ethernet packet.
Add support for this and provide a handler for the tagger.
The switch will send packet with MIB counter for each port, the switch
will use completion API to wait for the correct packet to be received
and will complete the task only when each packet is received.
Although the handler will drop all the other packet, we still have to
consume each MIB packet to complete the request. This is done to prevent
mixed data with concurrent ethtool request.

connect_tag_protocol() is used to add the handler to the tag_qca tagger,
master_state_change() use the MIB lock to make sure no MIB Ethernet is
in progress.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c957c7c

net: dsa: qca8k: add support for mgmt read/write in Ethernet packet · 5950c7c0

Ansuel Smith authored Feb 02, 2022

Add qca8k side support for mgmt read/write in Ethernet packet.
qca8k supports some specially crafted Ethernet packet that can be used
for mgmt read/write instead of the legacy method uart/internal mdio.
This add support for the qca8k side to craft the packet and enqueue it.
Each port and the qca8k_priv have a special struct to put data in it.
The completion API is used to wait for the packet to be received back
with the requested data.

The various steps are:
1. Craft the special packet with the qca hdr set to mgmt read/write
   mode.
2. Set the lock in the dedicated mgmt struct.
3. Increment the seq number and set it in the mgmt pkt
4. Reinit the completion.
5. Enqueue the packet.
6. Wait the packet to be received.
7. Use the data set by the tagger to complete the mdio operation.

If the completion timeouts or the ack value is not true, the legacy
mdio way is used.

It has to be considered that in the initial setup mdio is still used and
mdio is still used until DSA is ready to accept and tag packet.

tag_proto_connect() is used to fill the required handler for the tagger
to correctly parse and elaborate the special Ethernet mdio packet.

Locking is added to qca8k_master_change() to make sure no mgmt Ethernet
are in progress.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5950c7c0

net: dsa: qca8k: add tracking state of master port · cddbec19

Ansuel Smith authored Feb 02, 2022

MDIO/MIB Ethernet require the master port and the tagger availabale to
correctly work. Use the new api master_state_change to track when master
is operational or not and set a bool in qca8k_priv.
We cache the first cached master available and we check if other cpu
port are operational when the cached one goes down.
This cached master will later be used by mdio read/write and mib request to
correctly use the working function.

qca8k implementation for MDIO/MIB Ethernet is bad. CPU port0 is the only
one that answers with the ack packet or sends MIB Ethernet packets. For
this reason the master_state_change ignore CPU port6 and only checks
CPU port0 if it's operational and enables this mode.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cddbec19

net: dsa: tag_qca: add support for handling mgmt and MIB Ethernet packet · 31eb6b43

Ansuel Smith authored Feb 02, 2022

Add connect/disconnect helper to assign private struct to the DSA switch.
Add support for Ethernet mgmt and MIB if the DSA driver provide an handler
to correctly parse and elaborate the data.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

31eb6b43

net: dsa: tag_qca: add define for handling MIB packet · 18be654a

Ansuel Smith authored Feb 02, 2022

Add struct to correctly parse a mib Ethernet packet.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

18be654a

net: dsa: tag_qca: add define for handling mgmt Ethernet packet · c2ee8181

Ansuel Smith authored Feb 02, 2022

Add all the required define to prepare support for mgmt read/write in
Ethernet packet. Any packet of this type has to be dropped as the only
use of these special packet is receive ack for an mgmt write request or
receive data for an mgmt read request.
A struct is used that emulates the Ethernet header but is used for a
different purpose.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2ee8181

net: dsa: tag_qca: enable promisc_on_master flag · 101c04c3

Ansuel Smith authored Feb 02, 2022

Ethernet MDIO packets are non-standard and DSA master expects the first
6 octets to be the MAC DA. To address these kind of packet, enable
promisc_on_master flag for the tagger.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

101c04c3

net: dsa: tag_qca: move define to include linux/dsa · 3ec762fb

Ansuel Smith authored Feb 02, 2022

Move tag_qca define to include dir linux/dsa as the qca8k require access
to the tagger define to support in-band mdio read/write using ethernet
packet.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ec762fb

net: dsa: tag_qca: convert to FIELD macro · 6b045829

Ansuel Smith authored Feb 02, 2022

Convert driver to FIELD macro to drop redundant define.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6b045829

net: dsa: replay master state events in dsa_tree_{setup,teardown}_master · e83d5653

Vladimir Oltean authored Feb 02, 2022

In order for switch driver to be able to make simple and reliable use of
the master tracking operations, they must also be notified of the
initial state of the DSA master, not just of the changes. This is
because they might enable certain features only during the time when
they know that the DSA master is up and running.

Therefore, this change explicitly checks the state of the DSA master
under the same rtnl_mutex as we were holding during the
dsa_master_setup() and dsa_master_teardown() call. The idea being that
if the DSA master became operational in between the moment in which it
became a DSA master (dsa_master_setup set dev->dsa_ptr) and the moment
when we checked for the master being up, there is a chance that we
would emit a ->master_state_change() call with no actual state change.
We need to avoid that by serializing the concurrent netdevice event with
us. If the netdevice event started before, we force it to finish before
we begin, because we take rtnl_lock before making netdev_uses_dsa()
return true. So we also handle that early event and do nothing on it.
Similarly, if the dev_open() attempt is concurrent with us, it will
attempt to take the rtnl_mutex, but we're holding it. We'll see that
the master flag IFF_UP isn't set, then when we release the rtnl_mutex
we'll process the NETDEV_UP notifier.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e83d5653

net: dsa: provide switch operations for tracking the master state · 295ab96f

Vladimir Oltean authored Feb 02, 2022

Certain drivers may need to send management traffic to the switch for
things like register access, FDB dump, etc, to accelerate what their
slow bus (SPI, I2C, MDIO) can already do.

Ethernet is faster (especially in bulk transactions) but is also more
unreliable, since the user may decide to bring the DSA master down (or
not bring it up), therefore severing the link between the host and the
attached switch.

Drivers needing Ethernet-based register access already should have
fallback logic to the slow bus if the Ethernet method fails, but that
fallback may be based on a timeout, and the I/O to the switch may slow
down to a halt if the master is down, because every Ethernet packet will
have to time out. The driver also doesn't have the option to turn off
Ethernet-based I/O momentarily, because it wouldn't know when to turn it
back on.

Which is where this change comes in. By tracking NETDEV_CHANGE,
NETDEV_UP and NETDEV_GOING_DOWN events on the DSA master, we should know
the exact interval of time during which this interface is reliably
available for traffic. Provide this information to switches so they can
use it as they wish.

An helper is added dsa_port_master_is_operational() to check if a master
port is operational.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

295ab96f

sfc: The size of the RX recycle ring should be more flexible · 000fe940

Martin Habets authored Jan 31, 2022

Ideally the size would depend on the link speed, but the recycle
ring is created when the interface is brought up before the driver
knows the link speed. So size it for the maximum speed of a given NIC.
PowerPC is only supported on SFN7xxx and SFN8xxx NICs.

With this patch on a 40G NIC the number of calls to alloc_pages and
friends went down from about 18% to under 2%.
On a 10G NIC the number of calls to alloc_pages and friends went down
from about 15% to 0 (perf did not capture any calls during the 60
second test).
On a 100G NIC the number of calls to alloc_pages and friends went down
from about 23% to 4%.
Reported-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20220131111054.cp4f6foyinaarwbn@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

000fe940

r8169: support L1.2 control on RTL8168h · 68650b4e

Heiner Kallweit authored Jan 31, 2022

According to Realtek RTL8168h supports the same L1.2 control as RTL8125.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/4784d5ce-38ac-046a-cbfa-5fdd9773f820@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

68650b4e

net: allow SO_MARK with CAP_NET_RAW via cmsg · 91f0d8a4

Jakub Kicinski authored Jan 31, 2022

There's not reason SO_MARK would be allowed via setsockopt()
and not via cmsg, let's keep the two consistent. See
commit 079925cc ("net: allow SO_MARK with CAP_NET_RAW")
for justification why NET_RAW -> SO_MARK is safe.
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220131233357.52964-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

91f0d8a4

01 Feb, 2022 16 commits

Merge branch 'lan966x-ptp' · e4d2763f

David S. Miller authored Feb 01, 2022

Horatiu Vultur says:

====================
net: lan966x: Add PTP Hardward Clock support

Add support for PTP Hardware Clock (PHC) for lan966x. The switch supports
both PTP 1-step and 2-step modes.

v1->v2:
- fix commit messages
- reduce the scope of the lock ptp_lock inside the function
  lan966x_ptp_hwtstamp_set
- the rx timestamping is always enabled for all packages
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e4d2763f

net: lan966x: Implement get_ts_info · 966f2e1a

Horatiu Vultur authored Jan 31, 2022

Implement the function get_ts_info in ethtool_ops which is needed to get
the HW capabilities for timestamping.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

966f2e1a

net: lan966x: Add support for ptp interrupts · e85a96e4

Horatiu Vultur authored Jan 31, 2022

When doing 2-step timestamping the HW will generate an interrupt when it
managed to timestamp a frame. It is the SW responsibility to read it
from the FIFO.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e85a96e4

net: lan966x: Update extraction/injection for timestamping · 77eecf25

Horatiu Vultur authored Jan 31, 2022

Update both the extraction and injection to do timestamping of the
frames. The extraction is always doing the timestamping while for
injection is doing the timestamping only if it is configured.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

77eecf25

net: lan966x: Implement SIOCSHWTSTAMP and SIOCGHWTSTAMP · 735fec99

Horatiu Vultur authored Jan 31, 2022

Implement the ioctl callbacks SIOCSHWTSTAMP and SIOCGHWTSTAMP to allow
to configure the ports to enable/disable timestamping for TX. The RX
timestamping is always enabled. The HW is capable to run both 1-step
timestamping and 2-step timestamping.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

735fec99

net: lan966x: Add support for ptp clocks · d0964594

Horatiu Vultur authored Jan 31, 2022

The lan966x has 3 PHC. Enable each of them, for now all the
timestamping is happening on the first PHC.
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d0964594

net: lan966x: Add registers that are use for ptp functionality · d700dff4

Horatiu Vultur authored Jan 31, 2022

Add the registers that will be used to configure the PHC in the HW.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d700dff4

dt-bindings: net: lan966x: Extend with the ptp interrupt · 2f92512e

Horatiu Vultur authored Jan 31, 2022

Extend dt-bindings for lan966x with ptp interrupt. This is generated
when doing 2-step timestamping and the timestamp can be read from the
FIFO.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f92512e

selftests: fib rule: Don't echo modified sysctls · 9f397dd5

Guillaume Nault authored Jan 31, 2022

Run sysctl in quiet mode. Echoing the modified sysctl doesn't bring any
useful information.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9f397dd5

selftests: fib rule: Log test description · 21f25cd4

Guillaume Nault authored Jan 31, 2022

All callers of fib_rule6_test_match_n_redirect() and
fib_rule4_test_match_n_redirect() pass a third argument containing a
description of the test being run. Instead of ignoring this argument,
let's use it for logging instead of printing a truncated version of the
command.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

21f25cd4

selftests: fib rule: Drop erroneous TABLE variable · 2e252113

Guillaume Nault authored Jan 31, 2022

The fib_rule6_del_by_pref() and fib_rule4_del_by_pref() functions use
an uninitialised $TABLE variable. They should use $RTABLE instead.
This doesn't alter the result of the test, as it just makes the grep
command less specific (but since the script always uses the same table
number, that doesn't really matter).

Let's fix it anyway and, while there, specify the filtering parameters
directly in 'ip -X rule show' to avoid the extra grep command entirely.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2e252113

selftests: fib rule: Make 'getmatch' and 'match' local variables · 8af2ba9a

Guillaume Nault authored Jan 31, 2022

Let's restrict the scope of these variables to avoid possible
interferences.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8af2ba9a

Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next · 1d02c039

David S. Miller authored Feb 01, 2022

-queue

Tony Nguyen says:

====================
10GbE Intel Wired LAN Driver Updates 2022-01-31

Alexander Lobakin says:

This is an interpolation of [0] to other Intel Ethernet drivers
(and is (re)based on its code).
The main aim is to keep XDP metadata not only in case with
build_skb(), but also when we do napi_alloc_skb() + memcpy().

All Intel drivers suffers from the same here:
 - metadata gets lost on XDP_PASS in legacy-rx;
 - excessive headroom allocation on XSK Rx to skbs;
 - metadata gets lost on XSK Rx to skbs.

Those get especially actual in XDP Hints upcoming.
I couldn't have addressed the first one for all Intel drivers due to
that they don't reserve any headroom for now in legacy-rx mode even
with XDP enabled. This is hugely wrong, but requires quite a bunch
of work and a separate series. Luckily, ice doesn't suffer from
that.
igc has 1 and 3 already fixed in [0].

[0] https://lore.kernel.org/netdev/163700856423.565980.10162564921347693758.stgit@firesoul
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1d02c039

sh_eth: kill useless initializers in sh_eth_{suspend|resume}() · 9a90986e

Sergey Shtylyov authored Jan 29, 2022

sh_eth_{suspend|resume}() initialize their local variable 'ret' to 0 but
this value is never really used, thus we can kill those intializers...
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/f09d7c64-4a2b-6973-09a4-10d759ed0df4@omp.ruSigned-off-by: Jakub Kicinski <kuba@kernel.org>

9a90986e

net: ena: Do not waste napi skb cache · 7354a426

Hyeonggon Yoo authored Jan 29, 2022

By profiling, discovered that ena device driver allocates skb by
build_skb() and frees by napi_skb_cache_put(). Because the driver
does not use napi skb cache in allocation path, napi skb cache is
periodically filled and flushed. This is waste of napi skb cache.

As ena_alloc_skb() is called only in napi, Use napi_build_skb()
and napi_alloc_skb() when allocating skb.

This patch was tested on aws a1.metal instance.

[ jwiedmann.dev@gmail.com: Use napi_alloc_skb() instead of
  netdev_alloc_skb_ip_align() to keep things consistent. ]
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/r/YfUAkA9BhyOJRT4B@ip-172-31-19-208.ap-northeast-1.compute.internalSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7354a426

qed: use msleep() in qed_mcp_cmd() and add qed_mcp_cmd_nosleep() for udelay. · ef10bd49

Venkata Sudheer Kumar Bhavaraju authored Jan 30, 2022

Change qed_mcp_cmd() to use msleep() (by setting QED_MB_FLAG_CAN_SLEEP
flag) and add new nosleep() version of the api. These api are used to
issue cmds to management fw and the change affects how driver
behaves while waiting for a response/resource.

All sleepable callers of the existing api now use msleep() version. For
non-sleepable callers, the new nosleep() version is explicitly used.
Signed-off-by: Venkata Sudheer Kumar Bhavaraju <vbhavaraju@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Link: https://lore.kernel.org/r/20220131005235.1647881-1-vbhavaraju@marvell.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ef10bd49

31 Jan, 2022 3 commits

ixgbe: respect metadata on XSK Rx to skb · f322a620

Alexander Lobakin authored Dec 08, 2021

For now, if the XDP prog returns XDP_PASS on XSK, the metadata
will be lost as it doesn't get copied to the skb.

Copy it along with the frame headers. Account its size on skb
allocation, and when copying just treat it as a part of the frame
and do a pull after to "move" it to the "reserved" zone.

net_prefetch() xdp->data_meta and align the copy size to speed-up
memcpy() a little and better match ixgbe_construct_skb().

Fixes: d0bcacd0 ("ixgbe: add AF_XDP zero-copy Rx support")
Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

f322a620

ixgbe: don't reserve excessive XDP_PACKET_HEADROOM on XSK Rx to skb · 8f405221

Alexander Lobakin authored Dec 08, 2021

{__,}napi_alloc_skb() allocates and reserves additional NET_SKB_PAD
+ NET_IP_ALIGN for any skb.
OTOH, ixgbe_construct_skb_zc() currently allocates and reserves
additional `xdp->data - xdp->data_hard_start`, which is
XDP_PACKET_HEADROOM for XSK frames.
There's no need for that at all as the frame is post-XDP and will
go only to the networking stack core.
Pass the size of the actual data only to __napi_alloc_skb() and
don't reserve anything. This will give enough headroom for stack
processing.

Fixes: d0bcacd0 ("ixgbe: add AF_XDP zero-copy Rx support")
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

8f405221

ixgbe: pass bi->xdp to ixgbe_construct_skb_zc() directly · 1fbdaa13

Alexander Lobakin authored Dec 08, 2021

To not dereference bi->xdp each time in ixgbe_construct_skb_zc(),
pass bi->xdp as an argument instead of bi. We can also call
xsk_buff_free() outside of the function as well as assign bi->xdp
to NULL, which seems to make it closer to its name.
Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

1fbdaa13