Commit a8db76d4 authored by Sven Van Asbroeck's avatar Sven Van Asbroeck Committed by David S. Miller

lan743x: boost performance on cpu archs w/o dma cache snooping

The buffers in the lan743x driver's receive ring are always 9K,
even when the largest packet that can be received (the mtu) is
much smaller. This performs particularly badly on cpu archs
without dma cache snooping (such as ARM): each received packet
results in a 9K dma_{map|unmap} operation, which is very expensive
because cpu caches need to be invalidated.

Careful measurement of the driver rx path on armv7 reveals that
the cpu spends the majority of its time waiting for cache
invalidation.

Optimize by keeping the rx ring buffer size as close as possible
to the mtu. This limits the amount of cache that requires
invalidation.

This optimization would normally force us to re-allocate all
ring buffers when the mtu is changed - a disruptive event,
because it can only happen when the network interface is down.

Remove the need to re-allocate all ring buffers by adding support
for multi-buffer frames. Now any combination of mtu and ring
buffer size will work. When the mtu changes from mtu1 to mtu2,
consumed buffers of size mtu1 are lazily replaced by newly
allocated buffers of size mtu2.

These optimizations double the rx performance on armv7.
Third parties report 3x rx speedup on armv8.

Tested with iperf3 on a freescale imx6qp + lan7430, both sides
set to mtu 1500 bytes, measure rx performance:

Before:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-20.00  sec   550 MBytes   231 Mbits/sec    0
After:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-20.00  sec  1.33 GBytes   570 Mbits/sec    0
Signed-off-by: default avatarSven Van Asbroeck <thesven73@gmail.com>
Reviewed-by: default avatarBryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parent 80fea53d
......@@ -699,6 +699,8 @@ struct lan743x_rx {
struct napi_struct napi;
u32 frame_count;
struct sk_buff *skb_head, *skb_tail;
};
struct lan743x_adapter {
......@@ -831,8 +833,7 @@ struct lan743x_rx_buffer_info {
#define LAN743X_RX_RING_SIZE (65)
#define RX_PROCESS_RESULT_NOTHING_TO_DO (0)
#define RX_PROCESS_RESULT_PACKET_RECEIVED (1)
#define RX_PROCESS_RESULT_PACKET_DROPPED (2)
#define RX_PROCESS_RESULT_BUFFER_RECEIVED (1)
u32 lan743x_csr_read(struct lan743x_adapter *adapter, int offset);
void lan743x_csr_write(struct lan743x_adapter *adapter, int offset, u32 data);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment