-
VSR Burru authored
Improve UDP TX performance by: * reducing the ring size from 2K to 512 * replacing the numerous streaming DMA allocations for info buffers and gather lists with one large consistent DMA allocation per ring BQL is not effective here. We reduced the ring size because there is heavy overhead with dma_map_single every so often. With iommu=on, dma_map_single in PF Tx data path was taking longer time (~700usec) for every ~250 packets. Debugged intel_iommu code, and found that PF driver is utilizing too many static IO virtual address mapping entries (for gather list entries and info buffers): about 100K entries for two PF's each using 8 rings. Also, finding an empty entry (in rbtree of device domain's iova mapping in kernel) during Tx path becomes a bottleneck every so often; the loop to find the empty entry goes through over 40K iterations; this is too costly and was the major overhead. Overhead is low when this loop quits quickly. Netperf benchmark numbers before and after patch: PF UDP TX +--------+--------+------------+------------+---------+ | | | Before | After | | | Number | | Patch | Patch | | | of | Packet | Throughput | Throughput | Percent | | Flows | Size | (Gbps) | (Gbps) | Change | +--------+--------+------------+------------+---------+ | | 360 | 0.52 | 0.93 | +78.9 | | 1 | 1024 | 1.62 | 2.84 | +75.3 | | | 1518 | 2.44 | 4.21 | +72.5 | +--------+--------+------------+------------+---------+ | | 360 | 0.45 | 1.59 | +253.3 | | 4 | 1024 | 1.34 | 5.48 | +308.9 | | | 1518 | 2.27 | 8.31 | +266.1 | +--------+--------+------------+------------+---------+ | | 360 | 0.40 | 1.61 | +302.5 | | 8 | 1024 | 1.64 | 4.24 | +158.5 | | | 1518 | 2.87 | 6.52 | +127.2 | +--------+--------+------------+------------+---------+ VF UDP TX +--------+--------+------------+------------+---------+ | | | Before | After | | | Number | | Patch | Patch | | | of | Packet | Throughput | Throughput | Percent | | Flows | Size | (Gbps) | (Gbps) | Change | +--------+--------+------------+------------+---------+ | | 360 | 1.28 | 1.49 | +16.4 | | 1 | 1024 | 4.44 | 4.39 | -1.1 | | | 1518 | 6.08 | 6.51 | +7.1 | +--------+--------+------------+------------+---------+ | | 360 | 2.35 | 2.35 | 0.0 | | 4 | 1024 | 6.41 | 8.07 | +25.9 | | | 1518 | 9.56 | 9.54 | -0.2 | +--------+--------+------------+------------+---------+ | | 360 | 3.41 | 3.65 | +7.0 | | 8 | 1024 | 9.35 | 9.34 | -0.1 | | | 1518 | 9.56 | 9.57 | +0.1 | +--------+--------+------------+------------+---------+ Signed-off-by: VSR Burru <veerasenareddy.burru@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
67e303e0