• VSR Burru's avatar
    liquidio: improve UDP TX performance · 67e303e0
    VSR Burru authored
    Improve UDP TX performance by:
    * reducing the ring size from 2K to 512
    * replacing the numerous streaming DMA allocations for info buffers and
      gather lists with one large consistent DMA allocation per ring
    
    BQL is not effective here.  We reduced the ring size because there is heavy
    overhead with dma_map_single every so often.  With iommu=on, dma_map_single
    in PF Tx data path was taking longer time (~700usec) for every ~250
    packets.  Debugged intel_iommu code, and found that PF driver is utilizing
    too many static IO virtual address mapping entries (for gather list entries
    and info buffers): about 100K entries for two PF's each using 8 rings.
    Also, finding an empty entry (in rbtree of device domain's iova mapping in
    kernel) during Tx path becomes a bottleneck every so often; the loop to
    find the empty entry goes through over 40K iterations; this is too costly
    and was the major overhead.  Overhead is low when this loop quits quickly.
    
    Netperf benchmark numbers before and after patch:
    
    PF UDP TX
    +--------+--------+------------+------------+---------+
    |        |        |  Before    |  After     |         |
    | Number |        |  Patch     |  Patch     |         |
    |  of    | Packet | Throughput | Throughput | Percent |
    | Flows  |  Size  |  (Gbps)    |  (Gbps)    | Change  |
    +--------+--------+------------+------------+---------+
    |        |   360  |   0.52     |   0.93     |  +78.9  |
    |   1    |  1024  |   1.62     |   2.84     |  +75.3  |
    |        |  1518  |   2.44     |   4.21     |  +72.5  |
    +--------+--------+------------+------------+---------+
    |        |   360  |   0.45     |   1.59     | +253.3  |
    |   4    |  1024  |   1.34     |   5.48     | +308.9  |
    |        |  1518  |   2.27     |   8.31     | +266.1  |
    +--------+--------+------------+------------+---------+
    |        |   360  |   0.40     |   1.61     | +302.5  |
    |   8    |  1024  |   1.64     |   4.24     | +158.5  |
    |        |  1518  |   2.87     |   6.52     | +127.2  |
    +--------+--------+------------+------------+---------+
    
    VF UDP TX
    +--------+--------+------------+------------+---------+
    |        |        |  Before    |  After     |         |
    | Number |        |  Patch     |  Patch     |         |
    |  of    | Packet | Throughput | Throughput | Percent |
    | Flows  |  Size  |  (Gbps)    |  (Gbps)    | Change  |
    +--------+--------+------------+------------+---------+
    |        |   360  |   1.28     |   1.49     |  +16.4  |
    |   1    |  1024  |   4.44     |   4.39     |   -1.1  |
    |        |  1518  |   6.08     |   6.51     |   +7.1  |
    +--------+--------+------------+------------+---------+
    |        |   360  |   2.35     |   2.35     |    0.0  |
    |   4    |  1024  |   6.41     |   8.07     |  +25.9  |
    |        |  1518  |   9.56     |   9.54     |   -0.2  |
    +--------+--------+------------+------------+---------+
    |        |   360  |   3.41     |   3.65     |   +7.0  |
    |   8    |  1024  |   9.35     |   9.34     |   -0.1  |
    |        |  1518  |   9.56     |   9.57     |   +0.1  |
    +--------+--------+------------+------------+---------+
    Signed-off-by: default avatarVSR Burru <veerasenareddy.burru@cavium.com>
    Signed-off-by: default avatarFelix Manlunas <felix.manlunas@cavium.com>
    Signed-off-by: default avatarDerek Chickles <derek.chickles@cavium.com>
    Signed-off-by: default avatarRaghu Vatsavayi <raghu.vatsavayi@cavium.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    67e303e0
octeon_droq.c 25.5 KB