• Eric Dumazet's avatar
    tcp: adjust TSO packet sizes based on min_rtt · 65466904
    Eric Dumazet authored
    Back when tcp_tso_autosize() and TCP pacing were introduced,
    our focus was really to reduce burst sizes for long distance
    flows.
    
    The simple heuristic of using sk_pacing_rate/1024 has worked
    well, but can lead to too small packets for hosts in the same
    rack/cluster, when thousands of flows compete for the bottleneck.
    
    Neal Cardwell had the idea of making the TSO burst size
    a function of both sk_pacing_rate and tcp_min_rtt()
    
    Indeed, for local flows, sending bigger bursts is better
    to reduce cpu costs, as occasional losses can be repaired
    quite fast.
    
    This patch is based on Neal Cardwell implementation
    done more than two years ago.
    bbr is adjusting max_pacing_rate based on measured bandwidth,
    while cubic would over estimate max_pacing_rate.
    
    /proc/sys/net/ipv4/tcp_tso_rtt_log can be used to tune or disable
    this new feature, in logarithmic steps.
    
    Tested:
    
    100Gbit NIC, two hosts in the same rack, 4K MTU.
    600 flows rate-limited to 20000000 bytes per second.
    
    Before patch: (TSO sizes would be limited to 20000000/1024/4096 -> 4 segments per TSO)
    
    ~# echo 0 >/proc/sys/net/ipv4/tcp_tso_rtt_log
    ~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
      96005
    
     Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':
    
             65,945.29 msec task-clock                #    2.845 CPUs utilized
             1,314,632      context-switches          # 19935.279 M/sec
                 5,292      cpu-migrations            #   80.249 M/sec
               940,641      page-faults               # 14264.023 M/sec
       201,117,030,926      cycles                    # 3049769.216 GHz                   (83.45%)
        17,699,435,405      stalled-cycles-frontend   #    8.80% frontend cycles idle     (83.48%)
       136,584,015,071      stalled-cycles-backend    #   67.91% backend cycles idle      (83.44%)
        53,809,530,436      instructions              #    0.27  insn per cycle
                                                      #    2.54  stalled cycles per insn  (83.36%)
         9,062,315,523      branches                  # 137422329.563 M/sec               (83.22%)
           153,008,621      branch-misses             #    1.69% of all branches          (83.32%)
    
          23.182970846 seconds time elapsed
    
    TcpInSegs                       15648792           0.0
    TcpOutSegs                      58659110           0.0  # Average of 3.7 4K segments per TSO packet
    TcpExtTCPDelivered              58654791           0.0
    TcpExtTCPDeliveredCE            19                 0.0
    
    After patch:
    
    ~# echo 9 >/proc/sys/net/ipv4/tcp_tso_rtt_log
    ~# nstat -n;perf stat ./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered"
      96046
    
     Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000':
    
             48,982.58 msec task-clock                #    2.104 CPUs utilized
               186,014      context-switches          # 3797.599 M/sec
                 3,109      cpu-migrations            #   63.472 M/sec
               941,180      page-faults               # 19214.814 M/sec
       153,459,763,868      cycles                    # 3132982.807 GHz                   (83.56%)
        12,069,861,356      stalled-cycles-frontend   #    7.87% frontend cycles idle     (83.32%)
       120,485,917,953      stalled-cycles-backend    #   78.51% backend cycles idle      (83.24%)
        36,803,672,106      instructions              #    0.24  insn per cycle
                                                      #    3.27  stalled cycles per insn  (83.18%)
         5,947,266,275      branches                  # 121417383.427 M/sec               (83.64%)
            87,984,616      branch-misses             #    1.48% of all branches          (83.43%)
    
          23.281200256 seconds time elapsed
    
    TcpInSegs                       1434706            0.0
    TcpOutSegs                      58883378           0.0  # Average of 41 4K segments per TSO packet
    TcpExtTCPDelivered              58878971           0.0
    TcpExtTCPDeliveredCE            9664               0.0
    Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
    Reviewed-by: default avatarNeal Cardwell <ncardwell@google.com>
    Link: https://lore.kernel.org/r/20220309015757.2532973-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    65466904
sysctl_net_ipv4.c 37 KB