• Yuchung Cheng's avatar
    tcp: track data delivery rate for a TCP connection · b9f64820
    Yuchung Cheng authored
    This patch generates data delivery rate (throughput) samples on a
    per-ACK basis. These rate samples can be used by congestion control
    modules, and specifically will be used by TCP BBR in later patches in
    this series.
    
    Key state:
    
    tp->delivered: Tracks the total number of data packets (original or not)
    	       delivered so far. This is an already-existing field.
    
    tp->delivered_mstamp: the last time tp->delivered was updated.
    
    Algorithm:
    
    A rate sample is calculated as (d1 - d0)/(t1 - t0) on a per-ACK basis:
    
      d1: the current tp->delivered after processing the ACK
      t1: the current time after processing the ACK
    
      d0: the prior tp->delivered when the acked skb was transmitted
      t0: the prior tp->delivered_mstamp when the acked skb was transmitted
    
    When an skb is transmitted, we snapshot d0 and t0 in its control
    block in tcp_rate_skb_sent().
    
    When an ACK arrives, it may SACK and ACK some skbs. For each SACKed
    or ACKed skb, tcp_rate_skb_delivered() updates the rate_sample struct
    to reflect the latest (d0, t0).
    
    Finally, tcp_rate_gen() generates a rate sample by storing
    (d1 - d0) in rs->delivered and (t1 - t0) in rs->interval_us.
    
    One caveat: if an skb was sent with no packets in flight, then
    tp->delivered_mstamp may be either invalid (if the connection is
    starting) or outdated (if the connection was idle). In that case,
    we'll re-stamp tp->delivered_mstamp.
    
    At first glance it seems t0 should always be the time when an skb was
    transmitted, but actually this could over-estimate the rate due to
    phase mismatch between transmit and ACK events. To track the delivery
    rate, we ensure that if packets are in flight then t0 and and t1 are
    times at which packets were marked delivered.
    
    If the initial and final RTTs are different then one may be corrupted
    by some sort of noise. The noise we see most often is sending gaps
    caused by delayed, compressed, or stretched acks. This either affects
    both RTTs equally or artificially reduces the final RTT. We approach
    this by recording the info we need to compute the initial RTT
    (duration of the "send phase" of the window) when we recorded the
    associated inflight. Then, for a filter to avoid bandwidth
    overestimates, we generalize the per-sample bandwidth computation
    from:
    
        bw = delivered / ack_phase_rtt
    
    to the following:
    
        bw = delivered / max(send_phase_rtt, ack_phase_rtt)
    
    In large-scale experiments, this filtering approach incorporating
    send_phase_rtt is effective at avoiding bandwidth overestimates due to
    ACK compression or stretched ACKs.
    Signed-off-by: default avatarVan Jacobson <vanj@google.com>
    Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
    Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
    Signed-off-by: default avatarNandita Dukkipati <nanditad@google.com>
    Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
    Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    b9f64820
Makefile 2.49 KB