• Yuchung Cheng's avatar
    tcp: retry more conservatively on local congestion · 590d2026
    Yuchung Cheng authored
    Previously when the sender fails to retransmit a data packet on
    timeout due to congestion in the local host (e.g. throttling in
    qdisc), it'll retry within an RTO up to 500ms.
    
    In low-RTT networks such as data-centers, RTO is often far
    below the default minimum 200ms (and the cap 500ms). Then local
    host congestion could trigger a retry storm pouring gas to the
    fire. Worse yet, the retry counter (icsk_retransmits) is not
    properly updated so the aggressive retry may exceed the system
    limit (15 rounds) until the packet finally slips through.
    
    On such rare events, it's wise to retry more conservatively (500ms)
    and update the stats properly to reflect these incidents and follow
    the system limit. Note that this is consistent with the behavior
    when a keep-alive probe is dropped due to local congestion.
    Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
    Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
    Reviewed-by: default avatarNeal Cardwell <ncardwell@google.com>
    Reviewed-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    590d2026
tcp_timer.c 21.7 KB