Commit 726e9e8b authored by Eric Dumazet's avatar Eric Dumazet Committed by Jakub Kicinski

tcp: refine skb->ooo_okay setting

Enabling BIG TCP on a low end platform apparently increased
chances of getting flows locked on one busy TX queue.

A similar problem was handled in commit 9b462d02
("tcp: TCP Small Queues and strange attractors"),
but the strategy worked for either bulk flows,
or 'large enough' RPC. BIG TCP changed how large
RPC needed to be to enable the work around:
If RPC fits in a single skb, TSQ never triggers.

Root cause for the problem is a busy TX queue,
with delayed TX completions.

This patch changes how we set skb->ooo_okay to detect
the case TX completion was not done, but incoming ACK
already was processed and emptied rtx queue.

Update the comment to explain the tricky details.
Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230817182353.2523746-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
parent fc720399
...@@ -1301,14 +1301,21 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, ...@@ -1301,14 +1301,21 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
} }
tcp_header_size = tcp_options_size + sizeof(struct tcphdr); tcp_header_size = tcp_options_size + sizeof(struct tcphdr);
/* if no packet is in qdisc/device queue, then allow XPS to select /* We set skb->ooo_okay to one if this packet can select
* another queue. We can be called from tcp_tsq_handler() * a different TX queue than prior packets of this flow,
* which holds one reference to sk. * to avoid self inflicted reorders.
* * The 'other' queue decision is based on current cpu number
* TODO: Ideally, in-flight pure ACK packets should not matter here. * if XPS is enabled, or sk->sk_txhash otherwise.
* One way to get this would be to set skb->truesize = 2 on them. * We can switch to another (and better) queue if:
*/ * 1) No packet with payload is in qdisc/device queues.
skb->ooo_okay = sk_wmem_alloc_get(sk) < SKB_TRUESIZE(1); * Delays in TX completion can defeat the test
* even if packets were already sent.
* 2) Or rtx queue is empty.
* This mitigates above case if ACK packets for
* all prior packets were already processed.
*/
skb->ooo_okay = sk_wmem_alloc_get(sk) < SKB_TRUESIZE(1) ||
tcp_rtx_queue_empty(sk);
/* If we had to use memory reserve to allocate this skb, /* If we had to use memory reserve to allocate this skb,
* this might cause drops if packet is looped back : * this might cause drops if packet is looped back :
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment