Commit 216808c6 authored by Eric Dumazet's avatar Eric Dumazet Committed by Jakub Kicinski

tcp: refine rule to allow EPOLLOUT generation under mem pressure

At the time commit ce5ec440 ("tcp: ensure epoll edge trigger
wakeup when write queue is empty") was added to the kernel,
we still had a single write queue, combining rtx and write queues.

Once we moved the rtx queue into a separate rb-tree, testing
if sk_write_queue is empty has been suboptimal.

Indeed, if we have packets in the rtx queue, we probably want
to delay the EPOLLOUT generation at the time incoming packets
will free them, making room, but more importantly avoiding
flooding application with EPOLLOUT events.

Solution is to use tcp_rtx_and_write_queues_empty() helper.

Fixes: 75c119af ("tcp: implement rb-tree based retransmit queue")
Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
parent ee2aabd3
...@@ -1087,8 +1087,7 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, ...@@ -1087,8 +1087,7 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
goto out; goto out;
out_err: out_err:
/* make sure we wake any epoll edge trigger waiter */ /* make sure we wake any epoll edge trigger waiter */
if (unlikely(skb_queue_len(&sk->sk_write_queue) == 0 && if (unlikely(tcp_rtx_and_write_queues_empty(sk) && err == -EAGAIN)) {
err == -EAGAIN)) {
sk->sk_write_space(sk); sk->sk_write_space(sk);
tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED); tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
} }
...@@ -1419,8 +1418,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) ...@@ -1419,8 +1418,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
sock_zerocopy_put_abort(uarg, true); sock_zerocopy_put_abort(uarg, true);
err = sk_stream_error(sk, flags, err); err = sk_stream_error(sk, flags, err);
/* make sure we wake any epoll edge trigger waiter */ /* make sure we wake any epoll edge trigger waiter */
if (unlikely(skb_queue_len(&sk->sk_write_queue) == 0 && if (unlikely(tcp_rtx_and_write_queues_empty(sk) && err == -EAGAIN)) {
err == -EAGAIN)) {
sk->sk_write_space(sk); sk->sk_write_space(sk);
tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED); tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment