Commit b248230c authored by Yuchung Cheng's avatar Yuchung Cheng Committed by David S. Miller

tcp: abort orphan sockets stalling on zero window probes

Currently we have two different policies for orphan sockets
that repeatedly stall on zero window ACKs. If a socket gets
a zero window ACK when it is transmitting data, the RTO is
used to probe the window. The socket is aborted after roughly
tcp_orphan_retries() retries (as in tcp_write_timeout()).

But if the socket was idle when it received the zero window ACK,
and later wants to send more data, we use the probe timer to
probe the window. If the receiver always returns zero window ACKs,
icsk_probes keeps getting reset in tcp_ack() and the orphan socket
can stall forever until the system reaches the orphan limit (as
commented in tcp_probe_timer()). This opens up a simple attack
to create lots of hanging orphan sockets to burn the memory
and the CPU, as demonstrated in the recent netdev post "TCP
connection will hang in FIN_WAIT1 after closing if zero window is
advertised." http://www.spinics.net/lists/netdev/msg296539.html

This patch follows the design in RTO-based probe: we abort an orphan
socket stalling on zero window when the probe timer reaches both
the maximum backoff and the maximum RTO. For example, an 100ms RTT
connection will timeout after roughly 153 seconds (0.3 + 0.6 +
.... + 76.8) if the receiver keeps the window shut. If the orphan
socket passes this check, but the system already has too many orphans
(as in tcp_out_of_resources()), we still abort it but we'll also
send an RST packet as the connection may still be active.

In addition, we change TCP_USER_TIMEOUT to cover (life or dead)
sockets stalled on zero-window probes. This changes the semantics
of TCP_USER_TIMEOUT slightly because it previously only applies
when the socket has pending transmission.
Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
Reported-by: default avatarAndrey Dmitrov <andrey.dmitrov@oktetlabs.ru>
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parent cb57659a
...@@ -2693,7 +2693,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, ...@@ -2693,7 +2693,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
break; break;
#endif #endif
case TCP_USER_TIMEOUT: case TCP_USER_TIMEOUT:
/* Cap the max timeout in ms TCP will retry/retrans /* Cap the max time in ms TCP will retry or probe the window
* before giving up and aborting (ETIMEDOUT) a connection. * before giving up and aborting (ETIMEDOUT) a connection.
*/ */
if (val < 0) if (val < 0)
......
...@@ -52,7 +52,7 @@ static void tcp_write_err(struct sock *sk) ...@@ -52,7 +52,7 @@ static void tcp_write_err(struct sock *sk)
* limit. * limit.
* 2. If we have strong memory pressure. * 2. If we have strong memory pressure.
*/ */
static int tcp_out_of_resources(struct sock *sk, int do_reset) static int tcp_out_of_resources(struct sock *sk, bool do_reset)
{ {
struct tcp_sock *tp = tcp_sk(sk); struct tcp_sock *tp = tcp_sk(sk);
int shift = 0; int shift = 0;
...@@ -72,7 +72,7 @@ static int tcp_out_of_resources(struct sock *sk, int do_reset) ...@@ -72,7 +72,7 @@ static int tcp_out_of_resources(struct sock *sk, int do_reset)
if ((s32)(tcp_time_stamp - tp->lsndtime) <= TCP_TIMEWAIT_LEN || if ((s32)(tcp_time_stamp - tp->lsndtime) <= TCP_TIMEWAIT_LEN ||
/* 2. Window is closed. */ /* 2. Window is closed. */
(!tp->snd_wnd && !tp->packets_out)) (!tp->snd_wnd && !tp->packets_out))
do_reset = 1; do_reset = true;
if (do_reset) if (do_reset)
tcp_send_active_reset(sk, GFP_ATOMIC); tcp_send_active_reset(sk, GFP_ATOMIC);
tcp_done(sk); tcp_done(sk);
...@@ -270,40 +270,41 @@ static void tcp_probe_timer(struct sock *sk) ...@@ -270,40 +270,41 @@ static void tcp_probe_timer(struct sock *sk)
struct inet_connection_sock *icsk = inet_csk(sk); struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk); struct tcp_sock *tp = tcp_sk(sk);
int max_probes; int max_probes;
u32 start_ts;
if (tp->packets_out || !tcp_send_head(sk)) { if (tp->packets_out || !tcp_send_head(sk)) {
icsk->icsk_probes_out = 0; icsk->icsk_probes_out = 0;
return; return;
} }
/* *WARNING* RFC 1122 forbids this /* RFC 1122 4.2.2.17 requires the sender to stay open indefinitely as
* * long as the receiver continues to respond probes. We support this by
* It doesn't AFAIK, because we kill the retransmit timer -AK * default and reset icsk_probes_out with incoming ACKs. But if the
* * socket is orphaned or the user specifies TCP_USER_TIMEOUT, we
* FIXME: We ought not to do it, Solaris 2.5 actually has fixing * kill the socket when the retry count and the time exceeds the
* this behaviour in Solaris down as a bug fix. [AC] * corresponding system limit. We also implement similar policy when
* * we use RTO to probe window in tcp_retransmit_timer().
* Let me to explain. icsk_probes_out is zeroed by incoming ACKs
* even if they advertise zero window. Hence, connection is killed only
* if we received no ACKs for normal connection timeout. It is not killed
* only because window stays zero for some time, window may be zero
* until armageddon and even later. We are in full accordance
* with RFCs, only probe timer combines both retransmission timeout
* and probe timeout in one bottle. --ANK
*/ */
max_probes = sysctl_tcp_retries2; start_ts = tcp_skb_timestamp(tcp_send_head(sk));
if (!start_ts)
skb_mstamp_get(&tcp_send_head(sk)->skb_mstamp);
else if (icsk->icsk_user_timeout &&
(s32)(tcp_time_stamp - start_ts) > icsk->icsk_user_timeout)
goto abort;
max_probes = sysctl_tcp_retries2;
if (sock_flag(sk, SOCK_DEAD)) { if (sock_flag(sk, SOCK_DEAD)) {
const int alive = inet_csk_rto_backoff(icsk, TCP_RTO_MAX) < TCP_RTO_MAX; const int alive = inet_csk_rto_backoff(icsk, TCP_RTO_MAX) < TCP_RTO_MAX;
max_probes = tcp_orphan_retries(sk, alive); max_probes = tcp_orphan_retries(sk, alive);
if (!alive && icsk->icsk_backoff >= max_probes)
if (tcp_out_of_resources(sk, alive || icsk->icsk_probes_out <= max_probes)) goto abort;
if (tcp_out_of_resources(sk, true))
return; return;
} }
if (icsk->icsk_probes_out > max_probes) { if (icsk->icsk_probes_out > max_probes) {
tcp_write_err(sk); abort: tcp_write_err(sk);
} else { } else {
/* Only send another probe if we didn't close things up. */ /* Only send another probe if we didn't close things up. */
tcp_send_probe0(sk); tcp_send_probe0(sk);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment