1. 28 Jul, 2014 2 commits
    • Yuchung Cheng's avatar
      tcp: fix false undo corner cases · f37e4913
      Yuchung Cheng authored
      [ Upstream commit 6e08d5e3
      
       ]
      
      The undo code assumes that, upon entering loss recovery, TCP
      1) always retransmit something
      2) the retransmission never fails locally (e.g., qdisc drop)
      
      so undo_marker is set in tcp_enter_recovery() and undo_retrans is
      incremented only when tcp_retransmit_skb() is successful.
      
      When the assumption is broken because TCP's cwnd is too small to
      retransmit or the retransmit fails locally. The next (DUP)ACK
      would incorrectly revert the cwnd and the congestion state in
      tcp_try_undo_dsack() or tcp_may_undo(). Subsequent (DUP)ACKs
      may enter the recovery state. The sender repeatedly enter and
      (incorrectly) exit recovery states if the retransmits continue to
      fail locally while receiving (DUP)ACKs.
      
      The fix is to initialize undo_retrans to -1 and start counting on
      the first retransmission. Always increment undo_retrans even if the
      retransmissions fail locally because they couldn't cause DSACKs to
      undo the cwnd reduction.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f37e4913
    • Neal Cardwell's avatar
      tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb · 06c84e98
      Neal Cardwell authored
      [ Upstream commit 2cd0d743 ]
      
      If there is an MSS change (or misbehaving receiver) that causes a SACK
      to arrive that covers the end of an skb but is less than one MSS, then
      tcp_match_skb_to_sack() was rounding up pkt_len to the full length of
      the skb ("Round if necessary..."), then chopping all bytes off the skb
      and creating a zero-byte skb in the write queue.
      
      This was visible now because the recently simplified TLP logic in
      bef1909e ("tcp: fixing TLP's FIN recovery") could find that 0-byte
      skb at the end of the write queue, and now that we do not check that
      skb's length we could send it as a TLP probe.
      
      Consider the following example scenario:
      
       mss: 1000
       skb: seq: 0 end_seq: 4000  len: 4000
       SACK: start_seq: 3999 end_seq: 4000
      
      The tcp_match_skb_to_sack() code will compute:
      
       in_sack = false
       pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999
       new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000
       new_len += mss = 4000
      
      Previously we would find the new_len > skb->len check failing, so we
      would fall through and set pkt_len = new_len = 4000 and chop off
      pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment
      afterward in the write queue.
      
      With this new commit, we notice that the new new_len >= skb->len check
      succeeds, so that we return without trying to fragment.
      
      Fixes: adb92db8
      
       ("tcp: Make SACK code to split only at mss boundaries")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      06c84e98
  2. 04 Nov, 2013 1 commit
    • Eric Dumazet's avatar
      tcp: do not forget FIN in tcp_shifted_skb() · d1e668e7
      Eric Dumazet authored
      [ Upstream commit 5e8a402f ]
      
      Yuchung found following problem :
      
       There are bugs in the SACK processing code, merging part in
       tcp_shift_skb_data(), that incorrectly resets or ignores the sacked
       skbs FIN flag. When a receiver first SACK the FIN sequence, and later
       throw away ofo queue (e.g., sack-reneging), the sender will stop
       retransmitting the FIN flag, and hangs forever.
      
      Following packetdrill test can be used to reproduce the bug.
      
      $ cat sack-merge-bug.pkt
      `sysctl -q net.ipv4.tcp_fack=0`
      
      // Establish a connection and send 10 MSS.
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +.000 bind(3, ..., ...) = 0
      +.000 listen(3, 1) = 0
      
      +.050 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
      +.000 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
      +.001 < . 1:1(0) ack 1 win 1024
      +.000 accept(3, ..., ...) = 4
      
      +.100 write(4, ..., 12000) = 12000
      +.000 shutdown(4, SHUT_WR) = 0
      +.000 > . 1:10001(10000) ack 1
      +.050 < . 1:1(0) ack 2001 win 257
      +.000 > FP. 10001:12001(2000) ack 1
      +.050 < . 1:1(0) ack 2001 win 257 <sack 10001:11001,nop,nop>
      +.050 < . 1:1(0) ack 2001 win 257 <sack 10001:12002,nop,nop>
      // SACK reneg
      +.050 < . 1:1(0) ack 12001 win 257
      +0 %{ print "unacked: ",tcpi_unacked }%
      +5 %{ print "" }%
      
      First, a typo inverted left/right of one OR operation, then
      code forgot to advance end_seq if the merged skb carried FIN.
      
      Bug was added in 2.6.29 by commit 832d11c5
      
      
      ("tcp: Try to restore large SKBs while SACK processing")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1e668e7
  3. 27 Jun, 2013 1 commit
    • Nandita Dukkipati's avatar
      tcp: bug fix in proportional rate reduction. · 2d73733f
      Nandita Dukkipati authored
      [ Upstream commit 35f079eb
      
       ]
      
      This patch is a fix for a bug triggering newly_acked_sacked < 0
      in tcp_ack(.).
      
      The bug is triggered by sacked_out decreasing relative to prior_sacked,
      but packets_out remaining the same as pior_packets. This is because the
      snapshot of prior_packets is taken after tcp_sacktag_write_queue() while
      prior_sacked is captured before tcp_sacktag_write_queue(). The problem
      is: tcp_sacktag_write_queue (tcp_match_skb_to_sack() -> tcp_fragment)
      adjusts the pcount for packets_out and sacked_out (MSS change or other
      reason). As a result, this delta in pcount is reflected in
      (prior_sacked - sacked_out) but not in (prior_packets - packets_out).
      
      This patch does the following:
      1) initializes prior_packets at the start of tcp_ack() so as to
      capture the delta in packets_out created by tcp_fragment.
      2) introduces a new "previous_packets_out" variable that snapshots
      packets_out right before tcp_clean_rtx_queue, so pkts_acked can be
      correctly computed as before.
      3) Computes pkts_acked using previous_packets_out, and computes
      newly_acked_sacked using prior_packets.
      Signed-off-by: default avatarNandita Dukkipati <nanditad@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d73733f
  4. 01 May, 2013 1 commit
  5. 05 Apr, 2013 1 commit
  6. 20 Mar, 2013 1 commit
  7. 14 Feb, 2013 2 commits
  8. 11 Jan, 2013 5 commits
  9. 02 Oct, 2012 1 commit
  10. 09 Aug, 2012 1 commit
    • Jiri Kosina's avatar
      tcp: perform DMA to userspace only if there is a task waiting for it · 2ce42ec4
      Jiri Kosina authored
      [ Upstream commit 59ea33a6 ]
      
      Back in 2006, commit 1a2449a8
      
       ("[I/OAT]: TCP recv offload to I/OAT")
      added support for receive offloading to IOAT dma engine if available.
      
      The code in tcp_rcv_established() tries to perform early DMA copy if
      applicable. It however does so without checking whether the userspace
      task is actually expecting the data in the buffer.
      
      This is not a problem under normal circumstances, but there is a corner
      case where this doesn't work -- and that's when MSG_TRUNC flag to
      recvmsg() is used.
      
      If the IOAT dma engine is not used, the code properly checks whether
      there is a valid ucopy.task and the socket is owned by userspace, but
      misses the check in the dmaengine case.
      
      This problem can be observed in real trivially -- for example 'tbench' is a
      good reproducer, as it makes a heavy use of MSG_TRUNC. On systems utilizing
      IOAT, you will soon find tbench waiting indefinitely in sk_wait_data(), as they
      have been already early-copied in tcp_rcv_established() using dma engine.
      
      This patch introduces the same check we are performing in the simple
      iovec copy case to the IOAT case as well. It fixes the indefinite
      recvmsg(MSG_TRUNC) hangs.
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ce42ec4
  11. 03 May, 2012 1 commit
    • Eric Dumazet's avatar
      tcp: change tcp_adv_win_scale and tcp_rmem[2] · b49960a0
      Eric Dumazet authored
      
      tcp_adv_win_scale default value is 2, meaning we expect a good citizen
      skb to have skb->len / skb->truesize ratio of 75% (3/4)
      
      In 2.6 kernels we (mis)accounted for typical MSS=1460 frame :
      1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392.
      So these skbs were considered as not bloated.
      
      With recent truesize fixes, a typical MSS=1460 frame truesize is now the
      more precise :
      2048 + 256 = 2304. But 2304 * 3/4 = 1728.
      So these skb are not good citizen anymore, because 1460 < 1728
      
      (GRO can escape this problem because it build skbs with a too low
      truesize.)
      
      This also means tcp advertises a too optimistic window for a given
      allocated rcvspace : When receiving frames, sk_rmem_alloc can hit
      sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often,
      especially when application is slow to drain its receive queue or in
      case of losses (netperf is fast, scp is slow). This is a major latency
      source.
      
      We should adjust the len/truesize ratio to 50% instead of 75%
      
      This patch :
      
      1) changes tcp_adv_win_scale default to 1 instead of 2
      
      2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account
      better truesize tracking and to allow autotuning tcp receive window to
      reach same value than before. Note that same amount of kernel memory is
      consumed compared to 2.6 kernels.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b49960a0
  12. 30 Apr, 2012 1 commit
  13. 27 Apr, 2012 1 commit
  14. 18 Apr, 2012 1 commit
  15. 10 Apr, 2012 1 commit
  16. 05 Apr, 2012 1 commit
  17. 19 Mar, 2012 2 commits
    • Eric Dumazet's avatar
      tcp: reduce out_of_order memory use · c8628155
      Eric Dumazet authored
      
      With increasing receive window sizes, but speed of light not improved
      that much, out of order queue can contain a huge number of skbs, waiting
      to be moved to receive_queue when missing packets can fill the holes.
      
      Some devices happen to use fat skbs (truesize of 4096 + sizeof(struct
      sk_buff)) to store regular (MTU <= 1500) frames. This makes highly
      probable sk_rmem_alloc hits sk_rcvbuf limit, which can be 4Mbytes in
      many cases.
      
      When limit is hit, tcp stack calls tcp_collapse_ofo_queue(), a true
      latency killer and cpu cache blower.
      
      Doing the coalescing attempt each time we add a frame in ofo queue
      permits to keep memory use tight and in many cases avoid the
      tcp_collapse() thing later.
      
      Tested on various wireless setups (b43, ath9k, ...) known to use big skb
      truesize, this patch removed the "packets collapsed in receive queue due
      to low socket buffer" I had before.
      
      This also reduced average memory used by tcp sockets.
      
      With help from Neal Cardwell.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: H.K. Jerry Chu <hkchu@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8628155
    • Eric Dumazet's avatar
      tcp: introduce tcp_data_queue_ofo · e86b2919
      Eric Dumazet authored
      
      Split tcp_data_queue() in two parts for better readability.
      
      tcp_data_queue_ofo() is responsible for queueing incoming skb into out
      of order queue.
      
      Change code layout so that the skb_set_owner_r() is performed only if
      skb is not dropped.
      
      This is a preliminary patch before "reduce out_of_order memory use"
      following patch.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: H.K. Jerry Chu <hkchu@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e86b2919
  18. 13 Mar, 2012 1 commit
  19. 12 Mar, 2012 1 commit
    • Joe Perches's avatar
      net: Convert printks to pr_<level> · 058bd4d2
      Joe Perches authored
      
      Use a more current kernel messaging style.
      
      Convert a printk block to print_hex_dump.
      Coalesce formats, align arguments.
      Use %s, __func__ instead of embedding function names.
      
      Some messages that were prefixed with <foo>_close are
      now prefixed with <foo>_fini.  Some ah4 and esp messages
      are now not prefixed with "ip ".
      
      The intent of this patch is to later add something like
        #define pr_fmt(fmt) "IPv4: " fmt.
      to standardize the output messages.
      
      Text size is trivially reduced. (x86-32 allyesconfig)
      
      $ size net/ipv4/built-in.o*
         text	   data	    bss	    dec	    hex	filename
       887888	  31558	 249696	1169142	 11d6f6	net/ipv4/built-in.o.new
       887934	  31558	 249800	1169292	 11d78c	net/ipv4/built-in.o.old
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      058bd4d2
  20. 06 Mar, 2012 1 commit
    • Neal Cardwell's avatar
      tcp: fix tcp_shift_skb_data() to not shift SACKed data below snd_una · 4648dc97
      Neal Cardwell authored
      This commit fixes tcp_shift_skb_data() so that it does not shift
      SACKed data below snd_una.
      
      This fixes an issue whose symptoms exactly match reports showing
      tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at
      net/ipv4/tcp_input.c:3418" thread on netdev).
      
      Since 2008 (832d11c5)
      tcp_shift_skb_data() had been shifting SACKed ranges that were below
      snd_una. It checked that the *end* of the skb it was about to shift
      from was above snd_una, but did not check that the end of the actual
      shifted range was above snd_una; this commit adds that check.
      
      Shifting SACKed ranges below snd_una is problematic because for such
      ranges tcp_sacktag_one() short-circuits: it does not declare anything
      as SACKed and does not increase sacked_out.
      
      Before the fixes in commits cc9a672e
      and daef52ba
      
      , shifting SACKed ranges
      below snd_una happened to work because tcp_shifted_skb() was always
      (incorrectly) passing in to tcp_sacktag_one() an skb whose end_seq
      tcp_shift_skb_data() had already guaranteed was beyond snd_una. Hence
      tcp_sacktag_one() never short-circuited and always increased
      tp->sacked_out in this case.
      
      After those two fixes, my testing has verified that shifting SACKed
      ranges below snd_una could cause tp->sacked_out to go negative with
      the following sequence of events:
      
      (1) tcp_shift_skb_data() sees an skb whose end_seq is beyond snd_una,
          then shifts a prefix of that skb that is below snd_una
      
      (2) tcp_shifted_skb() increments the packet count of the
          already-SACKed prev sk_buff
      
      (3) tcp_sacktag_one() sees the end of the new SACKed range is below
          snd_una, so it short-circuits and doesn't increase tp->sacked_out
      
      (5) tcp_clean_rtx_queue() sees the SACKed skb has been ACKed,
          decrements tp->sacked_out by this "inflated" pcount that was
          missing a matching increase in tp->sacked_out, and hence
          tp->sacked_out underflows to a u32 like 0xFFFFFFFF, which casted
          to s32 is negative.
      
      (6) this leads to the warnings seen in the recent "WARNING: at
          net/ipv4/tcp_input.c:3418" thread on the netdev list; e.g.:
          tcp_input.c:3418  WARN_ON((int)tp->sacked_out < 0);
      
      More generally, I think this bug can be tickled in some cases where
      two or more ACKs from the receiver are lost and then a DSACK arrives
      that is immediately above an existing SACKed skb in the write queue.
      
      This fix changes tcp_shift_skb_data() to abort this sequence at step
      (1) in the scenario above by noticing that the bytes are below snd_una
      and not shifting them.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4648dc97
  21. 03 Mar, 2012 1 commit
    • Neal Cardwell's avatar
      tcp: don't fragment SACKed skbs in tcp_mark_head_lost() · c0638c24
      Neal Cardwell authored
      
      In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb
      to mark the first portion as lost. This is for two primary reasons:
      
      (1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When
      doing this, it preserves the sum of their packet counts in order to
      reflect the real-world dynamics on the wire. But given that skbs can
      have remainders that do not align to MSS boundaries, this packet count
      preservation means that for SACKed skbs there is not necessarily a
      direct linear relationship between tcp_skb_pcount(skb) and
      skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment
      off and mark as lost a prefix of length (packets - oldcnt)*mss from
      SACKed skbs were leading to occasional failures of the WARN_ON(len >
      skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the
      recent "crash in tcp_fragment" thread on netdev).
      
      (2) there is no real point in fragmenting off part of a SACKed skb and
      calling tcp_skb_mark_lost() on it, since tcp_skb_mark_lost() is a NOP
      for SACKed skbs.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNandita Dukkipati <nanditad@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0638c24
  22. 28 Feb, 2012 1 commit
    • Neal Cardwell's avatar
      tcp: fix false reordering signal in tcp_shifted_skb · 4c90d3b3
      Neal Cardwell authored
      
      When tcp_shifted_skb() shifts bytes from the skb that is currently
      pointed to by 'highest_sack' then the increment of
      TCP_SKB_CB(skb)->seq implicitly advances tcp_highest_sack_seq(). This
      implicit advancement, combined with the recent fix to pass the correct
      SACKed range into tcp_sacktag_one(), caused tcp_sacktag_one() to think
      that the newly SACKed range was before the tcp_highest_sack_seq(),
      leading to a call to tcp_update_reordering() with a degree of
      reordering matching the size of the newly SACKed range (typically just
      1 packet, which is a NOP, but potentially larger).
      
      This commit fixes this by simply calling tcp_sacktag_one() before the
      TCP_SKB_CB(skb)->seq advancement that can advance our notion of the
      highest SACKed sequence.
      
      Correspondingly, we can simplify the code a little now that
      tcp_shifted_skb() should update the lost_cnt_hint in all cases where
      skb == tp->lost_skb_hint.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c90d3b3
  23. 14 Feb, 2012 1 commit
  24. 13 Feb, 2012 2 commits
  25. 22 Jan, 2012 1 commit
    • Yuchung Cheng's avatar
      tcp: detect loss above high_seq in recovery · 974c1236
      Yuchung Cheng authored
      
      Correctly implement a loss detection heuristic: New sequences (above
      high_seq) sent during the fast recovery are deemed lost when higher
      sequences are SACKed.
      
      Current code does not catch these losses, because tcp_mark_head_lost()
      does not check packets beyond high_seq. The fix is straight-forward by
      checking packets until the highest sacked packet. In addition, all the
      FLAG_DATA_LOST logic are in-effective and redundant and can be removed.
      
      Update the loss heuristic comments. The algorithm above is documented
      as heuristic B, but it is redundant too because heuristic A already
      covers B.
      
      Note that this change only marks some forward-retransmitted packets LOST.
      It does NOT forbid TCP performing further CWR on new losses. A potential
      follow-up patch under preparation is to perform another CWR on "new"
      losses such as
      1) sequence above high_seq is lost (by resetting high_seq to snd_nxt)
      2) retransmission is lost.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      974c1236
  26. 21 Dec, 2011 1 commit
  27. 13 Dec, 2011 1 commit
  28. 11 Dec, 2011 1 commit
  29. 04 Dec, 2011 1 commit
  30. 27 Nov, 2011 3 commits
    • Neal Cardwell's avatar
      tcp: skip cwnd moderation in TCP_CA_Open in tcp_try_to_open · 8cd6d616
      Neal Cardwell authored
      
      The problem: Senders were overriding cwnd values picked during an undo
      by calling tcp_moderate_cwnd() in tcp_try_to_open().
      
      The fix: Don't moderate cwnd in tcp_try_to_open() if we're in
      TCP_CA_Open, since doing so is generally unnecessary and specifically
      would override a DSACK-based undo of a cwnd reduction made in fast
      recovery.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8cd6d616
    • Neal Cardwell's avatar
      tcp: allow undo from reordered DSACKs · f698204b
      Neal Cardwell authored
      
      Previously, SACK-enabled connections hung around in TCP_CA_Disorder
      state while snd_una==high_seq, just waiting to accumulate DSACKs and
      hopefully undo a cwnd reduction. This could and did lead to the
      following unfortunate scenario: if some incoming ACKs advance snd_una
      beyond high_seq then we were setting undo_marker to 0 and moving to
      TCP_CA_Open, so if (due to reordering in the ACK return path) we
      shortly thereafter received a DSACK then we were no longer able to
      undo the cwnd reduction.
      
      The change: Simplify the congestion avoidance state machine by
      removing the behavior where SACK-enabled connections hung around in
      the TCP_CA_Disorder state just waiting for DSACKs. Instead, when
      snd_una advances to high_seq or beyond we typically move to
      TCP_CA_Open immediately and allow an undo in either TCP_CA_Open or
      TCP_CA_Disorder if we later receive enough DSACKs.
      
      Other patches in this series will provide other changes that are
      necessary to fully fix this problem.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f698204b
    • Neal Cardwell's avatar
      tcp: use SACKs and DSACKs that arrive on ACKs below snd_una · e95ae2f2
      Neal Cardwell authored
      
      The bug: When the ACK field is below snd_una (which can happen when
      ACKs are reordered), senders ignored DSACKs (preventing undo) and did
      not call tcp_fastretrans_alert, so they did not increment
      prr_delivered to reflect newly-SACKed sequence ranges, and did not
      call tcp_xmit_retransmit_queue, thus passing up chances to send out
      more retransmitted and new packets based on any newly-SACKed packets.
      
      The change: When the ACK field is below snd_una (the "old_ack" goto
      label), call tcp_fastretrans_alert to allow undo based on any
      newly-arrived DSACKs and try to send out more packets based on
      newly-SACKed packets.
      
      Other patches in this series will provide other changes that are
      necessary to fully fix this problem.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e95ae2f2