• Neal Cardwell's avatar
    tcp: fix tcp_shift_skb_data() to not shift SACKed data below snd_una · 4648dc97
    Neal Cardwell authored
    This commit fixes tcp_shift_skb_data() so that it does not shift
    SACKed data below snd_una.
    
    This fixes an issue whose symptoms exactly match reports showing
    tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at
    net/ipv4/tcp_input.c:3418" thread on netdev).
    
    Since 2008 (832d11c5)
    tcp_shift_skb_data() had been shifting SACKed ranges that were below
    snd_una. It checked that the *end* of the skb it was about to shift
    from was above snd_una, but did not check that the end of the actual
    shifted range was above snd_una; this commit adds that check.
    
    Shifting SACKed ranges below snd_una is problematic because for such
    ranges tcp_sacktag_one() short-circuits: it does not declare anything
    as SACKed and does not increase sacked_out.
    
    Before the fixes in commits cc9a672e
    and daef52ba, shifting SACKed ranges
    below snd_una happened to work because tcp_shifted_skb() was always
    (incorrectly) passing in to tcp_sacktag_one() an skb whose end_seq
    tcp_shift_skb_data() had already guaranteed was beyond snd_una. Hence
    tcp_sacktag_one() never short-circuited and always increased
    tp->sacked_out in this case.
    
    After those two fixes, my testing has verified that shifting SACKed
    ranges below snd_una could cause tp->sacked_out to go negative with
    the following sequence of events:
    
    (1) tcp_shift_skb_data() sees an skb whose end_seq is beyond snd_una,
        then shifts a prefix of that skb that is below snd_una
    
    (2) tcp_shifted_skb() increments the packet count of the
        already-SACKed prev sk_buff
    
    (3) tcp_sacktag_one() sees the end of the new SACKed range is below
        snd_una, so it short-circuits and doesn't increase tp->sacked_out
    
    (5) tcp_clean_rtx_queue() sees the SACKed skb has been ACKed,
        decrements tp->sacked_out by this "inflated" pcount that was
        missing a matching increase in tp->sacked_out, and hence
        tp->sacked_out underflows to a u32 like 0xFFFFFFFF, which casted
        to s32 is negative.
    
    (6) this leads to the warnings seen in the recent "WARNING: at
        net/ipv4/tcp_input.c:3418" thread on the netdev list; e.g.:
        tcp_input.c:3418  WARN_ON((int)tp->sacked_out < 0);
    
    More generally, I think this bug can be tickled in some cases where
    two or more ACKs from the receiver are lost and then a DSACK arrives
    that is immediately above an existing SACKed skb in the write queue.
    
    This fix changes tcp_shift_skb_data() to abort this sequence at step
    (1) in the scenario above by noticing that the bytes are below snd_una
    and not shifting them.
    Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    4648dc97
tcp_input.c 171 KB