1. 13 Aug, 2017 10 commits
    • Willem de Bruijn's avatar
      udp: consistently apply ufo or fragmentation · 938990d2
      Willem de Bruijn authored
      
      [ Upstream commit 85f1bd9a ]
      
      When iteratively building a UDP datagram with MSG_MORE and that
      datagram exceeds MTU, consistently choose UFO or fragmentation.
      
      Once skb_is_gso, always apply ufo. Conversely, once a datagram is
      split across multiple skbs, do not consider ufo.
      
      Sendpage already maintains the first invariant, only add the second.
      IPv6 does not have a sendpage implementation to modify.
      
      A gso skb must have a partial checksum, do not follow sk_no_check_tx
      in udp_send_skb.
      
      Found by syzkaller.
      
      Fixes: e89e9cf5 ("[IPv4/IPv6]: UFO Scatter-gather approach")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      938990d2
    • Greg Kroah-Hartman's avatar
      revert "ipv4: Should use consistent conditional judgement for ip fragment in... · 98c1ad1e
      Greg Kroah-Hartman authored
      revert "ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output"
      
      This reverts commit f102bb71 which is
      commit 0a28cfd5 upstream as there is
      another patch that needs to be applied instead of this one.
      
      Cc: Zheng Li <james.z.li@ericsson.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Sasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      98c1ad1e
    • Greg Kroah-Hartman's avatar
      revert "net: account for current skb length when deciding about UFO" · 54fc0c32
      Greg Kroah-Hartman authored
      This reverts commit ef09c9ff which is
      commit a5cb659b upstream as it causes
      merge issues with later patches that are much more important...
      
      Cc: Michal Kubecek <mkubecek@suse.cz>
      Cc: Vlad Yasevich <vyasevic@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Sasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54fc0c32
    • Willem de Bruijn's avatar
      packet: fix tp_reserve race in packet_set_ring · 63364a50
      Willem de Bruijn authored
      
      [ Upstream commit c27927e3 ]
      
      Updates to tp_reserve can race with reads of the field in
      packet_set_ring. Avoid this by holding the socket lock during
      updates in setsockopt PACKET_RESERVE.
      
      This bug was discovered by syzkaller.
      
      Fixes: 8913336a ("packet: add PACKET_RESERVE sockopt")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63364a50
    • Willem de Bruijn's avatar
      net: avoid skb_warn_bad_offload false positives on UFO · 37d5c6e8
      Willem de Bruijn authored
      
      [ Upstream commit 8d63bee6 ]
      
      skb_warn_bad_offload triggers a warning when an skb enters the GSO
      stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL
      checksum offload set.
      
      Commit b2504a5d ("net: reduce skb_warn_bad_offload() noise")
      observed that SKB_GSO_DODGY producers can trigger the check and
      that passing those packets through the GSO handlers will fix it
      up. But, the software UFO handler will set ip_summed to
      CHECKSUM_NONE.
      
      When __skb_gso_segment is called from the receive path, this
      triggers the warning again.
      
      Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On
      Tx these two are equivalent. On Rx, this better matches the
      skb state (checksum computed), as CHECKSUM_NONE here means no
      checksum computed.
      
      See also this thread for context:
      http://patchwork.ozlabs.org/patch/799015/
      
      Fixes: b2504a5d ("net: reduce skb_warn_bad_offload() noise")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37d5c6e8
    • Eric Dumazet's avatar
      tcp: fastopen: tcp_connect() must refresh the route · 8607d550
      Eric Dumazet authored
      
      [ Upstream commit 8ba60924 ]
      
      With new TCP_FASTOPEN_CONNECT socket option, there is a possibility
      to call tcp_connect() while socket sk_dst_cache is either NULL
      or invalid.
      
       +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
       +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
       +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
       +0 connect(4, ..., ...) = 0
      
      << sk->sk_dst_cache becomes obsolete, or even set to NULL >>
      
       +1 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000
      
      We need to refresh the route otherwise bad things can happen,
      especially when syzkaller is running on the host :/
      
      Fixes: 19f6d3f3 ("net/tcp-fastopen: Add new API support")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarWei Wang <weiwan@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8607d550
    • Xin Long's avatar
      net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target · 40fc2b44
      Xin Long authored
      
      [ Upstream commit 96d97030 ]
      
      Commit 55917a21 ("netfilter: x_tables: add context to know if
      extension runs from nft_compat") introduced a member nft_compat to
      xt_tgchk_param structure.
      
      But it didn't set it's value for ipt_init_target. With unexpected
      value in par.nft_compat, it may return unexpected result in some
      target's checkentry.
      
      This patch is to set all it's fields as 0 and only initialize the
      non-zero fields in ipt_init_target.
      
      v1->v2:
        As Wang Cong's suggestion, fix it by setting all it's fields as
        0 and only initializing the non-zero fields.
      
      Fixes: 55917a21 ("netfilter: x_tables: add context to know if extension runs from nft_compat")
      Suggested-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      40fc2b44
    • Daniel Borkmann's avatar
      bpf, s390: fix jit branch offset related to ldimm64 · d0da2877
      Daniel Borkmann authored
      
      [ Upstream commit b0a0c256 ]
      
      While testing some other work that required JIT modifications, I
      run into test_bpf causing a hang when JIT enabled on s390. The
      problematic test case was the one from ddc665a4 (bpf, arm64:
      fix jit branch offset related to ldimm64), and turns out that we
      do have a similar issue on s390 as well. In bpf_jit_prog() we
      update next instruction address after returning from bpf_jit_insn()
      with an insn_count. bpf_jit_insn() returns either -1 in case of
      error (e.g. unsupported insn), 1 or 2. The latter is only the
      case for ldimm64 due to spanning 2 insns, however, next address
      is only set to i + 1 not taking actual insn_count into account,
      thus fix is to use insn_count instead of 1. bpf_jit_enable in
      mode 2 provides also disasm on s390:
      
      Before fix:
      
        000003ff800349b6: a7f40003   brc     15,3ff800349bc                 ; target
        000003ff800349ba: 0000               unknown
        000003ff800349bc: e3b0f0700024       stg     %r11,112(%r15)
        000003ff800349c2: e3e0f0880024       stg     %r14,136(%r15)
        000003ff800349c8: 0db0               basr    %r11,%r0
        000003ff800349ca: c0ef00000000       llilf   %r14,0
        000003ff800349d0: e320b0360004       lg      %r2,54(%r11)
        000003ff800349d6: e330b03e0004       lg      %r3,62(%r11)
        000003ff800349dc: ec23ffeda065       clgrj   %r2,%r3,10,3ff800349b6 ; jmp
        000003ff800349e2: e3e0b0460004       lg      %r14,70(%r11)
        000003ff800349e8: e3e0b04e0004       lg      %r14,78(%r11)
        000003ff800349ee: b904002e   lgr     %r2,%r14
        000003ff800349f2: e3b0f0700004       lg      %r11,112(%r15)
        000003ff800349f8: e3e0f0880004       lg      %r14,136(%r15)
        000003ff800349fe: 07fe               bcr     15,%r14
      
      After fix:
      
        000003ff80ef3db4: a7f40003   brc     15,3ff80ef3dba
        000003ff80ef3db8: 0000               unknown
        000003ff80ef3dba: e3b0f0700024       stg     %r11,112(%r15)
        000003ff80ef3dc0: e3e0f0880024       stg     %r14,136(%r15)
        000003ff80ef3dc6: 0db0               basr    %r11,%r0
        000003ff80ef3dc8: c0ef00000000       llilf   %r14,0
        000003ff80ef3dce: e320b0360004       lg      %r2,54(%r11)
        000003ff80ef3dd4: e330b03e0004       lg      %r3,62(%r11)
        000003ff80ef3dda: ec230006a065       clgrj   %r2,%r3,10,3ff80ef3de6 ; jmp
        000003ff80ef3de0: e3e0b0460004       lg      %r14,70(%r11)
        000003ff80ef3de6: e3e0b04e0004       lg      %r14,78(%r11)          ; target
        000003ff80ef3dec: b904002e   lgr     %r2,%r14
        000003ff80ef3df0: e3b0f0700004       lg      %r11,112(%r15)
        000003ff80ef3df6: e3e0f0880004       lg      %r14,136(%r15)
        000003ff80ef3dfc: 07fe               bcr     15,%r14
      
      test_bpf.ko suite runs fine after the fix.
      
      Fixes: 05462310 ("s390/bpf: Add s390x eBPF JIT compiler backend")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d0da2877
    • Eric Dumazet's avatar
      net: fix keepalive code vs TCP_FASTOPEN_CONNECT · 4e0675f4
      Eric Dumazet authored
      
      [ Upstream commit 2dda6400 ]
      
      syzkaller was able to trigger a divide by 0 in TCP stack [1]
      
      Issue here is that keepalive timer needs to be updated to not attempt
      to send a probe if the connection setup was deferred using
      TCP_FASTOPEN_CONNECT socket option added in linux-4.11
      
      [1]
       divide error: 0000 [#1] SMP
       CPU: 18 PID: 0 Comm: swapper/18 Not tainted
       task: ffff986f62f4b040 ti: ffff986f62fa2000 task.ti: ffff986f62fa2000
       RIP: 0010:[<ffffffff8409cc0d>]  [<ffffffff8409cc0d>] __tcp_select_window+0x8d/0x160
       Call Trace:
        <IRQ>
        [<ffffffff8409d951>] tcp_transmit_skb+0x11/0x20
        [<ffffffff8409da21>] tcp_xmit_probe_skb+0xc1/0xe0
        [<ffffffff840a0ee8>] tcp_write_wakeup+0x68/0x160
        [<ffffffff840a151b>] tcp_keepalive_timer+0x17b/0x230
        [<ffffffff83b3f799>] call_timer_fn+0x39/0xf0
        [<ffffffff83b40797>] run_timer_softirq+0x1d7/0x280
        [<ffffffff83a04ddb>] __do_softirq+0xcb/0x257
        [<ffffffff83ae03ac>] irq_exit+0x9c/0xb0
        [<ffffffff83a04c1a>] smp_apic_timer_interrupt+0x6a/0x80
        [<ffffffff83a03eaf>] apic_timer_interrupt+0x7f/0x90
        <EOI>
        [<ffffffff83fed2ea>] ? cpuidle_enter_state+0x13a/0x3b0
        [<ffffffff83fed2cd>] ? cpuidle_enter_state+0x11d/0x3b0
      
      Tested:
      
      Following packetdrill no longer crashes the kernel
      
      `echo 0 >/proc/sys/net/ipv4/tcp_timestamps`
      
      // Cache warmup: send a Fast Open cookie request
          0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
         +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
         +0 setsockopt(3, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
         +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation is now in progress)
         +0 > S 0:0(0) <mss 1460,nop,nop,sackOK,nop,wscale 8,FO,nop,nop>
       +.01 < S. 123:123(0) ack 1 win 14600 <mss 1460,nop,nop,sackOK,nop,wscale 6,FO abcd1234,nop,nop>
         +0 > . 1:1(0) ack 1
         +0 close(3) = 0
         +0 > F. 1:1(0) ack 1
         +0 < F. 1:1(0) ack 2 win 92
         +0 > .  2:2(0) ack 2
      
         +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
         +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
         +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
         +0 setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
       +.01 connect(4, ..., ...) = 0
         +0 setsockopt(4, SOL_TCP, TCP_KEEPIDLE, [5], 4) = 0
         +10 close(4) = 0
      
      `echo 1 >/proc/sys/net/ipv4/tcp_timestamps`
      
      Fixes: 19f6d3f3 ("net/tcp-fastopen: Add new API support")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e0675f4
    • Yuchung Cheng's avatar
      tcp: avoid setting cwnd to invalid ssthresh after cwnd reduction states · 025bb7f7
      Yuchung Cheng authored
      
      [ Upstream commit ed254971 ]
      
      If the sender switches the congestion control during ECN-triggered
      cwnd-reduction state (CA_CWR), upon exiting recovery cwnd is set to
      the ssthresh value calculated by the previous congestion control. If
      the previous congestion control is BBR that always keep ssthresh
      to TCP_INIFINITE_SSTHRESH, cwnd ends up being infinite. The safe
      step is to avoid assigning invalid ssthresh value when recovery ends.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      025bb7f7
  2. 11 Aug, 2017 30 commits