1. 21 Nov, 2016 9 commits
    • Eric Dumazet's avatar
      dccp: do not release listeners too soon · 72b03e54
      Eric Dumazet authored
      [ Upstream commit c3f24cfb ]
      
      Andrey Konovalov reported following error while fuzzing with syzkaller :
      
      IPv4: Attempt to release alive inet socket ffff880068e98940
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 3905 Comm: a.out Not tainted 4.9.0-rc3+ #333
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff88006b9e0000 task.stack: ffff880068770000
      RIP: 0010:[<ffffffff819ead5f>]  [<ffffffff819ead5f>]
      selinux_socket_sock_rcv_skb+0xff/0x6a0 security/selinux/hooks.c:4639
      RSP: 0018:ffff8800687771c8  EFLAGS: 00010202
      RAX: ffff88006b9e0000 RBX: 1ffff1000d0eee3f RCX: 1ffff1000d1d312a
      RDX: 1ffff1000d1d31a6 RSI: dffffc0000000000 RDI: 0000000000000010
      RBP: ffff880068777360 R08: 0000000000000000 R09: 0000000000000002
      R10: dffffc0000000000 R11: 0000000000000006 R12: ffff880068e98940
      R13: 0000000000000002 R14: ffff880068777338 R15: 0000000000000000
      FS:  00007f00ff760700(0000) GS:ffff88006cd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020008000 CR3: 000000006a308000 CR4: 00000000000006e0
      Stack:
       ffff8800687771e0 ffffffff812508a5 ffff8800686f3168 0000000000000007
       ffff88006ac8cdfc ffff8800665ea500 0000000041b58ab3 ffffffff847b5480
       ffffffff819eac60 ffff88006b9e0860 ffff88006b9e0868 ffff88006b9e07f0
      Call Trace:
       [<ffffffff819c8dd5>] security_sock_rcv_skb+0x75/0xb0 security/security.c:1317
       [<ffffffff82c2a9e7>] sk_filter_trim_cap+0x67/0x10e0 net/core/filter.c:81
       [<ffffffff82b81e60>] __sk_receive_skb+0x30/0xa00 net/core/sock.c:460
       [<ffffffff838bbf12>] dccp_v4_rcv+0xdb2/0x1910 net/dccp/ipv4.c:873
       [<ffffffff83069d22>] ip_local_deliver_finish+0x332/0xad0
      net/ipv4/ip_input.c:216
       [<     inline     >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
       [<     inline     >] NF_HOOK ./include/linux/netfilter.h:255
       [<ffffffff8306abd2>] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257
       [<     inline     >] dst_input ./include/net/dst.h:507
       [<ffffffff83068500>] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396
       [<     inline     >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
       [<     inline     >] NF_HOOK ./include/linux/netfilter.h:255
       [<ffffffff8306b82f>] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487
       [<ffffffff82bd9fb7>] __netif_receive_skb_core+0x1897/0x2a50 net/core/dev.c:4213
       [<ffffffff82bdb19a>] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251
       [<ffffffff82bdb493>] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4279
       [<ffffffff82bdb6b8>] netif_receive_skb+0x48/0x250 net/core/dev.c:4303
       [<ffffffff8241fc75>] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308
       [<ffffffff82421b5a>] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332
       [<     inline     >] new_sync_write fs/read_write.c:499
       [<ffffffff8151bd44>] __vfs_write+0x334/0x570 fs/read_write.c:512
       [<ffffffff8151f85b>] vfs_write+0x17b/0x500 fs/read_write.c:560
       [<     inline     >] SYSC_write fs/read_write.c:607
       [<ffffffff81523184>] SyS_write+0xd4/0x1a0 fs/read_write.c:599
       [<ffffffff83fc02c1>] entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      It turns out DCCP calls __sk_receive_skb(), and this broke when
      lookups no longer took a reference on listeners.
      
      Fix this issue by adding a @refcounted parameter to __sk_receive_skb(),
      so that sock_put() is used only when needed.
      
      Fixes: 3b24d854 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72b03e54
    • Eric Dumazet's avatar
      tcp: fix return value for partial writes · b3523a07
      Eric Dumazet authored
      [ Upstream commit 79d8665b ]
      
      After my commit, tcp_sendmsg() might restart its loop after
      processing socket backlog.
      
      If sk_err is set, we blindly return an error, even though we
      copied data to user space before.
      
      We should instead return number of bytes that could be copied,
      otherwise user space might resend data and corrupt the stream.
      
      This might happen if another thread is using recvmsg(MSG_ERRQUEUE)
      to process timestamps.
      
      Issue was diagnosed by Soheil and Willem, big kudos to them !
      
      Fixes: d41a69f1 ("tcp: make tcp_sendmsg() aware of socket backlog")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Tested-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3523a07
    • Lance Richardson's avatar
      ipv4: allow local fragmentation in ip_finish_output_gso() · 1f49cc6f
      Lance Richardson authored
      [ Upstream commit 9ee6c5dc ]
      
      Some configurations (e.g. geneve interface with default
      MTU of 1500 over an ethernet interface with 1500 MTU) result
      in the transmission of packets that exceed the configured MTU.
      While this should be considered to be a "bad" configuration,
      it is still allowed and should not result in the sending
      of packets that exceed the configured MTU.
      
      Fix by dropping the assumption in ip_finish_output_gso() that
      locally originated gso packets will never need fragmentation.
      Basic testing using iperf (observing CPU usage and bandwidth)
      have shown no measurable performance impact for traffic not
      requiring fragmentation.
      
      Fixes: c7ba65d7 ("net: ip: push gso skb forwarding handling down the stack")
      Reported-by: default avatarJan Tluka <jtluka@redhat.com>
      Signed-off-by: default avatarLance Richardson <lrichard@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f49cc6f
    • Eric Dumazet's avatar
      tcp: fix potential memory corruption · 842a858f
      Eric Dumazet authored
      [ Upstream commit ac9e70b1 ]
      
      Imagine initial value of max_skb_frags is 17, and last
      skb in write queue has 15 frags.
      
      Then max_skb_frags is lowered to 14 or smaller value.
      
      tcp_sendmsg() will then be allowed to add additional page frags
      and eventually go past MAX_SKB_FRAGS, overflowing struct
      skb_shared_info.
      
      Fixes: 5f74f82e ("net:Add sysctl_max_skb_frags")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
      Cc: Håkon Bugge <haakon.bugge@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      842a858f
    • Eli Cooper's avatar
      ip6_tunnel: Clear IP6CB in ip6tunnel_xmit() · fc3b825f
      Eli Cooper authored
      [ Upstream commit 23f4ffed ]
      
      skb->cb may contain data from previous layers. In the observed scenario,
      the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so
      that small packets sent through the tunnel are mistakenly fragmented.
      
      This patch unconditionally clears the control buffer in ip6tunnel_xmit(),
      which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of
      these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEli Cooper <elicooper@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc3b825f
    • Andy Gospodarek's avatar
      bgmac: stop clearing DMA receive control register right after it is set · f5f4b71d
      Andy Gospodarek authored
      [ Upstream commit fcdefcca ]
      
      Current bgmac code initializes some DMA settings in the receive control
      register for some hardware and then immediately clears those settings.
      Not clearing those settings results in ~420Mbps *improvement* in
      throughput; this system can now receive frames at line-rate on Broadcom
      5871x hardware compared to ~520Mbps today.  I also tested a few other
      values but found there to be no discernible difference in CPU
      utilization even if burst size and prefetching values are different.
      
      On the hardware tested there was no need to keep the code that cleared
      all but bits 16-17, but since there is a wide variety of hardware that
      used this driver (I did not look at all hardware docs for hardware using
      this IP block), I find it wise to move this call up and clear bits just
      after reading the default value from the hardware rather than completely
      removing it.
      
      This is a good candidate for -stable >=3.14 since that is when the code
      that was supposed to improve performance (but did not) was introduced.
      Signed-off-by: default avatarAndy Gospodarek <gospo@broadcom.com>
      Fixes: 56ceecde ("bgmac: initialize the DMA controller of core...")
      Cc: Hauke Mehrtens <hauke@hauke-m.de>
      Acked-by: default avatarHauke Mehrtens <hauke@hauke-m.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5f4b71d
    • Eric Dumazet's avatar
      net: mangle zero checksum in skb_checksum_help() · 0c7f764d
      Eric Dumazet authored
      [ Upstream commit 4f2e4ad5 ]
      
      Sending zero checksum is ok for TCP, but not for UDP.
      
      UDPv6 receiver should by default drop a frame with a 0 checksum,
      and UDPv4 would not verify the checksum and might accept a corrupted
      packet.
      
      Simply replace such checksum by 0xffff, regardless of transport.
      
      This error was caught on SIT tunnels, but seems generic.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Acked-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c7f764d
    • Eric Dumazet's avatar
      net: clear sk_err_soft in sk_clone_lock() · ac22a3ba
      Eric Dumazet authored
      [ Upstream commit e551c32d ]
      
      At accept() time, it is possible the parent has a non zero
      sk_err_soft, leftover from a prior error.
      
      Make sure we do not leave this value in the child, as it
      makes future getsockopt(SO_ERROR) calls quite unreliable.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac22a3ba
    • Florian Westphal's avatar
      dctcp: avoid bogus doubling of cwnd after loss · 5b078dc6
      Florian Westphal authored
      [ Upstream commit ce6dd233 ]
      
      If a congestion control module doesn't provide .undo_cwnd function,
      tcp_undo_cwnd_reduction() will set cwnd to
      
         tp->snd_cwnd = max(tp->snd_cwnd, tp->snd_ssthresh << 1);
      
      ... which makes sense for reno (it sets ssthresh to half the current cwnd),
      but it makes no sense for dctcp, which sets ssthresh based on the current
      congestion estimate.
      
      This can cause severe growth of cwnd (eventually overflowing u32).
      
      Fix this by saving last cwnd on loss and restore cwnd based on that,
      similar to cubic and other algorithms.
      
      Fixes: e3118e83 ("net: tcp: add DCTCP congestion control algorithm")
      Cc: Lawrence Brakmo <brakmo@fb.com>
      Cc: Andrew Shewmaker <agshew@gmail.com>
      Cc: Glenn Judd <glenn.judd@morganstanley.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b078dc6
  2. 18 Nov, 2016 31 commits