1. 21 Nov, 2016 17 commits
    • Alexander Duyck's avatar
      fib_trie: Correct /proc/net/route off by one error · b78ba0a0
      Alexander Duyck authored
      [ Upstream commit fd0285a3 ]
      
      The display of /proc/net/route has had a couple issues due to the fact that
      when I originally rewrote most of fib_trie I made it so that the iterator
      was tracking the next value to use instead of the current.
      
      In addition it had an off by 1 error where I was tracking the first piece
      of data as position 0, even though in reality that belonged to the
      SEQ_START_TOKEN.
      
      This patch updates the code so the iterator tracks the last reported
      position and key instead of the next expected position and key.  In
      addition it shifts things so that all of the leaves start at 1 instead of
      trying to report leaves starting with offset 0 as being valid.  With these
      two issues addressed this should resolve any off by one errors that were
      present in the display of /proc/net/route.
      
      Fixes: 25b97c01 ("ipv4: off-by-one in continuation handling in /proc/net/route")
      Cc: Andy Whitcroft <apw@canonical.com>
      Reported-by: default avatarJason Baron <jbaron@akamai.com>
      Tested-by: default avatarJason Baron <jbaron@akamai.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b78ba0a0
    • David Ahern's avatar
      net: icmp6_send should use dst dev to determine L3 domain · 92fd1c1f
      David Ahern authored
      [ Upstream commit 5d41ce29 ]
      
      icmp6_send is called in response to some event. The skb may not have
      the device set (skb->dev is NULL), but it is expected to have a dst set.
      Update icmp6_send to use the dst on the skb to determine L3 domain.
      
      Fixes: ca254490 ("net: Add VRF support to IPv6 stack")
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      92fd1c1f
    • Daniel Borkmann's avatar
      bpf: fix htab map destruction when extra reserve is in use · 09ee0949
      Daniel Borkmann authored
      [ Upstream commit 483bed2b ]
      
      Commit a6ed3ea6 ("bpf: restore behavior of bpf_map_update_elem")
      added an extra per-cpu reserve to the hash table map to restore old
      behaviour from pre prealloc times. When non-prealloc is in use for a
      map, then problem is that once a hash table extra element has been
      linked into the hash-table, and the hash table is destroyed due to
      refcount dropping to zero, then htab_map_free() -> delete_all_elements()
      will walk the whole hash table and drop all elements via htab_elem_free().
      The problem is that the element from the extra reserve is first fed
      to the wrong backend allocator and eventually freed twice.
      
      Fixes: a6ed3ea6 ("bpf: restore behavior of bpf_map_update_elem")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09ee0949
    • Marcelo Ricardo Leitner's avatar
      sctp: assign assoc_id earlier in __sctp_connect · de289ad2
      Marcelo Ricardo Leitner authored
      [ Upstream commit 7233bc84 ]
      
      sctp_wait_for_connect() currently already holds the asoc to keep it
      alive during the sleep, in case another thread release it. But Andrey
      Konovalov and Dmitry Vyukov reported an use-after-free in such
      situation.
      
      Problem is that __sctp_connect() doesn't get a ref on the asoc and will
      do a read on the asoc after calling sctp_wait_for_connect(), but by then
      another thread may have closed it and the _put on sctp_wait_for_connect
      will actually release it, causing the use-after-free.
      
      Fix is, instead of doing the read after waiting for the connect, do it
      before so, and avoid this issue as the socket is still locked by then.
      There should be no issue on returning the asoc id in case of failure as
      the application shouldn't trust on that number in such situations
      anyway.
      
      This issue doesn't exist in sctp_sendmsg() path.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de289ad2
    • Eric Dumazet's avatar
      ipv6: dccp: add missing bind_conflict to dccp_ipv6_mapped · 76b5fee5
      Eric Dumazet authored
      [ Upstream commit 990ff4d8 ]
      
      While fuzzing kernel with syzkaller, Andrey reported a nasty crash
      in inet6_bind() caused by DCCP lacking a required method.
      
      Fixes: ab1e0a13 ("[SOCK] proto: Add hashinfo member to struct proto")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76b5fee5
    • Eric Dumazet's avatar
      ipv6: dccp: fix out of bound access in dccp_v6_err() · 84d9c612
      Eric Dumazet authored
      [ Upstream commit 1aa9d1a0 ]
      
      dccp_v6_err() does not use pskb_may_pull() and might access garbage.
      
      We only need 4 bytes at the beginning of the DCCP header, like TCP,
      so the 8 bytes pulled in icmpv6_notify() are more than enough.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84d9c612
    • Eric Dumazet's avatar
      dccp: fix out of bound access in dccp_v4_err() · ba93cf7d
      Eric Dumazet authored
      [ Upstream commit 6706a97f ]
      
      dccp_v4_err() does not use pskb_may_pull() and might access garbage.
      
      We only need 4 bytes at the beginning of the DCCP header, like TCP,
      so the 8 bytes pulled in icmp_socket_deliver() are more than enough.
      
      This patch might allow to process more ICMP messages, as some routers
      are still limiting the size of reflected bytes to 28 (RFC 792), instead
      of extended lengths (RFC 1812 4.3.2.3)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba93cf7d
    • Eric Dumazet's avatar
      dccp: do not send reset to already closed sockets · 378a6110
      Eric Dumazet authored
      [ Upstream commit 346da62c ]
      
      Andrey reported following warning while fuzzing with syzkaller
      
      WARNING: CPU: 1 PID: 21072 at net/dccp/proto.c:83 dccp_set_state+0x229/0x290
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 1 PID: 21072 Comm: syz-executor Not tainted 4.9.0-rc1+ #293
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       ffff88003d4c7738 ffffffff81b474f4 0000000000000003 dffffc0000000000
       ffffffff844f8b00 ffff88003d4c7804 ffff88003d4c7800 ffffffff8140c06a
       0000000041b58ab3 ffffffff8479ab7d ffffffff8140beae ffffffff8140cd00
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff81b474f4>] dump_stack+0xb3/0x10f lib/dump_stack.c:51
       [<ffffffff8140c06a>] panic+0x1bc/0x39d kernel/panic.c:179
       [<ffffffff8111125c>] __warn+0x1cc/0x1f0 kernel/panic.c:542
       [<ffffffff8111144c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
       [<ffffffff8389e5d9>] dccp_set_state+0x229/0x290 net/dccp/proto.c:83
       [<ffffffff838a0aa2>] dccp_close+0x612/0xc10 net/dccp/proto.c:1016
       [<ffffffff8316bf1f>] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
       [<ffffffff82b6e89e>] sock_release+0x8e/0x1d0 net/socket.c:570
       [<ffffffff82b6e9f6>] sock_close+0x16/0x20 net/socket.c:1017
       [<ffffffff815256ad>] __fput+0x29d/0x720 fs/file_table.c:208
       [<ffffffff81525bb5>] ____fput+0x15/0x20 fs/file_table.c:244
       [<ffffffff811727d8>] task_work_run+0xf8/0x170 kernel/task_work.c:116
       [<     inline     >] exit_task_work include/linux/task_work.h:21
       [<ffffffff8111bc53>] do_exit+0x883/0x2ac0 kernel/exit.c:828
       [<ffffffff811221fe>] do_group_exit+0x10e/0x340 kernel/exit.c:931
       [<ffffffff81143c94>] get_signal+0x634/0x15a0 kernel/signal.c:2307
       [<ffffffff81054aad>] do_signal+0x8d/0x1a30 arch/x86/kernel/signal.c:807
       [<ffffffff81003a05>] exit_to_usermode_loop+0xe5/0x130
      arch/x86/entry/common.c:156
       [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
       [<ffffffff81006298>] syscall_return_slowpath+0x1a8/0x1e0
      arch/x86/entry/common.c:259
       [<ffffffff83fc1a62>] entry_SYSCALL_64_fastpath+0xc0/0xc2
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Kernel Offset: disabled
      
      Fix this the same way we did for TCP in commit 565b7b2d
      ("tcp: do not send reset to already closed sockets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      378a6110
    • Eric Dumazet's avatar
      dccp: do not release listeners too soon · 72b03e54
      Eric Dumazet authored
      [ Upstream commit c3f24cfb ]
      
      Andrey Konovalov reported following error while fuzzing with syzkaller :
      
      IPv4: Attempt to release alive inet socket ffff880068e98940
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 3905 Comm: a.out Not tainted 4.9.0-rc3+ #333
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff88006b9e0000 task.stack: ffff880068770000
      RIP: 0010:[<ffffffff819ead5f>]  [<ffffffff819ead5f>]
      selinux_socket_sock_rcv_skb+0xff/0x6a0 security/selinux/hooks.c:4639
      RSP: 0018:ffff8800687771c8  EFLAGS: 00010202
      RAX: ffff88006b9e0000 RBX: 1ffff1000d0eee3f RCX: 1ffff1000d1d312a
      RDX: 1ffff1000d1d31a6 RSI: dffffc0000000000 RDI: 0000000000000010
      RBP: ffff880068777360 R08: 0000000000000000 R09: 0000000000000002
      R10: dffffc0000000000 R11: 0000000000000006 R12: ffff880068e98940
      R13: 0000000000000002 R14: ffff880068777338 R15: 0000000000000000
      FS:  00007f00ff760700(0000) GS:ffff88006cd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020008000 CR3: 000000006a308000 CR4: 00000000000006e0
      Stack:
       ffff8800687771e0 ffffffff812508a5 ffff8800686f3168 0000000000000007
       ffff88006ac8cdfc ffff8800665ea500 0000000041b58ab3 ffffffff847b5480
       ffffffff819eac60 ffff88006b9e0860 ffff88006b9e0868 ffff88006b9e07f0
      Call Trace:
       [<ffffffff819c8dd5>] security_sock_rcv_skb+0x75/0xb0 security/security.c:1317
       [<ffffffff82c2a9e7>] sk_filter_trim_cap+0x67/0x10e0 net/core/filter.c:81
       [<ffffffff82b81e60>] __sk_receive_skb+0x30/0xa00 net/core/sock.c:460
       [<ffffffff838bbf12>] dccp_v4_rcv+0xdb2/0x1910 net/dccp/ipv4.c:873
       [<ffffffff83069d22>] ip_local_deliver_finish+0x332/0xad0
      net/ipv4/ip_input.c:216
       [<     inline     >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
       [<     inline     >] NF_HOOK ./include/linux/netfilter.h:255
       [<ffffffff8306abd2>] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257
       [<     inline     >] dst_input ./include/net/dst.h:507
       [<ffffffff83068500>] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396
       [<     inline     >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
       [<     inline     >] NF_HOOK ./include/linux/netfilter.h:255
       [<ffffffff8306b82f>] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487
       [<ffffffff82bd9fb7>] __netif_receive_skb_core+0x1897/0x2a50 net/core/dev.c:4213
       [<ffffffff82bdb19a>] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251
       [<ffffffff82bdb493>] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4279
       [<ffffffff82bdb6b8>] netif_receive_skb+0x48/0x250 net/core/dev.c:4303
       [<ffffffff8241fc75>] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308
       [<ffffffff82421b5a>] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332
       [<     inline     >] new_sync_write fs/read_write.c:499
       [<ffffffff8151bd44>] __vfs_write+0x334/0x570 fs/read_write.c:512
       [<ffffffff8151f85b>] vfs_write+0x17b/0x500 fs/read_write.c:560
       [<     inline     >] SYSC_write fs/read_write.c:607
       [<ffffffff81523184>] SyS_write+0xd4/0x1a0 fs/read_write.c:599
       [<ffffffff83fc02c1>] entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      It turns out DCCP calls __sk_receive_skb(), and this broke when
      lookups no longer took a reference on listeners.
      
      Fix this issue by adding a @refcounted parameter to __sk_receive_skb(),
      so that sock_put() is used only when needed.
      
      Fixes: 3b24d854 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72b03e54
    • Eric Dumazet's avatar
      tcp: fix return value for partial writes · b3523a07
      Eric Dumazet authored
      [ Upstream commit 79d8665b ]
      
      After my commit, tcp_sendmsg() might restart its loop after
      processing socket backlog.
      
      If sk_err is set, we blindly return an error, even though we
      copied data to user space before.
      
      We should instead return number of bytes that could be copied,
      otherwise user space might resend data and corrupt the stream.
      
      This might happen if another thread is using recvmsg(MSG_ERRQUEUE)
      to process timestamps.
      
      Issue was diagnosed by Soheil and Willem, big kudos to them !
      
      Fixes: d41a69f1 ("tcp: make tcp_sendmsg() aware of socket backlog")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Tested-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3523a07
    • Lance Richardson's avatar
      ipv4: allow local fragmentation in ip_finish_output_gso() · 1f49cc6f
      Lance Richardson authored
      [ Upstream commit 9ee6c5dc ]
      
      Some configurations (e.g. geneve interface with default
      MTU of 1500 over an ethernet interface with 1500 MTU) result
      in the transmission of packets that exceed the configured MTU.
      While this should be considered to be a "bad" configuration,
      it is still allowed and should not result in the sending
      of packets that exceed the configured MTU.
      
      Fix by dropping the assumption in ip_finish_output_gso() that
      locally originated gso packets will never need fragmentation.
      Basic testing using iperf (observing CPU usage and bandwidth)
      have shown no measurable performance impact for traffic not
      requiring fragmentation.
      
      Fixes: c7ba65d7 ("net: ip: push gso skb forwarding handling down the stack")
      Reported-by: default avatarJan Tluka <jtluka@redhat.com>
      Signed-off-by: default avatarLance Richardson <lrichard@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f49cc6f
    • Eric Dumazet's avatar
      tcp: fix potential memory corruption · 842a858f
      Eric Dumazet authored
      [ Upstream commit ac9e70b1 ]
      
      Imagine initial value of max_skb_frags is 17, and last
      skb in write queue has 15 frags.
      
      Then max_skb_frags is lowered to 14 or smaller value.
      
      tcp_sendmsg() will then be allowed to add additional page frags
      and eventually go past MAX_SKB_FRAGS, overflowing struct
      skb_shared_info.
      
      Fixes: 5f74f82e ("net:Add sysctl_max_skb_frags")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
      Cc: Håkon Bugge <haakon.bugge@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      842a858f
    • Eli Cooper's avatar
      ip6_tunnel: Clear IP6CB in ip6tunnel_xmit() · fc3b825f
      Eli Cooper authored
      [ Upstream commit 23f4ffed ]
      
      skb->cb may contain data from previous layers. In the observed scenario,
      the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so
      that small packets sent through the tunnel are mistakenly fragmented.
      
      This patch unconditionally clears the control buffer in ip6tunnel_xmit(),
      which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of
      these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEli Cooper <elicooper@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc3b825f
    • Andy Gospodarek's avatar
      bgmac: stop clearing DMA receive control register right after it is set · f5f4b71d
      Andy Gospodarek authored
      [ Upstream commit fcdefcca ]
      
      Current bgmac code initializes some DMA settings in the receive control
      register for some hardware and then immediately clears those settings.
      Not clearing those settings results in ~420Mbps *improvement* in
      throughput; this system can now receive frames at line-rate on Broadcom
      5871x hardware compared to ~520Mbps today.  I also tested a few other
      values but found there to be no discernible difference in CPU
      utilization even if burst size and prefetching values are different.
      
      On the hardware tested there was no need to keep the code that cleared
      all but bits 16-17, but since there is a wide variety of hardware that
      used this driver (I did not look at all hardware docs for hardware using
      this IP block), I find it wise to move this call up and clear bits just
      after reading the default value from the hardware rather than completely
      removing it.
      
      This is a good candidate for -stable >=3.14 since that is when the code
      that was supposed to improve performance (but did not) was introduced.
      Signed-off-by: default avatarAndy Gospodarek <gospo@broadcom.com>
      Fixes: 56ceecde ("bgmac: initialize the DMA controller of core...")
      Cc: Hauke Mehrtens <hauke@hauke-m.de>
      Acked-by: default avatarHauke Mehrtens <hauke@hauke-m.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5f4b71d
    • Eric Dumazet's avatar
      net: mangle zero checksum in skb_checksum_help() · 0c7f764d
      Eric Dumazet authored
      [ Upstream commit 4f2e4ad5 ]
      
      Sending zero checksum is ok for TCP, but not for UDP.
      
      UDPv6 receiver should by default drop a frame with a 0 checksum,
      and UDPv4 would not verify the checksum and might accept a corrupted
      packet.
      
      Simply replace such checksum by 0xffff, regardless of transport.
      
      This error was caught on SIT tunnels, but seems generic.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Acked-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c7f764d
    • Eric Dumazet's avatar
      net: clear sk_err_soft in sk_clone_lock() · ac22a3ba
      Eric Dumazet authored
      [ Upstream commit e551c32d ]
      
      At accept() time, it is possible the parent has a non zero
      sk_err_soft, leftover from a prior error.
      
      Make sure we do not leave this value in the child, as it
      makes future getsockopt(SO_ERROR) calls quite unreliable.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac22a3ba
    • Florian Westphal's avatar
      dctcp: avoid bogus doubling of cwnd after loss · 5b078dc6
      Florian Westphal authored
      [ Upstream commit ce6dd233 ]
      
      If a congestion control module doesn't provide .undo_cwnd function,
      tcp_undo_cwnd_reduction() will set cwnd to
      
         tp->snd_cwnd = max(tp->snd_cwnd, tp->snd_ssthresh << 1);
      
      ... which makes sense for reno (it sets ssthresh to half the current cwnd),
      but it makes no sense for dctcp, which sets ssthresh based on the current
      congestion estimate.
      
      This can cause severe growth of cwnd (eventually overflowing u32).
      
      Fix this by saving last cwnd on loss and restore cwnd based on that,
      similar to cubic and other algorithms.
      
      Fixes: e3118e83 ("net: tcp: add DCTCP congestion control algorithm")
      Cc: Lawrence Brakmo <brakmo@fb.com>
      Cc: Andrew Shewmaker <agshew@gmail.com>
      Cc: Glenn Judd <glenn.judd@morganstanley.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b078dc6
  2. 18 Nov, 2016 23 commits