1. 25 Apr, 2016 9 commits
  2. 24 Apr, 2016 16 commits
    • Eric Dumazet's avatar
      tcp-tso: do not split TSO packets at retransmit time · 10d3be56
      Eric Dumazet authored
      
      Linux TCP stack painfully segments all TSO/GSO packets before retransmits.
      
      This was fine back in the days when TSO/GSO were emerging, with their
      bugs, but we believe the dark age is over.
      
      Keeping big packets in write queues, but also in stack traversal
      has a lot of benefits.
       - Less memory overhead, because write queues have less skbs
       - Less cpu overhead at ACK processing.
       - Better SACK processing, as lot of studies mentioned how
         awful linux was at this ;)
       - Less cpu overhead to send the rtx packets
         (IP stack traversal, netfilter traversal, drivers...)
       - Better latencies in presence of losses.
       - Smaller spikes in fq like packet schedulers, as retransmits
         are not constrained by TCP Small Queues.
      
      1 % packet losses are common today, and at 100Gbit speeds, this
      translates to ~80,000 losses per second.
      Losses are often correlated, and we see many retransmit events
      leading to 1-MSS train of packets, at the time hosts are already
      under stress.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10d3be56
    • Parthasarathy Bhuvaragan's avatar
      tipc: fix stale links after re-enabling bearer · 8cee83dd
      Parthasarathy Bhuvaragan authored
      Commit 42b18f60 ("tipc: refactor function tipc_link_timeout()"),
      introduced a bug which prevents sending of probe messages during
      link synchronization phase. This leads to hanging links, if the
      bearer is disabled/enabled after links are up.
      
      In this commit, we send the probe messages correctly.
      
      Fixes: 42b18f60
      
       ("tipc: refactor function tipc_link_timeout()")
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8cee83dd
    • David S. Miller's avatar
      Merge branch 'tcp-tcstamp_ack-frag-coalesce' · 6a74c196
      David S. Miller authored
      
      Martin KaFai Lau says:
      
      ====================
      tcp: Handle txstamp_ack when fragmenting/coalescing skbs
      
      This patchset is to handle the txstamp-ack bit when
      fragmenting/coalescing skbs.
      
      The second patch depends on the recently posted series
      for the net branch:
      "tcp: Merge timestamp info when coalescing skbs"
      
      A BPF prog is used to kprobe to sock_queue_err_skb()
      and print out the value of serr->ee.ee_data.  The BPF
      prog (run-able from bcc) is attached here:
      
      BPF prog used for testing:
      ~~~~~
      
      from __future__ import print_function
      from bcc import BPF
      
      bpf_text = """
      
      int trace_err_skb(struct pt_regs *ctx)
      {
      	struct sk_buff *skb = (struct sk_buff *)ctx->si;
      	struct sock *sk = (struct sock *)ctx->di;
      	struct sock_exterr_skb *serr;
      	u32 ee_data = 0;
      
      	if (!sk || !skb)
      		return 0;
      
      	serr = SKB_EXT_ERR(skb);
      	bpf_probe_read(&ee_data, sizeof(ee_data), &serr->ee.ee_data);
      	bpf_trace_printk("ee_data:%u\\n", ee_data);
      
      	return 0;
      };
      """
      
      b = BPF(text=bpf_text)
      b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb")
      print("Attached to kprobe")
      b.trace_print()
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a74c196
    • Martin KaFai Lau's avatar
      tcp: Merge txstamp_ack in tcp_skb_collapse_tstamp · 2de8023e
      Martin KaFai Lau authored
      
      When collapsing skbs, txstamp_ack also needs to be merged.
      
      Retrans Collapse Test:
      ~~~~~~
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      0.200 write(4, ..., 730) = 730
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 730) = 730
      +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
      0.200 write(4, ..., 11680) = 11680
      
      0.200 > P. 1:731(730) ack 1
      0.200 > P. 731:1461(730) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:13141(4380) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop>
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop>
      0.300 > P. 1:1461(1460) ack 1
      0.400 < . 1:1(0) ack 13141 win 257
      
      BPF Output Before:
      ~~~~~
      <No output due to missing SCM_TSTAMP_ACK timestamp>
      
      BPF Output After:
      ~~~~~
      <...>-2027  [007] d.s.    79.765921: : ee_data:1459
      
      Sacks Collapse Test:
      ~~~~~
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      0.200 write(4, ..., 1460) = 1460
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 13140) = 13140
      +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
      
      0.200 > P. 1:1461(1460) ack 1
      0.200 > . 1461:8761(7300) ack 1
      0.200 > P. 8761:14601(5840) ack 1
      
      0.300 < . 1:1(0) ack 1 win 257 <sack 1461:14601,nop,nop>
      0.300 > P. 1:1461(1460) ack 1
      0.400 < . 1:1(0) ack 14601 win 257
      
      BPF Output Before:
      ~~~~~
      <No output due to missing SCM_TSTAMP_ACK timestamp>
      
      BPF Output After:
      ~~~~~
      <...>-2049  [007] d.s.    89.185538: : ee_data:14599
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2de8023e
    • Martin KaFai Lau's avatar
      tcp: Carry txstamp_ack in tcp_fragment_tstamp · b51e13fa
      Martin KaFai Lau authored
      
      When a tcp skb is sliced into two smaller skbs (e.g. in
      tcp_fragment() and tso_fragment()),  it does not carry
      the txstamp_ack bit to the newly created skb if it is needed.
      The end result is a timestamping event (SCM_TSTAMP_ACK) will
      be missing from the sk->sk_error_queue.
      
      This patch carries this bit to the new skb2
      in tcp_fragment_tstamp().
      
      BPF Output Before:
      ~~~~~~
      <No output due to missing SCM_TSTAMP_ACK timestamp>
      
      BPF Output After:
      ~~~~~~
      <...>-2050  [000] d.s.   100.928763: : ee_data:14599
      
      Packetdrill Script:
      ~~~~~~
      +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
      +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
      +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0 bind(3, ..., ...) = 0
      +0 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
      
      +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
      0.200 write(4, ..., 14600) = 14600
      +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
      
      0.200 > . 1:7301(7300) ack 1
      0.200 > P. 7301:14601(7300) ack 1
      
      0.300 < . 1:1(0) ack 14601 win 257
      
      0.300 close(4) = 0
      0.300 > F. 14601:14601(0) ack 1
      0.400 < F. 1:1(0) ack 16062 win 257
      0.400 > . 14602:14602(0) ack 2
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Tested-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b51e13fa
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 11afbff8
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for your net-next
      tree, mostly from Florian Westphal to sort out the lack of sufficient
      validation in x_tables and connlabel preparation patches to add
      nf_tables support. They are:
      
      1) Ensure we don't go over the ruleset blob boundaries in
         mark_source_chains().
      
      2) Validate that target jumps land on an existing xt_entry. This extra
         sanitization comes with a performance penalty when loading the ruleset.
      
      3) Introduce xt_check_entry_offsets() and use it from {arp,ip,ip6}tables.
      
      4) Get rid of the smallish check_entry() functions in {arp,ip,ip6}tables.
      
      5) Make sure the minimal possible target size in x_tables.
      
      6) Similar to #3, add xt_compat_check_entry_offsets() for compat code.
      
      7) Check that standard target size is valid.
      
      8) More sanitization to ensure that the target_offset field is correct.
      
      9) Add xt_ch...
      11afbff8
    • David S. Miller's avatar
      Merge branch 'nla_align-more' · 8d9ea160
      David S. Miller authored
      
      Nicolas Dichtel says:
      
      ====================
      netlink: align attributes when needed (patchset #1)
      
      This is the continuation of the work done to align netlink attributes
      when these attributes contain some 64-bit fields.
      
      David, if the third patch is too big (or maybe the series), I can split it.
      Just tell me what you prefer.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d9ea160
    • Nicolas Dichtel's avatar
      taskstats: use the libnl API to align nlattr on 64-bit · 80df5542
      Nicolas Dichtel authored
      
      Goal of this patch is to use the new libnl API to align netlink attribute
      when needed.
      The layout of the netlink message will be a bit different after the patch,
      because the padattr (TASKSTATS_TYPE_STATS) will be inside the nested
      attribute instead of before it.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80df5542
    • Nicolas Dichtel's avatar
    • Nicolas Dichtel's avatar
      libnl: add nla_put_u64_64bit() helper · 73520786
      Nicolas Dichtel authored
      
      With this function, nla_data() is aligned on a 64-bit area.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73520786
    • Nicolas Dichtel's avatar
      libnl: nla_put_msecs(): align on a 64-bit area · 2175d87c
      Nicolas Dichtel authored
      
      nla_data() is now aligned on a 64-bit area.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2175d87c
    • Nicolas Dichtel's avatar
      libnl: nla_put_s64(): align on a 64-bit area · 756a2f59
      Nicolas Dichtel authored
      
      nla_data() is now aligned on a 64-bit area.
      In fact, there is no user of this function.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      756a2f59
    • Nicolas Dichtel's avatar
      libnl: nla_put_net64(): align on a 64-bit area · e9bbe898
      Nicolas Dichtel authored
      
      nla_data() is now aligned on a 64-bit area.
      
      The temporary function nla_put_be64_32bit() is removed in this patch.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9bbe898
    • Nicolas Dichtel's avatar
      libnl: nla_put_be64(): align on a 64-bit area · b46f6ded
      Nicolas Dichtel authored
      
      nla_data() is now aligned on a 64-bit area.
      
      A temporary version (nla_put_be64_32bit()) is added for nla_put_net64().
      This function is removed in the next patch.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b46f6ded
    • Nicolas Dichtel's avatar
      libnl: nla_put_le64(): align on a 64-bit area · e7479122
      Nicolas Dichtel authored
      
      nla_data() is now aligned on a 64-bit area.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7479122
    • Nicolas Dichtel's avatar
      libnl: fix help of _64bit functions · 11a99573
      Nicolas Dichtel authored
      Fix typo and describe 'padattr'.
      
      Fixes: 089bf1a6
      
       ("libnl: add more helpers to align attributes on 64-bit")
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11a99573
  3. 23 Apr, 2016 1 commit
  4. 21 Apr, 2016 14 commits