1. 03 May, 2016 27 commits
  2. 02 May, 2016 13 commits
    • David S. Miller's avatar
      Merge branch 'ipv6-tunnel-cleanups' · d1ac3b16
      David S. Miller authored
      Tom Herbert says:
      
      ====================
      net: Cleanup IPv6 ip tunnels
      
      The IPv6 tunnel code is very different from IPv4 code. There is a lot
      of redundancy with the IPv4 code, particularly in the GRE tunneling.
      
      This patch set cleans up the tunnel code to make the IPv6 code look
      more like the IPv4 code and use common functions between the two
      stacks where possible.
      
      This work should make it easier to maintain and extend the IPv6 ip
      tunnels.
      
      Items in this patch set:
        - Cleanup IPv6 tunnel receive path (ip6_tnl_rcv). Includes using
          gro_cells and exporting ip6_tnl_rcv so the ip6_gre can call it
        - Move GRE functions to common header file (tx functions) or
          gre_demux.c (rx functions like gre_parse_header)
        - Call common GRE functions from IPv6 GRE
        - Create ip6_tnl_xmit (to be like ip_tunnel_xmit)
      
      Tested:
        Ran super_netperf tests for TCP_RR and TCP_STREAM for:
          - IPv4 over gre, gretap, gre6, gre6tap
          - IPv6 over gre, gretap, gre6, gre6tap
          - ipip
          - ip6ip6
          - ipip/gue
          - IPv6 over gre/gue
          - IPv4 over gre/gue
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1ac3b16
    • Tom Herbert's avatar
      gre6: Cleanup GREv6 transmit path, call common GRE functions · b05229f4
      Tom Herbert authored
      Changes in GREv6 transmit path:
        - Call gre_checksum, remove gre6_checksum
        - Rename ip6gre_xmit2 to __gre6_xmit
        - Call gre_build_header utility function
        - Call ip6_tnl_xmit common function
        - Call ip6_tnl_change_mtu, eliminate ip6gre_tunnel_change_mtu
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b05229f4
    • Tom Herbert's avatar
      ipv6: Generic tunnel cleanup · 79ecb90e
      Tom Herbert authored
      A few generic changes to generalize tunnels in IPv6:
        - Export ip6_tnl_change_mtu so that it can be called by ip6_gre
        - Add tun_hlen to ip6_tnl structure.
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79ecb90e
    • Tom Herbert's avatar
      gre: Create common functions for transmit · 182a352d
      Tom Herbert authored
      Create common functions for both IPv4 and IPv6 GRE in transmit. These
      are put into gre.h.
      
      Common functions are for:
        - GRE checksum calculation. Move gre_checksum to gre.h.
        - Building a GRE header. Move GRE build_header and rename
          gre_build_header.
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      182a352d
    • Tom Herbert's avatar
      ipv6: Create ip6_tnl_xmit · 8eb30be0
      Tom Herbert authored
      This patch renames ip6_tnl_xmit2 to ip6_tnl_xmit and exports it. Other
      users like GRE will be able to call this. The original ip6_tnl_xmit
      function is renamed to ip6_tnl_start_xmit (this is an ndo_start_xmit
      function).
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8eb30be0
    • Tom Herbert's avatar
      gre6: Cleanup GREv6 receive path, call common GRE functions · 308edfdf
      Tom Herbert authored
      - Create gre_rcv function. This calls gre_parse_header and ip6gre_rcv.
        - Call ip6_tnl_rcv. Doing this and using gre_parse_header eliminates
          most of the code in ip6gre_rcv.
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      308edfdf
    • Tom Herbert's avatar
      gre: Move utility functions to common headers · 95f5c64c
      Tom Herbert authored
      Several of the GRE functions defined in net/ipv4/ip_gre.c are usable
      for IPv6 GRE implementation (that is they are protocol agnostic).
      
      These include:
        - GRE flag handling functions are move to gre.h
        - GRE build_header is moved to gre.h and renamed gre_build_header
        - parse_gre_header is moved to gre_demux.c and renamed gre_parse_header
        - iptunnel_pull_header is taken out of gre_parse_header. This is now
          done by caller. The header length is returned from gre_parse_header
          in an int* argument.
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95f5c64c
    • Tom Herbert's avatar
      ipv6: Cleanup IPv6 tunnel receive path · 0d3c703a
      Tom Herbert authored
      Some basic changes to make IPv6 tunnel receive path look more like
      IPv4 path:
        - Make ip6_tnl_rcv non-static so that GREv6 and others can call it
        - Make ip6_tnl_rcv look like ip_tunnel_rcv
        - Switch to gro_cells_receive
        - Make ip6_tnl_rcv non-static and export it
      Signed-off-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d3c703a
    • David S. Miller's avatar
      Merge branch 'tcp-preempt' · 570d6320
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      net: make TCP preemptible
      
      Most of TCP stack assumed it was running from BH handler.
      
      This is great for most things, as TCP behavior is very sensitive
      to scheduling artifacts.
      
      However, the prequeue and backlog processing are problematic,
      as they need to be flushed with BH being blocked.
      
      To cope with modern needs, TCP sockets have big sk_rcvbuf values,
      in the order of 16 MB, and soon 32 MB.
      This means that backlog can hold thousands of packets, and things
      like TCP coalescing or collapsing on this amount of packets can
      lead to insane latency spikes, since BH are blocked for too long.
      
      It is time to make UDP/TCP stacks preemptible.
      
      Note that fast path still runs from BH handler.
      
      v2: Added "tcp: make tcp_sendmsg() aware of socket backlog"
          to reduce latency problems of large sends.
      
      v3: Fixed a typo in tcp_cdg.c
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      570d6320
    • Eric Dumazet's avatar
      tcp: make tcp_sendmsg() aware of socket backlog · d41a69f1
      Eric Dumazet authored
      Large sendmsg()/write() hold socket lock for the duration of the call,
      unless sk->sk_sndbuf limit is hit. This is bad because incoming packets
      are parked into socket backlog for a long time.
      Critical decisions like fast retransmit might be delayed.
      Receivers have to maintain a big out of order queue with additional cpu
      overhead, and also possible stalls in TX once windows are full.
      
      Bidirectional flows are particularly hurt since the backlog can become
      quite big if the copy from user space triggers IO (page faults)
      
      Some applications learnt to use sendmsg() (or sendmmsg()) with small
      chunks to avoid this issue.
      
      Kernel should know better, right ?
      
      Add a generic sk_flush_backlog() helper and use it right
      before a new skb is allocated. Typically we put 64KB of payload
      per skb (unless MSG_EOR is requested) and checking socket backlog
      every 64KB gives good results.
      
      As a matter of fact, tests with TSO/GSO disabled give very nice
      results, as we manage to keep a small write queue and smaller
      perceived rtt.
      
      Note that sk_flush_backlog() maintains socket ownership,
      so is not equivalent to a {release_sock(sk); lock_sock(sk);},
      to ensure implicit atomicity rules that sendmsg() was
      giving to (possibly buggy) applications.
      
      In this simple implementation, I chose to not call tcp_release_cb(),
      but we might consider this later.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d41a69f1
    • Eric Dumazet's avatar
      net: do not block BH while processing socket backlog · 5413d1ba
      Eric Dumazet authored
      Socket backlog processing is a major latency source.
      
      With current TCP socket sk_rcvbuf limits, I have sampled __release_sock()
      holding cpu for more than 5 ms, and packets being dropped by the NIC
      once ring buffer is filled.
      
      All users are now ready to be called from process context,
      we can unblock BH and let interrupts be serviced faster.
      
      cond_resched_softirq() could be removed, as it has no more user.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5413d1ba
    • Eric Dumazet's avatar
      sctp: prepare for socket backlog behavior change · 860fbbc3
      Eric Dumazet authored
      sctp_inq_push() will soon be called without BH being blocked
      when generic socket code flushes the socket backlog.
      
      It is very possible SCTP can be converted to not rely on BH,
      but this needs to be done by SCTP experts.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      860fbbc3
    • Eric Dumazet's avatar
      udp: prepare for non BH masking at backlog processing · e61da9e2
      Eric Dumazet authored
      UDP uses the generic socket backlog code, and this will soon
      be changed to not disable BH when protocol is called back.
      
      We need to use appropriate SNMP accessors.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e61da9e2