1. 26 Apr, 2018 15 commits
    • David S. Miller's avatar
      Merge branch 'udp-gso' · cb586c63
      David S. Miller authored
      Willem de Bruijn says:
      
      ====================
      udp gso
      
      Segmentation offload reduces cycles/byte for large packets by
      amortizing the cost of protocol stack traversal.
      
      This patchset implements GSO for UDP. A process can concatenate and
      submit multiple datagrams to the same destination in one send call
      by setting socket option SOL_UDP/UDP_SEGMENT with the segment size,
      or passing an analogous cmsg at send time.
      
      The stack will send the entire large (up to network layer max size)
      datagram through the protocol layer. At the GSO layer, it is broken
      up in individual segments. All receive the same network layer header
      and UDP src and dst port. All but the last segment have the same UDP
      header, but the last may differ in length and checksum.
      
      Initial results show a significant reduction in UDP cycles/byte.
      See the main patch for more details and benchmark results.
      
              udp
                876 MB/s 14873 msg/s 624666 calls/s
                  11,205,777,429      cycles
      
              udp gso
               2139 MB/s 36282 msg/s 36282 calls/s
                  11,204,374,561      cycles
      
      The patch set is broken down as follows:
      - patch 1 is a prerequisite: code rearrangement, noop otherwise
      - patch 2 implements the gso logic
      - patch 3 adds protocol stack support for UDP_SEGMENT
      - patch 4,5,7 are refinements
      - patch 6 adds the cmsg interface
      - patch 8..11 are tests
      
      This idea was presented previously at netconf 2017-2
      http://vger.kernel.org/netconf2017_files/rx_hardening_and_udp_gso.pdf
      
      Changes v1 -> v2
        - Convert __udp_gso_segment to modify headers after skb_segment
        - Split main patch into two, one for gso logic, one for UDP_SEGMENT
      
      Changes RFC -> v1
        - MSG_MORE:
            fixed, by allowing checksum offload with corking if gso
        - SKB_GSO_UDP_L4:
            made independent from SKB_GSO_UDP
            and removed skb_is_ufo() wrapper
        - NETIF_F_GSO_UDP_L4:
            add to netdev_features_string
            and to netdev-features.txt
            add BUILD_BUG_ON to match SKB_GSO_UDP_L4 value
        - UDP_MAX_SEGMENTS:
            introduce limit on number of segments per gso skb
            to avoid extreme cases like IP_MAX_MTU/IPV4_MIN_MTU
        - CHECKSUM_PARTIAL:
            test against missing feature after ndo_features_check
            if not supported return error, analogous to udp_send_check
        - MSG_ZEROCOPY: removed, deferred for now
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb586c63
    • Willem de Bruijn's avatar
      selftests: udp gso benchmark · 3a687bef
      Willem de Bruijn authored
      Send udp data between a source and sink, optionally with udp gso.
      The two processes are expected to be run on separate hosts.
      
      A script is included that runs them together over loopback in a
      single namespace for functionality testing.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a687bef
    • Willem de Bruijn's avatar
      selftests: udp gso with corking · 3f12817f
      Willem de Bruijn authored
      Corked sockets take a different path to construct a udp datagram than
      the lockless fast path. Test this alternate path.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f12817f
    • Willem de Bruijn's avatar
      selftests: udp gso with connected sockets · e5b2d91c
      Willem de Bruijn authored
      Connected sockets use path mtu instead of device mtu.
      
      Test this path by inserting a route mtu that is lower than the device
      mtu. Verify that the path mtu for the connection matches this lower
      number, then run the same test as in the connectionless case.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5b2d91c
    • Willem de Bruijn's avatar
      selftests: udp gso · a1607257
      Willem de Bruijn authored
      Validate udp gso, including edge cases (such as min/max gso sizes).
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1607257
    • Willem de Bruijn's avatar
      udp: add gso support to virtual devices · 83aa025f
      Willem de Bruijn authored
      Virtual devices such as tunnels and bonding can handle large packets.
      Only segment packets when reaching a physical or loopback device.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83aa025f
    • Willem de Bruijn's avatar
      udp: add gso segment cmsg · 2e8de857
      Willem de Bruijn authored
      Allow specifying segment size in the send call.
      
      The new control message performs the same function as socket option
      UDP_SEGMENT while avoiding the extra system call.
      
      [ Export udp_cmsg_send for ipv6. -DaveM ]
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e8de857
    • Willem de Bruijn's avatar
      udp: paged allocation with gso · 15e36f5b
      Willem de Bruijn authored
      When sending large datagrams that are later segmented, store data in
      page frags to avoid copying from linear in skb_segment.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15e36f5b
    • Willem de Bruijn's avatar
      udp: better wmem accounting on gso · ad405857
      Willem de Bruijn authored
      skb_segment by default transfers allocated wmem from the gso skb
      to the tail of the segment list. This underreports real truesize
      of the list, especially if the tail might be dropped.
      
      Similar to tcp_gso_segment, update wmem_alloc with the aggregate
      list truesize and make each segment responsible for its own
      share by setting skb->destructor.
      
      Clear gso_skb->destructor prior to calling skb_segment to skip
      the default assignment to tail.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad405857
    • Willem de Bruijn's avatar
      udp: generate gso with UDP_SEGMENT · bec1f6f6
      Willem de Bruijn authored
      Support generic segmentation offload for udp datagrams. Callers can
      concatenate and send at once the payload of multiple datagrams with
      the same destination.
      
      To set segment size, the caller sets socket option UDP_SEGMENT to the
      length of each discrete payload. This value must be smaller than or
      equal to the relevant MTU.
      
      A follow-up patch adds cmsg UDP_SEGMENT to specify segment size on a
      per send call basis.
      
      Total byte length may then exceed MTU. If not an exact multiple of
      segment size, the last segment will be shorter.
      
      The implementation adds a gso_size field to the udp socket, ip(v6)
      cmsg cookie and inet_cork structure to be able to set the value at
      setsockopt or cmsg time and to work with both lockless and corked
      paths.
      
      Initial benchmark numbers show UDP GSO about as expensive as TCP GSO.
      
          tcp tso
           3197 MB/s 54232 msg/s 54232 calls/s
               6,457,754,262      cycles
      
          tcp gso
           1765 MB/s 29939 msg/s 29939 calls/s
              11,203,021,806      cycles
      
          tcp without tso/gso *
            739 MB/s 12548 msg/s 12548 calls/s
              11,205,483,630      cycles
      
          udp
            876 MB/s 14873 msg/s 624666 calls/s
              11,205,777,429      cycles
      
          udp gso
           2139 MB/s 36282 msg/s 36282 calls/s
              11,204,374,561      cycles
      
         [*] after reverting commit 0a6b2a1d
             ("tcp: switch to GSO being always on")
      
      Measured total system cycles ('-a') for one core while pinning both
      the network receive path and benchmark process to that core:
      
        perf stat -a -C 12 -e cycles \
          ./udpgso_bench_tx -C 12 -4 -D "$DST" -l 4
      
      Note the reduction in calls/s with GSO. Bytes per syscall drops
      increases from 1470 to 61818.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bec1f6f6
    • Willem de Bruijn's avatar
      udp: add udp gso · ee80d1eb
      Willem de Bruijn authored
      Implement generic segmentation offload support for udp datagrams. A
      follow-up patch adds support to the protocol stack to generate such
      packets.
      
      UDP GSO is not UFO. UFO fragments a single large datagram. GSO splits
      a large payload into a number of discrete UDP datagrams.
      
      The implementation adds a GSO type SKB_UDP_GSO_L4 to differentiate it
      from UFO (SKB_UDP_GSO).
      
      IPPROTO_UDPLITE is excluded, as that protocol has no gso handler
      registered.
      
      [ Export __udp_gso_segment for ipv6. -DaveM ]
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee80d1eb
    • Willem de Bruijn's avatar
      udp: expose inet cork to udp · 1cd7884d
      Willem de Bruijn authored
      UDP segmentation offload needs access to inet_cork in the udp layer.
      Pass the struct to ip(6)_make_skb instead of allocating it on the
      stack in that function itself.
      
      This patch is a noop otherwise.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cd7884d
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · a9537c93
      David S. Miller authored
      Merging net into net-next to help the bpf folks avoid
      some really ugly merge conflicts.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9537c93
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · e9350d44
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      1GbE Intel Wired LAN Driver Updates 2018-04-25
      
      This series enables some ethtool and tc-flower filters to be offloaded
      to igb-based network controllers. This is useful when the system
      configuration wants to steer kinds of traffic to a specific hardware
      queue for i210 devices only.
      
      The first two patch in the series are bug fixes.
      
      The basis of this series is to export the internal API used to
      configure address filters, so they can be used by ethtool, and
      extending the functionality so an source address can be handled.
      
      Then, we enable the tc-flower offloading implementation to re-use the
      same infrastructure as ethtool, and storing them in the per-adapter
      "nfc" (Network Filter Config?) list. But for consistency, for
      destructive access they are separated, i.e. an filter added by
      tc-flower can only be removed by tc-flower, but ethtool can read them
      all.
      
      Only support for VLAN Prio, Source and Destination MAC Address, and
      Ethertype is enabled for now.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9350d44
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 25eb0ea7
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-04-25
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix to clear the percpu metadata_dst that could otherwise carry
         stale ip_tunnel_info, from William.
      
      2) Fix that reduces the number of passes in x64 JIT with regards to
         dead code sanitation to avoid risk of prog rejection, from Gianluca.
      
      3) Several fixes of sockmap programs, besides others, fixing a double
         page_put() in error path, missing refcount hold for pinned sockmap,
         adding required -target bpf for clang in sample Makefile, from John.
      
      4) Fix to disable preemption in __BPF_PROG_RUN_ARRAY() paths, from Roman.
      
      5) Fix tools/bpf/ Makefile with regards to a lex/yacc build error
         seen on older gcc-5, from John.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25eb0ea7
  2. 25 Apr, 2018 25 commits