1. 11 Feb, 2016 11 commits
    • Craig Gallek's avatar
      soreuseport: BPF selection functional test for TCP · 4b2a6aed
      Craig Gallek authored
      Unfortunately the existing test relied on packet payload in order to
      map incoming packets to sockets.  In order to get this to work with TCP,
      TCP_FASTOPEN needed to be used.
      
      Since the fast open path is slightly different than the standard TCP path,
      I created a second test which sends to reuseport group members based
      on receiving cpu core id.  This will probably serve as a better
      real-world example use as well.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b2a6aed
    • Craig Gallek's avatar
      soreuseport: fast reuseport TCP socket selection · c125e80b
      Craig Gallek authored
      This change extends the fast SO_REUSEPORT socket lookup implemented
      for UDP to TCP.  Listener sockets with SO_REUSEPORT and the same
      receive address are additionally added to an array for faster
      random access.  This means that only a single socket from the group
      must be found in the listener list before any socket in the group can
      be used to receive a packet.  Previously, every socket in the group
      needed to be considered before handing off the incoming packet.
      
      This feature also exposes the ability to use a BPF program when
      selecting a socket from a reuseport group.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c125e80b
    • Craig Gallek's avatar
      soreuseport: Prep for fast reuseport TCP socket selection · fa463497
      Craig Gallek authored
      Both of the lines in this patch probably should have been included
      in the initial implementation of this code for generic socket
      support, but weren't technically necessary since only UDP sockets
      were supported.
      
      First, the sk_reuseport_cb points to a structure which assumes
      each socket in the group has this pointer assigned at the same
      time it's added to the array in the structure.  The sk_clone_lock
      function breaks this assumption.  Since a child socket shouldn't
      implicitly be in a reuseport group, the simple fix is to clear
      the field in the clone.
      
      Second, the SO_ATTACH_REUSEPORT_xBPF socket options require that
      SO_REUSEPORT also be set first.  For UDP sockets, this is easily
      enforced at bind-time since that process both puts the socket in
      the appropriate receive hlist and updates the reuseport structures.
      Since these operations can happen at two different times for TCP
      sockets (bind and listen) it must be explicitly checked to enforce
      the use of SO_REUSEPORT with SO_ATTACH_REUSEPORT_xBPF in the
      setsockopt call.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa463497
    • Craig Gallek's avatar
      inet: refactor inet[6]_lookup functions to take skb · a583636a
      Craig Gallek authored
      This is a preliminary step to allow fast socket lookup of SO_REUSEPORT
      groups.  Doing so with a BPF filter will require access to the
      skb in question.  This change plumbs the skb (and offset to payload
      data) through the call stack to the listening socket lookup
      implementations where it will be used in a following patch.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a583636a
    • Craig Gallek's avatar
      tcp: __tcp_hdrlen() helper · d9b3fca2
      Craig Gallek authored
      tcp_hdrlen is wasteful if you already have a pointer to struct tcphdr.
      This splits the size calculation into a helper function that can be
      used if a struct tcphdr is already available.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9b3fca2
    • Craig Gallek's avatar
      inet: create IPv6-equivalent inet_hash function · 496611d7
      Craig Gallek authored
      In order to support fast lookups for TCP sockets with SO_REUSEPORT,
      the function that adds sockets to the listening hash set needs
      to be able to check receive address equality.  Since this equality
      check is different for IPv4 and IPv6, we will need two different
      socket hashing functions.
      
      This patch adds inet6_hash identical to the existing inet_hash function
      and updates the appropriate references.  A following patch will
      differentiate the two by passing different comparison functions to
      __inet_hash.
      
      Additionally, in order to use the IPv6 address equality function from
      inet6_hashtables (which is compiled as a built-in object when IPv6 is
      enabled) it also needs to be in a built-in object file as well.  This
      moves ipv6_rcv_saddr_equal into inet_hashtables to accomplish this.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      496611d7
    • Craig Gallek's avatar
      sock: struct proto hash function may error · 086c653f
      Craig Gallek authored
      In order to support fast reuseport lookups in TCP, the hash function
      defined in struct proto must be capable of returning an error code.
      This patch changes the function signature of all related hash functions
      to return an integer and handles or propagates this return value at
      all call sites.
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      086c653f
    • David S. Miller's avatar
      Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge · 30c1de08
      David S. Miller authored
      Antonio Quartulli says:
      
      ====================
      Here you have a batch of patches by Sven Eckelmann that
      drops our private reference counting implementation and
      substitutes it with the kref objects/functions.
      
      Then you have a patch, by Simon Wunderlich, that
      makes the broadcast protection window code more generic so
      that it can be re-used in the future by other components
      with different requirements.
      
      Lastly, Sven is also introducing two lockdep asserts in
      functions operating on our TVLV container list, to make
      sure that the proper lock is always acquired by the users.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30c1de08
    • David S. Miller's avatar
      Merge branch 'be2net-next' · dba6cf55
      David S. Miller authored
      Ajit Khaparde says:
      
      ====================
      be2net Patch series
      
      Please consider applying these two patches to net-next
      
        Patch-1: Request RSS capability of Rx interface depending on number of
          Rx rings
        Patch-2: Interpret and log new data that's added to the port
          misconfigure async event
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dba6cf55
    • Ajit Khaparde's avatar
      be2net: Interpret and log new data that's added to the port misconfigure async event · 51d1f98a
      Ajit Khaparde authored
      >From FW version 11.0. onwards, the PORT_MISCONFIG event generated by the FW
      will carry more information about the event in the "data_word1"
      and "data_word2" fields. This patch adds support in the driver to parse the
      new information and log it accordingly. This patch also changes some of the
      messages that are being logged currently.
      Signed-off-by: default avatarSuresh Reddy <suresh.reddy@broadcom.com>
      Signed-off-by: default avatarVenkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
      Signed-off-by: default avatarAjit Khaparde <ajit.khaparde@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51d1f98a
    • Ajit Khaparde's avatar
      be2net: Request RSS capability of Rx interface depending on number of Rx rings · 62219066
      Ajit Khaparde authored
      Currently we request RSS capability even if a single Rx ring is created.
      As a result in few cases we unnecessarily consume an RSS capable interface
      which is a limited resource in the chip.
      This patch enables RSS on an interface only if more than one Rx ring
      is created.
      Signed-off-by: default avatarAjit Khaparde <ajit.khaparde@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62219066
  2. 10 Feb, 2016 25 commits
  3. 09 Feb, 2016 4 commits
    • David S. Miller's avatar
      Merge branch 'tpacket-gso-csum-offload' · ef5c0e25
      David S. Miller authored
      Willem de Bruijn says:
      
      ====================
      packet: tpacket gso and csum offload
      
      Extend PACKET_VNET_HDR socket option support to packet sockets with
      memory mapped rings.
      
      Patches 2 and 4 add support to tpacket_rcv and tpacket_snd.
      
      Patch 1 prepares for this by moving the relevant virtio_net_hdr
      logic out of packet_snd and packet_rcv into helper functions.
      
      GSO transmission requires all headers in the skb linear section.
      Patch 3 moves parsing of tx_ring slot headers before skb allocation
      to enable allocation with sufficient linear size.
      
      Changes
        v1->v2:
          - fix bounds checks:
            - subtract sizeof(vnet_hdr) before comparing tp_len to size_max
            - compare tp_len to size_max also with GSO, just do not truncate to MTU
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef5c0e25
    • Willem de Bruijn's avatar
      packet: tpacket_snd gso and checksum offload · 1d036d25
      Willem de Bruijn authored
      Support socket option PACKET_VNET_HDR together with PACKET_TX_RING.
      
      When enabled, a struct virtio_net_hdr is expected to precede the data
      in the ring. The vnet option must be set before the ring is created.
      
      The implementation reuses the existing skb_copy_bits code that is used
      when dev->hard_header_len is non-zero. Move this ll_header check to
      before the skb alloc and combine it with a test for vnet_hdr->hdr_len.
      Allocate and copy the max of the two.
      
      Verified with test program at
      github.com/wdebruij/kerneltools/blob/master/tests/psock_txring_vnet.c
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d036d25
    • Willem de Bruijn's avatar
      packet: parse tpacket header before skb alloc · 8d39b4a6
      Willem de Bruijn authored
      GSO packet headers must be stored in the linear skb segment.
      Move tpacket header parsing before sock_alloc_send_skb. The GSO
      follow-on patch will later increase the skb linear argument to
      sock_alloc_send_skb if needed for large packets.
      
      The header parsing code does not require an allocated skb, so is
      safe to move. Later pass to tpacket_fill_skb the computed data
      start and length.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d39b4a6
    • Willem de Bruijn's avatar
      packet: vnet_hdr support for tpacket_rcv · 58d19b19
      Willem de Bruijn authored
      Support socket option PACKET_VNET_HDR together with PACKET_RX_RING.
      When enabled, a struct virtio_net_hdr will precede the data in the
      packet ring slots.
      
      Verified with test program at
      github.com/wdebruij/kerneltools/blob/master/tests/psock_rxring_vnet.c
      
        pkt: 1454269209.798420 len=5066
        vnet: gso_type=tcpv4 gso_size=1448 hlen=66 ecn=off
        csum: start=34 off=16
        eth: proto=0x800
        ip: src=<masked> dst=<masked> proto=6 len=5052
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58d19b19