Commits · c125e80b88687b25b321795457309eaaee4bf270 · Kirill Smelkov / linux

11 Feb, 2016 10 commits

soreuseport: fast reuseport TCP socket selection · c125e80b

Craig Gallek authored Feb 10, 2016

This change extends the fast SO_REUSEPORT socket lookup implemented
for UDP to TCP.  Listener sockets with SO_REUSEPORT and the same
receive address are additionally added to an array for faster
random access.  This means that only a single socket from the group
must be found in the listener list before any socket in the group can
be used to receive a packet.  Previously, every socket in the group
needed to be considered before handing off the incoming packet.

This feature also exposes the ability to use a BPF program when
selecting a socket from a reuseport group.
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c125e80b

soreuseport: Prep for fast reuseport TCP socket selection · fa463497

Craig Gallek authored Feb 10, 2016

Both of the lines in this patch probably should have been included
in the initial implementation of this code for generic socket
support, but weren't technically necessary since only UDP sockets
were supported.

First, the sk_reuseport_cb points to a structure which assumes
each socket in the group has this pointer assigned at the same
time it's added to the array in the structure.  The sk_clone_lock
function breaks this assumption.  Since a child socket shouldn't
implicitly be in a reuseport group, the simple fix is to clear
the field in the clone.

Second, the SO_ATTACH_REUSEPORT_xBPF socket options require that
SO_REUSEPORT also be set first.  For UDP sockets, this is easily
enforced at bind-time since that process both puts the socket in
the appropriate receive hlist and updates the reuseport structures.
Since these operations can happen at two different times for TCP
sockets (bind and listen) it must be explicitly checked to enforce
the use of SO_REUSEPORT with SO_ATTACH_REUSEPORT_xBPF in the
setsockopt call.
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa463497

inet: refactor inet[6]_lookup functions to take skb · a583636a

Craig Gallek authored Feb 10, 2016

This is a preliminary step to allow fast socket lookup of SO_REUSEPORT
groups.  Doing so with a BPF filter will require access to the
skb in question.  This change plumbs the skb (and offset to payload
data) through the call stack to the listening socket lookup
implementations where it will be used in a following patch.
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a583636a

tcp: __tcp_hdrlen() helper · d9b3fca2

Craig Gallek authored Feb 10, 2016

tcp_hdrlen is wasteful if you already have a pointer to struct tcphdr.
This splits the size calculation into a helper function that can be
used if a struct tcphdr is already available.
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d9b3fca2

inet: create IPv6-equivalent inet_hash function · 496611d7

Craig Gallek authored Feb 10, 2016

In order to support fast lookups for TCP sockets with SO_REUSEPORT,
the function that adds sockets to the listening hash set needs
to be able to check receive address equality.  Since this equality
check is different for IPv4 and IPv6, we will need two different
socket hashing functions.

This patch adds inet6_hash identical to the existing inet_hash function
and updates the appropriate references.  A following patch will
differentiate the two by passing different comparison functions to
__inet_hash.

Additionally, in order to use the IPv6 address equality function from
inet6_hashtables (which is compiled as a built-in object when IPv6 is
enabled) it also needs to be in a built-in object file as well.  This
moves ipv6_rcv_saddr_equal into inet_hashtables to accomplish this.
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

496611d7

sock: struct proto hash function may error · 086c653f

Craig Gallek authored Feb 10, 2016

In order to support fast reuseport lookups in TCP, the hash function
defined in struct proto must be capable of returning an error code.
This patch changes the function signature of all related hash functions
to return an integer and handles or propagates this return value at
all call sites.
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

086c653f

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge · 30c1de08

David S. Miller authored Feb 11, 2016

Antonio Quartulli says:

====================
Here you have a batch of patches by Sven Eckelmann that
drops our private reference counting implementation and
substitutes it with the kref objects/functions.

Then you have a patch, by Simon Wunderlich, that
makes the broadcast protection window code more generic so
that it can be re-used in the future by other components
with different requirements.

Lastly, Sven is also introducing two lockdep asserts in
functions operating on our TVLV container list, to make
sure that the proper lock is always acquired by the users.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

30c1de08

Merge branch 'be2net-next' · dba6cf55

David S. Miller authored Feb 11, 2016

Ajit Khaparde says:

====================
be2net Patch series

Please consider applying these two patches to net-next

  Patch-1: Request RSS capability of Rx interface depending on number of
    Rx rings
  Patch-2: Interpret and log new data that's added to the port
    misconfigure async event
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

dba6cf55

be2net: Interpret and log new data that's added to the port misconfigure async event · 51d1f98a

Ajit Khaparde authored Feb 10, 2016

>From FW version 11.0. onwards, the PORT_MISCONFIG event generated by the FW
will carry more information about the event in the "data_word1"
and "data_word2" fields. This patch adds support in the driver to parse the
new information and log it accordingly. This patch also changes some of the
messages that are being logged currently.
Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

51d1f98a

be2net: Request RSS capability of Rx interface depending on number of Rx rings · 62219066

Ajit Khaparde authored Feb 10, 2016

Currently we request RSS capability even if a single Rx ring is created.
As a result in few cases we unnecessarily consume an RSS capable interface
which is a limited resource in the chip.
This patch enables RSS on an interface only if more than one Rx ring
is created.
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

62219066

10 Feb, 2016 25 commits

batman-adv: Convert batadv_tt_common_entry to kref · 92dcdf09

Sven Eckelmann authored Jan 16, 2016

batman-adv uses a self-written reference implementation which is just based
on atomic_t. This is less obvious when reading the code than kref and
therefore increases the change that the reference counting will be missed.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>

92dcdf09

batman-adv: Convert batadv_orig_node to kref · 7c124391

Sven Eckelmann authored Jan 16, 2016

7c124391

batman-adv: Convert batadv_orig_node_vlan to kref · 161a3be9

Sven Eckelmann authored Jan 16, 2016

161a3be9

batman-adv: Convert batadv_hard_iface to kref · 7a659d56

Sven Eckelmann authored Jan 16, 2016

7a659d56

batman-adv: Convert batadv_neigh_node to kref · 77ae32e8

Sven Eckelmann authored Jan 16, 2016

77ae32e8

batman-adv: Convert batadv_orig_ifinfo to kref · a6ba0d34

Sven Eckelmann authored Jan 16, 2016

a6ba0d34

batman-adv: Convert batadv_neigh_ifinfo to kref · 962c6832

Sven Eckelmann authored Jan 16, 2016

962c6832

batman-adv: Convert batadv_tt_orig_list_entry to kref · 6e8ef69d

Sven Eckelmann authored Jan 16, 2016

6e8ef69d

batman-adv: Convert batadv_tvlv_handler to kref · 32836f56

Sven Eckelmann authored Jan 16, 2016

32836f56

batman-adv: Convert batadv_tvlv_container to kref · f7157dd1

Sven Eckelmann authored Jan 16, 2016

f7157dd1

batman-adv: Convert batadv_dat_entry to kref · 68a6722c

Sven Eckelmann authored Jan 16, 2016

68a6722c

batman-adv: Convert batadv_nc_path to kref · 727e0cd5

Sven Eckelmann authored Jan 16, 2016

727e0cd5

batman-adv: Convert batadv_nc_node to kref · daf99b48

Sven Eckelmann authored Jan 16, 2016

daf99b48

batman-adv: Convert batadv_bla_claim to kref · 71b7e3d3

Sven Eckelmann authored Jan 16, 2016

71b7e3d3

batman-adv: Convert batadv_bla_backbone_gw to kref · 06e56ded

Sven Eckelmann authored Jan 16, 2016

06e56ded

batman-adv: Convert batadv_softif_vlan to kref · 6be4d30c

Sven Eckelmann authored Jan 16, 2016

6be4d30c

batman-adv: Convert batadv_gw_node to kref · e7aed321

Sven Eckelmann authored Jan 16, 2016

e7aed321

batman-adv: Convert batadv_hardif_neigh_node to kref · 90f564df

Sven Eckelmann authored Jan 16, 2016

90f564df

batman-adv: Add lockdep assert for container_list_lock · dded0692

Sven Eckelmann authored Dec 20, 2015

The batadv_tvlv_container* functions state in their kernel-doc that they
require tvlv.container_list_lock. Add an assert to automatically detect
when this might have been ignored by the caller.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>

dded0692

batman-adv: add seqno maximum age and protection start flag parameters · 81f02683

Simon Wunderlich authored Nov 23, 2015

To allow future use of the window protected function with different
maximum sequence numbers, add a parameter to set this value which
was previously hardcoded. Another parameter added for future use is a
flag to return whether the protection window has started.

While at it, also fix the kerneldoc.
Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>

81f02683

batman-adv: Drop reference to netdevice on last reference · 140ed8e8

Sven Eckelmann authored Jan 05, 2016

The references to the network device should be dropped inside the release
function for batadv_hard_iface similar to what is done with the batman-adv
internal datastructures.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>

140ed8e8

sxgbe: remove unused code · aaa56720

Jean Sacren authored Feb 09, 2016

Remove the unused code of sxgbe_xpcs.
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Cc: Byungho An <bh74.an@samsung.com>
Cc: Girish K S <ks.giri@samsung.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1601191918470.2531@hadrienSigned-off-by: David S. Miller <davem@davemloft.net>

aaa56720

Merge branch 'renesas-bit-twiddling' · 12f08412

David S. Miller authored Feb 10, 2016

Sergei Shtylyov says:

====================
Factor out register bit twiddling in the Renesas Ethernet drivers

   Here's a set of 2 patches against DaveM's 'net-next.git' repo. We factor out
the often repeated pattern of reading a register, AND'ing and/or OR'ing some
bits, and then writing the value back.

[1/2] ravb: factor out register bit twiddling code
[2/2] sh_eth: factor out register bit twiddling code
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

12f08412

sh_eth: factor out register bit twiddling code · b2b14d2f

Sergei Shtylyov authored Feb 10, 2016

The driver has often repeated pattern of reading a register, AND'ing and/or
OR'ing some bits and writing the value back. Factor the pattern out into
sh_eth_modify() -- this saves 84 bytes of code with ARM gcc 4.7.3.

While at it, update Cogent Embedded's copyright.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b2b14d2f

ravb: factor out register bit twiddling code · 568b3ce7

Sergei Shtylyov authored Feb 10, 2016

The driver has often repeated pattern of reading a register, AND'ing and/or
OR'ing some bits and writing the value back. Factor the pattern out into
ravb_modify() -- this saves 260 bytes of code with ARM gcc 4.7.3.

While at it, update Cogent Embedded's copyrights.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

568b3ce7

09 Feb, 2016 5 commits

Merge branch 'tpacket-gso-csum-offload' · ef5c0e25

David S. Miller authored Feb 09, 2016

Willem de Bruijn says:

====================
packet: tpacket gso and csum offload

Extend PACKET_VNET_HDR socket option support to packet sockets with
memory mapped rings.

Patches 2 and 4 add support to tpacket_rcv and tpacket_snd.

Patch 1 prepares for this by moving the relevant virtio_net_hdr
logic out of packet_snd and packet_rcv into helper functions.

GSO transmission requires all headers in the skb linear section.
Patch 3 moves parsing of tx_ring slot headers before skb allocation
to enable allocation with sufficient linear size.

Changes
  v1->v2:
    - fix bounds checks:
      - subtract sizeof(vnet_hdr) before comparing tp_len to size_max
      - compare tp_len to size_max also with GSO, just do not truncate to MTU
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ef5c0e25

packet: tpacket_snd gso and checksum offload · 1d036d25

Willem de Bruijn authored Feb 03, 2016

Support socket option PACKET_VNET_HDR together with PACKET_TX_RING.

When enabled, a struct virtio_net_hdr is expected to precede the data
in the ring. The vnet option must be set before the ring is created.

The implementation reuses the existing skb_copy_bits code that is used
when dev->hard_header_len is non-zero. Move this ll_header check to
before the skb alloc and combine it with a test for vnet_hdr->hdr_len.
Allocate and copy the max of the two.

Verified with test program at
github.com/wdebruij/kerneltools/blob/master/tests/psock_txring_vnet.c
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d036d25

packet: parse tpacket header before skb alloc · 8d39b4a6

Willem de Bruijn authored Feb 03, 2016

GSO packet headers must be stored in the linear skb segment.
Move tpacket header parsing before sock_alloc_send_skb. The GSO
follow-on patch will later increase the skb linear argument to
sock_alloc_send_skb if needed for large packets.

The header parsing code does not require an allocated skb, so is
safe to move. Later pass to tpacket_fill_skb the computed data
start and length.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8d39b4a6

packet: vnet_hdr support for tpacket_rcv · 58d19b19

Willem de Bruijn authored Feb 03, 2016

Support socket option PACKET_VNET_HDR together with PACKET_RX_RING.
When enabled, a struct virtio_net_hdr will precede the data in the
packet ring slots.

Verified with test program at
github.com/wdebruij/kerneltools/blob/master/tests/psock_rxring_vnet.c

  pkt: 1454269209.798420 len=5066
  vnet: gso_type=tcpv4 gso_size=1448 hlen=66 ecn=off
  csum: start=34 off=16
  eth: proto=0x800
  ip: src=<masked> dst=<masked> proto=6 len=5052
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

58d19b19

packet: move vnet_hdr code to helper functions · 16cc1400

Willem de Bruijn authored Feb 03, 2016

packet_snd and packet_rcv support virtio net headers for GSO.
Move this logic into helper functions to be able to reuse it in
tpacket_snd and tpacket_rcv.

This is a straighforward code move with one exception. Instead of
creating and passing a separate gso_type variable, reuse
vnet_hdr.gso_type after conversion from virtio to kernel gso type.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

16cc1400