Commits · 61ed713bbb043f333ca9576c79a3d33d2ad17438 · Kirill Smelkov / linux

18 Aug, 2015 29 commits

Merge branch 'drivers_iff_no_queue' · 61ed713b

David S. Miller authored Aug 18, 2015

Phil Sutter says:

====================
net: Convert drivers to IFF_NO_QUEUE and cleanup afterwards

This series converts in-tree users away from the old and deprecated
'tx_queue_len = 0' idiom, adds a warning to notify out-of-tree driver
maintainers that there is need for action on their behalf and finally drops any
workarounds in scheduling algorithm implementations.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

61ed713b

net: sched: drop all special handling of tx_queue_len == 0 · 348e3435

Phil Sutter authored Aug 18, 2015

Those were all workarounds for the formerly double meaning of
tx_queue_len, which broke scheduling algorithms if untreated.

Now that all in-tree drivers have been converted away from setting
tx_queue_len = 0, it should be safe to drop these workarounds for
categorically broken setups.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

348e3435

net: warn if drivers set tx_queue_len = 0 · 906470c1

Phil Sutter authored Aug 18, 2015

Due to the introduction of IFF_NO_QUEUE, there is a better way for
drivers to indicate that no qdisc should be attached by default. Though,
the old convention can't be dropped since ignoring that setting would
break drivers still using it. Instead, add a warning so out-of-tree
driver maintainers get a chance to adjust their code before we finally
get rid of any special handling of tx_queue_len == 0.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

906470c1

staging: wilc1000: convert to using IFF_NO_QUEUE · 7d9e437d

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Johnny Kim <johnny.kim@atmel.com>
Cc: Rachel Kim <rachel.kim@atmel.com>
Cc: Dean Lee <dean.lee@atmel.com>
Cc: Chris Park <chris.park@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7d9e437d

net: caif: convert to using IFF_NO_QUEUE · 4676a152

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

4676a152

net: hsr: convert to using IFF_NO_QUEUE · 9ad09c5c

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Arvid Brodin <arvid.brodin@alten.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

9ad09c5c

net: batman-adv: convert to using IFF_NO_QUEUE · cdf73703

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cdf73703

net: mac80211_hwsim: convert to using IFF_NO_QUEUE · 3db6da1f

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

3db6da1f

net: hostap: convert to using IFF_NO_QUEUE · 3a9c0a1b

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Jouni Malinen <j@w1.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a9c0a1b

net: dsa: convert to using IFF_NO_QUEUE · 0a5f107b

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0a5f107b

net: ipvlan: convert to using IFF_NO_QUEUE · bf485bcf

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bf485bcf

net: bonding: convert to using IFF_NO_QUEUE · 1e6f20ca

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1e6f20ca

net: 6lowpan: convert to using IFF_NO_QUEUE · 4afbc0db

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4afbc0db

net: bridge: convert to using IFF_NO_QUEUE · ccecb2a4

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ccecb2a4

net: 8021q: convert to using IFF_NO_QUEUE · 2e659c05

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

2e659c05

net: vxlan: convert to using IFF_NO_QUEUE · 22dba393

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

22dba393

net: team: convert to using IFF_NO_QUEUE · 22e380a6

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

22e380a6

net: nlmon: convert to using IFF_NO_QUEUE · 85773a61

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

85773a61

net: loopback: convert to using IFF_NO_QUEUE · e65db2b7

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

e65db2b7

net: geneve: convert to using IFF_NO_QUEUE · ed961ac2

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ed961ac2

net: dummy: convert to using IFF_NO_QUEUE · ff42c02c

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff42c02c

net: veth: enable noqueue operation by default · 02f01ec1

Phil Sutter authored Aug 18, 2015

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

02f01ec1

Merge branch 'Identifier-Locator-Addressing' · 0b233dc7

David S. Miller authored Aug 17, 2015

Tom Herbert says:

====================
net: Identifier Locator Addressing - Part I

This patch set provides rudimentary support for Identifier Locator
Addressing or ILA. The basic concept of ILA is that we split an IPv6
address into a 64 bit locator and 64 bit identifier. The identifier is
the identity of an entity in communication ("who"), and the locator
expresses the location of the entity ("where"). Applications
use externally visible address that contains the identifier.
When a packet is actually sent, a translation is done that
overwrites the first 64 bits of the address with a locator.
The packet can then be forwarded over the network to the host where
the addressed entity is located. At the receiver, the reverse
translation is done so the that the application sees the original,
untranslated address. Presumably an external control plane will
provide identifier->locator mappings.

v2:
  - Fix compilation erros when LWT not configured
  - Consolidate ILA into a single ila.c

v3:
  - Change pseudohdr argument od inet_proto_csum_replace functions to
    be a bool

v4:
  - In ila_build_state check locator being in netlink params before
    allocating tunnel state

The data path for ILA is a simple NAT translation that only operates
on the upper 64 bits of a destination address in IPv6 packets. The
basic process is:

   1) Lookup 64 bit identifier (lower 64 bits of destination)
   2) If a match is found
      a) Overwrite locator (upper 64 bits of destination) with
         the new locator
      b) Adjust any checksum that has destination address included in
         pseudo header
   3) Send or receive packet

ILA is a means to implement tunnels or network virtualization without
encapsulation. Since there is no encapsulation involved, we assume that
stateless support in the network for IPv6 (e.g. RSS, ECMP, TSO, etc.)
just works. Also, since we're minimally changing the packet many of
the worries about encapsulation (MTU, checksum, fragmentation) are
not relevant. The downside is that, ILA is not extensible like other
encapsulations (GUE for instance) so it might not be appropriate for
all use cases. Also, this only makes sense to do in IPv6!

A key aspect of ILA is performance. The intent is that ILA would be
used in data centers in virtualizing tasks or jobs. In the fullest
incarnation all intra data center communications might be targeted to
virtual ILA addresses. This is basically adding a new virtualization
capability to the existing services in a datacenter, so there is a
strong expectation is that this does not degrade performance for
existing applications.

Performance seems to be dependent on how ILA is hooked into kernel.
ILA can be implemented under some different models:

  - Mechanically it is a form a stateless DNAT
  - It can be thought of as a type of (source) routing
  - As a functional replacement of encapsulation

In this patch set we hook into the data path using Light Weight
Tunnels (LWT) infrastructure. As part of that, we add support in LWT
to redirect dst input. iproute will be modified to take a new ila encap
type. ILA can be configured like:

ip route add 3333:0:0:1:5555:0:2:0/128 \
   encap ila 2001:0:0:2 via 2401:db00:20:911a:face:0:27:0

ip -6 addr add 3333:0:0:1:5555:0:1:0/128 dev eth0

ip route add table local local 2001:0:0:1:5555:0:1:0/128
   encap ila 3333:0:0:1 dev lo

So sending to destination 3333:0:0:1:5555:0:2:0 will have destination
of 2001:0:0:2:5555:0:2:0 on the wire.

Performance results are below. With ILA we see about a 10% drop in
pps compared to non-ILA. Much of this drop can be attributed to the
loss of early demux on input (translation occurs after it is attempted).
We will address this in the next patch set. Also, IPvlan input path
does not work with ILA since the routing is bypassed-- this will
be addressed in a future patch.

Performance testing:

Performing netperf TCP_RR with 200 clients:

Non-ILA baseline
  84.92% CPU utilization
  1861922.9 tps
  93/163/330 50/90/99% latencies

ILA single destination
  83.16% CPU utilization
  1679683.4 tps
  105/180/332 50/90/99% latencies

References:

Slides from netconf:
http://vger.kernel.org/netconf2015Herbert-ILA.pdf

Slides from presentation at IETF:
https://www.ietf.org/proceedings/92/slides/slides-92-nvo3-1.pdf

I-D:
https://tools.ietf.org/html/draft-herbert-nvo3-ila-00
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0b233dc7

net: Identifier Locator Addressing module · 65d7ab8d

Tom Herbert authored Aug 17, 2015

Adding new module name ila. This implements ILA translation. Light
weight tunnel redirection is used to perform the translation in
the data path. This is configured by the "ip -6 route" command
using the "encap ila <locator>" option, where <locator> is the
value to set in destination locator of the packet. e.g.

ip -6 route add 3333:0:0:1:5555:0:1:0/128 \
      encap ila 2001:0:0:1 via 2401:db00:20:911a:face:0:25:0

Sets a route where 3333:0:0:1 will be overwritten by
2001:0:0:1 on output.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

65d7ab8d

net: Add inet_proto_csum_replace_by_diff utility function · abc5d1ff

Tom Herbert authored Aug 17, 2015

This function updates a checksum field value and skb->csum based on
a value which is the difference between the old and new checksum.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

abc5d1ff

net: Change pseudohdr argument of inet_proto_csum_replace* to be a bool · 4b048d6d

Tom Herbert authored Aug 17, 2015

inet_proto_csum_replace4,2,16 take a pseudohdr argument which indicates
the checksum field carries a pseudo header. This argument should be a
boolean instead of an int.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4b048d6d

lwt: Add support to redirect dst.input · 25368623

Tom Herbert authored Aug 17, 2015

This patch adds the capability to redirect dst input in the same way
that dst output is redirected by LWT.

Also, save the original dst.input and and dst.out when setting up
lwtunnel redirection. These can be called by the client as a pass-
through.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

25368623

enic: Fix sparse warning in vnic_devcmd_init(). · f376d4ad

David S. Miller authored Aug 17, 2015

>> drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13: sparse: incorrect type in assignment (different address spaces)
   drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13:    expected void *res
   drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13:    got void [noderef] <asn:2>*
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f376d4ad

mlx5e: Fix sparse warnings in mlx5e_handle_csum(). · ecf842f6

David S. Miller authored Aug 17, 2015

>> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:173:44: sparse: incorrect type in argument 1 (different base types)
   drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:173:44:    expected restricted __sum16 [usertype] n
   drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:173:44:    got restricted __be16 [usertype] check_sum
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ecf842f6

17 Aug, 2015 11 commits

inet: Move VRF table lookup to inlined function · dc028da5

David Ahern authored Aug 16, 2015

Table lookup compiles out when VRF is not enabled.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dc028da5

net: Fix docbook warning for IFF_VRF_MASTER enum · 808d28c4

David Ahern authored Aug 16, 2015

kbuild test robot reported:
tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   d52736e2
commit: 4e3c8992 [751/762] net: Introduce VRF related flags and helpers
reproduce: make htmldocs

>> Warning(include/linux/netdevice.h:1293): Enum value 'IFF_VRF_MASTER' not described in enum 'netdev_priv_flags'
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

808d28c4

net: Updates to netif_index_is_vrf · 2f52bdcf

David Ahern authored Aug 16, 2015

As Eric noted netif_index_is_vrf is not called with rcu_read_lock held,
so wrap the dev_get_by_index_rcu in rcu_read_lock and unlock.

If VRF is not enabled or oif is 0 skip the device lookup. In both cases
index cannot be the VRF master.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f52bdcf

Merge branch 'mlx5e-next' · 9cd3778c

David S. Miller authored Aug 17, 2015

Achiad Shochat says:

====================
Driver updates 16-Aug-2015

This patchset contains bug fixes, new RSS and pause parameters ethtool
options, and support for RX CHECKSUM_COMPLETE.

Patchset was applied and tested over commit adc6310 ("Merge branch
'mv88e6xxx-switchdev-fdb'").
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9cd3778c

net/mlx5e: Support RX CHECKSUM_COMPLETE · bbceefce

Achiad Shochat authored Aug 16, 2015

Only for packets with first ethertype set to IPv4/6 for now.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bbceefce

net/mlx5e: Support ethtool get/set_pauseparam · 3c2d18ef

Achiad Shochat authored Aug 16, 2015

Only rx/tx pause settings.
Autoneg setting is currently not supported.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c2d18ef

net/mlx5e: Ethtool link speed setting fixes · 6fa1bcab

Achiad Shochat authored Aug 16, 2015

- Port speed settings are applied by the device only upon
  port admin status transition from DOWN to UP.
  So we enforce this transition regardless of the port's
  current operation state (which may be occasionally DOWN if
  for example the network cable is disconnected).
- Fix the PORT_UP/DOWN device interface enum
- Set the local_port bit in the device PAOS register
- EXPORT the PAOS (Port Administrative and Operational Status)
  register set/query access functions.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6fa1bcab

net/mlx5e: HW LRO changes/fixes · d9a40271

Achiad Shochat authored Aug 16, 2015

- Change the maximum LRO session size from 16KB to 64KB
- Reduce the LRO session timeout from 512us to 32us in
  order to reduce the TCP latency of non-LRO'ed flows.
- Fix skb_shinfo(skb)->gso_size and set skb_shinfo(skb)->gso_type.
- Fix a bug accessing un-initialized mdev pointer.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d9a40271

net/mlx5e: Support smaller RX/TX ring sizes · e842b100

Achiad Shochat authored Aug 16, 2015

We un-intentionally limited the minimum rings size too much.

TX minimum ring size reduced from 128 to 64.
RX minimum ring size reduced from 128 to 2.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e842b100

net/mlx5e: Add ethtool RSS configuration options · 2d75b2bc

Achiad Shochat authored Aug 16, 2015

- get_rxfh_key_size
- get_rxfh_indir_size
- get/set_rxfh indirection table and RSS Toeplitz hash key
- get_rxnfc
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2d75b2bc

net/mlx5e: Make RSS indirection table size a constant · 936896e9

Achiad Shochat authored Aug 16, 2015

The indirection table size was defined by a variable that
was actually assigned a constant value.
Since we do not have any forseen intension to make it configurable
we simply made it a constant.

We also limit the number of channels such that the RSS indirection
table could always populate all RX rings.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

936896e9