Commits · 7dcc18adad31d15d528414bdff12bf98d33d9a20 · Kirill Smelkov / linux

18 Jul, 2017 40 commits

mlxsw: spectrum_router: Update prefix count for IPv6 · 7dcc18ad

Ido Schimmel authored Jul 18, 2017

The number of possible prefix lengths for IPv6 is 129 and not 128.

Fixes following warning from UBSAN when /128 routes are offloaded:

 UBSAN: Undefined behaviour in
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:2510:27 index 128 is out
of range for type 'long unsigned int [128]'

Fixes: 5e9c16cc ("mlxsw: spectrum_router: Implement private fib")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7dcc18ad

mlxsw: spectrum_router: Rename functions to add / delete a FIB entry · 80c238f9

Ido Schimmel authored Jul 18, 2017

These functions aren't specific to IPv4 and can be re-used for IPv6.

Drop the '4' designation from their name.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

80c238f9

mlxsw: spectrum_router: Drop unnecessary parameter · 9efbee6f

Ido Schimmel authored Jul 18, 2017

Functions that take as argument a FIB entry don't need to take FIB node
as well, as it can be extracted from the entry.

Remove unnecessary FIB node parameter.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9efbee6f

mlxsw: spectrum_router: Mark IPv4 specific function accordingly · 0e6ea2a4

Ido Schimmel authored Jul 18, 2017

The functions to create and destroy a nexthop group are IPv4 specific
and should be renamed accordingly, so that they won't be confused with
the IPv6 specific functions in follow-up patches.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0e6ea2a4

mlxsw: spectrum_router: Create IPv4 specific entry struct · 4f1c7f1f

Ido Schimmel authored Jul 18, 2017

Some of the parameters stored in the FIB entry structure are specific to
IPv4 and therefore better placed in an IPv4 specific structure.

Create an IPv4 specific structure that encapsulates the common FIB entry
structure and contains IPv4 specific parameters.

In a follow-up patchset an IPv6 specific structure will be introduced.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f1c7f1f

mlxsw: spectrum_router: Set abort trap for IPv6 · bc65a8a4

Ido Schimmel authored Jul 18, 2017

When we fail to insert a route we invoke the abort mechanism which
flushes all the tables and inserts a default route in each, so that all
packets incoming to the router will be trapped to the CPU.

Upon abort, add an IPv6 default route to the IPv6 tables.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bc65a8a4

mlxsw: spectrum_router: Allow IPv6 routes to be programmed · 9dbf4d76

Ido Schimmel authored Jul 18, 2017

Take advantage of previous patch and allow the RALUE register to be
called with IPv6 routes.

In order to re-use as much code as possible between IPv4 and IPv6, only
the lowest-level function that actually does the register packing is
demuxed based on the passed protocol.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9dbf4d76

mlxsw: reg: Update RALUE register with IPv6 support · 62547f40

Ido Schimmel authored Jul 18, 2017

Update the register so that IPv6 LPM entries could be programmed to the
device's table.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

62547f40

mlxsw: spectrum_router: Extend virtual routers with IPv6 support · a3d9bc50

Ido Schimmel authored Jul 18, 2017

A Virtual Router (VR) is an entity which corresponds to a VRF and
performs FIB lookup in an LPM tree according to the {VR, IP Proto} ->
Tree binding.

Extend the virtual router data structure towards IPv6 FIB offload.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a3d9bc50

mlxsw: spectrum_router: Make FIB node retrieval family agnostic · 731ea1ca

Ido Schimmel authored Jul 18, 2017

A FIB node is an entity which stores routes sharing the same prefix and
length. The data structure itself is already family agnostic, but we
make some of its operations agnostic as well and thus re-use them for
IPv6 offload.

Instead of passing an IPv4-specific structure to fib4_node_get(), pass
general routing parameters and rename the function accordingly.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

731ea1ca

mlxsw: spectrum_router: Don't create FIB node during lookup · 160e22aa

Ido Schimmel authored Jul 18, 2017

When looking up a FIB entry we shouldn't create the FIB node where it's
supposed to be linked in case the node doesn't already exist.

Instead, lookup the node and fail if it doesn't exist.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

160e22aa

mlxsw: spectrum_router: Don't assume neighbour type · 58adf2c4

Ido Schimmel authored Jul 18, 2017

Thankfully, the neighbour subsystem is agnostic to the upper protocol
and used by both IPv4 and IPv6. By removing assumptions regarding the
neighbour type we can thus re-use much of the neighbour-related code for
both IPv4 and IPv6.

For each nexthop, store its gateway IP and for nexthop group store the
neighbour table used by its nexthops.

Use this information throughout the code and remove assumption about the
neighbour type.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

58adf2c4

mlxsw: spectrum_router: Set activity interval according to both neighbour tables · a6c9b5d1

Arkadi Sharshevsky authored Jul 18, 2017

The neighbours' activity is currently dumped according to the ARP
table's DELAY_PROBE time, but with the introduction of IPv6 offload we
should set the interval according to the minimum between the ARP and
ndisc tables.
Signed-off-by: Arkadi Sharshvesky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a6c9b5d1

mlxsw: spectrum_router: Periodically dump active IPv6 neighbours · 60f040ca

Arkadi Sharshevsky authored Jul 18, 2017

In addition to IPv4, periodically dump IPv6 neighbours and update the
kernel about them.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

60f040ca

mlxsw: reg: Update RAUHTD register with IPv6 support · 72e8ebe1

Arkadi Sharshevsky authored Jul 18, 2017

Update the register so that the active IPv6 neighbours could be dumped
from the device's neighbour table.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

72e8ebe1

mlxsw: spectrum_router: Reflect IPv6 neighbours to the device · d5eb89cf

Arkadi Sharshevsky authored Jul 18, 2017

As with IPv4, listen to NEIGH_UPDATE events from the ndisc table and
program relevant neighbours to the device's neighbour table.

Note that neighbours with a link-local IP address aren't programmed, as
packets with a link-local destination IP are trapped after LPM lookup
and never reach the neighbour table.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d5eb89cf

mlxsw: reg: Update RAUHT register with IPv6 support · 6929e507

Arkadi Sharshevsky authored Jul 18, 2017

Update the register, so the IPv6 neighbours could be programmed to the
device's neighbour table.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6929e507

mlxsw: spectrum_router: Configure RIFs based on IPv6 addresses · 5ea1237f

Arkadi Sharshevsky authored Jul 18, 2017

When a netdev is configured with an IP address a router interface (RIF)
should be configured for it in the device. Allow configuration of RIFs
based on IPv6 address notifications as well as IPv4.

Note that the RIF exists as long as an IP address is configured on the
netdev, regardless of the address family.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5ea1237f

mlxsw: spectrum_router: Flood unregistered multicast packets to router · 0d284818

Ido Schimmel authored Jul 18, 2017

Up until now we only flooded broadcast packets to the router when an L3
interface was configured on top of a bridge. However, IPv6 Neighbour
Discovery packets are trapped to the CPU inside the router and these can
be sent with a multicast address.

Flood unregistered multicast packets to the router port, so that
relevant packets could be trapped there.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0d284818

mlxsw: spectrum: Add support for IPv6 traps · 8d54814e

Arkadi Sharshevsky authored Jul 18, 2017

Before we can start using IPv6, we need to trap certain control packets
to the CPU. Among others, these include Neighbour Discovery, DHCP and
neighbour misses.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8d54814e

mlxsw: reg: Enable IPv6 on router interfaces · e717e011

Arkadi Sharshevsky authored Jul 18, 2017

Enable IPv6 and IPv6 forwarding on router interfaces (RIFs), so that
they will be able to receive and forward IPv6 traffic.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e717e011

mlxsw: spectrum_router: Enable IPv6 router · e29237e7

Arkadi Sharshevsky authored Jul 18, 2017

Before we add IPv6 constructs like traps and router interfaces, we first
need to enable IPv6 routing in the device.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e29237e7

Merge branch 'xfrm-remove-flow-cache' · 7f81ff04

David S. Miller authored Jul 18, 2017

Florian Westphal says:

====================
xfrm: remove flow cache

After RCU-ification of ipsec packet path there are no major scalability
issues anymore without flow cache.

We still incur a performance hit, which comes mostly from the extra xfrm
dst allocation/freeing.
The last patch in the series adds a simple percpu cache to avoid the
extra allocation if a packet matched the same policies as last one.

The main concern with this is that we will see performance drops,
especially with large numbers of policies/SAs.

However, during hallway discussions at nfws 2017 it seemed the issues
with flow caching outweight the removal downsides, and that it
might be best to just 'remove it' and see where the practical issues
(if any) will appear.

It should now be possible to also remove the genid member in the policies
as we don't hold bundles for prolonged time anymore, but I think
this change is controversial (and intrusive) enough as-is, so defer
that to a later point in time.

Changes since last rfc:

- fix build failures due to implicit interrupt.h includes
- rework last patch (pcpu cache):
 * avoid xchg()
 * check policies for walk.dead = 1 instead of more costly bundle_ok().
 * flush pcpu bundles when sa/policies get removed, to allow module
   references to go away (suggested by Ilan Tayari)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7f81ff04

xfrm: add xdst pcpu cache · ec30d78c

Florian Westphal authored Jul 17, 2017

retain last used xfrm_dst in a pcpu cache.
On next request, reuse this dst if the policies are the same.

The cache will not help with strict RR workloads as there is no hit.

The cache packet-path part is reasonably small, the notifier part is
needed so we do not add long hangs when a device is dismantled but some
pcpu xdst still holds a reference, there are also calls to the flush
operation when userspace deletes SAs so modules can be removed
(there is no hit.

We need to run the dst_release on the correct cpu to avoid races with
packet path.  This is done by adding a work_struct for each cpu and then
doing the actual test/release on each affected cpu via schedule_work_on().

Test results using 4 network namespaces and null encryption:

ns1           ns2          -> ns3           -> ns4
netperf -> xfrm/null enc   -> xfrm/null dec -> netserver

what                    TCP_STREAM      UDP_STREAM      UDP_RR
Flow cache:             14644.61        294.35          327231.64
No flow cache:		14349.81	242.64		202301.72
Pcpu cache:		14629.70	292.21		205595.22

UDP tests used 64byte packets, tests ran for one minute each,
value is average over ten iterations.

'Flow cache' is 'net-next', 'No flow cache' is net-next plus this
series but without this patch.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ec30d78c

xfrm: remove flow cache · 09c75704

Florian Westphal authored Jul 17, 2017

After rcu conversions performance degradation in forward tests isn't that
noticeable anymore.

See next patch for some numbers.

A followup patcg could then also remove genid from the policies
as we do not cache bundles anymore.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

09c75704

xfrm_policy: make xfrm_bundle_lookup return xfrm dst object · bd45c539

Florian Westphal authored Jul 17, 2017

This allows to remove flow cache object embedded in struct xfrm_dst.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd45c539

xfrm_policy: remove xfrm_policy_lookup · 86dc8ee0

Florian Westphal authored Jul 17, 2017

This removes the wrapper and renames the __xfrm_policy_lookup variant
to get rid of another place that used flow cache objects.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

86dc8ee0

xfrm_policy: kill flow to policy dir conversion · aff669bc

Florian Westphal authored Jul 17, 2017

XFRM_POLICY_IN/OUT/FWD are identical to FLOW_DIR_*, so gcc already
removed this function as its just returns the argument.  Again, no
code change.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

aff669bc

xfrm_policy: remove always true/false branches · 855dad99

Florian Westphal authored Jul 17, 2017

after previous change oldflo and xdst are always NULL.
These branches were already removed by gcc, this doesn't change code.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

855dad99

xfrm_policy: bypass flow_cache_lookup · 3ca28286

Florian Westphal authored Jul 17, 2017

Instead of consulting flow cache, call the xfrm bundle/policy lookup
functions directly.  This pretends the flow cache had no entry.

This helps to gradually remove flow cache integration,
followup commit will remove the dead code that this change adds.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ca28286

net: xfrm: revert to lower xfrm dst gc limit · 3c2a89dd

Florian Westphal authored Jul 17, 2017

revert c386578f ("xfrm: Let the flowcache handle its size by default.").

Once we remove flow cache, we don't have a flow cache limit anymore.
We must not allow (virtually) unlimited allocations of xfrm dst entries.
Revert back to the old xfrm dst gc limits.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c2a89dd

vti: revert flush x-netns xfrm cache when vti interface is removed · 6b1c42e9

Florian Westphal authored Jul 17, 2017

flow cache is removed in next commit.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

6b1c42e9

drivers: net: add missing interrupt.h include · 0ab10314

Florian Westphal authored Jul 17, 2017

these drivers use tasklets or irq apis, but don't include interrupt.h.
Once flow cache is removed the implicit interrupt.h inclusion goes away
which will break the build.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

0ab10314

Merge branch 'dsa-mv88e6xxx-cleanup-capabilities' · 6ddb4fdf

David S. Miller authored Jul 18, 2017

Vivien Didelot says:

====================
net: dsa: mv88e6xxx: cleanup capabilities

This patch series removes the remaining capabilities as well as the
flags bitmap in the info structures. Most of them are turned into ops,
or new info members.

There is no mv88e6xxx_cap enum or bitmap flags anymore, only
mv88e6xxx_info and mv88e6xxx_ops structures.

While reviewing and documenting the related G2 registers, fix a few
inconsistencies: 88E6185 has no interrupt in G2 and 88E6390 has a POT.

Except these two adjustments, there is no functional changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6ddb4fdf

net: dsa: mv88e6xxx: add a multi_chip info flag · b3e05aa1

Vivien Didelot authored Jul 17, 2017

Instead of relying on a bitmap flag, add a new multi_chip info flag to
describe the presence of the indirect SMI access though the two device
registers 0x0 and 0x1.

All remaining capabilities and flags are now unused. Remove the
mv88e6xxx_cap enum and the info flags bitmaps.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

b3e05aa1

net: dsa: mv88e6xxx: add Energy Detect ops · 68b8f60c

Vivien Didelot authored Jul 17, 2017

The 88E6352 family supports Energy Detect and has one bit for Sense and
one bit for periodically transmit NLP (Energy Detect+TM). The 88E6390
family adds another bit to distinguish Auto or SW wake-up. Chips
supporting EEE all have an EEE Enabled bit in the Port Status Register.

This patch adds new ops for the PHY Energy Detect accesses.

This also allows us to get rid of the MV88E6XXX_FLAG_EEE flag.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

68b8f60c

net: dsa: mv88e6xxx: add a global2_addr info flag · 9069c13a

Vivien Didelot authored Jul 17, 2017

Similarly to global1_addr, add a global2_addr member in the info
structure to describe the presence of the Global 2 Registers.

This allows us to get rid of the MV88E6XXX_FLAG_GLOBAL2 flag.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

9069c13a

net: dsa: mv88e6xxx: add POT operation · 9e907d73

Vivien Didelot authored Jul 17, 2017

Add a pot_clear operation to clear the Priority Override Table and wrap
its call into a mv88e6xxx_pot_setup helper.

This allows us to get rid of the MV88E6XXX_FLAG_G2_POT flag.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e907d73

net: dsa: mv88e6xxx: add POT flag to 88E6390 · a2a05db8

Vivien Didelot authored Jul 17, 2017

The 88E6390 family clear the Priority Override Table the same way as
88E6352, thus add MV88E6XXX_FLAG_G2_POT to MV88E6XXX_FLAGS_FAMILY_6390.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

a2a05db8

net: dsa: mv88e6xxx: distinguish Global 2 Rsvd2CPU · 51c901a7

Vivien Didelot authored Jul 17, 2017

The 88E6185 family only has one 16-bit register to mark the 16 802.1D
reserved multicast addresses in the range of 01:80:C2:00:00:0x as MGMT.

The 88E6352 family also has one 16-bit register to mark the 16 GARP
reserved multicast addresses in the range of 01:80:C2:00:00:2x as MGMT.

Split the existing mv88e6095 prefixed mgmt_rsvd2cpu operation into two
distinct mv88e6185 and mv88e6352 prefixed operations, and wrap its call
into a mv88e6xxx_rsvd2cpu_setup helper.

This allows us to also get rid of the MV88E6XXX_CAP_G2_MGMT_EN_* flags.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

51c901a7