Commits · 7785bba299a8dc8fe8390a0183dad3cafb3f1d80 · Kirill Smelkov / linux

15 Feb, 2017 6 commits

esp: Add a software GRO codepath · 7785bba2

Steffen Klassert authored Feb 15, 2017

This patch adds GRO ifrastructure and callbacks for ESP on
ipv4 and ipv6.

In case the GRO layer detects an ESP packet, the
esp{4,6}_gro_receive() function does a xfrm state lookup
and calls the xfrm input layer if it finds a matching state.
The packet will be decapsulated and reinjected it into layer 2.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

7785bba2

xfrm: Extend the sec_path for IPsec offloading · 54ef207a

Steffen Klassert authored Feb 15, 2017

We need to keep per packet offloading informations across
the layers. So we extend the sec_path to carry these for
the input and output offload codepath.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

54ef207a

xfrm: Export xfrm_parse_spi. · 1e295370

Steffen Klassert authored Feb 15, 2017

We need it in the ESP offload handlers, so export it.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

1e295370

net: Prepare gro for packet consuming gro callbacks · 25393d3f

Steffen Klassert authored Feb 15, 2017

The upcomming IPsec ESP gro callbacks will consume the skb,
so prepare for that.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

25393d3f

net: Add a skb_gro_flush_final helper. · 5f114163

Steffen Klassert authored Feb 15, 2017

Add a skb_gro_flush_final helper to prepare for  consuming
skbs in call_gro_receive. We will extend this helper to not
touch the skb if the skb is consumed by a gro callback with
a followup patch. We need this to handle the upcomming IPsec
ESP callbacks as they reinject the skb to the napi_gro_receive
asynchronous. The handler is used in all gro_receive functions
that can call the ESP gro handlers.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

5f114163

xfrm: Add a secpath_set helper. · b0fcee82

Steffen Klassert authored Feb 15, 2017

Add a new helper to set the secpath to the skb.
This avoids code duplication, as this is used
in multiple places.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

b0fcee82

09 Feb, 2017 7 commits

xfrm: policy: make policy backend const · 37b10383

Florian Westphal authored Feb 07, 2017

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

37b10383

xfrm: policy: remove xfrm_policy_put_afinfo · bdba9fe0

Florian Westphal authored Feb 07, 2017

Alternative is to keep it an make the (unused) afinfo arg const to avoid
the compiler warnings once the afinfo structs get constified.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

bdba9fe0

xfrm: policy: remove family field · a2817d8b

Florian Westphal authored Feb 07, 2017

Only needed it to register the policy backend at init time.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

a2817d8b

xfrm: policy: remove garbage_collect callback · 3d7d25a6

Florian Westphal authored Feb 07, 2017

Just call xfrm_garbage_collect_deferred() directly.
This gets rid of a write to afinfo in register/unregister and allows to
constify afinfo later on.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

3d7d25a6

xfrm: policy: xfrm_policy_unregister_afinfo can return void · 2b61997a

Florian Westphal authored Feb 07, 2017

Nothing checks the return value. Also, the errors returned on unregister
are impossible (we only support INET and INET6, so no way
xfrm_policy_afinfo[afinfo->family] can be anything other than 'afinfo'
itself).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

2b61997a

xfrm: policy: xfrm_get_tos cannot fail · f5e2bb4f

Florian Westphal authored Feb 07, 2017

The comment makes it look like get_tos() is used to validate something,
but it turns out the comment was about xfrm_find_bundle() which got removed
years ago.

xfrm_get_tos will return either the tos (ipv4) or 0 (ipv6).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

f5e2bb4f

xfrm: input: constify xfrm_input_afinfo · 960fdfde

Florian Westphal authored Feb 07, 2017

Nothing writes to these structures (the module owner was not used).

While at it, size xfrm_input_afinfo[] by the highest existing xfrm family
(INET6), not AF_MAX.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

960fdfde

07 Feb, 2017 7 commits

Merge branch 'bridge-improve-cache-utilization' · 152bff37

David S. Miller authored Feb 06, 2017

Nikolay Aleksandrov says:

====================
bridge: improve cache utilization

This is the first set which begins to deal with the bad bridge cache
access patterns. The first patch rearranges the bridge and port structs
a little so the frequently (and closely) accessed members are in the same
cache line. The second patch then moves the garbage collection to a
workqueue trying to improve system responsiveness under load (many fdbs)
and more importantly removes the need to check if the matched entry is
expired in __br_fdb_get which was a major source of false-sharing.
The third patch is a preparation for the final one which
If properly configured, i.e. ports bound to CPUs (thus updating "updated"
locally) then the bridge's HitM goes from 100% to 0%, but even without
binding we get a win because previously every lookup that iterated over
the hash chain caused false-sharing due to the first cache line being
used for both mac/vid and used/updated fields.

Some results from tests I've run:
(note that these were run in good conditions for the baseline, everything
 ran on a single NUMA node and there were only 3 fdbs)

1. baseline
100% Load HitM on the fdbs (between everyone who has done lookups and hit
                            one of the 3 hash chains of the communicating
                            src/dst fdbs)
Overall 5.06% Load HitM for the bridge, first place in the list

2. patched & ports bound to CPUs
0% Local load HitM, bridge is not even in the c2c report list
Also there's 3% consistent improvement in netperf tests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

152bff37

bridge: fdb: write to used and updated at most once per jiffy · 83a718d6

Nikolay Aleksandrov authored Feb 04, 2017

Writing once per jiffy is enough to limit the bridge's false sharing.
After this change the bridge doesn't show up in the local load HitM stats.
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

83a718d6

bridge: move write-heavy fdb members in their own cache line · 1214628c

Nikolay Aleksandrov authored Feb 04, 2017

Fdb's used and updated fields are written to on every packet forward and
packet receive respectively. Thus if we are receiving packets from a
particular fdb, they'll cause false-sharing with everyone who has looked
it up (even if it didn't match, since mac/vid share cache line!). The
"used" field is even worse since it is updated on every packet forward
to that fdb, thus the standard config where X ports use a single gateway
results in 100% fdb false-sharing. Note that this patch does not prevent
the last scenario, but it makes it better for other bridge participants
which are not using that fdb (and are only doing lookups over it).
The point is with this move we make sure that only communicating parties
get the false-sharing, in a later patch we'll show how to avoid that too.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1214628c

bridge: move to workqueue gc · f7cdee8a

Nikolay Aleksandrov authored Feb 04, 2017

Move the fdb garbage collector to a workqueue which fires at least 10
milliseconds apart and cleans chain by chain allowing for other tasks
to run in the meantime. When having thousands of fdbs the system is much
more responsive. Most importantly remove the need to check if the
matched entry has expired in __br_fdb_get that causes false-sharing and
is completely unnecessary if we cleanup entries, at worst we'll get 10ms
of traffic for that entry before it gets deleted.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f7cdee8a

bridge: modify bridge and port to have often accessed fields in one cache line · 1f90c7f3

Nikolay Aleksandrov authored Feb 04, 2017

Move around net_bridge so the vlan fields are in the beginning since
they're checked on every packet even if vlan filtering is disabled.
For the port move flags & vlan group to the beginning, so they're in the
same cache line with the port's state (both flags and state are checked
on each packet).
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1f90c7f3

bpf: enable verifier to add 0 to packet ptr · 63dfef75

William Tu authored Feb 04, 2017

The patch fixes the case when adding a zero value to the packet
pointer.  The zero value could come from src_reg equals type
BPF_K or CONST_IMM.  The patch fixes both, otherwise the verifer
reports the following error:
  [...]
    R0=imm0,min_value=0,max_value=0
    R1=pkt(id=0,off=0,r=4)
    R2=pkt_end R3=fp-12
    R4=imm4,min_value=4,max_value=4
    R5=pkt(id=0,off=4,r=4)
  269: (bf) r2 = r0     // r2 becomes imm0
  270: (77) r2 >>= 3
  271: (bf) r4 = r1     // r4 becomes pkt ptr
  272: (0f) r4 += r2    // r4 += 0
  addition of negative constant to packet pointer is not allowed
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Mihai Budiu <mbudiu@vmware.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

63dfef75

bpf: test for AND edge cases · 29200c19

Josef Bacik authored Feb 03, 2017

These two tests are based on the work done for f23cc643. The first test is
just a basic one to make sure we don't allow AND'ing negative values, even if it
would result in a valid index for the array. The second is a cleaned up version
of the original testcase provided by Jann Horn that resulted in the commit.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

29200c19

06 Feb, 2017 20 commits

Merge branch 'dsa-add-fabric-notifier' · 9172d2a0

David S. Miller authored Feb 06, 2017

Vivien Didelot says:

====================
net: dsa: add fabric notifier

When a switch fabric is composed of multiple switch chips, these chips
must be programmed accordingly when an event occurred on one of them.

Examples of such event include hardware bridging: when a Linux bridge
spans interconnected chips, they must be programmed to allow external
ports to ingress frames on their internal ports.

Another example is cross-chip hardware VLANs. Switch chips in-between
interconnected bridge ports must also configure a given VLAN to allow
packets to pass through them.

In order to support that, this patchset introduces a non-intrusive
notifier mechanism. It adds a notifier head in every DSA switch tree
(the said fabric), and a notifier block in every DSA switch chip.

When an even occurs, it is chained to all notifiers of the fabric.
Switch chips can react accordingly if they are cross-chip capable.

On a dynamic debug enabled system, bridging a port in a multi-chip
fabric will print something like this (ZII Rev B board):

    # brctl addif br0 lan3
    mv88e6085 0.1:00: crosschip DSA port 1.0 bridged to br0
    mv88e6085 0.4:00: crosschip DSA port 1.0 bridged to br0
    # brctl delif br0 lan3
    mv88e6085 0.1:00: crosschip DSA port 1.0 unbridged from br0
    mv88e6085 0.4:00: crosschip DSA port 1.0 unbridged from br0

Currently only bridging events are added. A patchset introducing support
for cross-chip hardware bridging configuration in mv88e6xxx will follow
right after. Then events for switchdev operations are next on the line.

We should note that non-switchdev events do not support rolling-back
switch-wide operations. We'll have to work on closer integration with
switchdev for that, like introducing new attributes or objects, to
benefit from the prepare and commit phases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9172d2a0

net: dsa: introduce bridge notifier · 04d3a4c6

Vivien Didelot authored Feb 03, 2017

A slave device will now notify the switch fabric once its port is
bridged or unbridged, instead of calling directly its switch operations.

This code allows propagating cross-chip bridging events in the fabric.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

04d3a4c6

net: dsa: add switch notifier · f515f192

Vivien Didelot authored Feb 03, 2017

Add a notifier block per DSA switch, registered against a notifier head
in the switch fabric they belong to.

This infrastructure will allow to propagate fabric-wide events such as
port bridging, VLAN configuration, etc. If a DSA switch driver cares
about cross-chip configuration, such events can be caught.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f515f192

net: dsa: change state setter scope · c5d35cb3

Vivien Didelot authored Feb 03, 2017

The scope of the functions inside net/dsa/slave.c must be the slave
net_device pointer. Change to state setter helper accordingly to
simplify callers.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c5d35cb3

net: dsa: rollback bridging on error · 9c265426

Vivien Didelot authored Feb 03, 2017

When an error is returned during the bridging of a port in a
NETDEV_CHANGEUPPER event, net/core/dev.c rolls back the operation.

Be consistent and unassign dp->bridge_dev when this happens.

In the meantime, add comments to document this behavior.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9c265426

net: dsa: simplify netdevice events handling · 8e92ab3a

Vivien Didelot authored Feb 03, 2017

Simplify the code handling the slave netdevice notifier call by
providing a dsa_slave_changeupper helper for NETDEV_CHANGEUPPER, and so
on (only this event is supported at the moment.)

Return NOTIFY_DONE when we did not care about an event, and NOTIFY_OK
when we were concerned but no error occurred, as the API suggests.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8e92ab3a

net: dsa: move netdevice notifier registration · 88e4f0ca

Vivien Didelot authored Feb 03, 2017

Move the netdevice notifier block register code in slave.c and provide
helpers for dsa.c to register and unregister it.

At the same time, check for errors since (un)register_netdevice_notifier
may fail.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

88e4f0ca

net/mlx5e: fix another maybe-uninitialized false-positive · 321fa4ff

Arnd Bergmann authored Feb 03, 2017

In commit abeffce9 ("net/mlx5e: Fix a -Wmaybe-uninitialized warning"), I fixed a
gcc warning for the ipv4 offload handling. Now we get the same warning for the
added ipv6 support:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:815:40: warning: 'out_dev' may be used uninitialized in this function [-Wmaybe-uninitialized]

We can apply the same workaround here as well.

Fixes: ce99f6b9 ("net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

321fa4ff

net-next: treewide use is_vlan_dev() helper function. · d0d7b10b

Parav Pandit authored Feb 04, 2017

This patch makes use of is_vlan_dev() function instead of flag
comparison which is exactly done by is_vlan_dev() helper function.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jon Maxwell <jmaxwell37@gmail.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d0d7b10b

net/mlx4_en: fix a condition · 73cfb2a2

Dan Carpenter authored Feb 03, 2017

There is a "||" vs "|" typo here so we test 0x1 instead of 0x6.

Fixes: 1f8176f7 ("net/mlx4_en: Check the enabling pptx/pprx flags in SET_PORT wrapper flow")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

73cfb2a2

sfc: don't rearm interrupts if busy polling · f820c0ac

Bert Kenward authored Feb 06, 2017

Since commit 364b6055 ("net: busy-poll: return busypolling status
to drivers"), napi_complete_done() returns a boolean that can be used
by drivers to conditionally rearm interrupts.

Testing with a 7142 shows a small latency improvement of ~100 ns.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f820c0ac

sctp: process fwd tsn chunk only when prsctp is enabled · d15c9ede

Xin Long authored Feb 03, 2017

This patch is to check if asoc->peer.prsctp_capable is set before
processing fwd tsn chunk, if not, it will return an ERROR to the
peer, just as rfc3758 section 3.3.1 demands.
Reported-by: Julian Cordes <julian.cordes@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d15c9ede

Merge branch 'mlxsw-cleanup-neigh-handling' · 3bc32d03

David S. Miller authored Feb 06, 2017

Jiri Pirko says:

====================
mlxsw: cleanup neigh handling

Ido says:

This series addresses long standing issues in the mlxsw driver
concerning neighbour reflection. It also prepares the code for follow-up
changes dealing with proper resource cleanup and nexthop reflection.

The first two patches convert the neighbour reflection code to use an
ordered workqueue, to prevent re-ordering of NEIGH_UPDATE events that
may happen following subsequent patches.

The third to fifth patches remove the ndo_neigh_{construct,destroy}
entry points from the driver, thereby relying only on NEIGH_UPDATE
events for neighbour reflection. This simplifies the code considerably.

Last patches are fallout and adjust nits in the code I noticed while
going over it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3bc32d03

mlxsw: spectrum_router: Fix typo in comment · fd76d910

Ido Schimmel authored Feb 06, 2017

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fd76d910

mlxsw: spectrum_router: Don't read 'nud_state' without lock · 01b1aa35

Ido Schimmel authored Feb 06, 2017

We periodically ask the neighbouring system to try and resolve
neighbours that are used for nexthops, but aren't currently resolved.

However, 'nud_state' is protected by the neighbour lock, so we shouldn't
access it without taking it. Instead, we can simply check the
'connected' field of the neighbour entry, which we update upon
NEIGH_UPDATE events.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

01b1aa35

mlxsw: spectrum_router: Remove redundant check · 8a0b7275

Ido Schimmel authored Feb 06, 2017

We only add neighbour entries that are also used for nexthops to
'nexthop_neighs_list', so when iterating over this list there's no need
to check that the entry is indeed used for nexthops.

Remove the redundant check.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8a0b7275

net: remove ndo_neigh_{construct, destroy} from stacked devices · a8eca326

Ido Schimmel authored Feb 06, 2017

In commit 18bfb924 ("net: introduce default neigh_construct/destroy
ndo calls for L2 upper devices") we added these ndos to stacked devices
such as team and bond, so that calls will be propagated to mlxsw.

However, previous commit removed the reliance on these ndos and no new
users of these ndos have appeared since above mentioned commit. We can
therefore safely remove this dead code.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a8eca326

mlxsw: spectrum_router: Simplify neighbour reflection · 5c8802f1

Ido Schimmel authored Feb 06, 2017

Up until now we had two interfaces for neighbour related configuration:
ndo_neigh_{construct,destroy} and NEIGH_UPDATE netevents. The ndos were
used to add and remove neighbours from the driver's cache, whereas the
netevent was used to reflect the neighbours into the device's tables.

However, if the NUD state of a neighbour isn't NUD_VALID or if the
neighbour is dead, then there's really no reason for us to keep it
inside our cache. The only exception to this rule are neighbours that
are also used for nexthops, which we periodically refresh to get them
resolved.

We can therefore eliminate the ndo entry point into the driver and
simplify the code, making it similar to the FIB reflection, which is
based solely on events. This also helps us avoid a locking issue, in
which the RIF cache was traversed without proper locking during
insertion into the neigh entry cache.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c8802f1

mlxsw: spectrum_router: Remove unused variable · de04b6a3

Ido Schimmel authored Feb 06, 2017

Since commit 33b1341c ("mlxsw: spectrum_router: Fix handling of
neighbour structure") we no longer use destination IP for neighbour
lookup, so remove it.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de04b6a3

mlxsw: spectrum_router: Use ordered workqueue for neigh updates · e60234dd

Ido Schimmel authored Feb 06, 2017

We currently associate each neighbour entry with a work item, so it's
not possible to have multiple events queued for the same neighbour
entry. However, this is about to be changed so that the neighbour entry
is only resolved when the work item is scheduled.

The above can result in a mismatch between the kernel's and the device's
neighbour table, unless the associated work items are processed in the
order in which they were submitted.

Do that by migrating the NEIGH_UPDATE work items to be processed in the
ordered workqueue which was recently introduced in mlxsw in commit
a3832b31 ("mlxsw: core: Create an ordered workqueue for FIB
offload").
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e60234dd