Commits · 9cf545ebd591da673bb6b6c88150212ad83567a9 · Kirill Smelkov / linux

09 Nov, 2018 40 commits

xfrm: policy: store inexact policies in a tree ordered by destination address · 9cf545eb

Florian Westphal authored Nov 07, 2018

This adds inexact lists per destination network, stored in a search tree.

Inexact lookups now return two 'candidate lists', the 'any' policies
('any' destionations), and a list of policies that share same
daddr/prefix.

Next patch will add a second search tree for 'saddr:any' policies
so we can avoid placing those on the 'any:any' list too.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

9cf545eb

xfrm: policy: add inexact policy search tree infrastructure · 6be3b0db

Florian Westphal authored Nov 07, 2018

At this time inexact policies are all searched in-order until the first
match is found.  After removal of the flow cache, this resolution has
to be performed for every packetm resulting in major slowdown when
number of inexact policies is high.

This adds infrastructure to later sort inexact policies into a tree.
This only introduces a single class: any:any.

Next patch will add a search tree to pre-sort policies that
have a fixed daddr/prefixlen, so in this patch the any:any class
will still be used for all policies.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

6be3b0db

xfrm: policy: consider if_id when hashing inexact policy · b5fe22e2

Florian Westphal authored Nov 07, 2018

This avoids searches of polices that cannot match in the first
place due to different interface id by placing them in different bins.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

b5fe22e2

xfrm: policy: store inexact policies in an rhashtable · 24969fac

Florian Westphal authored Nov 07, 2018

Switch packet-path lookups for inexact policies to rhashtable.

In this initial version, we now no longer need to search policies with
non-matching address family and type.

Next patch will add the if_id as well so lookups from the xfrm interface
driver only need to search inexact policies for that device.

Future patches will augment the hlist in each rhash bucket with a tree
and pre-sort policies according to daddr/prefix.

A single rhashtable is used. In order to avoid a full rhashtable walk on
netns exit, the bins get placed on a pernet list, i.e. we add almost no
cost for network namespaces that had no xfrm policies.

The inexact lists are kept in place, and policies are added to both the
per-rhash-inexact list and a pernet one.

The latter is needed for the control plane to handle migrate -- these
requests do not consider the if_id, so if we'd remove the inexact_list
now we would have to search all hash buckets and then figure
out which matching policy candidate is the most recent one -- this appears
a bit harder than just keeping the 'old' inexact list for this purpose.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

24969fac

xfrm: policy: return NULL when inexact search needed · cc1bb845

Florian Westphal authored Nov 07, 2018

currently policy_hash_bysel() returns the hash bucket list
(for exact policies), or the inexact list (when policy uses a prefix).

Searching this inexact list is slow, so it might be better to pre-sort
inexact lists into a tree or another data structure for faster
searching.

However, due to 'any' policies, that need to be searched in any case,
doing so will require that 'inexact' policies need to be handled
specially to decide the best search strategy.  So change hash_bysel()
and return NULL if the policy can't be handled via the policy hash
table.

Right now, we simply use the inexact list when this happens, but
future patch can then implement a different strategy.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

cc1bb845

xfrm: policy: split list insertion into a helper · a927d6af

Florian Westphal authored Nov 07, 2018

... so we can reuse this later without code duplication when we add
policy to a second inexact list.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

a927d6af

xfrm: security: iterate all, not inexact lists · ceb159e3

Florian Westphal authored Nov 07, 2018

currently all non-socket policies are either hashed in the dst table,
or placed on the 'inexact list'.  When flushing, we first walk the
table, then the (per-direction) inexact lists.

When we try and get rid of the inexact lists to having "n" inexact
lists (e.g. per-af inexact lists, or sorted into a tree), this walk
would become more complicated.

Simplify this: walk the 'all' list and skip socket policies during
traversal so we don't need to handle exact and inexact policies
separately anymore.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

ceb159e3

selftests: add xfrm policy test script · b69d540d

Florian Westphal authored Nov 07, 2018

add a script that adds a ipsec tunnel between two network
namespaces plus following policies:

.0/24 -> ipsec tunnel
.240/28 -> bypass
.253/32 -> ipsec tunnel

Then check that .254 bypasses tunnel (match /28 exception),
and .2 (match /24) and .253 (match direct policy) pass through the
tunnel.

Abuses iptables to check if ping did resolve an ipsec policy or not.

Also adds a bunch of 'block' rules that are not supposed to match.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

b69d540d

sfc: use the new __netdev_tx_sent_queue BQL optimisation · 29e12207

Edward Cree authored Nov 08, 2018

As added in 3e59020a ("net: bql: add __netdev_tx_sent_queue()"), which
 see for performance rationale.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

29e12207

Merge branch 'net-Remove-VLAN_TAG_PRESENT-from-drivers' · eb4149c9

David S. Miller authored Nov 08, 2018

Michał Mirosław says:

====================
net: Remove VLAN_TAG_PRESENT from drivers

This series removes VLAN_TAG_PRESENT use from network drivers in
preparation to removing its special meaning.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

eb4149c9

gianfar: remove use of VLAN_TAG_PRESENT · f4f9a5e6

Michał Mirosław authored Nov 08, 2018

Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

f4f9a5e6

OVS: remove use of VLAN_TAG_PRESENT · 9df46aef

Michał Mirosław authored Nov 08, 2018

This is a minimal change to allow removing of VLAN_TAG_PRESENT.
It leaves OVS unable to use CFI bit, as fixing this would need
a deeper surgery involving userspace interface.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

9df46aef

cnic: remove use of VLAN_TAG_PRESENT · f723a1a2

Michał Mirosław authored Nov 08, 2018

This just removes VLAN_TAG_PRESENT use.  VLAN TCI=0 special meaning is
deeply embedded in the driver code and so is left as is.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

f723a1a2

i40iw: remove use of VLAN_TAG_PRESENT · 1ef212af

Michał Mirosław authored Nov 08, 2018

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

1ef212af

net: socionext: refactor netsec_alloc_dring() · 0d404a61

Ilias Apalodimas authored Nov 08, 2018

return -ENOMEM directly instead of assigning it in a variable
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0d404a61

net: socionext: different approach on DMA · 4acb20b4

Ilias Apalodimas authored Nov 08, 2018

Current driver dynamically allocates an skb and maps it as DMA Rx
buffer. In order to prepare for upcoming XDP changes, let's introduce a
different allocation scheme.
Buffers are allocated dynamically and mapped into hardware.
During the Rx operation the driver uses build_skb() to produce the
necessary buffers for the network stack.
This change increases performance ~15% on 64b packets with smmu disabled
and ~5% with smmu enabled
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

4acb20b4

net: qca_spi: Add available buffer space verification · 026b907d

Stefan Wahren authored Nov 08, 2018

Interferences on the SPI line could distort the response of
available buffer space. So at least we should check that the
response doesn't exceed the maximum available buffer space.
In error case increase a new error counter and retry it later.
This behavior avoids buffer errors in the QCA7000, which
results in an unnecessary chip reset including packet loss.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

026b907d

sock: Reset dst when changing sk_mark via setsockopt · 50254256

David Barmann authored Nov 08, 2018

When setting the SO_MARK socket option, if the mark changes, the dst
needs to be reset so that a new route lookup is performed.

This fixes the case where an application wants to change routing by
setting a new sk_mark.  If this is done after some packets have already
been sent, the dst is cached and has no effect.
Signed-off-by: David Barmann <david.barmann@stackpath.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

50254256

Merge branch 's390-qeth-next' · 52358cb5

David S. Miller authored Nov 08, 2018

Julian Wiedmann says:

====================
s390/qeth: updates 2018-11-08

please apply the following qeth patches to net-next.

The first patch allows one more device type to query the FW for a MAC address,
the others are all basically just removal of duplicated or unused code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

52358cb5

s390/qeth: don't process hsuid in qeth_l3_setup_netdev() · ded9da1f

Julian Wiedmann authored Nov 08, 2018

qeth_l3_setup_netdev() checks if the hsuid attribute is set on the qeth
device, and propagates it to the net_device. In the past this was needed
to pick up any hsuid that was set before allocation of the net_device.

With commit d3d1b205 ("s390/qeth: allocate netdevice early") this
is no longer necessary, qeth_l3_dev_hsuid_store() always stores the
hsuid straight into dev->perm_addr.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ded9da1f

s390/qeth: remove unused fallback in Layer3's MAC code · 9168f5ae

Julian Wiedmann authored Nov 08, 2018

If the CREATE ADDR sent by qeth_l3_iqd_read_initial_mac() fails, its
callback sets a random MAC address on the net_device. The error then
propagates back, and qeth_l3_setup_netdev() bails out without
registering the net_device.

Any subsequent call to qeth_l3_setup_netdev() will then attempt a fresh
CREATE ADDR which either 1) also fails, or 2) sets a proper MAC address
on the net_device. Consequently, the net_device will never be registered
with a random MAC and we can drop the fallback code.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9168f5ae

s390/qeth: remove two IPA command helpers · 4fa55fa9

Julian Wiedmann authored Nov 08, 2018

qeth_l3_send_ipa_arp_cmd() is merely a wrapper around
qeth_send_control_data() now. So push the length adjustment into
QETH_SETASS_BASE_LEN, and remove the wrapper. While at it, also remove
some redundant 0-initializations.

qeth_send_setassparms() requires that callers prepare their command
parameters, so that they can be copied into the parameter area in one
go. Skip the indirection, and just let callers set up the command
themselves.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4fa55fa9

s390/qeth: replace open-coded cmd setup · 605c9d5f

Julian Wiedmann authored Nov 08, 2018

Call qeth_prepare_ipa_cmd() during setup of a new IPA cmd buffer, so
that it is used for all commands. Thus ARP and SNMP requests don't have
to do their own initialization.

This will now also set the proper MPC protocol version for SNMP requests
on L2 devices.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

605c9d5f

s390/qeth: remove card list · d7d18da1

Julian Wiedmann authored Nov 08, 2018

Re-implement the card-by-RDEV lookup by using device model concepts, and
remove the now redundant list of all qeth card instances in the system.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d7d18da1

s390/qeth: unify transmit code · 81ec5439

Julian Wiedmann authored Nov 08, 2018

Since commit 82bf5c08 ("s390/qeth: add support for IPv6 TSO"),
qeth_xmit() also knows how to build TSO packets and is practically
identical to qeth_l3_xmit().
Convert qeth_l3_xmit() into a thin wrapper that merely strips the
L2 header off a packet, and calls qeth_xmit() for the actual
TX processing.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

81ec5439

s390/qeth: handle af_iucv skbs in qeth_l3_fill_header() · 5a541f6d

Julian Wiedmann authored Nov 08, 2018

Filling the HW header from one single function will make it easier to
rip out all the duplicated transmit code in qeth_l3_xmit(). On top, this
saves one conditional branch in the TSO path.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a541f6d

s390/qeth: utilize virtual MAC for Layer2 OSD devices · b144b99f

Julian Wiedmann authored Nov 08, 2018

By default, READ MAC on a Layer2 OSD device returns the adapter's
burnt-in MAC address. Given the default scenario of many virtual devices
on the same adapter, qeth can't make any use of this address and
therefore skips the READ MAC call for this device type.

But in some configurations, the READ MAC command for a Layer2 OSD device
actually returns a pre-provisioned, virtual MAC address. So enable the
READ MAC code to detect this situation, and let the L2 subdriver
call READ MAC for OSD devices.

This also removes the QETH_LAYER2_MAC_READ flag, which protects L2
devices against calling READ MAC multiple times. Instead protect the
whole call to qeth_l2_request_initial_mac().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b144b99f

openvswitch: remove BUG_ON from get_dpdev · 04087d9a

Li RongQing authored Nov 08, 2018

if local is NULL pointer, and the following access of local's
dev will trigger panic, which is same as BUG_ON
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

04087d9a

Merge branch 'ICMP-error-handling-for-UDP-tunnels' · 20da4ef9

David S. Miller authored Nov 08, 2018

Stefano Brivio says:

====================
ICMP error handling for UDP tunnels

This series introduces ICMP error handling for UDP tunnels and
encapsulations and related selftests. We need to handle ICMP errors to
support PMTU discovery and route redirection -- this support is entirely
missing right now:

- patch 1/11 adds a socket lookup for UDP tunnels that use, by design,
  the same destination port on both endpoints -- i.e. VXLAN and GENEVE
- patches 2/11 to 7/11 are specific to VxLAN and GENEVE
- patches 8/11 and 9/11 add infrastructure for lookup of encapsulations
  where sent packets cannot be matched via receiving socket lookup, i.e.
  FoU and GUE
- patches 10/11 and 11/11 are specific to FoU and GUE

v2: changes are listed in the single patches
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

20da4ef9

selftests: pmtu: Introduce FoU and GUE PMTU exceptions tests · 56fd865f

Stefano Brivio authored Nov 08, 2018

Introduce eight tests, for FoU and GUE, with IPv4 and IPv6 payload,
on IPv4 and IPv6 transport, that check that PMTU exceptions are created
with the right value when exceeding the MTU on a link of the path.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

56fd865f

fou, fou6: ICMP error handlers for FoU and GUE · b8a51b38

Stefano Brivio authored Nov 08, 2018

As the destination port in FoU and GUE receiving sockets doesn't
necessarily match the remote destination port, we can't associate errors
to the encapsulating tunnels with a socket lookup -- we need to blindly
try them instead. This means we don't even know if we are handling errors
for FoU or GUE without digging into the packets.

Hence, implement a single handler for both, one for IPv4 and one for IPv6,
that will check whether the packet that generated the ICMP error used a
direct IP encapsulation or if it had a GUE header, and send the error to
the matching protocol handler, if any.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

b8a51b38

udp: Support for error handlers of tunnels with arbitrary destination port · e7cc0824

Stefano Brivio authored Nov 08, 2018

ICMP error handling is currently not possible for UDP tunnels not
employing a receiving socket with local destination port matching the
remote one, because we have no way to look them up.

Add an err_handler tunnel encapsulation operation that can be exported by
tunnels in order to pass the error to the protocol implementing the
encapsulation. We can't easily use a lookup function as we did for VXLAN
and GENEVE, as protocol error handlers, which would be in turn called by
implementations of this new operation, handle the errors themselves,
together with the tunnel lookup.

Without a socket, we can't be sure which encapsulation error handler is
the appropriate one: encapsulation handlers (the ones for FoU and GUE
introduced in the next patch, e.g.) will need to check the new error codes
returned by protocol handlers to figure out if errors match the given
encapsulation, and, in turn, report this error back, so that we can try
all of them in __udp{4,6}_lib_err_encap_no_sk() until we have a match.

v2:
- Name all arguments in err_handler prototypes (David Miller)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

e7cc0824

net: Convert protocol error handlers from void to int · 32bbd879

Stefano Brivio authored Nov 08, 2018

We'll need this to handle ICMP errors for tunnels without a sending socket
(i.e. FoU and GUE). There, we might have to look up different types of IP
tunnels, registered as network protocols, before we get a match, so we
want this for the error handlers of IPPROTO_IPIP and IPPROTO_IPV6 in both
inet_protos and inet6_protos. These error codes will be used in the next
patch.

For consistency, return sensible error codes in protocol error handlers
whenever handlers can't handle errors because, even if valid, they don't
match a protocol or any of its states.

This has no effect on existing error handling paths.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

32bbd879

selftests: pmtu: Introduce tests for IPv4/IPv6 over GENEVE over IPv4/IPv6 · ce733661

Stefano Brivio authored Nov 08, 2018

Use a router between endpoints, implemented via namespaces, set a low MTU
between router and destination endpoint, exceed it and check PMTU value in
route exceptions.

v2:
- Introduce IPv4 tests right away, if iproute2 doesn't support the 'df'
  link option they will be skipped (David Ahern)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ce733661

geneve: Allow configuration of DF behaviour · a025fb5f

Stefano Brivio authored Nov 08, 2018

draft-ietf-nvo3-geneve-08 says:

   It is strongly RECOMMENDED that Path MTU Discovery ([RFC1191],
   [RFC1981]) be used by setting the DF bit in the IP header when Geneve
   packets are transmitted over IPv4 (this is the default with IPv6).

Now that ICMP error handling is working for GENEVE, we can comply with
this recommendation.

Make this configurable, though, to avoid breaking existing setups. By
default, DF won't be set. It can be set or inherited from inner IPv4
packets. If it's configured to be inherited and we are encapsulating IPv6,
it will be set.

This only applies to non-lwt tunnels: if an external control plane is
used, tunnel key will still control the DF flag.

v2:
- DF behaviour configuration only applies for non-lwt tunnels, apply DF
  setting only if (!geneve->collect_md) in geneve_xmit_skb()
  (Stephen Hemminger)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

a025fb5f

geneve: ICMP error lookup handler · a0796644

Stefano Brivio authored Nov 08, 2018

Export an encap_err_lookup() operation to match an ICMP error against a
valid VNI.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

a0796644

selftests: pmtu: Introduce tests for IPv4/IPv6 over VXLAN over IPv4/IPv6 · 58288879

Stefano Brivio authored Nov 08, 2018

Use a router between endpoints, implemented via namespaces, set a low MTU
between router and destination endpoint, exceed it and check PMTU value in
route exceptions.

v2:
- Change all occurrences of VxLAN to VXLAN (Jiri Benc)
- Introduce IPv4 tests right away, if iproute2 doesn't support the 'df'
  link option they will be skipped (David Ahern)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

58288879

vxlan: Allow configuration of DF behaviour · b4d30697

Stefano Brivio authored Nov 08, 2018

Allow users to set the IPv4 DF bit in outgoing packets, or to inherit its
value from the IPv4 inner header. If the encapsulated protocol is IPv6 and
DF is configured to be inherited, always set it.

For IPv4, inheriting DF from the inner header was probably intended from
the very beginning judging by the comment to vxlan_xmit(), but it wasn't
actually implemented -- also because it would have done more harm than
good, without handling for ICMP Fragmentation Needed messages.

According to RFC 7348, "Path MTU discovery MAY be used". An expired RFC
draft, draft-saum-nvo3-pmtud-over-vxlan-05, whose purpose was to describe
PMTUD implementation, says that "is a MUST that Vxlan gateways [...]
SHOULD set the DF-bit [...]", whatever that means.

Given this background, the only sane option is probably to let the user
decide, and keep the current behaviour as default.

This only applies to non-lwt tunnels: if an external control plane is
used, tunnel key will still control the DF flag.

v2:
- DF behaviour configuration only applies for non-lwt tunnels, move DF
  setting to if (!info) block in vxlan_xmit_one() (Stephen Hemminger)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

b4d30697

vxlan: ICMP error lookup handler · c3a43b9f

Stefano Brivio authored Nov 08, 2018

Export an encap_err_lookup() operation to match an ICMP error against a
valid VNI.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3a43b9f

udp: Handle ICMP errors for tunnels with same destination port on both endpoints · a36e185e

Stefano Brivio authored Nov 08, 2018

For both IPv4 and IPv6, if we can't match errors to a socket, try
tunnels before ignoring them. Look up a socket with the original source
and destination ports as found in the UDP packet inside the ICMP payload,
this will work for tunnels that force the same destination port for both
endpoints, i.e. VXLAN and GENEVE.

Actually, lwtunnels could break this assumption if they are configured by
an external control plane to have different destination ports on the
endpoints: in this case, we won't be able to trace ICMP messages back to
them.

For IPv6 redirect messages, call ip6_redirect() directly with the output
interface argument set to the interface we received the packet from (as
it's the very interface we should build the exception on), otherwise the
new nexthop will be rejected. There's no such need for IPv4.

Tunnels can now export an encap_err_lookup() operation that indicates a
match. Pass the packet to the lookup function, and if the tunnel driver
reports a matching association, continue with regular ICMP error handling.

v2:
- Added newline between network and transport header sets in
  __udp{4,6}_lib_err_encap() (David Miller)
- Removed redundant skb_reset_network_header(skb); in
  __udp4_lib_err_encap()
- Removed redundant reassignment of iph in __udp4_lib_err_encap()
  (Sabrina Dubroca)
- Edited comment to __udp{4,6}_lib_err_encap() to reflect the fact this
  won't work with lwtunnels configured to use asymmetric ports. By the way,
  it's VXLAN, not VxLAN (Jiri Benc)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

a36e185e