Commits · 12479f627f7c2017e6fcd50b56c2537592674c50 · nexedi / linux

09 Jul, 2019 40 commits

bnxt_en: Add page_pool_destroy() during RX ring cleanup. · 12479f62

Michael Chan authored Jul 09, 2019

Add page_pool_destroy() in bnxt_free_rx_rings() during normal RX ring
cleanup, as Ilias has informed us that the following commit has been
merged:

1da4bbef ("net: core: page_pool: add user refcnt and reintroduce page_pool_destroy")

The special error handling code to call page_pool_free() can now be
removed.  bnxt_free_rx_rings() will always be called during normal
shutdown or any error paths.

Fixes: 322b87ca ("bnxt_en: add page_pool support")
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

12479f62

Merge branch 'net-sched-Introduce-tc-connection-tracking' · 216dcb05

David S. Miller authored Jul 09, 2019

Paul Blakey says:

====================
net/sched: Introduce tc connection tracking

This patch series add connection tracking capabilities in tc sw datapath.
It does so via a new tc action, called act_ct, and new tc flower classifier matching
on conntrack state, mark and label.

Usage is as follows:
$ tc qdisc add dev ens1f0_0 ingress
$ tc qdisc add dev ens1f0_1 ingress

$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 2
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_state +trk+new \
  action ct zone 2 commit mark 0xbb nat src addr 5.5.5.7 pipe \
  action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 1
$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 1 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_0

The pattern used in the design here closely resembles OvS, as the plan is to also offload
OvS conntrack rules to tc. OvS datapath rules uses it's recirculation mechanism to send
specific packets to conntrack, and return with the new conntrack state (ct_state) on some other recirc_id
to be matched again (we use goto chain for this).

This results in the following OvS datapath rules:

recirc_id(0),in_port(ens1f0_0),ct_state(-trk),... actions:ct(zone=2),recirc(2)
recirc_id(2),in_port(ens1f0_0),ct_state(+new+trk),ct_mark(0xbb),... actions:ct(commit,zone=2,nat(src=5.5.5.7),mark=0xbb),ens1f0_1
recirc_id(2),in_port(ens1f0_0),ct_state(+est+trk),ct_mark(0xbb),... actions:ct(zone=2,nat),ens1f0_1

recirc_id(1),in_port(ens1f0_1),ct_state(-trk),... actions:ct(zone=2),recirc(1)
recirc_id(1),in_port(ens1f0_1),ct_state(+est+trk),... actions:ct(zone=2,nat),ens1f0_0

Changelog:
	See individual patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

216dcb05

tc-tests: Add tc action ct tests · 6e52fca3

Paul Blakey authored Jul 09, 2019

Add 13 tests ensuring the command line is doing what is supposed to do.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6e52fca3

net/sched: cls_flower: Add matching on conntrack info · e0ace68a

Paul Blakey authored Jul 09, 2019

New matches for conntrack mark, label, zone, and state.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e0ace68a

net/flow_dissector: add connection tracking dissection · 75a56758

Paul Blakey authored Jul 09, 2019

Retreives connection tracking zone, mark, label, and state from
a SKB.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

75a56758

net/sched: Introduce action ct · b57dc7c1

Paul Blakey authored Jul 09, 2019

Allow sending a packet to conntrack module for connection tracking.

The packet will be marked with conntrack connection's state, and
any metadata such as conntrack mark and label. This state metadata
can later be matched against with tc classifers, for example with the
flower classifier as below.

In addition to committing new connections the user can optionally
specific a zone to track within, set a mark/label and configure nat
with an address range and port range.

Usage is as follows:
$ tc qdisc add dev ens1f0_0 ingress
$ tc qdisc add dev ens1f0_1 ingress

$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 2
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_state +trk+new \
  action ct zone 2 commit mark 0xbb nat src addr 5.5.5.7 pipe \
  action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_0 ingress \
  prio 1 chain 2 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 0 proto ip \
  flower ip_proto tcp ct_state -trk \
  action ct zone 2 pipe \
  action goto chain 1
$ tc filter add dev ens1f0_1 ingress \
  prio 1 chain 1 proto ip \
  flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
  action ct nat pipe \
  action mirred egress redirect dev ens1f0_0
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>

Changelog:
V5->V6:
	Added CONFIG_NF_DEFRAG_IPV6 in handle fragments ipv6 case
V4->V5:
	Reordered nf_conntrack_put() in tcf_ct_skb_nfct_cached()
V3->V4:
	Added strict_start_type for act_ct policy
V2->V3:
	Fixed david's comments: Removed extra newline after rcu in tcf_ct_params , and indent of break in act_ct.c
V1->V2:
	Fixed parsing of ranges TCA_CT_NAT_IPV6_MAX as 'else' case overwritten ipv4 max
	Refactored NAT_PORT_MIN_MAX range handling as well
	Added ipv4/ipv6 defragmentation
	Removed extra skb pull push of nw offset in exectute nat
	Refactored tcf_ct_skb_network_trim after pull
	Removed TCA_ACT_CT define
Signed-off-by: David S. Miller <davem@davemloft.net>

b57dc7c1

Merge branch 'devlink-Introduce-PCI-PF-VF-ports-and-attributes' · f108c887

David S. Miller authored Jul 09, 2019

Parav Pandit says:

====================
devlink: Introduce PCI PF, VF ports and attributes

This patchset carry forwards the work initiated in [1] and discussion
futher concluded at [2].

To improve visibility of representor netdevice, its association with
PF or VF, physical port, two new devlink port flavours are added as
PCI PF and PCI VF ports.

A sample eswitch view can be seen below, which will be futher extended to
mdev subdevices of a PCI function in future.

Patch-1 moves physical port's attribute to new structure
Patch-2 enhances netlink response to consider port flavour
Patch-3,4 extends devlink port attributes and port flavour
Patch-5 extends mlx5 driver to register devlink ports for PF, VF and
physical link.

                                +---+      +---+
                              vf|   |      |   | pf
                                +-+-+      +-+-+
physical link <---------+         |          |
                        |         |          |
                        |         |          |
                      +-+-+     +-+-+      +-+-+
                      | 1 |     | 2 |      | 3 |
                   +--+---+-----+---+------+---+--+
                   |  physical   vf         pf    |
                   |  port       port       port  |
                   |                              |
                   |             eswitch          |
                   |                              |
                   +------------------------------+

[1] https://www.spinics.net/lists/netdev/msg555797.html
[2] https://marc.info/?l=linux-netdev&m=155354609408485&w=2

Changelog:
v5->v6:
 - Fixed port flavour check order for PCI PF vs other flavours in
   netlink response.
 - Changed 'physical' to 'phys'.
v4->v5:
 - Split first patch to two patches to handle netlink response in
   separate patch.
 - Corrected typo 'otwerwise' to 'otherwise' in patches 3 and 4.
v3->v4:
 - Addressed comments from Jiri.
 - Split first patch to two patches.
 - Renamed phys_port to physical to be consistent with pci_pf.
 - Removed port_number from __devlink_port_attrs_set and moved
   assignment to caller function.
 - Used capital letter while moving old comment to new structure.
 - Removed helper function is_devlink_phy_port_num_supported().
v2->v3:
 - Made port_number and split_port_number applicable only to
   physical port flavours.
v1->v2:
 - Updated new APIs and mlx5 driver to drop port_number for PF, VF
   attributes
 - Updated port_number comment for its usage
 - Limited putting port_number to physical ports
====================
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f108c887

net/mlx5e: Register devlink ports for physical link, PCI PF, VFs · f60f315d

Parav Pandit authored Jul 08, 2019

Register devlink port of physical port, PCI PF and PCI VF flavour
for each PF, VF when a given devlink instance is in switchdev mode.

Implement ndo_get_devlink_port callback API to make use of registered
devlink ports.
This eliminates ndo_get_phys_port_name() and ndo_get_port_parent_id()
callbacks. Hence, remove them.

An example output with 2 VFs, without a PF and single uplink port is
below.

$devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0 flavour physical
pci/0000:05:00.0/1: type eth netdev eth1 flavour pcivf pfnum 0 vfnum 0
pci/0000:05:00.0/2: type eth netdev eth2 flavour pcivf pfnum 0 vfnum 1
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f60f315d

devlink: Introduce PCI VF port flavour and port attribute · e41b6bf3

Parav Pandit authored Jul 08, 2019

In an eswitch, PCI VF may have port which is normally represented using
a representor netdevice.
To have better visibility of eswitch port, its association with VF,
and its representor netdevice, introduce a PCI VF port flavour.

When devlink port flavour is PCI VF, fill up PCI VF attributes of
the port.

Extend port name creation using PCI PF and VF number scheme on best
effort basis, so that vendor drivers can skip defining their own scheme.

$ devlink port show
pci/0000:05:00.0/0: type eth netdev eth0 flavour pcipf pfnum 0
pci/0000:05:00.0/1: type eth netdev eth1 flavour pcivf pfnum 0 vfnum 0
pci/0000:05:00.0/2: type eth netdev eth2 flavour pcivf pfnum 0 vfnum 1
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e41b6bf3

devlink: Introduce PCI PF port flavour and port attribute · 98fd2d65

Parav Pandit authored Jul 08, 2019

In an eswitch, PCI PF may have port which is normally represented
using a representor netdevice.
To have better visibility of eswitch port, its association with
PF and a representor netdevice, introduce a PCI PF port
flavour and port attriute.

When devlink port flavour is PCI PF, fill up PCI PF attributes of the
port.

Extend port name creation using PCI PF number on best effort basis.
So that vendor drivers can skip defining their own scheme.

$ devlink port show
pci/0000:05:00.0/0: type eth netdev eth0 flavour pcipf pfnum 0
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

98fd2d65

devlink: Return physical port fields only for applicable port flavours · a2c6b87d

Parav Pandit authored Jul 08, 2019

Physical port number and split group fields are applicable only to
physical port flavours such as PHYSICAL, CPU and DSA.
Hence limit returning those values in netlink response to such port
flavours.
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a2c6b87d

devlink: Refactor physical port attributes · 378ef01b

Parav Pandit authored Jul 08, 2019

To support additional devlink port flavours and to support few common
and few different port attributes, move physical port attributes to a
different structure.
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

378ef01b

Merge branch 'nfp-tls-fixes-for-initial-TLS-support' · b14a260e

David S. Miller authored Jul 08, 2019

Jakub Kicinski says:

====================
nfp: tls: fixes for initial TLS support

This series brings various fixes to nfp tls offload recently added
to net-next.

First 4 patches revolve around device mailbox communication, trying
to make it more reliable. Next patch fixes statistical counter.
Patch 6 improves the TX resync if device communication failed.
Patch 7 makes sure we remove keys from memory after talking to FW.
Patch 8 adds missing tls context initialization, we fill in the
context information from various places based on the configuration
and looks like we missed the init in the case of where TX is
offloaded, but RX wasn't initialized yet. Patches 9 and 10 make
the nfp driver undo TLS state changes if we need to drop the
frame (e.g. due to DMA mapping error).

Last but not least TLS fallback should not adjust socket memory
after skb_orphan_partial(). This code will go away once we forbid
orphaning of skbs in need of crypto, but that's "real" -next
material, so lets do a quick fix.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b14a260e

net/tls: fix socket wmem accounting on fallback with netem · 5c4b4608

Jakub Kicinski authored Jul 08, 2019

netem runs skb_orphan_partial() which "disconnects" the skb
from normal TCP write memory accounting.  We should not adjust
sk->sk_wmem_alloc on the fallback path for such skbs.

Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c4b4608

nfp: tls: undo TLS sequence tracking when dropping the frame · 5a4cea28

Jakub Kicinski authored Jul 08, 2019

If driver has to drop the TLS frame it needs to undo the TCP
sequence tracking changes, otherwise device will receive
segments out of order and drop them.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a4cea28

nfp: tls: avoid one of the ifdefs for TLS · c8d3928e

Jakub Kicinski authored Jul 08, 2019

Move the #ifdef CONFIG_TLS_DEVICE a little so we can eliminate
the other one.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c8d3928e

net/tls: add missing prot info init · ab232e61

Jakub Kicinski authored Jul 08, 2019

Turns out TLS_TX in HW offload mode does not initialize tls_prot_info.
Since commit 9cd81988 ("net/tls: use version from prot") we actually
use this field on the datapath.  Luckily we always compare it to TLS 1.3,
and assume 1.2 otherwise. So since zero is not equal to 1.3, everything
worked fine.

Fixes: 9cd81988 ("net/tls: use version from prot")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ab232e61

nfp: tls: don't leave key material in freed FW cmsg skbs · c3b64911

Jakub Kicinski authored Jul 08, 2019

Make sure the contents of the skb which carried key material
to the FW is cleared.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3b64911

net/tls: don't clear TX resync flag on error · b5d9a834

Dirk van der Merwe authored Jul 08, 2019

Introduce a return code for the tls_dev_resync callback.

When the driver TX resync fails, kernel can retry the resync again
until it succeeds.  This prevents drivers from attempting to offload
TLS packets if the connection is known to be out of sync.

We don't worry about the RX resync since they will be retried naturally
as more encrypted records get received.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b5d9a834

nfp: tls: count TSO segments separately for the TLS offload · 427545b3

Jakub Kicinski authored Jul 08, 2019

Count the number of successfully submitted TLS segments,
not skbs. This will make it easier to compare the TLS
encryption count against other counters.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

427545b3

nfp: ccm: increase message limits · f6dfa315

Dirk van der Merwe authored Jul 08, 2019

Increase the batch limit to consume small message bursts more
effectively. Practically, the effect on the 'add' messages is not
significant since the mailbox is sized such that the 'add' messages are
still limited to the same order of magnitude that it was originally set
for.

Furthermore, increase the queue size limit to 1024 entries. This further
improves the handling of bursts of small control messages.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6dfa315

nfp: tls: use unique connection ids instead of 4-tuple for TX · 53601c68

Jakub Kicinski authored Jul 08, 2019

Connection 4 tuple reuse is slightly problematic - TLS socket
and context do not get destroyed until all the associated skbs
left the system and all references are released. This leads
to stale connection entry in the device preventing addition
of new one if the 4 tuple is reused quickly enough.

Instead of using read 4 tuple as the key use a unique ID.
Set the protocol to TCP and port to 0 to ensure no collisions
with real connections.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

53601c68

nfp: tls: move setting ipver_vlan to a helper · ff8869d5

Jakub Kicinski authored Jul 08, 2019

Long lines are ugly.  No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff8869d5

nfp: tls: ignore queue limits for delete commands · 0f93242d

Jakub Kicinski authored Jul 08, 2019

We need to do our best not to drop delete commands, otherwise
we will have stale entries in the connection table.  Ignore
the control message queue limits for delete commands.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0f93242d

sctp: remove rcu_read_lock from sctp_bind_addr_state · 3cab2afb

Xin Long authored Jul 09, 2019

sctp_bind_addr_state() is called either in packet rcv path or
by sctp_copy_local_addr_list(), which are under rcu_read_lock.
So there's no need to call it again in sctp_bind_addr_state().
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3cab2afb

Merge branch 'sctp-tidyup' · 6c6fbad6

David S. Miller authored Jul 08, 2019

Xin Long says:

====================
sctp: tidy up some ep and asoc feature flags

This patchset is to remove some unnecessary feature flags from
sctp_assocation and move some others to the right places.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6c6fbad6

sctp: rename sp strm_interleave to ep intl_enable · e55f4b8b

Xin Long authored Jul 09, 2019

Like other endpoint features, strm_interleave should be moved to
sctp_endpoint and renamed to intl_enable.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e55f4b8b

sctp: rename asoc intl_enable to asoc peer.intl_capable · da1f6d4d

Xin Long authored Jul 09, 2019

To keep consistent with other asoc features, we move intl_enable
to peer.intl_capable in asoc.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

da1f6d4d

sctp: remove prsctp_enable from asoc · 1c134753

Xin Long authored Jul 09, 2019

Like reconf_enable, prsctp_enable should also be removed from asoc,
as asoc->peer.prsctp_capable has taken its job.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1c134753

sctp: remove reconf_enable from asoc · a96701fb

Xin Long authored Jul 09, 2019

asoc's reconf support is actually decided by the 4-shakehand negotiation,
not something that users can set by sockopt. asoc->peer.reconf_capable is
working for this. So remove it from asoc.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a96701fb

net: phy: Make use of linkmode_mod_bit helper · ccf355e5

Fuqian Huang authored Jul 08, 2019

linkmode_mod_bit is introduced as a helper function to set/clear
bits in a linkmode.
Replace the if else code structure with a call to the helper
linkmode_mod_bit.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ccf355e5

Merge branch 'Add-MPLS-actions-to-TC' · 88e2f284

David S. Miller authored Jul 08, 2019

John Hurley says:

====================
Add MPLS actions to TC

This patchset introduces a new TC action module that allows the
manipulation of the MPLS headers of packets. The code impliments
functionality including push, pop, and modify.

Also included are tests for the new funtionality. Note that these will
require iproute2 changes to be submitted soon.

NOTE: these patches are applied to net-next along with the patch:
[PATCH net 1/1] net: openvswitch: fix csum updates for MPLS actions
This patch has been accepted into net but, at time of posting, is not yet
in net-next.

v6-v7:
- add extra tests for setting max/min and exceeding range of fields -
  patch 5 (Roman Mashak)
v5-v6:
- add CONFIG_NET_ACT_MPLS to tc-testing config file - patch 5
  (Davide Caratti)
v4-v5:
- move mpls_hdr() call to after skb_ensure_writable - patch 3
  (Willem de Bruijn)
- move mpls_dec_ttl to helper - patch 4 (Willem de Bruijn)
- add iproute2 usage example to commit msg - patch 4 (David Ahern)
- align label validation with mpls core code - patch 4 (David Ahern)
- improve extack message for no proto in mpls pop - patch 4 (David Ahern)
v3-v4:
- refactor and reuse OvS code (Cong Wang)
- use csum API rather than skb_post*rscum to update skb->csum (Cong Wang)
- remove unnecessary warning (Cong Wang)
- add comments to uapi attributes (David Ahern)
- set strict type policy check for TCA_MPLS_UNSPEC (David Ahern)
- expand/improve extack messages (David Ahern)
- add option to manually set BOS
v2-v3:
- remove a few unnecessary line breaks (Jiri Pirko)
- retract hw offload patch from set (resubmit with driver changes) (Jiri)
v1->v2:
- ensure TCA_ID_MPLS does not conflict with TCA_ID_CTINFO (Davide Caratti)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

88e2f284

tc-tests: actions: add MPLS tests · 6fb8dbca

John Hurley authored Jul 07, 2019

Add a new series of selftests to verify the functionality of act_mpls in
TC.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6fb8dbca

net: sched: add mpls manipulation actions to TC · 2a2ea508

John Hurley authored Jul 07, 2019

Currently, TC offers the ability to match on the MPLS fields of a packet
through the use of the flow_dissector_key_mpls struct. However, as yet, TC
actions do not allow the modification or manipulation of such fields.

Add a new module that registers TC action ops to allow manipulation of
MPLS. This includes the ability to push and pop headers as well as modify
the contents of new or existing headers. A further action to decrement the
TTL field of an MPLS header is also provided with a new helper added to
support this.

Examples of the usage of the new action with flower rules to push and pop
MPLS labels are:

tc filter add dev eth0 protocol ip parent ffff: flower \
    action mpls push protocol mpls_uc label 123  \
    action mirred egress redirect dev eth1

tc filter add dev eth0 protocol mpls_uc parent ffff: flower \
    action mpls pop protocol ipv4  \
    action mirred egress redirect dev eth1
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2a2ea508

net: core: add MPLS update core helper and use in OvS · d27cf5c5

John Hurley authored Jul 07, 2019

Open vSwitch allows the updating of an existing MPLS header on a packet.
In preparation for supporting similar functionality in TC, move this to a
common skb helper function.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d27cf5c5

net: core: move pop MPLS functionality from OvS to core helper · ed246cee

John Hurley authored Jul 07, 2019

Open vSwitch provides code to pop an MPLS header to a packet. In
preparation for supporting this in TC, move the pop code to an skb helper
that can be reused.

Remove the, now unused, update_ethertype static function from OvS.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ed246cee

net: core: move push MPLS functionality from OvS to core helper · 8822e270

John Hurley authored Jul 07, 2019

Open vSwitch provides code to push an MPLS header to a packet. In
preparation for supporting this in TC, move the push code to an skb helper
that can be reused.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8822e270

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · af144a98
David S. Miller authored Jul 08, 2019
```
Two cases of overlapping changes, nothing fancy.
Signed-off-by: David S. Miller <davem@davemloft.net>
```
af144a98

skbuff: increase verbosity when dumping skb data · 6413139d

Willem de Bruijn authored Jul 07, 2019

skb_warn_bad_offload and netdev_rx_csum_fault trigger on hard to debug
issues. Dump more state and the header.

Optionally dump the entire packet and linear segment. This is required
to debug checksum bugs that may include bytes past skb_tail_pointer().

Both call sites call this function inside a net_ratelimit() block.
Limit full packet log further to a hard limit of can_dump_full (5).

Based on an earlier patch by Cong Wang, see link below.

Changes v1 -> v2
  - dump frag_list only on full_pkt

Link: https://patchwork.ozlabs.org/patch/1000841/Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6413139d

ipv6: elide flowlabel check if no exclusive leases exist · 59c820b2

Willem de Bruijn authored Jul 07, 2019

Processes can request ipv6 flowlabels with cmsg IPV6_FLOWINFO.
If not set, by default an autogenerated flowlabel is selected.

Explicit flowlabels require a control operation per label plus a
datapath check on every connection (every datagram if unconnected).
This is particularly expensive on unconnected sockets multiplexing
many flows, such as QUIC.

In the common case, where no lease is exclusive, the check can be
safely elided, as both lease request and check trivially succeed.
Indeed, autoflowlabel does the same even with exclusive leases.

Elide the check if no process has requested an exclusive lease.

fl6_sock_lookup previously returns either a reference to a lease or
NULL to denote failure. Modify to return a real error and update
all callers. On return NULL, they can use the label and will elide
the atomic_dec in fl6_sock_release.

This is an optimization. Robust applications still have to revert to
requesting leases if the fast path fails due to an exclusive lease.

Changes RFC->v1:
  - use static_key_false_deferred to rate limit jump label operations
    - call static_key_deferred_flush to stop timers on exit
  - move decrement out of RCU context
  - defer optimization also if opt data is associated with a lease
  - updated all fp6_sock_lookup callers, not just udp
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

59c820b2