Commits · 272c2330adc9c68284cb0066719160c24bfe605f · Kirill Smelkov / linux

22 Jun, 2020 11 commits

xfrm: bail early on slave pass over skb · 272c2330

Jarod Wilson authored Jun 19, 2020

This is prep work for initial support of bonding hardware encryption
pass-through support. The bonding driver will fill in the slave_dev
pointer, and we use that to know not to skb_push() again on a given
skb that was already processed on the bond device.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: netdev@vger.kernel.org
CC: intel-wired-lan@lists.osuosl.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

272c2330

Merge branch 'devlink-Support-get-set-mac-address-of-a-port-function' · 389cc2f3

David S. Miller authored Jun 22, 2020

Parav Pandit says:

====================
devlink: Support get,set mac address of a port function

Currently, ip link set dev <pfndev> vf <vf_num> <param> <value> has
below few limitations.

1. Command is limited to set VF parameters only.
It cannot set the default MAC address for the PCI PF.

2. It can be set only on system where PCI SR-IOV capability exists.
In smartnic based system, eswitch of a NIC resides on a different
embedded cpu which has the VF and PF representors for the SR-IOV
functions of a host system in which this smartnic is plugged-in.

3. It cannot setup the function attributes of sub-function described
in detail in comprehensive RFC [1] and [2].

This series covers the first small part to let user query and set MAC
address (hardware address) of a PCI PF/VF which is represented by
devlink port pcipf, pcivf port flavours respectively.

Whenever a devlink port manages a function connected to a devlink port,
it allows to query and set its hardware address.

Driver implements necessary get/set callback functions if it supports
port function for a given port type.

Patch summary:
Patch-1 Prepares devlink port fill routines for extack
Patch-2 and 3 extended devlink interface to get/set port function
attributes, mainly hardware address to start with.

Patch-2 Extended port dump command to query port function hardware
address
Patch-3 Introduces a command to set the hardware address of a port
function

Patch-4 to 9 refactors and implement devlink callbacks in mlx5_core
driver.
Patch-4 Constify the mac address pointer in set routines
Patch-5 Introduces eswich check helper to use in devlink facing
callbacks
Patch-6 Moves port index, port number conversion routine to eswitch
header file
Patch-7 Implements port function query devlink callback
Patch-8 Refactors mac address setting routine to uniformly use
state_lock
Patch-9 Implements port function set devlink callback

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/
[2] https://marc.info/?l=linux-netdev&m=158555928517777&w=2
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

389cc2f3

net/mlx5: E-switch, Supporting setting devlink port function mac address · 330077d1

Parav Pandit authored Jun 19, 2020

Enable user to set mac address of the PCI PF and VF port function.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

330077d1

net/mlx5: Split mac address setting function for using state_lock · 1094795c

Parav Pandit authored Jun 19, 2020

Refactor mac address setting function to let caller hold the necessary
state_lock mutex, so that subsequent patch and use this helper routine.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1094795c

net/mlx5: E-switch, Support querying port function mac address · f099fde1

Parav Pandit authored Jun 19, 2020

Support querying mac address of the eswitch devlink port function.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f099fde1

net/mlx5: Move helper to eswitch layer · 443bf36e

Parav Pandit authored Jun 19, 2020

To use port number to port index conversion at eswitch level, move it to
eswitch header.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

443bf36e

net/mlx5: E-switch, Introduce and use eswitch support check helper · bd939753

Parav Pandit authored Jun 19, 2020

Introduce an helper routine to get esw from a devlink device and use it
at eswitch callbacks and in subsequent patch.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd939753

net/mlx5: Constify mac address pointer · fa997825

Parav Pandit authored Jun 19, 2020

Since none of the functions need to modify the input mac address,
constify them.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa997825

net/devlink: Support setting hardware address of port function · a1e8ae90

Parav Pandit authored Jun 19, 2020

PCI PF and VF devlink port can manage the function represented by a
devlink port.

Allow users to set port function's hardware address.

Example of a PCI VF port which supports a port function:
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:00:00:00:00:00

$ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55

$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:11:22:33:44:55
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a1e8ae90

net/devlink: Support querying hardware address of port function · 2a916ecc

Parav Pandit authored Jun 19, 2020

PCI PF and VF devlink port can manage the function represented by
a devlink port.

Enable users to query port function's hardware address.

Example of a PCI VF port which supports a port function:
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
  function:
    hw_addr 00:11:22:33:44:66

$ devlink port show pci/0000:06:00.0/2 -jp
{
    "port": {
        "pci/0000:06:00.0/2": {
            "type": "eth",
            "netdev": "enp6s0pf0vf1",
            "flavour": "pcivf",
            "pfnum": 0,
            "vfnum": 1,
            "function": {
                "hw_addr": "00:11:22:33:44:66"
            }
        }
    }
}
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2a916ecc

net/devlink: Prepare devlink port functions to fill extack · a829eb0d

Parav Pandit authored Jun 19, 2020

Prepare devlink port related functions to optionally fill up
the extack information which will be used in subsequent patch by port
function attribute(s).
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a829eb0d

21 Jun, 2020 29 commits

Merge branch 'Marvell-mvpp2-improvements' · 29a720c1

David S. Miller authored Jun 20, 2020

Russell King says:

====================
Marvell mvpp2 improvements

This series primarily cleans up mvpp2, but also fixes a left-over from
91a208f2 ("net: phylink: propagate resolved link config via
mac_link_up()").

Patch 1 introduces some port helpers:
  mvpp2_port_supports_xlg() - does the port support the XLG MAC
  mvpp2_port_supports_rgmii() - does the port support RGMII modes

Patch 2 introduces mvpp2_phylink_to_port(), rather than having repeated
  open coding of container_of().

Patch 3 introduces mvpp2_modify(), which reads-modifies-writes a
  register - I've converted the phylink specific code to use this
  helper.

Patch 4 moves the hardware control of the pause modes from
  mvpp2_xlg_config() (which is called via the phylink_config method)
  to mvpp2_mac_link_up() - a change that was missed in the above
  referenced commit.

v2: remove "inline" in patch 2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

29a720c1

net: mvpp2: set xlg flow control in mvpp2_mac_link_up() · 63d78cc9

Russell King authored Jun 20, 2020

Set the flow control settings in mvpp2_mac_link_up() for 10G links
just as we do for 1G and slower links. This is now the preferred
location.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

63d78cc9

net: mvpp2: add register modification helper · bd45f644

Russell King authored Jun 20, 2020

Add a helper to read-modify-write a register, and use it in the phylink
helpers.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd45f644

net: mvpp2: add mvpp2_phylink_to_port() helper · 6c2b49eb

Russell King authored Jun 20, 2020

Add a helper to convert the struct phylink_config pointer passed in
from phylink to the drivers internal struct mvpp2_port.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

6c2b49eb

net: mvpp2: add port support helpers · a9a33202

Russell King authored Jun 20, 2020

The mvpp2 code has tests scattered amongst the code to determine
whether the port supports the XLG, and whether the port supports
RGMII mode.

Rather than having these tests scattered, provide a couple of helper
functions, so that future additions can ensure that they get these
tests correct.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

a9a33202

Remove redundant skb null check · 8bf15395

Gaurav Singh authored Jun 19, 2020

Remove the redundant null check for skb.
Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8bf15395

Merge branch 'tcp-remove-two-indirect-calls-from-xmit-path' · c8f8a9f8

David S. Miller authored Jun 20, 2020

Eric Dumazet says:

====================
tcp: remove two indirect calls from xmit path

__tcp_transmit_skb() does two indirect calls per packet, lets get rid
of them.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c8f8a9f8

tcp: remove indirect calls for icsk->icsk_af_ops->send_check · dd2e0b86

Eric Dumazet authored Jun 19, 2020

Mitigate RETPOLINE costs in __tcp_transmit_skb()
by using INDIRECT_CALL_INET() wrapper.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dd2e0b86

tcp: remove indirect calls for icsk->icsk_af_ops->queue_xmit · 05e22e83

Eric Dumazet authored Jun 19, 2020

Mitigate RETPOLINE costs in __tcp_transmit_skb()
by using INDIRECT_CALL_INET() wrapper.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

05e22e83

of: mdio: preserve phy dev_flags in of_phy_connect() · 902053f1

Tao Ren authored Jun 18, 2020

Replace assignment "=" with OR "|=" for "phy->dev_flags" so "dev_flags"
configured in phy probe() function can be preserved.

The idea is similar to commit e7312efb ("net: phy: modify assignment
to OR for dev_flags in phy_attach_direct").
Signed-off-by: Tao Ren <rentao.bupt@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

902053f1

net: Avoid overwriting valid skb->napi_id · 78e57f15

Amritha Nambiar authored Jun 18, 2020

This will be useful to allow busy poll for tunneled traffic. In case of
busy poll for sessions over tunnels, the underlying physical device's
queues need to be polled.

Tunnels schedule NAPI either via netif_rx() for backlog queue or
schedule the gro_cell_poll(). netif_rx() propagates the valid skb->napi_id
to the socket. OTOH, gro_cell_poll() stamps the skb->napi_id again by
calling skb_mark_napi_id() with the tunnel NAPI which is not a busy poll
candidate. This was preventing tunneled traffic to use busy poll. A valid
NAPI ID in the skb indicates it was already marked for busy poll by a
NAPI driver and hence needs to be copied into the socket.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

78e57f15

Remove redundant condition in qdisc_graft · 8eaf8d99

Gaurav Singh authored Jun 18, 2020

parent cannot be NULL here since its in the else part
of the if (parent == NULL) condition. Remove the extra
check on parent pointer.
Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8eaf8d99

Merge branch 'Ocelot-Felix-driver-cleanup' · cd399838

David S. Miller authored Jun 20, 2020

Vladimir Oltean says:

====================
Ocelot/Felix driver cleanup

Some of the code in the mscc felix and ocelot drivers was added while in
a bit of a hurry. Let's take a moment and put things in relative order.

First 3 patches are sparse warning fixes.

Patches 4-9 perform some further splitting between mscc_felix,
mscc_ocelot, and the common hardware library they share. Meaning that
some code is being moved from the library into just the mscc_ocelot
module.

Patches 10-12 refactor the naming conventions in the existing VCAP code
(for tc-flower offload), since we're going to be adding some more code
for VCAP IS1 (previous tentatives already submitted here:
https://patchwork.ozlabs.org/project/netdev/cover/20200602051828.5734-1-xiaoliang.yang_1@nxp.com/),
and that code would be confusing to read and maintain using current
naming conventions.

No functional modification is intended. I checked that the VCAP IS2 code
still works by applying a tc ingress filter with an EtherType key and
'drop' action.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

cd399838

net: mscc: ocelot: unexpose ocelot_vcap_policer_{add,del} · c73b0ad3

Vladimir Oltean authored Jun 20, 2020

Remove the function prototypes from ocelot_police.h and make these
functions static. We need to move them above their callers. Note that
moving the implementations to ocelot_police.c is not trivially possible
due to dependency on is2_entry_set() which is static to ocelot_vcap.c.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c73b0ad3

net: mscc: ocelot: generalize the "ACE/ACL" names · aae4e500

Vladimir Oltean authored Jun 20, 2020

Access Control Lists (and their respective Access Control Entries) are
specifically entries in the VCAP IS2, the security enforcement block,
according to the documentation.
Let's rename the structures and functions to something more generic, so
that VCAP IS1 structures (which would otherwise have to be called
Ingress Classification Entries) can reuse the same code without
confusion.

Some renaming that was done:

struct ocelot_ace_rule -> struct ocelot_vcap_filter
struct ocelot_acl_block -> struct ocelot_vcap_block
enum ocelot_ace_type -> enum ocelot_vcap_key_type
struct ocelot_ace_vlan -> struct ocelot_vcap_key_vlan
enum ocelot_ace_action -> enum ocelot_vcap_action
struct ocelot_ace_stats -> struct ocelot_vcap_stats
enum ocelot_ace_type -> enum ocelot_vcap_key_type
struct ocelot_ace_frame_* -> struct ocelot_vcap_key_*

No functional change is intended.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aae4e500

net: mscc: ocelot: rename ocelot_ace.{c, h} to ocelot_vcap.{c,h} · 3c83654f

Vladimir Oltean authored Jun 20, 2020

Access Control Lists (and their respective Access Control Entries) are
specifically entries in the VCAP IS2, the security enforcement block,
according to the documentation.

Let's rename the files that deal with generic operations on the VCAP
TCAM, so that VCAP IS1 and ES0 can reuse the same code without
confusion.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c83654f

net: mscc: ocelot: move net_device related functions to ocelot_net.c · 9c90eea3

Vladimir Oltean authored Jun 20, 2020

The ocelot hardware library shouldn't contain too much net_device
specific code, since it is shared with DSA which abstracts that
structure away. So much as much of this code as possible into the
mscc_ocelot driver and outside of the common library.

We're making an exception for MDB and LAG code. That is not yet exported
to DSA, but when it will, most of the code that's already in ocelot.c
will remain there. So, there's no point in moving code to ocelot_net.c
just to move it back later.

We could have moved all net_device code to ocelot_vsc7514.c directly,
but let's operate under the assumption that if a new switchdev ocelot
driver gets added, it'll define its SoC-specific stuff in a new
ocelot_vsc*.c file and it'll reuse the rest of the code.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9c90eea3

net: mscc: ocelot: move ocelot_regs.c into ocelot_vsc7514.c · d9feb904

Vladimir Oltean authored Jun 20, 2020

ocelot_regs.c actually shouldn't be part of the common library. It
describes the register map of the VSC7514 switch. The way ocelot
switches work, they'll have highly optimized register maps, so another
SoC will likely have the same registers but laid out completely
different in memory (so there's little room for reusing this structure).
So move it to ocelot_vsc7514.c instead.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d9feb904

net: mscc: ocelot: rename MSCC_OCELOT_SWITCH_OCELOT to MSCC_OCELOT_SWITCH · 14addfb6

Vladimir Oltean authored Jun 20, 2020

Putting 'ocelot' in the config's name twice just to say that 'it's the
ocelot driver running on the ocelot SoC' is a bit confusing. Instead,
it's just the ocelot driver. Now that we've renamed the previous symbol
that was holding the MSCC_OCELOT_SWITCH_OCELOT into *_LIB (because
that's what it is), we're free to use this name for the driver.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

14addfb6

net: mscc: ocelot: convert MSCC_OCELOT_SWITCH into a library · f4d0323b

Vladimir Oltean authored Jun 20, 2020

Hide the CONFIG_MSCC_OCELOT_SWITCH option from users. It is meant to be
only a hardware library which is selected by the drivers that use it
(ocelot, felix).

Since it is "selected" from Kconfig, all its dependencies are manually
transferred to the driver that selects it. This is because "select" in
Kconfig language is a bit of a mess, and doesn't handle dependencies of
selected options quite right.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f4d0323b

net: mscc: ocelot: rename module to mscc_ocelot · 56583862

Vladimir Oltean authored Jun 20, 2020

mscc_ocelot is a slightly better name for a module than ocelot_board or
ocelot_vsc7514 is, so let's use that.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

56583862

net: mscc: ocelot: rename ocelot_board.c to ocelot_vsc7514.c · 589aa6e7

Vladimir Oltean authored Jun 20, 2020

To follow the model of felix and seville where we have one
platform-specific file, rename this file to the actual SoC it serves.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

589aa6e7

net: mscc: ocelot: access EtherType using __be16 · ff4b0bc6

Vladimir Oltean authored Jun 20, 2020

Get rid of sparse "cast to restricted __be16" warnings.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff4b0bc6

net: mscc: ocelot: use plain int when interacting with TCAM tables · 7eb5c96a

Vladimir Oltean authored Jun 20, 2020

sparse is rightfully complaining about the fact that:

warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
   26 |   __builtin_constant_p((l) > (h)), (l) > (h), 0)))
      |                            ^
note: in expansion of macro ‘GENMASK_INPUT_CHECK’
   39 |  (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
      |   ^~~~~~~~~~~~~~~~~~~
note: in expansion of macro ‘GENMASK’
  127 |   mask = GENMASK(width, 0);
      |          ^~~~~~~

So replace the variables that go into GENMASK with plain, signed integer
types.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7eb5c96a

net: dsa: felix: make vcap is2 keys and actions static · 3ab4ceb6

Vladimir Oltean authored Jun 20, 2020

Get rid of some sparse warnings.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ab4ceb6

Merge branch 'Strict-mode-for-VRF' · 60cb8d3d

David S. Miller authored Jun 20, 2020

Andrea Mayer says:

====================
Strict mode for VRF

This patch set adds the new "strict mode" functionality to the Virtual
Routing and Forwarding infrastructure (VRF). Hereafter we discuss the
requirements and the main features of the "strict mode" for VRF.

On VRF creation, it is necessary to specify the associated routing table used
during the lookup operations. Currently, there is no mechanism that avoids
creating multiple VRFs sharing the same routing table. In other words, it is not
possible to force a one-to-one relationship between a specific VRF and the table
associated with it.

The "strict mode" imposes that each VRF can be associated to a routing table
only if such routing table is not already in use by any other VRF.
In particular, the strict mode ensures that:

 1) given a specific routing table, the VRF (if exists) is uniquely identified;
 2) given a specific VRF, the related table is not shared with any other VRF.

Constraints (1) and (2) force a one-to-one relationship between each VRF and the
corresponding routing table.

The strict mode feature is designed to be network-namespace aware and it can be
directly enabled/disabled acting on the "strict_mode" parameter.
Read and write operations are carried out through the classic sysctl command on
net.vrf.strict_mode path, i.e: sysctl -w net.vrf.strict_mode=1.

Only two distinct values {0,1} are accepted by the strict_mode parameter:

 - with strict_mode=0, multiple VRFs can be associated with the same table.
   This is the (legacy) default kernel behavior, the same that we experience
   when the strict mode patch set is not applied;

 - with strict_mode=1, the one-to-one relationship between the VRFs and the
   associated tables is guaranteed. In this configuration, the creation of a VRF
   which refers to a routing table already associated with another VRF fails and
   the error is returned to the user.

The kernel keeps track of the associations between a VRF and the routing table
during the VRF setup, in the "management" plane. Therefore, the strict mode does
not impact the performance or the intrinsic functionality of the data plane in
any way.

When the strict mode is active it is always possible to disable the strict mode,
while the reverse operation is not always allowed.
Setting the strict_mode parameter to 0 is equivalent to removing the one-to-one
constraint between any single VRF and its associated routing table.

Conversely, if the strict mode is disabled and there are multiple VRFs that
refer to the same routing table, then it is prohibited to set the strict_mode
parameter to 1. In this configuration, any attempt to perform the operation will
lead to an error and it will be reported to the user.
To enable strict mode once again (by setting the strict_mode parameter to 1),
you must first remove all the VRFs that share common tables.

There are several use cases which can take advantage from the introduction of
the strict mode feature. In particular, the strict mode allows us to:

  i) guarantee the proper functioning of some applications which deal with
     routing protocols;

 ii) perform some tunneling decap operations which require to use specific
     routing tables for segregating and forwarding the traffic.

Considering (i), the creation of different VRFs that point to the same table
leads to the situation where two different routing entities believe they have
exclusive access to the same table. This leads to the situation where different
routing daemons can conflict for gaining routes control due to overlapping
tables. By enabling strict mode it is possible to prevent this situation which
often occurs due to incorrect configurations done by the users.
The ability to enable/disable the strict mode functionality does not depend on
the tool used for configuring the networking. In essence, the strict mode patch
solves, at the kernel level, what some other patches [1] had tried to solve at
the userspace level (using only iproute2) with all the related problems.

Considering (ii), the introduction of the strict mode functionality allows us
implementing the SRv6 End.DT4 behavior. Such behavior terminates a SR tunnel and
it forwards the IPv4 traffic according to the routes present in the routing
table supplied during the configuration. The SRv6 End.DT4 can be realized
exploiting the routing capabilities made available by the VRF infrastructure.
This behavior could leverage a specific VRF for forcing the traffic to be
forwarded in accordance with the routes available in the VRF table.
Anyway, in order to make the End.DT4 properly work, it must be guaranteed that
the table used for the route lookup operations is bound to one and only one VRF.
In this way, it is possible to use the table for uniquely retrieving the
associated VRF and for routing packets.

I would like to thank David Ahern for his constant and valuable support during
the design and development phases of this patch set.

Comments, suggestions and improvements are very welcome!
====================
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

60cb8d3d

selftests: add selftest for the VRF strict mode · 8735e6ea

Andrea Mayer authored Jun 20, 2020

The new strict mode functionality is tested in different configurations and
on different network namespaces.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>

8735e6ea

vrf: add l3mdev registration for table to VRF device lookup · a59a8ffd

Andrea Mayer authored Jun 20, 2020

During the initialization phase of the VRF module, the callback for table
to VRF device lookup is registered in l3mdev.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>

a59a8ffd

vrf: add sysctl parameter for strict mode · 33306f1a

Andrea Mayer authored Jun 20, 2020

Add net.vrf.strict_mode sysctl parameter.

When net.vrf.strict_mode=0 (default) it is possible to associate multiple
VRF devices to the same table. Conversely, when net.vrf.strict_mode=1 a
table can be associated to a single VRF device.

When switching from net.vrf.strict_mode=0 to net.vrf.strict_mode=1, a check
is performed to verify that all tables have at most one VRF associated,
otherwise the switch is not allowed.

The net.vrf.strict_mode parameter is per network namespace.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>

33306f1a