Merge branch 'Strict-mode-for-VRF'

Andrea Mayer says: ==================== Strict mode for VRF This patch set adds the new "strict mode" functionality to the Virtual Routing and Forwarding infrastructure (VRF). Hereafter we discuss the requirements and the main features of the "strict mode" for VRF. On VRF creation, it is necessary to specify the associated routing table used during the lookup operations. Currently, there is no mechanism that avoids creating multiple VRFs sharing the same routing table. In other words, it is not possible to force a one-to-one relationship between a specific VRF and the table associated with it. The "strict mode" imposes that each VRF can be associated to a routing table only if such routing table is not already in use by any other VRF. In particular, the strict mode ensures that: 1) given a specific routing table, the VRF (if exists) is uniquely identified; 2) given a specific VRF, the related table is not shared with any other VRF. Constraints (1) and (2) force a one-to-one relationship between each VRF and the corresponding routing table. The strict mode feature is designed to be network-namespace aware and it can be directly enabled/disabled acting on the "strict_mode" parameter. Read and write operations are carried out through the classic sysctl command on net.vrf.strict_mode path, i.e: sysctl -w net.vrf.strict_mode=1. Only two distinct values {0,1} are accepted by the strict_mode parameter: - with strict_mode=0, multiple VRFs can be associated with the same table. This is the (legacy) default kernel behavior, the same that we experience when the strict mode patch set is not applied; - with strict_mode=1, the one-to-one relationship between the VRFs and the associated tables is guaranteed. In this configuration, the creation of a VRF which refers to a routing table already associated with another VRF fails and the error is returned to the user. The kernel keeps track of the associations between a VRF and the routing table during the VRF setup, in the "management" plane. Therefore, the strict mode does not impact the performance or the intrinsic functionality of the data plane in any way. When the strict mode is active it is always possible to disable the strict mode, while the reverse operation is not always allowed. Setting the strict_mode parameter to 0 is equivalent to removing the one-to-one constraint between any single VRF and its associated routing table. Conversely, if the strict mode is disabled and there are multiple VRFs that refer to the same routing table, then it is prohibited to set the strict_mode parameter to 1. In this configuration, any attempt to perform the operation will lead to an error and it will be reported to the user. To enable strict mode once again (by setting the strict_mode parameter to 1), you must first remove all the VRFs that share common tables. There are several use cases which can take advantage from the introduction of the strict mode feature. In particular, the strict mode allows us to: i) guarantee the proper functioning of some applications which deal with routing protocols; ii) perform some tunneling decap operations which require to use specific routing tables for segregating and forwarding the traffic. Considering (i), the creation of different VRFs that point to the same table leads to the situation where two different routing entities believe they have exclusive access to the same table. This leads to the situation where different routing daemons can conflict for gaining routes control due to overlapping tables. By enabling strict mode it is possible to prevent this situation which often occurs due to incorrect configurations done by the users. The ability to enable/disable the strict mode functionality does not depend on the tool used for configuring the networking. In essence, the strict mode patch solves, at the kernel level, what some other patches [1] had tried to solve at the userspace level (using only iproute2) with all the related problems. Considering (ii), the introduction of the strict mode functionality allows us implementing the SRv6 End.DT4 behavior. Such behavior terminates a SR tunnel and it forwards the IPv4 traffic according to the routes present in the routing table supplied during the configuration. The SRv6 End.DT4 can be realized exploiting the routing capabilities made available by the VRF infrastructure. This behavior could leverage a specific VRF for forcing the traffic to be forwarded in accordance with the routes available in the VRF table. Anyway, in order to make the End.DT4 properly work, it must be guaranteed that the table used for the route lookup operations is bound to one and only one VRF. In this way, it is possible to use the table for uniquely retrieving the associated VRF and for routing packets. I would like to thank David Ahern for his constant and valuable support during the design and development phases of this patch set. Comments, suggestions and improvements are very welcome! ==================== Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Strict-mode-for-VRF'
Andrea Mayer says: ==================== Strict mode for VRF This patch set adds the new "strict mode" functionality to the Virtual Routing and Forwarding infrastructure (VRF). Hereafter we discuss the requirements and the main features of the "strict mode" for VRF. On VRF creation, it is necessary to specify the associated routing table used during the lookup operations. Currently, there is no mechanism that avoids creating multiple VRFs sharing the same routing table. In other words, it is not possible to force a one-to-one relationship between a specific VRF and the table associated with it. The "strict mode" imposes that each VRF can be associated to a routing table only if such routing table is not already in use by any other VRF. In particular, the strict mode ensures that: 1) given a specific routing table, the VRF (if exists) is uniquely identified; 2) given a specific VRF, the related table is not shared with any other VRF. Constraints (1) and (2) force a one-to-one relationship between each VRF and the corresponding routing table. The strict mode feature is designed to be network-namespace aware and it can be directly enabled/disabled acting on the "strict_mode" parameter. Read and write operations are carried out through the classic sysctl command on net.vrf.strict_mode path, i.e: sysctl -w net.vrf.strict_mode=1. Only two distinct values {0,1} are accepted by the strict_mode parameter: - with strict_mode=0, multiple VRFs can be associated with the same table. This is the (legacy) default kernel behavior, the same that we experience when the strict mode patch set is not applied; - with strict_mode=1, the one-to-one relationship between the VRFs and the associated tables is guaranteed. In this configuration, the creation of a VRF which refers to a routing table already associated with another VRF fails and the error is returned to the user. The kernel keeps track of the associations between a VRF and the routing table during the VRF setup, in the "management" plane. Therefore, the strict mode does not impact the performance or the intrinsic functionality of the data plane in any way. When the strict mode is active it is always possible to disable the strict mode, while the reverse operation is not always allowed. Setting the strict_mode parameter to 0 is equivalent to removing the one-to-one constraint between any single VRF and its associated routing table. Conversely, if the strict mode is disabled and there are multiple VRFs that refer to the same routing table, then it is prohibited to set the strict_mode parameter to 1. In this configuration, any attempt to perform the operation will lead to an error and it will be reported to the user. To enable strict mode once again (by setting the strict_mode parameter to 1), you must first remove all the VRFs that share common tables. There are several use cases which can take advantage from the introduction of the strict mode feature. In particular, the strict mode allows us to: i) guarantee the proper functioning of some applications which deal with routing protocols; ii) perform some tunneling decap operations which require to use specific routing tables for segregating and forwarding the traffic. Considering (i), the creation of different VRFs that point to the same table leads to the situation where two different routing entities believe they have exclusive access to the same table. This leads to the situation where different routing daemons can conflict for gaining routes control due to overlapping tables. By enabling strict mode it is possible to prevent this situation which often occurs due to incorrect configurations done by the users. The ability to enable/disable the strict mode functionality does not depend on the tool used for configuring the networking. In essence, the strict mode patch solves, at the kernel level, what some other patches [1] had tried to solve at the userspace level (using only iproute2) with all the related problems. Considering (ii), the introduction of the strict mode functionality allows us implementing the SRv6 End.DT4 behavior. Such behavior terminates a SR tunnel and it forwards the IPv4 traffic according to the routes present in the routing table supplied during the configuration. The SRv6 End.DT4 can be realized exploiting the routing capabilities made available by the VRF infrastructure. This behavior could leverage a specific VRF for forcing the traffic to be forwarded in accordance with the routes available in the VRF table. Anyway, in order to make the End.DT4 properly work, it must be guaranteed that the table used for the route lookup operations is bound to one and only one VRF. In this way, it is possible to use the table for uniquely retrieving the associated VRF and for routing packets. I would like to thank David Ahern for his constant and valuable support during the design and development phases of this patch set. Comments, suggestions and improvements are very welcome! ==================== Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
60cb8d3d · David S. Miller · c5eb179e · 8735e6ea · 60cb8d3d · 60cb8d3d
Commit 60cb8d3d authored Jun 20, 2020 by David S. Miller
4 changed files
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -10,6 +10,16 @@
 #include <net/dst.h>
 #include <net/fib_rules.h>

+enum l3mdev_type {
+	L3MDEV_TYPE_UNSPEC,
+	L3MDEV_TYPE_VRF,
+	__L3MDEV_TYPE_MAX
+};
+
+#define L3MDEV_TYPE_MAX (__L3MDEV_TYPE_MAX - 1)
+
+typedef int (*lookup_by_table_id_t)(struct net *net, u32 table_d);
+
 /**
 * struct l3mdev_ops - l3mdev operations
 *
@@ -37,6 +47,15 @@ struct l3mdev_ops {

 #ifdef CONFIG_NET_L3_MASTER_DEV

+int l3mdev_table_lookup_register(enum l3mdev_type l3type,
+				 lookup_by_table_id_t fn);
+
+void l3mdev_table_lookup_unregister(enum l3mdev_type l3type,
+				    lookup_by_table_id_t fn);
+
+int l3mdev_ifindex_lookup_by_table_id(enum l3mdev_type l3type, struct net *net,
+				      u32 table_id);
+
 int l3mdev_fib_rule_match(struct net *net, struct flowi *fl,
 			  struct fib_lookup_arg *arg);

@@ -280,6 +299,26 @@ struct sk_buff *l3mdev_ip6_out(struct sock *sk, struct sk_buff *skb)
 	return skb;
 }

+static inline
+int l3mdev_table_lookup_register(enum l3mdev_type l3type,
+				 lookup_by_table_id_t fn)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline
+void l3mdev_table_lookup_unregister(enum l3mdev_type l3type,
+				    lookup_by_table_id_t fn)
+{
+}
+
+static inline
+int l3mdev_ifindex_lookup_by_table_id(enum l3mdev_type l3type, struct net *net,
+				      u32 table_id)
+{
+	return -ENODEV;
+}
+
 static inline
 int l3mdev_fib_rule_match(struct net *net, struct flowi *fl,
 			  struct fib_lookup_arg *arg)

--- a/net/l3mdev/l3mdev.c
+++ b/net/l3mdev/l3mdev.c
@@ -9,6 +9,99 @@
 #include <net/fib_rules.h>
 #include <net/l3mdev.h>

+static DEFINE_SPINLOCK(l3mdev_lock);
+
+struct l3mdev_handler {
+	lookup_by_table_id_t dev_lookup;
+};
+
+static struct l3mdev_handler l3mdev_handlers[L3MDEV_TYPE_MAX + 1];
+
+static int l3mdev_check_type(enum l3mdev_type l3type)
+{
+	if (l3type <= L3MDEV_TYPE_UNSPEC || l3type > L3MDEV_TYPE_MAX)
+		return -EINVAL;
+
+	return 0;
+}
+
+int l3mdev_table_lookup_register(enum l3mdev_type l3type,
+				 lookup_by_table_id_t fn)
+{
+	struct l3mdev_handler *hdlr;
+	int res;
+
+	res = l3mdev_check_type(l3type);
+	if (res)
+		return res;
+
+	hdlr = &l3mdev_handlers[l3type];
+
+	spin_lock(&l3mdev_lock);
+
+	if (hdlr->dev_lookup) {
+		res = -EBUSY;
+		goto unlock;
+	}
+
+	hdlr->dev_lookup = fn;
+	res = 0;
+
+unlock:
+	spin_unlock(&l3mdev_lock);
+
+	return res;
+}
+EXPORT_SYMBOL_GPL(l3mdev_table_lookup_register);
+
+void l3mdev_table_lookup_unregister(enum l3mdev_type l3type,
+				    lookup_by_table_id_t fn)
+{
+	struct l3mdev_handler *hdlr;
+
+	if (l3mdev_check_type(l3type))
+		return;
+
+	hdlr = &l3mdev_handlers[l3type];
+
+	spin_lock(&l3mdev_lock);
+
+	if (hdlr->dev_lookup == fn)
+		hdlr->dev_lookup = NULL;
+
+	spin_unlock(&l3mdev_lock);
+}
+EXPORT_SYMBOL_GPL(l3mdev_table_lookup_unregister);
+
+int l3mdev_ifindex_lookup_by_table_id(enum l3mdev_type l3type,
+				      struct net *net, u32 table_id)
+{
+	lookup_by_table_id_t lookup;
+	struct l3mdev_handler *hdlr;
+	int ifindex = -EINVAL;
+	int res;
+
+	res = l3mdev_check_type(l3type);
+	if (res)
+		return res;
+
+	hdlr = &l3mdev_handlers[l3type];
+
+	spin_lock(&l3mdev_lock);
+
+	lookup = hdlr->dev_lookup;
+	if (!lookup)
+		goto unlock;
+
+	ifindex = lookup(net, table_id);
+
+unlock:
+	spin_unlock(&l3mdev_lock);
+
+	return ifindex;
+}
+EXPORT_SYMBOL_GPL(l3mdev_ifindex_lookup_by_table_id);
+
 /**
 *	l3mdev_master_ifindex - get index of L3 master device
 *	@dev: targeted interface

--- a/tools/testing/selftests/net/vrf_strict_mode_test.sh
+++ b/tools/testing/selftests/net/vrf_strict_mode_test.sh
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test is designed for testing the new VRF strict_mode functionality.
+
+ret=0
+
+# identifies the "init" network namespace which is often called root network
+# namespace.
+INIT_NETNS_NAME="init"
+
+PAUSE_ON_FAIL=${PAUSE_ON_FAIL:=no}
+
+log_test()
+{
+	local rc=$1
+	local expected=$2
+	local msg="$3"
+
+	if [ ${rc} -eq ${expected} ]; then
+		nsuccess=$((nsuccess+1))
+		printf "\n    TEST: %-60s  [ OK ]\n" "${msg}"
+	else
+		ret=1
+		nfail=$((nfail+1))
+		printf "\n    TEST: %-60s  [FAIL]\n" "${msg}"
+		if [ "${PAUSE_ON_FAIL}" = "yes" ]; then
+			echo
+			echo "hit enter to continue, 'q' to quit"
+			read a
+			[ "$a" = "q" ] && exit 1
+		fi
+	fi
+}
+
+print_log_test_results()
+{
+	if [ "$TESTS" != "none" ]; then
+		printf "\nTests passed: %3d\n" ${nsuccess}
+		printf "Tests failed: %3d\n"   ${nfail}
+	fi
+}
+
+log_section()
+{
+	echo
+	echo "################################################################################"
+	echo "TEST SECTION: $*"
+	echo "################################################################################"
+}
+
+ip_expand_args()
+{
+	local nsname=$1
+	local nsarg=""
+
+	if [ "${nsname}" != "${INIT_NETNS_NAME}" ]; then
+		nsarg="-netns ${nsname}"
+	fi
+
+	echo "${nsarg}"
+}
+
+vrf_count()
+{
+	local nsname=$1
+	local nsarg="$(ip_expand_args ${nsname})"
+
+	ip ${nsarg} -o link show type vrf | wc -l
+}
+
+count_vrf_by_table_id()
+{
+	local nsname=$1
+	local tableid=$2
+	local nsarg="$(ip_expand_args ${nsname})"
+
+	ip ${nsarg} -d -o link show type vrf | grep "table ${tableid}" | wc -l
+}
+
+add_vrf()
+{
+	local nsname=$1
+	local vrfname=$2
+	local vrftable=$3
+	local nsarg="$(ip_expand_args ${nsname})"
+
+	ip ${nsarg} link add ${vrfname} type vrf table ${vrftable} &>/dev/null
+}
+
+add_vrf_and_check()
+{
+	local nsname=$1
+	local vrfname=$2
+	local vrftable=$3
+	local cnt
+	local rc
+
+	add_vrf ${nsname} ${vrfname} ${vrftable}; rc=$?
+
+	cnt=$(count_vrf_by_table_id ${nsname} ${vrftable})
+
+	log_test ${rc} 0 "${nsname}: add vrf ${vrfname}, ${cnt} vrfs for table ${vrftable}"
+}
+
+add_vrf_and_check_fail()
+{
+	local nsname=$1
+	local vrfname=$2
+	local vrftable=$3
+	local cnt
+	local rc
+
+	add_vrf ${nsname} ${vrfname} ${vrftable}; rc=$?
+
+	cnt=$(count_vrf_by_table_id ${nsname} ${vrftable})
+
+	log_test ${rc} 2 "${nsname}: CANNOT add vrf ${vrfname}, ${cnt} vrfs for table ${vrftable}"
+}
+
+del_vrf_and_check()
+{
+	local nsname=$1
+	local vrfname=$2
+	local nsarg="$(ip_expand_args ${nsname})"
+
+	ip ${nsarg} link del ${vrfname}
+	log_test $? 0 "${nsname}: remove vrf ${vrfname}"
+}
+
+config_vrf_and_check()
+{
+	local nsname=$1
+	local addr=$2
+	local vrfname=$3
+	local nsarg="$(ip_expand_args ${nsname})"
+
+	ip ${nsarg} link set dev ${vrfname} up && \
+		ip ${nsarg} addr add ${addr} dev ${vrfname}
+	log_test $? 0 "${nsname}: vrf ${vrfname} up, addr ${addr}"
+}
+
+read_strict_mode()
+{
+	local nsname=$1
+	local rval
+	local rc=0
+	local nsexec=""
+
+	if [ "${nsname}" != "${INIT_NETNS_NAME}" ]; then
+		# a custom network namespace is provided
+		nsexec="ip netns exec ${nsname}"
+	fi
+
+	rval="$(${nsexec} bash -c "cat /proc/sys/net/vrf/strict_mode" | \
+		grep -E "^[0-1]$")" &> /dev/null
+	if [ $? -ne 0 ]; then
+		# set errors
+		rval=255
+		rc=1
+	fi
+
+	# on success, rval can be only 0 or 1; on error, rval is equal to 255
+	echo ${rval}
+	return ${rc}
+}
+
+read_strict_mode_compare_and_check()
+{
+	local nsname=$1
+	local expected=$2
+	local res
+
+	res="$(read_strict_mode ${nsname})"
+	log_test ${res} ${expected} "${nsname}: check strict_mode=${res}"
+}
+
+set_strict_mode()
+{
+	local nsname=$1
+	local val=$2
+	local nsexec=""
+
+	if [ "${nsname}" != "${INIT_NETNS_NAME}" ]; then
+		# a custom network namespace is provided
+		nsexec="ip netns exec ${nsname}"
+	fi
+
+	${nsexec} bash -c "echo ${val} >/proc/sys/net/vrf/strict_mode" &>/dev/null
+}
+
+enable_strict_mode()
+{
+	local nsname=$1
+
+	set_strict_mode ${nsname} 1
+}
+
+disable_strict_mode()
+{
+	local nsname=$1
+
+	set_strict_mode ${nsname} 0
+}
+
+disable_strict_mode_and_check()
+{
+	local nsname=$1
+
+	disable_strict_mode ${nsname}
+	log_test $? 0 "${nsname}: disable strict_mode (=0)"
+}
+
+enable_strict_mode_and_check()
+{
+	local nsname=$1
+
+	enable_strict_mode ${nsname}
+	log_test $? 0 "${nsname}: enable strict_mode (=1)"
+}
+
+enable_strict_mode_and_check_fail()
+{
+	local nsname=$1
+
+	enable_strict_mode ${nsname}
+	log_test $? 1 "${nsname}: CANNOT enable strict_mode"
+}
+
+strict_mode_check_default()
+{
+	local nsname=$1
+	local strictmode
+	local vrfcnt
+
+	vrfcnt=$(vrf_count ${nsname})
+	strictmode=$(read_strict_mode ${nsname})
+	log_test ${strictmode} 0 "${nsname}: strict_mode=0 by default, ${vrfcnt} vrfs"
+}
+
+setup()
+{
+	modprobe vrf
+
+	ip netns add testns
+	ip netns exec testns ip link set lo up
+}
+
+cleanup()
+{
+	ip netns del testns 2>/dev/null
+
+	ip link del vrf100 2>/dev/null
+	ip link del vrf101 2>/dev/null
+	ip link del vrf102 2>/dev/null
+
+	echo 0 >/proc/sys/net/vrf/strict_mode 2>/dev/null
+}
+
+vrf_strict_mode_tests_init()
+{
+	vrf_strict_mode_check_support init
+
+	strict_mode_check_default init
+
+	add_vrf_and_check init vrf100 100
+	config_vrf_and_check init 172.16.100.1/24 vrf100
+
+	enable_strict_mode_and_check init
+
+	add_vrf_and_check_fail init vrf101 100
+
+	disable_strict_mode_and_check init
+
+	add_vrf_and_check init vrf101 100
+	config_vrf_and_check init 172.16.101.1/24 vrf101
+
+	enable_strict_mode_and_check_fail init
+
+	del_vrf_and_check init vrf101
+
+	enable_strict_mode_and_check init
+
+	add_vrf_and_check init vrf102 102
+	config_vrf_and_check init 172.16.102.1/24 vrf102
+
+	# the strict_modle is enabled in the init
+}
+
+vrf_strict_mode_tests_testns()
+{
+	vrf_strict_mode_check_support testns
+
+	strict_mode_check_default testns
+
+	enable_strict_mode_and_check testns
+
+	add_vrf_and_check testns vrf100 100
+	config_vrf_and_check testns 10.0.100.1/24 vrf100
+
+	add_vrf_and_check_fail testns vrf101 100
+
+	add_vrf_and_check_fail testns vrf102 100
+
+	add_vrf_and_check testns vrf200 200
+
+	disable_strict_mode_and_check testns
+
+	add_vrf_and_check testns vrf101 100
+
+	add_vrf_and_check testns vrf102 100
+
+	#the strict_mode is disabled in the testns
+}
+
+vrf_strict_mode_tests_mix()
+{
+	read_strict_mode_compare_and_check init 1
+
+	read_strict_mode_compare_and_check testns 0
+
+	del_vrf_and_check testns vrf101
+
+	del_vrf_and_check testns vrf102
+
+	disable_strict_mode_and_check init
+
+	enable_strict_mode_and_check testns
+
+	enable_strict_mode_and_check init
+	enable_strict_mode_and_check init
+
+	disable_strict_mode_and_check testns
+	disable_strict_mode_and_check testns
+
+	read_strict_mode_compare_and_check init 1
+
+	read_strict_mode_compare_and_check testns 0
+}
+
+vrf_strict_mode_tests()
+{
+	log_section "VRF strict_mode test on init network namespace"
+	vrf_strict_mode_tests_init
+
+	log_section "VRF strict_mode test on testns network namespace"
+	vrf_strict_mode_tests_testns
+
+	log_section "VRF strict_mode test mixing init and testns network namespaces"
+	vrf_strict_mode_tests_mix
+}
+
+vrf_strict_mode_check_support()
+{
+	local nsname=$1
+	local output
+	local rc
+
+	output="$(lsmod | grep '^vrf' | awk '{print $1}')"
+	if [ -z "${output}" ]; then
+		modinfo vrf || return $?
+	fi
+
+	# we do not care about the value of the strict_mode; we only check if
+	# the strict_mode parameter is available or not.
+	read_strict_mode ${nsname} &>/dev/null; rc=$?
+	log_test ${rc} 0 "${nsname}: net.vrf.strict_mode is available"
+
+	return ${rc}
+}
+
+if [ "$(id -u)" -ne 0 ];then
+	echo "SKIP: Need root privileges"
+	exit 0
+fi
+
+if [ ! -x "$(command -v ip)" ]; then
+	echo "SKIP: Could not run test without ip tool"
+	exit 0
+fi
+
+cleanup &> /dev/null
+
+setup
+vrf_strict_mode_tests
+cleanup
+
+print_log_test_results
+
+exit $ret