Commit 1274e1cc authored by Roopa Prabhu's avatar Roopa Prabhu Committed by David S. Miller

vxlan: ecmp support for mac fdb entries

Todays vxlan mac fdb entries can point to multiple remote
ips (rdsts) with the sole purpose of replicating
broadcast-multicast and unknown unicast packets to those remote ips.

E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be
load balanced to remote switches (vteps) belonging to the
same multi-homed ethernet segment (E-VPN multihoming is analogous
to multi-homed LAG implementations, but with the inter-switch
peerlink replaced with a vxlan tunnel). In other words it needs
support for mac ecmp. Furthermore, for faster convergence, E-VPN
multihoming needs the ability to update fdb ecmp nexthops independent
of the fdb entries.

New route nexthop API is perfect for this usecase.
This patch extends the vxlan fdb code to take a nexthop id
pointing to an ecmp nexthop group.

Changes include:
- New NDA_NH_ID attribute for fdbs
- Use the newly added fdb nexthop groups
- makes vxlan rdsts and nexthop handling code mutually
  exclusive
- since this is a new use-case and the requirement is for ecmp
nexthop groups, the fdb add and update path checks that the
nexthop is really an ecmp nexthop group. This check can be relaxed
in the future, if we want to introduce replication fdb nexthop groups
and allow its use in lieu of current rdst lists.
- fdb update requests with nexthop id's only allowed for existing
fdb's that have nexthop id's
- learning will not override an existing fdb entry with nexthop
group
- I have wrapped the switchdev offload code around the presence of
rdst

[1] E-VPN RFC https://tools.ietf.org/html/rfc7432
[2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365
[3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf

Includes a null check fix in vxlan_xmit from Nikolay

v2 - Fixed build issue:
Reported-by: default avatarkbuild test robot <lkp@intel.com>
Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parent 38428d68
This diff is collapsed.
......@@ -7,6 +7,7 @@
#include <net/dst_metadata.h>
#include <net/rtnetlink.h>
#include <net/switchdev.h>
#include <net/nexthop.h>
#define IANA_VXLAN_UDP_PORT 4789
......@@ -487,4 +488,28 @@ static inline void vxlan_flag_attr_error(int attrtype,
#undef VXLAN_FLAG
}
static inline bool vxlan_fdb_nh_path_select(struct nexthop *nh,
int hash,
struct vxlan_rdst *rdst)
{
struct fib_nh_common *nhc;
nhc = nexthop_path_fdb_result(nh, hash);
if (unlikely(!nhc))
return false;
switch (nhc->nhc_gw_family) {
case AF_INET:
rdst->remote_ip.sin.sin_addr.s_addr = nhc->nhc_gw.ipv4;
rdst->remote_ip.sa.sa_family = AF_INET;
break;
case AF_INET6:
rdst->remote_ip.sin6.sin6_addr = nhc->nhc_gw.ipv6;
rdst->remote_ip.sa.sa_family = AF_INET6;
break;
}
return true;
}
#endif
......@@ -29,6 +29,7 @@ enum {
NDA_LINK_NETNSID,
NDA_SRC_VNI,
NDA_PROTOCOL, /* Originator of entry */
NDA_NH_ID,
__NDA_MAX
};
......
......@@ -1771,6 +1771,7 @@ static struct neigh_table *neigh_find_table(int family)
}
const struct nla_policy nda_policy[NDA_MAX+1] = {
[NDA_UNSPEC] = { .strict_start_type = NDA_NH_ID },
[NDA_DST] = { .type = NLA_BINARY, .len = MAX_ADDR_LEN },
[NDA_LLADDR] = { .type = NLA_BINARY, .len = MAX_ADDR_LEN },
[NDA_CACHEINFO] = { .len = sizeof(struct nda_cacheinfo) },
......@@ -1781,6 +1782,7 @@ const struct nla_policy nda_policy[NDA_MAX+1] = {
[NDA_IFINDEX] = { .type = NLA_U32 },
[NDA_MASTER] = { .type = NLA_U32 },
[NDA_PROTOCOL] = { .type = NLA_U8 },
[NDA_NH_ID] = { .type = NLA_U32 },
};
static int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment