Commit 9c5982fe authored by David S. Miller's avatar David S. Miller

Merge branch 'fib-offload-notifications'

Jiri Pirko says:

====================
fib offload: switch to notifier

The goal of this patchset is to allow driver to propagate all prefixes
configured in kernel down HW. This is necessary for routing to work
as expected. If we don't do that HW might forward prefixes known to kernel
incorrectly. Take an example when default route is set in switch HW and there
is an IP address set on a management (non-switch) port.

Currently, only FIB entries related to the switch port netdev are
offloaded using switchdev ops. This model is not extendable so the
first patch introduces a replacement: notifier to propagate FIB entry
additions and removals to whoever is interested.

The second patch introduces couple of helpers to deal with RTNH_F_OFFLOAD
flags. Currently it is set in switchdev core. There the assumption is
that only one offload device exists. But for FIB notifier, we assume
multiple offload devices. So the patch introduces a per FIB entry
reference counter and helpers use it in order to achieve this:
   0 means RTNH_F_OFFLOAD is not set, no device offloads this entry
   n means RTNH_F_OFFLOAD is set and the entry is offloaded by n devices

Patches 3 and 4 convert mlxsw and rocker to adopt this new way, registering
one notifier block for each asic instance. Both of these patches also
implement internal "abort" mechanism.

Using switchdev ops, "abort" is called by switchdev core whenever there is
an error during FIB entry add offload. This leads to removal of all
offloaded entries on system by fib_trie code.

Now the new notifier assumes the driver takes care of the abort action.
Here's why:
1) The fact that one HW cannot offload an entry does not mean that the
   others can't do it. So let only one entity to abort and leave the rest
   to work happily.
2) The driver knows what to in order to properly abort. For example,
   currently abort is broken for mlxsw, as for Spectrum there is a need
   to set 0.0.0.0/0 trap in RALUE register.

The fifth patch removes the old, no longer used FIB offload infrastructure.

The last patch reflects the changes into switchdev documentation file.

---
v2->v3:
 -patch 3/6
   -fixed offload inc/dec to be done in fib4_entry_init/fini and only
    in case !trap as suggested by Ido
v1->v2:
 -patch 3/6:
   -fixed lpm tree setup and binding for abort and pointed out by Ido
   -do nexthop checks as suggested by Ido
   -fix use after free during abort
 -patch 6/6:
   -fixed texts as suggested by Ido
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents eb523f42 fd41b0ea
......@@ -314,30 +314,29 @@ the kernel, with the device doing the FIB lookup and forwarding. The device
does a longest prefix match (LPM) on FIB entries matching route prefix and
forwards the packet to the matching FIB entry's nexthop(s) egress ports.
To program the device, the driver implements support for
SWITCHDEV_OBJ_IPV[4|6]_FIB object using switchdev_port_obj_xxx ops.
switchdev_port_obj_add is used for both adding a new FIB entry to the device,
or modifying an existing entry on the device.
To program the device, the driver has to register a FIB notifier handler
using register_fib_notifier. The following events are available:
FIB_EVENT_ENTRY_ADD: used for both adding a new FIB entry to the device,
or modifying an existing entry on the device.
FIB_EVENT_ENTRY_DEL: used for removing a FIB entry
FIB_EVENT_RULE_ADD, FIB_EVENT_RULE_DEL: used to propagate FIB rule changes
XXX: Currently, only SWITCHDEV_OBJ_ID_IPV4_FIB objects are supported.
FIB_EVENT_ENTRY_ADD and FIB_EVENT_ENTRY_DEL events pass:
SWITCHDEV_OBJ_ID_IPV4_FIB object passes:
struct switchdev_obj_ipv4_fib { /* IPV4_FIB */
struct fib_entry_notifier_info {
struct fib_notifier_info info; /* must be first */
u32 dst;
int dst_len;
struct fib_info *fi;
u8 tos;
u8 type;
u32 nlflags;
u32 tb_id;
} ipv4_fib;
u32 nlflags;
};
to add/modify/delete IPv4 dst/dest_len prefix on table tb_id. The *fi
structure holds details on the route and route's nexthops. *dev is one of the
port netdevs mentioned in the routes next hop list. If the output port netdevs
referenced in the route's nexthop list don't all have the same switch ID, the
driver is not called to add/modify/delete the FIB entry.
port netdevs mentioned in the route's next hop list.
Routes offloaded to the device are labeled with "offload" in the ip route
listing:
......@@ -355,6 +354,8 @@ listing:
12.0.0.4 via 11.0.0.9 dev sw1p2 proto zebra metric 20 offload
192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.15
The "offload" flag is set in case at least one device offloads the FIB entry.
XXX: add/mod/del IPv6 FIB API
Nexthop Resolution
......
......@@ -45,7 +45,7 @@
#include <linux/list.h>
#include <linux/dcbnl.h>
#include <linux/in6.h>
#include <net/switchdev.h>
#include <linux/notifier.h>
#include "port.h"
#include "core.h"
......@@ -257,6 +257,7 @@ struct mlxsw_sp_router {
#define MLXSW_SP_UNRESOLVED_NH_PROBE_INTERVAL 5000 /* ms */
struct list_head nexthop_group_list;
struct list_head nexthop_neighs_list;
bool aborted;
};
struct mlxsw_sp {
......@@ -296,6 +297,7 @@ struct mlxsw_sp {
struct mlxsw_sp_span_entry *entries;
int entries_count;
} span;
struct notifier_block fib_nb;
};
static inline struct mlxsw_sp_upper *
......@@ -584,11 +586,6 @@ static inline void mlxsw_sp_port_dcb_fini(struct mlxsw_sp_port *mlxsw_sp_port)
int mlxsw_sp_router_init(struct mlxsw_sp *mlxsw_sp);
void mlxsw_sp_router_fini(struct mlxsw_sp *mlxsw_sp);
int mlxsw_sp_router_fib4_add(struct mlxsw_sp_port *mlxsw_sp_port,
const struct switchdev_obj_ipv4_fib *fib4,
struct switchdev_trans *trans);
int mlxsw_sp_router_fib4_del(struct mlxsw_sp_port *mlxsw_sp_port,
const struct switchdev_obj_ipv4_fib *fib4);
int mlxsw_sp_router_neigh_construct(struct net_device *dev,
struct neighbour *n);
void mlxsw_sp_router_neigh_destroy(struct net_device *dev,
......
......@@ -1044,11 +1044,6 @@ static int mlxsw_sp_port_obj_add(struct net_device *dev,
SWITCHDEV_OBJ_PORT_VLAN(obj),
trans);
break;
case SWITCHDEV_OBJ_ID_IPV4_FIB:
err = mlxsw_sp_router_fib4_add(mlxsw_sp_port,
SWITCHDEV_OBJ_IPV4_FIB(obj),
trans);
break;
case SWITCHDEV_OBJ_ID_PORT_FDB:
err = mlxsw_sp_port_fdb_static_add(mlxsw_sp_port,
SWITCHDEV_OBJ_PORT_FDB(obj),
......@@ -1181,10 +1176,6 @@ static int mlxsw_sp_port_obj_del(struct net_device *dev,
err = mlxsw_sp_port_vlans_del(mlxsw_sp_port,
SWITCHDEV_OBJ_PORT_VLAN(obj));
break;
case SWITCHDEV_OBJ_ID_IPV4_FIB:
err = mlxsw_sp_router_fib4_del(mlxsw_sp_port,
SWITCHDEV_OBJ_IPV4_FIB(obj));
break;
case SWITCHDEV_OBJ_ID_PORT_FDB:
err = mlxsw_sp_port_fdb_static_del(mlxsw_sp_port,
SWITCHDEV_OBJ_PORT_FDB(obj));
......
......@@ -15,6 +15,7 @@
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/netdevice.h>
#include <linux/notifier.h>
#include <net/neighbour.h>
#include <net/switchdev.h>
......@@ -52,6 +53,9 @@ struct rocker_port {
struct rocker_dma_ring_info rx_ring;
};
struct rocker_port *rocker_port_dev_lower_find(struct net_device *dev,
struct rocker *rocker);
struct rocker_world_ops;
struct rocker {
......@@ -66,6 +70,7 @@ struct rocker {
spinlock_t cmd_ring_lock; /* for cmd ring accesses */
struct rocker_dma_ring_info cmd_ring;
struct rocker_dma_ring_info event_ring;
struct notifier_block fib_nb;
struct rocker_world_ops *wops;
void *wpriv;
};
......@@ -117,11 +122,6 @@ struct rocker_world_ops {
int (*port_obj_vlan_dump)(const struct rocker_port *rocker_port,
struct switchdev_obj_port_vlan *vlan,
switchdev_obj_dump_cb_t *cb);
int (*port_obj_fib4_add)(struct rocker_port *rocker_port,
const struct switchdev_obj_ipv4_fib *fib4,
struct switchdev_trans *trans);
int (*port_obj_fib4_del)(struct rocker_port *rocker_port,
const struct switchdev_obj_ipv4_fib *fib4);
int (*port_obj_fdb_add)(struct rocker_port *rocker_port,
const struct switchdev_obj_port_fdb *fdb,
struct switchdev_trans *trans);
......@@ -141,6 +141,11 @@ struct rocker_world_ops {
int (*port_ev_mac_vlan_seen)(struct rocker_port *rocker_port,
const unsigned char *addr,
__be16 vlan_id);
int (*fib4_add)(struct rocker *rocker,
const struct fib_entry_notifier_info *fen_info);
int (*fib4_del)(struct rocker *rocker,
const struct fib_entry_notifier_info *fen_info);
void (*fib4_abort)(struct rocker *rocker);
};
extern struct rocker_world_ops rocker_ofdpa_ops;
......
......@@ -1624,29 +1624,6 @@ rocker_world_port_obj_vlan_dump(const struct rocker_port *rocker_port,
return wops->port_obj_vlan_dump(rocker_port, vlan, cb);
}
static int
rocker_world_port_obj_fib4_add(struct rocker_port *rocker_port,
const struct switchdev_obj_ipv4_fib *fib4,
struct switchdev_trans *trans)
{
struct rocker_world_ops *wops = rocker_port->rocker->wops;
if (!wops->port_obj_fib4_add)
return -EOPNOTSUPP;
return wops->port_obj_fib4_add(rocker_port, fib4, trans);
}
static int
rocker_world_port_obj_fib4_del(struct rocker_port *rocker_port,
const struct switchdev_obj_ipv4_fib *fib4)
{
struct rocker_world_ops *wops = rocker_port->rocker->wops;
if (!wops->port_obj_fib4_del)
return -EOPNOTSUPP;
return wops->port_obj_fib4_del(rocker_port, fib4);
}
static int
rocker_world_port_obj_fdb_add(struct rocker_port *rocker_port,
const struct switchdev_obj_port_fdb *fdb,
......@@ -1733,6 +1710,34 @@ static int rocker_world_port_ev_mac_vlan_seen(struct rocker_port *rocker_port,
return wops->port_ev_mac_vlan_seen(rocker_port, addr, vlan_id);
}
static int rocker_world_fib4_add(struct rocker *rocker,
const struct fib_entry_notifier_info *fen_info)
{
struct rocker_world_ops *wops = rocker->wops;
if (!wops->fib4_add)
return 0;
return wops->fib4_add(rocker, fen_info);
}
static int rocker_world_fib4_del(struct rocker *rocker,
const struct fib_entry_notifier_info *fen_info)
{
struct rocker_world_ops *wops = rocker->wops;
if (!wops->fib4_del)
return 0;
return wops->fib4_del(rocker, fen_info);
}
static void rocker_world_fib4_abort(struct rocker *rocker)
{
struct rocker_world_ops *wops = rocker->wops;
if (wops->fib4_abort)
wops->fib4_abort(rocker);
}
/*****************
* Net device ops
*****************/
......@@ -2096,11 +2101,6 @@ static int rocker_port_obj_add(struct net_device *dev,
SWITCHDEV_OBJ_PORT_VLAN(obj),
trans);
break;
case SWITCHDEV_OBJ_ID_IPV4_FIB:
err = rocker_world_port_obj_fib4_add(rocker_port,
SWITCHDEV_OBJ_IPV4_FIB(obj),
trans);
break;
case SWITCHDEV_OBJ_ID_PORT_FDB:
err = rocker_world_port_obj_fdb_add(rocker_port,
SWITCHDEV_OBJ_PORT_FDB(obj),
......@@ -2125,10 +2125,6 @@ static int rocker_port_obj_del(struct net_device *dev,
err = rocker_world_port_obj_vlan_del(rocker_port,
SWITCHDEV_OBJ_PORT_VLAN(obj));
break;
case SWITCHDEV_OBJ_ID_IPV4_FIB:
err = rocker_world_port_obj_fib4_del(rocker_port,
SWITCHDEV_OBJ_IPV4_FIB(obj));
break;
case SWITCHDEV_OBJ_ID_PORT_FDB:
err = rocker_world_port_obj_fdb_del(rocker_port,
SWITCHDEV_OBJ_PORT_FDB(obj));
......@@ -2175,6 +2171,31 @@ static const struct switchdev_ops rocker_port_switchdev_ops = {
.switchdev_port_obj_dump = rocker_port_obj_dump,
};
static int rocker_router_fib_event(struct notifier_block *nb,
unsigned long event, void *ptr)
{
struct rocker *rocker = container_of(nb, struct rocker, fib_nb);
struct fib_entry_notifier_info *fen_info = ptr;
int err;
switch (event) {
case FIB_EVENT_ENTRY_ADD:
err = rocker_world_fib4_add(rocker, fen_info);
if (err)
rocker_world_fib4_abort(rocker);
else
break;
case FIB_EVENT_ENTRY_DEL:
rocker_world_fib4_del(rocker, fen_info);
break;
case FIB_EVENT_RULE_ADD: /* fall through */
case FIB_EVENT_RULE_DEL:
rocker_world_fib4_abort(rocker);
break;
}
return NOTIFY_DONE;
}
/********************
* ethtool interface
********************/
......@@ -2740,6 +2761,9 @@ static int rocker_probe(struct pci_dev *pdev, const struct pci_device_id *id)
goto err_probe_ports;
}
rocker->fib_nb.notifier_call = rocker_router_fib_event;
register_fib_notifier(&rocker->fib_nb);
dev_info(&pdev->dev, "Rocker switch with id %*phN\n",
(int)sizeof(rocker->hw.id), &rocker->hw.id);
......@@ -2771,6 +2795,7 @@ static void rocker_remove(struct pci_dev *pdev)
{
struct rocker *rocker = pci_get_drvdata(pdev);
unregister_fib_notifier(&rocker->fib_nb);
rocker_write32(rocker, CONTROL, ROCKER_CONTROL_RESET);
rocker_remove_ports(rocker);
free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_EVENT), rocker);
......@@ -2799,6 +2824,37 @@ static bool rocker_port_dev_check(const struct net_device *dev)
return dev->netdev_ops == &rocker_port_netdev_ops;
}
static bool rocker_port_dev_check_under(const struct net_device *dev,
struct rocker *rocker)
{
struct rocker_port *rocker_port;
if (!rocker_port_dev_check(dev))
return false;
rocker_port = netdev_priv(dev);
if (rocker_port->rocker != rocker)
return false;
return true;
}
struct rocker_port *rocker_port_dev_lower_find(struct net_device *dev,
struct rocker *rocker)
{
struct net_device *lower_dev;
struct list_head *iter;
if (rocker_port_dev_check_under(dev, rocker))
return netdev_priv(dev);
netdev_for_each_all_lower_dev(dev, lower_dev, iter) {
if (rocker_port_dev_check_under(lower_dev, rocker))
return netdev_priv(lower_dev);
}
return NULL;
}
static int rocker_netdevice_event(struct notifier_block *unused,
unsigned long event, void *ptr)
{
......
......@@ -99,6 +99,7 @@ struct ofdpa_flow_tbl_entry {
struct ofdpa_flow_tbl_key key;
size_t key_len;
u32 key_crc32; /* key */
struct fib_info *fi;
};
struct ofdpa_group_tbl_entry {
......@@ -189,6 +190,7 @@ struct ofdpa {
spinlock_t neigh_tbl_lock; /* for neigh tbl accesses */
u32 neigh_tbl_next_index;
unsigned long ageing_time;
bool fib_aborted;
};
struct ofdpa_port {
......@@ -1043,7 +1045,8 @@ static int ofdpa_flow_tbl_ucast4_routing(struct ofdpa_port *ofdpa_port,
__be16 eth_type, __be32 dst,
__be32 dst_mask, u32 priority,
enum rocker_of_dpa_table_id goto_tbl,
u32 group_id, int flags)
u32 group_id, struct fib_info *fi,
int flags)
{
struct ofdpa_flow_tbl_entry *entry;
......@@ -1060,6 +1063,7 @@ static int ofdpa_flow_tbl_ucast4_routing(struct ofdpa_port *ofdpa_port,
entry->key.ucast_routing.group_id = group_id;
entry->key_len = offsetof(struct ofdpa_flow_tbl_key,
ucast_routing.group_id);
entry->fi = fi;
return ofdpa_flow_tbl_do(ofdpa_port, trans, flags, entry);
}
......@@ -1425,7 +1429,7 @@ static int ofdpa_port_ipv4_neigh(struct ofdpa_port *ofdpa_port,
eth_type, ip_addr,
inet_make_mask(32),
priority, goto_tbl,
group_id, flags);
group_id, NULL, flags);
if (err)
netdev_err(ofdpa_port->dev, "Error (%d) /32 unicast route %pI4 group 0x%08x\n",
......@@ -2390,7 +2394,7 @@ static __be16 ofdpa_port_internal_vlan_id_get(struct ofdpa_port *ofdpa_port,
static int ofdpa_port_fib_ipv4(struct ofdpa_port *ofdpa_port,
struct switchdev_trans *trans, __be32 dst,
int dst_len, const struct fib_info *fi,
int dst_len, struct fib_info *fi,
u32 tb_id, int flags)
{
const struct fib_nh *nh;
......@@ -2426,7 +2430,7 @@ static int ofdpa_port_fib_ipv4(struct ofdpa_port *ofdpa_port,
err = ofdpa_flow_tbl_ucast4_routing(ofdpa_port, trans, eth_type, dst,
dst_mask, priority, goto_tbl,
group_id, flags);
group_id, fi, flags);
if (err)
netdev_err(ofdpa_port->dev, "Error (%d) IPv4 route %pI4\n",
err, &dst);
......@@ -2718,28 +2722,6 @@ static int ofdpa_port_obj_vlan_dump(const struct rocker_port *rocker_port,
return err;
}
static int ofdpa_port_obj_fib4_add(struct rocker_port *rocker_port,
const struct switchdev_obj_ipv4_fib *fib4,
struct switchdev_trans *trans)
{
struct ofdpa_port *ofdpa_port = rocker_port->wpriv;
return ofdpa_port_fib_ipv4(ofdpa_port, trans,
htonl(fib4->dst), fib4->dst_len,
fib4->fi, fib4->tb_id, 0);
}
static int ofdpa_port_obj_fib4_del(struct rocker_port *rocker_port,
const struct switchdev_obj_ipv4_fib *fib4)
{
struct ofdpa_port *ofdpa_port = rocker_port->wpriv;
return ofdpa_port_fib_ipv4(ofdpa_port, NULL,
htonl(fib4->dst), fib4->dst_len,
fib4->fi, fib4->tb_id,
OFDPA_OP_FLAG_REMOVE);
}
static int ofdpa_port_obj_fdb_add(struct rocker_port *rocker_port,
const struct switchdev_obj_port_fdb *fdb,
struct switchdev_trans *trans)
......@@ -2922,6 +2904,82 @@ static int ofdpa_port_ev_mac_vlan_seen(struct rocker_port *rocker_port,
return ofdpa_port_fdb(ofdpa_port, NULL, addr, vlan_id, flags);
}
static struct ofdpa_port *ofdpa_port_dev_lower_find(struct net_device *dev,
struct rocker *rocker)
{
struct rocker_port *rocker_port;
rocker_port = rocker_port_dev_lower_find(dev, rocker);
return rocker_port ? rocker_port->wpriv : NULL;
}
static int ofdpa_fib4_add(struct rocker *rocker,
const struct fib_entry_notifier_info *fen_info)
{
struct ofdpa *ofdpa = rocker->wpriv;
struct ofdpa_port *ofdpa_port;
int err;
if (ofdpa->fib_aborted)
return 0;
ofdpa_port = ofdpa_port_dev_lower_find(fen_info->fi->fib_dev, rocker);
if (!ofdpa_port)
return 0;
err = ofdpa_port_fib_ipv4(ofdpa_port, NULL, htonl(fen_info->dst),
fen_info->dst_len, fen_info->fi,
fen_info->tb_id, 0);
if (err)
return err;
fib_info_offload_inc(fen_info->fi);
return 0;
}
static int ofdpa_fib4_del(struct rocker *rocker,
const struct fib_entry_notifier_info *fen_info)
{
struct ofdpa *ofdpa = rocker->wpriv;
struct ofdpa_port *ofdpa_port;
if (ofdpa->fib_aborted)
return 0;
ofdpa_port = ofdpa_port_dev_lower_find(fen_info->fi->fib_dev, rocker);
if (!ofdpa_port)
return 0;
fib_info_offload_dec(fen_info->fi);
return ofdpa_port_fib_ipv4(ofdpa_port, NULL, htonl(fen_info->dst),
fen_info->dst_len, fen_info->fi,
fen_info->tb_id, OFDPA_OP_FLAG_REMOVE);
}
static void ofdpa_fib4_abort(struct rocker *rocker)
{
struct ofdpa *ofdpa = rocker->wpriv;
struct ofdpa_port *ofdpa_port;
struct ofdpa_flow_tbl_entry *flow_entry;
struct hlist_node *tmp;
unsigned long flags;
int bkt;
if (ofdpa->fib_aborted)
return;
spin_lock_irqsave(&ofdpa->flow_tbl_lock, flags);
hash_for_each_safe(ofdpa->flow_tbl, bkt, tmp, flow_entry, entry) {
if (flow_entry->key.tbl_id !=
ROCKER_OF_DPA_TABLE_ID_UNICAST_ROUTING)
continue;
ofdpa_port = ofdpa_port_dev_lower_find(flow_entry->fi->fib_dev,
rocker);
if (!ofdpa_port)
continue;
fib_info_offload_dec(flow_entry->fi);
ofdpa_flow_tbl_del(ofdpa_port, NULL, OFDPA_OP_FLAG_REMOVE,
flow_entry);
}
spin_unlock_irqrestore(&ofdpa->flow_tbl_lock, flags);
ofdpa->fib_aborted = true;
}
struct rocker_world_ops rocker_ofdpa_ops = {
.kind = "ofdpa",
.priv_size = sizeof(struct ofdpa),
......@@ -2941,8 +2999,6 @@ struct rocker_world_ops rocker_ofdpa_ops = {
.port_obj_vlan_add = ofdpa_port_obj_vlan_add,
.port_obj_vlan_del = ofdpa_port_obj_vlan_del,
.port_obj_vlan_dump = ofdpa_port_obj_vlan_dump,
.port_obj_fib4_add = ofdpa_port_obj_fib4_add,
.port_obj_fib4_del = ofdpa_port_obj_fib4_del,
.port_obj_fdb_add = ofdpa_port_obj_fdb_add,
.port_obj_fdb_del = ofdpa_port_obj_fdb_del,
.port_obj_fdb_dump = ofdpa_port_obj_fdb_dump,
......@@ -2951,4 +3007,7 @@ struct rocker_world_ops rocker_ofdpa_ops = {
.port_neigh_update = ofdpa_port_neigh_update,
.port_neigh_destroy = ofdpa_port_neigh_destroy,
.port_ev_mac_vlan_seen = ofdpa_port_ev_mac_vlan_seen,
.fib4_add = ofdpa_fib4_add,
.fib4_del = ofdpa_fib4_del,
.fib4_abort = ofdpa_fib4_abort,
};
......@@ -22,6 +22,7 @@
#include <net/fib_rules.h>
#include <net/inetpeer.h>
#include <linux/percpu.h>
#include <linux/notifier.h>
struct fib_config {
u8 fc_dst_len;
......@@ -122,6 +123,7 @@ struct fib_info {
#ifdef CONFIG_IP_ROUTE_MULTIPATH
int fib_weight;
#endif
unsigned int fib_offload_cnt;
struct rcu_head rcu;
struct fib_nh fib_nh[0];
#define fib_dev fib_nh[0].nh_dev
......@@ -173,6 +175,18 @@ struct fib_result_nl {
__be32 fib_info_update_nh_saddr(struct net *net, struct fib_nh *nh);
static inline void fib_info_offload_inc(struct fib_info *fi)
{
fi->fib_offload_cnt++;
fi->fib_flags |= RTNH_F_OFFLOAD;
}
static inline void fib_info_offload_dec(struct fib_info *fi)
{
if (--fi->fib_offload_cnt == 0)
fi->fib_flags &= ~RTNH_F_OFFLOAD;
}
#define FIB_RES_SADDR(net, res) \
((FIB_RES_NH(res).nh_saddr_genid == \
atomic_read(&(net)->ipv4.dev_addr_genid)) ? \
......@@ -185,6 +199,33 @@ __be32 fib_info_update_nh_saddr(struct net *net, struct fib_nh *nh);
#define FIB_RES_PREFSRC(net, res) ((res).fi->fib_prefsrc ? : \
FIB_RES_SADDR(net, res))
struct fib_notifier_info {
struct net *net;
};
struct fib_entry_notifier_info {
struct fib_notifier_info info; /* must be first */
u32 dst;
int dst_len;
struct fib_info *fi;
u8 tos;
u8 type;
u32 tb_id;
u32 nlflags;
};
enum fib_event_type {
FIB_EVENT_ENTRY_ADD,
FIB_EVENT_ENTRY_DEL,
FIB_EVENT_RULE_ADD,
FIB_EVENT_RULE_DEL,
};
int register_fib_notifier(struct notifier_block *nb);
int unregister_fib_notifier(struct notifier_block *nb);
int call_fib_notifiers(struct net *net, enum fib_event_type event_type,
struct fib_notifier_info *info);
struct fib_table {
struct hlist_node tb_hlist;
u32 tb_id;
......@@ -196,13 +237,12 @@ struct fib_table {
int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
struct fib_result *res, int fib_flags);
int fib_table_insert(struct fib_table *, struct fib_config *);
int fib_table_delete(struct fib_table *, struct fib_config *);
int fib_table_insert(struct net *, struct fib_table *, struct fib_config *);
int fib_table_delete(struct net *, struct fib_table *, struct fib_config *);
int fib_table_dump(struct fib_table *table, struct sk_buff *skb,
struct netlink_callback *cb);
int fib_table_flush(struct fib_table *table);
int fib_table_flush(struct net *net, struct fib_table *table);
struct fib_table *fib_trie_unmerge(struct fib_table *main_tb);
void fib_table_flush_external(struct fib_table *table);
void fib_free_table(struct fib_table *tb);
#ifndef CONFIG_IP_MULTIPLE_TABLES
......@@ -315,7 +355,6 @@ static inline int fib_num_tclassid_users(struct net *net)
}
#endif
int fib_unmerge(struct net *net);
void fib_flush_external(struct net *net);
/* Exported by fib_semantics.c */
int ip_fib_check_default(__be32 gw, struct net_device *dev);
......
......@@ -68,7 +68,6 @@ struct switchdev_attr {
enum switchdev_obj_id {
SWITCHDEV_OBJ_ID_UNDEFINED,
SWITCHDEV_OBJ_ID_PORT_VLAN,
SWITCHDEV_OBJ_ID_IPV4_FIB,
SWITCHDEV_OBJ_ID_PORT_FDB,
SWITCHDEV_OBJ_ID_PORT_MDB,
};
......@@ -92,21 +91,6 @@ struct switchdev_obj_port_vlan {
#define SWITCHDEV_OBJ_PORT_VLAN(obj) \
container_of(obj, struct switchdev_obj_port_vlan, obj)
/* SWITCHDEV_OBJ_ID_IPV4_FIB */
struct switchdev_obj_ipv4_fib {
struct switchdev_obj obj;
u32 dst;
int dst_len;
struct fib_info *fi;
u8 tos;
u8 type;
u32 nlflags;
u32 tb_id;
};
#define SWITCHDEV_OBJ_IPV4_FIB(obj) \
container_of(obj, struct switchdev_obj_ipv4_fib, obj)
/* SWITCHDEV_OBJ_ID_PORT_FDB */
struct switchdev_obj_port_fdb {
struct switchdev_obj obj;
......@@ -209,11 +193,6 @@ int switchdev_port_bridge_setlink(struct net_device *dev,
struct nlmsghdr *nlh, u16 flags);
int switchdev_port_bridge_dellink(struct net_device *dev,
struct nlmsghdr *nlh, u16 flags);
int switchdev_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
u8 tos, u8 type, u32 nlflags, u32 tb_id);
int switchdev_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
u8 tos, u8 type, u32 tb_id);
void switchdev_fib_ipv4_abort(struct fib_info *fi);
int switchdev_port_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
struct net_device *dev, const unsigned char *addr,
u16 vid, u16 nlm_flags);
......@@ -304,25 +283,6 @@ static inline int switchdev_port_bridge_dellink(struct net_device *dev,
return -EOPNOTSUPP;
}
static inline int switchdev_fib_ipv4_add(u32 dst, int dst_len,
struct fib_info *fi,
u8 tos, u8 type,
u32 nlflags, u32 tb_id)
{
return 0;
}
static inline int switchdev_fib_ipv4_del(u32 dst, int dst_len,
struct fib_info *fi,
u8 tos, u8 type, u32 tb_id)
{
return 0;
}
static inline void switchdev_fib_ipv4_abort(struct fib_info *fi)
{
}
static inline int switchdev_port_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
struct net_device *dev,
const unsigned char *addr,
......
......@@ -182,26 +182,13 @@ static void fib_flush(struct net *net)
struct fib_table *tb;
hlist_for_each_entry_safe(tb, tmp, head, tb_hlist)
flushed += fib_table_flush(tb);
flushed += fib_table_flush(net, tb);
}
if (flushed)
rt_cache_flush(net);
}
void fib_flush_external(struct net *net)
{
struct fib_table *tb;
struct hlist_head *head;
unsigned int h;
for (h = 0; h < FIB_TABLE_HASHSZ; h++) {
head = &net->ipv4.fib_table_hash[h];
hlist_for_each_entry(tb, head, tb_hlist)
fib_table_flush_external(tb);
}
}
/*
* Find address type as if only "dev" was present in the system. If
* on_dev is NULL then all interfaces are taken into consideration.
......@@ -590,13 +577,13 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg)
if (cmd == SIOCDELRT) {
tb = fib_get_table(net, cfg.fc_table);
if (tb)
err = fib_table_delete(tb, &cfg);
err = fib_table_delete(net, tb, &cfg);
else
err = -ESRCH;
} else {
tb = fib_new_table(net, cfg.fc_table);
if (tb)
err = fib_table_insert(tb, &cfg);
err = fib_table_insert(net, tb, &cfg);
else
err = -ENOBUFS;
}
......@@ -719,7 +706,7 @@ static int inet_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh)
goto errout;
}
err = fib_table_delete(tb, &cfg);
err = fib_table_delete(net, tb, &cfg);
errout:
return err;
}
......@@ -741,7 +728,7 @@ static int inet_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh)
goto errout;
}
err = fib_table_insert(tb, &cfg);
err = fib_table_insert(net, tb, &cfg);
errout:
return err;
}
......@@ -828,9 +815,9 @@ static void fib_magic(int cmd, int type, __be32 dst, int dst_len, struct in_ifad
cfg.fc_scope = RT_SCOPE_HOST;
if (cmd == RTM_NEWROUTE)
fib_table_insert(tb, &cfg);
fib_table_insert(net, tb, &cfg);
else
fib_table_delete(tb, &cfg);
fib_table_delete(net, tb, &cfg);
}
void fib_add_ifaddr(struct in_ifaddr *ifa)
......@@ -1254,7 +1241,7 @@ static void ip_fib_net_exit(struct net *net)
hlist_for_each_entry_safe(tb, tmp, head, tb_hlist) {
hlist_del(&tb->tb_hlist);
fib_table_flush(tb);
fib_table_flush(net, tb);
fib_free_table(tb);
}
}
......
......@@ -164,6 +164,14 @@ static struct fib_table *fib_empty_table(struct net *net)
return NULL;
}
static int call_fib_rule_notifiers(struct net *net,
enum fib_event_type event_type)
{
struct fib_notifier_info info;
return call_fib_notifiers(net, event_type, &info);
}
static const struct nla_policy fib4_rule_policy[FRA_MAX+1] = {
FRA_GENERIC_POLICY,
[FRA_FLOW] = { .type = NLA_U32 },
......@@ -220,7 +228,7 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
rule4->tos = frh->tos;
net->ipv4.fib_has_custom_rules = true;
fib_flush_external(rule->fr_net);
call_fib_rule_notifiers(net, FIB_EVENT_RULE_ADD);
err = 0;
errout:
......@@ -242,7 +250,7 @@ static int fib4_rule_delete(struct fib_rule *rule)
net->ipv4.fib_num_tclassid_users--;
#endif
net->ipv4.fib_has_custom_rules = true;
fib_flush_external(rule->fr_net);
call_fib_rule_notifiers(net, FIB_EVENT_RULE_DEL);
errout:
return err;
}
......
......@@ -73,6 +73,7 @@
#include <linux/slab.h>
#include <linux/export.h>
#include <linux/vmalloc.h>
#include <linux/notifier.h>
#include <net/net_namespace.h>
#include <net/ip.h>
#include <net/protocol.h>
......@@ -80,10 +81,47 @@
#include <net/tcp.h>
#include <net/sock.h>
#include <net/ip_fib.h>
#include <net/switchdev.h>
#include <trace/events/fib.h>
#include "fib_lookup.h"
static BLOCKING_NOTIFIER_HEAD(fib_chain);
int register_fib_notifier(struct notifier_block *nb)
{
return blocking_notifier_chain_register(&fib_chain, nb);
}
EXPORT_SYMBOL(register_fib_notifier);
int unregister_fib_notifier(struct notifier_block *nb)
{
return blocking_notifier_chain_unregister(&fib_chain, nb);
}
EXPORT_SYMBOL(unregister_fib_notifier);
int call_fib_notifiers(struct net *net, enum fib_event_type event_type,
struct fib_notifier_info *info)
{
info->net = net;
return blocking_notifier_call_chain(&fib_chain, event_type, info);
}
static int call_fib_entry_notifiers(struct net *net,
enum fib_event_type event_type, u32 dst,
int dst_len, struct fib_info *fi,
u8 tos, u8 type, u32 tb_id, u32 nlflags)
{
struct fib_entry_notifier_info info = {
.dst = dst,
.dst_len = dst_len,
.fi = fi,
.tos = tos,
.type = type,
.tb_id = tb_id,
.nlflags = nlflags,
};
return call_fib_notifiers(net, event_type, &info.info);
}
#define MAX_STAT_DEPTH 32
#define KEYLENGTH (8*sizeof(t_key))
......@@ -1076,7 +1114,8 @@ static int fib_insert_alias(struct trie *t, struct key_vector *tp,
}
/* Caller must hold RTNL. */
int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
int fib_table_insert(struct net *net, struct fib_table *tb,
struct fib_config *cfg)
{
struct trie *t = (struct trie *)tb->tb_data;
struct fib_alias *fa, *new_fa;
......@@ -1175,17 +1214,6 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
new_fa->tb_id = tb->tb_id;
new_fa->fa_default = -1;
err = switchdev_fib_ipv4_add(key, plen, fi,
new_fa->fa_tos,
cfg->fc_type,
cfg->fc_nlflags,
tb->tb_id);
if (err) {
switchdev_fib_ipv4_abort(fi);
kmem_cache_free(fn_alias_kmem, new_fa);
goto out;
}
hlist_replace_rcu(&fa->fa_list, &new_fa->fa_list);
alias_free_mem_rcu(fa);
......@@ -1193,6 +1221,11 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
fib_release_info(fi_drop);
if (state & FA_S_ACCESSED)
rt_cache_flush(cfg->fc_nlinfo.nl_net);
call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_ADD,
key, plen, fi,
new_fa->fa_tos, cfg->fc_type,
tb->tb_id, cfg->fc_nlflags);
rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
tb->tb_id, &cfg->fc_nlinfo, nlflags);
......@@ -1228,30 +1261,22 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
new_fa->tb_id = tb->tb_id;
new_fa->fa_default = -1;
/* (Optionally) offload fib entry to switch hardware. */
err = switchdev_fib_ipv4_add(key, plen, fi, tos, cfg->fc_type,
cfg->fc_nlflags, tb->tb_id);
if (err) {
switchdev_fib_ipv4_abort(fi);
goto out_free_new_fa;
}
/* Insert new entry to the list. */
err = fib_insert_alias(t, tp, l, new_fa, fa, key);
if (err)
goto out_sw_fib_del;
goto out_free_new_fa;
if (!plen)
tb->tb_num_default++;
rt_cache_flush(cfg->fc_nlinfo.nl_net);
call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_ADD, key, plen, fi, tos,
cfg->fc_type, tb->tb_id, cfg->fc_nlflags);
rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen, new_fa->tb_id,
&cfg->fc_nlinfo, nlflags);
succeeded:
return 0;
out_sw_fib_del:
switchdev_fib_ipv4_del(key, plen, fi, tos, cfg->fc_type, tb->tb_id);
out_free_new_fa:
kmem_cache_free(fn_alias_kmem, new_fa);
out:
......@@ -1490,7 +1515,8 @@ static void fib_remove_alias(struct trie *t, struct key_vector *tp,
}
/* Caller must hold RTNL. */
int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
int fib_table_delete(struct net *net, struct fib_table *tb,
struct fib_config *cfg)
{
struct trie *t = (struct trie *) tb->tb_data;
struct fib_alias *fa, *fa_to_delete;
......@@ -1543,9 +1569,9 @@ int fib_table_delete(struct fib_table *tb, struct fib_config *cfg)
if (!fa_to_delete)
return -ESRCH;
switchdev_fib_ipv4_del(key, plen, fa_to_delete->fa_info, tos,
cfg->fc_type, tb->tb_id);
call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, key, plen,
fa_to_delete->fa_info, tos, cfg->fc_type,
tb->tb_id, 0);
rtmsg_fib(RTM_DELROUTE, htonl(key), fa_to_delete, plen, tb->tb_id,
&cfg->fc_nlinfo, 0);
......@@ -1734,82 +1760,8 @@ struct fib_table *fib_trie_unmerge(struct fib_table *oldtb)
return NULL;
}
/* Caller must hold RTNL */
void fib_table_flush_external(struct fib_table *tb)
{
struct trie *t = (struct trie *)tb->tb_data;
struct key_vector *pn = t->kv;
unsigned long cindex = 1;
struct hlist_node *tmp;
struct fib_alias *fa;
/* walk trie in reverse order */
for (;;) {
unsigned char slen = 0;
struct key_vector *n;
if (!(cindex--)) {
t_key pkey = pn->key;
/* cannot resize the trie vector */
if (IS_TRIE(pn))
break;
/* resize completed node */
pn = resize(t, pn);
cindex = get_index(pkey, pn);
continue;
}
/* grab the next available node */
n = get_child(pn, cindex);
if (!n)
continue;
if (IS_TNODE(n)) {
/* record pn and cindex for leaf walking */
pn = n;
cindex = 1ul << n->bits;
continue;
}
hlist_for_each_entry_safe(fa, tmp, &n->leaf, fa_list) {
struct fib_info *fi = fa->fa_info;
/* if alias was cloned to local then we just
* need to remove the local copy from main
*/
if (tb->tb_id != fa->tb_id) {
hlist_del_rcu(&fa->fa_list);
alias_free_mem_rcu(fa);
continue;
}
/* record local slen */
slen = fa->fa_slen;
if (!fi || !(fi->fib_flags & RTNH_F_OFFLOAD))
continue;
switchdev_fib_ipv4_del(n->key, KEYLENGTH - fa->fa_slen,
fi, fa->fa_tos, fa->fa_type,
tb->tb_id);
}
/* update leaf slen */
n->slen = slen;
if (hlist_empty(&n->leaf)) {
put_child_root(pn, n->key, NULL);
node_free(n);
}
}
}
/* Caller must hold RTNL. */
int fib_table_flush(struct fib_table *tb)
int fib_table_flush(struct net *net, struct fib_table *tb)
{
struct trie *t = (struct trie *)tb->tb_data;
struct key_vector *pn = t->kv;
......@@ -1858,9 +1810,11 @@ int fib_table_flush(struct fib_table *tb)
continue;
}
switchdev_fib_ipv4_del(n->key, KEYLENGTH - fa->fa_slen,
fi, fa->fa_tos, fa->fa_type,
tb->tb_id);
call_fib_entry_notifiers(net, FIB_EVENT_ENTRY_DEL,
n->key,
KEYLENGTH - fa->fa_slen,
fi, fa->fa_tos, fa->fa_type,
tb->tb_id, 0);
hlist_del_rcu(&fa->fa_list);
fib_release_info(fa->fa_info);
alias_free_mem_rcu(fa);
......
......@@ -21,7 +21,6 @@
#include <linux/workqueue.h>
#include <linux/if_vlan.h>
#include <linux/rtnetlink.h>
#include <net/ip_fib.h>
#include <net/switchdev.h>
/**
......@@ -344,8 +343,6 @@ static size_t switchdev_obj_size(const struct switchdev_obj *obj)
switch (obj->id) {
case SWITCHDEV_OBJ_ID_PORT_VLAN:
return sizeof(struct switchdev_obj_port_vlan);
case SWITCHDEV_OBJ_ID_IPV4_FIB:
return sizeof(struct switchdev_obj_ipv4_fib);
case SWITCHDEV_OBJ_ID_PORT_FDB:
return sizeof(struct switchdev_obj_port_fdb);
case SWITCHDEV_OBJ_ID_PORT_MDB:
......@@ -1108,184 +1105,6 @@ int switchdev_port_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb,
}
EXPORT_SYMBOL_GPL(switchdev_port_fdb_dump);
static struct net_device *switchdev_get_lowest_dev(struct net_device *dev)
{
const struct switchdev_ops *ops = dev->switchdev_ops;
struct net_device *lower_dev;
struct net_device *port_dev;
struct list_head *iter;
/* Recusively search down until we find a sw port dev.
* (A sw port dev supports switchdev_port_attr_get).
*/
if (ops && ops->switchdev_port_attr_get)
return dev;
netdev_for_each_lower_dev(dev, lower_dev, iter) {
port_dev = switchdev_get_lowest_dev(lower_dev);
if (port_dev)
return port_dev;
}
return NULL;
}
static struct net_device *switchdev_get_dev_by_nhs(struct fib_info *fi)
{
struct switchdev_attr attr = {
.id = SWITCHDEV_ATTR_ID_PORT_PARENT_ID,
};
struct switchdev_attr prev_attr;
struct net_device *dev = NULL;
int nhsel;
ASSERT_RTNL();
/* For this route, all nexthop devs must be on the same switch. */
for (nhsel = 0; nhsel < fi->fib_nhs; nhsel++) {
const struct fib_nh *nh = &fi->fib_nh[nhsel];
if (!nh->nh_dev)
return NULL;
dev = switchdev_get_lowest_dev(nh->nh_dev);
if (!dev)
return NULL;
attr.orig_dev = dev;
if (switchdev_port_attr_get(dev, &attr))
return NULL;
if (nhsel > 0 &&
!netdev_phys_item_id_same(&prev_attr.u.ppid, &attr.u.ppid))
return NULL;
prev_attr = attr;
}
return dev;
}
/**
* switchdev_fib_ipv4_add - Add/modify switch IPv4 route entry
*
* @dst: route's IPv4 destination address
* @dst_len: destination address length (prefix length)
* @fi: route FIB info structure
* @tos: route TOS
* @type: route type
* @nlflags: netlink flags passed in (NLM_F_*)
* @tb_id: route table ID
*
* Add/modify switch IPv4 route entry.
*/
int switchdev_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
u8 tos, u8 type, u32 nlflags, u32 tb_id)
{
struct switchdev_obj_ipv4_fib ipv4_fib = {
.obj.id = SWITCHDEV_OBJ_ID_IPV4_FIB,
.dst = dst,
.dst_len = dst_len,
.fi = fi,
.tos = tos,
.type = type,
.nlflags = nlflags,
.tb_id = tb_id,
};
struct net_device *dev;
int err = 0;
/* Don't offload route if using custom ip rules or if
* IPv4 FIB offloading has been disabled completely.
*/
#ifdef CONFIG_IP_MULTIPLE_TABLES
if (fi->fib_net->ipv4.fib_has_custom_rules)
return 0;
#endif
if (fi->fib_net->ipv4.fib_offload_disabled)
return 0;
dev = switchdev_get_dev_by_nhs(fi);
if (!dev)
return 0;
ipv4_fib.obj.orig_dev = dev;
err = switchdev_port_obj_add(dev, &ipv4_fib.obj);
if (!err)
fi->fib_flags |= RTNH_F_OFFLOAD;
return err == -EOPNOTSUPP ? 0 : err;
}
EXPORT_SYMBOL_GPL(switchdev_fib_ipv4_add);
/**
* switchdev_fib_ipv4_del - Delete IPv4 route entry from switch
*
* @dst: route's IPv4 destination address
* @dst_len: destination address length (prefix length)
* @fi: route FIB info structure
* @tos: route TOS
* @type: route type
* @tb_id: route table ID
*
* Delete IPv4 route entry from switch device.
*/
int switchdev_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
u8 tos, u8 type, u32 tb_id)
{
struct switchdev_obj_ipv4_fib ipv4_fib = {
.obj.id = SWITCHDEV_OBJ_ID_IPV4_FIB,
.dst = dst,
.dst_len = dst_len,
.fi = fi,
.tos = tos,
.type = type,
.nlflags = 0,
.tb_id = tb_id,
};
struct net_device *dev;
int err = 0;
if (!(fi->fib_flags & RTNH_F_OFFLOAD))
return 0;
dev = switchdev_get_dev_by_nhs(fi);
if (!dev)
return 0;
ipv4_fib.obj.orig_dev = dev;
err = switchdev_port_obj_del(dev, &ipv4_fib.obj);
if (!err)
fi->fib_flags &= ~RTNH_F_OFFLOAD;
return err == -EOPNOTSUPP ? 0 : err;
}
EXPORT_SYMBOL_GPL(switchdev_fib_ipv4_del);
/**
* switchdev_fib_ipv4_abort - Abort an IPv4 FIB operation
*
* @fi: route FIB info structure
*/
void switchdev_fib_ipv4_abort(struct fib_info *fi)
{
/* There was a problem installing this route to the offload
* device. For now, until we come up with more refined
* policy handling, abruptly end IPv4 fib offloading for
* for entire net by flushing offload device(s) of all
* IPv4 routes, and mark IPv4 fib offloading broken from
* this point forward.
*/
fib_flush_external(fi->fib_net);
fi->fib_net->ipv4.fib_offload_disabled = true;
}
EXPORT_SYMBOL_GPL(switchdev_fib_ipv4_abort);
bool switchdev_port_same_parent_id(struct net_device *a,
struct net_device *b)
{
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment