Commit f389a40e authored by David S. Miller's avatar David S. Miller

Merge branch 'ipv4-nexthop-link-status'

Andy Gospodarek says:

====================
changes to make ipv4 routing table aware of next-hop link status

This series adds the ability to have the Linux kernel track whether or
not a particular route should be used based on the link-status of the
interface associated with the next-hop.

Before this patch any link-failure on an interface that was serving as a
gateway for some systems could result in those systems being isolated
from the rest of the network as the stack would continue to attempt to
send frames out of an interface that is actually linked-down.  When the
kernel is responsible for all forwarding, it should also be responsible
for taking action when the traffic can no longer be forwarded -- there
is no real need to outsource link-monitoring to userspace anymore.

This feature is only enabled with the new per-interface or ipv4 global
sysctls called 'ignore_routes_with_linkdown'.

net.ipv4.conf.all.ignore_routes_with_linkdown = 0
net.ipv4.conf.default.ignore_routes_with_linkdown = 0
net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
...

When the above sysctls are set, the kernel will not only report to
userspace that the link is down, but it will also report to userspace
that a route is dead.  This will signal to userspace that the route will
not be selected.

With the new sysctls set, the following behavior can be observed
(interface p8p1 is link-down):

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
    cache
local 80.0.0.1 dev lo  src 80.0.0.1
    cache <local>
80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
    cache

While the route does remain in the table (so it can be modified if
needed rather than being wiped away as it would be if IFF_UP was
cleared), the proper next-hop is chosen automatically when the link is
down.  Now interface p8p1 is linked-up:

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
    cache
local 80.0.0.1 dev lo  src 80.0.0.1
    cache <local>
80.0.0.2 dev p8p1  src 80.0.0.1
    cache

and the output changes to what one would expect.

If the global or interface sysctl is not set, the following output would
be expected when p8p1 is down:

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2

If the dead flag does not appear there should be no expectation that the
kernel would skip using this route due to link being down.

v2: Split kernel changes into 2 patches: first to add linkdown flag and
second to add new sysctl settings.  Also took suggestion from Alex to
simplify code by only checking sysctl during fib lookup and suggestion
from Scott to add a per-interface sysctl.  Added iproute2 patch to
recognize and print linkdown flag.

v3: Code cleanups along with reverse-path checks suggested by Alex and
small fixes related to problems found when multipath was disabled.

v4: Drop binary sysctls

v5: Whitespace and variable declaration fixups suggested by Dave

v6: Style changes noticed by Dave and checkpath suggestions.

v7: Last checkpatch fixup.

Though there were some that preferred not to have a configuration option
and to make this behavior the default when it was discussed in Ottawa
earlier this year since "it was time to do this."  I wanted to propose
the config option to preserve the current behavior for those that desire
it.  I'll happily remove it if Dave and Linus approve.

An IPv6 implementation is also needed (DECnet too!), but I wanted to
start with the IPv4 implementation to get people comfortable with the
idea before moving forward.  If this is accepted the IPv6 implementation
can be posted shortly.

There was also a request for switchdev support for this, but that will
be posted as a followup as switchdev does not currently handle dead
next-hops in a multi-path case and I felt that infra needed to be added
first.

FWIW, we have been running the original version of this series with a
global sysctl and our customers have been happily using a backported
version for IPv4 and IPv6 for >6 months.
====================
Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 5c8079d0 0eeb075f
...@@ -120,6 +120,9 @@ static inline void ipv4_devconf_setall(struct in_device *in_dev) ...@@ -120,6 +120,9 @@ static inline void ipv4_devconf_setall(struct in_device *in_dev)
|| (!IN_DEV_FORWARD(in_dev) && \ || (!IN_DEV_FORWARD(in_dev) && \
IN_DEV_ORCONF((in_dev), ACCEPT_REDIRECTS))) IN_DEV_ORCONF((in_dev), ACCEPT_REDIRECTS)))
#define IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) \
IN_DEV_CONF_GET((in_dev), IGNORE_ROUTES_WITH_LINKDOWN)
#define IN_DEV_ARPFILTER(in_dev) IN_DEV_ORCONF((in_dev), ARPFILTER) #define IN_DEV_ARPFILTER(in_dev) IN_DEV_ORCONF((in_dev), ARPFILTER)
#define IN_DEV_ARP_ACCEPT(in_dev) IN_DEV_ORCONF((in_dev), ARP_ACCEPT) #define IN_DEV_ARP_ACCEPT(in_dev) IN_DEV_ORCONF((in_dev), ARP_ACCEPT)
#define IN_DEV_ARP_ANNOUNCE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE) #define IN_DEV_ARP_ANNOUNCE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE)
......
...@@ -36,7 +36,8 @@ struct fib_lookup_arg { ...@@ -36,7 +36,8 @@ struct fib_lookup_arg {
void *result; void *result;
struct fib_rule *rule; struct fib_rule *rule;
int flags; int flags;
#define FIB_LOOKUP_NOREF 1 #define FIB_LOOKUP_NOREF 1
#define FIB_LOOKUP_IGNORE_LINKSTATE 2
}; };
struct fib_rules_ops { struct fib_rules_ops {
......
...@@ -226,7 +226,7 @@ static inline struct fib_table *fib_new_table(struct net *net, u32 id) ...@@ -226,7 +226,7 @@ static inline struct fib_table *fib_new_table(struct net *net, u32 id)
} }
static inline int fib_lookup(struct net *net, const struct flowi4 *flp, static inline int fib_lookup(struct net *net, const struct flowi4 *flp,
struct fib_result *res) struct fib_result *res, unsigned int flags)
{ {
struct fib_table *tb; struct fib_table *tb;
int err = -ENETUNREACH; int err = -ENETUNREACH;
...@@ -234,7 +234,7 @@ static inline int fib_lookup(struct net *net, const struct flowi4 *flp, ...@@ -234,7 +234,7 @@ static inline int fib_lookup(struct net *net, const struct flowi4 *flp,
rcu_read_lock(); rcu_read_lock();
tb = fib_get_table(net, RT_TABLE_MAIN); tb = fib_get_table(net, RT_TABLE_MAIN);
if (tb && !fib_table_lookup(tb, flp, res, FIB_LOOKUP_NOREF)) if (tb && !fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF))
err = 0; err = 0;
rcu_read_unlock(); rcu_read_unlock();
...@@ -249,16 +249,18 @@ void __net_exit fib4_rules_exit(struct net *net); ...@@ -249,16 +249,18 @@ void __net_exit fib4_rules_exit(struct net *net);
struct fib_table *fib_new_table(struct net *net, u32 id); struct fib_table *fib_new_table(struct net *net, u32 id);
struct fib_table *fib_get_table(struct net *net, u32 id); struct fib_table *fib_get_table(struct net *net, u32 id);
int __fib_lookup(struct net *net, struct flowi4 *flp, struct fib_result *res); int __fib_lookup(struct net *net, struct flowi4 *flp,
struct fib_result *res, unsigned int flags);
static inline int fib_lookup(struct net *net, struct flowi4 *flp, static inline int fib_lookup(struct net *net, struct flowi4 *flp,
struct fib_result *res) struct fib_result *res, unsigned int flags)
{ {
struct fib_table *tb; struct fib_table *tb;
int err; int err;
flags |= FIB_LOOKUP_NOREF;
if (net->ipv4.fib_has_custom_rules) if (net->ipv4.fib_has_custom_rules)
return __fib_lookup(net, flp, res); return __fib_lookup(net, flp, res, flags);
rcu_read_lock(); rcu_read_lock();
...@@ -266,11 +268,11 @@ static inline int fib_lookup(struct net *net, struct flowi4 *flp, ...@@ -266,11 +268,11 @@ static inline int fib_lookup(struct net *net, struct flowi4 *flp,
for (err = 0; !err; err = -ENETUNREACH) { for (err = 0; !err; err = -ENETUNREACH) {
tb = rcu_dereference_rtnl(net->ipv4.fib_main); tb = rcu_dereference_rtnl(net->ipv4.fib_main);
if (tb && !fib_table_lookup(tb, flp, res, FIB_LOOKUP_NOREF)) if (tb && !fib_table_lookup(tb, flp, res, flags))
break; break;
tb = rcu_dereference_rtnl(net->ipv4.fib_default); tb = rcu_dereference_rtnl(net->ipv4.fib_default);
if (tb && !fib_table_lookup(tb, flp, res, FIB_LOOKUP_NOREF)) if (tb && !fib_table_lookup(tb, flp, res, flags))
break; break;
} }
...@@ -305,9 +307,9 @@ void fib_flush_external(struct net *net); ...@@ -305,9 +307,9 @@ void fib_flush_external(struct net *net);
/* Exported by fib_semantics.c */ /* Exported by fib_semantics.c */
int ip_fib_check_default(__be32 gw, struct net_device *dev); int ip_fib_check_default(__be32 gw, struct net_device *dev);
int fib_sync_down_dev(struct net_device *dev, int force); int fib_sync_down_dev(struct net_device *dev, unsigned long event);
int fib_sync_down_addr(struct net *net, __be32 local); int fib_sync_down_addr(struct net *net, __be32 local);
int fib_sync_up(struct net_device *dev); int fib_sync_up(struct net_device *dev, unsigned int nh_flags);
void fib_select_multipath(struct fib_result *res); void fib_select_multipath(struct fib_result *res);
/* Exported by fib_trie.c */ /* Exported by fib_trie.c */
......
...@@ -164,6 +164,7 @@ enum ...@@ -164,6 +164,7 @@ enum
IPV4_DEVCONF_ROUTE_LOCALNET, IPV4_DEVCONF_ROUTE_LOCALNET,
IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL, IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL,
IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL, IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL,
IPV4_DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
__IPV4_DEVCONF_MAX __IPV4_DEVCONF_MAX
}; };
......
...@@ -338,6 +338,9 @@ struct rtnexthop { ...@@ -338,6 +338,9 @@ struct rtnexthop {
#define RTNH_F_PERVASIVE 2 /* Do recursive gateway lookup */ #define RTNH_F_PERVASIVE 2 /* Do recursive gateway lookup */
#define RTNH_F_ONLINK 4 /* Gateway is forced on link */ #define RTNH_F_ONLINK 4 /* Gateway is forced on link */
#define RTNH_F_OFFLOAD 8 /* offloaded route */ #define RTNH_F_OFFLOAD 8 /* offloaded route */
#define RTNH_F_LINKDOWN 16 /* carrier-down on nexthop */
#define RTNH_COMPARE_MASK (RTNH_F_DEAD | RTNH_F_LINKDOWN)
/* Macros to handle hexthops */ /* Macros to handle hexthops */
......
...@@ -2169,6 +2169,8 @@ static struct devinet_sysctl_table { ...@@ -2169,6 +2169,8 @@ static struct devinet_sysctl_table {
"igmpv2_unsolicited_report_interval"), "igmpv2_unsolicited_report_interval"),
DEVINET_SYSCTL_RW_ENTRY(IGMPV3_UNSOLICITED_REPORT_INTERVAL, DEVINET_SYSCTL_RW_ENTRY(IGMPV3_UNSOLICITED_REPORT_INTERVAL,
"igmpv3_unsolicited_report_interval"), "igmpv3_unsolicited_report_interval"),
DEVINET_SYSCTL_RW_ENTRY(IGNORE_ROUTES_WITH_LINKDOWN,
"ignore_routes_with_linkdown"),
DEVINET_SYSCTL_FLUSHING_ENTRY(NOXFRM, "disable_xfrm"), DEVINET_SYSCTL_FLUSHING_ENTRY(NOXFRM, "disable_xfrm"),
DEVINET_SYSCTL_FLUSHING_ENTRY(NOPOLICY, "disable_policy"), DEVINET_SYSCTL_FLUSHING_ENTRY(NOPOLICY, "disable_policy"),
......
...@@ -280,7 +280,7 @@ __be32 fib_compute_spec_dst(struct sk_buff *skb) ...@@ -280,7 +280,7 @@ __be32 fib_compute_spec_dst(struct sk_buff *skb)
fl4.flowi4_tos = RT_TOS(ip_hdr(skb)->tos); fl4.flowi4_tos = RT_TOS(ip_hdr(skb)->tos);
fl4.flowi4_scope = scope; fl4.flowi4_scope = scope;
fl4.flowi4_mark = IN_DEV_SRC_VMARK(in_dev) ? skb->mark : 0; fl4.flowi4_mark = IN_DEV_SRC_VMARK(in_dev) ? skb->mark : 0;
if (!fib_lookup(net, &fl4, &res)) if (!fib_lookup(net, &fl4, &res, 0))
return FIB_RES_PREFSRC(net, res); return FIB_RES_PREFSRC(net, res);
} else { } else {
scope = RT_SCOPE_LINK; scope = RT_SCOPE_LINK;
...@@ -319,7 +319,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst, ...@@ -319,7 +319,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
fl4.flowi4_mark = IN_DEV_SRC_VMARK(idev) ? skb->mark : 0; fl4.flowi4_mark = IN_DEV_SRC_VMARK(idev) ? skb->mark : 0;
net = dev_net(dev); net = dev_net(dev);
if (fib_lookup(net, &fl4, &res)) if (fib_lookup(net, &fl4, &res, 0))
goto last_resort; goto last_resort;
if (res.type != RTN_UNICAST && if (res.type != RTN_UNICAST &&
(res.type != RTN_LOCAL || !IN_DEV_ACCEPT_LOCAL(idev))) (res.type != RTN_LOCAL || !IN_DEV_ACCEPT_LOCAL(idev)))
...@@ -354,7 +354,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst, ...@@ -354,7 +354,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
fl4.flowi4_oif = dev->ifindex; fl4.flowi4_oif = dev->ifindex;
ret = 0; ret = 0;
if (fib_lookup(net, &fl4, &res) == 0) { if (fib_lookup(net, &fl4, &res, FIB_LOOKUP_IGNORE_LINKSTATE) == 0) {
if (res.type == RTN_UNICAST) if (res.type == RTN_UNICAST)
ret = FIB_RES_NH(res).nh_scope >= RT_SCOPE_HOST; ret = FIB_RES_NH(res).nh_scope >= RT_SCOPE_HOST;
} }
...@@ -1063,9 +1063,9 @@ static void nl_fib_lookup_exit(struct net *net) ...@@ -1063,9 +1063,9 @@ static void nl_fib_lookup_exit(struct net *net)
net->ipv4.fibnl = NULL; net->ipv4.fibnl = NULL;
} }
static void fib_disable_ip(struct net_device *dev, int force) static void fib_disable_ip(struct net_device *dev, unsigned long event)
{ {
if (fib_sync_down_dev(dev, force)) if (fib_sync_down_dev(dev, event))
fib_flush(dev_net(dev)); fib_flush(dev_net(dev));
rt_cache_flush(dev_net(dev)); rt_cache_flush(dev_net(dev));
arp_ifdown(dev); arp_ifdown(dev);
...@@ -1081,7 +1081,7 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event, ...@@ -1081,7 +1081,7 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event,
case NETDEV_UP: case NETDEV_UP:
fib_add_ifaddr(ifa); fib_add_ifaddr(ifa);
#ifdef CONFIG_IP_ROUTE_MULTIPATH #ifdef CONFIG_IP_ROUTE_MULTIPATH
fib_sync_up(dev); fib_sync_up(dev, RTNH_F_DEAD);
#endif #endif
atomic_inc(&net->ipv4.dev_addr_genid); atomic_inc(&net->ipv4.dev_addr_genid);
rt_cache_flush(dev_net(dev)); rt_cache_flush(dev_net(dev));
...@@ -1093,7 +1093,7 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event, ...@@ -1093,7 +1093,7 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event,
/* Last address was deleted from this interface. /* Last address was deleted from this interface.
* Disable IP. * Disable IP.
*/ */
fib_disable_ip(dev, 1); fib_disable_ip(dev, event);
} else { } else {
rt_cache_flush(dev_net(dev)); rt_cache_flush(dev_net(dev));
} }
...@@ -1107,9 +1107,10 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo ...@@ -1107,9 +1107,10 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
struct net_device *dev = netdev_notifier_info_to_dev(ptr); struct net_device *dev = netdev_notifier_info_to_dev(ptr);
struct in_device *in_dev; struct in_device *in_dev;
struct net *net = dev_net(dev); struct net *net = dev_net(dev);
unsigned int flags;
if (event == NETDEV_UNREGISTER) { if (event == NETDEV_UNREGISTER) {
fib_disable_ip(dev, 2); fib_disable_ip(dev, event);
rt_flush_dev(dev); rt_flush_dev(dev);
return NOTIFY_DONE; return NOTIFY_DONE;
} }
...@@ -1124,16 +1125,22 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo ...@@ -1124,16 +1125,22 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
fib_add_ifaddr(ifa); fib_add_ifaddr(ifa);
} endfor_ifa(in_dev); } endfor_ifa(in_dev);
#ifdef CONFIG_IP_ROUTE_MULTIPATH #ifdef CONFIG_IP_ROUTE_MULTIPATH
fib_sync_up(dev); fib_sync_up(dev, RTNH_F_DEAD);
#endif #endif
atomic_inc(&net->ipv4.dev_addr_genid); atomic_inc(&net->ipv4.dev_addr_genid);
rt_cache_flush(net); rt_cache_flush(net);
break; break;
case NETDEV_DOWN: case NETDEV_DOWN:
fib_disable_ip(dev, 0); fib_disable_ip(dev, event);
break; break;
case NETDEV_CHANGEMTU:
case NETDEV_CHANGE: case NETDEV_CHANGE:
flags = dev_get_flags(dev);
if (flags & (IFF_RUNNING | IFF_LOWER_UP))
fib_sync_up(dev, RTNH_F_LINKDOWN);
else
fib_sync_down_dev(dev, event);
/* fall through */
case NETDEV_CHANGEMTU:
rt_cache_flush(net); rt_cache_flush(net);
break; break;
} }
......
...@@ -47,11 +47,12 @@ struct fib4_rule { ...@@ -47,11 +47,12 @@ struct fib4_rule {
#endif #endif
}; };
int __fib_lookup(struct net *net, struct flowi4 *flp, struct fib_result *res) int __fib_lookup(struct net *net, struct flowi4 *flp,
struct fib_result *res, unsigned int flags)
{ {
struct fib_lookup_arg arg = { struct fib_lookup_arg arg = {
.result = res, .result = res,
.flags = FIB_LOOKUP_NOREF, .flags = flags,
}; };
int err; int err;
......
...@@ -266,7 +266,7 @@ static inline int nh_comp(const struct fib_info *fi, const struct fib_info *ofi) ...@@ -266,7 +266,7 @@ static inline int nh_comp(const struct fib_info *fi, const struct fib_info *ofi)
#ifdef CONFIG_IP_ROUTE_CLASSID #ifdef CONFIG_IP_ROUTE_CLASSID
nh->nh_tclassid != onh->nh_tclassid || nh->nh_tclassid != onh->nh_tclassid ||
#endif #endif
((nh->nh_flags ^ onh->nh_flags) & ~RTNH_F_DEAD)) ((nh->nh_flags ^ onh->nh_flags) & ~RTNH_COMPARE_MASK))
return -1; return -1;
onh++; onh++;
} endfor_nexthops(fi); } endfor_nexthops(fi);
...@@ -318,7 +318,7 @@ static struct fib_info *fib_find_info(const struct fib_info *nfi) ...@@ -318,7 +318,7 @@ static struct fib_info *fib_find_info(const struct fib_info *nfi)
nfi->fib_type == fi->fib_type && nfi->fib_type == fi->fib_type &&
memcmp(nfi->fib_metrics, fi->fib_metrics, memcmp(nfi->fib_metrics, fi->fib_metrics,
sizeof(u32) * RTAX_MAX) == 0 && sizeof(u32) * RTAX_MAX) == 0 &&
((nfi->fib_flags ^ fi->fib_flags) & ~RTNH_F_DEAD) == 0 && !((nfi->fib_flags ^ fi->fib_flags) & ~RTNH_COMPARE_MASK) &&
(nfi->fib_nhs == 0 || nh_comp(fi, nfi) == 0)) (nfi->fib_nhs == 0 || nh_comp(fi, nfi) == 0))
return fi; return fi;
} }
...@@ -604,6 +604,8 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi, ...@@ -604,6 +604,8 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi,
return -ENODEV; return -ENODEV;
if (!(dev->flags & IFF_UP)) if (!(dev->flags & IFF_UP))
return -ENETDOWN; return -ENETDOWN;
if (!netif_carrier_ok(dev))
nh->nh_flags |= RTNH_F_LINKDOWN;
nh->nh_dev = dev; nh->nh_dev = dev;
dev_hold(dev); dev_hold(dev);
nh->nh_scope = RT_SCOPE_LINK; nh->nh_scope = RT_SCOPE_LINK;
...@@ -621,7 +623,8 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi, ...@@ -621,7 +623,8 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi,
/* It is not necessary, but requires a bit of thinking */ /* It is not necessary, but requires a bit of thinking */
if (fl4.flowi4_scope < RT_SCOPE_LINK) if (fl4.flowi4_scope < RT_SCOPE_LINK)
fl4.flowi4_scope = RT_SCOPE_LINK; fl4.flowi4_scope = RT_SCOPE_LINK;
err = fib_lookup(net, &fl4, &res); err = fib_lookup(net, &fl4, &res,
FIB_LOOKUP_IGNORE_LINKSTATE);
if (err) { if (err) {
rcu_read_unlock(); rcu_read_unlock();
return err; return err;
...@@ -636,6 +639,8 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi, ...@@ -636,6 +639,8 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi,
if (!dev) if (!dev)
goto out; goto out;
dev_hold(dev); dev_hold(dev);
if (!netif_carrier_ok(dev))
nh->nh_flags |= RTNH_F_LINKDOWN;
err = (dev->flags & IFF_UP) ? 0 : -ENETDOWN; err = (dev->flags & IFF_UP) ? 0 : -ENETDOWN;
} else { } else {
struct in_device *in_dev; struct in_device *in_dev;
...@@ -654,6 +659,8 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi, ...@@ -654,6 +659,8 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi,
nh->nh_dev = in_dev->dev; nh->nh_dev = in_dev->dev;
dev_hold(nh->nh_dev); dev_hold(nh->nh_dev);
nh->nh_scope = RT_SCOPE_HOST; nh->nh_scope = RT_SCOPE_HOST;
if (!netif_carrier_ok(nh->nh_dev))
nh->nh_flags |= RTNH_F_LINKDOWN;
err = 0; err = 0;
} }
out: out:
...@@ -920,11 +927,17 @@ struct fib_info *fib_create_info(struct fib_config *cfg) ...@@ -920,11 +927,17 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
if (!nh->nh_dev) if (!nh->nh_dev)
goto failure; goto failure;
} else { } else {
int linkdown = 0;
change_nexthops(fi) { change_nexthops(fi) {
err = fib_check_nh(cfg, fi, nexthop_nh); err = fib_check_nh(cfg, fi, nexthop_nh);
if (err != 0) if (err != 0)
goto failure; goto failure;
if (nexthop_nh->nh_flags & RTNH_F_LINKDOWN)
linkdown++;
} endfor_nexthops(fi) } endfor_nexthops(fi)
if (linkdown == fi->fib_nhs)
fi->fib_flags |= RTNH_F_LINKDOWN;
} }
if (fi->fib_prefsrc) { if (fi->fib_prefsrc) {
...@@ -1023,12 +1036,20 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event, ...@@ -1023,12 +1036,20 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event,
nla_put_in_addr(skb, RTA_PREFSRC, fi->fib_prefsrc)) nla_put_in_addr(skb, RTA_PREFSRC, fi->fib_prefsrc))
goto nla_put_failure; goto nla_put_failure;
if (fi->fib_nhs == 1) { if (fi->fib_nhs == 1) {
struct in_device *in_dev;
if (fi->fib_nh->nh_gw && if (fi->fib_nh->nh_gw &&
nla_put_in_addr(skb, RTA_GATEWAY, fi->fib_nh->nh_gw)) nla_put_in_addr(skb, RTA_GATEWAY, fi->fib_nh->nh_gw))
goto nla_put_failure; goto nla_put_failure;
if (fi->fib_nh->nh_oif && if (fi->fib_nh->nh_oif &&
nla_put_u32(skb, RTA_OIF, fi->fib_nh->nh_oif)) nla_put_u32(skb, RTA_OIF, fi->fib_nh->nh_oif))
goto nla_put_failure; goto nla_put_failure;
if (fi->fib_nh->nh_flags & RTNH_F_LINKDOWN) {
in_dev = __in_dev_get_rcu(fi->fib_nh->nh_dev);
if (in_dev &&
IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev))
rtm->rtm_flags |= RTNH_F_DEAD;
}
#ifdef CONFIG_IP_ROUTE_CLASSID #ifdef CONFIG_IP_ROUTE_CLASSID
if (fi->fib_nh[0].nh_tclassid && if (fi->fib_nh[0].nh_tclassid &&
nla_put_u32(skb, RTA_FLOW, fi->fib_nh[0].nh_tclassid)) nla_put_u32(skb, RTA_FLOW, fi->fib_nh[0].nh_tclassid))
...@@ -1045,11 +1066,19 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event, ...@@ -1045,11 +1066,19 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event,
goto nla_put_failure; goto nla_put_failure;
for_nexthops(fi) { for_nexthops(fi) {
struct in_device *in_dev;
rtnh = nla_reserve_nohdr(skb, sizeof(*rtnh)); rtnh = nla_reserve_nohdr(skb, sizeof(*rtnh));
if (!rtnh) if (!rtnh)
goto nla_put_failure; goto nla_put_failure;
rtnh->rtnh_flags = nh->nh_flags & 0xFF; rtnh->rtnh_flags = nh->nh_flags & 0xFF;
if (nh->nh_flags & RTNH_F_LINKDOWN) {
in_dev = __in_dev_get_rcu(nh->nh_dev);
if (in_dev &&
IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev))
rtnh->rtnh_flags |= RTNH_F_DEAD;
}
rtnh->rtnh_hops = nh->nh_weight - 1; rtnh->rtnh_hops = nh->nh_weight - 1;
rtnh->rtnh_ifindex = nh->nh_oif; rtnh->rtnh_ifindex = nh->nh_oif;
...@@ -1103,7 +1132,7 @@ int fib_sync_down_addr(struct net *net, __be32 local) ...@@ -1103,7 +1132,7 @@ int fib_sync_down_addr(struct net *net, __be32 local)
return ret; return ret;
} }
int fib_sync_down_dev(struct net_device *dev, int force) int fib_sync_down_dev(struct net_device *dev, unsigned long event)
{ {
int ret = 0; int ret = 0;
int scope = RT_SCOPE_NOWHERE; int scope = RT_SCOPE_NOWHERE;
...@@ -1112,7 +1141,8 @@ int fib_sync_down_dev(struct net_device *dev, int force) ...@@ -1112,7 +1141,8 @@ int fib_sync_down_dev(struct net_device *dev, int force)
struct hlist_head *head = &fib_info_devhash[hash]; struct hlist_head *head = &fib_info_devhash[hash];
struct fib_nh *nh; struct fib_nh *nh;
if (force) if (event == NETDEV_UNREGISTER ||
event == NETDEV_DOWN)
scope = -1; scope = -1;
hlist_for_each_entry(nh, head, nh_hash) { hlist_for_each_entry(nh, head, nh_hash) {
...@@ -1129,7 +1159,15 @@ int fib_sync_down_dev(struct net_device *dev, int force) ...@@ -1129,7 +1159,15 @@ int fib_sync_down_dev(struct net_device *dev, int force)
dead++; dead++;
else if (nexthop_nh->nh_dev == dev && else if (nexthop_nh->nh_dev == dev &&
nexthop_nh->nh_scope != scope) { nexthop_nh->nh_scope != scope) {
nexthop_nh->nh_flags |= RTNH_F_DEAD; switch (event) {
case NETDEV_DOWN:
case NETDEV_UNREGISTER:
nexthop_nh->nh_flags |= RTNH_F_DEAD;
/* fall through */
case NETDEV_CHANGE:
nexthop_nh->nh_flags |= RTNH_F_LINKDOWN;
break;
}
#ifdef CONFIG_IP_ROUTE_MULTIPATH #ifdef CONFIG_IP_ROUTE_MULTIPATH
spin_lock_bh(&fib_multipath_lock); spin_lock_bh(&fib_multipath_lock);
fi->fib_power -= nexthop_nh->nh_power; fi->fib_power -= nexthop_nh->nh_power;
...@@ -1139,14 +1177,23 @@ int fib_sync_down_dev(struct net_device *dev, int force) ...@@ -1139,14 +1177,23 @@ int fib_sync_down_dev(struct net_device *dev, int force)
dead++; dead++;
} }
#ifdef CONFIG_IP_ROUTE_MULTIPATH #ifdef CONFIG_IP_ROUTE_MULTIPATH
if (force > 1 && nexthop_nh->nh_dev == dev) { if (event == NETDEV_UNREGISTER &&
nexthop_nh->nh_dev == dev) {
dead = fi->fib_nhs; dead = fi->fib_nhs;
break; break;
} }
#endif #endif
} endfor_nexthops(fi) } endfor_nexthops(fi)
if (dead == fi->fib_nhs) { if (dead == fi->fib_nhs) {
fi->fib_flags |= RTNH_F_DEAD; switch (event) {
case NETDEV_DOWN:
case NETDEV_UNREGISTER:
fi->fib_flags |= RTNH_F_DEAD;
/* fall through */
case NETDEV_CHANGE:
fi->fib_flags |= RTNH_F_LINKDOWN;
break;
}
ret++; ret++;
} }
} }
...@@ -1210,13 +1257,11 @@ void fib_select_default(struct fib_result *res) ...@@ -1210,13 +1257,11 @@ void fib_select_default(struct fib_result *res)
return; return;
} }
#ifdef CONFIG_IP_ROUTE_MULTIPATH
/* /*
* Dead device goes up. We wake up dead nexthops. * Dead device goes up. We wake up dead nexthops.
* It takes sense only on multipath routes. * It takes sense only on multipath routes.
*/ */
int fib_sync_up(struct net_device *dev) int fib_sync_up(struct net_device *dev, unsigned int nh_flags)
{ {
struct fib_info *prev_fi; struct fib_info *prev_fi;
unsigned int hash; unsigned int hash;
...@@ -1243,7 +1288,7 @@ int fib_sync_up(struct net_device *dev) ...@@ -1243,7 +1288,7 @@ int fib_sync_up(struct net_device *dev)
prev_fi = fi; prev_fi = fi;
alive = 0; alive = 0;
change_nexthops(fi) { change_nexthops(fi) {
if (!(nexthop_nh->nh_flags & RTNH_F_DEAD)) { if (!(nexthop_nh->nh_flags & nh_flags)) {
alive++; alive++;
continue; continue;
} }
...@@ -1254,14 +1299,18 @@ int fib_sync_up(struct net_device *dev) ...@@ -1254,14 +1299,18 @@ int fib_sync_up(struct net_device *dev)
!__in_dev_get_rtnl(dev)) !__in_dev_get_rtnl(dev))
continue; continue;
alive++; alive++;
#ifdef CONFIG_IP_ROUTE_MULTIPATH
spin_lock_bh(&fib_multipath_lock); spin_lock_bh(&fib_multipath_lock);
nexthop_nh->nh_power = 0; nexthop_nh->nh_power = 0;
nexthop_nh->nh_flags &= ~RTNH_F_DEAD; nexthop_nh->nh_flags &= ~nh_flags;
spin_unlock_bh(&fib_multipath_lock); spin_unlock_bh(&fib_multipath_lock);
#else
nexthop_nh->nh_flags &= ~nh_flags;
#endif
} endfor_nexthops(fi) } endfor_nexthops(fi)
if (alive > 0) { if (alive > 0) {
fi->fib_flags &= ~RTNH_F_DEAD; fi->fib_flags &= ~nh_flags;
ret++; ret++;
} }
} }
...@@ -1269,6 +1318,8 @@ int fib_sync_up(struct net_device *dev) ...@@ -1269,6 +1318,8 @@ int fib_sync_up(struct net_device *dev)
return ret; return ret;
} }
#ifdef CONFIG_IP_ROUTE_MULTIPATH
/* /*
* The algorithm is suboptimal, but it provides really * The algorithm is suboptimal, but it provides really
* fair weighted route distribution. * fair weighted route distribution.
...@@ -1276,16 +1327,22 @@ int fib_sync_up(struct net_device *dev) ...@@ -1276,16 +1327,22 @@ int fib_sync_up(struct net_device *dev)
void fib_select_multipath(struct fib_result *res) void fib_select_multipath(struct fib_result *res)
{ {
struct fib_info *fi = res->fi; struct fib_info *fi = res->fi;
struct in_device *in_dev;
int w; int w;
spin_lock_bh(&fib_multipath_lock); spin_lock_bh(&fib_multipath_lock);
if (fi->fib_power <= 0) { if (fi->fib_power <= 0) {
int power = 0; int power = 0;
change_nexthops(fi) { change_nexthops(fi) {
if (!(nexthop_nh->nh_flags & RTNH_F_DEAD)) { in_dev = __in_dev_get_rcu(nexthop_nh->nh_dev);
power += nexthop_nh->nh_weight; if (nexthop_nh->nh_flags & RTNH_F_DEAD)
nexthop_nh->nh_power = nexthop_nh->nh_weight; continue;
} if (in_dev &&
IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
nexthop_nh->nh_flags & RTNH_F_LINKDOWN)
continue;
power += nexthop_nh->nh_weight;
nexthop_nh->nh_power = nexthop_nh->nh_weight;
} endfor_nexthops(fi); } endfor_nexthops(fi);
fi->fib_power = power; fi->fib_power = power;
if (power <= 0) { if (power <= 0) {
......
...@@ -1412,9 +1412,15 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp, ...@@ -1412,9 +1412,15 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
continue; continue;
for (nhsel = 0; nhsel < fi->fib_nhs; nhsel++) { for (nhsel = 0; nhsel < fi->fib_nhs; nhsel++) {
const struct fib_nh *nh = &fi->fib_nh[nhsel]; const struct fib_nh *nh = &fi->fib_nh[nhsel];
struct in_device *in_dev = __in_dev_get_rcu(nh->nh_dev);
if (nh->nh_flags & RTNH_F_DEAD) if (nh->nh_flags & RTNH_F_DEAD)
continue; continue;
if (in_dev &&
IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
nh->nh_flags & RTNH_F_LINKDOWN &&
!(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE))
continue;
if (flp->flowi4_oif && flp->flowi4_oif != nh->nh_oif) if (flp->flowi4_oif && flp->flowi4_oif != nh->nh_oif)
continue; continue;
......
...@@ -40,7 +40,7 @@ static bool rpfilter_lookup_reverse(struct flowi4 *fl4, ...@@ -40,7 +40,7 @@ static bool rpfilter_lookup_reverse(struct flowi4 *fl4,
struct net *net = dev_net(dev); struct net *net = dev_net(dev);
int ret __maybe_unused; int ret __maybe_unused;
if (fib_lookup(net, fl4, &res)) if (fib_lookup(net, fl4, &res, FIB_LOOKUP_IGNORE_LINKSTATE))
return false; return false;
if (res.type != RTN_UNICAST) { if (res.type != RTN_UNICAST) {
......
...@@ -747,7 +747,7 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow ...@@ -747,7 +747,7 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
if (!(n->nud_state & NUD_VALID)) { if (!(n->nud_state & NUD_VALID)) {
neigh_event_send(n, NULL); neigh_event_send(n, NULL);
} else { } else {
if (fib_lookup(net, fl4, &res) == 0) { if (fib_lookup(net, fl4, &res, 0) == 0) {
struct fib_nh *nh = &FIB_RES_NH(res); struct fib_nh *nh = &FIB_RES_NH(res);
update_or_create_fnhe(nh, fl4->daddr, new_gw, update_or_create_fnhe(nh, fl4->daddr, new_gw,
...@@ -975,7 +975,7 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu) ...@@ -975,7 +975,7 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
return; return;
rcu_read_lock(); rcu_read_lock();
if (fib_lookup(dev_net(dst->dev), fl4, &res) == 0) { if (fib_lookup(dev_net(dst->dev), fl4, &res, 0) == 0) {
struct fib_nh *nh = &FIB_RES_NH(res); struct fib_nh *nh = &FIB_RES_NH(res);
update_or_create_fnhe(nh, fl4->daddr, 0, mtu, update_or_create_fnhe(nh, fl4->daddr, 0, mtu,
...@@ -1186,7 +1186,7 @@ void ip_rt_get_source(u8 *addr, struct sk_buff *skb, struct rtable *rt) ...@@ -1186,7 +1186,7 @@ void ip_rt_get_source(u8 *addr, struct sk_buff *skb, struct rtable *rt)
fl4.flowi4_mark = skb->mark; fl4.flowi4_mark = skb->mark;
rcu_read_lock(); rcu_read_lock();
if (fib_lookup(dev_net(rt->dst.dev), &fl4, &res) == 0) if (fib_lookup(dev_net(rt->dst.dev), &fl4, &res, 0) == 0)
src = FIB_RES_PREFSRC(dev_net(rt->dst.dev), res); src = FIB_RES_PREFSRC(dev_net(rt->dst.dev), res);
else else
src = inet_select_addr(rt->dst.dev, src = inet_select_addr(rt->dst.dev,
...@@ -1716,7 +1716,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr, ...@@ -1716,7 +1716,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
fl4.flowi4_scope = RT_SCOPE_UNIVERSE; fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
fl4.daddr = daddr; fl4.daddr = daddr;
fl4.saddr = saddr; fl4.saddr = saddr;
err = fib_lookup(net, &fl4, &res); err = fib_lookup(net, &fl4, &res, 0);
if (err != 0) { if (err != 0) {
if (!IN_DEV_FORWARD(in_dev)) if (!IN_DEV_FORWARD(in_dev))
err = -EHOSTUNREACH; err = -EHOSTUNREACH;
...@@ -2123,7 +2123,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) ...@@ -2123,7 +2123,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
goto make_route; goto make_route;
} }
if (fib_lookup(net, fl4, &res)) { if (fib_lookup(net, fl4, &res, 0)) {
res.fi = NULL; res.fi = NULL;
res.table = NULL; res.table = NULL;
if (fl4->flowi4_oif) { if (fl4->flowi4_oif) {
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment