1. 12 Jun, 2019 14 commits
  2. 11 Jun, 2019 17 commits
  3. 10 Jun, 2019 9 commits
    • David S. Miller's avatar
      Merge branch 'net-Enable-nexthop-objects-with-IPv4-and-IPv6-routes' · 48debfd7
      David S. Miller authored
      David Ahern says:
      
      ====================
      net: Enable nexthop objects with IPv4 and IPv6 routes
      
      This is the final set of the initial nexthop object work. When I
      started this idea almost 2 years ago, it took 18 seconds to inject
      700k+ IPv4 routes with 1 hop and about 28 seconds for 4-paths. Some
      of that time was due to inefficiencies in 'ip', but most of it was
      kernel side with excessive synchronize_rcu calls in ipv4, and redundant
      processing validating a nexthop spec (device, gateway, encap). Worse,
      the time increased dramatically as the number of legs in the routes
      increased; for example, taking over 72 seconds for 16-path routes.
      
      After this set, with increased dirty memory limits (fib_sync_mem sysctl),
      an improved ip and nexthop objects a full internet fib (743,799 routes
      based on a pull in January 2019) can be pushed to the kernel in 4.3
      seconds. Even better, the time to insert is "almost" constant with
      increasing number of paths. The 'almost constant' time is due to
      expanding the nexthop definitions when generating notifications. A
      follow on patch will be sent adding a sysctl that allows an admin to
      avoid the nexthop expansion and truly get constant route insert time
      regardless of the number of paths in a route! (Useful once all programs
      used for a deployment that care about routes understand nexthop objects).
      
      To be clear, 'ip' is used for benchmarking for no other reason than
      'ip -batch' is a trivial to use for the tests. FRR, for example, better
      manages nexthops and route changes and the way those are pushed to the
      kernel and thus will have less userspace processing times than 'ip -batch'.
      
      Patches 1-10 iterate over fib6_nh with a nexthop invoke a processing
      function per fib6_nh. Prior to nexthop objects, a fib6_info referenced
      a single fib6_nh. Multipath routes were added as separate fib6_info for
      each leg of the route and linked as siblings:
      
          f6i -> sibling -> sibling ... -> sibling
           |                                   |
           +--------- multipath route ---------+
      
      With nexthop objects a single fib6_info references an external
      nexthop which may have a series of fib6_nh:
      
           f6i ---> nexthop ---> fib6_nh
                                 ...
                                 fib6_nh
      
      making IPv6 routes similar to IPv4. The side effect is that a single
      fib6_info now indirectly references a series of fib6_nh so the code
      needs to walk each entry and call the local, per-fib6_nh processing
      function.
      
      Patches 11 and 13 wire up use of nexthops with fib entries for IPv4
      and IPv6. With these commits you can actually use nexthops with routes.
      
      Patch 12 is an optimization for IPv4 when using nexthops in the most
      predominant use case (no metrics).
      
      Patches 14 handles replace of a nexthop config.
      
      Patches 15-18 add update pmtu and redirect tests to use both old and
      new routing.
      
      Patches 19 and 20 add new tests for the nexthop infrastructure. The first
      is single nexthop is used by multiple prefixes to communicate with remote
      hosts. This is on top of the functional tests already committed. The
      second verifies multipath selection.
      
      v4
      - changed return to 'goto out' in patch 9 since the rcu_read_lock is
        held (noticed by Wei)
      
      v3
      - removed found arg in patch 7 and changed rt6_nh_remove_exception_rt
        to return 1 when a match is found for an exception
      
      v2
      - changed ++i to i++ in patches 1 and 14 as noticed by DaveM
      - improved commit message for patch 14 (nexthop replace)
      - removed the skip_fib argument to remove_nexthop; vestige of an
        older design
      ====================
      Reviewed-By: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48debfd7
    • David Ahern's avatar
      selftests: Add version of router_multipath.sh using nexthop objects · cab14d10
      David Ahern authored
      Add a version of router_multipath.sh that uses nexthop objects for
      routes.
      
      Ido requested a version that does not cause regressions with mlxsw
      testing since it does not support nexthop objects yet.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cab14d10
    • David Ahern's avatar
      selftests: Add test with multiple prefixes using single nexthop · 735ab2f6
      David Ahern authored
      Add tests where multiple FIB entries use the same nexthop object. Generate
      per-cpu cached routes for each by running ping on each cpu, and then
      generate exceptions unique to each prefix (remote host) with different
      mtus.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      735ab2f6
    • David Ahern's avatar
      selftests: icmp_redirect: Add support for routing via nexthop objects · 622946d9
      David Ahern authored
      Add a second pass to icmp_redirect.sh to use nexthop objects for
      routes.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      622946d9
    • David Ahern's avatar
      selftests: pmtu: Add support for routing via nexthop objects · 438a9a85
      David Ahern authored
      Add routing setup using nexthop objects and repeat tests with
      old and new routing.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      438a9a85
    • David Ahern's avatar
      selftests: pmtu: Move route installs to a new function · f4ca0c34
      David Ahern authored
      Move the route add commands to a new function called setup_routing_old.
      The '_old' refers to the classic way of installing routes.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4ca0c34
    • David Ahern's avatar
      selftests: pmtu: Move running of test into a new function · 243781db
      David Ahern authored
      Move the block of code that runs a test and prints the verdict to a
      new function, run_test.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      243781db
    • David Ahern's avatar
      nexthops: add support for replace · 7bf4796d
      David Ahern authored
      Add support for atomically upating a nexthop config.
      
      When updating a nexthop, walk the lists of associated fib entries and
      verify the new config is valid. Replace is done by swapping nh_info
      for single nexthops - new config is applied to old nexthop struct, and
      old config is moved to new nexthop struct. For nexthop groups the same
      applies but for nh_group. In addition for groups the nh_parent reference
      needs to be updated. The old config is released by calling __remove_nexthop
      on the 'new' nexthop which now has the old config. This is done to avoid
      messing around with the list_heads that track which fib entries are
      using the nexthop.
      
      After the swap of config data, bump the sequence counters for FIB entries
      to invalidate any dst entries and send notifications to userspace. The
      notifications include the new nexthop spec as well as any fib entries
      using the updated nexthop struct.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7bf4796d
    • David Ahern's avatar
      ipv6: Allow routes to use nexthop objects · 5b98324e
      David Ahern authored
      Add support for RTA_NH_ID attribute to allow a user to specify a
      nexthop id to use with a route. fc_nh_id is added to fib6_config to
      hold the value passed in the RTA_NH_ID attribute. If a nexthop id
      is given, the gateway, device, encap and multipath attributes can
      not be set.
      
      Update ip6_route_del to check metric and protocol before nexthop
      specs. If fc_nh_id is set, then it must match the id in the route
      entry. Since IPv6 allows delete of a cached entry (an exception),
      add ip6_del_cached_rt_nh to cycle through all of the fib6_nh in
      a fib entry if it is using a nexthop.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b98324e