1. 19 Apr, 2018 39 commits
  2. 18 Apr, 2018 1 commit
    • David S. Miller's avatar
      Merge branch 'ipv6-Separate-data-structures-for-FIB-and-data-path' · 0565de29
      David S. Miller authored
      David Ahern says:
      
      ====================
      net/ipv6: Separate data structures for FIB and data path
      
      IPv6 uses the same data struct for both control plane (FIB entries) and
      data path (dst entries). This struct has elements needed for both paths
      adding memory overhead and complexity (taking a dst hold in most places
      but an additional reference on rt6i_ref in a few). Furthermore, because
      of the dst_alloc tie, all FIB entries are allocated with GFP_ATOMIC.
      
      This patch set separates FIB entries from dst entries, better aligning
      IPv6 code with IPv4, simplifying the reference counting and allowing
      FIB entries added by userspace (not autoconf) to use GFP_KERNEL. It is
      first step to a number of performance and scalability changes.
      
      The end result of this patch set:
        - FIB entries (fib6_info):
              /* size: 208, cachelines: 4, members: 25 */
              /* sum members: 207, holes: 1, sum holes: 1 */
      
        - dst entries (rt6_info)
             /* size: 240, cachelines: 4, members: 11 */
      
      Versus the the single rt6_info struct today for both paths:
            /* size: 320, cachelines: 5, members: 28 */
      
      This amounts to a 35% reduction in memory use for FIB entries and a
      25% reduction for dst entries.
      
      With respect to locking FIB entries use RCU and a single atomic
      counter with fib6_info_hold and fib6_info_release helpers to manage
      the reference counting. dst entries use only the traditional dst
      refcounts with dst_hold and dst_release.
      
      FIB entries for host routes are referenced by inet6_ifaddr and
      ifacaddr6. In both cases, additional holds are taken -- similar to
      what is done for devices.
      
      This set is the first of many changes to improve the scalability of the
      IPv6 code. Follow on changes include:
      - consolidating duplicate fib6_info references like IPv4 does with
        duplicate fib_info
      
      - moving fib6_info into a slab cache to avoid allocation roundups to
        power of 2 (the 208 size becomes a 256 actual allocation)
      
      - Allow FIB lookups without generating a dst (e.g., most rt6_lookup
        users just want to verify the egress device). Means moving dst
        allocation to the other side of fib6_rule_lookup which again aligns
        with IPv4 behavior
      
      - using separate standalone nexthop objects which have performance
        benefits beyond fib_info consolidation
      
      At this point I am not seeing any refcount leaks or underflows, no
      oops or bug_ons, or warnings from kasan, so I think it is ready for
      others to beat up on it finding errors in code paths I have missed.
      
      v2 changes
      - rebased to top of tree
      - improved commit message on patch 7
      
      v1 changes
      - rebased to top of tree
      - fix memory leak of metrics as noted by Ido
      - MTU fixes based on pmtu tests (thanks Stefano Brivio for writing)
      
      RFC v2 changes
      - improved commit messages
      - move common metrics code from dst.c to net/ipv4/metrics.c (comment
        from DaveM)
      - address comments from Wei Wang and Martin KaFai Lau (let me know if
        I missed something)
      - fixes detected by kernel test robots
        + added fib6_metric_set to change metric on a FIB entry which could
          be pointing to read-only dst_default_metrics
        + 0day testing found a problem with an intermediate patch; added
          dst_hold_safe on rt->from. Code is removed 3 patches later
      - allow cacheinfo to handle NULL dst; means only expires is pushed to
        userspace
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0565de29