• Daniel Borkmann's avatar
    net: tcp: add key management to congestion control · c5c6a8ab
    Daniel Borkmann authored
    This patch adds necessary infrastructure to the congestion control
    framework for later per route congestion control support.
    
    For a per route congestion control possibility, our aim is to store
    a unique u32 key identifier into dst metrics, which can then be
    mapped into a tcp_congestion_ops struct. We argue that having a
    RTAX key entry is the most simple, generic and easy way to manage,
    and also keeps the memory footprint of dst entries lower on 64 bit
    than with storing a pointer directly, for example. Having a unique
    key id also allows for decoupling actual TCP congestion control
    module management from the FIB layer, i.e. we don't have to care
    about expensive module refcounting inside the FIB at this point.
    
    We first thought of using an IDR store for the realization, which
    takes over dynamic assignment of unused key space and also performs
    the key to pointer mapping in RCU. While doing so, we stumbled upon
    the issue that due to the nature of dynamic key distribution, it
    just so happens, arguably in very rare occasions, that excessive
    module loads and unloads can lead to a possible reuse of previously
    used key space. Thus, previously stale keys in the dst metric are
    now being reassigned to a different congestion control algorithm,
    which might lead to unexpected behaviour. One way to resolve this
    would have been to walk FIBs on the actually rare occasion of a
    module unload and reset the metric keys for each FIB in each netns,
    but that's just very costly.
    
    Therefore, we argue a better solution is to reuse the unique
    congestion control algorithm name member and map that into u32 key
    space through jhash. For that, we split the flags attribute (as it
    currently uses 2 bits only anyway) into two u32 attributes, flags
    and key, so that we can keep the cacheline boundary of 2 cachelines
    on x86_64 and cache the precalculated key at registration time for
    the fast path. On average we might expect 2 - 4 modules being loaded
    worst case perhaps 15, so a key collision possibility is extremely
    low, and guaranteed collision-free on LE/BE for all in-tree modules.
    Overall this results in much simpler code, and all without the
    overhead of an IDR. Due to the deterministic nature, modules can
    now be unloaded, the congestion control algorithm for a specific
    but unloaded key will fall back to the default one, and on module
    reload time it will switch back to the expected algorithm
    transparently.
    
    Joint work with Florian Westphal.
    Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
    Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    c5c6a8ab
inet_connection_sock.h 11.1 KB