• Eric Dumazet's avatar
    ipv4: percpu nh_rth_output cache · d26b3a7c
    Eric Dumazet authored
    Input path is mostly run under RCU and doesnt touch dst refcnt
    
    But output path on forwarding or UDP workloads hits
    badly dst refcount, and we have lot of false sharing, for example
    in ipv4_mtu() when reading rt->rt_pmtu
    
    Using a percpu cache for nh_rth_output gives a nice performance
    increase at a small cost.
    
    24 udpflood test on my 24 cpu machine (dummy0 output device)
    (each process sends 1.000.000 udp frames, 24 processes are started)
    
    before : 5.24 s
    after : 2.06 s
    For reference, time on linux-3.5 : 6.60 s
    Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
    Tested-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    d26b3a7c
route.c 61.3 KB