• Eric Dumazet's avatar
    net: percpu net_device refcount · 29b4433d
    Eric Dumazet authored
    We tried very hard to remove all possible dev_hold()/dev_put() pairs in
    network stack, using RCU conversions.
    
    There is still an unavoidable device refcount change for every dst we
    create/destroy, and this can slow down some workloads (routers or some
    app servers, mmap af_packet)
    
    We can switch to a percpu refcount implementation, now dynamic per_cpu
    infrastructure is mature. On a 64 cpus machine, this consumes 256 bytes
    per device.
    
    On x86, dev_hold(dev) code :
    
    before
            lock    incl 0x280(%ebx)
    after:
            movl    0x260(%ebx),%eax
            incl    fs:(%eax)
    
    Stress bench :
    
    (Sending 160.000.000 UDP frames,
    IP route cache disabled, dual E5540 @2.53GHz,
    32bit kernel, FIB_TRIE)
    
    Before:
    
    real    1m1.662s
    user    0m14.373s
    sys     12m55.960s
    
    After:
    
    real    0m51.179s
    user    0m15.329s
    sys     10m15.942s
    Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    29b4433d
dev.c 150 KB