• Eric Dumazet's avatar
    net: __alloc_skb() speedup · ec7d2f2c
    Eric Dumazet authored
    With following patch I can reach maximum rate of my pktgen+udpsink
    simulator :
    - 'old' machine : dual quad core E5450  @3.00GHz
    - 64 UDP rx flows (only differ by destination port)
    - RPS enabled, NIC interrupts serviced on cpu0
    - rps dispatched on 7 other cores. (~130.000 IPI per second)
    - SLAB allocator (faster than SLUB in this workload)
    - tg3 NIC
    - 1.080.000 pps without a single drop at NIC level.
    
    Idea is to add two prefetchw() calls in __alloc_skb(), one to prefetch
    first sk_buff cache line, the second to prefetch the shinfo part.
    
    Also using one memset() to initialize all skb_shared_info fields instead
    of one by one to reduce number of instructions, using long word moves.
    
    All skb_shared_info fields before 'dataref' are cleared in 
    __alloc_skb().
    Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    ec7d2f2c
skbuff.c 74.1 KB