• Eric Dumazet's avatar
    tcp: defer skb freeing after socket lock is released · f35f8219
    Eric Dumazet authored
    tcp recvmsg() (or rx zerocopy) spends a fair amount of time
    freeing skbs after their payload has been consumed.
    
    A typical ~64KB GRO packet has to release ~45 page
    references, eventually going to page allocator
    for each of them.
    
    Currently, this freeing is performed while socket lock
    is held, meaning that there is a high chance that
    BH handler has to queue incoming packets to tcp socket backlog.
    
    This can cause additional latencies, because the user
    thread has to process the backlog at release_sock() time,
    and while doing so, additional frames can be added
    by BH handler.
    
    This patch adds logic to defer these frees after socket
    lock is released, or directly from BH handler if possible.
    
    Being able to free these skbs from BH handler helps a lot,
    because this avoids the usual alloc/free assymetry,
    when BH handler and user thread do not run on same cpu or
    NUMA node.
    
    One cpu can now be fully utilized for the kernel->user copy,
    and another cpu is handling BH processing and skb/page
    allocs/frees (assuming RFS is not forcing use of a single CPU)
    
    Tested:
     100Gbit NIC
     Max throughput for one TCP_STREAM flow, over 10 runs
    
    MTU : 1500
    Before: 55 Gbit
    After:  66 Gbit
    
    MTU : 4096+(headers)
    Before: 82 Gbit
    After:  95 Gbit
    Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    f35f8219
tcp.c 120 KB