• Jesper Dangaard Brouer's avatar
    bpf: cpumap use ptr_ring_consume_batched · 77361825
    Jesper Dangaard Brouer authored
    Move ptr_ring dequeue outside loop, that allocate SKBs and calls network
    stack, as these operations that can take some time. The ptr_ring is a
    communication channel between CPUs, where we want to reduce/limit any
    cacheline bouncing.
    
    Do a concentrated bulk dequeue via ptr_ring_consume_batched, to shorten the
    period and times the remote cacheline in ptr_ring is read
    
    Batch size 8 is both to (1) limit BH-disable period, and (2) consume one
    cacheline on 64-bit archs. After reducing the BH-disable section further
    then we can consider changing this, while still thinking about L1 cacheline
    size being active.
    Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
    Acked-by: default avatarSong Liu <songliubraving@fb.com>
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    77361825
cpumap.c 19.1 KB