• Hou Tao's avatar
    selftests/bpf: Add benchmark for bpf memory allocator · fd283ab1
    Hou Tao authored
    The benchmark could be used to compare the performance of hash map
    operations and the memory usage between different flavors of bpf memory
    allocator (e.g., no bpf ma vs bpf ma vs reuse-after-gp bpf ma). It also
    could be used to check the performance improvement or the memory saving
    provided by optimization.
    
    The benchmark creates a non-preallocated hash map which uses bpf memory
    allocator and shows the operation performance and the memory usage of
    the hash map under different use cases:
    (1) overwrite
    Each CPU overwrites nonoverlapping part of hash map. When each CPU
    completes overwriting of 64 elements in hash map, it increases the
    op_count.
    (2) batch_add_batch_del
    Each CPU adds then deletes nonoverlapping part of hash map in batch.
    When each CPU adds and deletes 64 elements in hash map, it increases
    the op_count twice.
    (3) add_del_on_diff_cpu
    Each two-CPUs pair adds and deletes nonoverlapping part of map
    cooperatively. When each CPU adds or deletes 64 elements in hash map,
    it will increase the op_count.
    
    The following is the benchmark results when comparing between different
    flavors of bpf memory allocator. These tests are conducted on a KVM guest
    with 8 CPUs and 16 GB memory. The command line below is used to do all
    the following benchmarks:
    
      ./bench htab-mem --use-case $name ${OPTS} -w3 -d10 -a -p8
    
    These results show that preallocated hash map has both better performance
    and smaller memory footprint.
    
    (1) non-preallocated + no bpf memory allocator (v6.0.19)
    use kmalloc() + call_rcu
    
    overwrite            per-prod-op: 11.24 ± 0.07k/s, avg mem: 82.64 ± 26.32MiB, peak mem: 119.18MiB
    batch_add_batch_del  per-prod-op: 18.45 ± 0.10k/s, avg mem: 50.47 ± 14.51MiB, peak mem: 94.96MiB
    add_del_on_diff_cpu  per-prod-op: 14.50 ± 0.03k/s, avg mem: 4.64 ± 0.73MiB, peak mem: 7.20MiB
    
    (2) preallocated
    OPTS=--preallocated
    
    overwrite            per-prod-op: 191.42 ± 0.09k/s, avg mem: 1.24 ± 0.00MiB, peak mem: 1.49MiB
    batch_add_batch_del  per-prod-op: 221.83 ± 0.17k/s, avg mem: 1.23 ± 0.00MiB, peak mem: 1.49MiB
    add_del_on_diff_cpu  per-prod-op: 39.66 ± 0.31k/s, avg mem: 1.47 ± 0.13MiB, peak mem: 1.75MiB
    
    (3) normal bpf memory allocator
    
    overwrite            per-prod-op: 126.59 ± 0.02k/s, avg mem: 2.26 ± 0.00MiB, peak mem: 2.74MiB
    batch_add_batch_del  per-prod-op: 83.37 ± 0.20k/s, avg mem: 2.14 ± 0.17MiB, peak mem: 2.74MiB
    add_del_on_diff_cpu  per-prod-op: 21.25 ± 0.24k/s, avg mem: 17.50 ± 3.32MiB, peak mem: 28.87MiB
    Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
    Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
    Link: https://lore.kernel.org/r/20230704025039.938914-1-houtao@huaweicloud.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    fd283ab1
run_bench_htab_mem.sh 857 Bytes