• Andrii Nakryiko's avatar
    selftests/bpf: Add uprobe triggering overhead benchmarks · d41bc48b
    Andrii Nakryiko authored
    Add benchmark to measure overhead of uprobes and uretprobes. Also have
    a baseline (no uprobe attached) benchmark.
    
    On my dev machine, baseline benchmark can trigger 130M user_target()
    invocations. When uprobe is attached, this falls to just 700K. With
    uretprobe, we get down to 520K:
    
      $ sudo ./bench trig-uprobe-base -a
      Summary: hits  131.289 ± 2.872M/s
    
      # UPROBE
      $ sudo ./bench -a trig-uprobe-without-nop
      Summary: hits    0.729 ± 0.007M/s
    
      $ sudo ./bench -a trig-uprobe-with-nop
      Summary: hits    1.798 ± 0.017M/s
    
      # URETPROBE
      $ sudo ./bench -a trig-uretprobe-without-nop
      Summary: hits    0.508 ± 0.012M/s
    
      $ sudo ./bench -a trig-uretprobe-with-nop
      Summary: hits    0.883 ± 0.008M/s
    
    So there is almost 2.5x performance difference between probing nop vs
    non-nop instruction for entry uprobe. And 1.7x difference for uretprobe.
    
    This means that non-nop uprobe overhead is around 1.4 microseconds for uprobe
    and 2 microseconds for non-nop uretprobe.
    
    For nop variants, uprobe and uretprobe overhead is down to 0.556 and
    1.13 microseconds, respectively.
    
    For comparison, just doing a very low-overhead syscall (with no BPF
    programs attached anywhere) gives:
    
      $ sudo ./bench trig-base -a
      Summary: hits    4.830 ± 0.036M/s
    
    So uprobes are about 2.67x slower than pure context switch.
    Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20211116013041.4072571-1-andrii@kernel.org
    d41bc48b
bench.c 13.8 KB