• Paul Mackerras's avatar
    perf_counter tools: Reduce perf stat measurement overhead/skew · 051ae7f7
    Paul Mackerras authored
    Vince Weaver reported a 'perf stat' measurement overhead in the
    count of retired instructions, which can amount to a +6000
    instructions inflated count in the reported count.
    
    At present, perf stat creates its counters on the perf process.  Thus
    the counters count the fork and various other activity in both the
    parent and child, such as the resolver overhead for resolving PLT
    entries for any libc functions that haven't been called before, such
    as execvp.
    
    This reduces the overhead by creating the counters on the child process
    after the fork, using a couple of pipes to synchronize so that the
    child process waits until the parent has created the counters before
    doing the exec.  To eliminate the PLT resolution overhead on calling
    execvp, this does a dummy execvp first which will always fail.
    
    With this, the overhead of executing a program goes down from over
    4800 instructions to about 90 instructions on powerpc (32-bit).
    This was measured with a statically-linked program written in
    assembler which only does the 3 instructions needed to call _exit(0).
    
    Before:
    
    $ perf stat -e 0:1:u ./three
    
     Performance counter stats for './three':
    
               4858  instructions
    
        0.001274523  seconds time elapsed
    
    After:
    
    $ perf stat -e 0:1:u ./three
    
     Performance counter stats for './three':
    
                 92  instructions
    
        0.000468153  seconds time elapsed
    Reported-by: default avatarVince Weaver <vince@deater.net>
    Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <19016.41425.814043.870352@cargo.ozlabs.ibm.com>
    Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    051ae7f7
builtin-stat.c 13.5 KB