• Yan, Zheng's avatar
    perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold) · 3569c0d7
    Yan, Zheng authored
    PEBS always had the capability to log samples to its buffers without
    an interrupt. Traditionally perf has not used this but always set the
    PEBS threshold to one.
    
    For frequently occurring events (like cycles or branches or load/store)
    this in term requires using a relatively high sampling period to avoid
    overloading the system, by only processing PMIs. This in term increases
    sampling error.
    
    For the common cases we still need to use the PMI because the PEBS
    hardware has various limitations. The biggest one is that it can not
    supply a callgraph. It also requires setting a fixed period, as the
    hardware does not support adaptive period. Another issue is that it
    cannot supply a time stamp and some other options. To supply a TID it
    requires flushing on context switch. It can however supply the IP, the
    load/store address, TSX information, registers, and some other things.
    
    So we can make PEBS work for some specific cases, basically as long as
    you can do without a callgraph and can set the period you can use this
    new PEBS mode.
    
    The main benefit is the ability to support much lower sampling period
    (down to -c 1000) without extensive overhead.
    
    One use cases is for example to increase the resolution of the c2c tool.
    Another is double checking when you suspect the standard sampling has
    too much sampling error.
    
    Some numbers on the overhead, using cycle soak, comparing the elapsed
    time from "kernbench -M -H" between plain (threshold set to one) and
    multi (large threshold).
    
    The test command for plain:
      "perf record --time -e cycles:p -c $period -- kernbench -M -H"
    
    The test command for multi:
      "perf record --no-time -e cycles:p -c $period -- kernbench -M -H"
    
    ( The only difference of test command between multi and plain is time
      stamp options. Since time stamp is not supported by large PEBS
      threshold, it can be used as a flag to indicate if large threshold is
      enabled during the test. )
    
    	period    plain(Sec)  multi(Sec)  Delta
    	10003     32.7        16.5        16.2
    	20003     30.2        16.2        14.0
    	40003     18.6        14.1        4.5
    	80003     16.8        14.6        2.2
    	100003    16.9        14.1        2.8
    	800003    15.4        15.7        -0.3
    	1000003   15.3        15.2        0.2
    	2000003   15.3        15.1        0.1
    
    With periods below 100003, plain (threshold one) cause much more
    overhead. With 10003 sampling period, the Elapsed Time for multi is
    even 2X faster than plain.
    Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
    Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: acme@infradead.org
    Cc: eranian@google.com
    Link: http://lkml.kernel.org/r/1430940834-8964-5-git-send-email-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    3569c0d7
perf_event_intel.c 92.8 KB