1. 09 Aug, 2009 12 commits
    • Arnaldo Carvalho de Melo's avatar
      perf report: Add debug help for the finding of symbol bugs - show the symtab... · 94cb9e38
      Arnaldo Carvalho de Melo authored
      perf report: Add debug help for the finding of symbol bugs - show the symtab origin (DSO, build-id, kernel, etc)
      
      Used with perf report --verbose:
      
      [acme@doppio linux-2.6-tip]$ perf report -v | head -16
           5.17%  firefox  /usr/lib64/xulrunner-1.9.1/libxul.so   0x00000000005d8eee f [.] imgContainer::DrawFrameTo(gfxIImageFrame*, gfxIImageFrame*, nsRect&)
           2.56%  firefox  /lib64/libpthread-2.10.1.so            0x0000000000008e02 d [.] __pthread_mutex_lock_internal
           1.94%  firefox  /usr/lib64/xulrunner-1.9.1/libxul.so   0x0000000000d0af8f f [.] SearchTable
           1.75%  firefox  [kernel]                               0xffffffffff60013b k [.] vread_hpet
           1.63%  firefox  /lib64/libpthread-2.10.1.so            0x000000000000a404 d [.] __pthread_mutex_unlock
           1.47%  firefox  /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x00000000000482ea f [.] js_Interpret
           1.42%  firefox  /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x000000000003eda3 f [.] JS_CallTracer
           1.24%  firefox  [kernel]                               0xffffffff8102ca4a k [k] read_hpet
           1.16%  firefox  [kernel]                               0xffffffff810f3dd4 k [k] fget_light
           1.11%  firefox  /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x00000000000567ff f [.] js_TraceObject
           0.98%  firefox  /usr/lib64/firefox-3.5.2/firefox       0x000000000000dd23 b [.] arena_ralloc
      [acme@doppio linux-2.6-tip]$
      
      The new field is just after the symbol address. To help in
      figuring out symbol resolution bugs.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      94cb9e38
    • Peter Zijlstra's avatar
      perf report: Fix per task mult-counter stat reporting · 8f18aec5
      Peter Zijlstra authored
      Brice Goglin reported:
      
      > I can easily sort them by thread id, but I don't know how to match
      > my 4 events with each group of 4 lines.
      
      Also report the counter id and the time running/enabled
      stats (in case the counter got time-shared).
      Reported-by: default avatarBrice Goglin <Brice.Goglin@inria.fr>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: default avatarBrice Goglin <Brice.Goglin@inria.fr>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8f18aec5
    • Peter Zijlstra's avatar
      perf tools: Fix multi-counter stat bug caused by incorrect reading of perf.data file header · 1c222bce
      Peter Zijlstra authored
      Brice Goglin reported that only the first result from a
      multi-counter perf record --stat run is accurate, the
      rest looks bogus.
      
      A silly mistake made us re-read the first attribute for
      every recorded attribute.
      Reported-by: default avatarBrice Goglin <Brice.Goglin@inria.fr>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: default avatarBrice Goglin <Brice.Goglin@inria.fr>
      Cc: paulus@samba.org
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1c222bce
    • Frederic Weisbecker's avatar
      perf tools: Fix call-chain cumul hit based sub-total (fractal mode) · 1953287b
      Frederic Weisbecker authored
      The callchain fractal mode builds each new total hits in a new
      branch of profiling by using the parent's hits of the current
      branch plus the hits of the children.
      
      This is wrong, the total hits of a branch should be made of the
      sum of every children hits, we must ignore the parent hits in
      this scope.
      
      This patch also fixes another mistake with the hit counting.
      
      Now the rates are correct.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1953287b
    • Mike Galbraith's avatar
      perf top: Update man page · 83617983
      Mike Galbraith authored
      perf_counter tools: update perf top manual page to reflect
      current implementation.
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      83617983
    • Mike Galbraith's avatar
      perf top: Improve interactive key handling · 091bd2e9
      Mike Galbraith authored
      Pressing any key which is not currently mapped to
      functionality, based on startup command line options, displays
      currently mapped keys, and prompts for input.
      
      Pressing any unmapped key at the prompt returns the user to
      display mode with variables unchanged.  eg, pressing ? <SPACE>
      <ESC> etc displays currently available keys, the value of the
      variable associated with that key, and prompts.
      
      Pressing same again aborts input.
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      091bd2e9
    • Peter Zijlstra's avatar
      perf_counter: Fix software counters for fast moving event sources · 7b4b6658
      Peter Zijlstra authored
      Reimplement the software counters to deal with fast moving
      event sources (such as tracepoints). This means being able
      to generate multiple overflows from a single 'event' as well
      as support throttling.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      7b4b6658
    • Mike Galbraith's avatar
      perf_counter tools: Allow perf top top users to switch between weighted and... · 46ab9764
      Mike Galbraith authored
      perf_counter tools: Allow perf top top users to switch between weighted and individual counter display
      
      Add [w]eighted hotkey.  Pressing [w] toggles between displaying
      weighted total of all counters, and the counter selected via
      [E]vent select key.
      
      ------------------------------------------------------------------------------
         PerfTop:   90395 irqs/sec  kernel:16.1% [cache-misses/cache-references/instructions],  (all, 4 CPUs)
      ------------------------------------------------------------------------------
      
        weight     samples    pcnt         RIP          kernel function
        ______     _______   _____   ________________   _______________
      
      1275408.6      10881 -  5.3% - ffffffff81146f70 : copy_page_c
       553683.4      43569 - 21.3% - ffffffff81146f20 : clear_page_c
        74075.0       6768 -  3.3% - ffffffff81147190 : copy_user_generic_string
        40602.9       7538 -  3.7% - ffffffff81284ba2 : _spin_lock
        26882.1        965 -  0.5% - ffffffff8109d280 : file_ra_state_init
      
      [w]
      
      ------------------------------------------------------------------------------
         PerfTop:   91221 irqs/sec  kernel:14.5% [10000Hz cache-misses],  (all, 4 CPUs)
      ------------------------------------------------------------------------------
      
        weight     samples    pcnt         RIP          kernel function
        ______     _______   _____   ________________   _______________
      
                  47320.00 - 22.3% - ffffffff81146f20 : clear_page_c
                  14261.00 -  6.7% - ffffffff810992f5 : __rmqueue
                  11046.00 -  5.2% - ffffffff81146f70 : copy_page_c
                   7842.00 -  3.7% - ffffffff81284ba2 : _spin_lock
                   7234.00 -  3.4% - ffffffff810aa1d6 : unmap_vmas
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      46ab9764
    • Mike Galbraith's avatar
      perf_counter tools: Fix/resurrect perf top annotation in a simple interactive form · 923c42c1
      Mike Galbraith authored
      perf top used to have annotation support, but it has bitrotted and
      removed.
      
      This patch restores that: it allows the user to select any symbol
      in kernel space for source level annotation on the fly, switch
      between event counters and alter display variables. When symbol
      details are being displayed, stopping annotation reverts to normal.
      
      known keys:
              [d]     select display delay.
              [e]     select display entries (lines).
              [E]     select annotation event counter.
              [f]     select normal display count filter.
              [F]     select annotation display count filter (percentage).
              [qQ]    quit.
              [s]     select annotation symbol and start annotation.
              [S]     stop annotation, revert to normal display.
              [z]     toggle event count zeroing.
      
      Sample:
      ------------------------------------------------------------------------------
         PerfTop:   16719 irqs/sec  kernel:78.7% [cache-misses/cache-references/instructions/cycles],  (all, 4 CPUs)
      ------------------------------------------------------------------------------
      
      Showing cache-misses for e1000_clean_rx_irq
        Events  Pcnt (>=3%)
             0  0.0%                  /* adjust length to remove Ethernet CRC */
             0  0.0%                  if (!(adapter->flags2 & FLAG2_CRC_STRIPPING))
             0  0.0%                          length -= 4;
           436  5.0%      f039:       41 f6 84 24 5c 29 00    testb  $0x1,0x295c(%r12)
             0  0.0%      f089:       8b 4d 84                mov    -0x7c(%rbp),%ecx
             0  0.0%      f08c:       48 83 ef 02             sub    $0x2,%rdi
             0  0.0%      f090:       48 83 ee 02             sub    $0x2,%rsi
           811  9.3%      f094:       f3 a4                   rep movsb %ds:(%rsi),%es:(%rdi)
             0  0.0%
             0  0.0%          while (rx_desc->status & E1000_RXD_STAT_DD) {
             0  0.0%      f114:       41 f6 47 0c 01          testb  $0x1,0xc(%r15)
          7226 82.6%      f119:       0f 85 24 fe ff ff       jne    ef43 <e1000_clean_rx_irq+0x84>
      
      Available events:
              0 cache-misses
              1 cache-references
              2 instructions
              3 cycles
      Enter details event counter: 2
      ------------------------------------------------------------------------------
         PerfTop:   15035 irqs/sec  kernel:79.0% [cache-misses/cache-references/instructions/cycles],  (all, 4 CPUs)
      ------------------------------------------------------------------------------
      
      Showing instructions for e1000_clean_rx_irq
        Events  Pcnt (>=3%)
             0  0.0%                                 int *work_done, int work_to_do)
             0  0.0%  {
           175  0.9%      eebf:       55                      push   %rbp
          1898  9.8%      eec0:       48 89 e5                mov    %rsp,%rbp
             0  0.0%
             0  0.0%          i = rx_ring->next_to_clean;
           140  0.7%      ef0a:       0f b7 41 1a             movzwl 0x1a(%rcx),%eax
           670  3.4%      ef0e:       89 45 ac                mov    %eax,-0x54(%rbp)
             0  0.0%  {
             0  0.0%          memcpy(skb->data + offset, from, len);
            91  0.5%      f07b:       49 8b b6 e8 00 00 00    mov    0xe8(%r14),%rsi
          1153  5.9%      f082:       48 8b b8 e8 00 00 00    mov    0xe8(%rax),%rdi
            42  0.2%      f089:       8b 4d 84                mov    -0x7c(%rbp),%ecx
            14  0.1%      f08c:       48 83 ef 02             sub    $0x2,%rdi
             0  0.0%      f090:       48 83 ee 02             sub    $0x2,%rsi
          1618  8.3%      f094:       f3 a4                   rep movsb %ds:(%rsi),%es:(%rdi)
             0  0.0%
             0  0.0%                  /* return some buffers to hardware, one at a time is too slow */
             0  0.0%                  if (cleaned_count >= E1000_RX_BUFFER_WRITE) {
           867  4.5%      f0e7:       83 7d b0 0f             cmpl   $0xf,-0x50(%rbp)
             0  0.0%
             0  0.0%          while (rx_desc->status & E1000_RXD_STAT_DD) {
            37  0.2%      f114:       41 f6 47 0c 01          testb  $0x1,0xc(%r15)
          4047 20.8%      f119:       0f 85 24 fe ff ff       jne    ef43 <e1000_clean_rx_irq+0x84>
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      923c42c1
    • Frederic Weisbecker's avatar
      perf_counter: Fix/complete ftrace event records sampling · f413cdb8
      Frederic Weisbecker authored
      This patch implements the kernel side support for ftrace event
      record sampling.
      
      A new counter sampling attribute is added:
      
         PERF_SAMPLE_TP_RECORD
      
      which requests ftrace events record sampling. In this case
      if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
      fires, we emit the tracepoint binary record to the
      perfcounter event buffer, as a sample.
      
      Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
      record:
      
       perf record -f -F 1 -a -e workqueue:workqueue_execution
       perf report -D
      
       0x21e18 [0x48]: event: 9
       .
       . ... raw event: size 72 bytes
       .  0000:  09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff  ......H........
       .  0010:  0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00  ........!......
       .  0020:  2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e  +...........eve
       .  0030:  74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00  ts/1...........
       .  0040:  e0 b1 31 81 ff ff ff ff                          .......
      .
      0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33
      
      The raw ftrace binary record starts at offset 0020.
      
      Translation:
      
       struct trace_entry {
      	type		= 0x2b = 43;
      	flags		= 1;
      	preempt_count	= 2;
      	pid		= 0xa = 10;
      	tgid		= 0xa = 10;
       }
      
       thread_comm = "events/1"
       thread_pid  = 0xa = 10;
       func	    = 0xffffffff8131b1e0 = flush_to_ldisc()
      
      What will come next?
      
       - Userspace support ('perf trace'), 'flight data recorder' mode
         for perf trace, etc.
      
       - The unconditional copy from the profiling callback brings
         some costs however if someone wants no such sampling to
         occur, and needs to be fixed in the future. For that we need
         to have an instant access to the perf counter attribute.
         This is a matter of a flag to add in the struct ftrace_event.
      
       - Take care of the events recursivity! Don't ever try to record
         a lock event for example, it seems some locking is used in
         the profiling fast path and lead to a tracing recursivity.
         That will be fixed using raw spinlock or recursivity
         protection.
      
       - [...]
      
       - Profit! :-)
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f413cdb8
    • Peter Zijlstra's avatar
      perf_counter, ftrace: Fix perf_counter integration · 3a659305
      Peter Zijlstra authored
      Adds possible second part to the assign argument of TP_EVENT().
      
        TP_perf_assign(
      	__perf_count(foo);
      	__perf_addr(bar);
        )
      
      Which, when specified make the swcounter increment with @foo instead
      of the usual 1, and report @bar for PERF_SAMPLE_ADDR (data address
      associated with the event) when this triggers a counter overflow.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3a659305
    • Ingo Molnar's avatar
      Merge branch 'linus' into tracing/urgent · e3560336
      Ingo Molnar authored
      Merge reason: Merge up to almost-rc6 to pick up latest perfcounters
                    (on which we'll queue up a dependent fix)
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e3560336
  2. 08 Aug, 2009 7 commits
  3. 07 Aug, 2009 21 commits