1. 29 Jun, 2009 3 commits
    • Paul Mackerras's avatar
      perf_counter tools: Reduce perf stat measurement overhead/skew · 051ae7f7
      Paul Mackerras authored
      Vince Weaver reported a 'perf stat' measurement overhead in the
      count of retired instructions, which can amount to a +6000
      instructions inflated count in the reported count.
      
      At present, perf stat creates its counters on the perf process.  Thus
      the counters count the fork and various other activity in both the
      parent and child, such as the resolver overhead for resolving PLT
      entries for any libc functions that haven't been called before, such
      as execvp.
      
      This reduces the overhead by creating the counters on the child process
      after the fork, using a couple of pipes to synchronize so that the
      child process waits until the parent has created the counters before
      doing the exec.  To eliminate the PLT resolution overhead on calling
      execvp, this does a dummy execvp first which will always fail.
      
      With this, the overhead of executing a program goes down from over
      4800 instructions to about 90 instructions on powerpc (32-bit).
      This was measured with a statically-linked program written in
      assembler which only does the 3 instructions needed to call _exit(0).
      
      Before:
      
      $ perf stat -e 0:1:u ./three
      
       Performance counter stats for './three':
      
                 4858  instructions
      
          0.001274523  seconds time elapsed
      
      After:
      
      $ perf stat -e 0:1:u ./three
      
       Performance counter stats for './three':
      
                   92  instructions
      
          0.000468153  seconds time elapsed
      Reported-by: default avatarVince Weaver <vince@deater.net>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <19016.41425.814043.870352@cargo.ozlabs.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      051ae7f7
    • Ingo Molnar's avatar
      perf stat: Use percentages for scaling output · 210ad39f
      Ingo Molnar authored
      Peter expressed a strong preference for percentage based
      display of scaled values - so revert to that from the
      recently introduced multiplication-factor unit.
      Reported-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      210ad39f
    • Yinghai Lu's avatar
      perf_counter, x86: Update x86_pmu after WARN() · 4078c444
      Yinghai Lu authored
      The print out should read the value before changing the value.
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4A487017.4090007@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4078c444
  2. 28 Jun, 2009 1 commit
  3. 27 Jun, 2009 4 commits
    • Jaswinder Singh Rajput's avatar
      perf stat: Improve output · 6e750a8f
      Jaswinder Singh Rajput authored
      Increase size for event name to handle bigger names like
      'L1-d$-prefetch-misses'
      
      Changed scaled counters from percentage to a multiplicative
      factor because the latter is more expressive.
      
      Also aligned the scaling factor, otherwise sometimes it looks
      like:
      
                  384  iTLB-load-misses           (4.74x scaled)
               452029  branch-loads               (8.00x scaled)
                 5892  branch-load-misses         (20.39x scaled)
               972315  iTLB-loads                 (3.24x scaled)
      
      Before:
               150708  L1-d$-stores          (scaled from 23.57%)
               428804  L1-d$-prefetches      (scaled from 23.47%)
               314446  L1-d$-prefetch-misses  (scaled from 23.42%)
            252626137  L1-i$-loads           (scaled from 23.24%)
              5297550  dTLB-load-misses      (scaled from 23.96%)
            106992392  branch-loads          (scaled from 23.67%)
              5239561  branch-load-misses    (scaled from 23.43%)
      
      After:
              1731713  L1-d$-loads               (  14.25x scaled)
                44241  L1-d$-prefetches          (   3.88x scaled)
                21076  L1-d$-prefetch-misses     (   3.40x scaled)
              5789421  L1-i$-loads               (   3.78x scaled)
                29645  dTLB-load-misses          (   2.95x scaled)
               461474  branch-loads              (   6.52x scaled)
                 7493  branch-load-misses        (  26.57x scaled)
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1246051927.2988.10.camel@hpdv5.satnam>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      6e750a8f
    • Ingo Molnar's avatar
      perf stat: Fix multi-run stats · 566747e6
      Ingo Molnar authored
      In multi-run (-r/--repeat) printouts, print out the noise of
      the wall-clock average as well.
      
      Also, fix a bug in printing out scaled counters: if it was not
      scaled then we should not update the average with -1.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      566747e6
    • Ingo Molnar's avatar
      perf stat: Add -n/--null option to run without counters · 0cfb7a13
      Ingo Molnar authored
      Allow a no-counters run. This can be useful to measure just
      elapsed wall-clock time - or to assess the raw overhead of perf
      stat itself, without running any counters.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0cfb7a13
    • Ingo Molnar's avatar
      perf_counter tools: Remove dead code · fde953c1
      Ingo Molnar authored
      Vince Weaver reported that there's a handful of #ifdef __MINGW32__
      sections in the code.
      
      Remove them as they are in essence dead code - as unlike upstream
      Git, the perf tool is unlikely to be ported to Windows.
      Reported-by: default avatarVince Weaver <vince@deater.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      fde953c1
  4. 26 Jun, 2009 3 commits
    • Peter Zijlstra's avatar
      perf_counter: Complete counter swap · 19d2e755
      Peter Zijlstra authored
      Complete the counter swap by indeed switching the times too and
      updating the userpage after modifying the counter values.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1246014623.31755.195.camel@twins>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      19d2e755
    • Frederic Weisbecker's avatar
      perf report: Print sorted callchains per histogram entries · f55c5552
      Frederic Weisbecker authored
      Use the newly created callchains radix tree to gather the chains stats
      from the recorded events and then print the callchains for all of them,
      sorted by hits, using the "-c" parameter with perf report.
      
      Example:
      
       66.15%  [k] atm_clip_exit
                  63.08%
                      0xffffffffffffff80
                      0xffffffff810196a8
                      0xffffffff810c14c8
                      0xffffffff8101a79c
                      0xffffffff810194f3
                      0xffffffff8106ab7f
                      0xffffffff8106abe5
                      0xffffffff8106acde
                      0xffffffff8100d94b
                      0xffffffff8153e7ea
                      [...]
      
                   1.54%
                      0xffffffffffffff80
                      0xffffffff810196a8
                      0xffffffff810c14c8
                      0xffffffff8101a79c
      		[...]
      
      Symbols are not yet resolved.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1246026481-8314-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f55c5552
    • Frederic Weisbecker's avatar
      perf_counter tools: Prepare a small callchain framework · 8cb76d99
      Frederic Weisbecker authored
      We plan to display the callchains depending on some user-configurable
      parameters.
      
      To gather the callchains stats from the recorded stream in a fast way,
      this patch introduces an ad hoc radix tree adapted for callchains and also
      a rbtree to sort these callchains once we have gathered every events
      from the stream.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1246026481-8314-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8cb76d99
  5. 25 Jun, 2009 14 commits
    • Frederic Weisbecker's avatar
      perf record: Fix unhandled io return value · 3928ddbe
      Frederic Weisbecker authored
      Building latest perfcounter fails on the following error:
      
       builtin-record.c: In function ‘create_counter’:
       builtin-record.c:451: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
       make: *** [builtin-record.o] Erreur 1
      
      Just check if we successfully read the perf file descriptor.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1245961287-5327-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3928ddbe
    • Jaswinder Singh Rajput's avatar
      perf_counter tools: Add alias for 'l1d' and 'l1i' · 4418351f
      Jaswinder Singh Rajput authored
      Add 'l1d' and 'l1i' aliases again as shortcuts - just dont make them
      the primary display alias.
      Signed-off-by: default avatarJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1245945462.9157.11.camel@hpdv5.satnam>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4418351f
    • Peter Zijlstra's avatar
      perf-report: Add bare minimum PERF_EVENT_READ parsing · e9ea2fde
      Peter Zijlstra authored
      Provide the basic infrastructure to provide per task stats.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e9ea2fde
    • Peter Zijlstra's avatar
      perf-report: Add modes for inherited stats and no-samples · 649c48a9
      Peter Zijlstra authored
      Now that we can collect per task statistics, add modes that
      make use of that facility.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      649c48a9
    • Peter Zijlstra's avatar
      perf_counter: Rework the sample ABI · e6e18ec7
      Peter Zijlstra authored
      The PERF_EVENT_READ implementation made me realize we don't
      actually need the sample_type int the output sample, since
      we already have that in the perf_counter_attr information.
      
      Therefore, remove the PERF_EVENT_MISC_OVERFLOW bit and the
      event->type overloading, and imply put counter overflow
      samples in a PERF_EVENT_SAMPLE type.
      
      This also fixes the issue that event->type was only 32-bit
      and sample_type had 64 usable bits.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e6e18ec7
    • Peter Zijlstra's avatar
      perf_counter: Implement more accurate per task statistics · bfbd3381
      Peter Zijlstra authored
      With the introduction of PERF_EVENT_READ we have the
      possibility to provide accurate counter values for
      individual tasks in a task hierarchy.
      
      However, due to the lazy context switching used for similar
      counter contexts our current per task counts are way off.
      
      In order to maintain some of the lazy switch benefits we
      don't disable it out-right, but simply iterate the active
      counters and flip the values between the contexts.
      
      This only reads the counters but does not need to reprogram
      the full PMU.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      bfbd3381
    • Peter Zijlstra's avatar
      perf_counter: Add PERF_EVENT_READ · 38b200d6
      Peter Zijlstra authored
      Provide a read() like event which can be used to log the
      counter value at specific sites such as child->parent
      folding on exit.
      
      In order to be useful, we log the counter parent ID, not the
      actual counter ID, since userspace can only relate parent
      IDs to perf_counter_attr constructs.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      38b200d6
    • Peter Zijlstra's avatar
      perf_counter, x86: Add mmap counter read support · 194002b2
      Peter Zijlstra authored
      Update the mmap control page with the needed information to
      use the userspace RDPMC instruction for self monitoring.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      194002b2
    • Peter Zijlstra's avatar
      perf_counter: Add scale information to the mmap control page · 7f8b4e4e
      Peter Zijlstra authored
      Add the needed time scale to the self-profile mmap information.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      7f8b4e4e
    • Peter Zijlstra's avatar
      perf_counter: Split the mmap control page in two parts · 41f95331
      Peter Zijlstra authored
      Since there are two distinct sections to the control page,
      move them apart so that possible extentions don't overlap.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      41f95331
    • Peter Zijlstra's avatar
      perf_counter tools: Rework the file format · 7c6a1c65
      Peter Zijlstra authored
      Create a structured file format that includes the full
      perf_counter_attr and all its relevant counter IDs so that
      the reporting program has full information.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      7c6a1c65
    • Jaswinder Singh Rajput's avatar
      perf_counter tools: Shorten names for events · e5c59547
      Jaswinder Singh Rajput authored
      Added new alias for events.
      
      On AMD box:
      
       $ ./perf stat -e l1d -e l1d-misses -e l1d-write -e l1d-prefetch -e l1d-prefetch-miss -e l1i -e l1i-misses -e l1i-prefetch -e l2 -e l2-misses -e l2-write -e dtlb -e dtlb-misses -e itlb -e itlb-misses -e bpu -e bpu-misses -- ls -lR /usr/include/ > /dev/null
      
      Before :
      
       Performance counter stats for 'ls -lR /usr/include/':
      
            248064467  L1-data-Cache-Load-Referencees  (scaled from 23.27%)
              1001433  L1-data-Cache-Load-Misses  (scaled from 23.34%)
               153691  L1-data-Cache-Store-Referencees  (scaled from 23.34%)
               423248  L1-data-Cache-Prefetch-Referencees  (scaled from 23.33%)
               302138  L1-data-Cache-Prefetch-Misses  (scaled from 23.25%)
            251217546  L1-instruction-Cache-Load-Referencees  (scaled from 23.25%)
              5757005  L1-instruction-Cache-Load-Misses  (scaled from 23.23%)
                93435  L1-instruction-Cache-Prefetch-Referencees  (scaled from 23.24%)
              6496073  L2-Cache-Load-Referencees  (scaled from 23.32%)
               609485  L2-Cache-Load-Misses  (scaled from 23.45%)
              6876991  L2-Cache-Store-Referencees  (scaled from 23.71%)
            248922840  Data-TLB-Cache-Load-Referencees  (scaled from 23.94%)
              5828386  Data-TLB-Cache-Load-Misses  (scaled from 24.17%)
            257613506  Instruction-TLB-Cache-Load-Referencees  (scaled from 24.20%)
                 6833  Instruction-TLB-Cache-Load-Misses  (scaled from 23.88%)
            109043606  Branch-Cache-Load-Referencees  (scaled from 23.64%)
              5552296  Branch-Cache-Load-Misses  (scaled from 23.42%)
      
          0.413702461  seconds time elapsed.
      
      After :
      
       Peformance counter stats for 'ls -lR /usr/include/':
      
            266590464  L1-d$-loads           (scaled from 23.03%)
              1222273  L1-d$-load-misses     (scaled from 23.58%)
               146204  L1-d$-stores          (scaled from 23.83%)
               406344  L1-d$-prefetches      (scaled from 24.09%)
               283748  L1-d$-prefetch-misses (scaled from 24.10%)
            249650965  L1-i$-loads           (scaled from 23.80%)
              3353961  L1-i$-load-misses     (scaled from 23.82%)
               104599  L1-i$-prefetches      (scaled from 23.68%)
              4836405  LLC-loads             (scaled from 23.67%)
               498214  LLC-load-misses       (scaled from 23.66%)
              4953994  LLC-stores            (scaled from 23.64%)
            243354097  dTLB-loads            (scaled from 23.77%)
              6468584  dTLB-load-misses      (scaled from 23.74%)
            249719549  iTLB-loads            (scaled from 23.25%)
                 5060  iTLB-load-misses      (scaled from 23.00%)
            112343016  branch-loads          (scaled from 22.76%)
              5528876  branch-load-misses    (scaled from 22.54%)
      
          0.427154051  seconds time elapsed.
      
      Reported-by : Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1245934522.5308.39.camel@hpdv5.satnam>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e5c59547
    • Jaswinder Singh Rajput's avatar
      perf_counter tools: Check for valid cache operations · 06813f6c
      Jaswinder Singh Rajput authored
      Made new table for cache operartion stat 'hw_cache_stat' as:
      
       L1I : Read and prefetch only
       ITLB and BPU : Read-only
      
      introduce is_cache_op_valid() for cache operation validity
      
      And checks for valid cache operations.
      
      Reported-by : Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1245930367.5308.33.camel@localhost.localdomain>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      06813f6c
    • Johannes Weiner's avatar
      perf record: Fix filemap pathname parsing in /proc/pid/maps · 76c64c5e
      Johannes Weiner authored
      Looking backward for the first space from the end of a line in
      /proc/pid/maps does not find the start of the pathname of the mapped
      file if it contains a space.
      
      Since the only slashes we have in this file occur in the (absolute!)
      pathname column of file mappings, looking for the first slash in a
      line is a safe method to find the name.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Stefani Seibold <stefani@seibold.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090624190835.GA25548@cmpxchg.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      76c64c5e
  6. 24 Jun, 2009 4 commits
  7. 23 Jun, 2009 7 commits
  8. 22 Jun, 2009 4 commits