1. 19 Sep, 2012 1 commit
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo' of... · bea8f354
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from  Arnaldo Carvalho de Melo:
      
       * Fix handling of unresolved samples when --symbols is used in 'report',
         from Feng Tang.
      
       * Add --symbols to 'script', similar to the one in 'report', from Feng Tang.
      
       * Add union member access support to 'probe', from Hyeoncheol Lee.
      
       * Make 'archive' work on Android, tweaking some of the utility parameters
         used (tar, rm), from Irina Tirdea.
      
       * Fixups to die() removal, from Namhyung Kim.
      
       * Render fixes for the TUI, from Namhyung Kim.
      
       * Don't enable annotation in non symbolic view, from Namhyung Kim.
      
       * Fix pipe mode in 'report', from Namhyung Kim.
      
       * Move related stats code from stat to util/, will be used by the 'stat'
         kvm tool, from Xiao Guangrong.
      
       * Add cpumask for uncore pmu, use it in 'stat', from Yan, Zheng.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      bea8f354
  2. 17 Sep, 2012 12 commits
  3. 14 Sep, 2012 13 commits
  4. 13 Sep, 2012 2 commits
    • Ingo Molnar's avatar
      Merge branch 'core/rcu' into perf/core · 4553f0b9
      Ingo Molnar authored
      Steve Rostedt asked for the merge of a single commit, into both
      the RCU and the perf/tracing tree:
      
       | Josh made a change to the tracing code that affects both the
       | work Paul McKenney and I are currently doing. At the last
       | Kernel Summit back in August, Linus said when such a case
       | exists, it is best to make a separate branch based off of his
       | tree and place the change there. This way, the repositories
       | that need to share the change can both pull them in and the
       | SHA1 will match for both. Whichever branch is pulled in first
       | by Linus will also pull in the necessary change for the other
       | branch as well.
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4553f0b9
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo' of... · be267be8
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
       * Remove die()/exit() calls from several tools.
      
       * Add missing perf_regs.h file to MANIFEST
      
       * Clean up and improve 'perf sched' performance by elliminating lots of
         needless calls to libtraceevent.
      
       * More patches to make perf build on Android, from Irina Tirdea
      
       * Resolve vdso callchains, from Jiri Olsa
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      be267be8
  5. 12 Sep, 2012 1 commit
  6. 11 Sep, 2012 11 commits
    • Arnaldo Carvalho de Melo's avatar
      perf sched: Don't read all tracepoint variables in advance · 9ec3f4e4
      Arnaldo Carvalho de Melo authored
      Do it just at the actual consumer of these fields, that way we avoid
      needless lookups:
      
        [root@sandy ~]# perf sched record sleep 30s
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 8.585 MB perf.data (~375063 samples) ]
      
      Before:
      
        [root@sandy ~]# perf stat -r 10 perf sched lat > /dev/null
      
         Performance counter stats for 'perf sched lat' (10 runs):
      
                103.592215 task-clock                #    0.993 CPUs utilized            ( +-  0.33% )
                        12 context-switches          #    0.114 K/sec                    ( +-  3.29% )
                         0 cpu-migrations            #    0.000 K/sec
                     7,605 page-faults               #    0.073 M/sec                    ( +-  0.00% )
               345,796,112 cycles                    #    3.338 GHz                      ( +-  0.07% ) [82.90%]
               106,876,796 stalled-cycles-frontend   #   30.91% frontend cycles idle     ( +-  0.38% ) [83.23%]
                62,060,877 stalled-cycles-backend    #   17.95% backend  cycles idle     ( +-  0.80% ) [67.14%]
               628,246,586 instructions              #    1.82  insns per cycle
                                                     #    0.17  stalled cycles per insn  ( +-  0.04% ) [83.64%]
               134,962,057 branches                  # 1302.820 M/sec                    ( +-  0.10% ) [83.64%]
                 1,233,037 branch-misses             #    0.91% of all branches          ( +-  0.29% ) [83.41%]
      
               0.104333272 seconds time elapsed                                          ( +-  0.33% )
      
        [root@sandy ~]# perf stat -r 10 perf sched lat > /dev/null
      
         Performance counter stats for 'perf sched lat' (10 runs):
      
               98.848272 task-clock                #    0.993 CPUs utilized            ( +-  0.48% )
                      11 context-switches          #    0.112 K/sec                    ( +-  2.83% )
                       0 cpu-migrations            #    0.003 K/sec                    ( +- 50.92% )
                   7,604 page-faults               #    0.077 M/sec                    ( +-  0.00% )
             332,216,085 cycles                    #    3.361 GHz                      ( +-  0.14% ) [82.87%]
             100,623,710 stalled-cycles-frontend   #   30.29% frontend cycles idle     ( +-  0.53% ) [82.95%]
              58,788,692 stalled-cycles-backend    #   17.70% backend  cycles idle     ( +-  0.59% ) [67.15%]
             609,402,433 instructions              #    1.83  insns per cycle
                                                   #    0.17  stalled cycles per insn  ( +-  0.04% ) [83.76%]
             131,277,138 branches                  # 1328.067 M/sec                    ( +-  0.06% ) [83.77%]
               1,117,871 branch-misses             #    0.85% of all branches          ( +-  0.32% ) [83.51%]
      
             0.099580430 seconds time elapsed                                          ( +-  0.48% )
      
        [root@sandy ~]#
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-kracdpw8wqlr0xjh75uk8g11@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9ec3f4e4
    • Arnaldo Carvalho de Melo's avatar
      perf sched: Use perf_evsel__{int,str}val · 2b7fcbc5
      Arnaldo Carvalho de Melo authored
      This patch also stops reading the common fields, as they were not being used except
      for one ->common_pid case that was replaced by sample->tid, i.e. the info is already
      in the perf_sample struct.
      
      Also it only fills the _event structures when there is a handler.
      
        [root@sandy ~]# perf sched record sleep 30s
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 8.585 MB perf.data (~375063 samples) ]
      
      Before:
      
        [root@sandy ~]# perf stat -r 10 perf sched lat > /dev/null
      
         Performance counter stats for 'perf sched lat' (10 runs):
      
                129.117838 task-clock                #    0.994 CPUs utilized            ( +-  0.28% )
                        14 context-switches          #    0.111 K/sec                    ( +-  2.10% )
                         0 cpu-migrations            #    0.002 K/sec                    ( +- 66.67% )
                     7,654 page-faults               #    0.059 M/sec                    ( +-  0.67% )
               438,121,661 cycles                    #    3.393 GHz                      ( +-  0.06% ) [83.06%]
               150,808,605 stalled-cycles-frontend   #   34.42% frontend cycles idle     ( +-  0.14% ) [83.10%]
                80,748,941 stalled-cycles-backend    #   18.43% backend  cycles idle     ( +-  0.64% ) [66.73%]
               758,605,879 instructions              #    1.73  insns per cycle
                                                     #    0.20  stalled cycles per insn  ( +-  0.08% ) [83.54%]
               162,164,321 branches                  # 1255.940 M/sec                    ( +-  0.10% ) [83.70%]
                 1,609,903 branch-misses             #    0.99% of all branches          ( +-  0.08% ) [83.62%]
      
               0.129949153 seconds time elapsed                                          ( +-  0.28% )
      
      After:
      
        [root@sandy ~]# perf stat -r 10 perf sched lat > /dev/null
      
         Performance counter stats for 'perf sched lat' (10 runs):
      
                103.592215 task-clock                #    0.993 CPUs utilized            ( +-  0.33% )
                        12 context-switches          #    0.114 K/sec                    ( +-  3.29% )
                         0 cpu-migrations            #    0.000 K/sec
                     7,605 page-faults               #    0.073 M/sec                    ( +-  0.00% )
               345,796,112 cycles                    #    3.338 GHz                      ( +-  0.07% ) [82.90%]
               106,876,796 stalled-cycles-frontend   #   30.91% frontend cycles idle     ( +-  0.38% ) [83.23%]
                62,060,877 stalled-cycles-backend    #   17.95% backend  cycles idle     ( +-  0.80% ) [67.14%]
               628,246,586 instructions              #    1.82  insns per cycle
                                                     #    0.17  stalled cycles per insn  ( +-  0.04% ) [83.64%]
               134,962,057 branches                  # 1302.820 M/sec                    ( +-  0.10% ) [83.64%]
                 1,233,037 branch-misses             #    0.91% of all branches          ( +-  0.29% ) [83.41%]
      
               0.104333272 seconds time elapsed                                          ( +-  0.33% )
      
        [root@sandy ~]#
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-weu9t63zkrfrazkn0gxj48xy@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2b7fcbc5
    • Arnaldo Carvalho de Melo's avatar
      perf evsel: Introduce perf_evsel__{str,int}val methods · 5555ded4
      Arnaldo Carvalho de Melo authored
      Wrappers to the libtraceevent routines, so that we can further reduce
      the surface contact perf builtins have with it.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-rtmgzptvrifzjxqwb9vs6g1b@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5555ded4
    • Arnaldo Carvalho de Melo's avatar
      perf sched: Use perf_tool as ancestor · 0e9b07e5
      Arnaldo Carvalho de Melo authored
      So that we can remove all the globals.
      
      Before:
      
         text	   data	    bss	    dec	    hex	filename
      1586833	 110368	1438600	3135801	 2fd939	/tmp/oldperf
      
      After:
      
         text	   data	    bss	    dec	    hex	filename
      1629329	  93568	 848328	2571225	 273bd9	/root/bin/perf
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-oph40vikij0crjz4eyapneov@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0e9b07e5
    • Arnaldo Carvalho de Melo's avatar
      perf sched: Remove unused thread parameter · 4218e673
      Arnaldo Carvalho de Melo authored
      From the tracepoint handling routines.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-mcqd9mv34z6he0wqiz4a3mh9@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4218e673
    • Irina Tirdea's avatar
      perf tools: Use __maybe_used for unused variables · 1d037ca1
      Irina Tirdea authored
      perf defines both __used and __unused variables to use for marking
      unused variables. The variable __used is defined to
      __attribute__((__unused__)), which contradicts the kernel definition to
      __attribute__((__used__)) for new gcc versions. On Android, __used is
      also defined in system headers and this leads to warnings like: warning:
      '__used__' attribute ignored
      
      __unused is not defined in the kernel and is not a standard definition.
      If __unused is included everywhere instead of __used, this leads to
      conflicts with glibc headers, since glibc has a variables with this name
      in its headers.
      
      The best approach is to use __maybe_unused, the definition used in the
      kernel for __attribute__((unused)). In this way there is only one
      definition in perf sources (instead of 2 definitions that point to the
      same thing: __used and __unused) and it works on both Linux and Android.
      This patch simply replaces all instances of __used and __unused with
      __maybe_unused.
      Signed-off-by: default avatarIrina Tirdea <irina.tirdea@intel.com>
      Acked-by: default avatarPekka Enberg <penberg@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1347315303-29906-7-git-send-email-irina.tirdea@intel.com
      [ committer note: fixed up conflict with a116e05d in builtin-sched.c ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1d037ca1
    • Jiri Olsa's avatar
      perf tools: Back [vdso] DSO with real data · 7dbf4dcf
      Jiri Olsa authored
      Storing data for VDSO shared object, because we need it for the post
      unwind processing.
      
      The VDSO shared object is same for all process on a running system, so
      it makes no difference when we store it inside the tracer - perf.
      
      When [vdso] map memory is hit, we retrieve [vdso] DSO image and store it
      into temporary file.
      
      During the build-id processing phase, the [vdso] DSO image is stored in
      build-id db, and build-id reference is made inside perf.data. The
      build-id vdso file object is called '[vdso]'. We don't use temporary
      file name which gets removed when record is finished.
      
      During report phase the vdso build-id object is treated as any other
      build-id DSO object.
      
      Adding following API for vdso object:
      
        bool is_vdso_map(const char *filename)
          - returns true if the filename matches vdso map name
      
        struct dso *vdso__dso_findnew(struct list_head *head)
          - find/create proper vdso DSO object
      
        vdso__exit(void)
          - removes temporary VDSO image if there's any
      
      This change makes backtrace dwarf post unwind possible from [vdso] maps.
      
      Following output is current report of [vdso] sample dwarf backtrace:
      
        # Overhead  Command      Shared Object                         Symbol
        # ........  .......  .................  .............................
        #
            99.52%       ex  [vdso]             [.] 0x00007fff3ace89af
                         |
                         --- 0x7fff3ace89af
      
      Following output is new report of [vdso] sample dwarf backtrace:
      
        # Overhead  Command      Shared Object                         Symbol
        # ........  .......  .................  .............................
        #
            99.52%       ex  [vdso]             [.] 0x00000000000009af
                         |
                         --- 0x7fff3ace89af
                             main
                             __libc_start_main
                             _start
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1347295819-23177-5-git-send-email-jolsa@redhat.com
      [ committer note: s/ALIGN/PERF_ALIGN/g to cope with the android build changes ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7dbf4dcf
    • Jiri Olsa's avatar
      perf symbols: Make dsos__find function globally available · 1c4be9ff
      Jiri Olsa authored
      Changing dsos__find function from static to be globally available.
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1347295819-23177-4-git-send-email-jolsa@redhat.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1c4be9ff
    • Jiri Olsa's avatar
      perf tools: Add memdup function · b232e073
      Jiri Olsa authored
      Adding memdup function to duplicate region of memory.
      
        void *memdup(const void *src, size_t len)
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1347295819-23177-3-git-send-email-jolsa@redhat.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b232e073
    • Jiri Olsa's avatar
      perf tools: Do backtrace post unwind only if we regs and stack were captured · bdde3716
      Jiri Olsa authored
      Bail out without error if we want to do backtrace post unwind, but were
      not able to capture user registers or user stack during the record
      phase, which is possible and valid case.
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1347295819-23177-2-git-send-email-jolsa@redhat.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bdde3716
    • Irina Tirdea's avatar
      perf tools: fix ALIGN redefinition in system headers · 9ac3e487
      Irina Tirdea authored
      On some systems (e.g. Android), ALIGN is defined in system headers as
      ALIGN(p).  The definition of ALIGN used in perf takes 2 parameters:
      ALIGN(x,a).  This leads to redefinition conflicts.
      
      Redefinition error on Android:
      In file included from util/include/linux/list.h:1:0,
      from util/callchain.h:5,
      from util/hist.h:6,
      from util/session.h:4,
      from util/build-id.h:4,
      from util/annotate.c:11:
      util/include/linux/kernel.h:11:0: error: "ALIGN" redefined [-Werror]
      bionic/libc/include/sys/param.h:38:0: note: this is the location of
      the previous definition
      
      Conflics with system defined ALIGN in Android:
      util/event.c: In function 'perf_event__synthesize_comm':
      util/event.c:115:32: error: macro "ALIGN" passed 2 arguments, but takes just 1
      util/event.c:115:9: error: 'ALIGN' undeclared (first use in this function)
      util/event.c:115:9: note: each undeclared identifier is reported only once for
      each function it appears in
      
      In order to avoid this redefinition, ALIGN is renamed to PERF_ALIGN.
      Signed-off-by: default avatarIrina Tirdea <irina.tirdea@intel.com>
      Acked-by: default avatarPekka Enberg <penberg@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Irina Tirdea <irina.tirdea@intel.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1347315303-29906-5-git-send-email-irina.tirdea@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9ac3e487