1. 15 Mar, 2010 3 commits
    • Arnaldo Carvalho de Melo's avatar
      perf top: Properly notify the user that vmlinux is missing · b0a9ab62
      Arnaldo Carvalho de Melo authored
      Before this patch this message would very briefly appear on the
      screen and then the screen would get updates only on the top,
      for number of interrupts received, etc, but no annotation would
      be performed:
      
       [root@doppio linux-2.6-tip]# perf top -s n_tty_write > /tmp/bla
       objdump: '[kernel.kallsyms]': No such file
      
      Now this is what the user gets:
      
       [root@doppio linux-2.6-tip]# perf top -s n_tty_write
       Can't annotate n_tty_write: No vmlinux file was found in the
       path: [0] vmlinux
       [1] /boot/vmlinux
       [2] /boot/vmlinux-2.6.33-rc5
       [3] /lib/modules/2.6.33-rc5/build/vmlinux
       [4] /usr/lib/debug/lib/modules/2.6.33-rc5/vmlinux
       [root@doppio linux-2.6-tip]#
      
      This bug was introduced when we added automatic search for
      vmlinux, before that time the user had to specify a vmlinux
      file.
      Reported-by: default avatarDavid S. Miller <davem@davemloft.net>
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: <stable@kernel.org>
      LKML-Reference: <1268664418-28328-2-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b0a9ab62
    • Eric B Munson's avatar
      perf record: Enable the enable_on_exec flag if record forks the target · bedbfdea
      Eric B Munson authored
      When forking its target, perf record can capture data from
      before the target application is started.  Perf stat uses the
      enable_on_exec flag in the event attributes to keep from
      displaying events from before the target program starts, this
      patch adds the same functionality to perf record when it is will
      fork the target process.
      Signed-off-by: default avatarEric B Munson <ebmunson@us.ibm.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1268664418-28328-1-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      bedbfdea
    • Cyrill Gorcunov's avatar
      perf, x86: Enable not tagged retired instruction counting on P4s · e4495262
      Cyrill Gorcunov authored
      This should turn on instruction counting on P4s, which was missing in
      the first version of the new PMU driver.
      
      It's inaccurate for now, we still need dependant event to tag mops
      before we can count them precisely. The result is that the number of
      instruction may be lifted up.
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: default avatarLin Ming <ming.m.lin@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1268629102.3355.11.camel@minggr.sh.intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e4495262
  2. 13 Mar, 2010 2 commits
  3. 12 Mar, 2010 11 commits
    • Ingo Molnar's avatar
      Merge branch 'perf/x86' into perf/core · 03086359
      Ingo Molnar authored
      Merge reason: The new P4 driver is stable and ready now for more
                    testing.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      03086359
    • Arnaldo Carvalho de Melo's avatar
      perf hist: Don't fprintf the callgraph unconditionally · 3997d377
      Arnaldo Carvalho de Melo authored
      [root@doppio ~]# perf report -i newt.data | head -10
        # Samples: 11999679868
        #
        # Overhead  Command                  Shared Object  Symbol
        # ........  .......  .............................  ......
        #
            63.61%     perf  libslang.so.2.1.4              [.] SLsmg_write_chars
             6.30%     perf  perf                           [.] symbols__find
             2.19%     perf  libnewt.so.0.52.10             [.] newtListboxAppendEntry
             2.08%     perf  libslang.so.2.1.4              [.] SLsmg_write_chars@plt
             1.99%     perf  libc-2.10.2.so                 [.] _IO_vfprintf_internal
        [root@doppio ~]#
      
      Not good, the newt form for report works, but slang has to eat
      the cost of the additional callgraph lines everytime it prints a
      line, and the callgraph doesn't appear on the screen, so move
      the callgraph printing to a separate function and don't use it
      in newt.c.
      
      Newt tree widgets are being investigated to properly support
      callgraphs, but till that gets merged, lets remove this huge
      overhead and show at least the symbol overheads for a callgraph
      rich perf.data with good performance.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268408808-13595-2-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3997d377
    • Arnaldo Carvalho de Melo's avatar
      perf newt: Use newtGetScreenSize · cb7afb70
      Arnaldo Carvalho de Melo authored
      For consistency, use the newt API more fully.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268408808-13595-1-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cb7afb70
    • Arnaldo Carvalho de Melo's avatar
      perf newt: Add 'Q', 'q' and Ctrl+C as ways to exit from forms · 7081e087
      Arnaldo Carvalho de Melo authored
      These are keys people expect when pressed to exit the current
      widget, so have associate all of them to this semantic.
      Suggested-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268401692-9361-1-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      7081e087
    • Arnaldo Carvalho de Melo's avatar
      perf report: Implement initial UI using newt · f9224c5c
      Arnaldo Carvalho de Melo authored
      Newt has widespread availability and provides a rather simple
      API as can be seen by the size of this patch.
      
      The work needed to support it will benefit other frontends too.
      
      In this initial patch it just checks if the output is a tty, if
      not it falls back to the previous behaviour, also if
      newt-devel/libnewt-dev is not installed the previous behaviour
      is maintaned.
      
      Pressing enter on a symbol will annotate it, ESC in the
      annotation window will return to the report symbol list.
      
      More work will be done to remove the special casing in
      color_fprintf, stop using fmemopen/FILE in the printing of
      hist_entries, etc.
      
      Also the annotation doesn't need to be done via spawning "perf
      annotate" and then browsing its output, we can do better by
      calling directly the builtin-annotate.c functions, that would
      then be moved to tools/perf/util/annotate.c and shared with perf
      top, etc
      
      But lets go by baby steps, this patch already improves perf
      usability by allowing to quickly do annotations on symbols from
      the report screen and provides a first experimentation with
      libnewt/TUI integration of tools.
      
      Tested on RHEL5 and Fedora12 X86_64 and on Debian PARISC64 to
      browse a perf.data file collected on a Fedora12 x86_64 box.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268349164-5822-5-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f9224c5c
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Add missing bytes printed in hist_entry__fprintf · dd2ee78d
      Arnaldo Carvalho de Melo authored
      We need those to properly size the browser widht in the newt
      TUI.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268349164-5822-4-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      dd2ee78d
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Use eprintf for pr_{err,warning,info} too · b4f5296f
      Arnaldo Carvalho de Melo authored
      Just like we do for pr_debug, so that we can have a single point
      where to redirect to the currently used output system, be it
      stdio or newt.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268349164-5822-3-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b4f5296f
    • Arnaldo Carvalho de Melo's avatar
      perf top: Export get_window_dimensions · 895f0edc
      Arnaldo Carvalho de Melo authored
      Will be used by the newt code too.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268349164-5822-2-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      895f0edc
    • Arnaldo Carvalho de Melo's avatar
      perf symbols: Bump plt synthesizing warning debug level · fe2197b8
      Arnaldo Carvalho de Melo authored
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268349164-5822-1-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      fe2197b8
    • Ingo Molnar's avatar
      Merge branch 'perf/urgent' into perf/core · 937779db
      Ingo Molnar authored
      Merge reason: We want to queue up a dependent patch.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      937779db
    • Cyrill Gorcunov's avatar
      x86, perf: Fix NULL deref on not assigned x86_pmu · 0b861225
      Cyrill Gorcunov authored
      In case of not assigned x86_pmu and software events NULL dereference may
      being hit via x86_pmu::schedule_events method.
      
      Fix it by checking if x86_pmu is initialized at all.
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Lin Ming <ming.m.lin@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <20100311215016.GG25162@lenovo>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0b861225
  4. 11 Mar, 2010 11 commits
    • Arnaldo Carvalho de Melo's avatar
      perf record: Mention paranoid sysctl when failing to create counter · 6230f2c7
      Arnaldo Carvalho de Melo authored
      [acme@mica linux-2.6-tip]$ perf record -a -f
         Fatal: Permission error - are you root?
       	 Consider tweaking /proc/sys/kernel/perf_event_paranoid.
      
       [acme@mica linux-2.6-tip]$
      Suggested-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268333592-30872-2-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      6230f2c7
    • Arnaldo Carvalho de Melo's avatar
      perf record: Don't try to find buildids in a zero sized file · 9f591fd7
      Arnaldo Carvalho de Melo authored
      Fixing this symptom:
      
       [acme@mica linux-2.6-tip]$ perf record -a -f
         Fatal: Permission error - are you root?
      
       Bus error
       [acme@mica linux-2.6-tip]$
      
      I.e. if for some reason no data is collected, in this case a non
      root user trying to do systemwide profiling, no data will be
      collected, and then we end up trying to mmap a zero sized file
      and access the file header, b00m.
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: <stable@kernel.org>
      LKML-Reference: <1268333592-30872-1-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9f591fd7
    • Cyrill Gorcunov's avatar
      perf, x86: Implement initial P4 PMU driver · a072738e
      Cyrill Gorcunov authored
      The netburst PMU is way different from the "architectural
      perfomance monitoring" specification that current CPUs use.
      P4 uses a tuple of ESCR+CCCR+COUNTER MSR registers to handle
      perfomance monitoring events.
      
      A few implementational details:
      
      1) We need a separate x86_pmu::hw_config helper in struct
         x86_pmu since register bit-fields are quite different from P6,
         Core and later cpu series.
      
      2) For the same reason is a x86_pmu::schedule_events helper
         introduced.
      
      3) hw_perf_event::config consists of packed ESCR+CCCR values.
         It's allowed since in reality both registers only use a half
         of their size. Of course before making a real write into a
         particular MSR we need to unpack the value and extend it to
         a proper size.
      
      4) The tuple of packed ESCR+CCCR in hw_perf_event::config
         doesn't describe the memory address of ESCR MSR register
         so that we need to keep a mapping between these tuples
         used and available ESCR (various P4 events may use same
         ESCRs but not simultaneously), for this sake every active
         event has a per-cpu map of hw_perf_event::idx <--> ESCR
         addresses.
      
      5) Since hw_perf_event::idx is an offset to counter/control register
         we need to lift X86_PMC_MAX_GENERIC up, otherwise kernel
         strips it down to 8 registers and event armed may never be turned
         off (ie the bit in active_mask is set but the loop never reaches
         this index to check), thanks to Peter Zijlstra
      
      Restrictions:
      
       - No cascaded counters support (do we ever need them?)
       - No dependent events support (so PERF_COUNT_HW_INSTRUCTIONS
         doesn't work for now)
       - There are events with same counters which can't work simultaneously
         (need to use intersected ones due to broken counter 1)
       - No PERF_COUNT_HW_CACHE_ events yet
      
      Todo:
      
       - Implement dependent events
       - Need proper hashing for event opcodes (no linear search, good for
         debugging stage but not in real loads)
       - Some events counted during a clock cycle -- need to set threshold
         for them and count every clock cycle just to get summary statistics
         (ie to behave the same way as other PMUs do)
       - Need to swicth to use event_constraints
       - To support RAW events we need to encode a global list of P4 events
         into p4_templates
       - Cache events need to be added
      
      Event support status matrix:
      
       Event			status
       -----------------------------
       cycles			works
       cache-references	works
       cache-misses		works
       branch-misses		works
       bus-cycles		partially (does not work on 64bit cpu with HT enabled)
       instruction		doesnt work (needs dependent event [mop tagging])
       branches		doesnt work
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: default avatarLin Ming <ming.m.lin@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100311165439.GB5129@lenovo>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a072738e
    • eranian@google.com's avatar
      perf_events: Improve task_sched_in() · 9b33fa6b
      eranian@google.com authored
      This patch is an optimization in perf_event_task_sched_in() to avoid
      scheduling the events twice in a row.
      
      Without it, the perf_disable()/perf_enable() pair is invoked twice,
      thereby pinned events counts while scheduling flexible events and we go
      throuh hw_perf_enable() twice.
      
      By encapsulating, the whole sequence into perf_disable()/perf_enable() we
      ensure, hw_perf_enable() is going to be invoked only once because of the
      refcount protection.
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1268288765-5326-1-git-send-email-eranian@google.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9b33fa6b
    • Xiao Guangrong's avatar
      perf: export perf_trace_regs and perf_arch_fetch_caller_regs · 639fe4b1
      Xiao Guangrong authored
      Export perf_trace_regs and perf_arch_fetch_caller_regs since module will
      use these.
      Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      [ use EXPORT_PER_CPU_SYMBOL_GPL() ]
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4B989C1B.2090407@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      639fe4b1
    • Peter Zijlstra's avatar
      perf, x86: Fix hw_perf_enable() event assignment · 45e16a68
      Peter Zijlstra authored
      What happens is that we schedule badly like:
      
      <...>-1987  [019]   280.252808: x86_pmu_start: event-46/1300c0: idx: 0
      <...>-1987  [019]   280.252811: x86_pmu_start: event-47/1300c0: idx: 1
      <...>-1987  [019]   280.252812: x86_pmu_start: event-48/1300c0: idx: 2
      <...>-1987  [019]   280.252813: x86_pmu_start: event-49/1300c0: idx: 3
      <...>-1987  [019]   280.252814: x86_pmu_start: event-50/1300c0: idx: 32
      <...>-1987  [019]   280.252825: x86_pmu_stop: event-46/1300c0: idx: 0
      <...>-1987  [019]   280.252826: x86_pmu_stop: event-47/1300c0: idx: 1
      <...>-1987  [019]   280.252827: x86_pmu_stop: event-48/1300c0: idx: 2
      <...>-1987  [019]   280.252828: x86_pmu_stop: event-49/1300c0: idx: 3
      <...>-1987  [019]   280.252829: x86_pmu_stop: event-50/1300c0: idx: 32
      <...>-1987  [019]   280.252834: x86_pmu_start: event-47/1300c0: idx: 1
      <...>-1987  [019]   280.252834: x86_pmu_start: event-48/1300c0: idx: 2
      <...>-1987  [019]   280.252835: x86_pmu_start: event-49/1300c0: idx: 3
      <...>-1987  [019]   280.252836: x86_pmu_start: event-50/1300c0: idx: 32
      <...>-1987  [019]   280.252837: x86_pmu_start: event-51/1300c0: idx: 32 *FAIL*
      
      This happens because we only iterate the n_running events in the first
      pass, and reset their index to -1 if they don't match to force a
      re-assignment.
      
      Now, in our RR example, n_running == 0 because we fully unscheduled, so
      event-50 will retain its idx==32, even though in scheduling it will have
      gotten idx=0, and we don't trigger the re-assign path.
      
      The easiest way to fix this is the below patch, which simply validates
      the full assignment in the second pass.
      Reported-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1268311069.5037.31.camel@laptop>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      45e16a68
    • Peter Zijlstra's avatar
      perf, ppc: Fix compile error due to new cpu notifiers · 85cfabbc
      Peter Zijlstra authored
      Fix:
      
        arch/powerpc/kernel/perf_event.c:1334: error: 'power_pmu_notifier' undeclared (first use in this function)
        arch/powerpc/kernel/perf_event.c:1334: error: (Each undeclared identifier is reported only once
        arch/powerpc/kernel/perf_event.c:1334: error: for each function it appears in.)
        arch/powerpc/kernel/perf_event.c:1334: error: implicit declaration of function 'power_pmu_notifier'
        arch/powerpc/kernel/perf_event.c:1334: error: implicit declaration of function 'register_cpu_notifier'
      
      Due to commit 3f6da390 (perf: Rework and fix the arch CPU-hotplug hooks).
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      85cfabbc
    • John Kacur's avatar
      perf: Make the install relative to DESTDIR if specified · 7ae5f213
      John Kacur authored
      Without this change, the install path is relative to
      prefix/DESTDIR where prefix is automatically set to $HOME.
      
      This can produce unexpected results. For example:
      
        make -C tools/perf DESTDIR=/home/jkacur/tmp install-man
      
      creates the directory:		/home/jkacur/home/jkacur/tmp/share/...
      instead of the expected:	/home/jkacur/tmp/share/...
      Signed-off-by: default avatarJohn Kacur <jkacur@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Kyle McMartin <kyle@redhat.com>
      Cc: <stable@kernel.org>
      LKML-Reference: <1268312220-12880-1-git-send-email-jkacur@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      7ae5f213
    • Masami Hiramatsu's avatar
      kprobes: Calculate the index correctly when freeing the out-of-line execution slot · 83ff56f4
      Masami Hiramatsu authored
      From : Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      
      When freeing the instruction slot, the arithmetic to calculate
      the index of the slot in the page needs to account for the total
      size of the instruction on the various architectures.
      
      Calculate the index correctly when freeing the out-of-line
      execution slot.
      Reported-by: default avatarSachin Sant <sachinp@in.ibm.com>
      Reported-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      LKML-Reference: <4B9667AB.9050507@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      83ff56f4
    • Paul Mackerras's avatar
      perf tools: Fix sparse CPU numbering related bugs · a12b51c4
      Paul Mackerras authored
      At present, the perf subcommands that do system-wide monitoring
      (perf stat, perf record and perf top) don't work properly unless
      the online cpus are numbered 0, 1, ..., N-1.  These tools ask
      for the number of online cpus with sysconf(_SC_NPROCESSORS_ONLN)
      and then try to create events for cpus 0, 1, ..., N-1.
      
      This creates problems for systems where the online cpus are
      numbered sparsely.  For example, a POWER6 system in
      single-threaded mode (i.e. only running 1 hardware thread per
      core) will have only even-numbered cpus online.
      
      This fixes the problem by reading the /sys/devices/system/cpu/online
      file to find out which cpus are online.  The code that does that is in
      tools/perf/util/cpumap.[ch], and consists of a read_cpu_map()
      function that sets up a cpumap[] array and returns the number of
      online cpus.  If /sys/devices/system/cpu/online can't be read or
      can't be parsed successfully, it falls back to using sysconf to
      ask how many cpus are online and sets up an identity map in cpumap[].
      
      The perf record, perf stat and perf top code then calls
      read_cpu_map() in the system-wide monitoring case (instead of
      sysconf) and uses cpumap[] to get the cpu numbers to pass to
      perf_event_open.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      LKML-Reference: <20100310093609.GA3959@brick.ozlabs.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a12b51c4
    • Paul Mackerras's avatar
      perf_event: Fix oops triggered by cpu offline/online · 220b140b
      Paul Mackerras authored
      Anton Blanchard found that he could reliably make the kernel hit a
      BUG_ON in the slab allocator by taking a cpu offline and then online
      while a system-wide perf record session was running.
      
      The reason is that when the cpu comes up, we completely reinitialize
      the ctx field of the struct perf_cpu_context for the cpu.  If there is
      a system-wide perf record session running, then there will be a struct
      perf_event that has a reference to the context, so its refcount will
      be 2.  (The perf_event has been removed from the context's group_entry
      and event_entry lists by perf_event_exit_cpu(), but that doesn't
      remove the perf_event's reference to the context and doesn't decrement
      the context's refcount.)
      
      When the cpu comes up, perf_event_init_cpu() gets called, and it calls
      __perf_event_init_context() on the cpu's context.  That resets the
      refcount to 1.  Then when the perf record session finishes and the
      perf_event is closed, the refcount gets decremented to 0 and the
      context gets kfreed after an RCU grace period.  Since the context
      wasn't kmalloced -- it's part of a per-cpu variable -- bad things
      happen.
      
      In fact we don't need to completely reinitialize the context when the
      cpu comes up.  It's sufficient to initialize the context once at boot,
      but we need to do it for all possible cpus.
      
      This moves the context initialization to happen at boot time.  With
      this, we don't trash the refcount and the context never gets kfreed,
      and we don't hit the BUG_ON.
      Reported-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Tested-by: default avatarAnton Blanchard <anton@samba.org>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      220b140b
  5. 10 Mar, 2010 13 commits
    • Frederic Weisbecker's avatar
      perf: Drop the obsolete profile naming for trace events · 97d5a220
      Frederic Weisbecker authored
      Drop the obsolete "profile" naming used by perf for trace events.
      Perf can now do more than simple events counting, so generalize
      the API naming.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      97d5a220
    • Frederic Weisbecker's avatar
      perf: Take a hot regs snapshot for trace events · c530665c
      Frederic Weisbecker authored
      We are taking a wrong regs snapshot when a trace event triggers.
      Either we use get_irq_regs(), which gives us the interrupted
      registers if we are in an interrupt, or we use task_pt_regs()
      which gives us the state before we entered the kernel, assuming
      we are lucky enough to be no kernel thread, in which case
      task_pt_regs() returns the initial set of regs when the kernel
      thread was started.
      
      What we want is different. We need a hot snapshot of the regs,
      so that we can get the instruction pointer to record in the
      sample, the frame pointer for the callchain, and some other
      things.
      
      Let's use the new perf_fetch_caller_regs() for that.
      
      Comparison with perf record -e lock: -R -a -f -g
      Before:
      
              perf  [kernel]                   [k] __do_softirq
                     |
                     --- __do_softirq
                        |
                        |--55.16%-- __open
                        |
                         --44.84%-- __write_nocancel
      
      After:
      
                  perf  [kernel]           [k] perf_tp_event
                     |
                     --- perf_tp_event
                        |
                        |--41.07%-- lock_acquire
                        |          |
                        |          |--39.36%-- _raw_spin_lock
                        |          |          |
                        |          |          |--7.81%-- hrtimer_interrupt
                        |          |          |          smp_apic_timer_interrupt
                        |          |          |          apic_timer_interrupt
      
      The old case was producing unreliable callchains. Now having
      right frame and instruction pointers, we have the trace we
      want.
      
      Also syscalls and kprobe events already have the right regs,
      let's use them instead of wasting a retrieval.
      
      v2: Follow the rename perf_save_regs() -> perf_fetch_caller_regs()
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Archs <linux-arch@vger.kernel.org>
      c530665c
    • Frederic Weisbecker's avatar
      perf: Introduce new perf_fetch_caller_regs() for hot regs snapshot · 5331d7b8
      Frederic Weisbecker authored
      Events that trigger overflows by interrupting a context can
      use get_irq_regs() or task_pt_regs() to retrieve the state
      when the event triggered. But this is not the case for some
      other class of events like trace events as tracepoints are
      executed in the same context than the code that triggered
      the event.
      
      It means we need a different api to capture the regs there,
      namely we need a hot snapshot to get the most important
      informations for perf: the instruction pointer to get the
      event origin, the frame pointer for the callchain, the code
      segment for user_mode() tests (we always use __KERNEL_CS as
      trace events always occur from the kernel) and the eflags
      for further purposes.
      
      v2: rename perf_save_regs to perf_fetch_caller_regs as per
      Masami's suggestion.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Archs <linux-arch@vger.kernel.org>
      5331d7b8
    • Frederic Weisbecker's avatar
      perf/x86-64: Use frame pointer to walk on irq and process stacks · 61e67fb9
      Frederic Weisbecker authored
      We were using the frame pointer based stack walker on every
      contexts in x86-32, but not in x86-64 where we only use the
      seven-league boots on the exception stacks.
      
      Use it also on irq and process stacks. This utterly accelerate
      the captures.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      61e67fb9
    • Frederic Weisbecker's avatar
      lockdep: Move lock events under lockdep recursion protection · db2c4c77
      Frederic Weisbecker authored
      There are rcu locked read side areas in the path where we submit
      a trace event. And these rcu_read_(un)lock() trigger lock events,
      which create recursive events.
      
      One pair in do_perf_sw_event:
      
      __lock_acquire
            |
            |--96.11%-- lock_acquire
            |          |
            |          |--27.21%-- do_perf_sw_event
            |          |          perf_tp_event
            |          |          |
            |          |          |--49.62%-- ftrace_profile_lock_release
            |          |          |          lock_release
            |          |          |          |
            |          |          |          |--33.85%-- _raw_spin_unlock
      
      Another pair in perf_output_begin/end:
      
      __lock_acquire
            |--23.40%-- perf_output_begin
            |          |          __perf_event_overflow
            |          |          perf_swevent_overflow
            |          |          perf_swevent_add
            |          |          perf_swevent_ctx_event
            |          |          do_perf_sw_event
            |          |          perf_tp_event
            |          |          |
            |          |          |--55.37%-- ftrace_profile_lock_acquire
            |          |          |          lock_acquire
            |          |          |          |
            |          |          |          |--37.31%-- _raw_spin_lock
      
      The problem is not that much the trace recursion itself, as we have a
      recursion protection already (though it's always wasteful to recurse).
      But the trace events are outside the lockdep recursion protection, then
      each lockdep event triggers a lock trace, which will trigger two
      other lockdep events. Here the recursive lock trace event won't
      be taken because of the trace recursion, so the recursion stops there
      but lockdep will still analyse these new events:
      
      To sum up, for each lockdep events we have:
      
      	lock_*()
      	     |
                   trace lock_acquire
                        |
                        ----- rcu_read_lock()
                        |          |
                        |          lock_acquire()
                        |          |
                        |          trace_lock_acquire() (stopped)
                        |          |
      		  |          lockdep analyze
                        |
                        ----- rcu_read_unlock()
                                   |
                                   lock_release
                                   |
                                   trace_lock_release() (stopped)
                                   |
                                   lockdep analyze
      
      And you can repeat the above two times as we have two rcu read side
      sections when we submit an event.
      
      This is fixed in this patch by moving the lock trace event under
      the lockdep recursion protection.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      db2c4c77
    • Arnaldo Carvalho de Melo's avatar
      perf report: Print the map table just after samples for which no map was found · 65f2ed2b
      Arnaldo Carvalho de Melo authored
      If -vv is used just the map table will be printed, -vvv will
      print the symbol table too, with it we can see that we have a
      bug where some samples are not being resolved to a map when we
      get them in the perf.data stream, but after we have it all
      processed, we can find the right map, some reordering probably
      is happening.
      
      Upcoming patches will provide ways to ask for most PERF_SAMPLE_
      conditional samples to be taken for !PERF_RECORD_SAMPLE events
      too, then we'll be able to ask for PERF_SAMPLE_TIME and
      PERF_SAMPLE_CPU to help diagnose this.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268161097-17761-1-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      65f2ed2b
    • Eric B Munson's avatar
      perf report: Add multiple event support · cbbc79a5
      Eric B Munson authored
      Perf report does not handle multiple events being reported, even
      though perf record stores them properly on disk.  This patch
      addresses that issue by adding the logic to perf report to use
      the event stream id that is saved by record and the new data
      structures to seperate the event streams and report them
      individually.
      Signed-off-by: default avatarEric B Munson <ebmunson@us.ibm.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1267804269-22660-6-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cbbc79a5
    • Eric B Munson's avatar
      perf session: Change perf_session post processing functions to take histogram tree · eefc465c
      Eric B Munson authored
      Now that report can store historgrams for multiple events we
      need to be able to do the post processing work for each
      histogram. This patch changes the post processing functions so
      that they can be called individually for each event's histogram.
      Signed-off-by: default avatarEric B Munson <ebmunson@us.ibm.com>
      [ Guarantee bisectabilty by fixing up builtin-report.c ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1267804269-22660-5-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      eefc465c
    • Eric B Munson's avatar
      perf session: Add storage for seperating event types in report · cb8f0939
      Eric B Munson authored
      This patch adds the structures necessary to count each event
      type independently in perf report.
      Signed-off-by: default avatarEric B Munson <ebmunson@us.ibm.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1267804269-22660-4-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cb8f0939
    • Eric B Munson's avatar
      perf session: Change add_hist_entry to take the tree root instead of session · d403d0ac
      Eric B Munson authored
      In order to minimize the impact of storing multiple events in a
      report this function will now take the root of the histogram
      tree so that the logic for selecting the proper tree can be
      inserted before the call.
      Signed-off-by: default avatarEric B Munson <ebmunson@us.ibm.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1267804269-22660-3-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d403d0ac
    • Eric B Munson's avatar
      perf record: Add ID and to recorded event data when recording multiple events · 8907fd60
      Eric B Munson authored
      Currently perf record does not write the ID or the to disk for
      events. This doesn't allow report to tell if an event stream
      contains one or more types of events.  This patch adds this
      entry to the list of data that record will write to disk if more
      than one event was requested.
      Signed-off-by: default avatarEric B Munson <ebmunson@us.ibm.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1267804269-22660-2-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8907fd60
    • Arnaldo Carvalho de Melo's avatar
      perf probe: Add missing variable initialization · accd3cc4
      Arnaldo Carvalho de Melo authored
      cc1: warnings being treated as errors
       util/probe-finder.c: In function 'find_line_range':
       util/probe-finder.c:172: warning: 'src' may be used
       uninitialized in this function make: *** [util/probe-finder.o]
       Error 1
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1267804269-22660-1-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      accd3cc4
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Don't trow away old map slices not overlapped by new maps · 12245509
      Arnaldo Carvalho de Melo authored
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1267800842-22324-1-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      12245509