1. 08 Sep, 2010 3 commits
    • Masami Hiramatsu's avatar
      perf probe: Fix return probe support · 04ddd04b
      Masami Hiramatsu authored
      Fix a bug to support %return probe syntax again. Previous commit 4235b045 has a
      bug which disables the %return syntax on perf probe.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100827113852.22882.87447.stgit@ltc236.sdl.hitachi.co.jp>
      Signed-off-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      04ddd04b
    • Masami Hiramatsu's avatar
      tracing/kprobe: Fix a memory leak in error case · 61a52736
      Masami Hiramatsu authored
      Fix a memory leak which happens when a field name conflicts with others. In
      error case, free_trace_probe() will free all arguments until nr_args, so this
      increments nr_args the begining of the loop instead of the end.
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      LKML-Reference: <20100827113846.22882.12670.stgit@ltc236.sdl.hitachi.co.jp>
      Signed-off-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      61a52736
    • Steven Rostedt's avatar
      tracing: Do not allow llseek to set_ftrace_filter · 9c55cb12
      Steven Rostedt authored
      Reading the file set_ftrace_filter does three things.
      
      1) shows whether or not filters are set for the function tracer
      2) shows what functions are set for the function tracer
      3) shows what triggers are set on any functions
      
      3 is independent from 1 and 2.
      
      The way this file currently works is that it is a state machine,
      and as you read it, it may change state. But this assumption breaks
      when you use lseek() on the file. The state machine gets out of sync
      and the t_show() may use the wrong pointer and cause a kernel oops.
      
      Luckily, this will only kill the app that does the lseek, but the app
      dies while holding a mutex. This prevents anyone else from using the
      set_ftrace_filter file (or any other function tracing file for that matter).
      
      A real fix for this is to rewrite the code, but that is too much for
      a -rc release or stable. This patch simply disables llseek on the
      set_ftrace_filter() file for now, and we can do the proper fix for the
      next major release.
      Reported-by: default avatarRobert Swiecki <swiecki@google.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Tavis Ormandy <taviso@google.com>
      Cc: Eugene Teo <eugene@redhat.com>
      Cc: vendor-sec@lst.de
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      9c55cb12
  2. 03 Sep, 2010 3 commits
    • Robert Richter's avatar
      perf, x86: Try to handle unknown nmis with an enabled PMU · 4177c42a
      Robert Richter authored
      When the PMU is enabled it is valid to have unhandled nmis, two
      events could trigger 'simultaneously' raising two back-to-back
      NMIs. If the first NMI handles both, the latter will be empty
      and daze the CPU.
      
      The solution to avoid an 'unknown nmi' massage in this case was
      simply to stop the nmi handler chain when the PMU is enabled by
      stating the nmi was handled. This has the drawback that a) we
      can not detect unknown nmis anymore, and b) subsequent nmi
      handlers are not called.
      
      This patch addresses this. Now, we check this unknown NMI if it
      could be a PMU back-to-back NMI. Otherwise we pass it and let
      the kernel handle the unknown nmi.
      
      This is a debug log:
      
       cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430
       cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616
       cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320
       cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139
       cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100
       cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607
       cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416
       cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032
       cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830
       cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743
       cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552
       cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224
       cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677
       cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772
       cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818
       cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591
       Uhhuh. NMI received for unknown reason 00 on CPU 6.
       Do you have a strange power saving mode enabled?
       Dazed and confused, but trying to continue
      
      Deltas:
      
       nmi #32334 340186
       nmi #32335 1327704
       nmi #32336 1819      <<<< back-to-back nmi [1]
       nmi #32337 85961
       nmi #32338 284507
       nmi #32339 1578809
       nmi #32340 217616
       nmi #32341 1798      <<<< back-to-back nmi [2]
       nmi #32342 240913
       nmi #32343 1512809
       nmi #32344 116672
       nmi #32345 412453
       nmi #32346 1462095   <<<< 1st nmi (standard) handling 2 counters
       nmi #32347 2046      <<<< 2nd nmi (back-to-back) handling one
       counter nmi #32348 1773      <<<< 3rd nmi (back-to-back)
       handling no counter! [3]
      
      For  back-to-back nmi detection there are the following rules:
      
      The PMU nmi handler was handling more than one counter and no
      counter was handled in the subsequent nmi (see [1] and [2]
      above).
      
      There is another case if there are two subsequent back-to-back
      nmis [3]. The 2nd is detected as back-to-back because the first
      handled more than one counter. If the second handles one counter
      and the 3rd handles nothing, we drop the 3rd nmi because it
      could be a back-to-back nmi.
      Signed-off-by: default avatarRobert Richter <robert.richter@amd.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      [ renamed nmi variable to pmu_nmi to avoid clash with .nmi in entry.S ]
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Cc: peterz@infradead.org
      Cc: gorcunov@gmail.com
      Cc: fweisbec@gmail.com
      Cc: ying.huang@intel.com
      Cc: ming.m.lin@intel.com
      Cc: eranian@google.com
      LKML-Reference: <1283454469-1909-3-git-send-email-dzickus@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4177c42a
    • Peter Zijlstra's avatar
      perf, x86: Fix handle_irq return values · de725dec
      Peter Zijlstra authored
      Now that we rely on the number of handled overflows, ensure all
      handle_irq implementations actually return the right number.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Cc: peterz@infradead.org
      Cc: robert.richter@amd.com
      Cc: gorcunov@gmail.com
      Cc: fweisbec@gmail.com
      Cc: ying.huang@intel.com
      Cc: ming.m.lin@intel.com
      Cc: eranian@google.com
      LKML-Reference: <1283454469-1909-4-git-send-email-dzickus@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      de725dec
    • Don Zickus's avatar
      perf, x86: Fix accidentally ack'ing a second event on intel perf counter · 2e556b5b
      Don Zickus authored
      During testing of a patch to stop having the perf subsytem
      swallow nmis, it was uncovered that Nehalem boxes were randomly
      getting unknown nmis when using the perf tool.
      
      Moving the ack'ing of the PMI closer to when we get the status
      allows the hardware to properly re-set the PMU bit signaling
      another PMI was triggered during the processing of the first
      PMI.  This allows the new logic for dealing with the
      shortcomings of multiple PMIs to handle the extra NMI by
      'eat'ing it later.
      
      Now one can wonder why are we getting a second PMI when we
      disable all the PMUs in the begining of the NMI handler to
      prevent such a case, for that I do not know.  But I know the fix
      below helps deal with this quirk.
      
      Tested on multiple Nehalems where the problem was occuring.
      With the patch, the code now loops a second time to handle the
      second PMI (whereas before it was not).
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Cc: peterz@infradead.org
      Cc: robert.richter@amd.com
      Cc: gorcunov@gmail.com
      Cc: fweisbec@gmail.com
      Cc: ying.huang@intel.com
      Cc: ming.m.lin@intel.com
      Cc: eranian@google.com
      LKML-Reference: <1283454469-1909-2-git-send-email-dzickus@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2e556b5b
  3. 01 Sep, 2010 3 commits
  4. 31 Aug, 2010 2 commits
  5. 30 Aug, 2010 1 commit
    • Stephane Eranian's avatar
      perf_events: Fix time tracking for events with pid != -1 and cpu != -1 · fa66f07a
      Stephane Eranian authored
      Per-thread events with a cpu filter, i.e., cpu != -1, were not
      reporting correct timings when the thread never ran on the
      monitored cpu. The time enabled was reported as a negative
      value.
      
      This patch fixes the problem by updating tstamp_stopped,
      tstamp_running in event_sched_out() for events with filters and
      which are marked as INACTIVE.
      
      The function group_sched_out() is modified to systematically
      call into event_sched_out() to avoid duplicating the timing
      adjustment code twice.
      
      With the patch, I now get:
      
      $ task_cpu -i -e unhalted_core_cycles,unhalted_core_cycles
      noploop 2 noploop for 2 seconds
      CPU0 0		   unhalted_core_cycles (ena=1,991,136,594, run=0)
      CPU0 0		   unhalted_core_cycles (ena=1,991,136,594, run=0)
      
      CPU1 0		   unhalted_core_cycles (ena=1,991,136,594, run=0)
      CPU1 0		   unhalted_core_cycles (ena=1,991,136,594, run=0)
      
      CPU2 0		   unhalted_core_cycles (ena=1,991,136,594, run=0)
      CPU2 0		   unhalted_core_cycles (ena=1,991,136,594, run=0)
      
      CPU3 4,747,990,931 unhalted_core_cycles (ena=1,991,136,594, run=1,991,136,594)
      CPU3 4,747,990,931 unhalted_core_cycles (ena=1,991,136,594, run=1,991,136,594)
      Signed-off-by: default avatarStephane Eranian <eranian@gmail.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Cc: davem@davemloft.net
      Cc: fweisbec@gmail.com
      Cc: perfmon2-devel@lists.sf.net
      Cc: eranian@google.com
      LKML-Reference: <4c76802d.aae9d80a.115d.70fe@mx.google.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      fa66f07a
  6. 26 Aug, 2010 1 commit
    • Frederic Weisbecker's avatar
      perf: Initialize callchains roots's childen hits · 5225c458
      Frederic Weisbecker authored
      Each histogram entry has a callchain root that stores the
      callchain samples. However we forgot to initialize the
      tracking of children hits of these roots, which then got
      random values on their creation.
      
      The root children hits is multiplied by the minimum percentage
      of hits provided by the user, and the result becomes the minimum
      hits expected from children branches. If the random value due
      to the uninitialization is big enough, then this minimum number
      of hits can be huge and eventually filter every children branches.
      
      The end result was invisible callchains. All we need to
      fix this is to initialize the children hits of the root.
      Reported-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: 2.6.32.x-2.6.35.y <stable@kernel.org>
      5225c458
  7. 25 Aug, 2010 3 commits
    • Lin Ming's avatar
      perf, x86, Pentium4: Clear the P4_CCCR_FORCE_OVF flag · 8d330919
      Lin Ming authored
      If on Pentium4 CPUs the FORCE_OVF flag is set then an NMI happens
      on every event, which can generate a flood of NMIs. Clear it.
      Reported-by: default avatarVince Weaver <vweaver1@eecs.utk.edu>
      Signed-off-by: default avatarLin Ming <ming.m.lin@intel.com>
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8d330919
    • Anton Blanchard's avatar
      tracing/trace_stack: Fix stack trace on ppc64 · 151772db
      Anton Blanchard authored
      save_stack_trace() stores the instruction pointer, not the
      function descriptor. On ppc64 the trace stack code currently
      dereferences the instruction pointer and shows 8 bytes of
      instructions in our backtraces:
      
       # cat /sys/kernel/debug/tracing/stack_trace
              Depth    Size   Location    (26 entries)
              -----    ----   --------
        0)     5424     112   0x6000000048000004
        1)     5312     160   0x60000000ebad01b0
        2)     5152     160   0x2c23000041c20030
        3)     4992     240   0x600000007c781b79
        4)     4752     160   0xe84100284800000c
        5)     4592     192   0x600000002fa30000
        6)     4400     256   0x7f1800347b7407e0
        7)     4144     208   0xe89f0108f87f0070
        8)     3936     272   0xe84100282fa30000
      
      Since we aren't dealing with function descriptors, use %pS
      instead of %pF to fix it:
      
       # cat /sys/kernel/debug/tracing/stack_trace
              Depth    Size   Location    (26 entries)
              -----    ----   --------
        0)     5424     112   ftrace_call+0x4/0x8
        1)     5312     160   .current_io_context+0x28/0x74
        2)     5152     160   .get_io_context+0x48/0xa0
        3)     4992     240   .cfq_set_request+0x94/0x4c4
        4)     4752     160   .elv_set_request+0x60/0x84
        5)     4592     192   .get_request+0x2d4/0x468
        6)     4400     256   .get_request_wait+0x7c/0x258
        7)     4144     208   .__make_request+0x49c/0x610
        8)     3936     272   .generic_make_request+0x390/0x434
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Cc: rostedt@goodmis.org
      Cc: fweisbec@gmail.com
      LKML-Reference: <20100825013238.GE28360@kryten>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      151772db
    • Robert Richter's avatar
      oprofile: fix crash when accessing freed task structs · 750d857c
      Robert Richter authored
      This patch fixes a crash during shutdown reported below. The crash is
      caused by accessing already freed task structs. The fix changes the
      order for registering and unregistering notifier callbacks.
      
      All notifiers must be initialized before buffers start working. To
      stop buffer synchronization we cancel all workqueues, unregister the
      notifier callback and then flush all buffers. After all of this we
      finally can free all tasks listed.
      
      This should avoid accessing freed tasks.
      
      On 22.07.10 01:14:40, Benjamin Herrenschmidt wrote:
      
      > So the initial observation is a spinlock bad magic followed by a crash
      > in the spinlock debug code:
      >
      > [ 1541.586531] BUG: spinlock bad magic on CPU#5, events/5/136
      > [ 1541.597564] Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6d03
      >
      > Backtrace looks like:
      >
      >       spin_bug+0x74/0xd4
      >       ._raw_spin_lock+0x48/0x184
      >       ._spin_lock+0x10/0x24
      >       .get_task_mm+0x28/0x8c
      >       .sync_buffer+0x1b4/0x598
      >       .wq_sync_buffer+0xa0/0xdc
      >       .worker_thread+0x1d8/0x2a8
      >       .kthread+0xa8/0xb4
      >       .kernel_thread+0x54/0x70
      >
      > So we are accessing a freed task struct in the work queue when
      > processing the samples.
      Reported-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: stable@kernel.org
      Signed-off-by: default avatarRobert Richter <robert.richter@amd.com>
      750d857c
  8. 24 Aug, 2010 24 commits