1. 14 Jul, 2011 6 commits
    • Cyrill Gorcunov's avatar
      perf, x86: P4 PMU - Introduce event alias feature · f9129870
      Cyrill Gorcunov authored
      Instead of hw_nmi_watchdog_set_attr() weak function
      and appropriate x86_pmu::hw_watchdog_set_attr() call
      we introduce even alias mechanism which allow us
      to drop this routines completely and isolate quirks
      of Netburst architecture inside P4 PMU code only.
      
      The main idea remains the same though -- to allow
      nmi-watchdog and perf top run simultaneously.
      
      Note the aliasing mechanism applies to generic
      PERF_COUNT_HW_CPU_CYCLES event only because arbitrary
      event (say passed as RAW initially) might have some
      additional bits set inside ESCR register changing
      the behaviour of event and we can't guarantee anymore
      that alias event will give the same result.
      
      P.S. Thanks a huge to Don and Steven for for testing
           and early review.
      Acked-by: default avatarDon Zickus <dzickus@redhat.com>
      Tested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
      CC: Stephane Eranian <eranian@google.com>
      CC: Lin Ming <ming.m.lin@intel.com>
      CC: Arnaldo Carvalho de Melo <acme@redhat.com>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/20110708201712.GS23657@sunSigned-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      f9129870
    • Steven Rostedt's avatar
      tracing: Have dynamic size event stack traces · 4a9bd3f1
      Steven Rostedt authored
      Currently the stack trace per event in ftace is only 8 frames.
      This can be quite limiting and sometimes useless. Especially when
      the "ignore frames" is wrong and we also use up stack frames for
      the event processing itself.
      
      Change this to be dynamic by adding a percpu buffer that we can
      write a large stack frame into and then copy into the ring buffer.
      
      For interrupts and NMIs that come in while another event is being
      process, will only get to use the 8 frame stack. That should be enough
      as the task that it interrupted will have the full stack frame anyway.
      Requested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      4a9bd3f1
    • Sonny Rao's avatar
      perf: Robustify proc and debugfs file recording · 259032bf
      Sonny Rao authored
      While attempting to create a timechart of boot up I found perf didn't
      tolerate modules being loaded/unloaded.  This patch fixes this by
      reading the file once and then writing the size read at the correct
      point in the file.  It also simplifies the code somewhat.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Signed-off-by: default avatarSonny Rao <sonnyrao@chromium.org>
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Link: http://lkml.kernel.org/r/10011.1310614483@neuling.orgSigned-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      259032bf
    • Steven Rostedt's avatar
      ftrace: Fix dynamic selftest failure on some archs · 6331c28c
      Steven Rostedt authored
      Archs that do not implement CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST, will
      fail the dynamic ftrace selftest.
      
      The function tracer has a quick 'off' variable that will prevent
      the call back functions from being called. This variable is called
      function_trace_stop. In x86, this is implemented directly in the mcount
      assembly, but for other archs, an intermediate function is used called
      ftrace_test_stop_func().
      
      In dynamic ftrace, the function pointer variable ftrace_trace_function is
      used to update the caller code in the mcount caller. But for archs that
      do not have CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST set, it only calls
      ftrace_test_stop_func() instead, which in turn calls __ftrace_trace_function.
      
      When more than one ftrace_ops is registered, the function it calls is
      ftrace_ops_list_func(), which will iterate over all registered ftrace_ops
      and call the callbacks that have their hash matching.
      
      The issue happens when two ftrace_ops are registered for different functions
      and one is then unregistered. The __ftrace_trace_function is then pointed
      to the remaining ftrace_ops callback function directly. This mean it will
      be called for all functions that were registered to trace by both ftrace_ops
      that were registered.
      
      This is not an issue for archs with CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST,
      because the update of ftrace_trace_function doesn't happen until after all
      functions have been updated, and then the mcount caller is updated. But
      for those archs that do use the ftrace_test_stop_func(), the update is
      immediate.
      
      The dynamic selftest fails because it hits this situation, and the
      ftrace_ops that it registers fails to only trace what it was suppose to
      and instead traces all other functions.
      
      The solution is to delay the setting of __ftrace_trace_function until
      after all the functions have been updated according to the registered
      ftrace_ops. Also, function_trace_stop is set during the update to prevent
      function tracing from calling code that is caused by the function tracer
      itself.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      6331c28c
    • Steven Rostedt's avatar
      ftrace: Update filter when tracing enabled in set_ftrace_filter() · 072126f4
      Steven Rostedt authored
      Currently, if set_ftrace_filter() is called when the ftrace_ops is
      active, the function filters will not be updated. They will only be updated
      when tracing is disabled and re-enabled.
      
      Update the functions immediately during set_ftrace_filter().
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      072126f4
    • Steven Rostedt's avatar
      ftrace: Balance records when updating the hash · 41fb61c2
      Steven Rostedt authored
      Whenever the hash of the ftrace_ops is updated, the record counts
      must be balance. This requires disabling the records that are set
      in the original hash, and then enabling the records that are set
      in the updated hash.
      
      Moving the update into ftrace_hash_move() removes the bug where the
      hash was updated but the records were not, which results in ftrace
      triggering a warning and disabling itself because the ftrace_ops filter
      is updated while the ftrace_ops was registered, and then the failure
      happens when the ftrace_ops is unregistered.
      
      The current code will not trigger this bug, but new code will.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      41fb61c2
  2. 08 Jul, 2011 2 commits
    • Steven Rostedt's avatar
      ftrace: Do not disable interrupts for modules in mcount update · 4376cac6
      Steven Rostedt authored
      When I mounted an NFS directory, it caused several modules to be loaded. At the
      time I was running the preemptirqsoff tracer, and it showed the following
      output:
      
      # tracer: preemptirqsoff
      #
      # preemptirqsoff latency trace v1.1.5 on 2.6.33.9-rt30-mrg-test
      # --------------------------------------------------------------------
      # latency: 1177 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
      #    -----------------
      #    | task: modprobe-19370 (uid:0 nice:0 policy:0 rt_prio:0)
      #    -----------------
      #  => started at: ftrace_module_notify
      #  => ended at:   ftrace_module_notify
      #
      #
      #                  _------=> CPU#
      #                 / _-----=> irqs-off
      #                | / _----=> need-resched
      #                || / _---=> hardirq/softirq
      #                ||| / _--=> preempt-depth
      #                |||| /_--=> lock-depth
      #                |||||/     delay
      #  cmd     pid   |||||| time  |   caller
      #     \   /      ||||||   \   |   /
      modprobe-19370   3d....    0us!: ftrace_process_locs <-ftrace_module_notify
      modprobe-19370   3d.... 1176us : ftrace_process_locs <-ftrace_module_notify
      modprobe-19370   3d.... 1178us : trace_hardirqs_on <-ftrace_module_notify
      modprobe-19370   3d.... 1178us : <stack trace>
       => ftrace_process_locs
       => ftrace_module_notify
       => notifier_call_chain
       => __blocking_notifier_call_chain
       => blocking_notifier_call_chain
       => sys_init_module
       => system_call_fastpath
      
      That's over 1ms that interrupts are disabled on a Real-Time kernel!
      
      Looking at the cause (being the ftrace author helped), I found that the
      interrupts are disabled before the code modification of mcounts into nops. The
      interrupts only need to be disabled on start up around this code, not when
      modules are being loaded.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      4376cac6
    • Steven Rostedt's avatar
      tracing: Still trace filtered irq functions when irq trace is disabled · e4a3f541
      Steven Rostedt authored
      If a function is set to be traced by the set_graph_function, but the
      option funcgraph-irqs is zero, and the traced function happens to be
      called from a interrupt, it will not be traced.
      
      The point of funcgraph-irqs is to not trace interrupts when we are
      preempted by an irq, not to not trace functions we want to trace that
      happen to be *in* a irq.
      
      Luckily the current->trace_recursion element is perfect to add a flag
      to help us be able to trace functions within an interrupt even when
      we are not tracing interrupts that preempt the trace.
      Reported-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Tested-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e4a3f541
  3. 07 Jul, 2011 1 commit
    • Steven Rostedt's avatar
      tracing, x86/irq: Do not trace arch_local_{*,irq_*}() functions · e08fbb78
      Steven Rostedt authored
      I triggered a triple fault with gcc 4.5.1 because it did not
      honor the inline annotation to arch_local_save_flags() function
      and that function was added to the pool of functions traced by
      the function tracer.
      
      When preempt_schedule() called arch_local_save_flags() (called
      by irqs_disabled()), it was traced, but the first thing the
      function tracer does is disable preemption. When it enables
      preemption, the NEED_RESCHED flag will not have been cleared and
      the preemption check will trigger the call to preempt_schedule()
      again.
      
      Although the dynamic function tracer crashed immediately, the
      static version of the function tracer (CONFIG_DYNAMIC_FTRACE is
      not set) actually was able to show where the problem was.
      
       swapper-1       3.N.. 103885us : arch_local_save_flags <-preempt_schedule
       swapper-1       3.N.. 103886us : arch_local_save_flags <-preempt_schedule
       swapper-1       3.N.. 103886us : arch_local_save_flags <-preempt_schedule
       swapper-1       3.N.. 103887us : arch_local_save_flags <-preempt_schedule
       swapper-1       3.N.. 103887us : arch_local_save_flags <-preempt_schedule
       swapper-1       3.N.. 103888us : arch_local_save_flags <-preempt_schedule
       swapper-1       3.N.. 103888us : arch_local_save_flags <-preempt_schedule
      
      It went on for a while before it triple faulted with a corrupted
      stack.
      
      The arch_local_save_flags and arch_local_irq_* functions should
      not be traced. Even though they are marked as inline, gcc may
      still make them a function and enable tracing of them.
      
      The simple solution is to just mark them as notrace. I had to
      add the <linux/types.h> for this file to include the notrace
      tag.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20110702033852.733414762@goodmis.orgSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e08fbb78
  4. 05 Jul, 2011 2 commits
  5. 04 Jul, 2011 1 commit
  6. 03 Jul, 2011 1 commit
  7. 02 Jul, 2011 6 commits
    • Frederic Weisbecker's avatar
      x86: Don't use frame pointer to save old stack on irq entry · a2bbe750
      Frederic Weisbecker authored
      rbp is used in SAVE_ARGS_IRQ to save the old stack pointer
      in order to restore it later in ret_from_intr.
      
      It is convenient because we save its value in the irq regs
      and it's easily restored using the leave instruction.
      
      However this is a kind of abuse of the frame pointer which
      role is to help unwinding the kernel by chaining frames
      together, each node following the return address to the
      previous frame.
      
      But although we are breaking the frame by changing the stack
      pointer, there is no preceding return address before the new
      frame. Hence using the frame pointer to link the two stacks
      breaks the stack unwinders that find a random value instead of
      a return address here.
      
      There is no workaround that can work in every case. We are using
      the fixup_bp_irq_link() function to dereference that abused frame
      pointer in the case of non nesting interrupt (which means stack
      changed).
      But that doesn't fix the case of interrupts that don't change the
      stack (but we still have the unconditional frame link), which is
      the case of hardirq interrupting softirq. We have no way to detect
      this transition so the frame irq link is considered as a real frame
      pointer and the return address is dereferenced but it is still a
      spurious one.
      
      There are two possible results of this: either the spurious return
      address, a random stack value, luckily belongs to the kernel text
      and then the unwinding can continue and we just have a weird entry
      in the stack trace. Or it doesn't belong to the kernel text and
      unwinding stops there.
      
      This is the reason why stacktraces (including perf callchains) on
      irqs that interrupted softirqs don't work very well.
      
      To solve this, we don't save the old stack pointer on rbp anymore
      but we save it to a scratch register that we push on the new
      stack and that we pop back later on irq return.
      
      This preserves the whole frame chain without spurious return addresses
      in the middle and drops the need for the horrid fixup_bp_irq_link()
      workaround.
      
      And finally irqs that interrupt softirq are sanely unwinded.
      
      Before:
      
          99.81%         perf  [kernel.kallsyms]  [k] perf_pending_event
                         |
                         --- perf_pending_event
                             irq_work_run
                             smp_irq_work_interrupt
                             irq_work_interrupt
                            |
                            |--41.60%-- __read
                            |          |
                            |          |--99.90%-- create_worker
                            |          |          bench_sched_messaging
                            |          |          cmd_bench
                            |          |          run_builtin
                            |          |          main
                            |          |          __libc_start_main
                            |           --0.10%-- [...]
      
      After:
      
           1.64%  swapper  [kernel.kallsyms]  [k] perf_pending_event
                  |
                  --- perf_pending_event
                      irq_work_run
                      smp_irq_work_interrupt
                      irq_work_interrupt
                     |
                     |--95.00%-- arch_irq_work_raise
                     |          irq_work_queue
                     |          __perf_event_overflow
                     |          perf_swevent_overflow
                     |          perf_swevent_event
                     |          perf_tp_event
                     |          perf_trace_softirq
                     |          __do_softirq
                     |          call_softirq
                     |          do_softirq
                     |          irq_exit
                     |          |
                     |          |--73.68%-- smp_apic_timer_interrupt
                     |          |          apic_timer_interrupt
                     |          |          |
                     |          |          |--96.43%-- amd_e400_idle
                     |          |          |          cpu_idle
                     |          |          |          start_secondary
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      a2bbe750
    • Frederic Weisbecker's avatar
      x86: Remove useless unwinder backlink from irq regs saving · 48ffee7d
      Frederic Weisbecker authored
      The unwinder backlink in interrupt entry is very useless.
      It's actually not part of the stack frame chain and thus is
      never used.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      48ffee7d
    • Frederic Weisbecker's avatar
      x86,64: Separate arg1 from rbp handling in SAVE_REGS_IRQ · 3b99a3ef
      Frederic Weisbecker authored
      Just for clarity in the code. Have a first block that handles
      the frame pointer and a separate one that handles pt_regs
      pointer and its use.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      3b99a3ef
    • Frederic Weisbecker's avatar
      x86,64: Simplify save_regs() · 1871853f
      Frederic Weisbecker authored
      The save_regs function that saves the regs on low level
      irq entry is complicated because of the fact it changes
      its stack in the middle and also because it manipulates
      data allocated in the caller frame and accesses there
      are directly calculated from callee rsp value with the
      return address in the middle of the way.
      
      This complicates the static stack offsets calculation and
      require more dynamic ones. It also needs a save/restore
      of the function's return address.
      
      To simplify and optimize this, turn save_regs() into a
      macro.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      1871853f
    • Frederic Weisbecker's avatar
      x86: Fetch stack from regs when possible in dump_trace() · 47ce11a2
      Frederic Weisbecker authored
      When regs are passed to dump_stack(), we fetch the frame
      pointer from the regs but the stack pointer is taken from
      the current frame.
      
      Thus the frame and stack pointers may not come from the same
      context. For example this can result in the unwinder to
      think the context is in irq, due to the current value of
      the stack, but the frame pointer coming from the regs points
      to a frame from another place. It then tries to fix up
      the irq link but ends up dereferencing a random frame
      pointer that doesn't belong to the irq stack:
      
      [ 9131.706906] ------------[ cut here ]------------
      [ 9131.707003] WARNING: at arch/x86/kernel/dumpstack_64.c:129 dump_trace+0x2aa/0x330()
      [ 9131.707003] Hardware name: AMD690VM-FMH
      [ 9131.707003] Perf: bad frame pointer = 0000000000000005 in callchain
      [ 9131.707003] Modules linked in:
      [ 9131.707003] Pid: 1050, comm: perf Not tainted 3.0.0-rc3+ #181
      [ 9131.707003] Call Trace:
      [ 9131.707003]  <IRQ>  [<ffffffff8104bd4a>] warn_slowpath_common+0x7a/0xb0
      [ 9131.707003]  [<ffffffff8104be21>] warn_slowpath_fmt+0x41/0x50
      [ 9131.707003]  [<ffffffff8178b873>] ? bad_to_user+0x6d/0x10be
      [ 9131.707003]  [<ffffffff8100c2da>] dump_trace+0x2aa/0x330
      [ 9131.707003]  [<ffffffff810107d3>] ? native_sched_clock+0x13/0x50
      [ 9131.707003]  [<ffffffff8101b164>] perf_callchain_kernel+0x54/0x70
      [ 9131.707003]  [<ffffffff810d391f>] perf_prepare_sample+0x19f/0x2a0
      [ 9131.707003]  [<ffffffff810d546c>] __perf_event_overflow+0x16c/0x290
      [ 9131.707003]  [<ffffffff810d5430>] ? __perf_event_overflow+0x130/0x290
      [ 9131.707003]  [<ffffffff810107d3>] ? native_sched_clock+0x13/0x50
      [ 9131.707003]  [<ffffffff8100fbb9>] ? sched_clock+0x9/0x10
      [ 9131.707003]  [<ffffffff810752e5>] ? T.375+0x15/0x90
      [ 9131.707003]  [<ffffffff81084da4>] ? trace_hardirqs_on_caller+0x64/0x180
      [ 9131.707003]  [<ffffffff810817bd>] ? trace_hardirqs_off+0xd/0x10
      [ 9131.707003]  [<ffffffff810d5764>] perf_event_overflow+0x14/0x20
      [ 9131.707003]  [<ffffffff810d588c>] perf_swevent_hrtimer+0x11c/0x130
      [ 9131.707003]  [<ffffffff817821a1>] ? error_exit+0x51/0xb0
      [ 9131.707003]  [<ffffffff81072e93>] __run_hrtimer+0x83/0x1e0
      [ 9131.707003]  [<ffffffff810d5770>] ? perf_event_overflow+0x20/0x20
      [ 9131.707003]  [<ffffffff81073256>] hrtimer_interrupt+0x106/0x250
      [ 9131.707003]  [<ffffffff812a3bfd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
      [ 9131.707003]  [<ffffffff81024833>] smp_apic_timer_interrupt+0x53/0x90
      [ 9131.707003]  [<ffffffff81789053>] apic_timer_interrupt+0x13/0x20
      [ 9131.707003]  <EOI>  [<ffffffff817821a1>] ? error_exit+0x51/0xb0
      [ 9131.707003]  [<ffffffff8178219c>] ? error_exit+0x4c/0xb0
      [ 9131.707003] ---[ end trace b2560d4876709347 ]---
      
      Fix this by simply taking the stack pointer from regs->sp
      when regs are provided.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      47ce11a2
    • Frederic Weisbecker's avatar
      x86: Save stack pointer in perf live regs savings · 9e46294d
      Frederic Weisbecker authored
      In order to prepare for fetching the stack pointer from the
      regs when possible in dump_trace() instead of taking the
      local one, save the current stack pointer in perf live regs saving.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      9e46294d
  8. 01 Jul, 2011 21 commits