1. 13 Mar, 2010 5 commits
    • Steven Rostedt's avatar
      tracing: Do not record user stack trace from NMI context · b6345879
      Steven Rostedt authored
      A bug was found with Li Zefan's ftrace_stress_test that caused applications
      to segfault during the test.
      
      Placing a tracing_off() in the segfault code, and examining several
      traces, I found that the following was always the case. The lock tracer
      was enabled (lockdep being required) and userstack was enabled. Testing
      this out, I just enabled the two, but that was not good enough. I needed
      to run something else that could trigger it. Running a load like hackbench
      did not work, but executing a new program would. The following would
      trigger the segfault within seconds:
      
        # echo 1 > /debug/tracing/options/userstacktrace
        # echo 1 > /debug/tracing/events/lock/enable
        # while :; do ls > /dev/null ; done
      
      Enabling the function graph tracer and looking at what was happening
      I finally noticed that all cashes happened just after an NMI.
      
       1)               |    copy_user_handle_tail() {
       1)               |      bad_area_nosemaphore() {
       1)               |        __bad_area_nosemaphore() {
       1)               |          no_context() {
       1)               |            fixup_exception() {
       1)   0.319 us    |              search_exception_tables();
       1)   0.873 us    |            }
      [...]
       1)   0.314 us    |  __rcu_read_unlock();
       1)   0.325 us    |    native_apic_mem_write();
       1)   0.943 us    |  }
       1)   0.304 us    |  rcu_nmi_exit();
      [...]
       1)   0.479 us    |  find_vma();
       1)               |  bad_area() {
       1)               |    __bad_area() {
      
      After capturing several traces of failures, all of them happened
      after an NMI. Curious about this, I added a trace_printk() to the NMI
      handler to read the regs->ip to see where the NMI happened. In which I
      found out it was here:
      
      ffffffff8135b660 <page_fault>:
      ffffffff8135b660:       48 83 ec 78             sub    $0x78,%rsp
      ffffffff8135b664:       e8 97 01 00 00          callq  ffffffff8135b800 <error_entry>
      
      What was happening is that the NMI would happen at the place that a page
      fault occurred. It would call rcu_read_lock() which was traced by
      the lock events, and the user_stack_trace would run. This would trigger
      a page fault inside the NMI. I do not see where the CR2 register is
      saved or restored in NMI handling. This means that it would corrupt
      the page fault handling that the NMI interrupted.
      
      The reason the while loop of ls helped trigger the bug, was that
      each execution of ls would cause lots of pages to be faulted in, and
      increase the chances of the race happening.
      
      The simple solution is to not allow user stack traces in NMI context.
      After this patch, I ran the above "ls" test for a couple of hours
      without any issues. Without this patch, the bug would trigger in less
      than a minute.
      
      Cc: stable@kernel.org
      Reported-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      b6345879
    • Steven Rostedt's avatar
      tracing: Disable buffer switching when starting or stopping trace · a2f80714
      Steven Rostedt authored
      When the trace iterator is read, tracing_start() and tracing_stop()
      is called to stop tracing while the iterator is processing the trace
      output.
      
      These functions disable both the standard buffer and the max latency
      buffer. But if the wakeup tracer is running, it can switch these
      buffers between the two disables:
      
        buffer = global_trace.buffer;
        if (buffer)
            ring_buffer_record_disable(buffer);
      
            <<<--------- swap happens here
      
        buffer = max_tr.buffer;
        if (buffer)
            ring_buffer_record_disable(buffer);
      
      What happens is that we disabled the same buffer twice. On tracing_start()
      we can enable the same buffer twice. All ring_buffer_record_disable()
      must be matched with a ring_buffer_record_enable() or the buffer
      can be disable permanently, or enable prematurely, and cause a bug
      where a reset happens while a trace is commiting.
      
      This patch protects these two by taking the ftrace_max_lock to prevent
      a switch from occurring.
      
      Found with Li Zefan's ftrace_stress_test.
      
      Cc: stable@kernel.org
      Reported-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      a2f80714
    • Steven Rostedt's avatar
      tracing: Use same local variable when resetting the ring buffer · 283740c6
      Steven Rostedt authored
      In the ftrace code that resets the ring buffer it references the
      buffer with a local variable, but then uses the tr->buffer as the
      parameter to reset. If the wakeup tracer is running, which can
      switch the tr->buffer with the max saved buffer, this can break
      the requirement of disabling the buffer before the reset.
      
         buffer = tr->buffer;
         ring_buffer_record_disable(buffer);
         synchronize_sched();
         __tracing_reset(tr->buffer, cpu);
      
      If the tr->buffer is swapped, then the reset is not happening to the
      buffer that was disabled. This will cause the ring buffer to fail.
      
      Found with Li Zefan's ftrace_stress_test.
      
      Cc: stable@kernel.org
      Reported-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      283740c6
    • Steven Rostedt's avatar
      function-graph: Init curr_ret_stack with ret_stack · ea14eb71
      Steven Rostedt authored
      If the graph tracer is active, and a task is forked but the allocating of
      the processes graph stack fails, it can cause crash later on.
      
      This is due to the temporary stack being NULL, but the curr_ret_stack
      variable is copied from the parent. If it is not -1, then in
      ftrace_graph_probe_sched_switch() the following:
      
      	for (index = next->curr_ret_stack; index >= 0; index--)
      		next->ret_stack[index].calltime += timestamp;
      
      Will cause a kernel OOPS.
      
      Found with Li Zefan's ftrace_stress_test.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      ea14eb71
    • Lai Jiangshan's avatar
      ring-buffer: Move disabled check into preempt disable section · 52fbe9cd
      Lai Jiangshan authored
      The ring buffer resizing and resetting relies on a schedule RCU
      action. The buffers are disabled, a synchronize_sched() is called
      and then the resize or reset takes place.
      
      But this only works if the disabling of the buffers are within the
      preempt disabled section, otherwise a window exists that the buffers
      can be written to while a reset or resize takes place.
      
      Cc: stable@kernel.org
      Reported-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4B949E43.2010906@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      52fbe9cd
  2. 11 Mar, 2010 1 commit
  3. 06 Mar, 2010 4 commits
    • Tim Bird's avatar
      function-graph: Add tracing_thresh support to function_graph tracer · 0e950173
      Tim Bird authored
      Add support for tracing_thresh to the function_graph tracer.  This
      version of this feature isolates the checks into new entry and
      return functions, to avoid adding more conditional code into the
      main function_graph paths.
      
      When the tracing_thresh is set and the function graph tracer is
      enabled, only the functions that took longer than the time in
      microseconds that was set in tracing_thresh are recorded. To do this
      efficiently, only the function exits are recorded:
      
       [tracing]# echo 100 > tracing_thresh
       [tracing]# echo function_graph > current_tracer
       [tracing]# cat trace
       # tracer: function_graph
       #
       # CPU  DURATION                  FUNCTION CALLS
       # |     |   |                     |   |   |   |
        1) ! 119.214 us  |  } /* smp_apic_timer_interrupt */
        1)   <========== |
        0) ! 101.527 us  |              } /* __rcu_process_callbacks */
        0) ! 126.461 us  |            } /* rcu_process_callbacks */
        0) ! 145.111 us  |          } /* __do_softirq */
        0) ! 149.667 us  |        } /* do_softirq */
        0) ! 168.817 us  |      } /* irq_exit */
        0) ! 248.254 us  |    } /* smp_apic_timer_interrupt */
      
      Also, add support for specifying tracing_thresh on the kernel
      command line.  When used like so: "tracing_thresh=200 ftrace=function_graph"
      this can be used to analyse system startup.  It is important to disable
      tracing soon after boot, in order to avoid losing the trace data.
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarTim Bird <tim.bird@am.sony.com>
      LKML-Reference: <4B87098B.4040308@am.sony.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      0e950173
    • Arnaldo Carvalho de Melo's avatar
      tracing: Update the comm field in the right variable in update_max_tr · 1acaa1b2
      Arnaldo Carvalho de Melo authored
      The latency output showed:
      
       #    | task: -3 (uid:0 nice:0 policy:1 rt_prio:99)
      
      The comm is missing in the "task:" and it looks like a minus 3 is
      the output. The correct display should be:
      
       #    | task: migration/0-3 (uid:0 nice:0 policy:1 rt_prio:99)
      
      The problem is that the comm is being stored in the wrong data
      structure. The max_tr.data[cpu] is what stores the comm, not the
      tr->data[cpu].
      
      Before this patch the max_tr.data[cpu]->comm was zeroed and the /debug/trace
      ended up showing just the '-' sign followed by the pid.
      
      Also remove a needless initialization of max_data.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1267824230-23861-1-git-send-email-acme@infradead.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      1acaa1b2
    • Steven Rostedt's avatar
      function-graph: Use comment notation for func names of dangling '}' · a094fe04
      Steven Rostedt authored
      When a '}' does not have a matching function start, the name is printed
      within parenthesis. But this makes it confusing between ending '}'
      and function starts. This patch makes the function name appear in C comment
      notation.
      
      Old view:
       3)   1.281 us    |            } (might_fault)
       3)   3.620 us    |          } (filldir)
       3)   5.251 us    |        } (call_filldir)
       3)               |        call_filldir() {
       3)               |          filldir() {
      
      New view:
       3)   1.281 us    |            } /* might_fault */
       3)   3.620 us    |          } /* filldir */
       3)   5.251 us    |        } /* call_filldir */
       3)               |        call_filldir() {
       3)               |          filldir() {
      Requested-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      a094fe04
    • Steven Rostedt's avatar
      function-graph: Fix unused reference to ftrace_set_func() · 801c29fd
      Steven Rostedt authored
      The declaration of ftrace_set_func() is at the start of the ftrace.c file
      and wrapped with a #ifdef CONFIG_FUNCTION_GRAPH condition. If function
      graph tracing is enabled but CONFIG_DYNAMIC_FTRACE is not, a warning
      about that function being declared static and unused is given.
      
      This really should have been placed within the CONFIG_FUNCTION_GRAPH
      condition that uses ftrace_set_func().
      
      Moving the declaration down fixes the warning and makes the code cleaner.
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      801c29fd
  4. 04 Mar, 2010 1 commit
  5. 03 Mar, 2010 1 commit
    • Lai Jiangshan's avatar
      tracing: Fix warning in s_next of trace file ops · ac91d854
      Lai Jiangshan authored
      This warning in s_next() can be triggered by lseek():
       [<c018b3f7>] ? s_next+0x77/0x80
       [<c013e3c1>] warn_slowpath_common+0x81/0xa0
       [<c018b3f7>] ? s_next+0x77/0x80
       [<c013e3fa>] warn_slowpath_null+0x1a/0x20
       [<c018b3f7>] s_next+0x77/0x80
       [<c01efa77>] traverse+0x117/0x200
       [<c01eff13>] seq_lseek+0xa3/0x120
       [<c01efe70>] ? seq_lseek+0x0/0x120
       [<c01d7081>] vfs_llseek+0x41/0x50
       [<c01d8116>] sys_llseek+0x66/0xa0
       [<c0102bd0>] sysenter_do_call+0x12/0x26
      
      The iterator "leftover" variable is zeroed in the opening of the trace
      file. But lseek can call s_start() which will call s_next() without
      reseting the "leftover" variable back to zero, which might trigger
      the WARN_ON_ONCE(iter->leftover) that is in s_next().
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4B8CE06A.9090207@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      ac91d854
  6. 02 Mar, 2010 12 commits
  7. 01 Mar, 2010 16 commits