1. 20 May, 2016 2 commits
    • Steven Rostedt's avatar
      ftrace/x86: Set ftrace_stub to weak to prevent gcc from using short jumps to it · 8329e818
      Steven Rostedt authored
      Matt Fleming reported seeing crashes when enabling and disabling
      function profiling which uses function graph tracer. Later Namhyung Kim
      hit a similar issue and he found that the issue was due to the jmp to
      ftrace_stub in ftrace_graph_call was only two bytes, and when it was
      changed to jump to the tracing code, it overwrote the ftrace_stub that
      was after it.
      
      Masami Hiramatsu bisected this down to a binutils change:
      
      8dcea93252a9ea7dff57e85220a719e2a5e8ab41 is the first bad commit
      commit 8dcea93252a9ea7dff57e85220a719e2a5e8ab41
      Author: H.J. Lu <hjl.tools@gmail.com>
      Date:   Fri May 15 03:17:31 2015 -0700
      
          Add -mshared option to x86 ELF assembler
      
          This patch adds -mshared option to x86 ELF assembler.  By default,
          assembler will optimize out non-PLT relocations against defined non-weak
          global branch targets with default visibility.  The -mshared option tells
          the assembler to generate code which may go into a shared library
          where all non-weak global branch targets with default visibility can
          be preempted.  The resulting code is slightly bigger.  This option
          only affects the handling of branch instructions.
      
      Declaring ftrace_stub as a weak call prevents gas from using two byte
      jumps to it, which would be converted to a jump to the function graph
      code.
      
      Link: http://lkml.kernel.org/r/20160516230035.1dbae571@gandalf.local.homeReported-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Reported-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      8329e818
    • Soumya PN's avatar
      ftrace: Don't disable irqs when taking the tasklist_lock read_lock · 6112a300
      Soumya PN authored
      In ftrace.c inside the function alloc_retstack_tasklist() (which will be
      invoked when function_graph tracing is on) the tasklist_lock is being
      held as reader while iterating through a list of threads. Here the lock
      is being held as reader with irqs disabled. The tasklist_lock is never
      write_locked in interrupt context so it is safe to not disable interrupts
      for the duration of read_lock in this block which, can be significant,
      given the block of code iterates through all threads. Hence changing the
      code to call read_lock() and read_unlock() instead of read_lock_irqsave()
      and read_unlock_irqrestore().
      
      A similar change was made in commits: 8063e41d ("tracing: Change
      syscall_*regfunc() to check PF_KTHREAD and use for_each_process_thread()")'
      and 3472eaa1 ("sched: normalize_rt_tasks(): Don't use _irqsave for
      tasklist_lock, use task_rq_lock()")'
      
      Link: http://lkml.kernel.org/r/1463500874-77480-1-git-send-email-soumya.p.n@hpe.comSigned-off-by: default avatarSoumya PN <soumya.p.n@hpe.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      6112a300
  2. 09 May, 2016 1 commit
  3. 03 May, 2016 2 commits
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Use temp buffer when filtering events · 0fc1b09f
      Steven Rostedt (Red Hat) authored
      Filtering of events requires the data to be written to the ring buffer
      before it can be decided to filter or not. This is because the parameters of
      the filter are based on the result that is written to the ring buffer and
      not on the parameters that are passed into the trace functions.
      
      The ftrace ring buffer is optimized for writing into the ring buffer and
      committing. The discard procedure used when filtering decides the event
      should be discarded is much more heavy weight. Thus, using a temporary
      filter when filtering events can speed things up drastically.
      
      Without a temp buffer we have:
      
       # trace-cmd start -p nop
       # perf stat -r 10 hackbench 50
             0.790706626 seconds time elapsed ( +-  0.71% )
      
       # trace-cmd start -e all
       # perf stat -r 10 hackbench 50
             1.566904059 seconds time elapsed ( +-  0.27% )
      
       # trace-cmd start -e all -f 'common_preempt_count==20'
       # perf stat -r 10 hackbench 50
             1.690598511 seconds time elapsed ( +-  0.19% )
      
       # trace-cmd start -e all -f 'common_preempt_count!=20'
       # perf stat -r 10 hackbench 50
             1.707486364 seconds time elapsed ( +-  0.30% )
      
      The first run above is without any tracing, just to get a based figure.
      hackbench takes ~0.79 seconds to run on the system.
      
      The second run enables tracing all events where nothing is filtered. This
      increases the time by 100% and hackbench takes 1.57 seconds to run.
      
      The third run filters all events where the preempt count will equal "20"
      (this should never happen) thus all events are discarded. This takes 1.69
      seconds to run. This is 10% slower than just committing the events!
      
      The last run enables all events and filters where the filter will commit all
      events, and this takes 1.70 seconds to run. The filtering overhead is
      approximately 10%. Thus, the discard and commit of an event from the ring
      buffer may be about the same time.
      
      With this patch, the numbers change:
      
       # trace-cmd start -p nop
       # perf stat -r 10 hackbench 50
             0.778233033 seconds time elapsed ( +-  0.38% )
      
       # trace-cmd start -e all
       # perf stat -r 10 hackbench 50
             1.582102692 seconds time elapsed ( +-  0.28% )
      
       # trace-cmd start -e all -f 'common_preempt_count==20'
       # perf stat -r 10 hackbench 50
             1.309230710 seconds time elapsed ( +-  0.22% )
      
       # trace-cmd start -e all -f 'common_preempt_count!=20'
       # perf stat -r 10 hackbench 50
             1.786001924 seconds time elapsed ( +-  0.20% )
      
      The first run is again the base with no tracing.
      
      The second run is all tracing with no filtering. It is a little slower, but
      that may be well within the noise.
      
      The third run shows that discarding all events only took 1.3 seconds. This
      is a speed up of 23%! The discard is much faster than even the commit.
      
      The one downside is shown in the last run. Events that are not discarded by
      the filter will take longer to add, this is due to the extra copy of the
      event.
      
      Cc: Alexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      0fc1b09f
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Remove TRACE_EVENT_FL_USE_CALL_FILTER logic · dcb0b557
      Steven Rostedt (Red Hat) authored
      Nothing sets TRACE_EVENT_FL_USE_CALL_FILTER anymore. Remove it.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      dcb0b557
  4. 29 Apr, 2016 6 commits
  5. 27 Apr, 2016 2 commits
  6. 26 Apr, 2016 5 commits
  7. 19 Apr, 2016 22 commits