1. 21 Jul, 2010 5 commits
    • Mike Frysinger's avatar
      tracing/documentation: Document dynamic ftracer internals · 9849ed4d
      Mike Frysinger authored
      Add more details to the dynamic function tracing design implementation.
      Signed-off-by: default avatarMike Frysinger <vapier@gentoo.org>
      LKML-Reference: <1279610015-10250-1-git-send-email-vapier@gentoo.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      9849ed4d
    • KOSAKI Motohiro's avatar
      tracing: Shrink max latency ringbuffer if unnecessary · ef710e10
      KOSAKI Motohiro authored
      Documentation/trace/ftrace.txt says
      
        buffer_size_kb:
      
              This sets or displays the number of kilobytes each CPU
              buffer can hold. The tracer buffers are the same size
              for each CPU. The displayed number is the size of the
              CPU buffer and not total size of all buffers. The
              trace buffers are allocated in pages (blocks of memory
              that the kernel uses for allocation, usually 4 KB in size).
              If the last page allocated has room for more bytes
              than requested, the rest of the page will be used,
              making the actual allocation bigger than requested.
              ( Note, the size may not be a multiple of the page size
                due to buffer management overhead. )
      
              This can only be updated when the current_tracer
              is set to "nop".
      
      But it's incorrect. currently total memory consumption is
      'buffer_size_kb x CPUs x 2'.
      
      Why two times difference is there? because ftrace implicitly allocate
      the buffer for max latency too.
      
      That makes sad result when admin want to use large buffer. (If admin
      want full logging and makes detail analysis). example, If admin
      have 24 CPUs machine and write 200MB to buffer_size_kb, the system
      consume ~10GB memory (200MB x 24 x 2). umm.. 5GB memory waste is
      usually unacceptable.
      
      Fortunatelly, almost all users don't use max latency feature.
      The max latency buffer can be disabled easily.
      
      This patch shrink buffer size of the max latency buffer if
      unnecessary.
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      LKML-Reference: <20100701104554.DA2D.A69D9226@jp.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      ef710e10
    • Lai Jiangshan's avatar
      tracing: Reduce latency and remove percpu trace_seq · bc289ae9
      Lai Jiangshan authored
      __print_flags() and __print_symbolic() use percpu trace_seq:
      
      1) Its memory is allocated at compile time, it wastes memory if we don't use tracing.
      2) It is percpu data and it wastes more memory for multi-cpus system.
      3) It disables preemption when it executes its core routine
         "trace_seq_printf(s, "%s: ", #call);" and introduces latency.
      
      So we move this trace_seq to struct trace_iterator.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4C078350.7090106@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      bc289ae9
    • Richard Kennedy's avatar
      trace: Reorder struct ring_buffer_per_cpu to remove padding on 64bit · 985023de
      Richard Kennedy authored
      Reorder structure to remove 8 bytes of padding on 64 bit builds.
      This shrinks the size to 128 bytes so allowing allocation from a smaller
      slab & needed one fewer cache lines.
      Signed-off-by: default avatarRichard Kennedy <richard@rsk.demon.co.uk>
      LKML-Reference: <1269516456.2054.8.camel@localhost>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      985023de
    • Li Zefan's avatar
      tracing: Allow to disable cmdline recording · e870e9a1
      Li Zefan authored
      We found that even enabling a single trace event that will rarely be
      triggered can add big overhead to context switch.
      
      (lmbench context switch test)
       -------------------------------------------------
       2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
       ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
      ------ ------ ------ ------ ------ ------- -------
        2.19   2.3   2.21   2.56   2.13     2.54    2.07
        2.39   2.51  2.35   2.75   2.27     2.81    2.24
      
      The overhead is 6% ~ 11%.
      
      It's because when a trace event is enabled 3 tracepoints (sched_switch,
      sched_wakeup, sched_wakeup_new) will be activated to map pid to cmdname.
      
      We'd like to avoid this overhead, so add a trace option '(no)record-cmd'
      to allow to disable cmdline recording.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4C2D57F4.2050204@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e870e9a1
  2. 17 Jul, 2010 4 commits
  3. 16 Jul, 2010 3 commits
  4. 15 Jul, 2010 3 commits
  5. 12 Jul, 2010 1 commit
  6. 06 Jul, 2010 1 commit
  7. 05 Jul, 2010 14 commits
  8. 04 Jul, 2010 1 commit
    • Will Deacon's avatar
      ARM: 6205/1: perf: ensure counter delta is treated as unsigned · 446a5a8b
      Will Deacon authored
      Hardware performance counters on ARM are 32-bits wide but atomic64_t
      variables are used to represent counter data in the hw_perf_event structure.
      
      The armpmu_event_update function right-shifts a signed 64-bit delta variable
      and adds the result to the event count. This can lead to shifting in sign-bits
      if the MSB of the 32-bit counter value is set. This results in perf output
      such as:
      
       Performance counter stats for 'sleep 20':
      
       18446744073460670464  cycles             <-- 0xFFFFFFFFF12A6000
              7783773  instructions             #      0.000 IPC
                  465  context-switches
                  161  page-faults
              1172393  branches
      
         20.154242147  seconds time elapsed
      
      This patch ensures that the delta value is treated as unsigned so that the
      right shift sets the upper bits to zero.
      
      Cc: <stable@kernel.org>
      Acked-by: default avatarJamie Iles <jamie.iles@picochip.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      446a5a8b
  9. 03 Jul, 2010 1 commit
  10. 02 Jul, 2010 7 commits