1. 23 Mar, 2018 2 commits
  2. 21 Mar, 2018 10 commits
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Mark jumps to outher functions with the call arrow · 751b1783
      Arnaldo Carvalho de Melo authored
      Things like this in _cpp_lex_token (gcc's cc1 program):
      
           cpp_named_operator2name@@Base+0xa72
      
      Point to a place that is after the cpp_named_operator2name boundaries,
      i.e.  in the ELF symbol table for cc1 cpp_named_operator2name is marked
      as being 32-bytes long, but it in fact is much larger than that, so we
      seem to need a symbols__find() routine that looks for >= current->start
      and  < next_symbol->start, possibly just for C++ objects?
      
      For now lets just make some progress by marking jumps to outside the
      current function as call like.
      
      Actual navigation will come next, with further understanding of how the
      symbol searching and disassembly should be done.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-aiys0a0bsgm3e00hbi6fg7yy@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      751b1783
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Pass function descriptor to its instruction parsing routines · 85a84e4f
      Arnaldo Carvalho de Melo authored
      We need that to figure out if jumps have targets in a different
      function.
      
      E.g. _cpp_lex_token(), in /usr/libexec/gcc/x86_64-redhat-linux/5.3.1/cc1
      has a line like this:
      
        jne    c469be <cpp_named_operator2name@@Base+0xa72>
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-ris0ioziyp469pofpzix2atb@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      85a84e4f
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: No need to calculate notes->start twice · 425859ff
      Arnaldo Carvalho de Melo authored
      Since we already set notes->start to map__rip_2objdump(map, sym->start)
      in symbol__annotate2(), no need to calculate that address again in
      symbol__calc_lines(), just use notes->start.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-ycxlg8mm5ueuj21w6gi62l7g@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      425859ff
    • Arnaldo Carvalho de Melo's avatar
      perf annotate browser: Add 'P' hotkey to dump annotation to file · d9bd7665
      Arnaldo Carvalho de Melo authored
      Just like we have in the histograms browser used as the main screen for
      'perf top --tui' and 'perf report --tui', to print the current
      annotation to a file with a named composed by the symbol name and the
      ".annotation" suffix.
      
      Here is one example of pressing 'A' on 'perf top' to live annotate a
      kernel function and then press 'P' to dump that annotation, the
      resulting file:
      
        # cat _raw_spin_lock_irqsave.annotation
        _raw_spin_lock_irqsave() /proc/kcore
        Event: cycles:ppp
      
          7.14        nop
         21.43        push   %rbx
          7.14        pushfq
                      pop    %rax
                      nop
                      mov    %rax,%rbx
                      cli
                      nop
                      xor    %eax,%eax
                      mov    $0x1,%edx
         64.29        lock   cmpxchg %edx,(%rdi)
                      test   %eax,%eax
                    ↓ jne    2b
                      mov    %rbx,%rax
                      pop    %rbx
                    ← retq
                2b:   mov    %eax,%esi
                    → callq  queued_spin_lock_slowpath
                      mov    %rbx,%rax
                      pop    %rbx
                    ← retq
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-zzmnrwugb5vtk7bvg0rbx150@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d9bd7665
    • Arnaldo Carvalho de Melo's avatar
      perf report: Introduce --ignore-vmlinux command line option · 91340c51
      Arnaldo Carvalho de Melo authored
      We've had this in 'perf top' for quite a while, useful if one wishes
      to force using /proc/kcore to do annotation using the patched kernel
      instead of the ELF image it started from, aka vmlinux.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-ircpvox4wzsv7gasrpb28fw9@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      91340c51
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Introduce --ignore-vmlinux command line option · be316409
      Arnaldo Carvalho de Melo authored
      This is already present in 'perf top', albeit undocumented (will fix),
      and is useful to use /proc/kcore instead of vmlinux and then get what is
      really in place, not what the kernel starts with, before alternatives,
      ftrace .text patching, etc, see the differences:
      
        # perf annotate --stdio2 _raw_spin_lock_irqsave
        _raw_spin_lock_irqsave() /lib/modules/4.16.0-rc4/build/vmlinux
        Event: anon group { cycles, instructions }
      
          0.00   3.17      → callq  __fentry__
          0.00   7.94        push   %rbx
          7.69  36.51      → callq  __page_file_index
                             mov    %rax,%rbx
          7.69   3.17      → callq  *ffffffff82225cd0
                             xor    %eax,%eax
                             mov    $0x1,%edx
         80.77  49.21        lock   cmpxchg %edx,(%rdi)
                             test   %eax,%eax
                           ↓ jne    2b
          3.85   0.00        mov    %rbx,%rax
                             pop    %rbx
                           ← retq
                       2b:   mov    %eax,%esi
                           → callq  queued_spin_lock_slowpath
                             mov    %rbx,%rax
                             pop    %rbx
                           ← retq
        [root@jouet ~]# perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
        _raw_spin_lock_irqsave() /proc/kcore
        Event: anon group { cycles, instructions }
      
          0.00   3.17        nop
          0.00   7.94        push   %rbx
          0.00  23.81        pushfq
          7.69  12.70        pop    %rax
                             nop
                             mov    %rax,%rbx
          7.69   3.17        cli
                             nop
                             xor    %eax,%eax
                             mov    $0x1,%edx
         80.77  49.21        lock   cmpxchg %edx,(%rdi)
                             test   %eax,%eax
                           ↓ jne    2b
          3.85   0.00        mov    %rbx,%rax
                             pop    %rbx
                           ← retq
                       2b:   mov    %eax,%esi
                           → callq  *ffffffff820e96b0
                             mov    %rbx,%rax
                             pop    %rbx
                           ← retq
        #
      
      Diff of the output of those commands:
      
        # perf annotate --stdio2 _raw_spin_lock_irqsave > /tmp/vmlinux
        # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave > /tmp/kcore
        # diff -y /tmp/vmlinux /tmp/kcore
        _raw_spin_lock_irqsave() vmlinux             | _raw_spin_lock_irqsave() /proc/kcore
        Event: anon group { cycles, instructions }     Event: anon group { cycles, instructions }
      
         0.00  3.17  → callq __fentry__              |  0.00  3.17     nop
         0.00  7.94    push  %rbx                       0.00  7.94     push  %rbx
         7.69 36.51  → callq __page_file_index       |  0.00 23.81     pushfq
                                                     >  7.69 12.70     pop   %rax
                                                     >                 nop
                       mov   %rax,%rbx                                 mov   %rax,%rbx
         7.69  3.17  → callq *ffffffff82225cd0       |  7.69  3.17     cli
                                                     >                 nop
                       xor   %eax,%eax                                 xor   %eax,%eax
                       mov   $0x1,%edx                                 mov   $0x1,%edx
        80.77 49.21    lock  cmpxchg %edx,(%rdi)       80.77 49.21     lock  cmpxchg %edx,(%rdi)
                       test  %eax,%eax                                 test  %eax,%eax
                     ↓ jne   2b                                      ↓ jne   2b
         3.85  0.00    mov   %rbx,%rax                  3.85  0.00     mov   %rbx,%rax
                       pop   %rbx                                      pop   %rbx
                     ← retq                                          ← retq
                  2b:  mov   %eax,%esi                            2b:  mov   %eax,%esi
                     → callq queued_spin_lock_slowpath|              → callq *ffffffff820e96b0
                       mov   %rbx,%rax                                 mov   %rbx,%rax
                       pop   %rbx                                      pop   %rbx
                     ← retq                                          ← retq
        #
      
      This should be further streamlined by doing both annotations and
      allowing the TUI to toggle initial/current, and show the patched
      instructions in a slightly different color.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-wz8d269hxkcwaczr0r4rhyjg@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      be316409
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Add function header to --stdio2 · 864298f2
      Arnaldo Carvalho de Melo authored
        # perf annotate --stdio2 _raw_spin_lock_irqsave
        _raw_spin_lock_irqsave() /lib/modules/4.16.0-rc4/build/vmlinux
        Event: anon group { cycles, instructions }
      
          0.00   3.17      → callq  __fentry__
          0.00   7.94        push   %rbx
          7.69  36.51      → callq  __page_file_index
                             mov    %rax,%rbx
          7.69   3.17      → callq  *ffffffff82225cd0
                             xor    %eax,%eax
                             mov    $0x1,%edx
         80.77  49.21        lock   cmpxchg %edx,(%rdi)
                             test   %eax,%eax
                           ↓ jne    2b
          3.85   0.00        mov    %rbx,%rax
                             pop    %rbx
                           ← retq
                       2b:   mov    %eax,%esi
                           → callq  queued_spin_lock_slowpath
                             mov    %rbx,%rax
                             pop    %rbx
                           ← retq
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-i86yfyzl8m194ioxgj1jo32f@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      864298f2
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Use the default annotation options for --stdio2 · 35632892
      Arnaldo Carvalho de Melo authored
      With an empty '[annotate]' section in ~/.perfconfig:
      
        # perf record -a --all-kernel -e '{cycles,instructions}:P' sleep 5
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 2.243 MB perf.data (5513 samples) ]
        # perf annotate --stdio2 _raw_spin_lock | head -20
      
                           Disassembly of section .text:
      
                           ffffffff81868790 <_raw_spin_lock>:
                           _raw_spin_lock():
                           EXPORT_SYMBOL(_raw_spin_trylock_bh);
                           #endif
      
                           #ifndef CONFIG_INLINE_SPIN_LOCK
                           void __lockfunc _raw_spin_lock(raw_spinlock_t *lock)
                           {
                           → callq  __fentry__
                           atomic_cmpxchg():
                                   return xadd(&v->counter, -i);
                           }
      
                           static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new)
                           {
        # perf annotate --stdio2 _raw_spin_lock | head -20
                           → callq  __fentry__
                             xor    %eax,%eax
                             mov    $0x1,%edx
         87.50 100.00        lock   cmpxchg %edx,(%rdi)
          6.25   0.00        test   %eax,%eax
                           ↓ jne    16
          6.25   0.00        repz   retq
                       16:   mov    %eax,%esi
                           ↑ jmpq   ffffffff810e96b0 <queued_spin_lock_slowpath>
        #
        # cat ~/.perfconfig
        [annotate]
      
          hide_src_code = false
          show_linenr = true
        # perf annotate --stdio2 _raw_spin_lock | head -20
      
                       3   Disassembly of section .text:
      
                       5   ffffffff81868790 <_raw_spin_lock>:
                       6   _raw_spin_lock():
                       143 EXPORT_SYMBOL(_raw_spin_trylock_bh);
                       144 #endif
      
                       146 #ifndef CONFIG_INLINE_SPIN_LOCK
                       147 void __lockfunc _raw_spin_lock(raw_spinlock_t *lock)
                       148 {
                           → callq  __fentry__
                       150 atomic_cmpxchg():
                       187         return xadd(&v->counter, -i);
                       188 }
      
                       190 static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new)
                       191 {
        #
        # cat ~/.perfconfig
        [annotate]
      
          hide_src_code = true
          show_total_period = true
        # perf annotate --stdio2 _raw_spin_lock | head -20
                                     → callq  __fentry__
                                       xor    %eax,%eax
                                       mov    $0x1,%edx
            1411316      152339        lock   cmpxchg %edx,(%rdi)
             344694           0        test   %eax,%eax
                                     ↓ jne    16
              80806           0        repz   retq
                                 16:   mov    %eax,%esi
                                     ↑ jmpq   ffffffff810e96b0 <queued_spin_lock_slowpath>
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-nu4rxg5zkdtgs1b2gc40p7v7@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      35632892
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Move the default annotate options to the library · 7f0b6fde
      Arnaldo Carvalho de Melo authored
      One more thing that goes from the TUI code to be used more widely,
      for instance it'll affect the default options used by:
      
        perf annotate --stdio2
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-0nsz0dm0akdbo30vgja2a10e@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7f0b6fde
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Introduce the --stdio2 output mode · befd2a38
      Arnaldo Carvalho de Melo authored
      This uses the TUI augmented formatting routines, modulo interactivity.
      
        # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
        _raw_spin_lock_irqsave() /proc/kcore
        Event: cycles:ppp
      
        Percent
      
                    Disassembly of section load0:
      
                    ffffffff9a8734b0 <load0>:
                      nop
                      push   %rbx
         50.00        pushfq
                      pop    %rax
                      nop
                      mov    %rax,%rbx
                      cli
                      nop
                      xor    %eax,%eax
                      mov    $0x1,%edx
         50.00        lock   cmpxchg %edx,(%rdi)
                      test   %eax,%eax
                    ↓ jne    2b
                      mov    %rbx,%rax
                      pop    %rbx
                    ← retq
                2b:   mov    %eax,%esi
                    → callq  queued_spin_lock_slowpath
                      mov    %rbx,%rax
                      pop    %rbx
                    ← retq
      Tested-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-6cte5o8z84mbivbvqlg14uh1@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      befd2a38
  3. 20 Mar, 2018 28 commits