1. 06 Mar, 2019 4 commits
    • Andi Kleen's avatar
      perf thread: Generalize function to copy from thread addr space from intel-bts code · 15325938
      Andi Kleen authored
      Add a utility function to fetch executable code. Convert one
      user over to it. There are more places doing that, but they
      do significantly different actions, so they are not
      easy to fit into a single library function.
      
      Committer changes:
      
      . No need to cast around, make 'buf' be a void pointer.
      
      . Rename it to thread__memcpy() to reflect the fact it is about copying
        a chunk of memory from a thread, i.e. from its address space.
      
      . No need to have it in a separate object file, move it to thread.[ch]
      
      . Check the return of map__load(), the original code didn't do it, but
        since we're moving this around, check that as well.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/r/20190305144758.12397-2-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      15325938
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Calculate the max instruction name, align column to that · bc3bb795
      Arnaldo Carvalho de Melo authored
      We were hardcoding '6' as the max instruction name, and we have lots
      that are longer than that, see the diff from two 'P' printed TUI
      annotations for a libc function that uses instructions with long names,
      such as 'vpmovmskb' with its 9 chars:
      
        --- __strcmp_avx2.annotation.before	2019-03-06 16:31:39.368020425 -0300
        +++ __strcmp_avx2.annotation	2019-03-06 16:32:12.079450508 -0300
        @@ -2,284 +2,284 @@
         Event: cycles:ppp
      
         Percent        endbr64
        -  0.10         mov    %edi,%eax
        +  0.10         mov        %edi,%eax
        -               xor    %edx,%edx
        +               xor        %edx,%edx
        -  3.54         vpxor  %ymm7,%ymm7,%ymm7
        +  3.54         vpxor      %ymm7,%ymm7,%ymm7
        -               or     %esi,%eax
        +               or         %esi,%eax
        -               and    $0xfff,%eax
        +               and        $0xfff,%eax
        -               cmp    $0xf80,%eax
        +               cmp        $0xf80,%eax
        -             ↓ jg     370
        +             ↓ jg         370
        - 27.07         vmovdqu (%rdi),%ymm1
        + 27.07         vmovdqu    (%rdi),%ymm1
        -  7.97         vpcmpeqb (%rsi),%ymm1,%ymm0
        +  7.97         vpcmpeqb   (%rsi),%ymm1,%ymm0
        -  2.15         vpminub %ymm1,%ymm0,%ymm0
        +  2.15         vpminub    %ymm1,%ymm0,%ymm0
        -  4.09         vpcmpeqb %ymm7,%ymm0,%ymm0
        +  4.09         vpcmpeqb   %ymm7,%ymm0,%ymm0
        -  0.43         vpmovmskb %ymm0,%ecx
        +  0.43         vpmovmskb  %ymm0,%ecx
        -  1.53         test   %ecx,%ecx
        +  1.53         test       %ecx,%ecx
        -             ↓ je     b0
        +             ↓ je         b0
        -  5.26         tzcnt  %ecx,%edx
        +  5.26         tzcnt      %ecx,%edx
        - 18.40         movzbl (%rdi,%rdx,1),%eax
        + 18.40         movzbl     (%rdi,%rdx,1),%eax
        -  7.09         movzbl (%rsi,%rdx,1),%edx
        +  7.09         movzbl     (%rsi,%rdx,1),%edx
        -  3.34         sub    %edx,%eax
        +  3.34         sub        %edx,%eax
           2.37         vzeroupper
                      ← retq
                        nop
        -         50:   tzcnt  %ecx,%edx
        +         50:   tzcnt      %ecx,%edx
        -               movzbl 0x20(%rdi,%rdx,1),%eax
        +               movzbl     0x20(%rdi,%rdx,1),%eax
        -               movzbl 0x20(%rsi,%rdx,1),%edx
        +               movzbl     0x20(%rsi,%rdx,1),%edx
        -               sub    %edx,%eax
        +               sub        %edx,%eax
                        vzeroupper
                      ← retq
        -               data16 nopw %cs:0x0(%rax,%rax,1)
        +               data16     nopw %cs:0x0(%rax,%rax,1)
      Reported-by: default avatarTravis Downs <travis.downs@gmail.com>
      LPU-Reference: CAOBGo4z1KfmWeOm6Et0cnX5Z6DWsG2PQbAvRn1MhVPJmXHrc5g@mail.gmail.com
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-89wsdd9h9g6bvq52sgp6d0u4@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bc3bb795
    • Yang Wei's avatar
      perf clang: Remove needless extra semicolon · a53837a5
      Yang Wei authored
      Delete a superfluous semicolon in getBPFObjectFromModule().
      Signed-off-by: default avatarYang Wei <yang.wei9@zte.com.cn>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yang Wei <albin_yang@163.com>
      Link: http://lkml.kernel.org/r/1551710174-3349-1-git-send-email-albin_yang@163.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a53837a5
    • Arnaldo Carvalho de Melo's avatar
      perf bpf: Automatically add BTF ELF markers · 3163613c
      Arnaldo Carvalho de Melo authored
      The libbpf loader expects that some __btf_map_<MAP_NAME> structs be in
      place with the keys and values types of maps so that one can store the
      struct definitions and have them sent to the kernel via sys_bpf(fd, cmd
      = BTF_LOAD) and then later be retrievable via sys_bpf(fd, cmd =
      BPF_OBJ_GET_INFO_BY_FD) for use by tools such as 'bpftool map dump id
      MAP_ID'.
      
      Since we already have this for defining maps in 'perf trace' BPF events:
      
         bpf_map(name, _type, type_key, type_val, _max_entries)
      
      As used in the tools/perf/examples/bpf/augmented_raw_syscalls.c:
      
       --- 8< ---
      
      struct syscall {
              bool    enabled;
      };
      
      bpf_map(syscalls, ARRAY, int, struct syscall, 512);
      
       --- 8< ---
      
      All we need is to get all that already available info, piggyback on the
      'bpf_map' define in tools/perf/include/bpf/bpf.h, that is included by
      'perf trace' BPF programs and do that without requiring changes to the
      BPF programs already defining maps using 'bpf_map()'.
      
      So this is what we have before this patch:
      
      1) With this in ~/.perfconfig to dump .c events as .o, aka save a copy
         so that we can use the .o later as a pre-compiled BPF bytecode:
      
        # grep '\[llvm\]' -A2 ~/.perfconfig
        [llvm]
      	dump-obj = true
      	clang-opt = -g
      
        #
        # clang --version
        clang version 9.0.0 (https://git.llvm.org/git/clang.git/ 7906282d3afec5dfdc2b27943fd6c0309086c507) (https://git.llvm.org/git/llvm.git/ a1b5de1ff8ae8bc79dc8e86e1f82565229bd0500)
        Target: x86_64-unknown-linux-gnu
        Thread model: posix
        InstalledDir: /opt/llvm/bin
      
      2) Note the -g there so that we get clang to generate debuginfo, and
         since the target is 'bpf' it will generate the BTF info in this
         clang version (9.0).
      
      3) Run a simple 'perf record' specifiying as an event the augmented_raw_syscalls.c
         source code:
      
        # perf record -e /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
        LLVM: dumping /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.025 MB perf.data ]
      
        # file /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
        /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o: ELF 64-bit LSB relocatable, eBPF, version 1 (SYSV), with debug_info, not stripped
      
      4) Look at the BTF structs encoded in it:
      
        # pahole -F btf --sizes /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
        syscall_enter_args	64	0
        augmented_filename	264	0
        syscall	1	0
        syscall_exit_args	24	0
        bpf_map	28	0
        #
        # pahole -F btf -C syscalls /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
        # pahole -F btf -C syscall /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
        struct syscall {
      	  bool                       enabled;              /*     0     1 */
      
      	  /* size: 1, cachelines: 1, members: 1 */
      	  /* last cacheline: 1 bytes */
        };
        #
      
      5) Ok, with just this we don't have the markers expected by the libbpf
         loader and when we run with this BPF bytecode, because we have:
      
        # grep '\[trace\]' -A1 ~/.perfconfig
        [trace]
      	add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
        #
      
      6) Lets do a 'perf trace' system wide session using this BPF program:
      
         # perf trace -e *mmsg,open*
        Cache2 I/O/6885 openat(AT_FDCWD, "/home/acme/.cache/mozilla/firefox/ina67tev.default/cache2/entries/BA220AB2914006A7AE96D27BE6EA13DD77519FCA", O_RDWR|O_CREAT|O_TRUNC, S_IRUSR|S_IWUSR) = 106
        Cache2 I/O/6885 openat(AT_FDCWD, "/proc/self/mountinfo", O_RDONLY) = 121
        Cache2 I/O/6885 openat(AT_FDCWD, "/proc/self/mountinfo", O_RDONLY) = 121
        Cache2 I/O/6885 openat(AT_FDCWD, "/proc/self/mountinfo", O_RDONLY) = 121
        Cache2 I/O/6885 openat(AT_FDCWD, "/proc/self/mountinfo", O_RDONLY) = 121
        DNS Res~ver #3/23340 openat(AT_FDCWD, "/etc/hosts", O_RDONLY|O_CLOEXEC) = 106
        DNS Res~ver #3/23340 sendmmsg(106<socket:[3482690]>, 0x7f252f1fcaf0, 2, MSG_NOSIGNAL) = 2
        Cache2 I/O/6885 openat(AT_FDCWD, "/home/acme/.cache/mozilla/firefox/ina67tev.default/cache2/entries/BA220AB2914006A7AE96D27BE6EA13DD77519FCA", O_RDWR) = 106
        lighttpd/18915 openat(AT_FDCWD, "/proc/loadavg", O_RDONLY) = 12
      
      7) While it runs lets see the maps that 'perf trace' + libbpf's BPF
        loader loaded into the kernel via sys_bpf(fd, BPF_BTF_LOAD, ...):
      
        # bpftool map list | tail -6
        149: perf_event_array  name __augmented_sys  flags 0x0
      	  key 4B  value 4B  max_entries 8  memlock 4096B
        150: array  name syscalls  flags 0x0
      	  key 4B  value 1B  max_entries 512  memlock 8192B
        151: hash  name pids_filtered  flags 0x0
      	  key 4B  value 1B  max_entries 64  memlock 8192B
        #
      
      8) Dump the "pids_filtered", map, that will have one entry per PID that
         'perf trace' wants filtered, which includes its own, to avoid a
         tracing feedback loop (perf trace shows the syscalls it does which
         generates more syscalls that it has to show that...), it also
         auto-filters the 'gnome-terminal' and 'sshd' parent PIDs, for the
         same reason:
      
        # bpftool map dump id 151
        key: a5 0c 00 00  value: 01
        key: 14 63 00 00  value: 01
        Found 2 elements
        #
      
      9) Since there is no BTF info available, it does a generic hex dump :-\
      
      10) Now, with this patch applied, we'll do steps 3 to 6 again and look
          with pahole if there are extra structs encoded in BTF:
      
        # pahole -F btf --sizes /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
        syscall_enter_args	64	0
        augmented_filename	264	0
        syscall	1	0
        syscall_exit_args	24	0
        bpf_map	28	0
        ____btf_map___augmented_syscalls__	8	0
        ____btf_map_syscalls	8	0
        ____btf_map_pids_filtered	8	0
        #
      
      11) Yes, those __btf_map_ + the map names, lets see how they look like:
      
        # pahole -F btf -C ____btf_map_syscalls /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
        struct ____btf_map_syscalls {
      	  int                        key;                  /*     0     4 */
      	  struct syscall             value;                /*     4     1 */
      
      	  /* size: 8, cachelines: 1, members: 2 */
      	  /* padding: 3 */
      	  /* last cacheline: 8 bytes */
        };
        #
      
      12) Lets repeat step 7 to get the new map ids:
      
        # bpftool map list | tail -6
        155: perf_event_array  name __augmented_sys  flags 0x0
      	  key 4B  value 4B  max_entries 8  memlock 4096B
        156: array  name syscalls  flags 0x0
      	  key 4B  value 1B  max_entries 512  memlock 8192B
        157: hash  name pids_filtered  flags 0x0
      	  key 4B  value 1B  max_entries 64  memlock 8192B
        #
      
      13) And finally lets dump the 'pids_filtered':
      
        # bpftool map dump id 157
        [{
              "key": 3237,
              "value": true
          },{
              "key": 26435,
              "value": true
          }
        ]
        #
      
      Looks much better! BTF info was used to interpret the key as an integer
      and the value as a struct with just one boolean member, so to make it
      more compact, show just the 'true' value where we saw '01'.
      
      Now to make 'perf trace --dump-map' to use BTF!
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lkml.kernel.org/n/tip-ybuf9wpkm30xk28iq7jbwb40@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3163613c
  2. 01 Mar, 2019 13 commits
    • Arnaldo Carvalho de Melo's avatar
      perf beauty msg_flags: Add missing %s lost when adding prefix suppression logic · c3b81a50
      Arnaldo Carvalho de Melo authored
      When the prefix suppresion/enabling logic was added, I forgot to add an
      extra %, which ended up chopping off the strings:
      
      Before:
      
        # perf trace -e *mmsg --map-dump syscalls
        [299] = 1,
        [307] = 1,
        DNS Res~ver #3/14587 sendmmsg(106<socket:[3462393]>, 0x7f252b0fcaf0, 2, MSG_) = 2
        chronyd/1053 recvmmsg(4, 0x558542ca5740, 4, MSG_, NULL) = 1
        DNS Res~ver #2/14445 sendmmsg(106<socket:[3461475]>, 0x7f252ab09af0, 2, MSG_) = 2
        DNS Res~ver #2/14444 sendmmsg(146<socket:[3457863]>, 0x7f2521a7aaf0, 2, MSG_) = 2
        DNS Res~ver #2/14445 sendmmsg(106<socket:[3461475]>, 0x7f252ab09af0, 2, MSG_) = 2
        DNS Res~ver #3/14587 sendmmsg(148<socket:[3460636]>, 0x7f252b0fcaf0, 2, MSG_) = 2
        DNS Res~ver #2/14444 sendmmsg(146<socket:[3457863]>, 0x7f2521a7aaf0, 2, MSG_) = 2
        ^C#
      
      After:
      
        # perf trace -e *mmsg --map-dump syscalls
        [299] = 1,
        [307] = 1,
        NetworkManager/17467 sendmmsg(22<socket:[3466493]>, 0x7f28927f9bb0, 2, MSG_NOSIGNAL) = 2
        pool/17478 sendmmsg(10<socket:[3466523]>, 0x7f2769f95e90, 2, MSG_NOSIGNAL) = 2
        DNS Res~ver #3/14587 sendmmsg(121<socket:[3466132]>, 0x7f252b0fcaf0, 2, MSG_NOSIGNAL) = 2
        chronyd/1053 recvmmsg(4, 0x558542ca5740, 4, MSG_DONTWAIT, NULL) = 1
        Socket Thread/17433 sendmmsg(121<socket:[3460903]>, 0x7f252668baf0, 2, MSG_NOSIGNAL) = 2
        ^C#
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: c65c83ff ("perf trace: Allow asking for not suppressing common string prefixes")
      Link: https://lkml.kernel.org/n/tip-t2eu1rqx710k6jr4814mlzg7@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c3b81a50
    • Adrian Hunter's avatar
      perf scripts python: exported-sql-viewer.py: Add call tree · ae8b887c
      Adrian Hunter authored
      Add a new report to display a call tree. The Call Tree report is very
      similar to the Context-Sensitive Call Graph, but the data is not
      aggregated. Also the 'Count' column, which would be always 1, is replaced
      by the 'Call Time'.
      
      Committer testing:
      
        $ cat simple-retpoline.c
        /*
      
          https://lkml.kernel.org/r/20190109091835.5570-6-adrian.hunter@intel.com
      
        $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
        $ objdump -d simple-retpoline
        */
      
        __attribute__((noinline)) int bar(void)
        {
                return -1;
        }
      
        int foo(void)
        {
                return bar() + 1;
        }
      
        __attribute__((indirect_branch("thunk"))) int main()
        {
                int (*volatile fn)(void) = foo;
      
                fn();
                return fn();
        }
        $
        $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
        $ perf script -i simple-retpoline.perf.data --itrace=be -s ~acme/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
        $ python ~acme/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db
      
      And in the GUI select:
      
          "Reports"
            "Call Tree"
      
          Call Path                 | Object          | Call Time (ns) | Time (ns) | Time (%) | Branch Count | Brach Count (%) |
          > simple-retpolin
            > PID:TID
              > _start                ld-2.28.so       2193855505777      156267      100.0       10602          100.0
                  unknown             unknown          2193855506010        2276        1.5           1            0.0
                > _dl_start           ld-2.28.so       2193855508286      137047       87.7       10088           95.2
                > _dl_init            ld-2.28.so       2193855645444        9142        5.9         326            3.1
                > _start              simple-retpoline 2193855654587        7457        4.8         182            1.7
                  > __libc_start_main <SNIP>
                    <SNIP>
                    > main            simple-retpoline 2193855657493          32        0.5          12            6.7
                      > foo           simple-retpoline 2193855657493          14       43.8           5           41.7
                    <SNIP>
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: https://lkml.kernel.org/n/tip-enf0w96gqzfpv4fi16pw9ovc@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ae8b887c
    • Adrian Hunter's avatar
      perf scripts python: exported-sql-viewer.py: Factor out CallGraphModelBase · 254c0d82
      Adrian Hunter authored
      Factor out a base class CallGraphModelBase from CallGraphModel, so that
      CallGraphModelBase can be reused.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: https://lkml.kernel.org/n/tip-76eybebzjwvgnadkm2oufrqi@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      254c0d82
    • Adrian Hunter's avatar
      perf scripts python: exported-sql-viewer.py: Improve TreeModel abstraction · a448ba23
      Adrian Hunter authored
      Instead of passing the tree root, get it from a method that can be
      implemented in any derived class.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: https://lkml.kernel.org/n/tip-ovcv28bg4mt9swk36ypdyz14@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a448ba23
    • Adrian Hunter's avatar
      perf scripts python: exported-sql-viewer.py: Factor out TreeWindowBase · a731cc4c
      Adrian Hunter authored
      Factor out a base class TreeWindowBase from CallGraphWindow, so that
      TreeWindowBase can be reused.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: https://lkml.kernel.org/n/tip-ifirw0c0mhkwxg6l12lk6k4p@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a731cc4c
    • Adrian Hunter's avatar
      perf scripts python: export-to-postgresql.py: Export calls parent_id · febce6dc
      Adrian Hunter authored
      Export to the 'calls' table the newly created 'parent_id' and create an
      index for it.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: https://lkml.kernel.org/n/tip-eybd6fnk6j9r7g643lsideoo@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      febce6dc
    • Adrian Hunter's avatar
      perf scripts python: export-to-postgresql.py: Fix invalid input syntax for integer error · 07c5ebea
      Adrian Hunter authored
      Fix SQL query error "invalid input syntax for integer":
      
        Traceback (most recent call last):
          File "tools/perf/scripts/python/export-to-postgresql.py", line 465, in <module>
            do_query(query, 'CREATE VIEW calls_view AS '
          File "tools/perf/scripts/python/export-to-postgresql.py", line 274, in do_query
            raise Exception("Query failed: " + q.lastError().text())
        Exception: Query failed: ERROR:  invalid input syntax for integer: ""
        LINE 1: ...ch_count,call_id,return_id,CASE WHEN flags=0 THEN '' WHEN fl...
                                                                     ^
        (22P02) QPSQL: Unable to create query
        Error running python script tools/perf/scripts/python/export-to-postgresql.py
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Fixes: f08046cb ("perf thread-stack: Represent jmps to the start of a different symbol")
      Link: https://lkml.kernel.org/n/tip-strfpdozrvg7bi1xzrivxzqt@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      07c5ebea
    • Adrian Hunter's avatar
      perf scripts python: export-to-sqlite.py: Export calls parent_id · 8ce9a725
      Adrian Hunter authored
      Export to the 'calls' table the newly created 'parent_id'.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: https://lkml.kernel.org/n/tip-b09oukl48rsl9azkp2wmh0bl@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8ce9a725
    • Adrian Hunter's avatar
      perf db-export: Add calls parent_id to enable creation of call trees · f435887e
      Adrian Hunter authored
      The call_path can be used to find the parent symbol for a call but not
      the exact parent call. To do that add parent_id to the call_return
      export. This enables the creation of a call tree from the exported data.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: https://lkml.kernel.org/n/tip-6j7tzdxo67cox6kan7k22oo6@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f435887e
    • Adrian Hunter's avatar
      perf intel-pt: Fix divide by zero when TSC is not available · 07633387
      Adrian Hunter authored
      When TSC is not available, "timeless" decoding is used but a divide by
      zero occurs if perf_time_to_tsc() is called.
      
      Ensure the divisor is not zero.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org # v4.9+
      Link: https://lkml.kernel.org/n/tip-1i4j0wqoc8vlbkcizqqxpsf4@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      07633387
    • Adrian Hunter's avatar
      perf auxtrace: Improve address filter error message when there is no DSO · c1c49204
      Adrian Hunter authored
      The message does not indicate the possibility that the symbol is not
      found because the file does not exist.
      
      Before:
      
        $ perf record -e intel_pt//u --filter 'filter strcmp / strcpy @ foo ' ls
        Symbol 'strcmp' not found.
        Note that symbols must be functions.
        Failed to parse address filter: 'filter strcmp / strcpy @ foo '
        Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [@<file name>]
        Where multiple filters are separated by space or comma.
      
      After:
      
        $ perf record -e intel_pt//u --filter 'filter strcmp / strcpy @ foo ' ls
        File 'foo' not found or has no symbols.
        Symbol 'strcmp' not found.
        Note that symbols must be functions.
        Failed to parse address filter: 'filter strcmp / strcpy @ foo '
        Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [@<file name>]
        Where multiple filters are separated by space or comma.
      Reported-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: https://lkml.kernel.org/n/tip-dvngzxd0jkplzw1ary69dilb@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c1c49204
    • Jin Yao's avatar
      perf time-utils: Refactor time range parsing code · 284c4e18
      Jin Yao authored
      Jiri points out that we don't need any time checking and time string
      parsing if the --time option is not set. That makes sense.
      
      This patch refactors the time range parsing code, move the duplicated
      code from perf report and perf script to time_utils and check if --time
      option is set before parsing the time string. This patch is no logic
      change expected. So the usage of --time is same as before.
      
      For example:
      
      Select the first and second 10% time slices:
        perf report --time 10%/1,10%/2
        perf script --time 10%/1,10%/2
      
      Select the slices from 0% to 10% and from 30% to 40%:
        perf report --time 0%-10%,30%-40%
        perf script --time 0%-10%,30%-40%
      
      Select the time slices from timestamp 3971 to 3973
        perf report --time 3971,3973
        perf script --time 3971,3973
      
      Committer testing:
      
      Using the above examples, check before and after to see if it remains
      the same:
      
        $ perf record -F 10000 -- find . -name "*.[ch]" -exec cat {} + > /dev/null
        [ perf record: Woken up 3 times to write data ]
        [ perf record: Captured and wrote 1.626 MB perf.data (42392 samples) ]
        $
        $ perf report --time 10%/1,10%/2 > /tmp/report.before.1
        $ perf script --time 10%/1,10%/2 > /tmp/script.before.1
        $ perf report --time 0%-10%,30%-40% > /tmp/report.before.2
        $ perf script --time 0%-10%,30%-40% > /tmp/script.before.2
        $ perf report --time 180457.375844,180457.377717 > /tmp/report.before.3
        $ perf script --time 180457.375844,180457.377717 > /tmp/script.before.3
      
      For example, the 3rd test produces this slice:
      
        $ cat /tmp/script.before.3
              cat  3147 180457.375844:   2143 cycles:uppp:      7f79362590d9 cfree@GLIBC_2.2.5+0x9 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.375986:   2245 cycles:uppp:      558b70f3d86e [unknown] (/usr/bin/cat)
              cat  3147 180457.376012:   2164 cycles:uppp:      7f7936257430 _int_malloc+0x8c0 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.376140:   2921 cycles:uppp:      558b70f3a554 [unknown] (/usr/bin/cat)
              cat  3147 180457.376296:   2844 cycles:uppp:      7f7936258abe malloc+0x4e (/usr/lib64/libc-2.28.so)
              cat  3147 180457.376431:   2717 cycles:uppp:      558b70f3b0ca [unknown] (/usr/bin/cat)
              cat  3147 180457.376667:   2630 cycles:uppp:      558b70f3d86e [unknown] (/usr/bin/cat)
              cat  3147 180457.376795:   2442 cycles:uppp:      7f79362bff55 read+0x15 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.376927:   2376 cycles:uppp:  ffffffff9aa00163 [unknown] ([unknown])
              cat  3147 180457.376954:   2307 cycles:uppp:      7f7936257438 _int_malloc+0x8c8 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.377116:   3091 cycles:uppp:      7f7936258a70 malloc+0x0 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.377362:   2945 cycles:uppp:      558b70f3a3b0 [unknown] (/usr/bin/cat)
              cat  3147 180457.377517:   2727 cycles:uppp:      558b70f3a9aa [unknown] (/usr/bin/cat)
        $
      
      Install 'coreutils-debuginfo' to see cat's guts (symbols), but then, the
      above chunk translates into this 'perf report' output:
      
        $ cat /tmp/report.before.3
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 13  of event 'cycles:uppp' (time slices: 180457.375844,180457.377717)
        # Event count (approx.): 33552
        #
        # Overhead  Command  Shared Object     Symbol
        # ........  .......  ................  ......................
        #
            17.69%  cat      libc-2.28.so      [.] malloc
            14.53%  cat      cat               [.] 0x000000000000586e
            13.33%  cat      libc-2.28.so      [.] _int_malloc
             8.78%  cat      cat               [.] 0x00000000000023b0
             8.71%  cat      cat               [.] 0x0000000000002554
             8.13%  cat      cat               [.] 0x00000000000029aa
             8.10%  cat      cat               [.] 0x00000000000030ca
             7.28%  cat      libc-2.28.so      [.] read
             7.08%  cat      [unknown]         [k] 0xffffffff9aa00163
             6.39%  cat      libc-2.28.so      [.] cfree@GLIBC_2.2.5
      
        #
        # (Tip: Order by the overhead of source file name and line number: perf report -s srcline)
        #
        $
      
      Now lets see after applying this patch, nothing should change:
      
        $ perf report --time 10%/1,10%/2 > /tmp/report.after.1
        $ perf script --time 10%/1,10%/2 > /tmp/script.after.1
        $ perf report --time 0%-10%,30%-40% > /tmp/report.after.2
        $ perf script --time 0%-10%,30%-40% > /tmp/script.after.2
        $ perf report --time 180457.375844,180457.377717 > /tmp/report.after.3
        $ perf script --time 180457.375844,180457.377717 > /tmp/script.after.3
        $ diff -u /tmp/report.before.1 /tmp/report.after.1
        $ diff -u /tmp/script.before.1 /tmp/script.after.1
        $ diff -u /tmp/report.before.2 /tmp/report.after.2
        --- /tmp/report.before.2	2019-03-01 11:01:53.526094883 -0300
        +++ /tmp/report.after.2	2019-03-01 11:09:18.231770467 -0300
        @@ -352,5 +352,5 @@
      
         #
        -# (Tip: Generate a script for your data: perf script -g <lang>)
        +# (Tip: Treat branches as callchains: perf report --branch-history)
         #
        $ diff -u /tmp/script.before.2 /tmp/script.after.2
        $ diff -u /tmp/report.before.3 /tmp/report.after.3
        --- /tmp/report.before.3	2019-03-01 11:03:08.890045588 -0300
        +++ /tmp/report.after.3	2019-03-01 11:09:40.660224002 -0300
        @@ -22,5 +22,5 @@
      
         #
        -# (Tip: Order by the overhead of source file name and line number: perf report -s srcline)
        +# (Tip: List events using substring match: perf list <keyword>)
         #
        $ diff -u /tmp/script.before.3 /tmp/script.after.3
        $
      
      Cool, just the 'perf report' tips changed, QED.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1551435186-6008-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      284c4e18
    • Gustavo A. R. Silva's avatar
      perf: Mark expected switch fall-through · 10c3405f
      Gustavo A. R. Silva authored
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      This patch fixes the following warning:
      
        kernel/events/core.c: In function ‘perf_event_parse_addr_filter’:
        kernel/events/core.c:9154:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
            kernel = 1;
            ~~~~~~~^~~
        kernel/events/core.c:9156:3: note: here
           case IF_SRC_FILEADDR:
           ^~~~
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough.
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Gustavo A. R. Silva <gustavo@embeddedor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kees Kook <keescook@chromium.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20190212205430.GA8446@embeddedorSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      10c3405f
  3. 28 Feb, 2019 6 commits
    • Tony Jones's avatar
      tools lib traceevent: Fix buffer overflow in arg_eval · 7c5b019e
      Tony Jones authored
      Fix buffer overflow observed when running perf test.
      
      The overflow is when trying to evaluate "1ULL << (64 - 1)" which is
      resulting in -9223372036854775808 which overflows the 20 character
      buffer.
      
      If is possible this bug has been reported before but I still don't see
      any fix checked in:
      
      See: https://www.spinics.net/lists/linux-perf-users/msg07714.htmlReported-by: default avatarMichael Sartain <mikesart@fastmail.com>
      Reported-by: default avatarMathias Krause <minipli@googlemail.com>
      Signed-off-by: default avatarTony Jones <tonyj@suse.de>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Fixes: f7d82350 ("tools/events: Add files to create libtraceevent.a")
      Link: http://lkml.kernel.org/r/20190228015532.8941-1-tonyj@suse.deSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7c5b019e
    • Arnaldo Carvalho de Melo's avatar
      perf probe: Clarify error message about not finding kernel modules debuginfo · 4d6101f5
      Arnaldo Carvalho de Melo authored
      'perf probe' supports using just the kernel module name, but that will
      work only when the module is loaded, or using the full pathname to the
      file with the DWARF debug info, but the warning was cryptic:
      
      Before:
      
        # perf probe -m cls_flower -L fl_change
        Failed to find the path for cls_flower: No such file or directory
          Error: Failed to show lines.
        #
      
      After:
      
        # perf probe -m cls_flower -L fl_change
        Module cls_flower is not loaded, please specify its full path name.
          Error: Failed to show lines.
        # perf probe -m /lib/modules/5.0.0-rc7+/kernel/net/sched/cls_flower.ko -L fl_change | head -7
        <fl_change@/home/acme/git/linux/net/sched/cls_flower.c:0>
              0  static int fl_change(struct net *net, struct sk_buff *in_skb,
               		       struct tcf_proto *tp, unsigned long base,
               		       u32 handle, struct nlattr **tca,
               		       void **arg, bool ovr, struct netlink_ext_ack *extack)
              4  {
              5  	struct cls_fl_head *head = rtnl_dereference(tp->root);
        #
      
      The behaviour doesn't change when the module is loaded:
      
        # modprobe cls_flower
        # perf probe -m cls_flower -L fl_change | head -7
        <fl_change@/home/acme/git/linux/net/sched/cls_flower.c:0>
              0  static int fl_change(struct net *net, struct sk_buff *in_skb,
                                     struct tcf_proto *tp, unsigned long base,
                                     u32 handle, struct nlattr **tca,
                                     void **arg, bool ovr, struct netlink_ext_ack *extack)
              4  {
              5         struct cls_fl_head *head = rtnl_dereference(tp->root);
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-q4njvk9mshra00jacqjbzfn5@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4d6101f5
    • Song Liu's avatar
      perf, bpf: Consider events with attr.bpf_event as side-band events · 21038f2b
      Song Liu authored
      Events with attr.bpf_event set should be considered as side-band events,
      as they carry information about BPF programs.
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kernel-team@fb.com
      Cc: netdev@vger.kernel.org
      Fixes: 6ee52e2a ("perf, bpf: Introduce PERF_RECORD_BPF_EVENT")
      Link: http://lkml.kernel.org/r/20190226002019.3748539-2-songliubraving@fb.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      21038f2b
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-5.1-20190225' of... · c978b946
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-5.1-20190225' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      perf annotate:
      
        Wei Li:
      
        - Fix getting source line failure
      
      perf script:
      
        Andi Kleen:
      
        - Handle missing fields with -F +...
      
      perf data:
      
        Jiri Olsa:
      
        - Prep work to support per-cpu files in a directory.
      
      Intel PT:
      
        Adrian Hunter:
      
        - Improve thread_stack__no_call_return()
      
        - Hide x86 retpolines in thread stacks.
      
        - exported SQL viewer refactorings, new 'top calls' report..
      
        Alexander Shishkin:
      
        - Copy parent's address filter offsets on clone
      
        - Fix address filters for vmas with non-zero offset. Applies to
          ARM's CoreSight as well.
      
      python scripts:
      
        Tony Jones:
      
        - Python3 support for several 'perf script' python scripts.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c978b946
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-5.1-20190220' of... · 0a157124
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-5.1-20190220' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      perf report:
      
        He Kuang:
      
        - Don't shadow inlined symbol with different addr range.
      
      perf script:
      
        Jiri Olsa:
      
        - Allow +- operator to ask for -F to add/remove fields to
          the default set, for instance to ask for the removal of the
          'cpu' field in tracepoint events, adding 'period' to that
          kind of events, etc.
      
      perf test:
      
        Thomas Richter:
      
        - Fix scheduler tracepoint signedness of COMM fields failure of
          'evsel-tp-sched' test on s390 and other arches.
      
        Tommi Rantala:
      
        - Skip trace+probe_vfs_getname.sh when 'perf trace' is not built.
      
      perf trace:
      
        Arnaldo Carvalho de Melo:
      
        - Add initial BPF map dumper, initially just for the current, minimal
          needs of the augmented_raw_syscalls BPF example used to collect
          pointer args payloads that uses BPF maps for pid and syscall filtering,
          but will in time have features similar to 'perf stat' --interval-print,
          --interval-clear, ways to signal from a BPF event that a specific
          map (or range of that map) should be printed, optionally as a
          histogram, etc.
      
      General:
      
        Jiri Olsa:
      
        - Add CPU and NUMA topologies classes for further reuse, fixing some
          issues in the process.
      
        - Fixup some warnings and debug levels.
      
        - Make rm_rf() remove single file, not just directories.
      
      Documentation:
      
        Jonas Rabenstein:
      
        - Fix HEADER_CMDLINE description in perf.data documentation.
      
        - Fix documentation of the Flags section in perf.data.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0a157124
    • Ingo Molnar's avatar
      9ed8f1a6
  4. 25 Feb, 2019 17 commits