1. 19 Mar, 2019 13 commits
    • Changbin Du's avatar
      perf config: Fix an error in the config template documentation · 9b40dff7
      Changbin Du authored
      The option 'sort-order' should be 'sort_order'.
      Signed-off-by: default avatarChangbin Du <changbin.du@gmail.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Fixes: 893c5c79 ("perf config: Show default report configuration in example and docs")
      Link: http://lkml.kernel.org/r/20190316080556.3075-5-changbin.du@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9b40dff7
    • Changbin Du's avatar
      perf tools: Fix errors under optimization level '-Og' · 11c1ea6f
      Changbin Du authored
      Optimization level '-Og' offers a reasonable level of optimization while
      maintaining fast compilation and a good debugging experience. This patch
      tries to make it work.
      
        $ make DEBUG=1 EXTRA_CFLAGS='-Og'
        bench/epoll-ctl.c: In function ‘do_threads’:
        bench/epoll-ctl.c:274:9: error: ‘ret’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
          return ret;
                 ^~~
        ...
      Signed-off-by: default avatarChangbin Du <changbin.du@gmail.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20190316080556.3075-4-changbin.du@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      11c1ea6f
    • Changbin Du's avatar
      perf list: Don't forget to drop the reference to the allocated thread_map · 39df730b
      Changbin Du authored
      Detected via gcc's ASan:
      
        Direct leak of 2048 byte(s) in 64 object(s) allocated from:
          6     #0 0x7f606512e370 in __interceptor_realloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee370)
          7     #1 0x556b0f1d7ddd in thread_map__realloc util/thread_map.c:43
          8     #2 0x556b0f1d84c7 in thread_map__new_by_tid util/thread_map.c:85
          9     #3 0x556b0f0e045e in is_event_supported util/parse-events.c:2250
         10     #4 0x556b0f0e1aa1 in print_hwcache_events util/parse-events.c:2382
         11     #5 0x556b0f0e3231 in print_events util/parse-events.c:2514
         12     #6 0x556b0ee0a66e in cmd_list /home/changbin/work/linux/tools/perf/builtin-list.c:58
         13     #7 0x556b0f01e0ae in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
         14     #8 0x556b0f01e859 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
         15     #9 0x556b0f01edc8 in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
         16     #10 0x556b0f01f71f in main /home/changbin/work/linux/tools/perf/perf.c:520
         17     #11 0x7f6062ccf09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
      Signed-off-by: default avatarChangbin Du <changbin.du@gmail.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Fixes: 89896051 ("perf tools: Do not put a variable sized type not at the end of a struct")
      Link: http://lkml.kernel.org/r/20190316080556.3075-3-changbin.du@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      39df730b
    • Changbin Du's avatar
      perf tools: Add doc about how to build perf with Asan and UBSan · af7a14a7
      Changbin Du authored
      AddressSanitizer (or ASan) and UndefinedBehaviorSanitizer (or UBSan) are
      very useful tools to detect program bugs:
      
       - AddressSanitizer (or ASan) is a GCC feature that detects memory
         corruption bugs such as buffer overflows and memory leaks.
      
       - UndefinedBehaviorSanitizer (or UBSan) is a fast undefined behavior
         detector supported by GCC. UBSan detects undefined behaviors of programs
         at runtime.
      
      This patch adds a document about how to use them on perf. Later patches will fix
      some of the issues disclosed by them.
      Signed-off-by: default avatarChangbin Du <changbin.du@gmail.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20190316080556.3075-2-changbin.du@gmail.com
      [ Make some changes based on comments made by Jiri Olsa ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      af7a14a7
    • Mamatha Inamdar's avatar
      perf vendor events: Remove P8 HW events which are not supported · c3b4d5c4
      Mamatha Inamdar authored
      This patch is to remove following hardware events from JSON file which
      are not supported on POWER8.
      
        pm_co_disp_fail
        pm_co_tm_sc_footprint
        pm_iside_disp
        pm_iside_disp_fail
        pm_iside_disp_fail_other
        pm_iside_mru_touch
        pm_l2_castout_mod
        pm_l2_castout_shr
        pm_l2_dc_inv
        pm_l2_disp_all_l2miss
        pm_l2_grp_guess_correct
        pm_l2_grp_guess_wrong
        pm_l2_ic_inv
        pm_l2_inst
        pm_l2_inst_miss
        pm_l2_ld
        pm_l2_ld_disp
        pm_l2_ld_hit
        pm_l2_ld_miss
        pm_l2_loc_guess_correct
        pm_l2_loc_guess_wrong
        pm_l2_rcld_disp
        pm_l2_rcld_disp_fail_addr
        pm_l2_rcld_disp_fail_other
        pm_l2_rcst_disp
        pm_l2_rcst_disp_fail_addr
        pm_l2_rcst_disp_fail_other
        pm_l2_rc_st_done
        pm_l2_rty_ld
        pm_l2_sn_m_rd_done
        pm_l2_sn_m_wr_done
        pm_l2_sn_sx_i_done
        pm_l2_st_disp
        pm_l2_st_hit
        pm_l2_sys_guess_correct
        pm_l2_sys_guess_wrong
        pm_l2_sys_pump
        pm_l3_ci_hit
        pm_l3_ci_miss
        pm_l3_cinj
        pm_l3_co
        pm_l3_co_lco
        pm_l3_grp_guess_correct
        pm_l3_grp_guess_wrong_high
        pm_l3_grp_guess_wrong_low
        pm_l3_hit
        pm_l3_l2_co_hit
        pm_l3_l2_co_miss
        pm_l3_lat_ci_hit
        pm_l3_lat_ci_miss
        pm_l3_ld_hit
        pm_l3_ld_miss
        pm_l3_loc_guess_correct
        pm_l3_loc_guess_wrong
        pm_l3_miss
        pm_l3_p0_co_l31
        pm_l3_p0_co_mem
        pm_l3_p0_co_rty
        pm_l3_p0_grp_pump
        pm_l3_p0_lco_data
        pm_l3_p0_lco_no_data
        pm_l3_p0_lco_rty
        pm_l3_p0_node_pump
        pm_l3_p0_pf_rty
        pm_l3_p0_sn_hit
        pm_l3_p0_sn_inv
        pm_l3_p0_sn_miss
        pm_l3_p0_sys_pump
        pm_l3_p1_co_l31
        pm_l3_p1_co_mem
        pm_l3_p1_co_rty
        pm_l3_p1_grp_pump
        pm_l3_p1_lco_data
        pm_l3_p1_lco_no_data
        pm_l3_p1_lco_rty
        pm_l3_p1_node_pump
        pm_l3_p1_pf_rty
        pm_l3_p1_sn_hit
        pm_l3_p1_sn_inv
        pm_l3_p1_sn_miss
        pm_l3_p1_sys_pump
        pm_l3_pf_hit_l3
        pm_l3_sys_guess_correct
        pm_l3_sys_guess_wrong
        pm_l3_trans_pf
        pm_l3_wi0_busy
        pm_l3_wi_usage
        pm_non_tm_rst_sc
        pm_rd_clearing_sc
        pm_rd_forming_sc
        pm_rd_hit_pf
        pm_snp_tm_hit_m
        pm_snp_tm_hit_t
        pm_st_caused_fail
        pm_tm_cam_overflow
        pm_tm_cap_overflow
        pm_tm_fav_caused_fail
        pm_tm_ld_caused_fail
        pm_tm_ld_conf
        pm_tm_rst_sc
        pm_tm_sc_co
        pm_tm_st_caused_fail
        pm_tm_st_conf
      Signed-off-by: default avatarMamatha Inamdar <mamatha4@linux.vnet.ibm.com>
      Acked-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Fixes: 2a81fa3b ("perf vendor events: Add power8 PMU events")
      Link: http://lkml.kernel.org/r/154953186583.11022.14819560028300370163.stgit@localhost.localdomainSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c3b4d5c4
    • Andi Kleen's avatar
      perf stat: Improve scaling · 42a5864c
      Andi Kleen authored
      The multiplexing scaling in perf stat mysteriously adds 0.5 to the
      value. This dates back to the original perf tool. Other scaling code
      doesn't use that strange convention. Remove the extra 0.5.
      
      Before:
      
      $ perf stat -e 'cycles,cycles,cycles,cycles,cycles,cycles' grep -rq foo
      
       Performance counter stats for 'grep -rq foo':
      
               6,403,580      cycles                                                        (81.62%)
               6,404,341      cycles                                                        (81.64%)
               6,402,983      cycles                                                        (81.62%)
               6,399,941      cycles                                                        (81.63%)
               6,399,451      cycles                                                        (81.62%)
               6,436,105      cycles                                                        (91.87%)
      
             0.005843799 seconds time elapsed
      
             0.002905000 seconds user
             0.002902000 seconds sys
      
      After:
      
      $ perf stat -e 'cycles,cycles,cycles,cycles,cycles,cycles' grep -rq foo
      
       Performance counter stats for 'grep -rq foo':
      
               6,422,704      cycles                                                        (81.68%)
               6,401,842      cycles                                                        (81.68%)
               6,398,432      cycles                                                        (81.68%)
               6,397,098      cycles                                                        (81.68%)
               6,396,074      cycles                                                        (81.67%)
               6,434,980      cycles                                                        (91.62%)
      
             0.005884437 seconds time elapsed
      
             0.003580000 seconds user
             0.002356000 seconds sys
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      LPU-Reference: 20190314225002.30108-10-andi@firstfloor.org
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      42a5864c
    • Andi Kleen's avatar
      perf stat: Fix --no-scale · 75998bb2
      Andi Kleen authored
      The -c option to enable multiplex scaling has been useless for quite
      some time because scaling is default.
      
      It's only useful as --no-scale to disable scaling. But the non scaling
      code path has bitrotted and doesn't print anything because perf output
      code relies on value run/ena information.
      
      Also even when we don't want to scale a value it's still useful to show
      its multiplex percentage.
      
      This patch:
        - Fixes help and documentation to show --no-scale instead of -c
        - Removes -c, only keeps the long option because -c doesn't support negatives.
        - Enables running/enabled even with --no-scale
        - And fixes some other problems in the no-scale output.
      
      Before:
      
        $ perf stat --no-scale -e cycles true
      
         Performance counter stats for 'true':
      
             <not counted>      cycles
      
               0.000984154 seconds time elapsed
      
      After:
      
        $ ./perf stat --no-scale -e cycles true
      
         Performance counter stats for 'true':
      
                   706,070      cycles
      
               0.001219821 seconds time elapsed
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      LPU-Reference: 20190314225002.30108-9-andi@firstfloor.org
      Link: https://lkml.kernel.org/n/tip-xggjvwcdaj2aqy8ib3i4b1g6@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      75998bb2
    • Andi Kleen's avatar
      perf script: Support relative time · 90b10f47
      Andi Kleen authored
      When comparing time stamps in 'perf script' traces it can be annoying to
      work with the full perf time stamps.
      
      Add a --reltime option that displays time stamps relative to the trace
      start to make it easier to read the traces.
      
      Note: not currently supported for --time. Report an error in this
      case.
      
      Before:
      
        % perf script
            swapper 0 [000] 245402.891216:    1 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
            swapper 0 [000] 245402.891223:    1 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
            swapper 0 [000] 245402.891227:    5 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
            swapper 0 [000] 245402.891231:   41 cycles:ppp: ffffffffa0068816 native_write_msr+0x6 ([kernel.kallsyms])
            swapper 0 [000] 245402.891235:  355 cycles:ppp: ffffffffa000dd51 intel_bts_enable_local+0x21 ([kernel.kallsyms])
            swapper 0 [000] 245402.891239: 3084 cycles:ppp: ffffffffa0a0150a end_repeat_nmi+0x48 ([kernel.kallsyms])
      
      After:
      
        % perf script --reltime
      
            swapper 0 [000]     0.000000:    1 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
            swapper 0 [000]     0.000006:    1 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
            swapper 0 [000]     0.000010:    5 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
            swapper 0 [000]     0.000014:   41 cycles:ppp: ffffffffa0068816 native_write_msr+0x6 ([kernel.kallsyms])
            swapper 0 [000]     0.000018:  355 cycles:ppp: ffffffffa000dd51 intel_bts_enable_local+0x21 ([kernel.kallsyms])
            swapper 0 [000]     0.000022: 3084 cycles:ppp: ffffffffa0a0150a end_repeat_nmi+0x48 ([kernel.kallsyms])
      
      Committer notes:
      
      Do not use 'time' as the name of a variable, as this breaks the build on
      older glibcs:
      
        cc1: warnings being treated as errors
        builtin-script.c: In function 'perf_sample__fprintf_start':
        builtin-script.c:691: warning: declaration of 'time' shadows a global declaration
        /usr/include/time.h:187: warning: shadowed declaration is here
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      LPU-Reference: 20190314225002.30108-8-andi@firstfloor.org
      Link: https://lkml.kernel.org/n/tip-bpahyi6pr9r399mvihu65fvc@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      90b10f47
    • Andi Kleen's avatar
      perf report: Indicate JITed code better in report · a4e7e6ef
      Andi Kleen authored
      Print [TID] tid %d instead of the crypted /tmp/perf-%d.map default.
      
      % cat >loop.java
        public class loop {
                public static void main(String[] args)
                {
                        for (;;);
                }
        }
        ^D
        % javac loop.java
        % perf record java loop
        ^C
      
      Before:
      
        % perf report --stdio
        ...
            56.09%  java     perf-34724.map      [.] 0x00007fd5bd021896
            19.12%  java     perf-34724.map      [.] 0x00007fd5bd021887
             9.79%  java     perf-34724.map      [.] 0x00007fd5bd021783
             8.97%  java     perf-34724.map      [.] 0x00007fd5bd02175b
      
      After:
      
        % perf report --stdio
        ...
            56.09%  java     [JIT] tid 34724     [.] 0x00007fd5bd021896
            19.12%  java     [JIT] tid 34724     [.] 0x00007fd5bd021887
             9.79%  java     [JIT] tid 34724     [.] 0x00007fd5bd021783
             8.97%  java     [JIT] tid 34724     [.] 0x00007fd5bd02175b
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      LPU-Reference: 20190314225002.30108-7-andi@firstfloor.org
      Link: https://lkml.kernel.org/n/tip-r17l6py9g0sezb7mi1f286gt@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a4e7e6ef
    • Andi Kleen's avatar
      perf report: Show all sort keys in help output · 702fb9b4
      Andi Kleen authored
      Show all the supported sort keys in the command line help output, so
      that it's not needed to refer to the manpage.
      
      Before:
      
        % perf report -h
        ...
             -s, --sort <key[,key2...]>
                                  sort by key(s): pid, comm, dso, symbol, parent, cpu, srcline, ... Please refer the man page for the complete list.
      
      After:
      
        % perf report -h
        ...
            -s, --sort <key[,key2...]>
                                  sort by key(s): overhead overhead_sys overhead_us overhead_guest_sys overhead_guest_us overhead_children sample period pid comm dso symbol parent cpu ...
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      LPU-Reference: 20190314225002.30108-5-andi@firstfloor.org
      Link: https://lkml.kernel.org/n/tip-9r3uz2ch4izoi1uln3f889co@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      702fb9b4
    • Andi Kleen's avatar
      perf record: Clarify help for --switch-output · c38dab7d
      Andi Kleen authored
      The help description for --switch-output looks like there are multiple
      comma separated fields. But it's actually a choice of different options.
      Make it clear and less confusing.
      
      Before:
      
        % perf record -h
        ...
                --switch-output[=<signal,size,time>]
                                  Switch output when receive SIGUSR2 or cross size,time threshold
      
      After:
      
        % perf record -h
        ...
      
                --switch-output[=<signal or size[BKMG] or time[smhd]>]
                                  Switch output when receiving SIGUSR2 (signal) or cross a size or time threshold
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      LPU-Reference: 20190314225002.30108-4-andi@firstfloor.org
      Link: https://lkml.kernel.org/n/tip-9yecyuha04nyg8toyd1b2pgi@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c38dab7d
    • Andi Kleen's avatar
      perf record: Allow to limit number of reported perf.data files · 03724b2e
      Andi Kleen authored
      When doing long term recording and waiting for some event to snapshot
      on, we often only care about the last minute or so.
      
      The --switch-output command line option supports rotating the perf.data
      file when the size exceeds a threshold. But the disk would still be
      filled with unnecessary old files.
      
      Add a new option to only keep a number of rotated files, so that the
      disk space usage can be limited.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      LPU-Reference: 20190314225002.30108-3-andi@firstfloor.org
      Link: https://lkml.kernel.org/n/tip-y5u2lik0ragt4vlktz6qc9ks@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      03724b2e
    • Andi Kleen's avatar
      perf list: Filter metrics too · 6f40b2a5
      Andi Kleen authored
      When a filter is specified on the command line, filter the metrics too.
      
      Before:
      
        % perf list foo
        List of pre-defined events (to be used in -e):
      
        Metric Groups:
      
        DSB:
          DSB_Coverage
               [Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)]
        ... more metrics ...
      
      After:
      
      % perf list foo
      
        List of pre-defined events (to be used in -e):
      
        Metric Groups:
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      LPU-Reference: 20190314225002.30108-1-andi@firstfloor.org
      Link: https://lkml.kernel.org/n/tip-1y8oi2s8c4jhjtykgs5zvda1@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6f40b2a5
  2. 11 Mar, 2019 27 commits