1. 19 Mar, 2019 8 commits
    • Andi Kleen's avatar
      perf stat: Improve scaling · 42a5864c
      Andi Kleen authored
      The multiplexing scaling in perf stat mysteriously adds 0.5 to the
      value. This dates back to the original perf tool. Other scaling code
      doesn't use that strange convention. Remove the extra 0.5.
      
      Before:
      
      $ perf stat -e 'cycles,cycles,cycles,cycles,cycles,cycles' grep -rq foo
      
       Performance counter stats for 'grep -rq foo':
      
               6,403,580      cycles                                                        (81.62%)
               6,404,341      cycles                                                        (81.64%)
               6,402,983      cycles                                                        (81.62%)
               6,399,941      cycles                                                        (81.63%)
               6,399,451      cycles                                                        (81.62%)
               6,436,105      cycles                                                        (91.87%)
      
             0.005843799 seconds time elapsed
      
             0.002905000 seconds user
             0.002902000 seconds sys
      
      After:
      
      $ perf stat -e 'cycles,cycles,cycles,cycles,cycles,cycles' grep -rq foo
      
       Performance counter stats for 'grep -rq foo':
      
               6,422,704      cycles                                                        (81.68%)
               6,401,842      cycles                                                        (81.68%)
               6,398,432      cycles                                                        (81.68%)
               6,397,098      cycles                                                        (81.68%)
               6,396,074      cycles                                                        (81.67%)
               6,434,980      cycles                                                        (91.62%)
      
             0.005884437 seconds time elapsed
      
             0.003580000 seconds user
             0.002356000 seconds sys
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      LPU-Reference: 20190314225002.30108-10-andi@firstfloor.org
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      42a5864c
    • Andi Kleen's avatar
      perf stat: Fix --no-scale · 75998bb2
      Andi Kleen authored
      The -c option to enable multiplex scaling has been useless for quite
      some time because scaling is default.
      
      It's only useful as --no-scale to disable scaling. But the non scaling
      code path has bitrotted and doesn't print anything because perf output
      code relies on value run/ena information.
      
      Also even when we don't want to scale a value it's still useful to show
      its multiplex percentage.
      
      This patch:
        - Fixes help and documentation to show --no-scale instead of -c
        - Removes -c, only keeps the long option because -c doesn't support negatives.
        - Enables running/enabled even with --no-scale
        - And fixes some other problems in the no-scale output.
      
      Before:
      
        $ perf stat --no-scale -e cycles true
      
         Performance counter stats for 'true':
      
             <not counted>      cycles
      
               0.000984154 seconds time elapsed
      
      After:
      
        $ ./perf stat --no-scale -e cycles true
      
         Performance counter stats for 'true':
      
                   706,070      cycles
      
               0.001219821 seconds time elapsed
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      LPU-Reference: 20190314225002.30108-9-andi@firstfloor.org
      Link: https://lkml.kernel.org/n/tip-xggjvwcdaj2aqy8ib3i4b1g6@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      75998bb2
    • Andi Kleen's avatar
      perf script: Support relative time · 90b10f47
      Andi Kleen authored
      When comparing time stamps in 'perf script' traces it can be annoying to
      work with the full perf time stamps.
      
      Add a --reltime option that displays time stamps relative to the trace
      start to make it easier to read the traces.
      
      Note: not currently supported for --time. Report an error in this
      case.
      
      Before:
      
        % perf script
            swapper 0 [000] 245402.891216:    1 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
            swapper 0 [000] 245402.891223:    1 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
            swapper 0 [000] 245402.891227:    5 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
            swapper 0 [000] 245402.891231:   41 cycles:ppp: ffffffffa0068816 native_write_msr+0x6 ([kernel.kallsyms])
            swapper 0 [000] 245402.891235:  355 cycles:ppp: ffffffffa000dd51 intel_bts_enable_local+0x21 ([kernel.kallsyms])
            swapper 0 [000] 245402.891239: 3084 cycles:ppp: ffffffffa0a0150a end_repeat_nmi+0x48 ([kernel.kallsyms])
      
      After:
      
        % perf script --reltime
      
            swapper 0 [000]     0.000000:    1 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
            swapper 0 [000]     0.000006:    1 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
            swapper 0 [000]     0.000010:    5 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
            swapper 0 [000]     0.000014:   41 cycles:ppp: ffffffffa0068816 native_write_msr+0x6 ([kernel.kallsyms])
            swapper 0 [000]     0.000018:  355 cycles:ppp: ffffffffa000dd51 intel_bts_enable_local+0x21 ([kernel.kallsyms])
            swapper 0 [000]     0.000022: 3084 cycles:ppp: ffffffffa0a0150a end_repeat_nmi+0x48 ([kernel.kallsyms])
      
      Committer notes:
      
      Do not use 'time' as the name of a variable, as this breaks the build on
      older glibcs:
      
        cc1: warnings being treated as errors
        builtin-script.c: In function 'perf_sample__fprintf_start':
        builtin-script.c:691: warning: declaration of 'time' shadows a global declaration
        /usr/include/time.h:187: warning: shadowed declaration is here
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      LPU-Reference: 20190314225002.30108-8-andi@firstfloor.org
      Link: https://lkml.kernel.org/n/tip-bpahyi6pr9r399mvihu65fvc@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      90b10f47
    • Andi Kleen's avatar
      perf report: Indicate JITed code better in report · a4e7e6ef
      Andi Kleen authored
      Print [TID] tid %d instead of the crypted /tmp/perf-%d.map default.
      
      % cat >loop.java
        public class loop {
                public static void main(String[] args)
                {
                        for (;;);
                }
        }
        ^D
        % javac loop.java
        % perf record java loop
        ^C
      
      Before:
      
        % perf report --stdio
        ...
            56.09%  java     perf-34724.map      [.] 0x00007fd5bd021896
            19.12%  java     perf-34724.map      [.] 0x00007fd5bd021887
             9.79%  java     perf-34724.map      [.] 0x00007fd5bd021783
             8.97%  java     perf-34724.map      [.] 0x00007fd5bd02175b
      
      After:
      
        % perf report --stdio
        ...
            56.09%  java     [JIT] tid 34724     [.] 0x00007fd5bd021896
            19.12%  java     [JIT] tid 34724     [.] 0x00007fd5bd021887
             9.79%  java     [JIT] tid 34724     [.] 0x00007fd5bd021783
             8.97%  java     [JIT] tid 34724     [.] 0x00007fd5bd02175b
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      LPU-Reference: 20190314225002.30108-7-andi@firstfloor.org
      Link: https://lkml.kernel.org/n/tip-r17l6py9g0sezb7mi1f286gt@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a4e7e6ef
    • Andi Kleen's avatar
      perf report: Show all sort keys in help output · 702fb9b4
      Andi Kleen authored
      Show all the supported sort keys in the command line help output, so
      that it's not needed to refer to the manpage.
      
      Before:
      
        % perf report -h
        ...
             -s, --sort <key[,key2...]>
                                  sort by key(s): pid, comm, dso, symbol, parent, cpu, srcline, ... Please refer the man page for the complete list.
      
      After:
      
        % perf report -h
        ...
            -s, --sort <key[,key2...]>
                                  sort by key(s): overhead overhead_sys overhead_us overhead_guest_sys overhead_guest_us overhead_children sample period pid comm dso symbol parent cpu ...
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      LPU-Reference: 20190314225002.30108-5-andi@firstfloor.org
      Link: https://lkml.kernel.org/n/tip-9r3uz2ch4izoi1uln3f889co@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      702fb9b4
    • Andi Kleen's avatar
      perf record: Clarify help for --switch-output · c38dab7d
      Andi Kleen authored
      The help description for --switch-output looks like there are multiple
      comma separated fields. But it's actually a choice of different options.
      Make it clear and less confusing.
      
      Before:
      
        % perf record -h
        ...
                --switch-output[=<signal,size,time>]
                                  Switch output when receive SIGUSR2 or cross size,time threshold
      
      After:
      
        % perf record -h
        ...
      
                --switch-output[=<signal or size[BKMG] or time[smhd]>]
                                  Switch output when receiving SIGUSR2 (signal) or cross a size or time threshold
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      LPU-Reference: 20190314225002.30108-4-andi@firstfloor.org
      Link: https://lkml.kernel.org/n/tip-9yecyuha04nyg8toyd1b2pgi@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c38dab7d
    • Andi Kleen's avatar
      perf record: Allow to limit number of reported perf.data files · 03724b2e
      Andi Kleen authored
      When doing long term recording and waiting for some event to snapshot
      on, we often only care about the last minute or so.
      
      The --switch-output command line option supports rotating the perf.data
      file when the size exceeds a threshold. But the disk would still be
      filled with unnecessary old files.
      
      Add a new option to only keep a number of rotated files, so that the
      disk space usage can be limited.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      LPU-Reference: 20190314225002.30108-3-andi@firstfloor.org
      Link: https://lkml.kernel.org/n/tip-y5u2lik0ragt4vlktz6qc9ks@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      03724b2e
    • Andi Kleen's avatar
      perf list: Filter metrics too · 6f40b2a5
      Andi Kleen authored
      When a filter is specified on the command line, filter the metrics too.
      
      Before:
      
        % perf list foo
        List of pre-defined events (to be used in -e):
      
        Metric Groups:
      
        DSB:
          DSB_Coverage
               [Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)]
        ... more metrics ...
      
      After:
      
      % perf list foo
      
        List of pre-defined events (to be used in -e):
      
        Metric Groups:
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      LPU-Reference: 20190314225002.30108-1-andi@firstfloor.org
      Link: https://lkml.kernel.org/n/tip-1y8oi2s8c4jhjtykgs5zvda1@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6f40b2a5
  2. 11 Mar, 2019 31 commits
  3. 09 Mar, 2019 1 commit
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-5.1-20190307' of... · b339da48
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-5.1-20190307' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/core changes from Arnaldo Carvalho de Melo:
      
      perf bpf:
      
        Arnaldo Carvalho de Melo:
      
        - Automatically add BTF ELF markers to 'perf trace' BPF programs, so that
          tools such as 'bpftool map dump' can pretty print map keys and values.
      
      perf c2c:
      
        Jiri Olsa:
      
        - Fix report for empty NUMA node.
      
      perf diff:
      
        Jin Yao:
      
        - Support --time, --cpu, --pid and --tid filter options.
      
      perf probe:
      
        Arnaldo Carvalho de Melo:
      
        - Clarify error message about not finding kernel modules debuginfo.
      
      perf record:
      
        Jiri Olsa:
      
        - Fixup probing for max attr.precise_ip.
      
      perf trace:
      
        Arnaldo Carvalho de Melo:
      
        - Add missing %s lost in the 'msg_flags' recvmmsg arg when adding prefix suppression logic.
      
      perf annotate:
      
        Arnaldo Carvalho de Melo:
      
        - Calculate the max instruction name, align column to that, removing the
          hardcoded max 6 chars and cope with instructions with names longer than that,
          such as vpmovmskb, vpcmpeqb, etc.
      
      kernel:
      
        Song Liu:
      
        - Consider events with attr.bpf_event set as side-band.
      
        Gustavo A. R. Silva:
      
        - Mark expected switch fall-through in perf_event_parse_addr_filter().
      
      Libraries:
      
        Jiri Olsa:
      
        - Fix leaks and double frees on error paths.
      
      libtraceevent:
      
        Tony Jones:
      
        - Fix buffer overflow in arg_eval().
      
      python scripting:
      
        Tony Jones:
      
        - More python3 fixes.
      
      Trivial:
      
        Yang Wei:
      
        - Remove needless extra semicolon in clang C++ glue code.
      
      Intel PT/BTS:
      
        Adrian Hunter:
      
        - Improve auxtrace address filter error message when there is no DSO.
      
        - Fix divide by zero when TSC is not available.
      
        - Further improvements to the export to sqlite/posgresql python scripts
          and to the GUI sqlviewer, exporting 'parent_id' so that we have enable
          the creation of call trees.
      
        Andi Kleen:
      
        - Generalize function to copy from thread addr space from intel-bts code.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b339da48