An error occurred fetching the project authors.
  1. 16 Aug, 2019 1 commit
    • Arnaldo Carvalho de Melo's avatar
      perf report: Add --switch-on/--switch-off events · ef4b1a53
      Arnaldo Carvalho de Melo authored
      Since 'perf top' shares the histogram browser with 'perf report', then
      the same explanation in the previous cset applies.
      
      An additional example uses a pair of SDT events available for systemtap:
      
        # perf probe --exec=/usr/bin/stap '%*:*'
        Added new events:
          sdt_stap:benchmark__thread__start (on %* in /usr/bin/stap)
          sdt_stap:benchmark   (on %* in /usr/bin/stap)
          sdt_stap:benchmark__thread__end (on %* in /usr/bin/stap)
          sdt_stap:pass6__start (on %* in /usr/bin/stap)
          sdt_stap:pass6__end  (on %* in /usr/bin/stap)
          sdt_stap:pass5__start (on %* in /usr/bin/stap)
          sdt_stap:pass5__end  (on %* in /usr/bin/stap)
          sdt_stap:pass0__start (on %* in /usr/bin/stap)
          sdt_stap:pass0__end  (on %* in /usr/bin/stap)
          sdt_stap:pass1a__start (on %* in /usr/bin/stap)
          sdt_stap:pass1b__start (on %* in /usr/bin/stap)
          sdt_stap:pass1__end  (on %* in /usr/bin/stap)
          sdt_stap:pass2__start (on %* in /usr/bin/stap)
          sdt_stap:pass2__end  (on %* in /usr/bin/stap)
          sdt_stap:pass3__start (on %* in /usr/bin/stap)
          sdt_stap:pass3__end  (on %* in /usr/bin/stap)
          sdt_stap:pass4__start (on %* in /usr/bin/stap)
          sdt_stap:pass4__end  (on %* in /usr/bin/stap)
          sdt_stap:benchmark__start (on %* in /usr/bin/stap)
          sdt_stap:benchmark__end (on %* in /usr/bin/stap)
          sdt_stap:cache__get  (on %* in /usr/bin/stap)
          sdt_stap:cache__clean (on %* in /usr/bin/stap)
          sdt_stap:cache__add__module (on %* in /usr/bin/stap)
          sdt_stap:cache__add__source (on %* in /usr/bin/stap)
          sdt_stap:stap_system__complete (on %* in /usr/bin/stap)
          sdt_stap:stap_system__start (on %* in /usr/bin/stap)
          sdt_stap:stap_system__spawn (on %* in /usr/bin/stap)
          sdt_stap:stap_system__fork (on %* in /usr/bin/stap)
          sdt_stap:intern_string (on %* in /usr/bin/stap)
          sdt_stap:client__start (on %* in /usr/bin/stap)
          sdt_stap:client__end (on %* in /usr/bin/stap)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e sdt_stap:client__end -aR sleep 1
      
        #
      
      From these we're use the two below to run systemtap's test suite:
      
        # perf record -e sdt_stap:pass2__*,cycles:P make installcheck > /dev/null
        ^C[ perf record: Woken up 8 times to write data ]
        [ perf record: Captured and wrote 2.691 MB perf.data (39638 samples) ]
        Terminated
        # perf script | grep sdt_stap
                    stap 28979 [000] 19424.302660: sdt_stap:pass2__start: (561b9a537de3) arg1=140730364262544
                    stap 28979 [000] 19424.333083:   sdt_stap:pass2__end: (561b9a53a9e1) arg1=140730364262544
                    stap 29045 [006] 19424.933460: sdt_stap:pass2__start: (563edddcede3) arg1=140722674883152
                    stap 29045 [006] 19424.963794:   sdt_stap:pass2__end: (563edddd19e1) arg1=140722674883152
        # perf script | grep cycles |  wc -l
        39634
        #
      
      Looking at the whole perf.data file:
      
        [root@quaco testsuite]# perf report | grep cycles:P -A25
        # Samples: 39K of event 'cycles:P'
        # Event count (approx.): 34044267368
        #
        # Overhead  Command  Shared Object         Symbol
        # ........  .......  ....................  ................................
        #
             3.50%  cc1      cc1                   [.] ht_lookup_with_hash
             3.04%  cc1      cc1                   [.] _cpp_lex_token
             2.11%  cc1      cc1                   [.] ggc_internal_alloc
             1.83%  cc1      cc1                   [.] cpp_get_token_with_location
             1.68%  cc1      libc-2.29.so          [.] _int_malloc
             1.41%  cc1      cc1                   [.] linemap_position_for_column
             1.25%  cc1      cc1                   [.] ggc_internal_cleared_alloc
             1.20%  cc1      cc1                   [.] c_lex_with_flags
             1.18%  cc1      cc1                   [.] get_combined_adhoc_loc
             1.05%  cc1      libc-2.29.so          [.] malloc
             1.01%  cc1      libc-2.29.so          [.] _int_free
             0.96%  stap     stap                  [.] std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Identity, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, stringtable_hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::_M_insert<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__detail::_AllocNode<std::allocator<std::__detail::_Hash_node<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, true> > > >
             0.78%  stap     stap                  [.] lexer::scan
             0.74%  cc1      cc1                   [.] _cpp_lex_direct
             0.70%  cc1      cc1                   [.] pop_scope
             0.70%  cc1      cc1                   [.] c_parser_declspecs
             0.69%  stap     libc-2.29.so          [.] _int_malloc
             0.68%  cc1      cc1                   [.] htab_find_slot
             0.68%  cc1      [kernel.vmlinux]      [k] prepare_exit_to_usermode
             0.64%  cc1      [kernel.vmlinux]      [k] clear_page_erms
        [root@quaco testsuite]#
      
      And now only what happens in slices demarcated by those start/end SDT
      events:
      
        [root@quaco testsuite]# perf report --switch-on=sdt_stap:pass2__start --switch-off=sdt_stap:pass2__end | grep cycles:P -A100
        # Samples: 240  of event 'cycles:P'
        # Event count (approx.): 206491934
        #
        # Overhead  Command  Shared Object        Symbol
        # ........  .......  ...................  ................................................
        #
            38.99%  stap     stap                 [.] systemtap_session::register_library_aliases
            19.47%  stap     stap                 [.] match_key::operator<
            15.01%  stap     libc-2.29.so         [.] __memcmp_avx2_movbe
             5.19%  stap     libc-2.29.so         [.] _int_malloc
             2.50%  stap     libstdc++.so.6.0.26  [.] std::_Rb_tree_insert_and_rebalance
             2.30%  stap     stap                 [.] match_node::build_no_more
             2.07%  stap     libc-2.29.so         [.] malloc
             1.66%  stap     stap                 [.] std::_Rb_tree<match_key, std::pair<match_key const, match_node*>, std::_Select1st<std::pair<match_key const, match_node*> >, std::less<match_key>, std::allocator<std::pair<match_key const, match_node*> > >::find
             1.66%  stap     stap                 [.] match_node::bind
             1.58%  stap     [kernel.vmlinux]     [k] prepare_exit_to_usermode
             1.17%  stap     [kernel.vmlinux]     [k] native_irq_return_iret
             0.87%  stap     stap                 [.] 0x0000000000032ec4
             0.77%  stap     libstdc++.so.6.0.26  [.] std::_Rb_tree_increment
             0.47%  stap     stap                 [.] std::vector<derived_probe_builder*, std::allocator<derived_probe_builder*> >::_M_realloc_insert<derived_probe_builder* const&>
             0.47%  stap     [kernel.vmlinux]     [k] get_page_from_freelist
             0.47%  stap     [kernel.vmlinux]     [k] swapgs_restore_regs_and_return_to_usermode
             0.47%  stap     [kernel.vmlinux]     [k] do_user_addr_fault
             0.46%  stap     [kernel.vmlinux]     [k] __pagevec_lru_add_fn
             0.46%  stap     stap                 [.] std::_Rb_tree<match_key, std::pair<match_key const, match_node*>, std::_Select1st<std::pair<match_key const, match_node*> >, std::less<match_key>, std::allocator<std::pair<match_key const, match_node*> > >::_M_emplace_unique<std::pair<match_key, match_node*> >
             0.42%  stap     libstdc++.so.6.0.26  [.] 0x00000000000c18fa
             0.40%  stap     [kernel.vmlinux]     [k] interrupt_entry
             0.40%  stap     [kernel.vmlinux]     [k] update_load_avg
             0.40%  stap     [kernel.vmlinux]     [k] __intel_pmu_disable_all
             0.40%  stap     [kernel.vmlinux]     [k] clear_page_erms
             0.39%  stap     [kernel.vmlinux]     [k] __mod_node_page_state
             0.39%  stap     [kernel.vmlinux]     [k] error_entry
             0.39%  stap     [kernel.vmlinux]     [k] sync_regs
             0.38%  stap     [kernel.vmlinux]     [k] __handle_mm_fault
             0.38%  stap     stap                 [.] derive_probes
      
        #
        # (Tip: System-wide collection from all CPUs: perf record -a)
        #
        [root@quaco testsuite]#
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: William Cohen <wcohen@redhat.com>
      Link: https://lkml.kernel.org/n/tip-408hvumcnyn93a0auihnawew@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ef4b1a53
  2. 29 Jul, 2019 3 commits
  3. 09 Jul, 2019 2 commits
  4. 26 Jun, 2019 2 commits
  5. 10 Jun, 2019 1 commit
  6. 15 May, 2019 2 commits
    • Alexey Budankov's avatar
      perf report: Implement perf.data record decompression · cb62c6f1
      Alexey Budankov authored
      zstd_init(, comp_level = 0) initializes decompression part of API only
      hat now consists of zstd_decompress_stream() function.
      
      The perf.data PERF_RECORD_COMPRESSED records are decompressed using
      zstd_decompress_stream() function into a linked list of mmaped memory
      regions of mmap_comp_len size (struct decomp).
      
      After decompression of one COMPRESSED record its content is iterated and
      fetched for usual processing. The mmaped memory regions with
      decompressed events are kept in the linked list till the tool process
      termination.
      
      When dumping raw records (e.g., perf report -D --header) file offsets of
      events from compressed records are printed as zero.
      
      Committer notes:
      
      Since now we have support for processing PERF_RECORD_COMPRESSED, we see
      none, in raw form, like we saw in the previous patch commiter notes,
      they were decompressed into the usual PERF_RECORD_{FORK,MMAP,COMM,etc}
      records, we only see the stats for those PERF_RECORD_COMPRESSED events,
      and since I used the file generated in the commiter notes for the
      previous patch, there they are, 2 compressed records:
      
        $ perf report --header-only | grep cmdline
        # cmdline : /home/acme/bin/perf record -z2 sleep 1
        $ perf report -D | grep COMPRESS
              COMPRESSED events:          2
              COMPRESSED events:          0
        $ perf report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 15  of event 'cycles:u'
        # Event count (approx.): 962227
        #
        # Overhead  Command  Shared Object     Symbol
        # ........  .......  ................  ...........................
        #
            46.99%  sleep    libc-2.28.so      [.] _dl_addr
            29.24%  sleep    [unknown]         [k] 0xffffffffaea00a67
            16.45%  sleep    libc-2.28.so      [.] __GI__IO_un_link.part.1
             5.92%  sleep    ld-2.28.so        [.] _dl_setup_hash
             1.40%  sleep    libc-2.28.so      [.] __nanosleep
             0.00%  sleep    [unknown]         [k] 0xffffffffaea00163
      
        #
        # (Tip: To see callchains in a more compact form: perf report -g folded)
        #
        $
      Signed-off-by: default avatarAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/304b0a59-942c-3fe1-da02-aa749f87108b@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cb62c6f1
    • Jin Yao's avatar
      perf annotate: Remove hist__account_cycles() from callback · bdd1666b
      Jin Yao authored
      The hist__account_cycles() function is executed when the
      hist_iter__branch_callback() is called.
      
      But it looks it's not necessary.  In hist__account_cycles, it already
      walks on all branch entries.
      
      This patch moves the hist__account_cycles out of callback, now the data
      processing is much faster than before.
      
      Previous code has an issue that the ch[offset].num++ (in
      __symbol__account_cycles) is executed repeatedly since
      hist__account_cycles is called in each hist_iter__branch_callback, so
      the counting of ch[offset].num is not correct (too big).
      
      With this patch, the issue is fixed. And we don't need the code of
      "ch->reset >= ch->num / 2" to check if there are too many overlaps (in
      annotation__count_and_fill), otherwise some data would be hidden.
      
      Now, we can try, for example:
      
        perf record -b ...
        perf annotate or perf report -s symbol
      
      The before/after output should be no change.
      
       v3:
       ---
       Fix the crash in stdio mode.
       Like previous code, it needs the checking of ui__has_annotation()
       before hist__account_cycles()
      
       v2:
       ---
       1. Cover the similar perf report
       2. Remove the checking code "ch->reset >= ch->num / 2"
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1552684577-29041-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bdd1666b
  7. 19 Mar, 2019 1 commit
  8. 11 Mar, 2019 3 commits
  9. 01 Mar, 2019 1 commit
    • Jin Yao's avatar
      perf time-utils: Refactor time range parsing code · 284c4e18
      Jin Yao authored
      Jiri points out that we don't need any time checking and time string
      parsing if the --time option is not set. That makes sense.
      
      This patch refactors the time range parsing code, move the duplicated
      code from perf report and perf script to time_utils and check if --time
      option is set before parsing the time string. This patch is no logic
      change expected. So the usage of --time is same as before.
      
      For example:
      
      Select the first and second 10% time slices:
        perf report --time 10%/1,10%/2
        perf script --time 10%/1,10%/2
      
      Select the slices from 0% to 10% and from 30% to 40%:
        perf report --time 0%-10%,30%-40%
        perf script --time 0%-10%,30%-40%
      
      Select the time slices from timestamp 3971 to 3973
        perf report --time 3971,3973
        perf script --time 3971,3973
      
      Committer testing:
      
      Using the above examples, check before and after to see if it remains
      the same:
      
        $ perf record -F 10000 -- find . -name "*.[ch]" -exec cat {} + > /dev/null
        [ perf record: Woken up 3 times to write data ]
        [ perf record: Captured and wrote 1.626 MB perf.data (42392 samples) ]
        $
        $ perf report --time 10%/1,10%/2 > /tmp/report.before.1
        $ perf script --time 10%/1,10%/2 > /tmp/script.before.1
        $ perf report --time 0%-10%,30%-40% > /tmp/report.before.2
        $ perf script --time 0%-10%,30%-40% > /tmp/script.before.2
        $ perf report --time 180457.375844,180457.377717 > /tmp/report.before.3
        $ perf script --time 180457.375844,180457.377717 > /tmp/script.before.3
      
      For example, the 3rd test produces this slice:
      
        $ cat /tmp/script.before.3
              cat  3147 180457.375844:   2143 cycles:uppp:      7f79362590d9 cfree@GLIBC_2.2.5+0x9 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.375986:   2245 cycles:uppp:      558b70f3d86e [unknown] (/usr/bin/cat)
              cat  3147 180457.376012:   2164 cycles:uppp:      7f7936257430 _int_malloc+0x8c0 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.376140:   2921 cycles:uppp:      558b70f3a554 [unknown] (/usr/bin/cat)
              cat  3147 180457.376296:   2844 cycles:uppp:      7f7936258abe malloc+0x4e (/usr/lib64/libc-2.28.so)
              cat  3147 180457.376431:   2717 cycles:uppp:      558b70f3b0ca [unknown] (/usr/bin/cat)
              cat  3147 180457.376667:   2630 cycles:uppp:      558b70f3d86e [unknown] (/usr/bin/cat)
              cat  3147 180457.376795:   2442 cycles:uppp:      7f79362bff55 read+0x15 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.376927:   2376 cycles:uppp:  ffffffff9aa00163 [unknown] ([unknown])
              cat  3147 180457.376954:   2307 cycles:uppp:      7f7936257438 _int_malloc+0x8c8 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.377116:   3091 cycles:uppp:      7f7936258a70 malloc+0x0 (/usr/lib64/libc-2.28.so)
              cat  3147 180457.377362:   2945 cycles:uppp:      558b70f3a3b0 [unknown] (/usr/bin/cat)
              cat  3147 180457.377517:   2727 cycles:uppp:      558b70f3a9aa [unknown] (/usr/bin/cat)
        $
      
      Install 'coreutils-debuginfo' to see cat's guts (symbols), but then, the
      above chunk translates into this 'perf report' output:
      
        $ cat /tmp/report.before.3
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 13  of event 'cycles:uppp' (time slices: 180457.375844,180457.377717)
        # Event count (approx.): 33552
        #
        # Overhead  Command  Shared Object     Symbol
        # ........  .......  ................  ......................
        #
            17.69%  cat      libc-2.28.so      [.] malloc
            14.53%  cat      cat               [.] 0x000000000000586e
            13.33%  cat      libc-2.28.so      [.] _int_malloc
             8.78%  cat      cat               [.] 0x00000000000023b0
             8.71%  cat      cat               [.] 0x0000000000002554
             8.13%  cat      cat               [.] 0x00000000000029aa
             8.10%  cat      cat               [.] 0x00000000000030ca
             7.28%  cat      libc-2.28.so      [.] read
             7.08%  cat      [unknown]         [k] 0xffffffff9aa00163
             6.39%  cat      libc-2.28.so      [.] cfree@GLIBC_2.2.5
      
        #
        # (Tip: Order by the overhead of source file name and line number: perf report -s srcline)
        #
        $
      
      Now lets see after applying this patch, nothing should change:
      
        $ perf report --time 10%/1,10%/2 > /tmp/report.after.1
        $ perf script --time 10%/1,10%/2 > /tmp/script.after.1
        $ perf report --time 0%-10%,30%-40% > /tmp/report.after.2
        $ perf script --time 0%-10%,30%-40% > /tmp/script.after.2
        $ perf report --time 180457.375844,180457.377717 > /tmp/report.after.3
        $ perf script --time 180457.375844,180457.377717 > /tmp/script.after.3
        $ diff -u /tmp/report.before.1 /tmp/report.after.1
        $ diff -u /tmp/script.before.1 /tmp/script.after.1
        $ diff -u /tmp/report.before.2 /tmp/report.after.2
        --- /tmp/report.before.2	2019-03-01 11:01:53.526094883 -0300
        +++ /tmp/report.after.2	2019-03-01 11:09:18.231770467 -0300
        @@ -352,5 +352,5 @@
      
         #
        -# (Tip: Generate a script for your data: perf script -g <lang>)
        +# (Tip: Treat branches as callchains: perf report --branch-history)
         #
        $ diff -u /tmp/script.before.2 /tmp/script.after.2
        $ diff -u /tmp/report.before.3 /tmp/report.after.3
        --- /tmp/report.before.3	2019-03-01 11:03:08.890045588 -0300
        +++ /tmp/report.after.3	2019-03-01 11:09:40.660224002 -0300
        @@ -22,5 +22,5 @@
      
         #
        -# (Tip: Order by the overhead of source file name and line number: perf report -s srcline)
        +# (Tip: List events using substring match: perf list <keyword>)
         #
        $ diff -u /tmp/script.before.3 /tmp/script.after.3
        $
      
      Cool, just the 'perf report' tips changed, QED.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1551435186-6008-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      284c4e18
  10. 22 Feb, 2019 1 commit
    • Jiri Olsa's avatar
      perf data: Add global path holder · 2d4f2799
      Jiri Olsa authored
      Add a 'path' member to 'struct perf_data'. It will keep the configured
      path for the data (const char *). The path in struct perf_data_file is
      now dynamically allocated (duped) from it.
      
      This scheme is useful/used in following patches where struct
      perf_data::path holds the 'configure' directory path and struct
      perf_data_file::path holds the allocated path for specific files.
      
      Also it actually makes the code little simpler.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20190221094145.9151-3-jolsa@kernel.org
      [ Fixup data-convert-bt.c missing conversion ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2d4f2799
  11. 06 Feb, 2019 2 commits
  12. 25 Jan, 2019 1 commit
  13. 21 Jan, 2019 1 commit
    • Rasmus Villemoes's avatar
      perf tools: Replace automatic const char[] variables by statics · 49b8e2be
      Rasmus Villemoes authored
      An automatic const char[] variable gets initialized at runtime, just
      like any other automatic variable. For long strings, that uses a lot of
      stack and wastes time building the string; e.g. for the "No %s
      allocation events..." case one has:
      
        444516:       48 b8 4e 6f 20 25 73 20 61 6c   movabs $0x6c61207325206f4e,%rax # "No %s al"
        ...
        444674:       48 89 45 80                     mov    %rax,-0x80(%rbp)
        444678:       48 b8 6c 6f 63 61 74 69 6f 6e   movabs $0x6e6f697461636f6c,%rax # "location"
        444682:       48 89 45 88                     mov    %rax,-0x78(%rbp)
        444686:       48 b8 20 65 76 65 6e 74 73 20   movabs $0x2073746e65766520,%rax # " events "
        444690:       66 44 89 55 c4                  mov    %r10w,-0x3c(%rbp)
        444695:       48 89 45 90                     mov    %rax,-0x70(%rbp)
        444699:       48 b8 66 6f 75 6e 64 2e 20 20   movabs $0x20202e646e756f66,%rax
      
      Make them all static so that the compiler just references objects in .rodata.
      
      Committer testing:
      
      Ok, using dwarves's codiff tool:
      
          $ codiff --functions /tmp/perf.before ~/bin/perf
        builtin-sched.c:
          cmd_sched                 |  -48
         1 function changed, 48 bytes removed, diff: -48
      
        builtin-report.c:
          cmd_report                |  -32
         1 function changed, 32 bytes removed, diff: -32
      
        builtin-kmem.c:
          cmd_kmem                  |  -64
          build_alloc_func_list     |  -50
         2 functions changed, 114 bytes removed, diff: -114
      
        builtin-c2c.c:
          perf_c2c__report          | -390
         1 function changed, 390 bytes removed, diff: -390
      
        ui/browsers/header.c:
          tui__header_window        | -104
         1 function changed, 104 bytes removed, diff: -104
      
        /home/acme/bin/perf:
         9 functions changed, 688 bytes removed, diff: -688
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20181102230624.20064-1-linux@rasmusvillemoes.dkSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      49b8e2be
  14. 17 Dec, 2018 1 commit
    • Jin Yao's avatar
      perf report: Display average IPC and IPC coverage per symbol · ec6ae74f
      Jin Yao authored
      Support displaying the average IPC and IPC coverage per symbol in 'perf
      report' --tui and --stdio modes.
      
      For example,
      
       $ perf record -b ...
       $ perf report -s symbol
      
       Overhead  Symbol                           IPC   [IPC Coverage]
         39.60%  [.] __random                     2.30  [ 54.8%]
         18.02%  [.] main                         0.43  [ 54.3%]
         14.21%  [.] compute_flag                 2.29  [100.0%]
         14.16%  [.] rand                         0.36  [100.0%]
          7.06%  [.] __random_r                   2.57  [ 70.5%]
          6.85%  [.] rand@plt                     0.00  [  0.0%]
      
      Jiri Olsa <jolsa@redhat.com> provided the patch to support the --stdio
      mode. I merged Jiri's code in this patch.
      
        $ perf report -s symbol --stdio
      
          # Overhead  Symbol                       IPC   [IPC Coverage]
          # ........  ...........................  ....................
          #
            39.60%  [.] __random                   2.30  [ 54.8%]
            18.02%  [.] main                       0.43  [ 54.3%]
            14.21%  [.] compute_flag               2.29  [100.0%]
            14.16%  [.] rand                       0.36  [100.0%]
             7.06%  [.] __random_r                 2.57  [ 70.5%]
             6.85%  [.] rand@plt                   0.00  [  0.0%]
             0.02%  [k] run_timer_softirq          1.60  [ 57.2%]
      
      The columns "IPC" and "[IPC Coverage]" are automatically enabled when
      the sort-key "symbol" is specified. If the perf.data file doesn't
      contain timed LBR information, columns are filled with "-".
      
      For example,
      
        # Overhead  Symbol                       IPC   [IPC Coverage]
        # ........  ...........................  ....................
        #
            46.57%  [.] main                     -      -
            17.60%  [.] rand                     -      -
            15.84%  [.] __random_r               -      -
            11.90%  [.] __random                 -      -
             6.50%  [.] compute_flag             -      -
             1.59%  [.] rand@plt                 -      -
             0.00%  [.] _dl_relocate_object      -      -
             0.00%  [k] tlb_flush_mmu            -      -
             0.00%  [k] perf_event_mmap          -      -
             0.00%  [k] native_sched_clock       -      -
             0.00%  [k] intel_pmu_handle_irq_v4  -      -
             0.00%  [k] native_write_msr         -      -
      
       v3:
       ---
       Removed the sortkey 'ipc' from command-line. The columns "IPC"
       and "[IPC Coverage]" are automatically enabled when "symbol"
       is specified.
      
       v2:
       ---
       Merge in Jiri's patch to support stdio mode
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1543586097-27632-4-git-send-email-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ec6ae74f
  15. 16 Oct, 2018 1 commit
    • Jiri Olsa's avatar
      perf evsel: Store ids for events with their own cpus perf_event__synthesize_event_update_cpus · 4ab8455f
      Jiri Olsa authored
      John reported crash when recording on an event under PMU with cpumask defined:
      
        root@localhost:~# ./perf_debug_ record -e armv8_pmuv3_0/br_mis_pred/ sleep 1
        perf: Segmentation fault
        Obtained 9 stack frames.
        ./perf_debug_() [0x4c5ef8]
        [0xffff82ba267c]
        ./perf_debug_() [0x4bc5a8]
        ./perf_debug_() [0x419550]
        ./perf_debug_() [0x41a928]
        ./perf_debug_() [0x472f58]
        ./perf_debug_() [0x473210]
        ./perf_debug_() [0x4070f4]
        /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe0) [0xffff8294c8a0]
        Segmentation fault (core dumped)
      
      We synthesize an update event that needs to touch the evsel id array, which is
      not defined at that time. Fixing this by forcing the id allocation for events
      with their own cpus.
      Reported-by: default avatarJohn Garry <john.garry@huawei.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarJohn Garry <john.garry@huawei.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linuxarm@huawei.com
      Fixes: bfd8f72c ("perf record: Synthesize unit/scale/... in event update")
      Link: http://lkml.kernel.org/r/20181003212052.GA32371@kravaSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4ab8455f
  16. 19 Sep, 2018 2 commits
  17. 13 Aug, 2018 1 commit
  18. 08 Aug, 2018 1 commit
  19. 24 Jul, 2018 1 commit
  20. 25 Jun, 2018 1 commit
    • Ravi Bangoria's avatar
      perf tools: Fix crash caused by accessing feat_ops[HEADER_LAST_FEATURE] · 92ead7ee
      Ravi Bangoria authored
      perf_event__process_feature() accesses feat_ops[HEADER_LAST_FEATURE]
      which is not defined and thus perf is crashing. HEADER_LAST_FEATURE is
      used as an end marker for the perf report but it's unused for perf
      script/annotate. Ignore HEADER_LAST_FEATURE for perf script/annotate,
      just like it is done in 'perf report'.
      
      Before:
        # perf record -o - ls | perf script
        <SNIP 'ls' output>
        Segmentation fault (core dumped)
        #
      
      After:
        # perf record -o - ls | perf script
        <SNIP 'ls' output>
        Segmentation fault (core dumped)
        ls 7031 4392.099856:  250000 cpu-clock:uhH:  7f5e0ce7cd60
        ls 7031 4392.100355:  250000 cpu-clock:uhH:  7f5e0c706ef7
        #
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Fixes: 57b5de46 ("perf report: Support forced leader feature in pipe mode")
      Link: http://lkml.kernel.org/r/20180625124220.6434-4-ravi.bangoria@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      92ead7ee
  21. 04 Jun, 2018 6 commits
  22. 21 May, 2018 1 commit
  23. 27 Apr, 2018 1 commit
    • Arnaldo Carvalho de Melo's avatar
      perf symbols: Unify symbol maps · 3183f8ca
      Arnaldo Carvalho de Melo authored
      Remove the split of symbol tables for data (MAP__VARIABLE) and for
      functions (MAP__FUNCTION), its unneeded and there were various places
      doing two lookups to find a symbol, so simplify this.
      
      We still will consider only the symbols that matched the filters in
      place, i.e. see the (elf_(sec,sym)|symbol_type)__filter() routines in
      the patch, just so that we consider only the same symbols as before,
      to reduce the possibility of regressions.
      
      All the tests on 50-something build environments, in varios versions
      of lots of distros and cross build environments were performed without
      build regressions, as usual with all pull requests the other tests were
      also performed: 'perf test' and 'make -C tools/perf build-test'.
      
      Also this was done at a great granularity so that regressions can be
      bisected more easily.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-hiq0fy2rsleupnqqwuojo1ne@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3183f8ca
  24. 26 Apr, 2018 2 commits
  25. 21 Mar, 2018 1 commit