1. 27 Oct, 2017 10 commits
  2. 25 Oct, 2017 6 commits
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-4.15-20171025' of... · 57646b6f
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-4.15-20171025' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core inline improvements from Arnaldo Carvalho de Melo:
      
      From Milian's cover letter: (Milian Wolff)
      
      "This series of patches completely reworks the way inline frames are
       handled.  Instead of querying for the inline nodes on-demand in the
       individual tools, we now create proper callchain nodes for inlined
       frames. The advantages this approach brings are numerous:
      
       - Less duplicated code in the individual browser
      
       - Aggregated cost for inlined frames for the --children top-down list
      
       - Various bug fixes that arose from querying for a srcline/symbol based on
         the IP of a sample, which will always point to the last inlined frame
         instead of the corresponding non-inlined frame
      
       - Overall much better support for visualizing cost for heavily-inlined C++
         code, which simply was confusing and unreliably before
      
       - srcline honors the global setting as to whether full paths or basenames
         should be shown
      
       - Caches for inlined frames and srcline information, which allow us to
         enable inline frame handling by default"
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      57646b6f
    • Milian Wolff's avatar
      perf util: Enable handling of inlined frames by default · d8a88dd2
      Milian Wolff authored
      Now that we have caches in place to speed up the process of finding
      inlined frames and srcline information repeatedly, we can enable this
      useful option by default.
      Suggested-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171019113836.5548-6-milian.wolff@kdab.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d8a88dd2
    • Milian Wolff's avatar
      perf report: Use srcline from callchain for hist entries · 1fb7d06a
      Milian Wolff authored
      This also removes the symbol name from the srcline column, more on this
      below.
      
      This ensures we use the correct srcline, which could originate from a
      potentially inlined function. The hist entries used to query for the
      srcline based purely on the IP, which leads to wrong results for inlined
      entries.
      
      Before:
      
      ~~~~~
        perf report --inline -s srcline -g none --stdio
        ...
        # Children      Self  Source:Line
        # ........  ........  ..................................................................................................................................
        #
            94.23%     0.00%  __libc_start_main+18446603487898210537
            94.23%     0.00%  _start+41
            44.58%     0.00%  main+100
            44.58%     0.00%  std::_Norm_helper<true>::_S_do_it<double>+100
            44.58%     0.00%  std::__complex_abs+100
            44.58%     0.00%  std::abs<double>+100
            44.58%     0.00%  std::norm<double>+100
            36.01%     0.00%  hypot+18446603487892193300
            25.81%     0.00%  main+41
            25.81%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
            25.81%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
            25.75%    25.75%  random.h:143
            18.39%     0.00%  main+57
            18.39%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
            18.39%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
            13.80%    13.80%  random.tcc:3330
             5.64%     0.00%  ??:0
             4.13%     4.13%  __hypot_finite+163
             4.13%     0.00%  __hypot_finite+18446603487892193443
      ...
      ~~~~~
      
      After:
      
      ~~~~~
        perf report --inline -s srcline -g none --stdio
        ...
        # Children      Self  Source:Line
        # ........  ........  ...........................................
        #
            94.30%     1.19%  main.cpp:39
            94.23%     0.00%  __libc_start_main+18446603487898210537
            94.23%     0.00%  _start+41
            48.44%     1.70%  random.h:1823
            48.44%     0.00%  random.h:1814
            46.74%     2.53%  random.h:185
            44.68%     0.10%  complex:589
            44.68%     0.00%  complex:597
            44.68%     0.00%  complex:654
            44.68%     0.00%  complex:664
            40.61%    13.80%  random.tcc:3330
            36.01%     0.00%  hypot+18446603487892193300
            26.81%     0.00%  random.h:151
            26.81%     0.00%  random.h:332
            25.75%    25.75%  random.h:143
             5.64%     0.00%  ??:0
             4.13%     4.13%  __hypot_finite+163
             4.13%     0.00%  __hypot_finite+18446603487892193443
      ...
      ~~~~~
      
      Note that this change removes the symbol from the source:line hist
      column. If this information is desired, users should explicitly query
      for it if needed. I.e. run this command instead:
      
      ~~~~~
        perf report --inline -s sym,srcline -g none --stdio
        ...
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 1K of event 'cycles:uppp'
        # Event count (approx.): 1381229476
        #
        # Children      Self  Symbol                                                                                                                               Source:Line
        # ........  ........  ...................................................................................................................................  ...........................................
        #
            94.30%     1.19%  [.] main                                                                                                                             main.cpp:39
            94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
            94.23%     0.00%  [.] _start                                                                                                                           _start+41
            48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1814
            48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1823
            46.74%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  random.h:185
            44.68%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              complex:654
            44.68%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     complex:589
            44.68%     0.00%  [.] std::abs<double> (inlined)                                                                                                       complex:597
            44.68%     0.00%  [.] std::norm<double> (inlined)                                                                                                      complex:664
            39.80%    13.59%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
            36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
            26.81%     0.00%  [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)                                                        random.h:151
            26.81%     0.00%  [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)                                 random.h:332
            25.75%     0.00%  [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)                                     random.h:143
            25.19%    25.19%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
             4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
             4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
      ...
      ~~~~~
      
      Compared to the old behavior, this reduces duplication in the output.
      Before we used to print the symbol name in the srcline column even
      when the sym column was explicitly requested. I.e. the output was:
      
      ~~~~~
        perf report --inline -s sym,srcline -g none --stdio
        ...
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 1K of event 'cycles:uppp'
        # Event count (approx.): 1381229476
        #
        # Children      Self  Symbol                                                                                                                               Source:Line
        # ........  ........  ...................................................................................................................................  ..................................................................................................................................
        #
            94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
            94.23%     0.00%  [.] _start                                                                                                                           _start+41
            44.58%     0.00%  [.] main                                                                                                                             main+100
            44.58%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              std::_Norm_helper<true>::_S_do_it<double>+100
            44.58%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     std::__complex_abs+100
            44.58%     0.00%  [.] std::abs<double> (inlined)                                                                                                       std::abs<double>+100
            44.58%     0.00%  [.] std::norm<double> (inlined)                                                                                                      std::norm<double>+100
            36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
            25.81%     0.00%  [.] main                                                                                                                             main+41
            25.81%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
            25.81%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
            25.69%    25.69%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
            18.39%     0.00%  [.] main                                                                                                                             main+57
            18.39%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
            18.39%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
            13.80%    13.80%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
             4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
             4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
      ...
      ~~~~~
      Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171019113836.5548-5-milian.wolff@kdab.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1fb7d06a
    • Milian Wolff's avatar
      perf report: Cache srclines for callchain nodes · 21ac9d54
      Milian Wolff authored
      On one hand this ensures that the memory is properly freed when the DSO
      gets freed. On the other hand this significantly speeds up the
      processing of the callchain nodes when lots of srclines are requested.
      For one of my data files e.g.:
      
      Before:
      
       Performance counter stats for 'perf report -s srcline -g srcline --stdio':
      
            52496.495043      task-clock (msec)         #    0.999 CPUs utilized
                     634      context-switches          #    0.012 K/sec
                       2      cpu-migrations            #    0.000 K/sec
                 191,561      page-faults               #    0.004 M/sec
         165,074,498,235      cycles                    #    3.144 GHz
         334,170,832,408      instructions              #    2.02  insn per cycle
          90,220,029,745      branches                  # 1718.591 M/sec
             654,525,177      branch-misses             #    0.73% of all branches
      
            52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
      
      After:
      
       Performance counter stats for 'perf report -s srcline -g srcline --stdio':
      
            22606.323706      task-clock (msec)         #    1.000 CPUs utilized
                      31      context-switches          #    0.001 K/sec
                       0      cpu-migrations            #    0.000 K/sec
                 185,471      page-faults               #    0.008 M/sec
          71,188,113,681      cycles                    #    3.149 GHz
         133,204,943,083      instructions              #    1.87  insn per cycle
          34,886,384,979      branches                  # 1543.214 M/sec
             278,214,495      branch-misses             #    0.80% of all branches
      
            22.609857253 seconds time elapsed
      
      Note that the difference is only this large when `--inline` is not
      passed. In such situations, we would use the inliner cache and thus do
      not run this code path that often.
      
      I think that this cache should actually be used in other places, too.
      When looking at the valgrind leak report for perf report, we see tons of
      srclines being leaked, most notably from calls to
      hist_entry__get_srcline. The problem is that get_srcline has many
      different formatting options (show_sym, show_addr, potentially even
      unwind_inlines when calling __get_srcline directly). As such, the
      srcline cannot easily be cached for all calls, or we'd have to add
      caches for all formatting combinations (6 so far). An alternative would
      be to remove the formatting options and handle that on a different level
      - i.e. print the sym/addr on demand wherever we actually output
      something. And the unwind_inlines could be moved into a separate
      function that does not return the srcline.
      Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      21ac9d54
    • Milian Wolff's avatar
      perf report: Cache failed lookups of inlined frames · b38775cf
      Milian Wolff authored
      When no inlined frames could be found for a given address, we did not
      store this information anywhere. That means we potentially do the costly
      inliner lookup repeatedly for cases where we know it can never succeed.
      
      This patch makes dso__parse_addr_inlines always return a valid
      inline_node. It will be empty when no inliners are found. This enables
      us to cache the empty list in the DSO, thereby improving the performance
      when many addresses fail to find the inliners.
      
      For my trivial example, the performance impact is already quite
      significant:
      
      Before:
      
      ~~~~~
       Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
      
              594.804032      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.07% )
                      53      context-switches          #    0.089 K/sec                    ( +-  4.09% )
                       0      cpu-migrations            #    0.000 K/sec                    ( +-100.00% )
                   5,687      page-faults               #    0.010 M/sec                    ( +-  0.02% )
           2,300,918,213      cycles                    #    3.868 GHz                      ( +-  0.09% )
           4,395,839,080      instructions              #    1.91  insn per cycle           ( +-  0.00% )
             939,177,205      branches                  # 1578.969 M/sec                    ( +-  0.00% )
              11,824,633      branch-misses             #    1.26% of all branches          ( +-  0.10% )
      
             0.596246531 seconds time elapsed                                          ( +-  0.07% )
      ~~~~~
      
      After:
      
      ~~~~~
       Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
      
              113.111405      task-clock (msec)         #    0.990 CPUs utilized            ( +-  0.89% )
                      29      context-switches          #    0.255 K/sec                    ( +- 54.25% )
                       0      cpu-migrations            #    0.000 K/sec
                   5,380      page-faults               #    0.048 M/sec                    ( +-  0.01% )
             432,378,779      cycles                    #    3.823 GHz                      ( +-  0.75% )
             670,057,633      instructions              #    1.55  insn per cycle           ( +-  0.01% )
             141,001,247      branches                  # 1246.570 M/sec                    ( +-  0.01% )
               2,346,845      branch-misses             #    1.66% of all branches          ( +-  0.19% )
      
             0.114222393 seconds time elapsed                                          ( +-  1.19% )
      ~~~~~
      Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171019113836.5548-3-milian.wolff@kdab.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b38775cf
    • Milian Wolff's avatar
      perf report: Properly handle branch count in match_chain() · bf36eb5c
      Milian Wolff authored
      Some of the code paths I introduced before returned too early without
      running the code to handle a node's branch count.  By refactoring
      match_chain to only have one exit point, this can be remedied.
      Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1707691.qaJ269GSZW@agathebauer
      Link: http://lkml.kernel.org/r/20171018185350.14893-2-milian.wolff@kdab.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bf36eb5c
  3. 24 Oct, 2017 12 commits
    • Milian Wolff's avatar
      perf report: Compare symbol name for inlined frames when sorting · aa441895
      Milian Wolff authored
      Similar to the callstack frame matching, we also have to compare the
      symbol name when sorting hist entries. The reason is twofold: On one
      hand, multiple inlined functions will use the same symbol start/end
      values of the parent, non-inlined symbol.
      
      As such, all of these symbols often end up missing from top-level
      report, as they get merged with the non-inlined frame. On the other
      hand, multiple different functions may end up inlining the same
      function, and we need to aggregate these values properly.
      
      Before:
      
      ~~~~~
        perf report --stdio --inline -g none
        # Children     Self  Command       Shared Object Symbol
        # ........ ........  ............  ............. ...................................
        #
           100.00%   39.69%  cpp-inlining  cpp-inlining  [.] main
           100.00%    0.00%  cpp-inlining  cpp-inlining  [.] _start
           100.00%    0.00%  cpp-inlining  libc-2.25.so  [.] __libc_start_main
            97.03%    0.00%  cpp-inlining  cpp-inlining  [.] std::norm<double> (inlined)
            59.53%    4.26%  cpp-inlining  libm-2.25.so  [.] hypot
            55.21%   55.08%  cpp-inlining  libm-2.25.so  [.] __hypot_finite
             0.52%    0.52%  cpp-inlining  libm-2.25.so  [.] cabs
      ~~~~~
      
      After:
      
      ~~~~~
        perf report --stdio --inline -g none
        # Children     Self  Command       Shared Object Symbol
        # ........ ........  ............  ............. ...................................................................................................................................
        #
           100.00%   39.69%  cpp-inlining  cpp-inlining  [.] main
           100.00%    0.00%  cpp-inlining  cpp-inlining  [.] _start
           100.00%    0.00%  cpp-inlining  libc-2.25.so  [.] __libc_start_main
            62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)
            62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::__complex_abs (inlined)
            62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::abs<double> (inlined)
            62.57%    0.00%  cpp-inlining  cpp-inlining  [.] std::norm<double> (inlined)
            59.53%    4.26%  cpp-inlining  libm-2.25.so  [.] hypot
            55.21%   55.08%  cpp-inlining  libm-2.25.so  [.] __hypot_finite
            34.46%    0.00%  cpp-inlining  cpp-inlining  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
            32.39%    0.00%  cpp-inlining  cpp-inlining  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)
            32.39%    0.00%  cpp-inlining  cpp-inlining  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
            12.29%    0.00%  cpp-inlining  cpp-inlining  [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
            12.29%    0.00%  cpp-inlining  cpp-inlining  [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
            12.29%    0.00%  cpp-inlining  cpp-inlining  [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
             0.52%    0.52%  cpp-inlining  libm-2.25.so  [.] cabs
      ~~~~~
      Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@redhat.com>
      Reviewed-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-11-milian.wolff@kdab.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      aa441895
    • Milian Wolff's avatar
      perf callchain: Compare symbol name for inlined frames when matching · 9856240a
      Milian Wolff authored
      The fake symbols we create for inlined frames will represent different
      functions but can use the symbol start address. This leads to issues
      when different inline branches all lead to the same function.
      
      Before:
      ~~~~~
      $ perf report -s sym -i perf.inlining.data --inline --stdio -g function
      ...
                   --38.86%--_start
                             __libc_start_main
                             main
                             |
                              --37.57%--std::norm<double> (inlined)
                                        std::_Norm_helper<true>::_S_do_it<double> (inlined)
                                        |
                                         --36.36%--std::abs<double> (inlined)
                                                   std::__complex_abs (inlined)
                                                   |
                                                    --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
                                                              std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
                                                              std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
      ~~~~~
      
      Note that this backtrace representation is completely bogus.
      Complex abs does not call the linear congruential engine! It
      is just a side-effect of a longer inlined stack being appended
      to a shorter, different inlined stack, both of which originate
      in the same function (main).
      
      This patch fixes the issue:
      
      ~~~~~
      $ perf report -s sym -i perf.inlining.data --inline --stdio -g function
      ...
                   --38.86%--_start
                             __libc_start_main
                             main
                             |
                             |--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                             |          std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                             |          |
                             |           --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)
                             |                     std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
                             |                     |
                             |                      --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
                             |                                std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
                             |                                std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
                             |
                              --1.99%--std::norm<double> (inlined)
                                        std::_Norm_helper<true>::_S_do_it<double> (inlined)
                                        std::abs<double> (inlined)
                                        std::__complex_abs (inlined)
      ~~~~~
      Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@redhat.com>
      Reviewed-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-10-milian.wolff@kdab.com
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      [ Fix up conflict with c1fbc0cf ("perf callchain: Compare dsos (as well) for CCKEY_FUNCTION"), remove unneeded hunk ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9856240a
    • Milian Wolff's avatar
      perf script: Mark inlined frames and do not print DSO for them · 9628b56d
      Milian Wolff authored
      Instead of showing the (repeated) DSO name of the non-inlined frame, we
      now show the "(inlined)" suffix instead.
      
      Before:
                         214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                          ace3 hypot (/usr/lib/libm-2.25.so)
                           a4a std::__complex_abs (/home/milian/projects/src/perf-tests/inlining)
                           a4a std::abs<double> (/home/milian/projects/src/perf-tests/inlining)
                           a4a std::_Norm_helper<true>::_S_do_it<double> (/home/milian/projects/src/perf-tests/inlining)
                           a4a std::norm<double> (/home/milian/projects/src/perf-tests/inlining)
                           a4a main (/home/milian/projects/src/perf-tests/inlining)
                         20510 __libc_start_main (/usr/lib/libc-2.25.so)
                           bd9 _start (/home/milian/projects/src/perf-tests/inlining)
      
      After:
                         214f7 __hypot_finite (/usr/lib/libm-2.25.so)
                          ace3 hypot (/usr/lib/libm-2.25.so)
                           a4a std::__complex_abs (inlined)
                           a4a std::abs<double> (inlined)
                           a4a std::_Norm_helper<true>::_S_do_it<double> (inlined)
                           a4a std::norm<double> (inlined)
                           a4a main (/home/milian/projects/src/perf-tests/inlining)
                         20510 __libc_start_main (/usr/lib/libc-2.25.so)
                           bd9 _start (/home/milian/projects/src/perf-tests/inlining)
      Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@redhat.com>
      Reviewed-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-9-milian.wolff@kdab.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9628b56d
    • Milian Wolff's avatar
      perf callchain: Mark inlined frames in output by " (inlined)" suffix · 8932f807
      Milian Wolff authored
      The original patch that introduced inline frame output in the various
      browsers used this suffix already. The new centralized approach that
      uses fake symbols for inlined frames was missing this approach so far.
      
      Instead of changing the symbol name itself, we only print the suffix
      where needed. This allows us to efficiently lookup the symbol for a
      given name without first having to append the suffix before the lookup.
      Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@redhat.com>
      Reviewed-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-8-milian.wolff@kdab.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8932f807
    • Milian Wolff's avatar
      perf report: Fall-back to function name comparison for -g srcline · cbe50f61
      Milian Wolff authored
      When a callchain entry has no srcline available, we ended up comparing
      the instruction pointer. I consider this to be not too useful. Rather, I
      think we should group the entries by function name, which this patch
      adds. For people who want to split the data on the IP boundary, using
      `-g address` is the correct choice.
      
      Before:
      
      ~~~~~
         100.00%    38.86%  [.] main
                  |
                  |--61.14%--main inlining.cpp:14
                  |          std::norm<double> complex:664
                  |          std::_Norm_helper<true>::_S_do_it<double> complex:654
                  |          std::abs<double> complex:597
                  |          std::__complex_abs complex:589
                  |          |
                  |          |--56.03%--hypot
                  |          |          |
                  |          |          |--8.45%--__hypot_finite
                  |          |          |
                  |          |          |--7.62%--__hypot_finite
                  |          |          |
                  |          |          |--2.29%--__hypot_finite
                  |          |          |
                  |          |          |--2.24%--__hypot_finite
                  |          |          |
                  |          |          |--2.06%--__hypot_finite
                  |          |          |
                  |          |          |--1.81%--__hypot_finite
      ...
      ~~~~~
      
      After:
      
      ~~~~~
         100.00%    38.86%  [.] main
                  |
                  |--61.14%--main inlining.cpp:14
                  |          std::norm<double> complex:664
                  |          std::_Norm_helper<true>::_S_do_it<double> complex:654
                  |          std::abs<double> complex:597
                  |          std::__complex_abs complex:589
                  |          |
                  |          |--60.29%--hypot
                  |          |          |
                  |          |           --56.03%--__hypot_finite
                  |          |
                  |           --0.85%--cabs
      ~~~~~
      Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@redhat.com>
      Reviewed-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-7-milian.wolff@kdab.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cbe50f61
    • Milian Wolff's avatar
      perf callchain: Create real callchain entries for inlined frames · 11ea2515
      Milian Wolff authored
      The inline_node structs are maintained by the new dso->inlines tree.
      This in turn keeps ownership of the fake symbols and srcline string
      representing an inline frame.
      
      This tree is sorted by address to allow quick lookups. All other entries
      of the symbol beside the function name are unused for inline frames. The
      advantage of this approach is that all existing users of the callchain
      API can now transparently display inlined frames without having to patch
      their code.
      Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@redhat.com>
      Reviewed-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-6-milian.wolff@kdab.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      11ea2515
    • Milian Wolff's avatar
      perf callchain: Refactor inline_list to store srcline string directly · 2be8832f
      Milian Wolff authored
      This is a preparation for the creation of real callchain entries for
      inlined frames. The rest of the perf code uses the srcline string. As
      such, using that also for the srcline API allows us to simplify some of
      the upcoming code. Most notably, it will allow us to cache the srcline
      for a given inline node and reuse it for different callchain entries.
      Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@redhat.com>
      Reviewed-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-5-milian.wolff@kdab.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2be8832f
    • Milian Wolff's avatar
      perf callchain: Refactor inline_list to operate on symbols · fea0cf84
      Milian Wolff authored
      This is a requirement to create real callchain entries for inlined
      frames.
      
      Since the list of inlines usually contains the target symbol too, i.e.
      the location where the frames get inlined to, we alias that symbol and
      reuse it as-is is. This ensures that other dependent functionality keeps
      working, most notably annotation of the target frames.
      
      For all other entries in the inline_list, a fake symbol is created.
      These are marked by new 'inlined' member which is set to true. Only
      those symbols are managed by the inline_list and get freed when the
      inline_list is deleted from within inline_node__delete.
      Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@redhat.com>
      Reviewed-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-4-milian.wolff@kdab.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      fea0cf84
    • Milian Wolff's avatar
      perf callchain: Store srcline in callchain_cursor_node · 40a342cd
      Milian Wolff authored
      This is mostly a preparation to enable the creation of full callchain
      nodes for inline frames. Such frames will reference the IP of the
      non-inlined frame, but hold the symbol and srcline for an inlined
      location. As such, we won't be able to query the srcline on-demand based
      on the IP alone. Instead, we will leverage the functionality provided by
      this patch here, and store the srcline for the inlined nodes in the new
      srcline member of callchain_cursor_node.
      
      Note that this patch on its own leaks the srcline, as there is no
      free_callchain_cursor_node or similar. A future patch will add caching
      of the srcline and handle deletion properly.
      Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@redhat.com>
      Reviewed-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-3-milian.wolff@kdab.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      40a342cd
    • Milian Wolff's avatar
      perf report: Remove code to handle inline frames from browsers · 2a704fc8
      Milian Wolff authored
      The follow-up commits will make inline frames first-class citizens in
      the callchain, thereby obsoleting all of this special code.
      Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@redhat.com>
      Reviewed-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20171009203310.17362-2-milian.wolff@kdab.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2a704fc8
    • Alexander Shishkin's avatar
      perf/x86/intel/bts: Fix exclusive event reference leak · 2eece390
      Alexander Shishkin authored
      Commit:
      
        d2878d64 ("perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems")
      
      ... adds a privilege check in the exactly wrong place in the event init path:
      after the 'LBR exclusive' reference has been taken, and doesn't release it
      in the case of insufficient privileges. After this, nobody in the system
      gets to use PT or LBR afterwards.
      
      This patch moves the privilege check to where it should have been in the
      first place.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: d2878d64 ("perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems")
      Link: http://lkml.kernel.org/r/20171023123533.16973-1-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2eece390
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-4.15-20171023' of... · 9b7c8547
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-4.15-20171023' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
       - Update vendor events JSON metrics for Intel's Broadwell, Broadwell
         Server, Haswell, Haswell Server, IvyBridge, IvyTown, JakeTown, Sandy
         Bridge, Skylake and SkyLake Server (Andi Kleen)
      
       - Add vendor event file for Intel's Goldmont Plus V1 (Kan Liang)
      
       - Move perf_mmap methods from 'perf record' and evlist.c to a separate
         mmap.[ch] pair, to better separate things and pave the way for further
         work on multithreading tools (Arnaldo Carvalho de Melo)
      
       - Do not check ABI headers in a detached tarball build, as it the kernel
         headers from where we copied tools/include/ are by definition not
         available (Arnaldo Carvalho de Melo)
      
       - Make 'perf script' use fprintf() like printing, i.e. receiving a FILE
         pointer so that it gets consistent with other tools/ code and allows
         for printing to per-event files (Arnaldo Carvalho de Melo)
      
       - Error handling fixes (resource release on exit) for 'perf script'
         and 'perf kmem' (Christophe JAILLET)
      
       - Make some 'perf event attr' tests optional on virtual machines, where
         tested counters are not available (Jiri Olsa)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9b7c8547
  4. 23 Oct, 2017 12 commits