• Waiman Long's avatar
    perf symbols: Fix vdso list searching · f9ceffb6
    Waiman Long authored
    When "perf record" was used on a large machine with a lot of CPUs, the
    perf post-processing time (the time after the workload was done until
    the perf command itself exited) could take a lot of minutes and even
    hours depending on how large the resulting perf.data file was.
    
    While running AIM7 1500-user high_systime workload on a 80-core x86-64
    system with a 3.9 kernel (with only the -s -a options used), the
    workload itself took about 2 minutes to run and the perf.data file had a
    size of 1108.746 MB. However, the post-processing step took more than 10
    minutes.
    
    With a gprof-profiled perf binary, the time spent by perf was as
    follows:
    
      %   cumulative   self              self     total
     time   seconds   seconds    calls   s/call   s/call  name
     96.90    822.10   822.10   192156     0.00     0.00  dsos__find
      0.81    828.96     6.86 172089958     0.00     0.00  rb_next
      0.41    832.44     3.48 48539289     0.00     0.00  rb_erase
    
    So 97% (822 seconds) of the time was spent in a single dsos_find()
    function. After analyzing the call-graph data below:
    
     -----------------------------------------------
                     0.00  822.12  192156/192156      map__new [6]
     [7]     96.9    0.00  822.12  192156         vdso__dso_findnew [7]
                   822.10    0.00  192156/192156      dsos__find [8]
                     0.01    0.00  192156/192156      dsos__add [62]
                     0.01    0.00  192156/192366      dso__new [61]
                     0.00    0.00       1/45282525     memdup [31]
                     0.00    0.00  192156/192230      dso__set_long_name [91]
     -----------------------------------------------
                   822.10    0.00  192156/192156      vdso__dso_findnew [7]
     [8]     96.9  822.10    0.00  192156         dsos__find [8]
     -----------------------------------------------
    
    It was found that the vdso__dso_findnew() function failed to locate
    VDSO__MAP_NAME ("[vdso]") in the dso list and have to insert a new
    entry at the end for 192156 times. This problem is due to the fact that
    there are 2 types of name in the dso entry - short name and long name.
    The initial dso__new() adds "[vdso]" to both the short and long names.
    After that, vdso__dso_findnew() modifies the long name to something
    like /tmp/perf-vdso.so-NoXkDj. The dsos__find() function only compares
    the long name. As a result, the same vdso entry is duplicated many
    time in the dso list. This bug increases memory consumption as well
    as slows the symbol processing time to a crawl.
    
    To resolve this problem, the dsos__find() function interface was
    modified to enable searching either the long name or the short
    name. The vdso__dso_findnew() will now search only the short name
    while the other call sites search for the long name as before.
    
    With this change, the cpu time of perf was reduced from 848.38s to
    15.77s and dsos__find() only accounted for 0.06% of the total time.
    
      0.06     15.73     0.01   192151     0.00     0.00  dsos__find
    Signed-off-by: default avatarWaiman Long <Waiman.Long@hp.com>
    Acked-by: default avatarIngo Molnar <mingo@kernel.org>
    Cc: "Chandramouleeswaran, Aswin" <aswin@hp.com>
    Cc: "Norton, Scott J" <scott.norton@hp.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Stephane Eranian <eranian@google.com>
    Link: http://lkml.kernel.org/r/1368110568-64714-1-git-send-email-Waiman.Long@hp.com
    [ replaced TRUE/FALSE with stdbool.h equivalents, fixing builds where
      those macros are not present (NO_LIBPYTHON=1 NO_LIBPERL=1), fix from Jiri Olsa ]
    Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    f9ceffb6
dso.c 12.9 KB