• Waiman Long's avatar
    perf symbols: Improve DSO long names lookup speed with rbtree · 4598a0a6
    Waiman Long authored
    With workload that spawns and destroys many threads and processes, it
    was found that perf-mem could took a long time to post-process the perf
    data after the target workload had completed its operation.
    
    The performance bottleneck was found to be the lookup and insertion of
    the new DSO structures (thousands of them in this case).
    
    In a dual-socket Ivy-Bridge E7-4890 v2 machine (30-core, 60-thread), the
    perf profile below shows what perf was doing after the profiled AIM7
    shared workload completed:
    
    -     83.94%  perf  libc-2.11.3.so     [.] __strcmp_sse42
       - __strcmp_sse42
          - 99.82% map__new
               machine__process_mmap_event
               perf_session_deliver_event
               perf_session__process_event
               __perf_session__process_events
               cmd_record
               cmd_mem
               run_builtin
               main
               __libc_start_main
    -     13.17%  perf  perf               [.] __dsos__findnew
         __dsos__findnew
         map__new
         machine__process_mmap_event
         perf_session_deliver_event
         perf_session__process_event
         __perf_session__process_events
         cmd_record
         cmd_mem
         run_builtin
         main
         __libc_start_main
    
    So about 97% of CPU times were spent in the map__new() function trying
    to insert new DSO entry into the DSO linked list. The whole
    post-processing step took about 9 minutes.
    
    The DSO structures are currently searched linearly. So the total
    processing time will be proportional to n^2.
    
    To overcome this performance problem, the DSO code is modified to also
    put the DSO structures in a RB tree sorted by its long name in
    additional to being in a simple linked list. With this change, the
    processing time will become proportional to n*log(n) which will be much
    quicker for large n. However, the short name will still be searched
    using the old linear searching method.  With that patch in place, the
    same perf-mem post-processing step took less than 30 seconds to
    complete.
    Signed-off-by: default avatarWaiman Long <Waiman.Long@hp.com>
    Cc: Adrian Hunter <adrian.hunter@intel.com>
    Cc: Don Zickus <dzickus@redhat.com>
    Cc: Douglas Hatch <doug.hatch@hp.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Scott J Norton <scott.norton@hp.com>
    Link: http://lkml.kernel.org/r/1412098575-27863-3-git-send-email-Waiman.Long@hp.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    4598a0a6
machine.c 37.9 KB