• Ian Rogers's avatar
    perf synthetic events: Remove use of sscanf from /proc reading · 2069425e
    Ian Rogers authored
    The synthesize benchmark, run on a single process and thread, shows
    perf_event__synthesize_mmap_events as the hottest function with fgets
    and sscanf taking the majority of execution time.
    
    fscanf performs similarly well. Replace the scanf call with manual
    reading of each field of the /proc/pid/maps line, and remove some
    unnecessary buffering.
    
    This change also addresses potential, but unlikely, buffer overruns for
    the string values read by scanf.
    
    Performance before is:
    
      $ sudo perf bench internals synthesize -m 16 -M 16 -s -t
      \# Running 'internals/synthesize' benchmark:
      Computing performance of single threaded perf event synthesis by
      synthesizing events on the perf process itself:
        Average synthesis took: 102.810 usec (+- 0.027 usec)
        Average num. events: 17.000 (+- 0.000)
        Average time per event 6.048 usec
        Average data synthesis took: 106.325 usec (+- 0.018 usec)
        Average num. events: 89.000 (+- 0.000)
        Average time per event 1.195 usec
      Computing performance of multi threaded perf event synthesis by
      synthesizing events on CPU 0:
        Number of synthesis threads: 16
          Average synthesis took: 68103.100 usec (+- 441.234 usec)
          Average num. events: 30703.000 (+- 0.730)
          Average time per event 2.218 usec
    
    And after is:
    
      $ sudo perf bench internals synthesize -m 16 -M 16 -s -t
      \# Running 'internals/synthesize' benchmark:
      Computing performance of single threaded perf event synthesis by
      synthesizing events on the perf process itself:
        Average synthesis took: 50.388 usec (+- 0.031 usec)
        Average num. events: 17.000 (+- 0.000)
        Average time per event 2.964 usec
        Average data synthesis took: 52.693 usec (+- 0.020 usec)
        Average num. events: 89.000 (+- 0.000)
        Average time per event 0.592 usec
      Computing performance of multi threaded perf event synthesis by
      synthesizing events on CPU 0:
        Number of synthesis threads: 16
          Average synthesis took: 45022.400 usec (+- 552.740 usec)
          Average num. events: 30624.200 (+- 10.037)
          Average time per event 1.470 usec
    
    On a Intel Xeon 6154 compiling with Debian gcc 9.2.1.
    
    Committer testing:
    
    On a AMD Ryzen 5 3600X 6-Core Processor:
    
    Before:
    
      # perf bench internals synthesize --min-threads 12 --max-threads 12 --st --mt
      # Running 'internals/synthesize' benchmark:
      Computing performance of single threaded perf event synthesis by
      synthesizing events on the perf process itself:
        Average synthesis took: 267.491 usec (+- 0.176 usec)
        Average num. events: 56.000 (+- 0.000)
        Average time per event 4.777 usec
        Average data synthesis took: 277.257 usec (+- 0.169 usec)
        Average num. events: 287.000 (+- 0.000)
        Average time per event 0.966 usec
      Computing performance of multi threaded perf event synthesis by
      synthesizing events on CPU 0:
        Number of synthesis threads: 12
          Average synthesis took: 81599.500 usec (+- 346.315 usec)
          Average num. events: 36096.100 (+- 2.523)
          Average time per event 2.261 usec
      #
    
    After:
    
      # perf bench internals synthesize --min-threads 12 --max-threads 12 --st --mt
      # Running 'internals/synthesize' benchmark:
      Computing performance of single threaded perf event synthesis by
      synthesizing events on the perf process itself:
        Average synthesis took: 110.125 usec (+- 0.080 usec)
        Average num. events: 56.000 (+- 0.000)
        Average time per event 1.967 usec
        Average data synthesis took: 118.518 usec (+- 0.057 usec)
        Average num. events: 287.000 (+- 0.000)
        Average time per event 0.413 usec
      Computing performance of multi threaded perf event synthesis by
      synthesizing events on CPU 0:
        Number of synthesis threads: 12
          Average synthesis took: 43490.700 usec (+- 284.527 usec)
          Average num. events: 37028.500 (+- 0.563)
          Average time per event 1.175 usec
      #
    Signed-off-by: default avatarIan Rogers <irogers@google.com>
    Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
    Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Andrey Zhizhikin <andrey.z@gmail.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Kan Liang <kan.liang@linux.intel.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Stephane Eranian <eranian@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Link: http://lore.kernel.org/lkml/20200415054050.31645-4-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    2069425e
synthetic-events.c 48.9 KB