1. 26 Mar, 2021 4 commits
  2. 25 Mar, 2021 2 commits
  3. 24 Mar, 2021 3 commits
    • Tiezhu Yang's avatar
      MAINTAINERS: Add Mailing list and Web-page for PERFORMANCE EVENTS SUBSYSTEM · e0542cac
      Tiezhu Yang authored
      Add entry "L: linux-perf-users@vger.kernel.org" to archive the
      related mail on https://lore.kernel.org/linux-perf-users/, add
      entry "W: https://perf.wiki.kernel.org/" so that newbies could
      get some useful materials.
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/1615780592-21838-1-git-send-email-yangtiezhu@loongson.cnSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e0542cac
    • Jin Yao's avatar
      perf test: Add CSV summary test · 0f7ff383
      Jin Yao authored
      The patch "perf stat: Align CSV output for summary mode" aligned CSV
      output and added "summary" to the first column of summary lines.
      
      Now we check if the "summary" string is added to the CSV output.
      
      If we set '--no-csv-summary' option, the "summary" string would not be
      added, also check with this case.
      
      Committer testing:
      
        $ perf test csv
        84: perf stat csv summary test     : Ok
        $
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210319070156.20394-2-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0f7ff383
    • Jin Yao's avatar
      perf stat: Align CSV output for summary mode · 0bdad978
      Jin Yao authored
      The 'perf stat' subcommand supports the request for a summary of the
      interval counter readings.  But the summary lines break the CSV output
      so it's hard for scripts to parse the result.
      
      Before:
      
        # perf stat -x, -I1000 --interval-count 1 --summary
             1.001323097,8013.48,msec,cpu-clock,8013483384,100.00,8.013,CPUs utilized
             1.001323097,270,,context-switches,8013513297,100.00,0.034,K/sec
             1.001323097,13,,cpu-migrations,8013530032,100.00,0.002,K/sec
             1.001323097,184,,page-faults,8013546992,100.00,0.023,K/sec
             1.001323097,20574191,,cycles,8013551506,100.00,0.003,GHz
             1.001323097,10562267,,instructions,8013564958,100.00,0.51,insn per cycle
             1.001323097,2019244,,branches,8013575673,100.00,0.252,M/sec
             1.001323097,106152,,branch-misses,8013585776,100.00,5.26,of all branches
        8013.48,msec,cpu-clock,8013483384,100.00,7.984,CPUs utilized
        270,,context-switches,8013513297,100.00,0.034,K/sec
        13,,cpu-migrations,8013530032,100.00,0.002,K/sec
        184,,page-faults,8013546992,100.00,0.023,K/sec
        20574191,,cycles,8013551506,100.00,0.003,GHz
        10562267,,instructions,8013564958,100.00,0.51,insn per cycle
        2019244,,branches,8013575673,100.00,0.252,M/sec
        106152,,branch-misses,8013585776,100.00,5.26,of all branches
      
      The summary line loses the timestamp column, which breaks the CSV
      output.
      
      We add a column at the original 'timestamp' position and it just says
      'summary' for the summary line.
      
      After:
      
        # perf stat -x, -I1000 --interval-count 1 --summary
             1.001196053,8012.72,msec,cpu-clock,8012722903,100.00,8.013,CPUs utilized
             1.001196053,218,,context-switches,8012753271,100.00,0.027,K/sec
             1.001196053,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
             1.001196053,0,,page-faults,8012786257,100.00,0.000,K/sec
             1.001196053,15004518,,cycles,8012790637,100.00,0.002,GHz
             1.001196053,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
             1.001196053,1590259,,branches,8012814766,100.00,0.198,M/sec
             1.001196053,82601,,branch-misses,8012824365,100.00,5.19,of all branches
                 summary,8012.72,msec,cpu-clock,8012722903,100.00,7.986,CPUs utilized
                 summary,218,,context-switches,8012753271,100.00,0.027,K/sec
                 summary,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
                 summary,0,,page-faults,8012786257,100.00,0.000,K/sec
                 summary,15004518,,cycles,8012790637,100.00,0.002,GHz
                 summary,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
                 summary,1590259,,branches,8012814766,100.00,0.198,M/sec
                 summary,82601,,branch-misses,8012824365,100.00,5.19,of all branches
      
      Now it's easy for script to analyse the summary lines.
      
      Of course, we also consider not to break possible existing scripts which
      can continue to use the broken CSV format by using a new '--no-csv-summary.'
      option.
      
        # perf stat -x, -I1000 --interval-count 1 --summary --no-csv-summary
             1.001213261,8012.67,msec,cpu-clock,8012672327,100.00,8.013,CPUs utilized
             1.001213261,197,,context-switches,8012703742,100.00,24.586,/sec
             1.001213261,9,,cpu-migrations,8012720902,100.00,1.123,/sec
             1.001213261,644,,page-faults,8012738266,100.00,80.373,/sec
             1.001213261,18350698,,cycles,8012744109,100.00,0.002,GHz
             1.001213261,12745021,,instructions,8012759001,100.00,0.69,insn per cycle
             1.001213261,2458033,,branches,8012770864,100.00,306.768,K/sec
             1.001213261,102107,,branch-misses,8012781751,100.00,4.15,of all branches
        8012.67,msec,cpu-clock,8012672327,100.00,7.985,CPUs utilized
        197,,context-switches,8012703742,100.00,24.586,/sec
        9,,cpu-migrations,8012720902,100.00,1.123,/sec
        644,,page-faults,8012738266,100.00,80.373,/sec
        18350698,,cycles,8012744109,100.00,0.002,GHz
        12745021,,instructions,8012759001,100.00,0.69,insn per cycle
        2458033,,branches,8012770864,100.00,306.768,K/sec
        102107,,branch-misses,8012781751,100.00,4.15,of all branches
      
      This option can be enabled in perf config by setting the variable
      'stat.no-csv-summary'.
      
        # perf config stat.no-csv-summary=true
      
        # perf config -l
        stat.no-csv-summary=true
      
        # perf stat -x, -I1000 --interval-count 1 --summary
             1.001330198,8013.28,msec,cpu-clock,8013279201,100.00,8.013,CPUs utilized
             1.001330198,205,,context-switches,8013308394,100.00,25.583,/sec
             1.001330198,10,,cpu-migrations,8013324681,100.00,1.248,/sec
             1.001330198,0,,page-faults,8013340926,100.00,0.000,/sec
             1.001330198,8027742,,cycles,8013344503,100.00,0.001,GHz
             1.001330198,2871717,,instructions,8013356501,100.00,0.36,insn per cycle
             1.001330198,553564,,branches,8013366204,100.00,69.081,K/sec
             1.001330198,54021,,branch-misses,8013375952,100.00,9.76,of all branches
        8013.28,msec,cpu-clock,8013279201,100.00,7.985,CPUs utilized
        205,,context-switches,8013308394,100.00,25.583,/sec
        10,,cpu-migrations,8013324681,100.00,1.248,/sec
        0,,page-faults,8013340926,100.00,0.000,/sec
        8027742,,cycles,8013344503,100.00,0.001,GHz
        2871717,,instructions,8013356501,100.00,0.36,insn per cycle
        553564,,branches,8013366204,100.00,69.081,K/sec
        54021,,branch-misses,8013375952,100.00,9.76,of all branches
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Acked-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210319070156.20394-1-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0bdad978
  4. 23 Mar, 2021 4 commits
    • Song Liu's avatar
      perf test: Add a shell test for 'perf stat --bpf-counters' new option · 2c0cb9f5
      Song Liu authored
      Add a test to compare the output of perf-stat with and without option
      --bpf-counters. If the difference is more than 10%, the test is considered
      as failed.
      
      Committer testing:
      
        # perf test bpf-counters
        86: perf stat --bpf-counters test                                   : Ok
        # perf test -v bpf-counters
        86: perf stat --bpf-counters test                                   :
        --- start ---
        test child forked, pid 2433339
        test child finished with 0
        ---- end ----
        perf stat --bpf-counters test: Ok
        #
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Requested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lore.kernel.org/lkml/EC00E37D-8587-4662-8E30-7AD5F874FA84@fb.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2c0cb9f5
    • Song Liu's avatar
      perf stat: Measure 't0' and 'ref_time' after enable_counters() · 435b46ef
      Song Liu authored
      Take measurements of 't0' and 'ref_time' after enable_counters(), so
      that they only measure the time consumed when the counters are enabled.
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Acked-by: default avatarAndi Kleen <andi@firstfloor.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: kernel-team@fb.com
      Link: http://lore.kernel.org/lkml/20210316211837.910506-3-songliubraving@fb.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      435b46ef
    • Song Liu's avatar
      perf stat: Introduce 'bperf' to share hardware PMCs with BPF · 7fac83aa
      Song Liu authored
      The perf tool uses performance monitoring counters (PMCs) to monitor
      system performance. The PMCs are limited hardware resources. For
      example, Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu.
      
      Modern data center systems use these PMCs in many different ways: system
      level monitoring, (maybe nested) container level monitoring, per process
      monitoring, profiling (in sample mode), etc. In some cases, there are
      more active perf_events than available hardware PMCs. To allow all
      perf_events to have a chance to run, it is necessary to do expensive
      time multiplexing of events.
      
      On the other hand, many monitoring tools count the common metrics
      (cycles, instructions). It is a waste to have multiple tools create
      multiple perf_events of "cycles" and occupy multiple PMCs.
      
      bperf tries to reduce such wastes by allowing multiple perf_events of
      "cycles" or "instructions" (at different scopes) to share PMUs. Instead
      of having each perf-stat session to read its own perf_events, bperf uses
      BPF programs to read the perf_events and aggregate readings to BPF maps.
      Then, the perf-stat session(s) reads the values from these BPF maps.
      
      Please refer to the comment before the definition of bperf_ops for the
      description of bperf architecture.
      
      bperf is off by default. To enable it, pass --bpf-counters option to
      perf-stat. bperf uses a BPF hashmap to share information about BPF
      programs and maps used by bperf. This map is pinned to bpffs. The
      default path is /sys/fs/bpf/perf_attr_map. The user could change the
      path with option --bpf-attr-map.
      
      Committer testing:
      
        # dmesg|grep "Performance Events" -A5
        [    0.225277] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
        [    0.225280] ... version:                0
        [    0.225280] ... bit width:              48
        [    0.225281] ... generic registers:      6
        [    0.225281] ... value mask:             0000ffffffffffff
        [    0.225281] ... max period:             00007fffffffffff
        #
        #  for a in $(seq 6) ; do perf stat -a -e cycles,instructions sleep 100000 & done
        [1] 2436231
        [2] 2436232
        [3] 2436233
        [4] 2436234
        [5] 2436235
        [6] 2436236e
        # perf stat -a -e cycles,instructions sleep 0.1
      
         Performance counter stats for 'system wide':
      
               310,326,987      cycles                                                        (41.87%)
               236,143,290      instructions              #    0.76  insn per cycle           (41.87%)
      
               0.100800885 seconds time elapsed
      
        #
      
      We can see that the counters were enabled for this workload 41.87% of
      the time.
      
      Now with --bpf-counters:
      
        #  for a in $(seq 32) ; do perf stat --bpf-counters -a -e cycles,instructions sleep 100000 & done
        [1] 2436514
        [2] 2436515
        [3] 2436516
        [4] 2436517
        [5] 2436518
        [6] 2436519
        [7] 2436520
        [8] 2436521
        [9] 2436522
        [10] 2436523
        [11] 2436524
        [12] 2436525
        [13] 2436526
        [14] 2436527
        [15] 2436528
        [16] 2436529
        [17] 2436530
        [18] 2436531
        [19] 2436532
        [20] 2436533
        [21] 2436534
        [22] 2436535
        [23] 2436536
        [24] 2436537
        [25] 2436538
        [26] 2436539
        [27] 2436540
        [28] 2436541
        [29] 2436542
        [30] 2436543
        [31] 2436544
        [32] 2436545
        #
        # ls -la /sys/fs/bpf/perf_attr_map
        -rw-------. 1 root root 0 Mar 23 14:53 /sys/fs/bpf/perf_attr_map
        # bpftool map | grep bperf | wc -l
        64
        #
      
        # bpftool map | tail
        1265: percpu_array  name accum_readings  flags 0x0
        	key 4B  value 24B  max_entries 1  memlock 4096B
        1266: hash  name filter  flags 0x0
        	key 4B  value 4B  max_entries 1  memlock 4096B
        1267: array  name bperf_fo.bss  flags 0x400
        	key 4B  value 8B  max_entries 1  memlock 4096B
        	btf_id 996
        	pids perf(2436545)
        1268: percpu_array  name accum_readings  flags 0x0
        	key 4B  value 24B  max_entries 1  memlock 4096B
        1269: hash  name filter  flags 0x0
        	key 4B  value 4B  max_entries 1  memlock 4096B
        1270: array  name bperf_fo.bss  flags 0x400
        	key 4B  value 8B  max_entries 1  memlock 4096B
        	btf_id 997
        	pids perf(2436541)
        1285: array  name pid_iter.rodata  flags 0x480
        	key 4B  value 4B  max_entries 1  memlock 4096B
        	btf_id 1017  frozen
        	pids bpftool(2437504)
        1286: array  flags 0x0
        	key 4B  value 32B  max_entries 1  memlock 4096B
        #
        # bpftool map dump id 1268 | tail
        value (CPU 21):
        8f f3 bc ca 00 00 00 00  80 fd 2a d1 4d 00 00 00
        80 fd 2a d1 4d 00 00 00
        value (CPU 22):
        7e d5 64 4d 00 00 00 00  a4 8a 2e ee 4d 00 00 00
        a4 8a 2e ee 4d 00 00 00
        value (CPU 23):
        a7 78 3e 06 01 00 00 00  b2 34 94 f6 4d 00 00 00
        b2 34 94 f6 4d 00 00 00
        Found 1 element
        # bpftool map dump id 1268 | tail
        value (CPU 21):
        c6 8b d9 ca 00 00 00 00  20 c6 fc 83 4e 00 00 00
        20 c6 fc 83 4e 00 00 00
        value (CPU 22):
        9c b4 d2 4d 00 00 00 00  3e 0c df 89 4e 00 00 00
        3e 0c df 89 4e 00 00 00
        value (CPU 23):
        18 43 66 06 01 00 00 00  5b 69 ed 83 4e 00 00 00
        5b 69 ed 83 4e 00 00 00
        Found 1 element
        # bpftool map dump id 1268 | tail
        value (CPU 21):
        f2 6e db ca 00 00 00 00  92 67 4c ba 4e 00 00 00
        92 67 4c ba 4e 00 00 00
        value (CPU 22):
        dc 8e e1 4d 00 00 00 00  d9 32 7a c5 4e 00 00 00
        d9 32 7a c5 4e 00 00 00
        value (CPU 23):
        bd 2b 73 06 01 00 00 00  7c 73 87 bf 4e 00 00 00
        7c 73 87 bf 4e 00 00 00
        Found 1 element
        #
      
        # perf stat --bpf-counters -a -e cycles,instructions sleep 0.1
      
         Performance counter stats for 'system wide':
      
             119,410,122      cycles
             152,105,479      instructions              #    1.27  insn per cycle
      
             0.101395093 seconds time elapsed
      
        #
      
      See? We had the counters enabled all the time.
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: kernel-team@fb.com
      Link: http://lore.kernel.org/lkml/20210316211837.910506-2-songliubraving@fb.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7fac83aa
    • Ingo Molnar's avatar
      perf tools: Fix various typos in comments · 4d39c89f
      Ingo Molnar authored
      Fix ~124 single-word typos and a few spelling errors in the perf tooling code,
      accumulated over the years.
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210321113734.GA248990@gmail.com
      Link: http://lore.kernel.org/lkml/20210323160915.GA61903@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4d39c89f
  5. 17 Mar, 2021 3 commits
  6. 15 Mar, 2021 5 commits
    • Changbin Du's avatar
      perf stat: Improve readability of shadow stats · 6859bc0e
      Changbin Du authored
      This adds function convert_unit_double() and selects appropriate
      unit for shadow stats between K/M/G.
      
        $ sudo perf stat -a -- sleep 1
      
      Before: Unit 'M' is selected even the number is very small.
      
       Performance counter stats for 'system wide':
      
                4,003.06 msec cpu-clock                 #    3.998 CPUs utilized
                  16,179      context-switches          #    0.004 M/sec
                     161      cpu-migrations            #    0.040 K/sec
                   4,699      page-faults               #    0.001 M/sec
           6,135,801,925      cycles                    #    1.533 GHz                      (83.21%)
           5,783,308,491      stalled-cycles-frontend   #   94.26% frontend cycles idle     (83.21%)
           4,543,694,050      stalled-cycles-backend    #   74.05% backend cycles idle      (66.49%)
           4,720,130,587      instructions              #    0.77  insn per cycle
                                                        #    1.23  stalled cycles per insn  (83.28%)
             753,848,078      branches                  #  188.318 M/sec                    (83.61%)
              37,457,747      branch-misses             #    4.97% of all branches          (83.48%)
      
             1.001283725 seconds time elapsed
      
      After:
      
      $ sudo perf stat -a -- sleep 2
      
       Performance counter stats for 'system wide':
      
                8,005.52 msec cpu-clock                 #    3.999 CPUs utilized
                  10,715      context-switches          #    1.338 K/sec
                     785      cpu-migrations            #   98.057 /sec
                     102      page-faults               #   12.741 /sec
           1,948,202,279      cycles                    #    0.243 GHz
           2,816,470,932      stalled-cycles-frontend   #  144.57% frontend cycles idle
           2,661,172,207      stalled-cycles-backend    #  136.60% backend cycles idle
             464,172,105      instructions              #    0.24  insn per cycle
                                                        #    6.07  stalled cycles per insn
              91,567,662      branches                  #   11.438 M/sec
               7,756,054      branch-misses             #    8.47% of all branches
      
             2.002040043 seconds time elapsed
      
      v2:
        o do not change 'sec' to 'cpu-sec'.
        o use convert_unit_double to implement convert_unit.
      Signed-off-by: default avatarChangbin Du <changbin.du@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20210315143047.3867-1-changbin.du@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6859bc0e
    • Arnaldo Carvalho de Melo's avatar
      perf stat: Elaborate use cases for the -n/--null command line option · 4a03af3e
      Arnaldo Carvalho de Melo authored
      The existing text was way too terse, pick the intended usage from the
      cset that introduced this option.
      
      Twitter: https://twitter.com/_monoid/status/1371461130175004672?s=20Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4a03af3e
    • Shunsuke Nakamura's avatar
      perf vendor events arm64: Add Fujitsu A64FX pmu event · 5497b23e
      Shunsuke Nakamura authored
      Add pmu events for A64FX.
      
      Documentation source:
      
        https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_PMU_Events_v1.2.pdfSigned-off-by: default avatarNakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com>
      Reviewed-by: default avatarJohn Garry <john.garry@huawei.com>
      Tested-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20210308105342.746940-3-nakamura.shun@fujitsu.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5497b23e
    • Shunsuke Nakamura's avatar
      perf vendor events arm64: Add more common and uarch events · 8efd1634
      Shunsuke Nakamura authored
      Add the following events.[1]
      
      Common architectural events:
        - L2I_TLB_REFILL
        - L2I_TLB
        - SIMD_INST_RETIRED
        - SVE_INST_RETIRED
      
      Common microarchitectural events:
        - UOP_SPEC
        - SVE_MATH_SPEC
        - FP_SPEC
        - FP_FMA_SPEC
        - FP_RECPE_SPEC
        - FP_CVT_SPEC
        - ASE_SVE_INT_SPEC
        - SVE_PRED_SPEC
        - SVE_MOVPRFX_SPEC
        - SVE_MOVPRFX_U_SPEC
        - ASE_SVE_LD_SPEC
        - ASE_SVE_ST_SPEC
        - PRF_SPEC
        - BASE_LD_REG_SPEC
        - BASE_ST_REG_SPEC
        - SVE_LDR_REG_SPEC
        - SVE_STR_REG_SPEC
        - SVE_LDR_PREG_SPEC
        - SVE_STR_PREG_SPEC
        - SVE_PRF_CONTIG_SPEC
        - ASE_SVE_LD_MULTI_SPEC
        - ASE_SVE_ST_MULTI_SPEC
        - SVE_LD_GATHER_SPEC
        - SVE_ST_SCATTER_SPEC
        - SVE_PRF_GATHER_SPEC
        - SVE_LDFF_SPEC
        - FP_SCALE_OPS_SPEC
        - FP_FIXED_OPS_SPEC
        - FP_HP_SCALE_OPS_SPEC
        - FP_HP_FIXED_OPS_SPEC
        - FP_SP_SCALE_OPS_SPEC
        - FP_SP_FIXED_OPS_SPEC
        - FP_DP_SCALE_OPS_SPEC
        - FP_DP_FIXED_OPS_SPEC
      
      Reference document is at the following:
      
        [1] https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_PMU_Events_v1.2.pdfSigned-off-by: default avatarNakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com>
      Reviewed-by: default avatarJohn Garry <john.garry@huawei.com>
      Tested-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lore.kernel.org/lkml/20210308105342.746940-2-nakamura.shun@fujitsu.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8efd1634
    • Arnaldo Carvalho de Melo's avatar
      perf evlist: Change the COMM when preparing the workload · a7672d1d
      Arnaldo Carvalho de Melo authored
      It was reported that --exclude-perf wasn't working, as tracepoints were
      appearing in 'perf script' output as having the 'perf' COMM, that is
      just the window in evlist__prepare_workload() after the fork() and
      before the execvp() call for workloads specified in the command line.
      
      Example:
      
        # perf record -e kmem:kmalloc --filter 'bytes_alloc<650 && bytes_alloc>620' --exclude-perf -e kmem:kfree --exclude-perf -aR sleep 30
      
      Then:
      
        # perf script
                perf 15905 [009] 1498.356094: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
                perf 15905 [009] 1498.356116: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil)
                perf 15905 [009] 1498.356116: kmem:kfree: call_site=do_execveat_common+0x19d ptr=0xffff9cf750421c00
                perf 15905 [009] 1498.356138: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
                perf 15905 [009] 1498.356148: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil)
                perf 15905 [009] 1498.356148: kmem:kfree: call_site=do_execveat_common+0x19d ptr=0xffff9cf750421c00
                perf 15905 [009] 1498.356168: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
                perf 15905 [009] 1498.356176: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil)
        <SNIP>
                perf 15905 [009] 1498.356348: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
                perf 15905 [014] 1498.356386: kmem:kfree: call_site=security_compute_sid.part.0+0x3b2 ptr=(nil)
                perf 15905 [014] 1498.356423: kmem:kfree: call_site=load_elf_binary+0x207 ptr=0xffff9cf5b2a34220
                perf 15905 [014] 1498.356694: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf6d0b3b000
               sleep 15905 [014] 1498.356739: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
      
      Use prctl() to show that that is just the preparation of the workload:
      
        # perf script
           perf-exec 19036 [009] 2199.357582: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
           perf-exec 19036 [009] 2199.357604: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil)
           perf-exec 19036 [009] 2199.357604: kmem:kfree: call_site=do_execveat_common+0x19d ptr=0xffff9cf786459800
           perf-exec 19036 [009] 2199.357630: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
        <SNIP>
           perf-exec 19036 [000] 2199.358277: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf786fb9c00
           perf-exec 19036 [000] 2199.358278: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf786458200
           perf-exec 19036 [000] 2199.358279: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf786458600
               sleep 19036 [000] 2199.358316: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
               sleep 19036 [000] 2199.358323: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil)
               sleep 19036 [000] 2199.358330: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000
               sleep 19036 [000] 2199.358337: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000
               sleep 19036 [000] 2199.358339: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000
               sleep 19036 [000] 2199.358341: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000
      
      Reporter: zhanweiw <wingfancy@hotmail.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212213Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a7672d1d
  7. 09 Mar, 2021 4 commits
  8. 08 Mar, 2021 3 commits
    • Arnaldo Carvalho de Melo's avatar
      perf symbols: Fix dso__fprintf_symbols_by_name() to return the number of printed chars · 210e4c89
      Arnaldo Carvalho de Melo authored
      The 'ret' variable was initialized to zero but then it was not updated
      from the fprintf() return, fix it.
      Reported-by: default avatarYang Li <yang.lee@linux.alibaba.com>
      cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      cc: Ingo Molnar <mingo@redhat.com>
      cc: Jiri Olsa <jolsa@redhat.com>
      cc: Mark Rutland <mark.rutland@arm.com>
      cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Fixes: 90f18e63 ("perf symbols: List symbols in a dso in ascending name order")
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      210e4c89
    • Ian Rogers's avatar
      tools include: Add __sum16 and __wsum definitions. · 2942a671
      Ian Rogers authored
      This adds definitions available in the uapi version.
      
      Explanation:
      
      In the kernel include of types.h the uapi version is included.
      In tools the uapi/linux/types.h and linux/types.h are distinct.
      For BPF programs a definition of __wsum is needed by the generated
      bpf_helpers.h. The definition comes either from a generated vmlinux.h or
      from <linux/types.h> that may be transitively included from bpf.h. The
      perf build prefers linux/types.h over uapi/linux/types.h for
      <linux/types.h>*. To allow tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c
      to compile with the same include path used for perf then these
      definitions are necessary.
      
      There is likely a wider conversation about exactly how types.h should be
      specified and the include order used by the perf build - it is somewhat
      confusing that tools/include/uapi/linux/bpf.h is using the non-uapi
      types.h.
      
      *see tools/perf/Makefile.config:
      ...
      INC_FLAGS += -I$(srctree)/tools/include/
      INC_FLAGS += -I$(srctree)/tools/arch/$(SRCARCH)/include/uapi
      INC_FLAGS += -I$(srctree)/tools/include/uapi
      ...
      The include directories are scanned from left-to-right:
      https://gcc.gnu.org/onlinedocs/gcc/Directory-Options.html
      As tools/include/linux/types.h appears before
      tools/include/uapi/linux/types.h then I say it is preferred.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: KP Singh <kpsingh@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: bpf@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20210307223024.4081067-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2942a671
    • Arnaldo Carvalho de Melo's avatar
      Merge remote-tracking branch 'torvalds/master' into perf/core · 009ef05f
      Arnaldo Carvalho de Melo authored
      To pick up the fixes sent for v5.12 and continue development based on
      v5.12-rc2, i.e. without the swap on file bug.
      
      This also gets a slightly newer and better tools/perf/arch/arm/util/cs-etm.c
      patch version, using the BIT() macro, that had already been slated to
      v5.13 but ended up going to v5.12-rc1 on an older version.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      009ef05f
  9. 07 Mar, 2021 4 commits
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v5.12-2020-03-07' of... · 144c79ef
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v5.12-2020-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
       "Perf tool fixes:
      
         - Fix wrong skipping for per-die aggregation in 'perf stat'.
      
         - Fix race in signal handling on large core count machines, setting
           up signal handlers earlier.
      
         - Fix -F for branch & mem modes in 'perf report'.
      
         - Fix the condition checks for max number of NUMA nodes in 'perf
           bench numa'.
      
         - Fix crash in 'perf diff' error path.
      
         - Fix filtering of empty build-ids in 'perf archive'.
      
         - Ensure read cmdlines from libtraceevent are null terminated.
      
        Recent regressions:
      
         - Fix control fifo permissions in 'perf daemon'.
      
         - Fix 'perf daemon' compile error with ASAN.
      
         - Fix running 'perf daemon' test for non root user.
      
         - Fix PERF_SAMPLE_WEIGHT_STRUCT 'perf test' failure on non-x86
           arches.
      
         - Fix event's PMU name parsing related to new drm/i915/gt
           software-gt-awake-time event.
      
        Fixes from compiler instrumentation:
      
         - Fix leaks in 'perf test' entries, found using ASAN.
      
         - Fix use-after-free when 'perf stat -r' option is used.
      
        Arch specific:
      
         - Fix bitmap for option om ARM's CS-ETM.
      
        Documentation:
      
         - Fix documentation of verbose options.
      
        Build:
      
         - Clean 'generated' directory used for creating the syscall table on
           x86.
      
         - Fix ccache usage in $(CC) when generating arch errno table.
      
         - Cast (struct timeval).tv_sec when printing, fixing the build with
           MUSL libc.
      
         - Tighten snprintf() string precision to pass gcc check on some
           32-bit arches.
      
         - Update UAPI copies from the kernel sources.
      
         - Fix regression on feature detection 'make clean' target"
      
      * tag 'perf-tools-fixes-for-v5.12-2020-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (44 commits)
        perf cs-etm: Fix bitmap for option
        perf trace: Fix race in signal handling
        perf map: Tighten snprintf() string precision to pass gcc check on some 32-bit arches
        perf report: Fix -F for branch & mem modes
        perf tests x86: Move insn.h include to make sure it finds stddef.h
        perf test: Support the ins_lat check in the X86 specific test
        perf test: Fix sample-parsing failure on non-x86 platforms
        perf archive: Fix filtering of empty build-ids
        perf daemon: Fix compile error with Asan
        perf stat: Fix use-after-free when -r option is used
        libperf: Add perf_evlist__reset_id_hash()
        perf stat: Fix wrong skipping for per-die aggregation
        tools headers UAPI: Sync KVM's kvm.h and vmx.h headers with the kernel sources
        tools headers cpufeatures: Sync with the kernel sources
        tools headers UAPI: Update tools' copy of linux/coresight-pmu.h
        tools headers: Update syscall.tbl files to support mount_setattr
        perf test: Fix cpu and thread map leaks in perf_time_to_tsc test
        perf test: Fix cpu map leaks in cpu_map_print test
        perf test: Fix a memory leak in thread_map_remove test
        perf test: Fix a thread map leak in thread_map_synthesize test
        ...
      144c79ef
    • Linus Torvalds's avatar
      Merge branch 'parisc-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 3bb48a85
      Linus Torvalds authored
      Pull parisc fixes from Helge Deller:
       "Two small parisc architecture fixes: fix a linking failure reported by
        the kernel test robot and remove a duplicate include"
      
      * 'parisc-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        arch/parisc/kernel: remove duplicate include in ptrace
        parisc: Enable -mlong-calls gcc option with CONFIG_COMPILE_TEST
      3bb48a85
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · fbda7904
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "One non-fix, the conversion of vio_driver->remove() to return void,
        which touches various powerpc specific drivers.
      
        Fix the privilege checks we do in our perf handling, which could cause
        soft/hard lockups in some configurations.
      
        Fix a bug with IRQ affinity seen on kdump kernels when CPU 0 is
        offline in the second kernel.
      
        Fix missed page faults after mprotect(..., PROT_NONE) on 603 (32-bit).
      
        Fix a bug in our VSX (vector) instruction emulation, which should only
        be seen when doing VSX ops to cache inhibited mappings.
      
        Three commits fixing various build issues with obscure configurations.
      
        Thanks to Athira Rajeev, Cédric Le Goater, Christophe Leroy, Christoph
        Plattner, Greg Kurz, Jordan Niethe, Laurent Vivier, Ravi Bangoria,
        Tyrel Datwyler, and Uwe Kleine-König"
      
      * tag 'powerpc-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/sstep: Fix VSX instruction emulation
        powerpc/perf: Fix handling of privilege level checks in perf interrupt context
        powerpc: Force inlining of mmu_has_feature to fix build failure
        vio: make remove callback return void
        powerpc/syscall: Force inlining of __prep_irq_for_enabled_exit()
        powerpc/603: Fix protection of user pages mapped with PROT_NONE
        powerpc/pseries: Don't enforce MSI affinity with kdump
        powerpc/4xx: Fix build errors from mfdcr()
      fbda7904
    • Linus Torvalds's avatar
      Merge tag 'm68k-for-v5.12-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k · dac51870
      Linus Torvalds authored
      Pull m68k fix from Geert Uytterhoeven:
       "Fix virt_addr_valid() W=1 compiler warnings.
      
        This is a single non-critical fix. As the build bots are now testing
        all new code with W=1, these warnings are popping up everywhere,
        confusing people. Hence I think it makes sense to silence it as soon
        as possible"
      
      * tag 'm68k-for-v5.12-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
        m68k: Fix virt_addr_valid() W=1 compiler warnings
      dac51870
  10. 06 Mar, 2021 8 commits