- 26 Mar, 2021 4 commits
-
-
Athira Rajeev authored
The sort dimension "p_stage_cyc" is used to represent pipeline stage cycle information. Presently, this is used only in powerpc. For unsupported platforms, we don't want to display it in the perf report output columns. Hence add check in sort_dimension__add() and skip the sort key incase it is not applicable for the particular arch. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: https://lore.kernel.org/r/1616425047-1666-6-git-send-email-atrajeev@linux.vnet.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Athira Rajeev authored
The pipeline stage cycles details can be recorded on powerpc from the contents of Performance Monitor Unit (PMU) registers. On ISA v3.1 platform, sampling registers exposes the cycles spent in different pipeline stages. Patch adds perf tools support to present two of the cycle counter information along with memory latency (weight). Re-use the field 'ins_lat' for storing the first pipeline stage cycle. This is stored in 'var2_w' field of 'perf_sample_weight'. Add a new field 'p_stage_cyc' to store the second pipeline stage cycle which is stored in 'var3_w' field of perf_sample_weight. Add new sort function 'Pipeline Stage Cycle' and include this in default_mem_sort_order[]. This new sort function may be used to denote some other pipeline stage in another architecture. So add this to list of sort entries that can have dynamic header string. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: https://lore.kernel.org/r/1616425047-1666-5-git-send-email-atrajeev@linux.vnet.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Athira Rajeev authored
Add arch specific arch_evsel__set_sample_weight() to set the new sample type for powerpc. Add arch specific arch_perf_parse_sample_weight() to store the sample->weight values depending on the sample type applied. if the new sample type (PERF_SAMPLE_WEIGHT_STRUCT) is applied, store only the lower 32 bits to sample->weight. If sample type is 'PERF_SAMPLE_WEIGHT', store the full 64-bit to sample->weight. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: https://lore.kernel.org/r/1616425047-1666-4-git-send-email-atrajeev@linux.vnet.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Athira Rajeev authored
Currently the header string for different columns in perf report is fixed. Some fields of perf sample could have different meaning for different architectures than the meaning conveyed by the header string. An example is the new field 'var2_w' of perf_sample_weight structure. This is presently captured as 'Local INSTR Latency' in perf mem report. But this could be used to denote a different latency cycle in another architecture. Introduce a weak function arch_perf_header_entry() to set the arch specific header string for the fields which can contain dynamic header. If the architecture do not have this function, fall back to the default header string value. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: https://lore.kernel.org/r/1616425047-1666-3-git-send-email-atrajeev@linux.vnet.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 25 Mar, 2021 2 commits
-
-
Wan Jiabing authored
sys/stat.h has been included at line 23, so remove the duplicate one at line 27. linux/string.h has been included at line 7, so remove the duplicate one at line 9. time.h has been included at line 14, so remove the duplicate one at line 28. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kael_w@yeah.net Link: http://lore.kernel.org/lkml/20210323050139.287461-1-wanjiabing@vivo.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Wan Jiabing authored
'struct evlist' has been declared at 10th line. 'struct comm' has been declared at 15th line. Remove the duplicates Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kael_w@yeah.net Link: http://lore.kernel.org/lkml/20210325043947.846093-1-wanjiabing@vivo.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 24 Mar, 2021 3 commits
-
-
Tiezhu Yang authored
Add entry "L: linux-perf-users@vger.kernel.org" to archive the related mail on https://lore.kernel.org/linux-perf-users/, add entry "W: https://perf.wiki.kernel.org/" so that newbies could get some useful materials. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/1615780592-21838-1-git-send-email-yangtiezhu@loongson.cnSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jin Yao authored
The patch "perf stat: Align CSV output for summary mode" aligned CSV output and added "summary" to the first column of summary lines. Now we check if the "summary" string is added to the CSV output. If we set '--no-csv-summary' option, the "summary" string would not be added, also check with this case. Committer testing: $ perf test csv 84: perf stat csv summary test : Ok $ Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210319070156.20394-2-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jin Yao authored
The 'perf stat' subcommand supports the request for a summary of the interval counter readings. But the summary lines break the CSV output so it's hard for scripts to parse the result. Before: # perf stat -x, -I1000 --interval-count 1 --summary 1.001323097,8013.48,msec,cpu-clock,8013483384,100.00,8.013,CPUs utilized 1.001323097,270,,context-switches,8013513297,100.00,0.034,K/sec 1.001323097,13,,cpu-migrations,8013530032,100.00,0.002,K/sec 1.001323097,184,,page-faults,8013546992,100.00,0.023,K/sec 1.001323097,20574191,,cycles,8013551506,100.00,0.003,GHz 1.001323097,10562267,,instructions,8013564958,100.00,0.51,insn per cycle 1.001323097,2019244,,branches,8013575673,100.00,0.252,M/sec 1.001323097,106152,,branch-misses,8013585776,100.00,5.26,of all branches 8013.48,msec,cpu-clock,8013483384,100.00,7.984,CPUs utilized 270,,context-switches,8013513297,100.00,0.034,K/sec 13,,cpu-migrations,8013530032,100.00,0.002,K/sec 184,,page-faults,8013546992,100.00,0.023,K/sec 20574191,,cycles,8013551506,100.00,0.003,GHz 10562267,,instructions,8013564958,100.00,0.51,insn per cycle 2019244,,branches,8013575673,100.00,0.252,M/sec 106152,,branch-misses,8013585776,100.00,5.26,of all branches The summary line loses the timestamp column, which breaks the CSV output. We add a column at the original 'timestamp' position and it just says 'summary' for the summary line. After: # perf stat -x, -I1000 --interval-count 1 --summary 1.001196053,8012.72,msec,cpu-clock,8012722903,100.00,8.013,CPUs utilized 1.001196053,218,,context-switches,8012753271,100.00,0.027,K/sec 1.001196053,9,,cpu-migrations,8012769767,100.00,0.001,K/sec 1.001196053,0,,page-faults,8012786257,100.00,0.000,K/sec 1.001196053,15004518,,cycles,8012790637,100.00,0.002,GHz 1.001196053,7954691,,instructions,8012804027,100.00,0.53,insn per cycle 1.001196053,1590259,,branches,8012814766,100.00,0.198,M/sec 1.001196053,82601,,branch-misses,8012824365,100.00,5.19,of all branches summary,8012.72,msec,cpu-clock,8012722903,100.00,7.986,CPUs utilized summary,218,,context-switches,8012753271,100.00,0.027,K/sec summary,9,,cpu-migrations,8012769767,100.00,0.001,K/sec summary,0,,page-faults,8012786257,100.00,0.000,K/sec summary,15004518,,cycles,8012790637,100.00,0.002,GHz summary,7954691,,instructions,8012804027,100.00,0.53,insn per cycle summary,1590259,,branches,8012814766,100.00,0.198,M/sec summary,82601,,branch-misses,8012824365,100.00,5.19,of all branches Now it's easy for script to analyse the summary lines. Of course, we also consider not to break possible existing scripts which can continue to use the broken CSV format by using a new '--no-csv-summary.' option. # perf stat -x, -I1000 --interval-count 1 --summary --no-csv-summary 1.001213261,8012.67,msec,cpu-clock,8012672327,100.00,8.013,CPUs utilized 1.001213261,197,,context-switches,8012703742,100.00,24.586,/sec 1.001213261,9,,cpu-migrations,8012720902,100.00,1.123,/sec 1.001213261,644,,page-faults,8012738266,100.00,80.373,/sec 1.001213261,18350698,,cycles,8012744109,100.00,0.002,GHz 1.001213261,12745021,,instructions,8012759001,100.00,0.69,insn per cycle 1.001213261,2458033,,branches,8012770864,100.00,306.768,K/sec 1.001213261,102107,,branch-misses,8012781751,100.00,4.15,of all branches 8012.67,msec,cpu-clock,8012672327,100.00,7.985,CPUs utilized 197,,context-switches,8012703742,100.00,24.586,/sec 9,,cpu-migrations,8012720902,100.00,1.123,/sec 644,,page-faults,8012738266,100.00,80.373,/sec 18350698,,cycles,8012744109,100.00,0.002,GHz 12745021,,instructions,8012759001,100.00,0.69,insn per cycle 2458033,,branches,8012770864,100.00,306.768,K/sec 102107,,branch-misses,8012781751,100.00,4.15,of all branches This option can be enabled in perf config by setting the variable 'stat.no-csv-summary'. # perf config stat.no-csv-summary=true # perf config -l stat.no-csv-summary=true # perf stat -x, -I1000 --interval-count 1 --summary 1.001330198,8013.28,msec,cpu-clock,8013279201,100.00,8.013,CPUs utilized 1.001330198,205,,context-switches,8013308394,100.00,25.583,/sec 1.001330198,10,,cpu-migrations,8013324681,100.00,1.248,/sec 1.001330198,0,,page-faults,8013340926,100.00,0.000,/sec 1.001330198,8027742,,cycles,8013344503,100.00,0.001,GHz 1.001330198,2871717,,instructions,8013356501,100.00,0.36,insn per cycle 1.001330198,553564,,branches,8013366204,100.00,69.081,K/sec 1.001330198,54021,,branch-misses,8013375952,100.00,9.76,of all branches 8013.28,msec,cpu-clock,8013279201,100.00,7.985,CPUs utilized 205,,context-switches,8013308394,100.00,25.583,/sec 10,,cpu-migrations,8013324681,100.00,1.248,/sec 0,,page-faults,8013340926,100.00,0.000,/sec 8027742,,cycles,8013344503,100.00,0.001,GHz 2871717,,instructions,8013356501,100.00,0.36,insn per cycle 553564,,branches,8013366204,100.00,69.081,K/sec 54021,,branch-misses,8013375952,100.00,9.76,of all branches Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210319070156.20394-1-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 23 Mar, 2021 4 commits
-
-
Song Liu authored
Add a test to compare the output of perf-stat with and without option --bpf-counters. If the difference is more than 10%, the test is considered as failed. Committer testing: # perf test bpf-counters 86: perf stat --bpf-counters test : Ok # perf test -v bpf-counters 86: perf stat --bpf-counters test : --- start --- test child forked, pid 2433339 test child finished with 0 ---- end ---- perf stat --bpf-counters test: Ok # Signed-off-by: Song Liu <songliubraving@fb.com> Requested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/EC00E37D-8587-4662-8E30-7AD5F874FA84@fb.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Song Liu authored
Take measurements of 't0' and 'ref_time' after enable_counters(), so that they only measure the time consumed when the counters are enabled. Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Andi Kleen <andi@firstfloor.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: kernel-team@fb.com Link: http://lore.kernel.org/lkml/20210316211837.910506-3-songliubraving@fb.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Song Liu authored
The perf tool uses performance monitoring counters (PMCs) to monitor system performance. The PMCs are limited hardware resources. For example, Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu. Modern data center systems use these PMCs in many different ways: system level monitoring, (maybe nested) container level monitoring, per process monitoring, profiling (in sample mode), etc. In some cases, there are more active perf_events than available hardware PMCs. To allow all perf_events to have a chance to run, it is necessary to do expensive time multiplexing of events. On the other hand, many monitoring tools count the common metrics (cycles, instructions). It is a waste to have multiple tools create multiple perf_events of "cycles" and occupy multiple PMCs. bperf tries to reduce such wastes by allowing multiple perf_events of "cycles" or "instructions" (at different scopes) to share PMUs. Instead of having each perf-stat session to read its own perf_events, bperf uses BPF programs to read the perf_events and aggregate readings to BPF maps. Then, the perf-stat session(s) reads the values from these BPF maps. Please refer to the comment before the definition of bperf_ops for the description of bperf architecture. bperf is off by default. To enable it, pass --bpf-counters option to perf-stat. bperf uses a BPF hashmap to share information about BPF programs and maps used by bperf. This map is pinned to bpffs. The default path is /sys/fs/bpf/perf_attr_map. The user could change the path with option --bpf-attr-map. Committer testing: # dmesg|grep "Performance Events" -A5 [ 0.225277] Performance Events: Fam17h+ core perfctr, AMD PMU driver. [ 0.225280] ... version: 0 [ 0.225280] ... bit width: 48 [ 0.225281] ... generic registers: 6 [ 0.225281] ... value mask: 0000ffffffffffff [ 0.225281] ... max period: 00007fffffffffff # # for a in $(seq 6) ; do perf stat -a -e cycles,instructions sleep 100000 & done [1] 2436231 [2] 2436232 [3] 2436233 [4] 2436234 [5] 2436235 [6] 2436236e # perf stat -a -e cycles,instructions sleep 0.1 Performance counter stats for 'system wide': 310,326,987 cycles (41.87%) 236,143,290 instructions # 0.76 insn per cycle (41.87%) 0.100800885 seconds time elapsed # We can see that the counters were enabled for this workload 41.87% of the time. Now with --bpf-counters: # for a in $(seq 32) ; do perf stat --bpf-counters -a -e cycles,instructions sleep 100000 & done [1] 2436514 [2] 2436515 [3] 2436516 [4] 2436517 [5] 2436518 [6] 2436519 [7] 2436520 [8] 2436521 [9] 2436522 [10] 2436523 [11] 2436524 [12] 2436525 [13] 2436526 [14] 2436527 [15] 2436528 [16] 2436529 [17] 2436530 [18] 2436531 [19] 2436532 [20] 2436533 [21] 2436534 [22] 2436535 [23] 2436536 [24] 2436537 [25] 2436538 [26] 2436539 [27] 2436540 [28] 2436541 [29] 2436542 [30] 2436543 [31] 2436544 [32] 2436545 # # ls -la /sys/fs/bpf/perf_attr_map -rw-------. 1 root root 0 Mar 23 14:53 /sys/fs/bpf/perf_attr_map # bpftool map | grep bperf | wc -l 64 # # bpftool map | tail 1265: percpu_array name accum_readings flags 0x0 key 4B value 24B max_entries 1 memlock 4096B 1266: hash name filter flags 0x0 key 4B value 4B max_entries 1 memlock 4096B 1267: array name bperf_fo.bss flags 0x400 key 4B value 8B max_entries 1 memlock 4096B btf_id 996 pids perf(2436545) 1268: percpu_array name accum_readings flags 0x0 key 4B value 24B max_entries 1 memlock 4096B 1269: hash name filter flags 0x0 key 4B value 4B max_entries 1 memlock 4096B 1270: array name bperf_fo.bss flags 0x400 key 4B value 8B max_entries 1 memlock 4096B btf_id 997 pids perf(2436541) 1285: array name pid_iter.rodata flags 0x480 key 4B value 4B max_entries 1 memlock 4096B btf_id 1017 frozen pids bpftool(2437504) 1286: array flags 0x0 key 4B value 32B max_entries 1 memlock 4096B # # bpftool map dump id 1268 | tail value (CPU 21): 8f f3 bc ca 00 00 00 00 80 fd 2a d1 4d 00 00 00 80 fd 2a d1 4d 00 00 00 value (CPU 22): 7e d5 64 4d 00 00 00 00 a4 8a 2e ee 4d 00 00 00 a4 8a 2e ee 4d 00 00 00 value (CPU 23): a7 78 3e 06 01 00 00 00 b2 34 94 f6 4d 00 00 00 b2 34 94 f6 4d 00 00 00 Found 1 element # bpftool map dump id 1268 | tail value (CPU 21): c6 8b d9 ca 00 00 00 00 20 c6 fc 83 4e 00 00 00 20 c6 fc 83 4e 00 00 00 value (CPU 22): 9c b4 d2 4d 00 00 00 00 3e 0c df 89 4e 00 00 00 3e 0c df 89 4e 00 00 00 value (CPU 23): 18 43 66 06 01 00 00 00 5b 69 ed 83 4e 00 00 00 5b 69 ed 83 4e 00 00 00 Found 1 element # bpftool map dump id 1268 | tail value (CPU 21): f2 6e db ca 00 00 00 00 92 67 4c ba 4e 00 00 00 92 67 4c ba 4e 00 00 00 value (CPU 22): dc 8e e1 4d 00 00 00 00 d9 32 7a c5 4e 00 00 00 d9 32 7a c5 4e 00 00 00 value (CPU 23): bd 2b 73 06 01 00 00 00 7c 73 87 bf 4e 00 00 00 7c 73 87 bf 4e 00 00 00 Found 1 element # # perf stat --bpf-counters -a -e cycles,instructions sleep 0.1 Performance counter stats for 'system wide': 119,410,122 cycles 152,105,479 instructions # 1.27 insn per cycle 0.101395093 seconds time elapsed # See? We had the counters enabled all the time. Signed-off-by: Song Liu <songliubraving@fb.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: kernel-team@fb.com Link: http://lore.kernel.org/lkml/20210316211837.910506-2-songliubraving@fb.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ingo Molnar authored
Fix ~124 single-word typos and a few spelling errors in the perf tooling code, accumulated over the years. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210321113734.GA248990@gmail.com Link: http://lore.kernel.org/lkml/20210323160915.GA61903@gmail.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 17 Mar, 2021 3 commits
-
-
Ian Rogers authored
Retry the ping loop upto 600 times, or approximately 30 seconds, to make sure the test does hang at start up. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210317005505.2794804-3-irogers@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ian Rogers authored
Reorder daemon_start and daemon_exit as the trap handler is added in daemon_start referencing daemon_exit. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210317005505.2794804-2-irogers@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ian Rogers authored
Remove unused argument from daemon_exit. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210317005505.2794804-1-irogers@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 15 Mar, 2021 5 commits
-
-
Changbin Du authored
This adds function convert_unit_double() and selects appropriate unit for shadow stats between K/M/G. $ sudo perf stat -a -- sleep 1 Before: Unit 'M' is selected even the number is very small. Performance counter stats for 'system wide': 4,003.06 msec cpu-clock # 3.998 CPUs utilized 16,179 context-switches # 0.004 M/sec 161 cpu-migrations # 0.040 K/sec 4,699 page-faults # 0.001 M/sec 6,135,801,925 cycles # 1.533 GHz (83.21%) 5,783,308,491 stalled-cycles-frontend # 94.26% frontend cycles idle (83.21%) 4,543,694,050 stalled-cycles-backend # 74.05% backend cycles idle (66.49%) 4,720,130,587 instructions # 0.77 insn per cycle # 1.23 stalled cycles per insn (83.28%) 753,848,078 branches # 188.318 M/sec (83.61%) 37,457,747 branch-misses # 4.97% of all branches (83.48%) 1.001283725 seconds time elapsed After: $ sudo perf stat -a -- sleep 2 Performance counter stats for 'system wide': 8,005.52 msec cpu-clock # 3.999 CPUs utilized 10,715 context-switches # 1.338 K/sec 785 cpu-migrations # 98.057 /sec 102 page-faults # 12.741 /sec 1,948,202,279 cycles # 0.243 GHz 2,816,470,932 stalled-cycles-frontend # 144.57% frontend cycles idle 2,661,172,207 stalled-cycles-backend # 136.60% backend cycles idle 464,172,105 instructions # 0.24 insn per cycle # 6.07 stalled cycles per insn 91,567,662 branches # 11.438 M/sec 7,756,054 branch-misses # 8.47% of all branches 2.002040043 seconds time elapsed v2: o do not change 'sec' to 'cpu-sec'. o use convert_unit_double to implement convert_unit. Signed-off-by: Changbin Du <changbin.du@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210315143047.3867-1-changbin.du@gmail.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
The existing text was way too terse, pick the intended usage from the cset that introduced this option. Twitter: https://twitter.com/_monoid/status/1371461130175004672?s=20Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Shunsuke Nakamura authored
Add pmu events for A64FX. Documentation source: https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_PMU_Events_v1.2.pdfSigned-off-by: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com> Reviewed-by: John Garry <john.garry@huawei.com> Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20210308105342.746940-3-nakamura.shun@fujitsu.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Shunsuke Nakamura authored
Add the following events.[1] Common architectural events: - L2I_TLB_REFILL - L2I_TLB - SIMD_INST_RETIRED - SVE_INST_RETIRED Common microarchitectural events: - UOP_SPEC - SVE_MATH_SPEC - FP_SPEC - FP_FMA_SPEC - FP_RECPE_SPEC - FP_CVT_SPEC - ASE_SVE_INT_SPEC - SVE_PRED_SPEC - SVE_MOVPRFX_SPEC - SVE_MOVPRFX_U_SPEC - ASE_SVE_LD_SPEC - ASE_SVE_ST_SPEC - PRF_SPEC - BASE_LD_REG_SPEC - BASE_ST_REG_SPEC - SVE_LDR_REG_SPEC - SVE_STR_REG_SPEC - SVE_LDR_PREG_SPEC - SVE_STR_PREG_SPEC - SVE_PRF_CONTIG_SPEC - ASE_SVE_LD_MULTI_SPEC - ASE_SVE_ST_MULTI_SPEC - SVE_LD_GATHER_SPEC - SVE_ST_SCATTER_SPEC - SVE_PRF_GATHER_SPEC - SVE_LDFF_SPEC - FP_SCALE_OPS_SPEC - FP_FIXED_OPS_SPEC - FP_HP_SCALE_OPS_SPEC - FP_HP_FIXED_OPS_SPEC - FP_SP_SCALE_OPS_SPEC - FP_SP_FIXED_OPS_SPEC - FP_DP_SCALE_OPS_SPEC - FP_DP_FIXED_OPS_SPEC Reference document is at the following: [1] https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_PMU_Events_v1.2.pdfSigned-off-by: Nakamura, Shunsuke/中村 俊介 <nakamura.shun@fujitsu.com> Reviewed-by: John Garry <john.garry@huawei.com> Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20210308105342.746940-2-nakamura.shun@fujitsu.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
It was reported that --exclude-perf wasn't working, as tracepoints were appearing in 'perf script' output as having the 'perf' COMM, that is just the window in evlist__prepare_workload() after the fork() and before the execvp() call for workloads specified in the command line. Example: # perf record -e kmem:kmalloc --filter 'bytes_alloc<650 && bytes_alloc>620' --exclude-perf -e kmem:kfree --exclude-perf -aR sleep 30 Then: # perf script perf 15905 [009] 1498.356094: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil) perf 15905 [009] 1498.356116: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil) perf 15905 [009] 1498.356116: kmem:kfree: call_site=do_execveat_common+0x19d ptr=0xffff9cf750421c00 perf 15905 [009] 1498.356138: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil) perf 15905 [009] 1498.356148: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil) perf 15905 [009] 1498.356148: kmem:kfree: call_site=do_execveat_common+0x19d ptr=0xffff9cf750421c00 perf 15905 [009] 1498.356168: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil) perf 15905 [009] 1498.356176: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil) <SNIP> perf 15905 [009] 1498.356348: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil) perf 15905 [014] 1498.356386: kmem:kfree: call_site=security_compute_sid.part.0+0x3b2 ptr=(nil) perf 15905 [014] 1498.356423: kmem:kfree: call_site=load_elf_binary+0x207 ptr=0xffff9cf5b2a34220 perf 15905 [014] 1498.356694: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf6d0b3b000 sleep 15905 [014] 1498.356739: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil) Use prctl() to show that that is just the preparation of the workload: # perf script perf-exec 19036 [009] 2199.357582: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil) perf-exec 19036 [009] 2199.357604: kmem:kfree: call_site=free_bprm+0x8f ptr=(nil) perf-exec 19036 [009] 2199.357604: kmem:kfree: call_site=do_execveat_common+0x19d ptr=0xffff9cf786459800 perf-exec 19036 [009] 2199.357630: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil) <SNIP> perf-exec 19036 [000] 2199.358277: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf786fb9c00 perf-exec 19036 [000] 2199.358278: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf786458200 perf-exec 19036 [000] 2199.358279: kmem:kfree: call_site=__free_slab+0xb5 ptr=0xffff9cf786458600 sleep 19036 [000] 2199.358316: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil) sleep 19036 [000] 2199.358323: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=(nil) sleep 19036 [000] 2199.358330: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000 sleep 19036 [000] 2199.358337: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000 sleep 19036 [000] 2199.358339: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000 sleep 19036 [000] 2199.358341: kmem:kfree: call_site=perf_event_mmap+0x279 ptr=0xffff9cf58be2d000 Reporter: zhanweiw <wingfancy@hotmail.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212213Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 09 Mar, 2021 4 commits
-
-
Jiapeng Chong authored
Fix the following coccicheck warnings: ./tools/perf/util/machine.c:2041:9-10: WARNING: return of 0/1 in function 'symbol__match_regex' with return type bool. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/1615284669-82139-1-git-send-email-jiapeng.chong@linux.alibaba.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jiapeng Chong authored
Fix the following cppcheck warnings: ./tools/perf/tests/demangle-ocaml-test.c:29:34-35: WARNING: Use ARRAY_SIZE. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/1615281145-2122-1-git-send-email-jiapeng.chong@linux.alibaba.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
This is a perf_stat_evsel method, so should have that as its prefix, previously it was swapped as __perf_evsel_stat__is(). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
They all operate on 'struct evsel_script' instances, so should be prefixed with evsel_script__, not with perf_evsel_script__. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 08 Mar, 2021 3 commits
-
-
Arnaldo Carvalho de Melo authored
The 'ret' variable was initialized to zero but then it was not updated from the fprintf() return, fix it. Reported-by: Yang Li <yang.lee@linux.alibaba.com> cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> cc: Ingo Molnar <mingo@redhat.com> cc: Jiri Olsa <jolsa@redhat.com> cc: Mark Rutland <mark.rutland@arm.com> cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Fixes: 90f18e63 ("perf symbols: List symbols in a dso in ascending name order") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ian Rogers authored
This adds definitions available in the uapi version. Explanation: In the kernel include of types.h the uapi version is included. In tools the uapi/linux/types.h and linux/types.h are distinct. For BPF programs a definition of __wsum is needed by the generated bpf_helpers.h. The definition comes either from a generated vmlinux.h or from <linux/types.h> that may be transitively included from bpf.h. The perf build prefers linux/types.h over uapi/linux/types.h for <linux/types.h>*. To allow tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c to compile with the same include path used for perf then these definitions are necessary. There is likely a wider conversation about exactly how types.h should be specified and the include order used by the perf build - it is somewhat confusing that tools/include/uapi/linux/bpf.h is using the non-uapi types.h. *see tools/perf/Makefile.config: ... INC_FLAGS += -I$(srctree)/tools/include/ INC_FLAGS += -I$(srctree)/tools/arch/$(SRCARCH)/include/uapi INC_FLAGS += -I$(srctree)/tools/include/uapi ... The include directories are scanned from left-to-right: https://gcc.gnu.org/onlinedocs/gcc/Directory-Options.html As tools/include/linux/types.h appears before tools/include/uapi/linux/types.h then I say it is preferred. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20210307223024.4081067-1-irogers@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
To pick up the fixes sent for v5.12 and continue development based on v5.12-rc2, i.e. without the swap on file bug. This also gets a slightly newer and better tools/perf/arch/arm/util/cs-etm.c patch version, using the BIT() macro, that had already been slated to v5.13 but ended up going to v5.12-rc1 on an older version. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 07 Mar, 2021 4 commits
-
-
Linus Torvalds authored
Merge tag 'perf-tools-fixes-for-v5.12-2020-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: "Perf tool fixes: - Fix wrong skipping for per-die aggregation in 'perf stat'. - Fix race in signal handling on large core count machines, setting up signal handlers earlier. - Fix -F for branch & mem modes in 'perf report'. - Fix the condition checks for max number of NUMA nodes in 'perf bench numa'. - Fix crash in 'perf diff' error path. - Fix filtering of empty build-ids in 'perf archive'. - Ensure read cmdlines from libtraceevent are null terminated. Recent regressions: - Fix control fifo permissions in 'perf daemon'. - Fix 'perf daemon' compile error with ASAN. - Fix running 'perf daemon' test for non root user. - Fix PERF_SAMPLE_WEIGHT_STRUCT 'perf test' failure on non-x86 arches. - Fix event's PMU name parsing related to new drm/i915/gt software-gt-awake-time event. Fixes from compiler instrumentation: - Fix leaks in 'perf test' entries, found using ASAN. - Fix use-after-free when 'perf stat -r' option is used. Arch specific: - Fix bitmap for option om ARM's CS-ETM. Documentation: - Fix documentation of verbose options. Build: - Clean 'generated' directory used for creating the syscall table on x86. - Fix ccache usage in $(CC) when generating arch errno table. - Cast (struct timeval).tv_sec when printing, fixing the build with MUSL libc. - Tighten snprintf() string precision to pass gcc check on some 32-bit arches. - Update UAPI copies from the kernel sources. - Fix regression on feature detection 'make clean' target" * tag 'perf-tools-fixes-for-v5.12-2020-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (44 commits) perf cs-etm: Fix bitmap for option perf trace: Fix race in signal handling perf map: Tighten snprintf() string precision to pass gcc check on some 32-bit arches perf report: Fix -F for branch & mem modes perf tests x86: Move insn.h include to make sure it finds stddef.h perf test: Support the ins_lat check in the X86 specific test perf test: Fix sample-parsing failure on non-x86 platforms perf archive: Fix filtering of empty build-ids perf daemon: Fix compile error with Asan perf stat: Fix use-after-free when -r option is used libperf: Add perf_evlist__reset_id_hash() perf stat: Fix wrong skipping for per-die aggregation tools headers UAPI: Sync KVM's kvm.h and vmx.h headers with the kernel sources tools headers cpufeatures: Sync with the kernel sources tools headers UAPI: Update tools' copy of linux/coresight-pmu.h tools headers: Update syscall.tbl files to support mount_setattr perf test: Fix cpu and thread map leaks in perf_time_to_tsc test perf test: Fix cpu map leaks in cpu_map_print test perf test: Fix a memory leak in thread_map_remove test perf test: Fix a thread map leak in thread_map_synthesize test ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linuxLinus Torvalds authored
Pull parisc fixes from Helge Deller: "Two small parisc architecture fixes: fix a linking failure reported by the kernel test robot and remove a duplicate include" * 'parisc-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: arch/parisc/kernel: remove duplicate include in ptrace parisc: Enable -mlong-calls gcc option with CONFIG_COMPILE_TEST
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds authored
Pull powerpc fixes from Michael Ellerman: "One non-fix, the conversion of vio_driver->remove() to return void, which touches various powerpc specific drivers. Fix the privilege checks we do in our perf handling, which could cause soft/hard lockups in some configurations. Fix a bug with IRQ affinity seen on kdump kernels when CPU 0 is offline in the second kernel. Fix missed page faults after mprotect(..., PROT_NONE) on 603 (32-bit). Fix a bug in our VSX (vector) instruction emulation, which should only be seen when doing VSX ops to cache inhibited mappings. Three commits fixing various build issues with obscure configurations. Thanks to Athira Rajeev, Cédric Le Goater, Christophe Leroy, Christoph Plattner, Greg Kurz, Jordan Niethe, Laurent Vivier, Ravi Bangoria, Tyrel Datwyler, and Uwe Kleine-König" * tag 'powerpc-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/sstep: Fix VSX instruction emulation powerpc/perf: Fix handling of privilege level checks in perf interrupt context powerpc: Force inlining of mmu_has_feature to fix build failure vio: make remove callback return void powerpc/syscall: Force inlining of __prep_irq_for_enabled_exit() powerpc/603: Fix protection of user pages mapped with PROT_NONE powerpc/pseries: Don't enforce MSI affinity with kdump powerpc/4xx: Fix build errors from mfdcr()
-
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68kLinus Torvalds authored
Pull m68k fix from Geert Uytterhoeven: "Fix virt_addr_valid() W=1 compiler warnings. This is a single non-critical fix. As the build bots are now testing all new code with W=1, these warnings are popping up everywhere, confusing people. Hence I think it makes sense to silence it as soon as possible" * tag 'm68k-for-v5.12-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: Fix virt_addr_valid() W=1 compiler warnings
-
- 06 Mar, 2021 8 commits
-
-
Suzuki K Poulose authored
When set option with macros ETM_OPT_CTXTID and ETM_OPT_TS, it wrongly takes these two values (14 and 28 prespectively) as bit masks, but actually both are the offset for bits. But this doesn't lead to further failure due to the AND logic operation will be always true for ETM_OPT_CTXTID / ETM_OPT_TS. This patch defines new independent macros (rather than using the "config" bits) for requesting the "contextid" and "timestamp" for cs_etm_set_option(). Signed-off-by: Suzuki Poulouse <suzuki.poulose@arm.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Cc: Al Grant <al.grant@arm.com> Cc: Daniel Kiss <daniel.kiss@arm.com> Cc: Denis Nikitin <denik@chromium.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-doc@vger.kernel.org Link: http://lore.kernel.org/lkml/20210206150833.42120-5-leo.yan@linaro.org [ Extract the change as a separate patch for easier review ] Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Michael Petlan authored
Since a lot of stuff happens before the SIGINT signal handler is registered (scanning /proc/*, etc.), on bigger systems, such as Cavium Sabre CN99xx, it may happen that first interrupt signal is lost and perf isn't correctly terminated. The reproduction code might look like the following: perf trace -a & PERF_PID=$! sleep 4 kill -INT $PERF_PID The issue has been found on a CN99xx machine with RHEL-8 and the patch fixes it by registering the signal handlers earlier in the init stage. Suggested-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Michael Petlan <mpetlan@redhat.com> Tested-by: Michael Petlan <mpetlan@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/lkml/YEJnaMzH2ctp3PPx@kernel.org/Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
Noticed on a debian:experimental mips and mipsel cross build build environment: perfbuilder@ec265a086e9b:~$ mips-linux-gnu-gcc --version | head -1 mips-linux-gnu-gcc (Debian 10.2.1-3) 10.2.1 20201224 perfbuilder@ec265a086e9b:~$ CC /tmp/build/perf/util/map.o util/map.c: In function 'map__new': util/map.c:109:5: error: '%s' directive output may be truncated writing between 1 and 2147483645 bytes into a region of size 4096 [-Werror=format-truncation=] 109 | "%s/platforms/%s/arch-%s/usr/lib/%s", | ^~ In file included from /usr/mips-linux-gnu/include/stdio.h:867, from util/symbol.h:11, from util/map.c:2: /usr/mips-linux-gnu/include/bits/stdio2.h:67:10: note: '__builtin___snprintf_chk' output 32 or more bytes (assuming 4294967321) into a destination of size 4096 67 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 68 | __bos (__s), __fmt, __va_arg_pack ()); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Since we have the lenghts for what lands in that place, use it to give the compiler more info and make it happy. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ravi Bangoria authored
perf report fails to add valid additional fields with -F when used with branch or mem modes. Fix it. Before patch: $ perf record -b $ perf report -b -F +srcline_from --stdio Error: Invalid --fields key: `srcline_from' After patch: $ perf report -b -F +srcline_from --stdio # Samples: 8K of event 'cycles' # Event count (approx.): 8784 ... Committer notes: There was an inversion: when looking at branch stack dimensions (keys) it was checking if the sort mode was 'mem', not 'branch'. Fixes: aa6b3c99 ("perf report: Make -F more strict like -s") Reported-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20210304062958.85465-1-ravi.bangoria@linux.ibm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
In some versions of alpine Linux the perf build is broken since commit 1d509f2a ("x86/insn: Support big endian cross-compiles"): In file included from /usr/include/linux/byteorder/little_endian.h:13, from /usr/include/asm/byteorder.h:5, from arch/x86/util/../../../../arch/x86/include/asm/insn.h:10, from arch/x86/util/archinsn.c:2: /usr/include/linux/swab.h:161:8: error: unknown type name '__always_inline' static __always_inline __u16 __swab16p(const __u16 *p) So move the inclusion of arch/x86/include/asm/insn.h to later in the places where linux/stddef.h (that conditionally defines __always_inline) to workaround this problem on Alpine Linux 3.9 to 3.11, 3.12 onwards works. Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
The ins_lat of PERF_SAMPLE_WEIGHT_STRUCT stands for the instruction latency, which is only available for X86. Add a X86 specific test for the ins_lat and PERF_SAMPLE_WEIGHT_STRUCT type. The test__x86_sample_parsing() uses the same way as the test__sample_parsing() to verify a sample type. Since the ins_lat and PERF_SAMPLE_WEIGHT_STRUCT are the only X86 specific sample type for now, the test__x86_sample_parsing() only verify the PERF_SAMPLE_WEIGHT_STRUCT type. Other sample types are still verified in the generic test. $ perf test 77 -v 77: x86 Sample parsing : --- start --- test child forked, pid 102370 test child finished with 0 ---- end ---- x86 Sample parsing: Ok Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lore.kernel.org/lkml/1614787285-104151-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
Executing 'perf test 27' fails on s390: [root@t35lp46 perf]# ./perf test -Fv 27 27: Sample parsing --- start --- ---- end ---- Sample parsing: FAILED! [root@t35lp46 perf]# The commit fbefe9c2 ("perf tools: Support arch specific PERF_SAMPLE_WEIGHT_STRUCT processing") changes the ins_lat to a model-specific variable only for X86, but perf test still verify the variable in the generic test. Remove the ins_lat check in the generic test. The following patch will add it in the X86 specific test. Fixes: fbefe9c2 ("perf tools: Support arch specific PERF_SAMPLE_WEIGHT_STRUCT processing") Reported-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lore.kernel.org/lkml/1614787285-104151-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Nicholas Fraser authored
A non-existent build-id used to be treated as all-zero SHA-1 hash. Build-ids are now variable width. A non-existent build-id is an empty string and "perf buildid-list" pads this with spaces. This is true even when using old perf.data files recorded from older versions of perf; "perf buildid-list" never reports an all-zero hash anymore. This fixes "perf-archive" to skip missing build-ids by skipping lines that start with a padding space rather than with zeroes. Signed-off-by: Nicholas Fraser <nfraser@codeweavers.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Huw Davies <huw@codeweavers.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ulrich Czekalla <uczekalla@codeweavers.com> Link: https://lore.kernel.org/r/442bffc7-ac5c-0975-b876-a549efce2413@codeweavers.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-