Commits · 146edff3d7ed135269dae8abef3219083c45b21e · Kirill Smelkov / linux

31 Oct, 2022 11 commits

perf test: Parse events workaround for dash/minus · 146edff3

Ian Rogers authored Oct 12, 2022

Skip an event configuration for event names with a dash/minus in them.
Events with a dash/minus in their name cause parsing issues as legacy
encoding of events would use a dash/minus as a separator.

The parser separates events with dashes into prefixes and suffixes and
then recombines them. Unfortunately if an event has part of its name
that matches a legacy token then the recombining fails.

This is seen for branch-brs where branch is a legacy token. branch-brs
was introduced to sysfs in:

  https://lore.kernel.org/all/20220322221517.2510440-5-eranian@google.com/

The failure is shown below as well as the workaround to use a config
where the dash/minus isn't treated specially:

```
  $ perf stat -e branch-brs true
  event syntax error: 'branch-brs'
                             \___ parser error

  $ perf stat -e cpu/branch-brs/ true

   Performance counter stats for 'true':

              46,179      cpu/branch-brs/
```
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20221013011205.3151391-1-irogers@google.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

146edff3

perf evlist: Add missing util/event.h header · 2e5a738a

Arnaldo Carvalho de Melo authored Oct 27, 2022

Needed to get the event_attr_init() and perf_event_paranoid() prototypes
that were being obtained indirectly, by sheer luck.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

2e5a738a

perf mmap: Remove several unneeded includes from util/mmap.h · 606f70ab
Arnaldo Carvalho de Melo authored Oct 27, 2022
```
Those headers are not needed in util/mmap.h, remove them.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
```
606f70ab

perf tests: Add missing event.h include · fd8d5a3b

Arnaldo Carvalho de Melo authored Oct 27, 2022

It uses things like perf_event__name() but were not including event.h,
where its prototype lives, fix it.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

fd8d5a3b

perf thread: Move thread__resolve() from event.h · cde56712

Arnaldo Carvalho de Melo authored Oct 27, 2022

Its a thread method, so move it to thread.h, this way some places that
were using event.h just to get this prototype may stop doing so and
speed up building and disentanble the header dependency graph.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

cde56712

perf symbol: Move addr_location__put() from event.h · d1e633e4

Arnaldo Carvalho de Melo authored Oct 27, 2022

Its a addr_location method, so move it to symbol.h, where 'struct
addr_location' is, this way some places that were using event.h just to
get this prototype may stop doing so and speed up building and
disentanble the header dependency graph.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

d1e633e4

perf machine: Move machine__resolve() from event.h · 7e5c6f2c

Arnaldo Carvalho de Melo authored Oct 27, 2022

Its a machine method, so move it to machine.h, this way some places that
were using event.h just to get this prototype may stop doing so and
speed up building and disentanble the header dependency graph.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

7e5c6f2c

perf kwork: Remove includes not needed in kwork.h · 628d6999

Arnaldo Carvalho de Melo authored Oct 27, 2022

Leave just some forward declarations for pointers, move the includes to
where they are really needed.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

628d6999

perf tools: Move 'struct perf_sample' to a separate header file to disentangle headers · 9823147d

Arnaldo Carvalho de Melo authored Oct 26, 2022

Some places were including event.h just to get 'struct perf_sample',
move it to a separate place so that we speed up a bit the build.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

9823147d

perf branch: Remove some needless headers, add a needed one · 08043330

Arnaldo Carvalho de Melo authored Oct 26, 2022

map_symbol.h is needed because we have structs that contains 'struct
addr_map_symbol', so add it, remove the others.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

08043330

perf bpf: No need to include headers just use forward declarations · 8d0d129e

Arnaldo Carvalho de Melo authored Oct 26, 2022

In the bpf-prologue.h header we are just using pointers, so no need to
include headers for that, just provide forward declarations for those
types.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

8d0d129e

27 Oct, 2022 29 commits

perf bpf: No need to include compiler.h when HAVE_LIBBPF_SUPPORT is true · cff62414
Arnaldo Carvalho de Melo authored Oct 26, 2022
```
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
```
cff62414

perf tools: Make quiet mode consistent between tools · a527c2c1

James Clark authored Oct 18, 2022

Use the global quiet variable everywhere so that all tools hide warnings
in quiet mode and update the documentation to reflect this.

'perf probe' claimed that errors are not printed in quiet mode but I
don't see this so remove it from the docs.
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221018094137.783081-3-james.clark@arm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

a527c2c1

perf tools: Fix "kernel lock contention analysis" test by not printing warnings in quiet mode · 65319890

James Clark authored Oct 18, 2022

Especially when CONFIG_LOCKDEP and other debug configs are enabled,
Perf can print the following warning when running the "kernel lock
contention analysis" test:

  Warning:
  Processed 1378918 events and lost 4 chunks!

  Check IO/CPU overload!

  Warning:
  Processed 4593325 samples and lost 70.00%!

The test already supplies -q to run in quiet mode, so extend quiet mode
to perf_stdio__warning() and also ui__warning() for consistency.

This fixes the following failure due to the extra lines counted:

  perf test "lock cont" -vvv

  82: kernel lock contention analysis test                            :
  --- start ---
  test child forked, pid 3125
  Testing perf lock record and perf lock contention
  [Fail] Recorded result count is not 1: 9
  test child finished with -1
  ---- end ----
  kernel lock contention analysis test: FAILED!

Fixes: ec685de2 ("perf test: Add kernel lock contention test")
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221018094137.783081-2-james.clark@arm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

65319890

perf test: Do not set TEST_SKIP for record subtests · 8b380e6a

Namhyung Kim authored Oct 20, 2022

It now has 4 sub tests and at least one of them should run.

But once the TEST_SKIP (= 2) return value is set, it won't be
overwritten unless there's a failure.  I think we should return success
when one or more tests are skipped but the remaining subtests are
passed.

So update the test code not to set the err variable when it skips
the test.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221020172643.3458767-9-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

8b380e6a

perf test: Test record with --threads option · 7f4ed3f0

Namhyung Kim authored Oct 20, 2022

The --threads option changed the 'perf record' behavior significantly,
so it'd be nice if we test it separately.  Add --threads options with
different argument in each test supported and check the result.

Also update the cleanup routine because threads recording produces data
in a directory.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221020172643.3458767-8-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

7f4ed3f0

perf test: Add target workload test in 'perf record' tests · c8c93567

Namhyung Kim authored Oct 20, 2022

Add a subtest which profiles the given workload on the command line.

As it's a minimal requirement, the test should run ok so it doesn't skip
the test even if it failed to run the 'perf record' command.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221020172643.3458767-7-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

c8c93567

perf test: Add system-wide mode in 'perf record' tests · 2cadf2c7

Namhyung Kim authored Oct 20, 2022

Add system wide recording test with the same pattern.  It'd skip the
test when it fails to run 'perf record'.

For system-wide mode, it needs to avoid build-id collection and
synthesis because the test only cares about the test program and kernel
would generate the necessary events as the process starts.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221020172643.3458767-6-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

2cadf2c7

perf test: Wait for a new thread when testing --per-thread record · 6b7e02ab

Namhyung Kim authored Oct 20, 2022

Just running the target program is not enough to test multi-thread
target because it'd be racy perf vs target startup.  I used the
initial delay but it cannot guarantee for perf to see the thread.

Instead, use wait_for_threads helper from shell/lib/waiting.sh to make
sure it starts the sibling thread first.  Then perf record can use -p
option to profile the target process.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221020172643.3458767-5-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

6b7e02ab

perf test: Use a test program in 'perf record' tests · 4321ad4e

Namhyung Kim authored Oct 20, 2022

If the system has cc it could build a test program with two threads
and then use it for more detailed testing.  Also it accepts an option
to run a thread forever to ensure multi-thread runs.

If cc is not found, it falls back to use the default value 'true'.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221020172643.3458767-4-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

4321ad4e

perf test: Fix shellcheck issues in the record test · 9e455f4f

Namhyung Kim authored Oct 20, 2022

Basically there are 3 issues:

 1. quote shell expansion
 2. do not use egrep
 3. use upper case letters for signal names
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221020172643.3458767-3-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

9e455f4f

perf test: Do not use instructions:u explicitly · 439dbef2

Namhyung Kim authored Oct 20, 2022

I think it's to support non-root user tests.  But perf record can handle
the case and fall back to a software event (cpu-clock).  Practically this
would affect when it's run on a VM, but it seems no reason to prevent running
the test in the guest.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221020172643.3458767-2-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

439dbef2

perf scripts python: intel-pt-events.py: Add ability interleave output · ad7ad6b5

Adrian Hunter authored Oct 20, 2022

Intel PT timestamps are not provided for every branch, let alone every
instruction, so there can be many samples with the same timestamp. With
per-cpu contexts, decoding is done for each CPU in turn, which can make it
difficult to see what is happening on different CPUs at the same time.
Currently the interleaving from perf script --itrace=i0ns is quite coarse
grained. There are often long stretches executing on one CPU and nothing on
another.

Some people are interested in seeing what happened on multiple CPUs before
a crash to debug races etc.

To improve perf script interleaving for parallel execution, the
intel-pt-events.py script has been enhanced to enable interleaving the
output with the same timestamp from different CPUs. It is understood that
interleaving is not perfect or causal.

Add parameter --interleave [<n>] to interleave sample output for the same
timestamp so that no more than n samples for a CPU are displayed in a row.
'n' defaults to 4. Note this only affects the order of output, and only
when the timestamp is the same.

Example:

  $ perf script intel-pt-events.py --insn-trace --interleave 3
  ...
  bash  2267/2267  [004]  9323.692625625  563caa3c86f0  jz 0x563caa3c89c7        run_pending_traps+0x30 (/usr/bin/bash)   IPC: 1.52 (38/25)
  bash  2267/2267  [004]  9323.692625625  563caa3c89c7  movq  0x118(%rsp), %rax  run_pending_traps+0x307 (/usr/bin/bash)
  bash  2267/2267  [004]  9323.692625625  563caa3c89cf  subq  %fs:0x28, %rax     run_pending_traps+0x30f (/usr/bin/bash)
  bash  2270/2270  [007]  9323.692625625  55dc58cabf02  jz 0x55dc58cabf48        unquoted_glob_pattern_p+0x102 (/usr/bin/bash)   IPC: 1.56 (25/16)
  bash  2270/2270  [007]  9323.692625625  55dc58cabf04  cmp $0x5d, %al           unquoted_glob_pattern_p+0x104 (/usr/bin/bash)
  bash  2270/2270  [007]  9323.692625625  55dc58cabf06  jnz 0x55dc58cabf10       unquoted_glob_pattern_p+0x106 (/usr/bin/bash)
  bash  2264/2264  [001]  9323.692625625  7fd556a4376c  jbe 0x7fd556a43ac8       round_and_return+0x3fc (/usr/lib/x86_64-linux-gnu/libc.so.6)   IPC: 4.30 (43/10)
  bash  2264/2264  [001]  9323.692625625  7fd556a43772  and $0x8, %edx           round_and_return+0x402 (/usr/lib/x86_64-linux-gnu/libc.so.6)
  bash  2264/2264  [001]  9323.692625625  7fd556a43775  jnz 0x7fd556a43ac8       round_and_return+0x405 (/usr/lib/x86_64-linux-gnu/libc.so.6)
  bash  2267/2267  [004]  9323.692625625  563caa3c89d8  jnz 0x563caa3c8b11       run_pending_traps+0x318 (/usr/bin/bash)
  bash  2267/2267  [004]  9323.692625625  563caa3c89de  add $0x128, %rsp         run_pending_traps+0x31e (/usr/bin/bash)
  bash  2267/2267  [004]  9323.692625625  563caa3c89e5  popq  %rbx               run_pending_traps+0x325 (/usr/bin/bash)
  ...
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221020152509.5298-1-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

ad7ad6b5

perf event: Drop perf_regs.h include, not needed anymore · b15cf900

Arnaldo Carvalho de Melo authored Oct 25, 2022

Since commit c8978997 ("perf tools: Prevent out-of-bounds access
to registers") the util/event.h header doesn't use anything from
util/perf_regs.h, so drop it to untangle the header dependency tree a
bit, speeding up compilation.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

b15cf900

perf scripting python: Add missing util/perf_regs.h include to get perf_reg_name() prototype · 06bf28cb
Arnaldo Carvalho de Melo authored Oct 25, 2022
```
It was getting it via event.h, that doesn't need that include anymore
and will drop it.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
```
06bf28cb
perf arch x86: Add missing stdlib.h to get free() prototype · 6bc13cab
Arnaldo Carvalho de Melo authored Oct 25, 2022
```
It was getting indirectly, out of luck, add it.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
```
6bc13cab

perf unwind arm64: Remove needless event.h & thread.h includes · 743ef218

Arnaldo Carvalho de Melo authored Oct 25, 2022

To reduce compile time and header dependency chains just add forward
declarations for pointer types and include linux/types.h for u64.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

743ef218

perf config: Add missing newline on pr_warning() call in home_perfconfig() · 0cef66a9

Yang Jihong authored Oct 22, 2022

Add missing newline on pr_warning() call in home_perfconfig().

Before:
  # perf record
  File /home/yangjihong/.perfconfig not owned by current user or root, ignoring it.Couldn't synthesize bpf events.

After:
  # perf record
  File /home/yangjihong/.perfconfig not owned by current user or root, ignoring it.
  Couldn't synthesize bpf events.
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221022092735.114967-4-yangjihong1@huawei.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

0cef66a9

perf daemon: Complete list of supported subcommand in help message · a87edbec

Yang Jihong authored Oct 22, 2022

perf daemon supports start, signal, stop and ping subcommands, complete it

Before:

  # perf daemon -h
   Usage: perf daemon start [<options>]
      or: perf daemon [<options>]

      -v, --verbose         be more verbose
      -x, --field-separator[=<field separator>]
                            print counts with custom separator
          --base <directory>
                            base directory
          --config <config file>
                            config file path

After:

  # perf daemon -h

   Usage: perf daemon {start|signal|stop|ping} [<options>]
      or: perf daemon [<options>]

      -v, --verbose         be more verbose
      -x, --field-separator[=<field separator>]
                            print counts with custom separator
          --base <directory>
                            base directory
          --config <config file>
                            config file path
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221022092735.114967-3-yangjihong1@huawei.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

a87edbec

perf stat: Remove unused perf_counts.aggr field · 8b76a318

Namhyung Kim authored Oct 17, 2022

The aggr field in the struct perf_counts is to keep the aggregated value
in the AGGR_GLOBAL for the old code.  But it's not used anymore.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221018020227.85905-21-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

8b76a318

perf stat: Display percore events properly · cec94d69

Namhyung Kim authored Oct 17, 2022

The recent change in the perf stat broke the percore event display.
Note that the aggr counts are already processed so that the every
sibling thread in the same core will get the per-core counter values.

Check percore evsels and skip the sibling threads in the display.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221018020227.85905-20-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

cec94d69

perf stat: Display event stats using aggr counts · 91f85f98

Namhyung Kim authored Oct 17, 2022

Now aggr counts are ready for use.  Convert the display routines to use
the aggr counts and update the shadow stat with them.  It doesn't need
to aggregate counts or collect aliases anymore during the display.  Get
rid of now unused struct perf_aggr_thread_value.

Note that there's a difference in the display order among the aggr mode.
For per-core/die/socket/node aggregation, it shows relevant events in
the same unit together, whereas global/thread/no aggregation it shows
the same events for different units together.  So it still uses separate
codes to display them due to the ordering.

One more thing to note is that it breaks per-core event display for now.
The next patch will fix it to have identical output as of now.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221018020227.85905-19-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

91f85f98

perf stat: Add perf_stat_process_shadow_stats() · 88f1d351

Namhyung Kim authored Oct 17, 2022

This function updates the shadow stats using the aggregated counts
uniformly since it uses the aggr_counts for the every aggr mode.

It'd have duplicate shadow stats for each items for now since the
display routines will update them once again.  But that'd be fine
as it shows the average values and it'd be gone eventually.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221018020227.85905-18-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

88f1d351

perf stat: Add perf_stat_process_percore() · 1d6d2bea

Namhyung Kim authored Oct 17, 2022

The perf_stat_process_percore() is to aggregate counts for an event per-core
even if the aggr_mode is AGGR_NONE.  This is enabled when user requested it
on the command line.

To handle that, it keeps the per-cpu counts at first.  And then it aggregates
the counts that have the same core id in the aggr->counts and updates the
values for each cpu back.

Later, per-core events will skip one of the CPUs unless percore-show-thread
option is given.  In that case, it can simply print all cpu stats with the
updated (per-core) values.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221018020227.85905-17-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

1d6d2bea

perf stat: Add perf_stat_merge_counters() · 942c5593

Namhyung Kim authored Oct 17, 2022

The perf_stat_merge_counters() is to aggregate the same events in different
PMUs like in case of uncore or hybrid.  The same logic is in the stat-display
routines but I think it should be handled when it processes the event counters.

As it works on the aggr_counters, it doesn't change the output yet.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221018020227.85905-16-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

942c5593

perf stat: Split process_counters() to share it with process_stat_round_event() · 8962cbec

Namhyung Kim authored Oct 17, 2022

It'd do more processing with aggregation.  Let's split the function so that it
can be shared with by process_stat_round_event() too.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221018020227.85905-15-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

8962cbec

perf stat: Reset aggr counts for each interval · 8f97963e

Namhyung Kim authored Oct 17, 2022

The evsel->stats->aggr->count should be reset for interval processing
since we want to use the values directly for display.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221018020227.85905-14-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

8f97963e

perf stat: Allocate aggr counts for recorded data · ae7e6492

Namhyung Kim authored Oct 17, 2022

In the process_stat_config_event() it sets the aggr_mode that means the
earlier evlist__alloc_stats() cannot allocate the aggr counts due to the
missing aggr_mode.

Do it after setting the aggr_map using evlist__alloc_aggr_stats().
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221018020227.85905-13-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

ae7e6492

perf stat: Aggregate per-thread stats using evsel->stats->aggr · 050059e1

Namhyung Kim authored Oct 17, 2022

Per-thread aggregation doesn't use the CPU numbers but the logic should
be the same.  Initialize cpu_aggr_map separately for AGGR_THREAD and use
thread map idx to aggregate counter values.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221018020227.85905-12-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

050059e1

perf stat: Factor out evsel__count_has_error() · 049aba09

Namhyung Kim authored Oct 17, 2022

It's possible to have 0 enabled/running time for some per-task or per-cgroup
events since it's not scheduled on any CPU.  Treating the whole event as
failed would not work in this case.  Thinking again, the code only existed
when any CPU-level aggregation is enabled (like per-socket, per-core, ...).

To make it clearer, factor out the condition check into the new
evsel__count_has_error() function and add some comments.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221018020227.85905-11-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

049aba09