1. 22 Feb, 2024 2 commits
    • Ian Rogers's avatar
      perf metrics: Compute unmerged uncore metrics individually · a59fb796
      Ian Rogers authored
      When merging counts from multiple uncore PMUs the metric is only
      computed for the metric leader. When merging/aggregation is disabled,
      prior to this patch just the leader's metric would be computed. Fix
      this by computing the metric for each PMU.
      
      On a SkylakeX:
      Before:
      ```
      $ perf stat -A -M memory_bandwidth_total -a sleep 1
      
       Performance counter stats for 'system wide':
      
      CPU0               82,217      UNC_M_CAS_COUNT.RD [uncore_imc_0] #      9.2 MB/s  memory_bandwidth_total
      CPU18                   0      UNC_M_CAS_COUNT.RD [uncore_imc_0] #      0.0 MB/s  memory_bandwidth_total
      CPU0               61,395      UNC_M_CAS_COUNT.WR [uncore_imc_0]
      CPU18                   0      UNC_M_CAS_COUNT.WR [uncore_imc_0]
      CPU0                    0      UNC_M_CAS_COUNT.RD [uncore_imc_1]
      CPU18                   0      UNC_M_CAS_COUNT.RD [uncore_imc_1]
      CPU0                    0      UNC_M_CAS_COUNT.WR [uncore_imc_1]
      CPU18                   0      UNC_M_CAS_COUNT.WR [uncore_imc_1]
      CPU0               81,570      UNC_M_CAS_COUNT.RD [uncore_imc_2]
      CPU18             113,886      UNC_M_CAS_COUNT.RD [uncore_imc_2]
      CPU0               62,330      UNC_M_CAS_COUNT.WR [uncore_imc_2]
      CPU18              66,942      UNC_M_CAS_COUNT.WR [uncore_imc_2]
      CPU0               75,489      UNC_M_CAS_COUNT.RD [uncore_imc_3]
      CPU18              27,958      UNC_M_CAS_COUNT.RD [uncore_imc_3]
      CPU0               55,864      UNC_M_CAS_COUNT.WR [uncore_imc_3]
      CPU18              38,727      UNC_M_CAS_COUNT.WR [uncore_imc_3]
      CPU0                    0      UNC_M_CAS_COUNT.RD [uncore_imc_4]
      CPU18                   0      UNC_M_CAS_COUNT.RD [uncore_imc_4]
      CPU0                    0      UNC_M_CAS_COUNT.WR [uncore_imc_4]
      CPU18                   0      UNC_M_CAS_COUNT.WR [uncore_imc_4]
      CPU0               75,423      UNC_M_CAS_COUNT.RD [uncore_imc_5]
      CPU18             104,527      UNC_M_CAS_COUNT.RD [uncore_imc_5]
      CPU0               57,596      UNC_M_CAS_COUNT.WR [uncore_imc_5]
      CPU18              56,777      UNC_M_CAS_COUNT.WR [uncore_imc_5]
      CPU0        1,003,440,851 ns   duration_time
      
             1.003440851 seconds time elapsed
      ```
      
      After:
      ```
      $ perf stat -A -M memory_bandwidth_total -a sleep 1
      
       Performance counter stats for 'system wide':
      
      CPU0               88,968      UNC_M_CAS_COUNT.RD [uncore_imc_0] #      9.5 MB/s  memory_bandwidth_total
      CPU18                   0      UNC_M_CAS_COUNT.RD [uncore_imc_0] #      0.0 MB/s  memory_bandwidth_total
      CPU0               59,498      UNC_M_CAS_COUNT.WR [uncore_imc_0]
      CPU18                   0      UNC_M_CAS_COUNT.WR [uncore_imc_0]
      CPU0                    0      UNC_M_CAS_COUNT.RD [uncore_imc_1] #      0.0 MB/s  memory_bandwidth_total
      CPU18                   0      UNC_M_CAS_COUNT.RD [uncore_imc_1] #      0.0 MB/s  memory_bandwidth_total
      CPU0                    0      UNC_M_CAS_COUNT.WR [uncore_imc_1]
      CPU18                   0      UNC_M_CAS_COUNT.WR [uncore_imc_1]
      CPU0               88,635      UNC_M_CAS_COUNT.RD [uncore_imc_2] #      9.5 MB/s  memory_bandwidth_total
      CPU18             117,975      UNC_M_CAS_COUNT.RD [uncore_imc_2] #     11.5 MB/s  memory_bandwidth_total
      CPU0               60,829      UNC_M_CAS_COUNT.WR [uncore_imc_2]
      CPU18              62,105      UNC_M_CAS_COUNT.WR [uncore_imc_2]
      CPU0               82,238      UNC_M_CAS_COUNT.RD [uncore_imc_3] #      8.7 MB/s  memory_bandwidth_total
      CPU18              22,906      UNC_M_CAS_COUNT.RD [uncore_imc_3] #      3.6 MB/s  memory_bandwidth_total
      CPU0               53,959      UNC_M_CAS_COUNT.WR [uncore_imc_3]
      CPU18              32,990      UNC_M_CAS_COUNT.WR [uncore_imc_3]
      CPU0                    0      UNC_M_CAS_COUNT.RD [uncore_imc_4] #      0.0 MB/s  memory_bandwidth_total
      CPU18                   0      UNC_M_CAS_COUNT.RD [uncore_imc_4] #      0.0 MB/s  memory_bandwidth_total
      CPU0                    0      UNC_M_CAS_COUNT.WR [uncore_imc_4]
      CPU18                   0      UNC_M_CAS_COUNT.WR [uncore_imc_4]
      CPU0               83,595      UNC_M_CAS_COUNT.RD [uncore_imc_5] #      8.9 MB/s  memory_bandwidth_total
      CPU18             110,151      UNC_M_CAS_COUNT.RD [uncore_imc_5] #     10.5 MB/s  memory_bandwidth_total
      CPU0               56,540      UNC_M_CAS_COUNT.WR [uncore_imc_5]
      CPU18              53,816      UNC_M_CAS_COUNT.WR [uncore_imc_5]
      CPU0        1,003,353,416 ns   duration_time
      ```
      
      Signed-off-by: Ian Rogers <irogers@google.com>                                  |
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Kaige Ye <ye@kaige.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: John Garry <john.g.garry@oracle.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240221070754.4163916-2-irogers@google.com
      a59fb796
    • Ian Rogers's avatar
      perf stat: Pass fewer metric arguments · eee41e6b
      Ian Rogers authored
      Pass metric_expr and evsel rather than specific variables from the
      struct, thereby reducing the number of arguments. This will enable
      later fixes.
      
      To reduce the size of the diff, local variables are added to match the
      previous parameter names. This isn't done in the case of "name" as
      evsel->name is more intention revealing. A whitespace issue is also
      addressed.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Kaige Ye <ye@kaige.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: John Garry <john.g.garry@oracle.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240221070754.4163916-1-irogers@google.com
      eee41e6b
  2. 21 Feb, 2024 5 commits
  3. 17 Feb, 2024 2 commits
    • Ian Rogers's avatar
      perf list: For metricgroup only list include description · 81377de0
      Ian Rogers authored
      If perf list is invoked with 'metricgroups' include the description
      unless it is invoked with flags to exclude it. Make the description of
      metricgroup dumping dependent on the desc flag in print_state as with
      metrics.
      
      Before:
      ```
      $ perf list metricgroups
      List of pre-defined events (to be used in -e or -M):
      
      Metric Groups:
      
      Backend
      Bad
      BadSpec
      ...
      ```
      
      After:
      ```
      $ perf list metricgroups
      List of pre-defined events (to be used in -e or -M):
      
      Metric Groups:
      
      Backend [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
      Bad [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
      BadSpec
      ...
      ```
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240216192044.119897-1-irogers@google.com
      81377de0
    • Namhyung Kim's avatar
      perf tools: Fixup module symbol end address properly · bacefe0c
      Namhyung Kim authored
      I got a strange error on ARM to fail on processing FINISHED_ROUND
      record.  It turned out that it was failing in symbol__alloc_hist()
      because the symbol size is too big.
      
      When a sample is captured on a specific BPF program, it failed.  I've
      added a debug code and found the end address of the symbol is from
      the next module which is placed far way.
      
        ffff800008795778-ffff80000879d6d8: bpf_prog_1bac53b8aac4bc58_netcg_sock    [bpf]
        ffff80000879d6d8-ffff80000ad656b4: bpf_prog_76867454b5944e15_netcg_getsockopt      [bpf]
        ffff80000ad656b4-ffffd69b7af74048: bpf_prog_1d50286d2eb1be85_hn_egress     [bpf]   <---------- here
        ffffd69b7af74048-ffffd69b7af74048: $x.5    [sha3_generic]
        ffffd69b7af74048-ffffd69b7af740b8: crypto_sha3_init        [sha3_generic]
        ffffd69b7af740b8-ffffd69b7af741e0: crypto_sha3_update      [sha3_generic]
      
      The logic in symbols__fixup_end() just uses curr->start to update the
      prev->end.  But in this case, it won't work as it's too different.
      
      I think ARM has a different kernel memory layout for modules and BPF
      than on x86.  Actually there's a logic to handle kernel and module
      boundary.  Let's do the same for symbols between different modules.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Reviewed-by: default avatarLeo Yan <leo.yan@linux.dev>
      Cc: Will Deacon <will@kernel.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Link: https://lore.kernel.org/r/20240212233322.1855161-1-namhyung@kernel.org
      bacefe0c
  4. 16 Feb, 2024 31 commits