1. 23 Feb, 2024 2 commits
    • Masahiro Yamada's avatar
      treewide: remove meaningless assignments in Makefiles · c2bd08ba
      Masahiro Yamada authored
      In Makefiles, $(error ), $(warning ), and $(info ) expand to the empty
      string, as explained in the GNU Make manual [1]:
       "The result of the expansion of this function is the empty string."
      
      Therefore, they are no-op except for logging purposes.
      
      $(shell ...) expands to the output of the command. It expands to the
      empty string when the command does not print anything to stdout.
      Hence, $(shell mkdir ...) is no-op except for creating the directory.
      
      Remove meaningless assignments.
      
      [1]: https://www.gnu.org/software/make/manual/make.html#Make-Control-FunctionsSigned-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Link: https://lore.kernel.org/r/20240221134201.2656908-1-masahiroy@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: linux-kbuild@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-perf-users@vger.kernel.org
      c2bd08ba
    • Mark Rutland's avatar
      perf print-events: make is_event_supported() more robust · 25412c03
      Mark Rutland authored
      Currently the perf tool doesn't detect support for extended event types
      on Apple M1/M2 systems, and will not auto-expand plain PERF_EVENT_TYPE
      hardware events into per-PMU events. This is due to the detection of
      extended event types not handling mandatory filters required by the
      M1/M2 PMU driver.
      
      PMU drivers and the core perf_events code can require that
      perf_event_attr::exclude_* filters are configured in a specific way and
      may reject certain configurations of filters, for example:
      
      (a) Many PMUs lack support for any event filtering, and require all
          perf_event_attr::exclude_* bits to be clear. This includes Alpha's
          CPU PMU, and ARM CPU PMUs prior to the introduction of PMUv2 in
          ARMv7,
      
      (b) When /proc/sys/kernel/perf_event_paranoid >= 2, the perf core
          requires that perf_event_attr::exclude_kernel is set.
      
      (c) The Apple M1/M2 PMU requires that perf_event_attr::exclude_guest is
          set as the hardware PMU does not count while a guest is running (but
          might be extended in future to do so).
      
      In is_event_supported(), we try to account for cases (a) and (b), first
      attempting to open an event without any filters, and if this fails,
      retrying with perf_event_attr::exclude_kernel set. We do not account for
      case (c), or any other filters that drivers could theoretically require
      to be set.
      
      Thus is_event_supported() will fail to detect support for any events
      targeting an Apple M1/M2 PMU, even where events would be supported with
      perf_event_attr:::exclude_guest set.
      
      Since commit:
      
        82fe2e45 ("perf pmus: Check if we can encode the PMU number in perf_event_attr.type")
      
      ... we use is_event_supported() to detect support for extended types,
      with the PMU ID encoded into the perf_event_attr::type. As above, on an
      Apple M1/M2 system this will always fail to detect that the event is
      supported, and consequently we fail to detect support for extended types
      even when these are supported, as they have been since commit:
      
        5c816728 ("arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability")
      
      Due to this, the perf tool will not automatically expand plain
      PERF_TYPE_HARDWARE events into per-PMU events, even when all the
      necessary kernel support is present.
      
      This patch updates is_event_supported() to additionally try opening
      events with perf_event_attr::exclude_guest set, allowing support for
      events to be detected on Apple M1/M2 systems. I believe that this is
      sufficient for all contemporary CPU PMU drivers, though in future it may
      be necessary to check for other combinations of filter bits.
      
      I've deliberately changed the check to not expect a specific error code
      for missing filters, as today ;the kernel may return a number of
      different error codes for missing filters (e.g. -EACCESS, -EINVAL, or
      -EOPNOTSUPP) depending on why and where the filter configuration is
      rejected, and retrying for any error is more robust.
      
      Note that this does not remove the need for commit:
      
        a24d9d9d ("perf parse-events: Make legacy events lower priority than sysfs/JSON")
      
      ... which is still necessary so that named-pmu/event/ events work on
      kernels without extended type support, even if the event name happens to
      be the same as a PERF_EVENT_TYPE_HARDWARE event (e.g. as is the case for
      the M1/M2 PMU's 'cycles' and 'instructions' events).
      
      Fixes: 82fe2e45 ("perf pmus: Check if we can encode the PMU number in perf_event_attr.type")
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Tested-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarJames Clark <james.clark@arm.com>
      Tested-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: Hector Martin <marcan@marcan.st>
      Cc: James Clark <james.clark@arm.com>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240126145605.1005472-1-mark.rutland@arm.com
      25412c03
  2. 22 Feb, 2024 12 commits
    • Ian Rogers's avatar
      perf tests: Add option to run tests in parallel · b482f5f8
      Ian Rogers authored
      By default tests are forked, add an option (-p or --parallel) so that
      the forked tests are all started in parallel and then their output
      gathered serially. This is opt-in as running in parallel can cause
      test flakes.
      
      Rather than fork within the code, the start_command/finish_command
      from libsubcmd are used. This changes how stderr and stdout are
      handled. The child stderr and stdout are always read to avoid the
      child blocking. If verbose is 1 (-v) then if the test fails the child
      stdout and stderr are displayed. If the verbose is >1 (e.g. -vv) then
      the stdout and stderr from the child are immediately displayed.
      
      An unscientific test on my laptop shows the wall clock time for perf
      test without parallel being 5 minutes 21 seconds and with parallel
      (-p) being 1 minute 50 seconds.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Justin Stitt <justinstitt@google.com>
      Cc: Bill Wendling <morbo@google.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: llvm@lists.linux.dev
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240221034155.1500118-9-irogers@google.com
      b482f5f8
    • Ian Rogers's avatar
      perf tests: Run time generate shell test suites · 964461ee
      Ian Rogers authored
      Rather than special shell test logic, do a single pass to create an
      array of test suites. Hold the shell test file name in the test suite
      priv field. This makes the special shell test logic in builtin-test.c
      redundant so remove it.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Justin Stitt <justinstitt@google.com>
      Cc: Bill Wendling <morbo@google.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: llvm@lists.linux.dev
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240221034155.1500118-8-irogers@google.com
      964461ee
    • Ian Rogers's avatar
      perf tests: Use scandirat for shell script finding · f3295f5b
      Ian Rogers authored
      Avoid filename appending buffers by using openat, faccessat and
      scandirat more widely. Turn the script's path back to a file name
      using readlink from /proc/<pid>/fd/<fd>.
      
      Read the script's description using api/io.h to avoid fdopen
      conversions. Whilst reading perform additional sanity checks on the
      script's contents.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Justin Stitt <justinstitt@google.com>
      Cc: Bill Wendling <morbo@google.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: llvm@lists.linux.dev
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240221034155.1500118-7-irogers@google.com
      f3295f5b
    • Ian Rogers's avatar
      perf test: Rename builtin-test-list and add missed header guard · d5bcade9
      Ian Rogers authored
      builtin-test-list is primarily concerned with shell script
      tests. Rename the file to better reflect this and add a missed header
      guard.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Justin Stitt <justinstitt@google.com>
      Cc: Bill Wendling <morbo@google.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: llvm@lists.linux.dev
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240221034155.1500118-6-irogers@google.com
      d5bcade9
    • Ian Rogers's avatar
      tools subcmd: Add a no exec function call option · 1a562c0d
      Ian Rogers authored
      Tools like perf fork tests in case they crash, but they don't want to
      exec a full binary. Add an option to call a function rather than do an
      exec. The child process exits with the result of the function call and
      is passed the struct of the run_command, things like container_of can
      then allow the child process function to determine additional
      arguments.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Justin Stitt <justinstitt@google.com>
      Cc: Bill Wendling <morbo@google.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: llvm@lists.linux.dev
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240221034155.1500118-5-irogers@google.com
      1a562c0d
    • Ian Rogers's avatar
      perf tests: Avoid fork in perf_has_symbol test · 526f2ac9
      Ian Rogers authored
      perf test -vv Symbols is used to indentify symbols within the perf
      binary. Add the -F flag so that the test command doesn't fork the test
      before running. This removes a little overhead.
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Justin Stitt <justinstitt@google.com>
      Cc: Bill Wendling <morbo@google.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: llvm@lists.linux.dev
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240221034155.1500118-4-irogers@google.com
      526f2ac9
    • Ian Rogers's avatar
      perf list: Add scandirat compatibility function · 8ece26ad
      Ian Rogers authored
      scandirat is used during the printing of tracepoint events but may be
      missing from certain libcs. Add a compatibility implementation that
      uses the symlink of an fd in /proc as a path for the reliably present
      scandir.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Justin Stitt <justinstitt@google.com>
      Cc: Bill Wendling <morbo@google.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: llvm@lists.linux.dev
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240221034155.1500118-3-irogers@google.com
      8ece26ad
    • Ian Rogers's avatar
      perf thread_map: Skip exited threads when scanning /proc · 510e5287
      Ian Rogers authored
      Scanning /proc is inherently racy. Scanning /proc/pid/task within that
      is also racy as the pid can terminate. Rather than failing in
      __thread_map__new_all_cpus, skip pids for such failures.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Justin Stitt <justinstitt@google.com>
      Cc: Bill Wendling <morbo@google.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: llvm@lists.linux.dev
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240221034155.1500118-2-irogers@google.com
      510e5287
    • Thomas Richter's avatar
      perf list: fix short description for some cache events · b6968f9b
      Thomas Richter authored
      Correct the short description of the following events:
      DCW_REQ, DCW_REQ_CHIP_HIT, DCW_REQ_DRAWER_HIT, DCW_REQ_IV,
      DCW_ON_CHIP, DCW_ON_CHIP_IV, DCW_ON_CHIP_CHIP_HIT,
      DCW_ON_CHIP_DRAWER_HIT, CW_ON_MODULE, DCW_ON_DRAWER,
      DCW_OFF_DRAWER, IDCW_ON_MODULE_IV, IDCW_ON_MODULE_CHIP_HIT,
      IDCW_ON_MODULE_DRAWER_HIT, IDCW_ON_DRAWER_IV, IDCW_ON_DRAWER_CHIP_HIT,
      IDCW_ON_DRAWER_DRAWER_HIT, IDCW_OFF_DRAWER_IV, IDCW_OFF_DRAWER_CHIP_HIT,
      IDCW_OFF_DRAWER_DRAWER_HIT, ICW_REQ, ICW_REQ_IV, CW_REQ_CHIP_HIT,
      ICW_REQ_DRAWER_HIT, ICW_ON_CHIP, ICW_ON_CHIP_IV, ICW_ON_CHIP_CHIP_HIT,
      ICW_ON_CHIP_DRAWER_HIT, ICW_ON_MODULE and ICW_OFF_DRAWER.
      
      The second Cache should be L2-Cache.
      
      Output before (display diff of the first four events)
        # perf list -d
        DCW_REQ
             [Directory Write Level 1 Data Cache from Cache. Unit: cpum_cf]
        DCW_REQ_CHIP_HIT
             [Directory Write Level 1 Data Cache from Cache with Chip HP \
      	       Hit. Unit: cpum_cf]
        DCW_REQ_DRAWER_HIT
             [Directory Write Level 1 Data Cache from Cache with Drawer \
      	       HP Hit. Unit: cpum_cf]
        DCW_REQ_IV
             [Directory Write Level 1 Data Cache from Cache with Intervention. \
      	       Unit: cpum_cf]
      
      Output after:
        # perf list -d
        DCW_REQ
             [Directory Write Level 1 Data Cache from L2-Cache. Unit: cpum_cf]
        DCW_REQ_CHIP_HIT
             [Directory Write Level 1 Data Cache from L2-Cache with Chip HP \
      	       Hit. Unit: cpum_cf]
        DCW_REQ_DRAWER_HIT
             [Directory Write Level 1 Data Cache from L2-Cache with Drawer \
      	       HP Hit. Unit: cpum_cf]
        DCW_REQ_IV
             [Directory Write Level 1 Data Cache from L2-Cache with \
      	       Intervention. Unit: cpum_cf]
      
      Fixes: 7f76b311 ("perf list: Add IBM z16 event description for s390")
      Reported-by: default avatarAndreas Krebbel <krebbel@linux.ibm.com>
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Acked-by: default avatarAndreas Krebbel <krebbel@linux.ibm.com>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: gor@linux.ibm.com
      Cc: hca@linux.ibm.com
      Cc: sumanthk@linux.ibm.com
      Cc: svens@linux.ibm.com
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240221091908.1759083-1-tmricht@linux.ibm.com
      b6968f9b
    • Ian Rogers's avatar
      perf stat: Fix metric-only aggregation index · bafd4e75
      Ian Rogers authored
      Aggregation index was being computed using the evsel's cpumap which
      may have a different (typically the same or fewer) entries.
      
      Before:
      ```
      $ perf stat --metric-only -A -M memory_bandwidth_total -a sleep 1
      
       Performance counter stats for 'system wide':
      
             MB/s  memory_bandwidth_total MB/s  memory_bandwidth_total MB/s  memory_bandwidth_total MB/s  memory_bandwidth_total MB/s  memory_bandwidth_total MB/s  memory_bandwidth_total
      CPU0                            12.8                           0.0                          12.9                          12.7                           0.0                          12.6
      CPU1
      
             1.007806367 seconds time elapsed
      ```
      
      After:
      ```
      $ perf stat --metric-only -A -M memory_bandwidth_total -a sleep 1
      
       Performance counter stats for 'system wide':
      
             MB/s  memory_bandwidth_total MB/s  memory_bandwidth_total MB/s  memory_bandwidth_total MB/s  memory_bandwidth_total MB/s  memory_bandwidth_total MB/s  memory_bandwidth_total
      CPU0                            15.4                           0.0                          15.3                          15.0                           0.0                          14.9
      CPU18                            0.0                           0.0                          13.5                           5.2                           0.0                          11.9
      
             1.007858736 seconds time elapsed
      ```
      
      Signed-off-by: Ian Rogers <irogers@google.com>                                  |
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Kaige Ye <ye@kaige.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: John Garry <john.g.garry@oracle.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240221070754.4163916-3-irogers@google.com
      bafd4e75
    • Ian Rogers's avatar
      perf metrics: Compute unmerged uncore metrics individually · a59fb796
      Ian Rogers authored
      When merging counts from multiple uncore PMUs the metric is only
      computed for the metric leader. When merging/aggregation is disabled,
      prior to this patch just the leader's metric would be computed. Fix
      this by computing the metric for each PMU.
      
      On a SkylakeX:
      Before:
      ```
      $ perf stat -A -M memory_bandwidth_total -a sleep 1
      
       Performance counter stats for 'system wide':
      
      CPU0               82,217      UNC_M_CAS_COUNT.RD [uncore_imc_0] #      9.2 MB/s  memory_bandwidth_total
      CPU18                   0      UNC_M_CAS_COUNT.RD [uncore_imc_0] #      0.0 MB/s  memory_bandwidth_total
      CPU0               61,395      UNC_M_CAS_COUNT.WR [uncore_imc_0]
      CPU18                   0      UNC_M_CAS_COUNT.WR [uncore_imc_0]
      CPU0                    0      UNC_M_CAS_COUNT.RD [uncore_imc_1]
      CPU18                   0      UNC_M_CAS_COUNT.RD [uncore_imc_1]
      CPU0                    0      UNC_M_CAS_COUNT.WR [uncore_imc_1]
      CPU18                   0      UNC_M_CAS_COUNT.WR [uncore_imc_1]
      CPU0               81,570      UNC_M_CAS_COUNT.RD [uncore_imc_2]
      CPU18             113,886      UNC_M_CAS_COUNT.RD [uncore_imc_2]
      CPU0               62,330      UNC_M_CAS_COUNT.WR [uncore_imc_2]
      CPU18              66,942      UNC_M_CAS_COUNT.WR [uncore_imc_2]
      CPU0               75,489      UNC_M_CAS_COUNT.RD [uncore_imc_3]
      CPU18              27,958      UNC_M_CAS_COUNT.RD [uncore_imc_3]
      CPU0               55,864      UNC_M_CAS_COUNT.WR [uncore_imc_3]
      CPU18              38,727      UNC_M_CAS_COUNT.WR [uncore_imc_3]
      CPU0                    0      UNC_M_CAS_COUNT.RD [uncore_imc_4]
      CPU18                   0      UNC_M_CAS_COUNT.RD [uncore_imc_4]
      CPU0                    0      UNC_M_CAS_COUNT.WR [uncore_imc_4]
      CPU18                   0      UNC_M_CAS_COUNT.WR [uncore_imc_4]
      CPU0               75,423      UNC_M_CAS_COUNT.RD [uncore_imc_5]
      CPU18             104,527      UNC_M_CAS_COUNT.RD [uncore_imc_5]
      CPU0               57,596      UNC_M_CAS_COUNT.WR [uncore_imc_5]
      CPU18              56,777      UNC_M_CAS_COUNT.WR [uncore_imc_5]
      CPU0        1,003,440,851 ns   duration_time
      
             1.003440851 seconds time elapsed
      ```
      
      After:
      ```
      $ perf stat -A -M memory_bandwidth_total -a sleep 1
      
       Performance counter stats for 'system wide':
      
      CPU0               88,968      UNC_M_CAS_COUNT.RD [uncore_imc_0] #      9.5 MB/s  memory_bandwidth_total
      CPU18                   0      UNC_M_CAS_COUNT.RD [uncore_imc_0] #      0.0 MB/s  memory_bandwidth_total
      CPU0               59,498      UNC_M_CAS_COUNT.WR [uncore_imc_0]
      CPU18                   0      UNC_M_CAS_COUNT.WR [uncore_imc_0]
      CPU0                    0      UNC_M_CAS_COUNT.RD [uncore_imc_1] #      0.0 MB/s  memory_bandwidth_total
      CPU18                   0      UNC_M_CAS_COUNT.RD [uncore_imc_1] #      0.0 MB/s  memory_bandwidth_total
      CPU0                    0      UNC_M_CAS_COUNT.WR [uncore_imc_1]
      CPU18                   0      UNC_M_CAS_COUNT.WR [uncore_imc_1]
      CPU0               88,635      UNC_M_CAS_COUNT.RD [uncore_imc_2] #      9.5 MB/s  memory_bandwidth_total
      CPU18             117,975      UNC_M_CAS_COUNT.RD [uncore_imc_2] #     11.5 MB/s  memory_bandwidth_total
      CPU0               60,829      UNC_M_CAS_COUNT.WR [uncore_imc_2]
      CPU18              62,105      UNC_M_CAS_COUNT.WR [uncore_imc_2]
      CPU0               82,238      UNC_M_CAS_COUNT.RD [uncore_imc_3] #      8.7 MB/s  memory_bandwidth_total
      CPU18              22,906      UNC_M_CAS_COUNT.RD [uncore_imc_3] #      3.6 MB/s  memory_bandwidth_total
      CPU0               53,959      UNC_M_CAS_COUNT.WR [uncore_imc_3]
      CPU18              32,990      UNC_M_CAS_COUNT.WR [uncore_imc_3]
      CPU0                    0      UNC_M_CAS_COUNT.RD [uncore_imc_4] #      0.0 MB/s  memory_bandwidth_total
      CPU18                   0      UNC_M_CAS_COUNT.RD [uncore_imc_4] #      0.0 MB/s  memory_bandwidth_total
      CPU0                    0      UNC_M_CAS_COUNT.WR [uncore_imc_4]
      CPU18                   0      UNC_M_CAS_COUNT.WR [uncore_imc_4]
      CPU0               83,595      UNC_M_CAS_COUNT.RD [uncore_imc_5] #      8.9 MB/s  memory_bandwidth_total
      CPU18             110,151      UNC_M_CAS_COUNT.RD [uncore_imc_5] #     10.5 MB/s  memory_bandwidth_total
      CPU0               56,540      UNC_M_CAS_COUNT.WR [uncore_imc_5]
      CPU18              53,816      UNC_M_CAS_COUNT.WR [uncore_imc_5]
      CPU0        1,003,353,416 ns   duration_time
      ```
      
      Signed-off-by: Ian Rogers <irogers@google.com>                                  |
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Kaige Ye <ye@kaige.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: John Garry <john.g.garry@oracle.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240221070754.4163916-2-irogers@google.com
      a59fb796
    • Ian Rogers's avatar
      perf stat: Pass fewer metric arguments · eee41e6b
      Ian Rogers authored
      Pass metric_expr and evsel rather than specific variables from the
      struct, thereby reducing the number of arguments. This will enable
      later fixes.
      
      To reduce the size of the diff, local variables are added to match the
      previous parameter names. This isn't done in the case of "name" as
      evsel->name is more intention revealing. A whitespace issue is also
      addressed.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Kaige Ye <ye@kaige.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: John Garry <john.g.garry@oracle.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240221070754.4163916-1-irogers@google.com
      eee41e6b
  3. 21 Feb, 2024 5 commits
  4. 17 Feb, 2024 2 commits
    • Ian Rogers's avatar
      perf list: For metricgroup only list include description · 81377de0
      Ian Rogers authored
      If perf list is invoked with 'metricgroups' include the description
      unless it is invoked with flags to exclude it. Make the description of
      metricgroup dumping dependent on the desc flag in print_state as with
      metrics.
      
      Before:
      ```
      $ perf list metricgroups
      List of pre-defined events (to be used in -e or -M):
      
      Metric Groups:
      
      Backend
      Bad
      BadSpec
      ...
      ```
      
      After:
      ```
      $ perf list metricgroups
      List of pre-defined events (to be used in -e or -M):
      
      Metric Groups:
      
      Backend [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
      Bad [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
      BadSpec
      ...
      ```
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240216192044.119897-1-irogers@google.com
      81377de0
    • Namhyung Kim's avatar
      perf tools: Fixup module symbol end address properly · bacefe0c
      Namhyung Kim authored
      I got a strange error on ARM to fail on processing FINISHED_ROUND
      record.  It turned out that it was failing in symbol__alloc_hist()
      because the symbol size is too big.
      
      When a sample is captured on a specific BPF program, it failed.  I've
      added a debug code and found the end address of the symbol is from
      the next module which is placed far way.
      
        ffff800008795778-ffff80000879d6d8: bpf_prog_1bac53b8aac4bc58_netcg_sock    [bpf]
        ffff80000879d6d8-ffff80000ad656b4: bpf_prog_76867454b5944e15_netcg_getsockopt      [bpf]
        ffff80000ad656b4-ffffd69b7af74048: bpf_prog_1d50286d2eb1be85_hn_egress     [bpf]   <---------- here
        ffffd69b7af74048-ffffd69b7af74048: $x.5    [sha3_generic]
        ffffd69b7af74048-ffffd69b7af740b8: crypto_sha3_init        [sha3_generic]
        ffffd69b7af740b8-ffffd69b7af741e0: crypto_sha3_update      [sha3_generic]
      
      The logic in symbols__fixup_end() just uses curr->start to update the
      prev->end.  But in this case, it won't work as it's too different.
      
      I think ARM has a different kernel memory layout for modules and BPF
      than on x86.  Actually there's a logic to handle kernel and module
      boundary.  Let's do the same for symbols between different modules.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Reviewed-by: default avatarLeo Yan <leo.yan@linux.dev>
      Cc: Will Deacon <will@kernel.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Link: https://lore.kernel.org/r/20240212233322.1855161-1-namhyung@kernel.org
      bacefe0c
  5. 16 Feb, 2024 19 commits