1. 13 Aug, 2022 14 commits
    • Ian Rogers's avatar
      perf jevents: Fold strings optimization · d0313e62
      Ian Rogers authored
      If a shorter string ends a longer string then the shorter string may
      reuse the longer string at an offset. For example, on x86 the event
      arith.cycles_div_busy and cycles_div_busy can be folded, even though
      they have difference names the strings are identical after 6
      characters. cycles_div_busy can reuse the arith.cycles_div_busy string
      at an offset of 6.
      
      In pmu-events.c this looks like the following where the 'also:' lists
      folded strings:
      
      /* offset=177541 */ "arith.cycles_div_busy\000\000pipeline\000Cycles the divider is busy\000\000\000event=0x14,period=2000000,umask=0x1\000\000\000\000\000\000\000\000\000" /* also: cycles_div_busy\000\000pipeline\000Cycles the divider is busy\000\000\000event=0x14,period=2000000,umask=0x1\000\000\000\000\000\000\000\000\000 */
      
      As jevents.py combines multiple strings for an event into a larger
      string, the amount of folding is minimal as all parts of the event must
      align. Other organizations can benefit more from folding, but lose space
      by say recording more offsets.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-15-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d0313e62
    • Ian Rogers's avatar
      perf jevents: Compress the pmu_events_table · 9118259c
      Ian Rogers authored
      The pmu_events array requires 15 pointers per entry which in position
      independent code need relocating. Change the array to be an array of
      offsets within a big C string. Only the offset of the first variable is
      required, subsequent variables are stored in order after the \0
      terminator (requiring a byte per variable rather than 4 bytes per
      offset).
      
      The file size savings are:
      
      no jevents - the same 19,788,464bytes
      x86 jevents - ~16.7% file size saving 23,744,288bytes vs 28,502,632bytes
      all jevents - ~19.5% file size saving 24,469,056bytes vs 30,379,920bytes
      default build options plus NO_LIBBFD=1.
      
      For example, the x86 build savings come from .rela.dyn and
      .data.rel.ro becoming smaller by 3,157,032bytes and 3,042,016bytes
      respectively. .rodata increases by 1,432,448bytes, giving an overall
      4,766,600bytes saving.
      
      To make metric strings more shareable, the topic is changed from say
      'skx metrics' to just 'metrics'.
      
      To try to help with the memory layout the pmu_events are ordered as used
      by perf qsort comparator functions.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-14-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9118259c
    • Ian Rogers's avatar
      perf metrics: Copy entire pmu_event in find metric · d3abd7b8
      Ian Rogers authored
      The pmu_event passed to the pmu_events_table_for_each_event is invalid
      after the loop. Copy the entire struct in metricgroup__find_metric.
      Reduce the scope of this function to static.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-13-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d3abd7b8
    • Ian Rogers's avatar
      perf pmu-events: Hide the pmu_events · 1ba3752a
      Ian Rogers authored
      Hide that the pmu_event structs are an array with a new wrapper struct.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-12-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1ba3752a
    • Ian Rogers's avatar
      perf pmu-events: Don't assume pmu_event is an array · 660842e4
      Ian Rogers authored
      The current code assumes that a struct pmu_event can be iterated over
      forward until a NULL pmu_event is encountered.
      
      This makes it difficult to refactor pmu_event.
      
      Add a loop function taking a callback function that's passed the struct
      pmu_event.
      
      This way the pmu_event is only needed for one element and not an entire
      array.
      
      Switch existing code iterating over the pmu_event arrays to use the new
      loop function pmu_events_table_for_each_event.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-11-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      660842e4
    • Ian Rogers's avatar
      perf pmu-events: Move test events/metrics to JSON · 7ae5c03a
      Ian Rogers authored
      Move arrays of pmu_events into the JSON code so that it may be
      regenerated and modified by the jevents.py script.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-10-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7ae5c03a
    • Ian Rogers's avatar
      perf test: Use full metric resolution · 64234c14
      Ian Rogers authored
      The simple metric resolution doesn't handle recursion properly, switch
      to use the full resolution as with the parse-metric tests which also
      increases coverage. Don't set the values for the metric backward as
      failures to generate a result are ignored.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-9-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      64234c14
    • Ian Rogers's avatar
      perf pmu-events: Hide pmu_events_map · 29be2fe0
      Ian Rogers authored
      Move usage of the table to pmu-events.c so it may be hidden. By
      abstracting the table the implementation can later be changed.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-8-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      29be2fe0
    • Ian Rogers's avatar
      perf pmu-events: Avoid passing pmu_events_map · eeac7730
      Ian Rogers authored
      Preparation for hiding pmu_events_map as an implementation detail. While
      the map is passed, the table of events is all that is normally wanted.
      
      While modifying the function's types, rename pmu_events_map__find to
      pmu_events_table__find to match later encapsulation. Similarly rename
      pmu_add_cpu_aliases_map to pmu_add_cpu_aliases_table.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-7-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      eeac7730
    • Ian Rogers's avatar
      perf pmu-events: Hide pmu_sys_event_tables · 2519db2a
      Ian Rogers authored
      Move usage of the table to pmu-events.c so it may be hidden. By
      abstracting the table the implementation can later be changed.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-6-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2519db2a
    • Ian Rogers's avatar
      perf jevents: Sort JSON files entries · 7b2f844c
      Ian Rogers authored
      Sort the JSON files entries on conversion to C. The sort order tries to
      replicated cmp_sevent from pmu.c so that the input there is already
      sorted except for sysfs events. Specifically, the sort order is given by
      the tuple:
      
      (not j.desc is None, fix_none(j.topic), fix_none(j.name), fix_none(j.pmu), fix_none(j.metric_name))
      
      which is putting events with descriptions and topics before those
      without, then sorting by name, then pmu and finally metric_name
      
      Add the topic to JsonEvent on reading to simplify. Remove an unnecessary
      lambda in the JSON reading.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-5-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7b2f844c
    • Ian Rogers's avatar
      perf jevents: Provide path to JSON file on error · ee2ce6fd
      Ian Rogers authored
      If a JSONDecoderError or similar is raised then it is useful to know the
      path. Print this and then raise the exception agan.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-4-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ee2ce6fd
    • Ian Rogers's avatar
      perf jevents: Remove the type/version variables · f793ae18
      Ian Rogers authored
      pmu_events_map has a type variable that is always initialized to "core"
      and a version variable that is never read. Remove these from the API as
      it is straightforward to add them back when necessary.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-3-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f793ae18
    • Ian Rogers's avatar
      perf jevent: Add an 'all' architecture argument · 099b157c
      Ian Rogers authored
      When 'all' is passed as the architecture generate a mapping table for
      all architectures. This simplifies testing. To identify the table for an
      architecture add an arch variable to the pmu_events_map.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220812230949.683239-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      099b157c
  2. 12 Aug, 2022 9 commits
  3. 11 Aug, 2022 17 commits
    • Leo Yan's avatar
      perf c2c: Update documentation for new display option 'peer' · e754dd7e
      Leo Yan authored
      Since the new display option 'peer' is introduced, this patch is to
      update the documentation to reflect it.
      Reviewed-by: default avatarAli Saidi <alisaidi@amazon.com>
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-16-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e754dd7e
    • Leo Yan's avatar
      perf c2c: Use 'peer' as default display for Arm64 · ead42a0f
      Leo Yan authored
      Since Arm64 arch doesn't support HITMs flags, this patch changes to use
      'peer' as default display if user doesn't specify any type; for other
      arches, it still uses 'tot' as default display type if user doesn't
      specify it.
      
      This patch changes to call perf_session__new() in an earlier place, so
      session environment can be initialized ahead and arch info can be used
      for setting display type.
      Suggested-by: default avatarAli Saidi <alisaidi@amazon.com>
      Reviewed-by: default avatarAli Saidi <alisaidi@amazon.com>
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Tested-by: default avatarAli Saidi <alisaidi@amazon.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-15-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ead42a0f
    • Leo Yan's avatar
      perf c2c: Sort on peer snooping for load operations · f37c5d91
      Leo Yan authored
      This patch adds a new option 'peer' so can sort on the cache hit for
      peer snooping.
      
      For displaying with option 'peer', the "Shared Data Cache Line Table"
      and "Shared Cache Line Distribution Pareto" both sort with the metrics
      "tot_peer".
      
      As result, we can get the 'peer' display:
      
        # perf c2c report -d peer --coalesce tid,pid,iaddr,dso -N --stdio
      
        =================================================
                   Shared Data Cache Line Table
        =================================================
        #
        #        ----------- Cacheline ----------     Peer  ------- Load Peer -------    Total    Total    Total  --------- Stores --------  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
        # Index             Address  Node  PA cnt    Snoop    Total    Local   Remote  records    Loads   Stores    L1Hit   L1Miss      N/A       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
        # .....  ..................  ....  ......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
        #
              0      0xaaaac17d6000   N/A       0  100.00%       99       99        0    18851    18851        0        0        0        0        0    18752        0        99        0         0        0         0         0
      
        =================================================
              Shared Cache Line Distribution Pareto
        =================================================
        #
        #        -- Peer Snoop --  ------- Store Refs ------  --------- Data address ---------                                                  ---------- cycles ----------    Total       cpu                                    Shared
        #   Num      Rmt      Lcl   L1 Hit  L1 Miss      N/A              Offset  Node  PA cnt      Pid                Tid        Code address  rmt peer  lcl peer      load  records       cnt                  Symbol            Object      Source:Line  Node{cpus %peers %stores}
        # .....  .......  .......  .......  .......  .......  ..................  ....  ......  .......  .................  ..................  ........  ........  ........  .......  ........  ......................  ................  ...............  ....
        #
          ----------------------------------------------------------------------
              0        0       99        0        0        0      0xaaaac17d6000
          ----------------------------------------------------------------------
                   0.00%    3.03%    0.00%    0.00%    0.00%                0x20   N/A       0     3603     3603:memstress      0xaaaac17c25ac         0       376        41     9314         2  [.] 0x00000000000025ac  memstress         memstress[25ac]   0{ 2 100.0%    n/a}
                   0.00%    3.03%    0.00%    0.00%    0.00%                0x20   N/A       0     3603     3606:memstress      0xaaaac17c25ac         0       375        44     9155         1  [.] 0x00000000000025ac  memstress         memstress[25ac]   0{ 1 100.0%    n/a}
                   0.00%   48.48%    0.00%    0.00%    0.00%                0x29   N/A       0     3603     3606:memstress      0xaaaac17c3e88         0       180       170       65         1  [.] 0x0000000000003e88  memstress         memstress[3e88]   0{ 1 100.0%    n/a}
                   0.00%   45.45%    0.00%    0.00%    0.00%                0x29   N/A       0     3603     3603:memstress      0xaaaac17c3e88         0       180       175       70         2  [.] 0x0000000000003e88  memstress         memstress[3e88]   0{ 2 100.0%    n/a}
      Reviewed-by: default avatarAli Saidi <alisaidi@amazon.com>
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Tested-by: default avatarAli Saidi <alisaidi@amazon.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-14-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f37c5d91
    • Leo Yan's avatar
      perf c2c: Refactor display string · faa30dfa
      Leo Yan authored
      The display type is shown by combination the display string array and a
      suffix string "HITMs", which is not friendly to extend display for other
      sorting type (e.g. extension for peer operations).
      
      This patch moves the suffix string "HITMs" into display string array for
      HITM types, so it can allow us to not necessarily to output string
      "HITMs" for new incoming display type.
      Reviewed-by: default avatarAli Saidi <alisaidi@amazon.com>
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Tested-by: default avatarAli Saidi <alisaidi@amazon.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-13-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      faa30dfa
    • Leo Yan's avatar
      perf c2c: Refactor node header · 7c10b65a
      Leo Yan authored
      The node header array contains 3 items, each item is used for one of
      the 3 flavors for node accessing info.  To extend sorting on other
      snooping type and not always stick to HITMs, the second header string
      "Node{cpus %hitms %stores}" should be adjusted (e.g. it's changed as
      "Node{cpus %peer %stores}").
      
      For this reason, this patch changes the node header array to three
      flat variables and uses switch-case in function setup_nodes_header(),
      thus it is easier for altering the header string.
      Reviewed-by: default avatarAli Saidi <alisaidi@amazon.com>
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Tested-by: default avatarAli Saidi <alisaidi@amazon.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-12-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7c10b65a
    • Leo Yan's avatar
      perf c2c: Rename dimension from 'percent_hitm' to 'percent_costly_snoop' · 2be0bc75
      Leo Yan authored
      Use more general naming for the main sort dimension, this can allow us
      not to sort only on HITM snoop type, so it can be extended to support
      other costly snooping operations.  So rename the dimension to the prefix
      'percent_costly_".
      Reviewed-by: default avatarAli Saidi <alisaidi@amazon.com>
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Tested-by: default avatarAli Saidi <alisaidi@amazon.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-11-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2be0bc75
    • Leo Yan's avatar
      perf c2c: Use explicit names for display macros · c82ccc3a
      Leo Yan authored
      Perf c2c tool has an assumption that it heavily depends on HITM snoop
      type to detect cache false sharing, unfortunately, HITM is not supported
      on some architectures.
      
      Essentially, perf c2c tool wants to find some very costly snooping
      operations for false cache sharing, this means it's not necessarily
      to stick using HITM tags and we can explore other snooping types
      (e.g. SNOOPX_PEER).
      
      For this reason, this patch renames HITM related display macros with
      suffix '_HITM', so it can be distinct if later add more display types
      for on other snooping type.
      Reviewed-by: default avatarAli Saidi <alisaidi@amazon.com>
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Tested-by: default avatarAli Saidi <alisaidi@amazon.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-10-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c82ccc3a
    • Leo Yan's avatar
      perf c2c: Add mean dimensions for peer operations · 682352e5
      Leo Yan authored
      This patch adds two dimensions for the mean value of peer operations.
      Reviewed-by: default avatarAli Saidi <alisaidi@amazon.com>
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Tested-by: default avatarAli Saidi <alisaidi@amazon.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-9-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      682352e5
    • Leo Yan's avatar
      perf c2c: Add dimensions of peer metrics for cache line view · 9082282f
      Leo Yan authored
      This patch adds dimensions of peer ops, which will be used for Shared
      cache line distribution pareto.
      
      It adds the percentage dimensions for local and remote peer operations,
      and the dimensions for accounting operation numbers which is used for
      stdio mode.
      Reviewed-by: default avatarAli Saidi <alisaidi@amazon.com>
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Tested-by: default avatarAli Saidi <alisaidi@amazon.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-8-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9082282f
    • Leo Yan's avatar
      perf c2c: Add dimensions for peer load operations · 63e74ab5
      Leo Yan authored
      This patch adds three dimensions for peer load operations of 'lcl_peer',
      'rmt_peer' and 'tot_peer'.  These three dimensions will be used in the
      shared data cache line table.
      Reviewed-by: default avatarAli Saidi <alisaidi@amazon.com>
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Tested-by: default avatarAli Saidi <alisaidi@amazon.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-7-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      63e74ab5
    • Leo Yan's avatar
      perf c2c: Output statistics for peer snooping · 3ef1fc17
      Leo Yan authored
      This patch outputs statistics for peer snooping for whole trace events
      and global shared cache line.
      Reviewed-by: default avatarAli Saidi <alisaidi@amazon.com>
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Tested-by: default avatarAli Saidi <alisaidi@amazon.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-6-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3ef1fc17
    • Leo Yan's avatar
      perf mem: Add statistics for peer snooping · e843dec5
      Leo Yan authored
      Since the flag PERF_MEM_SNOOPX_PEER is added to support cache snooping
      from peer cache line, it can come from a peer core, a peer cluster, or
      a remote NUMA node.
      
      This patch adds statistics for the flag PERF_MEM_SNOOPX_PEER.  Note, we
      take PERF_MEM_SNOOPX_PEER as an affiliated info, it needs to cooperate
      with cache level statistics.  Therefore, we account the load operations
      for both the cache level's metrics (e.g. ld_l2hit, ld_llchit, etc.) and
      peer related metrics when flag PERF_MEM_SNOOPX_PEER is set.
      
      So three new metrics are introduced: 'lcl_peer' is for local cache
      access, the metric 'rmt_peer' is for remote access (includes remote DRAM
      and any caches in remote node), and the metric 'tot_peer' is accounting
      the sum value of 'lcl_peer' and 'rmt_peer'.
      Reviewed-by: default avatarAli Saidi <alisaidi@amazon.com>
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Tested-by: default avatarAli Saidi <alisaidi@amazon.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-5-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e843dec5
    • Ali Saidi's avatar
      perf arm-spe: Use SPE data source for neoverse cores · 4e6430cb
      Ali Saidi authored
      When synthesizing data from SPE, augment the type with source information
      for Arm Neoverse cores. The field is IMPLDEF but the Neoverse cores all use
      the same encoding. I can't find encoding information for any other SPE
      implementations to unify their choices with Arm's thus that is left for
      future work.
      
      This change populates the mem_lvl_num for Neoverse cores as well as the
      deprecated mem_lvl namespace.
      Reviewed-by: default avatarGerman Gomez <german.gomez@arm.com>
      Reviewed-by: default avatarLeo Yan <leo.yan@linaro.org>
      Signed-off-by: default avatarAli Saidi <alisaidi@amazon.com>
      Tested-by: default avatarLeo Yan <leo.yan@linaro.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-4-leo.yan@linaro.orgSigned-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4e6430cb
    • Leo Yan's avatar
      perf mem: Print snoop peer flag · f78d6250
      Leo Yan authored
      Since PERF_MEM_SNOOPX_PEER flag is a new snoop type, print this flag if
      it is set.
      
      Before:
             memstress  3603 [020]   122.463754:          1            l1d-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1          l1d-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1            llc-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1          llc-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1          tlb-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1              memory:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK  N/A               aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
      
      After:
      
             memstress  3603 [020]   122.463754:          1            l1d-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1          l1d-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1            llc-miss:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1          llc-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1          tlb-access:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
             memstress  3603 [020]   122.463754:          1              memory:       8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK  N/A              aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
      Reviewed-by: default avatarAli Saidi <alisaidi@amazon.com>
      Reviewed-by: default avatarKajol Jain <kjain@linux.ibm.com>
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Tested-by: default avatarAli Saidi <alisaidi@amazon.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-3-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f78d6250
    • Ali Saidi's avatar
      perf tools: Sync addition of PERF_MEM_SNOOPX_PEER · 2e21bcf0
      Ali Saidi authored
      Add a flag to the 'perf mem' data struct to signal that a request caused
      a cache-to-cache transfer of a line from a peer of the requestor and
      wasn't sourced from a lower cache level.
      
      The line being moved from one peer cache to another has latency and
      performance implications.
      
      On Arm64 Neoverse systems the data source can indicate a cache-to-cache
      transfer but not if the line is dirty or clean, so instead of
      overloading HITM define a new flag that indicates this type of transfer.
      
      Committer notes:
      
      This really is not syncing with the kernel since the patch to the kernel
      wasn't merged.
      
      But we're going ahead of this as it seems trivial and is just a matter
      of the perf kernel maintainers to give their ack or for us to find
      another way of expressing this in the perf records synthesized in
      userspace from the ARM64 hardware traces.
      Reviewed-by: default avatarLeo Yan <leo.yan@linaro.org>
      Signed-off-by: default avatarAli Saidi <alisaidi@amazon.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Like Xu <likexu@tencent.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Timothy Hayes <timothy.hayes@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20220811062451.435810-2-leo.yan@linaro.orgSigned-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2e21bcf0
    • Leo Yan's avatar
      perf arm64: Add missing -I for tools/arch/arm64/include/ to find asm/sysreg.h... · 4a88c4ec
      Leo Yan authored
      perf arm64: Add missing -I for tools/arch/arm64/include/ to find asm/sysreg.h when building arm_spe.h
      
      This cures a current problem where tools/perf/util/arm-spe.c isn't
      finding a ARM64 specific asm header, so lets add it for now to make
      progress.
      
      Adding a .o specific rule seems clunky, lets try and find if this is
      really the right solution.
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Reported-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Link: https://lore.kernel.org/lkml/20220811124825.GA868014@leoy-huanghe.lanSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4a88c4ec
    • Adrian Hunter's avatar
      perf tools: Tidy guest option documentation · 53e76d35
      Adrian Hunter authored
      Move common guest options into include files. Use attribute substitution to
      customize an example, using "[verse]" to define the block instead of a
      "literal" block which does not permit substitution.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20220811170411.84154-4-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      53e76d35