1. 07 May, 2024 11 commits
    • Ian Rogers's avatar
      perf block-info: Remove unused refcount · 557b32c3
      Ian Rogers authored
      block_info__get() has no callers so the refcount is only ever one. As
      such remove the reference counting logic and turn puts to deletes.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ben Gainey <ben.gainey@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Li Dong <lidong@vivo.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: Paran Lee <p4ranlee@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sun Haiyong <sunhaiyong@loongson.cn>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Link: https://lore.kernel.org/r/20240507183545.1236093-4-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      557b32c3
    • Ian Rogers's avatar
      perf annotate: Fix memory leak in annotated_source · a3f7768b
      Ian Rogers authored
      Freeing hash map doesn't free the entries added to the hashmap, add
      the missing free().
      
      Fixes: d3e7cad6 ("perf annotate: Add a hashmap for symbol histogram")
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ben Gainey <ben.gainey@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Li Dong <lidong@vivo.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: Paran Lee <p4ranlee@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sun Haiyong <sunhaiyong@loongson.cn>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Link: https://lore.kernel.org/r/20240507183545.1236093-3-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a3f7768b
    • Ian Rogers's avatar
      perf ui browser: Don't save pointer to stack memory · 769e6a1e
      Ian Rogers authored
      ui_browser__show() is capturing the input title that is stack allocated
      memory in hist_browser__run().
      
      Avoid a use after return by strdup-ing the string.
      
      Committer notes:
      
      Further explanation from Ian Rogers:
      
      My command line using tui is:
      $ sudo bash -c 'rm /tmp/asan.log*; export
      ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a
      sleep 1; /tmp/perf/perf mem report'
      I then go to the perf annotate view and quit. This triggers the asan
      error (from the log file):
      ```
      ==1254591==ERROR: AddressSanitizer: stack-use-after-return on address
      0x7f2813331920 at pc 0x7f28180
      65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10
      READ of size 80 at 0x7f2813331920 thread T0
          #0 0x7f2818065990 in __interceptor_strlen
      ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461
          #1 0x7f2817698251 in SLsmg_write_wrapped_string
      (/lib/x86_64-linux-gnu/libslang.so.2+0x98251)
          #2 0x7f28176984b9 in SLsmg_write_nstring
      (/lib/x86_64-linux-gnu/libslang.so.2+0x984b9)
          #3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60
          #4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266
          #5 0x55c94045c776 in ui_browser__show ui/browser.c:288
          #6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206
          #7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458
          #8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412
          #9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527
          #10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613
          #11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661
          #12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671
          #13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141
          #14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805
          #15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374
          #16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516
          #17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350
          #18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403
          #19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447
          #20 0x55c9400e53ad in main tools/perf/perf.c:561
          #21 0x7f28170456c9 in __libc_start_call_main
      ../sysdeps/nptl/libc_start_call_main.h:58
          #22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360
          #23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId:
      84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93)
      
      Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame
          #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746
      
        This frame has 1 object(s):
          [32, 192) 'title' (line 747) <== Memory access at offset 32 is
      inside this variable
      HINT: this may be a false positive if your program uses some custom
      stack unwind mechanism, swapcontext or vfork
      ```
      hist_browser__run isn't on the stack so the asan error looks legit.
      There's no clean init/exit on struct ui_browser so I may be trading a
      use-after-return for a memory leak, but that seems look a good trade
      anyway.
      
      Fixes: 05e8b080 ("perf ui browser: Stop using 'self'")
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ben Gainey <ben.gainey@arm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Li Dong <lidong@vivo.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Cc: Paran Lee <p4ranlee@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sun Haiyong <sunhaiyong@loongson.cn>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Link: https://lore.kernel.org/r/20240507183545.1236093-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      769e6a1e
    • He Zhe's avatar
      perf bench internals inject-build-id: Fix trap divide when collecting just one DSO · d9180e23
      He Zhe authored
      'perf bench internals inject-build-id' suffers from the following error when
      only one DSO is collected.
      
        # perf bench internals inject-build-id -v
          Collected 1 DSOs
        traps: internals-injec[2305] trap divide error
        ip:557566ba6394 sp:7ffd4de97fe0 error:0 in perf[557566b2a000+23d000]
          Build-id injection benchmark
          Iteration #1
        Floating point exception
      
      This patch removes the unnecessary minus one from the divisor which also
      corrects the randomization range.
      Signed-off-by: default avatarHe Zhe <zhe.he@windriver.com>
      Fixes: 0bf02a0d ("perf bench: Add build-id injection benchmark")
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20240507065026.2652929-1-zhe.he@windriver.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d9180e23
    • Arnaldo Carvalho de Melo's avatar
      perf probe: Use zfree() to avoid possibly accessing dangling pointers · b78854e5
      Arnaldo Carvalho de Melo authored
      When freeing a->b it is good practice to set a->b to NULL using
      zfree(&a->b) so that when we have a bug where a reference to a freed 'a'
      pointer is kept somewhere, we can more quickly cause a segfault if some
      code tries to use a->b.
      
      Convert one such case in the 'perf probe' codebase.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/lkml/ZjpBnkL2wO3QJa5W@x1Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b78854e5
    • James Clark's avatar
      perf auxtrace: Allow number of queues to be specified · ee73fe99
      James Clark authored
      Currently it's only possible to initialize with the default number of
      queues and then use auxtrace_queues__add_event() to grow the array.
      
      But that's problematic if you don't have a real event to pass into that
      function yet.
      
      The queues hold a void *priv member to store custom state, and for
      Coresight we want to create decoders upfront before receiving data, so
      add a new function that allows pre-allocating queues.
      
      One reason to do this is because we might need to store metadata (HW_ID
      events) that effects other queues, but never actually receive auxtrace
      data on that queue.
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Tested-by: default avatarGanapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steve Clevenger <scclevenger@os.amperecomputing.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20240429152207.479221-3-james.clark@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ee73fe99
    • James Clark's avatar
      perf cs-etm: Print error for new PERF_RECORD_AUX_OUTPUT_HW_ID versions · 0d2e3f25
      James Clark authored
      The likely fix for this is to update perf so print a helpful message.
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Tested-by: default avatarGanapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
      Acked-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steve Clevenger <scclevenger@os.amperecomputing.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20240429152207.479221-2-james.clark@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0d2e3f25
    • Athira Rajeev's avatar
      perf annotate: Fix a comment about multi_regs in extract_reg_offset function · 36e8aa90
      Athira Rajeev authored
      Fix a comment in function which explains how multi_regs field gets set
      for an instruction. In the example, "mov  %rsi, 8(%rbx,%rcx,4)", the
      comment mistakenly referred to "dst_multi_regs = 0". Correct it to use
      "src_multi_regs = 0"
      Signed-off-by: default avatarAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Akanksha J N <akanksha@linux.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
      Cc: Segher Boessenkool <segher@kernel.crashing.org>
      Link: https://lore.kernel.org/r/20240506121906.76639-4-atrajeev@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      36e8aa90
    • Arnaldo Carvalho de Melo's avatar
      perf kwork: Use zfree() to avoid possibly accessing dangling pointers · 07fde753
      Arnaldo Carvalho de Melo authored
      When freeing a->b it is good practice to set a->b to NULL using
      zfree(&a->b) so that when we have a bug where a reference to a freed 'a'
      pointer is kept somewhere, we can more quickly cause a segfault if some
      code tries to use a->b.
      
      Convert one such case in the 'perf kwork' codebase.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Link: https://lore.kernel.org/lkml/Zjmc5EiN6zmWZj4r@x1Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      07fde753
    • Arnaldo Carvalho de Melo's avatar
      perf callchain: Use zfree() to avoid possibly accessing dangling pointers · 54ef362e
      Arnaldo Carvalho de Melo authored
      When freeing a->b it is good practice to set a->b to NULL using
      zfree(&a->b) so that when we have a bug where a reference to a freed 'a'
      pointer is kept somewhere, we can more quickly cause a segfault if some
      code tries to use a->b.
      
      Convert one such case in the callchain code.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/lkml/ZjmcGobQ8E52EyjJ@x1Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      54ef362e
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Use zfree() to avoid possibly accessing dangling pointers · 69fb6eab
      Arnaldo Carvalho de Melo authored
      When freeing a->b it is good practice to set a->b to NULL using
      zfree(&a->b) so that when we have a bug where a reference to a freed 'a'
      pointer is kept somewhere, we can more quickly cause a segfault if some
      code tries to use a->b.
      
      This is mostly done but some new cases were introduced recently, convert
      them to zfree().
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/lkml/ZjmbHHrjIm5YRIBv@x1Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      69fb6eab
  2. 06 May, 2024 9 commits
    • Ian Rogers's avatar
      perf dso: Use container_of() to avoid a pointer in 'struct dso_data' · 37862d6f
      Ian Rogers authored
      The dso pointer in 'struct dso_data' is necessary for reference count
      checking to account for the dso_data forming a global list of open dso's
      with references to the dso.
      
      The dso pointer also allows for the indirection that reference count
      checking needs. Outside of reference count checking the indirection
      isn't needed and container_of() is more efficient and saves space.
      
      The reference count won't be increased by placing items onto the global
      list, matching how things were before the reference count checking
      change, but we assert the dso is in dsos holding it live (and that the
      set of open dsos is a subset of all dsos for the machine).
      
      Update the DSO data tests so that they use a dsos struct to make the
      invariant true.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Link: https://lore.kernel.org/r/20240506180104.485674-5-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      37862d6f
    • Ian Rogers's avatar
      perf symbol-elf: dso__load_sym_internal() reference count fixes · 23106e31
      Ian Rogers authored
      dso__load_sym_internal() passed curr_mapp as an out argument to
      dso__process_kernel_symbol(). The out argument was never used so remove
      it to simplify the reference counting logic.
      
      Simplify reference counting issues with curr_dso by ensuring the value
      it points to has a +1 reference count, and then putting as
      necessary.
      
      This avoids some reference counting games when the dso is created making
      the code more obviously correct with some possible introduced overhead
      due to the reference counting get/puts.
      
      This, however, silences reference count checking and we can always
      optimize from a seemingly correct point.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Link: https://lore.kernel.org/r/20240506180104.485674-4-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      23106e31
    • Ian Rogers's avatar
      perf symbol-elf: Ensure dso__put() in machine__process_ksymbol_register() · ee5061f8
      Ian Rogers authored
      The dso__put() after the map creation causes a use after put in
      dso__set_loaded().
      
      To ensure there is a +1 reference count on both sides of the if-else, do
      a dso__get() on the found map's dso.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Link: https://lore.kernel.org/r/20240506180104.485674-3-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ee5061f8
    • Ian Rogers's avatar
      perf map: Add missing dso__put() in map__new() · 7fdc33f8
      Ian Rogers authored
      A dso__put() is needed for the dsos__find() when the map is created and
      a buildid is sought.
      
      Fixes: f649ed80 ("perf dsos: Tidy reference counting and locking")
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Link: https://lore.kernel.org/r/20240506180104.485674-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7fdc33f8
    • Ian Rogers's avatar
      perf dso: Add reference count checking and accessor functions · ee756ef7
      Ian Rogers authored
      Add reference count checking to struct dso, this can help with
      implementing correct reference counting discipline. To avoid
      RC_CHK_ACCESS everywhere, add accessor functions for the variables in
      struct dso.
      
      The majority of the change is mechanical in nature and not easy to
      split up.
      
      Committer testing:
      
      'perf test' up to this patch shows no regressions.
      
      But:
      
        util/symbol.c: In function ‘dso__load_bfd_symbols’:
        util/symbol.c:1683:9: error: too few arguments to function ‘dso__set_adjust_symbols’
         1683 |         dso__set_adjust_symbols(dso);
              |         ^~~~~~~~~~~~~~~~~~~~~~~
        In file included from util/symbol.c:21:
        util/dso.h:268:20: note: declared here
          268 | static inline void dso__set_adjust_symbols(struct dso *dso, bool val)
              |                    ^~~~~~~~~~~~~~~~~~~~~~~
        make[6]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: /tmp/tmp.ZWHbQftdN6/util/symbol.o] Error 1
          MKDIR   /tmp/tmp.ZWHbQftdN6/tests/workloads/
        make[6]: *** Waiting for unfinished jobs....
      
      This was updated:
      
        -       symbols__fixup_end(&dso->symbols, false);
        -       symbols__fixup_duplicate(&dso->symbols);
        -       dso->adjust_symbols = 1;
        +       symbols__fixup_end(dso__symbols(dso), false);
        +       symbols__fixup_duplicate(dso__symbols(dso));
        +       dso__set_adjust_symbols(dso);
      
      But not build tested with BUILD_NONDISTRO and libbfd devel files installed
      (binutils-devel on fedora).
      
      Add the missing argument:
      
         	symbols__fixup_end(dso__symbols(dso), false);
         	symbols__fixup_duplicate(dso__symbols(dso));
        -	dso__set_adjust_symbols(dso);
        +	dso__set_adjust_symbols(dso, true);
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ben Gainey <ben.gainey@arm.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Chengen Du <chengen.du@canonical.com>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Dima Kogan <dima@secretsauce.net>
      Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Li Dong <lidong@vivo.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paran Lee <p4ranlee@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Sun Haiyong <sunhaiyong@loongson.cn>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
      Link: https://lore.kernel.org/r/20240504213803.218974-6-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ee756ef7
    • Ian Rogers's avatar
      perf dsos: Switch hand crafted code to bsearch() · 7a9418cf
      Ian Rogers authored
      Switch to using the bsearch library function rather than having a hand
      written binary search. Const-ify some static functions to avoid compiler
      warnings.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ben Gainey <ben.gainey@arm.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Chengen Du <chengen.du@canonical.com>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Dima Kogan <dima@secretsauce.net>
      Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Li Dong <lidong@vivo.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paran Lee <p4ranlee@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Sun Haiyong <sunhaiyong@loongson.cn>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
      Link: https://lore.kernel.org/r/20240504213803.218974-5-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7a9418cf
    • Ian Rogers's avatar
      perf dsos: Remove __dsos__findnew_link_by_longname_id() · 7410d600
      Ian Rogers authored
      Function was only called in dsos.c with the dso parameter as
      NULL. Remove the function and specialize for the dso being NULL case
      removing other unused functions along the way.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ben Gainey <ben.gainey@arm.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Chengen Du <chengen.du@canonical.com>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Dima Kogan <dima@secretsauce.net>
      Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Li Dong <lidong@vivo.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paran Lee <p4ranlee@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Sun Haiyong <sunhaiyong@loongson.cn>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
      Link: https://lore.kernel.org/r/20240504213803.218974-4-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7410d600
    • Ian Rogers's avatar
      perf dsos: Remove __dsos__addnew() · dfd48165
      Ian Rogers authored
      Function no longer used so remove.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ben Gainey <ben.gainey@arm.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Chengen Du <chengen.du@canonical.com>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Dima Kogan <dima@secretsauce.net>
      Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Li Dong <lidong@vivo.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paran Lee <p4ranlee@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Sun Haiyong <sunhaiyong@loongson.cn>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
      Link: https://lore.kernel.org/r/20240504213803.218974-3-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      dfd48165
    • Ian Rogers's avatar
      perf dsos: Switch backing storage to array from rbtree/list · 3f4ac23a
      Ian Rogers authored
      DSOs were held on a list for fast iteration and in an rbtree for fast
      finds.
      
      Switch to using a lazily sorted array where iteration is just iterating
      through the array and binary searches are the same complexity as
      searching the rbtree.
      
      The find may need to sort the array first which does increase the
      complexity, but add operations have lower complexity and overall the
      complexity should remain about the same.
      
      The set name operations on the dso just records that the array is no
      longer sorted, avoiding complexity in rebalancing the rbtree.
      
      Tighter locking discipline is enforced to avoid the array being resorted
      while long and short names or ids are changed.
      
      The array is smaller in size, replacing 6 pointers with 2, and so even
      with extra allocated space in the array, the array may be 50%
      unoccupied, the memory saving should be at least 2x.
      
      Committer testing:
      
      On a previous version of this patchset we were getting a lot of warnings
      about deleting a DSO still on a list, now it is ok:
      
        root@x1:~# perf probe -l
        root@x1:~# perf probe finish_task_switch
        Added new event:
          probe:finish_task_switch (on finish_task_switch)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e probe:finish_task_switch -aR sleep 1
      
        root@x1:~# perf probe -l
          probe:finish_task_switch (on finish_task_switch@kernel/sched/core.c)
        root@x1:~# perf trace -e probe:finish_task_switch/max-stack=8/ --max-events=1
             0.000 migration/0/19 probe:finish_task_switch(__probe_ip: -1894408688)
                                               finish_task_switch.isra.0 ([kernel.kallsyms])
                                               __schedule ([kernel.kallsyms])
                                               schedule ([kernel.kallsyms])
                                               smpboot_thread_fn ([kernel.kallsyms])
                                               kthread ([kernel.kallsyms])
                                               ret_from_fork ([kernel.kallsyms])
                                               ret_from_fork_asm ([kernel.kallsyms])
        root@x1:~#
        root@x1:~# perf probe -d probe:*
        Removed event: probe:finish_task_switch
        root@x1:~# perf probe -l
        root@x1:~#
      
      I also ran the full 'perf test' suite after applying this one, no
      regressions.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Ben Gainey <ben.gainey@arm.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Chengen Du <chengen.du@canonical.com>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Dima Kogan <dima@secretsauce.net>
      Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Li Dong <lidong@vivo.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paran Lee <p4ranlee@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Sun Haiyong <sunhaiyong@loongson.cn>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
      Link: https://lore.kernel.org/r/20240504213803.218974-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3f4ac23a
  3. 04 May, 2024 5 commits
  4. 03 May, 2024 6 commits
    • Ian Rogers's avatar
      perf pmu: Assume sysfs events are always the same case · 7b6dd7a9
      Ian Rogers authored
      Perf event names aren't case sensitive. For sysfs events the entire
      directory of events is read then iterated comparing names in a case
      insensitive way, most often to see if an event is present.
      
      Consider:
      
        $ perf stat -e inst_retired.any true
      
      The event inst_retired.any may be present in any PMU, so every PMU's
      sysfs events are loaded and then searched with strcasecmp to see if
      any match. This event is only present on the cpu PMU as a JSON event
      so a lot of events were loaded from sysfs unnecessarily just to prove
      an event didn't exist there.
      
      This change avoids loading all the events by assuming sysfs event
      names are always either lower or uppercase. It uses file exists and
      only loads the events when the desired event is present.
      
      For the example above, the number of openat calls measured by 'perf
      trace' on a tigerlake laptop goes from 325 down to 255. The reduction
      will be larger for machines with many PMUs, particularly replicated
      uncore PMUs.
      
      Ensure pmu_aliases_parse() is called before all uses of the aliases
      list, but remove some "pmu->sysfs_aliases_loaded" tests as they are now
      part of the function.
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: https://lore.kernel.org/r/20240502213507.2339733-7-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7b6dd7a9
    • Ian Rogers's avatar
      perf test pmu: Test all sysfs PMU event names are the same case · 6debc5aa
      Ian Rogers authored
      Being either lower or upper case means event name probes can avoid
      scanning the directory doing case insensitive comparisons, just the
      lower or upper case version of the name can be checked for
      existence.
      
      For the majority of PMUs event names are all lower case, upper case
      names are present on S390.
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: https://lore.kernel.org/r/20240502213507.2339733-6-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6debc5aa
    • Ian Rogers's avatar
      perf test pmu: Add an eagerly loaded event test · 18eb2ca8
      Ian Rogers authored
      Allow events/aliases to be eagerly loaded for a PMU. Factor out the
      pmu_aliases_parse to allow this.
      
      Parse a test event and check it configures the attribute as expected.
      
      There is overlap with the parse-events tests, but this test is done with
      a PMU created in a temp directory and doesn't rely on PMUs in sysfs.
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: https://lore.kernel.org/r/20240502213507.2339733-5-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      18eb2ca8
    • Ian Rogers's avatar
      perf test pmu: Refactor format test and exposed test APIs · aa1551f2
      Ian Rogers authored
      In tests/pmu.c, make a common utility that creates a PMU in a mkdtemp
      directory and uses regular PMU parsing logic to load that PMU. Formats
      must still be eagerly loaded as by default the PMU code assumes devices
      are going to be in sysfs.
      
      In util/pmu.[ch], hide perf_pmu__format_parse but add the eager argument
      to perf_pmu__lookup called by perf_pmus__add_test_pmu. Later patches
      will eagerly load other non-sysfs files when eager loading is enabled.
      
      In tests/pmu.c, rather than manually constructing a list of term
      arguments, just use the term parsing code from a string.
      
      Add more comments and debug logging.
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: https://lore.kernel.org/r/20240502213507.2339733-4-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      aa1551f2
    • Ian Rogers's avatar
      perf Document: Sysfs event names must be lower or upper case · 785623ee
      Ian Rogers authored
      To avoid directory scans in perf it is going to be assumed that sysfs
      event names are either lower or upper case.
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: https://lore.kernel.org/r/20240502213507.2339733-3-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      785623ee
    • Ian Rogers's avatar
      perf test pmu-events: Make it clearer that pmu-events tests JSON events · 97c48ea8
      Ian Rogers authored
      Add JSON to the test name.
      Reviewed-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Link: https://lore.kernel.org/r/20240502213507.2339733-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      97c48ea8
  5. 02 May, 2024 9 commits
    • Namhyung Kim's avatar
      perf maps: Remove check_invariants() from maps__lock() · 3cdd98b4
      Namhyung Kim authored
      I found that the debug build was a slowed down a lot by the maps lock
      code since it checks the invariants whenever it gets the pointer to the
      lock.  This means it checks twice the invariants before and after the
      access.
      
      Instead, let's move the checking code within the lock area but after any
      modification and remove it from the read paths.  This would remove (more
      than) half of the maps lock overhead.
      
      The time for perf report with a huge data file (200k+ of MMAP2 events).
      
        Non-debug     Before      After
        ---------   --------   --------
           2m 43s     6m 45s     4m 21s
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240429225738.1491791-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3cdd98b4
    • James Clark's avatar
      perf cs-etm: Improve version detection and error reporting · e3123079
      James Clark authored
      When the config validation functions are warning about ETMv3, they do it
      based on "not ETMv4". If the drivers aren't all loaded or the hardware
      doesn't support Coresight it will appear as "not ETMv4" and then Perf
      will print the error message "... not supported in ETMv3 ..." which is
      wrong and confusing.
      
      cs_etm_is_etmv4() is also misnamed because it also returns true for
      ETE because ETE has a superset of the ETMv4 metadata files. Although
      this was always done in the correct order so it wasn't a bug.
      
      Improve all this by making a single get version function which also
      handles not present as a separate case. Change the ETMv3 error message
      to only print when ETMv3 is detected, and add a new error message for
      the not present case.
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Reviewed-by: default avatarLeo Yan <leo.yan@linux.dev>
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20240501135753.508022-4-james.clark@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e3123079
    • James Clark's avatar
      perf cs-etm: Remove repeated fetches of the ETM PMU · bc5e0e1b
      James Clark authored
      Most functions already have cs_etm_pmu, so it's a bit neater to pass
      it through rather than itr only to convert it again.
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20240501135753.508022-3-james.clark@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bc5e0e1b
    • James Clark's avatar
      perf cs-etm: Use struct perf_cpu as much as possible · cbaf2c4f
      James Clark authored
      The perf_cpu struct makes some iterators simpler and avoids some
      mistakes with interchanging CPU IDs with indexes etc. At the moment in
      this file the conversion to an integer is done somewhere in the middle
      of the call tree. Change it to delay the conversion to an int until the
      leaf functions.
      
      Some of the usage patterns are duplicated, so instead of changing them
      all, make cs_etm_get_ro() more reusable and use that everywhere.
      cs_etm_get_ro() didn't return an error before, but return one now so
      that it can also be used where an error is needed. Continue to ignore
      the error where it was already ignored.
      
      Use cs_etm_pmu_path_exists() instead of cs_etm_get_ro() in
      cs_etm_is_etmv4() because cs_etm_get_ro() prints a warning, but path
      exists is sufficient for this use case.
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Leo Yan <leo.yan@linux.dev>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20240501135753.508022-2-james.clark@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cbaf2c4f
    • Namhyung Kim's avatar
      perf annotate-data: Check kind of stack variables · b7d4aacf
      Namhyung Kim authored
      I sometimes see ("unknown type") in the result and it was because it
      didn't check the type of stack variables properly during the instruction
      tracking.  The stack can carry constant values (without type info) and
      if the target instruction is accessing the stack location, it resulted
      in the "unknown type".
      
      Maybe we could pick one of integer types for the constant, but it
      doesn't really mean anything useful.  Let's just drop the stack slot if
      it doesn't have a valid type info.
      
      Here's an example how it got the unknown type.
      Note that 0xffffff48 = -0xb8.
        -----------------------------------------------------------
        find data type for 0xffffff48(reg6) at ...
        CU for ...
        frame base: cfa=0 fbreg=6
        scope: [2/2] (die:11cb97f)
        bb: [37 - 3a]
        var [37] reg15 type='int' size=0x4 (die:0x1180633)
        bb: [40 - 4b]
        mov [40] imm=0x1 -> reg13
        var [45] reg8 type='sigset_t*' size=0x8 (die:0x11a39ee)
        mov [45] imm=0x1 -> reg2                     <---  here reg2 has a constant
        bb: [215 - 237]
        mov [218] reg2 -> -0xb8(stack) constant      <---  and save it to the stack
        mov [225] reg13 -> -0xc4(stack) constant
        call [22f] find_task_by_vgpid
        call [22f] return -> reg0 type='struct task_struct*' size=0x8 (die:0x11881e8)
        bb: [5c8 - 5cf]
        bb: [2fb - 302]
        mov [2fb] -0xc4(stack) -> reg13 constant
        bb: [13b - 14d]
        mov [143] 0xd50(reg3) -> reg5 type='struct task_struct*' size=0x8 (die:0xa31f3c)
        bb: [153 - 153]
        chk [153] reg6 offset=0xffffff48 ok=0 kind=0 fbreg    <--- access here
        found by insn track: 0xffffff48(reg6) type-offset=0
         type='G<EF>^K<F6><AF>U' size=0 (die:0xffffffffffffffff)
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240502060011.1838090-7-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b7d4aacf
    • Namhyung Kim's avatar
      perf annotate-data: Handle multi regs in find_data_type_block() · af89e8f2
      Namhyung Kim authored
      The instruction tracking should be the same for the both registers.
      
      Just do it once and compare the result with multi regs as with the
      previous patches.
      
      Then we don't need to call find_data_type_block() separately for each
      reg.
      
      Let's remove the 'reg' argument from the relevant functions.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240502060011.1838090-6-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      af89e8f2
    • Namhyung Kim's avatar
      perf annotate-data: Check memory access with two registers · eba1f853
      Namhyung Kim authored
      The following instruction pattern is used to access a global variable.
      
        mov     $0x231c0, %rax
        movsql  %edi, %rcx
        mov     -0x7dc94ae0(,%rcx,8), %rcx
        cmpl    $0x0, 0xa60(%rcx,%rax,1)     <<<--- here
      
      The first instruction set the address of the per-cpu variable (here, it
      is 'runqueues' of type 'struct rq').  The second instruction seems like
      a cpu number of the per-cpu base.  The third instruction get the base
      offset of per-cpu area for that cpu.  The last instruction compares the
      value of the per-cpu variable at the offset of 0xa60.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240502060011.1838090-5-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      eba1f853
    • Namhyung Kim's avatar
      perf annotate-data: Handle direct global variable access · 4449c904
      Namhyung Kim authored
      Like per-cpu base offset array, sometimes it accesses the global
      variable directly using the offset.  Allow this type of instructions as
      long as it finds a global variable for the address.
      
        movslq  %edi, %rcx
        mov     -0x7dc94ae0(,%rcx,8), %rcx   <<<--- here
      
      As %rcx has a valid type (i.e. array index) from the first instruction,
      it will be checked by the first case in check_matching_type().  But as
      it's not a pointer type, the match will fail.  But in this case, it
      should check if it accesses the kernel global array variable.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240502060011.1838090-4-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4449c904
    • Namhyung Kim's avatar
      perf annotate-data: Collect global variables in advance · c1da8411
      Namhyung Kim authored
      Currently it looks up global variables from the current CU using address
      and name.  But it sometimes fails to find a variable as the variable can
      come from a different CU - but it's still strange it failed to find a
      declaration for some reason.
      
      Anyway, it can collect all global variables from all CU once and then
      lookup them later on.  This slightly improves the success rate of my
      test data set.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20240502060011.1838090-3-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c1da8411