1. 25 Oct, 2023 16 commits
    • Ian Rogers's avatar
      perf callchain: Make display use of branch_type_stat const · d47d876d
      Ian Rogers authored
      Display code doesn't modify the branch_type_stat so switch uses to
      const. This is done to aid refactoring struct callchain_list where
      current the branch_type_stat is embedded even if not used.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-9-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      d47d876d
    • Ian Rogers's avatar
      perf offcpu: Add missed btf_free · 67a3ebf1
      Ian Rogers authored
      Caught by address/leak sanitizer.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-8-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      67a3ebf1
    • Ian Rogers's avatar
      perf threads: Remove unused dead thread list · 7b2e444b
      Ian Rogers authored
      Commit 40826c45 ("perf thread: Remove notion of dead threads")
      removed dead threads but the list head wasn't removed. Remove it here.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-7-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      7b2e444b
    • Ian Rogers's avatar
      perf hist: Add missing puts to hist__account_cycles · c1149037
      Ian Rogers authored
      Caught using reference count checking on perf top with
      "--call-graph=lbr". After this no memory leaks were detected.
      
      Fixes: 57849998 ("perf report: Add processing for cycle histograms")
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-6-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      c1149037
    • Ian Rogers's avatar
      libperf rc_check: Add RC_CHK_EQUAL · 78c32f4c
      Ian Rogers authored
      Comparing pointers with reference count checking is tricky to avoid a
      SEGV. Add a convenience macro to simplify and use.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-5-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      78c32f4c
    • Ian Rogers's avatar
      libperf rc_check: Make implicit enabling work for GCC · 75265320
      Ian Rogers authored
      Make the implicit REFCOUNT_CHECKING robust to when building with GCC.
      
      Fixes: 9be6ab18 ("libperf rc_check: Enable implicitly with sanitizers")
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-4-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      75265320
    • Ian Rogers's avatar
      perf machine: Avoid out of bounds LBR memory read · ab8ce150
      Ian Rogers authored
      Running perf top with address sanitizer and "--call-graph=lbr" fails
      due to reading sample 0 when no samples exist. Add a guard to prevent
      this.
      
      Fixes: e2b23483 ("perf machine: Factor out lbr_callchain_add_lbr_ip()")
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-3-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      ab8ce150
    • Ian Rogers's avatar
      perf rwsem: Add debug mode that uses a mutex · 7a8f349e
      Ian Rogers authored
      Mutex error check will capture trying to take the lock recursively and
      other problems that rwlock won't. At the expense of concurrency, adda
      debug mode that uses a mutex in place of a rwsem.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: K Prateek Nayak <kprateek.nayak@amd.com>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: German Gomez <german.gomez@arm.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: liuwenyu <liuwenyu7@huawei.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Yanteng Si <siyanteng@loongson.cn>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20231024222353.3024098-2-irogers@google.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      7a8f349e
    • Arnaldo Carvalho de Melo's avatar
      perf build: Address stray '\' before # that is warned about since grep 3.8 · b27778ed
      Arnaldo Carvalho de Melo authored
      To address this grep 3.8 warning:
      
        grep: warning: stray \ before #
      
      We needed to remove the '' around the grep expression and keep the \
      before # so that it is escaped by the $(shell grep ...) and thus doesn't
      get to grep.
      
      We need that \ before the #, otherwise we get this:
      
        Makefile.perf:364: *** unterminated call to function 'shell': missing ')'.  Stop.
      
      As everything after the # will be considered a comment.
      
      Removing the single quotes needs some more escaping so that _some_ of
      the escaped chars gets to grep, like the '\|' that becomes '\\\|´.
      
      Running on debian:10, where there is no libtraceevent-devel available,
      we get:
      
        Makefile.perf:367: *** PYTHON_EXT_SRCS= util/python.c ../lib/ctype.c util/cap.c util/evlist.c util/evsel.c util/evsel_fprintf.c util/perf_event_attr_fprintf.c util/cpumap.c util/memswap.c util/mmap.c util/namespaces.c ../lib/bitmap.c ../lib/find_bit.c ../lib/list_sort.c ../lib/hweight.c ../lib/string.c ../lib/vsprintf.c util/thread_map.c util/util.c util/cgroup.c util/parse-branch-options.c util/rblist.c util/counts.c util/print_binary.c util/strlist.c ../lib/rbtree.c util/string.c util/symbol_fprintf.c util/units.c util/affinity.c util/rwsem.c util/hashmap.c util/perf_regs.c util/fncache.c util/perf-regs-arch/perf_regs_aarch64.c util/perf-regs-arch/perf_regs_arm.c util/perf-regs-arch/perf_regs_csky.c util/perf-regs-arch/perf_regs_loongarch.c util/perf-regs-arch/perf_regs_mips.c util/perf-regs-arch/perf_regs_powerpc.c util/perf-regs-arch/perf_regs_riscv.c util/perf-regs-arch/perf_regs_s390.c util/perf-regs-arch/perf_regs_x86.c.  Stop.
        make[1]: *** [Makefile.perf:242: sub-make] Error 2
      
      I.e. both the comments and the util/trace-event.c were removed.
      
      When using:
      
      msg := $(error PYTHON_EXT_SRCS=$(PYTHON_EXT_SRCS))
      
      While on the more recent fedora:38, with the new grep and make packages
      and libtraceevent-devel installed:
      
        Makefile.perf:367: *** PYTHON_EXT_SRCS= util/python.c ../lib/ctype.c util/cap.c util/evlist.c util/evsel.c util/evsel_fprintf.c util/perf_event_attr_fprintf.c util/cpumap.c util/memswap.c util/mmap.c util/namespaces.c ../lib/bitmap.c ../lib/find_bit.c ../lib/list_sort.c ../lib/hweight.c ../lib/string.c ../lib/vsprintf.c util/thread_map.c util/util.c util/cgroup.c util/parse-branch-options.c util/rblist.c util/counts.c util/print_binary.c util/strlist.c util/trace-event.c ../lib/rbtree.c util/string.c util/symbol_fprintf.c util/units.c util/affinity.c util/rwsem.c util/hashmap.c util/perf_regs.c util/fncache.c util/perf-regs-arch/perf_regs_aarch64.c util/perf-regs-arch/perf_regs_arm.c util/perf-regs-arch/perf_regs_csky.c util/perf-regs-arch/perf_regs_loongarch.c util/perf-regs-arch/perf_regs_mips.c util/perf-regs-arch/perf_regs_powerpc.c util/perf-regs-arch/perf_regs_riscv.c util/perf-regs-arch/perf_regs_s390.c util/perf-regs-arch/perf_regs_x86.c.  Stop.
        make[1]: *** [Makefile.perf:242: sub-make] Error 2
        make: *** [Makefile:113: install-bin] Error 2
        make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
        $
      
      I.e. only the comments were removed.
      
      If we build it on the same fedora:38 system, but using NO_LIBTRACEEVENT=1
      
        $ make NO_LIBTRACEEVENT=1 CORESIGHT=1 O=/tmp/build/$(basename $PWD) -C tools/perf install-bin
        Makefile.perf:367: *** PYTHON_EXT_SRCS= util/python.c ../lib/ctype.c util/cap.c util/evlist.c util/evsel.c util/evsel_fprintf.c util/perf_event_attr_fprintf.c util/cpumap.c util/memswap.c util/mmap.c util/namespaces.c ../lib/bitmap.c ../lib/find_bit.c ../lib/list_sort.c ../lib/hweight.c ../lib/string.c ../lib/vsprintf.c util/thread_map.c util/util.c util/cgroup.c util/parse-branch-options.c util/rblist.c util/counts.c util/print_binary.c util/strlist.c ../lib/rbtree.c util/string.c util/symbol_fprintf.c util/units.c util/affinity.c util/rwsem.c util/hashmap.c util/perf_regs.c util/fncache.c util/perf-regs-arch/perf_regs_aarch64.c util/perf-regs-arch/perf_regs_arm.c util/perf-regs-arch/perf_regs_csky.c util/perf-regs-arch/perf_regs_loongarch.c util/perf-regs-arch/perf_regs_mips.c util/perf-regs-arch/perf_regs_powerpc.c util/perf-regs-arch/perf_regs_riscv.c util/perf-regs-arch/perf_regs_s390.c util/perf-regs-arch/perf_regs_x86.c.  Stop.
        make[1]: *** [Makefile.perf:242: sub-make] Error 2
        make: *** [Makefile:113: install-bin] Error 2
        make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
        $
      
      Both comments and the util/trace-event.c file removed.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Link: https://lore.kernel.org/r/ZTj6mfM9UqY2DggC@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      b27778ed
    • Namhyung Kim's avatar
      perf report: Fix hierarchy mode on pipe input · a6e4a4a1
      Namhyung Kim authored
      The hierarchy mode needs to setup output formats for each evsel.
      Normally setup_sorting() handles this at the beginning, but it cannot
      do that if data comes from a pipe since there's no evsel info before
      reading the data.  And then perf report cannot process the samples
      in hierarchy mode and think as if there's no sample.
      
      Let's check the condition and setup the output formats after reading
      data so that it can find evsels.
      
      Before:
      
        $ ./perf record -o- true | ./perf report -i- --hierarchy -q
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.000 MB - ]
        Error:
        The - data has no samples!
      
      After:
      
        $ ./perf record -o- true | ./perf report -i- --hierarchy -q
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.000 MB - ]
            94.76%        true
               94.76%        [kernel.kallsyms]
                  94.76%        [k] filemap_fault
             5.24%        perf-ex
                5.24%        [kernel.kallsyms]
                   5.06%        [k] __memset
                   0.18%        [k] native_write_msr
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Link: https://lore.kernel.org/r/20231025003121.2811738-1-namhyung@kernel.orgSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      a6e4a4a1
    • Namhyung Kim's avatar
      perf lock contention: Use per-cpu array map for spinlocks · b5711042
      Namhyung Kim authored
      Currently lock contention timestamp is maintained in a hash map keyed by
      pid.  That means it needs to get and release a map element (which is
      proctected by spinlock!) on each contention begin and end pair.  This
      can impact on performance if there are a lot of contention (usually from
      spinlocks).
      
      It used to go with task local storage but it had an issue on memory
      allocation in some critical paths.  Although it's addressed in recent
      kernels IIUC, the tool should support old kernels too.  So it cannot
      simply switch to the task local storage at least for now.
      
      As spinlocks create lots of contention and they disabled preemption
      during the spinning, it can use per-cpu array to keep the timestamp to
      avoid overhead in hashmap update and delete.
      
      In contention_begin, it's easy to check the lock types since it can see
      the flags.  But contention_end cannot see it.  So let's try to per-cpu
      array first (unconditionally) if it has an active element (lock != 0).
      Then it should be used and per-task tstamp map should not be used until
      the per-cpu array element is cleared which means nested spinlock
      contention (if any) was finished and it nows see (the outer) lock.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Song Liu <song@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20231020204741.1869520-3-namhyung@kernel.org
      b5711042
    • Namhyung Kim's avatar
      perf lock contention: Check race in tstamp elem creation · 6a070573
      Namhyung Kim authored
      When pelem is NULL, it'd create a new entry with zero data.  But it
      might be preempted by IRQ/NMI just before calling bpf_map_update_elem()
      then there's a chance to call it twice for the same pid.  So it'd be
      better to use BPF_NOEXIST flag and check the return value to prevent
      the race.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Song Liu <song@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20231020204741.1869520-2-namhyung@kernel.org
      6a070573
    • Namhyung Kim's avatar
      perf lock contention: Clear lock addr after use · d99317f2
      Namhyung Kim authored
      It checks the current lock to calculated the delta of contention time.
      The address is saved in the tstamp map which is allocated at begining of
      contention and released at end of contention.
      
      But it's possible for bpf_map_delete_elem() to fail.  In that case, the
      element in the tstamp map kept for the current lock and it makes the
      next contention for the same lock tracked incorrectly.  Specificially
      the next contention begin will see the existing element for the task and
      it'd just return.  Then the next contention end will see the element and
      calculate the time using the timestamp for the previous begin.
      
      This can result in a large value for two small contentions happened from
      time to time.  Let's clear the lock address so that it can be updated
      next time even if the bpf_map_delete_elem() failed.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Song Liu <song@kernel.org>
      Cc: bpf@vger.kernel.org
      Link: https://lore.kernel.org/r/20231020204741.1869520-1-namhyung@kernel.org
      d99317f2
    • Yang Jihong's avatar
      perf evsel: Rename evsel__increase_rlimit to rlimit__increase_nofile · e093a222
      Yang Jihong authored
      evsel__increase_rlimit() helper does nothing with evsel, and description
      of the functionality is inaccurate, rename it and move to util/rlimit.c.
      
      By the way, fix a checkppatch warning about misplaced license tag:
      
        WARNING: Misplaced SPDX-License-Identifier tag - use line 1 instead
        #160: FILE: tools/perf/util/rlimit.h:3:
        /* SPDX-License-Identifier: LGPL-2.1 */
      
      No functional change.
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Link: https://lore.kernel.org/r/20231023033144.1011896-1-yangjihong1@huawei.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      e093a222
    • Namhyung Kim's avatar
      perf bench sched pipe: Add -G/--cgroups option · 79a3371b
      Namhyung Kim authored
      The -G/--cgroups option is to put sender and receiver in different
      cgroups in order to measure cgroup context switch overheads.
      
      Users need to make sure the cgroups exist and accessible.  The following
      example should the effect of this change.  Please don't forget taskset
      before the perf bench to measure cgroup switches properly.  Otherwise
      each task would run on a different CPU and generate cgroup switches
      regardless of this change.
      
        # perf stat -e context-switches,cgroup-switches \
        > taskset -c 0 perf bench sched pipe -l 10000 > /dev/null
      
         Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 10000':
      
                    20,001      context-switches
                         2      cgroup-switches
      
               0.053449651 seconds time elapsed
      
               0.011286000 seconds user
               0.041869000 seconds sys
      
        # perf stat -e context-switches,cgroup-switches \
        > taskset -c 0 perf bench sched pipe -l 10000 -G AAA,BBB > /dev/null
      
         Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 10000 -G AAA,BBB':
      
                    20,001      context-switches
                    20,001      cgroup-switches
      
               0.052768627 seconds time elapsed
      
               0.006284000 seconds user
               0.046266000 seconds sys
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Link: https://lore.kernel.org/r/20231017202342.1353124-1-namhyung@kernel.org
      79a3371b
    • Michael Petlan's avatar
      perf test: Skip CoreSight tests if cs_etm// event is not available · cbf5f584
      Michael Petlan authored
      CoreSight might be not available, in such case, skip the tests.
      Signed-off-by: default avatarMichael Petlan <mpetlan@redhat.com>
      Reviewed-by: default avatarLeo Yan <leo.yan@linaro.org>
      Reviewed-by: default avatarCarsten Haitzler <carsten.haitzler@arm.com>
      Cc: vmolnaro@redhat.com
      Link: https://lore.kernel.org/r/20231019091137.22525-1-mpetlan@redhat.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      cbf5f584
  2. 20 Oct, 2023 3 commits
    • Yang Jihong's avatar
      perf data: Increase RLIMIT_NOFILE limit when open too many files in perf_data__create_dir() · c4a85263
      Yang Jihong authored
      If using parallel threads to collect data, perf record needs at least 6 fds
      per CPU. (one for sys_perf_event_open, four for pipe msg and ack of the
      pipe, see record__thread_data_open_pipes(), and one for open perf.data.XXX)
      For an environment with more than 100 cores, if perf record uses both
      `-a` and `--threads` options, it is easy to exceed the upper limit of the
      file descriptor number, when we run out of them try to increase the limits.
      
      Before:
        $ ulimit -n
        1024
        $ lscpu | grep 'On-line CPU(s)'
        On-line CPU(s) list:                0-159
        $ perf record --threads -a sleep 1
        Failed to create data directory: Too many open files
      
      After:
        $ ulimit -n
        1024
        $ lscpu | grep 'On-line CPU(s)'
        On-line CPU(s) list:                0-159
        $ perf record --threads -a sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.394 MB perf.data (1576 samples) ]
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20231013075945.698874-1-yangjihong1@huawei.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      c4a85263
    • Kajol Jain's avatar
      perf vendor events: Update PMC used in PM_RUN_INST_CMPL event for power10 platform · 3f8b6e5b
      Kajol Jain authored
      The CPI_STALL_RATIO metric group can be used to present the high
      level CPI stall breakdown metrics in powerpc, which will show:
      
      - DISPATCH_STALL_CPI ( Dispatch stall cycles per insn )
      - ISSUE_STALL_CPI ( Issue stall cycles per insn )
      - EXECUTION_STALL_CPI ( Execution stall cycles per insn )
      - COMPLETION_STALL_CPI ( Completion stall cycles per insn )
      
      Commit cf26e043 ("perf vendor events power10: Add JSON
      metric events to present CPI stall cycles in powerpc)" which added
      the CPI_STALL_RATIO metric group, also modified
      the PMC value used in PM_RUN_INST_CMPL event from PMC4 to PMC5,
      to avoid multiplexing of events.
      But that got revert in recent changes. Fix this issue by changing
      back the PMC value used in PM_RUN_INST_CMPL to PMC5.
      
      Result with the fix:
      
       ./perf stat --metric-no-group -M CPI_STALL_RATIO <workload>
      
       Performance counter stats for 'workload':
      
              68,745,426      PM_CMPL_STALL                    #     0.21 COMPLETION_STALL_CPI
               7,692,827      PM_ISSUE_STALL                   #     0.02 ISSUE_STALL_CPI
             322,638,223      PM_RUN_INST_CMPL                 #     0.05 DISPATCH_STALL_CPI
                                                        #     0.48 EXECUTION_STALL_CPI
              16,858,553      PM_DISP_STALL_CYC
             153,880,133      PM_EXEC_STALL
      
             0.089774592 seconds time elapsed
      
      "--metric-no-group" is used for forcing PM_RUN_INST_CMPL to be scheduled
      in all group for more accuracy.
      
      Fixes: 7d473f47 ("perf vendor events: Move JSON/events to appropriate files for power10 platform")
      Reported-by: default avatarDisha Goel <disgoel@linux.vnet.ibm.com>
      Signed-off-by: default avatarKajol Jain <kjain@linux.ibm.com>
      Reviewed-by: default avatarAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Tested-by: Disha Goel<disgoel@linux.ibm.com>
      Cc: maddy@linux.ibm.com
      Link: https://lore.kernel.org/r/20231016143110.244255-1-kjain@linux.ibm.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      3f8b6e5b
    • Thomas Richter's avatar
      perf trace: Use the right bpf_probe_read(_str) variant for reading user data · 5069211e
      Thomas Richter authored
      Perf test case 111 Check open filename arg using perf trace + vfs_getname
      fails on s390. This is caused by a failing function
      bpf_probe_read() in file util/bpf_skel/augmented_raw_syscalls.bpf.c.
      
      The root cause is the lookup by address. Function bpf_probe_read()
      is used. This function works only for architectures
      with ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
      
      On s390 is not possible to determine from the address to which
      address space the address belongs to (user or kernel space).
      
      Replace bpf_probe_read() by bpf_probe_read_kernel()
      and bpf_probe_read_str() by bpf_probe_read_user_str() to
      explicity specify the address space the address refers to.
      
      Output before:
       # ./perf trace -eopen,openat -- touch /tmp/111
       libbpf: prog 'sys_enter': BPF program load failed: Invalid argument
       libbpf: prog 'sys_enter': -- BEGIN PROG LOAD LOG --
       reg type unsupported for arg#0 function sys_enter#75
       0: R1=ctx(off=0,imm=0) R10=fp0
       ; int sys_enter(struct syscall_enter_args *args)
       0: (bf) r6 = r1           ; R1=ctx(off=0,imm=0) R6_w=ctx(off=0,imm=0)
       ; return bpf_get_current_pid_tgid();
       1: (85) call bpf_get_current_pid_tgid#14      ; R0_w=scalar()
       2: (63) *(u32 *)(r10 -8) = r0 ; R0_w=scalar() R10=fp0 fp-8=????mmmm
       3: (bf) r2 = r10              ; R2_w=fp0 R10=fp0
       ;
       .....
       lines deleted here
       .....
       23: (bf) r3 = r6              ; R3_w=ctx(off=0,imm=0) R6=ctx(off=0,imm=0)
       24: (85) call bpf_probe_read#4
       unknown func bpf_probe_read#4
       processed 23 insns (limit 1000000) max_states_per_insn 0 \
      	 total_states 2 peak_states 2 mark_read 2
       -- END PROG LOAD LOG --
       libbpf: prog 'sys_enter': failed to load: -22
       libbpf: failed to load object 'augmented_raw_syscalls_bpf'
       libbpf: failed to load BPF skeleton 'augmented_raw_syscalls_bpf': -22
       ....
      
      Output after:
       # ./perf test -Fv 111
       111: Check open filename arg using perf trace + vfs_getname          :
       --- start ---
           1.085 ( 0.011 ms): touch/320753 openat(dfd: CWD, filename: \
      	"/tmp/temporary_file.SWH85", \
      	flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: IRUGO|IWUGO) = 3
       ---- end ----
       Check open filename arg using perf trace + vfs_getname: Ok
       #
      
      Test with the sleep command shows:
      Output before:
       # ./perf trace -e *sleep sleep 1.234567890
           0.000 (1234.681 ms): sleep/63114 clock_nanosleep(rqtp: \
               { .tv_sec: 0, .tv_nsec: 0 }, rmtp: 0x3ffe0979720) = 0
       #
      
      Output after:
       # ./perf trace -e *sleep sleep 1.234567890
           0.000 (1234.686 ms): sleep/64277 clock_nanosleep(rqtp: \
               { .tv_sec: 1, .tv_nsec: 234567890 }, rmtp: 0x3fff3df9ea0) = 0
       #
      
      Fixes: 14e4b9f4 ("perf trace: Raw augmented syscalls fix libbpf 1.0+ compatibility")
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Co-developed-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: gor@linux.ibm.com
      Cc: hca@linux.ibm.com
      Cc: sumanthk@linux.ibm.com
      Cc: svens@linux.ibm.com
      Link: https://lore.kernel.org/r/20231019082642.3286650-1-tmricht@linux.ibm.comSigned-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      5069211e
  3. 18 Oct, 2023 3 commits
  4. 17 Oct, 2023 18 commits