1. 16 May, 2023 2 commits
  2. 10 May, 2023 24 commits
    • Jiri Olsa's avatar
      perf lock contention: Add empty 'struct rq' to satisfy libbpf 'runqueue' type verification · 760ebc45
      Jiri Olsa authored
      If 'struct rq' isn't defined in lock_contention.bpf.c then the type for
      the 'runqueue' variable ends up being a forward declaration
      (BTF_KIND_FWD) while the kernel has it defined (BTF_KIND_STRUCT).
      
      This makes libbpf decide it has incompatible types and then fails to
      load the BPF skeleton:
      
        # perf lock con -ab sleep 1
        libbpf: extern (var ksym) 'runqueues': incompatible types, expected [95] fwd rq, but kernel has [55509] struct rq
        libbpf: failed to load object 'lock_contention_bpf'
        libbpf: failed to load BPF skeleton 'lock_contention_bpf': -22
        Failed to load lock-contention BPF skeleton
        lock contention BPF setup failed
        #
      
      Add it as an empty struct to satisfy that type verification:
      
        # perf lock con -ab sleep 1
         contended   total wait     max wait     avg wait         type   caller
      
                 2     50.64 us     25.38 us     25.32 us     spinlock   tick_do_update_jiffies64+0x25
                 1     26.18 us     26.18 us     26.18 us     spinlock   tick_do_update_jiffies64+0x25
        #
      
      Committer notes:
      
      Extracted from a larger patch as Namhyung had already fixed the other
      issues in e53de7b6 ("perf lock contention: Fix struct rq lock
      access").
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: Changbin Du <changbin.du@huawei.com>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Link: https://lore.kernel.org/lkml/ZFVqeKLssg7uzxzI@kravaSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      760ebc45
    • James Clark's avatar
      perf cs-etm: Fix contextid validation · bfd431cb
      James Clark authored
      Pre 5.11 kernels don't support 'contextid1' and 'contextid2' so
      validation would be skipped. By adding an additional check for
      'contextid', old kernels will still have validation done even though
      contextid would either be contextid1 or contextid2.
      
      Additionally now that it's possible to override options, an existing bug
      in the validation is revealed. 'val' is overwritten by the contextid1
      validation, and re-used for contextid2 validation causing it to always
      fail. '!val || val != 0x4' is the same as 'val != 0x4' because 0 is also
      != 4, so that expression can be simplified and the temp variable not
      overwritten.
      
      Fixes: 35c51f83 ("perf cs-etm: Validate options after applying them")
      Reviewed-by: default avatarLeo Yan <leo.yan@linaro.org>
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/all/20230501073452.GA4660@leoy-yangtze.lan
      Link: https://lore.kernel.org/r/20230504144822.1938717-1-james.clark@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bfd431cb
    • James Clark's avatar
      perf arm64: Fix build with refcount checking · a3cee974
      James Clark authored
      With EXTRA_CFLAGS=-DREFCNT_CHECKING=1 and build-test, some unwrapped
      map accesses appear. Wrap it in the new accessor to fix the error:
      
        error: 'struct perf_cpu_map' has no member named 'map'
      Signed-off-by: default avatarJames Clark <james.clark@arm.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Leach <mike.leach@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: https://lore.kernel.org/r/20230504160845.2065510-1-james.clark@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a3cee974
    • Sandipan Das's avatar
      perf test: Add stat test for record and script · 86698623
      Sandipan Das authored
      When using the global aggregation mode, running perf script after perf
      stat record can result in a segmentation fault as seen with commit
      8b76a318 ("perf stat: Remove unused perf_counts.aggr field").
      
      Add a basic test to the existing suite of stat-related tests for
      checking if that workflow runs without erroring out.
      Signed-off-by: default avatarSandipan Das <sandipan.das@amd.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ananth Narayan <ananth.narayan@amd.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Link: https://lore.kernel.org/r/6a5429879764e3dac984cbb11ee2d95cc1604161.1683280603.git.sandipan.das@amd.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      86698623
    • Sandipan Das's avatar
      perf script: Skip aggregation for stat events · 2fe65759
      Sandipan Das authored
      The script command does not support aggregation modes by itself although
      that can be achieved using post-processing scripts. Because of this, it
      does not allocate memory for aggregated event values.
      
      Upon running perf stat record, the aggregation mode is set in the perf
      data file. If the mode is AGGR_GLOBAL, the aggregated event values are
      accessed and this leads to a segmentation fault since these were never
      allocated to begin with. Set the mode to AGGR_NONE explicitly to avoid
      this.
      
      E.g.
      
        $ perf stat record -e cycles true
        $ perf script
      
      Before:
        Segmentation fault (core dumped)
      
      After:
        CPU   THREAD             VAL             ENA             RUN            TIME EVENT
         -1   231919          162831          362069          362069          935289 cycles:u
      
      Fixes: 8b76a318 ("perf stat: Remove unused perf_counts.aggr field")
      Signed-off-by: default avatarSandipan Das <sandipan.das@amd.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ananth Narayan <ananth.narayan@amd.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: stable@vger.kernel.org # v6.2+
      Link: https://lore.kernel.org/r/83d6c6c05c54bf00c5a9df32ac160718efca0c7a.1683280603.git.sandipan.das@amd.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2fe65759
    • Ian Rogers's avatar
      perf build: Add system include paths to BPF builds · a2af0f6b
      Ian Rogers authored
      There are insufficient headers in tools/include to satisfy building BPF
      programs and their header dependencies. Add the system include paths
      from the non-BPF clang compile so that these headers can be found.
      
      This code was taken from:
      
        tools/testing/selftests/bpf/Makefile
      
      Committer notes:
      
      Had to adjust the '#ifndef NO_BPF_SKEL' to '#ifdef BUILD_BPF_SKEL' as
      reverted that build BPF skels by default.
      
      Also cope with the addition of -I$(srctree)/tools/include/uapi done by
      Yang Jihong so that we prefer using the kernel sources headers instead
      of older ones in the system.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>,
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@meta.com>
      Cc: Tom Rix <trix@redhat.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Link: https://lore.kernel.org/lkml/20230506021450.3499232-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a2af0f6b
    • Yang Jihong's avatar
      perf bpf skels: Make vmlinux.h use bpf.h and perf_event.h in source directory · 5be6cecd
      Yang Jihong authored
      Currently, vmlinux.h uses the bpf.h and perf_event.h header files in the
      system path. If the header files in compilation environment are old,
      compilation may fail. For example:
      
        /home/yangjihong/linux/tools/perf/util/bpf_skel/.tmp/../vmlinux.h:151:27: error: field has incomplete type 'union perf_sample_weight'
                union perf_sample_weight weight;
      
      Use the bpf.h and perf_event.h files in the source code directory to
      avoid compilation compatibility problems.
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tom Rix <trix@redhat.com>
      Link: https://lore.kernel.org/r/20230510064401.225051-1-yangjihong1@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5be6cecd
    • Adrian Hunter's avatar
      perf parse-events: Do not break up AUX event group · 7d161165
      Adrian Hunter authored
      Do not assume which events may have a PMU name, allowing the logic to
      keep an AUX event group together.
      
      Example:
      
       Before:
      
          $ perf record --no-bpf-event -c 10 -e '{intel_pt//,tlb_flush.stlb_any/aux-sample-size=8192/pp}:u' -- sleep 0.1
          WARNING: events were regrouped to match PMUs
          Cannot add AUX area sampling to a group leader
          $
      
       After:
      
          $ perf record --no-bpf-event -c 10 -e '{intel_pt//,tlb_flush.stlb_any/aux-sample-size=8192/pp}:u' -- sleep 0.1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 0.078 MB perf.data ]
          $ perf script -F-dso,+addr | grep -C5 tlb_flush.stlb_any | head -11
          sleep 20444 [003]  7939.510243:  1  branches:uH:  7f5350cc82a2 dl_main+0x9a2 => 7f5350cb38f0 _dl_add_to_namespace_list+0x0
          sleep 20444 [003]  7939.510243:  1  branches:uH:  7f5350cb3908 _dl_add_to_namespace_list+0x18 => 7f5350cbb080 rtld_mutex_dummy+0x0
          sleep 20444 [003]  7939.510243:  1  branches:uH:  7f5350cc8350 dl_main+0xa50 => 0 [unknown]
          sleep 20444 [003]  7939.510244:  1  branches:uH:  7f5350cc83ca dl_main+0xaca => 7f5350caeb60 _dl_process_pt_gnu_property+0x0
          sleep 20444 [003]  7939.510245:  1  branches:uH:  7f5350caeb60 _dl_process_pt_gnu_property+0x0 => 0 [unknown]
          sleep 20444  7939.510245:       10 tlb_flush.stlb_any/aux-sample-size=8192/pp: 0 7f5350caeb60 _dl_process_pt_gnu_property+0x0
          sleep 20444 [003]  7939.510254:  1  branches:uH:  7f5350cc87fe dl_main+0xefe => 7f5350ccd240 strcmp+0x0
          sleep 20444 [003]  7939.510254:  1  branches:uH:  7f5350cc8862 dl_main+0xf62 => 0 [unknown]
          sleep 20444 [003]  7939.510255:  1  branches:uH:  7f5350cc9cdc dl_main+0x23dc => 0 [unknown]
          sleep 20444 [003]  7939.510257:  1  branches:uH:  7f5350cc89f6 dl_main+0x10f6 => 7f5350cb9530 _dl_setup_hash+0x0
          sleep 20444 [003]  7939.510257:  1  branches:uH:  7f5350cc8a2d dl_main+0x112d => 7f5350cb3990 _dl_new_object+0x0
          $
      
      Fixes: 347c2f0a ("perf parse-events: Sort and group parsed events")
      Suggested-by: default avatarIan Rogers <irogers@google.com>
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: default avatarIan Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20230508093952.27482-3-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7d161165
    • Adrian Hunter's avatar
      perf test test_intel_pt.sh: Test sample mode with event with PMU name · a4680850
      Adrian Hunter authored
      br_misp_retired.all_branches is supported on processors that support
      Intel PT, so use it to test sample mode with an event that has been
      given a PMU name.
      
      Please note, the test fails prior to the fix "perf parse-events: Do not
      break up AUX event group".
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/r/20230508093952.27482-2-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a4680850
    • Ian Rogers's avatar
      perf evsel: Modify group pmu name for software events · 12336165
      Ian Rogers authored
      If we have a group of {cycles,faults} then we need the faults software
      event to appear to be on the same PMU as cycles so that we don't split
      the group in parse_events__sort_events_and_fix_groups.
      
      This case is relatively easy as cycles is the leader and will have a PMU
      name. In the reverse case, {faults,cycles} we still need faults to
      appear to have the PMU name of cycles but the old behavior is just to
      return "cpu".
      
      For hybrid this fails as cycles will be on "cpu_core" or "cpu_atom",
      causing faults to be split into a different group.
      
      Change the behavior for software events so that the whole group is
      searched for the named PMU.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Edward Baker <edward.baker@intel.com>
      Cc: Florian Fischer <florian.fischer@muhq.space>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Kang Minchul <tegongkang@gmail.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Samantha Alt <samantha.alt@intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Weilin Wang <weilin.wang@intel.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Link: https://lore.kernel.org/r/20230502223851.2234828-20-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      12336165
    • Yanteng Si's avatar
      tools arch x86: Sync the msr-index.h copy with the kernel sources · 34e82891
      Yanteng Si authored
      Picking the changes from:
      
        c68e3d47 ("x86/include/asm/msr-index.h: Add IFS Array test bits")
      
      Silencing these perf build warnings:
      
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
        diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
      Signed-off-by: default avatarYanteng Si <siyanteng@loongson.cn>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: loongson-kernel@lists.loongnix.cn
      Link: https://lore.kernel.org/r/05778ab3c168c8030f6b20e60375dc803f0cd300.1683712945.git.siyanteng@loongson.cnSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      34e82891
    • Yanteng Si's avatar
      tools headers kvm: Sync uapi/{asm/linux} kvm.h headers with the kernel sources · 705049ca
      Yanteng Si authored
      Picking the changes from:
      
        e65733b5 ("KVM: x86: Redefine 'longmode' as a flag for KVM_EXIT_HYPERCALL")
        30ec7997 ("KVM: arm64: timers: Allow userspace to set the global counter offset")
        821d935c ("KVM: arm64: Introduce support for userspace SMCCC filtering")
        81dc9504 ("KVM: arm64: nv: timers: Support hyp timer emulation")
        a8308b3f ("KVM: arm64: Refactor hvc filtering to support different actions")
        0e5c9a9d ("KVM: arm64: Expose SMC/HVC width to userspace")
      
      Silencing these perf build warnings:
      
       Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
       diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
       Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
       diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
       Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
       diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
      Signed-off-by: default avatarYanteng Si <siyanteng@loongson.cn>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: loongson-kernel@lists.loongnix.cn
      Link: https://lore.kernel.org/r/ac5adb58411d23b3360d436a65038fefe91c32a8.1683712945.git.siyanteng@loongson.cnSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      705049ca
    • Yanteng Si's avatar
      tools include UAPI: Sync the sound/asound.h copy with the kernel sources · 8d6a41c8
      Yanteng Si authored
      Picking the changes from:
      
        102882b5 ("ALSA: document that struct __snd_pcm_mmap_control64 is messed up")
        9f656705 ("ALSA: pcm: rewrite snd_pcm_playback_silence()")
      
      Silencing these perf build warnings:
      
        Warning: Kernel ABI header at 'tools/include/uapi/sound/asound.h' differs from latest version at 'include/uapi/sound/asound.h'
        diff -u tools/include/uapi/sound/asound.h include/uapi/sound/asound.h
      Signed-off-by: default avatarYanteng Si <siyanteng@loongson.cn>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: loongson-kernel@lists.loongnix.cn
      Link: https://lore.kernel.org/r/5606e7989bbb029c400117f2e455ab995208266f.1683712945.git.siyanteng@loongson.cnSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8d6a41c8
    • Yanteng Si's avatar
      tools headers UAPI: Sync the linux/const.h with the kernel headers · 92b8e61e
      Yanteng Si authored
      Picking the changes from:
      
        31088f6f ("uapi/linux/const.h: prefer ISO-friendly __typeof__")
      
      Silencing these perf build warnings::
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/const.h' differs from latest version at 'include/uapi/linux/const.h'
        diff -u tools/include/uapi/linux/const.h include/uapi/linux/const.h
      Signed-off-by: default avatarYanteng Si <siyanteng@loongson.cn>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: loongson-kernel@lists.loongnix.cn
      Link: https://lore.kernel.org/r/33e963df304394f932d9108a1b0bb327f23a4eca.1683712945.git.siyanteng@loongson.cnSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      92b8e61e
    • Yanteng Si's avatar
      tools headers UAPI: Sync the i915_drm.h with the kernel sources · e7ec3a24
      Yanteng Si authored
      Picking the changes from:
      
        1cc064dc ("drm/i915/perf: Add support for OA media units")
        c61d04c9 ("drm/i915/perf: Add engine class instance parameters to perf")
        02abecde ("drm/i915/uapi: Replace fake flex-array with flexible-array member")
      
      Silencing these perf build warnings:
      
        Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
        diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h
      Signed-off-by: default avatarYanteng Si <siyanteng@loongson.cn>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: loongson-kernel@lists.loongnix.cn
      Link: https://lore.kernel.org/r/4c0c150997ae1455f49094222daa121385643ae0.1683712945.git.siyanteng@loongson.cnSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e7ec3a24
    • Yanteng Si's avatar
      tools headers UAPI: Sync the drm/drm.h with the kernel sources · e6232180
      Yanteng Si authored
      Picking the changes from:
      
        60687716 ("drm: document DRM_IOCTL_PRIME_HANDLE_TO_FD and PRIME_FD_TO_HANDLE")
        61a55f8b ("drm: document expectations for GETFB2 handles")
        158350aa ("drm: document DRM_IOCTL_GEM_CLOSE")
      
      Silencing these perf build warnings:
      
        Warning: Kernel ABI header at 'tools/include/uapi/drm/drm.h' differs from latest version at 'include/uapi/drm/drm.h'
        diff -u tools/include/uapi/drm/drm.h include/uapi/drm/drm.h
      
      No changes in tooling as these are just C comment documentation changes.
      Signed-off-by: default avatarYanteng Si <siyanteng@loongson.cn>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: loongson-kernel@lists.loongnix.cn
      Link: https://lore.kernel.org/r/7552c61660bf079f2979fdcbcef8e921255f877a.1683712945.git.siyanteng@loongson.cnSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e6232180
    • Yanteng Si's avatar
      tools headers UAPI: Sync the linux/in.h with the kernel sources · 5d1ac59f
      Yanteng Si authored
      Picking the changes from:
      
        91d0b78c ("inet: Add IP_LOCAL_PORT_RANGE socket option")
      
      Silencing these perf build warnings:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/in.h' differs from latest version at 'include/uapi/linux/in.h'
        diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h
      Signed-off-by: default avatarYanteng Si <siyanteng@loongson.cn>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: loongson-kernel@lists.loongnix.cn
      Link: https://lore.kernel.org/r/23aabc69956ac94fbf388b05c8be08a64e8c7ccc.1683712945.git.siyanteng@loongson.cnSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5d1ac59f
    • Arnaldo Carvalho de Melo's avatar
      perf build: Gracefully fail the build if BUILD_BPF_SKEL=1 is specified and clang isn't available · b0618f38
      Arnaldo Carvalho de Melo authored
      Build BPF skels require having a compiler able to generate BPF bytecode,
      and so far this is only possible with clang, so check for its
      availability and fail the build when the user explicitely ask for BPF
      skels to be built.
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Song Liu <songliubraving@meta.com>
      Yang: Yang Jihong <yangjihong1@huawei.com>,
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b0618f38
    • Thomas Richter's avatar
      perf test java symbol: Remove needless debuginfod queries · 5f0b89e6
      Thomas Richter authored
      Test case 'Test java symbol' might run for a long time. On Fedora 38 the
      run time is very, very long:
      
        Output before:
        # time ./perf test 108
        108: Test java symbol                  : Ok
        real   22m15.775s
        user   3m42.584s
        sys    4m30.685s
        #
      
      The reason is a lookup for the server for debug symbols as shown in:
      
        # cat /etc/debuginfod/elfutils.urls
        https://debuginfod.fedoraproject.org/
        #
      
      This lookup is done for every symbol/sample, so about 3500 lookups
      will take place.
      
      To omit this lookup, which is not needed, unset environment variable
      DEBUGINFOD_URLS=''.
      
        Output after:
        # time ./perf test 108
        108: Test java symbol                  : Ok
      
        real	0m6.242s
        user	0m4.982s
        sys	0m3.243s
        #
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Link: https://lore.kernel.org/r/20230509131847.835974-1-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5f0b89e6
    • Ian Rogers's avatar
      perf parse-events: Don't reorder ungrouped events by PMU · 327daf34
      Ian Rogers authored
      The pmu_group_name by default returns "cpu" which on non-hybrid/ARM
      means that ungrouped software, and hardware events are all going to
      sort by the original insertion index.
      
      However, on hybrid and ARM wildcard expansion may mean the PMU name is
      set and events will be unnecessarily reordered - triggering the
      reordering warning.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Edward Baker <edward.baker@intel.com>
      Cc: Florian Fischer <florian.fischer@muhq.space>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kang Minchul <tegongkang@gmail.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Samantha Alt <samantha.alt@intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Weilin Wang <weilin.wang@intel.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Link: https://lore.kernel.org/r/20230502223851.2234828-5-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      327daf34
    • Ian Rogers's avatar
      perf metric: JSON flag to not group events if gathering a metric group · ccc66c60
      Ian Rogers authored
      Some metric groups have metrics that don't have fully overlapping
      events, meaning that the group's events become unique event groups that
      may need to multiplex with each other. This can be particularly
      unfortunate when the groups wouldn't need to multiplex because there are
      sufficient hardware counters.
      
      Add a flag so that if recording a metric group then the metrics within
      the group needn't use groups for their events. The flag is added to
      Intel TopdownL1 and TopdownL2 metrics.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Edward Baker <edward.baker@intel.com>
      Cc: Florian Fischer <florian.fischer@muhq.space>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kang Minchul <tegongkang@gmail.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Samantha Alt <samantha.alt@intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Weilin Wang <weilin.wang@intel.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Link: https://lore.kernel.org/r/20230502223851.2234828-4-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ccc66c60
    • Ian Rogers's avatar
      perf stat: Introduce skippable evsels · 1b114824
      Ian Rogers authored
      'perf stat' with no arguments will use default events and metrics. These
      events may fail to open even with kernel and hypervisor disabled. When
      these fail then the permissions error appears even though they were
      implicitly selected. This is particularly a problem with the automatic
      selection of the TopdownL1 metric group on certain architectures like
      Skylake:
      
        $ perf stat true
        Error:
        Access to performance monitoring and observability operations is limited.
        Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
        access to performance monitoring and observability operations for processes
        without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
        More information can be found at 'Perf events and tool security' document:
        https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
        perf_event_paranoid setting is 2:
          -1: Allow use of (almost) all events by all users
              Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
        >= 0: Disallow raw and ftrace function tracepoint access
        >= 1: Disallow CPU event access
        >= 2: Disallow kernel profiling
        To make the adjusted perf_event_paranoid setting permanent preserve it
        in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
        $
      
      This patch adds skippable evsels that when they fail to open won't cause
      termination and will appear as "<not supported>" in output. The
      TopdownL1 events, from the metric group, are marked as skippable. This
      turns the failure above to:
      
        $ perf stat perf bench internals synthesize
        Computing performance of single threaded perf event synthesis by
        synthesizing events on the perf process itself:
          Average synthesis took: 49.287 usec (+- 0.083 usec)
          Average num. events: 3.000 (+- 0.000)
          Average time per event 16.429 usec
          Average data synthesis took: 49.641 usec (+- 0.085 usec)
          Average num. events: 11.000 (+- 0.000)
          Average time per event 4.513 usec
      
         Performance counter stats for 'perf bench internals synthesize':
      
                  1,222.38 msec task-clock:u                     #    0.993 CPUs utilized
                         0      context-switches:u               #    0.000 /sec
                         0      cpu-migrations:u                 #    0.000 /sec
                       162      page-faults:u                    #  132.529 /sec
               774,445,184      cycles:u                         #    0.634 GHz                         (49.61%)
             1,640,969,811      instructions:u                   #    2.12  insn per cycle              (59.67%)
               302,052,148      branches:u                       #  247.102 M/sec                       (59.69%)
                 1,807,718      branch-misses:u                  #    0.60% of all branches             (59.68%)
                 5,218,927      CPU_CLK_UNHALTED.REF_XCLK:u      #    4.269 M/sec
                                                          #     17.3 %  tma_frontend_bound
                                                          #     56.4 %  tma_retiring
                                                          #      nan %  tma_backend_bound
                                                          #      nan %  tma_bad_speculation      (60.01%)
               536,580,469      IDQ_UOPS_NOT_DELIVERED.CORE:u    #  438.965 M/sec                       (60.33%)
           <not supported>      INT_MISC.RECOVERY_CYCLES_ANY:u
                 5,223,936      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u #    4.274 M/sec                       (40.31%)
               774,127,250      CPU_CLK_UNHALTED.THREAD:u        #  633.297 M/sec                       (50.34%)
             1,746,579,518      UOPS_RETIRED.RETIRE_SLOTS:u      #    1.429 G/sec                       (50.12%)
             1,940,625,702      UOPS_ISSUED.ANY:u                #    1.588 G/sec                       (49.70%)
      
               1.231055525 seconds time elapsed
      
               0.258327000 seconds user
               0.965749000 seconds sys
        $
      
      The event INT_MISC.RECOVERY_CYCLES_ANY:u is skipped as it can't be
      opened with paranoia 2 on Skylake. With a lower paranoia, or as root,
      all events/metrics are computed.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Edward Baker <edward.baker@intel.com>
      Cc: Florian Fischer <florian.fischer@muhq.space>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kang Minchul <tegongkang@gmail.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Samantha Alt <samantha.alt@intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Weilin Wang <weilin.wang@intel.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Link: https://lore.kernel.org/r/20230502223851.2234828-3-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1b114824
    • Ian Rogers's avatar
      perf metric: Change divide by zero and !support events behavior · 2a939c86
      Ian Rogers authored
      Division by zero causes expression parsing to fail and no metric to be
      generated. This can mean for short running benchmarks metrics are not
      shown. Change the behavior to make the value nan, which gets shown like:
      
      '''
      $ perf stat -M TopdownL2 true
      
       Performance counter stats for 'true':
      
               1,031,492      INST_RETIRED.ANY                 #      nan %  tma_fetch_bandwidth
                                                        #      nan %  tma_heavy_operations
                                                        #      nan %  tma_light_operations
                  29,304      CPU_CLK_UNHALTED.REF_XCLK        #      nan %  tma_fetch_latency
                                                        #      nan %  tma_branch_mispredicts
                                                        #      nan %  tma_machine_clears
                                                        #      nan %  tma_core_bound
                                                        #      nan %  tma_memory_bound
               2,658,319      IDQ_UOPS_NOT_DELIVERED.CORE
                  11,167      EXE_ACTIVITY.BOUND_ON_STORES
                 262,058      EXE_ACTIVITY.1_PORTS_UTIL
           <not counted>      BR_MISP_RETIRED.ALL_BRANCHES                                            (0.00%)
           <not counted>      INT_MISC.RECOVERY_CYCLES_ANY                                            (0.00%)
           <not counted>      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE                                        (0.00%)
           <not counted>      CPU_CLK_UNHALTED.THREAD                                                 (0.00%)
           <not counted>      UOPS_RETIRED.RETIRE_SLOTS                                               (0.00%)
           <not counted>      CYCLE_ACTIVITY.STALLS_MEM_ANY                                           (0.00%)
           <not counted>      UOPS_RETIRED.MACRO_FUSED                                                (0.00%)
           <not counted>      IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE                                        (0.00%)
           <not counted>      EXE_ACTIVITY.2_PORTS_UTIL                                               (0.00%)
           <not counted>      CYCLE_ACTIVITY.STALLS_TOTAL                                             (0.00%)
           <not counted>      MACHINE_CLEARS.COUNT                                                    (0.00%)
           <not counted>      UOPS_ISSUED.ANY                                                         (0.00%)
      
             0.002864879 seconds time elapsed
      
             0.003012000 seconds user
             0.000000000 seconds sys
      '''
      
      When events aren't supported a count of 0 can be confusing and make
      metrics look meaningful. Change these to be nan also which, with the
      next change, gets shown like:
      
      '''
      $ perf stat true
       Performance counter stats for 'true':
      
                    1.25 msec task-clock:u                     #    0.387 CPUs utilized
                       0      context-switches:u               #    0.000 /sec
                       0      cpu-migrations:u                 #    0.000 /sec
                      46      page-faults:u                    #   36.702 K/sec
                 255,942      cycles:u                         #    0.204 GHz                         (88.66%)
                 123,046      instructions:u                   #    0.48  insn per cycle
                  28,301      branches:u                       #   22.580 M/sec
                   2,489      branch-misses:u                  #    8.79% of all branches
                   4,719      CPU_CLK_UNHALTED.REF_XCLK:u      #    3.765 M/sec
                                                        #      nan %  tma_frontend_bound
                                                        #      nan %  tma_retiring
                                                        #      nan %  tma_backend_bound
                                                        #      nan %  tma_bad_speculation
                 344,855      IDQ_UOPS_NOT_DELIVERED.CORE:u    #  275.147 M/sec
         <not supported>      INT_MISC.RECOVERY_CYCLES_ANY:u
           <not counted>      CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u                                        (0.00%)
           <not counted>      CPU_CLK_UNHALTED.THREAD:u                                               (0.00%)
           <not counted>      UOPS_RETIRED.RETIRE_SLOTS:u                                             (0.00%)
           <not counted>      UOPS_ISSUED.ANY:u                                                       (0.00%)
      
             0.003238142 seconds time elapsed
      
             0.000000000 seconds user
             0.003434000 seconds sys
      '''
      
      Ensure that nan metric values are quoted as nan isn't a valid number
      in JSON.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Tested-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ahmad Yasin <ahmad.yasin@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Caleb Biggers <caleb.biggers@intel.com>
      Cc: Edward Baker <edward.baker@intel.com>
      Cc: Florian Fischer <florian.fischer@muhq.space>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.g.garry@oracle.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Kang Minchul <tegongkang@gmail.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Perry Taylor <perry.taylor@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@amd.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Samantha Alt <samantha.alt@intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
      Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
      Cc: Thomas Richter <tmricht@linux.ibm.com>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Weilin Wang <weilin.wang@intel.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: Yang Jihong <yangjihong1@huawei.com>
      Link: https://lore.kernel.org/r/20230502223851.2234828-2-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2a939c86
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v6.4-2' of... · ad2fd53a
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull x86 platform driver fixes from Hans de Goede:
       "Nothing special to report just various small fixes:
      
         - thinkpad_acpi: Fix profile (performance/bal/low-power) regression
           on T490
      
         - misc other small fixes / hw-id additions"
      
      * tag 'platform-drivers-x86-v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
        platform/mellanox: fix potential race in mlxbf-tmfifo driver
        platform/x86: touchscreen_dmi: Add info for the Dexp Ursus KX210i
        platform/x86: touchscreen_dmi: Add upside-down quirk for GDIX1002 ts on the Juno Tablet
        platform/x86: thinkpad_acpi: Add profile force ability
        platform/x86: thinkpad_acpi: Fix platform profiles on T490
        platform/x86: hp-wmi: add micmute to hp_wmi_keymap struct
        platform/x86/intel-uncore-freq: Return error on write frequency
        platform/x86: intel_scu_pcidrv: Add back PCI ID for Medfield
      ad2fd53a
  3. 09 May, 2023 9 commits
  4. 08 May, 2023 2 commits
  5. 07 May, 2023 3 commits
    • Linus Torvalds's avatar
      Linux 6.4-rc1 · ac9a7868
      Linus Torvalds authored
      ac9a7868
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-for-v6.4-3-2023-05-06' of... · f085df1b
      Linus Torvalds authored
      Merge tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tool updates from Arnaldo Carvalho de Melo:
       "Third version of perf tool updates, with the build problems with with
        using a 'vmlinux.h' generated from the main build fixed, and the bpf
        skeleton build disabled by default.
      
        Build:
      
         - Require libtraceevent to build, one can disable it using
           NO_LIBTRACEEVENT=1.
      
           It is required for tools like 'perf sched', 'perf kvm', 'perf
           trace', etc.
      
           libtraceevent is available in most distros so installing
           'libtraceevent-devel' should be a one-time event to continue
           building perf as usual.
      
           Using NO_LIBTRACEEVENT=1 produces tooling that is functional and
           sufficient for lots of users not interested in those libtraceevent
           dependent features.
      
         - Allow Python support in 'perf script' when libtraceevent isn't
           linked, as not all features requires it, for instance Intel PT does
           not use tracepoints.
      
         - Error if the python interpreter needed for jevents to work isn't
           available and NO_JEVENTS=1 isn't set, preventing a build without
           support for JSON vendor events, which is a rare but possible
           condition. The two check error messages:
      
              $(error ERROR: No python interpreter needed for jevents generation. Install python or build with NO_JEVENTS=1.)
              $(error ERROR: Python interpreter needed for jevents generation too old (older than 3.6). Install a newer python or build with NO_JEVENTS=1.)
      
         - Make libbpf 1.0 the minimum required when building with out of
           tree, distro provided libbpf.
      
         - Use libsdtc++'s and LLVM's libcxx's __cxa_demangle, a portable C++
           demangler, add 'perf test' entry for it.
      
         - Make binutils libraries opt in, as distros disable building with it
           due to licensing, they were used for C++ demangling, for instance.
      
         - Switch libpfm4 to opt-out rather than opt-in, if libpfm-devel (or
           equivalent) isn't installed, we'll just have a build warning:
      
             Makefile.config:1144: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev
      
         - Add a feature test for scandirat(), that is not implemented so far
           in musl and uclibc, disabling features that need it, such as
           scanning for tracepoints in /sys/kernel/tracing/events.
      
        perf BPF filters:
      
         - New feature where BPF can be used to filter samples, for instance:
      
            $ sudo ./perf record -e cycles --filter 'period > 1000' true
            $ sudo ./perf script
                 perf-exec 2273949 546850.708501:       5029 cycles:  ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
                 perf-exec 2273949 546850.708508:      32409 cycles:  ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
                 perf-exec 2273949 546850.708526:     143369 cycles:  ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
                 perf-exec 2273949 546850.708600:     372650 cycles:  ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
                 perf-exec 2273949 546850.708791:     482953 cycles:  ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
                      true 2273949 546850.709036:     501985 cycles:  ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
                      true 2273949 546850.709292:     503065 cycles:      7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
      
         - In addition to 'period' (PERF_SAMPLE_PERIOD), the other
           PERF_SAMPLE_ can be used for filtering, and also some other sample
           accessible values, from tools/perf/Documentation/perf-record.txt:
      
              Essentially the BPF filter expression is:
      
              <term> <operator> <value> (("," | "||") <term> <operator> <value>)*
      
           The <term> can be one of:
              ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
              code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat,
              p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock,
              mem_dtlb, mem_blk, mem_hops
      
           The <operator> can be one of:
              ==, !=, >, >=, <, <=, &
      
           The <value> can be one of:
              <number> (for any term)
              na, load, store, pfetch, exec (for mem_op)
              l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl)
              na, none, hit, miss, hitm, fwd, peer (for mem_snoop)
              remote (for mem_remote)
              na, locked (for mem_locked)
              na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb)
              na, by_data, by_addr (for mem_blk)
              hops0, hops1, hops2, hops3 (for mem_hops)
      
        perf lock contention:
      
         - Show lock type with address.
      
         - Track and show mmap_lock, siglock and per-cpu rq_lock with address.
           This is done for mmap_lock by following the current->mm pointer:
      
            $ sudo ./perf lock con -abl -- sleep 10
             contended   total wait     max wait     avg wait            address   symbol
             ...
                 16344    312.30 ms      2.22 ms     19.11 us   ffff8cc702595640
                 17686    310.08 ms      1.49 ms     17.53 us   ffff8cc7025952c0
                     3     84.14 ms     45.79 ms     28.05 ms   ffff8cc78114c478   mmap_lock
                  3557     76.80 ms     68.75 us     21.59 us   ffff8cc77ca3af58
                     1     68.27 ms     68.27 ms     68.27 ms   ffff8cda745dfd70
                     9     54.53 ms      7.96 ms      6.06 ms   ffff8cc7642a48b8   mmap_lock
                 14629     44.01 ms     60.00 us      3.01 us   ffff8cc7625f9ca0
                  3481     42.63 ms    140.71 us     12.24 us   ffffffff937906ac   vmap_area_lock
                 16194     38.73 ms     42.15 us      2.39 us   ffff8cd397cbc560
                    11     38.44 ms     10.39 ms      3.49 ms   ffff8ccd6d12fbb8   mmap_lock
                     1      5.43 ms      5.43 ms      5.43 ms   ffff8cd70018f0d8
                  1674      5.38 ms    422.93 us      3.21 us   ffffffff92e06080   tasklist_lock
                   581      4.51 ms    130.68 us      7.75 us   ffff8cc9b1259058
                     5      3.52 ms      1.27 ms    703.23 us   ffff8cc754510070
                   112      3.47 ms     56.47 us     31.02 us   ffff8ccee38b3120
                   381      3.31 ms     73.44 us      8.69 us   ffffffff93790690   purge_vmap_area_lock
                   255      3.19 ms     36.35 us     12.49 us   ffff8d053ce30c80
      
         - Update default map size to 16384.
      
         - Allocate single letter option -M for --map-nr-entries, as it is
           proving being frequently used.
      
         - Fix struct rq lock access for older kernels with BPF's CO-RE
           (Compile once, run everywhere).
      
         - Fix problems found with MSAn.
      
        perf report/top:
      
         - Add inline information when using --call-graph=fp or lbr, as was
           already done to the --call-graph=dwarf callchain mode.
      
         - Improve the 'srcfile' sort key performance by really using an
           optimization introduced in 6.2 for the 'srcline' sort key that
           avoids calling addr2line for comparision with each sample.
      
        perf sched:
      
         - Make 'perf sched latency/map/replay' to use "sched:sched_waking"
           instead of "sched:sched_waking", consistent with 'perf record'
           since d566a9c2 ("perf sched: Prefer sched_waking event when it
           exists").
      
        perf ftrace:
      
         - Make system wide the default target for latency subcommand, run the
           following command then generate some network traffic and press
           control+C:
      
             # perf ftrace latency -T __kfree_skb
           ^C
               DURATION     |      COUNT | GRAPH                                          |
                0 - 1    us |         27 | #############                                  |
                1 - 2    us |         22 | ###########                                    |
                2 - 4    us |          8 | ####                                           |
                4 - 8    us |          5 | ##                                             |
                8 - 16   us |         24 | ############                                   |
               16 - 32   us |          2 | #                                              |
               32 - 64   us |          1 |                                                |
               64 - 128  us |          0 |                                                |
              128 - 256  us |          0 |                                                |
              256 - 512  us |          0 |                                                |
              512 - 1024 us |          0 |                                                |
                1 - 2    ms |          0 |                                                |
                2 - 4    ms |          0 |                                                |
                4 - 8    ms |          0 |                                                |
                8 - 16   ms |          0 |                                                |
               16 - 32   ms |          0 |                                                |
               32 - 64   ms |          0 |                                                |
               64 - 128  ms |          0 |                                                |
              128 - 256  ms |          0 |                                                |
              256 - 512  ms |          0 |                                                |
              512 - 1024 ms |          0 |                                                |
                1 - ...   s |          0 |                                                |
             #
      
        perf top:
      
         - Add --branch-history (LBR: Last Branch Record) option, just like
           already available for 'perf record'.
      
         - Fix segfault in thread__comm_len() where thread->comm was being
           used outside thread->comm_lock.
      
        perf annotate:
      
         - Allow configuring objdump and addr2line in ~/.perfconfig., so that
           you can use alternative binaries, such as llvm's.
      
        perf kvm:
      
         - Add TUI mode for 'perf kvm stat report'.
      
        Reference counting:
      
         - Add reference count checking infrastructure to check for use after
           free, done to the 'cpumap', 'namespaces', 'maps' and 'map' structs,
           more to come.
      
           To build with it use -DREFCNT_CHECKING=1 in the make command line
           to build tools/perf. Documented at:
      
             https://perf.wiki.kernel.org/index.php/Reference_Count_Checking
      
         - The above caught, for instance, fix, present in this series:
      
              - Fix maps use after put in 'perf test "Share thread maps"':
      
                'maps' is copied from leader, but the leader is put on line 79
                and then 'maps' is used to read the reference count below - so
                a use after put, with the put of maps happening within
                thread__put.
      
           Fixed by reversing the order of puts so that the leader is put
           last.
      
         - Also several fixes were made to places where reference counts were
           not being held.
      
         - Make this one of the tests in 'make -C tools/perf build-test' to
           regularly build test it and to make sure no direct access to the
           reference counted structs are made, doing that via accessors to
           check the validity of the struct pointer.
      
        ARM64:
      
         - Fix 'perf report' segfault when filtering coresight traces by
           sparse lists of CPUs.
      
         - Add support for 'simd' as a sort field for 'perf report', to show
           ARM's NEON SIMD's predicate flags: "partial" and "empty".
      
        arm64 vendor events:
      
         - Add N1 metrics.
      
        Intel vendor events:
      
         - Add graniterapids, grandridge and sierraforrest events.
      
         - Refresh events for: alderlake, aldernaken, broadwell, broadwellde,
           broadwellx, cascadelakx, haswell, haswellx, icelake, icelakex,
           jaketown, meteorlake, knightslanding, sandybridge, sapphirerapids,
           silvermont, skylake, tigerlake and westmereep-dp
      
         - Refresh metrics for alderlake-n, broadwell, broadwellde,
           broadwellx, haswell, haswellx, icelakex, ivybridge, ivytown and
           skylakex.
      
        perf stat:
      
         - Implement --topdown using JSON metrics.
      
         - Add TopdownL1 JSON metric as a default if present, but disable it
           for now for some Intel hybrid architectures, a series of patches
           addressing this is being reviewed and will be submitted for v6.5.
      
         - Use metrics for --smi-cost.
      
         - Update topdown documentation.
      
        Vendor events (JSON) infrastructure:
      
         - Add support for computing and printing metric threshold values. For
           instance, here is one found in thesapphirerapids json file:
      
             {
                 "BriefDescription": "Percentage of cycles spent in System Management Interrupts.",
                 "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
                 "MetricGroup": "smi",
                 "MetricName": "smi_cycles",
                 "MetricThreshold": "smi_cycles > 0.1",
                 "ScaleUnit": "100%"
             },
      
         - Test parsing metric thresholds with the fake PMU in 'perf test
           pmu-events'.
      
         - Support for printing metric thresholds in 'perf list'.
      
         - Add --metric-no-threshold option to 'perf stat'.
      
         - Add rand (reverse and) and has_pmem (optane memory) support to
           metrics.
      
         - Sort list of input files to avoid depending on the order from
           readdir() helping in obtaining reproducible builds.
      
        S/390:
      
         - Add common metrics: - CPI (cycles per instruction), prbstate (ratio
           of instructions executed in problem state compared to total number
           of instructions), l1mp (Level one instruction and data cache misses
           per 100 instructions).
      
         - Add cache metrics for z13, z14, z15 and z16.
      
         - Add metric for TLB and cache.
      
        ARM:
      
         - Add raw decoding for SPE (Statistical Profiling Extension) v1.3 MTE
           (Memory Tagging Extension) and MOPS (Memory Operations) load/store.
      
        Intel PT hardware tracing:
      
         - Add event type names UINTR (User interrupt delivered) and UIRET
           (Exiting from user interrupt routine), documented in table 32-50
           "CFE Packet Type and Vector Fields Details" in the Intel Processor
           Trace chapter of The Intel SDM Volume 3 version 078.
      
         - Add support for new branch instructions ERETS and ERETU.
      
         - Fix CYC timestamps after standalone CBR
      
        ARM CoreSight hardware tracing:
      
         - Allow user to override timestamp and contextid settings.
      
         - Fix segfault in dso lookup.
      
         - Fix timeless decode mode detection.
      
         - Add separate decode paths for timeless and per-thread modes.
      
        auxtrace:
      
         - Fix address filter entire kernel size.
      
        Miscellaneous:
      
         - Fix use-after-free and unaligned bugs in the PLT handling routines.
      
         - Use zfree() to reduce chances of use after free.
      
         - Add missing 0x prefix for addresses printed in hexadecimal in 'perf
           probe'.
      
         - Suppress massive unsupported target platform errors in the unwind
           code.
      
         - Fix return incorrect build_id size in elf_read_build_id().
      
         - Fix 'perf scripts intel-pt-events.py' IPC output for Python 2 .
      
         - Add missing new parameter in kfree_skb tracepoint to the python
           scripts using it.
      
         - Add 'perf bench syscall fork' benchmark.
      
         - Add support for printing PERF_MEM_LVLNUM_UNC (Uncached access) in
           'perf mem'.
      
         - Fix wrong size expectation for perf test 'Setup struct
           perf_event_attr' caused by the patch adding
           perf_event_attr::config3.
      
         - Fix some spelling mistakes"
      
      * tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (365 commits)
        Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"
        Revert "perf build: Warn for BPF skeletons if endian mismatches"
        perf metrics: Fix SEGV with --for-each-cgroup
        perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE
        perf stat: Separate bperf from bpf_profiler
        perf test record+probe_libc_inet_pton: Fix call chain match on x86_64
        perf test record+probe_libc_inet_pton: Fix call chain match on s390
        perf tracepoint: Fix memory leak in is_valid_tracepoint()
        perf cs-etm: Add fix for coresight trace for any range of CPUs
        perf build: Fix unescaped # in perf build-test
        perf unwind: Suppress massive unsupported target platform errors
        perf script: Add new parameter in kfree_skb tracepoint to the python scripts using it
        perf script: Print raw ip instead of binary offset for callchain
        perf symbols: Fix return incorrect build_id size in elf_read_build_id()
        perf list: Modify the warning message about scandirat(3)
        perf list: Fix memory leaks in print_tracepoint_events()
        perf lock contention: Rework offset calculation with BPF CO-RE
        perf lock contention: Fix struct rq lock access
        perf stat: Disable TopdownL1 on hybrid
        perf stat: Avoid SEGV on counter->name
        ...
      f085df1b
    • Linus Torvalds's avatar
      Merge tag 'core-debugobjects-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 17784de6
      Linus Torvalds authored
      Pull debugobjects fix from Thomas Gleixner:
       "A single fix for debugobjects:
      
        The recent fix to ensure atomicity of lookup and allocation
        inadvertently broke the pool refill mechanism, so that debugobject
        OOMs now in certain situations. The reason is that the functions which
        got updated no longer invoke debug_objecs_init(), which is now the
        only place to care about refilling the tracking object pool.
      
        Restore the original behaviour by adding explicit refill opportunities
        to those places"
      
      * tag 'core-debugobjects-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        debugobject: Ensure pool refill (again)
      17784de6