1. 24 Mar, 2020 17 commits
    • John Garry's avatar
      perf pmu: Add is_pmu_core() · d504fae9
      John Garry authored
      Add a function to decide whether a PMU is a core PMU.
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linuxarm@huawei.com
      Link: http://lore.kernel.org/lkml/1584442939-8911-6-git-send-email-john.garry@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d504fae9
    • John Garry's avatar
      perf test: Add pmu-events test · a6c925fd
      John Garry authored
      The initial test will verify that the test tables in generated pmu-events.c
      match against known, expected values.
      
      For known events added in pmu-events/arch/test, we need to add an entry
      in test_cpu_aliases_events[] or test_uncore_events[].
      
      A sample run is as follows for x86:
      
        john@linux-3c19:~/linux> tools/perf/perf test -vv 10
        10: PMU event aliases                                     :
        --- start ---
        test child forked, pid 5316
        testing event table bp_l1_btb_correct: pass
        testing event table bp_l2_btb_correct: pass
        testing event table segment_reg_loads.any: pass
        testing event table dispatch_blocked.any: pass
        testing event table eist_trans: pass
        testing event table uncore_hisi_ddrc.flux_wcmd: pass
        testing event table unc_cbo_xsnp_response.miss_eviction: pass
        test child finished with 0
        ---- end ----
        PMU event aliases: Ok
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linuxarm@huawei.com
      [ Fixup test_cpu_events[] and test_uncore_events[] sentinels to initialize one of its members to NULL, fixing the build in older compilers ]
      Link: http://lore.kernel.org/lkml/1584442939-8911-5-git-send-email-john.garry@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a6c925fd
    • John Garry's avatar
      perf pmu: Refactor pmu_add_cpu_aliases() · e45ad701
      John Garry authored
      Create pmu_add_cpu_aliases_map() from pmu_add_cpu_aliases(), so the caller
      can pass the map; the pmu-events test would use this since there would
      be no CPUID matching to a mapfile there.
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linuxarm@huawei.com
      Link: http://lore.kernel.org/lkml/1584442939-8911-4-git-send-email-john.garry@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e45ad701
    • John Garry's avatar
      perf jevents: Support test events folder · d8447808
      John Garry authored
      With the goal of supporting pmu-events test case, introduce support for
      a test events folder.
      
      These test events can be used for testing generation of pmu-event tables
      and alias creation for any arch.
      
      When running the pmu-events test case, these test events will be used as
      the platform-agnostic events, so aliases can be created per-PMU and
      validated against known expected values.
      
      To support the test events, add a "testcpu" entry in pmu_events_map[].
      The pmu-events test will be able to lookup the events map for "testcpu",
      to verify the generated tables against expected values.
      
      The resultant generated pmu-events.c will now look like the following:
      
        struct pmu_event pme_ampere_emag[] = {
        {
        	.name = "ldrex_spec",
        	.event = "event=0x6c",
        	.desc = "Exclusive operation spe...",
        	.topic = "intrinsic",
        	.long_desc = "Exclusive operation ...",
        },
        ...
        };
      
        struct pmu_event pme_test_cpu[] = {
        {
        	.name = "uncore_hisi_ddrc.flux_wcmd",
        	.event = "event=0x2",
        	.desc = "DDRC write commands. Unit: hisi_sccl,ddrc ",
        	.topic = "uncore",
        	.long_desc = "DDRC write commands",
        	.pmu = "hisi_sccl,ddrc",
        },
        {
        	.name = "unc_cbo_xsnp_response.miss_eviction",
        	.event = "umask=0x81,event=0x22",
        	.desc = "Unit: uncore_cbox A cross-core snoop resulted ...",
        	.topic = "uncore",
        	.long_desc = "A cross-core snoop resulted from L3 ...",
        	.pmu = "uncore_cbox",
        },
        {
        	.name = "eist_trans",
        	.event = "umask=0x0,period=200000,event=0x3a",
        	.desc = "Number of Enhanced Intel SpeedStep(R) ...",
        	.topic = "other",
        },
        {
        	.name = 0,
        },
        };
      
        struct pmu_events_map pmu_events_map[] = {
        ...
        {
        	.cpuid = "0x00000000500f0000",
        	.version = "v1",
        	.type = "core",
        	.table = pme_ampere_emag
        },
        ...
        {
        	.cpuid = "testcpu",
        	.version = "v1",
        	.type = "core",
        	.table = pme_test_cpu,
        },
        {
        	.cpuid = 0,
        	.version = 0,
        	.type = 0,
        	.table = 0,
        },
        };
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linuxarm@huawei.com
      Link: http://lore.kernel.org/lkml/1584442939-8911-3-git-send-email-john.garry@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d8447808
    • John Garry's avatar
      perf jevents: Add some test events · c52db67a
      John Garry authored
      Add some test PMU events. The events are randomly chosen from x86 and
      arm64 JSONs. The events include CPU and uncore events.
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: James Clark <james.clark@arm.com>
      Cc: Joakim Zhang <qiangqing.zhang@nxp.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: linuxarm@huawei.com
      Link: http://lore.kernel.org/lkml/1584442939-8911-2-git-send-email-john.garry@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c52db67a
    • Jiri Olsa's avatar
      perf tools: Unify a bit the build directory output · 7cd053d4
      Jiri Olsa authored
      Removing the extra 'SUBDIR' line from clean and doc build output.
      Because it's annoying.. ;-)
      
      Before:
      
        $ make clean
        ...
        SUBDIR   Documentation
        CLEAN    Documentation
      
      After:
      
        $ make clean
        ...
        CLEAN    Documentation
      
      Before:
      
        $ make doc
        BUILD:   Doing 'make -j8' parallel build
        SUBDIR   Documentation
        ASCIIDOC perf-stat.html
        ...
      
      After:
      
        $ make doc
        BUILD:   Doing 'make -j8' parallel build
        ASCIIDOC perf-stat.html
        ...
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200318204522.1200981-1-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7cd053d4
    • Arnaldo Carvalho de Melo's avatar
      tools headers uapi: Update linux/in.h copy · 29f36c16
      Arnaldo Carvalho de Melo authored
      To get the changes in:
      
        26776253 ("seg6: fix SRv6 L2 tunnels to use IANA-assigned protocol number")
      
      That ends up automatically adding the new IPPROTO_ETHERNET to the socket
      args beautifiers:
      
        $ tools/perf/trace/beauty/socket_ipproto.sh > before
      
      Apply this patch:
      
        $ tools/perf/trace/beauty/socket_ipproto.sh > after
        $ diff -u before after
        --- before	2020-03-19 11:48:36.876673819 -0300
        +++ after	2020-03-19 11:49:00.148541377 -0300
        @@ -6,6 +6,7 @@
         	[132] = "SCTP",
         	[136] = "UDPLITE",
         	[137] = "MPLS",
        +	[143] = "ETHERNET",
         	[17] = "UDP",
         	[1] = "ICMP",
         	[22] = "IDP",
        $
      
      Addresses this tools/perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/in.h' differs from latest version at 'include/uapi/linux/in.h'
        diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Paolo Lungaroni <paolo.lungaroni@cnit.it>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      29f36c16
    • Vijay Thakkar's avatar
      perf vendor events amd: Update Zen1 events to V2 · b5b8a7cf
      Vijay Thakkar authored
      This patch updates the PMCs for AMD Zen1 core based processors (Family
      17h; Models 0 through 2F) to be in accordance with PMCs as
      documented in the latest versions of the AMD Processor Programming
      Reference [1], [2] and [3]. Note that some events, such as FPU pipe
      assignment are missing in [1], and therefore [3] is included for full
      coverage of events.
      
      PMCs added:
      
        fpu_pipe_assignment.dual{0|1|2|3}
        fpu_pipe_assignment.total{0|1|2|3}
        ls_mab_alloc.dc_prefetcher
        ls_mab_alloc.stores
        ls_mab_alloc.loads
        bp_dyn_ind_pred
        bp_de_redirect
      
      PMC removed:
      
        ex_ret_cond_misp
      
      Cumulative counts, fpu_pipe_assignment.total and
      fpu_pipe_assignment.dual, existed in v1, but did expose port-level
      counters.
      
      ex_ret_cond_misp has been removed as it has been removed from the latest
      versions of the PPR, and when tested, always seems to sample zero as
      tested on a Ryzen 3400G system.
      
      [1]: Processor Programming Reference (PPR) for AMD Family 17h Models
      01h,08h, Revision B2 Processors, 54945 Rev 3.03 - Jun 14, 2019.
      
      [2]: Processor Programming Reference (PPR) for AMD Family 17h Model 18h,
      Revision B1 Processors, 55570-B1 Rev 3.14 - Sep 26, 2019.
      
      [3]: OSRR for AMD Family 17h processors, Models 00h-2Fh, 56255 Rev 3.03 - July, 2018
      
      All of the PPRs can be found at:
      https://bugzilla.kernel.org/show_bug.cgi?id=206537Signed-off-by: default avatarVijay Thakkar <vijaythakkar@me.com>
      Acked-by: default avatarKim Phillips <kim.phillips@amd.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Jon Grimm <jon.grimm@amd.com>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: vijay thakkar <vijaythakkar@me.com>
      Link: http://lore.kernel.org/lkml/20200318190002.307290-4-vijaythakkar@me.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b5b8a7cf
    • Vijay Thakkar's avatar
      perf vendor events amd: Add Zen2 events · 2079f7aa
      Vijay Thakkar authored
      This patch adds PMU events for AMD Zen2 core based processors, namely,
      Matisse (model 71h), Castle Peak (model 31h) and Rome (model 2xh), as
      documented in the AMD Processor Programming Reference for Matisse [1].
      The model number regex has been set to detect all the models under
      family 17 that do not match those of Zen1, as the range is larger for
      zen2.
      
      Zen2 adds some additional counters that are not present in Zen1 and
      events for them have been added in this patch. Some counters have also
      been removed for Zen2 thatwere previously present in Zen1 and have been
      confirmed to always sample zero on zen2. These added/removed counters
      have been omitted for brevity but can be found here:
      https://gist.github.com/thakkarV/5b12ca5fd7488eb2c42e451e40bdd5f3
      
      Note that PPR for Zen2 [1] does not include some counters that were
      documented in the PPR for Zen1 based processors [2]. After having tested
      these counters, some of them that still work for zen2 systems have been
      preserved in the events for zen2. The counters that are omitted in [1]
      but are still measurable and non-zero on zen2 (tested on a Ryzen 3900X
      system) are the following:
      
        PMC 0x000 fpu_pipe_assignment.{total|total0|total1|total2|total3}
        PMC 0x004 fp_num_mov_elim_scal_op.*
        PMC 0x046 ls_tablewalker.*
        PMC 0x062 l2_latency.l2_cycles_waiting_on_fills
        PMC 0x063 l2_wcb_req.*
        PMC 0x06D l2_fill_pending.l2_fill_busy
        PMC 0x080 ic_fw32
        PMC 0x081 ic_fw32_miss
        PMC 0x086 bp_snp_re_sync
        PMC 0x087 ic_fetch_stall.*
        PMC 0x08C ic_cache_inval.*
        PMC 0x099 bp_tlb_rel
        PMC 0x0C7 ex_ret_brn_resync
        PMC 0x28A ic_oc_mode_switch.*
        L3PMC 0x001 l3_request_g1.*
        L3PMC 0x006 l3_comb_clstr_state.*
      
      [1]: Processor Programming Reference (PPR) for AMD Family 17h Model 71h,
      Revision B0 Processors, 56176 Rev 3.06 - Jul 17, 2019
      
      [2]: Processor Programming Reference (PPR) for AMD Family 17h Models
      01h,08h, Revision B2 Processors, 54945 Rev 3.03 - Jun 14, 2019
      
      All of the PPRs can be found at:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=206537
      
      Here are the results of running "fpu_pipe_assignment.total" events on my
      Ryzen 3900X family 17h model 71h system:
      
      Before this patch:
      
        $> perf list *fpu_pipe_assignment*
      
      List of pre-defined events (to be used in -e):
      
      After:
      
        $> perf list *fpu_pipe_assignment*
      
        floating point:
        fpu_pipe_assignment.total
            [Total number of fp uOps]
        fpu_pipe_assignment.total0
            [Total number uOps assigned to pipe 0]
        fpu_pipe_assignment.total1
            [Total number uOps assigned to pipe 1]
        fpu_pipe_assignment.total2
            [Total number uOps assigned to pipe 2]
        fpu_pipe_assignment.total3
            [Total number uOps assigned to pipe 3]
      
        Metric Groups:
      
        $> perf stat -e fpu_pipe_assignment.total sleep 1
      
        Performance counter stats for 'sleep 1':
      
                    25,883      fpu_pipe_assignment.total
      
               1.004145868 seconds time elapsed
      
               0.001805000 seconds user
               0.000000000 seconds sys
      
      Usage tests while running Linpackin the background:
      
        $> perf stat -I1000 -e fpu_pipe_assignment.total
             1.000266796     79,313,191,516      fpu_pipe_assignment.total
             2.000809630     68,091,474,430      fpu_pipe_assignment.total
             3.001028115     52,925,023,174      fpu_pipe_assignment.total
      
        $> perf record -e fpu_pipe_assignment.total,fpu_pipe_assignment.total0 -a sleep 1
        [ perf record: Woken up 9 times to write data ]
        [ perf record: Captured and wrote 4.031 MB perf.data (64764 samples) ]
      
        $> perf report --stdio --no-header | head -30
            98.33%  xhpl             xhpl                          [.] dgemm_kernel
             0.28%  xhpl             xhpl                          [.] dtrsm_kernel_LT
             0.10%  xhpl             [kernel.kallsyms]             [k] entry_SYSCALL_64
             0.08%  xhpl             xhpl                          [.] idamax_k
             0.07%  baloo_file_extr  liblmdb.so                    [.] mdb_mid2l_insert
             0.06%  xhpl             xhpl                          [.] dgemm_itcopy
             0.06%  xhpl             xhpl                          [.] dgemm_oncopy
             0.06%  xhpl             [kernel.kallsyms]             [k] __schedule
             0.06%  xhpl             [kernel.kallsyms]             [k] syscall_trace_enter
             0.06%  xhpl             [kernel.kallsyms]             [k] native_sched_clock
             0.06%  xhpl             [kernel.kallsyms]             [k] pick_next_task_fair
             0.05%  xhpl             xhpl                          [.] blas_thread_server.llvm.15009391670273914865
             0.04%  xhpl             [kernel.kallsyms]             [k] do_syscall_64
             0.04%  xhpl             [kernel.kallsyms]             [k] yield_task_fair
             0.04%  xhpl             libpthread-2.31.so            [.] __pthread_mutex_unlock_usercnt
             0.03%  xhpl             [kernel.kallsyms]             [k] cpuacct_charge
             0.03%  xhpl             [kernel.kallsyms]             [k] syscall_return_via_sysret
             0.03%  xhpl             libc-2.31.so                  [.] __sched_yield
             0.03%  xhpl             [kernel.kallsyms]             [k] __calc_delta
      
        $> perf annotate --stdio2 dgemm_kernel | egrep '^ {0,2}[0-9]+' -B2 -A2
                        sub          $0x60,%rsp
                        mov          %rbx,(%rsp)
          0.00          mov          %rbp,0x8(%rsp)
                        mov          %r12,0x10(%rsp)
          0.00          mov          %r13,0x18(%rsp)
                        mov          %r14,0x20(%rsp)
                        mov          %r15,0x28(%rsp)
        --
                        mov          %rdi,%r13
                        mov          %rsi,0x28(%rsp)
          0.00          mov          %rdx,%r12
                        vmovsd       %xmm0,0x30(%rsp)
                        shl          $0x3,%r10
                        mov          0x28(%rsp),%rax
          0.00          xor          %rdx,%rdx
                        mov          $0x18,%rdi
                        div          %rdi
        --
                        nop
                  a0:   mov          %r12,%rax
          0.00          shl          $0x3,%rax
                        mov          %r8,%rdi
                        lea          (%r8,%rax,8),%r15
        --
                        mov          %r12,%rax
                        nop
          0.00    c0:   vmovups      (%rdi),%ymm1
          0.09          vmovups      0x20(%rdi),%ymm2
          0.02          vmovups      (%r15),%ymm3
          0.10          vmovups      %ymm1,(%rsi)
          0.07          vmovups      %ymm2,0x20(%rsi)
          0.07          vmovups      %ymm3,0x40(%rsi)
          0.06          add          $0x40,%rdi
                        add          $0x40,%r15
                        add          $0x60,%rsi
          0.00          dec          %rax
                      ↑ jne          c0
                        mov          %r9,%r15
        --
                        nop
                 110:   lea          0x80(%rsp),%rsi
          0.01          add          $0x60,%rsi
          0.03          mov          %r12,%rax
          0.00          sar          $0x3,%rax
                        cmp          $0x2,%rax
                      ↓ jl           d26
                        prefetcht0   0x200(%rdi)
          0.01          vmovups      -0x60(%rsi),%ymm1
          0.02          prefetcht0   0xa0(%rsi)
          0.00          vbroadcastsd -0x80(%rdi),%ymm0
          0.00          prefetcht0   0xe0(%rsi)
          0.03          vmovups      -0x40(%rsi),%ymm2
          0.00          prefetcht0   0x120(%rsi)
                        vmovups      -0x20(%rsi),%ymm3
                        vmulpd       %ymm0,%ymm1,%ymm4
          0.01          prefetcht0   0x160(%rsi)
                        vmulpd       %ymm0,%ymm2,%ymm8
          0.01          vmulpd       %ymm0,%ymm3,%ymm12
          0.02          prefetcht0   0x1a0(%rsi)
          0.01          vbroadcastsd -0x78(%rdi),%ymm0
                        vmulpd       %ymm0,%ymm1,%ymm5
          0.01          vmulpd       %ymm0,%ymm2,%ymm9
                        vmulpd       %ymm0,%ymm3,%ymm13
          0.01          vbroadcastsd -0x70(%rdi),%ymm0
                        vmulpd       %ymm0,%ymm1,%ymm6
          0.00          vmulpd       %ymm0,%ymm2,%ymm10
          0.00          add          $0x60,%rsi
      
        ... snip ...
      
                        nop
                65e0:   vmovddup     -0x60(%rsi),%xmm2
          0.00          vmovups      -0x80(%rdi),%xmm0
                        vmovups      -0x70(%rdi),%xmm1
          0.00          vmovddup     -0x58(%rsi),%xmm3
                        vfmadd231pd  %xmm0,%xmm2,%xmm4
          0.00          vfmadd231pd  %xmm1,%xmm2,%xmm5
          0.00          vfmadd231pd  %xmm0,%xmm3,%xmm6
          0.00          vfmadd231pd  %xmm1,%xmm3,%xmm7
          0.00          add          $0x10,%rsi
                        add          $0x20,%rdi
          0.00          dec          %rax
                      ↑ jne          65e0
                        nop
                        nop
                6620:   vmovddup     0x30(%rsp),%xmm0
          0.00          vmulpd       %xmm0,%xmm4,%xmm4
          0.00          vmulpd       %xmm0,%xmm5,%xmm5
                        vmulpd       %xmm0,%xmm6,%xmm6
                        vmulpd       %xmm0,%xmm7,%xmm7
                        vaddpd       (%r15),%xmm4,%xmm4
                        vaddpd       0x10(%r15),%xmm5,%xmm5
          0.00          vaddpd       (%r15,%r10,1),%xmm6,%xmm6
          0.00          vaddpd       0x10(%r15,%r10,1),%xmm7,%xmm7
          0.00          vmovups      %xmm4,(%r15)
                        vmovups      %xmm5,0x10(%r15)
          0.00          vmovups      %xmm6,(%r15,%r10,1)
                        vmovups      %xmm7,0x10(%r15,%r10,1)
                        add          $0x20,%r15
        --
                        lea          (%r8,%rax,8),%r8
                69d8:   mov          0x20(%rsp),%r14
          0.00          test         $0x1,%r14
                      ↓ je           6d84
                        mov          %r9,%r15
        --
                        vbroadcastsd -0x28(%rsi),%ymm3
                        vfmadd231pd  (%rdi),%ymm0,%ymm4
          0.00          vfmadd231pd  0x20(%rdi),%ymm1,%ymm5
                        vfmadd231pd  0x40(%rdi),%ymm2,%ymm6
                        vfmadd231pd  0x60(%rdi),%ymm3,%ymm7
        --
                        vmulpd       %ymm0,%ymm4,%ymm4
                        vaddpd       (%r15),%ymm4,%ymm4
          0.00          vmovups      %ymm4,(%r15)
                        add          $0x20,%r15
                        dec          %r11
        --
                        mov          %rbx,%rsp
                        mov          (%rsp),%rbx
          0.01          mov          0x8(%rsp),%rbp
                        mov          0x10(%rsp),%r12
                        mov          0x18(%rsp),%r13
      Signed-off-by: default avatarVijay Thakkar <vijaythakkar@me.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarKim Phillips <kim.phillips@amd.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Jon Grimm <jon.grimm@amd.com>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200318190002.307290-3-vijaythakkar@me.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2079f7aa
    • Vijay Thakkar's avatar
      perf vendor events amd: Restrict model detection for zen1 based processors · c5f18e9e
      Vijay Thakkar authored
      This patch changes the previous blanket detection of AMD Family 17h
      processors to be more specific to Zen1 core based products only by
      replacing model detection regex pattern [[:xdigit:]]+ with
      ([12][0-9A-F]|[0-9A-F]), restricting to models 0 though 2f only.
      
      This change is required to allow for the addition of separate PMU events
      for Zen2 core based models in the following patches as those belong to
      family 17h but have different PMCs. Current PMU events directory has
      also been renamed to "amdzen1" from "amdfam17h" to reflect this
      specificity.
      
      Note that although this change does not break PMU counters for existing
      zen1 based systems, it does disable the current set of counters for zen2
      based systems. Counters for zen2 have been added in the following
      patches in this patchset.
      Signed-off-by: default avatarVijay Thakkar <vijaythakkar@me.com>
      Acked-by: default avatarKim Phillips <kim.phillips@amd.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Jon Grimm <jon.grimm@amd.com>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200318190002.307290-2-vijaythakkar@me.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c5f18e9e
    • Kajol Jain's avatar
      perf metricgroup: Fix printing event names of metric group with multiple... · 58fc90fd
      Kajol Jain authored
      perf metricgroup: Fix printing event names of metric group with multiple events incase of overlapping events
      
      Commit f01642e4 ("perf metricgroup: Support multiple events for
      metricgroup") introduced support for multiple events in a metric group.
      But with the current upstream, metric events names are not printed
      properly incase we try to run multiple metric groups with overlapping
      event.
      
      With current upstream version, incase of overlapping metric events issue
      is, we always start our comparision logic from start.  So, the events
      which already matched with some metric group also take part in
      comparision logic. Because of that when we have overlapping events, we
      end up matching current metric group event with already matched one.
      
      For example, in skylake machine we have metric event CoreIPC and
      Instructions. Both of them need 'inst_retired.any' event value.  As
      events in Instructions is subset of events in CoreIPC, they endup in
      pointing to same 'inst_retired.any' value.
      
      In skylake platform:
      
      command:# ./perf stat -M CoreIPC,Instructions  -C 0 sleep 1
      
       Performance counter stats for 'CPU(s) 0':
      
           1,254,992,790      inst_retired.any          # 1254992790.0
                                                          Instructions
                                                        #      1.3 CoreIPC
             977,172,805      cycles
           1,254,992,756      inst_retired.any
      
             1.000802596 seconds time elapsed
      
      command:# sudo ./perf stat -M UPI,IPC sleep 1
      
         Performance counter stats for 'sleep 1':
                 948,650      uops_retired.retire_slots
                 866,182      inst_retired.any          #      0.7 IPC
                 866,182      inst_retired.any
               1,175,671      cpu_clk_unhalted.thread
      
      Patch fixes the issue by adding a new bool pointer 'evlist_used' to keep
      track of events which already matched with some group by setting it
      true.  So, we skip all used events in list when we start comparision
      logic.  Patch also make some changes in comparision logic, incase we get
      a match miss, we discard the whole match and start again with first
      event id in metric event.
      
      With this patch:
      
      In skylake platform:
      
      command:# ./perf stat -M CoreIPC,Instructions  -C 0 sleep 1
      
       Performance counter stats for 'CPU(s) 0':
      
               3,348,415      inst_retired.any          #      0.3 CoreIPC
              11,779,026      cycles
               3,348,381      inst_retired.any          # 3348381.0
                                                          Instructions
      
             1.001649056 seconds time elapsed
      
      command:# ./perf stat -M UPI,IPC sleep 1
      
       Performance counter stats for 'sleep 1':
      
               1,023,148      uops_retired.retire_slots #      1.1 UPI
                 924,976      inst_retired.any
                 924,976      inst_retired.any          #      0.6 IPC
               1,489,414      cpu_clk_unhalted.thread
      
             1.003064672 seconds time elapsed
      Signed-off-by: default avatarKajol Jain <kjain@linux.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lore.kernel.org/lkml/20200221101121.28920-1-kjain@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      58fc90fd
    • Jin Yao's avatar
      perf stat: Align the output for interval aggregation mode · d13e9e41
      Jin Yao authored
      There is a slight misalignment in -A -I output.
      
      For example:
      
       # perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
      
       #           time CPU                    counts unit events
            1.000440863 CPU0               1,068,388      cpu/event=cpu-cycles/
            1.000440863 CPU1                 875,954      cpu/event=cpu-cycles/
            1.000440863 CPU2               3,072,538      cpu/event=cpu-cycles/
            1.000440863 CPU3               4,026,870      cpu/event=cpu-cycles/
            1.000440863 CPU4               5,919,630      cpu/event=cpu-cycles/
            1.000440863 CPU5               2,714,260      cpu/event=cpu-cycles/
            1.000440863 CPU6               2,219,240      cpu/event=cpu-cycles/
            1.000440863 CPU7               1,299,232      cpu/event=cpu-cycles/
      
      The value of counts is not aligned with the column "counts" and
      the event name is not aligned with the column "events".
      
      With this patch, the output is,
      
       # perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
      
       #           time CPU                    counts unit events
            1.000423009 CPU0                  997,421      cpu/event=cpu-cycles/
            1.000423009 CPU1                1,422,042      cpu/event=cpu-cycles/
            1.000423009 CPU2                  484,651      cpu/event=cpu-cycles/
            1.000423009 CPU3                  525,791      cpu/event=cpu-cycles/
            1.000423009 CPU4                1,370,100      cpu/event=cpu-cycles/
            1.000423009 CPU5                  442,072      cpu/event=cpu-cycles/
            1.000423009 CPU6                  205,643      cpu/event=cpu-cycles/
            1.000423009 CPU7                1,302,250      cpu/event=cpu-cycles/
      
      Now output is aligned.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200218071614.25736-1-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d13e9e41
    • Jin Yao's avatar
      perf report/top TUI: Support hotkeys to let user select any event for sorting · dbddf174
      Jin Yao authored
      When performing "perf report --group", it shows the event group information
      together. In previous patch, we have supported a new option "--group-sort-idx"
      to sort the output by the event at the index n in event group.
      
      It would be nice if we can use a hotkey in browser to select a event
      to sort.
      
      For example,
      
        # perf report --group
      
       Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, ...
                              Overhead  Command    Shared Object            Symbol
        92.19%  98.68%   0.00%  93.30%  mgen       mgen                     [.] LOOP1
         3.12%   0.29%   0.00%   0.16%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x0000000000049515
         1.56%   0.03%   0.00%   0.04%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x00000000000494b7
         1.56%   0.01%   0.00%   0.00%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x00000000000494ce
         1.56%   0.00%   0.00%   0.00%  mgen       [kernel.kallsyms]        [k] task_tick_fair
         0.00%   0.15%   0.00%   0.04%  perf       [kernel.kallsyms]        [k] smp_call_function_single
         0.00%   0.13%   0.00%   6.08%  swapper    [kernel.kallsyms]        [k] intel_idle
         0.00%   0.03%   0.00%   0.00%  gsd-color  libglib-2.0.so.0.5600.4  [.] g_main_context_check
         0.00%   0.03%   0.00%   0.00%  swapper    [kernel.kallsyms]        [k] apic_timer_interrupt
         0.00%   0.03%   0.00%   0.00%  swapper    [kernel.kallsyms]        [k] check_preempt_curr
      
      When user press hotkey '3' (event index, starting from 0), it indicates
      to sort output by the forth event in group.
      
        Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, ...
                              Overhead  Command    Shared Object            Symbol
        92.19%  98.68%   0.00%  93.30%  mgen       mgen                     [.] LOOP1
         0.00%   0.13%   0.00%   6.08%  swapper    [kernel.kallsyms]        [k] intel_idle
         3.12%   0.29%   0.00%   0.16%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x0000000000049515
         0.00%   0.00%   0.00%   0.06%  swapper    [kernel.kallsyms]        [k] hrtimer_start_range_ns
         1.56%   0.03%   0.00%   0.04%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x00000000000494b7
         0.00%   0.15%   0.00%   0.04%  perf       [kernel.kallsyms]        [k] smp_call_function_single
         0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] update_curr
         0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] apic_timer_interrupt
         0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] native_apic_msr_eoi_write
         0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] __update_load_avg_se
      
       v6:
       ---
       Jiri provided a good improvement to eliminate unneeded refresh.
       This improvement is added to v6.
      
       v2:
       ---
       1. Report warning at helpline when index is invalid.
       2. Report warning at helpline when it's not group event.
       3. Use "case '0' ... '9'" to refine the code
       4. Split K_RELOAD implementation to another patch.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200220013616.19916-4-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      dbddf174
    • Jin Yao's avatar
      perf report: Support a new key to reload the browser · 5e3b810a
      Jin Yao authored
      Sometimes we may need to reload the browser to update the output since
      some options are changed.
      
      This patch creates a new key K_RELOAD. Once the __cmd_report() returns
      K_RELOAD, it would repeat the whole process, such as, read samples from
      data file, sort the data and display in the browser.
      
       v5:
       ---
       1. Fix the 'make NO_SLANG=1' error. Define K_RELOAD in util/hist.h.
       2. Skip setup_sorting() in repeat path if last key is K_RELOAD.
      
       v4:
       ---
       Need to quit in perf_evsel_menu__run if key is K_RELOAD.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200220013616.19916-3-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5e3b810a
    • Jin Yao's avatar
      perf report: Allow specifying event to be used as sort key in --group output · 429a5f9d
      Jin Yao authored
      When performing "perf report --group", it shows the event group
      information together. By default, the output is sorted by the first
      event in group.
      
      It would be nice for user to select any event for sorting. This patch
      introduces a new option "--group-sort-idx" to sort the output by the
      event at the index n in event group.
      
      For example,
      
      Before:
      
        # perf report --group --stdio
      
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, BR_MISP_RETIRED.ALL_BRANCHES:pp, cpu/event=0xc0,umask=1,cmask=1,
        # Event count (approx.): 6451235635
        #
        #                         Overhead  Command    Shared Object            Symbol
        # ................................  .........  .......................  ...................................
        #
            92.19%  98.68%   0.00%  93.30%  mgen       mgen                     [.] LOOP1
             3.12%   0.29%   0.00%   0.16%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x0000000000049515
             1.56%   0.03%   0.00%   0.04%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x00000000000494b7
             1.56%   0.01%   0.00%   0.00%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x00000000000494ce
             1.56%   0.00%   0.00%   0.00%  mgen       [kernel.kallsyms]        [k] task_tick_fair
             0.00%   0.15%   0.00%   0.04%  perf       [kernel.kallsyms]        [k] smp_call_function_single
             0.00%   0.13%   0.00%   6.08%  swapper    [kernel.kallsyms]        [k] intel_idle
             0.00%   0.03%   0.00%   0.00%  gsd-color  libglib-2.0.so.0.5600.4  [.] g_main_context_check
             0.00%   0.03%   0.00%   0.00%  swapper    [kernel.kallsyms]        [k] apic_timer_interrupt
             ...
      
      After:
      
        # perf report --group --stdio --group-sort-idx 3
      
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, BR_MISP_RETIRED.ALL_BRANCHES:pp, cpu/event=0xc0,umask=1,cmask=1,
        # Event count (approx.): 6451235635
        #
        #                         Overhead  Command    Shared Object            Symbol
        # ................................  .........  .......................  ...................................
        #
            92.19%  98.68%   0.00%  93.30%  mgen       mgen                     [.] LOOP1
             0.00%   0.13%   0.00%   6.08%  swapper    [kernel.kallsyms]        [k] intel_idle
             3.12%   0.29%   0.00%   0.16%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x0000000000049515
             0.00%   0.00%   0.00%   0.06%  swapper    [kernel.kallsyms]        [k] hrtimer_start_range_ns
             1.56%   0.03%   0.00%   0.04%  gsd-color  libglib-2.0.so.0.5600.4  [.] 0x00000000000494b7
             0.00%   0.15%   0.00%   0.04%  perf       [kernel.kallsyms]        [k] smp_call_function_single
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] update_curr
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] apic_timer_interrupt
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] native_apic_msr_eoi_write
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] __update_load_avg_se
             0.00%   0.00%   0.00%   0.02%  mgen       [kernel.kallsyms]        [k] scheduler_tick
      
      Now the output is sorted by the fourth event in group.
      
       v7:
       ---
       Rebase to latest perf/core, no other change.
      
       v4:
       ---
       1. Update Documentation/perf-report.txt to mention
          '--group-sort-idx' support multiple groups with different
          amount of events and it should be used on grouped events.
      
       2. Update __hpp__group_sort_idx(), just return when the
          idx is out of limit.
      
       3. Return failure on symbol_conf.group_sort_idx && !session->evlist->nr_groups.
          So now we don't need to use together with --group.
      
       v3:
       ---
       Refine the code in __hpp__group_sort_idx().
      
       Before:
         for (i = 1; i < nr_members; i++) {
              if (i == idx) {
                      ret = field_cmp(fields_a[i], fields_b[i]);
                      if (ret)
                              goto out;
              }
         }
      
       After:
         if (idx >= 1 && idx < nr_members) {
              ret = field_cmp(fields_a[idx], fields_b[idx]);
              if (ret)
                      goto out;
         }
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200220013616.19916-2-yao.jin@linux.intel.com
      [ Renamed pair_fields_alloc() to hist_entry__new_pair() and combined decl + assignment of vars ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      429a5f9d
    • Jin Yao's avatar
      perf report/top TUI: Support hotkey 'a' for annotation of unresolved addresses · ec0479a6
      Jin Yao authored
      In previous patch, we have supported the annotation functionality even
      without symbols.
      
      For this patch, it supports the hotkey 'a' on address in report view.
      Note that, for branch mode, we only support the annotation for "branch
      to" address.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200227043939.4403-4-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ec0479a6
    • Jin Yao's avatar
      perf report: Support interactive annotation of code without symbols · 7b0a0dcb
      Jin Yao authored
      For perf report on stripped binaries it is currently impossible to do
      annotation. The annotation state is all tied to symbols, but there are
      either no symbols, or symbols are not covering all the code.
      
      We should support the annotation functionality even without symbols.
      
      This patch fakes a symbol and the symbol name is the string of address.
      After that, we just follow current annotation working flow.
      
      For example,
      
      1. perf report
      
        Overhead  Command  Shared Object     Symbol
          20.67%  div      libc-2.27.so      [.] __random_r
          17.29%  div      libc-2.27.so      [.] __random
          10.59%  div      div               [.] 0x0000000000000628
           9.25%  div      div               [.] 0x0000000000000612
           6.11%  div      div               [.] 0x0000000000000645
      
      2. Select the line of "10.59%  div      div               [.] 0x0000000000000628" and ENTER.
      
        Annotate 0x0000000000000628
        Zoom into div thread
        Zoom into div DSO (use the 'k' hotkey to zoom directly into the kernel)
        Browse map details
        Run scripts for samples of symbol [0x0000000000000628]
        Run scripts for all samples
        Switch to another data file in PWD
        Exit
      
      3. Select the "Annotate 0x0000000000000628" and ENTER.
      
      Percent│
             │
             │
             │     Disassembly of section .text:
             │
             │     0000000000000628 <.text+0x68>:
             │       divsd %xmm4,%xmm0
             │       divsd %xmm3,%xmm1
             │       movsd (%rsp),%xmm2
             │       addsd %xmm1,%xmm0
             │       addsd %xmm2,%xmm0
             │       movsd %xmm0,(%rsp)
      
      Now we can see the dump of object starting from 0x628.
      
       v5:
       ---
       Remove the hotkey 'a' implementation from this patch. It
       will be moved to a separate patch.
      
       v4:
       ---
       1. Support the hotkey 'a'. When we press 'a' on address,
          now it supports the annotation.
      
       2. Change the patch title from
          "Support interactive annotation of code without symbols" to
          "perf report: Support interactive annotation of code without symbols"
      
       v3:
       ---
       Keep just the ANNOTATION_DUMMY_LEN, and remove the
       opts->annotate_dummy_len since it's the "maybe in future
       we will provide" feature.
      
       v2:
       ---
       Fix a crash issue when annotating an address in "unknown" object.
      
       The steps to reproduce this issue:
      
       perf record -e cycles:u ls
       perf report
      
          75.29%  ls       ld-2.27.so        [.] do_lookup_x
          23.64%  ls       ld-2.27.so        [.] __GI___tunables_init
           1.04%  ls       [unknown]         [k] 0xffffffff85c01210
           0.03%  ls       ld-2.27.so        [.] _start
      
       When annotating 0xffffffff85c01210, the crash happens.
      
       v2 adds checking for ms->map in add_annotate_opt(). If the object is
       "unknown", ms->map is NULL.
      
      Committer notes:
      
      Renamed new_annotate_sym() to symbol__new_unresolved().
      
      Use PRIx64 to fix this issue in some 32-bit arches:
      
        ui/browsers/hists.c: In function 'symbol__new_unresolved':
        ui/browsers/hists.c:2474:38: error: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'u64' {aka 'long long unsigned int'} [-Werror=format=]
          snprintf(name, sizeof(name), "%-#.*lx", BITS_PER_LONG / 4, addr);
                                        ~~~~~~^                      ~~~~
                                        %-#.*llx
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200227043939.4403-3-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7b0a0dcb
  2. 23 Mar, 2020 3 commits
    • Jin Yao's avatar
      perf report: Print al_addr when symbol is not found · 443bc639
      Jin Yao authored
      For branch mode, if the symbol is not found, it prints
      the address.
      
      For example, 0x0000555eee0365a0 in below output.
      
        Overhead  Command  Source Shared Object  Source Symbol                            Target Symbol
          17.55%  div      libc-2.27.so          [.] __random                             [.] __random
           6.11%  div      div                   [.] 0x0000555eee0365a0                   [.] rand
           6.10%  div      libc-2.27.so          [.] rand                                 [.] 0x0000555eee036769
           5.80%  div      libc-2.27.so          [.] __random_r                           [.] __random
           5.72%  div      libc-2.27.so          [.] __random                             [.] __random_r
           5.62%  div      libc-2.27.so          [.] __random_r                           [.] __random_r
           5.38%  div      libc-2.27.so          [.] __random                             [.] rand
           4.56%  div      libc-2.27.so          [.] __random                             [.] __random
           4.49%  div      div                   [.] 0x0000555eee036779                   [.] 0x0000555eee0365ff
           4.25%  div      div                   [.] 0x0000555eee0365fa                   [.] 0x0000555eee036760
      
      But it's not very easy to understand what the instructions
      are in the binary. So this patch uses the al_addr instead.
      
      With this patch, the output is
      
        Overhead  Command  Source Shared Object  Source Symbol                            Target Symbol
          17.55%  div      libc-2.27.so          [.] __random                             [.] __random
           6.11%  div      div                   [.] 0x00000000000005a0                   [.] rand
           6.10%  div      libc-2.27.so          [.] rand                                 [.] 0x0000000000000769
           5.80%  div      libc-2.27.so          [.] __random_r                           [.] __random
           5.72%  div      libc-2.27.so          [.] __random                             [.] __random_r
           5.62%  div      libc-2.27.so          [.] __random_r                           [.] __random_r
           5.38%  div      libc-2.27.so          [.] __random                             [.] rand
           4.56%  div      libc-2.27.so          [.] __random                             [.] __random
           4.49%  div      div                   [.] 0x0000000000000779                   [.] 0x00000000000005ff
           4.25%  div      div                   [.] 0x00000000000005fa                   [.] 0x0000000000000760
      
      Now we can use objdump to dump the object starting from 0x5a0.
      
      For example,
      objdump -d --start-address 0x5a0 div
      
      00000000000005a0 <rand@plt>:
       5a0:   ff 25 2a 0a 20 00       jmpq   *0x200a2a(%rip)        # 200fd0 <__cxa_finalize@plt+0x200a20>
       5a6:   68 02 00 00 00          pushq  $0x2
       5ab:   e9 c0 ff ff ff          jmpq   570 <srand@plt-0x10>
       ...
      
      Committer testing:
      
        [root@seventh ~]# perf record -a -b sleep 1
        [root@seventh ~]# perf report --header-only | grep cpudesc
        # cpudesc : Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
        [root@seventh ~]# perf evlist -v
        cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
        [root@seventh ~]#
      
      Before:
      
        [root@seventh ~]# perf report --stdio --dso libsystemd-shared-241.so | head -20
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 2K of event 'cycles'
        # Event count (approx.): 2240
        #
        # Overhead  Command          Source Shared Object      Source Symbol           Target Symbol           Basic Block Cycles
        # ........  ...............  ........................  ......................  ......................  ..................
        #
             0.13%  systemd-journal  libc-2.29.so              [.] cfree@GLIBC_2.2.5   [.] _int_free           1
             0.09%  systemd          libsystemd-shared-241.so  [.] 0x00007fe406465c82  [.] 0x00007fe406465d80  1
             0.09%  systemd          libsystemd-shared-241.so  [.] 0x00007fe406465ded  [.] 0x00007fe406465c30  1
             0.09%  systemd          libsystemd-shared-241.so  [.] 0x00007fe406465e4e  [.] 0x00007fe406465de0  1
             0.09%  systemd-journal  systemd-journald          [.] free@plt            [.] cfree@GLIBC_2.2.5   1
             0.09%  systemd-journal  libc-2.29.so              [.] _int_free           [.] _int_free           18
             0.09%  systemd-journal  libc-2.29.so              [.] _int_free           [.] _int_free           2
             0.04%  systemd          libsystemd-shared-241.so  [.] bus_resolve@plt     [.] bus_resolve         204
             0.04%  systemd          libsystemd-shared-241.so  [.] getpid_cached@plt   [.] getpid_cached       7
        [root@seventh ~]#
      
      After:
      
        [root@seventh ~]# perf report --stdio --dso libsystemd-shared-241.so | head -20
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 2K of event 'cycles'
        # Event count (approx.): 2240
        #
        # Overhead  Command          Source Shared Object      Source Symbol           Target Symbol           Basic Block Cycles
        # ........  ...............  ........................  ......................  ......................  ..................
        #
             0.13%  systemd-journal  libc-2.29.so              [.] cfree@GLIBC_2.2.5   [.] _int_free           1
             0.09%  systemd          libsystemd-shared-241.so  [.] 0x00000000000f7c82  [.] 0x00000000000f7d80  1
             0.09%  systemd          libsystemd-shared-241.so  [.] 0x00000000000f7ded  [.] 0x00000000000f7c30  1
             0.09%  systemd          libsystemd-shared-241.so  [.] 0x00000000000f7e4e  [.] 0x00000000000f7de0  1
             0.09%  systemd-journal  systemd-journald          [.] free@plt            [.] cfree@GLIBC_2.2.5   1
             0.09%  systemd-journal  libc-2.29.so              [.] _int_free           [.] _int_free           18
             0.09%  systemd-journal  libc-2.29.so              [.] _int_free           [.] _int_free           2
             0.04%  systemd          libsystemd-shared-241.so  [.] bus_resolve@plt     [.] bus_resolve         204
             0.04%  systemd          libsystemd-shared-241.so  [.] getpid_cached@plt   [.] getpid_cached       7
        [root@seventh ~]#
      
      Lets use -v to get full paths and then try objdump on the unresolved address:
      
        [root@seventh ~]# perf report -v --stdio --dso libsystemd-shared-241.so |& grep libsystemd-shared-241.so | tail -1
           0.04% systemd-journal /usr/lib/systemd/libsystemd-shared-241.so 0x80c1a B [.] 0x0000000000080c1a 0x80a95 B [.] 0x0000000000080a95 61
        [root@seventh ~]#
      
        [root@seventh ~]# objdump -d --start-address 0x00000000000f7d80 /usr/lib/systemd/libsystemd-shared-241.so | head -20
      
        /usr/lib/systemd/libsystemd-shared-241.so:     file format elf64-x86-64
      
        Disassembly of section .text:
      
        00000000000f7d80 <proc_cmdline_parse_given@@SD_SHARED+0x330>:
           f7d80:	41 39 11             	cmp    %edx,(%r9)
           f7d83:	0f 84 ff fe ff ff    	je     f7c88 <proc_cmdline_parse_given@@SD_SHARED+0x238>
           f7d89:	4c 8d 05 97 09 0c 00 	lea    0xc0997(%rip),%r8        # 1b8727 <utf8_skip_data@@SD_SHARED+0x3147>
           f7d90:	b9 49 00 00 00       	mov    $0x49,%ecx
           f7d95:	48 8d 15 c9 f5 0b 00 	lea    0xbf5c9(%rip),%rdx        # 1b7365 <utf8_skip_data@@SD_SHARED+0x1d85>
           f7d9c:	31 ff                	xor    %edi,%edi
           f7d9e:	48 8d 35 9b ff 0b 00 	lea    0xbff9b(%rip),%rsi        # 1b7d40 <utf8_skip_data@@SD_SHARED+0x2760>
           f7da5:	e8 a6 d6 f4 ff       	callq  45450 <log_assert_failed_realm@plt>
           f7daa:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
           f7db0:	41 56                	push   %r14
           f7db2:	41 55                	push   %r13
           f7db4:	41 54                	push   %r12
           f7db6:	55                   	push   %rbp
        [root@seventh ~]#
      
      If we tried the the reported address before this patch:
      
        [root@seventh ~]# objdump -d --start-address 0x00007fe406465d80 /usr/lib/systemd/libsystemd-shared-241.so | head -20
      
        /usr/lib/systemd/libsystemd-shared-241.so:     file format elf64-x86-64
      
        [root@seventh ~]#
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200227043939.4403-2-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      443bc639
    • Leo Yan's avatar
      perf symbols: Consolidate symbol fixup issue · 7eec00a7
      Leo Yan authored
      After copying Arm64's perf archive with object files and perf.data file
      to x86 laptop, the x86's perf kernel symbol resolution fails.  It
      outputs 'unknown' for all symbols parsing.
      
      This issue is root caused by the function elf__needs_adjust_symbols(),
      x86 perf tool uses one weak version, Arm64 (and powerpc) has rewritten
      their own version.  elf__needs_adjust_symbols() decides if need to parse
      symbols with the relative offset address; but x86 building uses the weak
      function which misses to check for the elf type 'ET_DYN', so that it
      cannot parse symbols in Arm DSOs due to the wrong result from
      elf__needs_adjust_symbols().
      
      The DSO parsing should not depend on any specific architecture perf
      building; e.g. x86 perf tool can parse Arm and Arm64 DSOs, vice versa.
      And confirmed by Naveen N. Rao that powerpc64 kernels are not being
      built as ET_DYN anymore and change to ET_EXEC.
      
      This patch removes the arch specific functions for Arm64 and powerpc and
      changes elf__needs_adjust_symbols() as a common function.
      
      In the common elf__needs_adjust_symbols(), it checks an extra condition
      'ET_DYN' for elf header type.  With this fixing, the Arm64 DSO can be
      parsed properly with x86's perf tool.
      
      Before:
      
        # perf script
        main 3258 1 branches:                0 [unknown] ([unknown]) => ffff800010c4665c [unknown] ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c46670 [unknown] ([kernel.kallsyms]) => ffff800010c4eaec [unknown] ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4eaec [unknown] ([kernel.kallsyms]) => ffff800010c4eb00 [unknown] ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4eb08 [unknown] ([kernel.kallsyms]) => ffff800010c4e780 [unknown] ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4e7a0 [unknown] ([kernel.kallsyms]) => ffff800010c4eeac [unknown] ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4eebc [unknown] ([kernel.kallsyms]) => ffff800010c4ed80 [unknown] ([kernel.kallsyms])
      
      After:
      
        # perf script
        main 3258 1 branches:                0 [unknown] ([unknown]) => ffff800010c4665c coresight_timeout+0x54 ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c46670 coresight_timeout+0x68 ([kernel.kallsyms]) => ffff800010c4eaec etm4_enable_hw+0x3cc ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4eaec etm4_enable_hw+0x3cc ([kernel.kallsyms]) => ffff800010c4eb00 etm4_enable_hw+0x3e0 ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4eb08 etm4_enable_hw+0x3e8 ([kernel.kallsyms]) => ffff800010c4e780 etm4_enable_hw+0x60 ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4e7a0 etm4_enable_hw+0x80 ([kernel.kallsyms]) => ffff800010c4eeac etm4_enable+0x2d4 ([kernel.kallsyms])
        main 3258 1 branches: ffff800010c4eebc etm4_enable+0x2e4 ([kernel.kallsyms]) => ffff800010c4ed80 etm4_enable+0x1a8 ([kernel.kallsyms])
      
      v3: Changed to check for ET_DYN across all architectures.
      
      v2: Fixed Arm64 and powerpc native building.
      Reported-by: default avatarMike Leach <mike.leach@linaro.org>
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Reviewed-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Allison Randal <allison@lohutok.net>
      Cc: Enrico Weigelt <info@metux.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Link: http://lore.kernel.org/lkml/20200306015759.10084-1-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7eec00a7
    • Ian Rogers's avatar
      perf parse-events: Fix 3 use after frees found with clang ASAN · d4953f7e
      Ian Rogers authored
      Reproducible with a clang asan build and then running perf test in
      particular 'Parse event definition strings'.
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: clang-built-linux@googlegroups.com
      Link: http://lore.kernel.org/lkml/20200314170356.62914-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d4953f7e
  3. 20 Mar, 2020 5 commits
  4. 19 Mar, 2020 4 commits
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-5.7-20200317' of... · d1c9f7d1
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-5.7-20200317' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      perf record:
      
        Alexey Budankov:
      
        - Fix binding of AIO user space buffers to nodes
      
      maps:
      
        Dominik b. Czarnota:
      
        - Fix off by one in strncpy() size argument.
      
        Arnaldo Carvalho de Melo:
      
        - Use strstarts() to look for Android libraries.
      
        Ian Rogers:
      
        - Give synthetic mmap events an inode generation.
      
      man pages:
      
        Ian Rogers:
      
        - Set man page date to last git commit.
      
      perf test:
      
        Ian Rogers:
      
        - Print if shell directory isn't present.
      
      perf report:
      
        Jin Yao:
      
        - Fix no branch type statistics report issue.
      
      perf expr:
      
        Jiri Olsa:
      
        - Fix copy/paste mistake
      
      vendor events:
      
        Kan Liang:
      
        - Support metric constraints.
      
      vendor events intel:
      
        Kan Liang:
      
        - Add NO_NMI_WATCHDOG metric constraint.
      
      vendor events s390:
      
        Thomas Richter:
      
       - Add new deflate counters for IBM z15.
      
      ARM cs-etm:
      
        Leo Yan:
      
        - Last branch improvements.
      
      intel-pt:
      
        Adrian Hunter:
      
        - Update intel-pt.txt file with new location of the documentation.
      
        - Add Intel PT man page references.
      
        - Rename intel-pt.txt and put it in man page format.
      
      perl scripting:
      
        Michael Petlan:
      
       - Add common_callchain to fix argument order.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      
      Conflicts:
      	tools/perf/util/map.c
      d1c9f7d1
    • Ingo Molnar's avatar
      409e1a31
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-5.7-20200310' of... · fdca7c14
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-5.7-20200310' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      perf stat:
      
        Jin Yao:
      
        - Show percore counts in per CPU output.
      
      perf report:
      
        Jin Yao:
      
        - Allow selecting which block info columns to report and its order.
      
        - Support color ops to print block percents in color.
      
        - Fix wrong block address comparison in block_info__cmp().
      
      perf annotate:
      
        Ravi Bangoria:
      
        - Get rid of annotation->nr_jumps, unused.
      
      expr:
      
        Jiri Olsa:
      
        - Move expr lexer to flex.
      
      llvm:
      
        Arnaldo Carvalho de Melo:
      
        - Add debug hint message about missing kernel-devel package.
      
      core:
      
        Kan Liang:
      
        - Initial patches to support the recently added PERF_SAMPLE_BRANCH_HW_INDEX
          kernel feature.
      
        - Add check for unexpected use of reserved membrs in event attr, so that in
          the future older perf tools will complain instead of silently try to process
          unknown features.
      
      libapi:
      
        Namhyung Kim:
      
        - Adopt cgroupsfs_find_mountpoint() from tools/perf/util/.
      
      libperf:
      
        Michael Petlan:
      
        - Add counting example.
      
      libtraceevent:
      
         Steven Rostedt (VMware):
      
        - Remove extra '\n' in print_event_time().
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fdca7c14
    • Ingo Molnar's avatar
      Merge tag 'perf-urgent-for-mingo-5.6-20200309' of... · db5d85ce
      Ingo Molnar authored
      Merge tag 'perf-urgent-for-mingo-5.6-20200309' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
      
      perf probe:
      
        Masami Hiramatsu:
      
        - Fix deletion of multiple probe events.
      
        - Fix userspace libraries handling by not depending on dwfl_module_addrsym().
      
      Event parsing:
      
        Ian Rogers:
      
        - Fix reading of invalid memory in event parsing.
      
      python binding:
      
        Ilie Halip:
      
        - Fix clang detection when using CC=clang-version.
      
      build:
      
        Masami Hiramatsu:
      
        - Fix O= use with relative paths.
      
      Android:
      
        Dominik b. Czarnota:
      
        - Fix off by one in strncpy() size argument when handling Android
          libraries.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      db5d85ce
  5. 17 Mar, 2020 7 commits
    • Jiri Olsa's avatar
      perf expr: Fix copy/paste mistake · 59a08b4b
      Jiri Olsa authored
      Copy/paste leftover from recent refactor.
      
      Fixes: 26226a97 ("perf expr: Move expr lexer to flex")
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200315155609.603948-1-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      59a08b4b
    • Jin Yao's avatar
      perf report: Fix no branch type statistics report issue · c3b10649
      Jin Yao authored
      Previously we could get the report of branch type statistics.
      
      For example:
      
        # perf record -j any,save_type ...
        # t perf report --stdio
      
        #
        # Branch Statistics:
        #
        COND_FWD:  40.6%
        COND_BWD:   4.1%
        CROSS_4K:  24.7%
        CROSS_2M:  12.3%
            COND:  44.7%
          UNCOND:   0.0%
             IND:   6.1%
            CALL:  24.5%
             RET:  24.7%
      
      But now for the recent perf, it can't report the branch type statistics.
      
      It's a regression issue caused by commit 40c39e30 ("perf report: Fix
      a no annotate browser displayed issue"), which only counts the branch
      type statistics for browser mode.
      
      This patch moves the branch_type_count() outside of ui__has_annotation()
      checking, then branch type statistics can work for stdio mode.
      
      Fixes: 40c39e30 ("perf report: Fix a no annotate browser displayed issue")
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200313134607.12873-1-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c3b10649
    • Ian Rogers's avatar
      perf tools: Give synthetic mmap events an inode generation · 3b7a15b0
      Ian Rogers authored
      When mmap2 events are synthesized the ino_generation field isn't being
      set leading to uninitialized memory being compared.
      
      Caught with clang's -fsanitize=memory:
      
      ==124733==WARNING: MemorySanitizer: use-of-uninitialized-value
          #0 0x55a96a6a65cc in __dso_id__cmp tools/perf/util/dsos.c:23:6
          #1 0x55a96a6a81d5 in dso_id__cmp tools/perf/util/dsos.c:38:9
          #2 0x55a96a6a717f in __dso__cmp_long_name tools/perf/util/dsos.c:74:15
          #3 0x55a96a6a6c4c in __dsos__findnew_link_by_longname_id tools/perf/util/dsos.c:106:12
          #4 0x55a96a6a851e in __dsos__findnew_by_longname_id tools/perf/util/dsos.c:178:9
          #5 0x55a96a6a7798 in __dsos__find_id tools/perf/util/dsos.c:191:9
          #6 0x55a96a6a7b57 in __dsos__findnew_id tools/perf/util/dsos.c:251:20
          #7 0x55a96a6a7a57 in dsos__findnew_id tools/perf/util/dsos.c:259:17
          #8 0x55a96a7776ae in machine__findnew_dso_id tools/perf/util/machine.c:2709:9
          #9 0x55a96a77dfcf in map__new tools/perf/util/map.c:193:10
          #10 0x55a96a77240a in machine__process_mmap2_event tools/perf/util/machine.c:1670:8
          #11 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9
          #12 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9
          #13 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9
          #14 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7
          #15 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9
          #16 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3
          #17 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
          #18 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
          #19 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
          #20 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
          #21 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
          #22 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
          #23 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4
          #24 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9
          #25 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11
          #26 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8
          #27 0x55a96a284097 in run_argv tools/perf/perf.c:408:2
          #28 0x55a96a282223 in main tools/perf/perf.c:538:3
      
        Uninitialized value was stored to memory at
          #1 0x55a96a6a18f7 in dso__new_id tools/perf/util/dso.c:1230:14
          #2 0x55a96a6a78ee in __dsos__addnew_id tools/perf/util/dsos.c:233:20
          #3 0x55a96a6a7bcc in __dsos__findnew_id tools/perf/util/dsos.c:252:21
          #4 0x55a96a6a7a57 in dsos__findnew_id tools/perf/util/dsos.c:259:17
          #5 0x55a96a7776ae in machine__findnew_dso_id tools/perf/util/machine.c:2709:9
          #6 0x55a96a77dfcf in map__new tools/perf/util/map.c:193:10
          #7 0x55a96a77240a in machine__process_mmap2_event tools/perf/util/machine.c:1670:8
          #8 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9
          #9 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9
          #10 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9
          #11 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7
          #12 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9
          #13 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3
          #14 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
          #15 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
          #16 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
          #17 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
          #18 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
          #19 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
      
        Uninitialized value was stored to memory at
          #0 0x55a96a7725af in machine__process_mmap2_event tools/perf/util/machine.c:1646:25
          #1 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9
          #2 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9
          #3 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9
          #4 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7
          #5 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9
          #6 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3
          #7 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
          #8 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
          #9 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
          #10 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
          #11 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
          #12 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
          #13 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4
          #14 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9
          #15 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11
          #16 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8
          #17 0x55a96a284097 in run_argv tools/perf/perf.c:408:2
          #18 0x55a96a282223 in main tools/perf/perf.c:538:3
      
        Uninitialized value was created by a heap allocation
          #0 0x55a96a22f60d in malloc llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:925:3
          #1 0x55a96a882948 in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:655:15
          #2 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
          #3 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
          #4 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
          #5 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
          #6 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
          #7 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
          #8 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4
          #9 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9
          #10 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11
          #11 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8
          #12 0x55a96a284097 in run_argv tools/perf/perf.c:408:2
          #13 0x55a96a282223 in main tools/perf/perf.c:538:3
      
      SUMMARY: MemorySanitizer: use-of-uninitialized-value tools/perf/util/dsos.c:23:6 in __dso_id__cmp
      Signed-off-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: clang-built-linux@googlegroups.com
      Link: http://lore.kernel.org/lkml/20200313053129.131264-1-irogers@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3b7a15b0
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · ac309e77
      Linus Torvalds authored
      Pull HID fixes from Jiri Kosina:
      
       - string buffer formatting fixes in picolcd and sensor drivers, from
         Takashi Iwai
      
       - two new device IDs from Chen-Tsung Hsieh and Tony Fischetti
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
        HID: add ALWAYS_POLL quirk to lenovo pixart mouse
        HID: google: add moonball USB id
        HID: hid-sensor-custom: Use scnprintf() for avoiding potential buffer overflow
        HID: hid-picolcd_fb: Use scnprintf() for avoiding potential buffer overflow
      ac309e77
    • Kim Phillips's avatar
      perf/amd/uncore: Add support for Family 19h L3 PMU · e48667b8
      Kim Phillips authored
      Family 19h introduces change in slice, core and thread specification in
      its L3 Performance Event Select (ChL3PmcCfg) h/w register. The change is
      incompatible with Family 17h's version of the register.
      
      Introduce a new path in l3_thread_slice_mask() to do things differently
      for Family 19h vs. Family 17h, otherwise the new hardware doesn't get
      programmed correctly.
      
      Instead of a linear core--thread bitmask, Family 19h takes an encoded
      core number, and a separate thread mask. There are new bits that are set
      for all cores and all slices, of which only the latter is used, since
      the driver counts events for all slices on behalf of the specified CPU.
      
      Also update amd_uncore_init() to base its L2/NB vs. L3/Data Fabric mode
      decision based on Family 17h or above, not just 17h and 18h: the Family
      19h Data Fabric PMC is compatible with the Family 17h DF PMC.
      
       [ bp: Touchups. ]
      Signed-off-by: default avatarKim Phillips <kim.phillips@amd.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200313231024.17601-3-kim.phillips@amd.com
      e48667b8
    • Kim Phillips's avatar
      perf/amd/uncore: Make L3 thread mask code more readable · 9689dbbe
      Kim Phillips authored
      Convert the l3_thread_slice_mask() function to use the more readable
      topology_* helper functions, more intuitive variable names like shift
      and thread_mask, and BIT_ULL().
      
      No functional changes.
      Signed-off-by: default avatarKim Phillips <kim.phillips@amd.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200313231024.17601-2-kim.phillips@amd.com
      9689dbbe
    • Kim Phillips's avatar
      perf/amd/uncore: Prepare L3 thread mask code for Family 19h · 4dcc3df8
      Kim Phillips authored
      In order to better accommodate the upcoming Family 19h, given
      the 80-char line limit, move the existing code into a new
      l3_thread_slice_mask() function.
      
      No functional changes.
      
       [ bp: Touchups. ]
      Signed-off-by: default avatarKim Phillips <kim.phillips@amd.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200313231024.17601-1-kim.phillips@amd.com
      4dcc3df8
  6. 16 Mar, 2020 3 commits
  7. 15 Mar, 2020 1 commit