Commit 2079f7aa authored by Vijay Thakkar's avatar Vijay Thakkar Committed by Arnaldo Carvalho de Melo

perf vendor events amd: Add Zen2 events

This patch adds PMU events for AMD Zen2 core based processors, namely,
Matisse (model 71h), Castle Peak (model 31h) and Rome (model 2xh), as
documented in the AMD Processor Programming Reference for Matisse [1].
The model number regex has been set to detect all the models under
family 17 that do not match those of Zen1, as the range is larger for
zen2.

Zen2 adds some additional counters that are not present in Zen1 and
events for them have been added in this patch. Some counters have also
been removed for Zen2 thatwere previously present in Zen1 and have been
confirmed to always sample zero on zen2. These added/removed counters
have been omitted for brevity but can be found here:
https://gist.github.com/thakkarV/5b12ca5fd7488eb2c42e451e40bdd5f3

Note that PPR for Zen2 [1] does not include some counters that were
documented in the PPR for Zen1 based processors [2]. After having tested
these counters, some of them that still work for zen2 systems have been
preserved in the events for zen2. The counters that are omitted in [1]
but are still measurable and non-zero on zen2 (tested on a Ryzen 3900X
system) are the following:

  PMC 0x000 fpu_pipe_assignment.{total|total0|total1|total2|total3}
  PMC 0x004 fp_num_mov_elim_scal_op.*
  PMC 0x046 ls_tablewalker.*
  PMC 0x062 l2_latency.l2_cycles_waiting_on_fills
  PMC 0x063 l2_wcb_req.*
  PMC 0x06D l2_fill_pending.l2_fill_busy
  PMC 0x080 ic_fw32
  PMC 0x081 ic_fw32_miss
  PMC 0x086 bp_snp_re_sync
  PMC 0x087 ic_fetch_stall.*
  PMC 0x08C ic_cache_inval.*
  PMC 0x099 bp_tlb_rel
  PMC 0x0C7 ex_ret_brn_resync
  PMC 0x28A ic_oc_mode_switch.*
  L3PMC 0x001 l3_request_g1.*
  L3PMC 0x006 l3_comb_clstr_state.*

[1]: Processor Programming Reference (PPR) for AMD Family 17h Model 71h,
Revision B0 Processors, 56176 Rev 3.06 - Jul 17, 2019

[2]: Processor Programming Reference (PPR) for AMD Family 17h Models
01h,08h, Revision B2 Processors, 54945 Rev 3.03 - Jun 14, 2019

All of the PPRs can be found at:

https://bugzilla.kernel.org/show_bug.cgi?id=206537

Here are the results of running "fpu_pipe_assignment.total" events on my
Ryzen 3900X family 17h model 71h system:

Before this patch:

  $> perf list *fpu_pipe_assignment*

List of pre-defined events (to be used in -e):

After:

  $> perf list *fpu_pipe_assignment*

  floating point:
  fpu_pipe_assignment.total
      [Total number of fp uOps]
  fpu_pipe_assignment.total0
      [Total number uOps assigned to pipe 0]
  fpu_pipe_assignment.total1
      [Total number uOps assigned to pipe 1]
  fpu_pipe_assignment.total2
      [Total number uOps assigned to pipe 2]
  fpu_pipe_assignment.total3
      [Total number uOps assigned to pipe 3]

  Metric Groups:

  $> perf stat -e fpu_pipe_assignment.total sleep 1

  Performance counter stats for 'sleep 1':

              25,883      fpu_pipe_assignment.total

         1.004145868 seconds time elapsed

         0.001805000 seconds user
         0.000000000 seconds sys

Usage tests while running Linpackin the background:

  $> perf stat -I1000 -e fpu_pipe_assignment.total
       1.000266796     79,313,191,516      fpu_pipe_assignment.total
       2.000809630     68,091,474,430      fpu_pipe_assignment.total
       3.001028115     52,925,023,174      fpu_pipe_assignment.total

  $> perf record -e fpu_pipe_assignment.total,fpu_pipe_assignment.total0 -a sleep 1
  [ perf record: Woken up 9 times to write data ]
  [ perf record: Captured and wrote 4.031 MB perf.data (64764 samples) ]

  $> perf report --stdio --no-header | head -30
      98.33%  xhpl             xhpl                          [.] dgemm_kernel
       0.28%  xhpl             xhpl                          [.] dtrsm_kernel_LT
       0.10%  xhpl             [kernel.kallsyms]             [k] entry_SYSCALL_64
       0.08%  xhpl             xhpl                          [.] idamax_k
       0.07%  baloo_file_extr  liblmdb.so                    [.] mdb_mid2l_insert
       0.06%  xhpl             xhpl                          [.] dgemm_itcopy
       0.06%  xhpl             xhpl                          [.] dgemm_oncopy
       0.06%  xhpl             [kernel.kallsyms]             [k] __schedule
       0.06%  xhpl             [kernel.kallsyms]             [k] syscall_trace_enter
       0.06%  xhpl             [kernel.kallsyms]             [k] native_sched_clock
       0.06%  xhpl             [kernel.kallsyms]             [k] pick_next_task_fair
       0.05%  xhpl             xhpl                          [.] blas_thread_server.llvm.15009391670273914865
       0.04%  xhpl             [kernel.kallsyms]             [k] do_syscall_64
       0.04%  xhpl             [kernel.kallsyms]             [k] yield_task_fair
       0.04%  xhpl             libpthread-2.31.so            [.] __pthread_mutex_unlock_usercnt
       0.03%  xhpl             [kernel.kallsyms]             [k] cpuacct_charge
       0.03%  xhpl             [kernel.kallsyms]             [k] syscall_return_via_sysret
       0.03%  xhpl             libc-2.31.so                  [.] __sched_yield
       0.03%  xhpl             [kernel.kallsyms]             [k] __calc_delta

  $> perf annotate --stdio2 dgemm_kernel | egrep '^ {0,2}[0-9]+' -B2 -A2
                  sub          $0x60,%rsp
                  mov          %rbx,(%rsp)
    0.00          mov          %rbp,0x8(%rsp)
                  mov          %r12,0x10(%rsp)
    0.00          mov          %r13,0x18(%rsp)
                  mov          %r14,0x20(%rsp)
                  mov          %r15,0x28(%rsp)
  --
                  mov          %rdi,%r13
                  mov          %rsi,0x28(%rsp)
    0.00          mov          %rdx,%r12
                  vmovsd       %xmm0,0x30(%rsp)
                  shl          $0x3,%r10
                  mov          0x28(%rsp),%rax
    0.00          xor          %rdx,%rdx
                  mov          $0x18,%rdi
                  div          %rdi
  --
                  nop
            a0:   mov          %r12,%rax
    0.00          shl          $0x3,%rax
                  mov          %r8,%rdi
                  lea          (%r8,%rax,8),%r15
  --
                  mov          %r12,%rax
                  nop
    0.00    c0:   vmovups      (%rdi),%ymm1
    0.09          vmovups      0x20(%rdi),%ymm2
    0.02          vmovups      (%r15),%ymm3
    0.10          vmovups      %ymm1,(%rsi)
    0.07          vmovups      %ymm2,0x20(%rsi)
    0.07          vmovups      %ymm3,0x40(%rsi)
    0.06          add          $0x40,%rdi
                  add          $0x40,%r15
                  add          $0x60,%rsi
    0.00          dec          %rax
                ↑ jne          c0
                  mov          %r9,%r15
  --
                  nop
           110:   lea          0x80(%rsp),%rsi
    0.01          add          $0x60,%rsi
    0.03          mov          %r12,%rax
    0.00          sar          $0x3,%rax
                  cmp          $0x2,%rax
                ↓ jl           d26
                  prefetcht0   0x200(%rdi)
    0.01          vmovups      -0x60(%rsi),%ymm1
    0.02          prefetcht0   0xa0(%rsi)
    0.00          vbroadcastsd -0x80(%rdi),%ymm0
    0.00          prefetcht0   0xe0(%rsi)
    0.03          vmovups      -0x40(%rsi),%ymm2
    0.00          prefetcht0   0x120(%rsi)
                  vmovups      -0x20(%rsi),%ymm3
                  vmulpd       %ymm0,%ymm1,%ymm4
    0.01          prefetcht0   0x160(%rsi)
                  vmulpd       %ymm0,%ymm2,%ymm8
    0.01          vmulpd       %ymm0,%ymm3,%ymm12
    0.02          prefetcht0   0x1a0(%rsi)
    0.01          vbroadcastsd -0x78(%rdi),%ymm0
                  vmulpd       %ymm0,%ymm1,%ymm5
    0.01          vmulpd       %ymm0,%ymm2,%ymm9
                  vmulpd       %ymm0,%ymm3,%ymm13
    0.01          vbroadcastsd -0x70(%rdi),%ymm0
                  vmulpd       %ymm0,%ymm1,%ymm6
    0.00          vmulpd       %ymm0,%ymm2,%ymm10
    0.00          add          $0x60,%rsi

  ... snip ...

                  nop
          65e0:   vmovddup     -0x60(%rsi),%xmm2
    0.00          vmovups      -0x80(%rdi),%xmm0
                  vmovups      -0x70(%rdi),%xmm1
    0.00          vmovddup     -0x58(%rsi),%xmm3
                  vfmadd231pd  %xmm0,%xmm2,%xmm4
    0.00          vfmadd231pd  %xmm1,%xmm2,%xmm5
    0.00          vfmadd231pd  %xmm0,%xmm3,%xmm6
    0.00          vfmadd231pd  %xmm1,%xmm3,%xmm7
    0.00          add          $0x10,%rsi
                  add          $0x20,%rdi
    0.00          dec          %rax
                ↑ jne          65e0
                  nop
                  nop
          6620:   vmovddup     0x30(%rsp),%xmm0
    0.00          vmulpd       %xmm0,%xmm4,%xmm4
    0.00          vmulpd       %xmm0,%xmm5,%xmm5
                  vmulpd       %xmm0,%xmm6,%xmm6
                  vmulpd       %xmm0,%xmm7,%xmm7
                  vaddpd       (%r15),%xmm4,%xmm4
                  vaddpd       0x10(%r15),%xmm5,%xmm5
    0.00          vaddpd       (%r15,%r10,1),%xmm6,%xmm6
    0.00          vaddpd       0x10(%r15,%r10,1),%xmm7,%xmm7
    0.00          vmovups      %xmm4,(%r15)
                  vmovups      %xmm5,0x10(%r15)
    0.00          vmovups      %xmm6,(%r15,%r10,1)
                  vmovups      %xmm7,0x10(%r15,%r10,1)
                  add          $0x20,%r15
  --
                  lea          (%r8,%rax,8),%r8
          69d8:   mov          0x20(%rsp),%r14
    0.00          test         $0x1,%r14
                ↓ je           6d84
                  mov          %r9,%r15
  --
                  vbroadcastsd -0x28(%rsi),%ymm3
                  vfmadd231pd  (%rdi),%ymm0,%ymm4
    0.00          vfmadd231pd  0x20(%rdi),%ymm1,%ymm5
                  vfmadd231pd  0x40(%rdi),%ymm2,%ymm6
                  vfmadd231pd  0x60(%rdi),%ymm3,%ymm7
  --
                  vmulpd       %ymm0,%ymm4,%ymm4
                  vaddpd       (%r15),%ymm4,%ymm4
    0.00          vmovups      %ymm4,(%r15)
                  add          $0x20,%r15
                  dec          %r11
  --
                  mov          %rbx,%rsp
                  mov          (%rsp),%rbx
    0.01          mov          0x8(%rsp),%rbp
                  mov          0x10(%rsp),%r12
                  mov          0x18(%rsp),%r13
Signed-off-by: default avatarVijay Thakkar <vijaythakkar@me.com>
Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: default avatarKim Phillips <kim.phillips@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jon Grimm <jon.grimm@amd.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200318190002.307290-3-vijaythakkar@me.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
parent c5f18e9e
[
{
"EventName": "bp_l1_btb_correct",
"EventCode": "0x8a",
"BriefDescription": "L1 Branch Prediction Overrides Existing Prediction (speculative)."
},
{
"EventName": "bp_l2_btb_correct",
"EventCode": "0x8b",
"BriefDescription": "L2 Branch Prediction Overrides Existing Prediction (speculative)."
},
{
"EventName": "bp_dyn_ind_pred",
"EventCode": "0x8e",
"BriefDescription": "Dynamic Indirect Predictions.",
"PublicDescription": "Indirect Branch Prediction for potential multi-target branch (speculative)."
},
{
"EventName": "bp_de_redirect",
"EventCode": "0x91",
"BriefDescription": "Decoder Overrides Existing Branch Prediction (speculative)."
},
{
"EventName": "bp_l1_tlb_fetch_hit",
"EventCode": "0x94",
"BriefDescription": "The number of instruction fetches that hit in the L1 ITLB.",
"UMask": "0xFF"
},
{
"EventName": "bp_l1_tlb_fetch_hit.if1g",
"EventCode": "0x94",
"BriefDescription": "The number of instruction fetches that hit in the L1 ITLB. Instruction fetches to a 1GB page.",
"UMask": "0x4"
},
{
"EventName": "bp_l1_tlb_fetch_hit.if2m",
"EventCode": "0x94",
"BriefDescription": "The number of instruction fetches that hit in the L1 ITLB. Instruction fetches to a 2MB page.",
"UMask": "0x2"
},
{
"EventName": "bp_l1_tlb_fetch_hit.if4k",
"EventCode": "0x94",
"BriefDescription": "The number of instruction fetches that hit in the L1 ITLB. Instruction fetches to a 4KB page.",
"UMask": "0x1"
},
{
"EventName": "bp_tlb_rel",
"EventCode": "0x99",
"BriefDescription": "The number of ITLB reload requests."
}
]
This diff is collapsed.
[
{
"EventName": "ex_ret_instr",
"EventCode": "0xc0",
"BriefDescription": "Retired Instructions."
},
{
"EventName": "ex_ret_cops",
"EventCode": "0xc1",
"BriefDescription": "Retired Uops.",
"PublicDescription": "The number of micro-ops retired. This count includes all processor activity (instructions, exceptions, interrupts, microcode assists, etc.). The number of events logged per cycle can vary from 0 to 8."
},
{
"EventName": "ex_ret_brn",
"EventCode": "0xc2",
"BriefDescription": "Retired Branch Instructions.",
"PublicDescription": "The number of branch instructions retired. This includes all types of architectural control flow changes, including exceptions and interrupts."
},
{
"EventName": "ex_ret_brn_misp",
"EventCode": "0xc3",
"BriefDescription": "Retired Branch Instructions Mispredicted.",
"PublicDescription": "The number of branch instructions retired, of any type, that were not correctly predicted. This includes those for which prediction is not attempted (far control transfers, exceptions and interrupts)."
},
{
"EventName": "ex_ret_brn_tkn",
"EventCode": "0xc4",
"BriefDescription": "Retired Taken Branch Instructions.",
"PublicDescription": "The number of taken branches that were retired. This includes all types of architectural control flow changes, including exceptions and interrupts."
},
{
"EventName": "ex_ret_brn_tkn_misp",
"EventCode": "0xc5",
"BriefDescription": "Retired Taken Branch Instructions Mispredicted.",
"PublicDescription": "The number of retired taken branch instructions that were mispredicted."
},
{
"EventName": "ex_ret_brn_far",
"EventCode": "0xc6",
"BriefDescription": "Retired Far Control Transfers.",
"PublicDescription": "The number of far control transfers retired including far call/jump/return, IRET, SYSCALL and SYSRET, plus exceptions and interrupts. Far control transfers are not subject to branch prediction."
},
{
"EventName": "ex_ret_brn_resync",
"EventCode": "0xc7",
"BriefDescription": "Retired Branch Resyncs.",
"PublicDescription": "The number of resync branches. These reflect pipeline restarts due to certain microcode assists and events such as writes to the active instruction stream, among other things. Each occurrence reflects a restart penalty similar to a branch mispredict. This is relatively rare."
},
{
"EventName": "ex_ret_near_ret",
"EventCode": "0xc8",
"BriefDescription": "Retired Near Returns.",
"PublicDescription": "The number of near return instructions (RET or RET Iw) retired."
},
{
"EventName": "ex_ret_near_ret_mispred",
"EventCode": "0xc9",
"BriefDescription": "Retired Near Returns Mispredicted.",
"PublicDescription": "The number of near returns retired that were not correctly predicted by the return address predictor. Each such mispredict incurs the same penalty as a mispredicted conditional branch instruction."
},
{
"EventName": "ex_ret_brn_ind_misp",
"EventCode": "0xca",
"BriefDescription": "Retired Indirect Branch Instructions Mispredicted."
},
{
"EventName": "ex_ret_mmx_fp_instr.sse_instr",
"EventCode": "0xcb",
"BriefDescription": "SSE instructions (SSE, SSE2, SSE3, SSSE3, SSE4A, SSE41, SSE42, AVX).",
"PublicDescription": "The number of MMX, SSE or x87 instructions retired. The UnitMask allows the selection of the individual classes of instructions as given in the table. Each increment represents one complete instruction. Since this event includes non-numeric instructions it is not suitable for measuring MFLOPS. SSE instructions (SSE, SSE2, SSE3, SSSE3, SSE4A, SSE41, SSE42, AVX).",
"UMask": "0x4"
},
{
"EventName": "ex_ret_mmx_fp_instr.mmx_instr",
"EventCode": "0xcb",
"BriefDescription": "MMX instructions.",
"PublicDescription": "The number of MMX, SSE or x87 instructions retired. The UnitMask allows the selection of the individual classes of instructions as given in the table. Each increment represents one complete instruction. Since this event includes non-numeric instructions it is not suitable for measuring MFLOPS. MMX instructions.",
"UMask": "0x2"
},
{
"EventName": "ex_ret_mmx_fp_instr.x87_instr",
"EventCode": "0xcb",
"BriefDescription": "x87 instructions.",
"PublicDescription": "The number of MMX, SSE or x87 instructions retired. The UnitMask allows the selection of the individual classes of instructions as given in the table. Each increment represents one complete instruction. Since this event includes non-numeric instructions it is not suitable for measuring MFLOPS. x87 instructions.",
"UMask": "0x1"
},
{
"EventName": "ex_ret_cond",
"EventCode": "0xd1",
"BriefDescription": "Retired Conditional Branch Instructions."
},
{
"EventName": "ex_ret_cond_misp",
"EventCode": "0xd2",
"BriefDescription": "Retired Conditional Branch Instructions Mispredicted."
},
{
"EventName": "ex_div_busy",
"EventCode": "0xd3",
"BriefDescription": "Div Cycles Busy count."
},
{
"EventName": "ex_div_count",
"EventCode": "0xd4",
"BriefDescription": "Div Op Count."
},
{
"EventName": "ex_tagged_ibs_ops.ibs_count_rollover",
"EventCode": "0x1cf",
"BriefDescription": "Tagged IBS Ops. Number of times an op could not be tagged by IBS because of a previous tagged op that has not retired.",
"UMask": "0x4"
},
{
"EventName": "ex_tagged_ibs_ops.ibs_tagged_ops_ret",
"EventCode": "0x1cf",
"BriefDescription": "Tagged IBS Ops. Number of Ops tagged by IBS that retired.",
"UMask": "0x2"
},
{
"EventName": "ex_tagged_ibs_ops.ibs_tagged_ops",
"EventCode": "0x1cf",
"BriefDescription": "Tagged IBS Ops. Number of Ops tagged by IBS.",
"UMask": "0x1"
},
{
"EventName": "ex_ret_fus_brnch_inst",
"EventCode": "0x1d0",
"BriefDescription": "Retired Fused Instructions. The number of fuse-branch instructions retired per cycle. The number of events logged per cycle can vary from 0-8.",
}
]
[
{
"EventName": "fpu_pipe_assignment.total",
"EventCode": "0x00",
"BriefDescription": "Total number of fp uOps.",
"PublicDescription": "Total number of fp uOps. The number of operations (uOps) dispatched to each of the 4 FPU execution pipelines. This event reflects how busy the FPU pipelines are and may be used for workload characterization. This includes all operations performed by x87, MMX, and SSE instructions, including moves. Each increment represents a one- cycle dispatch event. This event is a speculative event. Since this event includes non-numeric operations it is not suitable for measuring MFLOPS.",
"UMask": "0xf"
},
{
"EventName": "fpu_pipe_assignment.total3",
"EventCode": "0x00",
"BriefDescription": "Total number uOps assigned to pipe 3.",
"PublicDescription": "The number of operations (uOps) dispatched to each of the 4 FPU execution pipelines. This event reflects how busy the FPU pipelines are and may be used for workload characterization. This includes all operations performed by x87, MMX, and SSE instructions, including moves. Each increment represents a one-cycle dispatch event. This event is a speculative event. Since this event includes non-numeric operations it is not suitable for measuring MFLOPS. Total number uOps assigned to pipe 3.",
"UMask": "0x8"
},
{
"EventName": "fpu_pipe_assignment.total2",
"EventCode": "0x00",
"BriefDescription": "Total number uOps assigned to pipe 2.",
"PublicDescription": "The number of operations (uOps) dispatched to each of the 4 FPU execution pipelines. This event reflects how busy the FPU pipelines are and may be used for workload characterization. This includes all operations performed by x87, MMX, and SSE instructions, including moves. Each increment represents a one- cycle dispatch event. This event is a speculative event. Since this event includes non-numeric operations it is not suitable for measuring MFLOPS. Total number uOps assigned to pipe 2.",
"UMask": "0x4"
},
{
"EventName": "fpu_pipe_assignment.total1",
"EventCode": "0x00",
"BriefDescription": "Total number uOps assigned to pipe 1.",
"PublicDescription": "The number of operations (uOps) dispatched to each of the 4 FPU execution pipelines. This event reflects how busy the FPU pipelines are and may be used for workload characterization. This includes all operations performed by x87, MMX, and SSE instructions, including moves. Each increment represents a one- cycle dispatch event. This event is a speculative event. Since this event includes non-numeric operations it is not suitable for measuring MFLOPS. Total number uOps assigned to pipe 1.",
"UMask": "0x2"
},
{
"EventName": "fpu_pipe_assignment.total0",
"EventCode": "0x00",
"BriefDescription": "Total number of fp uOps on pipe 0.",
"PublicDescription": "The number of operations (uOps) dispatched to each of the 4 FPU execution pipelines. This event reflects how busy the FPU pipelines are and may be used for workload characterization. This includes all operations performed by x87, MMX, and SSE instructions, including moves. Each increment represents a one- cycle dispatch event. This event is a speculative event. Since this event includes non-numeric operations it is not suitable for measuring MFLOPS. Total number uOps assigned to pipe 0.",
"UMask": "0x1"
},
{
"EventName": "fp_ret_sse_avx_ops.all",
"EventCode": "0x03",
"BriefDescription": "All FLOPS. This is a retire-based event. The number of retired SSE/AVX FLOPS. The number of events logged per cycle can vary from 0 to 64. This event can count above 15.",
"UMask": "0xff"
},
{
"EventName": "fp_ret_sse_avx_ops.mac_flops",
"EventCode": "0x03",
"BriefDescription": "Multiply-add FLOPS. Multiply-add counts as 2 FLOPS. This is a retire-based event. The number of retired SSE/AVX FLOPS. The number of events logged per cycle can vary from 0 to 64. This event can count above 15.",
"PublicDescription": "",
"UMask": "0x8"
},
{
"EventName": "fp_ret_sse_avx_ops.div_flops",
"EventCode": "0x03",
"BriefDescription": "Divide/square root FLOPS. This is a retire-based event. The number of retired SSE/AVX FLOPS. The number of events logged per cycle can vary from 0 to 64. This event can count above 15.",
"UMask": "0x4"
},
{
"EventName": "fp_ret_sse_avx_ops.mult_flops",
"EventCode": "0x03",
"BriefDescription": "Multiply FLOPS. This is a retire-based event. The number of retired SSE/AVX FLOPS. The number of events logged per cycle can vary from 0 to 64. This event can count above 15.",
"UMask": "0x2"
},
{
"EventName": "fp_ret_sse_avx_ops.add_sub_flops",
"EventCode": "0x03",
"BriefDescription": "Add/subtract FLOPS. This is a retire-based event. The number of retired SSE/AVX FLOPS. The number of events logged per cycle can vary from 0 to 64. This event can count above 15.",
"UMask": "0x1"
},
{
"EventName": "fp_num_mov_elim_scal_op.optimized",
"EventCode": "0x04",
"BriefDescription": "Number of Scalar Ops optimized. This is a dispatch based speculative event, and is useful for measuring the effectiveness of the Move elimination and Scalar code optimization schemes.",
"UMask": "0x8"
},
{
"EventName": "fp_num_mov_elim_scal_op.opt_potential",
"EventCode": "0x04",
"BriefDescription": "Number of Ops that are candidates for optimization (have Z-bit either set or pass). This is a dispatch based speculative event, and is useful for measuring the effectiveness of the Move elimination and Scalar code optimization schemes.",
"UMask": "0x4"
},
{
"EventName": "fp_num_mov_elim_scal_op.sse_mov_ops_elim",
"EventCode": "0x04",
"BriefDescription": "Number of SSE Move Ops eliminated. This is a dispatch based speculative event, and is useful for measuring the effectiveness of the Move elimination and Scalar code optimization schemes.",
"UMask": "0x2"
},
{
"EventName": "fp_num_mov_elim_scal_op.sse_mov_ops",
"EventCode": "0x04",
"BriefDescription": "Number of SSE Move Ops. This is a dispatch based speculative event, and is useful for measuring the effectiveness of the Move elimination and Scalar code optimization schemes.",
"UMask": "0x1"
},
{
"EventName": "fp_retired_ser_ops.sse_bot_ret",
"EventCode": "0x05",
"BriefDescription": "SSE bottom-executing uOps retired. The number of serializing Ops retired.",
"UMask": "0x8"
},
{
"EventName": "fp_retired_ser_ops.sse_ctrl_ret",
"EventCode": "0x05",
"BriefDescription": "The number of serializing Ops retired. SSE control word mispredict traps due to mispredictions in RC, FTZ or DAZ, or changes in mask bits.",
"UMask": "0x4"
},
{
"EventName": "fp_retired_ser_ops.x87_bot_ret",
"EventCode": "0x05",
"BriefDescription": "x87 bottom-executing uOps retired. The number of serializing Ops retired.",
"UMask": "0x2"
},
{
"EventName": "fp_retired_ser_ops.x87_ctrl_ret",
"EventCode": "0x05",
"BriefDescription": "x87 control word mispredict traps due to mispredictions in RC or PC, or changes in mask bits. The number of serializing Ops retired.",
"UMask": "0x1"
},
{
"EventName": "fp_disp_faults.ymm_spill_fault",
"EventCode": "0x0e",
"BriefDescription": "Floating Point Dispatch Faults. YMM spill fault.",
"UMask": "0x8"
},
{
"EventName": "fp_disp_faults.ymm_fill_fault",
"EventCode": "0x0e",
"BriefDescription": "Floating Point Dispatch Faults. YMM fill fault.",
"UMask": "0x4"
},
{
"EventName": "fp_disp_faults.xmm_fill_fault",
"EventCode": "0x0e",
"BriefDescription": "Floating Point Dispatch Faults. XMM fill fault.",
"UMask": "0x2"
},
{
"EventName": "fp_disp_faults.x87_fill_fault",
"EventCode": "0x0e",
"BriefDescription": "Floating Point Dispatch Faults. x87 fill fault.",
"UMask": "0x1"
}
]
This diff is collapsed.
[
{
"EventName": "de_dis_uop_queue_empty_di0",
"EventCode": "0xa9",
"BriefDescription": "Cycles where the Micro-Op Queue is empty."
},
{
"EventName": "de_dis_uops_from_decoder",
"EventCode": "0xaa",
"BriefDescription": "Ops dispatched from either the decoders, OpCache or both.",
"UMask": "0xff"
},
{
"EventName": "de_dis_uops_from_decoder.opcache_dispatched",
"EventCode": "0xaa",
"BriefDescription": "Count of dispatched Ops from OpCache.",
"UMask": "0x2"
},
{
"EventName": "de_dis_uops_from_decoder.decoder_dispatched",
"EventCode": "0xaa",
"BriefDescription": "Count of dispatched Ops from Decoder.",
"UMask": "0x1"
},
{
"EventName": "de_dis_dispatch_token_stalls1.fp_misc_rsrc_stall",
"EventCode": "0xae",
"BriefDescription": "Cycles where a dispatch group is valid but does not get dispatched due to a token stall. FP Miscellaneous resource unavailable. Applies to the recovery of mispredicts with FP ops.",
"UMask": "0x80"
},
{
"EventName": "de_dis_dispatch_token_stalls1.fp_sch_rsrc_stall",
"EventCode": "0xae",
"BriefDescription": "Cycles where a dispatch group is valid but does not get dispatched due to a token stall. FP scheduler resource stall. Applies to ops that use the FP scheduler.",
"UMask": "0x40"
},
{
"EventName": "de_dis_dispatch_token_stalls1.fp_reg_file_rsrc_stall",
"EventCode": "0xae",
"BriefDescription": "Cycles where a dispatch group is valid but does not get dispatched due to a token stall. Floating point register file resource stall. Applies to all FP ops that have a destination register.",
"UMask": "0x20"
},
{
"EventName": "de_dis_dispatch_token_stalls1.taken_branch_buffer_rsrc_stall",
"EventCode": "0xae",
"BriefDescription": "Cycles where a dispatch group is valid but does not get dispatched due to a token stall. Taken branch buffer resource stall.",
"UMask": "0x10"
},
{
"EventName": "de_dis_dispatch_token_stalls1.int_sched_misc_token_stall",
"EventCode": "0xae",
"BriefDescription": "Cycles where a dispatch group is valid but does not get dispatched due to a token stall. Integer Scheduler miscellaneous resource stall.",
"UMask": "0x8"
},
{
"EventName": "de_dis_dispatch_token_stalls1.store_queue_token_stall",
"EventCode": "0xae",
"BriefDescription": "Cycles where a dispatch group is valid but does not get dispatched due to a token stall. Store queue resource stall. Applies to all ops with store semantics.",
"UMask": "0x4"
},
{
"EventName": "de_dis_dispatch_token_stalls1.load_queue_token_stall",
"EventCode": "0xae",
"BriefDescription": "Cycles where a dispatch group is valid but does not get dispatched due to a token stall. Load queue resource stall. Applies to all ops with load semantics.",
"UMask": "0x2"
},
{
"EventName": "de_dis_dispatch_token_stalls1.int_phy_reg_file_token_stall",
"EventCode": "0xae",
"BriefDescription": "Cycles where a dispatch group is valid but does not get dispatched due to a token stall. Integer Physical Register File resource stall. Applies to all ops that have an integer destination register.",
"UMask": "0x1"
},
{
"EventName": "de_dis_dispatch_token_stalls0.sc_agu_dispatch_stall",
"EventCode": "0xaf",
"BriefDescription": "Cycles where a dispatch group is valid but does not get dispatched due to a token stall. SC AGU dispatch stall.",
"UMask": "0x40"
},
{
"EventName": "de_dis_dispatch_token_stalls0.retire_token_stall",
"EventCode": "0xaf",
"BriefDescription": "Cycles where a dispatch group is valid but does not get dispatched due to a token stall. RETIRE Tokens unavailable.",
"UMask": "0x20"
},
{
"EventName": "de_dis_dispatch_token_stalls0.agsq_token_stall",
"EventCode": "0xaf",
"BriefDescription": "Cycles where a dispatch group is valid but does not get dispatched due to a token stall. AGSQ Tokens unavailable.",
"UMask": "0x10"
},
{
"EventName": "de_dis_dispatch_token_stalls0.alu_token_stall",
"EventCode": "0xaf",
"BriefDescription": "Cycles where a dispatch group is valid but does not get dispatched due to a token stall. ALU tokens total unavailable.",
"UMask": "0x8"
},
{
"EventName": "de_dis_dispatch_token_stalls0.alsq3_0_token_stall",
"EventCode": "0xaf",
"BriefDescription": "Cycles where a dispatch group is valid but does not get dispatched due to a token stall. ALSQ3_0_TokenStall.",
"UMask": "0x4"
},
{
"EventName": "de_dis_dispatch_token_stalls0.alsq2_token_stall",
"EventCode": "0xaf",
"BriefDescription": "Cycles where a dispatch group is valid but does not get dispatched due to a token stall. ALSQ 2 Tokens unavailable.",
"UMask": "0x2"
},
{
"EventName": "de_dis_dispatch_token_stalls0.alsq1_token_stall",
"EventCode": "0xaf",
"BriefDescription": "Cycles where a dispatch group is valid but does not get dispatched due to a token stall. ALSQ 1 Tokens unavailable.",
"UMask": "0x1"
}
]
...@@ -37,3 +37,4 @@ GenuineIntel-6-7D,v1,icelake,core ...@@ -37,3 +37,4 @@ GenuineIntel-6-7D,v1,icelake,core
GenuineIntel-6-7E,v1,icelake,core GenuineIntel-6-7E,v1,icelake,core
GenuineIntel-6-86,v1,tremontx,core GenuineIntel-6-86,v1,tremontx,core
AuthenticAMD-23-([12][0-9A-F]|[0-9A-F]),v1,amdzen1,core AuthenticAMD-23-([12][0-9A-F]|[0-9A-F]),v1,amdzen1,core
AuthenticAMD-23-[[:xdigit:]]+,v1,amdzen2,core
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment