Commit 48d02a1d authored by Andi Kleen's avatar Andi Kleen Committed by Arnaldo Carvalho de Melo

perf script: Add 'brstackinsn' for branch stacks

Implement printing instruction sequences as hex dump for branch stacks.

This relies on the x86 instruction decoder used by the PT decoder to
find the lengths of instructions to dump them individually.

This is good enough for pattern matching.

This allows to study hot paths for individual samples, together with
branch misprediction and cycle count / IPC information if available (on
Skylake systems).

  % perf record -b ...
  % perf script -F brstackinsn
  ...
    read_hpet+67:
          ffffffff9905b843        insn: 74 ea                     # PRED
          ffffffff9905b82f        insn: 85 c9
          ffffffff9905b831        insn: 74 12
          ffffffff9905b833        insn: f3 90
          ffffffff9905b835        insn: 48 8b 0f
          ffffffff9905b838        insn: 48 89 ca
          ffffffff9905b83b        insn: 48 c1 ea 20
          ffffffff9905b83f        insn: 39 f2
          ffffffff9905b841        insn: 89 d0
          ffffffff9905b843        insn: 74 ea                     # PRED

Only works when no special branch filters are specified.

Occasionally the path does not reach up to the sample IP, as the LBRs
may be frozen before executing a final jump. In this case we print a
special message.

The instruction dumper piggy backs on the existing infrastructure from
the IP PT decoder.

An earlier iteration of this patch relied on a disassembler, but this
version only uses the existing instruction decoder.

Committer note:

Added hint about how to get suitable perf.data files for use with
'-F brstackinsm':

  $ perf record usleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.018 MB perf.data (8 samples) ]
  $
  $ perf script -F brstackinsn
  Display of branch stack assembler requested, but non all-branch filter set
  Hint: run 'perf record -b ...'
  $
Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Link: http://lkml.kernel.org/r/20170223234634.583-1-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
parent 74beb09a
...@@ -116,7 +116,7 @@ OPTIONS ...@@ -116,7 +116,7 @@ OPTIONS
--fields:: --fields::
Comma separated list of fields to print. Options are: Comma separated list of fields to print. Options are:
comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff, comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff,
srcline, period, iregs, brstack, brstacksym, flags, bpf-output, srcline, period, iregs, brstack, brstacksym, flags, bpf-output, brstackinsn,
callindent, insn, insnlen. Field list can be prepended with the type, trace, sw or hw, callindent, insn, insnlen. Field list can be prepended with the type, trace, sw or hw,
to indicate to which event type the field list applies. to indicate to which event type the field list applies.
e.g., -F sw:comm,tid,time,ip,sym and -F trace:time,cpu,trace e.g., -F sw:comm,tid,time,ip,sym and -F trace:time,cpu,trace
...@@ -189,15 +189,20 @@ OPTIONS ...@@ -189,15 +189,20 @@ OPTIONS
i.e., -F "" is not allowed. i.e., -F "" is not allowed.
The brstack output includes branch related information with raw addresses using the The brstack output includes branch related information with raw addresses using the
/v/v/v/v/ syntax in the following order: /v/v/v/v/cycles syntax in the following order:
FROM: branch source instruction FROM: branch source instruction
TO : branch target instruction TO : branch target instruction
M/P/-: M=branch target mispredicted or branch direction was mispredicted, P=target predicted or direction predicted, -=not supported M/P/-: M=branch target mispredicted or branch direction was mispredicted, P=target predicted or direction predicted, -=not supported
X/- : X=branch inside a transactional region, -=not in transaction region or not supported X/- : X=branch inside a transactional region, -=not in transaction region or not supported
A/- : A=TSX abort entry, -=not aborted region or not supported A/- : A=TSX abort entry, -=not aborted region or not supported
cycles
The brstacksym is identical to brstack, except that the FROM and TO addresses are printed in a symbolic form if possible. The brstacksym is identical to brstack, except that the FROM and TO addresses are printed in a symbolic form if possible.
When brstackinsn is specified the full assembler sequences of branch sequences for each sample
is printed. This is the full execution path leading to the sample. This is only supported when the
sample was recorded with perf record -b or -j any.
-k:: -k::
--vmlinux=<file>:: --vmlinux=<file>::
vmlinux pathname vmlinux pathname
...@@ -302,6 +307,10 @@ include::itrace.txt[] ...@@ -302,6 +307,10 @@ include::itrace.txt[]
stop time is not given (i.e, time string is 'x.y,') then analysis goes stop time is not given (i.e, time string is 'x.y,') then analysis goes
to end of file. to end of file.
--max-blocks::
Set the maximum number of program blocks to print with brstackasm for
each sample.
SEE ALSO SEE ALSO
-------- --------
linkperf:perf-record[1], linkperf:perf-script-perl[1], linkperf:perf-record[1], linkperf:perf-script-perl[1],
......
This diff is collapsed.
...@@ -82,6 +82,7 @@ libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/ ...@@ -82,6 +82,7 @@ libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
libperf-$(CONFIG_AUXTRACE) += intel-pt.o libperf-$(CONFIG_AUXTRACE) += intel-pt.o
libperf-$(CONFIG_AUXTRACE) += intel-bts.o libperf-$(CONFIG_AUXTRACE) += intel-bts.o
libperf-y += parse-branch-options.o libperf-y += parse-branch-options.o
libperf-y += dump-insn.o
libperf-y += parse-regs-options.o libperf-y += parse-regs-options.o
libperf-y += term.o libperf-y += term.o
libperf-y += help-unknown-cmd.o libperf-y += help-unknown-cmd.o
......
#include <linux/compiler.h>
#include "dump-insn.h"
/* Fallback code */
__weak
const char *dump_insn(struct perf_insn *x __maybe_unused,
u64 ip __maybe_unused, u8 *inbuf __maybe_unused,
int inlen __maybe_unused, int *lenp)
{
if (lenp)
*lenp = 0;
return "?";
}
#ifndef __PERF_DUMP_INSN_H
#define __PERF_DUMP_INSN_H 1
#define MAXINSN 15
#include <linux/types.h>
struct thread;
struct perf_insn {
/* Initialized by callers: */
struct thread *thread;
u8 cpumode;
bool is64bit;
int cpu;
/* Temporary */
char out[256];
};
const char *dump_insn(struct perf_insn *x, u64 ip,
u8 *inbuf, int inlen, int *lenp);
#endif
...@@ -26,6 +26,7 @@ ...@@ -26,6 +26,7 @@
#include "insn.c" #include "insn.c"
#include "intel-pt-insn-decoder.h" #include "intel-pt-insn-decoder.h"
#include "dump-insn.h"
#if INTEL_PT_INSN_BUF_SZ < MAX_INSN_SIZE || INTEL_PT_INSN_BUF_SZ > MAX_INSN #if INTEL_PT_INSN_BUF_SZ < MAX_INSN_SIZE || INTEL_PT_INSN_BUF_SZ > MAX_INSN
#error Instruction buffer size too small #error Instruction buffer size too small
...@@ -179,6 +180,29 @@ int intel_pt_get_insn(const unsigned char *buf, size_t len, int x86_64, ...@@ -179,6 +180,29 @@ int intel_pt_get_insn(const unsigned char *buf, size_t len, int x86_64,
return 0; return 0;
} }
const char *dump_insn(struct perf_insn *x, uint64_t ip __maybe_unused,
u8 *inbuf, int inlen, int *lenp)
{
struct insn insn;
int n, i;
int left;
insn_init(&insn, inbuf, inlen, x->is64bit);
insn_get_length(&insn);
if (!insn_complete(&insn) || insn.length > inlen)
return "<bad>";
if (lenp)
*lenp = insn.length;
left = sizeof(x->out);
n = snprintf(x->out, left, "insn: ");
left -= n;
for (i = 0; i < insn.length; i++) {
n += snprintf(x->out + n, left, "%02x ", inbuf[i]);
left -= n;
}
return x->out;
}
const char *branch_name[] = { const char *branch_name[] = {
[INTEL_PT_OP_OTHER] = "Other", [INTEL_PT_OP_OTHER] = "Other",
[INTEL_PT_OP_CALL] = "Call", [INTEL_PT_OP_CALL] = "Call",
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment