- 21 Jun, 2017 23 commits
-
-
Ingo Molnar authored
Merge tag 'perf-core-for-mingo-4.13-20170621' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: New features: - Add support to measure SMI cost in 'perf stat' (Kan Liang) - Add support for unwinding callchains in powerpc with libdw (Paolo Bonzini) Fixes: - Fix message: cpu list option is -C not -c (Adrian Hunter) - Fix 'perf script' message: field list option is -F not -f (Adrian Hunter) - Intel PT fixes: (Adrian Hunter) o Fix missing stack clear o Ensure IP is zero when state is INTEL_PT_STATE_NO_IP o Fix last_ip usage o Ensure never to set 'last_ip' when packet 'count' is zero o Clear FUP flag on error o Fix transactions_sample_type Infrastructure changes: - Intel PT cleanups/refactorings (Adrian Hunter) o Use FUP always when scanning for an IP o Add missing __fallthrough o Remove redundant initial_skip checks o Allow decoding with branch tracing disabled o Add default config for pass-through branch enable o Add documentation for new config terms o Add decoder support for ptwrite and power event packets o Add reserved byte to CBR packet payload o Add decoder support for CBR events - Move find_process() to the only place that uses it, skimming some more fat from util.[ch] (Arnaldo Carvalho de Melo) - Do parameter validation earlier on fetch_kernel_version() (Arnaldo Carvalho de Melo) - Remove unused _ALL_SOURCE define (Arnaldo Carvalho de Melo) - Add sysfs__write_int function (Kan Liang) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Adrian Hunter authored
Fix message because field list option is -F not -f. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-20-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Fix message because cpu list option is -C not -c Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-19-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
'transactions_sample_type' is needed to correctly inject transactions samples but it was not being set. Set it from the event sample type. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-18-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
'initial_skip' is checked inside the sample synthesis functions which means it is actually being done twice for 'instructions' and 'transactions' samples. Remove the redundant checks. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-17-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Add decoder support for informing the tools of changes to the core-to-bus ratio (CBR). Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-16-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Future proof CBR packet decoding by passing through also the undefined 'reserved' byte in the packet payload. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-15-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Add decoder support for PTWRITE, MWAIT, PWRE, PWRX and EXSTOP packets. This patch only affects the decoder, so the tools still do not select or consume the new information. That is added in subsequent patches. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-14-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Add documentation for new config terms. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-13-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Branch tracing is enabled by default, so a fake config bit called 'pt' (pass-through) was added to allow the 'branch enable' bit to have affect. Add default config 'pt,branch' which will allow users to disable branch tracing using 'branch=0' instead of having to specify 'pt,branch=0'. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-12-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
The kernel now supports the disabling of branch tracing, however the decoder assumes branch tracing is always enabled. Pass through a parameter to indicate whether branch tracing is enabled and use it to avoid cases when the decoder is expecting branch packets. There are 2 such cases. First, FUP packets which can bind to an IP even when there is no branch tracing. Secondly, the decoder will try to use branch packets to find an IP to start decoding or to recover from errors. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-11-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
perf tools uses __fallthrough. Add missing __fallthrough to a switch statement. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1495786658-18063-10-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Sometimes a FUP packet is associated with a TSX transaction and a flag is set to indicate that. Ensure that flag is cleared on any error condition because at that point the decoder can no longer assume it is correct. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1495786658-18063-9-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
The decoder will try to use branch packets to find an IP to start decoding or to recover from errors. Currently the FUP packet is used only in the case of an overflow, however there is no reason for that to be a special case. So just use FUP always when scanning for an IP. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1495786658-18063-8-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Intel PT uses IP compression based on the last IP. For decoding purposes, 'last IP' is not updated when a branch target has been suppressed, which is indicated by IPBytes == 0. IPBytes is stored in the packet 'count', so ensure never to set 'last_ip' when packet 'count' is zero. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1495786658-18063-7-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Intel PT uses IP compression based on the last IP. For decoding purposes, 'last IP' is considered to be reset to zero whenever there is a synchronization packet (PSB). The decoder wasn't doing that, and was treating the zero value to mean that there was no last IP, whereas compression can be done against the zero value. Fix by setting last_ip to zero when a PSB is received and keep track of have_last_ip. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1495786658-18063-6-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
A value of zero is used to indicate that there is no IP. Ensure the value is zero when the state is INTEL_PT_STATE_NO_IP. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1495786658-18063-5-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
The return compression stack must be cleared whenever there is a PSB. Fix one case where that was not happening. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1495786658-18063-4-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
The decoder uses its current timestamp in samples. Usually that is a timestamp that has already passed, but in some cases it is a timestamp for a branch that the decoder is walking towards, and consequently hasn't reached. Improve that situation by using the pkt_state to determine when to use the current or previous timestamp. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1495786658-18063-3-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Adrian Hunter authored
Move decoder error setting into one condition. Cc'ed to stable because later fixes depend on it. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1495786658-18063-2-git-send-email-adrian.hunter@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Paolo Bonzini authored
Porting PPC to libdw only needs an architecture-specific hook to move the register state from perf to libdw. The ARM and x86 architectures already use libdw, and it is useful to have as much common code for the unwinder as possible. Mark Wielaard has contributed a frame-based unwinder to libdw, so that unwinding works even for binaries that do not have CFI information. In addition, libunwind is always preferred to libdw by the build machinery so this cannot introduce regressions on machines that have both libunwind and libdw installed. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Milian Wolff <milian.wolff@kdab.com> Acked-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1496312681-20133-1-git-send-email-pbonzini@redhat.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
Implementing a new --smi-cost mode in perf stat to measure SMI cost. During the measurement, the /sys/device/cpu/freeze_on_smi will be set. The measurement can be done with one counter (unhalted core cycles), and two free running MSR counters (IA32_APERF and SMI_COUNT). In practice, the percentages of SMI core cycles should be more useful than absolute value. So the output will be the percentage of SMI core cycles and SMI#. metric_only will be set by default. SMI cycles% = (aperf - unhalted core cycles) / aperf Here is an example output. Performance counter stats for 'sudo echo ': SMI cycles% SMI# 0.1% 1 0.010858678 seconds time elapsed Users who wants to get the actual value can apply additional --no-metric-only. Signed-off-by: Kan Liang <Kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Elliott <elliott@hpe.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1495825538-5230-3-git-send-email-kan.liang@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kan Liang authored
Add sysfs__write_int() to ease up writing int to sysfs. New interface is: int sysfs__write_int(const char *entry, int value); Also, introducing filename__write_int() which is useful for new helpers to write sysctl values. Signed-off-by: Kan Liang <Kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Elliott <elliott@hpe.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1495825538-5230-2-git-send-email-kan.liang@intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 20 Jun, 2017 13 commits
-
-
Arnaldo Carvalho de Melo authored
Curious as to what this was for I looked at /usr/include/ and only some python headers define this, and it ends up being to enable "extensions" on some old OSes: /* Enable extensions on AIX 3, Interix */ I guess we can remove this one safely. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-omnundlxo2brs552bdl6m0j1@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
While trying to reduce util.[ch] I noticed that fetch_kernel_version() and fetch_ubuntu_kernel_version() do lots of operations only to check if they are needed, i.e. it checks if the pointer where to return the kernel version is NULL only after obtaining the kernel version from /proc/version_signature or by parsing the results from uname(). Do it earlier not to confuse people reading this code in the future :-) Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-i94qwyekk4tzbu0b9ce1r1mz@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Arnaldo Carvalho de Melo authored
And make it static, nobody else uses it, if we ever need it in more places we can carve a new source file for process related methods, for now lets reduce util.{c,h} a tad more. Link: http://lkml.kernel.org/n/tip-zgb28rllvypjibw52aaz9p15@git.kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Ingo Molnar authored
Merge tag 'perf-core-for-mingo-4.13-20170719' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: User visible changes: - Allow adding and removing fields to the default 'perf script' columns, using + or - as field prefixes to do so (Andi Kleen) - Display titles in left frame in the annotate browser (Jin Yao) - Allow resolving the DSO name with 'perf script -F brstack{sym,off},dso' (Mark Santaniello) - Support function filtering in 'perf ftrace' (Namhyung Kim) - Allow specifying function call depth in 'perf ftrace' (Namhyumg Kim) Infrastructure changes: - Adopt __noreturn, __printf, __scanf, noinline, __packed and __aligned __alignment__(()) markers, to make the tools/ source code base to be more compact and look more like kernel code (Arnaldo Carvalho de Melo) - Remove unnecessary check in annotate_browser_write() (Jin Yao) - Return arch from symbol__disassemble() so that callers, such as the annotate TUI browser to use arch specific formattings, such as the upcoming instruction micro-op fusion on Intel Core (Jin Yao) - Remove superfluous check before use in the coresight code base (Kim Phillips) - Remove unused SAMPLE_SIZE defines and BTS priv array (Kim Phillips) - Error handling fix/tidy ups in 'perf config' (Taeung Song) - Avoid error in the BPF proggie built with clang in 'perf test llvm' when PROFILE_ALL_BRANCHES is set (Wang Nan) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-
Taeung Song authored
To simplify the code related to 'ret' variable in cmd_config(), initialize 'ret' with -1 instead of 0 and use goto to perform resource release at the end of the function, setting ret to zero just before the out_err label, as usual in the kernel sources. Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1497671202-20495-1-git-send-email-treeze.taeung@gmail.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Taeung Song authored
show_spec_config() and set_config() can be called multiple times in the loop in cmd_config(). However, The error cases of them wasn't checked, so fix it. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1497671197-20450-1-git-send-email-treeze.taeung@gmail.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Namhyung Kim authored
The -D/--graph-depth option is to set max graph depth. The following example traces max 2-depth of page fault handler. $ sudo perf ftrace -G __do_page_fault -D 2 -- hello ... 0) | __do_page_fault() { 0) 0.063 us | down_read_trylock(); 0) 0.251 us | find_vma(); 0) 5.374 us | handle_mm_fault(); 0) 0.054 us | up_read(); 0) 7.463 us | } ... Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20170618142302.25390-4-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Namhyung Kim authored
The -T/--trace-funcs and -N/--notrace-funcs options are to specify functions to enable/disable tracing dynamically. The -G/--graph-funcs and -g/--nograph-funcs options are to set filters for function graph tracer. For example, to trace fault handling functions only: $ sudo perf ftrace -T *fault hello 0) | __do_page_fault() { 0) | handle_mm_fault() { 0) 2.117 us | __handle_mm_fault(); 0) 3.627 us | } 0) 7.811 us | } 0) | __do_page_fault() { 0) | handle_mm_fault() { 0) 2.014 us | __handle_mm_fault(); 0) 2.424 us | } 0) 2.951 us | } ... To trace all functions executed in __do_page_fault: $ sudo perf ftrace -G __do_page_fault hello 2) | __do_page_fault() { 3) 0.060 us | down_read_trylock(); 3) | find_vma() { 3) 0.075 us | vmacache_find(); 3) 0.053 us | vmacache_update(); 3) 1.246 us | } 3) | handle_mm_fault() { 3) 0.063 us | __rcu_read_lock(); 3) 0.056 us | mem_cgroup_from_task(); 3) 0.057 us | __rcu_read_unlock(); 3) | __handle_mm_fault() { 3) | filemap_map_pages() { 3) 0.058 us | __rcu_read_lock(); 3) | alloc_set_pte() { ... But don't want to show details in handle_mm_fault: $ sudo perf ftrace -G __do_page_fault -g handle_mm_fault hello 3) | __do_page_fault() { 3) 0.049 us | down_read_trylock(); 3) | find_vma() { 3) 0.048 us | vmacache_find(); 3) 0.041 us | vmacache_update(); 3) 0.680 us | } 3) 0.036 us | up_read(); 3) 4.547 us | } /* __do_page_fault */ ... Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20170618142302.25390-3-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Namhyung Kim authored
The 'perf ftrace' command fails to reset tracer after finishing recording like below: $ sudo perf ftrace -v hello write 'nop' to tracing/current_tracer failed: Device or resource busy ... This is because the trace_pipe file is open in pager process. Move the pager setup to before opening the file. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: kernel-team@lge.com Fixes: 58335964 ("perf ftrace: Use pager for displaying result") Link: http://lkml.kernel.org/r/20170618142302.25390-2-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Namhyung Kim authored
It'd be better for debugging to show an error message when it fails to setup ftrace for some reason. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20170618142302.25390-1-namhyung@kernel.orgSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mark Santaniello authored
The idea here is to make AutoFDO easier in cloud environment with ASLR. It's easiest to show how this is useful by example. I built a small test akin to "while(1) { do_nothing(); }" where the do_nothing function is loaded from a dso: $ cat burncpu.cpp #include <dlfcn.h> int main() { void* handle = dlopen("./dso.so", RTLD_LAZY); if (!handle) return -1; typedef void (*fp)(); fp do_nothing = (fp) dlsym(handle, "do_nothing"); while(1) { do_nothing(); } } $ cat dso.cpp extern "C" void do_nothing() {} $ cat build.sh #!/bin/bash g++ -shared dso.cpp -o dso.so g++ burncpu.cpp -o burncpu -ldl I sampled the execution of this program with perf record -b. Using the existing "brstack,dso", we get absolute addresses that are affected by ASLR, and could be different on different hosts. The address does not uniquely identify a branch/target in the binary: $ perf script -F brstack,dso | sed 's/\/0 /\/0\n/g' | grep burncpu | grep dso.so | head -n 1 0x7f967139b6aa(/tmp/burncpu/dso.so)/0x4006b1(/tmp/burncpu/exe)/P/-/-/0 Using the existing "brstacksym,dso" is a little better, because the symbol plus offset and dso name *does* uniquely identify a branch/target in the binary. Ultimately, however, AutoFDO wants a simple offset into the binary, so we'd have to undo all the work perf did to symbolize in the first place: $ perf script -F brstacksym,dso | sed 's/\/0 /\/0\n/g' | grep burncpu | grep dso.so | head -n 1 do_nothing+0x5(/tmp/burncpu/dso.so)/main+0x44(/tmp/burncpu/exe)/P/-/-/0 With the new "brstackoff,dso" we get what we need: a simple offset into a specific dso/binary that uniquely identifies a branch/target: $ perf script -F brstackoff,dso | sed 's/\/0 /\/0\n/g' | grep burncpu | grep dso.so | head -n 1 0x6aa(/tmp/burncpu/dso.so)/0x4006b1(/tmp/burncpu/exe)/P/-/-/0 Signed-off-by: Mark Santaniello <marksan@fb.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170619163825.2012979-2-marksan@fb.com [ Updated documentation about 'brstackoff' using text from above ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Mark Santaniello authored
Perf script can report the dso for "addr" and "ip" fields. This adds the same support for the "brstack" and "brstacksym" fields. This can be helpful for AutoFDO: we can ignore LBR entries unless the source and target address are both in the target module we are about to build. I built a small test akin to "while(1) { do_nothing(); }" where the do_nothing function is loaded from a dso: $ cat burncpu.cpp #include <dlfcn.h> int main() { void* handle = dlopen("./dso.so", RTLD_LAZY); if (!handle) return -1; typedef void (*fp)(); fp do_nothing = (fp) dlsym(handle, "do_nothing"); while(1) { do_nothing(); } } $ cat dso.cpp extern "C" void do_nothing() {} $ cat build.sh #!/bin/bash g++ -shared dso.cpp -o dso.so g++ burncpu.cpp -o burncpu -ldl I sampled the execution with perf record -b. Using the new perf script functionality I can easily find cases where there was a transition from one dso to another: $ perf record -a -b -- sleep 5 [ perf record: Woken up 55 times to write data ] [ perf record: Captured and wrote 18.815 MB perf.data (43593 samples) ] $ perf script -F brstack,dso | sed 's/\/0 /\/0\n/g' | grep burncpu | grep dso.so | head -n 1 0x7f967139b6aa(/tmp/burncpu/dso.so)/0x4006b1(/tmp/burncpu/exe)/P/-/-/0 $ perf script -F brstacksym,dso | sed 's/\/0 /\/0\n/g' | grep burncpu | grep dso.so | head -n 1 do_nothing+0x5(/tmp/burncpu/dso.so)/main+0x44(/tmp/burncpu/exe)/P/-/-/0 Signed-off-by: Mark Santaniello <marksan@fb.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170619163825.2012979-1-marksan@fb.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
- 19 Jun, 2017 4 commits
-
-
Wang Nan authored
The 'if' keyword is a define that expands to complex code when CONFIG_PROFILE_ALL_BRANCHES is selected, which causes a 'perf test LLVM' failure like: $ ./perf test LLVM 35: LLVM search and compile : 35.1: Basic BPF llvm compile : Ok 35.2: kbuild searching : Ok 35.3: Compile source for BPF prologue generation: FAILED! 35.4: Compile source for BPF relocation : Skip The only affected test case is bpf-script-test-prologue.c because it uses kernel headers and has 'if' inside. This patch undefines 'if' to make it passes perf test. More detailed analysis from a message in this thread, also by Wang: The problem is caused by following relocation information: $ readelf -a ./llvmsubtest3 ... [ 5] _ftrace_branch PROGBITS 0000000000000000 00000260 00000000000000a0 0000000000000000 WA 0 0 4 ... Relocation section '.relfunc=null_lseek file->f_mode offset orig' at offset 0x490 contains 4 entries: Offset Info Type Sym. Value Sym. Name 000000000038 000b00000001 unrecognized: 1 0000000000000000 _ftrace_branch 0000000000b0 000b00000001 unrecognized: 1 0000000000000000 _ftrace_branch 000000000128 000b00000001 unrecognized: 1 0000000000000000 _ftrace_branch 0000000001c0 000b00000001 unrecognized: 1 0000000000000000 _ftrace_branch Relocation section '.rel_ftrace_branch' at offset 0x4d0 contains 8 entries: Offset Info Type Sym. Value Sym. Name 000000000000 000200000001 unrecognized: 1 0000000000000000 .L__func__.bpf_func__n 000000000008 000100000001 unrecognized: 1 0000000000000015 .L.str 000000000028 000200000001 unrecognized: 1 0000000000000000 .L__func__.bpf_func__n 000000000030 000100000001 unrecognized: 1 0000000000000015 .L.str 000000000050 000200000001 unrecognized: 1 0000000000000000 .L__func__.bpf_func__n 000000000058 000100000001 unrecognized: 1 0000000000000015 .L.str 000000000078 000200000001 unrecognized: 1 0000000000000000 .L__func__.bpf_func__n 000000000080 000100000001 unrecognized: 1 0000000000000015 .L.str ... So I think the failure is because you enabled CONFIG_PROFILE_ALL_BRANCHES. I can reproduce your buggy result by selecting CONFIG_PROFILE_ALL_BRANCHES in my kbuild: $ ./perf test LLVM 35: LLVM search and compile : 35.1: Basic BPF llvm compile : Ok 35.2: kbuild searching : Ok 35.3: Compile source for BPF prologue generation: FAILED! 35.4: Compile source for BPF relocation : Skip Simply undef CONFIG_PROFILE_ALL_BRANCHES in clang opts not working because it is introduced by "#include <uapi/linux/fs.h>", which override cmdline options. So I think the best way is to undefine 'if' inside BPF script. Reported-and-Tested-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/20170620183203.2517-1-wangnan0@huawei.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Jin Yao authored
In annotate browser, we will add support to check fused instructions. While this is x86-specific feature so we need the annotate browser to know what the arch it runs on. symbol__disassemble() has figured out the arch. This patch just lets the arch return from symbol__disassemble and save the arch in annotate browser. Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1497840958-4759-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kim Phillips authored
These defines were probably dragged in from sampling support in earlier patches. They can be put back when needed. Signed-off-by: Kim Phillips <kim.phillips@arm.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170616112339.3fb6986e4ff33e353008244b@arm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-
Kim Phillips authored
The cs_etm_evsel variable is guaranteed to be set at this point in cs_etm_recording_options(). Signed-off-by: Kim Phillips <kim.phillips@arm.com> Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20170615125521.80cc128dc856bc1f2e61b730@arm.comSigned-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-