1. 05 Nov, 2018 13 commits
    • Gustavo Romero's avatar
      perf tools: Fix undefined symbol scnprintf in libperf-jvmti.so · 6ac22262
      Gustavo Romero authored
      Currently jvmti agent can not be used because function scnprintf is not
      present in the agent libperf-jvmti.so. As a result the JVM when using
      such agent to record JITed code profiling information will fail on
      looking up scnprintf:
      
        java: symbol lookup error: lib/libperf-jvmti.so: undefined symbol: scnprintf
      
      This commit fixes that by reverting to the use of snprintf, that can be
      looked up, instead of scnprintf, adding a proper check for the returned
      value in order to print a better error message when the jitdump file
      pathname is too long. Checking the returned value also helps to comply
      with some recent gcc versions, like gcc8, which will fail due to
      truncated writing checks related to the -Werror=format-truncation= flag.
      Signed-off-by: default avatarGustavo Romero <gromero@linux.vnet.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      LPU-Reference: 1541117601-18937-2-git-send-email-gromero@linux.vnet.ibm.com
      Link: https://lkml.kernel.org/n/tip-mvpxxxy7wnzaj74cq75muw3f@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      6ac22262
    • Arnaldo Carvalho de Melo's avatar
      perf beauty: Use SRCARCH, ARCH=x86_64 must map to "x86" to find the headers · e2c39f36
      Arnaldo Carvalho de Melo authored
      Guenter reported that using ARCH=x86_64 to build perf has regressed:
      
        $ make -C tools/perf O=/tmp/build/perf ARCH=x86_64
        make: Entering directory '/home/acme/git/perf/tools/perf'
          BUILD:   Doing 'make -j4' parallel build
          HOSTCC   /tmp/build/perf/fixdep.o
          HOSTLD   /tmp/build/perf/fixdep-in.o
          LINK     /tmp/build/perf/fixdep
      
        Auto-detecting system features:
        ...                         dwarf: [ on  ]
        <SNIP>
        ...                           bpf: [ on  ]
      
          GEN      /tmp/build/perf/common-cmds.h
        make[2]: *** No rule to make target '/home/acme/git/perf/tools/arch/x86_64/include/uapi/asm//mman.h', needed by '/tmp/build/perf/trace/beauty/generated/mmap_flags_array.c'.  Stop.
        make[2]: *** Waiting for unfinished jobs....
          PERF_VERSION = 4.19.gf6c23e3b
        make[1]: *** [Makefile.perf:207: sub-make] Error 2
        make: *** [Makefile:70: all] Error 2
        make: Leaving directory '/home/acme/git/perf/tools/perf'
        $
      
      This is because we must use $(SRCARCH) where we were using $(ARCH), so
      that, just like the top level Makefile, we get this done:
      
        # Additional ARCH settings for x86
        ifeq ($(ARCH),i386)
                SRCARCH := x86
        endif
        ifeq ($(ARCH),x86_64)
                SRCARCH := x86
        endif
      
      Which is done in tools/scripts/Makefile.arch, so switch to use
      $(SRCARCH).
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: fbd7458d ("perf beauty: Wire up the mmap flags table generator to the Makefile")
      Link: https://lkml.kernel.org/r/20181105184612.GD7077@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e2c39f36
    • Adrian Hunter's avatar
      perf intel-pt: Add MTC and CYC timestamps to debug log · f6c23e3b
      Adrian Hunter authored
      One cause of decoding errors is un-synchronized side-band data.
      Timestamps are needed to debug such cases. TSC packet timestamps are
      logged. Log also MTC and CYC timestamps.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Link: http://lkml.kernel.org/r/20181105073505.8129-3-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f6c23e3b
    • Adrian Hunter's avatar
      perf intel-pt: Add more event information to debug log · 93f8be27
      Adrian Hunter authored
      More event information is useful for debugging, especially MMAP events.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Link: http://lkml.kernel.org/r/20181105073505.8129-2-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      93f8be27
    • Adrian Hunter's avatar
      perf scripts python: exported-sql-viewer.py: Fix table find when table re-ordered · 35fa1cee
      Adrian Hunter authored
      Table rows can be re-ordered by selecting a column to sort by. After
      re-ordering, the "find" operation was highlighting the wrong row, fix
      it.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20181104151238.15947-5-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      35fa1cee
    • Adrian Hunter's avatar
      perf scripts python: exported-sql-viewer.py: Add help window · 65b24292
      Adrian Hunter authored
      Add a window to display help. It is also possible to display the help
      only, by using the option "--help-only" instead of a database name.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20181104151238.15947-4-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      65b24292
    • Adrian Hunter's avatar
      perf scripts python: exported-sql-viewer.py: Add Selected branches report · 210cf1f9
      Adrian Hunter authored
      Fetching data from the database can be slow. Add a report that provides
      the ability to select a subset of branches.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20181104151238.15947-3-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      210cf1f9
    • Adrian Hunter's avatar
      perf scripts python: exported-sql-viewer.py: Fall back to /usr/local/lib/libxed.so · 5ed4419d
      Adrian Hunter authored
      Fall back to /usr/local/lib/libxed.so to cater for distributions that do
      not have /usr/local/lib in the library path by default.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20181104151238.15947-2-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5ed4419d
    • Jin Yao's avatar
      perf top: Display the LBR stats in callchain entry · 590ac60d
      Jin Yao authored
      'perf report' has supported the displaying of LBR stats (such as cycles,
      predicted%) in callchain entry.
      
      For example:
      
        $ perf report --branch-history --stdio
      
        --1.01%--intel_idle mwait.h:29
                  intel_idle cpufeature.h:164 (cycles:5)
                  intel_idle cpufeature.h:164 (predicted:76.4%)
                  intel_idle mwait.h:102 (cycles:41)
                  intel_idle current.h:15
      
      While 'perf top' doesn't support that.
      
      For example:
      
        $ perf top -a -b --call-graph branch
      
        -   13.86%     0.23%  [kernel]		[k] __x86_indirect_thunk_rax
           - 13.65% __x86_indirect_thunk_rax
              + 1.69% do_syscall_64
              + 1.68% do_select
              + 1.41% ktime_get
              + 0.70% __schedule
              + 0.62% do_sys_poll
                0.58% __x86_indirect_thunk_rax
      
      Actually it's very easy to enable this feature in 'perf top'.
      
      With this patch, the result is:
      
        $ perf top -a -b --call-graph branch
      
        $ -   13.58%     0.00%  [kernel]		[k] __x86_indirect_thunk_rax
           $ - 13.57% __x86_indirect_thunk_rax (predicted:93.9%)
              $ + 1.78% do_select (cycles:2)
              $ + 1.68% perf_pmu_disable.part.99 (cycles:1)
              $ + 1.45% ___sys_recvmsg (cycles:25)
              $ + 0.81% unix_stream_sendmsg (cycles:18)
              $ + 0.80% ktime_get (cycles:400)
                $ 0.58% pick_next_task_fair (cycles:47)
              $ + 0.56% i915_request_retire (cycles:2)
              $ + 0.52% do_sys_poll (cycles:4)
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1540983995-20462-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      590ac60d
    • Thomas Richter's avatar
      perf stat: Handle different PMU names with common prefix · ea1fa48c
      Thomas Richter authored
      On s390 the CPU Measurement Facility for counters now supports
      2 PMUs named cpum_cf (CPU Measurement Facility for counters) and
      cpum_cf_diag (CPU Measurement Facility for diagnostic counters)
      for one and the same CPU.
      
      Running command
      
       [root@s35lp76 perf]# ./perf stat -e tx_c_tend \
      	 -- ~/mytests/cf-tx-events 1
      
       Measuring transactions
       TX_C_TABORT_NO_SPECIAL: 0 expected:0
       TX_C_TABORT_SPECIAL: 0 expected:0
       TX_C_TEND: 1 expected:1
       TX_NC_TABORT: 11 expected:11
       TX_NC_TEND: 1 expected:1
      
       Performance counter stats for '/root/mytests/cf-tx-events 1':
      
        2      tx_c_tend
      
            0.002120091 seconds time elapsed
      
            0.000121000 seconds user
            0.002127000 seconds sys
      
       [root@s35lp76 perf]#
      
      displays output which is unexpected (and wrong):
      
        2      tx_c_tend
      
      The test program definitely triggers only one transaction, as shown
      in line 'TX_C_TEND: 1 expected:1'.
      
      This is caused by the following call sequence:
      
      pmu_lookup() scans and installs a PMU.
      +--> pmu_aliases() parses all aliases in directory
      		.../<pmu-name>/events/* which are file names.
           +--> pmu_aliases_parse() Read each file in directory and create
                            an new alias entry. This is done with
                +--> perf_pmu__new_alias() and
      	       +--> __perf_pmu__new_alias() which also check for
      	                   identical alias names.
      
      After pmu_aliases() returns, a complete list of event names
      for this pmu has been created. Now function
      
      pmu_add_cpu_aliases()   is called to add the events listed in the json
      |                       files to the alias list of the cpu.
      +--> perf_pmu__find_map()  Returns a pointer to the json events.
      
      Now function pmu_add_cpu_aliases() scans through all events listed
      in the JSON files for this CPU.
      Each json event pmu name is compared with the current PMU being
      built up and if they mismatch, the json event is added to the
      current PMUs alias list.
      To avoid duplicate entries the following comparison is done:
      
      	if (!is_arm_pmu_core(name)) {
      	     pname = pe->pmu ? pe->pmu : "cpu";
      	     if (strncmp(pname, name, strlen(pname)))
      		     continue;
           }
      
      The culprit is the strncmp() function.
      
      Using current s390 PMU naming, the first PMU is 'cpum_cf'
      and a long list of events is added, among them 'tx_c_tend'
      
      When the second PMU named 'cpum_cf_diag' is added, only one event
      named 'CF_DIAG' is added by the pmu_aliases()  function.
      
      Now function pmu_add_cpu_aliases() is invoked for PMU 'cpum_cf_diag'.
      Since the CPUID string is the same for both PMUs, json file events
      for PMU named 'cpum_cf' are added to the PMU 'cpm_cf_diag'
      
      This happens because the strncmp() actually compares:
      
           strncmp("cpum_cf", "cpum_cf_diag", 6);
      
      The first parameter is the pmu name taken from the event in
      the json file. The second parameter is the pmu name of the PMU
      currently being built.
      They are different, but the length of the compare only tests the
      common prefix and this returns 0(true) when it should return false.
      
      Now all events for PMU cpum_cf are added to the alias list for pmu
      cpum_cf_diag.
      
      Later on in function parse_events_add_pmu() the event 'tx_c_end' is
      searched in all available PMUs and found twice, adding it two
      times to the evsel_list global variable which is the root
      of all events. This results in a counter value of 2 instead
      of 1.
      
      Output with this patch:
      
       [root@s35lp76 perf]# ./perf stat -e tx_c_tend \
      			-- ~/mytests/cf-tx-events 1
       Measuring transactions
       TX_C_TABORT_NO_SPECIAL: 0 expected:0
       TX_C_TABORT_SPECIAL: 0 expected:0
       TX_C_TEND: 1 expected:1
       TX_NC_TABORT: 11 expected:11
       TX_NC_TEND: 1 expected:1
      
       Performance counter stats for '/root/mytests/cf-tx-events 1':
      
                        1      tx_c_tend
      
            0.001815365 seconds time elapsed
      
            0.000123000 seconds user
            0.001756000 seconds sys
      
       [root@s35lp76 perf]#
      Signed-off-by: default avatarThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.ibm.com>
      Reviewed-by: default avatarSebastien Boisvert <sboisvert@gydle.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: stable@vger.kernel.org
      Fixes: 292c34c1 ("perf pmu: Fix core PMU alias list for X86 platform")
      Link: http://lkml.kernel.org/r/20181023151616.78193-1-tmricht@linux.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ea1fa48c
    • Andi Kleen's avatar
      perf record: Support weak groups · cf99ad14
      Andi Kleen authored
      Implement a weak group fallback for 'perf record', similar to the
      existing 'perf stat' support.  This allows to use groups that might be
      longer than the available counters without failing.
      
      Before:
      
        $ perf record  -e '{cycles,cache-misses,cache-references,cpu_clk_unhalted.thread,cycles,cycles,cycles}' -a sleep 1
        Error:
        The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cycles).
        /bin/dmesg | grep -i perf may provide additional information.
      
      After:
      
        $ ./perf record  -e '{cycles,cache-misses,cache-references,cpu_clk_unhalted.thread,cycles,cycles,cycles}:W' -a sleep 1
        WARNING: No sample_id_all support, falling back to unordered processing
        [ perf record: Woken up 3 times to write data ]
        [ perf record: Captured and wrote 8.136 MB perf.data (134069 samples) ]
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20181001195927.14211-2-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cf99ad14
    • Andi Kleen's avatar
      perf evlist: Move perf_evsel__reset_weak_group into evlist · c3537fc2
      Andi Kleen authored
      - Move the function from builtin-stat to evlist for reuse
      - Rename to evlist to match purpose better
      - Pass the evlist as first argument.
      - No functional changes
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20181001195927.14211-1-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c3537fc2
    • Arnaldo Carvalho de Melo's avatar
      perf augmented_syscalls: Start collecting pathnames in the BPF program · 79ef68c7
      Arnaldo Carvalho de Melo authored
      This is the start of having the raw_syscalls:sys_enter BPF handler
      collecting pointer arguments, namely pathnames, and with two syscalls
      that have that pointer in different arguments, "open" as it as its first
      argument, "openat" as the second.
      
      With this in place the existing beautifiers in 'perf trace' works, those
      args are shown instead of just the pointer that comes with the syscalls
      tracepoints.
      
      This also serves to show and document pitfalls in the process of using
      just that place in the kernel (raw_syscalls:sys_enter) plus tables
      provided by userspace to collect syscall pointer arguments.
      
      One is the need to use a barrier, as suggested by Edward, to avoid clang
      optimizations that make the kernel BPF verifier to refuse loading our
      pointer contents collector.
      
      The end result should be a generic eBPF program that works in all
      architectures, with the differences amongst archs resolved by the
      userspace component, 'perf trace', that should get all its tables
      created automatically from the kernel components where they are defined,
      via string table constructors for things not expressed in BTF/DWARF
      (enums, structs, etc), and otherwise using those observability files
      (BTF).
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Edward Cree <ecree@solarflare.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lkml.kernel.org/n/tip-37dz54pmotgpnwg9tb6zuk9j@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      79ef68c7
  2. 03 Nov, 2018 1 commit
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Fix setting of augmented payload when using eBPF + raw_syscalls · cd26ea6d
      Arnaldo Carvalho de Melo authored
      For now with BPF raw_augmented we hook into raw_syscalls:sys_enter and
      there we get all 6 syscall args plus the tracepoint common fields
      (sizeof(long)) and the syscall_nr (another long). So we check if that is
      the case and if so don't look after the sc->args_size, but always after
      the full raw_syscalls:sys_enter payload, which is fixed.
      
      We'll revisit this later to pass s->args_size to the BPF augmenter (now
      tools/perf/examples/bpf/augmented_raw_syscalls.c, so that it copies only
      what we need for each syscall, like what happens when we use
      syscalls:sys_enter_NAME, so that we reduce the kernel/userspace traffic
      to just what is needed for each syscall.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-nlslrg8apxdsobt4pwl3n7ur@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cd26ea6d
  3. 01 Nov, 2018 3 commits
    • Arnaldo Carvalho de Melo's avatar
      perf trace: When augmenting raw_syscalls plug raw_syscalls:sys_exit too · 3c5e3dab
      Arnaldo Carvalho de Melo authored
      With just this commit we get to support all syscalls via hooking
      raw_syscalls:sys_{enter,exit} to the trace__sys_{enter,exit} routines
      to combine, strace-like, those tracepoints.
      
        # trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
               ? (         ): sleep/31680  ... [continued]: execve()) = 0
           0.043 ( 0.004 ms): sleep/31680 brk() = 0x55652a851000
           0.070 ( 0.009 ms): sleep/31680 access(filename:, mode: R) = -1 ENOENT No such file or directory
           0.087 ( 0.006 ms): sleep/31680 openat(dfd: CWD, filename: , flags: CLOEXEC) = 3
           0.096 ( 0.003 ms): sleep/31680 fstat(fd: 3, statbuf: 0x7ffc5269e190) = 0
           0.101 ( 0.005 ms): sleep/31680 mmap(len: 103334, prot: READ, flags: PRIVATE, fd: 3) = 0x7f709c239000
           0.109 ( 0.002 ms): sleep/31680 close(fd: 3) = 0
           0.126 ( 0.006 ms): sleep/31680 openat(dfd: CWD, filename: , flags: CLOEXEC) = 3
           0.135 ( 0.003 ms): sleep/31680 read(fd: 3, buf: 0x7ffc5269e358, count: 832) = 832
           0.141 ( 0.002 ms): sleep/31680 fstat(fd: 3, statbuf: 0x7ffc5269e1f0) = 0
           0.146 ( 0.005 ms): sleep/31680 mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS) = 0x7f709c237000
           0.159 ( 0.007 ms): sleep/31680 mmap(len: 3889792, prot: EXEC|READ, flags: PRIVATE|DENYWRITE, fd: 3) = 0x7f709bc79000
           0.168 ( 0.009 ms): sleep/31680 mprotect(start: 0x7f709be26000, len: 2093056) = 0
           0.179 ( 0.010 ms): sleep/31680 mmap(addr: 0x7f709c025000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 1753088) = 0x7f709c025000
           0.196 ( 0.005 ms): sleep/31680 mmap(addr: 0x7f709c02b000, len: 14976, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) = 0x7f709c02b000
           0.210 ( 0.002 ms): sleep/31680 close(fd: 3) = 0
           0.230 ( 0.002 ms): sleep/31680 arch_prctl(option: 4098, arg2: 140121632638208) = 0
           0.306 ( 0.009 ms): sleep/31680 mprotect(start: 0x7f709c025000, len: 16384, prot: READ) = 0
           0.338 ( 0.005 ms): sleep/31680 mprotect(start: 0x556529607000, len: 4096, prot: READ) = 0
           0.348 ( 0.005 ms): sleep/31680 mprotect(start: 0x7f709c253000, len: 4096, prot: READ) = 0
           0.356 ( 0.019 ms): sleep/31680 munmap(addr: 0x7f709c239000, len: 103334) = 0
           0.463 ( 0.002 ms): sleep/31680 brk() = 0x55652a851000
           0.468 ( 0.004 ms): sleep/31680 brk(brk: 0x55652a872000) = 0x55652a872000
           0.474 ( 0.002 ms): sleep/31680 brk() = 0x55652a872000
           0.484 ( 0.008 ms): sleep/31680 open(filename: , flags: CLOEXEC) = 3
           0.497 ( 0.002 ms): sleep/31680 fstat(fd: 3, statbuf: 0x7f709c02aaa0) = 0
           0.501 ( 0.006 ms): sleep/31680 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) = 0x7f70950aa000
           0.514 ( 0.002 ms): sleep/31680 close(fd: 3) = 0
           0.554 (1000.140 ms): sleep/31680 nanosleep(rqtp: 0x7ffc5269eed0) = 0
        1000.734 ( 0.007 ms): sleep/31680 close(fd: 1) = 0
        1000.748 ( 0.004 ms): sleep/31680 close(fd: 2) = 0
        1000.769 (         ): sleep/31680 exit_group()
        #
      
      Now to allow selecting which syscalls should be traced, using a map.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-votqqmqhag8e1i9mgyzfez3o@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3c5e3dab
    • Arnaldo Carvalho de Melo's avatar
      perf examples bpf: Start augmenting raw_syscalls:sys_{start,exit} · febf8a37
      Arnaldo Carvalho de Melo authored
      The previous approach of attaching to each syscall showed how it is
      possible to augment tracepoints and use that augmentation, pointer
      payloads, in the existing beautifiers in 'perf trace', but for a more
      general solution we now will try to augment the main
      raw_syscalls:sys_{enter,exit} syscalls, and then pass instructions in
      maps so that it knows which syscalls and which pointer contents, and how
      many bytes for each of the arguments should be copied.
      
      Start with just the bare minimum to collect what is provided by those
      two tracepoints via the __augmented_syscalls__ map + bpf-output perf
      event, which results in perf trace showing them without connecting
      enter+exit:
      
        # perf trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
           0.000 sleep/11563 raw_syscalls:sys_exit:NR 59 = 0
           0.019 (         ): sleep/11563 brk() ...
           0.021 sleep/11563 raw_syscalls:sys_exit:NR 12 = 94682642325504
           0.033 (         ): sleep/11563 access(filename:, mode: R) ...
           0.037 sleep/11563 raw_syscalls:sys_exit:NR 21 = -2
           0.041 (         ): sleep/11563 openat(dfd: CWD, filename: , flags: CLOEXEC) ...
           0.044 sleep/11563 raw_syscalls:sys_exit:NR 257 = 3
           0.045 (         ): sleep/11563 fstat(fd: 3, statbuf: 0x7ffdbf7119b0) ...
           0.046 sleep/11563 raw_syscalls:sys_exit:NR 5 = 0
           0.047 (         ): sleep/11563 mmap(len: 103334, prot: READ, flags: PRIVATE, fd: 3) ...
           0.049 sleep/11563 raw_syscalls:sys_exit:NR 9 = 140196285493248
           0.050 (         ): sleep/11563 close(fd: 3) ...
           0.051 sleep/11563 raw_syscalls:sys_exit:NR 3 = 0
           0.059 (         ): sleep/11563 openat(dfd: CWD, filename: , flags: CLOEXEC) ...
           0.062 sleep/11563 raw_syscalls:sys_exit:NR 257 = 3
           0.063 (         ): sleep/11563 read(fd: 3, buf: 0x7ffdbf711b78, count: 832) ...
           0.065 sleep/11563 raw_syscalls:sys_exit:NR 0 = 832
           0.066 (         ): sleep/11563 fstat(fd: 3, statbuf: 0x7ffdbf711a10) ...
           0.067 sleep/11563 raw_syscalls:sys_exit:NR 5 = 0
           0.068 (         ): sleep/11563 mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS) ...
           0.070 sleep/11563 raw_syscalls:sys_exit:NR 9 = 140196285485056
           0.073 (         ): sleep/11563 mmap(len: 3889792, prot: EXEC|READ, flags: PRIVATE|DENYWRITE, fd: 3) ...
           0.076 sleep/11563 raw_syscalls:sys_exit:NR 9 = 140196279463936
           0.077 (         ): sleep/11563 mprotect(start: 0x7f81fd8a8000, len: 2093056) ...
           0.083 sleep/11563 raw_syscalls:sys_exit:NR 10 = 0
           0.084 (         ): sleep/11563 mmap(addr: 0x7f81fdaa7000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 1753088) ...
           0.088 sleep/11563 raw_syscalls:sys_exit:NR 9 = 140196283314176
           0.091 (         ): sleep/11563 mmap(addr: 0x7f81fdaad000, len: 14976, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) ...
           0.093 sleep/11563 raw_syscalls:sys_exit:NR 9 = 140196283338752
           0.097 (         ): sleep/11563 close(fd: 3) ...
           0.098 sleep/11563 raw_syscalls:sys_exit:NR 3 = 0
           0.107 (         ): sleep/11563 arch_prctl(option: 4098, arg2: 140196285490432) ...
           0.108 sleep/11563 raw_syscalls:sys_exit:NR 158 = 0
           0.143 (         ): sleep/11563 mprotect(start: 0x7f81fdaa7000, len: 16384, prot: READ) ...
           0.146 sleep/11563 raw_syscalls:sys_exit:NR 10 = 0
           0.157 (         ): sleep/11563 mprotect(start: 0x561d037e7000, len: 4096, prot: READ) ...
           0.160 sleep/11563 raw_syscalls:sys_exit:NR 10 = 0
           0.163 (         ): sleep/11563 mprotect(start: 0x7f81fdcd5000, len: 4096, prot: READ) ...
           0.165 sleep/11563 raw_syscalls:sys_exit:NR 10 = 0
           0.166 (         ): sleep/11563 munmap(addr: 0x7f81fdcbb000, len: 103334) ...
           0.174 sleep/11563 raw_syscalls:sys_exit:NR 11 = 0
           0.216 (         ): sleep/11563 brk() ...
           0.217 sleep/11563 raw_syscalls:sys_exit:NR 12 = 94682642325504
           0.217 (         ): sleep/11563 brk(brk: 0x561d05453000) ...
           0.219 sleep/11563 raw_syscalls:sys_exit:NR 12 = 94682642460672
           0.220 (         ): sleep/11563 brk() ...
           0.221 sleep/11563 raw_syscalls:sys_exit:NR 12 = 94682642460672
           0.224 (         ): sleep/11563 open(filename: , flags: CLOEXEC) ...
           0.228 sleep/11563 raw_syscalls:sys_exit:NR 2 = 3
           0.229 (         ): sleep/11563 fstat(fd: 3, statbuf: 0x7f81fdaacaa0) ...
           0.230 sleep/11563 raw_syscalls:sys_exit:NR 5 = 0
           0.231 (         ): sleep/11563 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) ...
           0.234 sleep/11563 raw_syscalls:sys_exit:NR 9 = 140196166418432
           0.237 (         ): sleep/11563 close(fd: 3) ...
           0.238 sleep/11563 raw_syscalls:sys_exit:NR 3 = 0
           0.262 (         ): sleep/11563 nanosleep(rqtp: 0x7ffdbf7126f0) ...
        1000.399 sleep/11563 raw_syscalls:sys_exit:NR 35 = 0
        1000.440 (         ): sleep/11563 close(fd: 1) ...
        1000.447 sleep/11563 raw_syscalls:sys_exit:NR 3 = 0
        1000.454 (         ): sleep/11563 close(fd: 2) ...
        1000.468 (         ): sleep/11563 exit_group(                                                           )
        #
      
      In the next csets we'll connect those events to the existing enter/exit
      raw_syscalls handlers in 'perf trace', just like we did with the
      syscalls:sys_{enter,exit}_* tracepoints.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-5nl8l4hx1tl9pqdx65nkp6pw@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      febf8a37
    • Will Deacon's avatar
      tools headers barrier: Fix arm64 tools build failure wrt smp_load_{acquire,release} · 51f5fd2e
      Will Deacon authored
      Cheers for reporting this. I managed to reproduce the build failure with
      gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1).
      
      The code in question is the arm64 versions of smp_load_acquire() and
      smp_store_release(). Unlike other architectures, these are not built
      around READ_ONCE() and WRITE_ONCE() since we have instructions we can
      use instead of fences. Bringing our macros up-to-date with those (i.e.
      tweaking the union initialisation and using the special "uXX_alias_t"
      types) appears to fix the issue for me.
      
      Committer notes:
      
      Testing it in the systems previously failing:
      
        # time dm android-ndk:r12b-arm \
               android-ndk:r15c-arm \
               debian:experimental-x-arm64 \
               ubuntu:14.04.4-x-linaro-arm64 \
               ubuntu:16.04-x-arm \
               ubuntu:16.04-x-arm64 \
               ubuntu:18.04-x-arm \
               ubuntu:18.04-x-arm64
          1 android-ndk:r12b-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
          2 android-ndk:r15c-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
          3 debian:experimental-x-arm64   : Ok   aarch64-linux-gnu-gcc (Debian 8.2.0-7) 8.2.0
          4 ubuntu:14.04.4-x-linaro-arm64 : Ok   aarch64-linux-gnu-gcc (Linaro GCC 5.5-2017.10) 5.5.0
          5 ubuntu:16.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
          6 ubuntu:16.04-x-arm64          : Ok   aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
          7 ubuntu:18.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.3.0-27ubuntu1~18.04) 7.3.0
          8 ubuntu:18.04-x-arm64          : Ok   aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.3.0-27ubuntu1~18.04) 7.3.0
      Reported-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20181031174408.GA27871@arm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      51f5fd2e
  4. 31 Oct, 2018 19 commits
    • Ingo Molnar's avatar
      Merge tag 'perf-urgent-for-mingo-4.20-20181031' of... · 29995d29
      Ingo Molnar authored
      Merge tag 'perf-urgent-for-mingo-4.20-20181031' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/urgent improvements and fixes from Arnaldo Carvalho de Melo:
      
      - Fixes dealing with the removal of the fallback to looking up samples
        marked as userspace in the kernel maps, done recently:
      
        - For intel-pt, that was setting the synthesized header misc field
          as PERF_RECORD_MISC_USER, depending thus on the fallback to take
          place, now it sets as USER or KERNEL according to x86 specific
          knowledge. Also now it inserts the PERF_CONTEXT_{USER,KERNEL} into
          the PERF_SAMPLE_CALLCHAINs it synthesizes from hw traces (Adrian Hunter)
      
        - Similar fixes for the cs-etm ARM HW trace code, that used the Intel PT
          model as a starting point (Leo Yan)
      
        - For the "caller" callchain order, where the callchain returned by the
          kernel was simply reversed without taking into account the
          PERF_CONTEXT_{USER,KERNEL,etc} markers from where to define if an entry
          was for kernel or userspace, working just because the map lookup fallback
          was in place (David S. Miller)
      
      - Allow for selecting if 'overwrite' mode should be used in 'perf top' and
        make the default for it not to be used. This is due to problems with the
        current implementation where the pausing used ends up making 'perf top'
        miss PERF_RECORD_{MMAP,FORK,EXEC,etc} events, which with short lifetime
        threads workloads leads quickly to many "unknown" maps (and thus symbols)
        to appear in the UI. Workloads with long thread lifetimes and with few
        metadata events can still use --overwrite to take advantage of the
        overwrite mode (Arnaldo Carvalho de Melo)
      
      - Start 'perf top''s display thread earlier, so that the screen doesn't
        remain blank for too long at tool start (David S. Miller)
      
      - Don't clone maps from parent when synthesizing forks, to avoid the inevitable
        flurry of overlapping maps as we process the synthesized MMAP2 events that get
        delivered shortly thereafter. (David S. Miller)
      
      - Take pgoff into account when reporting elf to libdwfl, now the unwinding
        results are the same with elfutils's libdwfl and libunwind (Milian Wolff)
      
      - Update lotsa kernel ABI headers (Arnaldo Carvalho de Melo)
      
      - 'perf trace' syscall arg beautification improvements to allow for
        handling args such as mount's 'flags', where maks have to be ignored
        before considering what is left, that, if only zeroes, is suppressed
        like other args without such masks (Arnaldo Carvalho de Melo)
      
      - Beautify mount's 'source' and 'flags' args (Arnaldo Carvalho de Melo)
      
      - Generate mmap's flags bit constants from linux/mman.h and all the
        arch specific mman.h files, so that no changes in the main 'perf trace'
        source files is required when new flags get added (Arnaldo Carvalho de Melo)
      
      - Consider syscall aliases, so that 'perf trace -e umount' works and we don't
        have to use 'umount2' (that works as well, just not required) (Arnaldo Carvalho de Melo)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      29995d29
    • Adrian Hunter's avatar
      perf intel-pt/bts: Calculate cpumode for synthesized samples · 5d4f0eda
      Adrian Hunter authored
      In the absence of a fallback, samples must provide a correct cpumode for
      the 'ip'. Do that now there is no fallback.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: stable@vger.kernel.org # 4.19
      Link: http://lkml.kernel.org/r/20181031091043.23465-6-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5d4f0eda
    • Adrian Hunter's avatar
      perf intel-pt: Insert callchain context into synthesized callchains · 24248306
      Adrian Hunter authored
      In the absence of a fallback, callchains must encode also the callchain
      context. Do that now there is no fallback.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: stable@vger.kernel.org # 4.19
      Link: http://lkml.kernel.org/r/100ea2ec-ed14-b56d-d810-e0a6d2f4b069@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      24248306
    • David Miller's avatar
      perf tools: Don't clone maps from parent when synthesizing forks · 4f8f382e
      David Miller authored
      When synthesizing FORK events, we are trying to create thread objects
      for the already running tasks on the machine.
      
      Normally, for a kernel FORK event, we want to clone the parent's maps
      because that is what the kernel just did.
      
      But when synthesizing, this should not be done.  If we do, we end up
      with overlapping maps as we process the sythesized MMAP2 events that
      get delivered shortly thereafter.
      
      Use the FORK event misc flags in an internal way to signal this
      situation, so we can elide the map clone when appropriate.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Joe Mario <jmario@redhat.com>
      Link: http://lkml.kernel.org/r/20181030.222404.2085088822877051075.davem@davemloft.net
      [ Added comment about flag use in machine__process_fork_event(),
        use ternary op in thread__clone_map_groups() as suggested by Jiri ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4f8f382e
    • David Miller's avatar
      perf top: Start display thread earlier · ff27a06a
      David Miller authored
      If events are coming in at a rate such that the event processing thread
      can barely keep up, our initial run of the event ring will almost never
      terminate and this delays the starting of the display thread.
      
      The screen basically stays black until the event thread can get out of
      it's endless loop.
      
      Therefore, start the display thread before we start processing the ring
      buffer.
      
      This also make sure that we always have the user requested real time
      setting engaged when processing the ring.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20181030.223003.2242527041807905962.davem@davemloft.netSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ff27a06a
    • Arnaldo Carvalho de Melo's avatar
      tools headers uapi: Update linux/if_link.h header copy · 76b0b801
      Arnaldo Carvalho de Melo authored
      To pick the changes from:
      
        9163a0fc ("net: bridge: add support for per-port vlan stats")
      
      And silence this build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/if_link.h' differs from latest version at 'include/uapi/linux/if_link.h'
      
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Eric Leblond <eric@regit.org>
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Link: https://lkml.kernel.org/n/tip-7p53ghippywz7fqkwo3nkzet@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      76b0b801
    • Arnaldo Carvalho de Melo's avatar
      tools headers uapi: Update linux/netlink.h header copy · d45a57ff
      Arnaldo Carvalho de Melo authored
      Picking the changes from:
      
        89d35528 ("netlink: Add new socket option to enable strict checking on dumps")
      
      To silence this build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/netlink.h' differs from latest version at 'include/uapi/linux/netlink.h'
      
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Eric Leblond <eric@regit.org>
      Link: https://lkml.kernel.org/n/tip-1xymkfjpmhxfzrs46t8z8mjw@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d45a57ff
    • Arnaldo Carvalho de Melo's avatar
      tools headers: Sync the various kvm.h header copies · 82775812
      Arnaldo Carvalho de Melo authored
      For powerpc, s390, x86 and the main uapi linux/kvm.h header, none of
      them entail changes in tooling.
      
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-avn7iy8f4tcm2y40sbsdk31m@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      82775812
    • Arnaldo Carvalho de Melo's avatar
      tools include uapi: Update linux/mmap.h copy · 685626dc
      Arnaldo Carvalho de Melo authored
      To pick up the changes from:
      
        20916d46 ("mm/hugetlb: add mmap() encodings for 32MB and 512MB page sizes")
      
      That do not entail changes in in tools, this just shows that we have to
      consider bits [26:31] of flags to beautify that in tools like 'perf
      trace'
      
      This silences this perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/mman.h' differs from latest version at 'include/uapi/linux/mman.h'
        diff -u tools/include/uapi/linux/mman.h include/uapi/linux/mman.h
      
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-3rvc39lon93kgt5pl31d8g4x@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      685626dc
    • Arnaldo Carvalho de Melo's avatar
      perf trace beauty: Use the mmap flags table generated from headers · 2f967f1d
      Arnaldo Carvalho de Melo authored
      Instead of requiring us to go on and edit sources to add new flag.
      
        # perf trace -e *mmap sleep 0.1
           0.025 ( 0.005 ms): sleep/29876 mmap(len: 163746, prot: READ, flags: PRIVATE, fd: 3) = 0x7faa68ad1000
           0.059 ( 0.004 ms): sleep/29876 mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS) = 0x7faa68acf000
           0.069 ( 0.006 ms): sleep/29876 mmap(len: 3889792, prot: EXEC|READ, flags: PRIVATE|DENYWRITE, fd: 3) = 0x7faa6851f000
           0.086 ( 0.009 ms): sleep/29876 mmap(addr: 0x7faa688cb000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 1753088) = 0x7faa688cb000
           0.101 ( 0.005 ms): sleep/29876 mmap(addr: 0x7faa688d1000, len: 14976, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) = 0x7faa688d1000
           0.348 ( 0.005 ms): sleep/29876 mmap(len: 111950656, prot: READ, flags: PRIVATE, fd: 3) = 0x7faa61a5b000
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-ggmoy6vxoygh5yim890ht0kf@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2f967f1d
    • Arnaldo Carvalho de Melo's avatar
      perf beauty: Wire up the mmap flags table generator to the Makefile · fbd7458d
      Arnaldo Carvalho de Melo authored
      Now when we run 'make -C tools/perf O=/tmp/build/perf' we end up with:
      
        $ cat /tmp/build/perf/trace/beauty/generated/mmap_flags_array.c
        static const char *mmap_flags[] = {
      	[ilog2(0x40) + 1] = "32BIT",
      	[ilog2(0x01) + 1] = "SHARED",
      	[ilog2(0x02) + 1] = "PRIVATE",
      	[ilog2(0x10) + 1] = "FIXED",
      	[ilog2(0x20) + 1] = "ANONYMOUS",
      	[ilog2(0x100000) + 1] = "FIXED_NOREPLACE",
      	[ilog2(0x0100) + 1] = "GROWSDOWN",
      	[ilog2(0x0800) + 1] = "DENYWRITE",
      	[ilog2(0x1000) + 1] = "EXECUTABLE",
      	[ilog2(0x2000) + 1] = "LOCKED",
      	[ilog2(0x4000) + 1] = "NORESERVE",
      	[ilog2(0x8000) + 1] = "POPULATE",
      	[ilog2(0x10000) + 1] = "NONBLOCK",
      	[ilog2(0x20000) + 1] = "STACK",
      	[ilog2(0x40000) + 1] = "HUGETLB",
      	[ilog2(0x80000) + 1] = "SYNC",
        };
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-t3fn7u3tjsupio6e6vkufx9m@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      fbd7458d
    • Arnaldo Carvalho de Melo's avatar
      perf beauty: Add a generator for MAP_ mmap's flag constants · 80ee5668
      Arnaldo Carvalho de Melo authored
      It'll use tools/{arch}/*,include copies of mman.h to generate a table to
      be used by tools, initially by the 'mmap' beautifiers in 'perf trace',
      but that could also be used to translate from a string constant to the
      integer value to be used in a eBPF or tracefs tracepoint filter.
      
      Tested for all archs using:
      
      $ for arch in `ls tools/arch/` ; \
      	do echo $arch ; tools/perf/trace/beauty/mmap_flags.sh $arch ; \
         done | less
      
      Example for alpha, an oddball, doesn't include any header, defines all
      its stuff:
      
        $ tools/perf/trace/beauty/mmap_flags.sh alpha
        static const char *mmap_flags[] = {
      	[ilog2(0x10) + 1] = "ANONYMOUS",
      	[ilog2(0x02000) + 1] = "DENYWRITE",
      	[ilog2(0x04000) + 1] = "EXECUTABLE",
      	[ilog2(0x100) + 1] = "FIXED",
      	[ilog2(0x01000) + 1] = "GROWSDOWN",
      	[ilog2(0x100000) + 1] = "HUGETLB",
      	[ilog2(0x08000) + 1] = "LOCKED",
      	[ilog2(0x40000) + 1] = "NONBLOCK",
      	[ilog2(0x10000) + 1] = "NORESERVE",
      	[ilog2(0x20000) + 1] = "POPULATE",
      	[ilog2(0x02) + 1] = "PRIVATE",
      	[ilog2(0x01) + 1] = "SHARED",
      	[ilog2(0x80000) + 1] = "STACK",
        };
        $
      
      Common case, my workstation, defines one entry (MAP_32BIT), then
      includes mman.h, which gets it to include mman-common.h too:
      
        $ tools/perf/trace/beauty/mmap_flags.sh
        static const char *mmap_flags[] = {
      	[ilog2(0x40) + 1] = "32BIT",
      	[ilog2(0x01) + 1] = "SHARED",
      	[ilog2(0x02) + 1] = "PRIVATE",
      	[ilog2(0x10) + 1] = "FIXED",
      	[ilog2(0x20) + 1] = "ANONYMOUS",
      	[ilog2(0x100000) + 1] = "FIXED_NOREPLACE",
      	[ilog2(0x0100) + 1] = "GROWSDOWN",
      	[ilog2(0x0800) + 1] = "DENYWRITE",
      	[ilog2(0x1000) + 1] = "EXECUTABLE",
      	[ilog2(0x2000) + 1] = "LOCKED",
      	[ilog2(0x4000) + 1] = "NORESERVE",
      	[ilog2(0x8000) + 1] = "POPULATE",
      	[ilog2(0x10000) + 1] = "NONBLOCK",
      	[ilog2(0x20000) + 1] = "STACK",
      	[ilog2(0x40000) + 1] = "HUGETLB",
      	[ilog2(0x80000) + 1] = "SYNC",
        };
        $ uname -m
        x86_64
        $
      
      Sparc, that defines a bunch then includes just mman-common.h:
      
        $ tools/perf/trace/beauty/mmap_flags.sh sparc
        static const char *mmap_flags[] = {
      	[ilog2(0x0800) + 1] = "DENYWRITE",
      	[ilog2(0x1000) + 1] = "EXECUTABLE",
      	[ilog2(0x0200) + 1] = "GROWSDOWN",
      	[ilog2(0x40000) + 1] = "HUGETLB",
      	[ilog2(0x100) + 1] = "LOCKED",
      	[ilog2(0x10000) + 1] = "NONBLOCK",
      	[ilog2(0x40) + 1] = "NORESERVE",
      	[ilog2(0x8000) + 1] = "POPULATE",
      	[ilog2(0x20000) + 1] = "STACK",
      	[ilog2(0x01) + 1] = "SHARED",
      	[ilog2(0x02) + 1] = "PRIVATE",
      	[ilog2(0x10) + 1] = "FIXED",
      	[ilog2(0x20) + 1] = "ANONYMOUS",
      	[ilog2(0x100000) + 1] = "FIXED_NOREPLACE",
        };
        [acme@jouet perf]$
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-xydeh491z8fkgglcmqnl5thj@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      80ee5668
    • Arnaldo Carvalho de Melo's avatar
      tools include uapi: Update asound.h copy · 89eb1f3b
      Arnaldo Carvalho de Melo authored
      To silence this perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/sound/asound.h' differs from latest version at 'include/uapi/sound/asound.h'
        diff -u tools/include/uapi/sound/asound.h include/uapi/sound/asound.h
      
      Due to this cset:
      
        a9840151 ("ALSA: timer: fix wrong comment to refer to 'SNDRV_TIMER_PSFLG_*'")
      
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Takashi Sakamoto <o-takashi@sakamocchi.jp>
      Cc: Takashi Iwai <tiwai@suse.de>
      Link: https://lkml.kernel.org/n/tip-76gsvs0w2g0x723ivqa2xua3@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      89eb1f3b
    • Arnaldo Carvalho de Melo's avatar
      tools arch uapi: Update asm-generic/unistd.h and arm64 unistd.h copies · 8dd4c0f6
      Arnaldo Carvalho de Melo authored
      To get the changes in:
      
        82b355d1 ("y2038: Remove newstat family from default syscall set")
      
      Which will make the syscall table used by 'perf trace' for arm64 to be
      updated from the changes in that patch.
      
      This silences these perf build warnings:
      
        Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/unistd.h' differs from latest version at 'arch/arm64/include/uapi/asm/unistd.h'
        diff -u tools/arch/arm64/include/uapi/asm/unistd.h arch/arm64/include/uapi/asm/unistd.h
        Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
        diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
      
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-3euy7c4yy5mvnp5bm16t9vqg@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8dd4c0f6
    • Arnaldo Carvalho de Melo's avatar
      tools include uapi: Update linux/fs.h copy · 733ac4f9
      Arnaldo Carvalho de Melo authored
      To silence this perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/fs.h' differs from latest version at 'include/uapi/linux/fs.h'
        diff -u tools/include/uapi/linux/fs.h include/uapi/linux/fs.h
      
      Due to just two comments added by:
      
        Fixes: 578bdaab ("crypto: speck - remove Speck")
      
      So nothing that entails changes in tools/, that so far uses fs.h to
      generate the mount and umount syscalls 'flags' argument integer->string
      tables with:
      
        $ tools/perf/trace/beauty/mount_flags.sh
        static const char *mount_flags[] = {
      	[4096 ? (ilog2(4096) + 1) : 0] = "BIND",
      <SNIP>
      	[30 + 1] = "ACTIVE",
      	[31 + 1] = "NOUSER",
        };
        $
        # trace -e mount,umount mount --bind /proc /mnt
           1.228 ( 2.581 ms): mount/1068 mount(dev_name: /mnt, dir_name: 0x55f011c354a0, type: 0x55f011c38170, flags: BIND) = 0
        # trace -e mount,umount umount /proc /mnt
        umount: /proc: target is busy.
           1.587 ( 0.010 ms): umount/1070 umount2(name: /proc) = -1 EBUSY Device or resource busy
           1.799 (12.660 ms): umount/1070 umount2(name: /mnt) = 0
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Jason A. Donenfeld <Jason@zx2c4.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Link: https://lkml.kernel.org/n/tip-c00bqzclscgah26z2g5zxm73@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      733ac4f9
    • David S. Miller's avatar
      perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc} · e9024d51
      David S. Miller authored
      When processing using 'perf report -g caller', which is the default, we
      ended up reverting the callchain entries received from the kernel, but
      simply reverting throws away the information that tells that from a
      point onwards the addresses are for userspace, kernel, guest kernel,
      guest user, hypervisor.
      
      The idea is that if we are walking backwards, for each cluster of
      non-cpumode entries we have to first scan backwards for the next one and
      use that for the cluster.
      
      This seems silly and more expensive than it needs to be but it is enough
      for a initial fix.
      
      The code here is really complicated because it is intimately intertwined
      with the lbr and branch handling, as well as this callchain order,
      further fixes will be needed to properly take into account the cpumode
      in those cases.
      
      Another problem with ORDER_CALLER is that the NULL "0" IP that is at the
      end of most callchains shows up at the top of the histogram because
      every callchain contains it and with ORDER_CALLER it is the first entry.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Souvik Banerjee <souvik1997@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: stable@vger.kernel.org # 4.19
      Link: https://lkml.kernel.org/n/tip-2wt3ayp6j2y2f2xowixa8y6y@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e9024d51
    • Leo Yan's avatar
      perf cs-etm: Correct CPU mode for samples · d6c9c05f
      Leo Yan authored
      Since commit edeb0c90 ("perf tools: Stop fallbacking to kallsyms for
      vdso symbols lookup"), the kernel address cannot be properly parsed to
      kernel symbol with command 'perf script -k vmlinux'.  The reason is
      CoreSight samples is always to set CPU mode as PERF_RECORD_MISC_USER,
      thus it fails to find corresponding map/dso in below flows:
      
        process_sample_event()
          `-> machine__resolve()
      	  `-> thread__find_map(thread, sample->cpumode, sample->ip, al);
      
      In this flow it needs to pass argument 'sample->cpumode' to tell what's
      the CPU mode, before it always passed PERF_RECORD_MISC_USER but without
      any failure until the commit edeb0c90 ("perf tools: Stop fallbacking
      to kallsyms for vdso symbols lookup") has been merged.  The reason is
      even with the wrong CPU mode the function thread__find_map() firstly
      fails to find map but it will rollback to find kernel map for vdso
      symbols lookup.  In the latest code it has removed the fallback code,
      thus if CPU mode is PERF_RECORD_MISC_USER then it cannot find map
      anymore with kernel address.
      
      This patch is to correct samples CPU mode setting, it creates a new
      helper function cs_etm__cpu_mode() to tell what's the CPU mode based on
      the address with the info from machine structure; this patch has a bit
      extension to check not only kernel and user mode, but also check for
      host/guest and hypervisor mode.  Finally this patch uses the function in
      instruction and branch samples and also apply in cs_etm__mem_access()
      for a minor polishing.
      Signed-off-by: default avatarLeo Yan <leo.yan@linaro.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: stable@kernel.org # v4.19
      Link: http://lkml.kernel.org/r/1540883908-17018-1-git-send-email-leo.yan@linaro.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d6c9c05f
    • Milian Wolff's avatar
      perf unwind: Take pgoff into account when reporting elf to libdwfl · 1fe627da
      Milian Wolff authored
      libdwfl parses an ELF file itself and creates mappings for the
      individual sections. perf on the other hand sees raw mmap events which
      represent individual sections. When we encounter an address pointing
      into a mapping with pgoff != 0, we must take that into account and
      report the file at the non-offset base address.
      
      This fixes unwinding with libdwfl in some cases. E.g. for a file like:
      
      ```
      
      using namespace std;
      
      mutex g_mutex;
      
      double worker()
      {
          lock_guard<mutex> guard(g_mutex);
          uniform_real_distribution<double> uniform(-1E5, 1E5);
          default_random_engine engine;
          double s = 0;
          for (int i = 0; i < 1000; ++i) {
              s += norm(complex<double>(uniform(engine), uniform(engine)));
          }
          cout << s << endl;
          return s;
      }
      
      int main()
      {
          vector<std::future<double>> results;
          for (int i = 0; i < 10000; ++i) {
              results.push_back(async(launch::async, worker));
          }
          return 0;
      }
      ```
      
      Compile it with `g++ -g -O2 -lpthread cpp-locking.cpp  -o cpp-locking`,
      then record it with `perf record --call-graph dwarf -e
      sched:sched_switch`.
      
      When you analyze it with `perf script` and libunwind, you should see:
      
      ```
      cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120
              ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux)
                  7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so)
                  7f38e4252d0b arena_get2.part.4+0x2fb (/usr/lib/libc-2.28.so)
                  7f38e4255b1c tcache_init.part.6+0xec (/usr/lib/libc-2.28.so)
                  7f38e42569e5 __GI___libc_malloc+0x115 (inlined)
                  7f38e4241790 __GI__IO_file_doallocate+0x90 (inlined)
                  7f38e424fbbf __GI__IO_doallocbuf+0x4f (inlined)
                  7f38e424ee47 __GI__IO_file_overflow+0x197 (inlined)
                  7f38e424df36 _IO_new_file_xsputn+0x116 (inlined)
                  7f38e4242bfb __GI__IO_fwrite+0xdb (inlined)
                  7f38e463fa6d std::basic_streambuf<char, std::char_traits<char> >::sputn(char const*, long)+0x1cd (inlined)
                  7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> >::_M_put(char const*, long)+0x1cd (inlined)
                  7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> > std::__write<char>(std::ostreambuf_iterator<char, std::char_traits<char> >, char const*, int)+0x1cd (inlined)
                  7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_float<double>(std::ostreambuf_iterator<c>
                  7f38e464bd70 std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, double) const+0x90 (inl>
                  7f38e464bd70 std::ostream& std::ostream::_M_insert<double>(double)+0x90 (/usr/lib/libstdc++.so.6.0.25)
                  563b9cb502f7 std::ostream::operator<<(double)+0xb7 (inlined)
                  563b9cb502f7 worker()+0xb7 (/ssd/milian/projects/kdab/rnd/hotspot/build/tests/test-clients/cpp-locking/cpp-locking)
                  563b9cb506fb double std::__invoke_impl<double, double (*)()>(std::__invoke_other, double (*&&)())+0x2b (inlined)
                  563b9cb506fb std::__invoke_result<double (*)()>::type std::__invoke<double (*)()>(double (*&&)())+0x2b (inlined)
                  563b9cb506fb decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker<std::tuple<double (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x2b (inlined)
                  563b9cb506fb std::thread::_Invoker<std::tuple<double (*)()> >::operator()()+0x2b (inlined)
                  563b9cb506fb std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<double>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<double (*)()> >, dou>
                  563b9cb506fb std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_>
                  563b9cb507e8 std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const+0x28 (inlined)
                  563b9cb507e8 std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*)+0x28 (/ssd/milian/>
                  7f38e46d24fe __pthread_once_slow+0xbe (/usr/lib/libpthread-2.28.so)
                  563b9cb51149 __gthread_once+0xe9 (inlined)
                  563b9cb51149 void std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*)>
                  563b9cb51149 std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool)+0xe9 (inlined)
                  563b9cb51149 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread::_Invoker<std::tuple<double (*)()> >&&)::{lambda()#1}::op>
                  563b9cb51149 void std::__invoke_impl<void, std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread::_Invoker<std::tuple<double>
                  563b9cb51149 std::__invoke_result<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread::_Invoker<std::tuple<double (*)()> >>
                  563b9cb51149 decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker<std::tuple<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_>
                  563b9cb51149 std::thread::_Invoker<std::tuple<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread::_Invoker<std::tuple<dou>
                  563b9cb51149 std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread>
                  7f38e45f0062 execute_native_thread_routine+0x12 (/usr/lib/libstdc++.so.6.0.25)
                  7f38e46caa9c start_thread+0xfc (/usr/lib/libpthread-2.28.so)
                  7f38e42ccb22 __GI___clone+0x42 (inlined)
      ```
      
      Before this patch, using libdwfl, you would see:
      
      ```
      cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120
              ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux)
                  7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so)
              a041161e77950c5c [unknown] ([unknown])
      ```
      
      With this patch applied, we get a bit further in unwinding:
      
      ```
      cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120
              ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux)
                  7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so)
                  7f38e4252d0b arena_get2.part.4+0x2fb (/usr/lib/libc-2.28.so)
                  7f38e4255b1c tcache_init.part.6+0xec (/usr/lib/libc-2.28.so)
                  7f38e42569e5 __GI___libc_malloc+0x115 (inlined)
                  7f38e4241790 __GI__IO_file_doallocate+0x90 (inlined)
                  7f38e424fbbf __GI__IO_doallocbuf+0x4f (inlined)
                  7f38e424ee47 __GI__IO_file_overflow+0x197 (inlined)
                  7f38e424df36 _IO_new_file_xsputn+0x116 (inlined)
                  7f38e4242bfb __GI__IO_fwrite+0xdb (inlined)
                  7f38e463fa6d std::basic_streambuf<char, std::char_traits<char> >::sputn(char const*, long)+0x1cd (inlined)
                  7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> >::_M_put(char const*, long)+0x1cd (inlined)
                  7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> > std::__write<char>(std::ostreambuf_iterator<char, std::char_traits<char> >, char const*, int)+0x1cd (inlined)
                  7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_float<double>(std::ostreambuf_iterator<c>
                  7f38e464bd70 std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, double) const+0x90 (inl>
                  7f38e464bd70 std::ostream& std::ostream::_M_insert<double>(double)+0x90 (/usr/lib/libstdc++.so.6.0.25)
                  563b9cb502f7 std::ostream::operator<<(double)+0xb7 (inlined)
                  563b9cb502f7 worker()+0xb7 (/ssd/milian/projects/kdab/rnd/hotspot/build/tests/test-clients/cpp-locking/cpp-locking)
              6eab825c1ee3e4ff [unknown] ([unknown])
      ```
      
      Note that the backtrace is still stopping too early, when compared to
      the nice results obtained via libunwind. It's unclear so far what the
      reason for that is.
      
      Committer note:
      
      Further comment by Milian on the thread started on the Link: tag below:
      
       ---
      The remaining issue is due to a bug in elfutils:
      
      https://sourceware.org/ml/elfutils-devel/2018-q4/msg00089.html
      
      With both patches applied, libunwind and elfutils produce the same output for
      the above scenario.
       ---
      Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20181029141644.3907-1-milian.wolff@kdab.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      1fe627da
    • Arnaldo Carvalho de Melo's avatar
      perf top: Do not use overwrite mode by default · 218d6111
      Arnaldo Carvalho de Melo authored
      Enabling --overwrite mode allows us to to use just the most recent
      records, which helps in high core count machines such as Knights
      Landing/Mill, but right now is being disabled by default as the pausing
      used in this technique is leading to loss of metadata events such as
      PERF_RECORD_MMAP which makes 'perf top' unable to resolve samples,
      leading to lots of unknown samples appearing on the UI.
      
      Enabling this may be useful if you are in such machines and profiling a
      workload that doesn't creates short lived threads and/or doesn't uses
      many executable mmap operations.
      
      Work is being planed to solve this situation, till then, this will
      remain disabled by default.
      Reported-by: default avatarDavid Miller <davem@davemloft.net>
      Acked-by: default avatarKan Liang <kan.liang@intel.com>
      Link: https://lkml.kernel.org/r/4f84468f-37d9-cf1b-12c1-514ef74b6a48@linux.intel.com
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: ebebbf08 ("perf top: Switch default mode to overwrite mode")
      Link: https://lkml.kernel.org/n/tip-ehvf77vi1si9409r7p4wx788@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      218d6111
  5. 30 Oct, 2018 4 commits
    • Arnaldo Carvalho de Melo's avatar
      perf top: Allow disabling the overwrite mode · 4e303fbe
      Arnaldo Carvalho de Melo authored
      In ebebbf08 ("perf top: Switch default mode to overwrite mode") we
      forgot to leave a way to disable that new default, add a --overwrite
      option that can be disabled using --no-overwrite, since the code already
      in such a way that we can readily disable this mode.
      
      This is useful when investigating bugs with this mode like the recent
      report from David Miller where lots of unknown symbols appear due to
      disabling the events while processing them which disables all record
      types, not just PERF_RECORD_SAMPLE, which makes it impossible to resolve
      maps when we lose PERF_RECORD_MMAP records.
      
      This can be easily seen while building a kernel, when there are lots of
      short lived processes.
      Reported-by: default avatarDavid Miller <davem@davemloft.net>
      Acked-by: default avatarKan Liang <kan.liang@intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: ebebbf08 ("perf top: Switch default mode to overwrite mode")
      Link: https://lkml.kernel.org/n/tip-oqgsz2bq4kgrnnajrafcdhie@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4e303fbe
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Beautify mount's first pathname arg · 23c07a23
      Arnaldo Carvalho de Melo authored
      The pathname beautifiers so far support just one augmented pathname per
      syscall, so do it just for mount's first arg, later this will get fixed.
      
      With:
      
        # perf probe -l
        probe:vfs_getname    (on getname_flags:73@acme/git/linux/fs/namei.c with pathname)
        #
      
      Later this will get added to augmented_syscalls.c (eBPF):
      
      In one xterm:
      
        # perf trace -e mount,umount
        2687.331 ( 3.544 ms): mount/8892 mount(dev_name: /mnt, dir_name: 0x561f9ac184a0, type: 0x561f9ac1b170, flags: BIND) = 0
        3912.126 ( 8.807 ms): umount/8895 umount2(name: /mnt) = 0
        ^C#
      
      In the other:
      
        $ sudo mount --bind /proc /mnt
        $ sudo umount /mnt
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Benjamin Peterson <benjamin@python.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-qsvhrm2es635cl4zicqjeth2@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      23c07a23
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Beautify the umount's 'name' argument · 476c92ca
      Arnaldo Carvalho de Melo authored
      By using the SCA_FILENAME beautifier, that works when either the
      probe:vfs_getname probe is in place or with the eBPF program
      tools/perf/examples/bpf/augmented_syscalls.c:
      
        # perf probe -l
        probe:vfs_getname (on getname_flags:73@acme/git/linux/fs/namei.c with pathname)
        # perf trace -e umount
        9630.332 ( 9.521 ms): umount/8082 umount2(name: /mnt) = 0
        #
      
      The augmented syscalls one will be done in the next patch.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Benjamin Peterson <benjamin@python.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-hegbzlpd2nrn584l5jxn7sy2@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      476c92ca
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Consider syscall aliases too · f932184e
      Arnaldo Carvalho de Melo authored
      When trying to trace the 'umount' syscall on x86_64 I noticed that it
      was failing:
      
        # trace -e umount umount /mnt
        event syntax error: 'umount'
                             \___ parser error
        Run 'perf list' for a list of valid events
      
         Usage: perf trace [<options>] [<command>]
            or: perf trace [<options>] -- <command> [<options>]
            or: perf trace record [<options>] [<command>]
            or: perf trace record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event/syscall selector. use 'perf list' to list available events
        #
      
      This is because in the x86-64 we have it just as 'umount2':
      
        $ grep umount arch/x86/entry/syscalls/syscall_64.tbl
        166	common	umount2			__x64_sys_umount
        $
      
      So if the syscall name fails, try fallbacking to looking at the aliases
      we have in the syscall_fmts table to then re-lookup, now:
      
        # trace -e umount umount -f /mnt
        umount: /mnt: not mounted.
           1.759 ( 0.004 ms): umount/18365 umount2(name: 0x55fbfcbc4480, flags: 1) = -1 EINVAL Invalid argument
        #
      
      Time to beautify the flags arg :-)
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Benjamin Peterson <benjamin@python.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-ukweodgzbmjd25lfkgryeft1@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f932184e