1. 10 Jul, 2020 4 commits
    • Adrian Hunter's avatar
      perf script: Add option --show-text-poke-events · 92ecf3a6
      Adrian Hunter authored
      Consistent with other new events, add an option to perf script to
      display text poke events and ksymbol events. Both text poke events and
      ksymbol events are displayed because some text pokes (e.g. ftrace
      trampolines) have corresponding ksymbol events.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: x86@kernel.org
      Link: http://lore.kernel.org/lkml/20200512121922.8997-15-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      92ecf3a6
    • Adrian Hunter's avatar
      perf intel-pt: Add support for text poke events · b22f90aa
      Adrian Hunter authored
      Select text poke events when available and the kernel is being traced.
      Process text poke events to invalidate entries in Intel PT's instruction
      cache.
      
      Example:
      
        The example requires kernel config:
          CONFIG_PROC_SYSCTL=y
          CONFIG_SCHED_DEBUG=y
          CONFIG_SCHEDSTATS=y
      
        Before:
      
          # perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M &
          # cat /proc/sys/kernel/sched_schedstats
          0
          # echo 1 > /proc/sys/kernel/sched_schedstats
          # cat /proc/sys/kernel/sched_schedstats
          1
          # echo 0 > /proc/sys/kernel/sched_schedstats
          # cat /proc/sys/kernel/sched_schedstats
          0
          # kill %1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 3.341 MB perf.data.before ]
          [1]+  Terminated                 perf record -o perf.data.before --kcore -a -e intel_pt//k -m,64M
          # perf script -i perf.data.before --itrace=e >/dev/null
          Warning:
          474 instruction trace errors
      
        After:
      
          # perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M &
          # cat /proc/sys/kernel/sched_schedstats
          0
          # echo 1 > /proc/sys/kernel/sched_schedstats
          # cat /proc/sys/kernel/sched_schedstats
          1
          # echo 0 > /proc/sys/kernel/sched_schedstats
          # cat /proc/sys/kernel/sched_schedstats
          0
          # kill %1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 2.646 MB perf.data.after ]
          [1]+  Terminated                 perf record -o perf.data.after --kcore -a -e intel_pt//k -m,64M
          # perf script -i perf.data.after --itrace=e >/dev/null
      
      Example:
      
        The example requires kernel config:
          # CONFIG_FUNCTION_TRACER is not set
      
        Before:
          # perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
          # perf probe __schedule
          Added new event:
            probe:__schedule     (on __schedule)
      
          You can now use it in all perf tools, such as:
      
                  perf record -e probe:__schedule -aR sleep 1
      
          # perf record -e probe:__schedule -aR sleep 1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 0.026 MB perf.data (68 samples) ]
          # perf probe -d probe:__schedule
          Removed event: probe:__schedule
          # kill %1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 41.268 MB t1 ]
          [1]+  Terminated                 perf record --kcore -m,64M -o t1 -a -e intel_pt//k
          # perf script -i t1 --itrace=e >/dev/null
          Warning:
          207 instruction trace errors
      
        After:
          # perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
          # perf probe __schedule
          Added new event:
            probe:__schedule     (on __schedule)
      
          You can now use it in all perf tools, such as:
      
              perf record -e probe:__schedule -aR sleep 1
      
          # perf record -e probe:__schedule -aR sleep 1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 0.028 MB perf.data (107 samples) ]
          # perf probe -d probe:__schedule
          Removed event: probe:__schedule
          # kill %1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 39.978 MB t1 ]
          [1]+  Terminated                 perf record --kcore -m,64M -o t1 -a -e intel_pt//k
          # perf script -i t1 --itrace=e >/dev/null
          # perf script -i t1 --no-itrace -D | grep 'POKE\|KSYMBOL'
          6 565303693547 0x291f18 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc027a000 len 4096 type 2 flags 0x0 name kprobe_insn_page
          6 565303697010 0x291f68 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027a000 old len 0 new len 6
          6 565303838278 0x291fa8 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc027c000 len 4096 type 2 flags 0x0 name kprobe_optinsn_page
          6 565303848286 0x291ff8 [0xa0]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027c000 old len 0 new len 106
          6 565369336743 0x292af8 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffff88ab8890 old len 5 new len 5
          7 566434327704 0x217c208 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffff88ab8890 old len 5 new len 5
          6 566456313475 0x293198 [0xa0]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027c000 old len 106 new len 0
          6 566456314935 0x293238 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffc027a000 old len 6 new len 0
      
      Example:
      
        The example requires kernel config:
          CONFIG_FUNCTION_TRACER=y
      
        Before:
          # perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
          # perf probe __kmalloc
          Added new event:
            probe:__kmalloc      (on __kmalloc)
      
          You can now use it in all perf tools, such as:
      
              perf record -e probe:__kmalloc -aR sleep 1
      
          # perf record -e probe:__kmalloc -aR sleep 1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 0.022 MB perf.data (6 samples) ]
          # perf probe -d probe:__kmalloc
          Removed event: probe:__kmalloc
          # kill %1
          [ perf record: Woken up 2 times to write data ]
          [ perf record: Captured and wrote 43.850 MB t1 ]
          [1]+  Terminated                 perf record --kcore -m,64M -o t1 -a -e intel_pt//k
          # perf script -i t1 --itrace=e >/dev/null
          Warning:
          8 instruction trace errors
      
        After:
          # perf record --kcore -m,64M -o t1 -a -e intel_pt//k &
          # perf probe __kmalloc
          Added new event:
            probe:__kmalloc      (on __kmalloc)
      
          You can now use it in all perf tools, such as:
      
                  perf record -e probe:__kmalloc -aR sleep 1
      
          # perf record -e probe:__kmalloc -aR sleep 1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 0.037 MB perf.data (206 samples) ]
          # perf probe -d probe:__kmalloc
          Removed event: probe:__kmalloc
          # kill %1
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 41.442 MB t1 ]
          [1]+  Terminated                 perf record --kcore -m,64M -o t1 -a -e intel_pt//k
          # perf script -i t1 --itrace=e >/dev/null
          # perf script -i t1 --no-itrace -D | grep 'POKE\|KSYMBOL'
          5 312216133258 0x8bafe0 [0x50]: PERF_RECORD_KSYMBOL addr ffffffffc0360000 len 415 type 2 flags 0x0 name ftrace_trampoline
          5 312216133494 0x8bb030 [0x1d8]: PERF_RECORD_TEXT_POKE addr 0xffffffffc0360000 old len 0 new len 415
          5 312216229563 0x8bb208 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
          5 312216239063 0x8bb248 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
          5 312216727230 0x8bb288 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffabbea190 old len 5 new len 5
          5 312216739322 0x8bb2c8 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
          5 312216748321 0x8bb308 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
          7 313287163462 0x2817430 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
          7 313287174890 0x2817470 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
          7 313287818979 0x28174b0 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffabbea190 old len 5 new len 5
          7 313287829357 0x28174f0 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac6016f5 old len 5 new len 5
          7 313287841246 0x2817530 [0x40]: PERF_RECORD_TEXT_POKE addr 0xffffffffac601803 old len 5 new len 5
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: x86@kernel.org
      Link: http://lore.kernel.org/lkml/20200512121922.8997-14-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b22f90aa
    • Adrian Hunter's avatar
      perf tools: Add support for PERF_RECORD_KSYMBOL_TYPE_OOL · 789e2419
      Adrian Hunter authored
      PERF_RECORD_KSYMBOL_TYPE_OOL marks an executable page. Create a map
      backed only by memory, which will be populated as necessary by text poke
      events.
      
      Committer notes:
      
      From the patch:
      
      OOL stands for "Out of line" code such as kprobe-replaced instructions
      or optimized kprobes or ftrace trampolines.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: x86@kernel.org
      Link: http://lore.kernel.org/lkml/20200512121922.8997-13-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      789e2419
    • Adrian Hunter's avatar
      perf tools: Add support for PERF_RECORD_TEXT_POKE · 246eba8e
      Adrian Hunter authored
      Add processing for PERF_RECORD_TEXT_POKE events. When a text poke event
      is processed, then the kernel dso data cache is updated with the poked
      bytes.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: x86@kernel.org
      Link: http://lore.kernel.org/lkml/20200512121922.8997-12-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      246eba8e
  2. 09 Jul, 2020 1 commit
    • Numfor Mbiziwo-Tiapo's avatar
      perf annotate: Fix non-null terminated buffer returned by readlink() · b39730a6
      Numfor Mbiziwo-Tiapo authored
      Our local MSAN (Memory Sanitizer) build of perf throws a warning that
      comes from the "dso__disassemble_filename" function in
      "tools/perf/util/annotate.c" when running perf record.
      
      The warning stems from the call to readlink, in which "build_id_path"
      was being read into "linkname". Since readlink does not null terminate,
      an uninitialized memory access would later occur when "linkname" is
      passed into the strstr function. This is simply fixed by
      null-terminating "linkname" after the call to readlink.
      
      To reproduce this warning, build perf by running:
      
        $ make -C tools/perf CLANG=1 CC=clang EXTRA_CFLAGS="-fsanitize=memory -fsanitize-memory-track-origins"
      
      (Additionally, llvm might have to be installed and clang might have to
      be specified as the compiler - export CC=/usr/bin/clang)
      
      Then running:
      
        tools/perf/perf record -o - ls / | tools/perf/perf --no-pager annotate -i - --stdio
      
      Please see the cover letter for why false positive warnings may be
      generated.
      Signed-off-by: default avatarNumfor Mbiziwo-Tiapo <nums@google.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Mark Drayton <mbd@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20190729205750.193289-1-nums@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b39730a6
  3. 08 Jul, 2020 2 commits
    • Steve MacLean's avatar
      perf inject jit: Remove //anon mmap events · c8f6ae1f
      Steve MacLean authored
      **perf-<pid>.map and jit-<pid>.dump designs:
      
      When a JIT generates code to be executed, it must allocate memory and
      mark it executable using an mmap call.
      
      *** perf-<pid>.map design
      
      The perf-<pid>.map assumes that any sample recorded in an anonymous
      memory page is JIT code. It then tries to resolve the symbol name by
      looking at the process' perf-<pid>.map.
      
      *** jit-<pid>.dump design
      
      The jit-<pid>.dump mechanism takes a different approach. It requires a
      JIT to write a `<path>/jit-<pid>.dump` file. This file must also be
      mmapped so that perf inject -jit can find the file. The JIT must also
      add JIT_CODE_LOAD records for any functions it generates. The records
      are timestamped using a clock which can be correlated to the perf record
      clock.
      
      After perf record,  the `perf inject -jit` pass parses the recording
      looking for a `<path>/jit-<pid>.dump` file. When it finds the file, it
      parses it and for each JIT_CODE_LOAD record:
      * creates an elf file `<path>/jitted-<pid>-<code_index>.so
      * injects a new mmap record mapping the new elf file into the process.
      
      *** Coexistence design
      
      The kernel and perf support both of these mechanisms. We need to make
      sure perf works on an app supporting either or both of these mechanisms.
      Both designs rely on mmap records to determine how to resolve an ip
      address.
      
      The mmap records of both techniques by definition overlap. When the JIT
      compiles a method, it must:
      
      * allocate memory (mmap)
      * add execution privilege (mprotect or mmap. either will
      generate an mmap event form the kernel to perf)
      * compile code into memory
      * add a function record to perf-<pid>.map and/or jit-<pid>.dump
      
      Because the jit-<pid>.dump mechanism supports greater capabilities, perf
      prefers the symbols from jit-<pid>.dump. It implements this based on
      timestamp ordering of events. There is an implicit ASSUMPTION that the
      JIT_CODE_LOAD record timestamp will be after the // anon mmap event that
      was generated during memory allocation or adding the execution privilege setting.
      
      *** Problems with the ASSUMPTION
      
      The ASSUMPTION made in the Coexistence design section above is violated
      in the following scenario.
      
      *** Scenario
      
      While a JIT is jitting code it will eventually need to commit more
      pages and change these pages to executable permissions. Typically the
      JIT will want these collocated to minimize branch displacements.
      
      The kernel will coalesce these anonymous mapping with identical
      permissions before sending an MMAP event for the new pages. The address
      range of the new mmap will not be just the most recently mmap pages.
      It will include the entire coalesced mmap region.
      
      See mm/mmap.c
      
      unsigned long mmap_region(struct file *file, unsigned long addr,
                      unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
                      struct list_head *uf)
      {
      ...
              /*
               * Can we just expand an old mapping?
               */
      ...
              perf_event_mmap(vma);
      ...
      }
      
      *** Symptoms
      
      The coalesced // anon mmap event will be timestamped after the
      JIT_CODE_LOAD records. This means it will be used as the most recent
      mapping for that entire address range. For remaining events it will look
      at the inferior perf-<pid>.map for symbols.
      
      If both mechanisms are supported, the symbol will appear twice with
      different module names. This causes weird behavior in reporting.
      
      If only jit-<pid>.dump is supported, the symbol will no longer be resolved.
      
      ** Implemented solution
      
      This patch solves the issue by removing // anon mmap events for any
      process which has a valid jit-<pid>.dump file.
      
      It tracks on a per process basis to handle the case where some running
      apps support jit-<pid>.dump, but some only support perf-<pid>.map.
      
      It adds new assumptions:
      * // anon mmap events are only required for perf-<pid>.map support.
      * An app that uses jit-<pid>.dump, no longer needs
      perf-<pid>.map support. It assumes that any perf-<pid>.map info is
      inferior.
      
      *** Details
      
      Use thread->priv to store whether a jitdump file has been processed
      
      During "perf inject --jit", discard "//anon*" mmap events for any pid which
      has sucessfully processed a jitdump file.
      
      ** Testing:
      
      // jitdump case
      
        perf record <app with jitdump>
        perf inject --jit --input perf.data --output perfjit.data
      
      // verify mmap "//anon" events present initially
      
        perf script --input perf.data --show-mmap-events | grep '//anon'
      
      // verify mmap "//anon" events removed
      
        perf script --input perfjit.data --show-mmap-events | grep '//anon'
      
      // no jitdump case
      
        perf record <app without jitdump>
        perf inject --jit --input perf.data --output perfjit.data
      
      // verify mmap "//anon" events present initially
      
        perf script --input perf.data --show-mmap-events | grep '//anon'
      
      // verify mmap "//anon" events not removed
      
        perf script --input perfjit.data --show-mmap-events | grep '//anon'
      
      ** Repro:
      
      This issue was discovered while testing the initial CoreCLR jitdump
      implementation. https://github.com/dotnet/coreclr/pull/26897.
      
      ** Alternate solutions considered
      
      These were also briefly considered:
      
      * Change kernel to not coalesce mmap regions.
      
      * Change kernel reporting of coalesced mmap regions to perf. Only
      include newly mapped memory.
      
      * Only strip parts of // anon mmap events overlapping existing
      jitted-<pid>-<code_index>.so mmap events.
      Signed-off-by: default avatarSteve MacLean <Steve.MacLean@Microsoft.com>
      Acked-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/1590544271-125795-1-git-send-email-steve.maclean@linux.microsoft.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c8f6ae1f
    • Arnaldo Carvalho de Melo's avatar
      Merge remote-tracking branch 'torvalds/master' into perf/core · facbf0b9
      Arnaldo Carvalho de Melo authored
      To pick up fixes and move perf/core forward, minor conflict as
      perf_evlist__add_dummy() lost its 'perf_' prefix as it operates on a
      'struct evlist', not on a 'struct perf_evlist', i.e. its tools/perf/
      specific, it is not in libperf.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      facbf0b9
  4. 07 Jul, 2020 8 commits
  5. 06 Jul, 2020 24 commits
  6. 05 Jul, 2020 1 commit