1. 11 Mar, 2019 15 commits
  2. 09 Mar, 2019 4 commits
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-5.1-20190307' of... · b339da48
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-5.1-20190307' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/core changes from Arnaldo Carvalho de Melo:
      
      perf bpf:
      
        Arnaldo Carvalho de Melo:
      
        - Automatically add BTF ELF markers to 'perf trace' BPF programs, so that
          tools such as 'bpftool map dump' can pretty print map keys and values.
      
      perf c2c:
      
        Jiri Olsa:
      
        - Fix report for empty NUMA node.
      
      perf diff:
      
        Jin Yao:
      
        - Support --time, --cpu, --pid and --tid filter options.
      
      perf probe:
      
        Arnaldo Carvalho de Melo:
      
        - Clarify error message about not finding kernel modules debuginfo.
      
      perf record:
      
        Jiri Olsa:
      
        - Fixup probing for max attr.precise_ip.
      
      perf trace:
      
        Arnaldo Carvalho de Melo:
      
        - Add missing %s lost in the 'msg_flags' recvmmsg arg when adding prefix suppression logic.
      
      perf annotate:
      
        Arnaldo Carvalho de Melo:
      
        - Calculate the max instruction name, align column to that, removing the
          hardcoded max 6 chars and cope with instructions with names longer than that,
          such as vpmovmskb, vpcmpeqb, etc.
      
      kernel:
      
        Song Liu:
      
        - Consider events with attr.bpf_event set as side-band.
      
        Gustavo A. R. Silva:
      
        - Mark expected switch fall-through in perf_event_parse_addr_filter().
      
      Libraries:
      
        Jiri Olsa:
      
        - Fix leaks and double frees on error paths.
      
      libtraceevent:
      
        Tony Jones:
      
        - Fix buffer overflow in arg_eval().
      
      python scripting:
      
        Tony Jones:
      
        - More python3 fixes.
      
      Trivial:
      
        Yang Wei:
      
        - Remove needless extra semicolon in clang C++ glue code.
      
      Intel PT/BTS:
      
        Adrian Hunter:
      
        - Improve auxtrace address filter error message when there is no DSO.
      
        - Fix divide by zero when TSC is not available.
      
        - Further improvements to the export to sqlite/posgresql python scripts
          and to the GUI sqlviewer, exporting 'parent_id' so that we have enable
          the creation of call trees.
      
        Andi Kleen:
      
        - Generalize function to copy from thread addr space from intel-bts code.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b339da48
    • Gustavo A. R. Silva's avatar
      perf/core: Mark expected switch fall-through · 43aa378b
      Gustavo A. R. Silva authored
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      This patch fixes the following warning:
      
        kernel/events/core.c: In function ‘perf_event_parse_addr_filter’:
        kernel/events/core.c:9154:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
            kernel = 1;
            ~~~~~~~^~~
        kernel/events/core.c:9156:3: note: here
           case IF_SRC_FILEADDR:
           ^~~~
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough.
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190212205430.GA8446@embeddedorSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      43aa378b
    • Kan Liang's avatar
      perf/x86/intel/uncore: Fix client IMC events return huge result · 8041ffd3
      Kan Liang authored
      The client IMC bandwidth events currently return very large values:
      
        $ perf stat -e uncore_imc/data_reads/ -e uncore_imc/data_writes/ -I 10000 -a
      
        10.000117222 34,788.76 MiB uncore_imc/data_reads/
        10.000117222 8.26 MiB uncore_imc/data_writes/
        20.000374584 34,842.89 MiB uncore_imc/data_reads/
        20.000374584 10.45 MiB uncore_imc/data_writes/
        30.000633299 37,965.29 MiB uncore_imc/data_reads/
        30.000633299 323.62 MiB uncore_imc/data_writes/
        40.000891548 41,012.88 MiB uncore_imc/data_reads/
        40.000891548 6.98 MiB uncore_imc/data_writes/
        50.001142480 1,125,899,906,621,494.75 MiB uncore_imc/data_reads/
        50.001142480 6.97 MiB uncore_imc/data_writes/
      
      The client IMC events are freerunning counters. They still use the
      old event encoding format (0x1 for data_read and 0x2 for data write).
      The counter bit width is calculated by common code, which assume that
      the standard encoding format is used for the freerunning counters.
      Error bit width information is calculated.
      
      The patch intends to convert the old client IMC event encoding to the
      standard encoding format.
      
      Current common code uses event->attr.config which directly copy from
      user space. We should not implicitly modify it for a converted event.
      The event->hw.config is used to replace the event->attr.config in
      common code.
      
      For client IMC events, the event->attr.config is used to calculate a
      converted event with standard encoding format in the custom
      event_init(). The converted event is stored in event->hw.config.
      For other events of freerunning counters, they already use the standard
      encoding format. The same value as event->attr.config is assigned to
      event->hw.config in common event_init().
      Reported-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Tested-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: default avatarKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: stable@kernel.org # v4.18+
      Fixes: 9aae1780 ("perf/x86/intel/uncore: Clean up client IMC uncore")
      Link: https://lkml.kernel.org/r/20190227165729.1861-1-kan.liang@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8041ffd3
    • Alexander Shishkin's avatar
      perf/ring_buffer: Use high order allocations for AUX buffers optimistically · 5768402f
      Alexander Shishkin authored
      Currently, the AUX buffer allocator will use high-order allocations
      for PMUs that don't support hardware scatter-gather chaining to ensure
      large contiguous blocks of pages, and always use an array of single
      pages otherwise.
      
      There is, however, a tangible performance benefit in using larger chunks
      of contiguous memory even in the latter case, that comes from not having
      to fetch the next page's address at every page boundary. In particular,
      a task running under Intel PT on an Atom CPU shows 1.5%-2% less runtime
      penalty with a single multi-page output region in snapshot mode (no PMI)
      than with multiple single-page output regions, from ~6% down to ~4%. For
      the snapshot mode it does make a difference as it is intended to run over
      long periods of time.
      
      For this reason, change the allocation policy to always optimistically
      start with the highest possible order when allocating pages for the AUX
      buffer, desceding until the allocation succeeds or order zero allocation
      fails.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190215114727.62648-2-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5768402f
  3. 06 Mar, 2019 21 commits
    • Jiri Olsa's avatar
      perf data: Force perf_data__open|close zero data->file.path · b8f7d86b
      Jiri Olsa authored
      Making sure the data->file.path is zeroed on perf_data__open error path
      and in perf_data__close, so we don't double free it in case someone call
      it twice.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
      Cc: Nageswara R Sastry <nasastry@in.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lkml.kernel.org/r/20190305152536.21035-9-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b8f7d86b
    • Jiri Olsa's avatar
      perf session: Fix double free in perf_data__close · befa09b6
      Jiri Olsa authored
      We can't call perf_data__close and subsequently perf_session__delete,
      because it will call perf_data__close again and cause double free for
      data->file.path.
      
        $ perf report -i .
        incompatible file format (rerun with -v to learn more)
        free(): double free detected in tcache 2
        Aborted (core dumped)
      
      In fact we don't need to call perf_data__close at all, because at the
      time the got out_close is reached, session->data is already initialized,
      so the perf_data__close call will be triggered from
      perf_session__delete.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
      Cc: Nageswara R Sastry <nasastry@in.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Fixes: 2d4f2799 ("perf data: Add global path holder")
      Link: http://lkml.kernel.org/r/20190305152536.21035-8-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      befa09b6
    • Jiri Olsa's avatar
      perf evsel: Probe for precise_ip with simple attr · 5b61adb1
      Jiri Olsa authored
      Currently we probe for precise_ip with user specified perf_event_attr,
      which might fail because of unsupported kernel features, which would get
      disabled during the open time anyway.
      
      Switching the probe to take place on simple hw cycles, so the following
      record sets proper precise_ip:
      
        # perf record -e cycles:P ls
        # perf evlist -v
        cycles:P: size: 112, ... precise_ip: 3, ...
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
      Cc: Nageswara R Sastry <nasastry@in.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lkml.kernel.org/r/20190305152536.21035-7-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5b61adb1
    • Jiri Olsa's avatar
      perf tools: Read and store caps/max_precise in perf_pmu · 90a86bde
      Jiri Olsa authored
      Read the caps/max_precise value and store it in struct perf_pmu to be
      used when setting the maximum precise_ip field in following patch.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
      Cc: Nageswara R Sastry <nasastry@in.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lkml.kernel.org/r/20190305152536.21035-5-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      90a86bde
    • Jiri Olsa's avatar
      perf hist: Fix memory leak of srcline · 26349585
      Jiri Olsa authored
      We can't allocate he->srcline unconditionaly, only when new hist_entry
      is created. Moving he->srcline allocation into hist_entry__init
      function.
      Original-patch-by: default avatarJonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
      Suggested-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Nageswara R Sastry <nasastry@in.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Link: http://lkml.kernel.org/r/20190305152536.21035-4-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      26349585
    • Jiri Olsa's avatar
      perf hist: Add error path into hist_entry__init · c5758910
      Jiri Olsa authored
      Adding error path into hist_entry__init to unify error handling, so
      every new member does not need to free everything else.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: nageswara r sastry <nasastry@in.ibm.com>
      Link: http://lkml.kernel.org/r/20190305152536.21035-3-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c5758910
    • Jiri Olsa's avatar
      perf c2c: Fix c2c report for empty numa node · e34c9402
      Jiri Olsa authored
      Ravi Bangoria reported that we fail with an empty NUMA node with the
      following message:
      
        $ lscpu
        NUMA node0 CPU(s):
        NUMA node1 CPU(s):   0-4
      
        $ sudo ./perf c2c report
        node/cpu topology bugFailed setup nodes
      
      Fix this by detecting the empty node and keeping its CPU set empty.
      Reported-by: default avatarNageswara R Sastry <nasastry@in.ibm.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20190305152536.21035-2-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e34c9402
    • Tony Jones's avatar
      perf script python: Add Python3 support to intel-pt-events.py · fdf2460c
      Tony Jones authored
      Support both Python2 and Python3 in the intel-pt-events.py script
      
      There may be differences in the ordering of output lines due to
      differences in dictionary ordering etc.  However the format within lines
      should be unchanged.
      
      The use of 'from __future__' implies the minimum supported Python2 version
      is now v2.6
      Signed-off-by: default avatarTony Jones <tonyj@suse.de>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Link: http://lkml.kernel.org/r/fd26acf9-0c0f-717f-9664-a3c33043ce19@suse.deSigned-off-by: default avatarSeeteena Thoufeek <s1seetee@linux.vnet.ibm.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      fdf2460c
    • Tony Jones's avatar
      perf script python: Add Python3 support to event_analyzing_sample.py · c253c72e
      Tony Jones authored
      Support both Python2 and Python3 in the event_analyzing_sample.py script
      
      There may be differences in the ordering of output lines due to
      differences in dictionary ordering etc.  However the format within lines
      should be unchanged.
      
      The use of 'from __future__' implies the minimum supported Python2 version
      is now v2.6
      Signed-off-by: default avatarTony Jones <tonyj@suse.de>
      Cc: Feng Tang <feng.tang@intel.com>
      Link: http://lkml.kernel.org/r/20190302011903.2416-5-tonyj@suse.deSigned-off-by: default avatarSeeteena Thoufeek <s1seetee@linux.vnet.ibm.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c253c72e
    • Tony Jones's avatar
      perf script python: add Python3 support to check-perf-trace.py · 57e604b1
      Tony Jones authored
      Support both Python 2 and Python 3 in the check-perf-trace.py script.
      
      There may be differences in the ordering of output lines due to
      differences in dictionary ordering etc.  However the format within lines
      should be unchanged.
      
      The use of from __future__ implies the minimum supported version of
      Python2 is now v2.6
      Signed-off-by: default avatarTony Jones <tonyj@suse.de>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Link: http://lkml.kernel.org/r/20190302011903.2416-4-tonyj@suse.deSigned-off-by: default avatarSeeteena Thoufeek <s1seetee@linux.vnet.ibm.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      57e604b1
    • Tony Jones's avatar
      perf script python: Add Python3 support to futex-contention.py · de2ec16b
      Tony Jones authored
      Support both Python2 and Python3 in the futex-contention.py script
      
      There may be differences in the ordering of output lines due to
      differences in dictionary ordering etc.  However the format within lines
      should be unchanged.
      
      The use of 'from __future__' implies the minimum supported Python2 version
      is now v2.6
      Signed-off-by: default avatarTony Jones <tonyj@suse.de>
      Link: http://lkml.kernel.org/r/20190302011903.2416-3-tonyj@suse.deSigned-off-by: default avatarSeeteena Thoufeek <s1seetee@linux.vnet.ibm.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      de2ec16b
    • Tony Jones's avatar
      perf script python: Remove mixed indentation · b504d7f6
      Tony Jones authored
      Remove mixed indentation in Python scripts.  Revert to either all tabs
      (most common form) or all spaces (4 or 8) depending on what was the
      intent of the original commit.  This is necessary to complete Python3
      support as it will flag an error if it encounters mixed indentation.
      Signed-off-by: default avatarTony Jones <tonyj@suse.de>
      Link: http://lkml.kernel.org/r/20190302011903.2416-2-tonyj@suse.deSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b504d7f6
    • Jin Yao's avatar
      perf diff: Support --pid/--tid filter options · c1d3e633
      Jin Yao authored
      Using the existing symbol_conf.pid_list_str and symbol_conf.tid_list_str
      logic.
      
      For example:
      
        perf diff --tid 13965
      
      It'll only diff the samples for thread 13965.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1551791143-10334-4-git-send-email-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      c1d3e633
    • Jin Yao's avatar
      perf diff: Support --cpu filter option · daca23b2
      Jin Yao authored
      To improve 'perf diff', implement a --cpu filter option.
      
      Multiple CPUs can be provided as a comma-separated list with no space:
      0,1.  Ranges of CPUs are specified with -: 0-2. Default is to report
      samples on all CPUs.
      
      For example,
      
        perf diff --cpu 0,1
      
      It only diff the samples for CPU0 and CPU1.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1551791143-10334-3-git-send-email-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      daca23b2
    • Jin Yao's avatar
      perf diff: Support --time filter option · 4802138d
      Jin Yao authored
      To improve 'perf diff', implement a --time filter option to diff the
      samples within given time window.
      
      It supports time percent with multiple time ranges. The time string
      format is 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'.
      
      For example:
      
      Select the second 10% time slice to diff:
      
        perf diff --time 10%/2
      
      Select from 0% to 10% time slice to diff:
      
        perf diff --time 0%-10%
      
      Select the first and the second 10% time slices to diff:
      
        perf diff --time 10%/1,10%/2
      
      Select from 0% to 10% and 30% to 40% slices to diff:
      
        perf diff --time 0%-10%,30%-40%
      
      It also supports analysing samples within a given time window
      <start>,<stop>.
      
      Times have the format seconds.microseconds.
      
      If 'start' is not given (i.e., time string is ',x.y') then analysis starts at
      the beginning of the file.
      
      If the stop time is not given (i.e, time string is 'x.y,') then analysis
      goes to end of file.
      
      Time string is 'a1.b1,c1.d1:a2.b2,c2.d2'. Use ':' to separate timestamps for
      different perf.data files.
      
      For example, we get the timestamp information from perf script.
      
        perf script -i perf.data.old
      
          mgen 13940 [000]  3946.361400: ...
      
        perf script -i perf.data
      
          mgen 13940 [000]  3971.150589 ...
      
        perf diff --time 3946.361400,:3971.150589,
      
      It analyzes the perf.data.old from the timestamp 3946.361400 to the end of
      perf.data.old and analyzes the perf.data from the timestamp 3971.150589 to the
      end of perf.data.
      
       v4:
       ---
       Update abstime_str_dup(), let it return error if strdup
       is failed, and update __cmd_diff() accordingly.
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1551791143-10334-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4802138d
    • Andi Kleen's avatar
      perf thread: Generalize function to copy from thread addr space from intel-bts code · 15325938
      Andi Kleen authored
      Add a utility function to fetch executable code. Convert one
      user over to it. There are more places doing that, but they
      do significantly different actions, so they are not
      easy to fit into a single library function.
      
      Committer changes:
      
      . No need to cast around, make 'buf' be a void pointer.
      
      . Rename it to thread__memcpy() to reflect the fact it is about copying
        a chunk of memory from a thread, i.e. from its address space.
      
      . No need to have it in a separate object file, move it to thread.[ch]
      
      . Check the return of map__load(), the original code didn't do it, but
        since we're moving this around, check that as well.
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/r/20190305144758.12397-2-andi@firstfloor.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      15325938
    • Arnaldo Carvalho de Melo's avatar
      perf annotate: Calculate the max instruction name, align column to that · bc3bb795
      Arnaldo Carvalho de Melo authored
      We were hardcoding '6' as the max instruction name, and we have lots
      that are longer than that, see the diff from two 'P' printed TUI
      annotations for a libc function that uses instructions with long names,
      such as 'vpmovmskb' with its 9 chars:
      
        --- __strcmp_avx2.annotation.before	2019-03-06 16:31:39.368020425 -0300
        +++ __strcmp_avx2.annotation	2019-03-06 16:32:12.079450508 -0300
        @@ -2,284 +2,284 @@
         Event: cycles:ppp
      
         Percent        endbr64
        -  0.10         mov    %edi,%eax
        +  0.10         mov        %edi,%eax
        -               xor    %edx,%edx
        +               xor        %edx,%edx
        -  3.54         vpxor  %ymm7,%ymm7,%ymm7
        +  3.54         vpxor      %ymm7,%ymm7,%ymm7
        -               or     %esi,%eax
        +               or         %esi,%eax
        -               and    $0xfff,%eax
        +               and        $0xfff,%eax
        -               cmp    $0xf80,%eax
        +               cmp        $0xf80,%eax
        -             ↓ jg     370
        +             ↓ jg         370
        - 27.07         vmovdqu (%rdi),%ymm1
        + 27.07         vmovdqu    (%rdi),%ymm1
        -  7.97         vpcmpeqb (%rsi),%ymm1,%ymm0
        +  7.97         vpcmpeqb   (%rsi),%ymm1,%ymm0
        -  2.15         vpminub %ymm1,%ymm0,%ymm0
        +  2.15         vpminub    %ymm1,%ymm0,%ymm0
        -  4.09         vpcmpeqb %ymm7,%ymm0,%ymm0
        +  4.09         vpcmpeqb   %ymm7,%ymm0,%ymm0
        -  0.43         vpmovmskb %ymm0,%ecx
        +  0.43         vpmovmskb  %ymm0,%ecx
        -  1.53         test   %ecx,%ecx
        +  1.53         test       %ecx,%ecx
        -             ↓ je     b0
        +             ↓ je         b0
        -  5.26         tzcnt  %ecx,%edx
        +  5.26         tzcnt      %ecx,%edx
        - 18.40         movzbl (%rdi,%rdx,1),%eax
        + 18.40         movzbl     (%rdi,%rdx,1),%eax
        -  7.09         movzbl (%rsi,%rdx,1),%edx
        +  7.09         movzbl     (%rsi,%rdx,1),%edx
        -  3.34         sub    %edx,%eax
        +  3.34         sub        %edx,%eax
           2.37         vzeroupper
                      ← retq
                        nop
        -         50:   tzcnt  %ecx,%edx
        +         50:   tzcnt      %ecx,%edx
        -               movzbl 0x20(%rdi,%rdx,1),%eax
        +               movzbl     0x20(%rdi,%rdx,1),%eax
        -               movzbl 0x20(%rsi,%rdx,1),%edx
        +               movzbl     0x20(%rsi,%rdx,1),%edx
        -               sub    %edx,%eax
        +               sub        %edx,%eax
                        vzeroupper
                      ← retq
        -               data16 nopw %cs:0x0(%rax,%rax,1)
        +               data16     nopw %cs:0x0(%rax,%rax,1)
      Reported-by: default avatarTravis Downs <travis.downs@gmail.com>
      LPU-Reference: CAOBGo4z1KfmWeOm6Et0cnX5Z6DWsG2PQbAvRn1MhVPJmXHrc5g@mail.gmail.com
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-89wsdd9h9g6bvq52sgp6d0u4@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bc3bb795
    • Linus Torvalds's avatar
      Merge branch 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6ea98b4b
      Linus Torvalds authored
      Pull x86 alternative instruction updates from Ingo Molnar:
       "Small RDTSCP opimization, enabled by the newly added ALTERNATIVE_3(),
        and other small improvements"
      
      * 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/TSC: Use RDTSCP
        x86/alternatives: Add an ALTERNATIVE_3() macro
        x86/alternatives: Print containing function
        x86/alternatives: Add macro comments
      6ea98b4b
    • Linus Torvalds's avatar
      Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 45802da0
      Linus Torvalds authored
      Pull scheduler updates from Ingo Molnar:
       "The main changes in this cycle were:
      
         - refcount conversions
      
         - Solve the rq->leaf_cfs_rq_list can of worms for real.
      
         - improve power-aware scheduling
      
         - add sysctl knob for Energy Aware Scheduling
      
         - documentation updates
      
         - misc other changes"
      
      * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
        kthread: Do not use TIMER_IRQSAFE
        kthread: Convert worker lock to raw spinlock
        sched/fair: Use non-atomic cpumask_{set,clear}_cpu()
        sched/fair: Remove unused 'sd' parameter from select_idle_smt()
        sched/wait: Use freezable_schedule() when possible
        sched/fair: Prune, fix and simplify the nohz_balancer_kick() comment block
        sched/fair: Explain LLC nohz kick condition
        sched/fair: Simplify nohz_balancer_kick()
        sched/topology: Fix percpu data types in struct sd_data & struct s_data
        sched/fair: Simplify post_init_entity_util_avg() by calling it with a task_struct pointer argument
        sched/fair: Fix O(nr_cgroups) in the load balancing path
        sched/fair: Optimize update_blocked_averages()
        sched/fair: Fix insertion in rq->leaf_cfs_rq_list
        sched/fair: Add tmp_alone_branch assertion
        sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock()
        sched/debug: Initialize sd_sysctl_cpus if !CONFIG_CPUMASK_OFFSTACK
        sched/pelt: Skip updating util_est when utilization is higher than CPU's capacity
        sched/fair: Update scale invariance of PELT
        sched/fair: Move the rq_of() helper function
        sched/core: Convert task_struct.stack_refcount to refcount_t
        ...
      45802da0
    • Linus Torvalds's avatar
      Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 203b6609
      Linus Torvalds authored
      Pull perf updates from Ingo Molnar:
       "Lots of tooling updates - too many to list, here's a few highlights:
      
         - Various subcommand updates to 'perf trace', 'perf report', 'perf
           record', 'perf annotate', 'perf script', 'perf test', etc.
      
         - CPU and NUMA topology and affinity handling improvements,
      
         - HW tracing and HW support updates:
            - Intel PT updates
            - ARM CoreSight updates
            - vendor HW event updates
      
         - BPF updates
      
         - Tons of infrastructure updates, both on the build system and the
           library support side
      
         - Documentation updates.
      
         - ... and lots of other changes, see the changelog for details.
      
        Kernel side updates:
      
         - Tighten up kprobes blacklist handling, reduce the number of places
           where developers can install a kprobe and hang/crash the system.
      
         - Fix/enhance vma address filter handling.
      
         - Various PMU driver updates, small fixes and additions.
      
         - refcount_t conversions
      
         - BPF updates
      
         - error code propagation enhancements
      
         - misc other changes"
      
      * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (238 commits)
        perf script python: Add Python3 support to syscall-counts-by-pid.py
        perf script python: Add Python3 support to syscall-counts.py
        perf script python: Add Python3 support to stat-cpi.py
        perf script python: Add Python3 support to stackcollapse.py
        perf script python: Add Python3 support to sctop.py
        perf script python: Add Python3 support to powerpc-hcalls.py
        perf script python: Add Python3 support to net_dropmonitor.py
        perf script python: Add Python3 support to mem-phys-addr.py
        perf script python: Add Python3 support to failed-syscalls-by-pid.py
        perf script python: Add Python3 support to netdev-times.py
        perf tools: Add perf_exe() helper to find perf binary
        perf script: Handle missing fields with -F +..
        perf data: Add perf_data__open_dir_data function
        perf data: Add perf_data__(create_dir|close_dir) functions
        perf data: Fail check_backup in case of error
        perf data: Make check_backup work over directories
        perf tools: Add rm_rf_perf_data function
        perf tools: Add pattern name checking to rm_rf
        perf tools: Add depth checking to rm_rf
        perf data: Add global path holder
        ...
      203b6609
    • Linus Torvalds's avatar
      Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3478588b
      Linus Torvalds authored
      Pull locking updates from Ingo Molnar:
       "The biggest part of this tree is the new auto-generated atomics API
        wrappers by Mark Rutland.
      
        The primary motivation was to allow instrumentation without uglifying
        the primary source code.
      
        The linecount increase comes from adding the auto-generated files to
        the Git space as well:
      
          include/asm-generic/atomic-instrumented.h     | 1689 ++++++++++++++++--
          include/asm-generic/atomic-long.h             | 1174 ++++++++++---
          include/linux/atomic-fallback.h               | 2295 +++++++++++++++++++++++++
          include/linux/atomic.h                        | 1241 +------------
      
        I preferred this approach, so that the full call stack of the (already
        complex) locking APIs is still fully visible in 'git grep'.
      
        But if this is excessive we could certainly hide them.
      
        There's a separate build-time mechanism to determine whether the
        headers are out of date (they should never be stale if we do our job
        right).
      
        Anyway, nothing from this should be visible to regular kernel
        developers.
      
        Other changes:
      
         - Add support for dynamic keys, which removes a source of false
           positives in the workqueue code, among other things (Bart Van
           Assche)
      
         - Updates to tools/memory-model (Andrea Parri, Paul E. McKenney)
      
         - qspinlock, wake_q and lockdep micro-optimizations (Waiman Long)
      
         - misc other updates and enhancements"
      
      * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
        locking/lockdep: Shrink struct lock_class_key
        locking/lockdep: Add module_param to enable consistency checks
        lockdep/lib/tests: Test dynamic key registration
        lockdep/lib/tests: Fix run_tests.sh
        kernel/workqueue: Use dynamic lockdep keys for workqueues
        locking/lockdep: Add support for dynamic keys
        locking/lockdep: Verify whether lock objects are small enough to be used as class keys
        locking/lockdep: Check data structure consistency
        locking/lockdep: Reuse lock chains that have been freed
        locking/lockdep: Fix a comment in add_chain_cache()
        locking/lockdep: Introduce lockdep_next_lockchain() and lock_chain_count()
        locking/lockdep: Reuse list entries that are no longer in use
        locking/lockdep: Free lock classes that are no longer in use
        locking/lockdep: Update two outdated comments
        locking/lockdep: Make it easy to detect whether or not inside a selftest
        locking/lockdep: Split lockdep_free_key_range() and lockdep_reset_lock()
        locking/lockdep: Initialize the locks_before and locks_after lists earlier
        locking/lockdep: Make zap_class() remove all matching lock order entries
        locking/lockdep: Reorder struct lock_class members
        locking/lockdep: Avoid that add_chain_cache() adds an invalid chain to the cache
        ...
      3478588b