1. 06 Apr, 2016 7 commits
  2. 01 Apr, 2016 6 commits
    • Wang Nan's avatar
      perf bpf: Add sample types for 'bpf-output' event · d37ba880
      Wang Nan authored
      Before this patch we can see very large time in the events before the
      'bpf-output' event. For example:
      
        # perf trace -vv -T --ev sched:sched_switch \
                            --ev bpf-output/no-inherit,name=evt/ \
                            --ev ./test_bpf_trace.c/map:channel.event=evt/ \
                            usleep 10
        ...
        18446744073709.551 (18446564645918.480 ms): usleep/4157 nanosleep(rqtp: 0x7ffd3f0dc4e0) ...
        18446744073709.551 (         ): evt:Raise a BPF event!..)
        179427791.076 (         ): perf_bpf_probe:func_begin:(ffffffff810eb9a0))
        179427791.081 (         ): sched:sched_switch:usleep:4157 [120] S ==> swapper/2:0 [120])
        ...
      
      We can also see the differences between bpf-output events and
      breakpoint events:
      
      For bpf output event:
         sample_type                    IP|TID|RAW|IDENTIFIER
      
      For tracepoint events:
         sample_type                    IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
      
      This patch fix this differences by adding more sample type for
      bpf-output events.
      
      After this patch:
      
        # perf trace -vv -T --ev sched:sched_switch \
                            --ev bpf-output/no-inherit,name=evt/ \
                            --ev ./test_bpf_trace.c/map:channel.event=evt/ \
                            usleep 10
        ...
        179877370.878 ( 0.003 ms): usleep/5336 nanosleep(rqtp: 0x7ffff866c450) ...
        179877370.878 (         ): evt:Raise a BPF event!..)
        179877370.878 (         ): perf_bpf_probe:func_begin:(ffffffff810eb9a0))
        179877370.882 (         ): sched:sched_switch:usleep:5336 [120] S ==> swapper/4:0 [120])
        179877370.945 (         ): evt:Raise a BPF event!..)
        ...
      
        # ./perf trace -vv -T --ev sched:sched_switch \
                              --ev bpf-output/no-inherit,name=evt/ \
                              --ev ./test_bpf_trace.c/map:channel.event=evt/ \
                              usleep 10 2>&1 | grep sample_type
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
        sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
      
      The 'IDENTIFIER' info is not required because all events have the same
      sample_type.
      
      Committer notes:
      
      Further testing, on top of the changes making 'perf trace' avoid samples
      from events without PERF_SAMPLE_TIME:
      
      Before:
      
        # trace --ev bpf-output/no-inherit,name=evt/ --ev /home/acme/bpf/test_bpf_trace.c/map:channel.event=evt/ usleep 10
        <SNIP>
          0.560 ( 0.001 ms): brk(                                                   ) = 0x55e5a1df8000
          18446640227439.430 (18446640227438.859 ms): nanosleep(rqtp: 0x7ffc96643370) ...
          18446640227439.430 (         ): evt:Raise a BPF event!..)
          0.576 (         ): perf_bpf_probe:func_begin:(ffffffff81112460))
          18446640227439.430 (         ): evt:Raise a BPF event!..)
          0.645 (         ): perf_bpf_probe:func_end:(ffffffff81112460 <- ffffffff81003d92))
          0.646 ( 0.076 ms):  ... [continued]: nanosleep()) = 0
        #
      
      After:
      
        # trace --ev bpf-output/no-inherit,name=evt/ --ev /home/acme/bpf/test_bpf_trace.c/map:channel.event=evt/ usleep 10
        <SNIP>
           0.292 ( 0.001 ms): brk(                          ) = 0x55c7cd6e1000
           0.302 ( 0.004 ms): nanosleep(rqtp: 0x7ffedd8bc0f0) ...
           0.302 (         ): evt:Raise a BPF event!..)
           0.303 (         ): perf_bpf_probe:func_begin:(ffffffff81112460))
           0.397 (         ): evt:Raise a BPF event!..)
           0.397 (         ): perf_bpf_probe:func_end:(ffffffff81112460 <- ffffffff81003d92))
           0.398 ( 0.100 ms):  ... [continued]: nanosleep()) = 0
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Reported-and-Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1459517202-42320-1-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d37ba880
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Don't set the base timestamp using events without PERF_SAMPLE_TIME · 8a07a809
      Arnaldo Carvalho de Melo authored
      This was causing bogus values to be shown at the timestamp column:
      
      Before:
      
        # trace --ev bpf-output/no-inherit,name=evt/ --ev /home/acme/bpf/test_bpf_trace.c/map:channel.event=evt/ usleep 10
        94631143.385 ( 0.001 ms): brk(                                     ) = 0x555555757000
        94631143.398 ( 0.003 ms): mmap(len: 4096, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS, fd: -1) = 0x7ffff7ff6000
        94631143.406 ( 0.004 ms): access(filename: 0xf7df9e10, mode: R     ) = -1 ENOENT No such file or directory
        94631143.412 ( 0.004 ms): open(filename: 0xf7df8761, flags: CLOEXEC) = 3
        94631143.415 ( 0.002 ms): fstat(fd: 3, statbuf: 0x7fffffffd6b0     ) = 0
        94631143.419 ( 0.003 ms): mmap(len: 106798, prot: READ, flags: PRIVATE, fd: 3) = 0x7ffff7fdb000
        94631143.420 ( 0.001 ms): close(fd: 3                              ) = 0
        94631143.432 ( 0.004 ms): open(filename: 0xf7ff6640, flags: CLOEXEC) = 3
        <SNIP>
      
      After:
      
        # trace --ev bpf-output/no-inherit,name=evt/ --ev /home/acme/bpf/test_bpf_trace.c/map:channel.event=evt/ usleep 10
        0.022 ( 0.001 ms): brk(                                     ) = 0x55d7668a6000
        0.037 ( 0.003 ms): mmap(len: 4096, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS, fd: -1) = 0x7f8fbeb97000
        0.123 ( 0.083 ms): access(filename: 0xbe995e10, mode: R     ) = -1 ENOENT No such file or directory
        0.130 ( 0.004 ms): open(filename: 0xbe994761, flags: CLOEXEC) = 3
        0.133 ( 0.002 ms): fstat(fd: 3, statbuf: 0x7fff6487a890     ) = 0
        0.138 ( 0.003 ms): mmap(len: 106798, prot: READ, flags: PRIVATE, fd: 3) = 0x7f8fbeb7c000
        0.140 ( 0.001 ms): close(fd: 3                              ) = 0
        0.151 ( 0.004 ms): open(filename: 0xbeb97640, flags: CLOEXEC) = 3
        <SNIP>
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-p7m8llv81iv55ekxexdp5n57@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8a07a809
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Introduce function to set the base timestamp · e6001980
      Arnaldo Carvalho de Melo authored
      That is used in both live runs, i.e.:
      
        # trace ls
      
      As when processing events recorded in a perf.data file:
      
        # trace -i perf.data
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-901l6yebnzeqg7z8mbaf49xb@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e6001980
    • Kan Liang's avatar
      perf tools: Fix PMU term format max value calculation · ac0e2cd5
      Kan Liang authored
      Currently the max value of format is calculated by the bits number. It
      relies on the continuity of the format.
      
      However, uncore event format is not continuous. E.g. uncore qpi event
      format can be 0-7,21.
      
      If bit 21 is set, there is parsing issues as below.
      
        $ perf stat -a -e uncore_qpi_0/event=0x200002,umask=0x8/
        event syntax error: '..pi_0/event=0x200002,umask=0x8/'
                                          \___ value too big for format, maximum is 511
      
      This patch return the real max value by setting all possible bits to 1.
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1459365375-14285-1-git-send-email-kan.liang@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ac0e2cd5
    • Adrian Hunter's avatar
      perf intel-pt/bts: Define JITDUMP_USE_ARCH_TIMESTAMP · bd0c7a54
      Adrian Hunter authored
      For Intel PT / BTS, define the environment variable that selects TSC
      timestamps in the jitdump file.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1457426333-30260-1-git-send-email-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bd0c7a54
    • Adrian Hunter's avatar
      perf jit: Add support for using TSC as a timestamp · 2a28e230
      Adrian Hunter authored
      Intel PT uses TSC as a timestamp, so add support for using TSC instead
      of the monotonic clock.  Use of TSC is selected by an environment
      variable "JITDUMP_USE_ARCH_TIMESTAMP" and flagged in the jitdump file
      with flag JITDUMP_FLAGS_ARCH_TIMESTAMP.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1457426330-30226-1-git-send-email-adrian.hunter@intel.com
      [ Added the fixup from He Kuang to make it build on other arches, ]
      [ such as aarch64, to avoid inserting this bisectiong breakage upstream ]
      Link: http://lkml.kernel.org/r/1459482572-129494-1-git-send-email-hekuang@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2a28e230
  3. 31 Mar, 2016 27 commits
    • Adrian Hunter's avatar
      perf tools: Add time conversion event · 46bc29b9
      Adrian Hunter authored
      Intel PT uses the time members from the perf_event_mmap_page to convert
      between TSC and perf time.
      
      Due to a lack of foresight when Intel PT was implemented, those time
      members were recorded in the (implementation dependent) AUXTRACE_INFO
      event, the structure of which is generally inaccessible outside of the
      Intel PT decoder.  However now the conversion between TSC and perf time
      is needed when processing a jitdump file when Intel PT has been used for
      tracing.
      
      So add a user event to record the time members.  'perf record' will
      synthesize the event if the information is available.  And session
      processing will put a copy of the event on the session so that tools
      like 'perf inject' can easily access it.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1457426324-30158-1-git-send-email-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      46bc29b9
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Pretty print getrandom() args · 39878d49
      Arnaldo Carvalho de Melo authored
        # trace -e getrandom
        35622.560 ( 0.023 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        35622.585 ( 0.006 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        35622.594 ( 0.004 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        35627.395 ( 0.010 ms): libvirtd/1353 getrandom(buf: 0x7f7a1bfa35c0, count: 16, flags: NONBLOCK    ) = 16
        35630.940 ( 0.013 ms): fwupd/16120 getrandom(buf: 0x7f63243aa5c0, count: 16, flags: NONBLOCK      ) = 16
        35718.613 ( 0.015 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        35718.629 ( 0.005 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        35718.637 ( 0.004 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        35719.355 ( 0.010 ms): libvirtd/1353 getrandom(buf: 0x7f7a1bfa35c0, count: 16, flags: NONBLOCK    ) = 16
        35721.042 ( 0.030 ms): fwupd/16120 getrandom(buf: 0x7f63243aa5c0, count: 16, flags: NONBLOCK      ) = 16
        41090.830 ( 0.012 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        41090.845 ( 0.004 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        41090.851 ( 0.004 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        41091.750 ( 0.010 ms): libvirtd/1353 getrandom(buf: 0x7f7a1bfa35c0, count: 16, flags: NONBLOCK    ) = 16
        41091.823 ( 0.006 ms): fwupd/16120 getrandom(buf: 0x7f63243aa5c0, count: 16, flags: NONBLOCK      ) = 16
        41122.078 ( 0.053 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        41122.129 ( 0.009 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        41122.139 ( 0.004 ms): systemd-udevd/631 getrandom(buf: 0x55621e3c18f0, count: 16, flags: NONBLOCK) = 16
        41124.492 ( 0.007 ms): libvirtd/1353 getrandom(buf: 0x7f7a1bfa35c0, count: 16, flags: NONBLOCK    ) = 16
        41124.470 ( 0.013 ms): fwupd/16120 getrandom(buf: 0x7f63243aa5c0, count: 16, flags: NONBLOCK      ) = 16
        41590.832 ( 0.014 ms): chrome/5957 getrandom(buf: 0x7fabac7b15b0, count: 16, flags: NONBLOCK      ) = 16
        41590.884 ( 0.004 ms): chrome/5957 getrandom(buf: 0x7fabac7b15c0, count: 16, flags: NONBLOCK      ) = 16
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-gca0n1p3aca3depey703ph2q@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      39878d49
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Pretty print seccomp() args · 997bba8c
      Arnaldo Carvalho de Melo authored
      E.g:
      
        # trace -e seccomp
         200.061 (0.009 ms): :2441/2441 seccomp(op: FILTER, flags: TSYNC                       ) = -1 EFAULT Bad address
         200.910 (0.121 ms): :2441/2441 seccomp(op: FILTER, flags: TSYNC, uargs: 0x7fff57479fe0) = 0
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-t369uckshlwp4evkks4bcoo7@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      997bba8c
    • Arnaldo Carvalho de Melo's avatar
      perf trace: Do not process PERF_RECORD_LOST twice · 3ed5ca2e
      Arnaldo Carvalho de Melo authored
      We catch this record to provide a visual indication that events are
      getting lost, then call the default method to allow extra logging shared
      with the other tools to take place.
      
      This extra logging was done twice because we were continuing to the
      "default" clause where machine__process_event() will end up calling
      machine__process_lost_event() again, fix it.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-wus2zlhw3qo24ye84ewu4aqw@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3ed5ca2e
    • Wang Nan's avatar
      perf/ring_buffer: Prepare writing into the ring-buffer from the end · d1b26c70
      Wang Nan authored
      Convert perf_output_begin() to __perf_output_begin() and make the later
      function able to write records from the end of the ring-buffer.
      
      Following commits will utilize the 'backward' flag.
      
      This is the core patch to support writing to the ring-buffer backwards,
      which will be introduced by upcoming patches to support reading from
      overwritable ring-buffers.
      
      In theory, this patch should not introduce any extra performance
      overhead since we use always_inline, but it does not hurt to double
      check that assumption:
      
      When CONFIG_OPTIMIZE_INLINING is disabled, the output object is nearly
      identical to original one. See:
      
         http://lkml.kernel.org/g/56F52E83.70409@huawei.com
      
      When CONFIG_OPTIMIZE_INLINING is enabled, the resuling object file becomes
      smaller:
      
       $ size kernel/events/ring_buffer.o*
         text       data        bss        dec        hex    filename
         4641          4          8       4653       122d kernel/events/ring_buffer.o.old
         4545          4          8       4557       11cd kernel/events/ring_buffer.o.new
      
      Performance testing results:
      
      Calling 3000000 times of 'close(-1)', use gettimeofday() to check
      duration.  Use 'perf record -o /dev/null -e raw_syscalls:*' to capture
      system calls. In ns.
      
      Testing environment:
      
       CPU    : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
       Kernel : v4.5.0
      
                           MEAN         STDVAR
        BASE            800214.950    2853.083
        PRE            2253846.700    9997.014
        POST           2257495.540    8516.293
      
      Where 'BASE' is pure performance without capturing. 'PRE' is test
      result of pure 'v4.5.0' kernel. 'POST' is test result after this
      patch.
      
      Considering the stdvar, this patch doesn't hurt performance, within
      noise margin.
      
      For testing details, see:
      
        http://lkml.kernel.org/g/56F89DCD.1040202@huawei.comSigned-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <pi3orama@163.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/1459147292-239310-4-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d1b26c70
    • Wang Nan's avatar
      perf/core: Set event's default ::overflow_handler() · 1879445d
      Wang Nan authored
      Set a default event->overflow_handler in perf_event_alloc() so don't
      need to check event->overflow_handler in __perf_event_overflow().
      Following commits can give a different default overflow_handler.
      
      Initial idea comes from Peter:
      
        http://lkml.kernel.org/r/20130708121557.GA17211@twins.programming.kicks-ass.net
      
      Since the default value of event->overflow_handler is not NULL, existing
      'if (!overflow_handler)' checks need to be changed.
      
      is_default_overflow_handler() is introduced for this.
      
      No extra performance overhead is introduced into the hot path because in the
      original code we still need to read this handler from memory. A conditional
      branch is avoided so actually we remove some instructions.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <pi3orama@163.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/1459147292-239310-3-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1879445d
    • Wang Nan's avatar
      perf/ring_buffer: Introduce new ioctl options to pause and resume the ring-buffer · 86e7972f
      Wang Nan authored
      Add new ioctl() to pause/resume ring-buffer output.
      
      In some situations we want to read from the ring-buffer only when we
      ensure nothing can write to the ring-buffer during reading. Without
      this patch we have to turn off all events attached to this ring-buffer
      to achieve this.
      
      This patch is a prerequisite to enable overwrite support for the
      perf ring-buffer support. Following commits will introduce new methods
      support reading from overwrite ring buffer. Before reading, caller
      must ensure the ring buffer is frozen, or the reading is unreliable.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <pi3orama@163.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/1459147292-239310-2-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      86e7972f
    • Jiri Olsa's avatar
      ftrace/perf: Check sample types only for sampling events · 0a74c5b3
      Jiri Olsa authored
      Currently we check sample type for ftrace:function events
      even if it's not created as a sampling event. That prevents
      creating ftrace_function event in counting mode.
      
      Make sure we check sample types only for sampling events.
      
      Before:
        $ sudo perf stat -e ftrace:function ls
        ...
      
         Performance counter stats for 'ls':
      
           <not supported>      ftrace:function
      
               0.001983662 seconds time elapsed
      
      After:
        $ sudo perf stat -e ftrace:function ls
        ...
      
         Performance counter stats for 'ls':
      
                    44,498      ftrace:function
      
               0.037534722 seconds time elapsed
      Suggested-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1458138873-1553-2-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0a74c5b3
    • Alexander Shishkin's avatar
      perf/x86/intel/bts: Move transaction start/stop to start/stop callbacks · 981a4cb3
      Alexander Shishkin authored
      As per AUX buffer management requirement, AUX output has to happen between
      pmu::start and pmu::stop calls so that perf_event_stop() actually stops it
      and therefore perf can free the AUX data after it has called pmu::stop.
      
      This patch moves perf_aux_output_{begin,end} from bts_event_{add,del} to
      bts_event_{start,stop}. As a bonus, we get rid of bts_buffer_is_full(),
      which is already taken care of by perf_aux_output_begin() anyway.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1457098969-21595-6-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      981a4cb3
    • Alexander Shishkin's avatar
      perf/x86/intel/pt: Move transaction start/stop to PMU start/stop callbacks · 66d21901
      Alexander Shishkin authored
      As per AUX buffer management requirement, AUX output has to happen between
      pmu::start and pmu::stop calls so that perf_event_stop() actually stops it
      and therefore perf can free the AUX data after it has called pmu::stop.
      
      This patch moves perf_aux_output_{begin,end} from pt_event_{add,del} to
      pt_event_{start,stop}. As a bonus, we get rid of pt_buffer_is_full(),
      which is already taken care of by perf_aux_output_begin() anyway.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1457098969-21595-5-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      66d21901
    • Alexander Shishkin's avatar
      perf/ring_buffer: Document AUX API usage · af5bb4ed
      Alexander Shishkin authored
      In order to ensure safe AUX buffer management, we rely on the assumption
      that pmu::stop() stops its ongoing AUX transaction and not just the hw.
      
      This patch documents this requirement for the perf_aux_output_{begin,end}()
      APIs.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1457098969-21595-4-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      af5bb4ed
    • Alexander Shishkin's avatar
      perf/core: Free AUX pages in unmap path · 95ff4ca2
      Alexander Shishkin authored
      Now that we can ensure that when ring buffer's AUX area is on the way
      to getting unmapped new transactions won't start, we only need to stop
      all events that can potentially be writing aux data to our ring buffer.
      
      Having done that, we can safely free the AUX pages and corresponding
      PMU data, as this time it is guaranteed to be the last aux reference
      holder.
      
      This partially reverts:
      
        57ffc5ca ("perf: Fix AUX buffer refcounting")
      
      ... which was made to defer deallocation that was otherwise possible
      from an NMI context. Now it is no longer the case; the last call to
      rb_free_aux() that drops the last AUX reference has to happen in
      perf_mmap_close() on that AUX area.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/87d1qtz23d.fsf@ashishki-desk.ger.corp.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      95ff4ca2
    • Alexander Shishkin's avatar
      perf/ring_buffer: Refuse to begin AUX transaction after rb->aux_mmap_count drops · dcb10a96
      Alexander Shishkin authored
      When ring buffer's AUX area is unmapped and rb->aux_mmap_count drops to
      zero, new AUX transactions into this buffer can still be started,
      even though the buffer in en route to deallocation.
      
      This patch adds a check to perf_aux_output_begin() for rb->aux_mmap_count
      being zero, in which case there is no point starting new transactions,
      in other words, the ring buffers that pass a certain point in
      perf_mmap_close will not have their events sending new data, which
      clears path for freeing those buffers' pages right there and then,
      provided that no active transactions are holding the AUX reference.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1457098969-21595-2-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dcb10a96
    • Peter Zijlstra's avatar
      perf/core: Verify we have a single perf_hw_context PMU · 26657848
      Peter Zijlstra authored
      There should (and can) only be a single PMU for perf_hw_context
      events.
      
      This is because of how we schedule events: once a hardware event fails to
      schedule (the PMU is 'full') we stop trying to add more. The trivial
      'fix' would break the Round-Robin scheduling we do.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      26657848
    • Peter Zijlstra's avatar
      perf/x86: Move Kconfig.perf and other perf configuration bits to events/Kconfig · 07dc900e
      Peter Zijlstra authored
      Ingo says:
      
       "If we do a separate file we should have it in arch/x86/events/Kconfig
        (not in arch/x86/Kconfig.perf), and also move some of the other bits,
        such as PERF_EVENTS_AMD_POWER?"
      Suggested-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      07dc900e
    • Huang Rui's avatar
      perf/x86/msr: Add AMD IRPERF (Instructions Retired) performance counter · aaf24884
      Huang Rui authored
      AMD Zeppelin (Family 17h, Model 00h) introduces an instructions
      retired performance counter which is indicated by
      CPUID.8000_0008H:EBX[1]. A dedicated Instructions Retired MSR register
      (MSR 0xC000_000E9) increments once for every instruction retired.
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Jacob Shin <jacob.w.shin@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1454056197-5893-3-git-send-email-ray.huang@amd.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      aaf24884
    • Huang Rui's avatar
      perf/x86/msr: Add AMD PTSC (Performance Time-Stamp Counter) support · 8a224261
      Huang Rui authored
      AMD Carrizo (Family 15h, Model 60h) introduces a time-stamp counter
      which is indicated by CPUID.8000_0001H:ECX[27]. It increments at a 100
      MHz rate in all P-states, and C states, S0, or S1. The frequency is
      about 100MHz. This counter will be used to calculate processor power
      and other parts. So add an interface into the MSR PMU to get the PTSC
      counter value.
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Jacob Shin <jacob.w.shin@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1454056197-5893-2-git-send-email-ray.huang@amd.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8a224261
    • Thomas Gleixner's avatar
      x86/perf/intel/cstate: Modularize driver · c7afba32
      Thomas Gleixner authored
      Add the exit function and allow the driver to be built as a module.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20160320185623.658869675@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c7afba32
    • Thomas Gleixner's avatar
      x86/perf/intel/cstate: Sanitize error handling · d29859e7
      Thomas Gleixner authored
      There is no point in WARN_ON() inside of a well known init function. We
      already know the call stack and it's really not of critical importance whether
      the registration of a PMU fails.
      
      Aside of that for consistency reasons it's just pointless to try to register
      another PMU if the first register attempt failed. There is also no value in
      keeping one PMU if the second one can not be registered.
      
      Make it consistent so we can finaly modularize the driver.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20160320185623.579794064@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d29859e7
    • Thomas Gleixner's avatar
      x86/perf/intel/cstate: Sanitize probing · 424646ee
      Thomas Gleixner authored
      The whole probing functionality can simply be expressed with model matching
      and a bunch of structures describing the variants. This is a first step to
      make that driver modular.
      
      While at it, get rid of completely pointless comments and name the enums so
      they are self explaining.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      [ Reworked probing to clear msr[].attr for all !present msrs. ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20160320185623.500381872@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      424646ee
    • Thomas Gleixner's avatar
      x86/perf/intel/cstate: Make cstate hotplug handling actually work · 49de0493
      Thomas Gleixner authored
      The current implementation aside of being an incomprehensible mess is broken.
      
        # cat /sys/bus/event_source/devices/cstate_core/cpumask
        0-17
      
      That's on a quad socket machine with 72 physical cores! Qualitee stuff.
      
      So it's not a surprise that event migration in case of CPU hotplug does not
      work either.
      
        # perf stat -e cstate_core/c6-residency/ -C 1 sleep 60 &
        # echo 0 >/sys/devices/system/cpu/cpu1/online
      
      Tracing cstate_pmu_event_update gives me:
      
       [001] cstate_pmu_event_update <-event_sched_out
      
      After the fix it properly moves the event:
      
       [001] cstate_pmu_event_update <-event_sched_out
       [073] cstate_pmu_event_update <-__perf_event_read
       [073] cstate_pmu_event_update <-event_sched_out
      
      The migration of pkg events does not work either. Not that I'm surprised.
      
      I really could not be bothered to decode that loop mess and simply replaced it
      by querying the proper cpumasks which give us the answer in a comprehensible
      way.
      
      This also requires to direct the event to the current active reader CPU in
      cstate_pmu_event_init() otherwise the hotplug logic can't work.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      [ Added event->cpu < 0 test to not explode]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20160320185623.422519970@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      49de0493
    • Kan Liang's avatar
      x86/perf/intel/rapl: Make the Intel RAPL PMU driver modular · 4b6e2571
      Kan Liang authored
      By default, the RAPL driver will be built into the kernel. If it is
      configured as a module, the supported CPU model can be auto loaded.
      
      Also clean up the code of rapl_pmu_init().
      Based-on-a-patch-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1458372050-2420-2-git-send-email-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4b6e2571
    • Kan Liang's avatar
      x86/perf/intel/uncore: Make the Intel uncore PMU driver modular · e633c65a
      Kan Liang authored
      By default, the uncore driver will be built into the kernel. If it is
      configured as a module, the supported CPU model can be auto loaded.
      
      This patch also cleans up the code of uncore_cpu_init() and
      uncore_pci_init().
      Based-on-a-patch-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1458462817-2475-1-git-send-email-kan.liang@intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e633c65a
    • Ingo Molnar's avatar
    • Peter Zijlstra's avatar
      perf/x86/amd/ibs: Fix pmu::stop() nesting · 85dc6002
      Peter Zijlstra authored
      Patch 5a50f529 ("perf/x86/ibs: Fix race with IBS_STARTING state")
      closed a big hole while opening another, smaller hole.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 5a50f529 ("perf/x86/ibs: Fix race with IBS_STARTING state")
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      85dc6002
    • Alexander Shishkin's avatar
      perf/core: Don't leak event in the syscall error path · 201c2f85
      Alexander Shishkin authored
      In the error path, event_file not being NULL is used to determine
      whether the event itself still needs to be free'd, so fix it up to
      avoid leaking.
      Reported-by: default avatarLeon Yu <chianglungyu@gmail.com>
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 13005627 ("perf: Do not double free")
      Link: http://lkml.kernel.org/r/87twk06yxp.fsf@ashishki-desk.ger.corp.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      201c2f85
    • Peter Zijlstra's avatar
      perf/core: Fix time tracking bug with multiplexing · 8fdc6539
      Peter Zijlstra authored
      Stephane reported that commit:
      
        3cbaa590 ("perf: Fix ctx time tracking by introducing EVENT_TIME")
      
      introduced a regression wrt. time tracking, as easily observed by:
      
      > This patch introduce a bug in the time tracking of events when
      > multiplexing is used.
      >
      > The issue is easily reproducible with the following perf run:
      >
      >  $ perf stat -a -C 0 -e branches,branches,branches,branches,branches,branches -I 1000
      >      1.000730239            652,394      branches   (66.41%)
      >      1.000730239            597,809      branches   (66.41%)
      >      1.000730239            593,870      branches   (66.63%)
      >      1.000730239            651,440      branches   (67.03%)
      >      1.000730239            656,725      branches   (66.96%)
      >      1.000730239      <not counted>      branches
      >
      > One branches event is shown as not having run. Yet, with
      > multiplexing, all events should run especially with a 1s (-I 1000)
      > interval. The delta for time_running comes out to 0. Yet, the event
      > has run because the kernel is actually multiplexing the events. The
      > problem is that the time tracking is the kernel and especially in
      > ctx_sched_out() is wrong now.
      >
      > The problem is that in case that the kernel enters ctx_sched_out() with the
      > following state:
      >    ctx->is_active=0x7 event_type=0x1
      >    Call Trace:
      >     [<ffffffff813ddd41>] dump_stack+0x63/0x82
      >     [<ffffffff81182bdc>] ctx_sched_out+0x2bc/0x2d0
      >     [<ffffffff81183896>] perf_mux_hrtimer_handler+0xf6/0x2c0
      >     [<ffffffff811837a0>] ? __perf_install_in_context+0x130/0x130
      >     [<ffffffff810f5818>] __hrtimer_run_queues+0xf8/0x2f0
      >     [<ffffffff810f6097>] hrtimer_interrupt+0xb7/0x1d0
      >     [<ffffffff810509a8>] local_apic_timer_interrupt+0x38/0x60
      >     [<ffffffff8175ca9d>] smp_apic_timer_interrupt+0x3d/0x50
      >     [<ffffffff8175ac7c>] apic_timer_interrupt+0x8c/0xa0
      >
      > In that case, the test:
      >       if (is_active & EVENT_TIME)
      >
      > will be false and the time will not be updated. Time must always be updated on
      > sched out.
      
      Fix this by always updating time if EVENT_TIME was set, as opposed to
      only updating time when EVENT_TIME changed.
      Reported-by: default avatarStephane Eranian <eranian@google.com>
      Tested-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: kan.liang@intel.com
      Cc: namhyung@kernel.org
      Fixes: 3cbaa590 ("perf: Fix ctx time tracking by introducing EVENT_TIME")
      Link: http://lkml.kernel.org/r/20160329072644.GB3408@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8fdc6539