1. 28 Dec, 2016 2 commits
    • Arnaldo Carvalho de Melo's avatar
      samples/bpf sock_example: Avoid getting ethhdr from two includes · ee12996c
      Arnaldo Carvalho de Melo authored
      To avoid the following build failure on Alpine Linux 3.4, that has
      clang-3.8 with the bpf target:
      
          HOSTCC  samples/bpf/sock_example.o
        In file included from /usr/include/net/ethernet.h:10:0,
                         from /git/linux/samples/bpf/sock_example.h:7,
                         from /git/linux/samples/bpf/sock_example.c:30:
        /usr/include/netinet/if_ether.h:96:8: error: redefinition of 'struct
        ethhdr'
         struct ethhdr {
                ^
        In file included from /git/linux/samples/bpf/sock_example.c:26:0:
        ./usr/include/linux/if_ether.h:144:8: note: originally defined here
         struct ethhdr {
                ^
        scripts/Makefile.host:124: recipe for target
        'samples/bpf/sock_example.o' failed
        make[2]: *** [samples/bpf/sock_example.o] Error 1
        /git/linux/Makefile:1658: recipe for target 'samples/bpf/' failed
      
      So include net/if_ether.h for the needs of sock_example.h, using the
      same include that sock_example.c uses.
      
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Joe Stringer <joe@ovn.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-m9avekl1b651qe1r1zd5tzz9@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ee12996c
    • Namhyung Kim's avatar
      perf sched timehist: Show total scheduling time · 9396c9cb
      Namhyung Kim authored
      Show length of analyzed sample time and rate of idle task running.
      This also takes care of time range given by --time option.
      
        $ perf sched timehist -sI | tail
        Samples do not have callchains.
        Idle stats:
            CPU  0 idle for    930.316  msec  ( 92.93%)
            CPU  1 idle for    963.614  msec  ( 96.25%)
            CPU  2 idle for    885.482  msec  ( 88.45%)
            CPU  3 idle for    938.635  msec  ( 93.76%)
      
            Total number of unique tasks: 118
        Total number of context switches: 2337
                   Total run time (msec): 3718.048
            Total scheduling time (msec): 1001.131  (x 4)
      Suggested-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20161222060350.17655-3-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9396c9cb
  2. 23 Dec, 2016 1 commit
  3. 22 Dec, 2016 6 commits
    • Namhyung Kim's avatar
      perf sched timehist: Fix invalid period calculation · bdd75729
      Namhyung Kim authored
      When --time option is given with a value outside recorded time, the last
      sample time (tprev) was set to that value and run time calculation might
      be incorrect.  This is a problem of the first samples for each cpus
      since it would skip the runtime update when tprev is 0.  But with --time
      option it had non-zero (which is invalid) value so the calculation is
      also incorrect.
      
      For example, let's see the followging:
      
        $ perf sched timehist
                   time    cpu  task name                       wait time  sch delay   run time
                                [tid/pid]                          (msec)     (msec)     (msec)
        --------------- ------  ------------------------------  ---------  ---------  ---------
            3195.968367 [0003]  <idle>                              0.000      0.000      0.000
            3195.968386 [0002]  Timer[4306/4277]                    0.000      0.000      0.018
            3195.968397 [0002]  Web Content[4277]                   0.000      0.000      0.000
            3195.968595 [0001]  JS Helper[4302/4277]                0.000      0.000      0.000
            3195.969217 [0000]  <idle>                              0.000      0.000      0.621
            3195.969251 [0001]  kworker/1:1H[291]                   0.000      0.000      0.033
      
      The sample starts at 3195.968367 but when I gave a time interval from
      3194 to 3196 (in sec) it will calculate the whole 2 second as runtime.
      In below, 2 cpus accounted it as runtime, other 2 cpus accounted it as
      idle time.
      
      Before:
      
        $ perf sched timehist --time 3194,3196 -s | tail
        Idle stats:
            CPU  0 idle for   1995.991  msec
            CPU  1 idle for     20.793  msec
            CPU  2 idle for     30.191  msec
            CPU  3 idle for   1999.852  msec
      
            Total number of unique tasks: 23
        Total number of context switches: 128
                   Total run time (msec): 3724.940
      
      After:
      
        $ perf sched timehist --time 3194,3196 -s | tail
        Idle stats:
            CPU  0 idle for     10.811  msec
            CPU  1 idle for     20.793  msec
            CPU  2 idle for     30.191  msec
            CPU  3 idle for     18.337  msec
      
            Total number of unique tasks: 23
        Total number of context switches: 128
                   Total run time (msec): 18.139
      
      Committer notes:
      
      Further testing:
      
      Before:
      
        Idle stats:
            CPU  0 idle for    229.785  msec
            CPU  1 idle for    937.944  msec
            CPU  2 idle for    188.931  msec
            CPU  3 idle for    986.185  msec
      
        After:
      
        # perf sched timehist --time 40602,40603 -s | tail
      
        Idle stats:
            CPU  0 idle for    229.785  msec
            CPU  1 idle for    175.407  msec
            CPU  2 idle for    188.931  msec
            CPU  3 idle for    223.657  msec
      
            Total number of unique tasks: 68
        Total number of context switches: 814
                   Total run time (msec): 97.688
      
        # for cpu in `seq 0 3` ; do echo -n "CPU $cpu idle for " ; perf sched timehist --time 40602,40603 | grep "\[000${cpu}\].*\<idle\>" | tr -s ' ' | cut -d' ' -f7 | awk '{entries++ ; s+=$1} END {print s " msec (entries: " entries ")"}' ; done
        CPU 0 idle for 229.721 msec (entries: 123)
        CPU 1 idle for 175.381 msec (entries: 65)
        CPU 2 idle for 188.903 msec (entries: 56)
        CPU 3 idle for 223.61 msec (entries: 102)
      
      Difference due to the idle stats being accounted at nanoseconds precision while
      the <idle> entries in 'perf sched timehist' are trucated at msec.usec.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Fixes: 853b7407 ("perf sched timehist: Add option to specify time window of interest")
      Link: http://lkml.kernel.org/r/20161222060350.17655-2-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      bdd75729
    • Namhyung Kim's avatar
      perf sched timehist: Remove hardcoded 'comm_width' check at print_summary · 4fa0d1aa
      Namhyung Kim authored
      Now that the default 'comm_width' value is 30, no need to check that at
      print_summary,
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
      [ Split from a larger patch ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4fa0d1aa
    • Namhyung Kim's avatar
      perf sched timehist: Enlarge default 'comm_width' · 9b8087d7
      Namhyung Kim authored
      Current default value is 20 but it's easily changed to a bigger value as
      task has a long name and different tid and pid.  And it makes the output
      not aligned.  So change it to have a large value as summary shows.
      
      Committer notes:
      
      Before:
      
        # perf sched record
        ^C
        # perf sched timehist
        <SNIP>
          40602.770537 [0001]  rcuos/2[29]               7.970      0.002      0.020
          40602.771512 [0003]  <idle>                    0.003      0.000      0.986
          40602.771586 [0001]  <idle>                    0.020      0.000      1.049
          40602.771606 [0001]  qemu-system-x86[3593/3510]      0.000      0.002      0.020
          40602.771629 [0003]  qemu-system-x86[3510]           0.000      0.003      0.116
          40602.771776 [0000]  <idle>                          0.001      0.000      1.892
        <SNIP>
      
      After:
      
        # perf sched timehist
        <SNIP>
         40602.770537 [0001]  rcuos/2[29]                         7.970      0.002      0.020
         40602.771512 [0003]  <idle>                              0.003      0.000      0.986
         40602.771586 [0001]  <idle>                              0.020      0.000      1.049
         40602.771606 [0001]  qemu-system-x86[3593/3510]          0.000      0.002      0.020
         40602.771629 [0003]  qemu-system-x86[3510]               0.000      0.003      0.116
        <SNIP>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
      [ Split from a larger patch ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9b8087d7
    • Namhyung Kim's avatar
      perf sched timehist: Honour 'comm_width' when aligning the headers · 0e6758e8
      Namhyung Kim authored
      Current default value is 20, but that may change in the future, so make
      places where we have 20 hardcoded use 'comm_width'.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
      [ Split from a larger patch ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0e6758e8
    • Peter Zijlstra's avatar
      perf/x86: Fix overlap counter scheduling bug · 1134c2b5
      Peter Zijlstra authored
      Jiri reported the overlap scheduling exceeding its max stack.
      
      Looking at the constraint that triggered this, it turns out the
      overlap marker isn't needed.
      
      The comment with EVENT_CONSTRAINT_OVERLAP states: "This is the case if
      the counter mask of such an event is not a subset of any other counter
      mask of a constraint with an equal or higher weight".
      
      Esp. that latter part is of interest here I think, our overlapping mask
      is 0x0e, that has 3 bits set and is the highest weight mask in on the
      PMU, therefore it will be placed last. Can we still create a scenario
      where we would need to rewind that?
      
      The scenario for AMD Fam15h is we're having masks like:
      
      	0x3F -- 111111
      	0x38 -- 111000
      	0x07 -- 000111
      
      	0x09 -- 001001
      
      And we mark 0x09 as overlapping, because it is not a direct subset of
      0x38 or 0x07 and has less weight than either of those. This means we'll
      first try and place the 0x09 event, then try and place 0x38/0x07 events.
      Now imagine we have:
      
      	3 * 0x07 + 0x09
      
      and the initial pick for the 0x09 event is counter 0, then we'll fail to
      place all 0x07 events. So we'll pop back, try counter 4 for the 0x09
      event, and then re-try all 0x07 events, which will now work.
      
      The masks on the PMU in question are:
      
        0x01 - 0001
        0x03 - 0011
        0x0e - 1110
        0x0c - 1100
      
      But since all the masks that have overlap (0xe -> {0xc,0x3}) and (0x3 ->
      0x1) are of heavier weight, it should all work out.
      Reported-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Liang Kan <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20161109155153.GQ3142@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1134c2b5
    • Stephane Eranian's avatar
      perf/x86/pebs: Fix handling of PEBS buffer overflows · daa864b8
      Stephane Eranian authored
      This patch solves a race condition between PEBS and the PMU handler.
      
      In case multiple PEBS events are sampled at the same time,
      it is possible to have GLOBAL_STATUS bit 62 set indicating
      PEBS buffer overflow and also seeing at most 3 PEBS counters
      having their bits set in the status register. This is a sign
      that there was at least one PEBS record pending at the time
      of the PMU interrupt. PEBS counters must only be processed
      via the drain_pebs() calls, and not via the regular sample
      processing loop coming after that the function, otherwise
      phony regular samples may be generated in the sampling buffer
      not marked with the EXACT tag.
      
      Another possibility is to have one PEBS event and at least
      one non-PEBS event whic hoverflows while PEBS has armed. In this
      case, bit 62 of GLOBAL_STATUS will not be set, yet the overflow
      status bit for the PEBS counter will be on Skylake.
      
      To avoid this problem, we systematically ignore the PEBS-enabled
      counters from the GLOBAL_STATUS mask and we always process PEBS
      events via drain_pebs().
      
      The problem manifested itself by having non-exact samples when
      sampling only PEBS events, i.e., the PERF_SAMPLE_RECORD would
      not have the EXACT flag set.
      
      Note that this problem is only present on Skylake processor.
      This fix is harmless on older processors.
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarStephane Eranian <eranian@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1482395366-8992-1-git-send-email-eranian@google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      daa864b8
  4. 20 Dec, 2016 11 commits
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-20161220' of... · 03756917
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-20161220' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/core improvements and fixes:
      
      New features:
      
       - Introduce 'perf sched timehist --idle', to analyse processes
         going to/from idle state (Namhyung Kim)
      
      Fixes:
      
       - Allow 'perf record -u user' to continue when facing races with threads
         going away after having scanned them via /proc (Jiri Olsa)
      
       - Fix 'perf mem' --all-user/--all-kernel options (Jiri Olsa)
      
       - Support jumps with multiple arguments (Ravi Bangoria)
      
       - Fix jumps to before the function where they are located (Ravi Bangoria)
      
       - Fix lock-pi help string (Davidlohr Bueso)
      
       - Fix build of 'perf trace' in odd systems such as a RHEL PPC one (Jiri Olsa)
      
       - Do not overwrite valid build id in 'perf diff' (Kan Liang)
      
       - Don't throw error for zero length symbols, allowing the use of the TUI
         in PowerPC, where such symbols became more common recently (Ravi Bangoria)
      
      Infrastructure changes:
      
       - Switch of samples/bpf/ to use tools/lib/bpf, removing libbpf
         duplication (Joe Stringer)
      
       - Move headers check into bash script (Jiri Olsa)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      03756917
    • Joe Stringer's avatar
      samples/bpf: Move open_raw_sock to separate header · 9899694a
      Joe Stringer authored
      This function was declared in libbpf.c and was the only remaining
      function in this library, but has nothing to do with BPF. Shift it out
      into a new header, sock_example.h, and include it from the relevant
      samples.
      Signed-off-by: default avatarJoe Stringer <joe@ovn.org>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20161209024620.31660-8-joe@ovn.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9899694a
    • Joe Stringer's avatar
      samples/bpf: Remove perf_event_open() declaration · 205c8ada
      Joe Stringer authored
      This declaration was made in samples/bpf/libbpf.c for convenience, but
      there's already one in tools/perf/perf-sys.h. Reuse that one.
      
      Committer notes:
      
      Testing it:
      
        $ make -j4 O=../build/v4.9.0-rc8+ samples/bpf/
        make[1]: Entering directory '/home/build/v4.9.0-rc8+'
          CHK     include/config/kernel.release
          GEN     ./Makefile
          CHK     include/generated/uapi/linux/version.h
          Using /home/acme/git/linux as source for kernel
          CHK     include/generated/utsrelease.h
          CHK     include/generated/timeconst.h
          CHK     include/generated/bounds.h
          CHK     include/generated/asm-offsets.h
          CALL    /home/acme/git/linux/scripts/checksyscalls.sh
          HOSTCC  samples/bpf/test_verifier.o
          HOSTCC  samples/bpf/libbpf.o
          HOSTCC  samples/bpf/../../tools/lib/bpf/bpf.o
          HOSTCC  samples/bpf/test_maps.o
          HOSTCC  samples/bpf/sock_example.o
          HOSTCC  samples/bpf/bpf_load.o
      <SNIP>
          HOSTLD  samples/bpf/trace_event
          HOSTLD  samples/bpf/sampleip
          HOSTLD  samples/bpf/tc_l2_redirect
        make[1]: Leaving directory '/home/build/v4.9.0-rc8+'
        $
      
      Also tested the offwaketime resulting from the rebuild, seems to work as
      before.
      Signed-off-by: default avatarJoe Stringer <joe@ovn.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20161209024620.31660-7-joe@ovn.org
      [ Use -I$(srctree)/tools/lib/ to support out of source code tree builds ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      205c8ada
    • Arnaldo Carvalho de Melo's avatar
      samples/bpf: Be consistent with bpf_load_program bpf_insn parameter · 811b4f0d
      Arnaldo Carvalho de Melo authored
      Only one of the examples declare the bpf_insn bpf proggie as a const:
      
        $ grep 'struct bpf_insn [a-z]' samples/bpf/*.c
        samples/bpf/fds_example.c:	static const struct bpf_insn insns[] = {
        samples/bpf/sock_example.c:	struct bpf_insn prog[] = {
        samples/bpf/test_cgrp2_attach2.c:	struct bpf_insn prog[] = {
        samples/bpf/test_cgrp2_attach.c:	struct bpf_insn prog[] = {
        samples/bpf/test_cgrp2_sock.c:	struct bpf_insn prog[] = {
        $
      
      Which causes this warning:
      
        [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
        <SNIP>
           HOSTCC  samples/bpf/fds_example.o
        /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
        /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
              insns, insns_cnt, "GPL", 0,
              ^~~~~
        In file included from /git/linux/samples/bpf/libbpf.h:5:0,
                         from /git/linux/samples/bpf/bpf_load.h:4,
                         from /git/linux/samples/bpf/fds_example.c:15:
        /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
         int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
             ^~~~~~~~~~~~~~~~
          HOSTCC  samples/bpf/sockex1_user.o
      
      So just ditch that 'const' to reduce build noise, leaving changing the
      bpf_load_program() bpf_insn parameter to const to a later patch, if deemed
      adequate.
      
      Cc: Joe Stringer <joe@ovn.org>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-1z5xee8n3oa66jf62bpv16ed@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      811b4f0d
    • Joe Stringer's avatar
      tools lib bpf: Add bpf_prog_{attach,detach} · 5dc880de
      Joe Stringer authored
      Commit d8c5b17f ("samples: bpf: add userspace example for attaching
      eBPF programs to cgroups") added these functions to samples/libbpf, but
      during this merge all of the samples libbpf functionality is shifting to
      tools/lib/bpf. Shift these functions there.
      
      Committer notes:
      
      Use bzero + attr.FIELD = value instead of 'attr = { .FIELD = value, just
      like the other wrapper calls to sys_bpf with bpf_attr to make this build
      in older toolchais, such as the ones in CentOS 5 and 6.
      Signed-off-by: default avatarJoe Stringer <joe@ovn.org>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-au2zvtsh55vqeo3v3uw7jr4c@git.kernel.org
      Link: https://github.com/joestringer/linux/commit/353e6f298c3d0a92fa8bfa61ff898c5050261a12.patchSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      5dc880de
    • Joe Stringer's avatar
      samples/bpf: Switch over to libbpf · 43371c83
      Joe Stringer authored
      Now that libbpf under tools/lib/bpf/* is synced with the version from
      samples/bpf, we can get rid most of the libbpf library here.
      
      Committer notes:
      
      Built it in a docker fedora rawhide container and ran it in the f25 host, seems
      to work just like it did before this patch, i.e. the switch to tools/lib/bpf/
      doesn't seem to have introduced problems and Joe said he tested it with
      all the entries in samples/bpf/ and other code he found:
      
        [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux headers_install
        <SNIP>
        [root@f5065a7d6272 linux]# rm -rf /tmp/build/linux/samples/bpf/
        [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
        make[1]: Entering directory '/tmp/build/linux'
          CHK     include/config/kernel.release
          HOSTCC  scripts/basic/fixdep
          GEN     ./Makefile
          CHK     include/generated/uapi/linux/version.h
          Using /git/linux as source for kernel
          CHK     include/generated/utsrelease.h
          HOSTCC  scripts/basic/bin2c
          HOSTCC  arch/x86/tools/relocs_32.o
          HOSTCC  arch/x86/tools/relocs_64.o
          LD      samples/bpf/built-in.o
        <SNIP>
          HOSTCC  samples/bpf/fds_example.o
          HOSTCC  samples/bpf/sockex1_user.o
        /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
        /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
              insns, insns_cnt, "GPL", 0,
              ^~~~~
        In file included from /git/linux/samples/bpf/libbpf.h:5:0,
                         from /git/linux/samples/bpf/bpf_load.h:4,
                         from /git/linux/samples/bpf/fds_example.c:15:
        /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
         int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
             ^~~~~~~~~~~~~~~~
          HOSTCC  samples/bpf/sockex2_user.o
        <SNIP>
          HOSTCC  samples/bpf/xdp_tx_iptunnel_user.o
        clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include -I/git/linux/arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated  -I/git/linux/include -I./include -I/git/linux/arch/x86/include/uapi -I/git/linux/include/uapi -I./include/generated/uapi -include /git/linux/include/linux/kconfig.h  \
      	  -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
      	  -Wno-compare-distinct-pointer-types \
      	  -Wno-gnu-variable-sized-type-not-at-end \
      	  -Wno-address-of-packed-member -Wno-tautological-compare \
      	  -O2 -emit-llvm -c /git/linux/samples/bpf/sockex1_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/sockex1_kern.o
          HOSTLD  samples/bpf/tc_l2_redirect
        <SNIP>
          HOSTLD  samples/bpf/lwt_len_hist
          HOSTLD  samples/bpf/xdp_tx_iptunnel
        make[1]: Leaving directory '/tmp/build/linux'
        [root@f5065a7d6272 linux]#
      
      And then, in the host:
      
        [root@jouet bpf]# mount | grep "docker.*devicemapper\/"
        /dev/mapper/docker-253:0-1705076-9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 on /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 type xfs (rw,relatime,context="system_u:object_r:container_file_t:s0:c73,c276",nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota)
        [root@jouet bpf]# cd /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9/rootfs/tmp/build/linux/samples/bpf/
        [root@jouet bpf]# file offwaketime
        offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=f423d171e0487b2f802b6a792657f0f3c8f6d155, not stripped
        [root@jouet bpf]# readelf -SW offwaketime
        offwaketime         offwaketime_kern.o  offwaketime_user.o
        [root@jouet bpf]# readelf -SW offwaketime_kern.o
        There are 11 section headers, starting at offset 0x700:
      
        Section Headers:
          [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
          [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
          [ 1] .strtab           STRTAB          0000000000000000 000658 0000a8 00      0   0  1
          [ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
          [ 3] kprobe/try_to_wake_up PROGBITS        0000000000000000 000040 0000d8 00  AX  0   0  8
          [ 4] .relkprobe/try_to_wake_up REL             0000000000000000 0005a8 000020 10     10   3  8
          [ 5] tracepoint/sched/sched_switch PROGBITS        0000000000000000 000118 000318 00  AX  0   0  8
          [ 6] .reltracepoint/sched/sched_switch REL             0000000000000000 0005c8 000090 10     10   5  8
          [ 7] maps              PROGBITS        0000000000000000 000430 000050 00  WA  0   0  4
          [ 8] license           PROGBITS        0000000000000000 000480 000004 00  WA  0   0  1
          [ 9] version           PROGBITS        0000000000000000 000484 000004 00  WA  0   0  4
          [10] .symtab           SYMTAB          0000000000000000 000488 000120 18      1   4  8
        Key to Flags:
          W (write), A (alloc), X (execute), M (merge), S (strings)
          I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
          O (extra OS processing required) o (OS specific), p (processor specific)
          [root@jouet bpf]# ./offwaketime | head -3
        qemu-system-x86;entry_SYSCALL_64_fastpath;sys_ppoll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;hrtimer_wakeup;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel;start_cpu;;swapper/0 4
        firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 1
        swapper/2;start_cpu;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 61
        [root@jouet bpf]#
      Signed-off-by: default avatarJoe Stringer <joe@ovn.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: netdev@vger.kernel.org
      Link: https://github.com/joestringer/linux/commit/5c40f54a52b1f437123c81e21873f4b4b1f9bd55.patch
      Link: http://lkml.kernel.org/n/tip-xr8twtx7sjh5821g8qw47yxk@git.kernel.org
      [ Use -I$(srctree)/tools/lib/ to support out of source code tree builds, as noticed by Wang Nan ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      43371c83
    • Kan Liang's avatar
      perf diff: Do not overwrite valid build id · ed6c166c
      Kan Liang authored
      Fixes a perf diff regression issue which was introduced by commit
      5baecbcd ("perf symbols: we can now read separate debug-info files
      based on a build ID")
      
      The binary name could be same when perf diff different binaries. Build
      id is used to distinguish between them.
      However, the previous patch assumes the same binary name has same build
      id. So it overwrites the build id according to the binary name,
      regardless of whether the build id is set or not.
      
      Check the has_build_id in dso__load. If the build id is already set, use
      it.
      
      Before the fix:
      
        $ perf diff 1.perf.data 2.perf.data
        # Event 'cycles'
        #
        # Baseline    Delta  Shared Object     Symbol
        # ........  .......  ................  .............................
        #
          99.83%  -99.80%  tchain_edit       [.] f2
           0.12%  +99.81%  tchain_edit       [.] f3
           0.02%   -0.01%  [ixgbe]           [k] ixgbe_read_reg
      
        After the fix:
        $ perf diff 1.perf.data 2.perf.data
        # Event 'cycles'
        #
        # Baseline    Delta  Shared Object     Symbol
        # ........  .......  ................  .............................
        #
          99.83%   +0.10%  tchain_edit       [.] f3
           0.12%   -0.08%  tchain_edit       [.] f2
      Signed-off-by: default avatarKan Liang <kan.liang@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      CC: Dima Kogan <dima@secretsauce.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Fixes: 5baecbcd ("perf symbols: we can now read separate debug-info files based on a build ID")
      Link: http://lkml.kernel.org/r/1481642984-13593-1-git-send-email-kan.liang@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ed6c166c
    • Ravi Bangoria's avatar
      perf annotate: Don't throw error for zero length symbols · edee44be
      Ravi Bangoria authored
      'perf report --tui' exits with error when it finds a sample of zero
      length symbol (i.e. addr == sym->start == sym->end). Actually these are
      valid samples. Don't exit TUI and show report with such symbols.
      Reported-and-Tested-by: default avatarAnton Blanchard <anton@samba.org>
      Link: https://lkml.org/lkml/2016/10/8/189Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@kernel.org # v4.9+
      Link: http://lkml.kernel.org/r/1479804050-5028-1-git-send-email-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      edee44be
    • Davidlohr Bueso's avatar
      perf bench futex: Fix lock-pi help string · 9de3ffa1
      Davidlohr Bueso authored
      Obvious copy/paste typo from the requeue program.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Link: http://lkml.kernel.org/r/1481830584-30909-1-git-send-email-dave@stgolabs.netSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      9de3ffa1
    • Jiri Olsa's avatar
      perf trace: Check if MAP_32BIT is defined (again) · 2bd42f3a
      Jiri Olsa authored
      There might be systems where MAP_32BIT is not defined, like some some
      RHEL7 powerpc versions.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Kyle McMartin <kyle@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Fixes: 256763b0 ("perf trace beauty mmap: Add more conditional defines")
      Link: http://lkml.kernel.org/r/1481831814-23683-1-git-send-email-jolsa@kernel.org
      [ Changed the Fixme cset to the one removing the conditional switch case for MAP_32BIT ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2bd42f3a
    • Arnaldo Carvalho de Melo's avatar
      samples/bpf: Make perf_event_read() static · 96c2fb69
      Arnaldo Carvalho de Melo authored
      While testing Joe's conversion of samples/bpf/ to use tools/lib/bpf/ I noticed
      some warnings building samples/bpf/ on a Fedora Rawhide container, with
      clang/llvm 3.9 I noticed this:
      
        [root@1e797fdfbf4f linux]# make -j4 O=/tmp/build/linux/ samples/bpf/
        make[1]: Entering directory '/tmp/build/linux'
          CHK     include/config/kernel.release
          GEN     ./Makefile
          CHK     include/generated/uapi/linux/version.h
          Using /git/linux as source for kernel
        <SNIP>
          HOSTCC  samples/bpf/trace_output_user.o
        /git/linux/samples/bpf/trace_output_user.c:64:6: warning: no previous
        prototype for 'perf_event_read' [-Wmissing-prototypes]
         void perf_event_read(print_fn fn)
              ^~~~~~~~~~~~~~~
          HOSTLD  samples/bpf/trace_output
        make[1]: Leaving directory '/tmp/build/linux'
      
      Shut up the compiler by making that function static.
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Joe Stringer <joe@ovn.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20161215152927.GC6866@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      96c2fb69
  5. 18 Dec, 2016 1 commit
    • Marcin Nowakowski's avatar
      uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation · 297e765e
      Marcin Nowakowski authored
      Commit:
      
        72e6ae28 ('ARM: 8043/1: uprobes need icache flush after xol write'
      
      ... has introduced an arch-specific method to ensure all caches are
      flushed appropriately after an instruction is written to an XOL page.
      
      However, when the XOL area is created and the out-of-line breakpoint
      instruction is copied, caches are not flushed at all and stale data may
      be found in icache.
      
      Replace a simple copy_to_page() with arch_uprobe_copy_ixol() to allow
      the arch to ensure all caches are updated accordingly.
      
      This change fixes uprobes on MIPS InterAptiv (tested on Creator Ci40).
      Signed-off-by: default avatarMarcin Nowakowski <marcin.nowakowski@imgtec.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Victor Kamensky <victor.kamensky@linaro.org>
      Cc: linux-mips@linux-mips.org
      Link: http://lkml.kernel.org/r/1481625657-22850-1-git-send-email-marcin.nowakowski@imgtec.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      297e765e
  6. 15 Dec, 2016 19 commits
    • Joe Stringer's avatar
      samples/bpf: Make samples more libbpf-centric · d40fc181
      Joe Stringer authored
      Switch all of the sample code to use the function names from
      tools/lib/bpf so that they're consistent with that, and to declare their
      own log buffers. This allow the next commit to be purely devoted to
      getting rid of the duplicate library in samples/bpf.
      
      Committer notes:
      
      Testing it:
      
      On a fedora rawhide container, with clang/llvm 3.9, sharing the host
      linux kernel git tree:
      
        # make O=/tmp/build/linux/ headers_install
        # make O=/tmp/build/linux -C samples/bpf/
      
      Since I forgot to make it privileged, just tested it outside the
      container, using what it generated:
      
        # uname -a
        Linux jouet 4.9.0-rc8+ #1 SMP Mon Dec 12 11:20:49 BRT 2016 x86_64 x86_64 x86_64 GNU/Linux
        # cd /var/lib/docker/devicemapper/mnt/c43e09a53ff56c86a07baf79847f00e2cc2a17a1e2220e1adbf8cbc62734feda/rootfs/tmp/build/linux/samples/bpf/
        # ls -la offwaketime
        -rwxr-xr-x. 1 root root 24200 Dec 15 12:19 offwaketime
        # file offwaketime
        offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=c940d3f127d5e66cdd680e42d885cb0b64f8a0e4, not stripped
        # readelf -SW offwaketime_kern.o  | grep PROGBITS
        [ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
        [ 3] kprobe/try_to_wake_up PROGBITS        0000000000000000 000040 0000d8 00  AX  0   0  8
        [ 5] tracepoint/sched/sched_switch PROGBITS        0000000000000000 000118 000318 00  AX  0   0  8
        [ 7] maps              PROGBITS        0000000000000000 000430 000050 00  WA  0   0  4
        [ 8] license           PROGBITS        0000000000000000 000480 000004 00  WA  0   0  1
        [ 9] version           PROGBITS        0000000000000000 000484 000004 00  WA  0   0  4
        # ./offwaketime | head -5
        swapper/1;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 106
        CPU 0/KVM;entry_SYSCALL_64_fastpath;sys_ioctl;do_vfs_ioctl;kvm_vcpu_ioctl;kvm_arch_vcpu_ioctl_run;kvm_vcpu_block;schedule;__schedule;-;try_to_wake_up;swake_up_locked;swake_up;apic_timer_expired;apic_timer_fn;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary;;swapper/3 2
        Compositor;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;futex_requeue;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;SoftwareVsyncTh 5
        firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 13
        JS Helper;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;firefox 2
        #
      Signed-off-by: default avatarJoe Stringer <joe@ovn.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: netdev@vger.kernel.org
      Link: http://lkml.kernel.org/r/20161214224342.12858-2-joe@ovn.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d40fc181
    • Joe Stringer's avatar
      tools lib bpf: Add flags to bpf_create_map() · a5580c7f
      Joe Stringer authored
      Commit 6c905981 ("bpf: pre-allocate hash map elements") introduces
      map_flags to bpf_attr for BPF_MAP_CREATE command. Expose this new
      parameter in libbpf.
      
      By exposing it, users can access flags such as whether or not to
      preallocate the map.
      Signed-off-by: default avatarJoe Stringer <joe@ovn.org>
      Acked-by: default avatarWang Nan <wangnan0@huawei.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Link: http://lkml.kernel.org/r/20161209024620.31660-4-joe@ovn.org
      [ Added clarifying comment made by Wang Nan ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a5580c7f
    • Joe Stringer's avatar
      tools lib bpf: use __u32 from linux/types.h · 83d994d0
      Joe Stringer authored
      Fixes the following issue when building without access to 'u32' type:
      
      ./tools/lib/bpf/bpf.h:27:23: error: unknown type name ‘u32’
      Signed-off-by: default avatarJoe Stringer <joe@ovn.org>
      Acked-by: default avatarWang Nan <wangnan0@huawei.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Link: http://lkml.kernel.org/r/20161209024620.31660-3-joe@ovn.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      83d994d0
    • Joe Stringer's avatar
      tools lib bpf: Sync {tools,}/include/uapi/linux/bpf.h · 0cb34dc2
      Joe Stringer authored
      The tools version of this header is out of date; update it to the latest
      version from the kernel headers.
      Signed-off-by: default avatarJoe Stringer <joe@ovn.org>
      Acked-by: default avatarWang Nan <wangnan0@huawei.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Link: http://lkml.kernel.org/r/20161209024620.31660-2-joe@ovn.org
      [ Sync it harder, after merging with what was in net-next via perf/urgent via torvalds/master to get BPG_PROG_(AT|DE)TACH, etc ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0cb34dc2
    • Ravi Bangoria's avatar
      perf annotate: Fix jump target outside of function address range · e216874c
      Ravi Bangoria authored
      If jump target is outside of function range, perf is not handling it
      correctly. Especially when target address is lesser than function start
      address, target offset will be negative. But, target address declared to
      be unsigned, converts negative number into 2's complement. See below
      example. Here target of 'jumpq' instruction at 34cf8 is 34ac0 which is
      lesser than function start address(34cf0).
      
              34ac0 - 34cf0 = -0x230 = 0xfffffffffffffdd0
      
      Objdump output:
      
        0000000000034cf0 <__sigaction>:
        __GI___sigaction():
          34cf0: lea    -0x20(%rdi),%eax
          34cf3: cmp    -bashx1,%eax
          34cf6: jbe    34d00 <__sigaction+0x10>
          34cf8: jmpq   34ac0 <__GI___libc_sigaction>
          34cfd: nopl   (%rax)
          34d00: mov    0x386161(%rip),%rax        # 3bae68 <_DYNAMIC+0x2e8>
          34d07: movl   -bashx16,%fs:(%rax)
          34d0e: mov    -bashxffffffff,%eax
          34d13: retq
      
      perf annotate before applying patch:
      
        __GI___sigaction  /usr/lib64/libc-2.22.so
                 lea    -0x20(%rdi),%eax
                 cmp    -bashx1,%eax
              v  jbe    10
              v  jmpq   fffffffffffffdd0
                 nop
          10:    mov    _DYNAMIC+0x2e8,%rax
                 movl   -bashx16,%fs:(%rax)
                 mov    -bashxffffffff,%eax
                 retq
      
      perf annotate after applying patch:
      
        __GI___sigaction  /usr/lib64/libc-2.22.so
                 lea    -0x20(%rdi),%eax
                 cmp    -bashx1,%eax
              v  jbe    10
              ^  jmpq   34ac0 <__GI___libc_sigaction>
                 nop
          10:    mov    _DYNAMIC+0x2e8,%rax
                 movl   -bashx16,%fs:(%rax)
                 mov    -bashxffffffff,%eax
                 retq
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1480953407-7605-3-git-send-email-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e216874c
    • Ravi Bangoria's avatar
      perf annotate: Support jump instruction with target as second operand · 3ee2eb6d
      Ravi Bangoria authored
      Architectures like PowerPC have jump instructions that includes a target
      address as a second operand. For example, 'bne cr7,0xc0000000000f6154'.
      Add support for such instruction in perf annotate.
      
      objdump o/p:
        c0000000000f6140:   ld     r9,1032(r31)
        c0000000000f6144:   cmpdi  cr7,r9,0
        c0000000000f6148:   bne    cr7,0xc0000000000f6154
        c0000000000f614c:   ld     r9,2312(r30)
        c0000000000f6150:   std    r9,1032(r31)
        c0000000000f6154:   ld     r9,88(r31)
      
      Corresponding perf annotate o/p:
      
      Before patch:
               ld     r9,1032(r31)
               cmpdi  cr7,r9,0
            v  bne    3ffffffffff09f2c
               ld     r9,2312(r30)
               std    r9,1032(r31)
        74:    ld     r9,88(r31)
      
      After patch:
               ld     r9,1032(r31)
               cmpdi  cr7,r9,0
            v  bne    74
               ld     r9,2312(r30)
               std    r9,1032(r31)
        74:    ld     r9,88(r31)
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1480953407-7605-2-git-send-email-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3ee2eb6d
    • Jiri Olsa's avatar
      perf record: Force ignore_missing_thread for uid option · 23dc4f15
      Jiri Olsa authored
      Enable perf_evsel::ignore_missing_thread for -u option to ignore
      complete failure if any of the user's processes die between its
      enumeration and time we open the event.
      
      Committer notes:
      
      While doing a 'make -j4 allmodconfig' we sometimes get into the race:
      
      Before:
      
        # perf record -u acme
        Error:
        The sys_perf_event_open() syscall returned with 3 (No such process) for event (cycles:ppp).
        /bin/dmesg may provide additional information.
        No CONFIG_PERF_EVENTS=y kernel support configured?
        #
      
      After:
      
        [root@jouet ~]# perf record -u acme
        WARNING: Ignored open failure for pid 9888
        WARNING: Ignored open failure for pid 18059
        [root@jouet ~]#
      
      Which is an improvement, with the races not preventing the remaining threads
      for the specified user from being monitored, but the message probably needs
      further clarification.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1481538943-21874-6-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      23dc4f15
    • Jiri Olsa's avatar
      perf evsel: Allow to ignore missing pid · a359c17a
      Jiri Olsa authored
      Adding perf_evsel::ignore_missing_cpu_thread bool.
      
      When set true, it allows perf to ignore error of missing pid of perf
      event syscall.
      
      We remove missing thread id from the thread_map, so the rest of the
      processing like ioctl and mmap won't get disturbed with -1 fd.
      
      The reason for supporting this is to ease up monitoring group of pids,
      that 'disappear' before perf opens their event. This currently leads
      perf to report error and exit and makes perf record's -u option unusable
      under certain setup.
      
      With this change we will allow this race and ignore such failure with
      following warning:
      
        WARNING: Ignored open failure for pid 8605
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20161213074622.GA3084@kravaSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a359c17a
    • Jiri Olsa's avatar
      perf thread_map: Add thread_map__remove function · 38af91f0
      Jiri Olsa authored
      Add thread_map__remove function to remove thread from thread map.
      
      Add automated test also.
      
      Committer notes:
      
      Testing it:
      
        # perf test "Remove thread map"
        39: Remove thread map                          : Ok
        # perf test -v "Remove thread map"
        39: Remove thread map                          :
        --- start ---
        test child forked, pid 4483
        2 threads: 4482, 4483
        1 thread: 4483
        0 thread:
        test child finished with 0
        ---- end ----
        Remove thread map: Ok
        #
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1481538943-21874-4-git-send-email-jolsa@kernel.org
      [ Added stdlib.h, to get the free() declaration ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      38af91f0
    • Jiri Olsa's avatar
      perf evsel: Use variable instead of repeating lengthy FD macro · 83c2e4f3
      Jiri Olsa authored
      It's more readable and will ease up following patches.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1481538943-21874-3-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      83c2e4f3
    • Jiri Olsa's avatar
      perf mem: Fix --all-user/--all-kernel options · 631ac41b
      Jiri Olsa authored
      Removing extra '--' prefix.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Fixes: ad16511b ("perf mem: Add -U/-K (--all-user/--all-kernel) options")
      Link: http://lkml.kernel.org/r/1481538943-21874-2-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      631ac41b
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Remove some needless __maybe_unused · 7e6a7998
      Arnaldo Carvalho de Melo authored
      I.e. those parameters/functions _are_ used, so ditch that misleading attribute.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-13cqtjh0yojg5gzvpq1zzpl0@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7e6a7998
    • Namhyung Kim's avatar
      perf sched timehist: Show callchains for idle stat · ba957ebb
      Namhyung Kim authored
      When --idle-hist option is used with --summary, it now shows idle stats
      with callchains like below:
      
        Idle stats by callchain:
        CPU  0:   902.195 msec
        Idle time (msec)    Count Callchains
        ----------------  ------- --------------------------------------------------
                 370.589       69 futex_wait_queue_me <- futex_wait <- do_futex <- sys_futex <- entry_SYSCALL_64_fastpath
                 178.799       17 worker_thread <- kthread <- ret_from_fork
                 128.352       17 schedule_timeout <- rcu_gp_kthread <- kthread <- ret_from_fork
                 125.111       19 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_select <- core_sys_select
                  71.599       50 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_poll
                  23.146        1 rcu_gp_kthread <- kthread <- ret_from_fork
                   4.510        1 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- ep_poll <- sys_epoll_wait <- do_syscall_64
                   0.085        1 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- do_restart_poll
        ...
      
      Committer notes:
      
      Extra testing:
      
        # uname -a
        Linux jouet 4.8.8-300.fc25.x86_64 #1 SMP Tue Nov 15 18:10:06 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
      
      1) Run 'perf sched record -g'
      
      2) Run 'perf sched timehist --idle --summary'
      
      <SNIP>
        Idle stats by callchain:
        CPU  0: 13456.840 msec
        Idle time (msec) Count Callchains
        ---------------- ----- --------------------------------------------------
                5386.637  3283 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_poll
                2750.238  2299 futex_wait_queue_me <- futex_wait <- do_futex <- sys_futex <- do_syscall_64
                1275.672  1287 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- ep_poll <- sys_epoll_wait <- entry_SYSCALL_64_fastpath
                 936.322   452 worker_thread <- kthread <- ret_from_fork
                 741.311   385 rcu_nocb_kthread <- kthread <- ret_from_fork
                 729.385   248 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_ppoll
                 365.386   229 irq_thread <- kthread <- ret_from_fork
                 338.934   265 futex_wait_queue_me <- futex_wait <- do_futex <- sys_futex <- entry_SYSCALL_64_fastpath
                 219.488   201 schedule_timeout <- rcu_gp_kthread <- kthread <- ret_from_fork
                 186.839   410 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- ep_poll <- sys_epoll_wait <- do_syscall_64
                 142.541    59 kvm_vcpu_block <- kvm_arch_vcpu_ioctl_run <- kvm_vcpu_ioctl <- do_vfs_ioctl <- sys_ioctl
                  83.887    92 smpboot_thread_fn <- kthread <- ret_from_fork
                  62.722    96 do_exit <- do_group_exit <- 0x2a5594 <- entry_SYSCALL_64_fastpath
                  47.894    83 pipe_wait <- pipe_read <- __vfs_read <- vfs_read <- sys_read
                  46.554    61 rcu_gp_kthread <- kthread <- ret_from_fork
                  34.337    21 schedule_timeout <- intel_fbc_work_fn <- process_one_work <- worker_thread <- kthread
                  29.521    14 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_select <- core_sys_select
                  20.274    10 schedule_timeout <- io_schedule_timeout <- bit_wait_io <- __wait_on_bit <- out_of_line_wait_on_bit
                  15.085    55 schedule_timeout <- unix_stream_read_generic <- unix_stream_recvmsg <- sock_recvmsg <- SYSC_recvfrom
      <SNIP>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20161208144755.16673-7-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ba957ebb
    • Namhyung Kim's avatar
      perf sched timehist: Add -I/--idle-hist option · 07235f84
      Namhyung Kim authored
      The --idle-hist option is to analyze system idle state so which process
      makes cpu to go idle.  If this option is specified, non-idle events will
      be skipped and processes switching to/from idle will be shown.
      
      This option is mostly useful when used with --summary(-only) option.  In
      the idle-time summary view, idle time is accounted to previous thread
      which is run before idle task.
      
      The example output looks like following:
      
        Idle-time summary
                        comm parent sched-out idle-time min-idle avg-idle max-idle stddev migrations
                                      (count)    (msec)   (msec)   (msec)   (msec)      %
        --------------------------------------------------------------------------------------------
              rcu_preempt[7]      2        95   550.872    0.011    5.798   23.146   7.63      0
             migration/1[16]      2         1    15.558   15.558   15.558   15.558   0.00      0
              khugepaged[39]      2         1     3.062    3.062    3.062    3.062   0.00      0
           kworker/0:1H[124]      2         2     4.728    0.611    2.364    4.116  74.12      0
        systemd-journal[167]      1         1     4.510    4.510    4.510    4.510   0.00      0
          kworker/u16:3[558]      2        13    74.737    0.080    5.749   12.960  21.96      0
         irq/34-iwlwifi[628]      2        21   118.403    0.032    5.638   23.990  24.00      0
          kworker/u17:0[673]      2         1     3.523    3.523    3.523    3.523   0.00      0
            dbus-daemon[722]      1         1     6.743    6.743    6.743    6.743   0.00      0
                ifplugd[741]      1         1    58.826   58.826   58.826   58.826   0.00      0
        wpa_supplicant[1490]      1         1    13.302   13.302   13.302   13.302   0.00      0
           wpa_actiond[1492]      1         2     4.064    0.168    2.032    3.896  91.72      0
               dockerd[1500]      1         1     0.055    0.055    0.055    0.055   0.00      0
        ...
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20161208144755.16673-6-namhyung@kernel.org
      Link: http://lkml.kernel.org/r/20161213080632.19099-2-namhyung@kernel.org
      [ Merged fix sent by Namhyumg, as posted in the second Link: tag ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      07235f84
    • Namhyung Kim's avatar
      perf sched timehist: Skip non-idle events when necessary · a4b2b6f5
      Namhyung Kim authored
      Sometimes it only focuses on idle-related events like upcoming idle-hist
      feature.  In this case we don't want to see other event to reduce noise.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20161208144755.16673-5-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a4b2b6f5
    • Namhyung Kim's avatar
      perf sched timehist: Save callchain when entering idle · 699b5b92
      Namhyung Kim authored
      In order to investigate the idleness reason, it is necessary to keep the
      callchains when entering idle.  This can be identified by the
      sched:sched_switch event having the next_pid field as 0.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20161208144755.16673-4-namhyung@kernel.org
      Link: http://lkml.kernel.org/r/20161213080632.19099-1-namhyung@kernel.org
      [ Merged fix from Namhyung, see second Link: tag ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      699b5b92
    • Namhyung Kim's avatar
      perf sched timehist: Introduce struct idle_time_data · 3bc2fa9c
      Namhyung Kim authored
      The struct idle_time_data is to keep idle stats with callchains entering
      to the idle task.  The normal thread_runtime calculation is done
      transparently since it extends the struct thread_runtime.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20161208144755.16673-3-namhyung@kernel.org
      [ Align struct field names ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3bc2fa9c
    • Namhyung Kim's avatar
      perf sched timehist: Split is_idle_sample() · 96039c7c
      Namhyung Kim authored
      The is_idle_sample() function actually does more than determining
      whether sample come from idle task.  Split the callchain part into
      save_task_callchain() to make it clearer.
      
      Also checking prev_pid from trace data looks preferred than just
      checking sample->pid since it's possible, although rare, to have invalid
      0 pid/tid on scheduling an exiting task.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20161208144755.16673-2-namhyung@kernel.org
      [ Remove some needless () in some return statements ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      96039c7c
    • Jiri Olsa's avatar
      perf tools: Move headers check into bash script · aeafd623
      Jiri Olsa authored
      To make it nicer and easily maintainable.
      
      Also moving the check into fixdep sub make, so its output is not
      scattered around the build output.
      
      Removing extra $$ from mman*.h checks.
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1481030331-31944-5-git-send-email-jolsa@kernel.org
      [ Use /bin/sh, and 'function check() {' -> 'check () {' to make it work with busybox, in Alpine Linux, for instance ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      aeafd623