1. 24 Feb, 2016 1 commit
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-2' of... · c2b8d8c5
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-2' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      New features:
      
        - Add API to set values of map entries in a BPF object, be it
          individual map slots or ranges (Wang Nan)
      
        - Introduce support for the 'bpf-output' event (Wang Nan)
      
        - Add glue to read perf events in a BPF program (Wang Nan)
      
      User visible changes:
      
        - Don't stop PMU parsing on alias parse error, allowing the
          addition of new sysfs PMU files without breaking old tools (Andi Kleen)
      
        - Implement '%' operation in libtraceevent (Daniel Bristot de Oliveira)
      
        - Allow specifying events via -e in 'perf mem record', also listing what events
          can be specified via 'perf mem record -e list' (Jiri Olsa)
      
        - Improve support to 'data_src', 'weight' and 'addr' fields in
          'perf script' (Jiri Olsa)
      
      Infrastructure changes:
      
        - Export cacheline routines (Jiri Olsa)
      
        - Remove strbuf_{remove,splice}(), dead code (Arnaldo Carvalho de Melo)
      
      Fixes:
      
        - Sort key fixes: Alignment for srcline, file, trace; fix
          segfault for dynamic, trace events related sort keys (Namyung Kim)
      
      Build fixes:
      
        - Remove duplicate typedef config_term_func_t definition,
          fixing the build on older systems (Arnaldo Carvalho de Melo)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c2b8d8c5
  2. 23 Feb, 2016 11 commits
  3. 22 Feb, 2016 15 commits
    • Wang Nan's avatar
      perf tools: Introduce bpf-output event · 03e0a7df
      Wang Nan authored
      Commit a43eec30 ("bpf: introduce bpf_perf_event_output() helper")
      adds a helper to enable a BPF program to output data to a perf ring
      buffer through a new type of perf event, PERF_COUNT_SW_BPF_OUTPUT. This
      patch enables perf to create events of that type. Now a perf user can
      use the following cmdline to receive output data from BPF programs:
      
        # perf record -a -e bpf-output/no-inherit,name=evt/ \
                          -e ./test_bpf_output.c/map:channel.event=evt/ ls /
        # perf script
           perf 1560 [004] 347747.086295:  evt: ffffffff811fd201 sys_write ...
           perf 1560 [004] 347747.086300:  evt: ffffffff811fd201 sys_write ...
           perf 1560 [004] 347747.086315:  evt: ffffffff811fd201 sys_write ...
                  ...
      
      Test result:
      
        # cat test_bpf_output.c
        /************************ BEGIN **************************/
        #include <uapi/linux/bpf.h>
        struct bpf_map_def {
       	unsigned int type;
       	unsigned int key_size;
       	unsigned int value_size;
       	unsigned int max_entries;
        };
      
        #define SEC(NAME) __attribute__((section(NAME), used))
        static u64 (*ktime_get_ns)(void) =
       	(void *)BPF_FUNC_ktime_get_ns;
        static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
       	(void *)BPF_FUNC_trace_printk;
        static int (*get_smp_processor_id)(void) =
       	(void *)BPF_FUNC_get_smp_processor_id;
        static int (*perf_event_output)(void *, struct bpf_map_def *, int, void *, unsigned long) =
       	(void *)BPF_FUNC_perf_event_output;
      
        struct bpf_map_def SEC("maps") channel = {
       	.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
       	.key_size = sizeof(int),
       	.value_size = sizeof(u32),
       	.max_entries = __NR_CPUS__,
        };
      
        SEC("func_write=sys_write")
        int func_write(void *ctx)
        {
       	struct {
       		u64 ktime;
       		int cpuid;
       	} __attribute__((packed)) output_data;
       	char error_data[] = "Error: failed to output: %d\n";
      
       	output_data.cpuid = get_smp_processor_id();
       	output_data.ktime = ktime_get_ns();
       	int err = perf_event_output(ctx, &channel, get_smp_processor_id(),
       				    &output_data, sizeof(output_data));
       	if (err)
       		trace_printk(error_data, sizeof(error_data), err);
       	return 0;
        }
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        /************************ END ***************************/
      
        # perf record -a -e bpf-output/no-inherit,name=evt/ \
                          -e ./test_bpf_output.c/map:channel.event=evt/ ls /
        # perf script | grep ls
           ls  2242 [003] 347851.557563:   evt: ffffffff811fd201 sys_write ...
           ls  2242 [003] 347851.557571:   evt: ffffffff811fd201 sys_write ...
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456132275-98875-11-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      03e0a7df
    • Wang Nan's avatar
      perf tools: Apply tracepoint event definition options to BPF script · 95088a59
      Wang Nan authored
      Users can pass options to tracepoints defined in the BPF script.  For
      example:
      
        # perf record -e ./test.c/no-inherit/ bash
        # dd if=/dev/zero of=/dev/null count=10000
        # exit
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.022 MB perf.data (139 samples) ]
      
        (no-inherit works, only the sys_read issued by bash are captured, at
         least 10000 sys_read issued by dd are skipped.)
      
      test.c:
      
        #define SEC(NAME) __attribute__((section(NAME), used))
        SEC("func=sys_read")
        int bpf_func__sys_read(void *ctx)
        {
            return 1;
        }
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
      
      no-inherit is applied to the kprobe event defined in test.c.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456132275-98875-10-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      95088a59
    • Wang Nan's avatar
      perf tools: Enable indices setting syntax for BPF map · e571e029
      Wang Nan authored
      This patch introduces a new syntax to perf event parser:
      
       # perf record -e './test_bpf_map_3.c/map:channel.value[0,1,2,3...5]=101/' usleep 2
      
      By utilizing the basic facilities in bpf-loader.c which allow setting
      different slots in a BPF map separately, the newly introduced syntax
      allows perf to control specific elements in a BPF map.
      
      Test result:
      
        # cat ./test_bpf_map_3.c
        /************************ BEGIN **************************/
        #include <uapi/linux/bpf.h>
        #define SEC(NAME) __attribute__((section(NAME), used))
        struct bpf_map_def {
      	unsigned int type;
      	unsigned int key_size;
      	unsigned int value_size;
      	unsigned int max_entries;
        };
        static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
       	(void *)BPF_FUNC_map_lookup_elem;
        static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
       	(void *)BPF_FUNC_trace_printk;
        struct bpf_map_def SEC("maps") channel = {
       	.type = BPF_MAP_TYPE_ARRAY,
       	.key_size = sizeof(int),
       	.value_size = sizeof(unsigned char),
       	.max_entries = 100,
        };
        SEC("func=hrtimer_nanosleep rqtp->tv_nsec")
        int func(void *ctx, int err, long nsec)
        {
       	char fmt[] = "%ld\n";
       	long usec = nsec * 0x10624dd3 >> 38; // nsec / 1000
       	int key = (int)usec;
       	unsigned char *pval = map_lookup_elem(&channel, &key);
      
       	if (!pval)
       		return 0;
       	trace_printk(fmt, sizeof(fmt), (unsigned char)*pval);
       	return 0;
        }
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        /************************* END ***************************/
      
      Normal case:
      
        # echo "" > /sys/kernel/debug/tracing/trace
        # ./perf record -e './test_bpf_map_3.c/map:channel.value[0,1,2,3...5]=101/' usleep 2
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.012 MB perf.data ]
        # cat /sys/kernel/debug/tracing/trace | grep usleep
                  usleep-405   [004] d... 2745423.547822: : 101
        # ./perf record -e './test_bpf_map_3.c/map:channel.value[0...9,20...29]=102,map:channel.value[10...19]=103/' usleep 3
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.012 MB perf.data ]
        # ./perf record -e './test_bpf_map_3.c/map:channel.value[0...9,20...29]=102,map:channel.value[10...19]=103/' usleep 15
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.012 MB perf.data ]
        # cat /sys/kernel/debug/tracing/trace | grep usleep
                  usleep-405   [004] d... 2745423.547822: : 101
                  usleep-655   [006] d... 2745434.122814: : 102
                  usleep-904   [006] d... 2745439.916264: : 103
        # ./perf record -e './test_bpf_map_3.c/map:channel.value[all]=104/' usleep 99
        # cat /sys/kernel/debug/tracing/trace | grep usleep
                  usleep-405   [004] d... 2745423.547822: : 101
                  usleep-655   [006] d... 2745434.122814: : 102
                  usleep-904   [006] d... 2745439.916264: : 103
                  usleep-1537  [003] d... 2745538.053737: : 104
      
      Error case:
      
        # ./perf record -e './test_bpf_map_3.c/map:channel.value[10...1000]=104/' usleep 99
        event syntax error: '..annel.value[10...1000]=104/'
                                         \___ Index too large
        Hint:	Valid config terms:
            	map:[<arraymap>].value<indices>=[value]
            	map:[<eventmap>].event<indices>=[event]
      
            	where <indices> is something like [0,3...5] or [all]
            	(add -v to see detail)
        Run 'perf list' for a list of valid events
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456132275-98875-9-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e571e029
    • Wang Nan's avatar
      perf tools: Support setting different slots in a BPF map separately · 2d055bf2
      Wang Nan authored
      This patch introduces basic facilities to support config different slots
      in a BPF map one by one.
      
      array.nr_ranges and array.ranges are introduced into 'struct
      parse_events_term', where ranges is an array of indices range (start,
      length) which will be configured by this config term. nr_ranges is the
      size of the array. The array is passed to 'struct bpf_map_priv'.  To
      indicate the new type of configuration, BPF_MAP_KEY_RANGES is added as a
      new key type. bpf_map_config_foreach_key() is extended to iterate over
      those indices instead of all possible keys.
      
      Code in this commit will be enabled by following commit which enables
      the indices syntax for array configuration.
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456132275-98875-8-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2d055bf2
    • Wang Nan's avatar
      perf tools: Enable passing event to BPF object · 7630b3e2
      Wang Nan authored
      A new syntax is added to the parser so that the user can access
      predefined perf events in BPF objects.
      
      After this patch, BPF programs for perf are finally able to utilize
      bpf_perf_event_read() introduced in commit 35578d79 ("bpf: Implement
      function bpf_perf_event_read() that get the selected hardware PMU
      counter").
      
      Test result:
      
        # cat test_bpf_map_2.c
        /************************ BEGIN **************************/
        #include <uapi/linux/bpf.h>
        #define SEC(NAME) __attribute__((section(NAME), used))
        struct bpf_map_def {
            unsigned int type;
            unsigned int key_size;
            unsigned int value_size;
            unsigned int max_entries;
        };
        static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
            (void *)BPF_FUNC_trace_printk;
        static int (*get_smp_processor_id)(void) =
            (void *)BPF_FUNC_get_smp_processor_id;
        static int (*perf_event_read)(struct bpf_map_def *, int) =
            (void *)BPF_FUNC_perf_event_read;
      
        struct bpf_map_def SEC("maps") pmu_map = {
            .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
            .key_size = sizeof(int),
            .value_size = sizeof(int),
            .max_entries = __NR_CPUS__,
        };
        SEC("func_write=sys_write")
        int func_write(void *ctx)
        {
            unsigned long long val;
            char fmt[] = "sys_write:        pmu=%llu\n";
            val = perf_event_read(&pmu_map, get_smp_processor_id());
            trace_printk(fmt, sizeof(fmt), val);
            return 0;
        }
      
        SEC("func_write_return=sys_write%return")
        int func_write_return(void *ctx)
        {
            unsigned long long val = 0;
            char fmt[] = "sys_write_return: pmu=%llu\n";
            val = perf_event_read(&pmu_map, get_smp_processor_id());
            trace_printk(fmt, sizeof(fmt), val);
            return 0;
        }
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        /************************* END ***************************/
      
      Normal case:
      
        # echo "" > /sys/kernel/debug/tracing/trace
        # perf record -i -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' ls /
        [SNIP]
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.013 MB perf.data (7 samples) ]
        # cat /sys/kernel/debug/tracing/trace | grep ls
                      ls-17066 [000] d... 938449.863301: : sys_write:        pmu=1157327
                      ls-17066 [000] dN.. 938449.863342: : sys_write_return: pmu=1225218
                      ls-17066 [000] d... 938449.863349: : sys_write:        pmu=1241922
                      ls-17066 [000] dN.. 938449.863369: : sys_write_return: pmu=1267445
      
      Normal case (system wide):
      
        # echo "" > /sys/kernel/debug/tracing/trace
        # perf record -i -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' -a
        ^C[ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.811 MB perf.data (120 samples) ]
      
        # cat /sys/kernel/debug/tracing/trace | grep -v '18446744073709551594' | grep -v perf | head -n 20
        [SNIP]
        #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
        #              | |       |   ||||       |         |
                   gmain-30828 [002] d... 2740551.068992: : sys_write:        pmu=84373
                   gmain-30828 [002] d... 2740551.068992: : sys_write_return: pmu=87696
                   gmain-30828 [002] d... 2740551.068996: : sys_write:        pmu=100658
                   gmain-30828 [002] d... 2740551.068997: : sys_write_return: pmu=102572
      
      Error case 1:
      
        # perf record -e './test_bpf_map_2.c' ls /
        [SNIP]
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.014 MB perf.data ]
        # cat /sys/kernel/debug/tracing/trace | grep ls
                      ls-17115 [007] d... 2724279.665625: : sys_write:        pmu=18446744073709551614
                      ls-17115 [007] dN.. 2724279.665651: : sys_write_return: pmu=18446744073709551614
                      ls-17115 [007] d... 2724279.665658: : sys_write:        pmu=18446744073709551614
                      ls-17115 [007] dN.. 2724279.665677: : sys_write_return: pmu=18446744073709551614
      
        (18446744073709551614 is 0xfffffffffffffffe (-2))
      
      Error case 2:
      
        # perf record -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=evt/' -a
        event syntax error: '..ps:pmu_map.event=evt/'
                                          \___ Event not found for map setting
      
        Hint:	Valid config terms:
             	map:[<arraymap>].value=[value]
             	map:[<eventmap>].event=[event]
        [SNIP]
      
      Error case 3:
        # ls /proc/2348/task/
        2348  2505  2506  2507  2508
        # perf record -i -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' -p 2348
        ERROR: Apply config to BPF failed: Cannot set event to BPF map in multi-thread tracing
      
      Error case 4:
        # perf record -e cycles -e './test_bpf_map_2.c/map:pmu_map.event=cycles/' ls /
        ERROR: Apply config to BPF failed: Doesn't support inherit event (Hint: use -i to turn off inherit)
      
      Error case 5:
        # perf record -i -e raw_syscalls:sys_enter -e './test_bpf_map_2.c/map:pmu_map.event=raw_syscalls:sys_enter/' ls
        ERROR: Apply config to BPF failed: Can only put raw, hardware and BPF output event into a BPF map
      
      Error case 6:
        # perf record -i -e './test_bpf_map_2.c/map:pmu_map.event=123/' ls /
        event syntax error: '.._map.event=123/'
                                          \___ Incorrect value type for map
        [SNIP]
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456132275-98875-7-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarHe Kuang <hekuang@huawei.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      7630b3e2
    • Wang Nan's avatar
      perf record: Apply config to BPF objects before recording · 8690a2a7
      Wang Nan authored
      bpf__apply_obj_config() is introduced as the core API to apply object
      config options to all BPF objects. This patch also does the real work
      for setting values for BPF_MAP_TYPE_PERF_ARRAY maps by inserting value
      stored in map's private field into the BPF map.
      
      This patch is required because we are not always able to set all BPF
      config during parsing. Further patch will set events created by perf to
      BPF_MAP_TYPE_PERF_EVENT_ARRAY maps, which is not exist until
      perf_evsel__open().
      
      bpf_map_foreach_key() is introduced to iterate over each key needs to be
      configured. This function would be extended to support more map types
      and different key settings.
      
      In perf record, before start recording, call bpf__apply_config() to turn
      on all BPF config options.
      
      Test result:
      
        # cat ./test_bpf_map_1.c
        /************************ BEGIN **************************/
        #include <uapi/linux/bpf.h>
        #define SEC(NAME) __attribute__((section(NAME), used))
        struct bpf_map_def {
            unsigned int type;
            unsigned int key_size;
            unsigned int value_size;
            unsigned int max_entries;
        };
        static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
            (void *)BPF_FUNC_map_lookup_elem;
        static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
            (void *)BPF_FUNC_trace_printk;
        struct bpf_map_def SEC("maps") channel = {
            .type = BPF_MAP_TYPE_ARRAY,
            .key_size = sizeof(int),
            .value_size = sizeof(int),
            .max_entries = 1,
        };
        SEC("func=sys_nanosleep")
        int func(void *ctx)
        {
            int key = 0;
            char fmt[] = "%d\n";
            int *pval = map_lookup_elem(&channel, &key);
            if (!pval)
                return 0;
            trace_printk(fmt, sizeof(fmt), *pval);
            return 0;
        }
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        /************************* END ***************************/
      
        # echo "" > /sys/kernel/debug/tracing/trace
        # ./perf record -e './test_bpf_map_1.c/map:channel.value=11/' usleep 10
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.012 MB perf.data ]
        # cat /sys/kernel/debug/tracing/trace
        # tracer: nop
        #
        # entries-in-buffer/entries-written: 1/1   #P:8
        [SNIP]
        #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
        #              | |       |   ||||       |         |
                   usleep-18593 [007] d... 2394714.395539: : 11
        # ./perf record -e './test_bpf_map_1.c/map:channel.value=101/' usleep 10
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.012 MB perf.data ]
        # cat /sys/kernel/debug/tracing/trace
        # tracer: nop
        #
        # entries-in-buffer/entries-written: 1/1   #P:8
        [SNIP]
        #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
        #              | |       |   ||||       |         |
                   usleep-18593 [007] d... 2394714.395539: : 11
                   usleep-19000 [006] d... 2394831.057840: : 101
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456132275-98875-6-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarHe Kuang <hekuang@huawei.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      8690a2a7
    • Wang Nan's avatar
      perf tools: Enable BPF object configure syntax · a34f3be7
      Wang Nan authored
      This patch adds the final step for BPF map configuration. A new syntax
      is appended into parser so user can config BPF objects through '/' '/'
      enclosed config terms.
      
      After this patch, following syntax is available:
      
        # perf record -e ./test_bpf_map_1.c/map:channel.value=10/ ...
      
      It would takes effect after appling following commits.
      
      Test result:
      
        # cat ./test_bpf_map_1.c
        /************************ BEGIN **************************/
        #include <uapi/linux/bpf.h>
        #define SEC(NAME) __attribute__((section(NAME), used))
        struct bpf_map_def {
            unsigned int type;
            unsigned int key_size;
            unsigned int value_size;
            unsigned int max_entries;
        };
        static void *(*map_lookup_elem)(struct bpf_map_def *, void *) =
            (void *)BPF_FUNC_map_lookup_elem;
        static int (*trace_printk)(const char *fmt, int fmt_size, ...) =
            (void *)BPF_FUNC_trace_printk;
        struct bpf_map_def SEC("maps") channel = {
            .type = BPF_MAP_TYPE_ARRAY,
            .key_size = sizeof(int),
            .value_size = sizeof(int),
            .max_entries = 1,
        };
        SEC("func=sys_nanosleep")
        int func(void *ctx)
        {
            int key = 0;
            char fmt[] = "%d\n";
            int *pval = map_lookup_elem(&channel, &key);
            if (!pval)
                return 0;
            trace_printk(fmt, sizeof(fmt), *pval);
            return 0;
        }
        char _license[] SEC("license") = "GPL";
        int _version SEC("version") = LINUX_VERSION_CODE;
        /************************* END ***************************/
      
       - Normal case:
        # ./perf record -e './test_bpf_map_1.c/map:channel.value=10/' usleep 10
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.012 MB perf.data ]
      
       - Error case:
      
        # ./perf record -e './test_bpf_map_1.c/map:channel.value/' usleep 10
        event syntax error: '..ps:channel:value/'
                                         \___ Config value not set (missing '=')
        Hint:	Valid config term:
               map:[<arraymap>]:value=[value]
               (add -v to see detail)
        Run 'perf list' for a list of valid events
      
        Usage: perf record [<options>] [<command>]
           or: perf record [<options>] -- <command> [<options>]
      
           -e, --event <event>   event selector. use 'perf list' to list available events
      
        # ./perf record -e './test_bpf_map_1.c/xmap:channel.value=10/' usleep 10
        event syntax error: '..pf_map_1.c/xmap:channel.value=10/'
                                          \___ Invalid object config option
        [SNIP]
      
        # ./perf record -e './test_bpf_map_1.c/map:xchannel.value=10/' usleep 10
        event syntax error: '..p_1.c/map:xchannel.value=10/'
                                          \___ Target map not exist
        [SNIP]
      
        # ./perf record -e './test_bpf_map_1.c/map:channel.xvalue=10/' usleep 10
        event syntax error: '..ps:channel.xvalue=10/'
                                          \___ Invalid object map config option
        [SNIP]
      
        # ./perf record -e './test_bpf_map_1.c/map:channel.value=x10/' usleep 10
        event syntax error: '..nnel.value=x10/'
                                          \___ Incorrect value type for map
        [SNIP]
      
        Change BPF_MAP_TYPE_ARRAY to '1' in test_bpf_map_1.c:
      
        # ./perf record -e './test_bpf_map_1.c/map:channel.value=10/' usleep 10
        event syntax error: '..ps:channel.value=10/'
                                          \___ Can't use this config term to this type of map
      
        Hint:	Valid config term:
            	map:[<arraymap>].value=[value]
            	(add -v to see detail)
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      [for parser part]
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456132275-98875-5-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarHe Kuang <hekuang@huawei.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a34f3be7
    • Wang Nan's avatar
      perf bpf: Add API to set values to map entries in a bpf object · 066dacbf
      Wang Nan authored
      bpf__config_obj() is introduced as a core API to config BPF object after
      loading. One configuration option of maps is introduced. After this
      patch BPF object can accept assignments like:
      
        map:my_map.value=1234
      
      (map.my_map.value looks pretty. However, there's a small but hard to fix
      problem related to flex's greedy matching. Please see [1].  Choose ':'
      to avoid it in a simpler way.)
      
      This patch is more complex than the work it does because the
      consideration of extension. In designing BPF map configuration, the
      following things should be considered:
      
       1. Array indices selection: perf should allow user setting different
          value for different slots in an array, with syntax like:
          map:my_map.value[0,3...6]=1234;
      
       2. A map should be set by different config terms, each for a part
          of it. For example, set each slot to the pid of a thread;
      
       3. Type of value: integer is not the only valid value type. A perf
          counter can also be put into a map after commit 35578d79
          ("bpf: Implement function bpf_perf_event_read() that get the
            selected hardware PMU counter")
      
       4. For a hash table, it should be possible to use a string or other
          value as a key;
      
       5. It is possible that map configuration is unable to be setup
          during parsing. A perf counter is an example.
      
      Therefore, this patch does the following:
      
       1. Instead of updating map element during parsing, this patch stores
          map config options in 'struct bpf_map_priv'. Following patches
          will apply those configs at an appropriate time;
      
       2. Link map operations in a list so a map can have multiple config
          terms attached, so different parts can be configured separately;
      
       3. Make 'struct bpf_map_priv' extensible so that the following patches
          can add new types of keys and operations;
      
       4. Use bpf_obj_config__map_funcs array to support more map config options.
      
      Since the patch changing the event parser to parse BPF object config is
      relative large, I've put it in another commit. Code in this patch can be
      tested after applying the next patch.
      
      [1] http://lkml.kernel.org/g/564ED621.4050500@huawei.comSigned-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jeremie Galarneau <jeremie.galarneau@efficios.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: pi3orama@163.com
      Link: http://lkml.kernel.org/r/1456132275-98875-4-git-send-email-wangnan0@huawei.comSigned-off-by: default avatarHe Kuang <hekuang@huawei.com>
      [ Changes "maps:my_map.value" to "map:my_map.value", improved error messages ]
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      066dacbf
    • Namhyung Kim's avatar
      perf tools: Fix assertion failure on dynamic entry · dd42baf1
      Namhyung Kim authored
      The dynamic entry is created for each field in a tracepoint event.
      Since they have no fixed hpp format index, it should skip when
      perf_hpp__reset_width() is called.
      
      This caused following assertion failure..
      
        $ perf record -e sched:sched_switch -a sleep 1
      
        $ perf report -s comm,next_pid --stdio
        perf: ui/hist.c:651: perf_hpp__reset_width:
          Assertion `!(fmt->idx >= PERF_HPP__MAX_INDEX)' failed.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1456064558-13086-1-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      dd42baf1
    • Namhyung Kim's avatar
      perf tools: Fix column width setting on 'trace' sort key · 0c0af78d
      Namhyung Kim authored
      It missed to update column length of the 'trace' sort key in the
      hists__calc_col_len() so it might truncate the output.  It calculated
      the column length in the ->cmp() callback originally but it doesn't
      guarantee it's called always.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1456064558-13086-5-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0c0af78d
    • Namhyung Kim's avatar
      perf tools: Fix alignment on some sort keys · 2960ed6f
      Namhyung Kim authored
      The srcline, srcfile and trace sort keys can have long entries.  With
      commit 89fee709 ("perf hists: Do column alignment on the format
      iterator"), it now aligns output with hist_entry__snprintf_alignment().
      So each (possibly long) sort entries don't need to do it themselves.
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1456101153-14519-1-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2960ed6f
    • Namhyung Kim's avatar
      perf tools: Update srcline/file if needed · cecaec63
      Namhyung Kim authored
      Normally the hist entry's srcline and/or srcfile is set during sorting.
      However sometime it's possible to a hist entry's srcline is not set yet
      after the sorting.  This is because the entry is so unique and other
      sort keys already make it distinct.  Then the srcline/file sort didn't
      have a chance to be called during the sorting.  In that case it has NULL
      srcline/srcfile field and shows nothing.
      
      Before:
      
        $ perf report -s comm,sym,srcline
        ...
        Overhead  Command       Symbol
        -----------------------------------------------------------------
          34.42%  swapper       [k] intel_idle          intel_idle.c:0
           2.44%  perf          [.] __poll_nocancel     (null)
           1.70%  gnome-shell   [k] fw_domains_get      (null)
           1.04%  Xorg          [k] sock_poll           (null)
      
      After:
      
          34.42%  swapper       [k] intel_idle          intel_idle.c:0
           2.44%  perf          [.] __poll_nocancel     .:0
           1.70%  gnome-shell   [k] fw_domains_get      fw_domains_get+42
           1.04%  Xorg          [k] sock_poll           socket.c:0
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1456101111-14400-1-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      cecaec63
    • Namhyung Kim's avatar
      perf tools: Fix segfault on dynamic entries · 665aa757
      Namhyung Kim authored
      A dynamic entry is created for each tracepoint event.  When it sets up
      the sort key, it checks with existing keys using ->equal() callback.
      But it missed to set the ->equal for dynamic entries.  The following
      segfault was due to the missing ->equal() callback.
      
        (gdb) bt
        #0  0x0000000000140003 in ?? ()
        #1  0x0000000000537769 in fmt_equal (b=0x2106980, a=0x21067a0) at ui/hist.c:548
        #2  perf_hpp__setup_output_field (list=0x8c6d80 <perf_hpp_list>) at ui/hist.c:560
        #3  0x00000000004e927e in setup_sorting (evlist=<optimized out>) at util/sort.c:2642
        #4  0x000000000043cf50 in cmd_report (argc=<optimized out>, argv=<optimized out>, prefix=<optimized out>)
            at builtin-report.c:932
        #5  0x00000000004865a1 in run_builtin (p=p@entry=0x8bbce0 <commands+192>, argc=argc@entry=7,
            argv=argv@entry=0x7ffd24d56ce0) at perf.c:390
        #6  0x000000000042dc1f in handle_internal_command (argv=0x7ffd24d56ce0, argc=7) at perf.c:451
        #7  run_argv (argv=0x7ffd24d56a70, argcp=0x7ffd24d56a7c) at perf.c:495
        #8  main (argc=7, argv=0x7ffd24d56ce0) at perf.c:620
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1456064558-13086-2-git-send-email-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      665aa757
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Remove duplicate typedef config_term_func_t definition · 58de6ed0
      Arnaldo Carvalho de Melo authored
      Older compilers don't like this, for instance, on RHEL6.7:
      
          CC       /tmp/build/perf/util/parse-events.o
        util/parse-events.c:844: error: redefinition of typedef ‘config_term_func_t’
        util/parse-events.c:353: note: previous declaration of ‘config_term_func_t’ was here
      
      So remove the second definition, that should've been just moved in 43d0b978
      ("perf tools: Enable config and setting names for legacy cache events"), not copied.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 43d0b978 ("perf tools: Enable config and setting names for legacy cache events")
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      58de6ed0
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Fix build on older systems · 2c97b0d4
      Arnaldo Carvalho de Melo authored
      In RHEL 6.7:
      
        CC       /tmp/build/perf/util/parse-events.o
        cc1: warnings being treated as errors
        util/parse-events.c: In function ‘parse_events_add_cache’:
        util/parse-events.c:366: error: declaration of ‘error’ shadows a global declaration
        util/util.h:136: error: shadowed declaration is here
      
      Rename it to 'err'.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 43d0b978 ("perf tools: Enable config and setting names for legacy cache events")
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      2c97b0d4
  4. 20 Feb, 2016 1 commit
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo' of... · 91e48b7d
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      User visible changes:
      
       - Add 'perf record' --all-user/--all-kernel options, so that one can tell
         that all the events in the command line should be restricted to the user
         or kernel levels (Jiri Olsa), i.e.:
      
      	perf record -e cycles:u,instructions:u
      
         is equivalent to:
      
              perf record --all-user -e cycles,instructions
      
       - Fix percentage update on key press, due to the buffering code
         (that creates hist_entries that will later be consumed) touching
         per hists state that is used by the display thread (Namhyung Kim)
      
       - Bail out when event modifiers not supported by 'perf stat' are
         specified, i.e.: (Wang Nan)
      
         # perf stat -e cycles/no-inherit/ usleep 1
         event syntax error: 'cycles/no-inherit/'
                              \___ 'no-inherit' is not usable in 'perf stat'
         # perf stat -e cycles/foo/ usleep 1
         event syntax error: 'cycles/foo/'
                                     \___ unknown term
      
         valid terms: config,config1,config2,name
         #
      
       - Enable setting names for legacy cache, raw and numeric events, e.g: (Wang Nan)
      
         # perf record -e cycles -e 4:0x6530160/name=evtx,call-graph=fp/ -a sleep 1
         [ perf record: Woken up 1 times to write data ]
         [ perf record: Captured and wrote 1.659 MB perf.data (844 samples) ]
         # perf evlist
         cycles
         evtx
         #
      
      Miscelaneous/Infrastructure changes:
      
       - Handled scaled == -1 case for counters in 'perf stat', fixing
         recent, only in perf/core, regression (Andi Kleen)
      
       - Reference count the cpu and thread maps at set_maps(), fixing the
         'object code reading' 'perf test' entry when it was requesting a
         perf_event_attr.sample_freq > /proc/sys/kernel/perf_event_max_sample_rate
         (Arnaldo Carvalho de Melo)
      
       - Improve perf_evlist__strerror_open() to provide hints for -EINVAL due
         to perf_event_attr.sample_freq > /proc/sys/kernel/perf_event_max_sample_rate
         (Arnaldo Carvalho de Melo)
      
       - Add checks to various callchain and histogram routines (Namhyung Kim)
      
       - Fix checking asprintf return value when parsing additional event config terms (Wang Nan)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      91e48b7d
  5. 19 Feb, 2016 12 commits