Commit bd4c3a34 authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'tracing-fixes-for-linus' of...

Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  kernel/profile.c: Switch /proc/irq/prof_cpu_mask to seq_file
  tracing: Export trace_profile_buf symbols
  tracing/events: use list_for_entry_continue
  tracing: remove max_tracer_type_len
  function-graph: use ftrace_graph_funcs directly
  tracing: Remove markers
  tracing: Allocate the ftrace event profile buffer dynamically
  tracing: Factorize the events profile accounting
parents b3727c24 583a22e7
Using the Linux Kernel Markers
Mathieu Desnoyers
This document introduces Linux Kernel Markers and their use. It provides
examples of how to insert markers in the kernel and connect probe functions to
them and provides some examples of probe functions.
* Purpose of markers
A marker placed in code provides a hook to call a function (probe) that you can
provide at runtime. A marker can be "on" (a probe is connected to it) or "off"
(no probe is attached). When a marker is "off" it has no effect, except for
adding a tiny time penalty (checking a condition for a branch) and space
penalty (adding a few bytes for the function call at the end of the
instrumented function and adds a data structure in a separate section). When a
marker is "on", the function you provide is called each time the marker is
executed, in the execution context of the caller. When the function provided
ends its execution, it returns to the caller (continuing from the marker site).
You can put markers at important locations in the code. Markers are
lightweight hooks that can pass an arbitrary number of parameters,
described in a printk-like format string, to the attached probe function.
They can be used for tracing and performance accounting.
* Usage
In order to use the macro trace_mark, you should include linux/marker.h.
#include <linux/marker.h>
And,
trace_mark(subsystem_event, "myint %d mystring %s", someint, somestring);
Where :
- subsystem_event is an identifier unique to your event
- subsystem is the name of your subsystem.
- event is the name of the event to mark.
- "myint %d mystring %s" is the formatted string for the serializer. "myint" and
"mystring" are repectively the field names associated with the first and
second parameter.
- someint is an integer.
- somestring is a char pointer.
Connecting a function (probe) to a marker is done by providing a probe (function
to call) for the specific marker through marker_probe_register() and can be
activated by calling marker_arm(). Marker deactivation can be done by calling
marker_disarm() as many times as marker_arm() has been called. Removing a probe
is done through marker_probe_unregister(); it will disarm the probe.
marker_synchronize_unregister() must be called between probe unregistration and
the first occurrence of
- the end of module exit function,
to make sure there is no caller left using the probe;
- the free of any resource used by the probes,
to make sure the probes wont be accessing invalid data.
This, and the fact that preemption is disabled around the probe call, make sure
that probe removal and module unload are safe. See the "Probe example" section
below for a sample probe module.
The marker mechanism supports inserting multiple instances of the same marker.
Markers can be put in inline functions, inlined static functions, and
unrolled loops as well as regular functions.
The naming scheme "subsystem_event" is suggested here as a convention intended
to limit collisions. Marker names are global to the kernel: they are considered
as being the same whether they are in the core kernel image or in modules.
Conflicting format strings for markers with the same name will cause the markers
to be detected to have a different format string not to be armed and will output
a printk warning which identifies the inconsistency:
"Format mismatch for probe probe_name (format), marker (format)"
Another way to use markers is to simply define the marker without generating any
function call to actually call into the marker. This is useful in combination
with tracepoint probes in a scheme like this :
void probe_tracepoint_name(unsigned int arg1, struct task_struct *tsk);
DEFINE_MARKER_TP(marker_eventname, tracepoint_name, probe_tracepoint_name,
"arg1 %u pid %d");
notrace void probe_tracepoint_name(unsigned int arg1, struct task_struct *tsk)
{
struct marker *marker = &GET_MARKER(kernel_irq_entry);
/* write data to trace buffers ... */
}
* Probe / marker example
See the example provided in samples/markers/src
Compile them with your kernel.
Run, as root :
modprobe marker-example (insmod order is not important)
modprobe probe-example
cat /proc/marker-example (returns an expected error)
rmmod marker-example probe-example
dmesg
......@@ -29,7 +29,6 @@
#include <linux/poll.h>
#include <linux/ptrace.h>
#include <linux/seq_file.h>
#include <linux/marker.h>
#include <asm/io.h>
#include <asm/time.h>
......
......@@ -39,7 +39,6 @@
#include <linux/pid_namespace.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/marker.h>
#include <asm/io.h>
#include <asm/mmu_context.h>
......
......@@ -4,6 +4,7 @@
#include <linux/ring_buffer.h>
#include <linux/trace_seq.h>
#include <linux/percpu.h>
#include <linux/hardirq.h>
struct trace_array;
struct tracer;
......@@ -130,10 +131,15 @@ struct ftrace_event_call {
void *data;
atomic_t profile_count;
int (*profile_enable)(struct ftrace_event_call *);
void (*profile_disable)(struct ftrace_event_call *);
int (*profile_enable)(void);
void (*profile_disable)(void);
};
#define FTRACE_MAX_PROFILE_SIZE 2048
extern char *trace_profile_buf;
extern char *trace_profile_buf_nmi;
#define MAX_FILTER_PRED 32
#define MAX_FILTER_STR_VAL 256 /* Should handle KSYM_SYMBOL_LEN */
......
......@@ -15,7 +15,6 @@
#include <linux/sched.h>
#include <linux/mm.h>
#include <linux/preempt.h>
#include <linux/marker.h>
#include <linux/msi.h>
#include <asm/signal.h>
......
#ifndef _LINUX_MARKER_H
#define _LINUX_MARKER_H
/*
* Code markup for dynamic and static tracing.
*
* See Documentation/marker.txt.
*
* (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
*
* This file is released under the GPLv2.
* See the file COPYING for more details.
*/
#include <stdarg.h>
#include <linux/types.h>
struct module;
struct marker;
/**
* marker_probe_func - Type of a marker probe function
* @probe_private: probe private data
* @call_private: call site private data
* @fmt: format string
* @args: variable argument list pointer. Use a pointer to overcome C's
* inability to pass this around as a pointer in a portable manner in
* the callee otherwise.
*
* Type of marker probe functions. They receive the mdata and need to parse the
* format string to recover the variable argument list.
*/
typedef void marker_probe_func(void *probe_private, void *call_private,
const char *fmt, va_list *args);
struct marker_probe_closure {
marker_probe_func *func; /* Callback */
void *probe_private; /* Private probe data */
};
struct marker {
const char *name; /* Marker name */
const char *format; /* Marker format string, describing the
* variable argument list.
*/
char state; /* Marker state. */
char ptype; /* probe type : 0 : single, 1 : multi */
/* Probe wrapper */
void (*call)(const struct marker *mdata, void *call_private, ...);
struct marker_probe_closure single;
struct marker_probe_closure *multi;
const char *tp_name; /* Optional tracepoint name */
void *tp_cb; /* Optional tracepoint callback */
} __attribute__((aligned(8)));
#ifdef CONFIG_MARKERS
#define _DEFINE_MARKER(name, tp_name_str, tp_cb, format) \
static const char __mstrtab_##name[] \
__attribute__((section("__markers_strings"))) \
= #name "\0" format; \
static struct marker __mark_##name \
__attribute__((section("__markers"), aligned(8))) = \
{ __mstrtab_##name, &__mstrtab_##name[sizeof(#name)], \
0, 0, marker_probe_cb, { __mark_empty_function, NULL},\
NULL, tp_name_str, tp_cb }
#define DEFINE_MARKER(name, format) \
_DEFINE_MARKER(name, NULL, NULL, format)
#define DEFINE_MARKER_TP(name, tp_name, tp_cb, format) \
_DEFINE_MARKER(name, #tp_name, tp_cb, format)
/*
* Note : the empty asm volatile with read constraint is used here instead of a
* "used" attribute to fix a gcc 4.1.x bug.
* Make sure the alignment of the structure in the __markers section will
* not add unwanted padding between the beginning of the section and the
* structure. Force alignment to the same alignment as the section start.
*
* The "generic" argument controls which marker enabling mechanism must be used.
* If generic is true, a variable read is used.
* If generic is false, immediate values are used.
*/
#define __trace_mark(generic, name, call_private, format, args...) \
do { \
DEFINE_MARKER(name, format); \
__mark_check_format(format, ## args); \
if (unlikely(__mark_##name.state)) { \
(*__mark_##name.call) \
(&__mark_##name, call_private, ## args);\
} \
} while (0)
#define __trace_mark_tp(name, call_private, tp_name, tp_cb, format, args...) \
do { \
void __check_tp_type(void) \
{ \
register_trace_##tp_name(tp_cb); \
} \
DEFINE_MARKER_TP(name, tp_name, tp_cb, format); \
__mark_check_format(format, ## args); \
(*__mark_##name.call)(&__mark_##name, call_private, \
## args); \
} while (0)
extern void marker_update_probe_range(struct marker *begin,
struct marker *end);
#define GET_MARKER(name) (__mark_##name)
#else /* !CONFIG_MARKERS */
#define DEFINE_MARKER(name, tp_name, tp_cb, format)
#define __trace_mark(generic, name, call_private, format, args...) \
__mark_check_format(format, ## args)
#define __trace_mark_tp(name, call_private, tp_name, tp_cb, format, args...) \
do { \
void __check_tp_type(void) \
{ \
register_trace_##tp_name(tp_cb); \
} \
__mark_check_format(format, ## args); \
} while (0)
static inline void marker_update_probe_range(struct marker *begin,
struct marker *end)
{ }
#define GET_MARKER(name)
#endif /* CONFIG_MARKERS */
/**
* trace_mark - Marker using code patching
* @name: marker name, not quoted.
* @format: format string
* @args...: variable argument list
*
* Places a marker using optimized code patching technique (imv_read())
* to be enabled when immediate values are present.
*/
#define trace_mark(name, format, args...) \
__trace_mark(0, name, NULL, format, ## args)
/**
* _trace_mark - Marker using variable read
* @name: marker name, not quoted.
* @format: format string
* @args...: variable argument list
*
* Places a marker using a standard memory read (_imv_read()) to be
* enabled. Should be used for markers in code paths where instruction
* modification based enabling is not welcome. (__init and __exit functions,
* lockdep, some traps, printk).
*/
#define _trace_mark(name, format, args...) \
__trace_mark(1, name, NULL, format, ## args)
/**
* trace_mark_tp - Marker in a tracepoint callback
* @name: marker name, not quoted.
* @tp_name: tracepoint name, not quoted.
* @tp_cb: tracepoint callback. Should have an associated global symbol so it
* is not optimized away by the compiler (should not be static).
* @format: format string
* @args...: variable argument list
*
* Places a marker in a tracepoint callback.
*/
#define trace_mark_tp(name, tp_name, tp_cb, format, args...) \
__trace_mark_tp(name, NULL, tp_name, tp_cb, format, ## args)
/**
* MARK_NOARGS - Format string for a marker with no argument.
*/
#define MARK_NOARGS " "
/* To be used for string format validity checking with gcc */
static inline void __printf(1, 2) ___mark_check_format(const char *fmt, ...)
{
}
#define __mark_check_format(format, args...) \
do { \
if (0) \
___mark_check_format(format, ## args); \
} while (0)
extern marker_probe_func __mark_empty_function;
extern void marker_probe_cb(const struct marker *mdata,
void *call_private, ...);
/*
* Connect a probe to a marker.
* private data pointer must be a valid allocated memory address, or NULL.
*/
extern int marker_probe_register(const char *name, const char *format,
marker_probe_func *probe, void *probe_private);
/*
* Returns the private data given to marker_probe_register.
*/
extern int marker_probe_unregister(const char *name,
marker_probe_func *probe, void *probe_private);
/*
* Unregister a marker by providing the registered private data.
*/
extern int marker_probe_unregister_private_data(marker_probe_func *probe,
void *probe_private);
extern void *marker_get_private_data(const char *name, marker_probe_func *probe,
int num);
/*
* marker_synchronize_unregister must be called between the last marker probe
* unregistration and the first one of
* - the end of module exit function
* - the free of any resource used by the probes
* to ensure the code and data are valid for any possibly running probes.
*/
#define marker_synchronize_unregister() synchronize_sched()
#endif
......@@ -15,7 +15,6 @@
#include <linux/stringify.h>
#include <linux/kobject.h>
#include <linux/moduleparam.h>
#include <linux/marker.h>
#include <linux/tracepoint.h>
#include <asm/local.h>
......@@ -327,10 +326,6 @@ struct module
/* The command line arguments (may be mangled). People like
keeping pointers to this stuff */
char *args;
#ifdef CONFIG_MARKERS
struct marker *markers;
unsigned int num_markers;
#endif
#ifdef CONFIG_TRACEPOINTS
struct tracepoint *tracepoints;
unsigned int num_tracepoints;
......@@ -535,8 +530,6 @@ int unregister_module_notifier(struct notifier_block * nb);
extern void print_modules(void);
extern void module_update_markers(void);
extern void module_update_tracepoints(void);
extern int module_get_iter_tracepoints(struct tracepoint_iter *iter);
......@@ -651,10 +644,6 @@ static inline void print_modules(void)
{
}
static inline void module_update_markers(void)
{
}
static inline void module_update_tracepoints(void)
{
}
......
......@@ -100,33 +100,25 @@ struct perf_counter_attr;
#ifdef CONFIG_EVENT_PROFILE
#define TRACE_SYS_ENTER_PROFILE(sname) \
static int prof_sysenter_enable_##sname(struct ftrace_event_call *event_call) \
static int prof_sysenter_enable_##sname(void) \
{ \
int ret = 0; \
if (!atomic_inc_return(&event_enter_##sname.profile_count)) \
ret = reg_prof_syscall_enter("sys"#sname); \
return ret; \
return reg_prof_syscall_enter("sys"#sname); \
} \
\
static void prof_sysenter_disable_##sname(struct ftrace_event_call *event_call)\
static void prof_sysenter_disable_##sname(void) \
{ \
if (atomic_add_negative(-1, &event_enter_##sname.profile_count)) \
unreg_prof_syscall_enter("sys"#sname); \
unreg_prof_syscall_enter("sys"#sname); \
}
#define TRACE_SYS_EXIT_PROFILE(sname) \
static int prof_sysexit_enable_##sname(struct ftrace_event_call *event_call) \
static int prof_sysexit_enable_##sname(void) \
{ \
int ret = 0; \
if (!atomic_inc_return(&event_exit_##sname.profile_count)) \
ret = reg_prof_syscall_exit("sys"#sname); \
return ret; \
return reg_prof_syscall_exit("sys"#sname); \
} \
\
static void prof_sysexit_disable_##sname(struct ftrace_event_call *event_call) \
static void prof_sysexit_disable_##sname(void) \
{ \
if (atomic_add_negative(-1, &event_exit_##sname.profile_count)) \
unreg_prof_syscall_exit("sys"#sname); \
unreg_prof_syscall_exit("sys"#sname); \
}
#define TRACE_SYS_ENTER_PROFILE_INIT(sname) \
......
......@@ -382,20 +382,14 @@ static inline int ftrace_get_offsets_##call( \
*
* NOTE: The insertion profile callback (ftrace_profile_<call>) is defined later
*
* static int ftrace_profile_enable_<call>(struct ftrace_event_call *event_call)
* static int ftrace_profile_enable_<call>(void)
* {
* int ret = 0;
*
* if (!atomic_inc_return(&event_call->profile_count))
* ret = register_trace_<call>(ftrace_profile_<call>);
*
* return ret;
* return register_trace_<call>(ftrace_profile_<call>);
* }
*
* static void ftrace_profile_disable_<call>(struct ftrace_event_call *event_call)
* static void ftrace_profile_disable_<call>(void)
* {
* if (atomic_add_negative(-1, &event->call->profile_count))
* unregister_trace_<call>(ftrace_profile_<call>);
* unregister_trace_<call>(ftrace_profile_<call>);
* }
*
*/
......@@ -405,20 +399,14 @@ static inline int ftrace_get_offsets_##call( \
\
static void ftrace_profile_##call(proto); \
\
static int ftrace_profile_enable_##call(struct ftrace_event_call *event_call) \
static int ftrace_profile_enable_##call(void) \
{ \
int ret = 0; \
\
if (!atomic_inc_return(&event_call->profile_count)) \
ret = register_trace_##call(ftrace_profile_##call); \
\
return ret; \
return register_trace_##call(ftrace_profile_##call); \
} \
\
static void ftrace_profile_disable_##call(struct ftrace_event_call *event_call)\
static void ftrace_profile_disable_##call(void) \
{ \
if (atomic_add_negative(-1, &event_call->profile_count)) \
unregister_trace_##call(ftrace_profile_##call); \
unregister_trace_##call(ftrace_profile_##call); \
}
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
......@@ -660,11 +648,12 @@ __attribute__((section("_ftrace_events"))) event_##call = { \
* struct ftrace_raw_##call *entry;
* u64 __addr = 0, __count = 1;
* unsigned long irq_flags;
* struct trace_entry *ent;
* int __entry_size;
* int __data_size;
* int __cpu
* int pc;
*
* local_save_flags(irq_flags);
* pc = preempt_count();
*
* __data_size = ftrace_get_offsets_<call>(&__data_offsets, args);
......@@ -675,25 +664,34 @@ __attribute__((section("_ftrace_events"))) event_##call = { \
* sizeof(u64));
* __entry_size -= sizeof(u32);
*
* do {
* char raw_data[__entry_size]; <- allocate our sample in the stack
* struct trace_entry *ent;
* // Protect the non nmi buffer
* // This also protects the rcu read side
* local_irq_save(irq_flags);
* __cpu = smp_processor_id();
*
* if (in_nmi())
* raw_data = rcu_dereference(trace_profile_buf_nmi);
* else
* raw_data = rcu_dereference(trace_profile_buf);
*
* if (!raw_data)
* goto end;
*
* zero dead bytes from alignment to avoid stack leak to userspace:
* raw_data = per_cpu_ptr(raw_data, __cpu);
*
* *(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL;
* entry = (struct ftrace_raw_<call> *)raw_data;
* ent = &entry->ent;
* tracing_generic_entry_update(ent, irq_flags, pc);
* ent->type = event_call->id;
* //zero dead bytes from alignment to avoid stack leak to userspace:
* *(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL;
* entry = (struct ftrace_raw_<call> *)raw_data;
* ent = &entry->ent;
* tracing_generic_entry_update(ent, irq_flags, pc);
* ent->type = event_call->id;
*
* <tstruct> <- do some jobs with dynamic arrays
* <tstruct> <- do some jobs with dynamic arrays
*
* <assign> <- affect our values
* <assign> <- affect our values
*
* perf_tpcounter_event(event_call->id, __addr, __count, entry,
* __entry_size); <- submit them to perf counter
* } while (0);
* perf_tpcounter_event(event_call->id, __addr, __count, entry,
* __entry_size); <- submit them to perf counter
*
* }
*/
......@@ -716,11 +714,13 @@ static void ftrace_profile_##call(proto) \
struct ftrace_raw_##call *entry; \
u64 __addr = 0, __count = 1; \
unsigned long irq_flags; \
struct trace_entry *ent; \
int __entry_size; \
int __data_size; \
char *raw_data; \
int __cpu; \
int pc; \
\
local_save_flags(irq_flags); \
pc = preempt_count(); \
\
__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
......@@ -728,23 +728,38 @@ static void ftrace_profile_##call(proto) \
sizeof(u64)); \
__entry_size -= sizeof(u32); \
\
do { \
char raw_data[__entry_size]; \
struct trace_entry *ent; \
if (WARN_ONCE(__entry_size > FTRACE_MAX_PROFILE_SIZE, \
"profile buffer not large enough")) \
return; \
\
local_irq_save(irq_flags); \
__cpu = smp_processor_id(); \
\
*(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL; \
entry = (struct ftrace_raw_##call *)raw_data; \
ent = &entry->ent; \
tracing_generic_entry_update(ent, irq_flags, pc); \
ent->type = event_call->id; \
if (in_nmi()) \
raw_data = rcu_dereference(trace_profile_buf_nmi); \
else \
raw_data = rcu_dereference(trace_profile_buf); \
\
tstruct \
if (!raw_data) \
goto end; \
\
{ assign; } \
raw_data = per_cpu_ptr(raw_data, __cpu); \
\
perf_tpcounter_event(event_call->id, __addr, __count, entry,\
*(u64 *)(&raw_data[__entry_size - sizeof(u64)]) = 0ULL; \
entry = (struct ftrace_raw_##call *)raw_data; \
ent = &entry->ent; \
tracing_generic_entry_update(ent, irq_flags, pc); \
ent->type = event_call->id; \
\
tstruct \
\
{ assign; } \
\
perf_tpcounter_event(event_call->id, __addr, __count, entry, \
__entry_size); \
} while (0); \
\
end: \
local_irq_restore(irq_flags); \
\
}
......
......@@ -1054,13 +1054,6 @@ config PROFILING
config TRACEPOINTS
bool
config MARKERS
bool "Activate markers"
select TRACEPOINTS
help
Place an empty function call at each marker site. Can be
dynamically changed for a probe function.
source "arch/Kconfig"
config SLOW_WORK
......
......@@ -87,7 +87,6 @@ obj-$(CONFIG_RELAY) += relay.o
obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
obj-$(CONFIG_MARKERS) += marker.o
obj-$(CONFIG_TRACEPOINTS) += tracepoint.o
obj-$(CONFIG_LATENCYTOP) += latencytop.o
obj-$(CONFIG_FUNCTION_TRACER) += trace/
......
This diff is collapsed.
......@@ -2237,10 +2237,6 @@ static noinline struct module *load_module(void __user *umod,
sizeof(*mod->ctors), &mod->num_ctors);
#endif
#ifdef CONFIG_MARKERS
mod->markers = section_objs(hdr, sechdrs, secstrings, "__markers",
sizeof(*mod->markers), &mod->num_markers);
#endif
#ifdef CONFIG_TRACEPOINTS
mod->tracepoints = section_objs(hdr, sechdrs, secstrings,
"__tracepoints",
......@@ -2958,20 +2954,6 @@ void module_layout(struct module *mod,
EXPORT_SYMBOL(module_layout);
#endif
#ifdef CONFIG_MARKERS
void module_update_markers(void)
{
struct module *mod;
mutex_lock(&module_mutex);
list_for_each_entry(mod, &modules, list)
if (!mod->taints)
marker_update_probe_range(mod->markers,
mod->markers + mod->num_markers);
mutex_unlock(&module_mutex);
}
#endif
#ifdef CONFIG_TRACEPOINTS
void module_update_tracepoints(void)
{
......
......@@ -442,48 +442,51 @@ void profile_tick(int type)
#ifdef CONFIG_PROC_FS
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <asm/uaccess.h>
static int prof_cpu_mask_read_proc(char *page, char **start, off_t off,
int count, int *eof, void *data)
static int prof_cpu_mask_proc_show(struct seq_file *m, void *v)
{
int len = cpumask_scnprintf(page, count, data);
if (count - len < 2)
return -EINVAL;
len += sprintf(page + len, "\n");
return len;
seq_cpumask(m, prof_cpu_mask);
seq_putc(m, '\n');
return 0;
}
static int prof_cpu_mask_write_proc(struct file *file,
const char __user *buffer, unsigned long count, void *data)
static int prof_cpu_mask_proc_open(struct inode *inode, struct file *file)
{
return single_open(file, prof_cpu_mask_proc_show, NULL);
}
static ssize_t prof_cpu_mask_proc_write(struct file *file,
const char __user *buffer, size_t count, loff_t *pos)
{
struct cpumask *mask = data;
unsigned long full_count = count, err;
cpumask_var_t new_value;
int err;
if (!alloc_cpumask_var(&new_value, GFP_KERNEL))
return -ENOMEM;
err = cpumask_parse_user(buffer, count, new_value);
if (!err) {
cpumask_copy(mask, new_value);
err = full_count;
cpumask_copy(prof_cpu_mask, new_value);
err = count;
}
free_cpumask_var(new_value);
return err;
}
static const struct file_operations prof_cpu_mask_proc_fops = {
.open = prof_cpu_mask_proc_open,
.read = seq_read,
.llseek = seq_lseek,
.release = single_release,
.write = prof_cpu_mask_proc_write,
};
void create_prof_cpu_mask(struct proc_dir_entry *root_irq_dir)
{
struct proc_dir_entry *entry;
/* create /proc/irq/prof_cpu_mask */
entry = create_proc_entry("prof_cpu_mask", 0600, root_irq_dir);
if (!entry)
return;
entry->data = prof_cpu_mask;
entry->read_proc = prof_cpu_mask_read_proc;
entry->write_proc = prof_cpu_mask_write_proc;
proc_create("prof_cpu_mask", 0600, root_irq_dir, &prof_cpu_mask_proc_fops);
}
/*
......
......@@ -2414,11 +2414,9 @@ unsigned long ftrace_graph_funcs[FTRACE_GRAPH_MAX_FUNCS] __read_mostly;
static void *
__g_next(struct seq_file *m, loff_t *pos)
{
unsigned long *array = m->private;
if (*pos >= ftrace_graph_count)
return NULL;
return &array[*pos];
return &ftrace_graph_funcs[*pos];
}
static void *
......@@ -2482,16 +2480,10 @@ ftrace_graph_open(struct inode *inode, struct file *file)
ftrace_graph_count = 0;
memset(ftrace_graph_funcs, 0, sizeof(ftrace_graph_funcs));
}
mutex_unlock(&graph_lock);
if (file->f_mode & FMODE_READ) {
if (file->f_mode & FMODE_READ)
ret = seq_open(file, &ftrace_graph_seq_ops);
if (!ret) {
struct seq_file *m = file->private_data;
m->private = ftrace_graph_funcs;
}
} else
file->private_data = ftrace_graph_funcs;
mutex_unlock(&graph_lock);
return ret;
}
......@@ -2560,7 +2552,6 @@ ftrace_graph_write(struct file *file, const char __user *ubuf,
size_t cnt, loff_t *ppos)
{
struct trace_parser parser;
unsigned long *array;
size_t read = 0;
ssize_t ret;
......@@ -2574,12 +2565,6 @@ ftrace_graph_write(struct file *file, const char __user *ubuf,
goto out;
}
if (file->f_mode & FMODE_READ) {
struct seq_file *m = file->private_data;
array = m->private;
} else
array = file->private_data;
if (trace_parser_get_init(&parser, FTRACE_BUFF_MAX)) {
ret = -ENOMEM;
goto out;
......@@ -2591,7 +2576,7 @@ ftrace_graph_write(struct file *file, const char __user *ubuf,
parser.buffer[parser.idx] = 0;
/* we allow only one expression at a time */
ret = ftrace_set_func(array, &ftrace_graph_count,
ret = ftrace_set_func(ftrace_graph_funcs, &ftrace_graph_count,
parser.buffer);
if (ret)
goto out;
......
......@@ -125,13 +125,13 @@ int ftrace_dump_on_oops;
static int tracing_set_tracer(const char *buf);
#define BOOTUP_TRACER_SIZE 100
static char bootup_tracer_buf[BOOTUP_TRACER_SIZE] __initdata;
#define MAX_TRACER_SIZE 100
static char bootup_tracer_buf[MAX_TRACER_SIZE] __initdata;
static char *default_bootup_tracer;
static int __init set_ftrace(char *str)
{
strncpy(bootup_tracer_buf, str, BOOTUP_TRACER_SIZE);
strncpy(bootup_tracer_buf, str, MAX_TRACER_SIZE);
default_bootup_tracer = bootup_tracer_buf;
/* We are using ftrace early, expand it */
ring_buffer_expanded = 1;
......@@ -241,13 +241,6 @@ static struct tracer *trace_types __read_mostly;
/* current_trace points to the tracer that is currently active */
static struct tracer *current_trace __read_mostly;
/*
* max_tracer_type_len is used to simplify the allocating of
* buffers to read userspace tracer names. We keep track of
* the longest tracer name registered.
*/
static int max_tracer_type_len;
/*
* trace_types_lock is used to protect the trace_types list.
* This lock is also used to keep user access serialized.
......@@ -619,7 +612,6 @@ __releases(kernel_lock)
__acquires(kernel_lock)
{
struct tracer *t;
int len;
int ret = 0;
if (!type->name) {
......@@ -627,6 +619,11 @@ __acquires(kernel_lock)
return -1;
}
if (strlen(type->name) > MAX_TRACER_SIZE) {
pr_info("Tracer has a name longer than %d\n", MAX_TRACER_SIZE);
return -1;
}
/*
* When this gets called we hold the BKL which means that
* preemption is disabled. Various trace selftests however
......@@ -641,7 +638,7 @@ __acquires(kernel_lock)
for (t = trace_types; t; t = t->next) {
if (strcmp(type->name, t->name) == 0) {
/* already found */
pr_info("Trace %s already registered\n",
pr_info("Tracer %s already registered\n",
type->name);
ret = -1;
goto out;
......@@ -692,9 +689,6 @@ __acquires(kernel_lock)
type->next = trace_types;
trace_types = type;
len = strlen(type->name);
if (len > max_tracer_type_len)
max_tracer_type_len = len;
out:
tracing_selftest_running = false;
......@@ -703,7 +697,7 @@ __acquires(kernel_lock)
if (ret || !default_bootup_tracer)
goto out_unlock;
if (strncmp(default_bootup_tracer, type->name, BOOTUP_TRACER_SIZE))
if (strncmp(default_bootup_tracer, type->name, MAX_TRACER_SIZE))
goto out_unlock;
printk(KERN_INFO "Starting tracer '%s'\n", type->name);
......@@ -725,14 +719,13 @@ __acquires(kernel_lock)
void unregister_tracer(struct tracer *type)
{
struct tracer **t;
int len;
mutex_lock(&trace_types_lock);
for (t = &trace_types; *t; t = &(*t)->next) {
if (*t == type)
goto found;
}
pr_info("Trace %s not registered\n", type->name);
pr_info("Tracer %s not registered\n", type->name);
goto out;
found:
......@@ -745,17 +738,7 @@ void unregister_tracer(struct tracer *type)
current_trace->stop(&global_trace);
current_trace = &nop_trace;
}
if (strlen(type->name) != max_tracer_type_len)
goto out;
max_tracer_type_len = 0;
for (t = &trace_types; *t; t = &(*t)->next) {
len = strlen((*t)->name);
if (len > max_tracer_type_len)
max_tracer_type_len = len;
}
out:
out:
mutex_unlock(&trace_types_lock);
}
......@@ -2604,7 +2587,7 @@ static ssize_t
tracing_set_trace_read(struct file *filp, char __user *ubuf,
size_t cnt, loff_t *ppos)
{
char buf[max_tracer_type_len+2];
char buf[MAX_TRACER_SIZE+2];
int r;
mutex_lock(&trace_types_lock);
......@@ -2754,15 +2737,15 @@ static ssize_t
tracing_set_trace_write(struct file *filp, const char __user *ubuf,
size_t cnt, loff_t *ppos)
{
char buf[max_tracer_type_len+1];
char buf[MAX_TRACER_SIZE+1];
int i;
size_t ret;
int err;
ret = cnt;
if (cnt > max_tracer_type_len)
cnt = max_tracer_type_len;
if (cnt > MAX_TRACER_SIZE)
cnt = MAX_TRACER_SIZE;
if (copy_from_user(&buf, ubuf, cnt))
return -EFAULT;
......
......@@ -8,6 +8,57 @@
#include <linux/module.h>
#include "trace.h"
/*
* We can't use a size but a type in alloc_percpu()
* So let's create a dummy type that matches the desired size
*/
typedef struct {char buf[FTRACE_MAX_PROFILE_SIZE];} profile_buf_t;
char *trace_profile_buf;
EXPORT_SYMBOL_GPL(trace_profile_buf);
char *trace_profile_buf_nmi;
EXPORT_SYMBOL_GPL(trace_profile_buf_nmi);
/* Count the events in use (per event id, not per instance) */
static int total_profile_count;
static int ftrace_profile_enable_event(struct ftrace_event_call *event)
{
char *buf;
int ret = -ENOMEM;
if (atomic_inc_return(&event->profile_count))
return 0;
if (!total_profile_count++) {
buf = (char *)alloc_percpu(profile_buf_t);
if (!buf)
goto fail_buf;
rcu_assign_pointer(trace_profile_buf, buf);
buf = (char *)alloc_percpu(profile_buf_t);
if (!buf)
goto fail_buf_nmi;
rcu_assign_pointer(trace_profile_buf_nmi, buf);
}
ret = event->profile_enable();
if (!ret)
return 0;
kfree(trace_profile_buf_nmi);
fail_buf_nmi:
kfree(trace_profile_buf);
fail_buf:
total_profile_count--;
atomic_dec(&event->profile_count);
return ret;
}
int ftrace_profile_enable(int event_id)
{
struct ftrace_event_call *event;
......@@ -17,7 +68,7 @@ int ftrace_profile_enable(int event_id)
list_for_each_entry(event, &ftrace_events, list) {
if (event->id == event_id && event->profile_enable &&
try_module_get(event->mod)) {
ret = event->profile_enable(event);
ret = ftrace_profile_enable_event(event);
break;
}
}
......@@ -26,6 +77,33 @@ int ftrace_profile_enable(int event_id)
return ret;
}
static void ftrace_profile_disable_event(struct ftrace_event_call *event)
{
char *buf, *nmi_buf;
if (!atomic_add_negative(-1, &event->profile_count))
return;
event->profile_disable();
if (!--total_profile_count) {
buf = trace_profile_buf;
rcu_assign_pointer(trace_profile_buf, NULL);
nmi_buf = trace_profile_buf_nmi;
rcu_assign_pointer(trace_profile_buf_nmi, NULL);
/*
* Ensure every events in profiling have finished before
* releasing the buffers
*/
synchronize_sched();
free_percpu(buf);
free_percpu(nmi_buf);
}
}
void ftrace_profile_disable(int event_id)
{
struct ftrace_event_call *event;
......@@ -33,7 +111,7 @@ void ftrace_profile_disable(int event_id)
mutex_lock(&event_mutex);
list_for_each_entry(event, &ftrace_events, list) {
if (event->id == event_id) {
event->profile_disable(event);
ftrace_profile_disable_event(event);
module_put(event->mod);
break;
}
......
......@@ -271,42 +271,32 @@ ftrace_event_write(struct file *file, const char __user *ubuf,
static void *
t_next(struct seq_file *m, void *v, loff_t *pos)
{
struct list_head *list = m->private;
struct ftrace_event_call *call;
struct ftrace_event_call *call = v;
(*pos)++;
for (;;) {
if (list == &ftrace_events)
return NULL;
call = list_entry(list, struct ftrace_event_call, list);
list_for_each_entry_continue(call, &ftrace_events, list) {
/*
* The ftrace subsystem is for showing formats only.
* They can not be enabled or disabled via the event files.
*/
if (call->regfunc)
break;
list = list->next;
return call;
}
m->private = list->next;
return call;
return NULL;
}
static void *t_start(struct seq_file *m, loff_t *pos)
{
struct ftrace_event_call *call = NULL;
struct ftrace_event_call *call;
loff_t l;
mutex_lock(&event_mutex);
m->private = ftrace_events.next;
call = list_entry(&ftrace_events, struct ftrace_event_call, list);
for (l = 0; l <= *pos; ) {
call = t_next(m, NULL, &l);
call = t_next(m, call, &l);
if (!call)
break;
}
......@@ -316,37 +306,28 @@ static void *t_start(struct seq_file *m, loff_t *pos)
static void *
s_next(struct seq_file *m, void *v, loff_t *pos)
{
struct list_head *list = m->private;
struct ftrace_event_call *call;
struct ftrace_event_call *call = v;
(*pos)++;
retry:
if (list == &ftrace_events)
return NULL;
call = list_entry(list, struct ftrace_event_call, list);
if (!call->enabled) {
list = list->next;
goto retry;
list_for_each_entry_continue(call, &ftrace_events, list) {
if (call->enabled)
return call;
}
m->private = list->next;
return call;
return NULL;
}
static void *s_start(struct seq_file *m, loff_t *pos)
{
struct ftrace_event_call *call = NULL;
struct ftrace_event_call *call;
loff_t l;
mutex_lock(&event_mutex);
m->private = ftrace_events.next;
call = list_entry(&ftrace_events, struct ftrace_event_call, list);
for (l = 0; l <= *pos; ) {
call = s_next(m, NULL, &l);
call = s_next(m, call, &l);
if (!call)
break;
}
......
......@@ -11,7 +11,6 @@
#include <linux/ftrace.h>
#include <linux/string.h>
#include <linux/module.h>
#include <linux/marker.h>
#include <linux/mutex.h>
#include <linux/ctype.h>
#include <linux/list.h>
......
......@@ -384,10 +384,13 @@ static int sys_prof_refcount_exit;
static void prof_syscall_enter(struct pt_regs *regs, long id)
{
struct syscall_trace_enter *rec;
struct syscall_metadata *sys_data;
struct syscall_trace_enter *rec;
unsigned long flags;
char *raw_data;
int syscall_nr;
int size;
int cpu;
syscall_nr = syscall_get_nr(current, regs);
if (!test_bit(syscall_nr, enabled_prof_enter_syscalls))
......@@ -402,20 +405,38 @@ static void prof_syscall_enter(struct pt_regs *regs, long id)
size = ALIGN(size + sizeof(u32), sizeof(u64));
size -= sizeof(u32);
do {
char raw_data[size];
if (WARN_ONCE(size > FTRACE_MAX_PROFILE_SIZE,
"profile buffer not large enough"))
return;
/* Protect the per cpu buffer, begin the rcu read side */
local_irq_save(flags);
/* zero the dead bytes from align to not leak stack to user */
*(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
cpu = smp_processor_id();
if (in_nmi())
raw_data = rcu_dereference(trace_profile_buf_nmi);
else
raw_data = rcu_dereference(trace_profile_buf);
if (!raw_data)
goto end;
rec = (struct syscall_trace_enter *) raw_data;
tracing_generic_entry_update(&rec->ent, 0, 0);
rec->ent.type = sys_data->enter_id;
rec->nr = syscall_nr;
syscall_get_arguments(current, regs, 0, sys_data->nb_args,
(unsigned long *)&rec->args);
perf_tpcounter_event(sys_data->enter_id, 0, 1, rec, size);
} while(0);
raw_data = per_cpu_ptr(raw_data, cpu);
/* zero the dead bytes from align to not leak stack to user */
*(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
rec = (struct syscall_trace_enter *) raw_data;
tracing_generic_entry_update(&rec->ent, 0, 0);
rec->ent.type = sys_data->enter_id;
rec->nr = syscall_nr;
syscall_get_arguments(current, regs, 0, sys_data->nb_args,
(unsigned long *)&rec->args);
perf_tpcounter_event(sys_data->enter_id, 0, 1, rec, size);
end:
local_irq_restore(flags);
}
int reg_prof_syscall_enter(char *name)
......@@ -460,8 +481,12 @@ void unreg_prof_syscall_enter(char *name)
static void prof_syscall_exit(struct pt_regs *regs, long ret)
{
struct syscall_metadata *sys_data;
struct syscall_trace_exit rec;
struct syscall_trace_exit *rec;
unsigned long flags;
int syscall_nr;
char *raw_data;
int size;
int cpu;
syscall_nr = syscall_get_nr(current, regs);
if (!test_bit(syscall_nr, enabled_prof_exit_syscalls))
......@@ -471,12 +496,46 @@ static void prof_syscall_exit(struct pt_regs *regs, long ret)
if (!sys_data)
return;
tracing_generic_entry_update(&rec.ent, 0, 0);
rec.ent.type = sys_data->exit_id;
rec.nr = syscall_nr;
rec.ret = syscall_get_return_value(current, regs);
/* We can probably do that at build time */
size = ALIGN(sizeof(*rec) + sizeof(u32), sizeof(u64));
size -= sizeof(u32);
perf_tpcounter_event(sys_data->exit_id, 0, 1, &rec, sizeof(rec));
/*
* Impossible, but be paranoid with the future
* How to put this check outside runtime?
*/
if (WARN_ONCE(size > FTRACE_MAX_PROFILE_SIZE,
"exit event has grown above profile buffer size"))
return;
/* Protect the per cpu buffer, begin the rcu read side */
local_irq_save(flags);
cpu = smp_processor_id();
if (in_nmi())
raw_data = rcu_dereference(trace_profile_buf_nmi);
else
raw_data = rcu_dereference(trace_profile_buf);
if (!raw_data)
goto end;
raw_data = per_cpu_ptr(raw_data, cpu);
/* zero the dead bytes from align to not leak stack to user */
*(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
rec = (struct syscall_trace_exit *)raw_data;
tracing_generic_entry_update(&rec->ent, 0, 0);
rec->ent.type = sys_data->exit_id;
rec->nr = syscall_nr;
rec->ret = syscall_get_return_value(current, regs);
perf_tpcounter_event(sys_data->exit_id, 0, 1, rec, size);
end:
local_irq_restore(flags);
}
int reg_prof_syscall_exit(char *name)
......
......@@ -7,12 +7,6 @@ menuconfig SAMPLES
if SAMPLES
config SAMPLE_MARKERS
tristate "Build markers examples -- loadable modules only"
depends on MARKERS && m
help
This build markers example modules.
config SAMPLE_TRACEPOINTS
tristate "Build tracepoints examples -- loadable modules only"
depends on TRACEPOINTS && m
......
# Makefile for Linux samples code
obj-$(CONFIG_SAMPLES) += markers/ kobject/ kprobes/ tracepoints/ trace_events/
obj-$(CONFIG_SAMPLES) += kobject/ kprobes/ tracepoints/ trace_events/
# builds the kprobes example kernel modules;
# then to use one (as root): insmod <module_name.ko>
obj-$(CONFIG_SAMPLE_MARKERS) += probe-example.o marker-example.o
/* marker-example.c
*
* Executes a marker when /proc/marker-example is opened.
*
* (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
*
* This file is released under the GPLv2.
* See the file COPYING for more details.
*/
#include <linux/module.h>
#include <linux/marker.h>
#include <linux/sched.h>
#include <linux/proc_fs.h>
struct proc_dir_entry *pentry_example;
static int my_open(struct inode *inode, struct file *file)
{
int i;
trace_mark(subsystem_event, "integer %d string %s", 123,
"example string");
for (i = 0; i < 10; i++)
trace_mark(subsystem_eventb, MARK_NOARGS);
return -EPERM;
}
static struct file_operations mark_ops = {
.open = my_open,
};
static int __init example_init(void)
{
printk(KERN_ALERT "example init\n");
pentry_example = proc_create("marker-example", 0444, NULL, &mark_ops);
if (!pentry_example)
return -EPERM;
return 0;
}
static void __exit example_exit(void)
{
printk(KERN_ALERT "example exit\n");
remove_proc_entry("marker-example", NULL);
}
module_init(example_init)
module_exit(example_exit)
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Mathieu Desnoyers");
MODULE_DESCRIPTION("Marker example");
/* probe-example.c
*
* Connects two functions to marker call sites.
*
* (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
*
* This file is released under the GPLv2.
* See the file COPYING for more details.
*/
#include <linux/sched.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/marker.h>
#include <asm/atomic.h>
struct probe_data {
const char *name;
const char *format;
marker_probe_func *probe_func;
};
void probe_subsystem_event(void *probe_data, void *call_data,
const char *format, va_list *args)
{
/* Declare args */
unsigned int value;
const char *mystr;
/* Assign args */
value = va_arg(*args, typeof(value));
mystr = va_arg(*args, typeof(mystr));
/* Call printk */
printk(KERN_INFO "Value %u, string %s\n", value, mystr);
/* or count, check rights, serialize data in a buffer */
}
atomic_t eventb_count = ATOMIC_INIT(0);
void probe_subsystem_eventb(void *probe_data, void *call_data,
const char *format, va_list *args)
{
/* Increment counter */
atomic_inc(&eventb_count);
}
static struct probe_data probe_array[] =
{
{ .name = "subsystem_event",
.format = "integer %d string %s",
.probe_func = probe_subsystem_event },
{ .name = "subsystem_eventb",
.format = MARK_NOARGS,
.probe_func = probe_subsystem_eventb },
};
static int __init probe_init(void)
{
int result;
int i;
for (i = 0; i < ARRAY_SIZE(probe_array); i++) {
result = marker_probe_register(probe_array[i].name,
probe_array[i].format,
probe_array[i].probe_func, &probe_array[i]);
if (result)
printk(KERN_INFO "Unable to register probe %s\n",
probe_array[i].name);
}
return 0;
}
static void __exit probe_fini(void)
{
int i;
for (i = 0; i < ARRAY_SIZE(probe_array); i++)
marker_probe_unregister(probe_array[i].name,
probe_array[i].probe_func, &probe_array[i]);
printk(KERN_INFO "Number of event b : %u\n",
atomic_read(&eventb_count));
marker_synchronize_unregister();
}
module_init(probe_init);
module_exit(probe_fini);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Mathieu Desnoyers");
MODULE_DESCRIPTION("SUBSYSTEM Probe");
......@@ -13,7 +13,6 @@
# 2) modpost is then used to
# 3) create one <module>.mod.c file pr. module
# 4) create one Module.symvers file with CRC for all exported symbols
# 4a) [CONFIG_MARKERS] create one Module.markers file listing defined markers
# 5) compile all <module>.mod.c files
# 6) final link of the module to a <module.ko> file
......@@ -59,10 +58,6 @@ include scripts/Makefile.lib
kernelsymfile := $(objtree)/Module.symvers
modulesymfile := $(firstword $(KBUILD_EXTMOD))/Module.symvers
kernelmarkersfile := $(objtree)/Module.markers
modulemarkersfile := $(firstword $(KBUILD_EXTMOD))/Module.markers
markersfile = $(if $(KBUILD_EXTMOD),$(modulemarkersfile),$(kernelmarkersfile))
# Step 1), find all modules listed in $(MODVERDIR)/
__modules := $(sort $(shell grep -h '\.ko' /dev/null $(wildcard $(MODVERDIR)/*.mod)))
......@@ -85,8 +80,6 @@ modpost = scripts/mod/modpost \
$(if $(KBUILD_EXTRA_SYMBOLS), $(patsubst %, -e %,$(KBUILD_EXTRA_SYMBOLS))) \
$(if $(KBUILD_EXTMOD),-o $(modulesymfile)) \
$(if $(CONFIG_DEBUG_SECTION_MISMATCH),,-S) \
$(if $(CONFIG_MARKERS),-K $(kernelmarkersfile)) \
$(if $(CONFIG_MARKERS),-M $(markersfile)) \
$(if $(KBUILD_EXTMOD)$(KBUILD_MODPOST_WARN),-w) \
$(if $(cross_build),-c)
......@@ -101,17 +94,12 @@ quiet_cmd_kernel-mod = MODPOST $@
cmd_kernel-mod = $(modpost) $@
vmlinux.o: FORCE
@rm -fr $(kernelmarkersfile)
$(call cmd,kernel-mod)
# Declare generated files as targets for modpost
$(symverfile): __modpost ;
$(modules:.ko=.mod.c): __modpost ;
ifdef CONFIG_MARKERS
$(markersfile): __modpost ;
endif
# Step 5), compile all *.mod.c files
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment