Commit a2ded784 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'trace-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Allow kernel trace instance creation to specify what events are
   created

   Inside the kernel, a subsystem may create a tracing instance that it
   can use to send events to user space. This sub-system may not care
   about the thousands of events that exist in eventfs. Allow the
   sub-system to specify what sub-systems of events it cares about, and
   only those events are exposed to this instance.

 - Allow the ring buffer to be broken up into bigger sub-buffers than
   just the architecture page size.

   A new tracefs file called "buffer_subbuf_size_kb" is created. The
   user can now specify a minimum size the sub-buffer may be in
   kilobytes. Note, that the implementation currently make the
   sub-buffer size a power of 2 pages (1, 2, 4, 8, 16, ...) but the user
   only writes in kilobyte size, and the sub-buffer will be updated to
   the next size that it will can accommodate it. If the user writes in
   10, it will change the size to be 4 pages on x86 (16K), as that is
   the next available size that can hold 10K pages.

 - Update the debug output when a corrupt time is detected in the ring
   buffer. If the ring buffer detects inconsistent timestamps, there's a
   debug config options that will dump the contents of the meta data of
   the sub-buffer that is used for debugging. Add some more information
   to this dump that helps with debugging.

 - Add more timestamp debugging checks (only triggers when the config is
   enabled)

 - Increase the trace_seq iterator to 2 page sizes.

 - Allow strings written into tracefs_marker to be larger. Up to just
   under 2 page sizes (based on what trace_seq can hold).

 - Increase the trace_maker_raw write to be as big as a sub-buffer can
   hold.

 - Remove 32 bit time stamp logic, now that the rb_time_cmpxchg() has
   been removed.

 - More selftests were added.

 - Some code clean ups as well.

* tag 'trace-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (29 commits)
  ring-buffer: Remove stale comment from ring_buffer_size()
  tracing histograms: Simplify parse_actions() function
  tracing/selftests: Remove exec permissions from trace_marker.tc test
  ring-buffer: Use subbuf_order for buffer page masking
  tracing: Update subbuffer with kilobytes not page order
  ringbuffer/selftest: Add basic selftest to test changing subbuf order
  ring-buffer: Add documentation on the buffer_subbuf_order file
  ring-buffer: Just update the subbuffers when changing their allocation order
  ring-buffer: Keep the same size when updating the order
  tracing: Stop the tracing while changing the ring buffer subbuf size
  tracing: Update snapshot order along with main buffer order
  ring-buffer: Make sure the spare sub buffer used for reads has same size
  ring-buffer: Do no swap cpu buffers if order is different
  ring-buffer: Clear pages on error in ring_buffer_subbuf_order_set() failure
  ring-buffer: Read and write to ring buffers with custom sub buffer size
  ring-buffer: Set new size of the ring buffer sub page
  ring-buffer: Add interface for configuring trace sub buffer size
  ring-buffer: Page size per ring buffer
  ring-buffer: Have ring_buffer_print_page_header() be able to access ring_buffer_iter
  ring-buffer: Check if absolute timestamp goes backwards
  ...
parents 5b890ad4 25742aeb
......@@ -218,6 +218,27 @@ of ftrace. Here is a list of some of the key files:
This displays the total combined size of all the trace buffers.
buffer_subbuf_size_kb:
This sets or displays the sub buffer size. The ring buffer is broken up
into several same size "sub buffers". An event can not be bigger than
the size of the sub buffer. Normally, the sub buffer is the size of the
architecture's page (4K on x86). The sub buffer also contains meta data
at the start which also limits the size of an event. That means when
the sub buffer is a page size, no event can be larger than the page
size minus the sub buffer meta data.
Note, the buffer_subbuf_size_kb is a way for the user to specify the
minimum size of the subbuffer. The kernel may make it bigger due to the
implementation details, or simply fail the operation if the kernel can
not handle the request.
Changing the sub buffer size allows for events to be larger than the
page size.
Note: When changing the sub-buffer size, tracing is stopped and any
data in the ring buffer and the snapshot buffer will be discarded.
free_buffer:
If a process is performing tracing, and the ring buffer should be
......
......@@ -2889,7 +2889,7 @@ static void qla2x00_iocb_work_fn(struct work_struct *work)
static void
qla_trace_init(void)
{
qla_trc_array = trace_array_get_by_name("qla2xxx");
qla_trc_array = trace_array_get_by_name("qla2xxx", NULL);
if (!qla_trc_array) {
ql_log(ql_log_fatal, NULL, 0x0001,
"Unable to create qla2xxx trace instance, instance logging will be disabled.\n");
......
......@@ -141,6 +141,7 @@ int ring_buffer_iter_empty(struct ring_buffer_iter *iter);
bool ring_buffer_iter_dropped(struct ring_buffer_iter *iter);
unsigned long ring_buffer_size(struct trace_buffer *buffer, int cpu);
unsigned long ring_buffer_max_event_size(struct trace_buffer *buffer);
void ring_buffer_reset_cpu(struct trace_buffer *buffer, int cpu);
void ring_buffer_reset_online_cpus(struct trace_buffer *buffer);
......@@ -191,15 +192,24 @@ bool ring_buffer_time_stamp_abs(struct trace_buffer *buffer);
size_t ring_buffer_nr_pages(struct trace_buffer *buffer, int cpu);
size_t ring_buffer_nr_dirty_pages(struct trace_buffer *buffer, int cpu);
void *ring_buffer_alloc_read_page(struct trace_buffer *buffer, int cpu);
void ring_buffer_free_read_page(struct trace_buffer *buffer, int cpu, void *data);
int ring_buffer_read_page(struct trace_buffer *buffer, void **data_page,
struct buffer_data_read_page;
struct buffer_data_read_page *
ring_buffer_alloc_read_page(struct trace_buffer *buffer, int cpu);
void ring_buffer_free_read_page(struct trace_buffer *buffer, int cpu,
struct buffer_data_read_page *page);
int ring_buffer_read_page(struct trace_buffer *buffer,
struct buffer_data_read_page *data_page,
size_t len, int cpu, int full);
void *ring_buffer_read_page_data(struct buffer_data_read_page *page);
struct trace_seq;
int ring_buffer_print_entry_header(struct trace_seq *s);
int ring_buffer_print_page_header(struct trace_seq *s);
int ring_buffer_print_page_header(struct trace_buffer *buffer, struct trace_seq *s);
int ring_buffer_subbuf_order_get(struct trace_buffer *buffer);
int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order);
int ring_buffer_subbuf_size_get(struct trace_buffer *buffer);
enum ring_buffer_flags {
RB_FL_OVERWRITE = 1 << 0,
......
......@@ -51,7 +51,7 @@ int trace_array_printk(struct trace_array *tr, unsigned long ip,
const char *fmt, ...);
int trace_array_init_printk(struct trace_array *tr);
void trace_array_put(struct trace_array *tr);
struct trace_array *trace_array_get_by_name(const char *name);
struct trace_array *trace_array_get_by_name(const char *name, const char *systems);
int trace_array_destroy(struct trace_array *tr);
/* For osnoise tracer */
......@@ -84,7 +84,7 @@ static inline int trace_array_init_printk(struct trace_array *tr)
static inline void trace_array_put(struct trace_array *tr)
{
}
static inline struct trace_array *trace_array_get_by_name(const char *name)
static inline struct trace_array *trace_array_get_by_name(const char *name, const char *systems)
{
return NULL;
}
......
......@@ -8,11 +8,14 @@
/*
* Trace sequences are used to allow a function to call several other functions
* to create a string of data to use (up to a max of PAGE_SIZE).
* to create a string of data to use.
*/
#define TRACE_SEQ_BUFFER_SIZE (PAGE_SIZE * 2 - \
(sizeof(struct seq_buf) + sizeof(size_t) + sizeof(int)))
struct trace_seq {
char buffer[PAGE_SIZE];
char buffer[TRACE_SEQ_BUFFER_SIZE];
struct seq_buf seq;
size_t readpos;
int full;
......@@ -21,7 +24,7 @@ struct trace_seq {
static inline void
trace_seq_init(struct trace_seq *s)
{
seq_buf_init(&s->seq, s->buffer, PAGE_SIZE);
seq_buf_init(&s->seq, s->buffer, TRACE_SEQ_BUFFER_SIZE);
s->full = 0;
s->readpos = 0;
}
......
This diff is collapsed.
......@@ -104,10 +104,11 @@ static enum event_status read_event(int cpu)
static enum event_status read_page(int cpu)
{
struct buffer_data_read_page *bpage;
struct ring_buffer_event *event;
struct rb_page *rpage;
unsigned long commit;
void *bpage;
int page_size;
int *entry;
int ret;
int inc;
......@@ -117,14 +118,15 @@ static enum event_status read_page(int cpu)
if (IS_ERR(bpage))
return EVENT_DROPPED;
ret = ring_buffer_read_page(buffer, &bpage, PAGE_SIZE, cpu, 1);
page_size = ring_buffer_subbuf_size_get(buffer);
ret = ring_buffer_read_page(buffer, bpage, page_size, cpu, 1);
if (ret >= 0) {
rpage = bpage;
rpage = ring_buffer_read_page_data(bpage);
/* The commit may have missed event flags set, clear them */
commit = local_read(&rpage->commit) & 0xfffff;
for (i = 0; i < commit && !test_error ; i += inc) {
if (i >= (PAGE_SIZE - offsetof(struct rb_page, data))) {
if (i >= (page_size - offsetof(struct rb_page, data))) {
TEST_ERROR();
break;
}
......
This diff is collapsed.
......@@ -377,6 +377,7 @@ struct trace_array {
unsigned char trace_flags_index[TRACE_FLAGS_MAX_SIZE];
unsigned int flags;
raw_spinlock_t start_lock;
const char *system_names;
struct list_head err_log;
struct dentry *dir;
struct dentry *options;
......@@ -615,6 +616,7 @@ void tracing_reset_all_online_cpus(void);
void tracing_reset_all_online_cpus_unlocked(void);
int tracing_open_generic(struct inode *inode, struct file *filp);
int tracing_open_generic_tr(struct inode *inode, struct file *filp);
int tracing_release_generic_tr(struct inode *inode, struct file *file);
int tracing_open_file_tr(struct inode *inode, struct file *filp);
int tracing_release_file_tr(struct inode *inode, struct file *filp);
int tracing_single_release_file_tr(struct inode *inode, struct file *filp);
......
......@@ -633,7 +633,7 @@ trace_boot_init_instances(struct xbc_node *node)
if (!p || *p == '\0')
continue;
tr = trace_array_get_by_name(p);
tr = trace_array_get_by_name(p, NULL);
if (!tr) {
pr_err("Failed to get trace instance %s\n", p);
continue;
......
......@@ -1893,9 +1893,9 @@ subsystem_filter_write(struct file *filp, const char __user *ubuf, size_t cnt,
}
static ssize_t
show_header(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
show_header_page_file(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
{
int (*func)(struct trace_seq *s) = filp->private_data;
struct trace_array *tr = filp->private_data;
struct trace_seq *s;
int r;
......@@ -1908,7 +1908,31 @@ show_header(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
trace_seq_init(s);
func(s);
ring_buffer_print_page_header(tr->array_buffer.buffer, s);
r = simple_read_from_buffer(ubuf, cnt, ppos,
s->buffer, trace_seq_used(s));
kfree(s);
return r;
}
static ssize_t
show_header_event_file(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
{
struct trace_seq *s;
int r;
if (*ppos)
return 0;
s = kmalloc(sizeof(*s), GFP_KERNEL);
if (!s)
return -ENOMEM;
trace_seq_init(s);
ring_buffer_print_entry_header(s);
r = simple_read_from_buffer(ubuf, cnt, ppos,
s->buffer, trace_seq_used(s));
......@@ -2165,10 +2189,18 @@ static const struct file_operations ftrace_tr_enable_fops = {
.release = subsystem_release,
};
static const struct file_operations ftrace_show_header_fops = {
.open = tracing_open_generic,
.read = show_header,
static const struct file_operations ftrace_show_header_page_fops = {
.open = tracing_open_generic_tr,
.read = show_header_page_file,
.llseek = default_llseek,
.release = tracing_release_generic_tr,
};
static const struct file_operations ftrace_show_header_event_fops = {
.open = tracing_open_generic_tr,
.read = show_header_event_file,
.llseek = default_llseek,
.release = tracing_release_generic_tr,
};
static int
......@@ -2896,6 +2928,27 @@ void trace_event_eval_update(struct trace_eval_map **map, int len)
up_write(&trace_event_sem);
}
static bool event_in_systems(struct trace_event_call *call,
const char *systems)
{
const char *system;
const char *p;
if (!systems)
return true;
system = call->class->system;
p = strstr(systems, system);
if (!p)
return false;
if (p != systems && !isspace(*(p - 1)) && *(p - 1) != ',')
return false;
p += strlen(system);
return !*p || isspace(*p) || *p == ',';
}
static struct trace_event_file *
trace_create_new_event(struct trace_event_call *call,
struct trace_array *tr)
......@@ -2905,9 +2958,12 @@ trace_create_new_event(struct trace_event_call *call,
struct trace_event_file *file;
unsigned int first;
if (!event_in_systems(call, tr->system_names))
return NULL;
file = kmem_cache_alloc(file_cachep, GFP_TRACE);
if (!file)
return NULL;
return ERR_PTR(-ENOMEM);
pid_list = rcu_dereference_protected(tr->filtered_pids,
lockdep_is_held(&event_mutex));
......@@ -2972,8 +3028,17 @@ __trace_add_new_event(struct trace_event_call *call, struct trace_array *tr)
struct trace_event_file *file;
file = trace_create_new_event(call, tr);
/*
* trace_create_new_event() returns ERR_PTR(-ENOMEM) if failed
* allocation, or NULL if the event is not part of the tr->system_names.
* When the event is not part of the tr->system_names, return zero, not
* an error.
*/
if (!file)
return -ENOMEM;
return 0;
if (IS_ERR(file))
return PTR_ERR(file);
if (eventdir_initialized)
return event_create_dir(tr->event_dir, file);
......@@ -3012,8 +3077,17 @@ __trace_early_add_new_event(struct trace_event_call *call,
int ret;
file = trace_create_new_event(call, tr);
/*
* trace_create_new_event() returns ERR_PTR(-ENOMEM) if failed
* allocation, or NULL if the event is not part of the tr->system_names.
* When the event is not part of the tr->system_names, return zero, not
* an error.
*/
if (!file)
return -ENOMEM;
return 0;
if (IS_ERR(file))
return PTR_ERR(file);
ret = event_define_fields(call);
if (ret)
......@@ -3752,17 +3826,16 @@ static int events_callback(const char *name, umode_t *mode, void **data,
return 1;
}
if (strcmp(name, "header_page") == 0)
*data = ring_buffer_print_page_header;
else if (strcmp(name, "header_event") == 0)
*data = ring_buffer_print_entry_header;
if (strcmp(name, "header_page") == 0) {
*mode = TRACE_MODE_READ;
*fops = &ftrace_show_header_page_fops;
else
} else if (strcmp(name, "header_event") == 0) {
*mode = TRACE_MODE_READ;
*fops = &ftrace_show_header_event_fops;
} else
return 0;
*mode = TRACE_MODE_READ;
*fops = &ftrace_show_header_fops;
return 1;
}
......
......@@ -4805,36 +4805,35 @@ static int parse_actions(struct hist_trigger_data *hist_data)
int len;
for (i = 0; i < hist_data->attrs->n_actions; i++) {
enum handler_id hid = 0;
char *action_str;
str = hist_data->attrs->action_str[i];
if ((len = str_has_prefix(str, "onmatch("))) {
char *action_str = str + len;
if ((len = str_has_prefix(str, "onmatch(")))
hid = HANDLER_ONMATCH;
else if ((len = str_has_prefix(str, "onmax(")))
hid = HANDLER_ONMAX;
else if ((len = str_has_prefix(str, "onchange(")))
hid = HANDLER_ONCHANGE;
data = onmatch_parse(tr, action_str);
if (IS_ERR(data)) {
ret = PTR_ERR(data);
break;
}
} else if ((len = str_has_prefix(str, "onmax("))) {
char *action_str = str + len;
action_str = str + len;
data = track_data_parse(hist_data, action_str,
HANDLER_ONMAX);
if (IS_ERR(data)) {
ret = PTR_ERR(data);
break;
}
} else if ((len = str_has_prefix(str, "onchange("))) {
char *action_str = str + len;
switch (hid) {
case HANDLER_ONMATCH:
data = onmatch_parse(tr, action_str);
break;
case HANDLER_ONMAX:
case HANDLER_ONCHANGE:
data = track_data_parse(hist_data, action_str, hid);
break;
default:
data = ERR_PTR(-EINVAL);
break;
}
data = track_data_parse(hist_data, action_str,
HANDLER_ONCHANGE);
if (IS_ERR(data)) {
ret = PTR_ERR(data);
break;
}
} else {
ret = -EINVAL;
if (IS_ERR(data)) {
ret = PTR_ERR(data);
break;
}
......
......@@ -13,9 +13,6 @@
* trace_seq_init() more than once to reset the trace_seq to start
* from scratch.
*
* The buffer size is currently PAGE_SIZE, although it may become dynamic
* in the future.
*
* A write to the buffer will either succeed or fail. That is, unlike
* sprintf() there will not be a partial write (well it may write into
* the buffer but it wont update the pointers). This allows users to
......
......@@ -105,7 +105,7 @@ static int __init sample_trace_array_init(void)
* NOTE: This function increments the reference counter
* associated with the trace array - "tr".
*/
tr = trace_array_get_by_name("sample-instance");
tr = trace_array_get_by_name("sample-instance", "sched,timer,kprobes");
if (!tr)
return -1;
......
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
# description: Change the ringbuffer sub-buffer size
# requires: buffer_subbuf_size_kb
# flags: instance
get_buffer_data_size() {
sed -ne 's/^.*data.*size:\([0-9][0-9]*\).*/\1/p' events/header_page
}
get_buffer_data_offset() {
sed -ne 's/^.*data.*offset:\([0-9][0-9]*\).*/\1/p' events/header_page
}
get_event_header_size() {
type_len=`sed -ne 's/^.*type_len.*:[^0-9]*\([0-9][0-9]*\).*/\1/p' events/header_event`
time_len=`sed -ne 's/^.*time_delta.*:[^0-9]*\([0-9][0-9]*\).*/\1/p' events/header_event`
array_len=`sed -ne 's/^.*array.*:[^0-9]*\([0-9][0-9]*\).*/\1/p' events/header_event`
total_bits=$((type_len+time_len+array_len))
total_bits=$((total_bits+7))
echo $((total_bits/8))
}
get_print_event_buf_offset() {
sed -ne 's/^.*buf.*offset:\([0-9][0-9]*\).*/\1/p' events/ftrace/print/format
}
event_header_size=`get_event_header_size`
print_header_size=`get_print_event_buf_offset`
data_offset=`get_buffer_data_offset`
marker_meta=$((event_header_size+print_header_size))
make_str() {
cnt=$1
printf -- 'X%.0s' $(seq $cnt)
}
write_buffer() {
size=$1
str=`make_str $size`
# clear the buffer
echo > trace
# write the string into the marker
echo $str > trace_marker
echo $str
}
test_buffer() {
size_kb=$1
page_size=$((size_kb*1024))
size=`get_buffer_data_size`
# the size must be greater than or equal to page_size - data_offset
page_size=$((page_size-data_offset))
if [ $size -lt $page_size ]; then
exit fail
fi
# Now add a little more the meta data overhead will overflow
str=`write_buffer $size`
# Make sure the line was broken
new_str=`awk ' /tracing_mark_write:/ { sub(/^.*tracing_mark_write: /,"");printf "%s", $0; exit}' trace`
if [ "$new_str" = "$str" ]; then
exit fail;
fi
# Make sure the entire line can be found
new_str=`awk ' /tracing_mark_write:/ { sub(/^.*tracing_mark_write: /,"");printf "%s", $0; }' trace`
if [ "$new_str" != "$str" ]; then
exit fail;
fi
}
ORIG=`cat buffer_subbuf_size_kb`
# Could test bigger sizes than 32K, but then creating the string
# to write into the ring buffer takes too long
for a in 4 8 16 32 ; do
echo $a > buffer_subbuf_size_kb
test_buffer $a
done
echo $ORIG > buffer_subbuf_size_kb
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
# description: Basic tests on writing to trace_marker
# requires: trace_marker
# flags: instance
get_buffer_data_size() {
sed -ne 's/^.*data.*size:\([0-9][0-9]*\).*/\1/p' events/header_page
}
get_buffer_data_offset() {
sed -ne 's/^.*data.*offset:\([0-9][0-9]*\).*/\1/p' events/header_page
}
get_event_header_size() {
type_len=`sed -ne 's/^.*type_len.*:[^0-9]*\([0-9][0-9]*\).*/\1/p' events/header_event`
time_len=`sed -ne 's/^.*time_delta.*:[^0-9]*\([0-9][0-9]*\).*/\1/p' events/header_event`
array_len=`sed -ne 's/^.*array.*:[^0-9]*\([0-9][0-9]*\).*/\1/p' events/header_event`
total_bits=$((type_len+time_len+array_len))
total_bits=$((total_bits+7))
echo $((total_bits/8))
}
get_print_event_buf_offset() {
sed -ne 's/^.*buf.*offset:\([0-9][0-9]*\).*/\1/p' events/ftrace/print/format
}
event_header_size=`get_event_header_size`
print_header_size=`get_print_event_buf_offset`
data_offset=`get_buffer_data_offset`
marker_meta=$((event_header_size+print_header_size))
make_str() {
cnt=$1
# subtract two for \n\0 as marker adds these
cnt=$((cnt-2))
printf -- 'X%.0s' $(seq $cnt)
}
write_buffer() {
size=$1
str=`make_str $size`
# clear the buffer
echo > trace
# write the string into the marker
echo -n $str > trace_marker
echo $str
}
test_buffer() {
size=`get_buffer_data_size`
oneline_size=$((size-marker_meta))
echo size = $size
echo meta size = $marker_meta
# Now add a little more the meta data overhead will overflow
str=`write_buffer $size`
# Make sure the line was broken
new_str=`awk ' /tracing_mark_write:/ { sub(/^.*tracing_mark_write: /,"");printf "%s", $0; exit}' trace`
if [ "$new_str" = "$str" ]; then
exit fail;
fi
# Make sure the entire line can be found
new_str=`awk ' /tracing_mark_write:/ { sub(/^.*tracing_mark_write: /,"");printf "%s", $0; }' trace`
if [ "$new_str" != "$str" ]; then
exit fail;
fi
}
test_buffer
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment