Commit e88ed227 authored by Daniel Bristot de Oliveira's avatar Daniel Bristot de Oliveira Committed by Steven Rostedt (Google)

tracing/timerlat: Add user-space interface

Going a step further, we propose a way to use any user-space
workload as the task waiting for the timerlat timer. This is done
via a per-CPU file named osnoise/cpu$id/timerlat_fd file.

The tracef_fd allows a task to open at a time. When a task reads
the file, the timerlat timer is armed for future osnoise/timerlat_period_us
time. When the timer fires, it prints the IRQ latency and
wakes up the user-space thread waiting in the timerlat_fd.

The thread then starts to run, executes the timerlat measurement, prints
the thread scheduling latency and returns to user-space.

When the thread rereads the timerlat_fd, the tracer will print the
user-ret(urn) latency, which is an additional metric.

This additional metric is also traced by the tracer and can be used, for
example of measuring the context switch overhead from kernel-to-user and
user-to-kernel, or the response time for an arbitrary execution in
user-space.

The tracer supports one thread per CPU, the thread must be pinned to
the CPU, and it cannot migrate while holding the timerlat_fd. The reason
is that the tracer is per CPU (nothing prohibits the tracer from
allowing migrations in the future). The tracer monitors the migration
of the thread and disables the tracer if detected.

The timerlat_fd is only available for opening/reading when timerlat
tracer is enabled, and NO_OSNOISE_WORKLOAD is set.

The simplest way to activate this feature from user-space is:

 -------------------------------- %< -----------------------------------
 int main(void)
 {
	char buffer[1024];
	int timerlat_fd;
	int retval;
	long cpu = 0;	/* place in CPU 0 */
	cpu_set_t set;

	CPU_ZERO(&set);
	CPU_SET(cpu, &set);

	if (sched_setaffinity(gettid(), sizeof(set), &set) == -1)
		return 1;

	snprintf(buffer, sizeof(buffer),
		"/sys/kernel/tracing/osnoise/per_cpu/cpu%ld/timerlat_fd",
		cpu);

	timerlat_fd = open(buffer, O_RDONLY);
	if (timerlat_fd < 0) {
		printf("error opening %s: %s\n", buffer, strerror(errno));
		exit(1);
	}

	for (;;) {
		retval = read(timerlat_fd, buffer, 1024);
		if (retval < 0)
			break;
	}

	close(timerlat_fd);
	exit(0);
}
 -------------------------------- >% -----------------------------------

When disabling timerlat, if there is a workload holding the timerlat_fd,
the SIGKILL will be sent to the thread.

Link: https://lkml.kernel.org/r/69fe66a863d2792ff4c3a149bf9e32e26468bb3a.1686063934.git.bristot@kernel.org

Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: William White <chwhite@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
parent cb7ca871
......@@ -180,3 +180,81 @@ dummy_load_1ms_pd_init, which had the following code (on purpose)::
return 0;
}
User-space interface
---------------------------
Timerlat allows user-space threads to use timerlat infra-structure to
measure scheduling latency. This interface is accessible via a per-CPU
file descriptor inside $tracing_dir/osnoise/per_cpu/cpu$ID/timerlat_fd.
This interface is accessible under the following conditions:
- timerlat tracer is enable
- osnoise workload option is set to NO_OSNOISE_WORKLOAD
- The user-space thread is affined to a single processor
- The thread opens the file associated with its single processor
- Only one thread can access the file at a time
The open() syscall will fail if any of these conditions are not met.
After opening the file descriptor, the user space can read from it.
The read() system call will run a timerlat code that will arm the
timer in the future and wait for it as the regular kernel thread does.
When the timer IRQ fires, the timerlat IRQ will execute, report the
IRQ latency and wake up the thread waiting in the read. The thread will be
scheduled and report the thread latency via tracer - as for the kernel
thread.
The difference from the in-kernel timerlat is that, instead of re-arming
the timer, timerlat will return to the read() system call. At this point,
the user can run any code.
If the application rereads the file timerlat file descriptor, the tracer
will report the return from user-space latency, which is the total
latency. If this is the end of the work, it can be interpreted as the
response time for the request.
After reporting the total latency, timerlat will restart the cycle, arm
a timer, and go to sleep for the following activation.
If at any time one of the conditions is broken, e.g., the thread migrates
while in user space, or the timerlat tracer is disabled, the SIG_KILL
signal will be sent to the user-space thread.
Here is an basic example of user-space code for timerlat::
int main(void)
{
char buffer[1024];
int timerlat_fd;
int retval;
long cpu = 0; /* place in CPU 0 */
cpu_set_t set;
CPU_ZERO(&set);
CPU_SET(cpu, &set);
if (sched_setaffinity(gettid(), sizeof(set), &set) == -1)
return 1;
snprintf(buffer, sizeof(buffer),
"/sys/kernel/tracing/osnoise/per_cpu/cpu%ld/timerlat_fd",
cpu);
timerlat_fd = open(buffer, O_RDONLY);
if (timerlat_fd < 0) {
printf("error opening %s: %s\n", buffer, strerror(errno));
exit(1);
}
for (;;) {
retval = read(timerlat_fd, buffer, 1024);
if (retval < 0)
break;
}
close(timerlat_fd);
exit(0);
}
This diff is collapsed.
......@@ -1446,6 +1446,8 @@ static struct trace_event trace_osnoise_event = {
};
/* TRACE_TIMERLAT */
static char *timerlat_lat_context[] = {"irq", "thread", "user-ret"};
static enum print_line_t
trace_timerlat_print(struct trace_iterator *iter, int flags,
struct trace_event *event)
......@@ -1458,7 +1460,7 @@ trace_timerlat_print(struct trace_iterator *iter, int flags,
trace_seq_printf(s, "#%-5u context %6s timer_latency %9llu ns\n",
field->seqnum,
field->context ? "thread" : "irq",
timerlat_lat_context[field->context],
field->timer_latency);
return trace_handle_return(s);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment