- 03 May, 2023 1 commit
-
-
Ondrej Mosnacek authored
This file defines both read and write operations, yet it is being created as read-only. This means that it can't be written to without the CAP_DAC_OVERRIDE capability. Fix the permissions to allow root to write to it without the need to override DAC perms. Link: https://lore.kernel.org/linux-trace-kernel/20230503140114.3280002-1-omosnace@redhat.com Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 03329f99 ("tracing: Add tracefs file buffer_percentage") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
- 27 Apr, 2023 1 commit
-
-
Johannes Berg authored
If something was written to the buffer just before destruction, it may be possible (maybe not in a real system, but it did happen in ARCH=um with time-travel) to destroy the ringbuffer before the IRQ work ran, leading this KASAN report (or a crash without KASAN): BUG: KASAN: slab-use-after-free in irq_work_run_list+0x11a/0x13a Read of size 8 at addr 000000006d640a48 by task swapper/0 CPU: 0 PID: 0 Comm: swapper Tainted: G W O 6.3.0-rc1 #7 Stack: 60c4f20f 0c203d48 41b58ab3 60f224fc 600477fa 60f35687 60c4f20f 601273dd 00000008 6101eb00 6101eab0 615be548 Call Trace: [<60047a58>] show_stack+0x25e/0x282 [<60c609e0>] dump_stack_lvl+0x96/0xfd [<60c50d4c>] print_report+0x1a7/0x5a8 [<603078d3>] kasan_report+0xc1/0xe9 [<60308950>] __asan_report_load8_noabort+0x1b/0x1d [<60232844>] irq_work_run_list+0x11a/0x13a [<602328b4>] irq_work_tick+0x24/0x34 [<6017f9dc>] update_process_times+0x162/0x196 [<6019f335>] tick_sched_handle+0x1a4/0x1c3 [<6019fd9e>] tick_sched_timer+0x79/0x10c [<601812b9>] __hrtimer_run_queues.constprop.0+0x425/0x695 [<60182913>] hrtimer_interrupt+0x16c/0x2c4 [<600486a3>] um_timer+0x164/0x183 [...] Allocated by task 411: save_stack_trace+0x99/0xb5 stack_trace_save+0x81/0x9b kasan_save_stack+0x2d/0x54 kasan_set_track+0x34/0x3e kasan_save_alloc_info+0x25/0x28 ____kasan_kmalloc+0x8b/0x97 __kasan_kmalloc+0x10/0x12 __kmalloc+0xb2/0xe8 load_elf_phdrs+0xee/0x182 [...] The buggy address belongs to the object at 000000006d640800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 584 bytes inside of freed 1024-byte region [000000006d640800, 000000006d640c00) Add the appropriate irq_work_sync() so the work finishes before the buffers are destroyed. Prior to the commit in the Fixes tag below, there was only a single global IRQ work, so this issue didn't exist. Link: https://lore.kernel.org/linux-trace-kernel/20230427175920.a76159263122.I8295e405c44362a86c995e9c2c37e3e03810aa56@changeid Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 15693458 ("tracing/ring-buffer: Move poll wake ups into ring buffer code") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
- 26 Apr, 2023 11 commits
-
-
Ken Lin authored
If the buffer length is larger than 16 and concatenate is set to false, there would be missing spaces every 16 bytes. Example: Before: c5 11 10 50 05 4d 31 40 00 40 00 40 00 4d 31 4000 40 00 After: c5 11 10 50 05 4d 31 40 00 40 00 40 00 4d 31 40 00 40 00 Link: https://lore.kernel.org/linux-trace-kernel/20230426032257.3157247-1-lyenting@google.comSigned-off-by: Ken Lin <lyenting@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Tze-nan Wu authored
In ring_buffer_reset_online_cpus, the buffer_size_kb write operation may permanently fail if the cpu_online_mask changes between two for_each_online_buffer_cpu loops. The number of increases and decreases on both cpu_buffer->resize_disabled and cpu_buffer->record_disabled may be inconsistent, causing some CPUs to have non-zero values for these atomic variables after the function returns. This issue can be reproduced by "echo 0 > trace" while hotplugging cpu. After reproducing success, we can find out buffer_size_kb will not be functional anymore. To prevent leaving 'resize_disabled' and 'record_disabled' non-zero after ring_buffer_reset_online_cpus returns, we ensure that each atomic variable has been set up before atomic_sub() to it. Link: https://lore.kernel.org/linux-trace-kernel/20230426062027.17451-1-Tze-nan.Wu@mediatek.com Cc: stable@vger.kernel.org Cc: <mhiramat@kernel.org> Cc: npiggin@gmail.com Fixes: b23d7a5f ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU") Reviewed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com> Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Hao Zeng authored
Common realloc mistake: 'file_append' nulled but not freed upon failure Link: https://lkml.kernel.org/r/20230426010527.703093-1-zenghao@kylinos.cnSigned-off-by: Hao Zeng <zenghao@kylinos.cn> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Beau Belgrave authored
When event enablement changes, user_events attempts to update a bit in the user process. If a fault is hit, an attempt to fault-in the page and the write is retried if the page made it in. While this normally requires a couple attempts, it is possible a bad user process could attempt to cause infinite loops. Ensure fault-in attempts either sync or async are limited to a max of 10 attempts for each update. When the max is hit, return -EFAULT so another attempt is not made in all cases. Link: https://lkml.kernel.org/r/20230425225107.8525-5-beaub@linux.microsoft.comSuggested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Beau Belgrave authored
User processes register an address and bit pair for events. If the same address and bit pair are registered multiple times in the same process, it can cause undefined behavior when events are enabled/disabled. When more than one are used, the bit could be turned off by another event being disabled, while the original event is still enabled. Prevent undefined behavior by checking the current mm to see if any event has already been registered for the address and bit pair. Return EADDRINUSE back to the user process if it's already being used. Update ftrace self-test to ensure this occurs properly. Link: https://lkml.kernel.org/r/20230425225107.8525-4-beaub@linux.microsoft.comSuggested-by: Doug Cook <dcook@linux.microsoft.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Beau Belgrave authored
If an event is enabled and a user process unregisters user_events, the bit is left set. Fix this by always clearing the bit in the user process if unregister is successful. Update abi self-test to ensure this occurs properly. Link: https://lkml.kernel.org/r/20230425225107.8525-3-beaub@linux.microsoft.comSuggested-by: Doug Cook <dcook@linux.microsoft.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Beau Belgrave authored
The write index indicates which event the data is for and accesses a per-file array. The index is passed by user processes during write() calls as the first 4 bytes. Ensure that it cannot be negative by returning -EINVAL to prevent out of bounds accesses. Update ftrace self-test to ensure this occurs properly. Link: https://lkml.kernel.org/r/20230425225107.8525-2-beaub@linux.microsoft.com Fixes: 7f5a08c7 ("user_events: Add minimal support for trace_event into ftrace") Reported-by: Doug Cook <dcook@linux.microsoft.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Sergey Senozhatsky authored
Sometimes we use seq_buf to format a string buffer, which we then pass to printk(). However, in certain situations the seq_buf string buffer can get too big, exceeding the PRINTKRB_RECORD_MAX bytes limit, and causing printk() to truncate the string. Add a new seq_buf helper. This helper prints the seq_buf string buffer line by line, using \n as a delimiter, rather than passing the whole string buffer to printk() at once. Link: https://lkml.kernel.org/r/20230415100110.1419872-1-senozhatsky@chromium.org Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Beau Belgrave authored
Both print_fields() and print_array() do not handle if dynamic data ends at the last byte of the payload for both __dyn_loc and __rel_loc field types. For __rel_loc, the offset was off by 4 bytes, leading to incorrect strings and data being printed out. In print_array() the buffer pos was missed from being advanced, which results in the first payload byte being used as the offset base instead of the field offset. Advance __rel_loc offset by 4 to ensure correct offset and advance pos to the field offset to ensure correct data is displayed when printing arrays. Change >= to > when checking if data is in-bounds, since it's valid for dynamic data to include the last byte of the payload. Example outputs for event format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:__rel_loc char text[]; offset:8; size:4; signed:1; Output before: tp_rel_loc: text=<OVERFLOW> Output after: tp_rel_loc: text=Test Link: https://lkml.kernel.org/r/20230419214140.4158-3-beaub@linux.microsoft.com Fixes: 80a76994 ("tracing: Add "fields" option to show raw trace event fields") Reported-by: Doug Cook <dcook@linux.microsoft.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Beau Belgrave authored
Users expect that events can be filtered by the kernel. User events currently sets all event fields as FILTER_OTHER which limits to binary filters only. When strings are being used, functionality is reduced. Use filter_assign_type() to find the most appropriate filter type for each field in user events to ensure full kernel capabilities. Link: https://lkml.kernel.org/r/20230419214140.4158-2-beaub@linux.microsoft.comSigned-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Zheng Yejian authored
In error case, 'buffer_page' returned by rb_set_head_page() is NULL, currently check '&buffer_page->list' is equivalent to check 'buffer_page' due to 'list' is the first member of 'buffer_page', but suppose it is not some time, 'head_page' would be wild memory while check would be bypassed. Link: https://lore.kernel.org/linux-trace-kernel/20230414071729.57312-1-zhengyejian1@huawei.com Cc: <mhiramat@kernel.org> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
- 29 Mar, 2023 18 commits
-
-
Steven Rostedt (Google) authored
The user events was added a bit prematurely, and there were a few kernel developers that had issues with it. The API also needed a bit of work to make sure it would be stable. It was decided to make user events "broken" until this was settled. Now it has a new API that appears to be as stable as it will be without the use of a crystal ball. It's being used within Microsoft as is, which means the API has had some testing in real world use cases. It went through many discussions in the bi-weekly tracing meetings, and there's been no more comments about updates. I feel this is good to go. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Steven Rostedt (Google) authored
Currently, user events are shown using the "hex" output for "safety" reasons as one cannot trust user events behaving nicely. But the hex output is not the only utility for safe outputting of trace events. The print_event_fields() is just as safe and gives user readable output. Before: example-839 [001] ..... 43.222244: 00000000: b1 06 00 00 47 03 00 00 00 00 00 00 ....G....... example-839 [001] ..... 43.564433: 00000000: b1 06 00 00 47 03 00 00 01 00 00 00 ....G....... example-839 [001] ..... 43.763917: 00000000: b1 06 00 00 47 03 00 00 02 00 00 00 ....G....... example-839 [001] ..... 43.967929: 00000000: b1 06 00 00 47 03 00 00 03 00 00 00 ....G....... After: example-837 [006] ..... 55.739249: test: count=0x0 (0) example-837 [006] ..... 111.104784: test: count=0x1 (1) example-837 [006] ..... 111.268444: test: count=0x2 (2) example-837 [006] ..... 111.416533: test: count=0x3 (3) example-837 [006] ..... 111.542859: test: count=0x4 (4) Link: https://lore.kernel.org/linux-trace-kernel/20230328151413.4770b8d7@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Beau Belgrave authored
Add tabs to make struct members easier to read and unify the style of the code. Link: https://lkml.kernel.org/r/20230328235219.203-13-beaub@linux.microsoft.comSigned-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Beau Belgrave authored
Operators want to be able to ensure enough tracepoints exist on the system for kernel components as well as for user components. Since there are only up to 64K events, by default allow up to half to be used by user events. Add a kernel sysctl parameter (kernel.user_events_max) to set a global limit that is honored among all groups on the system. This ensures hard limits can be setup to prevent user processes from consuming all event IDs on the system. Link: https://lkml.kernel.org/r/20230328235219.203-12-beaub@linux.microsoft.comSigned-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Beau Belgrave authored
Operators need a way to limit how much memory cgroups use. User events need to be included into that accounting. Fix this by using GFP_KERNEL_ACCOUNT for allocations generated by user programs for user_event tracing. Link: https://lkml.kernel.org/r/20230328235219.203-11-beaub@linux.microsoft.comSigned-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Beau Belgrave authored
The ABI for user_events has changed from mmap() based to remote writes. Update the documentation to reflect these changes, add new section for unregistering events since lifetime is now tied to tasks instead of files. Link: https://lkml.kernel.org/r/20230328235219.203-10-beaub@linux.microsoft.comSigned-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Beau Belgrave authored
The ABI has changed to use a remote write approach. Update the example to show the expected use of this new ABI. Also remove debugfs path and use tracefs to ensure example works in more environments. Link: https://lkml.kernel.org/r/20230328235219.203-9-beaub@linux.microsoft.comSigned-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Beau Belgrave authored
Add ABI specific self-test to ensure enablements work in various scenarios such as fork, VM_CLONE, and basic event enable/disable. Ensure ABI contracts/limits are also being upheld, such as bit limits and data size limits. Link: https://lkml.kernel.org/r/20230328235219.203-8-beaub@linux.microsoft.comSigned-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Beau Belgrave authored
ABI has been changed to remote writes, update existing test cases to use this new ABI to ensure existing functionality continues to work. Link: https://lkml.kernel.org/r/20230328235219.203-7-beaub@linux.microsoft.comSigned-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Beau Belgrave authored
Enablements are now tracked by the lifetime of the task/mm. User processes need to be able to disable their addresses if tracing is requested to be turned off. Before unmapping the page would suffice. However, we now need a stronger contract. Add an ioctl to enable this. A new flag bit is added, freeing, to user_event_enabler to ensure that if the event is attempted to be removed while a fault is being handled that the remove is delayed until after the fault is reattempted. Link: https://lkml.kernel.org/r/20230328235219.203-6-beaub@linux.microsoft.comSigned-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Beau Belgrave authored
When events are enabled within the various tracing facilities, such as ftrace/perf, the event_mutex is held. As events are enabled pages are accessed. We do not want page faults to occur under this lock. Instead queue the fault to a workqueue to be handled in a process context safe way without the lock. The enable address is marked faulting while the async fault-in occurs. This ensures that we don't attempt to fault-in more than is necessary. Once the page has been faulted in, an address write is re-attempted. If the page couldn't fault-in, then we wait until the next time the event is enabled to prevent any potential infinite loops. Link: https://lkml.kernel.org/r/20230328235219.203-5-beaub@linux.microsoft.comSigned-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Beau Belgrave authored
As part of the discussions for user_events aligned with user space tracers, it was determined that user programs should register a aligned value to set or clear a bit when an event becomes enabled. Currently a shared page is being used that requires mmap(). Remove the shared page implementation and move to a user registered address implementation. In this new model during the event registration from user programs 3 new values are specified. The first is the address to update when the event is either enabled or disabled. The second is the bit to set/clear to reflect the event being enabled. The third is the size of the value at the specified address. This allows for a local 32/64-bit value in user programs to support both kernel and user tracers. As an example, setting bit 31 for kernel tracers when the event becomes enabled allows for user tracers to use the other bits for ref counts or other flags. The kernel side updates the bit atomically, user programs need to also update these values atomically. User provided addresses must be aligned on a natural boundary, this allows for single page checking and prevents odd behaviors such as a enable value straddling 2 pages instead of a single page. Currently page faults are only logged, future patches will handle these. Link: https://lkml.kernel.org/r/20230328235219.203-4-beaub@linux.microsoft.comSuggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Beau Belgrave authored
During tracefs discussions it was decided instead of requiring a mapping within a user-process to track the lifetime of memory descriptors we should hook the appropriate calls. Do this by adding the minimal stubs required for task fork, exec, and exit. Currently this is just a NOP. Future patches will implement these calls fully. Link: https://lkml.kernel.org/r/20230328235219.203-3-beaub@linux.microsoft.comSuggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Beau Belgrave authored
The UAPI parts need to be split out from the kernel parts of user_events now that other parts of the kernel will reference it. Do so by moving the existing include/linux/user_events.h into include/uapi/linux/user_events.h. Link: https://lkml.kernel.org/r/20230328235219.203-2-beaub@linux.microsoft.comSigned-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Steven Rostedt (Google) authored
The hex, raw and bin formats come from the old PREEMPT_RT patch set latency tracer. That actually gave real alternatives to reading the ascii buffer. But they have started to bit rot and they do not give a good representation of the tracing data. Add "fields" option that will read the trace event fields and parse the data from how the fields are defined: With "fields" = 0 (default) echo 1 > events/sched/sched_switch/enable cat trace <idle>-0 [003] d..2. 540.078653: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/3:1 next_pid=83 next_prio=120 kworker/3:1-83 [003] d..2. 540.078860: sched_switch: prev_comm=kworker/3:1 prev_pid=83 prev_prio=120 prev_state=I ==> next_comm=swapper/3 next_pid=0 next_prio=120 <idle>-0 [003] d..2. 540.206423: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=sshd next_pid=807 next_prio=120 sshd-807 [003] d..2. 540.206531: sched_switch: prev_comm=sshd prev_pid=807 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120 <idle>-0 [001] d..2. 540.206597: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=kworker/u16:4 next_pid=58 next_prio=120 kworker/u16:4-58 [001] d..2. 540.206617: sched_switch: prev_comm=kworker/u16:4 prev_pid=58 prev_prio=120 prev_state=I ==> next_comm=bash next_pid=830 next_prio=120 bash-830 [001] d..2. 540.206678: sched_switch: prev_comm=bash prev_pid=830 prev_prio=120 prev_state=R ==> next_comm=kworker/u16:4 next_pid=58 next_prio=120 kworker/u16:4-58 [001] d..2. 540.206696: sched_switch: prev_comm=kworker/u16:4 prev_pid=58 prev_prio=120 prev_state=I ==> next_comm=bash next_pid=830 next_prio=120 bash-830 [001] d..2. 540.206713: sched_switch: prev_comm=bash prev_pid=830 prev_prio=120 prev_state=R ==> next_comm=kworker/u16:4 next_pid=58 next_prio=120 echo 1 > options/fields <...>-998 [002] d..2. 538.643732: sched_switch: next_prio=0x78 (120) next_pid=0x0 (0) next_comm=swapper/2 prev_state=0x20 (32) prev_prio=0x78 (120) prev_pid=0x3e6 (998) prev_comm=trace-cmd <idle>-0 [001] d..2. 538.643806: sched_switch: next_prio=0x78 (120) next_pid=0x33e (830) next_comm=bash prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x0 (0) prev_comm=swapper/1 bash-830 [001] d..2. 538.644106: sched_switch: next_prio=0x78 (120) next_pid=0x3a (58) next_comm=kworker/u16:4 prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x33e (830) prev_comm=bash kworker/u16:4-58 [001] d..2. 538.644130: sched_switch: next_prio=0x78 (120) next_pid=0x33e (830) next_comm=bash prev_state=0x80 (128) prev_prio=0x78 (120) prev_pid=0x3a (58) prev_comm=kworker/u16:4 bash-830 [001] d..2. 538.644180: sched_switch: next_prio=0x78 (120) next_pid=0x3a (58) next_comm=kworker/u16:4 prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x33e (830) prev_comm=bash kworker/u16:4-58 [001] d..2. 538.644185: sched_switch: next_prio=0x78 (120) next_pid=0x33e (830) next_comm=bash prev_state=0x80 (128) prev_prio=0x78 (120) prev_pid=0x3a (58) prev_comm=kworker/u16:4 bash-830 [001] d..2. 538.644204: sched_switch: next_prio=0x78 (120) next_pid=0x0 (0) next_comm=swapper/1 prev_state=0x1 (1) prev_prio=0x78 (120) prev_pid=0x33e (830) prev_comm=bash <idle>-0 [003] d..2. 538.644211: sched_switch: next_prio=0x78 (120) next_pid=0x327 (807) next_comm=sshd prev_state=0x0 (0) prev_prio=0x78 (120) prev_pid=0x0 (0) prev_comm=swapper/3 sshd-807 [003] d..2. 538.644340: sched_switch: next_prio=0x78 (120) next_pid=0x0 (0) next_comm=swapper/3 prev_state=0x1 (1) prev_prio=0x78 (120) prev_pid=0x327 (807) prev_comm=sshd It traces the data safely without using the trace print formatting. Link: https://lore.kernel.org/linux-trace-kernel/20230328145156.497651be@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Ross Zwisler authored
The canonical location for the tracefs filesystem is at /sys/kernel/tracing. But, from Documentation/trace/ftrace.rst: Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing A comment in kvm_stat still refers to this older debugfs path, so let's update it to avoid confusion. Link: https://lkml.kernel.org/r/20230313211746.1541525-3-zwisler@kernel.org Cc: "Tobin C. Harding" <me@tobin.cc> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Tycho Andersen <tycho@tycho.pizza> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Ross Zwisler authored
The canonical location for the tracefs filesystem is at /sys/kernel/tracing. But, from Documentation/trace/ftrace.rst: Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing scripts/leaking_addresses.pl only skipped this older debugfs path, so let's add the canonical path as well. Link: https://lkml.kernel.org/r/20230313211746.1541525-2-zwisler@kernel.org Cc: "Tobin C. Harding" <me@tobin.cc> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Acked-by: Tycho Andersen <tycho@tycho.pizza> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Ross Zwisler authored
The canonical location for the tracefs filesystem is at /sys/kernel/tracing. But, from Documentation/trace/ftrace.rst: Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing A few spots in tools/testing/selftests still refer to this older debugfs path, so let's update them to avoid confusion. Link: https://lkml.kernel.org/r/20230313211746.1541525-1-zwisler@kernel.org Cc: "Tobin C. Harding" <me@tobin.cc> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tycho Andersen <tycho@tycho.pizza> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
- 28 Mar, 2023 7 commits
-
-
Masami Hiramatsu (Google) authored
Update fprobe.rst for - the private entry_data argument - the return value of the entry handler - the nr_rethook_node field. Link: https://lkml.kernel.org/r/167526701579.433354.3057889264263546659.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <revest@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Masami Hiramatsu (Google) authored
Add a testcase for skipping exit_handler if entry_handler returns !0. Link: https://lkml.kernel.org/r/167526700658.433354.12922388040490848613.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <revest@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Masami Hiramatsu (Google) authored
Skip hooking function return and calling exit_handler if the entry_handler() returns !0. Link: https://lkml.kernel.org/r/167526699798.433354.10998365726830117303.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <revest@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Masami Hiramatsu (Google) authored
Add a test case for nr_maxactive. If the number of active functions is more than nr_maxactive, it must be skipped. Link: https://lkml.kernel.org/r/167526698856.433354.4430007340787176666.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <revest@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Masami Hiramatsu (Google) authored
Add nr_maxactive to specify rethook_node pool size. This means the maximum number of actively running target functions concurrently for probing by exit_handler. Note that if the running function is preempted or sleep, it is still counted as 'active'. Link: https://lkml.kernel.org/r/167526697917.433354.17779774988245113106.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <revest@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Masami Hiramatsu (Google) authored
Add test cases for checking whether private entry_data is correctly passed or not. Link: https://lkml.kernel.org/r/167526697074.433354.17790288501657876219.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <revest@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Masami Hiramatsu (Google) authored
Pass the private entry_data to the entry and exit handlers so that they can share the context data, something like saved function arguments etc. User must specify the private entry_data size by @entry_data_size field before registering the fprobe. Link: https://lkml.kernel.org/r/167526696173.433354.17408372048319432574.stgit@mhiramat.roam.corp.google.com Cc: Florent Revest <revest@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
- 21 Mar, 2023 2 commits
-
-
Steven Rostedt (Google) authored
When debugging a crash that appears to be related to ftrace, but not for sure, it is useful to know if a function was ever enabled by ftrace or not. It could be that a BPF program was attached to it, or possibly a live patch. We are having crashes in the field where this information is not always known. But having ftrace set a flag if a function has ever been attached since boot up helps tremendously in trying to know if a crash had to do with something using ftrace. For analyzing crashes, the use of a kdump image can have access to the flags. When looking at issues where the kernel did not panic, the touched_functions file can simply be used. Link: https://lore.kernel.org/linux-trace-kernel/20230124095653.6fd1640e@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Chris Li <chriscli@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-
Uros Bizjak authored
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails. There is no need to re-read the value in the loop. No functional change intended. Link: https://lkml.kernel.org/r/20230305155532.5549-4-ubizjak@gmail.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Acked-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
-