Commits · 0113d4615dbf053ae9a7a1e0acbc6652713af01f · Kirill Smelkov / linux

14 Jun, 2023 14 commits

tracing/user_events: Document auto-cleanup and remove dyn_event refs · 0113d461

Beau Belgrave authored Jun 14, 2023

Now user_events auto-cleanup upon the last reference by default. This
makes it not possible to use the dynamics event file via tracefs.

Document that auto-cleanup is enabled by default and remove the refernce
to /sys/kernel/tracing/dynamic_events file to make this clear.

Link: https://lkml.kernel.org/r/20230614163336.5797-7-beaub@linux.microsoft.comSigned-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

0113d461

selftests/user_events: Adapt dyn_test to non-persist events · 61701242

Beau Belgrave authored Jun 14, 2023

Now that user_events does not honor persist events the dynamic_events
file cannot be easily used to test parsing and matching cases.

Update dyn_test to use the direct ABI file instead of dynamic_events so
that we still have testing coverage until persist events and
dynamic_events file integration has been decided.

Link: https://lkml.kernel.org/r/20230614163336.5797-6-beaub@linux.microsoft.comSigned-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

61701242

selftests/user_events: Ensure auto cleanup works as expected · 216a137e

Beau Belgrave authored Jun 14, 2023

User events now auto cleanup upon the last reference put. Update
ftrace_test to ensure this works as expected. Ensure EBUSY delays
while event is being deleted do not cause transient failures by
waiting and re-attempting.

Link: https://lkml.kernel.org/r/20230614163336.5797-5-beaub@linux.microsoft.comSigned-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

216a137e

tracing/user_events: Add auto cleanup and future persist flag · a65442ed

Beau Belgrave authored Jun 14, 2023

Currently user events need to be manually deleted via the delete IOCTL
call or via the dynamic_events file. Most operators and processes wish
to have these events auto cleanup when they are no longer used by
anything to prevent them piling without manual maintenance. However,
some operators may not want this, such as pre-registering events via the
dynamic_events tracefs file.

Update user_event_put() to attempt an auto delete of the event if it's
the last reference. The auto delete must run in a work queue to ensure
proper behavior of class->reg() invocations that don't expect the call
to go away from underneath them during the unregister. Add work_struct
to user_event struct to ensure we can do this reliably.

Add a persist flag, that is not yet exposed, to ensure we can toggle
between auto-cleanup and leaving the events existing in the future. When
a non-zero flag is seen during register, return -EINVAL to ensure ABI
is clear for the user processes while we work out the best approach for
persistent events.

Link: https://lkml.kernel.org/r/20230614163336.5797-4-beaub@linux.microsoft.com
Link: https://lore.kernel.org/linux-trace-kernel/20230518093600.3f119d68@rorschach.local.home/Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

a65442ed

tracing/user_events: Track refcount consistently via put/get · f0dbf6fd

Beau Belgrave authored Jun 14, 2023

Various parts of the code today track user_event's refcnt field directly
via a refcount_add/dec. This makes it hard to modify the behavior of the
last reference decrement in all code paths consistently. For example, in
the future we will auto-delete events upon the last reference going
away. This last reference could happen in many places, but we want it to
be consistently handled.

Add user_event_get() and user_event_put() for the add/dec. Update all
places where direct refcounts are being used to utilize these new
functions. In each location pass if event_mutex is locked or not. This
allows us to drop events automatically in future patches clearly. Ensure
when caller states the lock is held, it really is (or is not) held.

Link: https://lkml.kernel.org/r/20230614163336.5797-3-beaub@linux.microsoft.comSigned-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

f0dbf6fd

tracing/user_events: Store register flags on events · b08d7258

Beau Belgrave authored Jun 14, 2023

Currently we don't have any available flags for user processes to use to
indicate options for user_events. We will soon have a flag to indicate
the event should or should not auto-delete once it's not being used by
anyone.

Add a reg_flags field to user_events and parameters to existing
functions to allow for this in future patches.

Link: https://lkml.kernel.org/r/20230614163336.5797-2-beaub@linux.microsoft.comSigned-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

b08d7258

tracing/user_events: Remove user_ns walk for groups · ed0e0ae0

Beau Belgrave authored Jun 01, 2023

During discussions it was suggested that user_ns is not a good place to
try to attach a tracing namespace. The current code has stubs to enable
that work that are very likely to change and incur a performance cost.

Remove the user_ns walk when creating a group and determining the system
name to use, since it's unlikely user_ns will be used in the future.

Link: https://lore.kernel.org/all/20230601-urenkel-holzofen-cd9403b9cadd@brauner/
Link: https://lore.kernel.org/linux-trace-kernel/20230601224928.301-1-beaub@linux.microsoft.comSuggested-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

ed0e0ae0

selftests/user_events: Add perf self-test for empty arguments events · 42187bdc

sunliming authored Jun 06, 2023

Tests to ensure events that has empty arguments can input trace record
correctly when using perf.

Link: https://lkml.kernel.org/r/20230606062027.1008398-5-sunliming@kylinos.cnAcked-by: Beau Belgrave <beaub@linux.microsoft.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

42187bdc

selftests/user_events: Clear the events after perf self-test · 4b56c21b

sunliming authored Jun 06, 2023

When the self test is completed, perf self-test left the user events not to
be cleared. Clear the events by unregister and delete the event.

Link: https://lkml.kernel.org/r/20230606062027.1008398-4-sunliming@kylinos.cnAcked-by: Beau Belgrave <beaub@linux.microsoft.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

4b56c21b

selftests/user_events: Add ftrace self-test for empty arguments events · 3e7269dd

sunliming authored Jun 06, 2023

Tests to ensure events that has empty arguments can input trace record
correctly when using ftrace.

Link: https://lkml.kernel.org/r/20230606062027.1008398-3-sunliming@kylinos.cnAcked-by: Beau Belgrave <beaub@linux.microsoft.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

3e7269dd

tracing/user_events: Fix the incorrect trace record for empty arguments events · 6f05dcab

sunliming authored Jun 06, 2023

The user_events support events that has empty arguments. But the trace event
is discarded and not really committed when the arguments is empty. Fix this
by not attempting to copy in zero-length data.

Link: https://lkml.kernel.org/r/20230606062027.1008398-2-sunliming@kylinos.cnAcked-by: Beau Belgrave <beaub@linux.microsoft.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

6f05dcab

tracing: Modify print_fields() for fields output order · e70bb54d

sunliming authored May 25, 2023

Now the print_fields() print trace event fields in reverse order. Modify
it to the positive sequence.

Example outputs for a user event:
	test0 u32 count1; u32 count2

Output before:
	example-2547    [000] .....   325.666387: test0: count2=0x2 (2) count1=0x1 (1)

Output after:
	example-2742    [002] .....   429.769370: test0: count1=0x1 (1) count2=0x2 (2)

Link: https://lore.kernel.org/linux-trace-kernel/20230525085232.5096-1-sunliming@kylinos.cn

Fixes: 80a76994 ("tracing: Add "fields" option to show raw trace event fields")
Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

e70bb54d

tracing/user_events: Handle matching arguments that is null from dyn_events · cfac4ed7

sunliming authored May 29, 2023

When A registering user event from dyn_events has no argments, it will pass the
matching check, regardless of whether there is a user event with the same name
and arguments. Add the matching check when the arguments of registering user
event is null.

Link: https://lore.kernel.org/linux-trace-kernel/20230529065110.303440-1-sunliming@kylinos.cnSigned-off-by: sunliming <sunliming@kylinos.cn>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

cfac4ed7

tracing/user_events: Prevent same name but different args event · ba470eeb

sunliming authored May 29, 2023

User processes register name_args for events. If the same name but different
args event are registered. The trace outputs of second event are printed
as the first event. This is incorrect.

Return EADDRINUSE back to the user process if the same name but different args
event has being registered.

Link: https://lore.kernel.org/linux-trace-kernel/20230529032100.286534-1-sunliming@kylinos.cnSigned-off-by: sunliming <sunliming@kylinos.cn>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

ba470eeb

30 May, 2023 1 commit

tracing/rv/rtla: Update MAINTAINERS file to point to proper mailing list · aafbb1ee

Steven Rostedt (Google) authored May 29, 2023

The mailing list that goes to linux-trace-devel is for the tracing
libraries, and the patchwork associated to the tracing libraries keys
off of that mailing list.

For anything that lives in the Linux kernel proper (including the tools
directory) must go through linux-trace-kernel, as the patchwork to that
list keys off of the Linux kernel proper.

Update the MAINTAINERS file to reflect the proper mailing lists.

Link: https://lore.kernel.org/linux-trace-kernel/20230529044002.0481452b@rorschach.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>

aafbb1ee

29 May, 2023 5 commits

tracing: Have function_graph selftest call cond_resched() · a2d910f0

Steven Rostedt (Google) authored May 28, 2023

When all kernel debugging is enabled (lockdep, KSAN, etc), the function
graph enabling and disabling can take several seconds to complete. The
function_graph selftest enables and disables function graph tracing
several times. With full debugging enabled, the soft lockup watchdog was
triggering because the selftest was running without ever scheduling.

Add cond_resched() throughout the test to make sure it does not trigger
the soft lockup detector.

Link: https://lkml.kernel.org/r/20230528051742.1325503-6-rostedt@goodmis.orgSigned-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

a2d910f0

tracing: Only make selftest conditionals affect the global_trace · ac9d2cb1

Steven Rostedt (Google) authored May 28, 2023

The tracing_selftest_running and tracing_selftest_disabled variables were
to keep trace_printk() and other writes from affecting the tracing
selftests, as the tracing selftests would examine the ring buffer to see
if it contained what it expected or not. trace_printk() and friends could
add to the ring buffer and cause the selftests to fail (and then disable
the tracer that was being tested). To keep that from happening, these
variables were added and would keep trace_printk() and friends from
writing to the ring buffer while the tests were going on.

But this was only the top level ring buffer (owned by the global_trace
instance). There is no reason to prevent writing into ring buffers of
other instances via the trace_array_printk() and friends. For the
functions that could be used by other instances, check if the global_trace
is the tracer instance that is being written to before deciding to not
allow the write.

Link: https://lkml.kernel.org/r/20230528051742.1325503-5-rostedt@goodmis.orgSigned-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

ac9d2cb1

tracing: Make tracing_selftest_running/delete nops when not used · a3ae76d7

Steven Rostedt (Google) authored May 28, 2023

There's no reason to test the condition variables tracing_selftest_running
or tracing_selftest_delete when tracing selftests are not enabled. Make
them define 0s when not the selftests are not configured in.

Link: https://lkml.kernel.org/r/20230528051742.1325503-4-rostedt@goodmis.orgSigned-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

a3ae76d7

tracing: Have tracer selftests call cond_resched() before running · 9da705d4

Steven Rostedt (Google) authored May 28, 2023

As there are more and more internal selftests being added to the Linux
kernel (KSAN, lockdep, etc) the selftests are taking longer to run when
these are enabled. Add a cond_resched() to the calling of
do_run_tracer_selftest() to force a schedule if NEED_RESCHED is set,
otherwise the soft lockup watchdog may trigger on boot up.

Link: https://lkml.kernel.org/r/20230528051742.1325503-3-rostedt@goodmis.orgSigned-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

9da705d4

tracing: Move setting of tracing_selftest_running out of register_tracer() · e8352cf5

Steven Rostedt (Google) authored May 28, 2023

The variables tracing_selftest_running and tracing_selftest_disabled are
only used for when CONFIG_FTRACE_STARTUP_TEST is enabled. Make them only
visible within the selftest code. The setting of those variables are in
the register_tracer() call, and set in a location where they do not need
to be. Create a wrapper around run_tracer_selftest() called
do_run_tracer_selftest() which sets those variables, and have
register_tracer() call that instead.

Having those variables only set within the CONFIG_FTRACE_STARTUP_TEST
scope gets rid of them (and also the ability to remove testing against
them) when the startup tests are not enabled (most cases).

Link: https://lkml.kernel.org/r/20230528051742.1325503-2-rostedt@goodmis.orgSigned-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

e8352cf5

24 May, 2023 7 commits

tracing/selftests: Update synthetic event selftest to use common_stacktrace · f1aab363

Steven Rostedt (Google) authored May 23, 2023

With the rename of the stacktrace field to common_stacktrace, update the
selftests to reflect this change. Copy the current selftest to test the
backward compatibility "stacktrace" keyword. Also the "requires" of that
test was incorrect, so it would never actually ran before. That is fixed
now.

Link: https://lore.kernel.org/linux-trace-kernel/20230523225402.55951f2f@rorschach.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

f1aab363

tracing: Rename stacktrace field to common_stacktrace · 4b512860

Steven Rostedt (Google) authored May 23, 2023

The histogram and synthetic events can use a pseudo event called
"stacktrace" that will create a stacktrace at the time of the event and
use it just like it was a normal field. We have other pseudo events such
as "common_cpu" and "common_timestamp". To stay consistent with that,
convert "stacktrace" to "common_stacktrace". As this was used in older
kernels, to keep backward compatibility, this will act just like
"common_cpu" did with "cpu". That is, "cpu" will be the same as
"common_cpu" unless the event has a "cpu" field. In which case, the
event's field is used. The same is true with "stacktrace".

Also update the documentation to reflect this change.

Link: https://lore.kernel.org/linux-trace-kernel/20230523230913.6860e28d@rorschach.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

4b512860

tracing/histograms: Allow variables to have some modifiers · e30fbc61

Steven Rostedt (Google) authored May 23, 2023

Modifiers are used to change the behavior of keys. For instance, they
can grouped into buckets, converted to syscall names (from the syscall
identifier), show task->comm of the current pid, be an array of longs
that represent a stacktrace, and more.

It was found that nothing stopped a value from taking a modifier. As
values are simple counters. If this happened, it would call code that
was not expecting a modifier and crash the kernel. This was fixed by
having the ___create_val_field() function test if a modifier was present
and fail if one was. This fixed the crash.

Now there's a problem with variables. Variables are used to pass fields
from one event to another. Variables are allowed to have some modifiers,
as the processing may need to happen at the time of the event (like
stacktraces and comm names of the current pid). The issue is that it too
uses __create_val_field(). Now that fails on modifiers, variables can no
longer use them (this is a regression).

As not all modifiers are for variables, have them use a separate check.

Link: https://lore.kernel.org/linux-trace-kernel/20230523221108.064a5d82@rorschach.local.home

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: e0213434 ("tracing: Do not let histogram values have some modifiers")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

e30fbc61

tracing/user_events: Document user_event_mm one-shot list usage · ff9e1632

Beau Belgrave authored May 19, 2023

During 6.4 development it became clear that the one-shot list used by
the user_event_mm's next field was confusing to others. It is not clear
how this list is protected or what the next field usage is for unless
you are familiar with the code.

Add comments into the user_event_mm struct indicating lock requirement
and usage. Also document how and why this approach was used via comments
in both user_event_enabler_update() and user_event_mm_get_all() and the
rules to properly use it.

Link: https://lkml.kernel.org/r/20230519230741.669-5-beaub@linux.microsoft.com
Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wicngggxVpbnrYHjRTwGE0WYscPRM+L2HO2BF8ia1EXgQ@mail.gmail.com/Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

ff9e1632

tracing/user_events: Rename link fields for clarity · dcbd1ac2

Beau Belgrave authored May 19, 2023

Currently most list_head fields of various structs within user_events
are simply named link. This causes folks to keep additional context in
their head when working with the code, which can be confusing.

Instead of using link, describe what the actual link is, for example:
list_del_rcu(&mm->link);

Changes into:
list_del_rcu(&mm->mms_link);

The reader now is given a hint the link is to the mms global list
instead of having to remember or spot check within the code.

Link: https://lkml.kernel.org/r/20230519230741.669-4-beaub@linux.microsoft.com
Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wicngggxVpbnrYHjRTwGE0WYscPRM+L2HO2BF8ia1EXgQ@mail.gmail.com/Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

dcbd1ac2

tracing/user_events: Remove RCU lock while pinning pages · aaecdaf9

Linus Torvalds authored May 19, 2023

pin_user_pages_remote() can reschedule which means we cannot hold any
RCU lock while using it. Now that enablers are not exposed out to the
tracing register callbacks during fork(), there is clearly no need to
require the RCU lock as event_mutex is enough to protect changes.

Remove unneeded RCU usages when pinning pages and walking enablers with
event_mutex held. Cleanup a misleading "safe" list walk that is not
needed. During fork() duplication, remove unneeded RCU list add, since
the list is not exposed yet.

Link: https://lkml.kernel.org/r/20230519230741.669-3-beaub@linux.microsoft.com
Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wiiBfT4zNS29jA0XEsy8EmbqTH1hAPdRJCDAJMD8Gxt5A@mail.gmail.com/

Fixes: 72357590 ("tracing/user_events: Use remote writes for event enablement")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ change log written by Beau Belgrave ]
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

aaecdaf9

tracing/user_events: Split up mm alloc and attach · 3e0fea09

Linus Torvalds authored May 19, 2023

When a new mm is being created in a fork() path it currently is
allocated and then attached in one go. This leaves the mm exposed out to
the tracing register callbacks while any parent enabler locations are
copied in. This should not happen.

Split up mm alloc and attach as unique operations. When duplicating
enablers, first alloc, then duplicate, and only upon success, attach.
This prevents any timing window outside of the event_reg mutex for
enablement walking. This allows for dropping RCU requirement for
enablement walking in later patches.

Link: https://lkml.kernel.org/r/20230519230741.669-2-beaub@linux.microsoft.com
Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=whTBvXJuoi_kACo3qi5WZUmRrhyA-_=rRFsycTytmB6qw@mail.gmail.com/Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ change log written by Beau Belgrave ]
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

3e0fea09

23 May, 2023 2 commits

tracing/timerlat: Always wakeup the timerlat thread · 632478a0

Daniel Bristot de Oliveira authored May 11, 2023

While testing rtla timerlat auto analysis, I reach a condition where
the interface was not receiving tracing data. I was able to manually
reproduce the problem with these steps:

  # echo 0 > tracing_on                 # disable trace
  # echo 1 > osnoise/stop_tracing_us    # stop trace if timerlat irq > 1 us
  # echo timerlat > current_tracer      # enable timerlat tracer
  # sleep 1                             # wait... that is the time when rtla
                                        # apply configs like prio or cgroup
  # echo 1 > tracing_on                 # start tracing
  # cat trace
  # tracer: timerlat
  #
  #                                _-----=> irqs-off
  #                               / _----=> need-resched
  #                              | / _---=> hardirq/softirq
  #                              || / _--=> preempt-depth
  #                              ||| / _-=> migrate-disable
  #                              |||| /     delay
  #                              |||||            ACTIVATION
  #           TASK-PID      CPU# |||||   TIMESTAMP   ID            CONTEXT                 LATENCY
  #              | |         |   |||||      |         |                  |                       |
        NOTHING!

Then, trying to enable tracing again with echo 1 > tracing_on resulted
in no change: the trace was still not tracing.

This problem happens because the timerlat IRQ hits the stop tracing
condition while tracing is off, and do not wake up the timerlat thread,
so the timerlat threads are kept sleeping forever, resulting in no
trace, even after re-enabling the tracer.

Avoid this condition by always waking up the threads, even after stopping
tracing, allowing the tracer to return to its normal operating after
a new tracing on.

Link: https://lore.kernel.org/linux-trace-kernel/1ed8f830638b20a39d535d27d908e319a9a3c4e2.1683822622.git.bristot@kernel.org

Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: stable@vger.kernel.org
Fixes: a955d7ea ("trace: Add timerlat tracer")
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

632478a0

tracing/user_events: Use long vs int for atomic bit ops · ee7751b5

Beau Belgrave authored May 05, 2023

Each event stores a int to track which bit to set/clear when enablement
changes. On big endian 64-bit configurations, it's possible this could
cause memory corruption when it's used for atomic bit operations.

Use unsigned long for enablement values to ensure any possible
corruption cannot occur. Downcast to int after mask for the bit target.

Link: https://lore.kernel.org/all/6f758683-4e5e-41c3-9b05-9efc703e827c@kili.mountain/
Link: https://lore.kernel.org/linux-trace-kernel/20230505205855.6407-1-beaub@linux.microsoft.com

Fixes: dcb8177c ("tracing/user_events: Add ioctl for disabling addresses")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

ee7751b5

21 May, 2023 11 commits

Linux 6.4-rc3 · 44c026a7
Linus Torvalds authored May 21, 2023

44c026a7

Merge tag 'uml-for-linus-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux · fa4fe8ce

Linus Torvalds authored May 21, 2023

Pull UML fix from Richard Weinberger:

 - Fix modular build for UML watchdog

* tag 'uml-for-linus-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
  um: harddog: fix modular build

fa4fe8ce

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · a35747c3

Linus Torvalds authored May 21, 2023

Pull kvm fixes from Paolo Bonzini:
 "ARM:

   - Plug a race in the stage-2 mapping code where the IPA and the PA
     would end up being out of sync

   - Make better use of the bitmap API (bitmap_zero, bitmap_zalloc...)

   - FP/SVE/SME documentation update, in the hope that this field
     becomes clearer...

   - Add workaround for Apple SEIS brokenness to a new SoC

   - Random comment fixes

  x86:

   - add MSR_IA32_TSX_CTRL into msrs_to_save

   - fixes for XCR0 handling in SGX enclaves

  Generic:

   - Fix vcpu_array[0] races

   - Fix race between starting a VM and 'reboot -f'"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: VMX: add MSR_IA32_TSX_CTRL into msrs_to_save
  KVM: x86: Don't adjust guest's CPUID.0x12.1 (allowed SGX enclave XFRM)
  KVM: VMX: Don't rely _only_ on CPUID to enforce XCR0 restrictions for ECREATE
  KVM: Fix vcpu_array[0] races
  KVM: VMX: Fix header file dependency of asm/vmx.h
  KVM: Don't enable hardware after a restart/shutdown is initiated
  KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown
  KVM: arm64: vgic: Add Apple M2 PRO/MAX cpus to the list of broken SEIS implementations
  KVM: arm64: Clarify host SME state management
  KVM: arm64: Restructure check for SVE support in FP trap handler
  KVM: arm64: Document check for TIF_FOREIGN_FPSTATE
  KVM: arm64: Fix repeated words in comments
  KVM: arm64: Constify start/end/phys fields of the pgtable walker data
  KVM: arm64: Infer PA offset from VA in hyp map walker
  KVM: arm64: Infer the PA offset from IPA in stage-2 map walker
  KVM: arm64: Use the bitmap API to allocate bitmaps
  KVM: arm64: Slightly optimize flush_context()

a35747c3

Merge tag 'perf-tools-fixes-for-v6.4-1-2023-05-20' of... · c47d122c

Linus Torvalds authored May 21, 2023

Merge tag 'perf-tools-fixes-for-v6.4-1-2023-05-20' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fail graciously if BUILD_BPF_SKEL=1 is specified and clang isn't
   available

 - Add empty 'struct rq' to 'perf lock contention' to satisfy libbpf
   'runqueue' type verification. This feature is built only with
   BUILD_BPF_SKEL=1

 - Make vmlinux.h use bpf.h and perf_event.h in source directory, not
   system ones that may be old and not have things like 'union
   perf_sample_weight'

 - Add system include paths to BPF builds to pick things missing in the
   headers included by clang -target bpf

 - Update various header copies with the kernel sources

 - Change divide by zero and not supported events behavior to show
   'nan'/'not counted' in 'perf stat' output.

   This happens when using things like 'perf stat -M TopdownL2 true',
   involving JSON metrics

 - Update no event/metric expectations affected by using JSON metrics in
   'perf stat -ddd' perf test

 - Avoid segv with 'perf stat --topdown' for metrics without a group

 - Do not assume which events may have a PMU name, allowing the logic to
   keep an AUX event group together. Makes this usecase work again:

     $ perf record --no-bpf-event -c 10 -e '{intel_pt//,tlb_flush.stlb_any/aux-sample-size=8192/pp}:u' -- sleep 0.1
     [ perf record: Woken up 1 times to write data ]
     [ perf record: Captured and wrote 0.078 MB perf.data ]
     $ perf script -F-dso,+addr | grep -C5 tlb_flush.stlb_any | head -11
     sleep 20444 [003]  7939.510243:  1  branches:uH:  7f5350cc82a2 dl_main+0x9a2 => 7f5350cb38f0 _dl_add_to_namespace_list+0x0
     sleep 20444 [003]  7939.510243:  1  branches:uH:  7f5350cb3908 _dl_add_to_namespace_list+0x18 => 7f5350cbb080 rtld_mutex_dummy+0x0
     sleep 20444 [003]  7939.510243:  1  branches:uH:  7f5350cc8350 dl_main+0xa50 => 0 [unknown]
     sleep 20444 [003]  7939.510244:  1  branches:uH:  7f5350cc83ca dl_main+0xaca => 7f5350caeb60 _dl_process_pt_gnu_property+0x0
     sleep 20444 [003]  7939.510245:  1  branches:uH:  7f5350caeb60 _dl_process_pt_gnu_property+0x0 => 0 [unknown]
     sleep 20444  7939.510245:       10 tlb_flush.stlb_any/aux-sample-size=8192/pp: 0 7f5350caeb60 _dl_process_pt_gnu_property+0x0
     sleep 20444 [003]  7939.510254:  1  branches:uH:  7f5350cc87fe dl_main+0xefe => 7f5350ccd240 strcmp+0x0
     sleep 20444 [003]  7939.510254:  1  branches:uH:  7f5350cc8862 dl_main+0xf62 => 0 [unknown]

 - Add a check for the above use case in 'perf test test_intel_pt'

 - Fix build with refcount checking on arm64, it was still accessing
   fields that need to be wrapped so that the refcounted struct gets
   checked

 - Fix contextid validation in ARM's CS-ETM, so that older kernels
   without that field can still be supported

 - Skip unsupported aggregation for stat events found in perf.data files
   in 'perf script'

 - Add stat test for record and script to check the previous problem

 - Remove needless debuginfod queries from 'perf test java symbol', this
   was just making the test take a long time to complete

 - Address python SafeConfigParser() deprecation warning in 'perf test
   attr'

 - Fix __NR_execve undeclared on i386 'perf bench syscall' build error

* tag 'perf-tools-fixes-for-v6.4-1-2023-05-20' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (33 commits)
  perf bench syscall: Fix __NR_execve undeclared build error
  perf test attr: Fix python SafeConfigParser() deprecation warning
  perf test attr: Update no event/metric expectations
  tools headers disabled-features: Sync with the kernel sources
  tools headers UAPI: Sync arch prctl headers with the kernel sources
  tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench'
  tools headers x86 cpufeatures: Sync with the kernel sources
  tools headers UAPI: Sync s390 syscall table file that wires up the memfd_secret syscall
  tools headers UAPI: Sync linux/prctl.h with the kernel sources
  perf metrics: Avoid segv with --topdown for metrics without a group
  perf lock contention: Add empty 'struct rq' to satisfy libbpf 'runqueue' type verification
  perf cs-etm: Fix contextid validation
  perf arm64: Fix build with refcount checking
  perf test: Add stat test for record and script
  perf script: Skip aggregation for stat events
  perf build: Add system include paths to BPF builds
  perf bpf skels: Make vmlinux.h use bpf.h and perf_event.h in source directory
  perf parse-events: Do not break up AUX event group
  perf test test_intel_pt.sh: Test sample mode with event with PMU name
  perf evsel: Modify group pmu name for software events
  ...

c47d122c

Merge tag 'powerpc-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 4927cb98

Linus Torvalds authored May 21, 2023

Pull powerpc fixes from Michael Ellerman:

 - Fix broken soft dirty tracking when using the Radix MMU (>= P9)

 - Fix ISA mapping when "ranges" property is not present, for PASemi
   Nemo boards

 - Fix a possible WARN_ON_ONCE hitting in BPF extable handling

 - Fix incorrect DMA address handling when using 2MB TCEs

 - Fix a bug in IOMMU table handling for SR-IOV devices

 - Fix the recent rework of IOMMU handling which left arch code calling
   clean up routines that are handled by the IOMMU core

 - A few assorted build fixes

Thanks to Christian Zigotzky, Dan Horák, Gaurav Batra, Hari Bathini,
Jason Gunthorpe, Nathan Chancellor, Naveen N. Rao, Nicholas Piggin, Pali
Rohár, Randy Dunlap, and Rob Herring.

* tag 'powerpc-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/iommu: Incorrect DDW Table is referenced for SR-IOV device
  powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs
  powerpc/iommu: Remove iommu_del_device()
  powerpc/crypto: Fix aes-gcm-p10 build when VSX=n
  powerpc/bpf: populate extable entries only during the last pass
  powerpc/boot: Disable power10 features after BOOTAFLAGS assignment
  powerpc/64s/radix: Fix soft dirty tracking
  powerpc/fsl_uli1575: fix kconfig warnings and build errors
  powerpc/isa-bridge: Fix ISA mapping when "ranges" is not present

4927cb98

Merge tag 'ata-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata · 90af47ed

Linus Torvalds authored May 21, 2023

Pull ata fix from Damien Le Moal:

 - Fix DT binding for the ahci-ceva driver to fully describe all iommus,
   from Michal

* tag 'ata-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  dt-bindings: ata: ahci-ceva: Cover all 4 iommus entries

90af47ed

Merge tag 'fbdev-for-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev · 70e137e3

Linus Torvalds authored May 21, 2023

Pull fbdev fixes from Helge Deller:
 "A few small unspectacular fbdev fixes:

   - Fix for USB endpoint check in udlfb (found by syzbot fuzzer)

   - Small fix in error code path in omapfb

   - compiler warning fixes in fbmem & i810

   - code removal and whitespace cleanups in stifb and atyfb"

* tag 'fbdev-for-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: stifb: Whitespace cleanups
  fbdev: udlfb: Use usb_control_msg_send()
  fbdev: udlfb: Fix endpoint check
  fbdev: atyfb: Remove unused clock determination
  fbdev: i810: include i810_main.h in i810_dvt.c
  fbdev: fbmem: mark get_fb_unmapped_area() static
  fbdev: omapfb: panel-tpo-td043mtea1: fix error code in probe()

70e137e3

Merge tag '6.4-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd · e2065b8c

Linus Torvalds authored May 21, 2023

Pull ksmbd server fixes from Steve French:

 - two fixes for incorrect SMB3 message validation (one for client which
   uses 8 byte padding, and one for empty bcc)

 - two fixes for out of bounds bugs: one for username offset checks (in
   session setup) and the other for create context name length checks in
   open requests

* tag '6.4-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: smb2: Allow messages padded to 8byte boundary
  ksmbd: allocate one more byte for implied bcc[0]
  ksmbd: fix wrong UserName check in session_user
  ksmbd: fix global-out-of-bounds in smb2_find_context_vals

e2065b8c

Merge tag '6.4-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 0c9dcf12

Linus Torvalds authored May 21, 2023

Pull cifs client fixes from Steve French:
 "Two smb3 client fixes, both related to deferred close, and also for
  stable:

   - send close for deferred handles before not after lease break
     response to avoid possible sharing violations

   - check all opens on an inode (looking for deferred handles) when
     lease break is returned not just the handle the lease break came in
     on"

* tag '6.4-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  SMB3: drop reference to cfile before sending oplock break
  SMB3: Close all deferred handles of inode in case of handle lease break

0c9dcf12

KVM: VMX: add MSR_IA32_TSX_CTRL into msrs_to_save · b9846a69

Mingwei Zhang authored May 09, 2023

Add MSR_IA32_TSX_CTRL into msrs_to_save[] to explicitly tell userspace to
save/restore the register value during migration. Missing this may cause
userspace that relies on KVM ioctl(KVM_GET_MSR_INDEX_LIST) fail to port the
value to the target VM.

In addition, there is no need to add MSR_IA32_TSX_CTRL when
ARCH_CAP_TSX_CTRL_MSR is not supported in kvm_get_arch_capabilities(). So
add the checking in kvm_probe_msr_to_save().

Fixes: c11f83e0 ("KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality")
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20230509032348.1153070-1-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b9846a69

KVM: x86: Don't adjust guest's CPUID.0x12.1 (allowed SGX enclave XFRM) · 275a8724

Sean Christopherson authored May 03, 2023

Drop KVM's manipulation of guest's CPUID.0x12.1 ECX and EDX, i.e. the
allowed XFRM of SGX enclaves, now that KVM explicitly checks the guest's
allowed XCR0 when emulating ECREATE.

Note, this could theoretically break a setup where userspace advertises
a "bad" XFRM and relies on KVM to provide a sane CPUID model, but QEMU
is the only known user of KVM SGX, and QEMU explicitly sets the SGX CPUID
XFRM subleaf based on the guest's XCR0.
Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230503160838.3412617-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

275a8724