• Petr Pavlu's avatar
    tracing: Fix incomplete locking when disabling buffered events · 7fed14f7
    Petr Pavlu authored
    The following warning appears when using buffered events:
    
    [  203.556451] WARNING: CPU: 53 PID: 10220 at kernel/trace/ring_buffer.c:3912 ring_buffer_discard_commit+0x2eb/0x420
    [...]
    [  203.670690] CPU: 53 PID: 10220 Comm: stress-ng-sysin Tainted: G            E      6.7.0-rc2-default #4 56e6d0fcf5581e6e51eaaecbdaec2a2338c80f3a
    [  203.670704] Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017
    [  203.670709] RIP: 0010:ring_buffer_discard_commit+0x2eb/0x420
    [  203.735721] Code: 4c 8b 4a 50 48 8b 42 48 49 39 c1 0f 84 b3 00 00 00 49 83 e8 01 75 b1 48 8b 42 10 f0 ff 40 08 0f 0b e9 fc fe ff ff f0 ff 47 08 <0f> 0b e9 77 fd ff ff 48 8b 42 10 f0 ff 40 08 0f 0b e9 f5 fe ff ff
    [  203.735734] RSP: 0018:ffffb4ae4f7b7d80 EFLAGS: 00010202
    [  203.735745] RAX: 0000000000000000 RBX: ffffb4ae4f7b7de0 RCX: ffff8ac10662c000
    [  203.735754] RDX: ffff8ac0c750be00 RSI: ffff8ac10662c000 RDI: ffff8ac0c004d400
    [  203.781832] RBP: ffff8ac0c039cea0 R08: 0000000000000000 R09: 0000000000000000
    [  203.781839] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    [  203.781842] R13: ffff8ac10662c000 R14: ffff8ac0c004d400 R15: ffff8ac10662c008
    [  203.781846] FS:  00007f4cd8a67740(0000) GS:ffff8ad798880000(0000) knlGS:0000000000000000
    [  203.781851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  203.781855] CR2: 0000559766a74028 CR3: 00000001804c4000 CR4: 00000000001506f0
    [  203.781862] Call Trace:
    [  203.781870]  <TASK>
    [  203.851949]  trace_event_buffer_commit+0x1ea/0x250
    [  203.851967]  trace_event_raw_event_sys_enter+0x83/0xe0
    [  203.851983]  syscall_trace_enter.isra.0+0x182/0x1a0
    [  203.851990]  do_syscall_64+0x3a/0xe0
    [  203.852075]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
    [  203.852090] RIP: 0033:0x7f4cd870fa77
    [  203.982920] Code: 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 b8 89 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 43 0e 00 f7 d8 64 89 01 48
    [  203.982932] RSP: 002b:00007fff99717dd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000089
    [  203.982942] RAX: ffffffffffffffda RBX: 0000558ea1d7b6f0 RCX: 00007f4cd870fa77
    [  203.982948] RDX: 0000000000000000 RSI: 00007fff99717de0 RDI: 0000558ea1d7b6f0
    [  203.982957] RBP: 00007fff99717de0 R08: 00007fff997180e0 R09: 00007fff997180e0
    [  203.982962] R10: 00007fff997180e0 R11: 0000000000000246 R12: 00007fff99717f40
    [  204.049239] R13: 00007fff99718590 R14: 0000558e9f2127a8 R15: 00007fff997180b0
    [  204.049256]  </TASK>
    
    For instance, it can be triggered by running these two commands in
    parallel:
    
     $ while true; do
        echo hist:key=id.syscall:val=hitcount > \
          /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger;
      done
     $ stress-ng --sysinfo $(nproc)
    
    The warning indicates that the current ring_buffer_per_cpu is not in the
    committing state. It happens because the active ring_buffer_event
    doesn't actually come from the ring_buffer_per_cpu but is allocated from
    trace_buffered_event.
    
    The bug is in function trace_buffered_event_disable() where the
    following normally happens:
    
    * The code invokes disable_trace_buffered_event() via
      smp_call_function_many() and follows it by synchronize_rcu(). This
      increments the per-CPU variable trace_buffered_event_cnt on each
      target CPU and grants trace_buffered_event_disable() the exclusive
      access to the per-CPU variable trace_buffered_event.
    
    * Maintenance is performed on trace_buffered_event, all per-CPU event
      buffers get freed.
    
    * The code invokes enable_trace_buffered_event() via
      smp_call_function_many(). This decrements trace_buffered_event_cnt and
      releases the access to trace_buffered_event.
    
    A problem is that smp_call_function_many() runs a given function on all
    target CPUs except on the current one. The following can then occur:
    
    * Task X executing trace_buffered_event_disable() runs on CPU 0.
    
    * The control reaches synchronize_rcu() and the task gets rescheduled on
      another CPU 1.
    
    * The RCU synchronization finishes. At this point,
      trace_buffered_event_disable() has the exclusive access to all
      trace_buffered_event variables except trace_buffered_event[CPU0]
      because trace_buffered_event_cnt[CPU0] is never incremented and if the
      buffer is currently unused, remains set to 0.
    
    * A different task Y is scheduled on CPU 0 and hits a trace event. The
      code in trace_event_buffer_lock_reserve() sees that
      trace_buffered_event_cnt[CPU0] is set to 0 and decides the use the
      buffer provided by trace_buffered_event[CPU0].
    
    * Task X continues its execution in trace_buffered_event_disable(). The
      code incorrectly frees the event buffer pointed by
      trace_buffered_event[CPU0] and resets the variable to NULL.
    
    * Task Y writes event data to the now freed buffer and later detects the
      created inconsistency.
    
    The issue is observable since commit dea49978 ("tracing: Fix warning
    in trace_buffered_event_disable()") which moved the call of
    trace_buffered_event_disable() in __ftrace_event_enable_disable()
    earlier, prior to invoking call->class->reg(.. TRACE_REG_UNREGISTER ..).
    The underlying problem in trace_buffered_event_disable() is however
    present since the original implementation in commit 0fc1b09f
    ("tracing: Use temp buffer when filtering events").
    
    Fix the problem by replacing the two smp_call_function_many() calls with
    on_each_cpu_mask() which invokes a given callback on all CPUs.
    
    Link: https://lore.kernel.org/all/20231127151248.7232-2-petr.pavlu@suse.com/
    Link: https://lkml.kernel.org/r/20231205161736.19663-2-petr.pavlu@suse.com
    
    Cc: stable@vger.kernel.org
    Fixes: 0fc1b09f ("tracing: Use temp buffer when filtering events")
    Fixes: dea49978 ("tracing: Fix warning in trace_buffered_event_disable()")
    Signed-off-by: default avatarPetr Pavlu <petr.pavlu@suse.com>
    Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
    7fed14f7
trace.c 258 KB