• Chengming Zhou's avatar
    perf/core: Fix cgroup events tracking · f841b682
    Chengming Zhou authored
    We encounter perf warnings when using cgroup events like:
    
      cd /sys/fs/cgroup
      mkdir test
      perf stat -e cycles -a -G test
    
    Which then triggers:
    
      WARNING: CPU: 0 PID: 690 at kernel/events/core.c:849 perf_cgroup_switch+0xb2/0xc0
      Call Trace:
       <TASK>
       __schedule+0x4ae/0x9f0
       ? _raw_spin_unlock_irqrestore+0x23/0x40
       ? __cond_resched+0x18/0x20
       preempt_schedule_common+0x2d/0x70
       __cond_resched+0x18/0x20
       wait_for_completion+0x2f/0x160
       ? cpu_stop_queue_work+0x9e/0x130
       affine_move_task+0x18a/0x4f0
    
      WARNING: CPU: 0 PID: 690 at kernel/events/core.c:829 ctx_sched_in+0x1cf/0x1e0
      Call Trace:
       <TASK>
       ? ctx_sched_out+0xb7/0x1b0
       perf_cgroup_switch+0x88/0xc0
       __schedule+0x4ae/0x9f0
       ? _raw_spin_unlock_irqrestore+0x23/0x40
       ? __cond_resched+0x18/0x20
       preempt_schedule_common+0x2d/0x70
       __cond_resched+0x18/0x20
       wait_for_completion+0x2f/0x160
       ? cpu_stop_queue_work+0x9e/0x130
       affine_move_task+0x18a/0x4f0
    
    The above two warnings are not complete here since I remove other
    unimportant information. The problem is caused by the perf cgroup
    events tracking:
    
      CPU0					CPU1
      perf_event_open()
        perf_event_alloc()
          account_event()
    	account_event_cpu()
    	  atomic_inc(perf_cgroup_events)
    					  __perf_event_task_sched_out()
    					    if (atomic_read(perf_cgroup_events))
    					      perf_cgroup_switch()
    						// kernel/events/core.c:849
    						WARN_ON_ONCE(cpuctx->ctx.nr_cgroups == 0)
    						if (READ_ONCE(cpuctx->cgrp) == cgrp) // false
    						  return
    						perf_ctx_lock()
    						ctx_sched_out()
    						cpuctx->cgrp = cgrp
    						ctx_sched_in()
    						  perf_cgroup_set_timestamp()
    						    // kernel/events/core.c:829
    						    WARN_ON_ONCE(!ctx->nr_cgroups)
    						perf_ctx_unlock()
        perf_install_in_context()
          cpu_function_call()
    					  __perf_install_in_context()
    					    add_event_to_ctx()
    					      list_add_event()
    						perf_cgroup_event_enable()
    						  ctx->nr_cgroups++
    						  cpuctx->cgrp = X
    
    We can see from above that we wrongly use percpu atomic perf_cgroup_events
    to check if we need to perf_cgroup_switch(), which should only be used
    when we know this CPU has cgroup events enabled.
    
    The commit bd275681 ("perf: Rewrite core context handling") change
    to have only one context per-CPU, so we can just use cpuctx->cgrp to
    check if this CPU has cgroup events enabled.
    
    So percpu atomic perf_cgroup_events is not needed.
    
    Fixes: bd275681 ("perf: Rewrite core context handling")
    Signed-off-by: default avatarChengming Zhou <zhouchengming@bytedance.com>
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Tested-by: default avatarRavi Bangoria <ravi.bangoria@amd.com>
    Link: https://lkml.kernel.org/r/20221207124023.66252-1-zhouchengming@bytedance.com
    f841b682
core.c 329 KB