1. 05 Apr, 2022 2 commits
    • Sebastian Andrzej Siewior's avatar
      sched: Teach the forced-newidle balancer about CPU affinity limitation. · 386ef214
      Sebastian Andrzej Siewior authored
      try_steal_cookie() looks at task_struct::cpus_mask to decide if the
      task could be moved to `this' CPU. It ignores that the task might be in
      a migration disabled section while not on the CPU. In this case the task
      must not be moved otherwise per-CPU assumption are broken.
      
      Use is_cpu_allowed(), as suggested by Peter Zijlstra, to decide if the a
      task can be moved.
      
      Fixes: d2dfa17b ("sched: Trivial forced-newidle balancer")
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/YjNK9El+3fzGmswf@linutronix.de
      386ef214
    • Peter Zijlstra's avatar
      sched/core: Fix forceidle balancing · 5b6547ed
      Peter Zijlstra authored
      Steve reported that ChromeOS encounters the forceidle balancer being
      ran from rt_mutex_setprio()'s balance_callback() invocation and
      explodes.
      
      Now, the forceidle balancer gets queued every time the idle task gets
      selected, set_next_task(), which is strictly too often.
      rt_mutex_setprio() also uses set_next_task() in the 'change' pattern:
      
      	queued = task_on_rq_queued(p); /* p->on_rq == TASK_ON_RQ_QUEUED */
      	running = task_current(rq, p); /* rq->curr == p */
      
      	if (queued)
      		dequeue_task(...);
      	if (running)
      		put_prev_task(...);
      
      	/* change task properties */
      
      	if (queued)
      		enqueue_task(...);
      	if (running)
      		set_next_task(...);
      
      However, rt_mutex_setprio() will explicitly not run this pattern on
      the idle task (since priority boosting the idle task is quite insane).
      Most other 'change' pattern users are pidhash based and would also not
      apply to idle.
      
      Also, the change pattern doesn't contain a __balance_callback()
      invocation and hence we could have an out-of-band balance-callback,
      which *should* trigger the WARN in rq_pin_lock() (which guards against
      this exact anti-pattern).
      
      So while none of that explains how this happens, it does indicate that
      having it in set_next_task() might not be the most robust option.
      
      Instead, explicitly queue the forceidle balancer from pick_next_task()
      when it does indeed result in forceidle selection. Having it here,
      ensures it can only be triggered under the __schedule() rq->lock
      instance, and hence must be ran from that context.
      
      This also happens to clean up the code a little, so win-win.
      
      Fixes: d2dfa17b ("sched: Trivial forced-newidle balancer")
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: default avatarT.J. Alumbaugh <talumbau@chromium.org>
      Link: https://lkml.kernel.org/r/20220330160535.GN8939@worktop.programming.kicks-ass.net
      5b6547ed
  2. 03 Apr, 2022 8 commits
  3. 02 Apr, 2022 30 commits