• Tejun Heo's avatar
    sched_ext: Don't call put_prev_task_scx() before picking the next task · 7c65ae81
    Tejun Heo authored
    fd03c5b8 ("sched: Rework pick_next_task()") changed the definition of
    pick_next_task() from:
    
      pick_next_task() := pick_task() + set_next_task(.first = true)
    
    to:
    
      pick_next_task(prev) := pick_task() + put_prev_task() + set_next_task(.first = true)
    
    making invoking put_prev_task() pick_next_task()'s responsibility. This
    reordering allows pick_task() to be shared between regular and core-sched
    paths and put_prev_task() to know the next task.
    
    sched_ext depended on put_prev_task_scx() enqueueing the current task before
    pick_next_task_scx() is called. While pulling sched/core changes,
    70cc76aa0d80 ("Merge branch 'tip/sched/core' into for-6.12") added an
    explicit put_prev_task_scx() call for SCX tasks in pick_next_task_scx()
    before picking the first task as a workaround.
    
    Clean it up and adopt the conventions that other sched classes are
    following.
    
    The operation of keeping running the current task was spread and required
    the task to be put on the local DSQ before picking:
    
      - balance_one() used SCX_TASK_BAL_KEEP to indicate that the task is still
        runnable, hasn't exhausted its slice, and thus should keep running.
    
      - put_prev_task_scx() enqueued the task to local DSQ if SCX_TASK_BAL_KEEP
        is set. It also called do_enqueue_task() with SCX_ENQ_LAST if it is the
        only runnable task. do_enqueue_task() in turn decided whether to use the
        local DSQ depending on SCX_OPS_ENQ_LAST.
    
    Consolidate the logic in balance_one() as it always knows whether it is
    going to keep the current task. balance_one() now considers all conditions
    where the current task should be kept and uses SCX_TASK_BAL_KEEP to tell
    pick_next_task_scx() to keep the current task instead of picking one from
    the local DSQ. Accordingly, SCX_ENQ_LAST handling is removed from
    put_prev_task_scx() and do_enqueue_task() and pick_next_task_scx() is
    updated to pick the current task if SCX_TASK_BAL_KEEP is set.
    
    The workaround put_prev_task[_scx]() calls are replaced with
    put_prev_set_next_task().
    
    This causes two behavior changes observable from the BPF scheduler:
    
    - When a task keep running, it no longer goes through enqueue/dequeue cycle
      and thus ops.stopping/running() transitions. The new behavior is better
      and all the existing schedulers should be able to handle the new behavior.
    
    - The BPF scheduler cannot keep executing the current task by enqueueing
      SCX_ENQ_LAST task to the local DSQ. If SCX_OPS_ENQ_LAST is specified, the
      BPF scheduler is responsible for resuming execution after each
      SCX_ENQ_LAST. SCX_OPS_ENQ_LAST is mostly useful for cases where scheduling
      decisions are not made on the local CPU - e.g. central or userspace-driven
      schedulin - and the new behavior is more logical and shouldn't pose any
      problems. SCX_OPS_ENQ_LAST demonstration from scx_qmap is dropped as it
      doesn't fit that well anymore and the last task handling is moved to the
      end of qmap_dispatch().
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Cc: David Vernet <void@manifault.com>
    Cc: Andrea Righi <righi.andrea@gmail.com>
    Cc: Changwoo Min <multics69@gmail.com>
    Cc: Daniel Hodges <hodges.daniel.scott@gmail.com>
    Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
    7c65ae81
ext.c 184 KB