1. 13 Aug, 2024 2 commits
    • Tejun Heo's avatar
      sched_ext: Don't use double locking to migrate tasks across CPUs · 89909296
      Tejun Heo authored
      consume_remote_task() and dispatch_to_local_dsq() use
      move_task_to_local_dsq() to migrate the task to the target CPU. Currently,
      move_task_to_local_dsq() expects the caller to lock both the source and
      destination rq's. While this may save a few lock operations while the rq's
      are not contended, under contention, the double locking can exacerbate the
      situation significantly (refer to the linked message below).
      
      Update the migration path so that double locking is not used.
      move_task_to_local_dsq() now expects the caller to be locking the source rq,
      drops it and then acquires the destination rq lock. Code is simpler this way
      and, on a 2-way NUMA machine w/ Xeon Gold 6138, 'hackbench 100 thread 5000`
      shows ~3% improvement with scx_simple.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20240806082716.GP37996@noisy.programming.kicks-ass.netAcked-by: default avatarDavid Vernet <void@manifault.com>
      89909296
    • Manu Bretelle's avatar
      sched_ext: define missing cfi stubs for sched_ext · 33d031ec
      Manu Bretelle authored
      `__bpf_ops_sched_ext_ops` was missing the initialization of some struct
      attributes. With
      
        https://lore.kernel.org/all/20240722183049.2254692-4-martin.lau@linux.dev/
      
      every single attributes need to be initialized programs (like scx_layered)
      will fail to load.
      
        05:26:48 [INFO] libbpf: struct_ops layered: member cgroup_init not found in kernel, skipping it as it's set to zero
        05:26:48 [INFO] libbpf: struct_ops layered: member cgroup_exit not found in kernel, skipping it as it's set to zero
        05:26:48 [INFO] libbpf: struct_ops layered: member cgroup_prep_move not found in kernel, skipping it as it's set to zero
        05:26:48 [INFO] libbpf: struct_ops layered: member cgroup_move not found in kernel, skipping it as it's set to zero
        05:26:48 [INFO] libbpf: struct_ops layered: member cgroup_cancel_move not found in kernel, skipping it as it's set to zero
        05:26:48 [INFO] libbpf: struct_ops layered: member cgroup_set_weight not found in kernel, skipping it as it's set to zero
        05:26:48 [WARN] libbpf: prog 'layered_dump': BPF program load failed: unknown error (-524)
        05:26:48 [WARN] libbpf: prog 'layered_dump': -- BEGIN PROG LOAD LOG --
        attach to unsupported member dump of struct sched_ext_ops
        processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
        -- END PROG LOAD LOG --
        05:26:48 [WARN] libbpf: prog 'layered_dump': failed to load: -524
        05:26:48 [WARN] libbpf: failed to load object 'bpf_bpf'
        05:26:48 [WARN] libbpf: failed to load BPF skeleton 'bpf_bpf': -524
        Error: Failed to load BPF program
      Signed-off-by: default avatarManu Bretelle <chantr4@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      33d031ec
  2. 08 Aug, 2024 3 commits
    • Tejun Heo's avatar
      sched_ext: Improve logging around enable/disable · 344576fa
      Tejun Heo authored
      sched_ext currently doesn't generate messages when the BPF scheduler is
      enabled and disabled unless there are errors. It is useful to have paper
      trail. Improve logging around enable/disable:
      
      - Generate info messages on enable and non-error disable.
      
      - Update error exit message formatting so that it's consistent with
        non-error message. Also, prefix ei->msg with the BPF scheduler's name to
        make it clear where the message is coming from.
      
      - Shorten scx_exit_reason() strings for SCX_EXIT_UNREG* for brevity and
        consistency.
      
      v2: Use pr_*() instead of KERN_* consistently. (David)
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Suggested-by: default avatarPhil Auld <pauld@redhat.com>
      Reviewed-by: default avatarPhil Auld <pauld@redhat.com>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      344576fa
    • Tejun Heo's avatar
      sched_ext: Make scx_rq_online() also test cpu_active() in addition to SCX_RQ_ONLINE · 991ef53a
      Tejun Heo authored
      scx_rq_online() currently only tests SCX_RQ_ONLINE. This isn't fully correct
      - e.g. consume_dispatch_q() uses task_run_on_remote_rq() which tests
      scx_rq_online() to see whether the current rq can run the task, and, if so,
      calls consume_remote_task() to migrate the task to @rq. While the test
      itself was done while locking @rq, @rq can be temporarily unlocked by
      consume_remote_task() and nothing prevents SCX_RQ_ONLINE from going offline
      before the migration takes place.
      
      To address the issue, add cpu_active() test to scx_rq_online(). There is a
      synchronize_rcu() between cpu_active() being cleared and the rq going
      offline, so if an on-going scheduling operation sees cpu_active(), the
      associated rq is guaranteed to not go offline until the scheduling operation
      is complete.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 60c27fb5 ("sched_ext: Implement sched_ext_ops.cpu_online/offline()")
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      991ef53a
    • Tejun Heo's avatar
      sched_ext: Fix unsafe list iteration in process_ddsp_deferred_locals() · 72763ea3
      Tejun Heo authored
      process_ddsp_deferred_locals() executes deferred direct dispatches to the
      local DSQs of remote CPUs. It iterates the tasks on
      rq->scx.ddsp_deferred_locals list, removing and calling
      dispatch_to_local_dsq() on each. However, the list is protected by the rq
      lock that can be dropped by dispatch_to_local_dsq() temporarily, so the list
      can be modified during the iteration, which can lead to oopses and other
      failures.
      
      Fix it by popping from the head of the list instead of iterating the list.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 5b26f7b9 ("sched_ext: Allow SCX_DSQ_LOCAL_ON for direct dispatches")
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      72763ea3
  3. 06 Aug, 2024 6 commits
  4. 04 Aug, 2024 1 commit
    • Tejun Heo's avatar
      Merge branch 'sched/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into for-6.12 · 0df340ce
      Tejun Heo authored
      Pull tip/sched/core to resolve the following four conflicts. While 2-4 are
      simple context conflicts, 1 is a bit subtle and easy to resolve incorrectly.
      
      1. 2c8d046d ("sched: Add normal_policy()")
         vs.
         faa42d29 ("sched/fair: Make SCHED_IDLE entity be preempted in strict hierarchy")
      
      The former converts direct test on p->policy to use the helper
      normal_policy(). The latter moves the p->policy test to a different
      location. Resolve by converting the test on p->plicy in the new location to
      use normal_policy().
      
      2. a7a9fc54 ("sched_ext: Add boilerplate for extensible scheduler class")
         vs.
         a110a81c ("sched/deadline: Deferrable dl server")
      
      Both add calls to put_prev_task_idle() and set_next_task_idle(). Simple
      context conflict. Resolve by taking changes from both.
      
      3. a7a9fc54 ("sched_ext: Add boilerplate for extensible scheduler class")
         vs.
         c2459100 ("sched/core: Add clearing of ->dl_server in put_prev_task_balance()")
      
      The former changes for_each_class() itertion to use for_each_active_class().
      The latter moves away the adjacent dl_server handling code. Simple context
      conflict. Resolve by taking changes from both.
      
      4. 60c27fb5 ("sched_ext: Implement sched_ext_ops.cpu_online/offline()")
         vs.
         31b164e2 ("sched/smt: Introduce sched_smt_present_inc/dec() helper")
         2f027354 ("sched/core: Introduce sched_set_rq_on/offline() helper")
      
      The former adds scx_rq_deactivate() call. The latter two change code around
      it. Simple context conflict. Resolve by taking changes from both.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      0df340ce
  5. 02 Aug, 2024 1 commit
  6. 01 Aug, 2024 1 commit
  7. 31 Jul, 2024 2 commits
    • David Vernet's avatar
      scx/selftests: Verify we can call create_dsq from prog_run · 958b1891
      David Vernet authored
      We already have some testcases verifying that we can call
      BPF_PROG_TYPE_SYSCALL progs and invoke scx_bpf_exit(). Let's extend that to
      also call scx_bpf_create_dsq() so we get coverage for that as well.
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      958b1891
    • David Vernet's avatar
      scx: Allow calling sleepable kfuncs from BPF_PROG_TYPE_SYSCALL · 298dec19
      David Vernet authored
      We currently only allow calling sleepable scx kfuncs (i.e.
      scx_bpf_create_dsq()) from BPF_PROG_TYPE_STRUCT_OPS progs. The idea here
      was that we'd never have to call scx_bpf_create_dsq() outside of a
      sched_ext struct_ops callback, but that might not actually be true. For
      example, a scheduler could do something like the following:
      
      1. Open and load (not yet attach) a scheduler skel
      
      2. Synchronously call into a BPF_PROG_TYPE_SYSCALL prog from user space.
         For example, to initialize an LLC domain, or some other global,
         read-only state.
      
      3. Attach the skel, which actually enables the scheduler
      
      The advantage of doing this is that it can preclude having to do pretty
      ugly boilerplate like initializing a read-only, statically sized array of
      u64[]'s which the kernel consumes literally once at init time to then
      create struct bpf_cpumask objects which are actually queried at runtime.
      
      Doing the above is already possible given that we can invoke core BPF
      kfuncs, such as bpf_cpumask_create(), from BPF_PROG_TYPE_SYSCALL progs. We
      already allow many scx kfuncs to be called from BPF_PROG_TYPE_SYSCALL progs
      (e.g. scx_bpf_kick_cpu()). Let's allow the sleepable kfuncs as well.
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      298dec19
  8. 30 Jul, 2024 1 commit
  9. 29 Jul, 2024 20 commits
  10. 28 Jul, 2024 3 commits