• Tejun Heo's avatar
    sched_ext: Add cgroup support · 81951366
    Tejun Heo authored
    Add sched_ext_ops operations to init/exit cgroups, and track task migrations
    and config changes. A BPF scheduler may not implement or implement only
    subset of cgroup features. The implemented features can be indicated using
    %SCX_OPS_HAS_CGOUP_* flags. If cgroup configuration makes use of features
    that are not implemented, a warning is triggered.
    
    While a BPF scheduler is being enabled and disabled, relevant cgroup
    operations are locked out using scx_cgroup_rwsem. This avoids situations
    like task prep taking place while the task is being moved across cgroups,
    making things easier for BPF schedulers.
    
    v7: - cgroup interface file visibility toggling is dropped in favor just
          warning messages. Dynamically changing interface visiblity caused more
          confusion than helping.
    
    v6: - Updated to reflect the removal of SCX_KF_SLEEPABLE.
    
        - Updated to use CONFIG_GROUP_SCHED_WEIGHT and fixes for
          !CONFIG_FAIR_GROUP_SCHED && CONFIG_EXT_GROUP_SCHED.
    
    v5: - Flipped the locking order between scx_cgroup_rwsem and
          cpus_read_lock() to avoid locking order conflict w/ cpuset. Better
          documentation around locking.
    
        - sched_move_task() takes an early exit if the source and destination
          are identical. This triggered the warning in scx_cgroup_can_attach()
          as it left p->scx.cgrp_moving_from uncleared. Updated the cgroup
          migration path so that ops.cgroup_prep_move() is skipped for identity
          migrations so that its invocations always match ops.cgroup_move()
          one-to-one.
    
    v4: - Example schedulers moved into their own patches.
    
        - Fix build failure when !CONFIG_CGROUP_SCHED, reported by Andrea Righi.
    
    v3: - Make scx_example_pair switch all tasks by default.
    
        - Convert to BPF inline iterators.
    
        - scx_bpf_task_cgroup() is added to determine the current cgroup from
          CPU controller's POV. This allows BPF schedulers to accurately track
          CPU cgroup membership.
    
        - scx_example_flatcg added. This demonstrates flattened hierarchy
          implementation of CPU cgroup control and shows significant performance
          improvement when cgroups which are nested multiple levels are under
          competition.
    
    v2: - Build fixes for different CONFIG combinations.
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Reviewed-by: default avatarDavid Vernet <dvernet@meta.com>
    Acked-by: default avatarJosh Don <joshdon@google.com>
    Acked-by: default avatarHao Luo <haoluo@google.com>
    Acked-by: default avatarBarret Rhoden <brho@google.com>
    Reported-by: default avatarkernel test robot <lkp@intel.com>
    Cc: Andrea Righi <andrea.righi@canonical.com>
    81951366
maximal.bpf.c 3.99 KB