• Christian Brauner's avatar
    clone3: allow spawning processes into cgroups · ef2c41cf
    Christian Brauner authored
    This adds support for creating a process in a different cgroup than its
    parent. Callers can limit and account processes and threads right from
    the moment they are spawned:
    - A service manager can directly spawn new services into dedicated
      cgroups.
    - A process can be directly created in a frozen cgroup and will be
      frozen as well.
    - The initial accounting jitter experienced by process supervisors and
      daemons is eliminated with this.
    - Threaded applications or even thread implementations can choose to
      create a specific cgroup layout where each thread is spawned
      directly into a dedicated cgroup.
    
    This feature is limited to the unified hierarchy. Callers need to pass
    a directory file descriptor for the target cgroup. The caller can
    choose to pass an O_PATH file descriptor. All usual migration
    restrictions apply, i.e. there can be no processes in inner nodes. In
    general, creating a process directly in a target cgroup adheres to all
    migration restrictions.
    
    One of the biggest advantages of this feature is that CLONE_INTO_GROUP does
    not need to grab the write side of the cgroup cgroup_threadgroup_rwsem.
    This global lock makes moving tasks/threads around super expensive. With
    clone3() this lock is avoided.
    
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Li Zefan <lizefan@huawei.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: cgroups@vger.kernel.org
    Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    ef2c41cf
cgroup.c 171 KB