• Waiman Long's avatar
    cgroup: Show # of subsystem CSSes in cgroup.stat · ab031252
    Waiman Long authored
    Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to
    help manage different structures in various cgroup subsystems by being
    an embedded element inside a larger structure like cpuset or mem_cgroup.
    
    The /proc/cgroups file shows the number of cgroups for each of the
    subsystems.  With cgroup v1, the number of CSSes is the same as the
    number of cgroups.  That is not the case anymore with cgroup v2. The
    /proc/cgroups file cannot show the actual number of CSSes for the
    subsystems that are bound to cgroup v2.
    
    So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup),
    we can't tell by looking at /proc/cgroups which cgroup subsystems may
    be responsible.
    
    As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical
    cgroup.stat file is now being extended to show the number of live and
    dying CSSes associated with all the non-inhibited cgroup subsystems that
    have been bound to cgroup v2. The number includes CSSes in the current
    cgroup as well as in all the descendants underneath it.  This will help
    us pinpoint which subsystems are responsible for the increasing number
    of dying (nr_dying_descendants) cgroups.
    
    The CSSes dying counts are stored in the cgroup structure itself
    instead of inside the CSS as suggested by Johannes. This will allow
    us to accurately track dying counts of cgroup subsystems that have
    recently been disabled in a cgroup. It is now possible that a zero
    subsystem number is coupled with a non-zero dying subsystem number.
    
    The cgroup-v2.rst file is updated to discuss this new behavior.
    
    With this patch applied, a sample output from root cgroup.stat file
    was shown below.
    
    	nr_descendants 56
    	nr_subsys_cpuset 1
    	nr_subsys_cpu 43
    	nr_subsys_io 43
    	nr_subsys_memory 56
    	nr_subsys_perf_event 57
    	nr_subsys_hugetlb 1
    	nr_subsys_pids 56
    	nr_subsys_rdma 1
    	nr_subsys_misc 1
    	nr_dying_descendants 30
    	nr_dying_subsys_cpuset 0
    	nr_dying_subsys_cpu 0
    	nr_dying_subsys_io 0
    	nr_dying_subsys_memory 30
    	nr_dying_subsys_perf_event 0
    	nr_dying_subsys_hugetlb 0
    	nr_dying_subsys_pids 0
    	nr_dying_subsys_rdma 0
    	nr_dying_subsys_misc 0
    
    Another sample output from system.slice/cgroup.stat was:
    
    	nr_descendants 34
    	nr_subsys_cpuset 0
    	nr_subsys_cpu 32
    	nr_subsys_io 32
    	nr_subsys_memory 34
    	nr_subsys_perf_event 35
    	nr_subsys_hugetlb 0
    	nr_subsys_pids 34
    	nr_subsys_rdma 0
    	nr_subsys_misc 0
    	nr_dying_descendants 30
    	nr_dying_subsys_cpuset 0
    	nr_dying_subsys_cpu 0
    	nr_dying_subsys_io 0
    	nr_dying_subsys_memory 30
    	nr_dying_subsys_perf_event 0
    	nr_dying_subsys_hugetlb 0
    	nr_dying_subsys_pids 0
    	nr_dying_subsys_rdma 0
    	nr_dying_subsys_misc 0
    
    Note that 'debug' controller wasn't used to provide this information because
    the controller is not recommended in productions kernels, also many of them
    won't enable CONFIG_CGROUP_DEBUG by default.
    
    Similar information could be retrieved with debuggers like drgn but that's
    also not always available (e.g. lockdown) and the additional cost of runtime
    tracking here is deemed marginal.
    
    tj: Added Michal's paragraphs on why this is not added the debug controller
        to the commit message.
    Signed-off-by: default avatarWaiman Long <longman@redhat.com>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Acked-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
    Reviewed-by: default avatarKamalesh Babulal <kamalesh.babulal@oracle.com>
    Cc: Michal Koutný <mkoutny@suse.com>
    Link: http://lkml.kernel.org/r/20240715150034.2583772-1-longman@redhat.comSigned-off-by: default avatarTejun Heo <tj@kernel.org>
    ab031252
cgroup.c 187 KB