Commits · 873fe09ea5df6ccf6bb34811d8c9992aacb67598 · Kirill Smelkov / linux

15 Apr, 2013 3 commits

cgroup: introduce sane_behavior mount option · 873fe09e

Tejun Heo authored Apr 14, 2013

It's a sad fact that at this point various cgroup controllers are
carrying so many idiosyncrasies and pure insanities that it simply
isn't possible to reach any sort of sane consistent behavior while
maintaining staying fully compatible with what already has been
exposed to userland.

As we can't break exposed userland interface, transitioning to sane
behaviors can only be done in steps while maintaining backwards
compatibility.  This patch introduces a new mount option -
__DEVEL__sane_behavior - which disables crazy features and enforces
consistent behaviors in cgroup core proper and various controllers.
As exactly which behaviors it changes are still being determined, the
mount option, at this point, is useful only for development of the new
behaviors.  As such, the mount option is prefixed with __DEVEL__ and
generates a warning message when used.

Eventually, once we get to the point where all controller's behaviors
are consistent enough to implement unified hierarchy, the __DEVEL__
prefix will be dropped, and more importantly, unified-hierarchy will
enforce sane_behavior by default.  Maybe we'll able to completely drop
the crazy stuff after a while, maybe not, but we at least have a
strategy to move on to saner behaviors.

This patch introduces the mount option and changes the following
behaviors in cgroup core.

* Mount options "noprefix" and "clone_children" are disallowed.  Also,
  cgroupfs file cgroup.clone_children is not created.

* When mounting an existing superblock, mount options should match.
  This is currently pretty crazy.  If one mounts a cgroup, creates a
  subdirectory, unmounts it and then mount it again with different
  option, it looks like the new options are applied but they aren't.

* Remount is disallowed.

The behaviors changes are documented in the comment above
CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
controllers are converted and planned improvements progress.

v2: Dropped unnecessary explicit file permission setting sane_behavior
    cftype entry as suggested by Li Zefan.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vivek Goyal <vgoyal@redhat.com>

873fe09e

move cgroupfs_root to include/linux/cgroup.h · 25a7e684

Tejun Heo authored Apr 14, 2013

While controllers shouldn't be accessing cgroupfs_root directly, it
being hidden inside kern/cgroup.c makes somethings pretty silly.  This
makes routing hierarchy-wide settings which need to be visible to
controllers cumbersome.

We're gonna add another hierarchy-wide setting which needs to be
accessed from controllers.  Move cgroupfs_root and its flags to the
header file so that we can access root settings with inline helpers.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>

25a7e684

cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix · 93438629

Tejun Heo authored Apr 14, 2013

There's no reason to be using bitops, which tends to be more
cumbersome, to handle root flags.  Convert them to masks.  Also, as
they'll be moved to include/linux/cgroup.h and it's generally a good
idea, add CGRP_ prefix.

Note that flags are assigned from (1 << 1).  The first bit will be
used by a flag which will be added soon.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Li Zefan <lizefan@huawei.com>

93438629

14 Apr, 2013 1 commit

cgroup: make cgroup_path() not print double slashes · da1f296f

Tejun Heo authored Apr 14, 2013

While reimplementing cgroup_path(), 65dff759 ("cgroup: fix
cgroup_path() vs rename() race") introduced a bug where the path of a
non-root cgroup would have two slahses at the beginning, which is
caused by treating the root cgroup which has the name '/' like
non-root cgroups.

 $ grep systemd /proc/self/cgroup
 1:name=systemd://user/root/1

Fix it by special casing root cgroup case and not looping over it in
the normal path.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>

da1f296f

12 Apr, 2013 1 commit

Revert "cgroup: remove bind() method from cgroup_subsys." · 26d5bbe5

Tejun Heo authored Apr 12, 2013

This reverts commit 84cfb6ab.  There
are scheduled changes which make use of the removed callback.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rami Rosen <ramirose@gmail.com>
Cc: Li Zefan <lizefan@huawei.com>

26d5bbe5

10 Apr, 2013 4 commits

perf: make perf_event cgroup hierarchical · ef824fa1

Tejun Heo authored Apr 08, 2013

perf_event is one of a couple remaining cgroup controllers with broken
hierarchy support.  Converting it to support hierarchy is almost
trivial.  The only thing necessary is to consider a task belonging to
a descendant cgroup as a match.  IOW, if the cgroup of the currently
executing task (@cpuctx->cgrp) equals or is a descendant of the
event's cgroup (@event->cgrp), then the event should be enabled.

Implement hierarchy support and remove .broken_hierarchy tag along
with the incorrect comment on what needs to be done for hierarchy
support.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Stephane Eranian <eranian@google.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>

ef824fa1

cgroup: implement cgroup_is_descendant() · 78574cf9

Li Zefan authored Apr 08, 2013

A couple controllers want to determine whether two cgroups are in
ancestor/descendant relationship.  As it's more likely that the
descendant is the primary subject of interest and there are other
operations focusing on the descendants, let's ask is_descendent rather
than is_ancestor.

Implementation is trivial as the previous patch guarantees that all
ancestors of a cgroup stay accessible as long as the cgroup is
accessible.

tj: Removed depth optimization, renamed from cgroup_is_ancestor(),
    rewrote descriptions.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

78574cf9

cgroup: make sure parent won't be destroyed before its children · 415cf07a

Li Zefan authored Apr 08, 2013

Suppose we rmdir a cgroup and there're still css refs, this cgroup won't
be freed. Then we rmdir the parent cgroup, and the parent is freed
immediately due to css ref draining to 0. Now it would be a disaster if
the still-alive child cgroup tries to access its parent.

Make sure this won't happen.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

415cf07a

cgroup: remove bind() method from cgroup_subsys. · 84cfb6ab

Rami Rosen authored Apr 10, 2013

The bind() method of cgroup_subsys is not used in any of the
controllers (cpuset, freezer, blkio, net_cls, memcg, net_prio,
devices, perf, hugetlb, cpu and cpuacct)

tj: Removed the entry on ->bind() from
    Documentation/cgroups/cgroups.txt.  Also updated a couple
    paragraphs which were suggesting that dynamic re-binding may be
    implemented.  It's not gonna.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

84cfb6ab

08 Apr, 2013 1 commit

devcg: remove broken_hierarchy tag · 8adf12b0

Tejun Heo authored Apr 07, 2013

bd2953eb ("devcg: propagate local changes down the hierarchy")
implemented proper hierarchy support.  Remove the broken tag.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Aristeu Rozanski <aris@redhat.com>

8adf12b0

07 Apr, 2013 6 commits

cgroup: remove cgroup_lock_is_held() · 2219449a

Tejun Heo authored Apr 07, 2013

We don't want controllers to assume that the information is officially
available and do funky things with it.

The only user is task_subsys_state_check() which uses it to verify RCU
access context.  We can move cgroup_lock_is_held() inside
CONFIG_PROVE_RCU but that doesn't add meaningful protection compared
to conditionally exposing cgroup_mutex.

Remove cgroup_lock_is_held(), export cgroup_mutex iff CONFIG_PROVE_RCU
and use lockdep_is_held() directly on the mutex in
task_subsys_state_check().

While at it, add parentheses around macro arguments in
task_subsys_state_check().
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>

2219449a

cgroup: kill cgroup_[un]lock() · 47cfcd09

Tejun Heo authored Apr 07, 2013

Now that locking interface is unexported, there's no reason to keep
around these thin wrappers.  Kill them and use mutex operations
directly.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>

47cfcd09

cgroup: unexport locking interface and cgroup_attach_task() · b9777cf8

Tejun Heo authored Apr 07, 2013

Now that all external cgroup_lock() users are gone, we can finally
unexport the locking interface and prevent future abuse of
cgroup_mutex.

Make cgroup_[un]lock() and cgroup_lock_live_group() static.  Also,
cgroup_attach_task() doesn't have any user left and can't be used
without locking interface anyway.  Make it static too.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>

b9777cf8

cgroup: relocate cgroup_lock_live_group() and cgroup_attach_task_all() · 7ae1bad9

Tejun Heo authored Apr 07, 2013

cgroup_lock_live_group() and cgroup_attach_task() are scheduled to be
made static.  Relocate the former and cgroup_attach_task_all() so that
we don't need forward declarations.

This patch is pure relocation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>

7ae1bad9

cgroup, cpuset: replace move_member_tasks_to_cpuset() with cgroup_transfer_tasks() · 8cc99345

Tejun Heo authored Apr 07, 2013

When a cpuset becomes empty (no CPU or memory), its tasks are
transferred with the nearest ancestor with execution resources.  This
is implemented using cgroup_scan_tasks() with a callback which grabs
cgroup_mutex and invokes cgroup_attach_task() on each task.

Both cgroup_mutex and cgroup_attach_task() are scheduled to be
unexported.  Implement cgroup_transfer_tasks() in cgroup proper which
is essentially the same as move_member_tasks_to_cpuset() except that
it takes cgroups instead of cpusets and @to comes before @from like
normal functions with those arguments, and replace
move_member_tasks_to_cpuset() with it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>

8cc99345

memcg: fix memcg_cache_name() to use cgroup_name() · d9c10ddd

Michal Hocko authored Mar 28, 2013

As cgroup supports rename, it's unsafe to dereference dentry->d_name
without proper vfs locks. Fix this by using cgroup_name() rather than
dentry directly.

Also open code memcg_cache_name because it is called only from
kmem_cache_dup which frees the returned name right after
kmem_cache_create_memcg makes a copy of it. Such a short-lived
allocation doesn't make too much sense. So replace it by a static
buffer as kmem_cache_dup is called with memcg_cache_mutex.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

d9c10ddd

03 Apr, 2013 2 commits

cgroup: remove unused parameter in cgroup_task_migrate(). · 1e2ccd1c

Kevin Wilson authored Apr 01, 2013

This patch removes unused parameter from cgroup_task_migrate().
Signed-off-by: Kevin Wilson <wkevils@gmail.com>
Acked-by: Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

1e2ccd1c

cgroups: Documentation/cgroup/cgroup.txt - a trivial fix. · 1ae65ae9

Rami Rosen authored Apr 03, 2013

This trivial patch removes a word which appears twice in
Documentation/cgroup/cgroup.txt.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

1ae65ae9

20 Mar, 2013 6 commits

cgroup: consolidate cgroup_attach_task() and cgroup_attach_proc() · 081aa458

Li Zefan authored Mar 13, 2013

These two functions share most of the code.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

081aa458

devcg: propagate local changes down the hierarchy · bd2953eb

Aristeu Rozanski authored Feb 15, 2013

This patch makes exception changes to propagate down in hierarchy respecting
when possible local exceptions.

New exceptions allowing additional access to devices won't be propagated, but
it'll be possible to add an exception to access all of part of the newly
allowed device(s).

New exceptions disallowing access to devices will be propagated down and the
local group's exceptions will be revalidated for the new situation.
Example:
      A
     / \
        B

    group        behavior          exceptions
    A            allow             "b 8:* rwm", "c 116:1 rw"
    B            deny              "c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm"

If a new exception is added to group A:
	# echo "c 116:* r" > A/devices.deny
it'll propagate down and after revalidating B's local exceptions, the exception
"c 116:2 rwm" will be removed.

In case parent's exceptions change and local exceptions are not allowed anymore,
they'll be deleted.

v7:
- do not allow behavior change when the cgroup has children
- update documentation

v6: fixed issues pointed by Serge Hallyn
- only copy parent's exceptions while propagating behavior if the local
  behavior is different
- while propagating exceptions, do not clear and copy parent's: it'd be against
  the premise we don't propagate access to more devices

v5: fixed issues pointed by Serge Hallyn
- updated documentation
- not propagating when an exception is written to devices.allow
- when propagating a new behavior, clean the local exceptions list if they're
  for a different behavior

v4: fixed issues pointed by Tejun Heo
- separated function to walk the tree and collect valid propagation targets

v3: fixed issues pointed by Tejun Heo
- update documentation
- move css_online/css_offline changes to a new patch
- use cgroup_for_each_descendant_pre() instead of own descendant walk
- move exception_copy rework to a separared patch
- move exception_clean rework to a separated patch

v2: fixed issues pointed by Tejun Heo
- instead of keeping the local settings that won't apply anymore, remove them

Cc: Tejun Heo <tj@kernel.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

bd2953eb

devcg: use css_online and css_offline · 1909554c

Aristeu Rozanski authored Feb 15, 2013

Allocate resources and change behavior only when online. This is needed in
order to determine if a node is suitable for hierarchy propagation or if it's
being removed.

Locking:
Both functions take devcgroup_mutex to make changes to device_cgroup structure.
Hierarchy propagation will also take devcgroup_mutex before walking the
tree while walking the tree itself is protected by rcu lock.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

1909554c

devcg: prepare may_access() for hierarchy support · c39a2a30

Aristeu Rozanski authored Feb 15, 2013

Currently may_access() is only able to verify if an exception is valid for the
current cgroup, which has the same behavior. With hierarchy, it'll be also used
to verify if a cgroup local exception is valid towards its cgroup parent, which
might have different behavior.

v2:
- updated patch description
- rebased on top of a new patch to expand the may_access() logic to make it
  more clear
- fixed argument description order in may_access()
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

c39a2a30

devcg: expand may_access() logic · 26898fdf

Aristeu Rozanski authored Feb 15, 2013

In order to make the next patch more clear, expand may_access() logic.

v2: may_access() returns bool now
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

26898fdf

cgroup: fix an off-by-one bug which may trigger BUG_ON() · 3ac1707a