1. 15 Apr, 2013 5 commits
    • Tejun Heo's avatar
      memcg: force use_hierarchy if sane_behavior · f00baae7
      Tejun Heo authored
      Turn on use_hierarchy by default if sane_behavior is specified and
      don't create .use_hierarchy file.
      
      It is debatable whether to remove .use_hierarchy file or make it ro as
      the former could make transition easier in certain cases; however, the
      behavior changes which will be gated by sane_behavior are intensive
      including changing basic meaning of certain control knobs in a few
      controllers and I don't really think keeping this piece would make
      things easier in any noticeable way, so let's remove it.
      
      v2: Explain that mem_cgroup_bind() doesn't have to worry about
          children as suggested by Michal Hocko.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      f00baae7
    • Li Zefan's avatar
      cgroup: remove cgrp->top_cgroup · 05fb22ec
      Li Zefan authored
      It's not used, and it can be retrieved via cgrp->root->top_cgroup.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      05fb22ec
    • Tejun Heo's avatar
      cgroup: introduce sane_behavior mount option · 873fe09e
      Tejun Heo authored
      It's a sad fact that at this point various cgroup controllers are
      carrying so many idiosyncrasies and pure insanities that it simply
      isn't possible to reach any sort of sane consistent behavior while
      maintaining staying fully compatible with what already has been
      exposed to userland.
      
      As we can't break exposed userland interface, transitioning to sane
      behaviors can only be done in steps while maintaining backwards
      compatibility.  This patch introduces a new mount option -
      __DEVEL__sane_behavior - which disables crazy features and enforces
      consistent behaviors in cgroup core proper and various controllers.
      As exactly which behaviors it changes are still being determined, the
      mount option, at this point, is useful only for development of the new
      behaviors.  As such, the mount option is prefixed with __DEVEL__ and
      generates a warning message when used.
      
      Eventually, once we get to the point where all controller's behaviors
      are consistent enough to implement unified hierarchy, the __DEVEL__
      prefix will be dropped, and more importantly, unified-hierarchy will
      enforce sane_behavior by default.  Maybe we'll able to completely drop
      the crazy stuff after a while, maybe not, but we at least have a
      strategy to move on to saner behaviors.
      
      This patch introduces the mount option and changes the following
      behaviors in cgroup core.
      
      * Mount options "noprefix" and "clone_children" are disallowed.  Also,
        cgroupfs file cgroup.clone_children is not created.
      
      * When mounting an existing superblock, mount options should match.
        This is currently pretty crazy.  If one mounts a cgroup, creates a
        subdirectory, unmounts it and then mount it again with different
        option, it looks like the new options are applied but they aren't.
      
      * Remount is disallowed.
      
      The behaviors changes are documented in the comment above
      CGRP_ROOT_SANE_BEHAVIOR enum and will be expanded as different
      controllers are converted and planned improvements progress.
      
      v2: Dropped unnecessary explicit file permission setting sane_behavior
          cftype entry as suggested by Li Zefan.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      873fe09e
    • Tejun Heo's avatar
      move cgroupfs_root to include/linux/cgroup.h · 25a7e684
      Tejun Heo authored
      While controllers shouldn't be accessing cgroupfs_root directly, it
      being hidden inside kern/cgroup.c makes somethings pretty silly.  This
      makes routing hierarchy-wide settings which need to be visible to
      controllers cumbersome.
      
      We're gonna add another hierarchy-wide setting which needs to be
      accessed from controllers.  Move cgroupfs_root and its flags to the
      header file so that we can access root settings with inline helpers.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      25a7e684
    • Tejun Heo's avatar
      cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix · 93438629
      Tejun Heo authored
      There's no reason to be using bitops, which tends to be more
      cumbersome, to handle root flags.  Convert them to masks.  Also, as
      they'll be moved to include/linux/cgroup.h and it's generally a good
      idea, add CGRP_ prefix.
      
      Note that flags are assigned from (1 << 1).  The first bit will be
      used by a flag which will be added soon.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      93438629
  2. 14 Apr, 2013 1 commit
    • Tejun Heo's avatar
      cgroup: make cgroup_path() not print double slashes · da1f296f
      Tejun Heo authored
      While reimplementing cgroup_path(), 65dff759 ("cgroup: fix
      cgroup_path() vs rename() race") introduced a bug where the path of a
      non-root cgroup would have two slahses at the beginning, which is
      caused by treating the root cgroup which has the name '/' like
      non-root cgroups.
      
       $ grep systemd /proc/self/cgroup
       1:name=systemd://user/root/1
      
      Fix it by special casing root cgroup case and not looping over it in
      the normal path.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      da1f296f
  3. 12 Apr, 2013 1 commit
  4. 10 Apr, 2013 4 commits
    • Tejun Heo's avatar
      perf: make perf_event cgroup hierarchical · ef824fa1
      Tejun Heo authored
      perf_event is one of a couple remaining cgroup controllers with broken
      hierarchy support.  Converting it to support hierarchy is almost
      trivial.  The only thing necessary is to consider a task belonging to
      a descendant cgroup as a match.  IOW, if the cgroup of the currently
      executing task (@cpuctx->cgrp) equals or is a descendant of the
      event's cgroup (@event->cgrp), then the event should be enabled.
      
      Implement hierarchy support and remove .broken_hierarchy tag along
      with the incorrect comment on what needs to be done for hierarchy
      support.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      ef824fa1
    • Li Zefan's avatar
      cgroup: implement cgroup_is_descendant() · 78574cf9
      Li Zefan authored
      A couple controllers want to determine whether two cgroups are in
      ancestor/descendant relationship.  As it's more likely that the
      descendant is the primary subject of interest and there are other
      operations focusing on the descendants, let's ask is_descendent rather
      than is_ancestor.
      
      Implementation is trivial as the previous patch guarantees that all
      ancestors of a cgroup stay accessible as long as the cgroup is
      accessible.
      
      tj: Removed depth optimization, renamed from cgroup_is_ancestor(),
          rewrote descriptions.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      78574cf9
    • Li Zefan's avatar
      cgroup: make sure parent won't be destroyed before its children · 415cf07a
      Li Zefan authored
      Suppose we rmdir a cgroup and there're still css refs, this cgroup won't
      be freed. Then we rmdir the parent cgroup, and the parent is freed
      immediately due to css ref draining to 0. Now it would be a disaster if
      the still-alive child cgroup tries to access its parent.
      
      Make sure this won't happen.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      415cf07a
    • Rami Rosen's avatar
      cgroup: remove bind() method from cgroup_subsys. · 84cfb6ab
      Rami Rosen authored
      The bind() method of cgroup_subsys is not used in any of the
      controllers (cpuset, freezer, blkio, net_cls, memcg, net_prio,
      devices, perf, hugetlb, cpu and cpuacct)
      
      tj: Removed the entry on ->bind() from
          Documentation/cgroups/cgroups.txt.  Also updated a couple
          paragraphs which were suggesting that dynamic re-binding may be
          implemented.  It's not gonna.
      Signed-off-by: default avatarRami Rosen <ramirose@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      84cfb6ab
  5. 08 Apr, 2013 1 commit
  6. 07 Apr, 2013 6 commits
  7. 03 Apr, 2013 2 commits
  8. 20 Mar, 2013 6 commits
    • Li Zefan's avatar
      cgroup: consolidate cgroup_attach_task() and cgroup_attach_proc() · 081aa458
      Li Zefan authored
      These two functions share most of the code.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      081aa458
    • Aristeu Rozanski's avatar
      devcg: propagate local changes down the hierarchy · bd2953eb
      Aristeu Rozanski authored
      This patch makes exception changes to propagate down in hierarchy respecting
      when possible local exceptions.
      
      New exceptions allowing additional access to devices won't be propagated, but
      it'll be possible to add an exception to access all of part of the newly
      allowed device(s).
      
      New exceptions disallowing access to devices will be propagated down and the
      local group's exceptions will be revalidated for the new situation.
      Example:
            A
           / \
              B
      
          group        behavior          exceptions
          A            allow             "b 8:* rwm", "c 116:1 rw"
          B            deny              "c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm"
      
      If a new exception is added to group A:
      	# echo "c 116:* r" > A/devices.deny
      it'll propagate down and after revalidating B's local exceptions, the exception
      "c 116:2 rwm" will be removed.
      
      In case parent's exceptions change and local exceptions are not allowed anymore,
      they'll be deleted.
      
      v7:
      - do not allow behavior change when the cgroup has children
      - update documentation
      
      v6: fixed issues pointed by Serge Hallyn
      - only copy parent's exceptions while propagating behavior if the local
        behavior is different
      - while propagating exceptions, do not clear and copy parent's: it'd be against
        the premise we don't propagate access to more devices
      
      v5: fixed issues pointed by Serge Hallyn
      - updated documentation
      - not propagating when an exception is written to devices.allow
      - when propagating a new behavior, clean the local exceptions list if they're
        for a different behavior
      
      v4: fixed issues pointed by Tejun Heo
      - separated function to walk the tree and collect valid propagation targets
      
      v3: fixed issues pointed by Tejun Heo
      - update documentation
      - move css_online/css_offline changes to a new patch
      - use cgroup_for_each_descendant_pre() instead of own descendant walk
      - move exception_copy rework to a separared patch
      - move exception_clean rework to a separated patch
      
      v2: fixed issues pointed by Tejun Heo
      - instead of keeping the local settings that won't apply anymore, remove them
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatarAristeu Rozanski <aris@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      bd2953eb
    • Aristeu Rozanski's avatar
      devcg: use css_online and css_offline · 1909554c
      Aristeu Rozanski authored
      Allocate resources and change behavior only when online. This is needed in
      order to determine if a node is suitable for hierarchy propagation or if it's
      being removed.
      
      Locking:
      Both functions take devcgroup_mutex to make changes to device_cgroup structure.
      Hierarchy propagation will also take devcgroup_mutex before walking the
      tree while walking the tree itself is protected by rcu lock.
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatarAristeu Rozanski <aris@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      1909554c
    • Aristeu Rozanski's avatar
      devcg: prepare may_access() for hierarchy support · c39a2a30
      Aristeu Rozanski authored
      Currently may_access() is only able to verify if an exception is valid for the
      current cgroup, which has the same behavior. With hierarchy, it'll be also used
      to verify if a cgroup local exception is valid towards its cgroup parent, which
      might have different behavior.
      
      v2:
      - updated patch description
      - rebased on top of a new patch to expand the may_access() logic to make it
        more clear
      - fixed argument description order in may_access()
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatarAristeu Rozanski <aris@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      c39a2a30
    • Aristeu Rozanski's avatar
      devcg: expand may_access() logic · 26898fdf
      Aristeu Rozanski authored
      In order to make the next patch more clear, expand may_access() logic.
      
      v2: may_access() returns bool now
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatarAristeu Rozanski <aris@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      26898fdf
    • Li Zefan's avatar
      cgroup: fix an off-by-one bug which may trigger BUG_ON() · 3ac1707a
      Li Zefan authored
      The 3rd parameter of flex_array_prealloc() is the number of elements,
      not the index of the last element.
      
      The effect of the bug is, when opening cgroup.procs, a flex array will
      be allocated and all elements of the array is allocated with
      GFP_KERNEL flag, but the last one is GFP_ATOMIC, and if we fail to
      allocate memory for it, it'll trigger a BUG_ON().
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      3ac1707a
  9. 12 Mar, 2013 6 commits
  10. 05 Mar, 2013 3 commits
  11. 04 Mar, 2013 3 commits
    • Li Zefan's avatar
      cgroup: no need to check css refs for release notification · f50daa70
      Li Zefan authored
      We no longer fail rmdir() when there're still css refs, so we don't
      need to check css refs in check_for_release().
      
      This also voids a bug. cgroup_has_css_refs() accesses subsys[i]
      without cgroup_mutex, so it can race with cgroup_unload_subsys().
      
      cgroup_has_css_refs()
      ...
        if (ss == NULL || ss->root != cgrp->root)
      
      if ss pointers to net_cls_subsys, and cls_cgroup module is unloaded
      right after the former check but before the latter, the memory that
      net_cls_subsys resides has become invalid.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      f50daa70
    • Li Zefan's avatar
      cpuset: use cgroup_name() in cpuset_print_task_mems_allowed() · f440d98f
      Li Zefan authored
      Use cgroup_name() instead of cgrp->dentry->name. This makes the code
      a bit simpler.
      
      While at it, remove cpuset_name and make cpuset_nodelist a local variable
      to cpuset_print_task_mems_allowed().
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      f440d98f
    • Li Zefan's avatar
      cgroup: fix cgroup_path() vs rename() race · 65dff759
      Li Zefan authored
      rename() will change dentry->d_name. The result of this race can
      be worse than seeing partially rewritten name, but we might access
      a stale pointer because rename() will re-allocate memory to hold
      a longer name.
      
      As accessing dentry->name must be protected by dentry->d_lock or
      parent inode's i_mutex, while on the other hand cgroup-path() can
      be called with some irq-safe spinlocks held, we can't generate
      cgroup path using dentry->d_name.
      
      Alternatively we make a copy of dentry->d_name and save it in
      cgrp->name when a cgroup is created, and update cgrp->name at
      rename().
      
      v5: use flexible array instead of zero-size array.
      v4: - allocate root_cgroup_name and all root_cgroup->name points to it.
          - add cgroup_name() wrapper.
      v3: use kfree_rcu() instead of synchronize_rcu() in user-visible path.
      v2: make cgrp->name RCU safe.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      65dff759
  12. 03 Mar, 2013 2 commits