1. 24 Mar, 2014 1 commit
  2. 20 Mar, 2014 1 commit
    • Tejun Heo's avatar
      cgroup: break kernfs active_ref protection in cgroup directory operations · e1b2dc17
      Tejun Heo authored
      cgroup_tree_mutex should nest above the kernfs active_ref protection;
      however, cgroup_create() and cgroup_rename() were grabbing
      cgroup_tree_mutex while under kernfs active_ref protection.  This has
      actualy possibility to lead to deadlocks in case these operations race
      against cgroup_rmdir() which invokes kernfs_remove() on directory
      kernfs_node while holding cgroup_tree_mutex.
      
      Neither cgroup_create() or cgroup_rename() requires active_ref
      protection.  The former already has enough synchronization through
      cgroup_lock_live_group() and the latter doesn't care, so this can be
      fixed by updating both functions to break all active_ref protections
      before grabbing cgroup_tree_mutex.
      
      While this patch fixes the immediate issue, it probably needs further
      work in the long term - kernfs directories should enable lockdep
      annotations and maybe the better way to handle this is marking
      directory nodes as not needing active_ref protection rather than
      breaking it in each operation.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      e1b2dc17
  3. 19 Mar, 2014 11 commits
    • Tejun Heo's avatar
      cgroup: fix cgroup_taskset walking order · 1b9aba49
      Tejun Heo authored
      cgroup_taskset is used to track and iterate target tasks while
      migrating a task or process and should guarantee that the first task
      iterated is the task group leader if a process is being migrated.
      
      b3dc094e ("cgroup: use css_set->mg_tasks to track target tasks
      during migration") replaced flex array cgroup_taskset->tc_array with
      css_set->mg_tasks list to remove process size limit and dynamic
      allocation during migration; unfortunately, it incorrectly used list
      operations which don't preserve order breaking the guarantee that
      cgroup_taskset_first() returns the leader for a process target.
      
      Fix it by using order preserving list operations.  Note that as
      multiple src_csets may map to a single dst_cset, the iteration order
      may change across cgroup_task_migrate(); however, the leader is still
      guaranteed to be the first entry.
      
      The switch to list_splice_tail_init() at the end of cgroup_migrate()
      isn't strictly necessary.  Let's still do it for consistency.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      1b9aba49
    • Tejun Heo's avatar
      cgroup: implement CFTYPE_ONLY_ON_DFL · 8cbbf2c9
      Tejun Heo authored
      This cftype flag makes the file only appear on the default hierarchy.
      This will later be used for cgroup.controllers file.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      8cbbf2c9
    • Tejun Heo's avatar
      cgroup: make cgrp_dfl_root mountable · a2dd4247
      Tejun Heo authored
      cgrp_dfl_root will be used as the default unified hierarchy.  This
      patch makes cgrp_dfl_root mountable by making the following changes.
      
      * cgroup_init_early() now initializes cgrp_dfl_root w/
        CGRP_ROOT_SANE_BEHAVIOR.  The default hierarchy is always sane.
      
      * parse_cgroupfs_options() and cgroup_mount() are updated such that
        cgrp_dfl_root is mounted if sane_behavior is specified w/o any
        subsystems.
      
      * rebind_subsystems() now populates the root directory of
        cgrp_dfl_root.  Note that the function still guarantees success of
        rebinding subsystems to cgrp_dfl_root.  If populating fails while
        rebinding to cgrp_dfl_root, it whines but ignores the error.
      
      * For backward compatibility, the default hierarchy shows up in
        /proc/$PID/cgroup only after it's explicitly mounted so that
        userland which doesn't make use of it doesn't see any change.
      
      * "current_css_set_cg_links" file of debug cgroup now treats the
        default hierarchy the same as other hierarchies.  This is visible to
        userland.  Given that it's for debug controller, this should be
        fine.
      
      * While at it, implement cgroup_on_dfl() which tests whether a give
        cgroup is on the default hierarchy or not.
      
      The above changes make cgrp_dfl_root mostly equivalent to other
      controllers but the actual unified hierarchy behaviors are not
      implemented yet.  Let's plug child cgroup creation in cgrp_dfl_root
      from create_cgroup() for now.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      a2dd4247
    • Tejun Heo's avatar
      cgroup: drop const from @buffer of cftype->write_string() · 4d3bb511
      Tejun Heo authored
      cftype->write_string() just passes on the writeable buffer from kernfs
      and there's no reason to add const restriction on the buffer.  The
      only thing const achieves is unnecessarily complicating parsing of the
      buffer.  Drop const from @buffer.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Michal Hocko <mhocko@suse.cz>                                           
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      4d3bb511
    • Tejun Heo's avatar
      cgroup: rename cgroup_dummy_root and related names · 3dd06ffa
      Tejun Heo authored
      The dummy root will be repurposed to serve as the default unified
      hierarchy.  Let's rename things in preparation.
      
      * s/cgroup_dummy_root/cgrp_dfl_root/
      * s/cgroupfs_root/cgroup_root/ as we don't do fs part directly anymore
      * s/cgroup_root->top_cgroup/cgroup_root->cgrp/ for brevity
      
      This is pure rename.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      3dd06ffa
    • Tejun Heo's avatar
      cgroup: move ->subsys_mask from cgroupfs_root to cgroup · 94419627
      Tejun Heo authored
      cgroupfs_root->subsys_mask represents the controllers attached to the
      hierarchy.  This patch moves the field to cgroup.  Subsystem
      initialization and rebinding updates the top cgroup's subsys_mask.
      For !root cgroups, the subsys_mask bits are set from create_css() and
      cleared from kill_css(), which effectively means that all cgroups will
      have the same subsys_mask as the top cgroup.
      
      While this doesn't make any difference now, this will help
      implementation of the default unified hierarchy where !root cgroups
      may have subsets of the top_cgroup's subsys_mask.
      
      While at it, __kill_css() is split out of kill_css().  The former
      doesn't care about the subsys_mask while the latter becomes noop if
      the controller is already killed and clears the matching bit if not
      before proceeding to killing the css.  This will be used later by the
      default unified hierarchy implementation.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      94419627
    • Tejun Heo's avatar
      cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding · 5df36032
      Tejun Heo authored
      Currently, while rebinding, cgroup_dummy_root serves as the anchor
      point.  In addition to the target root, rebind_subsystems() takes
      @added_mask and @removed_mask.  The subsystems specified in the former
      are expected to be on the dummy root and then moved to the target
      root.  The ones in the latter are moved from non-dummy root to dummy.
      Now that the dummy root is a fully functional one and we're planning
      to use it for the default unified hierarchy, this level of distinction
      between dummy and non-dummy roots is quite awkward.
      
      This patch updates rebind_subsystems() to take the target root and one
      subsystem mask and move the specified subsystmes to the target root
      which may or may not be the dummy root.  IOW, unbinding now becomes
      moving the subsystems to the dummy root and binding to non-dummy root.
      This makes the dummy root mostly equivalent to other hierarchies in
      terms of the mechanism of moving subsystems around; however, we still
      retain all the semantical restrictions so that this patch doesn't
      introduce any visible behavior differences.  Another noteworthy detail
      is that rebind_subsystems() guarantees that moving a subsystem to the
      dummy root never fails so that valid unmounting attempts always
      succeed.
      
      This unifies binding and unbinding of subsystems.  The invocation
      points of ->bind() were inconsistent between the two and now moved
      after whole rebinding is complete.  This doesn't break the current
      users and generally makes more sense.
      
      All rebind_subsystems() users are converted accordingly.  Note that
      cgroup_remount() now makes two calls to rebind_subsystems() to bind
      and then unbind the requested subsystems.
      
      This will allow repurposing of the dummy hierarchy as the default
      unified hierarchy and shouldn't make any userland visible behavior
      difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      5df36032
    • Tejun Heo's avatar
      cgroup: remove NULL checks from [pr_cont_]cgroup_{name|path}() · fdce6bf8
      Tejun Heo authored
      The dummy hierarchy is now a fully functional one and dummy_top has a
      kernfs_node associated with it.  Drop the NULL checks in
      [pr_cont_]cont_{name|path}() which are no longer necessary.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      fdce6bf8
    • Tejun Heo's avatar
      cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root · 985ed670
      Tejun Heo authored
      cgroup_dummy_root is used to host controllers which aren't attached to
      any other hierarchy.  The root is minimally set up during kernfs
      bootstrap and didn't go through full hierarchy initialization.  We're
      planning to use cgroup_dummy_root for the default unified hierarchy
      and thus want it to be fully functional.
      
      Replace the special initialization, which was collected into
      cgroup_init() by the previous patch, with an invocation of
      cgroup_setup_root().  This simplifies the init path and makes
      cgroup_dummy_root a full hierarchy with its own kernfs_root and all.
      
      As this puts the dummy hierarchy on the cgroup_roots list, rename
      for_each_active_root() to for_each_root() and update its users to skip
      the dummy root for now.
      
      This patch doesn't cause any userland visible behavior changes at this
      point.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      985ed670
    • Tejun Heo's avatar
      cgroup: reorganize cgroup bootstrapping · 172a2c06
      Tejun Heo authored
      * Fields of init_css_set and css_set_count are now set using
        initializer instead of programmatically from cgroup_init_early().
      
      * init_cgroup_root() now also takes @opts and performs the optional
        part of initialization too.  The leftover part of
        cgroup_root_from_opts() is collapsed into its only caller -
        cgroup_mount().
      
      * Initialization of cgroup_root_count and linking of init_css_set are
        moved from cgroup_init_early() to to cgroup_init().  None of the
        early_init users depends on init_css_set being linked.
      
      * Subsystem initializations are moved after dummy hierarchy init and
        init_css_set linking.
      
      These changes reorganize the bootstrap logic so that the dummy
      hierarchy can share the usual hierarchy init path and be made more
      normal.  These changes don't make noticeable behavior changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      172a2c06
    • Tejun Heo's avatar
      cgroup: relocate setting of CGRP_DEAD · 5d77381f
      Tejun Heo authored
      In cgroup_destroy_locked(), move setting of CGRP_DEAD above
      invocations of kill_css().  This doesn't make any visible behavior
      difference now but will be used to inhibit manipulating controller
      enable states of a dying cgroup on the unified hierarchy.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      5d77381f
  4. 03 Mar, 2014 1 commit
  5. 25 Feb, 2014 9 commits
    • Tejun Heo's avatar
      cgroup_freezer: document freezer_fork() subtleties · a60bed29
      Tejun Heo authored
      cgroup_subsys->fork() callback is special in that it's called outside
      the usual cgroup locking and may race with on-going migration.
      freezer_fork() currently doesn't consider such race condition;
      however, it is still correct thanks to the fact that freeze_task() may
      be called spuriously.
      
      This is quite subtle.  Let's explain what's going on and add test to
      detect racing and losing to task migration and skip freeze_task() in
      such cases for documentation.
      
      This doesn't make any behavior difference meaningful to userland.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      a60bed29
    • Tejun Heo's avatar
      cgroup: update cgroup_transfer_tasks() to either succeed or fail · 952aaa12
      Tejun Heo authored
      cgroup_transfer_tasks() can currently fail in the middle due to memory
      allocation failure.  When that happens, the function just aborts and
      returns error code and there's no way to tell how many actually got
      migrated at the point of failure and or to revert the partial
      migration.
      
      Update it to use cgroup_migrate{_add_src|prepare_dst|migrate|finish}()
      so that the function either succeeds or fails as a whole as long as
      ->can_attach() doesn't fail.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      952aaa12
    • Tejun Heo's avatar
      cgroup: drop task_lock() protection around task->cgroups · 0e1d768f
      Tejun Heo authored
      For optimization, task_lock() is additionally used to protect
      task->cgroups.  The optimization is pretty dubious as either
      css_set_rwsem is grabbed anyway or PF_EXITING already protects
      task->cgroups.  It adds only overhead and confusion at this point.
      Let's drop task_[un]lock() and update comments accordingly.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      0e1d768f
    • Tejun Heo's avatar
      cgroup: update how a newly forked task gets associated with css_set · eaf797ab
      Tejun Heo authored
      When a new process is forked, cgroup_fork() associates it with the
      css_set of its parent but doesn't link it into it.  After the new
      process is linked to tasklist, cgroup_post_fork() does the linking.
      
      This is problematic for cgroup_transfer_tasks() as there's no way to
      tell whether there are tasks which are pointing to a css_set but not
      linked yet.  It is impossible to implement an operation which transfer
      all tasks of a cgroup to another and the current
      cgroup_transfer_tasks() can easily be tricked into leaving a newly
      forked process behind if it gets called between cgroup_fork() and
      cgroup_post_fork().
      
      Let's make association with a css_set and linking atomic by moving it
      to cgroup_post_fork().  cgroup_fork() sets child->cgroups to
      init_css_set as a placeholder and cgroup_post_fork() is updated to
      perform both the association with the parent's cgroup and linking
      there.  This means that a newly created task will point to
      init_css_set without holding a ref to it much like what it does on the
      exit path.  Empty cg_list is used to indicate that the task isn't
      holding a ref to the associated css_set.
      
      This fixes an actual bug with cgroup_transfer_tasks(); however, I'm
      not marking it for -stable.  The whole thing is broken in multiple
      other ways which require invasive updates to fix and I don't think
      it's worthwhile to bother with backporting this particular one.
      Fortunately, the only user is cpuset and these bugs don't crash the
      machine.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      eaf797ab
    • Tejun Heo's avatar
      cgroup: split process / task migration into four steps · 1958d2d5
      Tejun Heo authored
      Currently, process / task migration is a single operation which may
      fail depending on memory pressure or the involved controllers'
      ->can_attach() callbacks.  One problem with this approach is migration
      of multiple targets.  It's impossible to tell whether a given target
      will be successfully migrated beforehand and cgroup core can't keep
      track of enough states to roll back after intermediate failure.
      
      This is already an issue with cgroup_transfer_tasks().  Also, we're
      gonna need multiple target migration for unified hierarchy.
      
      This patch splits migration into four stages -
      cgroup_migrate_add_src(), cgroup_migrate_prepare_dst(),
      cgroup_migrate() and cgroup_migrate_finish(), where
      cgroup_migrate_prepare_dst() performs all the operations which may
      fail due to allocation failure without actually migrating the target.
      
      The four separate stages mean that, disregarding ->can_attach()
      failures, the success or failure of multi target migration can be
      determined before performing any actual migration.  If preparations of
      all targets succeed, the whole thing will succeed.  If not, the whole
      operation can fail without any side-effect.
      
      Since the previous patch to use css_set->mg_tasks to keep track of
      migration targets, the only thing which may need memory allocation
      during migration is the target css_sets.  cgroup_migrate_prepare()
      pins all source and target css_sets and link them up.  Note that this
      can be performed without holding threadgroup_lock even if the target
      is a process.  As long as cgroup_mutex is held, no new css_set can be
      put into play.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      1958d2d5
    • Tejun Heo's avatar
      cgroup: separate out cset_group_from_root() from task_cgroup_from_root() · ceb6a081
      Tejun Heo authored
      This will be used by the planned migration path update.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      ceb6a081
    • Tejun Heo's avatar
      cgroup: use css_set->mg_tasks to track target tasks during migration · b3dc094e
      Tejun Heo authored
      Currently, while migrating tasks from one cgroup to another,
      cgroup_attach_task() builds a flex array of all target tasks;
      unfortunately, this has a couple issues.
      
      * Flex array has size limit.  On 64bit, struct task_and_cgroup is
        24bytes making the flex element limit around 87k.  It is a high
        number but not impossible to hit.  This means that the current
        cgroup implementation can't migrate a process with more than 87k
        threads.
      
      * Process migration involves memory allocation whose size is dependent
        on the number of threads the process has.  This means that cgroup
        core can't guarantee success or failure of multi-process migrations
        as memory allocation failure can happen in the middle.  This is in
        part because cgroup can't grab threadgroup locks of multiple
        processes at the same time, so when there are multiple processes to
        migrate, it is imposible to tell how many tasks are to be migrated
        beforehand.
      
        Note that this already affects cgroup_transfer_tasks().  cgroup
        currently cannot guarantee atomic success or failure of the
        operation.  It may fail in the middle and after such failure cgroup
        doesn't have enough information to roll back properly.  It just
        aborts with some tasks migrated and others not.
      
      To resolve the situation, this patch updates the migration path to use
      task->cg_list to track target tasks.  The previous patch already added
      css_set->mg_tasks and updated iterations in non-migration paths to
      include them during task migration.  This patch updates migration path
      to actually make use of it.
      
      Instead of putting onto a flex_array, each target task is moved from
      its css_set->tasks list to css_set->mg_tasks and the migration path
      keeps trace of all the source css_sets and the associated cgroups.
      Once all source css_sets are determined, the destination css_set for
      each is determined, linked to the matching source css_set and put on a
      separate list.
      
      To iterate the target tasks, migration path just needs to iterat
      through either the source or target css_sets, depending on whether
      migration has been committed or not, and the tasks on their ->mg_tasks
      lists.  cgroup_taskset is updated to contain the list_heads for source
      and target css_sets and the iteration cursor.  cgroup_taskset_*() are
      accordingly updated to walk through css_sets and their ->mg_tasks.
      
      This resolves the above listed issues with moderate additional
      complexity.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      b3dc094e
    • Tejun Heo's avatar
      cgroup: add css_set->mg_tasks · c7561128
      Tejun Heo authored
      Currently, while migrating tasks from one cgroup to another,
      cgroup_attach_task() builds a flex array of all target tasks;
      unfortunately, this has a couple issues.
      
      * Flex array has size limit.  On 64bit, struct task_and_cgroup is
        24bytes making the flex element limit around 87k.  It is a high
        number but not impossible to hit.  This means that the current
        cgroup implementation can't migrate a process with more than 87k
        threads.
      
      * Process migration involves memory allocation whose size is dependent
        on the number of threads the process has.  This means that cgroup
        core can't guarantee success or failure of multi-process migrations
        as memory allocation failure can happen in the middle.  This is in
        part because cgroup can't grab threadgroup locks of multiple
        processes at the same time, so when there are multiple processes to
        migrate, it is imposible to tell how many tasks are to be migrated
        beforehand.
      
        Note that this already affects cgroup_transfer_tasks().  cgroup
        currently cannot guarantee atomic success or failure of the
        operation.  It may fail in the middle and after such failure cgroup
        doesn't have enough information to roll back properly.  It just
        aborts with some tasks migrated and others not.
      
      To resolve the situation, we're going to use task->cg_list during
      migration too.  Instead of building a separate array, target tasks
      will be linked into a dedicated migration list_head on the owning
      css_set.  Tasks on the migration list are treated the same as tasks on
      the usual tasks list; however, being on a separate list allows cgroup
      migration code path to keep track of the target tasks by simply
      keeping the list of css_sets with tasks being migrated, making
      unpredictable dynamic allocation unnecessary.
      
      In prepartion of such migration path update, this patch introduces
      css_set->mg_tasks list and updates css_set task iterations so that
      they walk both css_set->tasks and ->mg_tasks.  Note that ->mg_tasks
      isn't used yet.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      c7561128
    • Tejun Heo's avatar
      Merge branch 'cgroup/for-3.14-fixes' into cgroup/for-3.15 · f153ad11
      Tejun Heo authored
      Pull in for-3.14-fixes to receive 532de3fc ("cgroup: update
      cgroup_enable_task_cg_lists() to grab siglock") which conflicts with
      afeb0f9f ("cgroup: relocate cgroup_enable_task_cg_lists()") and
      the following cg_lists updates.  This is likely to cause further
      conflicts down the line too, so let's merge it early.
      
      As cgroup_enable_task_cg_lists() is relocated in for-3.15, this merge
      causes conflict in the original position.  It's resolved by applying
      siglock changes to the updated version in the new location.
      
      Conflicts:
      	kernel/cgroup.c
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      f153ad11
  6. 18 Feb, 2014 2 commits
    • Tejun Heo's avatar
      cgroup: update cgroup_enable_task_cg_lists() to grab siglock · 532de3fc
      Tejun Heo authored
      Currently, there's nothing preventing cgroup_enable_task_cg_lists()
      from missing set PF_EXITING and race against cgroup_exit().  Depending
      on the timing, cgroup_exit() may finish with the task still linked on
      css_set leading to list corruption.  Fix it by grabbing siglock in
      cgroup_enable_task_cg_lists() so that PF_EXITING is guaranteed to be
      visible.
      
      This whole on-demand cg_list optimization is extremely fragile and has
      ample possibility to lead to bugs which can cause things like
      once-a-year oops during boot.  I'm wondering whether the better
      approach would be just adding "cgroup_disable=all" handling which
      disables the whole cgroup rather than tempting fate with this
      on-demand craziness.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: stable@vger.kernel.org
      532de3fc
    • Li Zefan's avatar
      cgroup: add a validation check to cgroup_add_cftyps() · dc5736ed
      Li Zefan authored
      Fengguang reported this bug:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000003c
      IP: [<cc90b4ad>] cgroup_cfts_commit+0x27/0x1c1
      ...
      Call Trace:
        [<cc9d1129>] ? kmem_cache_alloc_trace+0x33f/0x3b7
        [<cc90c6fc>] cgroup_add_cftypes+0x8f/0xca
        [<cd78b646>] cgroup_init+0x6a/0x26a
        [<cd764d7d>] start_kernel+0x4d7/0x57a
        [<cd7642ef>] i386_start_kernel+0x92/0x96
      
      This happens in a corner case. If CGROUP_SCHED=y but CFS_BANDWIDTH=n &&
      FAIR_GROUP_SCHED=n && RT_GROUP_SCHED=n, we have:
      
      cpu_files[] = {
      	{ }	/* terminate */
      }
      
      When we pass cpu_files to cgroup_apply_cftypes(), as cpu_files[0].ss
      is NULL, we'll access NULL pointer.
      
      The bug was introduced by commit de00ffa5
      ("cgroup: make cgroup_subsys->base_cftypes use cgroup_add_cftypes()").
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      dc5736ed
  7. 14 Feb, 2014 3 commits
  8. 13 Feb, 2014 12 commits
    • Fengguang Wu's avatar
      cgroup: fix coccinelle warnings · 430af8ad
      Fengguang Wu authored
      kernel/cgroup.c:2256:1-3: WARNING: PTR_RET can be used
      
       Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
      
      Generated by: coccinelle/api/ptr_ret.cocci
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      430af8ad
    • Paul Gortmaker's avatar
      sparc: fix implicit include of slab.h in leon_pci_grpci2.c · d6250ee2
      Paul Gortmaker authored
      To fix:
      
      arch/sparc/kernel/leon_pci_grpci2.c: In function 'grpci2_of_probe':
      arch/sparc/kernel/leon_pci_grpci2.c:720:2: error: implicit declaration of function 'kzalloc' [-Werror=implicit-function-declaration]
      arch/sparc/kernel/leon_pci_grpci2.c:720:20: error: assignment makes pointer from integer without a cast [-Werror]
      arch/sparc/kernel/leon_pci_grpci2.c:882:2: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration]
      cc1: all warnings being treated as errors
      make[2]: *** [arch/sparc/kernel/leon_pci_grpci2.o] Error 1
      
      According to Stephen, these types of failures are caused by commit
      2bd59d48 ("cgroup: convert to kernfs") which was being included
      implicitly via cgroup.h's inclusion of xattr.h (which has now been
      removed).
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      d6250ee2
    • Tejun Heo's avatar
      cgroup: unexport functions · 8541fecc
      Tejun Heo authored
      With module support gone, a lot of functions no longer need to be
      exported.  Unexport them.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      8541fecc
    • Tejun Heo's avatar
      cgroup: cosmetic updates to cgroup_attach_task() · 9db8de37
      Tejun Heo authored
      cgroup_attach_task() is planned to go through restructuring.  Let's
      tidy it up a bit in preparation.
      
      * Update cgroup_attach_task() to receive the target task argument in
        @leader instead of @tsk.
      
      * Rename @tsk to @task.
      
      * Rename @retval to @ret.
      
      This is purely cosmetic.
      
      v2: get_nr_threads() was using uninitialized @task instead of @leader.
          Fixed.  Reported by Dan Carpenter.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      9db8de37
    • Tejun Heo's avatar
      cgroup: remove cgroup_taskset_cur_css() and cgroup_taskset_size() · bc668c75
      Tejun Heo authored
      The two functions don't have any users left.  Remove them along with
      cgroup_taskset->cur_cgrp.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      bc668c75
    • Tejun Heo's avatar
      cpuset: don't use cgroup_taskset_cur_css() · 57fce0a6
      Tejun Heo authored
      cgroup_taskset_cur_css() will be removed during the planned
      resturcturing of migration path.  The only use of
      cgroup_taskset_cur_css() is finding out the old cgroup_subsys_state of
      the leader in cpuset_attach().  This usage can easily be removed by
      remembering the old value from cpuset_can_attach().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      57fce0a6
    • Tejun Heo's avatar
      cgroup: drop @skip_css from cgroup_taskset_for_each() · 924f0d9a
      Tejun Heo authored
      If !NULL, @skip_css makes cgroup_taskset_for_each() skip the matching
      css.  The intention of the interface is to make it easy to skip css's
      (cgroup_subsys_states) which already match the migration target;
      however, this is entirely unnecessary as migration taskset doesn't
      include tasks which are already in the target cgroup.  Drop @skip_css
      from cgroup_taskset_for_each().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      924f0d9a
    • Tejun Heo's avatar
      cgroup: move css_set_rwsem locking outside of cgroup_task_migrate() · cb0f1fe9
      Tejun Heo authored
      Instead of repeatedly locking and unlocking css_set_rwsem inside
      cgroup_task_migrate(), update cgroup_attach_task() to grab it outside
      of the loop and update cgroup_task_migrate() to use
      put_css_set_locked().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      cb0f1fe9
    • Tejun Heo's avatar
      cgroup: separate out put_css_set_locked() and remove put_css_set_taskexit() · 89c5509b
      Tejun Heo authored
      put_css_set() is performed in two steps - it first tries to put
      without grabbing css_set_rwsem if such put wouldn't make the count
      zero.  If that fails, it puts after write-locking css_set_rwsem.  This
      patch separates out the second phase into put_css_set_locked() which
      should be called with css_set_rwsem locked.
      
      Also, put_css_set_taskexit() is droped and put_css_set() is made to
      take @taskexit.  There are only a handful users of these functions.
      No point in providing different variants.
      
      put_css_locked() will be used by later changes.  This patch doesn't
      introduce any functional changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      89c5509b
    • Tejun Heo's avatar
      cgroup: remove css_scan_tasks() · 889ed9ce
      Tejun Heo authored
      css_scan_tasks() doesn't have any user left.  Remove it.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      889ed9ce
    • Tejun Heo's avatar
      cpuset: use css_task_iter_start/next/end() instead of css_scan_tasks() · d66393e5
      Tejun Heo authored
      Now that css_task_iter_start/next_end() supports blocking while
      iterating, there's no reason to use css_scan_tasks() which is more
      cumbersome to use and scheduled to be removed.
      
      Convert all css_scan_tasks() usages in cpuset to
      css_task_iter_start/next/end().  This simplifies the code by removing
      heap allocation and callbacks.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      d66393e5
    • Tejun Heo's avatar
      cgroup: make css_set_lock a rwsem and rename it to css_set_rwsem · 96d365e0
      Tejun Heo authored
      Currently there are two ways to walk tasks of a cgroup -
      css_task_iter_start/next/end() and css_scan_tasks().  The latter
      builds on the former but allows blocking while iterating.
      Unfortunately, the way css_scan_tasks() is implemented is rather
      nasty, it uses a priority heap of pointers to extract some number of
      tasks in task creation order and loops over them invoking the callback
      and repeats that until it reaches the end.  It requires either
      preallocated heap or may fail under memory pressure, while unlikely to
      be problematic, the complexity is O(N^2), and in general just nasty.
      
      We're gonna convert all css_scan_users() to
      css_task_iter_start/next/end() and remove css_scan_users().  As
      css_scan_tasks() users may block, let's convert css_set_lock to a
      rwsem so that tasks can block during css_task_iter_*() is in progress.
      
      While this does increase the chance of possible deadlock scenarios,
      given the current usage, the probability is relatively low, and even
      if that happens, the right thing to do is updating the iteration in
      the similar way to css iterators so that it can handle blocking.
      
      Most conversions are trivial; however, task_cgroup_path() now expects
      to be called with css_set_rwsem locked instead of locking itself.
      This is because the function is called with RCU read lock held and
      rwsem locking should nest outside RCU read lock.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      96d365e0