1. 27 Feb, 2014 2 commits
    • Li Zefan's avatar
      cpuset: fix a race condition in __cpuset_node_allowed_softwall() · 99afb0fd
      Li Zefan authored
      It's not safe to access task's cpuset after releasing task_lock().
      Holding callback_mutex won't help.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      99afb0fd
    • Li Zefan's avatar
      cpuset: fix a locking issue in cpuset_migrate_mm() · 47295830
      Li Zefan authored
      I can trigger a lockdep warning:
      
        # mount -t cgroup -o cpuset xxx /cgroup
        # mkdir /cgroup/cpuset
        # mkdir /cgroup/tmp
        # echo 0 > /cgroup/tmp/cpuset.cpus
        # echo 0 > /cgroup/tmp/cpuset.mems
        # echo 1 > /cgroup/tmp/cpuset.memory_migrate
        # echo $$ > /cgroup/tmp/tasks
        # echo 1 > /cgruop/tmp/cpuset.mems
      
        ===============================
        [ INFO: suspicious RCU usage. ]
        3.14.0-rc1-0.1-default+ #32 Not tainted
        -------------------------------
        include/linux/cgroup.h:682 suspicious rcu_dereference_check() usage!
        ...
          [<ffffffff81582174>] dump_stack+0x72/0x86
          [<ffffffff810b8f01>] lockdep_rcu_suspicious+0x101/0x140
          [<ffffffff81105ba1>] cpuset_migrate_mm+0xb1/0xe0
        ...
      
      We used to hold cgroup_mutex when calling cpuset_migrate_mm(), but now
      we hold cpuset_mutex, which causes task_css() to complain.
      
      This is not a false-positive but a real issue.
      
      Holding cpuset_mutex won't prevent a task from migrating to another
      cpuset, and it won't prevent the original task->cgroup from destroying
      during this change.
      
      Fixes: 5d21cc2d (cpuset: replace cgroup_mutex locking with cpuset internal locking)
      Cc: <stable@vger.kernel.org> # 3.9+
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Sigend-off-by: default avatarTejun Heo <tj@kernel.org>
      47295830
  2. 18 Feb, 2014 1 commit
    • Tejun Heo's avatar
      cgroup: update cgroup_enable_task_cg_lists() to grab siglock · 532de3fc
      Tejun Heo authored
      Currently, there's nothing preventing cgroup_enable_task_cg_lists()
      from missing set PF_EXITING and race against cgroup_exit().  Depending
      on the timing, cgroup_exit() may finish with the task still linked on
      css_set leading to list corruption.  Fix it by grabbing siglock in
      cgroup_enable_task_cg_lists() so that PF_EXITING is guaranteed to be
      visible.
      
      This whole on-demand cg_list optimization is extremely fragile and has
      ample possibility to lead to bugs which can cause things like
      once-a-year oops during boot.  I'm wondering whether the better
      approach would be just adding "cgroup_disable=all" handling which
      disables the whole cgroup rather than tempting fate with this
      on-demand craziness.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: stable@vger.kernel.org
      532de3fc
  3. 13 Feb, 2014 1 commit
    • Tejun Heo's avatar
      Revert "cgroup: use an ordered workqueue for cgroup destruction" · 1a11533f
      Tejun Heo authored
      This reverts commit ab3f5faa.
      Explanation from Hugh:
      
        It's because more thorough testing, by others here, found that it
        wasn't always solving the problem: so I asked Tejun privately to
        hold off from sending it in, until we'd worked out why not.
      
        Most of our testing being on a v3,11-based kernel, it was perfectly
        possible that the problem was merely our own e.g. missing Tejun's
        8a2b7538 ("workqueue: fix ordered workqueues in NUMA setups").
      
        But that turned out not to be enough to fix it either. Then Filipe
        pointed out how percpu_ref_kill_and_confirm() uses call_rcu_sched()
        before we ever get to put the offline on to the workqueue: by the
        time we get to the workqueue, the ordering has already been lost.
      
        So, thanks for the Acks, but I'm afraid that this ordered workqueue
        solution is just not good enough: we should simply forget that patch
        and provide a different answer."
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      1a11533f
  4. 11 Feb, 2014 1 commit
    • Li Zefan's avatar
      cgroup: protect modifications to cgroup_idr with cgroup_mutex · 0ab02ca8
      Li Zefan authored
      Setup cgroupfs like this:
        # mount -t cgroup -o cpuacct xxx /cgroup
        # mkdir /cgroup/sub1
        # mkdir /cgroup/sub2
      
      Then run these two commands:
        # for ((; ;)) { mkdir /cgroup/sub1/tmp && rmdir /mnt/sub1/tmp; } &
        # for ((; ;)) { mkdir /cgroup/sub2/tmp && rmdir /mnt/sub2/tmp; } &
      
      After seconds you may see this warning:
      
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 25243 at lib/idr.c:527 sub_remove+0x87/0x1b0()
      idr_remove called for id=6 which is not allocated.
      ...
      Call Trace:
       [<ffffffff8156063c>] dump_stack+0x7a/0x96
       [<ffffffff810591ac>] warn_slowpath_common+0x8c/0xc0
       [<ffffffff81059296>] warn_slowpath_fmt+0x46/0x50
       [<ffffffff81300aa7>] sub_remove+0x87/0x1b0
       [<ffffffff810f3f02>] ? css_killed_work_fn+0x32/0x1b0
       [<ffffffff81300bf5>] idr_remove+0x25/0xd0
       [<ffffffff810f2bab>] cgroup_destroy_css_killed+0x5b/0xc0
       [<ffffffff810f4000>] css_killed_work_fn+0x130/0x1b0
       [<ffffffff8107cdbc>] process_one_work+0x26c/0x550
       [<ffffffff8107eefe>] worker_thread+0x12e/0x3b0
       [<ffffffff81085f96>] kthread+0xe6/0xf0
       [<ffffffff81570bac>] ret_from_fork+0x7c/0xb0
      ---[ end trace 2d1577ec10cf80d0 ]---
      
      It's because allocating/removing cgroup ID is not properly synchronized.
      
      The bug was introduced when we converted cgroup_ida to cgroup_idr.
      While synchronization is already done inside ida_simple_{get,remove}(),
      users are responsible for concurrent calls to idr_{alloc,remove}().
      
      tj: Refreshed on top of b58c8998 ("cgroup: fix error return from
      cgroup_create()").
      
      Fixes: 4e96ee8e ("cgroup: convert cgroup_ida to cgroup_idr")
      Cc: <stable@vger.kernel.org> #3.12+
      Reported-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      0ab02ca8
  5. 08 Feb, 2014 3 commits
    • Tejun Heo's avatar
      cgroup: fix locking in cgroup_cfts_commit() · 48573a89
      Tejun Heo authored
      cgroup_cfts_commit() walks the cgroup hierarchy that the target
      subsystem is attached to and tries to apply the file changes.  Due to
      the convolution with inode locking, it can't keep cgroup_mutex locked
      while iterating.  It currently holds only RCU read lock around the
      actual iteration and then pins the found cgroup using dget().
      
      Unfortunately, this is incorrect.  Although the iteration does check
      cgroup_is_dead() before invoking dget(), there's nothing which
      prevents the dentry from going away inbetween.  Note that this is
      different from the usual css iterations where css_tryget() is used to
      pin the css - css_tryget() tests whether the css can be pinned and
      fails if not.
      
      The problem can be solved by simply holding cgroup_mutex instead of
      RCU read lock around the iteration, which actually reduces LOC.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: stable@vger.kernel.org
      48573a89
    • Tejun Heo's avatar
      cgroup: fix error return from cgroup_create() · b58c8998
      Tejun Heo authored
      cgroup_create() was returning 0 after allocation failures.  Fix it.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: stable@vger.kernel.org
      b58c8998
    • Tejun Heo's avatar
      cgroup: fix error return value in cgroup_mount() · eb46bf89
      Tejun Heo authored
      When cgroup_mount() fails to allocate an id for the root, it didn't
      set ret before jumping to unlock_drop ending up returning 0 after a
      failure.  Fix it.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: stable@vger.kernel.org
      eb46bf89
  6. 07 Feb, 2014 1 commit
    • Hugh Dickins's avatar
      cgroup: use an ordered workqueue for cgroup destruction · ab3f5faa
      Hugh Dickins authored
      Sometimes the cleanup after memcg hierarchy testing gets stuck in
      mem_cgroup_reparent_charges(), unable to bring non-kmem usage down to 0.
      
      There may turn out to be several causes, but a major cause is this: the
      workitem to offline parent can get run before workitem to offline child;
      parent's mem_cgroup_reparent_charges() circles around waiting for the
      child's pages to be reparented to its lrus, but it's holding cgroup_mutex
      which prevents the child from reaching its mem_cgroup_reparent_charges().
      
      Just use an ordered workqueue for cgroup_destroy_wq.
      
      tj: Committing as the temporary fix until the reverse dependency can
          be removed from memcg.  Comment updated accordingly.
      
      Fixes: e5fca243 ("cgroup: use a dedicated workqueue for cgroup destruction")
      Suggested-by: default avatarFilipe Brandenburger <filbranden@google.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      ab3f5faa
  7. 03 Feb, 2014 7 commits
    • Tejun Heo's avatar
      nfs: include xattr.h from fs/nfs/nfs3proc.c · 0a6be655
      Tejun Heo authored
      fs/nfs/nfs3proc.c is making use of xattr but was getting linux/xattr.h
      indirectly through linux/cgroup.h, which will soon drop the inclusion
      of xattr.h.  Explicitly include linux/xattr.h from nfs3proc.c so that
      compilation doesn't fail when linux/cgroup.h drops linux/xattr.h.
      
      As the following cgroup changes will depend on these changes, it
      probably would be easier to route this through cgroup branch.  Would
      that be okay?
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Cc: linux-nfs@vger.kernel.org
      0a6be655
    • Li Zefan's avatar
      cpuset: update MAINTAINERS entry · 230579d7
      Li Zefan authored
      Add mailing list and tree tag to the entry.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      230579d7
    • Tejun Heo's avatar
      arm, pm, vmpressure: add missing slab.h includes · 1ff6bbfd
      Tejun Heo authored
      arch/arm/mach-tegra/pm.c, kernel/power/console.c and mm/vmpressure.c
      were somehow getting slab.h indirectly through cgroup.h which in turn
      was getting it indirectly through xattr.h.  A scheduled cgroup change
      drops xattr.h inclusion from cgroup.h and breaks compilation of these
      three files.  Add explicit slab.h includes to the three files.
      
      A pending cgroup patch depends on this change and it'd be great if
      this can be routed through cgroup/for-3.14-fixes branch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarStephen Warren <swarren@wwwdotorg.org>
      Cc: Thierry Reding <thierry.reding@gmail.com>
      Cc: linux-tegra@vger.kernel.org
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: linux-pm@vger.kernel.org
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: cgroups@vger.kernel.org
      1ff6bbfd
    • Linus Torvalds's avatar
      Linus 3.14-rc1 · 38dbfb59
      Linus Torvalds authored
      38dbfb59
    • Linus Torvalds's avatar
      Merge branch 'parisc-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 69048e01
      Linus Torvalds authored
      Pull parisc updates from Helge Deller:
       "The three major changes in this patchset is a implementation for
        flexible userspace memory maps, cache-flushing fixes (again), and a
        long-discussed ABI change to make EWOULDBLOCK the same value as
        EAGAIN.
      
        parisc has been the only platform where we had EWOULDBLOCK != EAGAIN
        to keep HP-UX compatibility.  Since we will probably never implement
        full HP-UX support, we prefer to drop this compatibility to make it
        easier for us with Linux userspace programs which mostly never checked
        for both values.  We don't expect major fall-outs because of this
        change, and if we face some, we will simply rebuild the necessary
        applications in the debian archives"
      
      * 'parisc-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: add flexible mmap memory layout support
        parisc: Make EWOULDBLOCK be equal to EAGAIN on parisc
        parisc: convert uapi/asm/stat.h to use native types only
        parisc: wire up sched_setattr and sched_getattr
        parisc: fix cache-flushing
        parisc/sti_console: prefer Linux fonts over built-in ROM fonts
      69048e01
    • Mikulas Patocka's avatar
      hpfs: optimize quad buffer loading · 1c0b8a7a
      Mikulas Patocka authored
      HPFS needs to load 4 consecutive 512-byte sectors when accessing the
      directory nodes or bitmaps.  We can't switch to 2048-byte block size
      because files are allocated in the units of 512-byte sectors.
      
      Previously, the driver would allocate a 2048-byte area using kmalloc,
      copy the data from four buffers to this area and eventually copy them
      back if they were modified.
      
      In the current implementation of the buffer cache, buffers are allocated
      in the pagecache.  That means that 4 consecutive 512-byte buffers are
      stored in consecutive areas in the kernel address space.  So, we don't
      need to allocate extra memory and copy the content of the buffers there.
      
      This patch optimizes the code to avoid copying the buffers.  It checks
      if the four buffers are stored in contiguous memory - if they are not,
      it falls back to allocating a 2048-byte area and copying data there.
      Signed-off-by: default avatarMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c0b8a7a
    • Mikulas Patocka's avatar
      hpfs: remember free space · 2cbe5c76
      Mikulas Patocka authored
      Previously, hpfs scanned all bitmaps each time the user asked for free
      space using statfs.  This patch changes it so that hpfs scans the
      bitmaps only once, remembes the free space and on next invocation of
      statfs it returns the value instantly.
      
      New versions of wine are hammering on the statfs syscall very heavily,
      making some games unplayable when they're stored on hpfs, with load
      times in minutes.
      
      This should be backported to the stable kernels because it fixes
      user-visible problem (excessive level load times in wine).
      Signed-off-by: default avatarMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2cbe5c76
  8. 02 Feb, 2014 12 commits
  9. 01 Feb, 2014 12 commits