1. 07 Mar, 2012 5 commits
    • Linus Torvalds's avatar
      x86: fix typo in recent find_vma_prev purge · 55062d06
      Linus Torvalds authored
      It turns out that test-compiling this file on x86-64 doesn't really
      help, because much of it is x86-32-specific.  And so I hadn't noticed
      the slightly over-eager removal of the 'r' from 'addr' variable despite
      thinking I had tested it.
      Signed-off-by: default avatarLinus "oopsie" Torvalds <torvalds@linux-foundation.org>
      55062d06
    • Linus Torvalds's avatar
      vm: avoid using find_vma_prev() unnecessarily · 097d5910
      Linus Torvalds authored
      Several users of "find_vma_prev()" were not in fact interested in the
      previous vma if there was no primary vma to be found either.  And in
      those cases, we're much better off just using the regular "find_vma()",
      and then "prev" can be looked up by just checking vma->vm_prev.
      
      The find_vma_prev() semantics are fairly subtle (see Mikulas' recent
      commit 83cd904d: "mm: fix find_vma_prev"), and the whole "return
      prev by reference" means that it generates worse code too.
      
      Thus this "let's avoid using this inconvenient and clearly too subtle
      interface when we don't really have to" patch.
      
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      097d5910
    • Linus Torvalds's avatar
      Merge git://git.samba.org/sfrench/cifs-2.6 · 71fece95
      Linus Torvalds authored
      Pull CIFS fixes from Steve French
      
      * git://git.samba.org/sfrench/cifs-2.6:
        cifs: fix dentry refcount leak when opening a FIFO on lookup
        CIFS: Fix mkdir/rmdir bug for the non-POSIX case
      71fece95
    • Mikulas Patocka's avatar
      mm: fix find_vma_prev · 83cd904d
      Mikulas Patocka authored
      Commit 6bd4837d ("mm: simplify find_vma_prev()") broke memory
      management on PA-RISC.
      
      After application of the patch, programs that allocate big arrays on the
      stack crash with segfault, for example, this will crash if compiled
      without optimization:
      
        int main()
        {
      	char array[200000];
      	array[199999] = 0;
      	return 0;
        }
      
      The reason is that PA-RISC has up-growing stack and the stack is usually
      the last memory area.  In the above example, a page fault happens above
      the stack.
      
      Previously, if we passed too high address to find_vma_prev, it returned
      NULL and stored the last VMA in *pprev.  After "simplify find_vma_prev"
      change, it stores NULL in *pprev.  Consequently, the stack area is not
      found and it is not expanded, as it used to be before the change.
      
      This patch restores the old behavior and makes it return the last VMA in
      *pprev if the requested address is higher than address of any other VMA.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83cd904d
    • Thomas Gleixner's avatar
      genirq: Clear action->thread_mask if IRQ_ONESHOT is not set · 52abb700
      Thomas Gleixner authored
      Xommit ac563761(genirq: Unmask oneshot irqs when thread was not woken)
      fails to unmask when a !IRQ_ONESHOT threaded handler is handled by
      handle_level_irq.
      
      This happens because thread_mask is or'ed unconditionally in
      irq_wake_thread(), but for !IRQ_ONESHOT interrupts never cleared.  So
      the check for !desc->thread_active fails and keeps the interrupt
      disabled.
      
      Keep the thread_mask zero for !IRQ_ONESHOT interrupts.
      
      Document the thread_mask magic while at it.
      Reported-and-tested-by: default avatarSven Joachim <svenjoac@gmx.de>
      Reported-and-tested-by: default avatarStefan Lippers-Hollmann <s.l-h@gmx.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      52abb700
  2. 06 Mar, 2012 17 commits
  3. 05 Mar, 2012 18 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (Andrew's patch bomb) · 3e85fb9c
      Linus Torvalds authored
      Merge the emailed seties of 19 patches from Andrew Morton
      
      * akpm:
        rapidio/tsi721: fix queue wrapping bug in inbound doorbell handler
        memcg: fix mapcount check in move charge code for anonymous page
        mm: thp: fix BUG on mm->nr_ptes
        alpha: fix 32/64-bit bug in futex support
        memcg: fix GPF when cgroup removal races with last exit
        debugobjects: Fix selftest for static warnings
        floppy/scsi: fix setting of BIO flags
        memcg: fix deadlock by inverting lrucare nesting
        drivers/rtc/rtc-r9701.c: fix crash in r9701_remove()
        c2port: class_create() returns an ERR_PTR
        pps: class_create() returns an ERR_PTR, not NULL
        hung_task: fix the broken rcu_lock_break() logic
        vfork: kill PF_STARTING
        coredump_wait: don't call complete_vfork_done()
        vfork: make it killable
        vfork: introduce complete_vfork_done()
        aio: wake up waiters when freeing unused kiocbs
        kprobes: return proper error code from register_kprobe()
        kmsg_dump: don't run on non-error paths by default
      3e85fb9c
    • Alexandre Bounine's avatar
      rapidio/tsi721: fix queue wrapping bug in inbound doorbell handler · b24823e6
      Alexandre Bounine authored
      Fix a bug that causes a kernel panic when the number of received doorbells
      is larger than number of entries in the inbound doorbell queue (current
      default value = 512).
      
      Another possible indication for this bug is large number of spurious
      doorbells reported by tsi721 driver after reaching the queue size maximum.
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Chul Kim <chul.kim@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: <stable@vger.kernel.org>		[3.2.x+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b24823e6
    • Naoya Horiguchi's avatar
      memcg: fix mapcount check in move charge code for anonymous page · e6ca7b89
      Naoya Horiguchi authored
      Currently the charge on shared anonyous pages is supposed not to moved in
      task migration.  To implement this, we need to check that mapcount > 1,
      instread of > 2.  So this patch fixes it.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e6ca7b89
    • Andrea Arcangeli's avatar
      mm: thp: fix BUG on mm->nr_ptes · 1c641e84
      Andrea Arcangeli authored
      Dave Jones reports a few Fedora users hitting the BUG_ON(mm->nr_ptes...)
      in exit_mmap() recently.
      
      Quoting Hugh's discovery and explanation of the SMP race condition:
      
        "mm->nr_ptes had unusual locking: down_read mmap_sem plus
         page_table_lock when incrementing, down_write mmap_sem (or mm_users
         0) when decrementing; whereas THP is careful to increment and
         decrement it under page_table_lock.
      
         Now most of those paths in THP also hold mmap_sem for read or write
         (with appropriate checks on mm_users), but two do not: when
         split_huge_page() is called by hwpoison_user_mappings(), and when
         called by add_to_swap().
      
         It's conceivable that the latter case is responsible for the
         exit_mmap() BUG_ON mm->nr_ptes that has been reported on Fedora."
      
      The simplest way to fix it without having to alter the locking is to make
      split_huge_page() a noop in nr_ptes terms, so by counting the preallocated
      pagetables that exists for every mapped hugepage.  It was an arbitrary
      choice not to count them and either way is not wrong or right, because
      they are not used but they're still allocated.
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Reported-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Cc: <stable@vger.kernel.org>	[3.0.x, 3.1.x, 3.2.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1c641e84
    • Andrew Morton's avatar
      alpha: fix 32/64-bit bug in futex support · 62aca403
      Andrew Morton authored
      Michael Cree said:
      
      : : I have noticed some user space problems (pulseaudio crashes in pthread
      : : code, glibc/nptl test suite failures, java compiler freezes on SMP alpha
      : : systems) that arise when using a 2.6.39 or later kernel on Alpha.
      : : Bisecting between 2.6.38 and 2.6.39 (using glibc/nptl test suite as
      : : criterion for good/bad kernel) eventually leads to:
      : :
      : : 8d7718aa is the first bad commit
      : : commit 8d7718aa
      : : Author: Michel Lespinasse <walken@google.com>
      : : Date:   Thu Mar 10 18:50:58 2011 -0800
      : :
      : :     futex: Sanitize futex ops argument types
      : :
      : :     Change futex_atomic_op_inuser and futex_atomic_cmpxchg_inatomic
      : :     prototypes to use u32 types for the futex as this is the data type the
      : :     futex core code uses all over the place.
      : :
      : : Looking at the commit I see there is a change of the uaddr argument in
      : : the Alpha architecture specific code for futexes from int to u32, but I
      : : don't see why this should cause a problem.
      
      Richard Henderson said:
      
      : futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
      :                               u32 oldval, u32 newval)
      : ...
      :         :       "r"(uaddr), "r"((long)oldval), "r"(newval)
      :
      :
      : There is no 32-bit compare instruction.  These are implemented by
      : consistently extending the values to a 64-bit type.  Since the
      : load instruction sign-extends, we want to sign-extend the other
      : quantity as well (despite the fact it's logically unsigned).
      :
      : So:
      :
      : -        :       "r"(uaddr), "r"((long)oldval), "r"(newval)
      : +        :       "r"(uaddr), "r"((long)(int)oldval), "r"(newval)
      :
      : should do the trick.
      
      Michael said:
      
      : This fixes the glibc test suite failures and the pulseaudio related
      : crashes, but it does not fix the java compiiler lockups that I was (and
      : are still) observing.  That is some other problem.
      Reported-by: default avatarMichael Cree <mcree@orcon.net.nz>
      Tested-by: default avatarMichael Cree <mcree@orcon.net.nz>
      Acked-by: default avatarPhil Carmody <ext-phil.2.carmody@nokia.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Reviewed-by: default avatarMatt Turner <mattst88@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      62aca403
    • Hugh Dickins's avatar
      memcg: fix GPF when cgroup removal races with last exit · 7512102c
      Hugh Dickins authored
      When moving tasks from old memcg (with move_charge_at_immigrate on new
      memcg), followed by removal of old memcg, hit General Protection Fault in
      mem_cgroup_lru_del_list() (called from release_pages called from
      free_pages_and_swap_cache from tlb_flush_mmu from tlb_finish_mmu from
      exit_mmap from mmput from exit_mm from do_exit).
      
      Somewhat reproducible, takes a few hours: the old struct mem_cgroup has
      been freed and poisoned by SLAB_DEBUG, but mem_cgroup_lru_del_list() is
      still trying to update its stats, and take page off lru before freeing.
      
      A task, or a charge, or a page on lru: each secures a memcg against
      removal.  In this case, the last task has been moved out of the old memcg,
      and it is exiting: anonymous pages are uncharged one by one from the
      memcg, as they are zapped from its pagetables, so the charge gets down to
      0; but the pages themselves are queued in an mmu_gather for freeing.
      
      Most of those pages will be on lru (and force_empty is careful to
      lru_add_drain_all, to add pages from pagevec to lru first), but not
      necessarily all: perhaps some have been isolated for page reclaim, perhaps
      some isolated for other reasons.  So, force_empty may find no task, no
      charge and no page on lru, and let the removal proceed.
      
      There would still be no problem if these pages were immediately freed; but
      typically (and the put_page_testzero protocol demands it) they have to be
      added back to lru before they are found freeable, then removed from lru
      and freed.  We don't see the issue when adding, because the
      mem_cgroup_iter() loops keep their own reference to the memcg being
      scanned; but when it comes to mem_cgroup_lru_del_list().
      
      I believe this was not an issue in v3.2: there, PageCgroupAcctLRU and
      PageCgroupUsed flags were used (like a trick with mirrors) to deflect view
      of pc->mem_cgroup to the stable root_mem_cgroup when neither set.
      38c5d72f ("memcg: simplify LRU handling by new rule") mercifully
      removed those convolutions, but left this General Protection Fault.
      
      But it's surprisingly easy to restore the old behaviour: just check
      PageCgroupUsed in mem_cgroup_lru_add_list() (which decides on which lruvec
      to add), and reset pc to root_mem_cgroup if page is uncharged.  A risky
      change?  just going back to how it worked before; testing, and an audit of
      uses of pc->mem_cgroup, show no problem.
      
      And there's a nice bonus: with mem_cgroup_lru_add_list() itself making
      sure that an uncharged page goes to root lru, mem_cgroup_reset_owner() no
      longer has any purpose, and we can safely revert 4e5f01c2 ("memcg:
      clear pc->mem_cgroup if necessary").
      
      Calling update_page_reclaim_stat() after add_page_to_lru_list() in swap.c
      is not strictly necessary: the lru_lock there, with RCU before memcg
      structures are freed, makes mem_cgroup_get_reclaim_stat_from_page safe
      without that; but it seems cleaner to rely on one dependency less.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7512102c
    • Stephen Boyd's avatar
      debugobjects: Fix selftest for static warnings · 9f78ff00
      Stephen Boyd authored
      debugobjects is now printing a warning when a fixup for a NOTAVAILABLE
      object is run.  This causes the selftest to fail like:
      
      	ODEBUG: selftest warnings failed 4 != 5
      
      We could just increase the number of warnings that the selftest is
      expecting to see because that is actually what has changed.  But, it turns
      out that fixup_activate() was written with inverted logic and thus a fixup
      for a static object returned 1 indicating the object had been fixed, and 0
      otherwise.  Fix the logic to be correct and update the counts to reflect
      that nothing needed fixing for a static object.
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Reported-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f78ff00
    • Muthu Kumar's avatar
      floppy/scsi: fix setting of BIO flags · 9354f1b8
      Muthu Kumar authored
      Fix setting bio flags in drivers (sd_dif/floppy).
      Signed-off-by: default avatarMuthukumar R <muthur@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9354f1b8
    • Hugh Dickins's avatar
      memcg: fix deadlock by inverting lrucare nesting · 9ce70c02
      Hugh Dickins authored
      We have forgotten the rules of lock nesting: the irq-safe ones must be
      taken inside the non-irq-safe ones, otherwise we are open to deadlock:
      
      CPU0                          CPU1
      ----                          ----
      lock(&(&pc->lock)->rlock);
                                    local_irq_disable();
                                    lock(&(&zone->lru_lock)->rlock);
                                    lock(&(&pc->lock)->rlock);
      <Interrupt>
      lock(&(&zone->lru_lock)->rlock);
      
      To check a different locking issue, I happened to add a spin_lock to
      memcg's bit_spin_lock in lock_page_cgroup(), and lockdep very quickly
      complained about __mem_cgroup_commit_charge_lrucare() (on CPU1 above).
      
      So delete __mem_cgroup_commit_charge_lrucare(), passing a bool lrucare to
      __mem_cgroup_commit_charge() instead, taking zone->lru_lock under
      lock_page_cgroup() in the lrucare case.
      
      The original was using spin_lock_irqsave, but we'd be in more trouble if
      it were ever called at interrupt time: unconditional _irq is enough.  And
      ClearPageLRU before del from lru, SetPageLRU before add to lru: no strong
      reason, but that is the ordering used consistently elsewhere.
      
      Fixes 36b62ad5 ("memcg: simplify corner case handling
      of LRU").
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9ce70c02
    • Anatolij Gustschin's avatar
      drivers/rtc/rtc-r9701.c: fix crash in r9701_remove() · 73737b87
      Anatolij Gustschin authored
      If probing the RTC didn't succeed due to failed RTC register access, the
      RTC device will be unregistered.  Then, when removing the module
      r9701_remove() causes a kernel crash while trying to unregister a not
      registered RTC device.  Fix this by doing RTC register access test before
      RTC device registration.
      Signed-off-by: default avatarAnatolij Gustschin <agust@denx.de>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      73737b87
    • Dan Carpenter's avatar
      c2port: class_create() returns an ERR_PTR · 22ea71d7
      Dan Carpenter authored
      class_create() doesn't return a NULL, it only returns ERR_PTRs.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      22ea71d7
    • Dan Carpenter's avatar
      pps: class_create() returns an ERR_PTR, not NULL · 7ad12566
      Dan Carpenter authored
      class_create() never returns NULLs only ERR_PTRs.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Rodolfo Giometti <giometti@enneenne.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ad12566
    • Oleg Nesterov's avatar
      hung_task: fix the broken rcu_lock_break() logic · 6027ce49
      Oleg Nesterov authored
      check_hung_uninterruptible_tasks()->rcu_lock_break() introduced by
      "softlockup: check all tasks in hung_task" commit ce9dbe24 looks
      absolutely wrong.
      
      	- rcu_lock_break() does put_task_struct(). If the task has exited
      	  it is not safe to even read its ->state, nothing protects this
      	  task_struct.
      
      	- The TASK_DEAD checks are wrong too. Contrary to the comment, we
      	  can't use it to check if the task was unhashed. It can be unhashed
      	  without TASK_DEAD, or it can be valid with TASK_DEAD.
      
      	  For example, an autoreaping task can do release_task(current)
      	  long before it sets TASK_DEAD in do_exit().
      
      	  Or, a zombie task can have ->state == TASK_DEAD but release_task()
      	  was not called, and in this case we must not break the loop.
      
      Change this code to check pid_alive() instead, and do this before we drop
      the reference to the task_struct.
      
      Note: while_each_thread() under rcu_read_lock() is not really safe, it can
      livelock.  This will be fixed later, but fortunately in this case the
      "max_count" logic saves us anyway.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarMandeep Singh Baines <msb@google.com>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6027ce49
    • Oleg Nesterov's avatar
      vfork: kill PF_STARTING · 6e27f63e
      Oleg Nesterov authored
      Previously it was (ab)used by utrace.  Then it was wrongly used by the
      scheduler code.
      
      Currently it is not used, kill it before it finds the new erroneous user.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6e27f63e
    • Oleg Nesterov's avatar
      coredump_wait: don't call complete_vfork_done() · 57b59c4a
      Oleg Nesterov authored
      Now that CLONE_VFORK is killable, coredump_wait() no longer needs
      complete_vfork_done().  zap_threads() should find and kill all tasks with
      the same ->mm, this includes our parent if ->vfork_done is set.
      
      mm_release() becomes the only caller, unexport complete_vfork_done().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      57b59c4a
    • Oleg Nesterov's avatar
      vfork: make it killable · d68b46fe
      Oleg Nesterov authored
      Make vfork() killable.
      
      Change do_fork(CLONE_VFORK) to do wait_for_completion_killable().  If it
      fails we do not return to the user-mode and never touch the memory shared
      with our child.
      
      However, in this case we should clear child->vfork_done before return, we
      use task_lock() in do_fork()->wait_for_vfork_done() and
      complete_vfork_done() to serialize with each other.
      
      Note: now that we use task_lock() we don't really need completion, we
      could turn task->vfork_done into "task_struct *wake_up_me" but this needs
      some complications.
      
      NOTE: this and the next patches do not affect in-kernel users of
      CLONE_VFORK, kernel threads run with all signals ignored including
      SIGKILL/SIGSTOP.
      
      However this is obviously the user-visible change.  Not only a fatal
      signal can kill the vforking parent, a sub-thread can do execve or
      exit_group() and kill the thread sleeping in vfork().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d68b46fe
    • Oleg Nesterov's avatar
      vfork: introduce complete_vfork_done() · c415c3b4
      Oleg Nesterov authored
      No functional changes.
      
      Move the clear-and-complete-vfork_done code into the new trivial helper,
      complete_vfork_done().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c415c3b4
    • Jeff Moyer's avatar
      aio: wake up waiters when freeing unused kiocbs · 880641bb
      Jeff Moyer authored
      Bart Van Assche reported a hung fio process when either hot-removing
      storage or when interrupting the fio process itself.  The (pruned) call
      trace for the latter looks like so:
      
        fio             D 0000000000000001     0  6849   6848 0x00000004
         ffff880092541b88 0000000000000046 ffff880000000000 ffff88012fa11dc0
         ffff88012404be70 ffff880092541fd8 ffff880092541fd8 ffff880092541fd8
         ffff880128b894d0 ffff88012404be70 ffff880092541b88 000000018106f24d
        Call Trace:
          schedule+0x3f/0x60
          io_schedule+0x8f/0xd0
          wait_for_all_aios+0xc0/0x100
          exit_aio+0x55/0xc0
          mmput+0x2d/0x110
          exit_mm+0x10d/0x130
          do_exit+0x671/0x860
          do_group_exit+0x44/0xb0
          get_signal_to_deliver+0x218/0x5a0
          do_signal+0x65/0x700
          do_notify_resume+0x65/0x80
          int_signal+0x12/0x17
      
      The problem lies with the allocation batching code.  It will
      opportunistically allocate kiocbs, and then trim back the list of iocbs
      when there is not enough room in the completion ring to hold all of the
      events.
      
      In the case above, what happens is that the pruning back of events ends
      up freeing up the last active request and the context is marked as dead,
      so it is thus responsible for waking up waiters.  Unfortunately, the
      code does not check for this condition, so we end up with a hung task.
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Reported-by: default avatarBart Van Assche <bvanassche@acm.org>
      Tested-by: default avatarBart Van Assche <bvanassche@acm.org>
      Cc: <stable@kernel.org>		[3.2.x only]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      880641bb