1. 07 Oct, 2016 4 commits
    • zhong jiang's avatar
      mm,ksm: fix endless looping in allocating memory when ksm enable · 3d0183af
      zhong jiang authored
      commit 5b398e41 upstream.
      
      I hit the following hung task when runing a OOM LTP test case with 4.1
      kernel.
      
      Call trace:
      [<ffffffc000086a88>] __switch_to+0x74/0x8c
      [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
      [<ffffffc000a1c09c>] schedule+0x3c/0x94
      [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
      [<ffffffc000a1e32c>] down_write+0x64/0x80
      [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
      [<ffffffc0000be650>] mmput+0x118/0x11c
      [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
      [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
      [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
      [<ffffffc000089fcc>] do_signal+0x1d8/0x450
      [<ffffffc00008a35c>] do_notify_resume+0x70/0x78
      
      The oom victim cannot terminate because it needs to take mmap_sem for
      write while the lock is held by ksmd for read which loops in the page
      allocator
      
      ksm_do_scan
      	scan_get_next_rmap_item
      		down_read
      		get_next_rmap_item
      			alloc_rmap_item   #ksmd will loop permanently.
      
      There is no way forward because the oom victim cannot release any memory
      in 4.1 based kernel.  Since 4.6 we have the oom reaper which would solve
      this problem because it would release the memory asynchronously.
      Nevertheless we can relax alloc_rmap_item requirements and use
      __GFP_NORETRY because the allocation failure is acceptable as ksm_do_scan
      would just retry later after the lock got dropped.
      
      Such a patch would be also easy to backport to older stable kernels which
      do not have oom_reaper.
      
      While we are at it add GFP_NOWARN so the admin doesn't have to be alarmed
      by the allocation failure.
      
      Link: http://lkml.kernel.org/r/1474165570-44398-1-git-send-email-zhongjiang@huawei.comSigned-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Suggested-by: default avatarHugh Dickins <hughd@google.com>
      Suggested-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3d0183af
    • Karl Beldan's avatar
      mtd: nand: davinci: Reinitialize the HW ECC engine in 4bit hwctl · ebc9785b
      Karl Beldan authored
      commit f6d7c1b5 upstream.
      
      This fixes subpage writes when using 4-bit HW ECC.
      
      There has been numerous reports about ECC errors with devices using this
      driver for a while.  Also the 4-bit ECC has been reported as broken with
      subpages in [1] and with 16 bits NANDs in the driver and in mach* board
      files both in mainline and in the vendor BSPs.
      
      What I saw with 4-bit ECC on a 16bits NAND (on an LCDK) which got me to
      try reinitializing the ECC engine:
      - R/W on whole pages properly generates/checks RS code
      - try writing the 1st subpage only of a blank page, the subpage is well
        written and the RS code properly generated, re-reading the same page
        the HW detects some ECC error, reading the same page again no ECC
        error is detected
      
      Note that the ECC engine is already reinitialized in the 1-bit case.
      
      Tested on my LCDK with UBI+UBIFS using subpages.
      This could potentially get rid of the issue workarounded in [1].
      
      [1] 28c015a9 ("mtd: davinci-nand: disable subpage write for keystone-nand")
      
      Fixes: 6a4123e5 ("mtd: nand: davinci_nand, 4-bit ECC for smallpage")
      Signed-off-by: default avatarKarl Beldan <kbeldan@baylibre.com>
      Acked-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: default avatarBrian Norris <computersforpeace@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ebc9785b
    • Michael Ellerman's avatar
      powerpc: Add macros for the ibm_architecture_vec[] lengths · 9dcc5e15
      Michael Ellerman authored
      commit e8a4fd0a upstream.
      
      The encoding of the lengths in the ibm_architecture_vec array is
      "interesting" to say the least. It's non-obvious how the number of bytes
      we provide relates to the length value.
      
      In fact we already got it wrong once, see 11e9ed43 "Fix up
      ibm_architecture_vec definition".
      
      So add some macros to make it (hopefully) clearer. These at least have
      the property that the integer present in the code is equal to the number
      of bytes that follows it.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarStewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9dcc5e15
    • Peter Zijlstra's avatar
      sched/core: Fix an SMP ordering race in try_to_wake_up() vs. schedule() · ac080ae1
      Peter Zijlstra authored
      commit ecf7d01c upstream.
      
      Oleg noticed that its possible to falsely observe p->on_cpu == 0 such
      that we'll prematurely continue with the wakeup and effectively run p on
      two CPUs at the same time.
      
      Even though the overlap is very limited; the task is in the middle of
      being scheduled out; it could still result in corruption of the
      scheduler data structures.
      
              CPU0                            CPU1
      
              set_current_state(...)
      
              <preempt_schedule>
                context_switch(X, Y)
                  prepare_lock_switch(Y)
                    Y->on_cpu = 1;
                  finish_lock_switch(X)
                    store_release(X->on_cpu, 0);
      
                                              try_to_wake_up(X)
                                                LOCK(p->pi_lock);
      
                                                t = X->on_cpu; // 0
      
                context_switch(Y, X)
                  prepare_lock_switch(X)
                    X->on_cpu = 1;
                  finish_lock_switch(Y)
                    store_release(Y->on_cpu, 0);
              </preempt_schedule>
      
              schedule();
                deactivate_task(X);
                X->on_rq = 0;
      
                                                if (X->on_rq) // false
      
                                                if (t) while (X->on_cpu)
                                                  cpu_relax();
      
                context_switch(X, ..)
                  finish_lock_switch(X)
                    store_release(X->on_cpu, 0);
      
      Avoid the load of X->on_cpu being hoisted over the X->on_rq load.
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ac080ae1
  2. 06 Oct, 2016 34 commits
  3. 29 Sep, 2016 2 commits