1. 29 Oct, 2014 18 commits
    • Mikulas Patocka's avatar
      mm/slab_common: don't check for duplicate cache names · 8aba7e0a
      Mikulas Patocka authored
      The SLUB cache merges caches with the same size and alignment and there
      was long standing bug with this behavior:
      
       - create the cache named "foo"
       - create the cache named "bar" (which is merged with "foo")
       - delete the cache named "foo" (but it stays allocated because "bar"
         uses it)
       - create the cache named "foo" again - it fails because the name "foo"
         is already used
      
      That bug was fixed in commit 69461747 ("slab_common: fix the check
      for duplicate slab names") by not warning on duplicate cache names when
      the SLUB subsystem is used.
      
      Recently, cache merging was implemented the with SLAB subsystem too, in
      12220dea ("mm/slab: support slab merge")).  Therefore we need stop
      checking for duplicate names even for the SLAB subsystem.
      
      This patch fixes the bug by removing the check.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8aba7e0a
    • Richard Weinberger's avatar
      ocfs2: fix d_splice_alias() return code checking · d3556bab
      Richard Weinberger authored
      d_splice_alias() can return a valid dentry, NULL or an ERR_PTR.
      Currently the code checks not for ERR_PTR and will cuase an oops in
      ocfs2_dentry_attach_lock().  Fix this by using IS_ERR_OR_NULL().
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d3556bab
    • Johannes Weiner's avatar
      mm: rmap: split out page_remove_file_rmap() · 8186eb6a
      Johannes Weiner authored
      page_remove_rmap() has too many branches on PageAnon() and is hard to
      follow.  Move the file part into a separate function.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8186eb6a
    • Johannes Weiner's avatar
      mm: memcontrol: fix missed end-writeback page accounting · d7365e78
      Johannes Weiner authored
      Commit 0a31bc97 ("mm: memcontrol: rewrite uncharge API") changed
      page migration to uncharge the old page right away.  The page is locked,
      unmapped, truncated, and off the LRU, but it could race with writeback
      ending, which then doesn't unaccount the page properly:
      
      test_clear_page_writeback()              migration
                                                 wait_on_page_writeback()
        TestClearPageWriteback()
                                                 mem_cgroup_migrate()
                                                   clear PCG_USED
        mem_cgroup_update_page_stat()
          if (PageCgroupUsed(pc))
            decrease memcg pages under writeback
      
        release pc->mem_cgroup->move_lock
      
      The per-page statistics interface is heavily optimized to avoid a
      function call and a lookup_page_cgroup() in the file unmap fast path,
      which means it doesn't verify whether a page is still charged before
      clearing PageWriteback() and it has to do it in the stat update later.
      
      Rework it so that it looks up the page's memcg once at the beginning of
      the transaction and then uses it throughout.  The charge will be
      verified before clearing PageWriteback() and migration can't uncharge
      the page as long as that is still set.  The RCU lock will protect the
      memcg past uncharge.
      
      As far as losing the optimization goes, the following test results are
      from a microbenchmark that maps, faults, and unmaps a 4GB sparse file
      three times in a nested fashion, so that there are two negative passes
      that don't account but still go through the new transaction overhead.
      There is no actual difference:
      
       old:     33.195102545 seconds time elapsed       ( +-  0.01% )
       new:     33.199231369 seconds time elapsed       ( +-  0.03% )
      
      The time spent in page_remove_rmap()'s callees still adds up to the
      same, but the time spent in the function itself seems reduced:
      
           # Children      Self  Command        Shared Object       Symbol
       old:     0.12%     0.11%  filemapstress  [kernel.kallsyms]   [k] page_remove_rmap
       new:     0.12%     0.08%  filemapstress  [kernel.kallsyms]   [k] page_remove_rmap
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: <stable@vger.kernel.org>	[3.17.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d7365e78
    • Johannes Weiner's avatar
      mm: page-writeback: inline account_page_dirtied() into single caller · 3a3c02ec
      Johannes Weiner authored
      A follow-up patch would have changed the call signature.  To save the
      trouble, just fold it instead.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: <stable@vger.kernel.org>	[3.17.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3a3c02ec
    • Jan Kara's avatar
      lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}() · ea5d05b3
      Jan Kara authored
      If __bitmap_shift_left() or __bitmap_shift_right() are asked to shift by
      a multiple of BITS_PER_LONG, they will try to shift a long value by
      BITS_PER_LONG bits which is undefined.  Change the functions to avoid
      the undefined shift.
      
      Coverity id: 1192175
      Coverity id: 1192174
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ea5d05b3
    • Pavel Machek's avatar
      drivers/rtc/rtc-bq32k.c: fix register value · 5a6e7599
      Pavel Machek authored
      Fix register value in bq32000 trickle charging.
      
      Mike reported that I'm using wrong value in one trickle-charging case,
      and after checking docs, I must admit he's right.
      Signed-off-by: default avatarPavel Machek <pavel@denx.de>
      Reported-by: default avatarMike Bremford <mike@bfo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5a6e7599
    • Yasuaki Ishimatsu's avatar
      memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node() · 35dca71c
      Yasuaki Ishimatsu authored
      When hot adding the same memory after hot removal, the following
      messages are shown:
      
        WARNING: CPU: 20 PID: 6 at mm/page_alloc.c:4968 free_area_init_node+0x3fe/0x426()
        ...
        Call Trace:
          dump_stack+0x46/0x58
          warn_slowpath_common+0x81/0xa0
          warn_slowpath_null+0x1a/0x20
          free_area_init_node+0x3fe/0x426
          hotadd_new_pgdat+0x90/0x110
          add_memory+0xd4/0x200
          acpi_memory_device_add+0x1aa/0x289
          acpi_bus_attach+0xfd/0x204
          acpi_bus_attach+0x178/0x204
          acpi_bus_scan+0x6a/0x90
          acpi_device_hotplug+0xe8/0x418
          acpi_hotplug_work_fn+0x1f/0x2b
          process_one_work+0x14e/0x3f0
          worker_thread+0x11b/0x510
          kthread+0xe1/0x100
          ret_from_fork+0x7c/0xb0
      
      The detaled explanation is as follows:
      
      When hot removing memory, pgdat is set to 0 in try_offline_node().  But
      if the pgdat is allocated by bootmem allocator, the clearing step is
      skipped.
      
      And when hot adding the same memory, the uninitialized pgdat is reused.
      But free_area_init_node() checks wether pgdat is set to zero.  As a
      result, free_area_init_node() hits WARN_ON().
      
      This patch clears pgdat which is allocated by bootmem allocator in
      try_offline_node().
      Signed-off-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: default avatarToshi Kani <toshi.kani@hp.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      35dca71c
    • Marek Szyprowski's avatar
      drivers/rtc/rtc-s3c.c: fix initialization failure without rtc source clock · eaf3a659
      Marek Szyprowski authored
      Fix unconditional initialization failure on non-exynos3250 SoCs.
      
      Commit df9e26d0 ("rtc: s3c: add support for RTC of Exynos3250 SoC")
      introduced rtc source clock support, but also added initialization
      failure on SoCs, which doesn't need such clock.
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Reviewed-by: default avatarChanwoo Choi <cw00.choi@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eaf3a659
    • Martin Schwidefsky's avatar
      kernel/kmod: fix use-after-free of the sub_info structure · 0baf2a4d
      Martin Schwidefsky authored
      Found this in the message log on a s390 system:
      
          BUG kmalloc-192 (Not tainted): Poison overwritten
          Disabling lock debugging due to kernel taint
          INFO: 0x00000000684761f4-0x00000000684761f7. First byte 0xff instead of 0x6b
          INFO: Allocated in call_usermodehelper_setup+0x70/0x128 age=71 cpu=2 pid=648
           __slab_alloc.isra.47.constprop.56+0x5f6/0x658
           kmem_cache_alloc_trace+0x106/0x408
           call_usermodehelper_setup+0x70/0x128
           call_usermodehelper+0x62/0x90
           cgroup_release_agent+0x178/0x1c0
           process_one_work+0x36e/0x680
           worker_thread+0x2f0/0x4f8
           kthread+0x10a/0x120
           kernel_thread_starter+0x6/0xc
           kernel_thread_starter+0x0/0xc
          INFO: Freed in call_usermodehelper_exec+0x110/0x1b8 age=71 cpu=2 pid=648
           __slab_free+0x94/0x560
           kfree+0x364/0x3e0
           call_usermodehelper_exec+0x110/0x1b8
           cgroup_release_agent+0x178/0x1c0
           process_one_work+0x36e/0x680
           worker_thread+0x2f0/0x4f8
           kthread+0x10a/0x120
           kernel_thread_starter+0x6/0xc
           kernel_thread_starter+0x0/0xc
      
      There is a use-after-free bug on the subprocess_info structure allocated
      by the user mode helper.  In case do_execve() returns with an error
      ____call_usermodehelper() stores the error code to sub_info->retval, but
      sub_info can already have been freed.
      
      Regarding UMH_NO_WAIT, the sub_info structure can be freed by
      __call_usermodehelper() before the worker thread returns from
      do_execve(), allowing memory corruption when do_execve() failed after
      exec_mmap() is called.
      
      Regarding UMH_WAIT_EXEC, the call to umh_complete() allows
      call_usermodehelper_exec() to continue which then frees sub_info.
      
      To fix this race the code needs to make sure that the call to
      call_usermodehelper_freeinfo() is always done after the last store to
      sub_info->retval.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0baf2a4d
    • Stanimir Varbanov's avatar
      drivers/rtc/rtc-pm8xxx.c: rework to support pm8941 rtc · c8d523a4
      Stanimir Varbanov authored
      Adds support for RTC device inside PM8941 PMIC.  The RTC in this PMIC
      have two register spaces.  Thus the rtc-pm8xxx is slightly reworked to
      reflect these differences.
      
      The register set for different PMIC chips are selected on DT compatible
      string base.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: simplify and fix locking in pm8xxx_rtc_set_time()]
      Signed-off-by: default avatarStanimir Varbanov <svarbanov@mm-sol.com>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Josh Cartwright <joshc@codeaurora.org>
      Cc: Stanimir Varbanov <svarbanov@mm-sol.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c8d523a4
    • David Rientjes's avatar
      mm, thp: fix collapsing of hugepages on madvise · 6d50e60c
      David Rientjes authored
      If an anonymous mapping is not allowed to fault thp memory and then
      madvise(MADV_HUGEPAGE) is used after fault, khugepaged will never
      collapse this memory into thp memory.
      
      This occurs because the madvise(2) handler for thp, hugepage_madvise(),
      clears VM_NOHUGEPAGE on the stack and it isn't stored in vma->vm_flags
      until the final action of madvise_behavior().  This causes the
      khugepaged_enter_vma_merge() to be a no-op in hugepage_madvise() when
      the vma had previously had VM_NOHUGEPAGE set.
      
      Fix this by passing the correct vma flags to the khugepaged mm slot
      handler.  There's no chance khugepaged can run on this vma until after
      madvise_behavior() returns since we hold mm->mmap_sem.
      
      It would be possible to clear VM_NOHUGEPAGE directly from vma->vm_flags
      in hugepage_advise(), but I didn't want to introduce special case
      behavior into madvise_behavior().  I think it's best to just let it
      always set vma->vm_flags itself.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Reported-by: default avatarSuleiman Souhlal <suleiman@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d50e60c
    • Marek Szyprowski's avatar
      drivers: of: add return value to of_reserved_mem_device_init() · 47f29df7
      Marek Szyprowski authored
      Driver calling of_reserved_mem_device_init() might be interested if the
      initialization has been successful or not, so add support for returning
      error code.
      
      This fixes a build warining caused by commit 7bfa5ab6 ("drivers:
      dma-coherent: add initialization from device tree"), which has been
      merged without this change and without fixing function return value.
      
      Fixes: 7bfa5ab6 ("drivers: dma-coherent: add initialization from device tree")
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Josh Cartwright <joshc@codeaurora.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      47f29df7
    • Yu Zhao's avatar
      mm: free compound page with correct order · 5ddacbe9
      Yu Zhao authored
      Compound page should be freed by put_page() or free_pages() with correct
      order.  Not doing so will cause tail pages leaked.
      
      The compound order can be obtained by compound_order() or use
      HPAGE_PMD_ORDER in our case.  Some people would argue the latter is
      faster but I prefer the former which is more general.
      
      This bug was observed not just on our servers (the worst case we saw is
      11G leaked on a 48G machine) but also on our workstations running Ubuntu
      based distro.
      
        $ cat /proc/vmstat  | grep thp_zero_page_alloc
        thp_zero_page_alloc 55
        thp_zero_page_alloc_failed 0
      
      This means there is (thp_zero_page_alloc - 1) * (2M - 4K) memory leaked.
      
      Fixes: 97ae1749 ("thp: implement refcounting for huge zero page")
      Signed-off-by: default avatarYu Zhao <yuzhao@google.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Bob Liu <lliubbo@gmail.com>
      Cc: <stable@vger.kernel.org>	[3.8+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5ddacbe9
    • Riku Voipio's avatar
      gcov: add ARM64 to GCOV_PROFILE_ALL · f601de20
      Riku Voipio authored
      Following up the arm testing of gcov, turns out gcov on ARM64 works fine
      as well.  Only change needed is adding ARM64 to Kconfig depends.
      
      Tested with qemu and mach-virt
      Signed-off-by: default avatarRiku Voipio <riku.voipio@linaro.org>
      Acked-by: default avatarPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f601de20
    • Jerry Hoemann's avatar
      fsnotify: next_i is freed during fsnotify_unmount_inodes. · 6424babf
      Jerry Hoemann authored
      During file system stress testing on 3.10 and 3.12 based kernels, the
      umount command occasionally hung in fsnotify_unmount_inodes in the
      section of code:
      
                      spin_lock(&inode->i_lock);
                      if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
                              spin_unlock(&inode->i_lock);
                              continue;
                      }
      
      As this section of code holds the global inode_sb_list_lock, eventually
      the system hangs trying to acquire the lock.
      
      Multiple crash dumps showed:
      
      The inode->i_state == 0x60 and i_count == 0 and i_sb_list would point
      back at itself.  As this is not the value of list upon entry to the
      function, the kernel never exits the loop.
      
      To help narrow down problem, the call to list_del_init in
      inode_sb_list_del was changed to list_del.  This poisons the pointers in
      the i_sb_list and causes a kernel to panic if it transverse a freed
      inode.
      
      Subsequent stress testing paniced in fsnotify_unmount_inodes at the
      bottom of the list_for_each_entry_safe loop showing next_i had become
      free.
      
      We believe the root cause of the problem is that next_i is being freed
      during the window of time that the list_for_each_entry_safe loop
      temporarily releases inode_sb_list_lock to call fsnotify and
      fsnotify_inode_delete.
      
      The code in fsnotify_unmount_inodes attempts to prevent the freeing of
      inode and next_i by calling __iget.  However, the code doesn't do the
      __iget call on next_i
      
      	if i_count == 0 or
      	if i_state & (I_FREEING | I_WILL_FREE)
      
      The patch addresses this issue by advancing next_i in the above two cases
      until we either find a next_i which we can __iget or we reach the end of
      the list.  This makes the handling of next_i more closely match the
      handling of the variable "inode."
      
      The time to reproduce the hang is highly variable (from hours to days.) We
      ran the stress test on a 3.10 kernel with the proposed patch for a week
      without failure.
      
      During list_for_each_entry_safe, next_i is becoming free causing
      the loop to never terminate.  Advance next_i in those cases where
      __iget is not done.
      Signed-off-by: default avatarJerry Hoemann <jerry.hoemann@hp.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Ken Helias <kenhelias@firemail.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6424babf
    • Joonsoo Kim's avatar
      mm/compaction.c: avoid premature range skip in isolate_migratepages_range · 6ea41c0c
      Joonsoo Kim authored
      Commit edc2ca61 ("mm, compaction: move pageblock checks up from
      isolate_migratepages_range()") commonizes isolate_migratepages variants
      and make them use isolate_migratepages_block().
      
      isolate_migratepages_block() could stop the execution when enough pages
      are isolated, but, there is no code in isolate_migratepages_range() to
      handle this case.  In the result, even if isolate_migratepages_block()
      returns prematurely without checking all pages in the range,
      
      isolate_migratepages_block() is called repeately on the following
      pageblock and some pages in the previous range are skipped to check.
      Then, CMA is failed frequently due to this fact.
      
      To fix this problem, this patch let isolate_migratepages_range() know
      the situation that enough pages are isolated and stop the isolation in
      that case.
      
      Note that isolate_migratepages() has no such problem, because, it always
      stops the isolation after just one call of isolate_migratepages_block().
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6ea41c0c
    • Wang Nan's avatar
      cgroup/kmemleak: add kmemleak_free() for cgroup deallocations. · 401507d6
      Wang Nan authored
      Commit ff7ee93f ("cgroup/kmemleak: Annotate alloc_page() for cgroup
      allocations") introduces kmemleak_alloc() for alloc_page_cgroup(), but
      corresponding kmemleak_free() is missing, which makes kmemleak be
      wrongly disabled after memory offlining.  Log is pasted at the end of
      this commit message.
      
      This patch add kmemleak_free() into free_page_cgroup().  During page
      offlining, this patch removes corresponding entries in kmemleak rbtree.
      After that, the freed memory can be allocated again by other subsystems
      without killing kmemleak.
      
        bash # for x in 1 2 3 4; do echo offline > /sys/devices/system/memory/memory$x/state ; sleep 1; done ; dmesg | grep leak
      
        Offlined Pages 32768
        kmemleak: Cannot insert 0xffff880016969000 into the object search tree (overlaps existing)
        CPU: 0 PID: 412 Comm: sleep Not tainted 3.17.0-rc5+ #86
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        Call Trace:
          dump_stack+0x46/0x58
          create_object+0x266/0x2c0
          kmemleak_alloc+0x26/0x50
          kmem_cache_alloc+0xd3/0x160
          __sigqueue_alloc+0x49/0xd0
          __send_signal+0xcb/0x410
          send_signal+0x45/0x90
          __group_send_sig_info+0x13/0x20
          do_notify_parent+0x1bb/0x260
          do_exit+0x767/0xa40
          do_group_exit+0x44/0xa0
          SyS_exit_group+0x17/0x20
          system_call_fastpath+0x16/0x1b
      
        kmemleak: Kernel memory leak detector disabled
        kmemleak: Object 0xffff880016900000 (size 524288):
        kmemleak:   comm "swapper/0", pid 0, jiffies 4294667296
        kmemleak:   min_count = 0
        kmemleak:   count = 0
        kmemleak:   flags = 0x1
        kmemleak:   checksum = 0
        kmemleak:   backtrace:
              log_early+0x63/0x77
              kmemleak_alloc+0x4b/0x50
              init_section_page_cgroup+0x7f/0xf5
              page_cgroup_init+0xc5/0xd0
              start_kernel+0x333/0x408
              x86_64_start_reservations+0x2a/0x2c
              x86_64_start_kernel+0xf5/0xfc
      
      Fixes: ff7ee93f (cgroup/kmemleak: Annotate alloc_page() for cgroup allocations)
      Signed-off-by: default avatarWang Nan <wangnan0@huawei.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: <stable@vger.kernel.org>	[3.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      401507d6
  2. 28 Oct, 2014 4 commits
    • Linus Torvalds's avatar
      Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux · 9f76628d
      Linus Torvalds authored
      Pull two nfsd fixes from Bruce Fields:
       "One regression from the 3.16 xdr rewrite, one an older bug exposed by
        a separate bug in the client's new SEEK code"
      
      * 'for-3.18' of git://linux-nfs.org/~bfields/linux:
        nfsd4: fix crash on unknown operation number
        nfsd4: fix response size estimation for OP_SEQUENCE
      9f76628d
    • Linus Torvalds's avatar
      Merge tag 'trace-fixes-v3.18-rc1' of... · 6234056e
      Linus Torvalds authored
      Merge tag 'trace-fixes-v3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
      
      Pull ftrace trampoline accounting fixes from Steven Rostedt:
       "Adding the new code for 3.19, I discovered a couple of minor bugs with
        the accounting of the ftrace_ops trampoline logic.
      
        One was that the old hash was not updated before calling the modify
        code for an ftrace_ops.  The second bug was what let the first bug go
        unnoticed, as the update would check the current hash for all
        ftrace_ops (where it should only check the old hash for modified
        ones).  This let things work when only one ftrace_ops was registered
        to a function, but could break if more than one was registered
        depending on the order of the look ups.
      
        The worse thing that can happen if this bug triggers is that the
        ftrace self checks would find an anomaly and shut itself down"
      
      * tag 'trace-fixes-v3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        ftrace: Fix checking of trampoline ftrace_ops in finding trampoline
        ftrace: Set ops->old_hash on modifying what an ops hooks to
      6234056e
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm · 6e2028aa
      Linus Torvalds authored
      Pull ARM fixes from Russell King:
       "A couple of ARM fixes.
      
        We fix some printk formats for ptrdiff_t quantities which cause GCC
        4.9 to complain, and we also blacklist known buggy GCC 4.8.x compilers
        as their miscompilation is serious enough to cause filesystem
        corruption, even through many distros have fixed their versions"
      
      * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
        ARM: fix some printk formats
        ARM: Blacklist GCC 4.8.0 to GCC 4.8.2 - PR58854
      6e2028aa
    • Will Deacon's avatar
      zap_pte_range: update addr when forcing flush after TLB batching faiure · ce9ec37b
      Will Deacon authored
      When unmapping a range of pages in zap_pte_range, the page being
      unmapped is added to an mmu_gather_batch structure for asynchronous
      freeing. If we run out of space in the batch structure before the range
      has been completely unmapped, then we break out of the loop, force a
      TLB flush and free the pages that we have batched so far. If there are
      further pages to unmap, then we resume the loop where we left off.
      
      Unfortunately, we forget to update addr when we break out of the loop,
      which causes us to truncate the range being invalidated as the end
      address is exclusive. When we re-enter the loop at the same address, the
      page has already been freed and the pte_present test will fail, meaning
      that we do not reconsider the address for invalidation.
      
      This patch fixes the problem by incrementing addr by the PAGE_SIZE
      before breaking out of the loop on batch failure.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ce9ec37b
  3. 27 Oct, 2014 7 commits
  4. 26 Oct, 2014 4 commits
    • Linus Torvalds's avatar
      Linux 3.18-rc2 · cac7f242
      Linus Torvalds authored
      cac7f242
    • Linus Torvalds's avatar
      Merge tag 'armsoc-for-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 88e23761
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "Another week, another small batch of fixes.
      
        Most of these make zynq, socfpga and sunxi platforms work a bit
        better:
      
         - due to new requirements for regulators, DWMMC on socfpga broke past
           v3.17
         - SMP spinup fix for socfpga
         - a few DT fixes for zynq
         - another option (FIXED_REGULATOR) for sunxi is needed that used to
           be selected by other options but no longer is.
         - a couple of small DT fixes for at91
         - ...and a couple for i.MX"
      
      * tag 'armsoc-for-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        ARM: dts: imx28-evk: Let i2c0 run at 100kHz
        ARM: i.MX6: Fix "emi" clock name typo
        ARM: multi_v7_defconfig: enable CONFIG_MMC_DW_ROCKCHIP
        ARM: sunxi_defconfig: enable CONFIG_REGULATOR_FIXED_VOLTAGE
        ARM: dts: socfpga: Add a 3.3V fixed regulator node
        ARM: dts: socfpga: Fix SD card detect
        ARM: dts: socfpga: rename gpio nodes
        ARM: at91/dt: sam9263: fix PLLB frequencies
        power: reset: at91-reset: fix power down register
        MAINTAINERS: add atmel ssc driver maintainer entry
        arm: socfpga: fix fetching cpu1start_addr for SMP
        ARM: zynq: DT: trivial: Fix mc node
        ARM: zynq: DT: Add cadence watchdog node
        ARM: zynq: DT: Add missing reference for memory-controller
        ARM: zynq: DT: Add missing reference for ADC
        ARM: zynq: DT: Add missing address for L2 pl310
        ARM: zynq: DT: Remove 222 MHz OPP
        ARM: zynq: DT: Fix GEM register area size
      88e23761
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · d1e14f1d
      Linus Torvalds authored
      Pull vfs updates from Al Viro:
       "overlayfs merge + leak fix for d_splice_alias() failure exits"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        overlayfs: embed middle into overlay_readdir_data
        overlayfs: embed root into overlay_readdir_data
        overlayfs: make ovl_cache_entry->name an array instead of pointer
        overlayfs: don't hold ->i_mutex over opening the real directory
        fix inode leaks on d_splice_alias() failure exits
        fs: limit filesystem stacking depth
        overlay: overlay filesystem documentation
        overlayfs: implement show_options
        overlayfs: add statfs support
        overlay filesystem
        shmem: support RENAME_WHITEOUT
        ext4: support RENAME_WHITEOUT
        vfs: add RENAME_WHITEOUT
        vfs: add whiteout support
        vfs: export check_sticky()
        vfs: introduce clone_private_mount()
        vfs: export __inode_permission() to modules
        vfs: export do_splice_direct() to modules
        vfs: add i_op->dentry_open()
      d1e14f1d
    • Olof Johansson's avatar
      Merge tag 'imx-fixes-3.18' of... · efc176a8
      Olof Johansson authored
      Merge tag 'imx-fixes-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes
      
      Merge "ARM: imx: fixes for 3.18" from Shawn Guo:
      
      The i.MX fixes for 3.18:
       - Revert one patch which increases I2C bus frequency on imx28-evk
       - Fix a typo on imx6q EIM clock name
      
      * tag 'imx-fixes-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
        ARM: dts: imx28-evk: Let i2c0 run at 100kHz
        ARM: i.MX6: Fix "emi" clock name typo
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      efc176a8
  5. 25 Oct, 2014 6 commits
  6. 24 Oct, 2014 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix checking of trampoline ftrace_ops in finding trampoline · 4fc40904
      Steven Rostedt (Red Hat) authored
      When modifying code, ftrace has several checks to make sure things
      are being done correctly. One of them is to make sure any code it
      modifies is exactly what it expects it to be before it modifies it.
      In order to do so with the new trampoline logic, it must be able
      to find out what trampoline a function is hooked to in order to
      see if the code that hooks to it is what's expected.
      
      The logic to find the trampoline from a record (accounting descriptor
      for a function that is hooked) needs to only look at the "old_hash"
      of an ops that is being modified. The old_hash is the list of function
      an ops is hooked to before its update. Since a record would only be
      pointing to an ops that is being modified if it was already hooked
      before.
      
      Currently, it can pick a modified ops based on its new functions it
      will be hooked to, and this picks the wrong trampoline and causes
      the check to fail, disabling ftrace.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      
      ftrace: squash into ordering of ops for modification
      4fc40904