- 24 Jan, 2014 40 commits
-
-
Mark Salter authored
Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: Richard Weinberger <richard@nod.at> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mark Salter authored
Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mark Salter authored
Signed-off-by: Mark Salter <msalter@redhat.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mark Salter authored
Signed-off-by: Mark Salter <msalter@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mark Salter authored
Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mark Salter authored
Signed-off-by: Mark Salter <msalter@redhat.com> Tested-by: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mark Salter authored
Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mark Salter authored
Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: Richard Kuo <rkuo@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mark Salter authored
Signed-off-by: Mark Salter <msalter@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mark Salter authored
Many architectures provide an asm/fixmap.h which defines support for compile-time 'special' virtual mappings which need to be made before paging_init() has run. This support is also used for early ioremap on x86. Much of this support is identical across the architectures. This patch consolidates all of the common bits into asm-generic/fixmap.h which is intended to be included from arch/*/include/asm/fixmap.h. Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: James Hogan <james.hogan@imgtec.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jonas Bonn <jonas.bonn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Younger Liu authored
In get_mapping_page(), after calling find_or_create_page(), the return value should be checked. This patch has been provided: http://www.spinics.net/lists/linux-fsdevel/msg66948.html but not been applied now. Signed-off-by: Younger Liu <liuyiyang@hisense.com> Cc: Younger Liu <younger.liucn@gmail.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Reviewed-by: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Jörn Engel <joern@logfs.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Fabian Frederick authored
RAM block device support module name changed to brd.ko some years ago with an "rd" alias to match previous module implementation. This patch updates its Kconfig definition. Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Dan Carpenter authored
There is a bug in omap2_mbox_probe() where we try do: mbox->irq = platform_get_irq(pdev, info->irq_id); if (mbox->irq < 0) { The problem is that mbox->irq is unsigned so the error handling doesn't work. I've changed it to a signed integer. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Suman Anna <s-anna@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Omar Ramirez Luna <omar.ramirez@copitl.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Geert Uytterhoeven authored
Now all 64-bit architectures have been converted to int-ll64.h, we can remove int-l64.h in kernelspace. For backwards compatibility, alpha, ia64, mips64, and powerpc64 still use int-l64.h in userspace. This is the (reworked for UAPI) non-documentation part of more than two year old "asm/types.h: All architectures use int-ll64.h in kernelspace" (https://lkml.org/lkml/2011/8/13/104) Since <asm/types.h> (from include/uapi/asm-generic/types.h) is used for both kernel and user space, include/asm-generic/int-ll64.h cannot just become include/asm-generic/types.h, as Arnd suggested. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Cyrill Gorcunov authored
The VM_SOFTDIRTY bit affects vma merge routine: if two VMAs has all bits in vm_flags matched except dirty bit the kernel can't longer merge them and this forces the kernel to generate new VMAs instead. It finally may lead to the situation when userspace application reaches vm.max_map_count limit and get crashed in worse case | (gimp:11768): GLib-ERROR **: gmem.c:110: failed to allocate 4096 bytes | | (file-tiff-load:12038): LibGimpBase-WARNING **: file-tiff-load: gimp_wire_read(): error | xinit: connection to X server lost | | waiting for X server to shut down | /usr/lib64/gimp/2.0/plug-ins/file-tiff-load terminated: Hangup | /usr/lib64/gimp/2.0/plug-ins/script-fu terminated: Hangup | /usr/lib64/gimp/2.0/plug-ins/script-fu terminated: Hangup https://bugzilla.kernel.org/show_bug.cgi?id=67651 https://bugzilla.gnome.org/show_bug.cgi?id=719619#c0 Initial problem came from missed VM_SOFTDIRTY in do_brk() routine but even if we would set up VM_SOFTDIRTY here, there is still a way to prevent VMAs from merging: one can call | echo 4 > /proc/$PID/clear_refs and clear all VM_SOFTDIRTY over all VMAs presented in memory map, then new do_brk() will try to extend old VMA and finds that dirty bit doesn't match thus new VMA will be generated. As discussed with Pavel, the right approach should be to ignore VM_SOFTDIRTY bit when we're trying to merge VMAs and if merge successed we mark extended VMA with dirty bit where needed. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Reported-by: Bastian Hougaard <gnome@rvzt.net> Reported-by: Mel Gorman <mgorman@suse.de> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Fengguang Wu authored
mm/rmap.c:851:9-10: WARNING: return of 0/1 in function 'invalid_mkclean_vma' with return type bool Return statements in functions returning bool should use true/false instead of 1/0. Generated by: coccinelle/misc/boolreturn.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jamie Liu authored
In the second half of scan_swap_map()'s scan loop, offset is set to si->lowest_bit and then incremented before entering the loop for the first time, causing si->swap_map[si->lowest_bit] to be skipped. Signed-off-by: Jamie Liu <jamieliu@google.com> Cc: Shaohua Li <shli@fusionio.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vladimir Davydov authored
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Mel Gorman authored
Developers occasionally try and optimise PFN scanners by using page_order but miss that in general it requires zone->lock. This has happened twice for compaction.c and rejected both times. This patch clarifies the documentation of page_order and adds a note to compaction.c why page_order is not used. [akpm@linux-foundation.org: tweaks] [lauraa@codeaurora.org: Corrected a page_zone(page)->lock reference] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rafael Aquini <aquini@redhat.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
Commit 19f39402 ("memcg: simplify mem_cgroup_iter") has reorganized mem_cgroup_iter code in order to simplify it. A part of that change was dropping an optimization which didn't call css_tryget on the root of the walked tree. The patch however didn't change the css_put part in mem_cgroup_iter which excludes root. This wasn't an issue at the time because __mem_cgroup_iter_next bailed out for root early without taking a reference as cgroup iterators (css_next_descendant_pre) didn't visit root themselves. Nevertheless cgroup iterators have been reworked to visit root by commit bd8815a6 ("cgroup: make css_for_each_descendant() and friends include the origin css in the iteration") when the root bypass have been dropped in __mem_cgroup_iter_next. This means that css_put is not called for root and so css along with mem_cgroup and other cgroup internal object tied by css lifetime are never freed. Fix the issue by reintroducing root check in __mem_cgroup_iter_next and do not take css reference for it. This reference counting magic protects us also from another issue, an endless loop reported by Hugh Dickins when reclaim races with root removal and css_tryget called by iterator internally would fail. There would be no other nodes to visit so __mem_cgroup_iter_next would return NULL and mem_cgroup_iter would interpret it as "start looping from root again" and so mem_cgroup_iter would loop forever internally. Signed-off-by: Michal Hocko <mhocko@suse.cz> Reported-by: Hugh Dickins <hughd@google.com> Tested-by: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Greg Thelen <gthelen@google.com> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
Hugh has reported an endless loop when the hardlimit reclaim sees the same group all the time. This might happen when the reclaim races with the memcg removal. shrink_zone [rmdir root] mem_cgroup_iter(root, NULL, reclaim) // prev = NULL rcu_read_lock() mem_cgroup_iter_load last_visited = iter->last_visited // gets root || NULL css_tryget(last_visited) // failed last_visited = NULL [1] memcg = root = __mem_cgroup_iter_next(root, NULL) mem_cgroup_iter_update iter->last_visited = root; reclaim->generation = iter->generation mem_cgroup_iter(root, root, reclaim) // prev = root rcu_read_lock mem_cgroup_iter_load last_visited = iter->last_visited // gets root css_tryget(last_visited) // failed [1] The issue seemed to be introduced by commit 5f578161 ("memcg: relax memcg iter caching") which has replaced unconditional css_get/css_put by css_tryget/css_put for the cached iterator. This patch fixes the issue by skipping css_tryget on the root of the tree walk in mem_cgroup_iter_load and symmetrically doesn't release it in mem_cgroup_iter_update. Signed-off-by: Michal Hocko <mhocko@suse.cz> Reported-by: Hugh Dickins <hughd@google.com> Tested-by: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Greg Thelen <gthelen@google.com> Cc: <stable@vger.kernel.org> [3.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
David Rientjes authored
When two threads have the same badness score, it's preferable to kill the thread group leader so that the actual process name is printed to the kernel log rather than the thread group name which may be shared amongst several processes. This was the behavior when select_bad_process() used to do for_each_process(), but it now iterates threads instead and leads to ambiguity. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Xishi Qiu authored
Add "kmemcheck=xx" to Documentation/kernel-parameters.txt. Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Cc: Vegard Nossum <vegardno@ifi.uio.no> Cc: Pekka Enberg <penberg@kernel.org> Cc: Rob Landley <rob@landley.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Hugh Dickins authored
It is surprising that the mem_cgroup iterator can return memcgs which have not yet been fully initialized. By accident (or trial and error?) this appears not to present an actual problem; but it may be better to prevent such surprises, by skipping memcgs not yet online. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Tejun Heo <tj@kernel.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Hugh Dickins authored
Shorten mem_cgroup_reclaim_iter.last_dead_count from unsigned long to int: it's assigned from an int and compared with an int, and adjacent to an unsigned int: so there's no point to it being unsigned long, which wasted 104 bytes in every mem_cgroup_per_zone. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Paul Gortmaker authored
Code that is obj-y (always built-in) or dependent on a bool Kconfig (built-in or absent) can never be modular. So using module_init as an alias for __initcall can be somewhat misleading. Fix these up now, so that we can relocate module_init from init.h into module.h in the future. If we don't do this, we'd have to add module.h to obviously non-modular code, and that would be a worse thing. The audit targets the following module_init users for change: mm/ksm.c bool KSM mm/mmap.c bool MMU mm/huge_memory.c bool TRANSPARENT_HUGEPAGE mm/mmu_notifier.c bool MMU_NOTIFIER Note that direct use of __initcall is discouraged, vs. one of the priority categorized subgroups. As __initcall gets mapped onto device_initcall, our use of subsys_initcall (which makes sense for these files) will thus change this registration from level 6-device to level 4-subsys (i.e. slightly earlier). However no observable impact of that difference has been observed during testing. One might think that core_initcall (l2) or postcore_initcall (l3) would be more appropriate for anything in mm/ but if we look at some actual init functions themselves, we see things like: mm/huge_memory.c --> hugepage_init --> hugepage_init_sysfs mm/mmap.c --> init_user_reserve --> sysctl_user_reserve_kbytes mm/ksm.c --> ksm_init --> sysfs_create_group and hence the choice of subsys_initcall (l4) seems reasonable, and at the same time minimizes the risk of changing the priority too drastically all at once. We can adjust further in the future. Also, several instances of missing ";" at EOL are fixed. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Paul Gortmaker authored
The use of __initcall is to be eventually replaced by choosing one from the prioritized groupings laid out in init.h header: pure_initcall 0 core_initcall 1 postcore_initcall 2 arch_initcall 3 subsys_initcall 4 fs_initcall 5 device_initcall 6 late_initcall 7 In the interim, all __initcall are mapped onto device_initcall, which as can be seen above, comes quite late in the ordering. Currently the mm_kobj is created with __initcall in mm_sysfs_init(). This means that any other initcalls that want to reference the mm_kobj have to be device_initcall (or later), otherwise we will for example, trip the BUG_ON(!kobj) in sysfs's internal_create_group(). This unfairly restricts those users; for example something that clearly makes sense to be an arch_initcall will not be able to choose that. However, upon examination, it is only this way for historical reasons (i.e. simply not reprioritized yet). We see that sysfs is ready quite earlier in init/main.c via: vfs_caches_init |_ mnt_init |_ sysfs_init well ahead of the processing of the prioritized calls listed above. So we can recategorize mm_sysfs_init to be a pure_initcall, which in turn allows any mm_kobj initcall users a wider range (1 --> 7) of initcall priorities to choose from. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Han Pingtian authored
min_free_kbytes may be raised during THP's initialization. Sometimes, this will change the value which was set by the user. Showing this message will clarify this confusion. Only show this message when changing a value which was set by the user according to Michal Hocko's suggestion. Show the old value of min_free_kbytes according to Dave Hansen's suggestion. This will give user the chance to restore old value of min_free_kbytes. Signed-off-by: Han Pingtian <hanpt@linux.vnet.ibm.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Nathan Zimmer authored
We don't need to do register_memory_resource() under lock_memory_hotplug() since it has its own lock and doesn't make any callbacks. Also register_memory_resource return NULL on failure so we don't have anything to cleanup at this point. The reason for this rfc is I was doing some experiments with hotplugging of memory on some of our larger systems. While it seems to work, it can be quite slow. With some preliminary digging I found that lock_memory_hotplug is clearly ripe for breakup. It could be broken up per nid or something but it also covers the online_page_callback. The online_page_callback shouldn't be very hard to break out. Also there is the issue of various structures(wmarks come to mind) that are only updated under the lock_memory_hotplug that would need to be dealt with. Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Hedi <hedi@sgi.com> Cc: Mike Travis <travis@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Philipp Hachtmann authored
get_allocated_memblock_reserved_regions_info() should work if it is compiled in. Extended the ifdef around get_allocated_memblock_memory_regions_info() to include get_allocated_memblock_reserved_regions_info() as well. Similar changes in nobootmem.c/free_low_memory_core_early() where the two functions are called. [akpm@linux-foundation.org: cleanup] Signed-off-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Cc: qiuxishi <qiuxishi@huawei.com> Cc: David Howells <dhowells@redhat.com> Cc: Daeseok Youn <daeseok.youn@gmail.com> Cc: Jiang Liu <liuj97@gmail.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vladimir Davydov authored
If a shrinker is not NUMA-aware, shrink_slab() should call it exactly once with nid=0, but currently it is not true: if node 0 is not set in the nodemask or if it is not online, we will not call such shrinkers at all. As a result some slabs will be left untouched under some circumstances. Let us fix it. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Reported-by: Dave Chinner <dchinner@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: Glauber Costa <glommer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vladimir Davydov authored
When reclaiming kmem, we currently don't scan slabs that have less than batch_size objects (see shrink_slab_node()): while (total_scan >= batch_size) { shrinkctl->nr_to_scan = batch_size; shrinker->scan_objects(shrinker, shrinkctl); total_scan -= batch_size; } If there are only a few shrinkers available, such a behavior won't cause any problems, because the batch_size is usually small, but if we have a lot of slab shrinkers, which is perfectly possible since FS shrinkers are now per-superblock, we can end up with hundreds of megabytes of practically unreclaimable kmem objects. For instance, mounting a thousand of ext2 FS images with a hundred of files in each and iterating over all the files using du(1) will result in about 200 Mb of FS caches that cannot be dropped even with the aid of the vm.drop_caches sysctl! This problem was initially pointed out by Glauber Costa [*]. Glauber proposed to fix it by making the shrink_slab() always take at least one pass, to put it simply, turning the scan loop above to a do{}while() loop. However, this proposal was rejected, because it could result in more aggressive and frequent slab shrinking even under low memory pressure when total_scan is naturally very small. This patch is a slightly modified version of Glauber's approach. Similarly to Glauber's patch, it makes shrink_slab() scan less than batch_size objects, but only if the total number of objects we want to scan (total_scan) is greater than the total number of objects available (max_pass). Since total_scan is biased as half max_pass if the current delta change is small: if (delta < max_pass / 4) total_scan = min(total_scan, max_pass / 2); this is only possible if we are scanning at high prio. That said, this patch shouldn't change the vmscan behaviour if the memory pressure is low, but if we are tight on memory, we will do our best by trying to reclaim all available objects, which sounds reasonable. [*] http://www.spinics.net/lists/cgroups/msg06913.htmlSigned-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Glauber Costa <glommer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Wanpeng Li authored
Commit 7851a45c ("mm: numa: Copy cpupid on page migration") copiess over the cpupid at page migration time. It is unnecessary to set it again in migrate_misplaced_transhuge_page(). Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jianguo Wu authored
Two cleanups: 1. remove redundant codes for hugetlb pages. 2. end = pmd_addr_end(addr, end) restricts [addr, end) within PMD_SIZE, this may increase do_mincore() calls, remove it. Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: qiuxishi <qiuxishi@huawei.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Shawn Guo authored
Compiling a C file which includes genalloc.h but without spinlock_types.h being included before, we will see the compile error below. include/linux/genalloc.h:54:2: error: unknown type name `spinlock_t' Include spinlock_types.h from genalloc.h to fix the problem. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Vinayak Menon authored
When irq, preempt and lockdep fields are printed (field 3 in the example below) in the trace output, the script fails. An example entry: kswapd0-610 [000] ...1 158.112152: mm_vmscan_kswapd_wake: nid=0 order=0 Signed-off-by: Vinayak Menon <vinayakm.list@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Han Pingtian authored
If echo -1 > /proc/vm/sys/min_free_kbytes, the system will hang. Changing proc_dointvec() to proc_dointvec_minmax() in the min_free_kbytes_sysctl_handler() can prevent this to happen. mhocko said: : You can still do echo $BIG_VALUE > /proc/vm/sys/min_free_kbytes and make : your machine unusable but I agree that proc_dointvec_minmax is more : suitable here as we already have: : : .proc_handler = min_free_kbytes_sysctl_handler, : .extra1 = &zero, : : It used to work properly but then 6fce56ec ("sysctl: Remove references : to ctl_name and strategy from the generic sysctl table") has removed : sysctl_intvec strategy and so extra1 is ignored. Signed-off-by: Han Pingtian <hanpt@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Michal Hocko authored
Commit 11c731e8 ("mm/mempolicy: fix !vma in new_vma_page()") has removed BUG_ON(!vma) from new_vma_page which is partially correct because page_address_in_vma will return EFAULT for non-linear mappings and at least shared shmem might be mapped this way. The patch also tried to prevent NULL ptr for hugetlb pages which is not correct AFAICS because hugetlb pages cannot be mapped as VM_NONLINEAR and other conditions in page_address_in_vma seem to be legit and catch real bugs. This patch restores BUG_ON for PageHuge to catch potential issues when the to-be-migrated page is not setup properly. Signed-off-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Bob Liu <bob.liu@oracle.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Naoya Horiguchi authored
After thp split in hwpoison_user_mappings(), we hold page lock on the raw error page only between try_to_unmap, hence we are in danger of race condition. I found in the RHEL7 MCE-relay testing that we have "bad page" error when a memory error happens on a thp tail page used by qemu-kvm: Triggering MCE exception on CPU 10 mce: [Hardware Error]: Machine check events logged MCE exception done on CPU 10 MCE 0x38c535: Killing qemu-kvm:8418 due to hardware memory corruption MCE 0x38c535: dirty LRU page recovery: Recovered qemu-kvm[8418]: segfault at 20 ip 00007ffb0f0f229a sp 00007fffd6bc5240 error 4 in qemu-kvm[7ffb0ef14000+420000] BUG: Bad page state in process qemu-kvm pfn:38c400 page:ffffea000e310000 count:0 mapcount:0 mapping: (null) index:0x7ffae3c00 page flags: 0x2fffff0008001d(locked|referenced|uptodate|dirty|swapbacked) Modules linked in: hwpoison_inject mce_inject vhost_net macvtap macvlan ... CPU: 0 PID: 8418 Comm: qemu-kvm Tainted: G M -------------- 3.10.0-54.0.1.el7.mce_test_fixed.x86_64 #1 Hardware name: NEC NEC Express5800/R120b-1 [N8100-1719F]/MS-91E7-001, BIOS 4.6.3C19 02/10/2011 Call Trace: dump_stack+0x19/0x1b bad_page.part.59+0xcf/0xe8 free_pages_prepare+0x148/0x160 free_hot_cold_page+0x31/0x140 free_hot_cold_page_list+0x46/0xa0 release_pages+0x1c1/0x200 free_pages_and_swap_cache+0xad/0xd0 tlb_flush_mmu.part.46+0x4c/0x90 tlb_finish_mmu+0x55/0x60 exit_mmap+0xcb/0x170 mmput+0x67/0xf0 vhost_dev_cleanup+0x231/0x260 [vhost_net] vhost_net_release+0x3f/0x90 [vhost_net] __fput+0xe9/0x270 ____fput+0xe/0x10 task_work_run+0xc4/0xe0 do_exit+0x2bb/0xa40 do_group_exit+0x3f/0xa0 get_signal_to_deliver+0x1d0/0x6e0 do_signal+0x48/0x5e0 do_notify_resume+0x71/0xc0 retint_signal+0x48/0x8c The reason of this bug is that a page fault happens before unlocking the head page at the end of memory_failure(). This strange page fault is trying to access to address 0x20 and I'm not sure why qemu-kvm does this, but anyway as a result the SIGSEGV makes qemu-kvm exit and on the way we catch the bad page bug/warning because we try to free a locked page (which was the former head page.) To fix this, this patch suggests to shift page lock from head page to tail page just after thp split. SIGSEGV still happens, but it affects only error affected VMs, not a whole system. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> [3.9+] # a3e0f9e4 "mm/memory-failure.c: transfer page count from head page to tail page after split thp" Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andi Kleen authored
Add a working sysctl to enable/disable automatic numa memory balancing at runtime. This allows us to track down performance problems with this feature and is generally a good idea. This was possible earlier through debugfs, but only with special debugging options set. Also fix the boot message. [akpm@linux-foundation.org: s/sched_numa_balancing/sysctl_numa_balancing/] Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-