1. 06 Nov, 2015 40 commits
    • Dave Hansen's avatar
      mm, hugetlbfs: optimize when NUMA=n · e0ec90ee
      Dave Hansen authored
      My recent patch "mm, hugetlb: use memory policy when available" added some
      bloat to hugetlb.o.  This patch aims to get some of the bloat back,
      especially when NUMA is not in play.
      
      It does this with an implicit #ifdef and marking some things static that
      should have been static in my first patch.  It also makes the warnings
      only VM_WARN_ON()s.  They were responsible for a pretty big chunk of the
      bloat.
      
      Doing this gets our NUMA=n text size back to a wee bit _below_ where we
      started before the original patch.
      
      It also shaves a bit of space off the NUMA=y case, but not much.
      Enforcing the mempolicy definitely takes some text and it's hard to avoid.
      
      size(1) output:
      
         text	   data	    bss	    dec	    hex	filename
        30745	   3433	   2492	  36670	   8f3e	hugetlb.o.nonuma.baseline
        31305	   3755	   2492	  37552	   92b0	hugetlb.o.nonuma.patch1
        30713	   3433	   2492	  36638	   8f1e	hugetlb.o.nonuma.patch2 (this patch)
        25235	    473	  41276	  66984	  105a8	hugetlb.o.numa.baseline
        25715	    475	  41276	  67466	  1078a	hugetlb.o.numa.patch1
        25491	    473	  41276	  67240	  106a8	hugetlb.o.numa.patch2 (this patch)
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e0ec90ee
    • Dave Hansen's avatar
      mm, hugetlb: use memory policy when available · 099730d6
      Dave Hansen authored
      I have a hugetlbfs user which is never explicitly allocating huge pages
      with 'nr_hugepages'.  They only set 'nr_overcommit_hugepages' and then let
      the pages be allocated from the buddy allocator at fault time.
      
      This works, but they noticed that mbind() was not doing them any good and
      the pages were being allocated without respect for the policy they
      specified.
      
      The code in question is this:
      
      > struct page *alloc_huge_page(struct vm_area_struct *vma,
      ...
      >         page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, gbl_chg);
      >         if (!page) {
      >                 page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
      
      dequeue_huge_page_vma() is smart and will respect the VMA's memory policy.
       But, it only grabs _existing_ huge pages from the huge page pool.  If the
      pool is empty, we fall back to alloc_buddy_huge_page() which obviously
      can't do anything with the VMA's policy because it isn't even passed the
      VMA.
      
      Almost everybody preallocates huge pages.  That's probably why nobody has
      ever noticed this.  Looking back at the git history, I don't think this
      _ever_ worked from when alloc_buddy_huge_page() was introduced in
      7893d1d5, 8 years ago.
      
      The fix is to pass vma/addr down in to the places where we actually call
      in to the buddy allocator.  It's fairly straightforward plumbing.  This
      has been lightly tested.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      099730d6
    • Alexander Kuleshov's avatar
      mm/hugetlb: make node_hstates array static · b4e289a6
      Alexander Kuleshov authored
      There are no users of the node_hstates array outside of the
      mm/hugetlb.c. So let's make it static.
      Signed-off-by: default avatarAlexander Kuleshov <kuleshovmail@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b4e289a6
    • Rasmus Villemoes's avatar
      mm/maccess.c: actually return -EFAULT from strncpy_from_unsafe · 9dd861d5
      Rasmus Villemoes authored
      As far as I can tell, strncpy_from_unsafe never returns -EFAULT.  ret is
      the result of a __copy_from_user_inatomic(), which is 0 for success and
      positive (in this case necessarily 1) for access error - it is never
      negative.  So we were always returning the length of the, possibly
      truncated, destination string.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9dd861d5
    • Andrew Morton's avatar
      mm/cma.c: suppress warning · 3acaea68
      Andrew Morton authored
      mm/cma.c: In function 'cma_alloc':
      mm/cma.c:366: warning: 'pfn' may be used uninitialized in this function
      
      The patch actually improves the tracing a bit: if alloc_contig_range()
      fails, tracing will display the offending pfn rather than -1.
      
      Cc: Stefan Strogin <stefan.strogin@gmail.com>
      Cc: Michal Nazarewicz <mpn@google.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
      Cc: Thierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3acaea68
    • Hugh Dickins's avatar
      mm: migrate dirty page without clear_page_dirty_for_io etc · 42cb14b1
      Hugh Dickins authored
      clear_page_dirty_for_io() has accumulated writeback and memcg subtleties
      since v2.6.16 first introduced page migration; and the set_page_dirty()
      which completed its migration of PageDirty, later had to be moderated to
      __set_page_dirty_nobuffers(); then PageSwapBacked had to skip that too.
      
      No actual problems seen with this procedure recently, but if you look into
      what the clear_page_dirty_for_io(page)+set_page_dirty(newpage) is actually
      achieving, it turns out to be nothing more than moving the PageDirty flag,
      and its NR_FILE_DIRTY stat from one zone to another.
      
      It would be good to avoid a pile of irrelevant decrementations and
      incrementations, and improper event counting, and unnecessary descent of
      the radix_tree under tree_lock (to set the PAGECACHE_TAG_DIRTY which
      radix_tree_replace_slot() left in place anyway).
      
      Do the NR_FILE_DIRTY movement, like the other stats movements, while
      interrupts still disabled in migrate_page_move_mapping(); and don't even
      bother if the zone is the same.  Do the PageDirty movement there under
      tree_lock too, where old page is frozen and newpage not yet visible:
      bearing in mind that as soon as newpage becomes visible in radix_tree, an
      un-page-locked set_page_dirty() might interfere (or perhaps that's just
      not possible: anything doing so should already hold an additional
      reference to the old page, preventing its migration; but play safe).
      
      But we do still need to transfer PageDirty in migrate_page_copy(), for
      those who don't go the mapping route through migrate_page_move_mapping().
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      42cb14b1
    • Hugh Dickins's avatar
      mm: page migration avoid touching newpage until no going back · cf4b769a
      Hugh Dickins authored
      We have had trouble in the past from the way in which page migration's
      newpage is initialized in dribs and drabs - see commit 8bdd6380 ("mm:
      fix direct reclaim writeback regression") which proposed a cleanup.
      
      We have no actual problem now, but I think the procedure would be clearer
      (and alternative get_new_page pools safer to implement) if we assert that
      newpage is not touched until we are sure that it's going to be used -
      except for taking the trylock on it in __unmap_and_move().
      
      So shift the early initializations from move_to_new_page() into
      migrate_page_move_mapping(), mapping and NULL-mapping paths.  Similarly
      migrate_huge_page_move_mapping(), but its NULL-mapping path can just be
      deleted: you cannot reach hugetlbfs_migrate_page() with a NULL mapping.
      
      Adjust stages 3 to 8 in the Documentation file accordingly.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cf4b769a
    • Hugh Dickins's avatar
      mm: page migration use migration entry for swapcache too · 470f119f
      Hugh Dickins authored
      Hitherto page migration has avoided using a migration entry for a
      swapcache page mapped into userspace, apparently for historical reasons.
      So any page blessed with swapcache would entail a minor fault when it's
      next touched, which page migration otherwise tries to avoid.  Swapcache in
      an mlocked area is rare, so won't often matter, but still better fixed.
      
      Just rearrange the block in try_to_unmap_one(), to handle TTU_MIGRATION
      before checking PageAnon, that's all (apart from some reindenting).
      
      Well, no, that's not quite all: doesn't this by the way fix a soft_dirty
      bug, that page migration of a file page was forgetting to transfer the
      soft_dirty bit?  Probably not a serious bug: if I understand correctly,
      soft_dirty afficionados usually have to handle file pages separately
      anyway; but we publish the bit in /proc/<pid>/pagemap on file mappings as
      well as anonymous, so page migration ought not to perturb it.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      470f119f
    • Hugh Dickins's avatar
      mm: simplify page migration's anon_vma comment and flow · 03f15c86
      Hugh Dickins authored
      __unmap_and_move() contains a long stale comment on page_get_anon_vma()
      and PageSwapCache(), with an odd control flow that's hard to follow.
      Mostly this reflects our confusion about the lifetime of an anon_vma, in
      the early days of page migration, before we could take a reference to one.
       Nowadays this seems quite straightforward: cut it all down to essentials.
      
      I cannot see the relevance of swapcache here at all, so don't treat it any
      differently: I believe the old comment reflects in part our anon_vma
      confusions, and in part the original v2.6.16 page migration technique,
      which used actual swap to migrate anon instead of swap-like migration
      entries.  Why should a swapcache page not be migrated with the aid of
      migration entry ptes like everything else?  So lose that comment now, and
      enable migration entries for swapcache in the next patch.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      03f15c86
    • Hugh Dickins's avatar
      mm: page migration remove_migration_ptes at lock+unlock level · 5c3f9a67
      Hugh Dickins authored
      Clean up page migration a little more by calling remove_migration_ptes()
      from the same level, on success or on failure, from __unmap_and_move() or
      from unmap_and_move_huge_page().
      
      Don't reset page->mapping of a PageAnon old page in move_to_new_page(),
      leave that to when the page is freed.  Except for here in page migration,
      it has been an invariant that a PageAnon (bit set in page->mapping) page
      stays PageAnon until it is freed, and I think we're safer to keep to that.
      
      And with the above rearrangement, it's necessary because zap_pte_range()
      wants to identify whether a migration entry represents a file or an anon
      page, to update the appropriate rss stats without waiting on it.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5c3f9a67
    • Hugh Dickins's avatar
      mm: page migration trylock newpage at same level as oldpage · 7db7671f
      Hugh Dickins authored
      Clean up page migration a little by moving the trylock of newpage from
      move_to_new_page() into __unmap_and_move(), where the old page has been
      locked.  Adjust unmap_and_move_huge_page() and balloon_page_migrate()
      accordingly.
      
      But make one kind-of-functional change on the way: whereas trylock of
      newpage used to BUG() if it failed, now simply return -EAGAIN if so.
      Cutting out BUG()s is good, right?  But, to be honest, this is really to
      extend the usefulness of the custom put_new_page feature, allowing a pool
      of new pages to be shared perhaps with racing uses.
      
      Use an "else" instead of that "skip_unmap" label.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: default avatarRafael Aquini <aquini@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7db7671f
    • Hugh Dickins's avatar
      mm: page migration use the put_new_page whenever necessary · 2def7424
      Hugh Dickins authored
      I don't know of any problem from the way it's used in our current tree,
      but there is one defect in page migration's custom put_new_page feature.
      
      An unused newpage is expected to be released with the put_new_page(), but
      there was one MIGRATEPAGE_SUCCESS (0) path which released it with
      putback_lru_page(): which can be very wrong for a custom pool.
      
      Fixed more easily by resetting put_new_page once it won't be needed, than
      by adding a further flag to modify the rc test.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2def7424
    • Hugh Dickins's avatar
      mm: correct a couple of page migration comments · 14e0f9bc
      Hugh Dickins authored
      It's migrate.c not migration,c, and nowadays putback_movable_pages() not
      putback_lru_pages().
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: default avatarRafael Aquini <aquini@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      14e0f9bc
    • Hugh Dickins's avatar
      mm: rename mem_cgroup_migrate to mem_cgroup_replace_page · 45637bab
      Hugh Dickins authored
      After v4.3's commit 0610c25d ("memcg: fix dirty page migration")
      mem_cgroup_migrate() doesn't have much to offer in page migration: convert
      migrate_misplaced_transhuge_page() to set_page_memcg() instead.
      
      Then rename mem_cgroup_migrate() to mem_cgroup_replace_page(), since its
      remaining callers are replace_page_cache_page() and shmem_replace_page():
      both of whom passed lrucare true, so just eliminate that argument.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      45637bab
    • Hugh Dickins's avatar
      mm: page migration fix PageMlocked on migrated pages · 51afb12b
      Hugh Dickins authored
      Commit e6c509f8 ("mm: use clear_page_mlock() in page_remove_rmap()")
      in v3.7 inadvertently made mlock_migrate_page() impotent: page migration
      unmaps the page from userspace before migrating, and that commit clears
      PageMlocked on the final unmap, leaving mlock_migrate_page() with
      nothing to do.  Not a serious bug, the next attempt at reclaiming the
      page would fix it up; but a betrayal of page migration's intent - the
      new page ought to emerge as PageMlocked.
      
      I don't see how to fix it for mlock_migrate_page() itself; but easily
      fixed in remove_migration_pte(), by calling mlock_vma_page() when the vma
      is VM_LOCKED - under pte lock as in try_to_unmap_one().
      
      Delete mlock_migrate_page()?  Not quite, it does still serve a purpose for
      migrate_misplaced_transhuge_page(): where we could replace it by a test,
      clear_page_mlock(), mlock_vma_page() sequence; but would that be an
      improvement?  mlock_migrate_page() is fairly lean, and let's make it
      leaner by skipping the irq save/restore now clearly not needed.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      51afb12b
    • Hugh Dickins's avatar
      mm: rmap use pte lock not mmap_sem to set PageMlocked · b87537d9
      Hugh Dickins authored
      KernelThreadSanitizer (ktsan) has shown that the down_read_trylock() of
      mmap_sem in try_to_unmap_one() (when going to set PageMlocked on a page
      found mapped in a VM_LOCKED vma) is ineffective against races with
      exit_mmap()'s munlock_vma_pages_all(), because mmap_sem is not held when
      tearing down an mm.
      
      But that's okay, those races are benign; and although we've believed for
      years in that ugly down_read_trylock(), it's unsuitable for the job, and
      frustrates the good intention of setting PageMlocked when it fails.
      
      It just doesn't matter if here we read vm_flags an instant before or after
      a racing mlock() or munlock() or exit_mmap() sets or clears VM_LOCKED: the
      syscalls (or exit) work their way up the address space (taking pt locks
      after updating vm_flags) to establish the final state.
      
      We do still need to be careful never to mark a page Mlocked (hence
      unevictable) by any race that will not be corrected shortly after.  The
      page lock protects from many of the races, but not all (a page is not
      necessarily locked when it's unmapped).  But the pte lock we just dropped
      is good to cover the rest (and serializes even with
      munlock_vma_pages_all(), so no special barriers required): now hold on to
      the pte lock while calling mlock_vma_page().  Is that lock ordering safe?
      Yes, that's how follow_page_pte() calls it, and how page_remove_rmap()
      calls the complementary clear_page_mlock().
      
      This fixes the following case (though not a case which anyone has
      complained of), which mmap_sem did not: truncation's preliminary
      unmap_mapping_range() is supposed to remove even the anonymous COWs of
      filecache pages, and that might race with try_to_unmap_one() on a
      VM_LOCKED vma, so that mlock_vma_page() sets PageMlocked just after
      zap_pte_range() unmaps the page, causing "Bad page state (mlocked)" when
      freed.  The pte lock protects against this.
      
      You could say that it also protects against the more ordinary case, racing
      with the preliminary unmapping of a filecache page itself: but in our
      current tree, that's independently protected by i_mmap_rwsem; and that
      race would be why "Bad page state (mlocked)" was seen before commit
      48ec833b ("Revert mm/memory.c: share the i_mmap_rwsem").
      
      Vlastimil Babka points out another race which this patch protects against.
       try_to_unmap_one() might reach its mlock_vma_page() TestSetPageMlocked a
      moment after munlock_vma_pages_all() did its Phase 1 TestClearPageMlocked:
      leaving PageMlocked and unevictable when it should be evictable.  mmap_sem
      is ineffective because exit_mmap() does not hold it; page lock ineffective
      because __munlock_pagevec() only takes it afterwards, in Phase 2; pte lock
      is effective because __munlock_pagevec_fill() takes it to get the page,
      after VM_LOCKED was cleared from vm_flags, so visible to try_to_unmap_one.
      
      Kirill Shutemov points out that if the compiler chooses to implement a
      "vma->vm_flags &= VM_WHATEVER" or "vma->vm_flags |= VM_WHATEVER" operation
      with an intermediate store of unrelated bits set, since I'm here foregoing
      its usual protection by mmap_sem, try_to_unmap_one() might catch sight of
      a spurious VM_LOCKED in vm_flags, and make the wrong decision.  This does
      not appear to be an immediate problem, but we may want to define vm_flags
      accessors in future, to guard against such a possibility.
      
      While we're here, make a related optimization in try_to_munmap_one(): if
      it's doing TTU_MUNLOCK, then there's no point at all in descending the
      page tables and getting the pt lock, unless the vma is VM_LOCKED.  Yes,
      that can change racily, but it can change racily even without the
      optimization: it's not critical.  Far better not to waste time here.
      
      Stopped short of separating try_to_munlock_one() from try_to_munmap_one()
      on this occasion, but that's probably the sensible next step - with a
      rename, given that try_to_munlock()'s business is to try to set Mlocked.
      
      Updated the unevictable-lru Documentation, to remove its reference to mmap
      semaphore, but found a few more updates needed in just that area.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b87537d9
    • Hugh Dickins's avatar
      mm Documentation: undoc non-linear vmas · 7a14239a
      Hugh Dickins authored
      While updating some mm Documentation, I came across a few straggling
      references to the non-linear vmas which were happily removed in v4.0.
      Delete them.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7a14239a
    • Vladimir Davydov's avatar
      mm: do not inc NR_PAGETABLE if ptlock_init failed · 706874e9
      Vladimir Davydov authored
      If ALLOC_SPLIT_PTLOCKS is defined, ptlock_init may fail, in which case we
      shouldn't increment NR_PAGETABLE.
      
      Since small allocations, such as ptlock, normally do not fail (currently
      they can fail if kmemcg is used though), this patch does not really fix
      anything and should be considered as a code cleanup.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      706874e9
    • Laurent Dufour's avatar
      mm: clear_soft_dirty_pmd() requires THP · 5d3875a0
      Laurent Dufour authored
      Don't build clear_soft_dirty_pmd() if transparent huge pages are not
      enabled.
      Signed-off-by: default avatarLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5d3875a0
    • Laurent Dufour's avatar
      mm: clear pte in clear_soft_dirty() · 326c2597
      Laurent Dufour authored
      As mentioned in the commit 56eecdb9 ("mm: Use ptep/pmdp_set_numa()
      for updating _PAGE_NUMA bit"), architectures like ppc64 don't do tlb
      flush in set_pte/pmd functions.
      
      So when dealing with existing pte in clear_soft_dirty, the pte must be
      cleared before being modified.
      Signed-off-by: default avatarLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      326c2597
    • Andrea Arcangeli's avatar
      ksm: unstable_tree_search_insert error checking cleanup · c8f95ed1
      Andrea Arcangeli authored
      get_mergeable_page() can only return NULL (also in case of errors) or the
      pinned mergeable page.  It can't return an error different than NULL.
      This optimizes away the unnecessary error check.
      
      Add a return after the "out:" label in the callee to make it more
      readable.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Petr Holasek <pholasek@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c8f95ed1
    • Andrea Arcangeli's avatar
      ksm: use find_mergeable_vma in try_to_merge_with_ksm_page · 85c6e8dd
      Andrea Arcangeli authored
      Doing the VM_MERGEABLE check after the page == kpage check won't provide
      any meaningful benefit.  The !vma->anon_vma check of find_mergeable_vma is
      the only superfluous bit in using find_mergeable_vma because the !PageAnon
      check of try_to_merge_one_page() implicitly checks for that, but it still
      looks cleaner to share the same find_mergeable_vma().
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Petr Holasek <pholasek@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      85c6e8dd
    • Andrea Arcangeli's avatar
      ksm: use the helper method to do the hlist_empty check · 98666f8a
      Andrea Arcangeli authored
      This just uses the helper function to cleanup the assumption on the
      hlist_node internals.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Petr Holasek <pholasek@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      98666f8a
    • Andrea Arcangeli's avatar
      ksm: don't fail stable tree lookups if walking over stale stable_nodes · f2e5ff85
      Andrea Arcangeli authored
      The stable_nodes can become stale at any time if the underlying pages gets
      freed.  The stable_node gets collected and removed from the stable rbtree
      if that is detected during the rbtree lookups.
      
      Don't fail the lookup if running into stale stable_nodes, just restart the
      lookup after collecting the stale stable_nodes.  Otherwise the CPU spent
      in the preparation stage is wasted and the lookup must be repeated at the
      next loop potentially failing a second time in a second stale stable_node.
      
      If we don't prune aggressively we delay the merging of the unstable node
      candidates and at the same time we delay the freeing of the stale
      stable_nodes.  Keeping stale stable_nodes around wastes memory and it
      can't provide any benefit.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Petr Holasek <pholasek@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f2e5ff85
    • Andrea Arcangeli's avatar
      ksm: add cond_resched() to the rmap_walks · ad12695f
      Andrea Arcangeli authored
      While at it add it to the file and anon walks too.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Petr Holasek <pholasek@redhat.com>
      Acked-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ad12695f
    • Vladimir Davydov's avatar
      memcg: simplify and inline __mem_cgroup_from_kmem · df406551
      Vladimir Davydov authored
      Before the previous patch ("memcg: unify slab and other kmem pages
      charging"), __mem_cgroup_from_kmem had to handle two types of kmem - slab
      pages and pages allocated with alloc_kmem_pages - memcg in the page
      struct.  Now we can unify it.  Since after it, this function becomes tiny
      we can fold it into mem_cgroup_from_kmem.
      
      [hughd@google.com: move mem_cgroup_from_kmem into list_lru.c]
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      df406551
    • Vladimir Davydov's avatar
      memcg: unify slab and other kmem pages charging · f3ccb2c4
      Vladimir Davydov authored
      We have memcg_kmem_charge and memcg_kmem_uncharge methods for charging and
      uncharging kmem pages to memcg, but currently they are not used for
      charging slab pages (i.e.  they are only used for charging pages allocated
      with alloc_kmem_pages).  The only reason why the slab subsystem uses
      special helpers, memcg_charge_slab and memcg_uncharge_slab, is that it
      needs to charge to the memcg of kmem cache while memcg_charge_kmem charges
      to the memcg that the current task belongs to.
      
      To remove this diversity, this patch adds an extra argument to
      __memcg_kmem_charge that can be a pointer to a memcg or NULL.  If it is
      not NULL, the function tries to charge to the memcg it points to,
      otherwise it charge to the current context.  Next, it makes the slab
      subsystem use this function to charge slab pages.
      
      Since memcg_charge_kmem and memcg_uncharge_kmem helpers are now used only
      in __memcg_kmem_charge and __memcg_kmem_uncharge, they are inlined.  Since
      __memcg_kmem_charge stores a pointer to the memcg in the page struct, we
      don't need memcg_uncharge_slab anymore and can use free_kmem_pages.
      Besides, one can now detect which memcg a slab page belongs to by reading
      /proc/kpagecgroup.
      
      Note, this patch switches slab to charge-after-alloc design.  Since this
      design is already used for all other memcg charges, it should not make any
      difference.
      
      [hannes@cmpxchg.org: better to have an outer function than a magic parameter for the memcg lookup]
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f3ccb2c4
    • Vladimir Davydov's avatar
      memcg: simplify charging kmem pages · d05e83a6
      Vladimir Davydov authored
      Charging kmem pages proceeds in two steps.  First, we try to charge the
      allocation size to the memcg the current task belongs to, then we allocate
      a page and "commit" the charge storing the pointer to the memcg in the
      page struct.
      
      Such a design looks overcomplicated, because there is not much sense in
      trying charging the allocation before actually allocating a page: we won't
      be able to consume much memory over the limit even if we charge after
      doing the actual allocation, besides we already charge user pages post
      factum, so being pedantic with kmem pages just looks pointless.
      
      So this patch simplifies the design by merging the "charge" and the
      "commit" steps into the same function, which takes the allocated page.
      
      Also, rename the charge and uncharge methods to memcg_kmem_charge and
      memcg_kmem_uncharge and make the charge method return error code instead
      of bool to conform to mem_cgroup_try_charge.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d05e83a6
    • Xishi Qiu's avatar
      mm/page_alloc.c: skip ZONE_MOVABLE if required_kernelcore is larger than totalpages · bde304bd
      Xishi Qiu authored
      If kernelcore was not specified, or the kernelcore size is zero
      (required_movablecore >= totalpages), or the kernelcore size is larger
      than totalpages, there is no ZONE_MOVABLE.  We should fill the zone with
      both kernel memory and movable memory.
      Signed-off-by: default avatarXishi Qiu <qiuxishi@huawei.com>
      Reviewed-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: <zhongjiang@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bde304bd
    • Davidlohr Bueso's avatar
      mm/vmacache: inline vmacache_valid_mm() · a2c1aad3
      Davidlohr Bueso authored
      This function incurs in very hot paths and merely does a few loads for
      validity check.  Lets inline it, such that we can save the function call
      overhead.
      
      (akpm: this is cosmetic - the compiler already inlines vmacache_valid_mm())
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a2c1aad3
    • yalin wang's avatar
      include/linux/vm_event_item.h: change HIGHMEM_ZONE macro definition · f7ae3a95
      yalin wang authored
      Change HIGHMEM_ZONE to be the same as the DMA_ZONE macro.
      Signed-off-by: default avataryalin wang <yalin.wang2010@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f7ae3a95
    • Laura Abbott's avatar
      mm: Don't offset memmap for flatmem · a1c34a3b
      Laura Abbott authored
      Srinivas Kandagatla reported bad page messages when trying to remove the
      bottom 2MB on an ARM based IFC6410 board
      
        BUG: Bad page state in process swapper  pfn:fffa8
        page:ef7fb500 count:0 mapcount:0 mapping:  (null) index:0x0
        flags: 0x96640253(locked|error|dirty|active|arch_1|reclaim|mlocked)
        page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
        bad because of flags:
        flags: 0x200041(locked|active|mlocked)
        Modules linked in:
        CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00007-g412f9ba-dirty #816
        Hardware name: Qualcomm (Flattened Device Tree)
          unwind_backtrace
          show_stack
          dump_stack
          bad_page
          free_pages_prepare
          free_hot_cold_page
          __free_pages
          free_highmem_page
          mem_init
          start_kernel
        Disabling lock debugging due to kernel taint
      
      Removing the lower 2MB made the start of the lowmem zone to no longer be
      page block aligned.  IFC6410 uses CONFIG_FLATMEM where alloc_node_mem_map
      allocates memory for the mem_map.  alloc_node_mem_map will offset for
      unaligned nodes with the assumption the pfn/page translation functions
      will account for the offset.  The functions for CONFIG_FLATMEM do not
      offset however, resulting in overrunning the memmap array.  Just use the
      allocated memmap without any offset when running with CONFIG_FLATMEM to
      avoid the overrun.
      Signed-off-by: default avatarLaura Abbott <laura@labbott.name>
      Signed-off-by: default avatarLaura Abbott <lauraa@codeaurora.org>
      Reported-by: default avatarSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Tested-by: default avatarSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarBjorn Andersson <bjorn.andersson@sonymobile.com>
      Cc: Santosh Shilimkar <ssantosh@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Arnd Bergman <arnd@arndb.de>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Andy Gross <agross@codeaurora.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a1c34a3b
    • Andrew Morton's avatar
      mm/vmstat.c: uninline node_page_state() · c2d42c16
      Andrew Morton authored
      With x86_64 (config http://ozlabs.org/~akpm/config-akpm2.txt) and old gcc
      (4.4.4), drivers/base/node.c:node_read_meminfo() is using 2344 bytes of
      stack.  Uninlining node_page_state() reduces this to 440 bytes.
      
      The stack consumption issue is fixed by newer gcc (4.8.4) however with
      that compiler this patch reduces the node.o text size from 7314 bytes to
      4578.
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c2d42c16
    • Chen Gang's avatar
      mm/mmap.c: change __install_special_mapping() args order · 27f28b97
      Chen Gang authored
      Make __install_special_mapping() args order match the caller, so the
      caller can pass their register args directly to callee with no touch.
      
      For most of architectures, args (at least the first 5th args) are in
      registers, so this change will have effect on most of architectures.
      
      For -O2, __install_special_mapping() may be inlined under most of
      architectures, but for -Os, it should not. So this change can get a
      little better performance for -Os, at least.
      Signed-off-by: default avatarChen Gang <gang.chen.5i5j@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      27f28b97
    • Geliang Tang's avatar
      mm/nommu.c: drop unlikely inside BUG_ON() · c9427bc0
      Geliang Tang authored
      (1) For !CONFIG_BUG cases, the bug call is a no-op, so we couldn't
          care less and the change is ok.
      
      (2) ppc and mips, which HAVE_ARCH_BUG_ON, do not rely on branch
          predictions as it seems to be pointless[1] and thus callers should not
          be trying to push an optimization in the first place.
      
      (3) For CONFIG_BUG and !HAVE_ARCH_BUG_ON cases, BUG_ON() contains an
          unlikely compiler flag already.
      
      Hence, we can drop unlikely behind BUG_ON().
      
      [1] http://lkml.iu.edu/hypermail/linux/kernel/1101.3/02289.htmlSigned-off-by: default avatarGeliang Tang <geliangtang@163.com>
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c9427bc0
    • Chen Gang's avatar
      mm/mmap.c: do not initialize retval in mmap_pgoff() · 1e3ee14b
      Chen Gang authored
      When fget() fails we can return -EBADF directly.
      Signed-off-by: default avatarChen Gang <gang.chen.5i5j@gmail.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e3ee14b
    • Chen Gang's avatar
      mm/mmap.c: remove redundant statement "error = -ENOMEM" · e6ee219f
      Chen Gang authored
      It is still a little better to remove it, although it should be skipped
      by "-O2".
      
      Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>=0A=
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e6ee219f
    • Vineet Gupta's avatar
      mm: optimize PageHighMem() check · 3ca65c19
      Vineet Gupta authored
      This came up when implementing HIHGMEM/PAE40 for ARC.  The kmap() /
      kmap_atomic() generated code seemed needlessly bloated due to the way
      PageHighMem() macro is implemented.  It derives the exact zone for page
      and then does pointer subtraction with first zone to infer the zone_type.
      The pointer arithmatic in turn generates the code bloat.
      
      PageHighMem(page)
        is_highmem(page_zone(page))
           zone_off = (char *)zone - (char *)zone->zone_pgdat->node_zones
      
      Instead use is_highmem_idx() to work on zone_type available in page flags
      
         ----- Before -----
      80756348:	mov_s      r13,r0
      8075634a:	ld_s       r2,[r13,0]
      8075634c:	lsr_s      r2,r2,30
      8075634e:	mpy        r2,r2,0x2a4
      80756352:	add_s      r2,r2,0x80aef880
      80756358:	ld_s       r3,[r2,28]
      8075635a:	sub_s      r2,r2,r3
      8075635c:	breq       r2,0x2a4,80756378 <kmap+0x48>
      80756364:	breq       r2,0x548,80756378 <kmap+0x48>
      
         ----- After  -----
      80756330:	mov_s      r13,r0
      80756332:	ld_s       r2,[r13,0]
      80756334:	lsr_s      r2,r2,30
      80756336:	sub_s      r2,r2,1
      80756338:	brlo       r2,2,80756348 <kmap+0x30>
      
      For x86 defconfig build (32 bit only) it saves around 900 bytes.
      For ARC defconfig with HIGHMEM, it saved around 2K bytes.
      
         ---->8-------
      ./scripts/bloat-o-meter x86/vmlinux-defconfig-pre x86/vmlinux-defconfig-post
      add/remove: 0/0 grow/shrink: 0/36 up/down: 0/-934 (-934)
      function                                     old     new   delta
      saveable_page                                162     154      -8
      saveable_highmem_page                        154     146      -8
      skb_gro_reset_offset                         147     131     -16
      ...
      ...
      __change_page_attr_set_clr                  1715    1678     -37
      setup_data_read                              434     394     -40
      mon_bin_event                               1967    1927     -40
      swsusp_save                                 1148    1105     -43
      _set_pages_array                             549     493     -56
         ---->8-------
      
      e.g. For ARC kmap()
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Jennifer Herbert <jennifer.herbert@citrix.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3ca65c19
    • Oleg Nesterov's avatar
      mm/oom_kill: fix the wrong task->mm == mm checks in oom_kill_process() · 4d7b3394
      Oleg Nesterov authored
      Both "child->mm == mm" and "p->mm != mm" checks in oom_kill_process() are
      wrong.  task->mm can be NULL if the task is the exited group leader.  This
      means in particular that "kill sharing same memory" loop can miss a
      process with a zombie leader which uses the same ->mm.
      
      Note: the process_has_mm(child, p->mm) check is still not 100% correct,
      p->mm can be NULL too.  This is minor, but probably deserves a fix or a
      comment anyway.
      
      [akpm@linux-foundation.org: document process_shares_mm() a bit]
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Kyle Walker <kwalker@redhat.com>
      Cc: Stanislav Kozina <skozina@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4d7b3394
    • Oleg Nesterov's avatar
      mm/oom_kill: cleanup the "kill sharing same memory" loop · c319025a
      Oleg Nesterov authored
      Purely cosmetic, but the complex "if" condition looks annoying to me.
      Especially because it is not consistent with OOM_SCORE_ADJ_MIN check
      which adds another if/continue.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Kyle Walker <kwalker@redhat.com>
      Cc: Stanislav Kozina <skozina@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c319025a