1. 20 May, 2016 40 commits
    • Vlastimil Babka's avatar
      cpuset: use static key better and convert to new API · 002f2906
      Vlastimil Babka authored
      An important function for cpusets is cpuset_node_allowed(), which
      optimizes on the fact if there's a single root CPU set, it must be
      trivially allowed.  But the check "nr_cpusets() <= 1" doesn't use the
      cpusets_enabled_key static key the right way where static keys eliminate
      branching overhead with jump labels.
      
      This patch converts it so that static key is used properly.  It's also
      switched to the new static key API and the checking functions are
      converted to return bool instead of int.  We also provide a new variant
      __cpuset_zone_allowed() which expects that the static key check was
      already done and they key was enabled.  This is needed for
      get_page_from_freelist() where we want to also avoid the relatively
      slower check when ALLOC_CPUSET is not set in alloc_flags.
      
      The impact on the page allocator microbenchmark is less than expected
      but the cleanup in itself is worthwhile.
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                             multcheck-v1r20               cpuset-v1r20
        Min      alloc-odr0-1               348.00 (  0.00%)           348.00 (  0.00%)
        Min      alloc-odr0-2               254.00 (  0.00%)           254.00 (  0.00%)
        Min      alloc-odr0-4               213.00 (  0.00%)           213.00 (  0.00%)
        Min      alloc-odr0-8               186.00 (  0.00%)           183.00 (  1.61%)
        Min      alloc-odr0-16              173.00 (  0.00%)           171.00 (  1.16%)
        Min      alloc-odr0-32              166.00 (  0.00%)           163.00 (  1.81%)
        Min      alloc-odr0-64              162.00 (  0.00%)           159.00 (  1.85%)
        Min      alloc-odr0-128             160.00 (  0.00%)           157.00 (  1.88%)
        Min      alloc-odr0-256             169.00 (  0.00%)           166.00 (  1.78%)
        Min      alloc-odr0-512             180.00 (  0.00%)           180.00 (  0.00%)
        Min      alloc-odr0-1024            188.00 (  0.00%)           187.00 (  0.53%)
        Min      alloc-odr0-2048            194.00 (  0.00%)           193.00 (  0.52%)
        Min      alloc-odr0-4096            199.00 (  0.00%)           198.00 (  0.50%)
        Min      alloc-odr0-8192            202.00 (  0.00%)           201.00 (  0.50%)
        Min      alloc-odr0-16384           203.00 (  0.00%)           202.00 (  0.49%)
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarZefan Li <lizefan@huawei.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      002f2906
    • Mel Gorman's avatar
      mm, page_alloc: inline pageblock lookup in page free fast paths · 0b423ca2
      Mel Gorman authored
      The function call overhead of get_pfnblock_flags_mask() is measurable in
      the page free paths.  This patch uses an inlined version that is faster.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0b423ca2
    • Mel Gorman's avatar
      mm, page_alloc: remove unnecessary variable from free_pcppages_bulk · e5b31ac2
      Mel Gorman authored
      The original count is never reused so it can be removed.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e5b31ac2
    • Mel Gorman's avatar
      mm, page_alloc: pull out side effects from free_pages_check · da838d4f
      Mel Gorman authored
      Check without side-effects should be easier to maintain.  It also
      removes the duplicated cpupid and flags reset done in !DEBUG_VM variant
      of both free_pcp_prepare() and then bulkfree_pcp_prepare().  Finally, it
      enables the next patch.
      
      It shouldn't result in new branches, thanks to inlining of the check.
      
      !DEBUG_VM bloat-o-meter:
      
        add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-27 (-27)
        function                                     old     new   delta
        __free_pages_ok                              748     739      -9
        free_pcppages_bulk                          1403    1385     -18
      
      DEBUG_VM:
      
        add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-28 (-28)
        function                                     old     new   delta
        free_pages_prepare                           806     778     -28
      
      This is also slightly faster because cpupid information is not set on
      tail pages so we can avoid resets there.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      da838d4f
    • Mel Gorman's avatar
      mm, page_alloc: un-inline the bad part of free_pages_check · bb552ac6
      Mel Gorman authored
      From: Vlastimil Babka <vbabka@suse.cz>
      
      !DEBUG_VM size and bloat-o-meter:
      
        add/remove: 1/0 grow/shrink: 0/2 up/down: 124/-370 (-246)
        function                                     old     new   delta
        free_pages_check_bad                           -     124    +124
        free_pcppages_bulk                          1288    1171    -117
        __free_pages_ok                              948     695    -253
      
      DEBUG_VM:
      
        add/remove: 1/0 grow/shrink: 0/1 up/down: 124/-214 (-90)
        function                                     old     new   delta
        free_pages_check_bad                           -     124    +124
        free_pages_prepare                          1112     898    -214
      
      [akpm@linux-foundation.org: fix whitespace]
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bb552ac6
    • Mel Gorman's avatar
      mm, page_alloc: check multiple page fields with a single branch · 7bfec6f4
      Mel Gorman authored
      Every page allocated or freed is checked for sanity to avoid corruptions
      that are difficult to detect later.  A bad page could be due to a number
      of fields.  Instead of using multiple branches, this patch combines
      multiple fields into a single branch.  A detailed check is only
      necessary if that check fails.
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                              initonce-v1r20            multcheck-v1r20
        Min      alloc-odr0-1               359.00 (  0.00%)           348.00 (  3.06%)
        Min      alloc-odr0-2               260.00 (  0.00%)           254.00 (  2.31%)
        Min      alloc-odr0-4               214.00 (  0.00%)           213.00 (  0.47%)
        Min      alloc-odr0-8               186.00 (  0.00%)           186.00 (  0.00%)
        Min      alloc-odr0-16              173.00 (  0.00%)           173.00 (  0.00%)
        Min      alloc-odr0-32              165.00 (  0.00%)           166.00 ( -0.61%)
        Min      alloc-odr0-64              162.00 (  0.00%)           162.00 (  0.00%)
        Min      alloc-odr0-128             161.00 (  0.00%)           160.00 (  0.62%)
        Min      alloc-odr0-256             170.00 (  0.00%)           169.00 (  0.59%)
        Min      alloc-odr0-512             181.00 (  0.00%)           180.00 (  0.55%)
        Min      alloc-odr0-1024            190.00 (  0.00%)           188.00 (  1.05%)
        Min      alloc-odr0-2048            196.00 (  0.00%)           194.00 (  1.02%)
        Min      alloc-odr0-4096            202.00 (  0.00%)           199.00 (  1.49%)
        Min      alloc-odr0-8192            205.00 (  0.00%)           202.00 (  1.46%)
        Min      alloc-odr0-16384           205.00 (  0.00%)           203.00 (  0.98%)
      
      Again, the benefit is marginal but avoiding excessive branches is
      important.  Ideally the paths would not have to check these conditions
      at all but regrettably abandoning the tests would make use-after-free
      bugs much harder to detect.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7bfec6f4
    • Mel Gorman's avatar
      mm, page_alloc: remove field from alloc_context · 93ea9964
      Mel Gorman authored
      The classzone_idx can be inferred from preferred_zoneref so remove the
      unnecessary field and save stack space.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      93ea9964
    • Mel Gorman's avatar
      mm, page_alloc: avoid looking up the first zone in a zonelist twice · c33d6c06
      Mel Gorman authored
      The allocator fast path looks up the first usable zone in a zonelist and
      then get_page_from_freelist does the same job in the zonelist iterator.
      This patch preserves the necessary information.
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                              fastmark-v1r20             initonce-v1r20
        Min      alloc-odr0-1               364.00 (  0.00%)           359.00 (  1.37%)
        Min      alloc-odr0-2               262.00 (  0.00%)           260.00 (  0.76%)
        Min      alloc-odr0-4               214.00 (  0.00%)           214.00 (  0.00%)
        Min      alloc-odr0-8               186.00 (  0.00%)           186.00 (  0.00%)
        Min      alloc-odr0-16              173.00 (  0.00%)           173.00 (  0.00%)
        Min      alloc-odr0-32              165.00 (  0.00%)           165.00 (  0.00%)
        Min      alloc-odr0-64              161.00 (  0.00%)           162.00 ( -0.62%)
        Min      alloc-odr0-128             159.00 (  0.00%)           161.00 ( -1.26%)
        Min      alloc-odr0-256             168.00 (  0.00%)           170.00 ( -1.19%)
        Min      alloc-odr0-512             180.00 (  0.00%)           181.00 ( -0.56%)
        Min      alloc-odr0-1024            190.00 (  0.00%)           190.00 (  0.00%)
        Min      alloc-odr0-2048            196.00 (  0.00%)           196.00 (  0.00%)
        Min      alloc-odr0-4096            202.00 (  0.00%)           202.00 (  0.00%)
        Min      alloc-odr0-8192            206.00 (  0.00%)           205.00 (  0.49%)
        Min      alloc-odr0-16384           206.00 (  0.00%)           205.00 (  0.49%)
      
      The benefit is negligible and the results are within the noise but each
      cycle counts.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c33d6c06
    • Mel Gorman's avatar
      mm, page_alloc: shortcut watermark checks for order-0 pages · 48ee5f36
      Mel Gorman authored
      Watermarks have to be checked on every allocation including the number
      of pages being allocated and whether reserves can be accessed.  The
      reserves only matter if memory is limited and the free_pages adjustment
      only applies to high-order pages.  This patch adds a shortcut for
      order-0 pages that avoids numerous calculations if there is plenty of
      free memory yielding the following performance difference in a page
      allocator microbenchmark;
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                               optfair-v1r20             fastmark-v1r20
        Min      alloc-odr0-1               380.00 (  0.00%)           364.00 (  4.21%)
        Min      alloc-odr0-2               273.00 (  0.00%)           262.00 (  4.03%)
        Min      alloc-odr0-4               227.00 (  0.00%)           214.00 (  5.73%)
        Min      alloc-odr0-8               196.00 (  0.00%)           186.00 (  5.10%)
        Min      alloc-odr0-16              183.00 (  0.00%)           173.00 (  5.46%)
        Min      alloc-odr0-32              173.00 (  0.00%)           165.00 (  4.62%)
        Min      alloc-odr0-64              169.00 (  0.00%)           161.00 (  4.73%)
        Min      alloc-odr0-128             169.00 (  0.00%)           159.00 (  5.92%)
        Min      alloc-odr0-256             180.00 (  0.00%)           168.00 (  6.67%)
        Min      alloc-odr0-512             190.00 (  0.00%)           180.00 (  5.26%)
        Min      alloc-odr0-1024            198.00 (  0.00%)           190.00 (  4.04%)
        Min      alloc-odr0-2048            204.00 (  0.00%)           196.00 (  3.92%)
        Min      alloc-odr0-4096            209.00 (  0.00%)           202.00 (  3.35%)
        Min      alloc-odr0-8192            213.00 (  0.00%)           206.00 (  3.29%)
        Min      alloc-odr0-16384           214.00 (  0.00%)           206.00 (  3.74%)
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      48ee5f36
    • Mel Gorman's avatar
      mm, page_alloc: reduce cost of fair zone allocation policy retry · 30534755
      Mel Gorman authored
      The fair zone allocation policy is not without cost but it can be
      reduced slightly.  This patch removes an unnecessary local variable,
      checks the likely conditions of the fair zone policy first, uses a bool
      instead of a flags check and falls through when a remote node is
      encountered instead of doing a full restart.  The benefit is marginal
      but it's there
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                               decstat-v1r20              optfair-v1r20
        Min      alloc-odr0-1               377.00 (  0.00%)           380.00 ( -0.80%)
        Min      alloc-odr0-2               273.00 (  0.00%)           273.00 (  0.00%)
        Min      alloc-odr0-4               226.00 (  0.00%)           227.00 ( -0.44%)
        Min      alloc-odr0-8               196.00 (  0.00%)           196.00 (  0.00%)
        Min      alloc-odr0-16              183.00 (  0.00%)           183.00 (  0.00%)
        Min      alloc-odr0-32              175.00 (  0.00%)           173.00 (  1.14%)
        Min      alloc-odr0-64              172.00 (  0.00%)           169.00 (  1.74%)
        Min      alloc-odr0-128             170.00 (  0.00%)           169.00 (  0.59%)
        Min      alloc-odr0-256             183.00 (  0.00%)           180.00 (  1.64%)
        Min      alloc-odr0-512             191.00 (  0.00%)           190.00 (  0.52%)
        Min      alloc-odr0-1024            199.00 (  0.00%)           198.00 (  0.50%)
        Min      alloc-odr0-2048            204.00 (  0.00%)           204.00 (  0.00%)
        Min      alloc-odr0-4096            210.00 (  0.00%)           209.00 (  0.48%)
        Min      alloc-odr0-8192            213.00 (  0.00%)           213.00 (  0.00%)
        Min      alloc-odr0-16384           214.00 (  0.00%)           214.00 (  0.00%)
      
      The benefit is marginal at best but one of the most important benefits,
      avoiding a second search when falling back to another node is not
      triggered by this particular test so the benefit for some corner cases
      is understated.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30534755
    • Mel Gorman's avatar
      mm, page_alloc: shorten the page allocator fast path · 4fcb0971
      Mel Gorman authored
      The page allocator fast path checks page multiple times unnecessarily.
      This patch avoids all the slowpath checks if the first allocation
      attempt succeeds.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4fcb0971
    • Mel Gorman's avatar
      mm, page_alloc: check once if a zone has isolated pageblocks · 3777999d
      Mel Gorman authored
      When bulk freeing pages from the per-cpu lists the zone is checked for
      isolated pageblocks on every release.  This patch checks it once per
      drain.
      
      [mgorman@techsingularity.net: fix locking radce, per Vlastimil]
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3777999d
    • Mel Gorman's avatar
      mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath · 83d4ca81
      Mel Gorman authored
      __GFP_HARDWALL only has meaning in the context of cpusets but the fast
      path always applies the flag on the first attempt.  Move the
      manipulations into the cpuset paths where they will be masked by a
      static branch in the common case.
      
      With the other micro-optimisations in this series combined, the impact
      on a page allocator microbenchmark is
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                               decstat-v1r20                micro-v1r20
        Min      alloc-odr0-1               381.00 (  0.00%)           377.00 (  1.05%)
        Min      alloc-odr0-2               275.00 (  0.00%)           273.00 (  0.73%)
        Min      alloc-odr0-4               229.00 (  0.00%)           226.00 (  1.31%)
        Min      alloc-odr0-8               199.00 (  0.00%)           196.00 (  1.51%)
        Min      alloc-odr0-16              186.00 (  0.00%)           183.00 (  1.61%)
        Min      alloc-odr0-32              179.00 (  0.00%)           175.00 (  2.23%)
        Min      alloc-odr0-64              174.00 (  0.00%)           172.00 (  1.15%)
        Min      alloc-odr0-128             172.00 (  0.00%)           170.00 (  1.16%)
        Min      alloc-odr0-256             181.00 (  0.00%)           183.00 ( -1.10%)
        Min      alloc-odr0-512             193.00 (  0.00%)           191.00 (  1.04%)
        Min      alloc-odr0-1024            201.00 (  0.00%)           199.00 (  1.00%)
        Min      alloc-odr0-2048            206.00 (  0.00%)           204.00 (  0.97%)
        Min      alloc-odr0-4096            212.00 (  0.00%)           210.00 (  0.94%)
        Min      alloc-odr0-8192            215.00 (  0.00%)           213.00 (  0.93%)
        Min      alloc-odr0-16384           216.00 (  0.00%)           214.00 (  0.93%)
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83d4ca81
    • Mel Gorman's avatar
      mm, page_alloc: simplify last cpupid reset · 09940a4f
      Mel Gorman authored
      The current reset unnecessarily clears flags and makes pointless
      calculations.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09940a4f
    • Mel Gorman's avatar
      mm, page_alloc: remove unnecessary initialisation from __alloc_pages_nodemask() · 5bb1b169
      Mel Gorman authored
      page is guaranteed to be set before it is read with or without the
      initialisation.
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5bb1b169
    • Mel Gorman's avatar
      mm, page_alloc: remove unnecessary initialisation in get_page_from_freelist · be06af00
      Mel Gorman authored
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      be06af00
    • Mel Gorman's avatar
      mm, page_alloc: remove unnecessary local variable in get_page_from_freelist · 4dfa6cd8
      Mel Gorman authored
      zonelist here is a copy of a struct field that is used once.  Ditch it.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4dfa6cd8
    • Mel Gorman's avatar
      mm, page_alloc: convert nr_fair_skipped to bool · fa379b95
      Mel Gorman authored
      The number of zones skipped to a zone expiring its fair zone allocation
      quota is irrelevant.  Convert to bool.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fa379b95
    • Mel Gorman's avatar
      mm, page_alloc: convert alloc_flags to unsigned · c603844b
      Mel Gorman authored
      alloc_flags is a bitmask of flags but it is signed which does not
      necessarily generate the best code depending on the compiler.  Even
      without an impact, it makes more sense that this be unsigned.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c603844b
    • Mel Gorman's avatar
      mm, page_alloc: avoid unnecessary zone lookups during pageblock operations · f75fb889
      Mel Gorman authored
      Pageblocks have an associated bitmap to store migrate types and whether
      the pageblock should be skipped during compaction.  The bitmap may be
      associated with a memory section or a zone but the zone is looked up
      unconditionally.  The compiler should optimise this away automatically
      so this is a cosmetic patch only in many cases.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f75fb889
    • Mel Gorman's avatar
      mm, page_alloc: use __dec_zone_state for order-0 page allocation · 754078eb
      Mel Gorman authored
      __dec_zone_state is cheaper to use for removing an order-0 page as it
      has fewer conditions to check.
      
      The performance difference on a page allocator microbenchmark is;
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                               optiter-v1r20              decstat-v1r20
        Min      alloc-odr0-1               382.00 (  0.00%)           381.00 (  0.26%)
        Min      alloc-odr0-2               282.00 (  0.00%)           275.00 (  2.48%)
        Min      alloc-odr0-4               233.00 (  0.00%)           229.00 (  1.72%)
        Min      alloc-odr0-8               203.00 (  0.00%)           199.00 (  1.97%)
        Min      alloc-odr0-16              188.00 (  0.00%)           186.00 (  1.06%)
        Min      alloc-odr0-32              182.00 (  0.00%)           179.00 (  1.65%)
        Min      alloc-odr0-64              177.00 (  0.00%)           174.00 (  1.69%)
        Min      alloc-odr0-128             175.00 (  0.00%)           172.00 (  1.71%)
        Min      alloc-odr0-256             184.00 (  0.00%)           181.00 (  1.63%)
        Min      alloc-odr0-512             197.00 (  0.00%)           193.00 (  2.03%)
        Min      alloc-odr0-1024            203.00 (  0.00%)           201.00 (  0.99%)
        Min      alloc-odr0-2048            209.00 (  0.00%)           206.00 (  1.44%)
        Min      alloc-odr0-4096            214.00 (  0.00%)           212.00 (  0.93%)
        Min      alloc-odr0-8192            218.00 (  0.00%)           215.00 (  1.38%)
        Min      alloc-odr0-16384           219.00 (  0.00%)           216.00 (  1.37%)
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      754078eb
    • Mel Gorman's avatar
      mm, page_alloc: inline the fast path of the zonelist iterator · 682a3385
      Mel Gorman authored
      The page allocator iterates through a zonelist for zones that match the
      addressing limitations and nodemask of the caller but many allocations
      will not be restricted.  Despite this, there is always functional call
      overhead which builds up.
      
      This patch inlines the optimistic basic case and only calls the iterator
      function for the complex case.  A hindrance was the fact that
      cpuset_current_mems_allowed is used in the fastpath as the allowed
      nodemask even though all nodes are allowed on most systems.  The patch
      handles this by only considering cpuset_current_mems_allowed if a cpuset
      exists.  As well as being faster in the fast-path, this removes some
      junk in the slowpath.
      
      The performance difference on a page allocator microbenchmark is;
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                            statinline-v1r20              optiter-v1r20
        Min      alloc-odr0-1               412.00 (  0.00%)           382.00 (  7.28%)
        Min      alloc-odr0-2               301.00 (  0.00%)           282.00 (  6.31%)
        Min      alloc-odr0-4               247.00 (  0.00%)           233.00 (  5.67%)
        Min      alloc-odr0-8               215.00 (  0.00%)           203.00 (  5.58%)
        Min      alloc-odr0-16              199.00 (  0.00%)           188.00 (  5.53%)
        Min      alloc-odr0-32              191.00 (  0.00%)           182.00 (  4.71%)
        Min      alloc-odr0-64              187.00 (  0.00%)           177.00 (  5.35%)
        Min      alloc-odr0-128             185.00 (  0.00%)           175.00 (  5.41%)
        Min      alloc-odr0-256             193.00 (  0.00%)           184.00 (  4.66%)
        Min      alloc-odr0-512             207.00 (  0.00%)           197.00 (  4.83%)
        Min      alloc-odr0-1024            213.00 (  0.00%)           203.00 (  4.69%)
        Min      alloc-odr0-2048            220.00 (  0.00%)           209.00 (  5.00%)
        Min      alloc-odr0-4096            226.00 (  0.00%)           214.00 (  5.31%)
        Min      alloc-odr0-8192            229.00 (  0.00%)           218.00 (  4.80%)
        Min      alloc-odr0-16384           229.00 (  0.00%)           219.00 (  4.37%)
      
      perf indicated that next_zones_zonelist disappeared in the profile and
      __next_zones_zonelist did not appear.  This is expected as the
      micro-benchmark would hit the inlined fast-path every time.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      682a3385
    • Mel Gorman's avatar
      mm, page_alloc: inline zone_statistics · 060e7417
      Mel Gorman authored
      zone_statistics has one call-site but it's a public function.  Make it
      static and inline.
      
      The performance difference on a page allocator microbenchmark is;
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                            statbranch-v1r20           statinline-v1r20
        Min      alloc-odr0-1               419.00 (  0.00%)           412.00 (  1.67%)
        Min      alloc-odr0-2               305.00 (  0.00%)           301.00 (  1.31%)
        Min      alloc-odr0-4               250.00 (  0.00%)           247.00 (  1.20%)
        Min      alloc-odr0-8               219.00 (  0.00%)           215.00 (  1.83%)
        Min      alloc-odr0-16              203.00 (  0.00%)           199.00 (  1.97%)
        Min      alloc-odr0-32              195.00 (  0.00%)           191.00 (  2.05%)
        Min      alloc-odr0-64              191.00 (  0.00%)           187.00 (  2.09%)
        Min      alloc-odr0-128             189.00 (  0.00%)           185.00 (  2.12%)
        Min      alloc-odr0-256             198.00 (  0.00%)           193.00 (  2.53%)
        Min      alloc-odr0-512             210.00 (  0.00%)           207.00 (  1.43%)
        Min      alloc-odr0-1024            216.00 (  0.00%)           213.00 (  1.39%)
        Min      alloc-odr0-2048            221.00 (  0.00%)           220.00 (  0.45%)
        Min      alloc-odr0-4096            227.00 (  0.00%)           226.00 (  0.44%)
        Min      alloc-odr0-8192            232.00 (  0.00%)           229.00 (  1.29%)
        Min      alloc-odr0-16384           232.00 (  0.00%)           229.00 (  1.29%)
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      060e7417
    • Mel Gorman's avatar
      mm, page_alloc: reduce branches in zone_statistics · b9f00e14
      Mel Gorman authored
      zone_statistics has more branches than it really needs to take an
      unlikely GFP flag into account.  Reduce the number and annotate the
      unlikely flag.
      
      The performance difference on a page allocator microbenchmark is;
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                            nocompound-v1r10           statbranch-v1r10
        Min      alloc-odr0-1               417.00 (  0.00%)           419.00 ( -0.48%)
        Min      alloc-odr0-2               308.00 (  0.00%)           305.00 (  0.97%)
        Min      alloc-odr0-4               253.00 (  0.00%)           250.00 (  1.19%)
        Min      alloc-odr0-8               221.00 (  0.00%)           219.00 (  0.90%)
        Min      alloc-odr0-16              205.00 (  0.00%)           203.00 (  0.98%)
        Min      alloc-odr0-32              199.00 (  0.00%)           195.00 (  2.01%)
        Min      alloc-odr0-64              193.00 (  0.00%)           191.00 (  1.04%)
        Min      alloc-odr0-128             191.00 (  0.00%)           189.00 (  1.05%)
        Min      alloc-odr0-256             200.00 (  0.00%)           198.00 (  1.00%)
        Min      alloc-odr0-512             212.00 (  0.00%)           210.00 (  0.94%)
        Min      alloc-odr0-1024            219.00 (  0.00%)           216.00 (  1.37%)
        Min      alloc-odr0-2048            225.00 (  0.00%)           221.00 (  1.78%)
        Min      alloc-odr0-4096            231.00 (  0.00%)           227.00 (  1.73%)
        Min      alloc-odr0-8192            234.00 (  0.00%)           232.00 (  0.85%)
        Min      alloc-odr0-16384           234.00 (  0.00%)           232.00 (  0.85%)
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b9f00e14
    • Mel Gorman's avatar
      mm, page_alloc: use new PageAnonHead helper in the free page fast path · 17514574
      Mel Gorman authored
      The PageAnon check always checks for compound_head but this is a
      relatively expensive check if the caller already knows the page is a
      head page.  This patch creates a helper and uses it in the page free
      path which only operates on head pages.
      
      With this patch and "Only check PageCompound for high-order pages", the
      performance difference on a page allocator microbenchmark is;
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                                     vanilla           nocompound-v1r20
        Min      alloc-odr0-1               425.00 (  0.00%)           417.00 (  1.88%)
        Min      alloc-odr0-2               313.00 (  0.00%)           308.00 (  1.60%)
        Min      alloc-odr0-4               257.00 (  0.00%)           253.00 (  1.56%)
        Min      alloc-odr0-8               224.00 (  0.00%)           221.00 (  1.34%)
        Min      alloc-odr0-16              208.00 (  0.00%)           205.00 (  1.44%)
        Min      alloc-odr0-32              199.00 (  0.00%)           199.00 (  0.00%)
        Min      alloc-odr0-64              195.00 (  0.00%)           193.00 (  1.03%)
        Min      alloc-odr0-128             192.00 (  0.00%)           191.00 (  0.52%)
        Min      alloc-odr0-256             204.00 (  0.00%)           200.00 (  1.96%)
        Min      alloc-odr0-512             213.00 (  0.00%)           212.00 (  0.47%)
        Min      alloc-odr0-1024            219.00 (  0.00%)           219.00 (  0.00%)
        Min      alloc-odr0-2048            225.00 (  0.00%)           225.00 (  0.00%)
        Min      alloc-odr0-4096            230.00 (  0.00%)           231.00 ( -0.43%)
        Min      alloc-odr0-8192            235.00 (  0.00%)           234.00 (  0.43%)
        Min      alloc-odr0-16384           235.00 (  0.00%)           234.00 (  0.43%)
        Min      free-odr0-1                215.00 (  0.00%)           191.00 ( 11.16%)
        Min      free-odr0-2                152.00 (  0.00%)           136.00 ( 10.53%)
        Min      free-odr0-4                119.00 (  0.00%)           107.00 ( 10.08%)
        Min      free-odr0-8                106.00 (  0.00%)            96.00 (  9.43%)
        Min      free-odr0-16                97.00 (  0.00%)            87.00 ( 10.31%)
        Min      free-odr0-32                91.00 (  0.00%)            83.00 (  8.79%)
        Min      free-odr0-64                89.00 (  0.00%)            81.00 (  8.99%)
        Min      free-odr0-128               88.00 (  0.00%)            80.00 (  9.09%)
        Min      free-odr0-256              106.00 (  0.00%)            95.00 ( 10.38%)
        Min      free-odr0-512              116.00 (  0.00%)           111.00 (  4.31%)
        Min      free-odr0-1024             125.00 (  0.00%)           118.00 (  5.60%)
        Min      free-odr0-2048             133.00 (  0.00%)           126.00 (  5.26%)
        Min      free-odr0-4096             136.00 (  0.00%)           130.00 (  4.41%)
        Min      free-odr0-8192             138.00 (  0.00%)           130.00 (  5.80%)
        Min      free-odr0-16384            137.00 (  0.00%)           130.00 (  5.11%)
      
      There is a sizable boost to the free allocator performance.  While there
      is an apparent boost on the allocation side, it's likely a co-incidence
      or due to the patches slightly reducing cache footprint.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      17514574
    • Mel Gorman's avatar
      mm, page_alloc: only check PageCompound for high-order pages · d61f8590
      Mel Gorman authored
      Another year, another round of page allocator optimisations focusing
      this time on the alloc and free fast paths.  This should be of help to
      workloads that are allocator-intensive from kernel space where the cost
      of zeroing is not nceessraily incurred.
      
      The series is motivated by the observation that page alloc
      microbenchmarks on multiple machines regressed between 3.12.44 and 4.4.
      Second, there is discussions before LSF/MM considering the possibility
      of adding another page allocator which is potentially hazardous but a
      patch series improving performance is better than whining.
      
      After the series is applied, there are still hazards.  In the free
      paths, the debugging checking and page zone/pageblock lookups dominate
      but there was not an obvious solution to that.  In the alloc path, the
      major contributers are dealing with zonelists, new page preperation, the
      fair zone allocation and numerous statistic updates.  The fair zone
      allocator is removed by the per-node LRU series if that gets merged so
      it's nor a major concern at the moment.
      
      On normal userspace benchmarks, there is little impact as the zeroing
      cost is significant but it's visible
      
        aim9
                                       4.6.0-rc3             4.6.0-rc3
                                         vanilla         deferalloc-v3
        Min      page_test   828693.33 (  0.00%)   887060.00 (  7.04%)
        Min      brk_test   4847266.67 (  0.00%)  4966266.67 (  2.45%)
        Min      exec_test     1271.00 (  0.00%)     1275.67 (  0.37%)
        Min      fork_test    12371.75 (  0.00%)    12380.00 (  0.07%)
      
      The overall impact on a page allocator microbenchmark for a range of orders
      and number of pages allocated in a batch is
      
                                                  4.6.0-rc3                  4.6.0-rc3
                                                     vanilla            deferalloc-v3r7
        Min      alloc-odr0-1               428.00 (  0.00%)           316.00 ( 26.17%)
        Min      alloc-odr0-2               314.00 (  0.00%)           231.00 ( 26.43%)
        Min      alloc-odr0-4               256.00 (  0.00%)           192.00 ( 25.00%)
        Min      alloc-odr0-8               222.00 (  0.00%)           166.00 ( 25.23%)
        Min      alloc-odr0-16              207.00 (  0.00%)           154.00 ( 25.60%)
        Min      alloc-odr0-32              197.00 (  0.00%)           148.00 ( 24.87%)
        Min      alloc-odr0-64              193.00 (  0.00%)           144.00 ( 25.39%)
        Min      alloc-odr0-128             191.00 (  0.00%)           143.00 ( 25.13%)
        Min      alloc-odr0-256             203.00 (  0.00%)           153.00 ( 24.63%)
        Min      alloc-odr0-512             212.00 (  0.00%)           165.00 ( 22.17%)
        Min      alloc-odr0-1024            221.00 (  0.00%)           172.00 ( 22.17%)
        Min      alloc-odr0-2048            225.00 (  0.00%)           179.00 ( 20.44%)
        Min      alloc-odr0-4096            232.00 (  0.00%)           185.00 ( 20.26%)
        Min      alloc-odr0-8192            235.00 (  0.00%)           187.00 ( 20.43%)
        Min      alloc-odr0-16384           236.00 (  0.00%)           188.00 ( 20.34%)
        Min      alloc-odr1-1               519.00 (  0.00%)           450.00 ( 13.29%)
        Min      alloc-odr1-2               391.00 (  0.00%)           336.00 ( 14.07%)
        Min      alloc-odr1-4               313.00 (  0.00%)           268.00 ( 14.38%)
        Min      alloc-odr1-8               277.00 (  0.00%)           235.00 ( 15.16%)
        Min      alloc-odr1-16              256.00 (  0.00%)           218.00 ( 14.84%)
        Min      alloc-odr1-32              252.00 (  0.00%)           212.00 ( 15.87%)
        Min      alloc-odr1-64              244.00 (  0.00%)           206.00 ( 15.57%)
        Min      alloc-odr1-128             244.00 (  0.00%)           207.00 ( 15.16%)
        Min      alloc-odr1-256             243.00 (  0.00%)           207.00 ( 14.81%)
        Min      alloc-odr1-512             245.00 (  0.00%)           209.00 ( 14.69%)
        Min      alloc-odr1-1024            248.00 (  0.00%)           214.00 ( 13.71%)
        Min      alloc-odr1-2048            253.00 (  0.00%)           220.00 ( 13.04%)
        Min      alloc-odr1-4096            258.00 (  0.00%)           224.00 ( 13.18%)
        Min      alloc-odr1-8192            261.00 (  0.00%)           229.00 ( 12.26%)
        Min      alloc-odr2-1               560.00 (  0.00%)           753.00 (-34.46%)
        Min      alloc-odr2-2               424.00 (  0.00%)           351.00 ( 17.22%)
        Min      alloc-odr2-4               339.00 (  0.00%)           393.00 (-15.93%)
        Min      alloc-odr2-8               298.00 (  0.00%)           246.00 ( 17.45%)
        Min      alloc-odr2-16              276.00 (  0.00%)           227.00 ( 17.75%)
        Min      alloc-odr2-32              271.00 (  0.00%)           221.00 ( 18.45%)
        Min      alloc-odr2-64              264.00 (  0.00%)           217.00 ( 17.80%)
        Min      alloc-odr2-128             264.00 (  0.00%)           217.00 ( 17.80%)
        Min      alloc-odr2-256             264.00 (  0.00%)           218.00 ( 17.42%)
        Min      alloc-odr2-512             269.00 (  0.00%)           223.00 ( 17.10%)
        Min      alloc-odr2-1024            279.00 (  0.00%)           230.00 ( 17.56%)
        Min      alloc-odr2-2048            283.00 (  0.00%)           235.00 ( 16.96%)
        Min      alloc-odr2-4096            285.00 (  0.00%)           239.00 ( 16.14%)
        Min      alloc-odr3-1               629.00 (  0.00%)           505.00 ( 19.71%)
        Min      alloc-odr3-2               472.00 (  0.00%)           374.00 ( 20.76%)
        Min      alloc-odr3-4               383.00 (  0.00%)           301.00 ( 21.41%)
        Min      alloc-odr3-8               341.00 (  0.00%)           266.00 ( 21.99%)
        Min      alloc-odr3-16              316.00 (  0.00%)           248.00 ( 21.52%)
        Min      alloc-odr3-32              308.00 (  0.00%)           241.00 ( 21.75%)
        Min      alloc-odr3-64              305.00 (  0.00%)           241.00 ( 20.98%)
        Min      alloc-odr3-128             308.00 (  0.00%)           244.00 ( 20.78%)
        Min      alloc-odr3-256             317.00 (  0.00%)           249.00 ( 21.45%)
        Min      alloc-odr3-512             327.00 (  0.00%)           256.00 ( 21.71%)
        Min      alloc-odr3-1024            331.00 (  0.00%)           261.00 ( 21.15%)
        Min      alloc-odr3-2048            333.00 (  0.00%)           266.00 ( 20.12%)
        Min      alloc-odr4-1               767.00 (  0.00%)           572.00 ( 25.42%)
        Min      alloc-odr4-2               578.00 (  0.00%)           429.00 ( 25.78%)
        Min      alloc-odr4-4               474.00 (  0.00%)           346.00 ( 27.00%)
        Min      alloc-odr4-8               422.00 (  0.00%)           310.00 ( 26.54%)
        Min      alloc-odr4-16              399.00 (  0.00%)           295.00 ( 26.07%)
        Min      alloc-odr4-32              392.00 (  0.00%)           293.00 ( 25.26%)
        Min      alloc-odr4-64              394.00 (  0.00%)           293.00 ( 25.63%)
        Min      alloc-odr4-128             405.00 (  0.00%)           305.00 ( 24.69%)
        Min      alloc-odr4-256             417.00 (  0.00%)           319.00 ( 23.50%)
        Min      alloc-odr4-512             425.00 (  0.00%)           326.00 ( 23.29%)
        Min      alloc-odr4-1024            426.00 (  0.00%)           329.00 ( 22.77%)
        Min      free-odr0-1                216.00 (  0.00%)           178.00 ( 17.59%)
        Min      free-odr0-2                152.00 (  0.00%)           125.00 ( 17.76%)
        Min      free-odr0-4                120.00 (  0.00%)            99.00 ( 17.50%)
        Min      free-odr0-8                106.00 (  0.00%)            85.00 ( 19.81%)
        Min      free-odr0-16                97.00 (  0.00%)            80.00 ( 17.53%)
        Min      free-odr0-32                92.00 (  0.00%)            76.00 ( 17.39%)
        Min      free-odr0-64                89.00 (  0.00%)            74.00 ( 16.85%)
        Min      free-odr0-128               89.00 (  0.00%)            73.00 ( 17.98%)
        Min      free-odr0-256              107.00 (  0.00%)            90.00 ( 15.89%)
        Min      free-odr0-512              117.00 (  0.00%)           108.00 (  7.69%)
        Min      free-odr0-1024             125.00 (  0.00%)           118.00 (  5.60%)
        Min      free-odr0-2048             132.00 (  0.00%)           125.00 (  5.30%)
        Min      free-odr0-4096             135.00 (  0.00%)           130.00 (  3.70%)
        Min      free-odr0-8192             137.00 (  0.00%)           130.00 (  5.11%)
        Min      free-odr0-16384            137.00 (  0.00%)           131.00 (  4.38%)
        Min      free-odr1-1                318.00 (  0.00%)           289.00 (  9.12%)
        Min      free-odr1-2                228.00 (  0.00%)           207.00 (  9.21%)
        Min      free-odr1-4                182.00 (  0.00%)           165.00 (  9.34%)
        Min      free-odr1-8                163.00 (  0.00%)           146.00 ( 10.43%)
        Min      free-odr1-16               151.00 (  0.00%)           135.00 ( 10.60%)
        Min      free-odr1-32               146.00 (  0.00%)           129.00 ( 11.64%)
        Min      free-odr1-64               145.00 (  0.00%)           130.00 ( 10.34%)
        Min      free-odr1-128              148.00 (  0.00%)           134.00 (  9.46%)
        Min      free-odr1-256              148.00 (  0.00%)           137.00 (  7.43%)
        Min      free-odr1-512              151.00 (  0.00%)           140.00 (  7.28%)
        Min      free-odr1-1024             154.00 (  0.00%)           143.00 (  7.14%)
        Min      free-odr1-2048             156.00 (  0.00%)           144.00 (  7.69%)
        Min      free-odr1-4096             156.00 (  0.00%)           142.00 (  8.97%)
        Min      free-odr1-8192             156.00 (  0.00%)           140.00 ( 10.26%)
        Min      free-odr2-1                361.00 (  0.00%)           457.00 (-26.59%)
        Min      free-odr2-2                258.00 (  0.00%)           224.00 ( 13.18%)
        Min      free-odr2-4                208.00 (  0.00%)           223.00 ( -7.21%)
        Min      free-odr2-8                185.00 (  0.00%)           160.00 ( 13.51%)
        Min      free-odr2-16               173.00 (  0.00%)           149.00 ( 13.87%)
        Min      free-odr2-32               166.00 (  0.00%)           145.00 ( 12.65%)
        Min      free-odr2-64               166.00 (  0.00%)           146.00 ( 12.05%)
        Min      free-odr2-128              169.00 (  0.00%)           148.00 ( 12.43%)
        Min      free-odr2-256              170.00 (  0.00%)           152.00 ( 10.59%)
        Min      free-odr2-512              177.00 (  0.00%)           156.00 ( 11.86%)
        Min      free-odr2-1024             182.00 (  0.00%)           162.00 ( 10.99%)
        Min      free-odr2-2048             181.00 (  0.00%)           160.00 ( 11.60%)
        Min      free-odr2-4096             180.00 (  0.00%)           159.00 ( 11.67%)
        Min      free-odr3-1                431.00 (  0.00%)           367.00 ( 14.85%)
        Min      free-odr3-2                306.00 (  0.00%)           259.00 ( 15.36%)
        Min      free-odr3-4                249.00 (  0.00%)           208.00 ( 16.47%)
        Min      free-odr3-8                224.00 (  0.00%)           186.00 ( 16.96%)
        Min      free-odr3-16               208.00 (  0.00%)           176.00 ( 15.38%)
        Min      free-odr3-32               206.00 (  0.00%)           174.00 ( 15.53%)
        Min      free-odr3-64               210.00 (  0.00%)           178.00 ( 15.24%)
        Min      free-odr3-128              215.00 (  0.00%)           182.00 ( 15.35%)
        Min      free-odr3-256              224.00 (  0.00%)           189.00 ( 15.62%)
        Min      free-odr3-512              232.00 (  0.00%)           195.00 ( 15.95%)
        Min      free-odr3-1024             230.00 (  0.00%)           195.00 ( 15.22%)
        Min      free-odr3-2048             229.00 (  0.00%)           193.00 ( 15.72%)
        Min      free-odr4-1                561.00 (  0.00%)           439.00 ( 21.75%)
        Min      free-odr4-2                418.00 (  0.00%)           318.00 ( 23.92%)
        Min      free-odr4-4                339.00 (  0.00%)           269.00 ( 20.65%)
        Min      free-odr4-8                299.00 (  0.00%)           239.00 ( 20.07%)
        Min      free-odr4-16               289.00 (  0.00%)           234.00 ( 19.03%)
        Min      free-odr4-32               291.00 (  0.00%)           235.00 ( 19.24%)
        Min      free-odr4-64               298.00 (  0.00%)           238.00 ( 20.13%)
        Min      free-odr4-128              308.00 (  0.00%)           251.00 ( 18.51%)
        Min      free-odr4-256              321.00 (  0.00%)           267.00 ( 16.82%)
        Min      free-odr4-512              327.00 (  0.00%)           269.00 ( 17.74%)
        Min      free-odr4-1024             326.00 (  0.00%)           271.00 ( 16.87%)
        Min      total-odr0-1               644.00 (  0.00%)           494.00 ( 23.29%)
        Min      total-odr0-2               466.00 (  0.00%)           356.00 ( 23.61%)
        Min      total-odr0-4               376.00 (  0.00%)           291.00 ( 22.61%)
        Min      total-odr0-8               328.00 (  0.00%)           251.00 ( 23.48%)
        Min      total-odr0-16              304.00 (  0.00%)           234.00 ( 23.03%)
        Min      total-odr0-32              289.00 (  0.00%)           224.00 ( 22.49%)
        Min      total-odr0-64              282.00 (  0.00%)           218.00 ( 22.70%)
        Min      total-odr0-128             280.00 (  0.00%)           216.00 ( 22.86%)
        Min      total-odr0-256             310.00 (  0.00%)           243.00 ( 21.61%)
        Min      total-odr0-512             329.00 (  0.00%)           273.00 ( 17.02%)
        Min      total-odr0-1024            346.00 (  0.00%)           290.00 ( 16.18%)
        Min      total-odr0-2048            357.00 (  0.00%)           304.00 ( 14.85%)
        Min      total-odr0-4096            367.00 (  0.00%)           315.00 ( 14.17%)
        Min      total-odr0-8192            372.00 (  0.00%)           317.00 ( 14.78%)
        Min      total-odr0-16384           373.00 (  0.00%)           319.00 ( 14.48%)
        Min      total-odr1-1               838.00 (  0.00%)           739.00 ( 11.81%)
        Min      total-odr1-2               619.00 (  0.00%)           543.00 ( 12.28%)
        Min      total-odr1-4               495.00 (  0.00%)           433.00 ( 12.53%)
        Min      total-odr1-8               440.00 (  0.00%)           382.00 ( 13.18%)
        Min      total-odr1-16              407.00 (  0.00%)           353.00 ( 13.27%)
        Min      total-odr1-32              398.00 (  0.00%)           341.00 ( 14.32%)
        Min      total-odr1-64              389.00 (  0.00%)           336.00 ( 13.62%)
        Min      total-odr1-128             392.00 (  0.00%)           341.00 ( 13.01%)
        Min      total-odr1-256             391.00 (  0.00%)           344.00 ( 12.02%)
        Min      total-odr1-512             396.00 (  0.00%)           349.00 ( 11.87%)
        Min      total-odr1-1024            402.00 (  0.00%)           357.00 ( 11.19%)
        Min      total-odr1-2048            409.00 (  0.00%)           364.00 ( 11.00%)
        Min      total-odr1-4096            414.00 (  0.00%)           366.00 ( 11.59%)
        Min      total-odr1-8192            417.00 (  0.00%)           369.00 ( 11.51%)
        Min      total-odr2-1               921.00 (  0.00%)          1210.00 (-31.38%)
        Min      total-odr2-2               682.00 (  0.00%)           576.00 ( 15.54%)
        Min      total-odr2-4               547.00 (  0.00%)           616.00 (-12.61%)
        Min      total-odr2-8               483.00 (  0.00%)           406.00 ( 15.94%)
        Min      total-odr2-16              449.00 (  0.00%)           376.00 ( 16.26%)
        Min      total-odr2-32              437.00 (  0.00%)           366.00 ( 16.25%)
        Min      total-odr2-64              431.00 (  0.00%)           363.00 ( 15.78%)
        Min      total-odr2-128             433.00 (  0.00%)           365.00 ( 15.70%)
        Min      total-odr2-256             434.00 (  0.00%)           371.00 ( 14.52%)
        Min      total-odr2-512             446.00 (  0.00%)           379.00 ( 15.02%)
        Min      total-odr2-1024            461.00 (  0.00%)           392.00 ( 14.97%)
        Min      total-odr2-2048            464.00 (  0.00%)           395.00 ( 14.87%)
        Min      total-odr2-4096            465.00 (  0.00%)           398.00 ( 14.41%)
        Min      total-odr3-1              1060.00 (  0.00%)           872.00 ( 17.74%)
        Min      total-odr3-2               778.00 (  0.00%)           633.00 ( 18.64%)
        Min      total-odr3-4               632.00 (  0.00%)           510.00 ( 19.30%)
        Min      total-odr3-8               565.00 (  0.00%)           452.00 ( 20.00%)
        Min      total-odr3-16              524.00 (  0.00%)           424.00 ( 19.08%)
        Min      total-odr3-32              514.00 (  0.00%)           415.00 ( 19.26%)
        Min      total-odr3-64              515.00 (  0.00%)           419.00 ( 18.64%)
        Min      total-odr3-128             523.00 (  0.00%)           426.00 ( 18.55%)
        Min      total-odr3-256             541.00 (  0.00%)           438.00 ( 19.04%)
        Min      total-odr3-512             559.00 (  0.00%)           451.00 ( 19.32%)
        Min      total-odr3-1024            561.00 (  0.00%)           456.00 ( 18.72%)
        Min      total-odr3-2048            562.00 (  0.00%)           459.00 ( 18.33%)
        Min      total-odr4-1              1328.00 (  0.00%)          1011.00 ( 23.87%)
        Min      total-odr4-2               997.00 (  0.00%)           747.00 ( 25.08%)
        Min      total-odr4-4               813.00 (  0.00%)           615.00 ( 24.35%)
        Min      total-odr4-8               721.00 (  0.00%)           550.00 ( 23.72%)
        Min      total-odr4-16              689.00 (  0.00%)           529.00 ( 23.22%)
        Min      total-odr4-32              683.00 (  0.00%)           528.00 ( 22.69%)
        Min      total-odr4-64              692.00 (  0.00%)           531.00 ( 23.27%)
        Min      total-odr4-128             713.00 (  0.00%)           556.00 ( 22.02%)
        Min      total-odr4-256             738.00 (  0.00%)           586.00 ( 20.60%)
        Min      total-odr4-512             753.00 (  0.00%)           595.00 ( 20.98%)
        Min      total-odr4-1024            752.00 (  0.00%)           600.00 ( 20.21%)
      
      This patch (of 27):
      
      order-0 pages by definition cannot be compound so avoid the check in the
      fast path for those pages.
      
      [akpm@linux-foundation.org: use unlikely(order) in free_pages_prepare(), per Vlastimil]
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d61f8590
    • Michal Hocko's avatar
      mm, oom_reaper: clear TIF_MEMDIE for all tasks queued for oom_reaper · 449d777d
      Michal Hocko authored
      Right now the oom reaper will clear TIF_MEMDIE only for tasks which were
      successfully reaped.  This is the safest option because we know that
      such an oom victim would only block forward progress of the oom killer
      without a good reason because it is highly unlikely it would release
      much more memory.  Basically most of its memory has been already torn
      down.
      
      We can relax this assumption to catch more corner cases though.
      
      The first obvious one is when the oom victim clears its mm and gets
      stuck later on.  oom_reaper would back of on find_lock_task_mm returning
      NULL.  We can safely try to clear TIF_MEMDIE in this case because such a
      task would be ignored by the oom killer anyway.  The flag would be
      cleared by that time already most of the time anyway.
      
      The less obvious one is when the oom reaper fails due to mmap_sem
      contention.  Even if we clear TIF_MEMDIE for this task then it is not
      very likely that we would select another task too easily because we
      haven't reaped the last victim and so it would be still the #1
      candidate.  There is a rare race condition possible when the current
      victim terminates before the next select_bad_process but considering
      that oom_reap_task had retried several times before giving up then this
      sounds like a borderline thing.
      
      After this patch we should have a guarantee that the OOM killer will not
      be block for unbounded amount of time for most cases.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Raushaniya Maksudova <rmaksudova@parallels.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      449d777d
    • Michal Hocko's avatar
      oom, oom_reaper: try to reap tasks which skip regular OOM killer path · 3ef22dff
      Michal Hocko authored
      If either the current task is already killed or PF_EXITING or a selected
      task is PF_EXITING then the oom killer is suppressed and so is the oom
      reaper.  This patch adds try_oom_reaper which checks the given task and
      queues it for the oom reaper if that is safe to be done meaning that the
      task doesn't share the mm with an alive process.
      
      This might help to release the memory pressure while the task tries to
      exit.
      
      [akpm@linux-foundation.org: fix nommu build]
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Raushaniya Maksudova <rmaksudova@parallels.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3ef22dff
    • Michal Hocko's avatar
      mm, oom: move GFP_NOFS check to out_of_memory · 3da88fb3
      Michal Hocko authored
      __alloc_pages_may_oom is the central place to decide when the
      out_of_memory should be invoked.  This is a good approach for most
      checks there because they are page allocator specific and the allocation
      fails right after for all of them.
      
      The notable exception is GFP_NOFS context which is faking
      did_some_progress and keep the page allocator looping even though there
      couldn't have been any progress from the OOM killer.  This patch doesn't
      change this behavior because we are not ready to allow those allocation
      requests to fail yet (and maybe we will face the reality that we will
      never manage to safely fail these request).  Instead __GFP_FS check is
      moved down to out_of_memory and prevent from OOM victim selection there.
      There are two reasons for that
      
      	- OOM notifiers might release some memory even from this context
      	  as none of the registered notifier seems to be FS related
      	- this might help a dying thread to get an access to memory
                reserves and move on which will make the behavior more
                consistent with the case when the task gets killed from a
                different context.
      
      Keep a comment in __alloc_pages_may_oom to make sure we do not forget
      how GFP_NOFS is special and that we really want to do something about
      it.
      
      Note to the current oom_notifier users:
      
      The observable difference for you is that oom notifiers cannot depend on
      any fs locks because we could deadlock.  Not that this would be allowed
      today because that would just lockup machine in most of the cases and
      ruling out the OOM killer along the way.  Another difference is that
      callbacks might be invoked sooner now because GFP_NOFS is a weaker
      reclaim context and so there could be reclaimable memory which is just
      not reachable now.  That would require GFP_NOFS only loads which are
      really rare and more importantly the observable result would be dropping
      of reconstructible object and potential performance drop which is not
      such a big deal when we are struggling to fulfill other important
      allocation requests.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Raushaniya Maksudova <rmaksudova@parallels.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3da88fb3
    • Vitaly Kuznetsov's avatar
      memory_hotplug: introduce memhp_default_state= command line parameter · 86dd995d
      Vitaly Kuznetsov authored
      CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE specifies the default value for the
      memory hotplug onlining policy.  Add a command line parameter to make it
      possible to override the default.  It may come handy for debug and
      testing purposes.
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Lennart Poettering <lennart@poettering.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      86dd995d
    • Vitaly Kuznetsov's avatar
      memory_hotplug: introduce CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE · 8604d9e5
      Vitaly Kuznetsov authored
      This patchset continues the work I started with commit 31bc3858
      ("memory-hotplug: add automatic onlining policy for the newly added
      memory").
      
      Initially I was going to stop there and bring the policy setting logic
      to userspace.  I met two issues on this way:
      
       1) It is possible to have memory hotplugged at boot (e.g.  with QEMU).
          These blocks stay offlined if we turn the onlining policy on by
          userspace.
      
       2) My attempt to bring this policy setting to systemd failed, systemd
          maintainers suggest to change the default in kernel or ...  to use
          tmpfiles.d to alter the policy (which looks like a hack to me):
              https://github.com/systemd/systemd/pull/2938
      
      Here I suggest to add a config option to set the default value for the
      policy and a kernel command line parameter to make the override.
      
      This patch (of 2):
      
      Introduce config option to set the default value for memory hotplug
      onlining policy (/sys/devices/system/memory/auto_online_blocks).  The
      reason one would want to turn this option on are to have early onlining
      for hotpluggable memory available at boot and to not require any
      userspace actions to make memory hotplug work.
      
      [akpm@linux-foundation.org: tweak Kconfig text]
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Lennart Poettering <lennart@poettering.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8604d9e5
    • Hugh Dickins's avatar
      arch: fix has_transparent_hugepage() · fd8cfd30
      Hugh Dickins authored
      I've just discovered that the useful-sounding has_transparent_hugepage()
      is actually an architecture-dependent minefield: on some arches it only
      builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when
      not, but on some of those (arm and arm64) it then gives the wrong
      answer; and on mips alone it's marked __init, which would crash if
      called later (but so far it has not been called later).
      
      Straighten this out: make it available to all configs, with a sensible
      default in asm-generic/pgtable.h, removing its definitions from those
      arches (arc, arm, arm64, sparc, tile) which are served by the default,
      adding #define has_transparent_hugepage has_transparent_hugepage to
      those (mips, powerpc, s390, x86) which need to override the default at
      runtime, and removing the __init from mips (but maybe that kind of code
      should be avoided after init: set a static variable the first time it's
      called).
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Ning Qu <quning@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: Vineet Gupta <vgupta@synopsys.com>		[arch/arc]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[arch/s390]
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fd8cfd30
    • Hugh Dickins's avatar
      huge pagecache: extend mremap pmd rmap lockout to files · 1d069b7d
      Hugh Dickins authored
      Whatever huge pagecache implementation we go with, file rmap locking
      must be added to anon rmap locking, when mremap's move_page_tables()
      finds a pmd_trans_huge pmd entry: a simple change, let's do it now.
      
      Factor out take_rmap_locks() and drop_rmap_locks() to handle the locking
      for make move_ptes() and move_page_tables(), and delete the
      VM_BUG_ON_VMA which rejected vm_file and required anon_vma.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Ning Qu <quning@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1d069b7d
    • Hugh Dickins's avatar
      huge mm: move_huge_pmd does not need new_vma · bf8616d5
      Hugh Dickins authored
      Remove move_huge_pmd()'s redundant new_vma arg: all it was used for was
      a VM_NOHUGEPAGE check on new_vma flags, but the new_vma is cloned from
      the old vma, so a trans_huge_pmd in the new_vma will be as acceptable as
      it was in the old vma, alignment and size permitting.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Ning Qu <quning@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bf8616d5
    • Hugh Dickins's avatar
      mm: /proc/sys/vm/stat_refresh to force vmstat update · 52b6f46b
      Hugh Dickins authored
      Provide /proc/sys/vm/stat_refresh to force an immediate update of
      per-cpu into global vmstats: useful to avoid a sleep(2) or whatever
      before checking counts when testing.  Originally added to work around a
      bug which left counts stranded indefinitely on a cpu going idle (an
      inaccuracy magnified when small below-batch numbers represent "huge"
      amounts of memory), but I believe that bug is now fixed: nonetheless,
      this is still a useful knob.
      
      Its schedule_on_each_cpu() is probably too expensive just to fold into
      reading /proc/meminfo itself: give this mode 0600 to prevent abuse.
      Allow a write or a read to do the same: nothing to read, but "grep -h
      Shmem /proc/sys/vm/stat_refresh /proc/meminfo" is convenient.  Oh, and
      since global_page_state() itself is careful to disguise any underflow as
      0, hack in an "Invalid argument" and pr_warn() if a counter is negative
      after the refresh - this helped to fix a misaccounting of
      NR_ISOLATED_FILE in my migration code.
      
      But on recent kernels, I find that NR_ALLOC_BATCH and NR_PAGES_SCANNED
      often go negative some of the time.  I have not yet worked out why, but
      have no evidence that it's actually harmful.  Punt for the moment by
      just ignoring the anomaly on those.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Ning Qu <quning@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      52b6f46b
    • Andres Lagar-Cavilla's avatar
      tmpfs: mem_cgroup charge fault to vm_mm not current mm · 9e18eb29
      Andres Lagar-Cavilla authored
      Although shmem_fault() has been careful to count a major fault to vm_mm,
      shmem_getpage_gfp() has been careless in charging a remote access fault
      to current->mm owner's memcg instead of to vma->vm_mm owner's memcg:
      that is inconsistent with all the mem_cgroup charging on remote access
      faults in mm/memory.c.
      
      Fix it by passing fault_mm along with fault_type to
      shmem_get_page_gfp(); but in that case, now knowing the right mm, it's
      better for it to handle the PGMAJFAULT updates itself.
      
      And let's keep this clutter out of most callers' way: change the common
      shmem_getpage() wrapper to hide fault_mm and fault_type as well as gfp.
      Signed-off-by: default avatarAndres Lagar-Cavilla <andreslc@google.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Ning Qu <quning@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9e18eb29
    • Hugh Dickins's avatar
      tmpfs: preliminary minor tidyups · 75edd345
      Hugh Dickins authored
      Make a few cleanups in mm/shmem.c, before going on to complicate it.
      
      shmem_alloc_page() will become more complicated: we can't afford to to
      have that complication duplicated between a CONFIG_NUMA version and a
      !CONFIG_NUMA version, so rearrange the #ifdef'ery there to yield a
      single shmem_swapin() and a single shmem_alloc_page().
      
      Yes, it's a shame to inflict the horrid pseudo-vma on non-NUMA
      configurations, but eliminating it is a larger cleanup: I have an
      alloc_pages_mpol() patchset not yet ready - mpol handling is subtle and
      bug-prone, and changed yet again since my last version.
      
      Move __SetPageLocked, __SetPageSwapBacked from shmem_getpage_gfp() to
      shmem_alloc_page(): that SwapBacked flag will be useful in future, to
      help to distinguish different cases appropriately.
      
      And the SGP_DIRTY variant of SGP_CACHE is hard to understand and of
      little use (IIRC it dates back to when shmem_getpage() returned the page
      unlocked): kill it and do the necessary in shmem_file_read_iter().
      
      But an arm64 build then complained that info may be uninitialized (where
      shmem_getpage_gfp() deletes a freshly alloced page beyond eof), and
      advancing to an "sgp <= SGP_CACHE" test jogged it back to reality.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Ning Qu <quning@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      75edd345
    • Hugh Dickins's avatar
      mm: use __SetPageSwapBacked and dont ClearPageSwapBacked · fa9949da
      Hugh Dickins authored
      v3.16 commit 07a42788 ("mm: shmem: avoid atomic operation during
      shmem_getpage_gfp") rightly replaced one instance of SetPageSwapBacked
      by __SetPageSwapBacked, pointing out that the newly allocated page is
      not yet visible to other users (except speculative get_page_unless_zero-
      ers, who may not update page flags before their further checks).
      
      That was part of a series in which Mel was focused on tmpfs profiles:
      but almost all SetPageSwapBacked uses can be so optimized, with the same
      justification.
      
      Remove ClearPageSwapBacked from __read_swap_cache_async() error path:
      it's not an error to free a page with PG_swapbacked set.
      
      Follow a convention of __SetPageLocked, __SetPageSwapBacked instead of
      doing it differently in different places; but that's for tidiness - if
      the ordering actually mattered, we should not be using the __variants.
      
      There's probably scope for further __SetPageFlags in other places, but
      SwapBacked is the one I'm interested in at the moment.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Ning Qu <quning@gmail.com>
      Reviewed-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fa9949da
    • Hugh Dickins's avatar
      mm: update_lru_size do the __mod_zone_page_state · 9d5e6a9f
      Hugh Dickins authored
      Konstantin Khlebnikov pointed out (nearly four years ago, when lumpy
      reclaim was removed) that lru_size can be updated by -nr_taken once per
      call to isolate_lru_pages(), instead of page by page.
      
      Update it inside isolate_lru_pages(), or at its two callsites? I chose
      to update it at the callsites, rearranging and grouping the updates by
      nr_taken and nr_scanned together in both.
      
      With one exception, mem_cgroup_update_lru_size(,lru,) is then used where
      __mod_zone_page_state(,NR_LRU_BASE+lru,) is used; and we shall be adding
      some more calls in a future commit.  Make the code a little smaller and
      simpler by incorporating stat update in lru_size update.
      
      The exception was move_active_pages_to_lru(), which aggregated the
      pgmoved stat update separately from the individual lru_size updates; but
      I still think this a simplification worth making.
      
      However, the __mod_zone_page_state is not peculiar to mem_cgroups: so
      better use the name update_lru_size, calls mem_cgroup_update_lru_size
      when CONFIG_MEMCG.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Ning Qu <quning@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9d5e6a9f
    • Hugh Dickins's avatar
      mm: update_lru_size warn and reset bad lru_size · ca707239
      Hugh Dickins authored
      Though debug kernels have a VM_BUG_ON to help protect from misaccounting
      lru_size, non-debug kernels are liable to wrap it around: and then the
      vast unsigned long size draws page reclaim into a loop of repeatedly
      doing nothing on an empty list, without even a cond_resched().
      
      That soft lockup looks confusingly like an over-busy reclaim scenario,
      with lots of contention on the lru_lock in shrink_inactive_list(): yet
      has a totally different origin.
      
      Help differentiate with a custom warning in
      mem_cgroup_update_lru_size(), even in non-debug kernels; and reset the
      size to avoid the lockup.  But the particular bug which suggested this
      change was mine alone, and since fixed.
      
      Make it a WARN_ONCE: the first occurrence is the most informative, a
      flurry may follow, yet even when rate-limited little more is learnt.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Ning Qu <quning@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ca707239