Commit a368ab67 authored by Mel Gorman's avatar Mel Gorman Committed by Linus Torvalds

mm: move zone lock to a different cache line than order-0 free page lists

Huang Ying reported the following problem due to commit 3484b2de ("mm:
rearrange zone fields into read-only, page alloc, statistics and page
reclaim lines") from the Intel performance tests

    24b7e581  3484b2de
    ----------------  --------------------------
             %stddev     %change         %stddev
                 \          |                \
        152288 \261  0%     -46.2%      81911 \261  0%  aim7.jobs-per-min
           237 \261  0%     +85.6%        440 \261  0%  aim7.time.elapsed_time
           237 \261  0%     +85.6%        440 \261  0%  aim7.time.elapsed_time.max
         25026 \261  0%     +70.7%      42712 \261  0%  aim7.time.system_time
       2186645 \261  5%     +32.0%    2885949 \261  4%  aim7.time.voluntary_context_switches
       4576561 \261  1%     +24.9%    5715773 \261  0%  aim7.time.involuntary_context_switches

The problem is specific to very large machines under stress.  It was not
reproducible with the machines I had used to justify the original patch
because large numbers of CPUs are required.  When pressure is high enough,
the cache line is bouncing between CPUs trying to acquire the lock and the
holder of the lock adjusting free lists.  The intention was that the
acquirer of the lock would automatically have the cache line holding the
free lists but according to Huang, this is not a universal win.

One possibility is to move the zone lock to its own cache line but it
increases the size of the zone.  This patch moves the lock to the other
end of the free lists where they do not contend under high pressure.  It
does mean the page allocator paths now require more cache lines but Huang
reports that it restores performance to previous levels on large machines

             %stddev     %change         %stddev
                 \          |                \
         84568 \261  1%     +94.3%     164280 \261  1%  aim7.jobs-per-min
       2881944 \261  2%     -35.1%    1870386 \261  8%  aim7.time.voluntary_context_switches
           681 \261  1%      -3.4%        658 \261  0%  aim7.time.user_time
       5538139 \261  0%     -12.1%    4867884 \261  0%  aim7.time.involuntary_context_switches
         44174 \261  1%     -46.0%      23848 \261  1%  aim7.time.system_time
           426 \261  1%     -48.4%        219 \261  1%  aim7.time.elapsed_time
           426 \261  1%     -48.4%        219 \261  1%  aim7.time.elapsed_time.max
           468 \261  1%     -43.1%        266 \261  2%  uptime.boot
Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
Reported-by: default avatarHuang Ying <ying.huang@intel.com>
Tested-by: default avatarHuang Ying <ying.huang@intel.com>
Acked-by: default avatarDavid Rientjes <rientjes@google.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent f22e6e84
...@@ -474,16 +474,15 @@ struct zone { ...@@ -474,16 +474,15 @@ struct zone {
unsigned long wait_table_bits; unsigned long wait_table_bits;
ZONE_PADDING(_pad1_) ZONE_PADDING(_pad1_)
/* Write-intensive fields used from the page allocator */
spinlock_t lock;
/* free areas of different sizes */ /* free areas of different sizes */
struct free_area free_area[MAX_ORDER]; struct free_area free_area[MAX_ORDER];
/* zone flags, see below */ /* zone flags, see below */
unsigned long flags; unsigned long flags;
/* Write-intensive fields used from the page allocator */
spinlock_t lock;
ZONE_PADDING(_pad2_) ZONE_PADDING(_pad2_)
/* Write-intensive fields used by page reclaim */ /* Write-intensive fields used by page reclaim */
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment