• Mel Gorman's avatar
    mm/page_alloc: allow high-order pages to be stored on the per-cpu lists · 44042b44
    Mel Gorman authored
    The per-cpu page allocator (PCP) only stores order-0 pages.  This means
    that all THP and "cheap" high-order allocations including SLUB contends on
    the zone->lock.  This patch extends the PCP allocator to store THP and
    "cheap" high-order pages.  Note that struct per_cpu_pages increases in
    size to 256 bytes (4 cache lines) on x86-64.
    
    Note that this is not necessarily a universal performance win because of
    how it is implemented.  High-order pages can cause pcp->high to be
    exceeded prematurely for lower-orders so for example, a large number of
    THP pages being freed could release order-0 pages from the PCP lists.
    Hence, much depends on the allocation/free pattern as observed by a single
    CPU to determine if caching helps or hurts a particular workload.
    
    That said, basic performance testing passed.  The following is a netperf
    UDP_STREAM test which hits the relevant patches as some of the network
    allocations are high-order.
    
    netperf-udp
                                     5.13.0-rc2             5.13.0-rc2
                               mm-pcpburst-v3r4   mm-pcphighorder-v1r7
    Hmean     send-64         261.46 (   0.00%)      266.30 *   1.85%*
    Hmean     send-128        516.35 (   0.00%)      536.78 *   3.96%*
    Hmean     send-256       1014.13 (   0.00%)     1034.63 *   2.02%*
    Hmean     send-1024      3907.65 (   0.00%)     4046.11 *   3.54%*
    Hmean     send-2048      7492.93 (   0.00%)     7754.85 *   3.50%*
    Hmean     send-3312     11410.04 (   0.00%)    11772.32 *   3.18%*
    Hmean     send-4096     13521.95 (   0.00%)    13912.34 *   2.89%*
    Hmean     send-8192     21660.50 (   0.00%)    22730.72 *   4.94%*
    Hmean     send-16384    31902.32 (   0.00%)    32637.50 *   2.30%*
    
    Functionally, a patch like this is necessary to make bulk allocation of
    high-order pages work with similar performance to order-0 bulk
    allocations.  The bulk allocator is not updated in this series as it would
    have to be determined by bulk allocation users how they want to track the
    order of pages allocated with the bulk allocator.
    
    Link: https://lkml.kernel.org/r/20210611135753.GC30378@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Cc: Zi Yan <ziy@nvidia.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Jesper Dangaard Brouer <brouer@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    44042b44
swap.c 31.9 KB