• Huang Ying's avatar
    mm: tune PCP high automatically · 51a755c5
    Huang Ying authored
    The target to tune PCP high automatically is as follows,
    
    - Minimize allocation/freeing from/to shared zone
    
    - Minimize idle pages in PCP
    
    - Minimize pages in PCP if the system free pages is too few
    
    To reach these target, a tuning algorithm as follows is designed,
    
    - When we refill PCP via allocating from the zone, increase PCP high.
      Because if we had larger PCP, we could avoid to allocate from the
      zone.
    
    - In periodic vmstat updating kworker (via refresh_cpu_vm_stats()),
      decrease PCP high to try to free possible idle PCP pages.
    
    - When page reclaiming is active for the zone, stop increasing PCP
      high in allocating path, decrease PCP high and free some pages in
      freeing path.
    
    So, the PCP high can be tuned to the page allocating/freeing depth of
    workloads eventually.
    
    One issue of the algorithm is that if the number of pages allocated is
    much more than that of pages freed on a CPU, the PCP high may become the
    maximal value even if the allocating/freeing depth is small.  But this
    isn't a severe issue, because there are no idle pages in this case.
    
    One alternative choice is to increase PCP high when we drain PCP via
    trying to free pages to the zone, but don't increase PCP high during PCP
    refilling.  This can avoid the issue above.  But if the number of pages
    allocated is much less than that of pages freed on a CPU, there will be
    many idle pages in PCP and it is hard to free these idle pages.
    
    1/8 (>> 3) of PCP high will be decreased periodically.  The value 1/8 is
    kind of arbitrary.  Just to make sure that the idle PCP pages will be
    freed eventually.
    
    On a 2-socket Intel server with 224 logical CPU, we run 8 kbuild instances
    in parallel (each with `make -j 28`) in 8 cgroup.  This simulates the
    kbuild server that is used by 0-Day kbuild service.  With the patch, the
    build time decreases 3.5%.  The cycles% of the spinlock contention (mostly
    for zone lock) decreases from 11.0% to 0.5%.  The number of PCP draining
    for high order pages freeing (free_high) decreases 65.6%.  The number of
    pages allocated from zone (instead of from PCP) decreases 83.9%.
    
    Link: https://lkml.kernel.org/r/20231016053002.756205-8-ying.huang@intel.comSigned-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
    Suggested-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Suggested-by: default avatarMichal Hocko <mhocko@suse.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Johannes Weiner <jweiner@redhat.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Arjan van de Ven <arjan@linux.intel.com>
    Cc: Sudeep Holla <sudeep.holla@arm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    51a755c5
page_alloc.c 190 KB