• Mel Gorman's avatar
    mm/page_alloc: delete vm.percpu_pagelist_fraction · bbbecb35
    Mel Gorman authored
    Patch series "Calculate pcp->high based on zone sizes and active CPUs", v2.
    
    The per-cpu page allocator (PCP) is meant to reduce contention on the zone
    lock but the sizing of batch and high is archaic and neither takes the
    zone size into account or the number of CPUs local to a zone.  With larger
    zones and more CPUs per node, the contention is getting worse.
    Furthermore, the fact that vm.percpu_pagelist_fraction adjusts both batch
    and high values means that the sysctl can reduce zone lock contention but
    also increase allocation latencies.
    
    This series disassociates pcp->high from pcp->batch and then scales
    pcp->high based on the size of the local zone with limited impact to
    reclaim and accounting for active CPUs but leaves pcp->batch static.  It
    also adapts the number of pages that can be on the pcp list based on
    recent freeing patterns.
    
    The motivation is partially to adjust to larger memory sizes but is also
    driven by the fact that large batches of page freeing via release_pages()
    often shows zone contention as a major part of the problem.  Another is a
    bug report based on an older kernel where a multi-terabyte process can
    takes several minutes to exit.  A workaround was to use
    vm.percpu_pagelist_fraction to increase the pcp->high value but testing
    indicated that a production workload could not use the same values because
    of an increase in allocation latencies.  Unfortunately, I cannot reproduce
    this test case myself as the multi-terabyte machines are in active use but
    it should alleviate the problem.
    
    The series aims to address both and partially acts as a pre-requisite.
    pcp only works with order-0 which is useless for SLUB (when using high
    orders) and THP (unconditionally).  To store high-order pages on PCP, the
    pcp->high values need to be increased first.
    
    This patch (of 6):
    
    The vm.percpu_pagelist_fraction is used to increase the batch and high
    limits for the per-cpu page allocator (PCP).  The intent behind the sysctl
    is to reduce zone lock acquisition when allocating/freeing pages but it
    has a problem.  While it can decrease contention, it can also increase
    latency on the allocation side due to unreasonably large batch sizes.
    This leads to games where an administrator adjusts
    percpu_pagelist_fraction on the fly to work around contention and
    allocation latency problems.
    
    This series aims to alleviate the problems with zone lock contention while
    avoiding the allocation-side latency problems.  For the purposes of
    review, it's easier to remove this sysctl now and reintroduce a similar
    sysctl later in the series that deals only with pcp->high.
    
    Link: https://lkml.kernel.org/r/20210525080119.5455-1-mgorman@techsingularity.net
    Link: https://lkml.kernel.org/r/20210525080119.5455-2-mgorman@techsingularity.netSigned-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Cc: Hillf Danton <hdanton@sina.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    bbbecb35
vm.rst 33.5 KB