• Vlastimil Babka's avatar
    mm, slub: better heuristic for number of cpus when calculating slab order · 3286222f
    Vlastimil Babka authored
    When creating a new kmem cache, SLUB determines how large the slab pages
    will based on number of inputs, including the number of CPUs in the
    system.  Larger slab pages mean that more objects can be allocated/free
    from per-cpu slabs before accessing shared structures, but also
    potentially more memory can be wasted due to low slab usage and
    fragmentation.  The rough idea of using number of CPUs is that larger
    systems will be more likely to benefit from reduced contention, and also
    should have enough memory to spare.
    
    Number of CPUs used to be determined as nr_cpu_ids, which is number of
    possible cpus, but on some systems many will never be onlined, thus
    commit 045ab8c9 ("mm/slub: let number of online CPUs determine the
    slub page order") changed it to nr_online_cpus().  However, for kmem
    caches created early before CPUs are onlined, this may lead to
    permamently low slab page sizes.
    
    Vincent reports a regression [1] of hackbench on arm64 systems:
    
      "I'm facing significant performances regression on a large arm64
       server system (224 CPUs). Regressions is also present on small arm64
       system (8 CPUs) but in a far smaller order of magnitude
    
       On 224 CPUs system : 9 iterations of hackbench -l 16000 -g 16
       v5.11-rc4 : 9.135sec (+/- 0.45%)
       v5.11-rc4 + revert this patch: 3.173sec (+/- 0.48%)
       v5.10: 3.136sec (+/- 0.40%)"
    
    Mel reports a regression [2] of hackbench on x86_64, with lockstat suggesting
    page allocator contention:
    
      "i.e. the patch incurs a 7% to 32% performance penalty. This bisected
       cleanly yesterday when I was looking for the regression and then
       found the thread.
    
       Numerous caches change size. For example, kmalloc-512 goes from
       order-0 (vanilla) to order-2 with the revert.
    
       So mostly this is down to the number of times SLUB calls into the
       page allocator which only caches order-0 pages on a per-cpu basis"
    
    Clearly num_online_cpus() doesn't work too early in bootup.  We could
    change the order dynamically in a memory hotplug callback, but runtime
    order changing for existing kmem caches has been already shown as
    dangerous, and removed in 32a6f409 ("mm, slub: remove runtime
    allocation order changes").
    
    It could be resurrected in a safe manner with some effort, but to fix
    the regression we need something simpler.
    
    We could use num_present_cpus() that should be the number of physically
    present CPUs even before they are onlined.  That would work for PowerPC
    [3], which triggered the original commit, but that still doesn't work on
    arm64 [4] as explained in [5].
    
    So this patch tries to determine the best available value without
    specific arch knowledge.
    
     - num_present_cpus() if the number is larger than 1, as that means the
       arch is likely setting it properly
    
     - nr_cpu_ids otherwise
    
    This should fix the reported regressions while also keeping the effect
    of 045ab8c9 for PowerPC systems.  It's possible there are
    configurations where num_present_cpus() is 1 during boot while
    nr_cpu_ids is at the same time bloated, so these (if they exist) would
    keep the large orders based on nr_cpu_ids as was before 045ab8c9.
    
    [1] https://lore.kernel.org/linux-mm/CAKfTPtA_JgMf_+zdFbcb_V9rM7JBWNPjAz9irgwFj7Rou=xzZg@mail.gmail.com/
    [2] https://lore.kernel.org/linux-mm/20210128134512.GF3592@techsingularity.net/
    [3] https://lore.kernel.org/linux-mm/20210123051607.GC2587010@in.ibm.com/
    [4] https://lore.kernel.org/linux-mm/CAKfTPtAjyVmS5VYvU6DBxg4-JEo5bdmWbngf-03YsY18cmWv_g@mail.gmail.com/
    [5] https://lore.kernel.org/linux-mm/20210126230305.GD30941@willie-the-truck/
    
    Link: https://lkml.kernel.org/r/20210208134108.22286-1-vbabka@suse.cz
    Fixes: 045ab8c9 ("mm/slub: let number of online CPUs determine the slub page order")
    Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Reported-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
    Reported-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Tested-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Tested-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Cc: Bharata B Rao <bharata@linux.ibm.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Jann Horn <jannh@google.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    3286222f
slub.c 141 KB