• Christoph Lameter's avatar
    slub: per cpu cache for partial pages · 49e22585
    Christoph Lameter authored
    Allow filling out the rest of the kmem_cache_cpu cacheline with pointers to
    partial pages. The partial page list is used in slab_free() to avoid
    per node lock taking.
    
    In __slab_alloc() we can then take multiple partial pages off the per
    node partial list in one go reducing node lock pressure.
    
    We can also use the per cpu partial list in slab_alloc() to avoid scanning
    partial lists for pages with free objects.
    
    The main effect of a per cpu partial list is that the per node list_lock
    is taken for batches of partial pages instead of individual ones.
    
    Potential future enhancements:
    
    1. The pickup from the partial list could be perhaps be done without disabling
       interrupts with some work. The free path already puts the page into the
       per cpu partial list without disabling interrupts.
    
    2. __slab_free() may have some code paths that could use optimization.
    
    Performance:
    
    				Before		After
    ./hackbench 100 process 200000
    				Time: 1953.047	1564.614
    ./hackbench 100 process 20000
    				Time: 207.176   156.940
    ./hackbench 100 process 20000
    				Time: 204.468	156.940
    ./hackbench 100 process 20000
    				Time: 204.879	158.772
    ./hackbench 10 process 20000
    				Time: 20.153	15.853
    ./hackbench 10 process 20000
    				Time: 20.153	15.986
    ./hackbench 10 process 20000
    				Time: 19.363	16.111
    ./hackbench 1 process 20000
    				Time: 2.518	2.307
    ./hackbench 1 process 20000
    				Time: 2.258	2.339
    ./hackbench 1 process 20000
    				Time: 2.864	2.163
    Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
    Signed-off-by: default avatarPekka Enberg <penberg@kernel.org>
    49e22585
slub.c 129 KB