• Andrew Morton's avatar
    [PATCH] slab: cleanups and speedups · cad9cd51
    Andrew Morton authored
    - enable the cpu array for all caches
    
    - remove the optimized implementations for quick list access - with
      cpu arrays in all caches, the list access is now rare.
    
    - make the cpu arrays mandatory, this removes 50% of the conditional
      branches from the hot path of kmem_cache_alloc [1]
    
    - poisoning for objects with constructors
    
    Patch got a bit longer...
    
    I forgot to mention this: head arrays mean that some pages can be
    blocked due to objects in the head arrays, and not returned to
    page_alloc.c.  The current kernel never flushes the head arrays, this
    might worsen the behaviour of low memory systems.  The hunk that
    flushes the arrays regularly comes next.
    
    Details changelog: [to be read site by side with the patch]
    
    * docu update
    
    * "growing" is not really needed: races between grow and shrink are
      handled by retrying.  [additionally, the current kernel never
      shrinks]
    
    * move the batchcount into the cpu array:
    	the old code contained a race during cpu cache tuning:
    		update batchcount [in cachep] before or after the IPI?
    	And NUMA will need it anyway.
    
    * bootstrap support: the cpu arrays are really mandatory, nothing
      works without them.  Thus a statically allocated cpu array is needed
      to for starting the allocators.
    
    * move the full, partial & free lists into a separate structure, as a
      preparation for NUMA
    
    * structure reorganization: now the cpu arrays are the most important
      part, not the lists.
    
    * dead code elimination: remove "failures", nowhere read.
    
    * dead code elimination: remove "OPTIMIZE": not implemented.  The
      idea is to skip the virt_to_page lookup for caches with on-slab slab
      structures, and use (ptr&PAGE_MASK) instead.  The details are in
      Bonwicks paper.  Not fully implemented.
    
    * remove GROWN: kernel never shrinks a cache, thus grown is
      meaningless.
    
    * bootstrap: starting the slab allocator is now a 3 stage process:
    	- nothing works, use the statically allocated cpu arrays.
    	- the smallest kmalloc allocator works, use it to allocate
    		cpu arrays.
    	- all kmalloc allocators work, use the default cpu array size
    
    * register a cpu nodifier callback, and allocate the needed head
      arrays if a new cpu arrives
    
    * always enable head arrays, even for DEBUG builds.  Poisoning and
      red-zoning now happens before an object is added to the arrays.
      Insert enable_all_cpucaches into cpucache_init, there is no need for
      seperate function.
    
    * modifications to the debug checks due to the earlier calls of the
      dtor for caches with poisoning enabled
    
    * poison+ctor is now supported
    
    * squeezing 3 objects into a cacheline is hopeless, the FIXME is not
      solvable and can be removed.
    
    * add additional debug tests: check_irq_off(), check_irq_on(),
      check_spinlock_acquired().
    
    * move do_ccupdate_local nearer to do_tune_cpucache.  Should have
      been part of -04-drain.
    
    * additional objects checks.  red-zoning is tricky: it's implemented
      by increasing the object size by 2*BYTES_PER_WORD.  Thus
      BYTES_PER_WORD must be added to objp before calling the destructor,
      constructor or before returing the object from alloc.  The poison
      functions add BYTES_PER_WORD internally.
    
    * create a flagcheck function, right now the tests are duplicated in
      cache_grow [always] and alloc_debugcheck_before [DEBUG only]
    
    * modify slab list updates: all allocs are now bulk allocs that try
      to get multiple objects at once, update the list pointers only at the
      end of a bulk alloc, not once per alloc.
    
    * might_sleep was moved into kmem_flagcheck.
    
    * major hotpath change:
    	- cc always exists, no fallback
    	- cache_alloc_refill is called with disabled interrupts,
    	  and does everything to recover from an empty cpu array.
    	  Far shorter & simpler __cache_alloc [inlined in both
    	  kmalloc and kmem_cache_alloc]
    
    * __free_block, free_block, cache_flusharray: main implementation of
      returning objects to the lists.  no big changes, diff lost track.
    
    * new debug check: too early kmalloc or kmem_cache_alloc
    
    * slightly reduce the sizes of the cpu arrays: keep the size < a
      power of 2, including batchcount, avail and now limit, for optimal
      kmalloc memory efficiency.
    
    That's it.  I even found 2 bugs while reading: dtors and ctors for
    verify were called with wrong parameters, with RED_ZONE enabled, and
    some checks still assumed that POISON and ctor are incompatible.
    cad9cd51
slab.c 54.9 KB