[PATCH] slab: use order 0 for vfs caches

We have interesting deadlocks when slab decides to use order-1 allocations for ext3_inode_cache. This is because ext3_alloc_inode() needs to perform a GFP_NOFS 1-order allocation. Sometimes the 1-order allocation needs to free a huge number of pages (tens of megabytes) before a 1-order grouping becomes available. But the GFP_NOFS allocator cannot free dcache (and hence icache) due to the deadlock problems identified in shrink_dcache_memory(). So change slab so that it will force 0-order allocations for shrinkable VFS objects. We can handle those OK.

[PATCH] slab: use order 0 for vfs caches
We have interesting deadlocks when slab decides to use order-1 allocations for ext3_inode_cache. This is because ext3_alloc_inode() needs to perform a GFP_NOFS 1-order allocation. Sometimes the 1-order allocation needs to free a huge number of pages (tens of megabytes) before a 1-order grouping becomes available. But the GFP_NOFS allocator cannot free dcache (and hence icache) due to the deadlock problems identified in shrink_dcache_memory(). So change slab so that it will force 0-order allocations for shrinkable VFS objects. We can handle those OK.
1b2569fb · Andrew Morton · Linus Torvalds · b3f25c2b · 1b2569fb
Commit 1b2569fb authored Apr 26, 2004 by Andrew Morton Committed by Linus Torvalds Apr 26, 2004
Show whitespace changes
Inline Side-by-side

Showing with 44 additions and 30 deletions

mm/slab.c mm/slab.c +44 -30

No files found.
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1220,10 +1220,22 @@ kmem_cache_create (const char *name, size_t size, size_t align,
 	size = ALIGN(size, align);
-	/* Cal size (in pages) of slabs, and the num of objs per slab.
+	if ((flags & SLAB_RECLAIM_ACCOUNT) && size <= PAGE_SIZE) {
-	 * This could be made much more intelligent.  For now, try to avoid
+		/*
-	 * using high page-orders for slabs.  When the gfp() funcs are more
+		 * A VFS-reclaimable slab tends to have most allocations
-	 * friendly towards high-order requests, this should be changed.
+		 * as GFP_NOFS and we really don't want to have to be allocating
+		 * higher-order pages when we are unable to shrink dcache.
+		 */
+		cachep->gfporder = 0;
+		cache_estimate(cachep->gfporder, size, align, flags,
+					&left_over, &cachep->num);
+	} else {
+		/*
+		 * Calculate size (in pages) of slabs, and the num of objs per
+		 * slab.  This could be made much more intelligent.  For now,
+		 * try to avoid using high page-orders for slabs.  When the
+		 * gfp() funcs are more friendly towards high-order requests,
+		 * this should be changed.
 		 */
 		do {
 			unsigned int break_flag = 0;
@@ -1236,16 +1248,17 @@ kmem_cache_create (const char *name, size_t size, size_t align,
 				break;
 			if (!cachep->num)
 				goto next;
-		if (flags & CFLGS_OFF_SLAB && cachep->num > offslab_limit) {
+			if (flags & CFLGS_OFF_SLAB &&
-			/* Oops, this num of objs will cause problems. */
+					cachep->num > offslab_limit) {
+				/* This num of objs will cause problems. */
 				cachep->gfporder--;
 				break_flag++;
 				goto cal_wastage;
 			}
 			/*
-		 * Large num of objs is good, but v. large slabs are currently
+			 * Large num of objs is good, but v. large slabs are
-		 * bad for the gfp()s.
+			 * currently bad for the gfp()s.
 			 */
 			if (cachep->gfporder >= slab_break_gfp_order)
 				break;
@@ -1255,6 +1268,7 @@ kmem_cache_create (const char *name, size_t size, size_t align,
 next:
 			cachep->gfporder++;
 		} while (1);
+	}
 	if (!cachep->num) {
 		printk("kmem_cache_create: couldn't create cache %s.\n", name);