[PATCH] make mapping->tree_lock an rwlock

Convert mapping->tree_lock to an rwlock. with: dd if=/dev/zero of=foo bs=1 count=2M 0.80s user 4.15s system 99% cpu 4.961 total dd if=/dev/zero of=foo bs=1 count=2M 0.73s user 4.26s system 100% cpu 4.987 total dd if=/dev/zero of=foo bs=1 count=2M 0.79s user 4.25s system 100% cpu 5.034 total dd if=foo of=/dev/null bs=1 0.80s user 3.12s system 99% cpu 3.928 total dd if=foo of=/dev/null bs=1 0.77s user 3.15s system 100% cpu 3.914 total dd if=foo of=/dev/null bs=1 0.92s user 3.02s system 100% cpu 3.935 total (3.926: 1.87 usecs) without: dd if=/dev/zero of=foo bs=1 count=2M 0.85s user 3.92s system 99% cpu 4.780 total dd if=/dev/zero of=foo bs=1 count=2M 0.78s user 4.02s system 100% cpu 4.789 total dd if=/dev/zero of=foo bs=1 count=2M 0.82s user 3.94s system 99% cpu 4.763 total dd if=/dev/zero of=foo bs=1 count=2M 0.71s user 4.10s system 99% cpu 4.810 tota dd if=foo of=/dev/null bs=1 0.76s user 2.68s system 100% cpu 3.438 total dd if=foo of=/dev/null bs=1 0.74s user 2.72s system 99% cpu 3.465 total dd if=foo of=/dev/null bs=1 0.67s user 2.82s system 100% cpu 3.489 total dd if=foo of=/dev/null bs=1 0.70s user 2.62s system 99% cpu 3.326 total (3.430: 1.635 usecs) So on a P4, the additional cost of the rwlock is ~240 nsecs for a one-byte-write(). On the other hand: From: Peter Chubb <peterc@gelato.unsw.edu.au> As part of the Gelato scalability focus group, we've been running OSDL's Re-AIM7 benchmark with an I/O intensive load with varying numbers of processors. The current kernel shows severe contention on the tree_lock in the address space structure when running on tmpfs or ext2 on a RAM disk. Lockstat output for a 12-way: SPINLOCKS HOLD WAIT UTIL CON MEAN( MAX ) MEAN( MAX )(% CPU) TOTAL NOWAIT SPIN RJECT NAME 5.5% 0.4us(3177us) 28us( 20ms)(44.2%) 131821954 94.5% 5.5% 0.00% *TOTAL* 72.3% 13.1% 0.5us( 9.5us) 29us( 20ms)(42.5%) 50542055 86.9% 13.1% 0% find_lock_page+0x30 23.8% 0% 385us(3177us) 0us 23235 100% 0% 0% exit_mmap+0x50 11.5% 0.82% 0.1us( 101us) 17us(5670us)( 1.6%) 50665658 99.2% 0.82% 0% dnotify_parent+0x70 Replacing the spinlock with a multi-reader lock fixes this problem, without unduly affecting anything else. Here are the benchmark results (jobs per minute at a 50-client level, average of 5 runs, standard deviation in parens) on an HP Olympia with 3 cells, 12 processors, and dnotify turned off (after this spinlock, the spinlock in dnotify_parent is the worst contended for this workload). tmpfs............... ext2............... #CPUs spinlock rwlock spinlock rwlock 1 7556(15) 7588(17) +0.42% 3744(20) 3791(16) +1.25% 2 13743(31) 13791(33) +0.35% 6405(30) 6413(24) +0.12% 4 23334(111) 22881(154) -2% 9648(51) 9595(50) -0.55% 8 33580(240) 36163(190) +7.7% 13183(63) 13070(68) -0.85% 12 28748(170) 44064(238)+53% 12681(49) 14504(105)+14% And on a pentium3 single processsor: 1 4177(4) 4169(2) -0.2% 3811(4) 3820(3) +0.23% I'm not sure what's happening in the 4-processor case. The important thing to note is that with a spinlock, the benchmark shows worse performance for a 12 than for an 8-way box; with the patch, the 12 way performs better, as expected. We've done some runs with 16-way as well; without the patch below, the 16-way performs worse than the 12-way. It's a tricky tradeoff, but large-smp is hurt a lot more by the spinlocks than small-smp is by the rwlocks. And I don't think we really want to implement compile-time either-or-locks. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] make mapping->tree_lock an rwlock
Convert mapping->tree_lock to an rwlock. with: dd if=/dev/zero of=foo bs=1 count=2M 0.80s user 4.15s system 99% cpu 4.961 total dd if=/dev/zero of=foo bs=1 count=2M 0.73s user 4.26s system 100% cpu 4.987 total dd if=/dev/zero of=foo bs=1 count=2M 0.79s user 4.25s system 100% cpu 5.034 total dd if=foo of=/dev/null bs=1 0.80s user 3.12s system 99% cpu 3.928 total dd if=foo of=/dev/null bs=1 0.77s user 3.15s system 100% cpu 3.914 total dd if=foo of=/dev/null bs=1 0.92s user 3.02s system 100% cpu 3.935 total (3.926: 1.87 usecs) without: dd if=/dev/zero of=foo bs=1 count=2M 0.85s user 3.92s system 99% cpu 4.780 total dd if=/dev/zero of=foo bs=1 count=2M 0.78s user 4.02s system 100% cpu 4.789 total dd if=/dev/zero of=foo bs=1 count=2M 0.82s user 3.94s system 99% cpu 4.763 total dd if=/dev/zero of=foo bs=1 count=2M 0.71s user 4.10s system 99% cpu 4.810 tota dd if=foo of=/dev/null bs=1 0.76s user 2.68s system 100% cpu 3.438 total dd if=foo of=/dev/null bs=1 0.74s user 2.72s system 99% cpu 3.465 total dd if=foo of=/dev/null bs=1 0.67s user 2.82s system 100% cpu 3.489 total dd if=foo of=/dev/null bs=1 0.70s user 2.62s system 99% cpu 3.326 total (3.430: 1.635 usecs) So on a P4, the additional cost of the rwlock is ~240 nsecs for a one-byte-write(). On the other hand: From: Peter Chubb <peterc@gelato.unsw.edu.au> As part of the Gelato scalability focus group, we've been running OSDL's Re-AIM7 benchmark with an I/O intensive load with varying numbers of processors. The current kernel shows severe contention on the tree_lock in the address space structure when running on tmpfs or ext2 on a RAM disk. Lockstat output for a 12-way: SPINLOCKS HOLD WAIT UTIL CON MEAN( MAX ) MEAN( MAX )(% CPU) TOTAL NOWAIT SPIN RJECT NAME 5.5% 0.4us(3177us) 28us( 20ms)(44.2%) 131821954 94.5% 5.5% 0.00% *TOTAL* 72.3% 13.1% 0.5us( 9.5us) 29us( 20ms)(42.5%) 50542055 86.9% 13.1% 0% find_lock_page+0x30 23.8% 0% 385us(3177us) 0us 23235 100% 0% 0% exit_mmap+0x50 11.5% 0.82% 0.1us( 101us) 17us(5670us)( 1.6%) 50665658 99.2% 0.82% 0% dnotify_parent+0x70 Replacing the spinlock with a multi-reader lock fixes this problem, without unduly affecting anything else. Here are the benchmark results (jobs per minute at a 50-client level, average of 5 runs, standard deviation in parens) on an HP Olympia with 3 cells, 12 processors, and dnotify turned off (after this spinlock, the spinlock in dnotify_parent is the worst contended for this workload). tmpfs............... ext2............... #CPUs spinlock rwlock spinlock rwlock 1 7556(15) 7588(17) +0.42% 3744(20) 3791(16) +1.25% 2 13743(31) 13791(33) +0.35% 6405(30) 6413(24) +0.12% 4 23334(111) 22881(154) -2% 9648(51) 9595(50) -0.55% 8 33580(240) 36163(190) +7.7% 13183(63) 13070(68) -0.85% 12 28748(170) 44064(238)+53% 12681(49) 14504(105)+14% And on a pentium3 single processsor: 1 4177(4) 4169(2) -0.2% 3811(4) 3820(3) +0.23% I'm not sure what's happening in the 4-processor case. The important thing to note is that with a spinlock, the benchmark shows worse performance for a 12 than for an 8-way box; with the patch, the 12 way performs better, as expected. We've done some runs with 16-way as well; without the patch below, the 16-way performs worse than the 12-way. It's a tricky tradeoff, but large-smp is hurt a lot more by the spinlocks than small-smp is by the rwlocks. And I don't think we really want to implement compile-time either-or-locks. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1eeae015 · bill.irwin@oracle.com · Linus Torvalds · 3db29f35 · 1eeae015 · 1eeae015
Commit 1eeae015 authored Mar 04, 2005 by bill.irwin@oracle.com Committed by Linus Torvalds Mar 04, 2005
13 changed files
--- a/drivers/mtd/devices/block2mtd.c
+++ b/drivers/mtd/devices/block2mtd.c
@@ -59,7 +59,7 @@ void cache_readahead(struct address_space *mapping, int index)

 	end_index = ((isize - 1) >> PAGE_CACHE_SHIFT);

-	spin_lock_irq(&mapping->tree_lock);
+	read_lock_irq(&mapping->tree_lock);
 	for (i = 0; i < PAGE_READAHEAD; i++) {
 		pagei = index + i;
 		if (pagei > end_index) {
@@ -71,16 +71,16 @@ void cache_readahead(struct address_space *mapping, int index)
 			break;
 		if (page)
 			continue;
-		spin_unlock_irq(&mapping->tree_lock);
+		read_unlock_irq(&mapping->tree_lock);
 		page = page_cache_alloc_cold(mapping);
-		spin_lock_irq(&mapping->tree_lock);
+		read_lock_irq(&mapping->tree_lock);
 		if (!page)
 			break;
 		page->index = pagei;
 		list_add(&page->lru, &page_pool);
 		ret++;
 	}
-	spin_unlock_irq(&mapping->tree_lock);
+	read_unlock_irq(&mapping->tree_lock);
 	if (ret)
 		read_cache_pages(mapping, &page_pool, filler, NULL);
 }

--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -875,7 +875,7 @@ int __set_page_dirty_buffers(struct page *page)
 	spin_unlock(&mapping->private_lock);

 	if (!TestSetPageDirty(page)) {
-		spin_lock_irq(&mapping->tree_lock);
+		write_lock_irq(&mapping->tree_lock);
 		if (page->mapping) {	/* Race with truncate? */
 			if (!mapping->backing_dev_info->memory_backed)
 				inc_page_state(nr_dirty);
@@ -883,7 +883,7 @@ int __set_page_dirty_buffers(struct page *page)
 						page_index(page),
 						PAGECACHE_TAG_DIRTY);
 		}
-		spin_unlock_irq(&mapping->tree_lock);
+		write_unlock_irq(&mapping->tree_lock);
 		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
 	}
 	

--- a/fs/inode.c
+++ b/fs/inode.c
@@ -196,7 +196,7 @@ void inode_init_once(struct inode *inode)
 	sema_init(&inode->i_sem, 1);
 	init_rwsem(&inode->i_alloc_sem);
 	INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC);
-	spin_lock_init(&inode->i_data.tree_lock);
+	rwlock_init(&inode->i_data.tree_lock);
 	spin_lock_init(&inode->i_data.i_mmap_lock);
 	INIT_LIST_HEAD(&inode->i_data.private_list);
 	spin_lock_init(&inode->i_data.private_lock);

--- a/include/asm-arm/cacheflush.h
+++ b/include/asm-arm/cacheflush.h
@@ -312,9 +312,9 @@ flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr)
 extern void flush_dcache_page(struct page *);

 #define flush_dcache_mmap_lock(mapping) \
-	spin_lock_irq(&(mapping)->tree_lock)
+	write_lock_irq(&(mapping)->tree_lock)
 #define flush_dcache_mmap_unlock(mapping) \
-	spin_unlock_irq(&(mapping)->tree_lock)
+	write_unlock_irq(&(mapping)->tree_lock)

 #define flush_icache_user_range(vma,page,addr,len) \
 	flush_dcache_page(page)

--- a/include/asm-parisc/cacheflush.h
+++ b/include/asm-parisc/cacheflush.h
@@ -57,9 +57,9 @@ flush_user_icache_range(unsigned long start, unsigned long end)
 extern void flush_dcache_page(struct page *page);

 #define flush_dcache_mmap_lock(mapping) \
-	spin_lock_irq(&(mapping)->tree_lock)
+	write_lock_irq(&(mapping)->tree_lock)
 #define flush_dcache_mmap_unlock(mapping) \
-	spin_unlock_irq(&(mapping)->tree_lock)
+	write_unlock_irq(&(mapping)->tree_lock)

 #define flush_icache_page(vma,page)	do { flush_kernel_dcache_page(page_address(page)); flush_kernel_icache_page(page_address(page)); } while (0)


--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -335,7 +335,7 @@ struct backing_dev_info;
 struct address_space {
 	struct inode		*host;		/* owner: inode, block_device */
 	struct radix_tree_root	page_tree;	/* radix tree of all pages */
-	spinlock_t		tree_lock;	/* and spinlock protecting it */
+	rwlock_t		tree_lock;	/* and rwlock protecting it */
 	unsigned int		i_mmap_writable;/* count VM_SHARED mappings */
 	struct prio_tree_root	i_mmap;		/* tree of private and shared mappings */
 	struct list_head	i_mmap_nonlinear;/*list VM_NONLINEAR mappings */

--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -126,9 +126,9 @@ void remove_from_page_cache(struct page *page)
 	if (unlikely(!PageLocked(page)))
 		PAGE_BUG(page);

-	spin_lock_irq(&mapping->tree_lock);
+	write_lock_irq(&mapping->tree_lock);
 	__remove_from_page_cache(page);
-	spin_unlock_irq(&mapping->tree_lock);
+	write_unlock_irq(&mapping->tree_lock);
 }

 static int sync_page(void *word)
@@ -365,7 +365,7 @@ int add_to_page_cache(struct page *page, struct address_space *mapping,
 	int error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);

 	if (error == 0) {
-		spin_lock_irq(&mapping->tree_lock);
+		write_lock_irq(&mapping->tree_lock);
 		error = radix_tree_insert(&mapping->page_tree, offset, page);
 		if (!error) {
 			page_cache_get(page);
@@ -375,7 +375,7 @@ int add_to_page_cache(struct page *page, struct address_space *mapping,
 			mapping->nrpages++;
 			pagecache_acct(1);
 		}
-		spin_unlock_irq(&mapping->tree_lock);
+		write_unlock_irq(&mapping->tree_lock);
 		radix_tree_preload_end();
 	}
 	return error;
@@ -488,11 +488,11 @@ struct page * find_get_page(struct address_space *mapping, unsigned long offset)
 {
 	struct page *page;

-	spin_lock_irq(&mapping->tree_lock);
+	read_lock_irq(&mapping->tree_lock);
 	page = radix_tree_lookup(&mapping->page_tree, offset);
 	if (page)
 		page_cache_get(page);
-	spin_unlock_irq(&mapping->tree_lock);
+	read_unlock_irq(&mapping->tree_lock);
 	return page;
 }

@@ -505,11 +505,11 @@ struct page *find_trylock_page(struct address_space *mapping, unsigned long offs
 {
 	struct page *page;

-	spin_lock_irq(&mapping->tree_lock);
+	read_lock_irq(&mapping->tree_lock);
 	page = radix_tree_lookup(&mapping->page_tree, offset);
 	if (page && TestSetPageLocked(page))
 		page = NULL;
-	spin_unlock_irq(&mapping->tree_lock);
+	read_unlock_irq(&mapping->tree_lock);
 	return page;
 }

@@ -531,15 +531,15 @@ struct page *find_lock_page(struct address_space *mapping,
 {
 	struct page *page;

-	spin_lock_irq(&mapping->tree_lock);
+	read_lock_irq(&mapping->tree_lock);
 repeat:
 	page = radix_tree_lookup(&mapping->page_tree, offset);
 	if (page) {
 		page_cache_get(page);
 		if (TestSetPageLocked(page)) {
-			spin_unlock_irq(&mapping->tree_lock);
+			read_unlock_irq(&mapping->tree_lock);
 			lock_page(page);
-			spin_lock_irq(&mapping->tree_lock);
+			read_lock_irq(&mapping->tree_lock);

 			/* Has the page been truncated while we slept? */
 			if (page->mapping != mapping || page->index != offset) {
@@ -549,7 +549,7 @@ struct page *find_lock_page(struct address_space *mapping,
 			}
 		}
 	}
-	spin_unlock_irq(&mapping->tree_lock);
+	read_unlock_irq(&mapping->tree_lock);
 	return page;
 }

@@ -623,12 +623,12 @@ unsigned find_get_pages(struct address_space *mapping, pgoff_t start,
 	unsigned int i;
 	unsigned int ret;

-	spin_lock_irq(&mapping->tree_lock);
+	read_lock_irq(&mapping->tree_lock);
 	ret = radix_tree_gang_lookup(&mapping->page_tree,
 				(void **)pages, start, nr_pages);
 	for (i = 0; i < ret; i++)
 		page_cache_get(pages[i]);
-	spin_unlock_irq(&mapping->tree_lock);
+	read_unlock_irq(&mapping->tree_lock);
 	return ret;
 }

@@ -642,14 +642,14 @@ unsigned find_get_pages_tag(struct address_space *mapping, pgoff_t *index,
 	unsigned int i;
 	unsigned int ret;

-	spin_lock_irq(&mapping->tree_lock);
+	read_lock_irq(&mapping->tree_lock);
 	ret = radix_tree_gang_lookup_tag(&mapping->page_tree,
 				(void **)pages, *index, nr_pages, tag);
 	for (i = 0; i < ret; i++)
 		page_cache_get(pages[i]);
 	if (ret)
 		*index = pages[ret - 1]->index + 1;
-	spin_unlock_irq(&mapping->tree_lock);
+	read_unlock_irq(&mapping->tree_lock);
 	return ret;
 }


--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -601,7 +601,7 @@ int __set_page_dirty_nobuffers(struct page *page)
 		struct address_space *mapping2;

 		if (mapping) {
-			spin_lock_irq(&mapping->tree_lock);
+			write_lock_irq(&mapping->tree_lock);
 			mapping2 = page_mapping(page);
 			if (mapping2) { /* Race with truncate? */
 				BUG_ON(mapping2 != mapping);
@@ -610,7 +610,7 @@ int __set_page_dirty_nobuffers(struct page *page)
 				radix_tree_tag_set(&mapping->page_tree,
 					page_index(page), PAGECACHE_TAG_DIRTY);
 			}
-			spin_unlock_irq(&mapping->tree_lock);
+			write_unlock_irq(&mapping->tree_lock);
 			if (mapping->host) {
 				/* !PageAnon && !swapper_space */
 				__mark_inode_dirty(mapping->host,
@@ -685,17 +685,17 @@ int test_clear_page_dirty(struct page *page)
 	unsigned long flags;

 	if (mapping) {
-		spin_lock_irqsave(&mapping->tree_lock, flags);
+		write_lock_irqsave(&mapping->tree_lock, flags);
 		if (TestClearPageDirty(page)) {
 			radix_tree_tag_clear(&mapping->page_tree,
 						page_index(page),
 						PAGECACHE_TAG_DIRTY);
-			spin_unlock_irqrestore(&mapping->tree_lock, flags);
+			write_unlock_irqrestore(&mapping->tree_lock, flags);
 			if (!mapping->backing_dev_info->memory_backed)
 				dec_page_state(nr_dirty);
 			return 1;
 		}
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		write_unlock_irqrestore(&mapping->tree_lock, flags);
 		return 0;
 	}
 	return TestClearPageDirty(page);
@@ -742,15 +742,15 @@ int __clear_page_dirty(struct page *page)
 	if (mapping) {
 		unsigned long flags;

-		spin_lock_irqsave(&mapping->tree_lock, flags);
+		write_lock_irqsave(&mapping->tree_lock, flags);
 		if (TestClearPageDirty(page)) {
 			radix_tree_tag_clear(&mapping->page_tree,
 						page_index(page),
 						PAGECACHE_TAG_DIRTY);
-			spin_unlock_irqrestore(&mapping->tree_lock, flags);
+			write_unlock_irqrestore(&mapping->tree_lock, flags);
 			return 1;
 		}
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		write_unlock_irqrestore(&mapping->tree_lock, flags);
 		return 0;
 	}
 	return TestClearPageDirty(page);
@@ -764,13 +764,13 @@ int test_clear_page_writeback(struct page *page)
 	if (mapping) {
 		unsigned long flags;

-		spin_lock_irqsave(&mapping->tree_lock, flags);
+		write_lock_irqsave(&mapping->tree_lock, flags);
 		ret = TestClearPageWriteback(page);
 		if (ret)
 			radix_tree_tag_clear(&mapping->page_tree,
 						page_index(page),
 						PAGECACHE_TAG_WRITEBACK);
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		write_unlock_irqrestore(&mapping->tree_lock, flags);
 	} else {
 		ret = TestClearPageWriteback(page);
 	}
@@ -785,7 +785,7 @@ int test_set_page_writeback(struct page *page)
 	if (mapping) {
 		unsigned long flags;

-		spin_lock_irqsave(&mapping->tree_lock, flags);
+		write_lock_irqsave(&mapping->tree_lock, flags);
 		ret = TestSetPageWriteback(page);
 		if (!ret)
 			radix_tree_tag_set(&mapping->page_tree,
@@ -795,7 +795,7 @@ int test_set_page_writeback(struct page *page)
 			radix_tree_tag_clear(&mapping->page_tree,
 						page_index(page),
 						PAGECACHE_TAG_DIRTY);
-		spin_unlock_irqrestore(&mapping->tree_lock, flags);
+		write_unlock_irqrestore(&mapping->tree_lock, flags);
 	} else {
 		ret = TestSetPageWriteback(page);
 	}
@@ -813,9 +813,9 @@ int mapping_tagged(struct address_space *mapping, int tag)
 	unsigned long flags;
 	int ret;

-	spin_lock_irqsave(&mapping->tree_lock, flags);
+	read_lock_irqsave(&mapping->tree_lock, flags);
 	ret = radix_tree_tagged(&mapping->page_tree, tag);
-	spin_unlock_irqrestore(&mapping->tree_lock, flags);
+	read_unlock_irqrestore(&mapping->tree_lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL(mapping_tagged);
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -274,7 +274,7 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
 	/*
 	 * Preallocate as many pages as we will need.
 	 */
-	spin_lock_irq(&mapping->tree_lock);
+	read_lock_irq(&mapping->tree_lock);
 	for (page_idx = 0; page_idx < nr_to_read; page_idx++) {
 		unsigned long page_offset = offset + page_idx;
 		
@@ -285,16 +285,16 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
 		if (page)
 			continue;

-		spin_unlock_irq(&mapping->tree_lock);
+		read_unlock_irq(&mapping->tree_lock);
 		page = page_cache_alloc_cold(mapping);
-		spin_lock_irq(&mapping->tree_lock);
+		read_lock_irq(&mapping->tree_lock);
 		if (!page)
 			break;
 		page->index = page_offset;
 		list_add(&page->lru, &page_pool);
 		ret++;
 	}
-	spin_unlock_irq(&mapping->tree_lock);
+	read_unlock_irq(&mapping->tree_lock);

 	/*
 	 * Now start the IO.  We ignore I/O errors - if the page is not

--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -35,7 +35,7 @@ static struct backing_dev_info swap_backing_dev_info = {

 struct address_space swapper_space = {
 	.page_tree	= RADIX_TREE_INIT(GFP_ATOMIC|__GFP_NOWARN),
-	.tree_lock	= SPIN_LOCK_UNLOCKED,
+	.tree_lock	= RW_LOCK_UNLOCKED,
 	.a_ops		= &swap_aops,
 	.i_mmap_nonlinear = LIST_HEAD_INIT(swapper_space.i_mmap_nonlinear),
 	.backing_dev_info = &swap_backing_dev_info,
@@ -76,7 +76,7 @@ static int __add_to_swap_cache(struct page *page,
 	BUG_ON(PagePrivate(page));
 	error = radix_tree_preload(gfp_mask);
 	if (!error) {
-		spin_lock_irq(&swapper_space.tree_lock);
+		write_lock_irq(&swapper_space.tree_lock);
 		error = radix_tree_insert(&swapper_space.page_tree,
 						entry.val, page);
 		if (!error) {
@@ -87,7 +87,7 @@ static int __add_to_swap_cache(struct page *page,
 			total_swapcache_pages++;
 			pagecache_acct(1);
 		}
-		spin_unlock_irq(&swapper_space.tree_lock);
+		write_unlock_irq(&swapper_space.tree_lock);
 		radix_tree_preload_end();
 	}
 	return error;
@@ -214,9 +214,9 @@ void delete_from_swap_cache(struct page *page)
  
 	entry.val = page->private;

-	spin_lock_irq(&swapper_space.tree_lock);
+	write_lock_irq(&swapper_space.tree_lock);
 	__delete_from_swap_cache(page);
-	spin_unlock_irq(&swapper_space.tree_lock);
+	write_unlock_irq(&swapper_space.tree_lock);

 	swap_free(entry);
 	page_cache_release(page);
@@ -315,13 +315,13 @@ struct page * lookup_swap_cache(swp_entry_t entry)
 {
 	struct page *page;

-	spin_lock_irq(&swapper_space.tree_lock);
+	read_lock_irq(&swapper_space.tree_lock);
 	page = radix_tree_lookup(&swapper_space.page_tree, entry.val);
 	if (page) {
 		page_cache_get(page);
 		INC_CACHE_INFO(find_success);
 	}
-	spin_unlock_irq(&swapper_space.tree_lock);
+	read_unlock_irq(&swapper_space.tree_lock);
 	INC_CACHE_INFO(find_total);
 	return page;
 }
@@ -344,12 +344,12 @@ struct page *read_swap_cache_async(swp_entry_t entry,
 		 * called after lookup_swap_cache() failed, re-calling
 		 * that would confuse statistics.
 		 */
-		spin_lock_irq(&swapper_space.tree_lock);
+		read_lock_irq(&swapper_space.tree_lock);
 		found_page = radix_tree_lookup(&swapper_space.page_tree,
 						entry.val);
 		if (found_page)
 			page_cache_get(found_page);
-		spin_unlock_irq(&swapper_space.tree_lock);
+		read_unlock_irq(&swapper_space.tree_lock);
 		if (found_page)
 			break;


--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -291,10 +291,10 @@ static int exclusive_swap_page(struct page *page)
 		/* Is the only swap cache user the cache itself? */
 		if (p->swap_map[swp_offset(entry)] == 1) {
 			/* Recheck the page count with the swapcache lock held.. */
-			spin_lock_irq(&swapper_space.tree_lock);
+			write_lock_irq(&swapper_space.tree_lock);
 			if (page_count(page) == 2)
 				retval = 1;
-			spin_unlock_irq(&swapper_space.tree_lock);
+			write_unlock_irq(&swapper_space.tree_lock);
 		}
 		swap_info_put(p);
 	}
@@ -362,13 +362,13 @@ int remove_exclusive_swap_page(struct page *page)
 	retval = 0;
 	if (p->swap_map[swp_offset(entry)] == 1) {
 		/* Recheck the page count with the swapcache lock held.. */
-		spin_lock_irq(&swapper_space.tree_lock);
+		write_lock_irq(&swapper_space.tree_lock);
 		if ((page_count(page) == 2) && !PageWriteback(page)) {
 			__delete_from_swap_cache(page);
 			SetPageDirty(page);
 			retval = 1;
 		}
-		spin_unlock_irq(&swapper_space.tree_lock);
+		write_unlock_irq(&swapper_space.tree_lock);
 	}
 	swap_info_put(p);

@@ -392,12 +392,12 @@ void free_swap_and_cache(swp_entry_t entry)
 	p = swap_info_get(entry);
 	if (p) {
 		if (swap_entry_free(p, swp_offset(entry)) == 1) {
-			spin_lock_irq(&swapper_space.tree_lock);
+			read_lock_irq(&swapper_space.tree_lock);
 			page = radix_tree_lookup(&swapper_space.page_tree,
 				entry.val);
 			if (page && TestSetPageLocked(page))
 				page = NULL;
-			spin_unlock_irq(&swapper_space.tree_lock);
+			read_unlock_irq(&swapper_space.tree_lock);
 		}
 		swap_info_put(p);
 	}

--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -76,15 +76,15 @@ invalidate_complete_page(struct address_space *mapping, struct page *page)
 	if (PagePrivate(page) && !try_to_release_page(page, 0))
 		return 0;

-	spin_lock_irq(&mapping->tree_lock);
+	write_lock_irq(&mapping->tree_lock);
 	if (PageDirty(page)) {
-		spin_unlock_irq(&mapping->tree_lock);
+		write_unlock_irq(&mapping->tree_lock);
 		return 0;
 	}

 	BUG_ON(PagePrivate(page));
 	__remove_from_page_cache(page);
-	spin_unlock_irq(&mapping->tree_lock);
+	write_unlock_irq(&mapping->tree_lock);
 	ClearPageUptodate(page);
 	page_cache_release(page);	/* pagecache ref */
 	return 1;

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -475,7 +475,7 @@ static int shrink_list(struct list_head *page_list, struct scan_control *sc)
 		if (!mapping)
 			goto keep_locked;	/* truncate got there first */

-		spin_lock_irq(&mapping->tree_lock);
+		write_lock_irq(&mapping->tree_lock);

 		/*
 		 * The non-racy check for busy page.  It is critical to check
@@ -483,7 +483,7 @@ static int shrink_list(struct list_head *page_list, struct scan_control *sc)
 		 * not in use by anybody. 	(pagecache + us == 2)
 		 */
 		if (page_count(page) != 2 || PageDirty(page)) {
-			spin_unlock_irq(&mapping->tree_lock);
+			write_unlock_irq(&mapping->tree_lock);
 			goto keep_locked;
 		}

@@ -491,7 +491,7 @@ static int shrink_list(struct list_head *page_list, struct scan_control *sc)
 		if (PageSwapCache(page)) {
 			swp_entry_t swap = { .val = page->private };
 			__delete_from_swap_cache(page);
-			spin_unlock_irq(&mapping->tree_lock);
+			write_unlock_irq(&mapping->tree_lock);
 			swap_free(swap);
 			__put_page(page);	/* The pagecache ref */
 			goto free_it;
@@ -499,7 +499,7 @@ static int shrink_list(struct list_head *page_list, struct scan_control *sc)
 #endif /* CONFIG_SWAP */

 		__remove_from_page_cache(page);
-		spin_unlock_irq(&mapping->tree_lock);
+		write_unlock_irq(&mapping->tree_lock);
 		__put_page(page);

 free_it: