[PATCH] mpage_writepages() i_size reading fix

I believe reading the i_size from memory multiple times can generate fs corruption. The "offset" and the "end_index" were not coherent. this is writepages and it runs w/o the i_sem, so the i_size can change from under us anytime. If a parallel write happens while writepages run, the i_size could advance from 4095 to 4100. With the current 2.6 code that could translate in end_index = 0 and offset = 4. That's broken because end_index and offset could be not coherent. Either end_index=1 and offset =4, or end_index = 0 and offset = 4095. When they lose coherency the memset can zeroout actual data. The below patch fixes that (it's at least a theoretical bug). I don't really expect this tiny race to fix the bug in practice after the more serious bugs we covered yesterday didn't fix it (more likely the compiler will get involved into the equation soon ;). This is also an optimization for 32bit archs that needs special locking to read 64bit i_size coherenty. This patch also arranges for mpage_writepages() to always zero out the file's final page between i_size and the end of the file's final block. This is a best-effort correctness thing to deal with errant applications which write into the mmapped page beyond the underlying file's EOF. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] mpage_writepages() i_size reading fix
I believe reading the i_size from memory multiple times can generate fs corruption. The "offset" and the "end_index" were not coherent. this is writepages and it runs w/o the i_sem, so the i_size can change from under us anytime. If a parallel write happens while writepages run, the i_size could advance from 4095 to 4100. With the current 2.6 code that could translate in end_index = 0 and offset = 4. That's broken because end_index and offset could be not coherent. Either end_index=1 and offset =4, or end_index = 0 and offset = 4095. When they lose coherency the memset can zeroout actual data. The below patch fixes that (it's at least a theoretical bug). I don't really expect this tiny race to fix the bug in practice after the more serious bugs we covered yesterday didn't fix it (more likely the compiler will get involved into the equation soon ;). This is also an optimization for 32bit archs that needs special locking to read 64bit i_size coherenty. This patch also arranges for mpage_writepages() to always zero out the file's final page between i_size and the end of the file's final block. This is a best-effort correctness thing to deal with errant applications which write into the mmapped page beyond the underlying file's EOF. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
d9ca90fb · Andrea Arcangeli · Linus Torvalds · c5da10ac · d9ca90fb
Commit d9ca90fb authored Jul 10, 2004 by Andrea Arcangeli Committed by Linus Torvalds Jul 10, 2004
Show whitespace changes
Inline Side-by-side

Showing with 13 additions and 5 deletions

fs/mpage.c fs/mpage.c +13 -5

No files found.
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -404,6 +404,7 @@ mpage_writepage(struct bio *bio, struct page *page, get_block_t get_block,
 	struct block_device *boundary_bdev = NULL;
 	int length;
 	struct buffer_head map_bh;
+	loff_t i_size = i_size_read(inode);

 	if (page_has_buffers(page)) {
 		struct buffer_head *head = page_buffers(page);
@@ -460,7 +461,7 @@ mpage_writepage(struct bio *bio, struct page *page, get_block_t get_block,
 	 */
 	BUG_ON(!PageUptodate(page));
 	block_in_file = page->index << (PAGE_CACHE_SHIFT - blkbits);
-	last_block = (i_size_read(inode) - 1) >> blkbits;
+	last_block = (i_size - 1) >> blkbits;
 	map_bh.b_page = page;
 	for (page_block = 0; page_block < blocks_per_page; ) {

@@ -489,9 +490,18 @@ mpage_writepage(struct bio *bio, struct page *page, get_block_t get_block,

 	first_unmapped = page_block;

-	end_index = i_size_read(inode) >> PAGE_CACHE_SHIFT;
+page_is_mapped:
+	end_index = i_size >> PAGE_CACHE_SHIFT;
 	if (page->index >= end_index) {
-		unsigned offset = i_size_read(inode) & (PAGE_CACHE_SIZE - 1);
+		/*
+		 * The page straddles i_size.  It must be zeroed out on each
+		 * and every writepage invokation because it may be mmapped.
+		 * "A file is mapped in multiples of the page size.  For a file
+		 * that is not a multiple of the page size, the remaining memory
+		 * is zeroed when mapped, and writes to that region are not
+		 * written out to the file."
+		 */
+		unsigned offset = i_size & (PAGE_CACHE_SIZE - 1);
 		char *kaddr;

 		if (page->index > end_index || !offset)
@@ -502,8 +512,6 @@ mpage_writepage(struct bio *bio, struct page *page, get_block_t get_block,
 		kunmap_atomic(kaddr, KM_USER0);
 	}

-page_is_mapped:
-
 	/*
 	 * This page will go to BIO.  Do we need to send this BIO off first?
 	 */