Commit 21adf7ac authored by Miquel van Smoorenburg's avatar Miquel van Smoorenburg Committed by Linus Torvalds

[PATCH] mark_page_accessed() for read()s on non-page boundaries

When reading a (partial) page from disk using read(), the kernel only marks
the page as "accessed" if the read started at a page boundary.  This means
that files that are accessed randomly at non-page boundaries (usually
database style files) will not be cached properly.

The patch below uses the readahead state instead.  If a page is read(), it
is marked as "accessed" if the previous read() was for a different page,
whatever the offset in the page.

Testing results:


- Boot kernel with mem=128M

- create a testfile of size 8 MB on a partition. Unmount/mount.

- then generate about 10 MB/sec streaming writes

	for i in `seq 1 1000`
	do
		dd if=/dev/zero of=junkfile.$i bs=1M count=10
		sync
		cat junkfile.$i > /dev/null
		sleep 1
	done

- use an application that reads 128 bytes 64000 times from a
  random offset in the 64 MB testfile.

1. Linux 2.6.10-rc3 vanilla, no streaming writes:

# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile  0.03s user 0.22s system 5% cpu 4.456 total

2. Linux 2.6.10-rc3 vanilla, streaming writes:

# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile  0.03s user 0.16s system 2% cpu 7.667 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile  0.03s user 0.37s system 1% cpu 23.294 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile  0.02s user 0.99s system 1% cpu 1:11.52 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile  0.03s user 0.21s system 2% cpu 10.273 total

3. Linux 2.6.10-rc3 with read-page-access.patch , streaming writes:

# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile  0.02s user 0.21s system 3% cpu 7.634 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile  0.04s user 0.22s system 2% cpu 9.588 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile  0.02s user 0.12s system 24% cpu 0.563 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile  0.03s user 0.13s system 98% cpu 0.163 total

As expected, with the read-page-access.patch, the kernel keeps the 8 MB
testfile cached as expected, while without it, it doesn't.

So this is useful for workloads where one smallish (wrt RAM) file is read
randomly over and over again (like heavily used database indexes), while
other I/O is going on.  Plain 2.6 caches those files poorly, if the app
uses plain read().
Signed-Off-By: default avatarMiquel van Smoorenburg <miquels@cistron.nl>
Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
parent bbd4c45d
...@@ -751,9 +751,10 @@ void do_generic_mapping_read(struct address_space *mapping, ...@@ -751,9 +751,10 @@ void do_generic_mapping_read(struct address_space *mapping,
flush_dcache_page(page); flush_dcache_page(page);
/* /*
* Mark the page accessed if we read the beginning. * When (part of) the same page is read multiple times
* in succession, only mark it as accessed the first time.
*/ */
if (!offset) if (ra.prev_page != index)
mark_page_accessed(page); mark_page_accessed(page);
/* /*
......
...@@ -35,6 +35,7 @@ void ...@@ -35,6 +35,7 @@ void
file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping) file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping)
{ {
ra->ra_pages = mapping->backing_dev_info->ra_pages; ra->ra_pages = mapping->backing_dev_info->ra_pages;
ra->prev_page = -1;
} }
/* /*
...@@ -431,6 +432,7 @@ page_cache_readahead(struct address_space *mapping, struct file_ra_state *ra, ...@@ -431,6 +432,7 @@ page_cache_readahead(struct address_space *mapping, struct file_ra_state *ra,
if (newsize == 0 || (ra->flags & RA_FLAG_INCACHE)) { if (newsize == 0 || (ra->flags & RA_FLAG_INCACHE)) {
newsize = 1; newsize = 1;
ra->prev_page = offset;
goto out; /* No readahead or file already in cache */ goto out; /* No readahead or file already in cache */
} }
/* /*
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment