Commit 31a9b2ac authored by Nick Piggin's avatar Nick Piggin Committed by Linus Torvalds

[PATCH] vm: prevent kswapd pageout priority windup

Now that we are correctly kicking off kswapd early (before the synch
reclaim watermark), it is really doing asynchronous pageout.  This has
exposed a latent problem where allocators running at the same time will
make kswapd think it is getting into trouble, and cause too much swapping
and suboptimal behaviour.

This patch changes the kswapd scanning algorithm to use the same metrics
for measuring pageout success as the synchronous reclaim path - namely, how
much work is required to free SWAP_CLUSTER_MAX pages.

This should make things less fragile all round, and has the added benefit
that kswapd will continue running so long as memory is low and it is
managing to free pages, rather than going through the full priority loop,
then giving up.  Should result in much better behaviour all round,
especially when there are concurrent allocators.

akpm: the patch was confirmed to fix up the excessive swapout which Ray Bryant
<raybry@sgi.com> has been reporting.
Signed-off-by: default avatarNick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
parent aeb1ae30
...@@ -968,12 +968,16 @@ int try_to_free_pages(struct zone **zones, ...@@ -968,12 +968,16 @@ int try_to_free_pages(struct zone **zones,
static int balance_pgdat(pg_data_t *pgdat, int nr_pages) static int balance_pgdat(pg_data_t *pgdat, int nr_pages)
{ {
int to_free = nr_pages; int to_free = nr_pages;
int all_zones_ok;
int priority; int priority;
int i; int i;
int total_scanned = 0, total_reclaimed = 0; int total_scanned, total_reclaimed;
struct reclaim_state *reclaim_state = current->reclaim_state; struct reclaim_state *reclaim_state = current->reclaim_state;
struct scan_control sc; struct scan_control sc;
loop_again:
total_scanned = 0;
total_reclaimed = 0;
sc.gfp_mask = GFP_KERNEL; sc.gfp_mask = GFP_KERNEL;
sc.may_writepage = 0; sc.may_writepage = 0;
sc.nr_mapped = read_page_state(nr_mapped); sc.nr_mapped = read_page_state(nr_mapped);
...@@ -987,10 +991,11 @@ static int balance_pgdat(pg_data_t *pgdat, int nr_pages) ...@@ -987,10 +991,11 @@ static int balance_pgdat(pg_data_t *pgdat, int nr_pages)
} }
for (priority = DEF_PRIORITY; priority >= 0; priority--) { for (priority = DEF_PRIORITY; priority >= 0; priority--) {
int all_zones_ok = 1;
int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */ int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */
unsigned long lru_pages = 0; unsigned long lru_pages = 0;
all_zones_ok = 1;
if (nr_pages == 0) { if (nr_pages == 0) {
/* /*
* Scan in the highmem->dma direction for the highest * Scan in the highmem->dma direction for the highest
...@@ -1072,6 +1077,15 @@ static int balance_pgdat(pg_data_t *pgdat, int nr_pages) ...@@ -1072,6 +1077,15 @@ static int balance_pgdat(pg_data_t *pgdat, int nr_pages)
*/ */
if (total_scanned && priority < DEF_PRIORITY - 2) if (total_scanned && priority < DEF_PRIORITY - 2)
blk_congestion_wait(WRITE, HZ/10); blk_congestion_wait(WRITE, HZ/10);
/*
* We do this so kswapd doesn't build up large priorities for
* example when it is freeing in parallel with allocators. It
* matches the direct reclaim path behaviour in terms of impact
* on zone->*_priority.
*/
if (total_reclaimed >= SWAP_CLUSTER_MAX)
break;
} }
out: out:
for (i = 0; i < pgdat->nr_zones; i++) { for (i = 0; i < pgdat->nr_zones; i++) {
...@@ -1079,6 +1093,9 @@ static int balance_pgdat(pg_data_t *pgdat, int nr_pages) ...@@ -1079,6 +1093,9 @@ static int balance_pgdat(pg_data_t *pgdat, int nr_pages)
zone->prev_priority = zone->temp_priority; zone->prev_priority = zone->temp_priority;
} }
if (!all_zones_ok)
goto loop_again;
return total_reclaimed; return total_reclaimed;
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment