Commit 399ba0b9 authored by NeilBrown's avatar NeilBrown Committed by Linus Torvalds

mm/vmscan.c: avoid throttling reclaim for loop-back nfsd threads

When a loopback NFS mount is active and the backing device for the NFS
mount becomes congested, that can impose throttling delays on the nfsd
threads.

These delays significantly reduce throughput and so the NFS mount remains
congested.

This results in a livelock and the reduced throughput persists.

This livelock has been found in testing with the 'wait_iff_congested'
call, and could possibly be caused by the 'congestion_wait' call.

This livelock is similar to the deadlock which justified the introduction
of PF_LESS_THROTTLE, and the same flag can be used to remove this
livelock.

To minimise the impact of the change, we still throttle nfsd when the
filesystem it is writing to is congested, but not when some separate
filesystem (e.g.  the NFS filesystem) is congested.
Signed-off-by: default avatarNeilBrown <neilb@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 11de9927
...@@ -1438,6 +1438,19 @@ putback_inactive_pages(struct lruvec *lruvec, struct list_head *page_list) ...@@ -1438,6 +1438,19 @@ putback_inactive_pages(struct lruvec *lruvec, struct list_head *page_list)
list_splice(&pages_to_free, page_list); list_splice(&pages_to_free, page_list);
} }
/*
* If a kernel thread (such as nfsd for loop-back mounts) services
* a backing device by writing to the page cache it sets PF_LESS_THROTTLE.
* In that case we should only throttle if the backing device it is
* writing to is congested. In other cases it is safe to throttle.
*/
static int current_may_throttle(void)
{
return !(current->flags & PF_LESS_THROTTLE) ||
current->backing_dev_info == NULL ||
bdi_write_congested(current->backing_dev_info);
}
/* /*
* shrink_inactive_list() is a helper for shrink_zone(). It returns the number * shrink_inactive_list() is a helper for shrink_zone(). It returns the number
* of reclaimed pages * of reclaimed pages
...@@ -1566,7 +1579,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, ...@@ -1566,7 +1579,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
* implies that pages are cycling through the LRU faster than * implies that pages are cycling through the LRU faster than
* they are written so also forcibly stall. * they are written so also forcibly stall.
*/ */
if (nr_unqueued_dirty == nr_taken || nr_immediate) if ((nr_unqueued_dirty == nr_taken || nr_immediate) &&
current_may_throttle())
congestion_wait(BLK_RW_ASYNC, HZ/10); congestion_wait(BLK_RW_ASYNC, HZ/10);
} }
...@@ -1575,7 +1589,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, ...@@ -1575,7 +1589,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
* is congested. Allow kswapd to continue until it starts encountering * is congested. Allow kswapd to continue until it starts encountering
* unqueued dirty pages or cycling through the LRU too quickly. * unqueued dirty pages or cycling through the LRU too quickly.
*/ */
if (!sc->hibernation_mode && !current_is_kswapd()) if (!sc->hibernation_mode && !current_is_kswapd() &&
current_may_throttle())
wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10); wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);
trace_mm_vmscan_lru_shrink_inactive(zone->zone_pgdat->node_id, trace_mm_vmscan_lru_shrink_inactive(zone->zone_pgdat->node_id,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment