[PATCH] writeback scalability improvements
The kernel has a number of problems wrt heavy write traffic to multiple spindles. What keeps on happening is that all processes which are responsible for writeback get blocked on one of the queues and all the others fall idle. This happens in the balance_dirty_pages() path (balance_dirty() in 2.4) and in the page reclaim code, when a dirty page is found on the LRU. The latter is particularly bad because it causes "innocent" processes to be suspended for long periods due to the activity of heavy writers. The general idea is: the primary resource for writeback should be the process which is dirtying memory. The secondary resource is the pdflush pool (although this is mainly for providing async writeback in the presence of light-moderate loads). Add the final oh-gee-we-screwed-up resource for writeback is a caller to shrink_cache(). This patch addresses the balance_dirty_pages() path. This code was initially modelled on the 2.4 writeback scheme: throttled processes writeback all data regardless of its queue. Instead, the patch changes it so that the balance_dirty_pages() caller only writes back pages which are dirty against the queue which that caller just dirtied. So the effect is a better allocation of writeback resources across the queues and increased parallelism. The per-queue writeback is implemented by using mapping->backing_dev_info as a search key during the walk across the superblocks and inodes. The patch also fixes an initialisation problem in block_dev.c:do_open(): it was setting up the blockdev's mapping->backing_dev_info too early, before the queue has been identified. Generally, this patch doesn't help much, because of the stalls in the page allocator. I have a patch which mostly fixes that up, and taken together the kernel is achieving almost platter speed against six spindles, but only when the system has a small amount of memory. More work is needed there.
Showing
Please register or sign in to comment