[PATCH] Fix busy-wait with writeback to large queues
blk_congestion_wait() is a utility function which various callers use to throttle themselves to the rate at which the IO system can retire writes. The current implementation refuses to wait if no queues are "congested" (>75% of requests are in flight). That doesn't work if the queue is so huge that it can hold more than 40% (dirty_ratio) of memory. The queue simply cannot enter congestion because the VM refuses to allow more than 40% of memory to be dirtied. (This spin could happen with a lot of normal-sized queues too) So this patch simply changes blk_congestion_wait() to throttle even if there are no congested queues. It will cause the caller to sleep until someone puts back a write request against any queue. (Nobody uses blk_congestion_wait for read congestion). The patch adds new state to backing_dev_info->state: a couple of flags which indicate whether there are _any_ reads or writes in flight against that queue. This was added to prevent blk_congestion_wait() from taking a nap when there are no writes at all in flight. But the "are there any reads" info could be used to defer background writeout from pdflush, to reduce read-vs-write competition. We'll see. Because the large request queues have made a fundamental change: blocking in get_request_wait() has been the main form of VM throttling for years. But with large queues it doesn't work any more - all throttling happens in blk_congestion_wait(). Also, change io_schedule_timeout() to propagate the schedule_timeout() return value. I was using that in some debug code, but it should have been like that from day one.
Showing
Please register or sign in to comment