[PATCH] fix ext3 memory leak
This is the leak which Con found. Long story... - If a dirty page is fed into ext3_writepage() during truncate, block_write_full_page() will reutrn -EIO (it's outside i_size) and will leave the buffers dirty. In the expectation that discard_buffer() will clean them. - ext3_writepage() then adds the still-dirty buffers to the journal's "async data list". These are buffers which are known to have had IO started. All we need to do is to wait on them in commit. - meanwhile, truncate will chop the pages off the address_space. But truncate cannot invalidate the buffers (in journal_unmap_buffer()) because the buffers are attached to the committing transaction. (hm. This behaviour in journal_unmap_buffer() is bogus. We just never need to write these buffers.) - ext3 commit will "wait on writeout" of these writepage buffers (even though it was never started) and will then release them from the journalling system. So we end up with pages which are attached to no mapping, which are clean and which have dirty buffers. These are unreclaimable. Aside: ext3-ordered has two buffer lists: the "sync data list" and the "async data list". The sync list consists of probably-dirty buffers which were dirtied in commit_write(). Transaction commit must write all thee out and wait on them. The async list supposedly consists of clean buffers which were attached to the journal in ->writepage. These have had IO started (by writepage) so commit merely needs to wait on them. This is all designed for the 2.4 VM really. In 2.5, tons of writeback goes via writepage (instead of the buffer lru) and these buffers end up madly hpooing between the async and sync lists. Plus it's arguably incorrect to just wait on the writes in commit - if the buffers were set dirty again (say, by zap_pte_range()) then perhaps we should write them again before committing. So what the patch does is to remove the async list. All ordered-data buffers are now attached to the single "sync data list". So when we come to commit, those buffers which are dirty will have IO started and all buffers are waited upon. This means that the dirty buffers against a clean page which came about from block_write_full_page()'s -EIO will be written to disk in commit - this cleans them, and the page is now reclaimable. No leak. It seems bogus to write these buffers in commit, and indeed it is. But ext3 will not allow those blocks to be reused until the commit has ended so there is no corruption risk. And the amount of data involved is low - it only comes about as a race between truncate and writepage().
Showing
Please register or sign in to comment