[PATCH] Fix page double-freeing race
This has been there for nearly two years. See bugzilla #1403 vmscan.c does, in two places: spin_lock(zone->lru_lock) page = lru_to_page(&zone->inactive_list); if (page_count(page) == 0) { /* erk, it's being freed by __page_cache_release() or * release_pages() */ put_it_back_on_the_lru(); } else { --> window 1 <-- page_cache_get(page); put_in_on_private_list(); } spin_unlock(zone->lru_lock) use_the_private_list(); page_cache_release(page); whereas __page_cache_release() and release_pages() do: if (put_page_testzero(page)) { --> window 2 <-- spin_lock(lru->lock); if (page_count(page) == 0) { remove_it_from_the_lru(); really_free_the_page() } spin_unlock(zone->lru_lock) } The race occurs if the vmscan.c path sees page_count()==1 and then the page_cache_release() path happens in that few-instruction "window 1" before vmscan's page_cache_get(). The page_cache_release() path does put_page_testzero(), which returns true. Then this CPU takes an interrupt... The vmscan.c path then does page_cache_get(), taking the refcount to one. Then it uses the page and does page_cache_release(), taking the refcount to zero and the page is really freed. Now, the CPU running page_cache_release() returns from the interrupt, takes the LRU lock, sees the page still has a refcount of zero and frees it again. Boom. The patch fixes this by closing "window 1". We provide a "get_page_testone()" which grabs a ref on the page and returns true if the refcount was previously zero. If that happens the vmscan.c code simply drops the page's refcount again and leaves the page on the LRU. All this happens under the zone->lru_lock, which is also taken by __page_cache_release() and release_pages(), so the vmscan code knows that the page has not been returned to the page allocator yet. In terms of implementation, the page counts are now offset by one: a free page has page->_count of -1. This is so that we can use atomic_add_negative() and atomic_inc_and_test() to provide put_page_testzero() and get_page_testone(). The macros hide all of this so the public interpretation of page_count() and set_page_count() remains unaltered. The compiler can usually constant-fold the offsetting of page->count. This patch increases an x86 SMP kernel's text by 32 bytes. The patch renames page->count to page->_count to break callers who aren't using the macros. This patch requires that the architecture implement atomic_add_negative(). It is currently present on arm arm26 i386 ia64 mips ppc s390 v850 x86_64 ppc implements this as #define atomic_add_negative(a, v) (atomic_add_return((a), (v)) < 0) and atomic_add_return() is implemented on alpha cris h8300 ia64 m68knommu mips parisc ppc ppc ppc64 s390 sh sparc v850 so we're looking pretty good.
Showing
Please register or sign in to comment