Commit 84c55f8d authored by Hugh Dickins's avatar Hugh Dickins Committed by Greg Kroah-Hartman

mm/khugepaged: collapse_shmem() without freezing new_page

commit 87c460a0 upstream.

khugepaged's collapse_shmem() does almost all of its work, to assemble
the huge new_page from 512 scattered old pages, with the new_page's
refcount frozen to 0 (and refcounts of all old pages so far also frozen
to 0).  Including shmem_getpage() to read in any which were out on swap,
memory reclaim if necessary to allocate their intermediate pages, and
copying over all the data from old to new.

Imagine the frozen refcount as a spinlock held, but without any lock
debugging to highlight the abuse: it's not good, and under serious load
heads into lockups - speculative getters of the page are not expecting
to spin while khugepaged is rescheduled.

One can get a little further under load by hacking around elsewhere; but
fortunately, freezing the new_page turns out to have been entirely
unnecessary, with no hacks needed elsewhere.

The huge new_page lock is already held throughout, and guards all its
subpages as they are brought one by one into the page cache tree; and
anything reading the data in that page, without the lock, before it has
been marked PageUptodate, would already be in the wrong.  So simply
eliminate the freezing of the new_page.

Each of the old pages remains frozen with refcount 0 after it has been
replaced by a new_page subpage in the page cache tree, until they are
all unfrozen on success or failure: just as before.  They could be
unfrozen sooner, but cause no problem once no longer visible to
find_get_entry(), filemap_map_pages() and other speculative lookups.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261527570.2275@eggly.anvils
Fixes: f3f0e1d2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: default avatarHugh Dickins <hughd@google.com>
Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
parent b447a6ad
...@@ -1288,7 +1288,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) ...@@ -1288,7 +1288,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
* collapse_shmem - collapse small tmpfs/shmem pages into huge one. * collapse_shmem - collapse small tmpfs/shmem pages into huge one.
* *
* Basic scheme is simple, details are more complex: * Basic scheme is simple, details are more complex:
* - allocate and freeze a new huge page; * - allocate and lock a new huge page;
* - scan over radix tree replacing old pages the new one * - scan over radix tree replacing old pages the new one
* + swap in pages if necessary; * + swap in pages if necessary;
* + fill in gaps; * + fill in gaps;
...@@ -1296,11 +1296,11 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) ...@@ -1296,11 +1296,11 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
* - if replacing succeed: * - if replacing succeed:
* + copy data over; * + copy data over;
* + free old pages; * + free old pages;
* + unfreeze huge page; * + unlock huge page;
* - if replacing failed; * - if replacing failed;
* + put all pages back and unfreeze them; * + put all pages back and unfreeze them;
* + restore gaps in the radix-tree; * + restore gaps in the radix-tree;
* + free huge page; * + unlock and free huge page;
*/ */
static void collapse_shmem(struct mm_struct *mm, static void collapse_shmem(struct mm_struct *mm,
struct address_space *mapping, pgoff_t start, struct address_space *mapping, pgoff_t start,
...@@ -1337,13 +1337,11 @@ static void collapse_shmem(struct mm_struct *mm, ...@@ -1337,13 +1337,11 @@ static void collapse_shmem(struct mm_struct *mm,
__SetPageSwapBacked(new_page); __SetPageSwapBacked(new_page);
new_page->index = start; new_page->index = start;
new_page->mapping = mapping; new_page->mapping = mapping;
BUG_ON(!page_ref_freeze(new_page, 1));
/* /*
* At this point the new_page is 'frozen' (page_count() is zero), locked * At this point the new_page is locked and not up-to-date.
* and not up-to-date. It's safe to insert it into radix tree, because * It's safe to insert it into the page cache, because nobody would
* nobody would be able to map it or use it in other way until we * be able to map it or use it in another way until we unlock it.
* unfreeze it.
*/ */
index = start; index = start;
...@@ -1521,9 +1519,8 @@ static void collapse_shmem(struct mm_struct *mm, ...@@ -1521,9 +1519,8 @@ static void collapse_shmem(struct mm_struct *mm,
index++; index++;
} }
/* Everything is ready, let's unfreeze the new_page */
SetPageUptodate(new_page); SetPageUptodate(new_page);
page_ref_unfreeze(new_page, HPAGE_PMD_NR); page_ref_add(new_page, HPAGE_PMD_NR - 1);
set_page_dirty(new_page); set_page_dirty(new_page);
mem_cgroup_commit_charge(new_page, memcg, false, true); mem_cgroup_commit_charge(new_page, memcg, false, true);
lru_cache_add_anon(new_page); lru_cache_add_anon(new_page);
...@@ -1571,8 +1568,6 @@ static void collapse_shmem(struct mm_struct *mm, ...@@ -1571,8 +1568,6 @@ static void collapse_shmem(struct mm_struct *mm,
VM_BUG_ON(nr_none); VM_BUG_ON(nr_none);
spin_unlock_irq(&mapping->tree_lock); spin_unlock_irq(&mapping->tree_lock);
/* Unfreeze new_page, caller would take care about freeing it */
page_ref_unfreeze(new_page, 1);
mem_cgroup_cancel_charge(new_page, memcg, true); mem_cgroup_cancel_charge(new_page, memcg, true);
new_page->mapping = NULL; new_page->mapping = NULL;
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment