Commit 023f47a8 authored by Jann Horn's avatar Jann Horn Committed by Andrew Morton

mm/khugepaged: fix ->anon_vma race

If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires
it to be locked.

Page table traversal is allowed under any one of the mmap lock, the
anon_vma lock (if the VMA is associated with an anon_vma), and the
mapping lock (if the VMA is associated with a mapping); and so to be
able to remove page tables, we must hold all three of them. 
retract_page_tables() bails out if an ->anon_vma is attached, but does
this check before holding the mmap lock (as the comment above the check
explains).

If we racily merged an existing ->anon_vma (shared with a child
process) from a neighboring VMA, subsequent rmap traversals on pages
belonging to the child will be able to see the page tables that we are
concurrently removing while assuming that nothing else can access them.

Repeat the ->anon_vma check once we hold the mmap lock to ensure that
there really is no concurrent page table access.

Hitting this bug causes a lockdep warning in collapse_and_free_pmd(),
in the line "lockdep_assert_held_write(&vma->anon_vma->root->rwsem)". 
It can also lead to use-after-free access.

Link: https://lore.kernel.org/linux-mm/CAG48ez3434wZBKFFbdx4M9j6eUwSUVPd4dxhzW_k_POneSDF+A@mail.gmail.com/
Link: https://lkml.kernel.org/r/20230111133351.807024-1-jannh@google.com
Fixes: f3f0e1d2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: default avatarJann Horn <jannh@google.com>
Reported-by: default avatarZach O'Keefe <zokeefe@google.com>
Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@intel.linux.com>
Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
parent 7327e811
...@@ -1642,7 +1642,7 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, ...@@ -1642,7 +1642,7 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff,
* has higher cost too. It would also probably require locking * has higher cost too. It would also probably require locking
* the anon_vma. * the anon_vma.
*/ */
if (vma->anon_vma) { if (READ_ONCE(vma->anon_vma)) {
result = SCAN_PAGE_ANON; result = SCAN_PAGE_ANON;
goto next; goto next;
} }
...@@ -1670,6 +1670,18 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff, ...@@ -1670,6 +1670,18 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff,
result = SCAN_PTE_MAPPED_HUGEPAGE; result = SCAN_PTE_MAPPED_HUGEPAGE;
if ((cc->is_khugepaged || is_target) && if ((cc->is_khugepaged || is_target) &&
mmap_write_trylock(mm)) { mmap_write_trylock(mm)) {
/*
* Re-check whether we have an ->anon_vma, because
* collapse_and_free_pmd() requires that either no
* ->anon_vma exists or the anon_vma is locked.
* We already checked ->anon_vma above, but that check
* is racy because ->anon_vma can be populated under the
* mmap lock in read mode.
*/
if (vma->anon_vma) {
result = SCAN_PAGE_ANON;
goto unlock_next;
}
/* /*
* When a vma is registered with uffd-wp, we can't * When a vma is registered with uffd-wp, we can't
* recycle the pmd pgtable because there can be pte * recycle the pmd pgtable because there can be pte
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment