• Jann Horn's avatar
    mm/mremap: fix move_normal_pmd/retract_page_tables race · 6fa1066f
    Jann Horn authored
    In mremap(), move_page_tables() looks at the type of the PMD entry and the
    specified address range to figure out by which method the next chunk of
    page table entries should be moved.
    
    At that point, the mmap_lock is held in write mode, but no rmap locks are
    held yet.  For PMD entries that point to page tables and are fully covered
    by the source address range, move_pgt_entry(NORMAL_PMD, ...) is called,
    which first takes rmap locks, then does move_normal_pmd(). 
    move_normal_pmd() takes the necessary page table locks at source and
    destination, then moves an entire page table from the source to the
    destination.
    
    The problem is: The rmap locks, which protect against concurrent page
    table removal by retract_page_tables() in the THP code, are only taken
    after the PMD entry has been read and it has been decided how to move it. 
    So we can race as follows (with two processes that have mappings of the
    same tmpfs file that is stored on a tmpfs mount with huge=advise); note
    that process A accesses page tables through the MM while process B does it
    through the file rmap:
    
    process A                      process B
    =========                      =========
    mremap
      mremap_to
        move_vma
          move_page_tables
            get_old_pmd
            alloc_new_pmd
                          *** PREEMPT ***
                                   madvise(MADV_COLLAPSE)
                                     do_madvise
                                       madvise_walk_vmas
                                         madvise_vma_behavior
                                           madvise_collapse
                                             hpage_collapse_scan_file
                                               collapse_file
                                                 retract_page_tables
                                                   i_mmap_lock_read(mapping)
                                                   pmdp_collapse_flush
                                                   i_mmap_unlock_read(mapping)
            move_pgt_entry(NORMAL_PMD, ...)
              take_rmap_locks
              move_normal_pmd
              drop_rmap_locks
    
    When this happens, move_normal_pmd() can end up creating bogus PMD entries
    in the line `pmd_populate(mm, new_pmd, pmd_pgtable(pmd))`.  The effect
    depends on arch-specific and machine-specific details; on x86, you can end
    up with physical page 0 mapped as a page table, which is likely
    exploitable for user->kernel privilege escalation.
    
    Fix the race by letting process B recheck that the PMD still points to a
    page table after the rmap locks have been taken.  Otherwise, we bail and
    let the caller fall back to the PTE-level copying path, which will then
    bail immediately at the pmd_none() check.
    
    Bug reachability: Reaching this bug requires that you can create
    shmem/file THP mappings - anonymous THP uses different code that doesn't
    zap stuff under rmap locks.  File THP is gated on an experimental config
    flag (CONFIG_READ_ONLY_THP_FOR_FS), so on normal distro kernels you need
    shmem THP to hit this bug.  As far as I know, getting shmem THP normally
    requires that you can mount your own tmpfs with the right mount flags,
    which would require creating your own user+mount namespace; though I don't
    know if some distros maybe enable shmem THP by default or something like
    that.
    
    Bug impact: This issue can likely be used for user->kernel privilege
    escalation when it is reachable.
    
    Link: https://lkml.kernel.org/r/20241007-move_normal_pmd-vs-collapse-fix-2-v1-1-5ead9631f2ea@google.com
    Fixes: 1d65b771 ("mm/khugepaged: retract_page_tables() without mmap or vma lock")
    Signed-off-by: default avatarJann Horn <jannh@google.com>
    Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Co-developed-by: default avatarDavid Hildenbrand <david@redhat.com>
    Closes: https://project-zero.issues.chromium.org/371047675Acked-by: default avatarQi Zheng <zhengqi.arch@bytedance.com>
    Reviewed-by: default avatarLorenzo Stoakes <lorenzo.stoakes@oracle.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    6fa1066f
mremap.c 31.5 KB