• Liam R. Howlett's avatar
    mm/vma: correctly position vma_iterator in __split_vma() · b7012d51
    Liam R. Howlett authored
    Patch series "Avoid MAP_FIXED gap exposure", v8.
    
    It is now possible to walk the vma tree using the rcu read locks and is
    beneficial to do so to reduce lock contention.  Doing so while a MAP_FIXED
    mapping is executing means that a reader may see a gap in the vma tree
    that should never logically exist - and does not when using the mmap lock
    in read mode.  The temporal gap exists because mmap_region() calls
    munmap() prior to installing the new mapping.
    
    This patch set stops rcu readers from seeing the temporal gap by splitting
    up the munmap() function into two parts.  The first part prepares the vma
    tree for modifications by doing the necessary splits and tracks the vmas
    marked for removal in a side tree.  The second part completes the
    munmapping of the vmas after the vma tree has been overwritten (either by
    a MAP_FIXED replacement vma or by a NULL in the munmap() case).
    
    Please note that rcu walkers will still be able to see a temporary state
    of split vmas that may be in the process of being removed, but the
    temporal gap will not be exposed.  vma_start_write() are called on both
    parts of the split vma, so this state is detectable.
    
    If existing vmas have a vm_ops->close(), then they will be called prior to
    mapping the new vmas (and ptes are cleared out).  Without calling
    ->close(), hugetlbfs tests fail (hugemmap06 specifically) due to resources
    still being marked as 'busy'.  Unfortunately, calling the corresponding
    ->open() may not restore the state of the vmas, so it is safer to keep the
    existing failure scenario where a gap is inserted and never replaced.  The
    failure scenario is in its own patch (0015) for traceability.
    
    
    This patch (of 21):
    
    The vma iterator may be left pointing to the newly created vma.  This
    happens when inserting the new vma at the end of the old vma (!new_below).
    
    The incorrect position in the vma iterator is not exposed currently since
    the vma iterator is repositioned in the munmap path and is not reused in
    any of the other paths.
    
    This has limited impact in the current code, but is required for future
    changes.
    
    Link: https://lkml.kernel.org/r/20240830040101.822209-2-Liam.Howlett@oracle.com
    Fixes: b2b3b886 ("mm: don't use __vma_adjust() in __split_vma()")
    Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@Oracle.com>
    Reviewed-by: default avatarSuren Baghdasaryan <surenb@google.com>
    Reviewed-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
    Cc: Bert Karwatzki <spasswolf@web.de>
    Cc: Jeff Xu <jeffxu@chromium.org>
    Cc: Jiri Olsa <olsajiri@gmail.com>
    Cc: Kees Cook <kees@kernel.org>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: "Paul E. McKenney" <paulmck@kernel.org>
    Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
    Cc: Mark Brown <broonie@kernel.org>
    Cc: Paul Moore <paul@paul-moore.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    b7012d51
vma.c 48.9 KB