• David Hildenbrand's avatar
    mm/madvise: don't perform madvise VMA walk for MADV_POPULATE_(READ|WRITE) · fa9fcd8b
    David Hildenbrand authored
    We changed faultin_page_range() to no longer consume a VMA, because
    faultin_page_range() might internally release the mm lock to lookup
    the VMA again -- required to cleanly handle VM_FAULT_RETRY. But
    independent of that, __get_user_pages() will always lookup the VMA
    itself.
    
    Now that we let __get_user_pages() just handle VMA checks in a way that
    is suitable for MADV_POPULATE_(READ|WRITE), the VMA walk in madvise()
    is just overhead. So let's just call madvise_populate()
    on the full range instead.
    
    There is one change in behavior: madvise_walk_vmas() would skip any VMA
    holes, and if everything succeeded, it would return -ENOMEM after
    processing all VMAs.
    
    However, for MADV_POPULATE_(READ|WRITE) it's unlikely for the caller to
    notice any difference: -ENOMEM might either indicate that there were VMA
    holes or that populating page tables failed because there was not enough
    memory. So it's unlikely that user space will notice the difference, and
    that special handling likely only makes sense for some other madvise()
    actions.
    
    Further, we'd already fail with -ENOMEM early in the past if looking up the
    VMA after dropping the MM lock failed because of concurrent VMA
    modifications. So let's just keep it simple and avoid the madvise VMA
    walk, and consistently fail early if we find a VMA hole.
    
    Link: https://lkml.kernel.org/r/20240314161300.382526-3-david@redhat.comSigned-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Cc: Darrick J. Wong <djwong@kernel.org>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    fa9fcd8b
madvise.c 38.4 KB