• Andrea Arcangeli's avatar
    mm/rmap: support move to different root anon_vma in folio_move_anon_rmap() · 880a99b6
    Andrea Arcangeli authored
    Patch series "userfaultfd move option", v6.
    
    This patch series introduces UFFDIO_MOVE feature to userfaultfd, which has
    long been implemented and maintained by Andrea in his local tree [1], but
    was not upstreamed due to lack of use cases where this approach would be
    better than allocating a new page and copying the contents.  Previous
    upstraming attempts could be found at [6] and [7].
    
    UFFDIO_COPY performs ~20% better than UFFDIO_MOVE when the application
    needs pages to be allocated [2].  However, with UFFDIO_MOVE, if pages are
    available (in userspace) for recycling, as is usually the case in heap
    compaction algorithms, then we can avoid the page allocation and memcpy
    (done by UFFDIO_COPY).  Also, since the pages are recycled in the
    userspace, we avoid the need to release (via madvise) the pages back to
    the kernel [3].  We see over 40% reduction (on a Google pixel 6 device) in
    the compacting thread's completion time by using UFFDIO_MOVE vs. 
    UFFDIO_COPY.  This was measured using a benchmark that emulates a heap
    compaction implementation using userfaultfd (to allow concurrent accesses
    by application threads).  More details of the usecase are explained in
    [3].
    
    Furthermore, UFFDIO_MOVE enables moving swapped-out pages without
    touching them within the same vma. Today, it can only be done by mremap,
    however it forces splitting the vma.
    
    TODOs for follow-up improvements:
    - cross-mm support. Known differences from single-mm and missing pieces:
    	- memcg recharging (might need to isolate pages in the process)
    	- mm counters
    	- cross-mm deposit table moves
    	- cross-mm test
    	- document the address space where src and dest reside in struct
    	  uffdio_move
    
    - TLB flush batching.  Will require extensive changes to PTL locking in
      move_pages_pte().  OTOH that might let us reuse parts of mremap code.
    
    
    This patch (of 5):
    
    For now, folio_move_anon_rmap() was only used to move a folio to a
    different anon_vma after fork(), whereby the root anon_vma stayed
    unchanged.  For that, it was sufficient to hold the folio lock when
    calling folio_move_anon_rmap().
    
    However, we want to make use of folio_move_anon_rmap() to move folios
    between VMAs that have a different root anon_vma.  As folio_referenced()
    performs an RMAP walk without holding the folio lock but only holding the
    anon_vma in read mode, holding the folio lock is insufficient.
    
    When moving to an anon_vma with a different root anon_vma, we'll have to
    hold both, the folio lock and the anon_vma lock in write mode. 
    Consequently, whenever we succeeded in folio_lock_anon_vma_read() to
    read-lock the anon_vma, we have to re-check if the mapping was changed in
    the meantime.  If that was the case, we have to retry.
    
    Note that folio_move_anon_rmap() must only be called if the anon page is
    exclusive to a process, and must not be called on KSM folios.
    
    This is a preparation for UFFDIO_MOVE, which will hold the folio lock, the
    anon_vma lock in write mode, and the mmap_lock in read mode.
    
    Link: https://lkml.kernel.org/r/20231206103702.3873743-1-surenb@google.com
    Link: https://lkml.kernel.org/r/20231206103702.3873743-2-surenb@google.comSigned-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
    Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
    Acked-by: default avatarPeter Xu <peterx@redhat.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: Brian Geffon <bgeffon@google.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Jann Horn <jannh@google.com>
    Cc: Kalesh Singh <kaleshsingh@google.com>
    Cc: kernel-team@android.com
    Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
    Cc: Lokesh Gidra <lokeshgidra@google.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Mike Rapoport (IBM) <rppt@kernel.org>
    Cc: Nicolas Geoffray <ngeoffray@google.com>
    Cc: Ryan Roberts <ryan.roberts@arm.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: ZhangPeng <zhangpeng362@huawei.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    880a99b6
rmap.c 76.1 KB