• Yin Fengwei's avatar
    mm: add functions folio_in_range() and folio_within_vma() · 28e56657
    Yin Fengwei authored
    Patch series "support large folio for mlock", v3.
    
    Yu mentioned at [1] about the mlock() can't be applied to large folio.
    
    I leant the related code and here is my understanding:
    
    - For RLIMIT_MEMLOCK related, there is no problem.  Because the
      RLIMIT_MEMLOCK statistics is not related underneath page.  That means
      underneath page mlock or munlock doesn't impact the RLIMIT_MEMLOCK
      statistics collection which is always correct.
    
    - For keeping the page in RAM, there is no problem either.  At least,
      during try_to_unmap_one(), once detect the VMA has VM_LOCKED bit set in
      vm_flags, the folio will be kept whatever the folio is mlocked or not.
    
    So the function of mlock for large folio works.  But it's not optimized
    because the page reclaim needs scan these large folio and may split them.
    
    This series identified the large folio for mlock to four types:
      - The large folio is in VM_LOCKED range and fully mapped to the
        range
    
      - The large folio is in the VM_LOCKED range but not fully mapped to
        the range
    
      - The large folio cross VM_LOCKED VMA boundary
    
      - The large folio cross last level page table boundary
    
    For the first type, we mlock large folio so page reclaim will skip it.
    
    For the second/third type, we don't mlock large folio.  As the pages not
    mapped to VM_LOACKED range are mapped to none VM_LOCKED range, if system
    is in memory pressure situation, the large folio can be picked by page
    reclaim and split.  Then the pages not mapped to VM_LOCKED range can be
    reclaimed.
    
    For the fourth type, we don't mlock large folio because locking one page
    table lock can't prevent the part in another last level page table being
    unmapped.  Thanks to Ryan for pointing this out.
    
    
    To check whether the folio is fully mapped to the range, PTEs needs be
    checked to see whether the page of folio is associated.  Which needs take
    page table lock and is heavy operation.  So far, the only place needs this
    check is madvise and page reclaim.  These functions already have their own
    PTE iterator.
    
    patch1 introduce API to check whether large folio is in VMA range.
    patch2 make page reclaim/mlock_vma_folio/munlock_vma_folio support
           large folio mlock/munlock.
    patch3 make mlock/munlock syscall support large folio.
    
    Yu also mentioned a race which can make folio unevictable after munlock
    during RFC v2 discussion [3]:
    We decided that race issue didn't block this series based on:
      - That race issue was not introduced by this series
    
      - We had a looks-ok fix for that race issue. Need to wait
        for mlock_count fixing patch as Yosry Ahmed suggested [4]
    
    [1] https://lore.kernel.org/linux-mm/CAOUHufbtNPkdktjt_5qM45GegVO-rCFOMkSh0HQminQ12zsV8Q@mail.gmail.com/
    [2] https://lore.kernel.org/linux-mm/20230809061105.3369958-1-fengwei.yin@intel.com/
    [3] https://lore.kernel.org/linux-mm/CAOUHufZ6=9P_=CAOQyw0xw-3q707q-1FVV09dBNDC-hpcpj2Pg@mail.gmail.com/
    
    
    This patch (of 3):
    
    folio_in_range() will be used to check whether the folio is mapped to
    specific VMA and whether the mapping address of folio is in the range.
    
    Also a helper function folio_within_vma() to check whether folio
    is in the range of vma based on folio_in_range().
    
    Link: https://lkml.kernel.org/r/20230918073318.1181104-1-fengwei.yin@intel.com
    Link: https://lkml.kernel.org/r/20230918073318.1181104-2-fengwei.yin@intel.comSigned-off-by: default avatarYin Fengwei <fengwei.yin@intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Ryan Roberts <ryan.roberts@arm.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: Yosry Ahmed <yosryahmed@google.com>
    Cc: Yu Zhao <yuzhao@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    28e56657
internal.h 38.6 KB