• Hugh Dickins's avatar
    mm: huge tmpfs: try to split_huge_page() when punching hole · 71725ed1
    Hugh Dickins authored
    Yang Shi writes:
    
    Currently, when truncating a shmem file, if the range is partly in a THP
    (start or end is in the middle of THP), the pages actually will just get
    cleared rather than being freed, unless the range covers the whole THP.
    Even though all the subpages are truncated (randomly or sequentially), the
    THP may still be kept in page cache.
    
    This might be fine for some usecases which prefer preserving THP, but
    balloon inflation is handled in base page size.  So when using shmem THP
    as memory backend, QEMU inflation actually doesn't work as expected since
    it doesn't free memory.  But the inflation usecase really needs to get the
    memory freed.  (Anonymous THP will also not get freed right away, but will
    be freed eventually when all subpages are unmapped: whereas shmem THP
    still stays in page cache.)
    
    Split THP right away when doing partial hole punch, and if split fails
    just clear the page so that read of the punched area will return zeroes.
    
    Hugh Dickins adds:
    
    Our earlier "team of pages" huge tmpfs implementation worked in the way
    that Yang Shi proposes; and we have been using this patch to continue to
    split the huge page when hole-punched or truncated, since converting over
    to the compound page implementation.  Although huge tmpfs gives out huge
    pages when available, if the user specifically asks to truncate or punch a
    hole (perhaps to free memory, perhaps to reduce the memcg charge), then
    the filesystem should do so as best it can, splitting the huge page.
    
    That is not always possible: any additional reference to the huge page
    prevents split_huge_page() from succeeding, so the result can be flaky.
    But in practice it works successfully enough that we've not seen any
    problem from that.
    
    Add shmem_punch_compound() to encapsulate the decision of when a split is
    needed, and doing the split if so.  Using this simplifies the flow in
    shmem_undo_range(); and the first (trylock) pass does not need to do any
    page clearing on failure, because the second pass will either succeed or
    do that clearing.  Following the example of zero_user_segment() when
    clearing a partial page, add flush_dcache_page() and set_page_dirty() when
    clearing a hole - though I'm not certain that either is needed.
    
    But: split_huge_page() would be sure to fail if shmem_undo_range()'s
    pagevec holds further references to the huge page.  The easiest way to fix
    that is for find_get_entries() to return early, as soon as it has put one
    compound head or tail into the pagevec.  At first this felt like a hack;
    but on examination, this convention better suits all its callers - or will
    do, if the slight one-page-per-pagevec slowdown in shmem_unlock_mapping()
    and shmem_seek_hole_data() is transformed into a 512-page-per-pagevec
    speedup by checking for compound pages there.
    Signed-off-by: default avatarHugh Dickins <hughd@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Cc: Yang Shi <yang.shi@linux.alibaba.com>
    Cc: Alexander Duyck <alexander.duyck@gmail.com>
    Cc: "Michael S. Tsirkin" <mst@redhat.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2002261959020.10801@eggly.anvilsSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    71725ed1
swap.c 31.2 KB