• yangge's avatar
    mm/page_alloc: Separate THP PCP into movable and non-movable categories · bf14ed81
    yangge authored
    Since commit 5d0a661d ("mm/page_alloc: use only one PCP list for
    THP-sized allocations") no longer differentiates the migration type of
    pages in THP-sized PCP list, it's possible that non-movable allocation
    requests may get a CMA page from the list, in some cases, it's not
    acceptable.
    
    If a large number of CMA memory are configured in system (for example, the
    CMA memory accounts for 50% of the system memory), starting a virtual
    machine with device passthrough will get stuck.  During starting the
    virtual machine, it will call pin_user_pages_remote(..., FOLL_LONGTERM,
    ...) to pin memory.  Normally if a page is present and in CMA area,
    pin_user_pages_remote() will migrate the page from CMA area to non-CMA
    area because of FOLL_LONGTERM flag.  But if non-movable allocation
    requests return CMA memory, migrate_longterm_unpinnable_pages() will
    migrate a CMA page to another CMA page, which will fail to pass the check
    in check_and_migrate_movable_pages() and cause migration endless.
    
    Call trace:
    pin_user_pages_remote
    --__gup_longterm_locked // endless loops in this function
    ----_get_user_pages_locked
    ----check_and_migrate_movable_pages
    ------migrate_longterm_unpinnable_pages
    --------alloc_migration_target
    
    This problem will also have a negative impact on CMA itself.  For example,
    when CMA is borrowed by THP, and we need to reclaim it through cma_alloc()
    or dma_alloc_coherent(), we must move those pages out to ensure CMA's
    users can retrieve that contigous memory.  Currently, CMA's memory is
    occupied by non-movable pages, meaning we can't relocate them.  As a
    result, cma_alloc() is more likely to fail.
    
    To fix the problem above, we add one PCP list for THP, which will not
    introduce a new cacheline for struct per_cpu_pages.  THP will have 2 PCP
    lists, one PCP list is used by MOVABLE allocation, and the other PCP list
    is used by UNMOVABLE allocation.  MOVABLE allocation contains GPF_MOVABLE,
    and UNMOVABLE allocation contains GFP_UNMOVABLE and GFP_RECLAIMABLE.
    
    Link: https://lkml.kernel.org/r/1718845190-4456-1-git-send-email-yangge1116@126.com
    Fixes: 5d0a661d ("mm/page_alloc: use only one PCP list for THP-sized allocations")
    Signed-off-by: default avataryangge <yangge1116@126.com>
    Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
    Cc: Barry Song <21cnbao@gmail.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    bf14ed81
page_alloc.c 194 KB