• Mike Kravetz's avatar
    hugetlb: do not clear hugetlb dtor until allocating vmemmap · 32c87719
    Mike Kravetz authored
    Patch series "Fix hugetlb free path race with memory errors".
    
    In the discussion of Jiaqi Yan's series "Improve hugetlbfs read on
    HWPOISON hugepages" the race window was discovered. 
    https://lore.kernel.org/linux-mm/20230616233447.GB7371@monkey/
    
    Freeing a hugetlb page back to low level memory allocators is performed
    in two steps.
    1) Under hugetlb lock, remove page from hugetlb lists and clear destructor
    2) Outside lock, allocate vmemmap if necessary and call low level free
    Between these two steps, the hugetlb page will appear as a normal
    compound page.  However, vmemmap for tail pages could be missing.
    If a memory error occurs at this time, we could try to update page
    flags non-existant page structs.
    
    A much more detailed description is in the first patch.
    
    The first patch addresses the race window.  However, it adds a
    hugetlb_lock lock/unlock cycle to every vmemmap optimized hugetlb page
    free operation.  This could lead to slowdowns if one is freeing a large
    number of hugetlb pages.
    
    The second path optimizes the update_and_free_pages_bulk routine to only
    take the lock once in bulk operations.
    
    The second patch is technically not a bug fix, but includes a Fixes tag
    and Cc stable to avoid a performance regression.  It can be combined with
    the first, but was done separately make reviewing easier.
    
    
    This patch (of 2):
    
    Freeing a hugetlb page and releasing base pages back to the underlying
    allocator such as buddy or cma is performed in two steps:
    - remove_hugetlb_folio() is called to remove the folio from hugetlb
      lists, get a ref on the page and remove hugetlb destructor.  This
      all must be done under the hugetlb lock.  After this call, the page
      can be treated as a normal compound page or a collection of base
      size pages.
    - update_and_free_hugetlb_folio() is called to allocate vmemmap if
      needed and the free routine of the underlying allocator is called
      on the resulting page.  We can not hold the hugetlb lock here.
    
    One issue with this scheme is that a memory error could occur between
    these two steps.  In this case, the memory error handling code treats
    the old hugetlb page as a normal compound page or collection of base
    pages.  It will then try to SetPageHWPoison(page) on the page with an
    error.  If the page with error is a tail page without vmemmap, a write
    error will occur when trying to set the flag.
    
    Address this issue by modifying remove_hugetlb_folio() and
    update_and_free_hugetlb_folio() such that the hugetlb destructor is not
    cleared until after allocating vmemmap.  Since clearing the destructor
    requires holding the hugetlb lock, the clearing is done in
    remove_hugetlb_folio() if the vmemmap is present.  This saves a
    lock/unlock cycle.  Otherwise, destructor is cleared in
    update_and_free_hugetlb_folio() after allocating vmemmap.
    
    Note that this will leave hugetlb pages in a state where they are marked
    free (by hugetlb specific page flag) and have a ref count.  This is not
    a normal state.  The only code that would notice is the memory error
    code, and it is set up to retry in such a case.
    
    A subsequent patch will create a routine to do bulk processing of
    vmemmap allocation.  This will eliminate a lock/unlock cycle for each
    hugetlb page in the case where we are freeing a large number of pages.
    
    Link: https://lkml.kernel.org/r/20230711220942.43706-1-mike.kravetz@oracle.com
    Link: https://lkml.kernel.org/r/20230711220942.43706-2-mike.kravetz@oracle.com
    Fixes: ad2fa371 ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page")
    Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
    Tested-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: Jiaqi Yan <jiaqiyan@google.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    32c87719
hugetlb.c 211 KB