• Andrea Arcangeli's avatar
    mm: thp: calculate the mapcount correctly for THP pages during WP faults · f1b9cba0
    Andrea Arcangeli authored
    commit 6d0a07ed upstream.
    
    This will provide fully accuracy to the mapcount calculation in the
    write protect faults, so page pinning will not get broken by false
    positive copy-on-writes.
    
    total_mapcount() isn't the right calculation needed in
    reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
    that is effectively the full accurate return value for page_mapcount()
    if dealing with Transparent Hugepages, however we only use the
    page_trans_huge_mapcount() during COW faults where it strictly needed,
    due to its higher runtime cost.
    
    This also provide at practical zero cost the total_mapcount
    information which is needed to know if we can still relocate the page
    anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
    can reuse the page no matter if it's a pte or a pmd_trans_huge
    triggering the fault, but we can only relocate the page anon_vma to
    the local vma->anon_vma if we're sure it's only this "vma" mapping the
    whole THP physical range.
    
    Kirill A. Shutemov discovered the problem with moving the page
    anon_vma to the local vma->anon_vma in a previous version of this
    patch and another problem in the way page_move_anon_rmap() was called.
    
    Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a
    previous version, because reuse_swap_page must be a macro to call
    page_trans_huge_mapcount from swap.h, so this uses a macro again
    instead of an inline function. With this change at least it's a less
    dangerous usage than it was before, because "page" is used only once
    now, while with the previous code reuse_swap_page(page++) would have
    called page_mapcount on page+1 and it would have increased page twice
    instead of just once.
    
    Dean Luick noticed an uninitialized variable that could result in a
    rmap inefficiency for the non-THP case in a previous version.
    
    Mike Marciniszyn said:
    
    : Our RDMA tests are seeing an issue with memory locking that bisects to
    : commit 61f5d698 ("mm: re-enable THP")
    :
    : The test program registers two rather large MRs (512M) and RDMA
    : writes data to a passive peer using the first and RDMA reads it back
    : into the second MR and compares that data.  The sizes are chosen randomly
    : between 0 and 1024 bytes.
    :
    : The test will get through a few (<= 4 iterations) and then gets a
    : compare error.
    :
    : Tracing indicates the kernel logical addresses associated with the individual
    : pages at registration ARE correct , the data in the "RDMA read response only"
    : packets ARE correct.
    :
    : The "corruption" occurs when the packet crosse two pages that are not physically
    : contiguous.   The second page reads back as zero in the program.
    :
    : It looks like the user VA at the point of the compare error no longer points to
    : the same physical address as was registered.
    :
    : This patch totally resolves the issue!
    
    Link: http://lkml.kernel.org/r/1462547040-1737-2-git-send-email-aarcange@redhat.comSigned-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
    Reviewed-by: default avatar"Kirill A. Shutemov" <kirill@shutemov.name>
    Reviewed-by: default avatarDean Luick <dean.luick@intel.com>
    Tested-by: default avatarAlex Williamson <alex.williamson@redhat.com>
    Tested-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
    Tested-by: default avatarJosh Collier <josh.d.collier@intel.com>
    Cc: Marc Haber <mh+linux-kernel@zugschlus.de>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    f1b9cba0
swapfile.c 77.4 KB