• Huang Ying's avatar
    mm, clear_huge_page: move order algorithm into a separate function · c6ddfb6c
    Huang Ying authored
    Patch series "mm, huge page: Copy target sub-page last when copy huge
    page", v2.
    
    Huge page helps to reduce TLB miss rate, but it has higher cache
    footprint, sometimes this may cause some issue.  For example, when
    copying huge page on x86_64 platform, the cache footprint is 4M.  But on
    a Xeon E5 v3 2699 CPU, there are 18 cores, 36 threads, and only 45M LLC
    (last level cache).  That is, in average, there are 2.5M LLC for each
    core and 1.25M LLC for each thread.
    
    If the cache contention is heavy when copying the huge page, and we copy
    the huge page from the begin to the end, it is possible that the begin
    of huge page is evicted from the cache after we finishing copying the
    end of the huge page.  And it is possible for the application to access
    the begin of the huge page after copying the huge page.
    
    In c79b57e4 ("mm: hugetlb: clear target sub-page last when clearing
    huge page"), to keep the cache lines of the target subpage hot, the
    order to clear the subpages in the huge page in clear_huge_page() is
    changed to clearing the subpage which is furthest from the target
    subpage firstly, and the target subpage last.  The similar order
    changing helps huge page copying too.  That is implemented in this
    patchset.
    
    The patchset is a generic optimization which should benefit quite some
    workloads, not for a specific use case.  To demonstrate the performance
    benefit of the patchset, we have tested it with vm-scalability run on
    transparent huge page.
    
    With this patchset, the throughput increases ~16.6% in vm-scalability
    anon-cow-seq test case with 36 processes on a 2 socket Xeon E5 v3 2699
    system (36 cores, 72 threads).  The test case set
    /sys/kernel/mm/transparent_hugepage/enabled to be always, mmap() a big
    anonymous memory area and populate it, then forked 36 child processes,
    each writes to the anonymous memory area from the begin to the end, so
    cause copy on write.  For each child process, other child processes
    could be seen as other workloads which generate heavy cache pressure.
    At the same time, the IPC (instruction per cycle) increased from 0.63 to
    0.78, and the time spent in user space is reduced ~7.2%.
    
    This patch (of 4):
    
    In c79b57e4 ("mm: hugetlb: clear target sub-page last when clearing
    huge page"), to keep the cache lines of the target subpage hot, the
    order to clear the subpages in the huge page in clear_huge_page() is
    changed to clearing the subpage which is furthest from the target
    subpage firstly, and the target subpage last.  This optimization could
    be applied to copying huge page too with the same order algorithm.  To
    avoid code duplication and reduce maintenance overhead, in this patch,
    the order algorithm is moved out of clear_huge_page() into a separate
    function: process_huge_page().  So that we can use it for copying huge
    page too.
    
    This will change the direct calls to clear_user_highpage() into the
    indirect calls.  But with the proper inline support of the compilers,
    the indirect call will be optimized to be the direct call.  Our tests
    show no performance change with the patch.
    
    This patch is a code cleanup without functionality change.
    
    Link: http://lkml.kernel.org/r/20180524005851.4079-2-ying.huang@intel.comSigned-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
    Suggested-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Cc: Andi Kleen <andi.kleen@intel.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Shaohua Li <shli@fb.com>
    Cc: Christopher Lameter <cl@linux.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    c6ddfb6c
memory.c 128 KB