• Mark Nelson's avatar
    powerpc: New copy_4K_page() · 57dda6ef
    Mark Nelson authored
    This new copy_4K_page() function was originally tuned for the best
    performance on the Cell processor, but after testing on more 64bit
    powerpc chips it was found that with a small modification it either
    matched the performance offered by the current mainline version or
    bettered it by a small amount.
    
    It was found that on a Cell-based QS22 blade the amount of system
    time measured when compiling a 2.6.26 pseries_defconfig decreased
    by 4%. Using the same test, a 4-way 970MP machine saw a decrease of
    2% in system time. No noticeable change was seen on Power4, Power5
    or Power6.
    
    The 4096 byte page is copied in thirty-two 128 byte strides. An
    initial setup loop executes dcbt instructions for the whole source
    page and dcbz instructions for the whole destination page. To do
    this, the cache line size is retrieved from ppc64_caches.
    
    A new CPU feature bit, CPU_FTR_CP_USE_DCBTZ, (introduced in the
    previous patch) is used to make the modification to this new copy
    routine - on Power4, 970 and Cell the feature bit is set so the
    setup loop is executed, but on all other 64bit chips the setup
    loop is nop'ed out.
    Signed-off-by: default avatarMark Nelson <markn@au1.ibm.com>
    Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
    57dda6ef
copypage_64.S 1.96 KB