• Will Deacon's avatar
    arm64: lib: improve copy_page to deal with 128 bytes at a time · 223e23e8
    Will Deacon authored
    We want to avoid lots of different copy_page implementations, settling
    for something that is "good enough" everywhere and hopefully easy to
    understand and maintain whilst we're at it.
    
    This patch reworks our copy_page implementation based on discussions
    with Cavium on the list and benchmarking on Cortex-A processors so that:
    
      - The loop is unrolled to copy 128 bytes per iteration
    
      - The reads are offset so that we read from the next 128-byte block
        in the same iteration that we store the previous block
    
      - Explicit prefetch instructions are removed for now, since they hurt
        performance on CPUs with hardware prefetching
    
      - The loop exit condition is calculated at the start of the loop
    Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
    Tested-by: default avatarAndrew Pinski <apinski@cavium.com>
    Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
    223e23e8
copy_page.S 1.72 KB