• Anton Blanchard's avatar
    powerpc: 64bit optimised __clear_user · 17968fbb
    Anton Blanchard authored
    I noticed __clear_user high up in a profile of one of my RAID stress
    tests. The testcase was doing a dd from /dev/zero which ends up
    calling __clear_user.
    
    __clear_user is basically a loop with a single 4 byte store which
    is horribly slow. We can do much better by aligning the desination
    and doing 32 bytes of 8 byte stores in a loop.
    
    The following testcase was used to verify the patch:
    
    http://ozlabs.org/~anton/junkcode/stress_clear_user.c
    
    To show the improvement in performance I ran a dd from /dev/zero
    to /dev/null on a POWER7 box:
    
    Before:
    
    # dd if=/dev/zero of=/dev/null bs=1M count=10000
    10485760000 bytes (10 GB) copied, 3.72379 s, 2.8 GB/s
    
    After:
    
    # time dd if=/dev/zero of=/dev/null bs=1M count=10000
    10485760000 bytes (10 GB) copied, 0.728318 s, 14.4 GB/s
    
    Over 5x faster.
    Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
    Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
    17968fbb
string.S 2.49 KB