• Jussi Kivilinna's avatar
    crypto: camellia-aesni-avx2 - tune assembly code for more performance · acfffdb8
    Jussi Kivilinna authored
    Add implementation tuned for more performance on real hardware. Changes are
    mostly around the part mixing 128-bit extract and insert instructions and
    AES-NI instructions. Also 'vpbroadcastb' instructions have been change to
    'vpshufb with zero mask'.
    
    Tests on Intel Core i5-4570:
    
    tcrypt ECB results, old-AVX2 vs new-AVX2:
    
    size    128bit key      256bit key
            enc     dec     enc     dec
    256     1.00x   1.00x   1.00x   1.00x
    1k      1.08x   1.09x   1.05x   1.06x
    8k      1.06x   1.06x   1.06x   1.06x
    
    tcrypt ECB results, AVX vs new-AVX2:
    
    size    128bit key      256bit key
            enc     dec     enc     dec
    256     1.00x   1.00x   1.00x   1.00x
    1k      1.51x   1.50x   1.52x   1.50x
    8k      1.47x   1.48x   1.48x   1.48x
    Signed-off-by: default avatarJussi Kivilinna <jussi.kivilinna@iki.fi>
    Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
    acfffdb8
camellia-aesni-avx2-asm_64.S 37.4 KB