crypto: arm/chacha20 - faster 8-bit rotations and other optimizations (a1b22a5f) · Commits · Kirill Smelkov / linux

Commit a1b22a5f authored Sep 01, 2018 by

Eric Biggers Committed by Herbert Xu Sep 04, 2018

crypto: arm/chacha20 - faster 8-bit rotations and other optimizations

Optimize ChaCha20 NEON performance by:

- Implementing the 8-bit rotations using the 'vtbl.8' instruction.
- Streamlining the part that adds the original state and XORs the data.
- Making some other small tweaks.

On ARM Cortex-A7, these optimizations improve ChaCha20 performance from
about 12.08 cycles per byte to about 11.37 -- a 5.9% improvement.

There is a tradeoff involved with the 'vtbl.8' rotation method since
there is at least one CPU (Cortex-A53) where it's not fastest.  But it
seems to be a better default; see the added comment.  Overall, this
patch reduces Cortex-A53 performance by less than 0.5%.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

parent 11dcb103

Expand all Hide whitespace changes

Inline Side-by-side

View file @ a1b22a5f

This diff is collapsed.

Please register or to comment