• Eric Biggers's avatar
    crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS · ede96221
    Eric Biggers authored
    Add an ARM NEON-accelerated implementation of Speck-XTS.  It operates on
    128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
    Speck64.  Each 128-byte chunk goes through XTS preprocessing, then is
    encrypted/decrypted (doing one cipher round for all the blocks, then the
    next round, etc.), then goes through XTS postprocessing.
    
    The performance depends on the processor but can be about 3 times faster
    than the generic code.  For example, on an ARMv7 processor we observe
    the following performance with Speck128/256-XTS:
    
        xts-speck128-neon:     Encryption 107.9 MB/s, Decryption 108.1 MB/s
        xts(speck128-generic): Encryption  32.1 MB/s, Decryption  36.6 MB/s
    
    In comparison to AES-256-XTS without the Cryptography Extensions:
    
        xts-aes-neonbs:        Encryption  41.2 MB/s, Decryption  36.7 MB/s
        xts(aes-asm):          Encryption  31.7 MB/s, Decryption  30.8 MB/s
        xts(aes-generic):      Encryption  21.2 MB/s, Decryption  20.9 MB/s
    
    Speck64/128-XTS is even faster:
    
        xts-speck64-neon:      Encryption 138.6 MB/s, Decryption 139.1 MB/s
    
    Note that as with the generic code, only the Speck128 and Speck64
    variants are supported.  Also, for now only the XTS mode of operation is
    supported, to target the disk and file encryption use cases.  The NEON
    code also only handles the portion of the data that is evenly divisible
    into 128-byte chunks, with any remainder handled by a C fallback.  Of
    course, other modes of operation could be added later if needed, and/or
    the NEON code could be updated to handle other buffer sizes.
    
    The XTS specification is only defined for AES which has a 128-bit block
    size, so for the GF(2^64) math needed for Speck64-XTS we use the
    reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
    paper.  Of course, when possible users should use Speck128-XTS, but even
    that may be too slow on some processors; Speck64-XTS can be faster.
    Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
    Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
    ede96221
speck-neon-core.S 9.99 KB