- 22 Feb, 2018 36 commits
-
-
Eric Biggers authored
Add an ARM NEON-accelerated implementation of Speck-XTS. It operates on 128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for Speck64. Each 128-byte chunk goes through XTS preprocessing, then is encrypted/decrypted (doing one cipher round for all the blocks, then the next round, etc.), then goes through XTS postprocessing. The performance depends on the processor but can be about 3 times faster than the generic code. For example, on an ARMv7 processor we observe the following performance with Speck128/256-XTS: xts-speck128-neon: Encryption 107.9 MB/s, Decryption 108.1 MB/s xts(speck128-generic): Encryption 32.1 MB/s, Decryption 36.6 MB/s In comparison to AES-256-XTS without the Cryptography Extensions: xts-aes-neonbs: Encryption 41.2 MB/s, Decryption 36.7 MB/s xts(aes-asm): Encryption 31.7 MB/s, Decryption 30.8 MB/s xts(aes-generic): Encryption 21.2 MB/s, Decryption 20.9 MB/s Speck64/128-XTS is even faster: xts-speck64-neon: Encryption 138.6 MB/s, Decryption 139.1 MB/s Note that as with the generic code, only the Speck128 and Speck64 variants are supported. Also, for now only the XTS mode of operation is supported, to target the disk and file encryption use cases. The NEON code also only handles the portion of the data that is evenly divisible into 128-byte chunks, with any remainder handled by a C fallback. Of course, other modes of operation could be added later if needed, and/or the NEON code could be updated to handle other buffer sizes. The XTS specification is only defined for AES which has a 128-bit block size, so for the GF(2^64) math needed for Speck64-XTS we use the reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX paper. Of course, when possible users should use Speck128-XTS, but even that may be too slow on some processors; Speck64-XTS can be faster. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Eric Biggers authored
Export the Speck constants and transform context and the ->setkey(), ->encrypt(), and ->decrypt() functions so that they can be reused by the ARM NEON implementation of Speck-XTS. The generic key expansion code will be reused because it is not performance-critical and is not vectorizable, while the generic encryption and decryption functions are needed as fallbacks and for the XTS tweak encryption. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Eric Biggers authored
Add a generic implementation of Speck, including the Speck128 and Speck64 variants. Speck is a lightweight block cipher that can be much faster than AES on processors that don't have AES instructions. We are planning to offer Speck-XTS (probably Speck128/256-XTS) as an option for dm-crypt and fscrypt on Android, for low-end mobile devices with older CPUs such as ARMv7 which don't have the Cryptography Extensions. Currently, such devices are unencrypted because AES is not fast enough, even when the NEON bit-sliced implementation of AES is used. Other AES alternatives such as Twofish, Threefish, Camellia, CAST6, and Serpent aren't fast enough either; it seems that only a modern ARX cipher can provide sufficient performance on these devices. This is a replacement for our original proposal (https://patchwork.kernel.org/patch/10101451/) which was to offer ChaCha20 for these devices. However, the use of a stream cipher for disk/file encryption with no space to store nonces would have been much more insecure than we thought initially, given that it would be used on top of flash storage as well as potentially on top of F2FS, neither of which is guaranteed to overwrite data in-place. Speck has been somewhat controversial due to its origin. Nevertheless, it has a straightforward design (it's an ARX cipher), and it appears to be the leading software-optimized lightweight block cipher currently, with the most cryptanalysis. It's also easy to implement without side channels, unlike AES. Moreover, we only intend Speck to be used when the status quo is no encryption, due to AES not being fast enough. We've also considered a novel length-preserving encryption mode based on ChaCha20 and Poly1305. While theoretically attractive, such a mode would be a brand new crypto construction and would be more complicated and difficult to implement efficiently in comparison to Speck-XTS. There is confusion about the byte and word orders of Speck, since the original paper doesn't specify them. But we have implemented it using the orders the authors recommended in a correspondence with them. The test vectors are taken from the original paper but were mapped to byte arrays using the recommended byte and word orders. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Dave Watson authored
Add gcmaes_crypt_by_sg routine, that will do scatter/gather by sg. Either src or dst may contain multiple buffers, so iterate over both at the same time if they are different. If the input is the same as the output, iterate only over one. Currently both the AAD and TAG must be linear, so copy them out with scatterlist_map_and_copy. If first buffer contains the entire AAD, we can optimize and not copy. Since the AAD can be any size, if copied it must be on the heap. TAG can be on the stack since it is always < 16 bytes. Only the SSE routines are updated so far, so leave the previous gcmaes_en/decrypt routines, and branch to the sg ones if the keysize is inappropriate for avx, or we are SSE only. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Dave Watson authored
The asm macros are all set up now, introduce entry points. GCM_INIT and GCM_COMPLETE have arguments supplied, so that the new scatter/gather entry points don't have to take all the arguments, and only the ones they need. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Dave Watson authored
We can fast-path any < 16 byte read if the full message is > 16 bytes, and shift over by the appropriate amount. Usually we are reading > 16 bytes, so this should be faster than the READ_PARTIAL macro introduced in b20209c9 for the average case. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Dave Watson authored
Before this diff, multiple calls to GCM_ENC_DEC will succeed, but only if all calls are a multiple of 16 bytes. Handle partial blocks at the start of GCM_ENC_DEC, and update aadhash as appropriate. The data offset %r11 is also updated after the partial block. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Dave Watson authored
HashKey computation only needs to happen once per scatter/gather operation, save it between calls in gcm_context struct instead of on the stack. Since the asm no longer stores anything on the stack, we can use %rsp directly, and clean up the frame save/restore macros a bit. Hashkeys actually only need to be calculated once per key and could be moved to when set_key is called, however, the current glue code falls back to generic aes code if fpu is disabled. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Dave Watson authored
Prepare to handle partial blocks between scatter/gather calls. For the last partial block, we only want to calculate the aadhash in GCM_COMPLETE, and a new partial block macro will handle both aadhash update and encrypting partial blocks between calls. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Dave Watson authored
Fill in aadhash, aadlen, pblocklen, curcount with appropriate values. pblocklen, aadhash, and pblockenckey are also updated at the end of each scatter/gather operation, to be carried over to the next operation. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Dave Watson authored
AAD hash only needs to be calculated once for each scatter/gather operation. Move it to its own macro, and call it from GCM_INIT instead of INITIAL_BLOCKS. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Dave Watson authored
Introduce a gcm_context_data struct that will be used to pass context data between scatter/gather update calls. It is passed as the second argument (after crypto keys), other args are renumbered. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Dave Watson authored
Make a macro for the main encode/decode routine. Only a small handful of lines differ for enc and dec. This will also become the main scatter/gather update routine. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Dave Watson authored
Merge encode and decode tag calculations in GCM_COMPLETE macro. Scatter/gather routines will call this once at the end of encryption or decryption. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Dave Watson authored
Reduce code duplication by introducting GCM_INIT macro. This macro will also be exposed as a function for implementing scatter/gather support, since INIT only needs to be called once for the full operation. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Dave Watson authored
Macro-ify function save and restore. These will be used in new functions added for scatter/gather update operations. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Dave Watson authored
Use macro operations to merge implemetations of INITIAL_BLOCKS, since they differ by only a small handful of lines. Use macro counter \@ to simplify implementation. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Markus Elfring authored
Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Markus Elfring authored
Replace the specification of a data structure by a pointer dereference as the parameter for the operator "sizeof" to make the corresponding size determination a bit safer according to the Linux coding style convention. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Markus Elfring authored
Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Markus Elfring authored
Two local variables will eventually be set to appropriate pointers a bit later. Thus omit their explicit initialisation at the beginning. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Markus Elfring authored
Replace the function name in this error message so that the same name is mentioned according to what was called before. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Markus Elfring authored
The local variable "cryp_error" was used only for two condition checks. * Check the return values from these function calls directly instead. * Delete this variable which became unnecessary with this refactoring. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Markus Elfring authored
Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Conor McLoughlin authored
The RSA private key for the first form should have version, prime1, prime2, exponent1, exponent2, coefficient values 0. With non-zero values for prime1,2, exponent 1,2 and coefficient the Intel QAT driver will assume that values are provided for the private key second form. This will result in signature verification failures for modules where QAT device is present and the modules are signed with rsa,sha256. Cc: <stable@vger.kernel.org> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Conor McLoughlin <conor.mcloughlin@intel.com> Reviewed-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Antoine Tenart authored
This patch adds a label to unmap the result buffer in the hash send function error path. Fixes: 1b44c5a6 ("crypto: inside-secure - add SafeXcel EIP197 crypto engine driver") Suggested-by: Ofer Heifetz <oferh@marvell.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Antoine Tenart authored
This patch updates the Inside Secure SafeXcel driver to avoid being out-of-sync between the number of requests sent and the one being completed. The number of requests acknowledged by the driver can be different than the threshold that was configured if new requests were being pushed to the h/w in the meantime. The driver wasn't taking those into account, and the number of remaining requests to handled (to reconfigure the interrupt threshold) could be out-of sync. This patch fixes it by not taking in account the number of requests left, but by taking in account the total number of requests being sent to the hardware, so that new requests are being taken into account. Fixes: dc7e28a3 ("crypto: inside-secure - dequeue all requests at once") Suggested-by: Ofer Heifetz <oferh@marvell.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Antoine Tenart authored
When exiting a transformation, the cra_exit() helper is called in each driver providing one. The Inside Secure SafeXcel driver has one, which is responsible of freeing some areas and of sending one invalidation request to the crypto engine, to invalidate the context that was used during the transformation. We could see in some setups (when lots of transformations were being used with a short lifetime, and hence lots of cra_exit() calls) NULL pointer dereferences and other weird issues. All these issues were coming from accessing the tfm context. The issue is the invalidation request completion is checked using a wait_for_completion_interruptible() call in both the cipher and hash cra_exit() helpers. In some cases this was interrupted while the invalidation request wasn't processed yet. And then cra_exit() returned, and its caller was freeing the tfm instance. Only then the request was being handled by the SafeXcel driver, which lead to the said issues. This patch fixes this by using wait_for_completion() calls in these specific cases. Fixes: 1b44c5a6 ("crypto: inside-secure - add SafeXcel EIP197 crypto engine driver") Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Antoine Tenart authored
This patch adds a check in the SafeXcel dequeue function, to avoid processing request further if no hardware command was issued. This can happen in certain cases where the ->send() function caches all the data that would have been send. Fixes: 809778e0 ("crypto: inside-secure - fix hash when length is a multiple of a block") Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Antoine Tenart authored
This patch fixes the cache length computation as cache_len could end up being a negative value. The check between the queued size and the block size is updated to reflect the caching mechanism which can cache up to a full block size (included!). Fixes: 809778e0 ("crypto: inside-secure - fix hash when length is a multiple of a block") Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Antoine Tenart authored
This patch fixes the extra cache computation when the queued data is a multiple of a block size. This fixes the hash support in some cases. Fixes: 809778e0 ("crypto: inside-secure - fix hash when length is a multiple of a block") Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Antoine Tenart authored
This patch fixes the Inside Secure SafeXcel driver not to overwrite the interrupt threshold value. In certain cases the value of this register, which controls when to fire an interrupt, was overwritten. This lead to packet not being processed or acked as the driver never was aware of their completion. This patch fixes this behaviour by not setting the threshold when requests are being processed by the engine. Fixes: dc7e28a3 ("crypto: inside-secure - dequeue all requests at once") Suggested-by: Ofer Heifetz <oferh@marvell.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Antoine Tenart authored
Free Electrons became Bootlin. Update my email accordingly. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Stefan Wahren authored
In case the probe of the clock is deferred, we would assume it is optional. This is wrong, so defer the probe of this driver until the clock is available. Fixes: 791af4f4 ("hwrng: bcm2835 - Manage an optional clock") Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Jinbum Park authored
Move the AES inverse S-box to the .rodata section where it is safe from abuse by speculation. Signed-off-by: Jinbum Park <jinb.park7@gmail.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Peter Robinson authored
The MODULE_ALIAS is required to enable the sun4i-ss driver to load automatically when built at a module. Tested on a Cubietruck. Fixes: 6298e948 ("crypto: sunxi-ss - Add Allwinner Security System crypto accelerator") Signed-off-by: Peter Robinson <pbrobinson@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
- 15 Feb, 2018 4 commits
-
-
Fabien DESSENNE authored
stm32mp1 differs from stm32f7 in the way it handles byte ordering and padding for aes gcm & ccm algo. Signed-off-by: Fabien Dessenne <fabien.dessenne@st.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Fabien DESSENNE authored
Add AEAD cipher algorithms for aes gcm and ccm. Signed-off-by: Fabien Dessenne <fabien.dessenne@st.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Colin Ian King authored
Functions qat_rsa_set_n, qat_rsa_set_e and qat_rsa_set_n are local to the source and do not need to be in global scope, so make them static. Cleans up sparse warnings: drivers/crypto/qat/qat_common/qat_asym_algs.c:972:5: warning: symbol 'qat_rsa_set_n' was not declared. Should it be static? drivers/crypto/qat/qat_common/qat_asym_algs.c:1003:5: warning: symbol 'qat_rsa_set_e' was not declared. Should it be static? drivers/crypto/qat/qat_common/qat_asym_algs.c:1027:5: warning: symbol 'qat_rsa_set_d' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Colin Ian King authored
Function ccp_get_dma_chan_attr is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: drivers/crypto/ccp/ccp-dmaengine.c:41:14: warning: symbol 'ccp_get_dma_chan_attr' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-