- 12 Jan, 2018 17 commits
-
-
Eric Biggers authored
Convert salsa20-generic from the deprecated "blkcipher" API to the "skcipher" API, in the process fixing it up to be thread-safe (as the crypto API expects) by maintaining each request's state separately from the transform context. Also remove the unnecessary cra_alignmask and tighten validation of the key size by accepting only 16 or 32 bytes, not anything in between. These changes bring the code close to the way chacha20-generic does things, so hopefully it will be easier to maintain in the future. However, the way Salsa20 interprets the IV is still slightly different; that was not changed. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Arnd Bergmann authored
While testing other changes, I discovered that gcc-7.2.1 produces badly optimized code for aes_encrypt/aes_decrypt. This is especially true when CONFIG_UBSAN_SANITIZE_ALL is enabled, where it leads to extremely large stack usage that in turn might cause kernel stack overflows: crypto/aes_generic.c: In function 'aes_encrypt': crypto/aes_generic.c:1371:1: warning: the frame size of 4880 bytes is larger than 2048 bytes [-Wframe-larger-than=] crypto/aes_generic.c: In function 'aes_decrypt': crypto/aes_generic.c:1441:1: warning: the frame size of 4864 bytes is larger than 2048 bytes [-Wframe-larger-than=] I verified that this problem exists on all architectures that are supported by gcc-7.2, though arm64 in particular is less affected than the others. I also found that gcc-7.1 and gcc-8 do not show the extreme stack usage but still produce worse code than earlier versions for this file, apparently because of optimization passes that generally provide a substantial improvement in object code quality but understandably fail to find any shortcuts in the AES algorithm. Possible workarounds include a) disabling -ftree-pre and -ftree-sra optimizations, this was an earlier patch I tried, which reliably fixed the stack usage, but caused a serious performance regression in some versions, as later testing found. b) disabling UBSAN on this file or all ciphers, as suggested by Ard Biesheuvel. This would lead to massively better crypto performance in UBSAN-enabled kernels and avoid the stack usage, but there is a concern over whether we should exclude arbitrary files from UBSAN at all. c) Forcing the optimization level in a different way. Similar to a), but rather than deselecting specific optimization stages, this now uses "gcc -Os" for this file, regardless of the CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE/SIZE option. This is a reliable workaround for the stack consumption on all architecture, and I've retested the performance results now on x86, cycles/byte (lower is better) for cbc(aes-generic) with 256 bit keys: -O2 -Os gcc-6.3.1 14.9 15.1 gcc-7.0.1 14.7 15.3 gcc-7.1.1 15.3 14.7 gcc-7.2.1 16.8 15.9 gcc-8.0.0 15.5 15.6 This implements the option c) by enabling forcing -Os on all compiler versions starting with gcc-7.1. As a workaround for PR83356, it would only be needed for gcc-7.2+ with UBSAN enabled, but since it also shows better performance on gcc-7.1 without UBSAN, it seems appropriate to use the faster version here as well. Side note: during testing, I also played with the AES code in libressl, which had a similar performance regression from gcc-6 to gcc-7.2, but was three times slower overall. It might be interesting to investigate that further and possibly port the Linux implementation into that. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651 Cc: Richard Biener <rguenther@suse.de> Cc: Jakub Jelinek <jakub@gcc.gnu.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Eric Biggers authored
Similar to what was done for the hash API, update the AEAD API to track whether each transform has been keyed, and reject encryption/decryption if a key is needed but one hasn't been set. This isn't quite as important as the equivalent fix for the hash API because AEADs always require a key, so are unlikely to be used without one. Still, tracking the key will prevent accidental unkeyed use. algif_aead also had to track the key anyway, so the new flag replaces that and slightly simplifies the algif_aead implementation. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Eric Biggers authored
Similar to what was done for the hash API, update the skcipher API to track whether each transform has been keyed, and reject encryption/decryption if a key is needed but one hasn't been set. This isn't as important as the equivalent fix for the hash API because symmetric ciphers almost always require a key (the "null cipher" is the only exception), so are unlikely to be used without one. Still, tracking the key will prevent accidental unkeyed use. algif_skcipher also had to track the key anyway, so the new flag replaces that and simplifies the algif_skcipher implementation. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Eric Biggers authored
Now that the crypto API prevents a keyed hash from being used without setting the key, there's no need for GHASH to do this check itself. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Eric Biggers authored
Currently, almost none of the keyed hash algorithms check whether a key has been set before proceeding. Some algorithms are okay with this and will effectively just use a key of all 0's or some other bogus default. However, others will severely break, as demonstrated using "hmac(sha3-512-generic)", the unkeyed use of which causes a kernel crash via a (potentially exploitable) stack buffer overflow. A while ago, this problem was solved for AF_ALG by pairing each hash transform with a 'has_key' bool. However, there are still other places in the kernel where userspace can specify an arbitrary hash algorithm by name, and the kernel uses it as unkeyed hash without checking whether it is really unkeyed. Examples of this include: - KEYCTL_DH_COMPUTE, via the KDF extension - dm-verity - dm-crypt, via the ESSIV support - dm-integrity, via the "internal hash" mode with no key given - drbd (Distributed Replicated Block Device) This bug is especially bad for KEYCTL_DH_COMPUTE as that requires no privileges to call. Fix the bug for all users by adding a flag CRYPTO_TFM_NEED_KEY to the ->crt_flags of each hash transform that indicates whether the transform still needs to be keyed or not. Then, make the hash init, import, and digest functions return -ENOKEY if the key is still needed. The new flag also replaces the 'has_key' bool which algif_hash was previously using, thereby simplifying the algif_hash implementation. Reported-by: syzbot <syzkaller@googlegroups.com> Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Eric Biggers authored
We need to consistently enforce that keyed hashes cannot be used without setting the key. To do this we need a reliable way to determine whether a given hash algorithm is keyed or not. AF_ALG currently does this by checking for the presence of a ->setkey() method. However, this is actually slightly broken because the CRC-32 algorithms implement ->setkey() but can also be used without a key. (The CRC-32 "key" is not actually a cryptographic key but rather represents the initial state. If not overridden, then a default initial state is used.) Prepare to fix this by introducing a flag CRYPTO_ALG_OPTIONAL_KEY which indicates that the algorithm has a ->setkey() method, but it is not required to be called. Then set it on all the CRC-32 algorithms. The same also applies to the Adler-32 implementation in Lustre. Also, the cryptd and mcryptd templates have to pass through the flag from their underlying algorithm. Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Eric Biggers authored
Since Poly1305 requires a nonce per invocation, the Linux kernel implementations of Poly1305 don't use the crypto API's keying mechanism and instead expect the key and nonce as the first 32 bytes of the data. But ->setkey() is still defined as a stub returning an error code. This prevents Poly1305 from being used through AF_ALG and will also break it completely once we start enforcing that all crypto API users (not just AF_ALG) call ->setkey() if present. Fix it by removing crypto_poly1305_setkey(), leaving ->setkey as NULL. Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Eric Biggers authored
When the mcryptd template is used to wrap an unkeyed hash algorithm, don't install a ->setkey() method to the mcryptd instance. This change is necessary for mcryptd to keep working with unkeyed hash algorithms once we start enforcing that ->setkey() is called when present. Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Eric Biggers authored
When the cryptd template is used to wrap an unkeyed hash algorithm, don't install a ->setkey() method to the cryptd instance. This change is necessary for cryptd to keep working with unkeyed hash algorithms once we start enforcing that ->setkey() is called when present. Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Eric Biggers authored
Templates that use an shash spawn can use crypto_shash_alg_has_setkey() to determine whether the underlying algorithm requires a key or not. But there was no corresponding function for ahash spawns. Add it. Note that the new function actually has to support both shash and ahash algorithms, since the ahash API can be used with either. Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Colin Ian King authored
There seems to be a cut-n-paste bug with the name of the buffer being free'd, xoutbuf should be used instead of axbuf. Detected by CoverityScan, CID#1463420 ("Copy-paste error") Fixes: 427988d9 ("crypto: tcrypt - add multibuf aead speed test") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Colin Ian King authored
Trivial fix to spelling mistakes in pr_err error message text. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Stephan Mueller authored
The user space interface allows specifying the type and mask field used to allocate the cipher. Only a subset of the possible flags are intended for user space. Therefore, white-list the allowed flags. In case the user space caller uses at least one non-allowed flag, EINVAL is returned. Reported-by: syzbot <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Joey Pabalinas authored
When char is signed, storing the values 0xba (186) and 0xad (173) in the `guard` array produces signed overflow. Change the type of `guard` to static unsigned char to correct undefined behavior and reduce function stack usage. Signed-off-by: Joey Pabalinas <joeypabalinas@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Eric Biggers authored
For chacha20_block(), use the existing 32-bit left-rotate function instead of defining one ourselves. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Himanshu Jha authored
Use dma_zalloc_coherent for allocating zeroed memory and remove unnecessary memset function. Done using Coccinelle. Generated-by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci 0-day tested with no failures. Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
- 05 Jan, 2018 14 commits
-
-
Eric Biggers authored
crypto_poly1305_final() no longer requires a cra_alignmask, and nothing else in the x86 poly1305-simd implementation does either. So remove the cra_alignmask so that the crypto API does not have to unnecessarily align the buffers. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Eric Biggers authored
Now that nothing in poly1305-generic assumes any special alignment, remove the cra_alignmask so that the crypto API does not have to unnecessarily align the buffers. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Eric Biggers authored
Currently the only part of poly1305-generic which is assuming special alignment is the part where the final digest is written. Switch this over to the unaligned access macros so that we'll be able to remove the cra_alignmask. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Eric Biggers authored
There is a message posted to the crypto notifier chain when an algorithm is unregistered, and when a template is registered or unregistered. But nothing is listening for those messages; currently there are only listeners for the algorithm request and registration messages. Get rid of these unused notifications for now. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Eric Biggers authored
Reference counters should use refcount_t rather than atomic_t, since the refcount_t implementation can prevent overflows, reducing the exploitability of reference leak bugs. crypto_alg.cra_refcount is a reference counter with the usual semantics, so switch it over to refcount_t. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Antoine Ténart authored
This patch fixes the hash support in the SafeXcel driver when the update size is a multiple of a block size, and when a final call is made just after with a size of 0. In such cases the driver should cache the last block from the update to avoid handling 0 length data on the final call (that's a hardware limitation). Cc: stable@vger.kernel.org Fixes: 1b44c5a6 ("crypto: inside-secure - add SafeXcel EIP197 crypto engine driver") Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Antoine Ténart authored
This patch adds a parameter in the SafeXcel ahash request structure to keep track of the number of SG entries mapped. This allows not to call dma_unmap_sg() when dma_map_sg() wasn't called in the first place. This also removes a warning when the debugging of the DMA-API is enabled in the kernel configuration: "DMA-API: device driver tries to free DMA memory it has not allocated". Cc: stable@vger.kernel.org Fixes: 1b44c5a6 ("crypto: inside-secure - add SafeXcel EIP197 crypto engine driver") Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Christian Lamparter authored
The ccm-aes-ppc4xx now fails one of testmgr's expected failure test cases as such: |decryption failed on test 10 for ccm-aes-ppc4xx: |ret was 0, |expected -EBADMSG It doesn't look like the hardware sets the authentication failure flag. The original vendor source from which this was ported does not have any special code or notes about why this would happen or if there are any WAs. Hence, this patch converts the aead_done callback handler to perform the icv check in the driver. And this fixes the false negative and the ccm-aes-ppc4xx passes the selftests once again. |name : ccm(aes) |driver : ccm-aes-ppc4xx |module : crypto4xx |priority : 300 |refcnt : 1 |selftest : passed |internal : no |type : aead |async : yes |blocksize : 1 |ivsize : 16 |maxauthsize : 16 |geniv : <none> Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Christian Lamparter authored
KBUILD_MODNAME provides the same value. Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Christian Lamparter authored
crypto4xx_device's name variable is not set to anything. The common devname for request_irq seems to be the module name. This will fix the seemingly anonymous interrupt entry in /proc/interrupts for crypto4xx. Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Christian Lamparter authored
This patch adds support for the crypto4xx RevB cores found in the 460EX, 460SX and later cores (like the APM821xx). Without this patch, the crypto4xx driver will not be able to process any offloaded requests and simply hang indefinitely. Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Christian Lamparter authored
It is possible to avoid the ce_base null pointer check in the drivers' interrupt handler routine "crypto4xx_ce_interrupt_handler()" by simply doing the iomap in front of the IRQ registration. This way, the ce_base will always be valid in the handler and a branch in an critical path can be avoided. Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Łukasz Stelmach authored
Add support for True Random Number Generator found in Samsung Exynos 5250+ SoCs. Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Philippe Ombredanne <pombredanne@nexb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Cheah Kok Cheong authored
Add SPDX license identifier according to the type of license text found in the file. Cc: Philippe Ombredanne <pombredanne@nexb.com> Signed-off-by: Cheah Kok Cheong <thrust73@gmail.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
- 28 Dec, 2017 9 commits
-
-
Junaid Shahid authored
The aesni_gcm_enc/dec functions can access memory after the end of the AAD buffer if the AAD length is not a multiple of 4 bytes. It didn't matter with rfc4106-gcm-aesni as in that case the AAD was always followed by the 8 byte IV, but that is no longer the case with generic-gcm-aesni. This can potentially result in accessing a page that is not mapped and thus causing the machine to crash. This patch fixes that by reading the last <16 byte block of the AAD byte-by-byte and optionally via an 8-byte load if the block was at least 8 bytes. Fixes: 0487ccac ("crypto: aesni - make non-AVX AES-GCM work with any aadlen") Cc: <stable@vger.kernel.org> Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Junaid Shahid authored
The aesni_gcm_enc/dec functions can access memory before the start of the data buffer if the length of the data buffer is less than 16 bytes. This is because they perform the read via a single 16-byte load. This can potentially result in accessing a page that is not mapped and thus causing the machine to crash. This patch fixes that by reading the partial block byte-by-byte and optionally an via 8-byte load if the block was at least 8 bytes. Fixes: 0487ccac ("crypto: aesni - make non-AVX AES-GCM work with any aadlen") Cc: <stable@vger.kernel.org> Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Corentin Labbe authored
All hardware crypto devices have their CONFIG names using the following convention: CRYPTO_DEV_name_algo This patch apply this conventions on STM32 CONFIG names. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Reviewed-by: Fabien Dessenne <fabien.dessenne@st.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Zhou Wang authored
There are no init and exit callbacks, so delete its comments. Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Horia Geantă authored
Offload split key generation in CAAM engine, using DKP. DKP is supported starting with Era 6. Note that the way assoclen is transmitted from the job descriptor to the shared descriptor changes - DPOVRD register is used instead of MATH3 (where available), since DKP protocol thrashes the MATH registers. The replacement of MDHA split key generation with DKP has the side effect of the crypto engine writing the authentication key, and thus the DMA mapping direction for the buffer holding the key has to change from DMA_TO_DEVICE to DMA_BIDIRECTIONAL. There are two cases: -key is inlined in descriptor - descriptor buffer mapping changes -key is referenced - key buffer mapping changes Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Horia Geantă authored
Save Era in driver's private data for further usage, like deciding whether an erratum applies or a feature is available based on its value. Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Horia Geantă authored
ablkcipher shared descriptors are relatively small, thus there is enough space for the key to be inlined. Accordingly, there is no need to copy the key in ctx->key. Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Horia Geantă authored
Key data is not modified, it is copied in the shared descriptor. Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-
Eric Biggers authored
Using %rbp as a temporary register breaks frame pointer convention and breaks stack traces when unwinding from an interrupt in the crypto code. In twofish-3way, we can't simply replace %rbp with another register because there are none available. Instead, we use the stack to hold the values that %rbp, %r11, and %r12 were holding previously. Each of these values represents the half of the output from the previous Feistel round that is being passed on unchanged to the following round. They are only used once per round, when they are exchanged with %rax, %rbx, and %rcx. As a result, we free up 3 registers (one per block) and can reassign them so that %rbp is not used, and additionally %r14 and %r15 are not used so they do not need to be saved/restored. There may be a small overhead caused by replacing 'xchg REG, REG' with the needed sequence 'mov MEM, REG; mov REG, MEM; mov REG, REG' once per round. But, counterintuitively, when I tested "ctr-twofish-3way" on a Haswell processor, the new version was actually about 2% faster. (Perhaps 'xchg' is not as well optimized as plain moves.) Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
-