Commits · 4860620da7e5752d916737472c40be573aec1869 · Kirill Smelkov / linux

11 Feb, 2017 7 commits

crypto: arm64/aes - add NEON/Crypto Extensions CBCMAC/CMAC/XCBC driver · 4860620d

Ard Biesheuvel authored Feb 03, 2017

On ARMv8 implementations that do not support the Crypto Extensions,
such as the Raspberry Pi 3, the CCM driver falls back to the generic
table based AES implementation to perform the MAC part of the
algorithm, which is slow and not time invariant. So add a CBCMAC
implementation to the shared glue code between NEON AES and Crypto
Extensions AES, so that it can be used instead now that the CCM
driver has been updated to look for CBCMAC implementations other
than the one it supplies itself.

Also, given how these algorithms mostly only differ in the way the key
handling and the final encryption are implemented, expose CMAC and XCBC
algorithms as well based on the same core update code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

4860620d

crypto: ccm - switch to separate cbcmac driver · f15f05b0

Ard Biesheuvel authored Feb 03, 2017

Update the generic CCM driver to defer CBC-MAC processing to a
dedicated CBC-MAC ahash transform rather than open coding this
transform (and much of the associated scatterwalk plumbing) in
the CCM driver itself.

This cleans up the code considerably, but more importantly, it allows
the use of alternative CBC-MAC implementations that don't suffer from
performance degradation due to significant setup time (e.g., the NEON
based AES code needs to enable/disable the NEON, and load the S-box
into 16 SIMD registers, which cannot be amortized over the entire input
when using the cipher interface)
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

f15f05b0

crypto: testmgr - add test cases for cbcmac(aes) · 092acf06

Ard Biesheuvel authored Feb 03, 2017

In preparation of splitting off the CBC-MAC transform in the CCM
driver into a separate algorithm, define some test cases for the
AES incarnation of cbcmac.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

092acf06

crypto: aes - add generic time invariant AES cipher · b5e0b032

Ard Biesheuvel authored Feb 02, 2017

Lookup table based AES is sensitive to timing attacks, which is due to
the fact that such table lookups are data dependent, and the fact that
8 KB worth of tables covers a significant number of cachelines on any
architecture, resulting in an exploitable correlation between the key
and the processing time for known plaintexts.

For network facing algorithms such as CTR, CCM or GCM, this presents a
security risk, which is why arch specific AES ports are typically time
invariant, either through the use of special instructions, or by using
SIMD algorithms that don't rely on table lookups.

For generic code, this is difficult to achieve without losing too much
performance, but we can improve the situation significantly by switching
to an implementation that only needs 256 bytes of table data (the actual
S-box itself), which can be prefetched at the start of each block to
eliminate data dependent latencies.

This code encrypts at ~25 cycles per byte on ARM Cortex-A57 (while the
ordinary generic AES driver manages 18 cycles per byte on this
hardware). Decryption is substantially slower.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

b5e0b032

crypto: aes-generic - drop alignment requirement · ec38a937

Ard Biesheuvel authored Feb 02, 2017

The generic AES code exposes a 32-bit align mask, which forces all
users of the code to use temporary buffers or take other measures to
ensure the alignment requirement is adhered to, even on architectures
that don't care about alignment for software algorithms such as this
one.

So drop the align mask, and fix the code to use get_unaligned_le32()
where appropriate, which will resolve to whatever is optimal for the
architecture.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ec38a937

crypto: sha512-mb - Protect sha512 mb ctx mgr access · c459bd7b

Tim Chen authored Feb 01, 2017

The flusher and regular multi-buffer computation via mcryptd may race with another.
Add here a lock and turn off interrupt to to access multi-buffer
computation state cstate->mgr before a round of computation. This should
prevent the flusher code jumping in.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

c459bd7b

crypto: arm64/crc32 - merge CRC32 and PMULL instruction based drivers · 5d3d9c8b

Ard Biesheuvel authored Feb 01, 2017

The PMULL based CRC32 implementation already contains code based on the
separate, optional CRC32 instructions to fallback to when operating on
small quantities of data. We can expose these routines directly on systems
that lack the 64x64 PMULL instructions but do implement the CRC32 ones,
which makes the driver that is based solely on those CRC32 instructions
redundant. So remove it.

Note that this aligns arm64 with ARM, whose accelerated CRC32 driver
also combines the CRC32 extension based and the PMULL based versions.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

5d3d9c8b

03 Feb, 2017 33 commits

crypto: arm/aes - don't use IV buffer to return final keystream block · 1a20b966

Ard Biesheuvel authored Feb 02, 2017

The ARM bit sliced AES core code uses the IV buffer to pass the final
keystream block back to the glue code if the input is not a multiple of
the block size, so that the asm code does not have to deal with anything
except 16 byte blocks. This is done under the assumption that the outgoing
IV is meaningless anyway in this case, given that chaining is no longer
possible under these circumstances.

However, as it turns out, the CCM driver does expect the IV to retain
a value that is equal to the original IV except for the counter value,
and even interprets byte zero as a length indicator, which may result
in memory corruption if the IV is overwritten with something else.

So use a separate buffer to return the final keystream block.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

1a20b966

crypto: arm64/aes - don't use IV buffer to return final keystream block · 88a3f582

Ard Biesheuvel authored Feb 02, 2017

The arm64 bit sliced AES core code uses the IV buffer to pass the final
keystream block back to the glue code if the input is not a multiple of
the block size, so that the asm code does not have to deal with anything
except 16 byte blocks. This is done under the assumption that the outgoing
IV is meaningless anyway in this case, given that chaining is no longer
possible under these circumstances.

So use a separate buffer to return the final keystream block.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

88a3f582

crypto: arm64/aes - replace scalar fallback with plain NEON fallback · 12fcd923

Ard Biesheuvel authored Jan 28, 2017

The new bitsliced NEON implementation of AES uses a fallback in two
places: CBC encryption (which is strictly sequential, whereas this
driver can only operate efficiently on 8 blocks at a time), and the
XTS tweak generation, which involves encrypting a single AES block
with a different key schedule.

The plain (i.e., non-bitsliced) NEON code is more suitable as a fallback,
given that it is faster than scalar on low end cores (which is what
the NEON implementations target, since high end cores have dedicated
instructions for AES), and shows similar behavior in terms of D-cache
footprint and sensitivity to cache timing attacks. So switch the fallback
handling to the plain NEON driver.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

12fcd923

crypto: arm64/aes-neon-blk - tweak performance for low end cores · 4edd7d01

Ard Biesheuvel authored Jan 28, 2017

The non-bitsliced AES implementation using the NEON is highly sensitive
to micro-architectural details, and, as it turns out, the Cortex-A53 on
the Raspberry Pi 3 is a core that can benefit from this code, given that
its scalar AES performance is abysmal (32.9 cycles per byte).

The new bitsliced AES code manages 19.8 cycles per byte on this core,
but can only operate on 8 blocks at a time, which is not supported by
all chaining modes. With a bit of tweaking, we can get the plain NEON
code to run at 22.0 cycles per byte, making it useful for sequential
modes like CBC encryption. (Like bitsliced NEON, the plain NEON
implementation does not use any lookup tables, which makes it easy on
the D-cache, and invulnerable to cache timing attacks)

So tweak the plain NEON AES code to use tbl instructions rather than
shl/sri pairs, and to avoid the need to reload permutation vectors or
other constants from memory in every round. Also, improve the decryption
performance by switching to 16x8 pmul instructions for the performing
the multiplications in GF(2^8).

To allow the ECB and CBC encrypt routines to be reused by the bitsliced
NEON code in a subsequent patch, export them from the module.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

4edd7d01

crypto: arm64/aes - performance tweak · c458c4ad

Ard Biesheuvel authored Jan 28, 2017

Shuffle some instructions around in the __hround macro to shave off
0.1 cycles per byte on Cortex-A57.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

c458c4ad

crypto: arm64/aes - avoid literals for cross-module symbol references · 262ea4f6

Ard Biesheuvel authored Jan 28, 2017

Using simple adrp/add pairs to refer to the AES lookup tables exposed by
the generic AES driver (which could be loaded far away from this driver
when KASLR is in effect) was unreliable at module load time before commit
41c066f2 ("arm64: assembler: make adr_l work in modules under KASLR"),
which is why the AES code used literals instead.

So now we can get rid of the literals, and switch to the adr_l macro.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

262ea4f6

crypto: arm64/chacha20 - remove cra_alignmask · 4d1108fd

Ard Biesheuvel authored Jan 28, 2017

Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

4d1108fd

crypto: arm64/aes-blk - remove cra_alignmask · ccc5d51e

Ard Biesheuvel authored Jan 28, 2017

Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ccc5d51e

crypto: arm64/aes-ce-ccm - remove cra_alignmask · 8f4102db

Ard Biesheuvel authored Jan 28, 2017

Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

8f4102db

crypto: arm/chacha20 - remove cra_alignmask · 4a70b526

Ard Biesheuvel authored Jan 28, 2017

Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

4a70b526

crypto: arm/aes-ce - remove cra_alignmask · 1465fb13

Ard Biesheuvel authored Jan 28, 2017

Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

1465fb13

crypto: chcr - Fix Smatch Complaint · 5ba042c0

Harsh Jain authored Jan 27, 2017

Initialise variable after null check.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

5ba042c0

crypto: chcr - Fix wrong typecasting · d2826056

Harsh Jain authored Jan 27, 2017

Typecast the pointer with correct structure.
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

d2826056

crypto: chcr - Change algo priority · 8f066015

Harsh Jain authored Jan 27, 2017

Update priorities to 3000
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

8f066015

crypto: chcr - Change cra_flags for cipher algos · 44e9f799

Harsh Jain authored Jan 27, 2017

Change cipher algos flags to CRYPTO_ALG_TYPE_ABLKCIPHER.
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

44e9f799

crypto: chcr - Use cipher instead of Block Cipher in gcm setkey · 8356ea51

Harsh Jain authored Jan 27, 2017

1 Block of encrption can be done with aes-generic. no need of
cbc(aes). This patch replaces cbc(aes-generic) with aes-generic.
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

8356ea51

crypto: chcr - fix itnull.cocci warnings · ee3bd84f

Harsh Jain authored Jan 27, 2017

The first argument to list_for_each_entry cannot be NULL.

Generated by: scripts/coccinelle/iterators/itnull.cocci
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ee3bd84f

crypto: chcr - Change flow IDs · 8a13449f

Harsh Jain authored Jan 27, 2017

Change assign flowc id to each outgoing request.Firmware use flowc id
to schedule each request onto HW. FW reply may miss without this change.
Reviewed-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

8a13449f

crypto: atmel-sha - add verbose debug facilities to print hw register names · 0569fc46

Cyrille Pitchen authored Jan 26, 2017

When VERBOSE_DEBUG is defined and SHA_FLAGS_DUMP_REG flag is set in
dd->flags, this patch prints the register names and values when performing
IO accesses.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

0569fc46

crypto: atmel-authenc - add support to authenc(hmac(shaX), Y(aes)) modes · 89a82ef8

Cyrille Pitchen authored Jan 26, 2017

This patchs allows to combine the AES and SHA hardware accelerators on
some Atmel SoCs. Doing so, AES blocks are only written to/read from the
AES hardware. Those blocks are also transferred from the AES to the SHA
accelerator internally, without additionnal accesses to the system busses.

Hence, the AES and SHA accelerators work in parallel to process all the
data blocks, instead of serializing the process by (de)crypting those
blocks first then authenticating them after like the generic
crypto/authenc.c driver does.

Of course, both the AES and SHA hardware accelerators need to be available
before we can start to process the data blocks. Hence we use their crypto
request queue to synchronize both drivers.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

89a82ef8

crypto: atmel-aes - fix atmel_aes_handle_queue() · a1f613f1

Cyrille Pitchen authored Jan 26, 2017

This patch fixes the value returned by atmel_aes_handle_queue(), which
could have been wrong previously when the crypto request was started
synchronously but became asynchronous during the ctx->start() call.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a1f613f1

crypto: atmel-sha - add support to hmac(shaX) · 81d8750b

Cyrille Pitchen authored Jan 26, 2017

This patch adds support to the hmac(shaX) algorithms.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

81d8750b

crypto: atmel-sha - add simple DMA transfers · 69303cf0

Cyrille Pitchen authored Jan 26, 2017

This patch adds a simple function to perform data transfer with the DMA
controller.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

69303cf0

crypto: atmel-sha - add atmel_sha_cpu_start() · eec12f66

Cyrille Pitchen authored Jan 26, 2017

This patch adds a simple function to perform data transfer with PIO, hence
handled by the CPU.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

eec12f66

crypto: atmel-sha - add SHA_MR_MODE_IDATAR0 · 563c47df

Cyrille Pitchen authored Jan 26, 2017

This patch defines an alias macro to SHA_MR_MODE_PDC, which is not suited
for DMA usage.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

563c47df

crypto: atmel-sha - add atmel_sha_wait_for_data_ready() · 9064ed92

Cyrille Pitchen authored Jan 26, 2017

This patch simply defines a helper function to test the 'Data Ready' flag
of the Status Register. It also gives a chance for the crypto request to
be processed synchronously if this 'Data Ready' flag is already set when
polling the Status Register. Indeed, running synchronously avoid the
latency of the 'Data Ready' interrupt.

When the 'Data Ready' flag has not been set yet, we enable the associated
interrupt and resume processing the crypto request asynchronously from the
'done' task just as before.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

9064ed92

crypto: atmel-sha - redefine SHA_FLAGS_SHA* flags to match SHA_MR_ALGO_SHA* · f07cebad

Cyrille Pitchen authored Jan 26, 2017

This patch modifies the SHA_FLAGS_SHA* flags: those algo flags are now
organized as values of a single bitfield instead of individual bits.
This allows to reduce the number of bits needed to encode all possible
values. Also the new values match the SHA_MR_ALGO_SHA* values hence
the algorithm bitfield of the SHA_MR register could simply be set with:

mr = (mr & ~SHA_FLAGS_ALGO_MASK) | (ctx->flags & SHA_FLAGS_ALGO_MASK)
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

f07cebad

crypto: atmel-sha - make atmel_sha_done_task more generic · b5ce82a7

Cyrille Pitchen authored Jan 26, 2017

This patch is a transitional patch. It updates atmel_sha_done_task() to
make it more generic. Indeed, it adds a new .resume() member in the
atmel_sha_dev structure. This hook is called from atmel_sha_done_task()
to resume processing an asynchronous request.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

b5ce82a7

crypto: atmel-sha - update request queue management to make it more generic · a29af939

Cyrille Pitchen authored Jan 26, 2017

This patch is a transitional patch. It splits the atmel_sha_handle_queue()
function. Now atmel_sha_handle_queue() only manages the request queue and
calls a new .start() hook from the atmel_sha_ctx structure.
This hook allows to implement different kind of requests still handled by
a single queue.

Also when the req parameter of atmel_sha_handle_queue() refers to the very
same request as the one returned by crypto_dequeue_request(), the queue
management now gives a chance to this crypto request to be handled
synchronously, hence reducing latencies. The .start() hook returns 0 if
the crypto request was handled synchronously and -EINPROGRESS if the
crypto request still need to be handled asynchronously.

Besides, the new .is_async member of the atmel_sha_dev structure helps
tagging this asynchronous state. Indeed, the req->base.complete() callback
should not be called if the crypto request is handled synchronously.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a29af939

crypto: atmel-sha - create function to get an Atmel SHA device · 8340c7fd

Cyrille Pitchen authored Jan 26, 2017

This is a transitional patch: it creates the atmel_sha_find_dev() function,
which will be used in further patches to share the source code responsible
for finding a Atmel SHA device.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

8340c7fd

crypto: doc - Fix hash export state information · 379d972b

Rabin Vincent authored Jan 26, 2017

The documentation states that crypto_ahash_reqsize() provides the size
of the state structure used by crypto_ahash_export().  But it's actually
crypto_ahash_statesize() which provides this size.
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

379d972b

Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 34cb5821
Herbert Xu authored Feb 03, 2017
```
Merge the crypto tree to pick up arm64 output IV patch.
```
34cb5821

crypto: chcr - Fix key length for RFC4106 · 7c2cf1c4

Harsh Jain authored Jan 27, 2017

Check keylen before copying salt to avoid wrap around of Integer.
Signed-off-by: Harsh Jain <harsh@chelsio.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

7c2cf1c4