Commits · ac3c8f36c31d2f0add4eeaa42c7d0540d7c60583 · nexedi / linux

21 Sep, 2018 26 commits

crypto: lrw - Do not use auxiliary buffer · ac3c8f36

Ondrej Mosnacek authored Sep 13, 2018

This patch simplifies the LRW template to recompute the LRW tweaks from
scratch in the second pass and thus also removes the need to allocate a
dynamic buffer using kmalloc().

As discussed at [1], the use of kmalloc causes deadlocks with dm-crypt.

PERFORMANCE MEASUREMENTS (x86_64)
Performed using: https://gitlab.com/omos/linux-crypto-bench
Crypto driver used: lrw(ecb-aes-aesni)

The results show that the new code has about the same performance as the
old code. For 512-byte message it seems to be even slightly faster, but
that might be just noise.

Before:
ALGORITHM KEY (b) DATA (B) TIME ENC (ns) TIME DEC (ns)
lrw(aes) 256 64 200 203
lrw(aes) 320 64 202 204
lrw(aes) 384 64 204 205
lrw(aes) 256 512 415 415
lrw(aes) 320 512 432 440
lrw(aes) 384 512 449 451
lrw(aes) 256 4096 1838 1995
lrw(aes) 320 4096 2123 1980
lrw(aes) 384 4096 2100 2119
lrw(aes) 256 16384 7183 6954
lrw(aes) 320 16384 7844 7631
lrw(aes) 384 16384 8256 8126
lrw(aes) 256 32768 14772 14484
lrw(aes) 320 32768 15281 15431
lrw(aes) 384 32768 16469 16293

After:
ALGORITHM KEY (b) DATA (B) TIME ENC (ns) TIME DEC (ns)
lrw(aes) 256 64 197 196
lrw(aes) 320 64 200 197
lrw(aes) 384 64 203 199
lrw(aes) 256 512 385 380
lrw(aes) 320 512 401 395
lrw(aes) 384 512 415 415
lrw(aes) 256 4096 1869 1846
lrw(aes) 320 4096 2080 1981
lrw(aes) 384 4096 2160 2109
lrw(aes) 256 16384 7077 7127
lrw(aes) 320 16384 7807 7766
lrw(aes) 384 16384 8108 8357
lrw(aes) 256 32768 14111 14454
lrw(aes) 320 32768 15268 15082
lrw(aes) 384 32768 16581 16250

[1] https://lkml.org/lkml/2018/8/23/1315Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ac3c8f36

crypto: lrw - Optimize tweak computation · c778f96b

Ondrej Mosnacek authored Sep 13, 2018

This patch rewrites the tweak computation to a slightly simpler method
that performs less bswaps. Based on performance measurements the new
code seems to provide slightly better performance than the old one.

PERFORMANCE MEASUREMENTS (x86_64)
Performed using: https://gitlab.com/omos/linux-crypto-bench
Crypto driver used: lrw(ecb-aes-aesni)

Before:
ALGORITHM KEY (b) DATA (B) TIME ENC (ns) TIME DEC (ns)
lrw(aes) 256 64 204 286
lrw(aes) 320 64 227 203
lrw(aes) 384 64 208 204
lrw(aes) 256 512 441 439
lrw(aes) 320 512 456 455
lrw(aes) 384 512 469 483
lrw(aes) 256 4096 2136 2190
lrw(aes) 320 4096 2161 2213
lrw(aes) 384 4096 2295 2369
lrw(aes) 256 16384 7692 7868
lrw(aes) 320 16384 8230 8691
lrw(aes) 384 16384 8971 8813
lrw(aes) 256 32768 15336 15560
lrw(aes) 320 32768 16410 16346
lrw(aes) 384 32768 18023 17465

After:
ALGORITHM KEY (b) DATA (B) TIME ENC (ns) TIME DEC (ns)
lrw(aes) 256 64 200 203
lrw(aes) 320 64 202 204
lrw(aes) 384 64 204 205
lrw(aes) 256 512 415 415
lrw(aes) 320 512 432 440
lrw(aes) 384 512 449 451
lrw(aes) 256 4096 1838 1995
lrw(aes) 320 4096 2123 1980
lrw(aes) 384 4096 2100 2119
lrw(aes) 256 16384 7183 6954
lrw(aes) 320 16384 7844 7631
lrw(aes) 384 16384 8256 8126
lrw(aes) 256 32768 14772 14484
lrw(aes) 320 32768 15281 15431
lrw(aes) 384 32768 16469 16293
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

c778f96b

crypto: testmgr - Add test for LRW counter wrap-around · dc6d6d5a

Ondrej Mosnacek authored Sep 13, 2018

This patch adds a test vector for lrw(aes) that triggers wrap-around of
the counter, which is a tricky corner case.
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

dc6d6d5a

crypto: lrw - Fix out-of bounds access on counter overflow · fbe1a850

Ondrej Mosnacek authored Sep 13, 2018

When the LRW block counter overflows, the current implementation returns
128 as the index to the precomputed multiplication table, which has 128
entries. This patch fixes it to return the correct value (127).

Fixes: 64470f1b ("[CRYPTO] lrw: Liskov Rivest Wagner, a tweakable narrow block cipher mode")
Cc: <stable@vger.kernel.org> # 2.6.20+
Reported-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

fbe1a850

crypto: tcrypt - fix ghash-generic speed test · 331351f8

Horia Geantă authored Sep 12, 2018

ghash is a keyed hash algorithm, thus setkey needs to be called.
Otherwise the following error occurs:
$ modprobe tcrypt mode=318 sec=1
testing speed of async ghash-generic (ghash-generic)
tcrypt: test  0 (   16 byte blocks,   16 bytes per update,   1 updates):
tcrypt: hashing failed ret=-126

Cc: <stable@vger.kernel.org> # 4.6+
Fixes: 0660511c ("crypto: tcrypt - Use ahash")
Tested-by: Franck Lenormand <franck.lenormand@nxp.com>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

331351f8

arm64: defconfig: enable CAAM crypto engine on QorIQ DPAA2 SoCs · e8342cc7

Horia Geantă authored Sep 12, 2018

Enable CAAM (Cryptographic Accelerator and Assurance Module) driver
for QorIQ Data Path Acceleration Architecture (DPAA) v2.
It handles DPSECI (Data Path SEC Interface) DPAA2 objects that sit
on the Management Complex (MC) fsl-mc bus.
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

e8342cc7

crypto: caam/qi2 - add support for ahash algorithms · 3f16f6c9

Horia Geantă authored Sep 12, 2018

Add support for unkeyed and keyed (hmac) md5, sha algorithms.
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

3f16f6c9

crypto: caam - export ahash shared descriptor generation · 0efa7579

Horia Geantă authored Sep 12, 2018

caam/qi2 driver will support ahash algorithms,
thus move ahash descriptors generation in a shared location.
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

0efa7579

crypto: caam/qi2 - add skcipher algorithms · 226853ac

Horia Geantă authored Sep 12, 2018

Add support to submit the following skcipher algorithms
via the DPSECI backend:
cbc({aes,des,des3_ede})
ctr(aes), rfc3686(ctr(aes))
xts(aes)
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

226853ac

crypto: caam/qi2 - add DPAA2-CAAM driver · 8d818c10

Horia Geantă authored Sep 12, 2018

Add CAAM driver that works using the DPSECI backend, i.e. manages
DPSECI DPAA2 objects sitting on the Management Complex (MC) fsl-mc bus.

Data transfers (crypto requests) are sent/received to/from CAAM crypto
engine via Queue Interface (v2), this being similar to existing caam/qi.
OTOH, configuration/setup (obtaining virtual queue IDs, authorization
etc.) is done by sending commands to the MC f/w.

Note that the CAAM accelerator included in DPAA2 platforms still has
Job Rings. However, the driver being added does not handle access
via this backend. Kconfig & Makefile are updated such that DPAA2-CAAM
(a.k.a. "caam/qi2") driver does not depend on caam/jr or caam/qi
backends - which rely on platform bus support (ctrl.c).

Support for the following aead and authenc algorithms is also added
in this patch:
-aead:
gcm(aes)
rfc4106(gcm(aes))
rfc4543(gcm(aes))
-authenc:
authenc(hmac({md5,sha*}),cbc({aes,des,des3_ede}))
echainiv(authenc(hmac({md5,sha*}),cbc({aes,des,des3_ede})))
authenc(hmac({md5,sha*}),rfc3686(ctr(aes))
seqiv(authenc(hmac({md5,sha*}),rfc3686(ctr(aes)))
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

8d818c10

crypto: caam - add Queue Interface v2 error codes · 94cebd9d

Horia Geantă authored Sep 12, 2018

Add support to translate error codes returned by QI v2, i.e.
Queue Interface present on DataPath Acceleration Architecture
v2 (DPAA2).
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

94cebd9d

crypto: caam - add DPAA2-CAAM (DPSECI) backend API · f9cb74fd

Horia Geantă authored Sep 12, 2018

Add the low-level API that allows to manage DPSECI DPAA2 objects
that sit on the Management Complex (MC) fsl-mc bus.

The API is compatible with MC firmware 10.2.0+.
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

f9cb74fd

crypto: caam - fix implicit casts in endianness helpers · aae733a3

Horia Geantă authored Sep 12, 2018

Fix the following sparse endianness warnings:

drivers/crypto/caam/regs.h:95:1: sparse: incorrect type in return expression (different base types) @@ expected unsigned int @@ got restricted __le32unsigned int @@
drivers/crypto/caam/regs.h:95:1: expected unsigned int
drivers/crypto/caam/regs.h:95:1: got restricted __le32 [usertype] <noident>
drivers/crypto/caam/regs.h:95:1: sparse: incorrect type in return expression (different base types) @@ expected unsigned int @@ got restricted __be32unsigned int @@
drivers/crypto/caam/regs.h:95:1: expected unsigned int
drivers/crypto/caam/regs.h:95:1: got restricted __be32 [usertype] <noident>

drivers/crypto/caam/regs.h:92:1: sparse: cast to restricted __le32
drivers/crypto/caam/regs.h:92:1: sparse: cast to restricted __be32

Fixes: 261ea058 ("crypto: caam - handle core endianness != caam endianness")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

aae733a3

soc: fsl: dpio: add congestion notification support · 55d01102

Horia Geantă authored Sep 12, 2018

Add support for Congestion State Change Notifications (CSCN), which
allow DPIO users to be notified when a congestion group changes its
state (due to hitting the entrance / exit threshold).
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

55d01102

soc: fsl: dpio: add frame list format support · 009447a0

Horia Geantă authored Sep 12, 2018

Add support for dpaa2_fd_list format, i.e. dpaa2_fl_entry structure
and accessors.

Frame list entries (FLEs) are similar, but not identical to FDs:
+ "F" (final) bit
- FMT[b'01] is reserved
- DD, SC, DROPP bits (covered by "FD compatibility" field in FLE case)
- FLC[5:0] not used for stashing
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

009447a0

soc: fsl: dpio: add back some frame queue functions · 48c43de0

Horia Geantă authored Sep 12, 2018

This commit adds back functions removed in
commit a211c817 ("staging: fsl-mc/dpio: remove couple of unused functions")
since dpseci object will make use of them.
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

48c43de0

bus: fsl-mc: add support for dpseci device type · e9158b35

Horia Geantă authored Sep 12, 2018

Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Acked-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

e9158b35

crypto: chacha20 - Fix chacha20_block() keystream alignment (again) · a5e9f557

Eric Biggers authored Sep 11, 2018

In commit 9f480fae ("crypto: chacha20 - Fix keystream alignment for
chacha20_block()"), I had missed that chacha20_block() can be called
directly on the buffer passed to get_random_bytes(), which can have any
alignment.  So, while my commit didn't break anything, it didn't fully
solve the alignment problems.

Revert my solution and just update chacha20_block() to use
put_unaligned_le32(), so the output buffer need not be aligned.
This is simpler, and on many CPUs it's the same speed.

But, I kept the 'tmp' buffers in extract_crng_user() and
_get_random_bytes() 4-byte aligned, since that alignment is actually
needed for _crng_backtrack_protect() too.
Reported-by: Stephan Müller <smueller@chronox.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a5e9f557

crypto: xts - Drop use of auxiliary buffer · 78105c7e

Ondrej Mosnacek authored Sep 11, 2018

Since commit acb9b159 ("crypto: gf128mul - define gf128mul_x_* in
gf128mul.h"), the gf128mul_x_*() functions are very fast and therefore
caching the computed XTS tweaks has only negligible advantage over
computing them twice.

In fact, since the current caching implementation limits the size of
the calls to the child ecb(...) algorithm to PAGE_SIZE (usually 4096 B),
it is often actually slower than the simple recomputing implementation.

This patch simplifies the XTS template to recompute the XTS tweaks from
scratch in the second pass and thus also removes the need to allocate a
dynamic buffer using kmalloc().

As discussed at [1], the use of kmalloc causes deadlocks with dm-crypt.

PERFORMANCE RESULTS
I measured time to encrypt/decrypt a memory buffer of varying sizes with
xts(ecb-aes-aesni) using a tool I wrote ([2]) and the results suggest
that after this patch the performance is either better or comparable for
both small and large buffers. Note that there is a lot of noise in the
measurements, but the overall difference is easy to see.

Old code:
ALGORITHM KEY (b) DATA (B) TIME ENC (ns) TIME DEC (ns)
xts(aes) 256 64 331 328
xts(aes) 384 64 332 333
xts(aes) 512 64 338 348
xts(aes) 256 512 889 920
xts(aes) 384 512 1019 993
xts(aes) 512 512 1032 990
xts(aes) 256 4096 2152 2292
xts(aes) 384 4096 2453 2597
xts(aes) 512 4096 3041 2641
xts(aes) 256 16384 9443 8027
xts(aes) 384 16384 8536 8925
xts(aes) 512 16384 9232 9417
xts(aes) 256 32768 16383 14897
xts(aes) 384 32768 17527 16102
xts(aes) 512 32768 18483 17322

New code:
ALGORITHM KEY (b) DATA (B) TIME ENC (ns) TIME DEC (ns)
xts(aes) 256 64 328 324
xts(aes) 384 64 324 319
xts(aes) 512 64 320 322
xts(aes) 256 512 476 473
xts(aes) 384 512 509 492
xts(aes) 512 512 531 514
xts(aes) 256 4096 2132 1829
xts(aes) 384 4096 2357 2055
xts(aes) 512 4096 2178 2027
xts(aes) 256 16384 6920 6983
xts(aes) 384 16384 8597 7505
xts(aes) 512 16384 7841 8164
xts(aes) 256 32768 13468 12307
xts(aes) 384 32768 14808 13402
xts(aes) 512 32768 15753 14636

[1] https://lkml.org/lkml/2018/8/23/1315
[2] https://gitlab.com/omos/linux-crypto-benchSigned-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

78105c7e

crypto: arm64/aes-blk - improve XTS mask handling · 2e5d2f33

Ard Biesheuvel authored Sep 10, 2018

The Crypto Extension instantiation of the aes-modes.S collection of
skciphers uses only 15 NEON registers for the round key array, whereas
the pure NEON flavor uses 16 NEON registers for the AES S-box.

This means we have a spare register available that we can use to hold
the XTS mask vector, removing the need to reload it at every iteration
of the inner loop.

Since the pure NEON version does not permit this optimization, tweak
the macros so we can factor out this functionality. Also, replace the
literal load with a short sequence to compose the mask vector.

On Cortex-A53, this results in a ~4% speedup.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

2e5d2f33

crypto: arm64/aes-blk - add support for CTS-CBC mode · dd597fb3

Ard Biesheuvel authored Sep 10, 2018

Currently, we rely on the generic CTS chaining mode wrapper to
instantiate the cts(cbc(aes)) skcipher. Due to the high performance
of the ARMv8 Crypto Extensions AES instructions (~1 cycles per byte),
any overhead in the chaining mode layers is amplified, and so it pays
off considerably to fold the CTS handling into the SIMD routines.

On Cortex-A53, this results in a ~50% speedup for smaller input sizes.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

dd597fb3

crypto: arm64/aes-blk - revert NEON yield for skciphers · 6e7de6af

Ard Biesheuvel authored Sep 10, 2018

The reasoning of commit f10dc56c ("crypto: arm64 - revert NEON yield
for fast AEAD implementations") applies equally to skciphers: the walk
API already guarantees that the input size of each call into the NEON
code is bounded to the size of a page, and so there is no need for an
additional TIF_NEED_RESCHED flag check inside the inner loop. So revert
the skcipher changes to aes-modes.S (but retain the mac ones)

This partially reverts commit 0c8f838a.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

6e7de6af

crypto: arm64/aes-blk - remove pointless (u8 *) casts · 557ecb45

Ard Biesheuvel authored Sep 10, 2018

For some reason, the asmlinkage prototypes of the NEON routines take
u8[] arguments for the round key arrays, while the actual round keys
are arrays of u32, and so passing them into those routines requires
u8* casts at each occurrence. Fix that.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

557ecb45

crypto: cavium/nitrox - use dma_pool_zalloc() · 718f608c

Srikanth Jampala authored Sep 10, 2018

use dma_pool_zalloc() instead of dma_pool_alloc with __GFP_ZERO flag.
crypto dma pool renamed to "nitrox-context".
Signed-off-by: Srikanth Jampala <Jampala.Srikanth@cavium.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

718f608c

Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 910e3ca1
Herbert Xu authored Sep 21, 2018
```
Merge crypto-2.6 to resolve caam conflict with skcipher conversion.
```
910e3ca1

crypto: caam/jr - fix ablkcipher_edesc pointer arithmetic · 13cc6f48

Horia Geantă authored Sep 14, 2018

In some cases the zero-length hw_desc array at the end of
ablkcipher_edesc struct requires for 4B of tail padding.

Due to tail padding and the way pointers to S/G table and IV
are computed:
	edesc->sec4_sg = (void *)edesc + sizeof(struct ablkcipher_edesc) +
			 desc_bytes;
	iv = (u8 *)edesc->hw_desc + desc_bytes + sec4_sg_bytes;
first 4 bytes of IV are overwritten by S/G table.

Update computation of pointer to S/G table to rely on offset of hw_desc
member and not on sizeof() operator.

Cc: <stable@vger.kernel.org> # 4.13+
Fixes: 115957bb ("crypto: caam - fix IV DMA mapping and updating")
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

13cc6f48

14 Sep, 2018 5 commits

crypto: cavium/nitrox - Added support for SR-IOV configuration. · 41a9aca6

Srikanth Jampala authored Sep 07, 2018

Added support to configure SR-IOV using sysfs interface.
Supported VF modes are 16, 32, 64 and 128. Grouped the
hardware configuration functions to "nitrox_hal.h" file.
Changed driver version to "1.1".
Signed-off-by: Srikanth Jampala <Jampala.Srikanth@cavium.com>
Reviewed-by: Gadam Sreerama <sgadam@cavium.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

41a9aca6

crypto: aesni - don't use GFP_ATOMIC allocation if the request doesn't cross a page in gcm · a7888481

Mikulas Patocka authored Sep 05, 2018

This patch fixes gcmaes_crypt_by_sg so that it won't use memory
allocation if the data doesn't cross a page boundary.

Authenticated encryption may be used by dm-crypt. If the encryption or
decryption fails, it would result in I/O error and filesystem corruption.
The function gcmaes_crypt_by_sg is using GFP_ATOMIC allocation that can
fail anytime. This patch fixes the logic so that it won't attempt the
failing allocation if the data doesn't cross a page boundary.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a7888481

crc-t10dif: crc_t10dif_mutex can be static · a7e7edfe

kbuild test robot authored Sep 05, 2018

Fixes: b7637754 ("crc-t10dif: Pick better transform if one becomes available")
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a7e7edfe

dm: Remove VLA usage from hashes · 6d39a124

Kees Cook authored Aug 07, 2018

In the quest to remove all stack VLA usage from the kernel[1], this uses
the new HASH_MAX_DIGESTSIZE from the crypto layer to allocate the upper
bounds on stack usage.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.comSigned-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

6d39a124

crypto: x86/aegis,morus - Do not require OSXSAVE for SSE2 · 24568b47

Ondrej Mosnacek authored Sep 05, 2018

It turns out OSXSAVE needs to be checked only for AVX, not for SSE.
Without this patch the affected modules refuse to load on CPUs with SSE2
but without AVX support.

Fixes: 877ccce7 ("crypto: x86/aegis,morus - Fix and simplify CPUID checks")
Cc: <stable@vger.kernel.org> # 4.18
Reported-by: Zdenek Kaspar <zkaspar82@gmail.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

24568b47

13 Sep, 2018 1 commit

crypto: ccp - add timeout support in the SEV command · 3702a058

Brijesh Singh authored Aug 15, 2018

Currently, the CCP driver assumes that the SEV command issued to the PSP
will always return (i.e. it will never hang).  But recently, firmware bugs
have shown that a command can hang.  Since of the SEV commands are used
in probe routines, this can cause boot hangs and/or loss of virtualization
capabilities.

To protect against firmware bugs, add a timeout in the SEV command
execution flow.  If a command does not complete within the specified
timeout then return -ETIMEOUT and stop the driver from executing any
further commands since the state of the SEV firmware is unknown.

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Gary Hook <Gary.Hook@amd.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

3702a058

04 Sep, 2018 8 commits

crypto: arm/chacha20 - faster 8-bit rotations and other optimizations · a1b22a5f

Eric Biggers authored Sep 01, 2018

Optimize ChaCha20 NEON performance by:

- Implementing the 8-bit rotations using the 'vtbl.8' instruction.
- Streamlining the part that adds the original state and XORs the data.
- Making some other small tweaks.

On ARM Cortex-A7, these optimizations improve ChaCha20 performance from
about 12.08 cycles per byte to about 11.37 -- a 5.9% improvement.

There is a tradeoff involved with the 'vtbl.8' rotation method since
there is at least one CPU (Cortex-A53) where it's not fastest.  But it
seems to be a better default; see the added comment.  Overall, this
patch reduces Cortex-A53 performance by less than 0.5%.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a1b22a5f

crc-t10dif: Allow current transform to be inspected in sysfs · 11dcb103

Martin K. Petersen authored Aug 30, 2018

Add a way to print the currently active CRC algorithm in:

	/sys/module/crc_t10dif/parameters/transform
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

11dcb103

crc-t10dif: Pick better transform if one becomes available · b7637754

Martin K. Petersen authored Aug 30, 2018

T10 CRC library is linked into the kernel thanks to block and SCSI. The
crypto accelerators are typically loaded later as modules and are
therefore not available when the T10 CRC library is initialized.

Use the crypto notifier facility to trigger a switch to a better algorithm
if one becomes available after the initial hash has been registered. Use
RCU to protect the original transform while the new one is being set up.

Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

b7637754

crypto: api - Introduce notifier for new crypto algorithms · dd8b083f

Martin K. Petersen authored Aug 30, 2018

Introduce a facility that can be used to receive a notification
callback when a new algorithm becomes available. This can be used by
existing crypto registrations to trigger a switch from a software-only
algorithm to a hardware-accelerated version.

A new CRYPTO_MSG_ALG_LOADED state is introduced to the existing crypto
notification chain, and the register/unregister functions are exported
so they can be called by subsystems outside of crypto.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

dd8b083f

crypto: arm64/crct10dif - implement non-Crypto Extensions alternative · 2fffee53

Ard Biesheuvel authored Aug 27, 2018

The arm64 implementation of the CRC-T10DIF algorithm uses the 64x64 bit
polynomial multiplication instructions, which are optional in the
architecture, and if these instructions are not available, we fall back
to the C routine which is slow and inefficient.

So let's reuse the 64x64 bit PMULL alternative from the GHASH driver that
uses a sequence of ~40 instructions involving 8x8 bit PMULL and some
shifting and masking. This is a lot slower than the original, but it is
still twice as fast as the current [unoptimized] C code on Cortex-A53,
and it is time invariant and much easier on the D-cache.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

2fffee53

crypto: arm64/crct10dif - preparatory refactor for 8x8 PMULL version · 6c1b0da1

Ard Biesheuvel authored Aug 27, 2018

Reorganize the CRC-T10DIF asm routine so we can easily instantiate an
alternative version based on 8x8 polynomial multiplication in a
subsequent patch.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

6c1b0da1

crypto: arm64/crc32 - remove PMULL based CRC32 driver · 598b7d41

Ard Biesheuvel authored Aug 27, 2018

Now that the scalar fallbacks have been moved out of this driver into
the core crc32()/crc32c() routines, we are left with a CRC32 crypto API
driver for arm64 that is based only on 64x64 polynomial multiplication,
which is an optional instruction in the ARMv8 architecture, and is less
and less likely to be available on cores that do not also implement the
CRC32 instructions, given that those are mandatory in the architecture
as of ARMv8.1.

Since the scalar instructions do not require the special handling that
SIMD instructions do, and since they turn out to be considerably faster
on some cores (Cortex-A53) as well, there is really no point in keeping
this code around so let's just remove it.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

598b7d41

crypto: arm64/aes-modes - get rid of literal load of addend vector · ed6ed118

Ard Biesheuvel authored Aug 23, 2018

Replace the literal load of the addend vector with a sequence that
performs each add individually. This sequence is only 2 instructions
longer than the original, and 2% faster on Cortex-A53.

This is an improvement by itself, but also works around a Clang issue,
whose integrated assembler does not implement the GNU ARM asm syntax
completely, and does not support the =literal notation for FP registers
(more info at https://bugs.llvm.org/show_bug.cgi?id=38642)

Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ed6ed118