Commits · ede9622162fac42eacde231d64e94c926f4be45d · Kirill Smelkov / linux

22 Feb, 2018 36 commits

crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS · ede96221

Eric Biggers authored Feb 14, 2018

Add an ARM NEON-accelerated implementation of Speck-XTS. It operates on
128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
Speck64. Each 128-byte chunk goes through XTS preprocessing, then is
encrypted/decrypted (doing one cipher round for all the blocks, then the
next round, etc.), then goes through XTS postprocessing.

The performance depends on the processor but can be about 3 times faster
than the generic code. For example, on an ARMv7 processor we observe
the following performance with Speck128/256-XTS:

xts-speck128-neon: Encryption 107.9 MB/s, Decryption 108.1 MB/s
xts(speck128-generic): Encryption 32.1 MB/s, Decryption 36.6 MB/s

In comparison to AES-256-XTS without the Cryptography Extensions:

xts-aes-neonbs: Encryption 41.2 MB/s, Decryption 36.7 MB/s
xts(aes-asm): Encryption 31.7 MB/s, Decryption 30.8 MB/s
xts(aes-generic): Encryption 21.2 MB/s, Decryption 20.9 MB/s

Speck64/128-XTS is even faster:

xts-speck64-neon: Encryption 138.6 MB/s, Decryption 139.1 MB/s

Note that as with the generic code, only the Speck128 and Speck64
variants are supported. Also, for now only the XTS mode of operation is
supported, to target the disk and file encryption use cases. The NEON
code also only handles the portion of the data that is evenly divisible
into 128-byte chunks, with any remainder handled by a C fallback. Of
course, other modes of operation could be added later if needed, and/or
the NEON code could be updated to handle other buffer sizes.

The XTS specification is only defined for AES which has a 128-bit block
size, so for the GF(2^64) math needed for Speck64-XTS we use the
reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
paper. Of course, when possible users should use Speck128-XTS, but even
that may be too slow on some processors; Speck64-XTS can be faster.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ede96221

crypto: speck - export common helpers · c8c36413

Eric Biggers authored Feb 14, 2018

Export the Speck constants and transform context and the ->setkey(),
->encrypt(), and ->decrypt() functions so that they can be reused by the
ARM NEON implementation of Speck-XTS.  The generic key expansion code
will be reused because it is not performance-critical and is not
vectorizable, while the generic encryption and decryption functions are
needed as fallbacks and for the XTS tweak encryption.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

c8c36413

crypto: speck - add support for the Speck block cipher · da7a0ab5

Eric Biggers authored Feb 14, 2018

Add a generic implementation of Speck, including the Speck128 and
Speck64 variants. Speck is a lightweight block cipher that can be much
faster than AES on processors that don't have AES instructions.

We are planning to offer Speck-XTS (probably Speck128/256-XTS) as an
option for dm-crypt and fscrypt on Android, for low-end mobile devices
with older CPUs such as ARMv7 which don't have the Cryptography
Extensions. Currently, such devices are unencrypted because AES is not
fast enough, even when the NEON bit-sliced implementation of AES is
used. Other AES alternatives such as Twofish, Threefish, Camellia,
CAST6, and Serpent aren't fast enough either; it seems that only a
modern ARX cipher can provide sufficient performance on these devices.

This is a replacement for our original proposal
(https://patchwork.kernel.org/patch/10101451/) which was to offer
ChaCha20 for these devices. However, the use of a stream cipher for
disk/file encryption with no space to store nonces would have been much
more insecure than we thought initially, given that it would be used on
top of flash storage as well as potentially on top of F2FS, neither of
which is guaranteed to overwrite data in-place.

Speck has been somewhat controversial due to its origin. Nevertheless,
it has a straightforward design (it's an ARX cipher), and it appears to
be the leading software-optimized lightweight block cipher currently,
with the most cryptanalysis. It's also easy to implement without side
channels, unlike AES. Moreover, we only intend Speck to be used when
the status quo is no encryption, due to AES not being fast enough.

We've also considered a novel length-preserving encryption mode based on
ChaCha20 and Poly1305. While theoretically attractive, such a mode
would be a brand new crypto construction and would be more complicated
and difficult to implement efficiently in comparison to Speck-XTS.

There is confusion about the byte and word orders of Speck, since the
original paper doesn't specify them. But we have implemented it using
the orders the authors recommended in a correspondence with them. The
test vectors are taken from the original paper but were mapped to byte
arrays using the recommended byte and word orders.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

da7a0ab5

crypto: aesni - Update aesni-intel_glue to use scatter/gather · e8455207

Dave Watson authored Feb 14, 2018

Add gcmaes_crypt_by_sg routine, that will do scatter/gather
by sg. Either src or dst may contain multiple buffers, so
iterate over both at the same time if they are different.
If the input is the same as the output, iterate only over one.

Currently both the AAD and TAG must be linear, so copy them out
with scatterlist_map_and_copy.  If first buffer contains the
entire AAD, we can optimize and not copy.   Since the AAD
can be any size, if copied it must be on the heap.  TAG can
be on the stack since it is always < 16 bytes.

Only the SSE routines are updated so far, so leave the previous
gcmaes_en/decrypt routines, and branch to the sg ones if the
keysize is inappropriate for avx, or we are SSE only.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

e8455207

crypto: aesni - Introduce scatter/gather asm function stubs · fb8986e6

Dave Watson authored Feb 14, 2018

The asm macros are all set up now, introduce entry points.

GCM_INIT and GCM_COMPLETE have arguments supplied, so that
the new scatter/gather entry points don't have to take all the
arguments, and only the ones they need.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

fb8986e6

crypto: aesni - Add fast path for > 16 byte update · 933d6aef

Dave Watson authored Feb 14, 2018

We can fast-path any < 16 byte read if the full message is > 16 bytes,
and shift over by the appropriate amount.  Usually we are
reading > 16 bytes, so this should be faster than the READ_PARTIAL
macro introduced in b20209c9 for the average case.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

933d6aef

crypto: aesni - Introduce partial block macro · ae952c5e

Dave Watson authored Feb 14, 2018

Before this diff, multiple calls to GCM_ENC_DEC will
succeed, but only if all calls are a multiple of 16 bytes.

Handle partial blocks at the start of GCM_ENC_DEC, and update
aadhash as appropriate.

The data offset %r11 is also updated after the partial block.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ae952c5e

crypto: aesni - Move HashKey computation from stack to gcm_context · 1476db2d

Dave Watson authored Feb 14, 2018

HashKey computation only needs to happen once per scatter/gather operation,
save it between calls in gcm_context struct instead of on the stack.
Since the asm no longer stores anything on the stack, we can use
%rsp directly, and clean up the frame save/restore macros a bit.

Hashkeys actually only need to be calculated once per key and could
be moved to when set_key is called, however, the current glue code
falls back to generic aes code if fpu is disabled.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

1476db2d

crypto: aesni - Move ghash_mul to GCM_COMPLETE · e2e34b08

Dave Watson authored Feb 14, 2018

Prepare to handle partial blocks between scatter/gather calls.
For the last partial block, we only want to calculate the aadhash
in GCM_COMPLETE, and a new partial block macro will handle both
aadhash update and encrypting partial blocks between calls.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

e2e34b08

crypto: aesni - Fill in new context data structures · 9660474b

Dave Watson authored Feb 14, 2018

Fill in aadhash, aadlen, pblocklen, curcount with appropriate values.
pblocklen, aadhash, and pblockenckey are also updated at the end
of each scatter/gather operation, to be carried over to the next
operation.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

9660474b

crypto: aesni - Split AAD hash calculation to separate macro · c594c540

Dave Watson authored Feb 14, 2018

AAD hash only needs to be calculated once for each scatter/gather operation.
Move it to its own macro, and call it from GCM_INIT instead of
INITIAL_BLOCKS.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

c594c540

crypto: aesni - Introduce gcm_context_data · 9ee4a5df

Dave Watson authored Feb 14, 2018

Introduce a gcm_context_data struct that will be used to pass
context data between scatter/gather update calls.  It is passed
as the second argument (after crypto keys), other args are
renumbered.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

9ee4a5df

crypto: aesni - Merge encode and decode to GCM_ENC_DEC macro · ba45833e

Dave Watson authored Feb 14, 2018

Make a macro for the main encode/decode routine.  Only a small handful
of lines differ for enc and dec.   This will also become the main
scatter/gather update routine.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ba45833e

crypto: aesni - Add GCM_COMPLETE macro · adcadab3

Dave Watson authored Feb 14, 2018

Merge encode and decode tag calculations in GCM_COMPLETE macro.
Scatter/gather routines will call this once at the end of encryption
or decryption.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

adcadab3

crypto: aesni - Add GCM_INIT macro · 7af964c2

Dave Watson authored Feb 14, 2018

Reduce code duplication by introducting GCM_INIT macro.  This macro
will also be exposed as a function for implementing scatter/gather
support, since INIT only needs to be called once for the full
operation.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

7af964c2

crypto: aesni - Macro-ify func save/restore · 6c2c86b3

Dave Watson authored Feb 14, 2018

Macro-ify function save and restore.  These will be used in new functions
added for scatter/gather update operations.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

6c2c86b3

crypto: aesni - Merge INITIAL_BLOCKS_ENC/DEC · e1fd316f

Dave Watson authored Feb 14, 2018

Use macro operations to merge implemetations of INITIAL_BLOCKS,
since they differ by only a small handful of lines.

Use macro counter \@ to simplify implementation.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

e1fd316f

crypto: nx-842 - Delete an error message for a failed memory allocation in nx842_pseries_init() · 8c48db9a

Markus Elfring authored Feb 14, 2018

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

8c48db9a

crypto: sahara - Improve a size determination in sahara_probe() · a8bc22f3

Markus Elfring authored Feb 14, 2018

Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a8bc22f3

crypto: sahara - Delete an error message for a failed memory allocation in sahara_probe() · 0d576d92

Markus Elfring authored Feb 14, 2018

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

0d576d92

crypto: ux500 - Delete two unnecessary variable initialisations in ux500_cryp_probe() · 0c704bf0

Markus Elfring authored Feb 14, 2018

Two local variables will eventually be set to appropriate pointers
a bit later. Thus omit their explicit initialisation at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

0c704bf0

crypto: ux500 - Adjust an error message in ux500_cryp_probe() · 9dea6941

Markus Elfring authored Feb 14, 2018

Replace the function name in this error message so that the same name
is mentioned according to what was called before.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

9dea6941

crypto: ux500 - Adjust two condition checks in ux500_cryp_probe() · 50ca524d

Markus Elfring authored Feb 14, 2018

The local variable "cryp_error" was used only for two condition checks.

* Check the return values from these function calls directly instead.

* Delete this variable which became unnecessary with this refactoring.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

50ca524d

crypto: ux500 - Delete an error message for a failed memory allocation in ux500_cryp_probe() · dbbd5d1e

Markus Elfring authored Feb 14, 2018

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

dbbd5d1e

crypto: testmgr - Fix incorrect values in PKCS#1 test vector · 333e18c5

Conor McLoughlin authored Feb 13, 2018

The RSA private key for the first form should have
version, prime1, prime2, exponent1, exponent2, coefficient
values 0.
With non-zero values for prime1,2, exponent 1,2 and coefficient
the Intel QAT driver will assume that values are provided for the
private key second form. This will result in signature verification
failures for modules where QAT device is present and the modules
are signed with rsa,sha256.

Cc: <stable@vger.kernel.org>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Conor McLoughlin <conor.mcloughlin@intel.com>
Reviewed-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

333e18c5

crypto: inside-secure - unmap the result in the hash send error path · 57240a78

Antoine Tenart authored Feb 13, 2018

This patch adds a label to unmap the result buffer in the hash send
function error path.

Fixes: 1b44c5a6 ("crypto: inside-secure - add SafeXcel EIP197 crypto engine driver")
Suggested-by: Ofer Heifetz <oferh@marvell.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

57240a78

crypto: inside-secure - keep the requests push/pop synced · f7268c53

Antoine Tenart authored Feb 13, 2018

This patch updates the Inside Secure SafeXcel driver to avoid being
out-of-sync between the number of requests sent and the one being
completed.

The number of requests acknowledged by the driver can be different than
the threshold that was configured if new requests were being pushed to
the h/w in the meantime. The driver wasn't taking those into account,
and the number of remaining requests to handled (to reconfigure the
interrupt threshold) could be out-of sync.

This patch fixes it by not taking in account the number of requests
left, but by taking in account the total number of requests being sent
to the hardware, so that new requests are being taken into account.

Fixes: dc7e28a3 ("crypto: inside-secure - dequeue all requests at once")
Suggested-by: Ofer Heifetz <oferh@marvell.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

f7268c53

crypto: inside-secure - fix the invalidation step during cra_exit · b7007dbc

Antoine Tenart authored Feb 13, 2018

When exiting a transformation, the cra_exit() helper is called in each
driver providing one. The Inside Secure SafeXcel driver has one, which
is responsible of freeing some areas and of sending one invalidation
request to the crypto engine, to invalidate the context that was used
during the transformation.

We could see in some setups (when lots of transformations were being
used with a short lifetime, and hence lots of cra_exit() calls) NULL
pointer dereferences and other weird issues. All these issues were
coming from accessing the tfm context.

The issue is the invalidation request completion is checked using a
wait_for_completion_interruptible() call in both the cipher and hash
cra_exit() helpers. In some cases this was interrupted while the
invalidation request wasn't processed yet. And then cra_exit() returned,
and its caller was freeing the tfm instance. Only then the request was
being handled by the SafeXcel driver, which lead to the said issues.

This patch fixes this by using wait_for_completion() calls in these
specific cases.

Fixes: 1b44c5a6 ("crypto: inside-secure - add SafeXcel EIP197 crypto engine driver")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

b7007dbc

crypto: inside-secure - do not process request if no command was issued · 95831cea

Antoine Tenart authored Feb 13, 2018

This patch adds a check in the SafeXcel dequeue function, to avoid
processing request further if no hardware command was issued. This can
happen in certain cases where the ->send() function caches all the data
that would have been send.

Fixes: 809778e0 ("crypto: inside-secure - fix hash when length is a multiple of a block")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

95831cea

crypto: inside-secure - fix the cache_len computation · 666a9c70

Antoine Tenart authored Feb 13, 2018

This patch fixes the cache length computation as cache_len could end up
being a negative value. The check between the queued size and the
block size is updated to reflect the caching mechanism which can cache
up to a full block size (included!).

666a9c70

crypto: inside-secure - fix the extra cache computation · c1a8fa6e

Antoine Tenart authored Feb 13, 2018

This patch fixes the extra cache computation when the queued data is a
multiple of a block size. This fixes the hash support in some cases.

c1a8fa6e

crypto: inside-secure - do not overwrite the threshold value · e1d24c0b

Antoine Tenart authored Feb 13, 2018

This patch fixes the Inside Secure SafeXcel driver not to overwrite the
interrupt threshold value. In certain cases the value of this register,
which controls when to fire an interrupt, was overwritten. This lead to
packet not being processed or acked as the driver never was aware of
their completion.

This patch fixes this behaviour by not setting the threshold when
requests are being processed by the engine.

Fixes: dc7e28a3 ("crypto: inside-secure - dequeue all requests at once")
Suggested-by: Ofer Heifetz <oferh@marvell.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

e1d24c0b

MAINTAINERS: update the Inside Secure maintainer email · c4ecc8f0

Antoine Tenart authored Feb 13, 2018

Free Electrons became Bootlin. Update my email accordingly.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

c4ecc8f0

hwrng: bcm2835 - Handle deferred clock properly · 7b4c5d30

Stefan Wahren authored Feb 12, 2018

In case the probe of the clock is deferred, we would assume it is
optional. This is wrong, so defer the probe of this driver until
the clock is available.

Fixes: 791af4f4 ("hwrng: bcm2835 - Manage an optional clock")
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

7b4c5d30

crypto: arm/aes-cipher - move S-box to .rodata section · 4ff8b1dd

Jinbum Park authored Feb 12, 2018

Move the AES inverse S-box to the .rodata section
where it is safe from abuse by speculation.
Signed-off-by: Jinbum Park <jinb.park7@gmail.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

4ff8b1dd

crypto: sunxi-ss - Add MODULE_ALIAS to sun4i-ss · 7c73cf4c

Peter Robinson authored Feb 11, 2018

The MODULE_ALIAS is required to enable the sun4i-ss driver to load
automatically when built at a module. Tested on a Cubietruck.

Fixes: 6298e948 ("crypto: sunxi-ss - Add Allwinner Security System crypto accelerator")
Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

7c73cf4c

15 Feb, 2018 4 commits

crypto: stm32/cryp - add stm32mp1 support · a43a3484

Fabien DESSENNE authored Feb 07, 2018

stm32mp1 differs from stm32f7 in the way it handles byte ordering and
padding for aes gcm & ccm algo.
Signed-off-by: Fabien Dessenne <fabien.dessenne@st.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a43a3484

crypto: stm32/cryp - add aes gcm / ccm support · 9d3b5030

Fabien DESSENNE authored Feb 07, 2018

Add AEAD cipher algorithms for aes gcm and ccm.
Signed-off-by: Fabien Dessenne <fabien.dessenne@st.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

9d3b5030

crypto: qat - Make several functions static · 9f32bb53

Colin Ian King authored Feb 06, 2018

Functions qat_rsa_set_n, qat_rsa_set_e and qat_rsa_set_n are local to
the source and do not need to be in global scope, so make them static.

Cleans up sparse warnings:
drivers/crypto/qat/qat_common/qat_asym_algs.c:972:5: warning: symbol
'qat_rsa_set_n' was not declared. Should it be static?
drivers/crypto/qat/qat_common/qat_asym_algs.c:1003:5: warning: symbol
'qat_rsa_set_e' was not declared. Should it be static?
drivers/crypto/qat/qat_common/qat_asym_algs.c:1027:5: warning: symbol
'qat_rsa_set_d' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

9f32bb53

crypto: ccp - Make function ccp_get_dma_chan_attr static · 404a36a7

Colin Ian King authored Feb 06, 2018

Function ccp_get_dma_chan_attr is local to the source and does not
need to be in global scope, so make it static.

Cleans up sparse warning:
drivers/crypto/ccp/ccp-dmaengine.c:41:14: warning: symbol
'ccp_get_dma_chan_attr' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

404a36a7