Commit 80cee03b authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "Here is the crypto update for 4.14:

  API:
   - Defer scompress scratch buffer allocation to first use.
   - Add __crypto_xor that takes separte src and dst operands.
   - Add ahash multiple registration interface.
   - Revamped aead/skcipher algif code to fix async IO properly.

  Drivers:
   - Add non-SIMD fallback code path on ARM for SVE.
   - Add AMD Security Processor framework for ccp.
   - Add support for RSA in ccp.
   - Add XTS-AES-256 support for CCP version 5.
   - Add support for PRNG in sun4i-ss.
   - Add support for DPAA2 in caam.
   - Add ARTPEC crypto support.
   - Add Freescale RNGC hwrng support.
   - Add Microchip / Atmel ECC driver.
   - Add support for STM32 HASH module"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits)
  crypto: af_alg - get_page upon reassignment to TX SGL
  crypto: cavium/nitrox - Fix an error handling path in 'nitrox_probe()'
  crypto: inside-secure - fix an error handling path in safexcel_probe()
  crypto: rockchip - Don't dequeue the request when device is busy
  crypto: cavium - add release_firmware to all return case
  crypto: sahara - constify platform_device_id
  MAINTAINERS: Add ARTPEC crypto maintainer
  crypto: axis - add ARTPEC-6/7 crypto accelerator driver
  crypto: hash - add crypto_(un)register_ahashes()
  dt-bindings: crypto: add ARTPEC crypto
  crypto: algif_aead - fix comment regarding memory layout
  crypto: ccp - use dma_mapping_error to check map error
  lib/mpi: fix build with clang
  crypto: sahara - Remove leftover from previous used spinlock
  crypto: sahara - Fix dma unmap direction
  crypto: af_alg - consolidation of duplicate code
  crypto: caam - Remove unused dentry members
  crypto: ccp - select CONFIG_CRYPTO_RSA
  crypto: ccp - avoid uninitialized variable warning
  crypto: serpent - improve __serpent_setkey with UBSAN
  ...
parents aae3dbb4 2d45a7e8
Axis crypto engine with PDMA interface.
Required properties:
- compatible : Should be one of the following strings:
"axis,artpec6-crypto" for the version in the Axis ARTPEC-6 SoC
"axis,artpec7-crypto" for the version in the Axis ARTPEC-7 SoC.
- reg: Base address and size for the PDMA register area.
- interrupts: Interrupt handle for the PDMA interrupt line.
Example:
crypto@f4264000 {
compatible = "axis,artpec6-crypto";
reg = <0xf4264000 0x1000>;
interrupts = <GIC_SPI 19 IRQ_TYPE_LEVEL_HIGH>;
};
......@@ -66,3 +66,16 @@ sha@f8034000 {
dmas = <&dma1 2 17>;
dma-names = "tx";
};
* Eliptic Curve Cryptography (I2C)
Required properties:
- compatible : must be "atmel,atecc508a".
- reg: I2C bus address of the device.
- clock-frequency: must be present in the i2c controller node.
Example:
atecc508a@C0 {
compatible = "atmel,atecc508a";
reg = <0xC0>;
};
* STMicroelectronics STM32 HASH
Required properties:
- compatible: Should contain entries for this and backward compatible
HASH versions:
- "st,stm32f456-hash" for stm32 F456.
- "st,stm32f756-hash" for stm32 F756.
- reg: The address and length of the peripheral registers space
- interrupts: the interrupt specifier for the HASH
- clocks: The input clock of the HASH instance
Optional properties:
- resets: The input reset of the HASH instance
- dmas: DMA specifiers for the HASH. See the DMA client binding,
Documentation/devicetree/bindings/dma/dma.txt
- dma-names: DMA request name. Should be "in" if a dma is present.
- dma-maxburst: Set number of maximum dma burst supported
Example:
hash1: hash@50060400 {
compatible = "st,stm32f756-hash";
reg = <0x50060400 0x400>;
interrupts = <80>;
clocks = <&rcc 0 STM32F7_AHB2_CLOCK(HASH)>;
resets = <&rcc STM32F7_AHB2_RESET(HASH)>;
dmas = <&dma2 7 2 0x400 0x0>;
dma-names = "in";
dma-maxburst = <0>;
};
Freescale RNGC (Random Number Generator Version C)
The driver also supports version B, which is mostly compatible
to version C.
Required properties:
- compatible : should be one of
"fsl,imx25-rngb"
"fsl,imx35-rngc"
- reg : offset and length of the register set of this block
- interrupts : the interrupt number for the RNGC block
- clocks : the RNGC clk source
Example:
rng@53fb0000 {
compatible = "fsl,imx25-rngb";
reg = <0x53fb0000 0x4000>;
interrupts = <22>;
clocks = <&trng_clk>;
};
......@@ -1162,6 +1162,7 @@ L: linux-arm-kernel@axis.com
F: arch/arm/mach-artpec
F: arch/arm/boot/dts/artpec6*
F: drivers/clk/axis
F: drivers/crypto/axis
F: drivers/pinctrl/pinctrl-artpec*
F: Documentation/devicetree/bindings/pinctrl/axis,artpec6-pinctrl.txt
......@@ -8770,6 +8771,12 @@ F: drivers/dma/at_hdmac.c
F: drivers/dma/at_hdmac_regs.h
F: include/linux/platform_data/dma-atmel.h
MICROCHIP / ATMEL ECC DRIVER
M: Tudor Ambarus <tudor.ambarus@microchip.com>
L: linux-crypto@vger.kernel.org
S: Maintained
F: drivers/crypto/atmel-ecc.*
MICROCHIP / ATMEL ISC DRIVER
M: Songjun Wu <songjun.wu@microchip.com>
L: linux-media@vger.kernel.org
......
......@@ -94,14 +94,15 @@ config CRYPTO_AES_ARM_CE
ARMv8 Crypto Extensions
config CRYPTO_GHASH_ARM_CE
tristate "PMULL-accelerated GHASH using ARMv8 Crypto Extensions"
tristate "PMULL-accelerated GHASH using NEON/ARMv8 Crypto Extensions"
depends on KERNEL_MODE_NEON
select CRYPTO_HASH
select CRYPTO_CRYPTD
help
Use an implementation of GHASH (used by the GCM AEAD chaining mode)
that uses the 64x64 to 128 bit polynomial multiplication (vmull.p64)
that is part of the ARMv8 Crypto Extensions
that is part of the ARMv8 Crypto Extensions, or a slower variant that
uses the vmull.p8 instruction that is part of the basic NEON ISA.
config CRYPTO_CRCT10DIF_ARM_CE
tristate "CRCT10DIF digest algorithm using PMULL instructions"
......
......@@ -285,9 +285,7 @@ static int ctr_encrypt(struct skcipher_request *req)
ce_aes_ctr_encrypt(tail, NULL, (u8 *)ctx->key_enc,
num_rounds(ctx), blocks, walk.iv);
if (tdst != tsrc)
memcpy(tdst, tsrc, nbytes);
crypto_xor(tdst, tail, nbytes);
crypto_xor_cpy(tdst, tsrc, tail, nbytes);
err = skcipher_walk_done(&walk, 0);
}
kernel_neon_end();
......
......@@ -10,6 +10,7 @@
*/
#include <linux/linkage.h>
#include <asm/cache.h>
.text
.align 5
......@@ -32,19 +33,19 @@
.endif
.endm
.macro __load, out, in, idx
.macro __load, out, in, idx, sz, op
.if __LINUX_ARM_ARCH__ < 7 && \idx > 0
ldr \out, [ttab, \in, lsr #(8 * \idx) - 2]
ldr\op \out, [ttab, \in, lsr #(8 * \idx) - \sz]
.else
ldr \out, [ttab, \in, lsl #2]
ldr\op \out, [ttab, \in, lsl #\sz]
.endif
.endm
.macro __hround, out0, out1, in0, in1, in2, in3, t3, t4, enc
.macro __hround, out0, out1, in0, in1, in2, in3, t3, t4, enc, sz, op
__select \out0, \in0, 0
__select t0, \in1, 1
__load \out0, \out0, 0
__load t0, t0, 1
__load \out0, \out0, 0, \sz, \op
__load t0, t0, 1, \sz, \op
.if \enc
__select \out1, \in1, 0
......@@ -53,10 +54,10 @@
__select \out1, \in3, 0
__select t1, \in0, 1
.endif
__load \out1, \out1, 0
__load \out1, \out1, 0, \sz, \op
__select t2, \in2, 2
__load t1, t1, 1
__load t2, t2, 2
__load t1, t1, 1, \sz, \op
__load t2, t2, 2, \sz, \op
eor \out0, \out0, t0, ror #24
......@@ -68,9 +69,9 @@
__select \t3, \in1, 2
__select \t4, \in2, 3
.endif
__load \t3, \t3, 2
__load t0, t0, 3
__load \t4, \t4, 3
__load \t3, \t3, 2, \sz, \op
__load t0, t0, 3, \sz, \op
__load \t4, \t4, 3, \sz, \op
eor \out1, \out1, t1, ror #24
eor \out0, \out0, t2, ror #16
......@@ -82,14 +83,14 @@
eor \out1, \out1, t2
.endm
.macro fround, out0, out1, out2, out3, in0, in1, in2, in3
__hround \out0, \out1, \in0, \in1, \in2, \in3, \out2, \out3, 1
__hround \out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1
.macro fround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op
__hround \out0, \out1, \in0, \in1, \in2, \in3, \out2, \out3, 1, \sz, \op
__hround \out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1, \sz, \op
.endm
.macro iround, out0, out1, out2, out3, in0, in1, in2, in3
__hround \out0, \out1, \in0, \in3, \in2, \in1, \out2, \out3, 0
__hround \out2, \out3, \in2, \in1, \in0, \in3, \in1, \in0, 0
.macro iround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op
__hround \out0, \out1, \in0, \in3, \in2, \in1, \out2, \out3, 0, \sz, \op
__hround \out2, \out3, \in2, \in1, \in0, \in3, \in1, \in0, 0, \sz, \op
.endm
.macro __rev, out, in
......@@ -114,7 +115,7 @@
.endif
.endm
.macro do_crypt, round, ttab, ltab
.macro do_crypt, round, ttab, ltab, bsz
push {r3-r11, lr}
ldr r4, [in]
......@@ -146,9 +147,12 @@
1: subs rounds, rounds, #4
\round r8, r9, r10, r11, r4, r5, r6, r7
__adrl ttab, \ltab, ls
bls 2f
\round r4, r5, r6, r7, r8, r9, r10, r11
bhi 0b
b 0b
2: __adrl ttab, \ltab
\round r4, r5, r6, r7, r8, r9, r10, r11, \bsz, b
#ifdef CONFIG_CPU_BIG_ENDIAN
__rev r4, r4
......@@ -170,10 +174,48 @@
.ltorg
.endm
.align L1_CACHE_SHIFT
.type __aes_arm_inverse_sbox, %object
__aes_arm_inverse_sbox:
.byte 0x52, 0x09, 0x6a, 0xd5, 0x30, 0x36, 0xa5, 0x38
.byte 0xbf, 0x40, 0xa3, 0x9e, 0x81, 0xf3, 0xd7, 0xfb
.byte 0x7c, 0xe3, 0x39, 0x82, 0x9b, 0x2f, 0xff, 0x87
.byte 0x34, 0x8e, 0x43, 0x44, 0xc4, 0xde, 0xe9, 0xcb
.byte 0x54, 0x7b, 0x94, 0x32, 0xa6, 0xc2, 0x23, 0x3d
.byte 0xee, 0x4c, 0x95, 0x0b, 0x42, 0xfa, 0xc3, 0x4e
.byte 0x08, 0x2e, 0xa1, 0x66, 0x28, 0xd9, 0x24, 0xb2
.byte 0x76, 0x5b, 0xa2, 0x49, 0x6d, 0x8b, 0xd1, 0x25
.byte 0x72, 0xf8, 0xf6, 0x64, 0x86, 0x68, 0x98, 0x16
.byte 0xd4, 0xa4, 0x5c, 0xcc, 0x5d, 0x65, 0xb6, 0x92
.byte 0x6c, 0x70, 0x48, 0x50, 0xfd, 0xed, 0xb9, 0xda
.byte 0x5e, 0x15, 0x46, 0x57, 0xa7, 0x8d, 0x9d, 0x84
.byte 0x90, 0xd8, 0xab, 0x00, 0x8c, 0xbc, 0xd3, 0x0a
.byte 0xf7, 0xe4, 0x58, 0x05, 0xb8, 0xb3, 0x45, 0x06
.byte 0xd0, 0x2c, 0x1e, 0x8f, 0xca, 0x3f, 0x0f, 0x02
.byte 0xc1, 0xaf, 0xbd, 0x03, 0x01, 0x13, 0x8a, 0x6b
.byte 0x3a, 0x91, 0x11, 0x41, 0x4f, 0x67, 0xdc, 0xea
.byte 0x97, 0xf2, 0xcf, 0xce, 0xf0, 0xb4, 0xe6, 0x73
.byte 0x96, 0xac, 0x74, 0x22, 0xe7, 0xad, 0x35, 0x85
.byte 0xe2, 0xf9, 0x37, 0xe8, 0x1c, 0x75, 0xdf, 0x6e
.byte 0x47, 0xf1, 0x1a, 0x71, 0x1d, 0x29, 0xc5, 0x89
.byte 0x6f, 0xb7, 0x62, 0x0e, 0xaa, 0x18, 0xbe, 0x1b
.byte 0xfc, 0x56, 0x3e, 0x4b, 0xc6, 0xd2, 0x79, 0x20
.byte 0x9a, 0xdb, 0xc0, 0xfe, 0x78, 0xcd, 0x5a, 0xf4
.byte 0x1f, 0xdd, 0xa8, 0x33, 0x88, 0x07, 0xc7, 0x31
.byte 0xb1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xec, 0x5f
.byte 0x60, 0x51, 0x7f, 0xa9, 0x19, 0xb5, 0x4a, 0x0d
.byte 0x2d, 0xe5, 0x7a, 0x9f, 0x93, 0xc9, 0x9c, 0xef
.byte 0xa0, 0xe0, 0x3b, 0x4d, 0xae, 0x2a, 0xf5, 0xb0
.byte 0xc8, 0xeb, 0xbb, 0x3c, 0x83, 0x53, 0x99, 0x61
.byte 0x17, 0x2b, 0x04, 0x7e, 0xba, 0x77, 0xd6, 0x26
.byte 0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d
.size __aes_arm_inverse_sbox, . - __aes_arm_inverse_sbox
ENTRY(__aes_arm_encrypt)
do_crypt fround, crypto_ft_tab, crypto_fl_tab
do_crypt fround, crypto_ft_tab, crypto_ft_tab + 1, 2
ENDPROC(__aes_arm_encrypt)
.align 5
ENTRY(__aes_arm_decrypt)
do_crypt iround, crypto_it_tab, crypto_il_tab
do_crypt iround, crypto_it_tab, __aes_arm_inverse_sbox, 0
ENDPROC(__aes_arm_decrypt)
......@@ -221,9 +221,8 @@ static int ctr_encrypt(struct skcipher_request *req)
u8 *dst = walk.dst.virt.addr + blocks * AES_BLOCK_SIZE;
u8 *src = walk.src.virt.addr + blocks * AES_BLOCK_SIZE;
if (dst != src)
memcpy(dst, src, walk.total % AES_BLOCK_SIZE);
crypto_xor(dst, final, walk.total % AES_BLOCK_SIZE);
crypto_xor_cpy(dst, src, final,
walk.total % AES_BLOCK_SIZE);
err = skcipher_walk_done(&walk, 0);
break;
......
/*
* Accelerated GHASH implementation with ARMv8 vmull.p64 instructions.
* Accelerated GHASH implementation with NEON/ARMv8 vmull.p8/64 instructions.
*
* Copyright (C) 2015 Linaro Ltd. <ard.biesheuvel@linaro.org>
* Copyright (C) 2015 - 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published
......@@ -12,40 +12,162 @@
#include <asm/assembler.h>
SHASH .req q0
SHASH2 .req q1
T1 .req q2
T2 .req q3
MASK .req q4
XL .req q5
XM .req q6
XH .req q7
IN1 .req q7
T1 .req q1
XL .req q2
XM .req q3
XH .req q4
IN1 .req q4
SHASH_L .req d0
SHASH_H .req d1
SHASH2_L .req d2
T1_L .req d4
MASK_L .req d8
XL_L .req d10
XL_H .req d11
XM_L .req d12
XM_H .req d13
XH_L .req d14
T1_L .req d2
T1_H .req d3
XL_L .req d4
XL_H .req d5
XM_L .req d6
XM_H .req d7
XH_L .req d8
t0l .req d10
t0h .req d11
t1l .req d12
t1h .req d13
t2l .req d14
t2h .req d15
t3l .req d16
t3h .req d17
t4l .req d18
t4h .req d19
t0q .req q5
t1q .req q6
t2q .req q7
t3q .req q8
t4q .req q9
T2 .req q9
s1l .req d20
s1h .req d21
s2l .req d22
s2h .req d23
s3l .req d24
s3h .req d25
s4l .req d26
s4h .req d27
MASK .req d28
SHASH2_p8 .req d28
k16 .req d29
k32 .req d30
k48 .req d31
SHASH2_p64 .req d31
.text
.fpu crypto-neon-fp-armv8
.macro __pmull_p64, rd, rn, rm, b1, b2, b3, b4
vmull.p64 \rd, \rn, \rm
.endm
/*
* void pmull_ghash_update(int blocks, u64 dg[], const char *src,
* struct ghash_key const *k, const char *head)
* This implementation of 64x64 -> 128 bit polynomial multiplication
* using vmull.p8 instructions (8x8 -> 16) is taken from the paper
* "Fast Software Polynomial Multiplication on ARM Processors Using
* the NEON Engine" by Danilo Camara, Conrado Gouvea, Julio Lopez and
* Ricardo Dahab (https://hal.inria.fr/hal-01506572)
*
* It has been slightly tweaked for in-order performance, and to allow
* 'rq' to overlap with 'ad' or 'bd'.
*/
ENTRY(pmull_ghash_update)
vld1.64 {SHASH}, [r3]
.macro __pmull_p8, rq, ad, bd, b1=t4l, b2=t3l, b3=t4l, b4=t3l
vext.8 t0l, \ad, \ad, #1 @ A1
.ifc \b1, t4l
vext.8 t4l, \bd, \bd, #1 @ B1
.endif
vmull.p8 t0q, t0l, \bd @ F = A1*B
vext.8 t1l, \ad, \ad, #2 @ A2
vmull.p8 t4q, \ad, \b1 @ E = A*B1
.ifc \b2, t3l
vext.8 t3l, \bd, \bd, #2 @ B2
.endif
vmull.p8 t1q, t1l, \bd @ H = A2*B
vext.8 t2l, \ad, \ad, #3 @ A3
vmull.p8 t3q, \ad, \b2 @ G = A*B2
veor t0q, t0q, t4q @ L = E + F
.ifc \b3, t4l
vext.8 t4l, \bd, \bd, #3 @ B3
.endif
vmull.p8 t2q, t2l, \bd @ J = A3*B
veor t0l, t0l, t0h @ t0 = (L) (P0 + P1) << 8
veor t1q, t1q, t3q @ M = G + H
.ifc \b4, t3l
vext.8 t3l, \bd, \bd, #4 @ B4
.endif
vmull.p8 t4q, \ad, \b3 @ I = A*B3
veor t1l, t1l, t1h @ t1 = (M) (P2 + P3) << 16
vmull.p8 t3q, \ad, \b4 @ K = A*B4
vand t0h, t0h, k48
vand t1h, t1h, k32
veor t2q, t2q, t4q @ N = I + J
veor t0l, t0l, t0h
veor t1l, t1l, t1h
veor t2l, t2l, t2h @ t2 = (N) (P4 + P5) << 24
vand t2h, t2h, k16
veor t3l, t3l, t3h @ t3 = (K) (P6 + P7) << 32
vmov.i64 t3h, #0
vext.8 t0q, t0q, t0q, #15
veor t2l, t2l, t2h
vext.8 t1q, t1q, t1q, #14
vmull.p8 \rq, \ad, \bd @ D = A*B
vext.8 t2q, t2q, t2q, #13
vext.8 t3q, t3q, t3q, #12
veor t0q, t0q, t1q
veor t2q, t2q, t3q
veor \rq, \rq, t0q
veor \rq, \rq, t2q
.endm
//
// PMULL (64x64->128) based reduction for CPUs that can do
// it in a single instruction.
//
.macro __pmull_reduce_p64
vmull.p64 T1, XL_L, MASK
veor XH_L, XH_L, XM_H
vext.8 T1, T1, T1, #8
veor XL_H, XL_H, XM_L
veor T1, T1, XL
vmull.p64 XL, T1_H, MASK
.endm
//
// Alternative reduction for CPUs that lack support for the
// 64x64->128 PMULL instruction
//
.macro __pmull_reduce_p8
veor XL_H, XL_H, XM_L
veor XH_L, XH_L, XM_H
vshl.i64 T1, XL, #57
vshl.i64 T2, XL, #62
veor T1, T1, T2
vshl.i64 T2, XL, #63
veor T1, T1, T2
veor XL_H, XL_H, T1_L
veor XH_L, XH_L, T1_H
vshr.u64 T1, XL, #1
veor XH, XH, XL
veor XL, XL, T1
vshr.u64 T1, T1, #6
vshr.u64 XL, XL, #1
.endm
.macro ghash_update, pn
vld1.64 {XL}, [r1]
vmov.i8 MASK, #0xe1
vext.8 SHASH2, SHASH, SHASH, #8
vshl.u64 MASK, MASK, #57
veor SHASH2, SHASH2, SHASH
/* do the head block first, if supplied */
ldr ip, [sp]
......@@ -62,33 +184,59 @@ ENTRY(pmull_ghash_update)
#ifndef CONFIG_CPU_BIG_ENDIAN
vrev64.8 T1, T1
#endif
vext.8 T2, XL, XL, #8
vext.8 IN1, T1, T1, #8
veor T1, T1, T2
veor T1_L, T1_L, XL_H
veor XL, XL, IN1
vmull.p64 XH, SHASH_H, XL_H @ a1 * b1
__pmull_\pn XH, XL_H, SHASH_H, s1h, s2h, s3h, s4h @ a1 * b1
veor T1, T1, XL
vmull.p64 XL, SHASH_L, XL_L @ a0 * b0
vmull.p64 XM, SHASH2_L, T1_L @ (a1 + a0)(b1 + b0)
__pmull_\pn XL, XL_L, SHASH_L, s1l, s2l, s3l, s4l @ a0 * b0
__pmull_\pn XM, T1_L, SHASH2_\pn @ (a1+a0)(b1+b0)
vext.8 T1, XL, XH, #8
veor T2, XL, XH
veor T1, XL, XH
veor XM, XM, T1
veor XM, XM, T2
vmull.p64 T2, XL_L, MASK_L
vmov XH_L, XM_H
vmov XM_H, XL_L
__pmull_reduce_\pn
veor XL, XM, T2
vext.8 T2, XL, XL, #8
vmull.p64 XL, XL_L, MASK_L
veor T2, T2, XH
veor XL, XL, T2
veor T1, T1, XH
veor XL, XL, T1
bne 0b
vst1.64 {XL}, [r1]
bx lr
ENDPROC(pmull_ghash_update)
.endm
/*
* void pmull_ghash_update(int blocks, u64 dg[], const char *src,
* struct ghash_key const *k, const char *head)
*/
ENTRY(pmull_ghash_update_p64)
vld1.64 {SHASH}, [r3]
veor SHASH2_p64, SHASH_L, SHASH_H
vmov.i8 MASK, #0xe1
vshl.u64 MASK, MASK, #57
ghash_update p64
ENDPROC(pmull_ghash_update_p64)
ENTRY(pmull_ghash_update_p8)
vld1.64 {SHASH}, [r3]
veor SHASH2_p8, SHASH_L, SHASH_H
vext.8 s1l, SHASH_L, SHASH_L, #1
vext.8 s2l, SHASH_L, SHASH_L, #2
vext.8 s3l, SHASH_L, SHASH_L, #3
vext.8 s4l, SHASH_L, SHASH_L, #4
vext.8 s1h, SHASH_H, SHASH_H, #1
vext.8 s2h, SHASH_H, SHASH_H, #2
vext.8 s3h, SHASH_H, SHASH_H, #3
vext.8 s4h, SHASH_H, SHASH_H, #4
vmov.i64 k16, #0xffff
vmov.i64 k32, #0xffffffff
vmov.i64 k48, #0xffffffffffff
ghash_update p8
ENDPROC(pmull_ghash_update_p8)
......@@ -22,6 +22,7 @@
MODULE_DESCRIPTION("GHASH secure hash using ARMv8 Crypto Extensions");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS_CRYPTO("ghash");
#define GHASH_BLOCK_SIZE 16
#define GHASH_DIGEST_SIZE 16
......@@ -41,8 +42,17 @@ struct ghash_async_ctx {
struct cryptd_ahash *cryptd_tfm;
};
asmlinkage void pmull_ghash_update(int blocks, u64 dg[], const char *src,
struct ghash_key const *k, const char *head);
asmlinkage void pmull_ghash_update_p64(int blocks, u64 dg[], const char *src,
struct ghash_key const *k,
const char *head);
asmlinkage void pmull_ghash_update_p8(int blocks, u64 dg[], const char *src,
struct ghash_key const *k,
const char *head);
static void (*pmull_ghash_update)(int blocks, u64 dg[], const char *src,
struct ghash_key const *k,
const char *head);
static int ghash_init(struct shash_desc *desc)
{
......@@ -312,6 +322,14 @@ static int __init ghash_ce_mod_init(void)
{
int err;
if (!(elf_hwcap & HWCAP_NEON))
return -ENODEV;
if (elf_hwcap2 & HWCAP2_PMULL)
pmull_ghash_update = pmull_ghash_update_p64;
else
pmull_ghash_update = pmull_ghash_update_p8;
err = crypto_register_shash(&ghash_alg);
if (err)
return err;
......@@ -332,5 +350,5 @@ static void __exit ghash_ce_mod_exit(void)
crypto_unregister_shash(&ghash_alg);
}
module_cpu_feature_match(PMULL, ghash_ce_mod_init);
module_init(ghash_ce_mod_init);
module_exit(ghash_ce_mod_exit);
......@@ -18,18 +18,23 @@ config CRYPTO_SHA512_ARM64
config CRYPTO_SHA1_ARM64_CE
tristate "SHA-1 digest algorithm (ARMv8 Crypto Extensions)"
depends on ARM64 && KERNEL_MODE_NEON
depends on KERNEL_MODE_NEON
select CRYPTO_HASH
select CRYPTO_SHA1
config CRYPTO_SHA2_ARM64_CE
tristate "SHA-224/SHA-256 digest algorithm (ARMv8 Crypto Extensions)"
depends on ARM64 && KERNEL_MODE_NEON
depends on KERNEL_MODE_NEON
select CRYPTO_HASH
select CRYPTO_SHA256_ARM64
config CRYPTO_GHASH_ARM64_CE
tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions"
depends on ARM64 && KERNEL_MODE_NEON
tristate "GHASH/AES-GCM using ARMv8 Crypto Extensions"
depends on KERNEL_MODE_NEON
select CRYPTO_HASH
select CRYPTO_GF128MUL
select CRYPTO_AES
select CRYPTO_AES_ARM64
config CRYPTO_CRCT10DIF_ARM64_CE
tristate "CRCT10DIF digest algorithm using PMULL instructions"
......@@ -49,25 +54,29 @@ config CRYPTO_AES_ARM64_CE
tristate "AES core cipher using ARMv8 Crypto Extensions"
depends on ARM64 && KERNEL_MODE_NEON
select CRYPTO_ALGAPI
select CRYPTO_AES_ARM64
config CRYPTO_AES_ARM64_CE_CCM
tristate "AES in CCM mode using ARMv8 Crypto Extensions"
depends on ARM64 && KERNEL_MODE_NEON
select CRYPTO_ALGAPI
select CRYPTO_AES_ARM64_CE
select CRYPTO_AES_ARM64
select CRYPTO_AEAD
config CRYPTO_AES_ARM64_CE_BLK
tristate "AES in ECB/CBC/CTR/XTS modes using ARMv8 Crypto Extensions"
depends on ARM64 && KERNEL_MODE_NEON
depends on KERNEL_MODE_NEON
select CRYPTO_BLKCIPHER
select CRYPTO_AES_ARM64_CE
select CRYPTO_AES_ARM64
select CRYPTO_SIMD
config CRYPTO_AES_ARM64_NEON_BLK
tristate "AES in ECB/CBC/CTR/XTS modes using NEON instructions"
depends on ARM64 && KERNEL_MODE_NEON
depends on KERNEL_MODE_NEON
select CRYPTO_BLKCIPHER
select CRYPTO_AES_ARM64
select CRYPTO_AES
select CRYPTO_SIMD
......@@ -82,6 +91,7 @@ config CRYPTO_AES_ARM64_BS
depends on KERNEL_MODE_NEON
select CRYPTO_BLKCIPHER
select CRYPTO_AES_ARM64_NEON_BLK
select CRYPTO_AES_ARM64
select CRYPTO_SIMD
endif
/*
* aesce-ccm-core.S - AES-CCM transform for ARMv8 with Crypto Extensions
*
* Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
* Copyright (C) 2013 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
......@@ -32,7 +32,7 @@ ENTRY(ce_aes_ccm_auth_data)
beq 8f /* out of input? */
cbnz w8, 0b
eor v0.16b, v0.16b, v1.16b
1: ld1 {v3.16b}, [x4] /* load first round key */
1: ld1 {v3.4s}, [x4] /* load first round key */
prfm pldl1strm, [x1]
cmp w5, #12 /* which key size? */
add x6, x4, #16
......@@ -42,17 +42,17 @@ ENTRY(ce_aes_ccm_auth_data)
mov v5.16b, v3.16b
b 4f
2: mov v4.16b, v3.16b
ld1 {v5.16b}, [x6], #16 /* load 2nd round key */
ld1 {v5.4s}, [x6], #16 /* load 2nd round key */
3: aese v0.16b, v4.16b
aesmc v0.16b, v0.16b
4: ld1 {v3.16b}, [x6], #16 /* load next round key */
4: ld1 {v3.4s}, [x6], #16 /* load next round key */
aese v0.16b, v5.16b
aesmc v0.16b, v0.16b
5: ld1 {v4.16b}, [x6], #16 /* load next round key */
5: ld1 {v4.4s}, [x6], #16 /* load next round key */
subs w7, w7, #3
aese v0.16b, v3.16b
aesmc v0.16b, v0.16b
ld1 {v5.16b}, [x6], #16 /* load next round key */
ld1 {v5.4s}, [x6], #16 /* load next round key */
bpl 3b
aese v0.16b, v4.16b
subs w2, w2, #16 /* last data? */
......@@ -90,7 +90,7 @@ ENDPROC(ce_aes_ccm_auth_data)
* u32 rounds);
*/
ENTRY(ce_aes_ccm_final)
ld1 {v3.16b}, [x2], #16 /* load first round key */
ld1 {v3.4s}, [x2], #16 /* load first round key */
ld1 {v0.16b}, [x0] /* load mac */
cmp w3, #12 /* which key size? */
sub w3, w3, #2 /* modified # of rounds */
......@@ -100,17 +100,17 @@ ENTRY(ce_aes_ccm_final)
mov v5.16b, v3.16b
b 2f
0: mov v4.16b, v3.16b
1: ld1 {v5.16b}, [x2], #16 /* load next round key */
1: ld1 {v5.4s}, [x2], #16 /* load next round key */
aese v0.16b, v4.16b
aesmc v0.16b, v0.16b
aese v1.16b, v4.16b
aesmc v1.16b, v1.16b
2: ld1 {v3.16b}, [x2], #16 /* load next round key */
2: ld1 {v3.4s}, [x2], #16 /* load next round key */
aese v0.16b, v5.16b
aesmc v0.16b, v0.16b
aese v1.16b, v5.16b
aesmc v1.16b, v1.16b
3: ld1 {v4.16b}, [x2], #16 /* load next round key */
3: ld1 {v4.4s}, [x2], #16 /* load next round key */
subs w3, w3, #3
aese v0.16b, v3.16b
aesmc v0.16b, v0.16b
......@@ -137,31 +137,31 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */
cmp w4, #12 /* which key size? */
sub w7, w4, #2 /* get modified # of rounds */
ins v1.d[1], x9 /* no carry in lower ctr */
ld1 {v3.16b}, [x3] /* load first round key */
ld1 {v3.4s}, [x3] /* load first round key */
add x10, x3, #16
bmi 1f
bne 4f
mov v5.16b, v3.16b
b 3f
1: mov v4.16b, v3.16b
ld1 {v5.16b}, [x10], #16 /* load 2nd round key */
ld1 {v5.4s}, [x10], #16 /* load 2nd round key */
2: /* inner loop: 3 rounds, 2x interleaved */
aese v0.16b, v4.16b
aesmc v0.16b, v0.16b
aese v1.16b, v4.16b
aesmc v1.16b, v1.16b
3: ld1 {v3.16b}, [x10], #16 /* load next round key */
3: ld1 {v3.4s}, [x10], #16 /* load next round key */
aese v0.16b, v5.16b
aesmc v0.16b, v0.16b
aese v1.16b, v5.16b
aesmc v1.16b, v1.16b
4: ld1 {v4.16b}, [x10], #16 /* load next round key */
4: ld1 {v4.4s}, [x10], #16 /* load next round key */
subs w7, w7, #3
aese v0.16b, v3.16b
aesmc v0.16b, v0.16b
aese v1.16b, v3.16b
aesmc v1.16b, v1.16b
ld1 {v5.16b}, [x10], #16 /* load next round key */
ld1 {v5.4s}, [x10], #16 /* load next round key */
bpl 2b
aese v0.16b, v4.16b
aese v1.16b, v4.16b
......
/*
* aes-ccm-glue.c - AES-CCM transform for ARMv8 with Crypto Extensions
*
* Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
* Copyright (C) 2013 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
......@@ -9,6 +9,7 @@
*/
#include <asm/neon.h>
#include <asm/simd.h>
#include <asm/unaligned.h>
#include <crypto/aes.h>
#include <crypto/scatterwalk.h>
......@@ -44,6 +45,8 @@ asmlinkage void ce_aes_ccm_decrypt(u8 out[], u8 const in[], u32 cbytes,
asmlinkage void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u32 const rk[],
u32 rounds);
asmlinkage void __aes_arm64_encrypt(u32 *rk, u8 *out, const u8 *in, int rounds);
static int ccm_setkey(struct crypto_aead *tfm, const u8 *in_key,
unsigned int key_len)
{
......@@ -103,7 +106,45 @@ static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen)
return 0;
}
static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
static void ccm_update_mac(struct crypto_aes_ctx *key, u8 mac[], u8 const in[],
u32 abytes, u32 *macp, bool use_neon)
{
if (likely(use_neon)) {
ce_aes_ccm_auth_data(mac, in, abytes, macp, key->key_enc,
num_rounds(key));
} else {
if (*macp > 0 && *macp < AES_BLOCK_SIZE) {
int added = min(abytes, AES_BLOCK_SIZE - *macp);
crypto_xor(&mac[*macp], in, added);
*macp += added;
in += added;
abytes -= added;
}
while (abytes > AES_BLOCK_SIZE) {
__aes_arm64_encrypt(key->key_enc, mac, mac,
num_rounds(key));
crypto_xor(mac, in, AES_BLOCK_SIZE);
in += AES_BLOCK_SIZE;
abytes -= AES_BLOCK_SIZE;
}
if (abytes > 0) {
__aes_arm64_encrypt(key->key_enc, mac, mac,
num_rounds(key));
crypto_xor(mac, in, abytes);
*macp = abytes;
} else {
*macp = 0;
}
}
}
static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[],
bool use_neon)
{
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead);
......@@ -122,8 +163,7 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
ltag.len = 6;
}
ce_aes_ccm_auth_data(mac, (u8 *)&ltag, ltag.len, &macp, ctx->key_enc,
num_rounds(ctx));
ccm_update_mac(ctx, mac, (u8 *)&ltag, ltag.len, &macp, use_neon);
scatterwalk_start(&walk, req->src);
do {
......@@ -135,8 +175,7 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
n = scatterwalk_clamp(&walk, len);
}
p = scatterwalk_map(&walk);
ce_aes_ccm_auth_data(mac, p, n, &macp, ctx->key_enc,
num_rounds(ctx));
ccm_update_mac(ctx, mac, p, n, &macp, use_neon);
len -= n;
scatterwalk_unmap(p);
......@@ -145,6 +184,56 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[])
} while (len);
}
static int ccm_crypt_fallback(struct skcipher_walk *walk, u8 mac[], u8 iv0[],
struct crypto_aes_ctx *ctx, bool enc)
{
u8 buf[AES_BLOCK_SIZE];
int err = 0;
while (walk->nbytes) {
int blocks = walk->nbytes / AES_BLOCK_SIZE;
u32 tail = walk->nbytes % AES_BLOCK_SIZE;
u8 *dst = walk->dst.virt.addr;
u8 *src = walk->src.virt.addr;
u32 nbytes = walk->nbytes;
if (nbytes == walk->total && tail > 0) {
blocks++;
tail = 0;
}
do {
u32 bsize = AES_BLOCK_SIZE;
if (nbytes < AES_BLOCK_SIZE)
bsize = nbytes;
crypto_inc(walk->iv, AES_BLOCK_SIZE);
__aes_arm64_encrypt(ctx->key_enc, buf, walk->iv,
num_rounds(ctx));
__aes_arm64_encrypt(ctx->key_enc, mac, mac,
num_rounds(ctx));
if (enc)
crypto_xor(mac, src, bsize);
crypto_xor_cpy(dst, src, buf, bsize);
if (!enc)
crypto_xor(mac, dst, bsize);
dst += bsize;
src += bsize;
nbytes -= bsize;
} while (--blocks);
err = skcipher_walk_done(walk, tail);
}
if (!err) {
__aes_arm64_encrypt(ctx->key_enc, buf, iv0, num_rounds(ctx));
__aes_arm64_encrypt(ctx->key_enc, mac, mac, num_rounds(ctx));
crypto_xor(mac, buf, AES_BLOCK_SIZE);
}
return err;
}
static int ccm_encrypt(struct aead_request *req)
{
struct crypto_aead *aead = crypto_aead_reqtfm(req);
......@@ -153,39 +242,46 @@ static int ccm_encrypt(struct aead_request *req)
u8 __aligned(8) mac[AES_BLOCK_SIZE];
u8 buf[AES_BLOCK_SIZE];
u32 len = req->cryptlen;
bool use_neon = may_use_simd();
int err;
err = ccm_init_mac(req, mac, len);
if (err)
return err;
kernel_neon_begin_partial(6);
if (likely(use_neon))
kernel_neon_begin();
if (req->assoclen)
ccm_calculate_auth_mac(req, mac);
ccm_calculate_auth_mac(req, mac, use_neon);
/* preserve the original iv for the final round */
memcpy(buf, req->iv, AES_BLOCK_SIZE);
err = skcipher_walk_aead_encrypt(&walk, req, true);
while (walk.nbytes) {
u32 tail = walk.nbytes % AES_BLOCK_SIZE;
if (walk.nbytes == walk.total)
tail = 0;
if (likely(use_neon)) {
while (walk.nbytes) {
u32 tail = walk.nbytes % AES_BLOCK_SIZE;
ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
walk.nbytes - tail, ctx->key_enc,
num_rounds(ctx), mac, walk.iv);
if (walk.nbytes == walk.total)
tail = 0;
err = skcipher_walk_done(&walk, tail);
}
if (!err)
ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
ce_aes_ccm_encrypt(walk.dst.virt.addr,
walk.src.virt.addr,
walk.nbytes - tail, ctx->key_enc,
num_rounds(ctx), mac, walk.iv);
kernel_neon_end();
err = skcipher_walk_done(&walk, tail);
}
if (!err)
ce_aes_ccm_final(mac, buf, ctx->key_enc,
num_rounds(ctx));
kernel_neon_end();
} else {
err = ccm_crypt_fallback(&walk, mac, buf, ctx, true);
}
if (err)
return err;
......@@ -205,38 +301,46 @@ static int ccm_decrypt(struct aead_request *req)
u8 __aligned(8) mac[AES_BLOCK_SIZE];
u8 buf[AES_BLOCK_SIZE];
u32 len = req->cryptlen - authsize;
bool use_neon = may_use_simd();
int err;
err = ccm_init_mac(req, mac, len);
if (err)
return err;
kernel_neon_begin_partial(6);
if (likely(use_neon))
kernel_neon_begin();
if (req->assoclen)
ccm_calculate_auth_mac(req, mac);
ccm_calculate_auth_mac(req, mac, use_neon);
/* preserve the original iv for the final round */
memcpy(buf, req->iv, AES_BLOCK_SIZE);
err = skcipher_walk_aead_decrypt(&walk, req, true);
while (walk.nbytes) {
u32 tail = walk.nbytes % AES_BLOCK_SIZE;
if (likely(use_neon)) {
while (walk.nbytes) {
u32 tail = walk.nbytes % AES_BLOCK_SIZE;
if (walk.nbytes == walk.total)
tail = 0;
if (walk.nbytes == walk.total)
tail = 0;
ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
walk.nbytes - tail, ctx->key_enc,
num_rounds(ctx), mac, walk.iv);
ce_aes_ccm_decrypt(walk.dst.virt.addr,
walk.src.virt.addr,
walk.nbytes - tail, ctx->key_enc,
num_rounds(ctx), mac, walk.iv);
err = skcipher_walk_done(&walk, tail);
}
if (!err)
ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx));
err = skcipher_walk_done(&walk, tail);
}
if (!err)
ce_aes_ccm_final(mac, buf, ctx->key_enc,
num_rounds(ctx));
kernel_neon_end();
kernel_neon_end();
} else {
err = ccm_crypt_fallback(&walk, mac, buf, ctx, false);
}
if (err)
return err;
......
/*
* aes-ce-cipher.c - core AES cipher using ARMv8 Crypto Extensions
*
* Copyright (C) 2013 - 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
* Copyright (C) 2013 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
......@@ -9,6 +9,8 @@
*/
#include <asm/neon.h>
#include <asm/simd.h>
#include <asm/unaligned.h>
#include <crypto/aes.h>
#include <linux/cpufeature.h>
#include <linux/crypto.h>
......@@ -20,6 +22,9 @@ MODULE_DESCRIPTION("Synchronous AES cipher using ARMv8 Crypto Extensions");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
asmlinkage void __aes_arm64_encrypt(u32 *rk, u8 *out, const u8 *in, int rounds);
asmlinkage void __aes_arm64_decrypt(u32 *rk, u8 *out, const u8 *in, int rounds);
struct aes_block {
u8 b[AES_BLOCK_SIZE];
};
......@@ -44,27 +49,32 @@ static void aes_cipher_encrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
void *dummy0;
int dummy1;
kernel_neon_begin_partial(4);
if (!may_use_simd()) {
__aes_arm64_encrypt(ctx->key_enc, dst, src, num_rounds(ctx));
return;
}
kernel_neon_begin();
__asm__(" ld1 {v0.16b}, %[in] ;"
" ld1 {v1.16b}, [%[key]], #16 ;"
" ld1 {v1.4s}, [%[key]], #16 ;"
" cmp %w[rounds], #10 ;"
" bmi 0f ;"
" bne 3f ;"
" mov v3.16b, v1.16b ;"
" b 2f ;"
"0: mov v2.16b, v1.16b ;"
" ld1 {v3.16b}, [%[key]], #16 ;"
" ld1 {v3.4s}, [%[key]], #16 ;"
"1: aese v0.16b, v2.16b ;"
" aesmc v0.16b, v0.16b ;"
"2: ld1 {v1.16b}, [%[key]], #16 ;"
"2: ld1 {v1.4s}, [%[key]], #16 ;"
" aese v0.16b, v3.16b ;"
" aesmc v0.16b, v0.16b ;"
"3: ld1 {v2.16b}, [%[key]], #16 ;"
"3: ld1 {v2.4s}, [%[key]], #16 ;"
" subs %w[rounds], %w[rounds], #3 ;"
" aese v0.16b, v1.16b ;"
" aesmc v0.16b, v0.16b ;"
" ld1 {v3.16b}, [%[key]], #16 ;"
" ld1 {v3.4s}, [%[key]], #16 ;"
" bpl 1b ;"
" aese v0.16b, v2.16b ;"
" eor v0.16b, v0.16b, v3.16b ;"
......@@ -89,27 +99,32 @@ static void aes_cipher_decrypt(struct crypto_tfm *tfm, u8 dst[], u8 const src[])
void *dummy0;
int dummy1;
kernel_neon_begin_partial(4);
if (!may_use_simd()) {
__aes_arm64_decrypt(ctx->key_dec, dst, src, num_rounds(ctx));
return;
}
kernel_neon_begin();
__asm__(" ld1 {v0.16b}, %[in] ;"
" ld1 {v1.16b}, [%[key]], #16 ;"
" ld1 {v1.4s}, [%[key]], #16 ;"
" cmp %w[rounds], #10 ;"
" bmi 0f ;"
" bne 3f ;"
" mov v3.16b, v1.16b ;"
" b 2f ;"
"0: mov v2.16b, v1.16b ;"
" ld1 {v3.16b}, [%[key]], #16 ;"
" ld1 {v3.4s}, [%[key]], #16 ;"
"1: aesd v0.16b, v2.16b ;"
" aesimc v0.16b, v0.16b ;"
"2: ld1 {v1.16b}, [%[key]], #16 ;"
"2: ld1 {v1.4s}, [%[key]], #16 ;"
" aesd v0.16b, v3.16b ;"
" aesimc v0.16b, v0.16b ;"
"3: ld1 {v2.16b}, [%[key]], #16 ;"
"3: ld1 {v2.4s}, [%[key]], #16 ;"
" subs %w[rounds], %w[rounds], #3 ;"
" aesd v0.16b, v1.16b ;"
" aesimc v0.16b, v0.16b ;"
" ld1 {v3.16b}, [%[key]], #16 ;"
" ld1 {v3.4s}, [%[key]], #16 ;"
" bpl 1b ;"
" aesd v0.16b, v2.16b ;"
" eor v0.16b, v0.16b, v3.16b ;"
......@@ -165,20 +180,16 @@ int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key,
key_len != AES_KEYSIZE_256)
return -EINVAL;
memcpy(ctx->key_enc, in_key, key_len);
ctx->key_length = key_len;
for (i = 0; i < kwords; i++)
ctx->key_enc[i] = get_unaligned_le32(in_key + i * sizeof(u32));
kernel_neon_begin_partial(2);
kernel_neon_begin();
for (i = 0; i < sizeof(rcon); i++) {
u32 *rki = ctx->key_enc + (i * kwords);
u32 *rko = rki + kwords;
#ifndef CONFIG_CPU_BIG_ENDIAN
rko[0] = ror32(aes_sub(rki[kwords - 1]), 8) ^ rcon[i] ^ rki[0];
#else
rko[0] = rol32(aes_sub(rki[kwords - 1]), 8) ^ (rcon[i] << 24) ^
rki[0];
#endif
rko[1] = rko[0] ^ rki[1];
rko[2] = rko[1] ^ rki[2];
rko[3] = rko[2] ^ rki[3];
......@@ -210,9 +221,9 @@ int ce_aes_expandkey(struct crypto_aes_ctx *ctx, const u8 *in_key,
key_dec[0] = key_enc[j];
for (i = 1, j--; j > 0; i++, j--)
__asm__("ld1 {v0.16b}, %[in] ;"
__asm__("ld1 {v0.4s}, %[in] ;"
"aesimc v1.16b, v0.16b ;"
"st1 {v1.16b}, %[out] ;"
"st1 {v1.4s}, %[out] ;"
: [out] "=Q"(key_dec[i])
: [in] "Q"(key_enc[j])
......
......@@ -2,7 +2,7 @@
* linux/arch/arm64/crypto/aes-ce.S - AES cipher for ARMv8 with
* Crypto Extensions
*
* Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
* Copyright (C) 2013 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
......@@ -22,11 +22,11 @@
cmp \rounds, #12
blo 2222f /* 128 bits */
beq 1111f /* 192 bits */
ld1 {v17.16b-v18.16b}, [\rk], #32
1111: ld1 {v19.16b-v20.16b}, [\rk], #32
2222: ld1 {v21.16b-v24.16b}, [\rk], #64
ld1 {v25.16b-v28.16b}, [\rk], #64
ld1 {v29.16b-v31.16b}, [\rk]
ld1 {v17.4s-v18.4s}, [\rk], #32
1111: ld1 {v19.4s-v20.4s}, [\rk], #32
2222: ld1 {v21.4s-v24.4s}, [\rk], #64
ld1 {v25.4s-v28.4s}, [\rk], #64
ld1 {v29.4s-v31.4s}, [\rk]
.endm
/* prepare for encryption with key in rk[] */
......
......@@ -10,6 +10,7 @@
#include <linux/linkage.h>
#include <asm/assembler.h>
#include <asm/cache.h>
.text
......@@ -17,94 +18,155 @@
out .req x1
in .req x2
rounds .req x3
tt .req x4
lt .req x2
tt .req x2
.macro __pair, enc, reg0, reg1, in0, in1e, in1d, shift
.macro __pair1, sz, op, reg0, reg1, in0, in1e, in1d, shift
.ifc \op\shift, b0
ubfiz \reg0, \in0, #2, #8
ubfiz \reg1, \in1e, #2, #8
.else
ubfx \reg0, \in0, #\shift, #8
.if \enc
ubfx \reg1, \in1e, #\shift, #8
.else
ubfx \reg1, \in1d, #\shift, #8
.endif
/*
* AArch64 cannot do byte size indexed loads from a table containing
* 32-bit quantities, i.e., 'ldrb w12, [tt, w12, uxtw #2]' is not a
* valid instruction. So perform the shift explicitly first for the
* high bytes (the low byte is shifted implicitly by using ubfiz rather
* than ubfx above)
*/
.ifnc \op, b
ldr \reg0, [tt, \reg0, uxtw #2]
ldr \reg1, [tt, \reg1, uxtw #2]
.else
.if \shift > 0
lsl \reg0, \reg0, #2
lsl \reg1, \reg1, #2
.endif
ldrb \reg0, [tt, \reg0, uxtw]
ldrb \reg1, [tt, \reg1, uxtw]
.endif
.endm
.macro __hround, out0, out1, in0, in1, in2, in3, t0, t1, enc
.macro __pair0, sz, op, reg0, reg1, in0, in1e, in1d, shift
ubfx \reg0, \in0, #\shift, #8
ubfx \reg1, \in1d, #\shift, #8
ldr\op \reg0, [tt, \reg0, uxtw #\sz]
ldr\op \reg1, [tt, \reg1, uxtw #\sz]
.endm
.macro __hround, out0, out1, in0, in1, in2, in3, t0, t1, enc, sz, op
ldp \out0, \out1, [rk], #8
__pair \enc, w13, w14, \in0, \in1, \in3, 0
__pair \enc, w15, w16, \in1, \in2, \in0, 8
__pair \enc, w17, w18, \in2, \in3, \in1, 16
__pair \enc, \t0, \t1, \in3, \in0, \in2, 24
eor \out0, \out0, w13
eor \out1, \out1, w14
eor \out0, \out0, w15, ror #24
eor \out1, \out1, w16, ror #24
eor \out0, \out0, w17, ror #16
eor \out1, \out1, w18, ror #16
__pair\enc \sz, \op, w12, w13, \in0, \in1, \in3, 0
__pair\enc \sz, \op, w14, w15, \in1, \in2, \in0, 8
__pair\enc \sz, \op, w16, w17, \in2, \in3, \in1, 16
__pair\enc \sz, \op, \t0, \t1, \in3, \in0, \in2, 24
eor \out0, \out0, w12
eor \out1, \out1, w13
eor \out0, \out0, w14, ror #24
eor \out1, \out1, w15, ror #24
eor \out0, \out0, w16, ror #16
eor \out1, \out1, w17, ror #16
eor \out0, \out0, \t0, ror #8
eor \out1, \out1, \t1, ror #8
.endm
.macro fround, out0, out1, out2, out3, in0, in1, in2, in3
__hround \out0, \out1, \in0, \in1, \in2, \in3, \out2, \out3, 1
__hround \out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1
.macro fround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op
__hround \out0, \out1, \in0, \in1, \in2, \in3, \out2, \out3, 1, \sz, \op
__hround \out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1, \sz, \op
.endm
.macro iround, out0, out1, out2, out3, in0, in1, in2, in3
__hround \out0, \out1, \in0, \in3, \in2, \in1, \out2, \out3, 0
__hround \out2, \out3, \in2, \in1, \in0, \in3, \in1, \in0, 0
.macro iround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op
__hround \out0, \out1, \in0, \in3, \in2, \in1, \out2, \out3, 0, \sz, \op
__hround \out2, \out3, \in2, \in1, \in0, \in3, \in1, \in0, 0, \sz, \op
.endm
.macro do_crypt, round, ttab, ltab
ldp w5, w6, [in]
ldp w7, w8, [in, #8]
ldp w9, w10, [rk], #16
ldp w11, w12, [rk, #-8]
.macro do_crypt, round, ttab, ltab, bsz
ldp w4, w5, [in]
ldp w6, w7, [in, #8]
ldp w8, w9, [rk], #16
ldp w10, w11, [rk, #-8]
CPU_BE( rev w4, w4 )
CPU_BE( rev w5, w5 )
CPU_BE( rev w6, w6 )
CPU_BE( rev w7, w7 )
CPU_BE( rev w8, w8 )
eor w4, w4, w8
eor w5, w5, w9
eor w6, w6, w10
eor w7, w7, w11
eor w8, w8, w12
adr_l tt, \ttab
adr_l lt, \ltab
tbnz rounds, #1, 1f
0: \round w9, w10, w11, w12, w5, w6, w7, w8
\round w5, w6, w7, w8, w9, w10, w11, w12
0: \round w8, w9, w10, w11, w4, w5, w6, w7
\round w4, w5, w6, w7, w8, w9, w10, w11
1: subs rounds, rounds, #4
\round w9, w10, w11, w12, w5, w6, w7, w8
csel tt, tt, lt, hi
\round w5, w6, w7, w8, w9, w10, w11, w12
b.hi 0b
\round w8, w9, w10, w11, w4, w5, w6, w7
b.ls 3f
2: \round w4, w5, w6, w7, w8, w9, w10, w11
b 0b
3: adr_l tt, \ltab
\round w4, w5, w6, w7, w8, w9, w10, w11, \bsz, b
CPU_BE( rev w4, w4 )
CPU_BE( rev w5, w5 )
CPU_BE( rev w6, w6 )
CPU_BE( rev w7, w7 )
CPU_BE( rev w8, w8 )
stp w5, w6, [out]
stp w7, w8, [out, #8]
stp w4, w5, [out]
stp w6, w7, [out, #8]
ret
.endm
.align 5
.align L1_CACHE_SHIFT
.type __aes_arm64_inverse_sbox, %object
__aes_arm64_inverse_sbox:
.byte 0x52, 0x09, 0x6a, 0xd5, 0x30, 0x36, 0xa5, 0x38
.byte 0xbf, 0x40, 0xa3, 0x9e, 0x81, 0xf3, 0xd7, 0xfb
.byte 0x7c, 0xe3, 0x39, 0x82, 0x9b, 0x2f, 0xff, 0x87
.byte 0x34, 0x8e, 0x43, 0x44, 0xc4, 0xde, 0xe9, 0xcb
.byte 0x54, 0x7b, 0x94, 0x32, 0xa6, 0xc2, 0x23, 0x3d
.byte 0xee, 0x4c, 0x95, 0x0b, 0x42, 0xfa, 0xc3, 0x4e
.byte 0x08, 0x2e, 0xa1, 0x66, 0x28, 0xd9, 0x24, 0xb2
.byte 0x76, 0x5b, 0xa2, 0x49, 0x6d, 0x8b, 0xd1, 0x25
.byte 0x72, 0xf8, 0xf6, 0x64, 0x86, 0x68, 0x98, 0x16
.byte 0xd4, 0xa4, 0x5c, 0xcc, 0x5d, 0x65, 0xb6, 0x92
.byte 0x6c, 0x70, 0x48, 0x50, 0xfd, 0xed, 0xb9, 0xda
.byte 0x5e, 0x15, 0x46, 0x57, 0xa7, 0x8d, 0x9d, 0x84
.byte 0x90, 0xd8, 0xab, 0x00, 0x8c, 0xbc, 0xd3, 0x0a
.byte 0xf7, 0xe4, 0x58, 0x05, 0xb8, 0xb3, 0x45, 0x06
.byte 0xd0, 0x2c, 0x1e, 0x8f, 0xca, 0x3f, 0x0f, 0x02
.byte 0xc1, 0xaf, 0xbd, 0x03, 0x01, 0x13, 0x8a, 0x6b
.byte 0x3a, 0x91, 0x11, 0x41, 0x4f, 0x67, 0xdc, 0xea
.byte 0x97, 0xf2, 0xcf, 0xce, 0xf0, 0xb4, 0xe6, 0x73
.byte 0x96, 0xac, 0x74, 0x22, 0xe7, 0xad, 0x35, 0x85
.byte 0xe2, 0xf9, 0x37, 0xe8, 0x1c, 0x75, 0xdf, 0x6e
.byte 0x47, 0xf1, 0x1a, 0x71, 0x1d, 0x29, 0xc5, 0x89
.byte 0x6f, 0xb7, 0x62, 0x0e, 0xaa, 0x18, 0xbe, 0x1b
.byte 0xfc, 0x56, 0x3e, 0x4b, 0xc6, 0xd2, 0x79, 0x20
.byte 0x9a, 0xdb, 0xc0, 0xfe, 0x78, 0xcd, 0x5a, 0xf4
.byte 0x1f, 0xdd, 0xa8, 0x33, 0x88, 0x07, 0xc7, 0x31
.byte 0xb1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xec, 0x5f
.byte 0x60, 0x51, 0x7f, 0xa9, 0x19, 0xb5, 0x4a, 0x0d
.byte 0x2d, 0xe5, 0x7a, 0x9f, 0x93, 0xc9, 0x9c, 0xef
.byte 0xa0, 0xe0, 0x3b, 0x4d, 0xae, 0x2a, 0xf5, 0xb0
.byte 0xc8, 0xeb, 0xbb, 0x3c, 0x83, 0x53, 0x99, 0x61
.byte 0x17, 0x2b, 0x04, 0x7e, 0xba, 0x77, 0xd6, 0x26
.byte 0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d
.size __aes_arm64_inverse_sbox, . - __aes_arm64_inverse_sbox
ENTRY(__aes_arm64_encrypt)
do_crypt fround, crypto_ft_tab, crypto_fl_tab
do_crypt fround, crypto_ft_tab, crypto_ft_tab + 1, 2
ENDPROC(__aes_arm64_encrypt)
.align 5
ENTRY(__aes_arm64_decrypt)
do_crypt iround, crypto_it_tab, crypto_il_tab
do_crypt iround, crypto_it_tab, __aes_arm64_inverse_sbox, 0
ENDPROC(__aes_arm64_decrypt)
/*
* Fallback for sync aes(ctr) in contexts where kernel mode NEON
* is not allowed
*
* Copyright (C) 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <crypto/aes.h>
#include <crypto/internal/skcipher.h>
asmlinkage void __aes_arm64_encrypt(u32 *rk, u8 *out, const u8 *in, int rounds);
static inline int aes_ctr_encrypt_fallback(struct crypto_aes_ctx *ctx,
struct skcipher_request *req)
{
struct skcipher_walk walk;
u8 buf[AES_BLOCK_SIZE];
int err;
err = skcipher_walk_virt(&walk, req, true);
while (walk.nbytes > 0) {
u8 *dst = walk.dst.virt.addr;
u8 *src = walk.src.virt.addr;
int nbytes = walk.nbytes;
int tail = 0;
if (nbytes < walk.total) {
nbytes = round_down(nbytes, AES_BLOCK_SIZE);
tail = walk.nbytes % AES_BLOCK_SIZE;
}
do {
int bsize = min(nbytes, AES_BLOCK_SIZE);
__aes_arm64_encrypt(ctx->key_enc, buf, walk.iv,
6 + ctx->key_length / 4);
crypto_xor_cpy(dst, src, buf, bsize);
crypto_inc(walk.iv, AES_BLOCK_SIZE);
dst += AES_BLOCK_SIZE;
src += AES_BLOCK_SIZE;
nbytes -= AES_BLOCK_SIZE;
} while (nbytes > 0);
err = skcipher_walk_done(&walk, tail);
}
return err;
}
......@@ -10,6 +10,7 @@
#include <asm/neon.h>
#include <asm/hwcap.h>
#include <asm/simd.h>
#include <crypto/aes.h>
#include <crypto/internal/hash.h>
#include <crypto/internal/simd.h>
......@@ -19,6 +20,7 @@
#include <crypto/xts.h>
#include "aes-ce-setkey.h"
#include "aes-ctr-fallback.h"
#ifdef USE_V8_CRYPTO_EXTENSIONS
#define MODE "ce"
......@@ -241,9 +243,7 @@ static int ctr_encrypt(struct skcipher_request *req)
aes_ctr_encrypt(tail, NULL, (u8 *)ctx->key_enc, rounds,
blocks, walk.iv, first);
if (tdst != tsrc)
memcpy(tdst, tsrc, nbytes);
crypto_xor(tdst, tail, nbytes);
crypto_xor_cpy(tdst, tsrc, tail, nbytes);
err = skcipher_walk_done(&walk, 0);
}
kernel_neon_end();
......@@ -251,6 +251,17 @@ static int ctr_encrypt(struct skcipher_request *req)
return err;
}
static int ctr_encrypt_sync(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
if (!may_use_simd())
return aes_ctr_encrypt_fallback(ctx, req);
return ctr_encrypt(req);
}
static int xts_encrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
......@@ -357,8 +368,8 @@ static struct skcipher_alg aes_algs[] = { {
.ivsize = AES_BLOCK_SIZE,
.chunksize = AES_BLOCK_SIZE,
.setkey = skcipher_aes_setkey,
.encrypt = ctr_encrypt,
.decrypt = ctr_encrypt,
.encrypt = ctr_encrypt_sync,
.decrypt = ctr_encrypt_sync,
}, {
.base = {
.cra_name = "__xts(aes)",
......@@ -460,11 +471,35 @@ static int mac_init(struct shash_desc *desc)
return 0;
}
static void mac_do_update(struct crypto_aes_ctx *ctx, u8 const in[], int blocks,
u8 dg[], int enc_before, int enc_after)
{
int rounds = 6 + ctx->key_length / 4;
if (may_use_simd()) {
kernel_neon_begin();
aes_mac_update(in, ctx->key_enc, rounds, blocks, dg, enc_before,
enc_after);
kernel_neon_end();
} else {
if (enc_before)
__aes_arm64_encrypt(ctx->key_enc, dg, dg, rounds);
while (blocks--) {
crypto_xor(dg, in, AES_BLOCK_SIZE);
in += AES_BLOCK_SIZE;
if (blocks || enc_after)
__aes_arm64_encrypt(ctx->key_enc, dg, dg,
rounds);
}
}
}
static int mac_update(struct shash_desc *desc, const u8 *p, unsigned int len)
{
struct mac_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm);
struct mac_desc_ctx *ctx = shash_desc_ctx(desc);
int rounds = 6 + tctx->key.key_length / 4;
while (len > 0) {
unsigned int l;
......@@ -476,10 +511,8 @@ static int mac_update(struct shash_desc *desc, const u8 *p, unsigned int len)
len %= AES_BLOCK_SIZE;
kernel_neon_begin();
aes_mac_update(p, tctx->key.key_enc, rounds, blocks,
ctx->dg, (ctx->len != 0), (len != 0));
kernel_neon_end();
mac_do_update(&tctx->key, p, blocks, ctx->dg,
(ctx->len != 0), (len != 0));
p += blocks * AES_BLOCK_SIZE;
......@@ -507,11 +540,8 @@ static int cbcmac_final(struct shash_desc *desc, u8 *out)
{
struct mac_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm);
struct mac_desc_ctx *ctx = shash_desc_ctx(desc);
int rounds = 6 + tctx->key.key_length / 4;
kernel_neon_begin();
aes_mac_update(NULL, tctx->key.key_enc, rounds, 0, ctx->dg, 1, 0);
kernel_neon_end();
mac_do_update(&tctx->key, NULL, 0, ctx->dg, 1, 0);
memcpy(out, ctx->dg, AES_BLOCK_SIZE);
......@@ -522,7 +552,6 @@ static int cmac_final(struct shash_desc *desc, u8 *out)
{
struct mac_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm);
struct mac_desc_ctx *ctx = shash_desc_ctx(desc);
int rounds = 6 + tctx->key.key_length / 4;
u8 *consts = tctx->consts;
if (ctx->len != AES_BLOCK_SIZE) {
......@@ -530,9 +559,7 @@ static int cmac_final(struct shash_desc *desc, u8 *out)
consts += AES_BLOCK_SIZE;
}
kernel_neon_begin();
aes_mac_update(consts, tctx->key.key_enc, rounds, 1, ctx->dg, 0, 1);
kernel_neon_end();
mac_do_update(&tctx->key, consts, 1, ctx->dg, 0, 1);
memcpy(out, ctx->dg, AES_BLOCK_SIZE);
......
/*
* Bit sliced AES using NEON instructions
*
* Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
* Copyright (C) 2016 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
......@@ -9,12 +9,15 @@
*/
#include <asm/neon.h>
#include <asm/simd.h>
#include <crypto/aes.h>
#include <crypto/internal/simd.h>
#include <crypto/internal/skcipher.h>
#include <crypto/xts.h>
#include <linux/module.h>
#include "aes-ctr-fallback.h"
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
......@@ -58,6 +61,11 @@ struct aesbs_cbc_ctx {
u32 enc[AES_MAX_KEYLENGTH_U32];
};
struct aesbs_ctr_ctx {
struct aesbs_ctx key; /* must be first member */
struct crypto_aes_ctx fallback;
};
struct aesbs_xts_ctx {
struct aesbs_ctx key;
u32 twkey[AES_MAX_KEYLENGTH_U32];
......@@ -196,6 +204,25 @@ static int cbc_decrypt(struct skcipher_request *req)
return err;
}
static int aesbs_ctr_setkey_sync(struct crypto_skcipher *tfm, const u8 *in_key,
unsigned int key_len)
{
struct aesbs_ctr_ctx *ctx = crypto_skcipher_ctx(tfm);
int err;
err = crypto_aes_expand_key(&ctx->fallback, in_key, key_len);
if (err)
return err;
ctx->key.rounds = 6 + key_len / 4;
kernel_neon_begin();
aesbs_convert_key(ctx->key.rk, ctx->fallback.key_enc, ctx->key.rounds);
kernel_neon_end();
return 0;
}
static int ctr_encrypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
......@@ -224,9 +251,8 @@ static int ctr_encrypt(struct skcipher_request *req)
u8 *dst = walk.dst.virt.addr + blocks * AES_BLOCK_SIZE;
u8 *src = walk.src.virt.addr + blocks * AES_BLOCK_SIZE;
if (dst != src)
memcpy(dst, src, walk.total % AES_BLOCK_SIZE);
crypto_xor(dst, final, walk.total % AES_BLOCK_SIZE);
crypto_xor_cpy(dst, src, final,
walk.total % AES_BLOCK_SIZE);
err = skcipher_walk_done(&walk, 0);
break;
......@@ -260,6 +286,17 @@ static int aesbs_xts_setkey(struct crypto_skcipher *tfm, const u8 *in_key,
return aesbs_setkey(tfm, in_key, key_len);
}
static int ctr_encrypt_sync(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct aesbs_ctr_ctx *ctx = crypto_skcipher_ctx(tfm);
if (!may_use_simd())
return aes_ctr_encrypt_fallback(&ctx->fallback, req);
return ctr_encrypt(req);
}
static int __xts_crypt(struct skcipher_request *req,
void (*fn)(u8 out[], u8 const in[], u8 const rk[],
int rounds, int blocks, u8 iv[]))
......@@ -356,7 +393,7 @@ static struct skcipher_alg aes_algs[] = { {
.base.cra_driver_name = "ctr-aes-neonbs",
.base.cra_priority = 250 - 1,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct aesbs_ctx),
.base.cra_ctxsize = sizeof(struct aesbs_ctr_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = AES_MIN_KEY_SIZE,
......@@ -364,9 +401,9 @@ static struct skcipher_alg aes_algs[] = { {
.chunksize = AES_BLOCK_SIZE,
.walksize = 8 * AES_BLOCK_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = aesbs_setkey,
.encrypt = ctr_encrypt,
.decrypt = ctr_encrypt,
.setkey = aesbs_ctr_setkey_sync,
.encrypt = ctr_encrypt_sync,
.decrypt = ctr_encrypt_sync,
}, {
.base.cra_name = "__xts(aes)",
.base.cra_driver_name = "__xts-aes-neonbs",
......
/*
* ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
*
* Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
* Copyright (C) 2016 - 2017 Linaro, Ltd. <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
......@@ -26,6 +26,7 @@
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/simd.h>
asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
......@@ -64,7 +65,7 @@ static int chacha20_neon(struct skcipher_request *req)
u32 state[16];
int err;
if (req->cryptlen <= CHACHA20_BLOCK_SIZE)
if (!may_use_simd() || req->cryptlen <= CHACHA20_BLOCK_SIZE)
return crypto_chacha20_crypt(req);
err = skcipher_walk_virt(&walk, req, true);
......
/*
* Accelerated CRC32(C) using arm64 NEON and Crypto Extensions instructions
*
* Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
* Copyright (C) 2016 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
......@@ -19,6 +19,7 @@
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/simd.h>
#include <asm/unaligned.h>
#define PMULL_MIN_LEN 64L /* minimum size of buffer
......@@ -105,10 +106,10 @@ static int crc32_pmull_update(struct shash_desc *desc, const u8 *data,
length -= l;
}
if (length >= PMULL_MIN_LEN) {
if (length >= PMULL_MIN_LEN && may_use_simd()) {
l = round_down(length, SCALE_F);
kernel_neon_begin_partial(10);
kernel_neon_begin();
*crc = crc32_pmull_le(data, l, *crc);
kernel_neon_end();
......@@ -137,10 +138,10 @@ static int crc32c_pmull_update(struct shash_desc *desc, const u8 *data,
length -= l;
}
if (length >= PMULL_MIN_LEN) {
if (length >= PMULL_MIN_LEN && may_use_simd()) {
l = round_down(length, SCALE_F);
kernel_neon_begin_partial(10);
kernel_neon_begin();
*crc = crc32c_pmull_le(data, l, *crc);
kernel_neon_end();
......
/*
* Accelerated CRC-T10DIF using arm64 NEON and Crypto Extensions instructions
*
* Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
* Copyright (C) 2016 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
......@@ -18,6 +18,7 @@
#include <crypto/internal/hash.h>
#include <asm/neon.h>
#include <asm/simd.h>
#define CRC_T10DIF_PMULL_CHUNK_SIZE 16U
......@@ -48,9 +49,13 @@ static int crct10dif_update(struct shash_desc *desc, const u8 *data,
}
if (length > 0) {
kernel_neon_begin_partial(14);
*crc = crc_t10dif_pmull(*crc, data, length);
kernel_neon_end();
if (may_use_simd()) {
kernel_neon_begin();
*crc = crc_t10dif_pmull(*crc, data, length);
kernel_neon_end();
} else {
*crc = crc_t10dif_generic(*crc, data, length);
}
}
return 0;
......
This diff is collapsed.
This diff is collapsed.
/*
* sha1-ce-glue.c - SHA-1 secure hash using ARMv8 Crypto Extensions
*
* Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
* Copyright (C) 2014 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
......@@ -9,6 +9,7 @@
*/
#include <asm/neon.h>
#include <asm/simd.h>
#include <asm/unaligned.h>
#include <crypto/internal/hash.h>
#include <crypto/sha.h>
......@@ -37,8 +38,11 @@ static int sha1_ce_update(struct shash_desc *desc, const u8 *data,
{
struct sha1_ce_state *sctx = shash_desc_ctx(desc);
if (!may_use_simd())
return crypto_sha1_update(desc, data, len);
sctx->finalize = 0;
kernel_neon_begin_partial(16);
kernel_neon_begin();
sha1_base_do_update(desc, data, len,
(sha1_block_fn *)sha1_ce_transform);
kernel_neon_end();
......@@ -52,13 +56,16 @@ static int sha1_ce_finup(struct shash_desc *desc, const u8 *data,
struct sha1_ce_state *sctx = shash_desc_ctx(desc);
bool finalize = !sctx->sst.count && !(len % SHA1_BLOCK_SIZE);
if (!may_use_simd())
return crypto_sha1_finup(desc, data, len, out);
/*
* Allow the asm code to perform the finalization if there is no
* partial data and the input is a round multiple of the block size.
*/
sctx->finalize = finalize;
kernel_neon_begin_partial(16);
kernel_neon_begin();
sha1_base_do_update(desc, data, len,
(sha1_block_fn *)sha1_ce_transform);
if (!finalize)
......@@ -71,8 +78,11 @@ static int sha1_ce_final(struct shash_desc *desc, u8 *out)
{
struct sha1_ce_state *sctx = shash_desc_ctx(desc);
if (!may_use_simd())
return crypto_sha1_finup(desc, NULL, 0, out);
sctx->finalize = 0;
kernel_neon_begin_partial(16);
kernel_neon_begin();
sha1_base_do_finalize(desc, (sha1_block_fn *)sha1_ce_transform);
kernel_neon_end();
return sha1_base_finish(desc, out);
......
/*
* sha2-ce-glue.c - SHA-224/SHA-256 using ARMv8 Crypto Extensions
*
* Copyright (C) 2014 Linaro Ltd <ard.biesheuvel@linaro.org>
* Copyright (C) 2014 - 2017 Linaro Ltd <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
......@@ -9,6 +9,7 @@
*/
#include <asm/neon.h>
#include <asm/simd.h>
#include <asm/unaligned.h>
#include <crypto/internal/hash.h>
#include <crypto/sha.h>
......@@ -34,13 +35,19 @@ const u32 sha256_ce_offsetof_count = offsetof(struct sha256_ce_state,
const u32 sha256_ce_offsetof_finalize = offsetof(struct sha256_ce_state,
finalize);
asmlinkage void sha256_block_data_order(u32 *digest, u8 const *src, int blocks);
static int sha256_ce_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
struct sha256_ce_state *sctx = shash_desc_ctx(desc);
if (!may_use_simd())
return sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha256_block_data_order);
sctx->finalize = 0;
kernel_neon_begin_partial(28);
kernel_neon_begin();
sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha2_ce_transform);
kernel_neon_end();
......@@ -54,13 +61,22 @@ static int sha256_ce_finup(struct shash_desc *desc, const u8 *data,
struct sha256_ce_state *sctx = shash_desc_ctx(desc);
bool finalize = !sctx->sst.count && !(len % SHA256_BLOCK_SIZE);
if (!may_use_simd()) {
if (len)
sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha256_block_data_order);
sha256_base_do_finalize(desc,
(sha256_block_fn *)sha256_block_data_order);
return sha256_base_finish(desc, out);
}
/*
* Allow the asm code to perform the finalization if there is no
* partial data and the input is a round multiple of the block size.
*/
sctx->finalize = finalize;
kernel_neon_begin_partial(28);
kernel_neon_begin();
sha256_base_do_update(desc, data, len,
(sha256_block_fn *)sha2_ce_transform);
if (!finalize)
......@@ -74,8 +90,14 @@ static int sha256_ce_final(struct shash_desc *desc, u8 *out)
{
struct sha256_ce_state *sctx = shash_desc_ctx(desc);
if (!may_use_simd()) {
sha256_base_do_finalize(desc,
(sha256_block_fn *)sha256_block_data_order);
return sha256_base_finish(desc, out);
}
sctx->finalize = 0;
kernel_neon_begin_partial(28);
kernel_neon_begin();
sha256_base_do_finalize(desc, (sha256_block_fn *)sha2_ce_transform);
kernel_neon_end();
return sha256_base_finish(desc, out);
......
......@@ -29,6 +29,7 @@ MODULE_ALIAS_CRYPTO("sha256");
asmlinkage void sha256_block_data_order(u32 *digest, const void *data,
unsigned int num_blks);
EXPORT_SYMBOL(sha256_block_data_order);
asmlinkage void sha256_block_neon(u32 *digest, const void *data,
unsigned int num_blks);
......
......@@ -344,8 +344,7 @@ static void ctr_crypt_final(struct crypto_sparc64_aes_ctx *ctx,
ctx->ops->ecb_encrypt(&ctx->key[0], (const u64 *)ctrblk,
keystream, AES_BLOCK_SIZE);
crypto_xor((u8 *) keystream, src, nbytes);
memcpy(dst, keystream, nbytes);
crypto_xor_cpy(dst, (u8 *) keystream, src, nbytes);
crypto_inc(ctrblk, AES_BLOCK_SIZE);
}
......
......@@ -475,8 +475,8 @@ static void ctr_crypt_final(struct crypto_aes_ctx *ctx,
unsigned int nbytes = walk->nbytes;
aesni_enc(ctx, keystream, ctrblk);
crypto_xor(keystream, src, nbytes);
memcpy(dst, keystream, nbytes);
crypto_xor_cpy(dst, keystream, src, nbytes);
crypto_inc(ctrblk, AES_BLOCK_SIZE);
}
......
......@@ -271,8 +271,7 @@ static void ctr_crypt_final(struct bf_ctx *ctx, struct blkcipher_walk *walk)
unsigned int nbytes = walk->nbytes;
blowfish_enc_blk(ctx, keystream, ctrblk);
crypto_xor(keystream, src, nbytes);
memcpy(dst, keystream, nbytes);
crypto_xor_cpy(dst, keystream, src, nbytes);
crypto_inc(ctrblk, BF_BLOCK_SIZE);
}
......
......@@ -256,8 +256,7 @@ static void ctr_crypt_final(struct blkcipher_desc *desc,
unsigned int nbytes = walk->nbytes;
__cast5_encrypt(ctx, keystream, ctrblk);
crypto_xor(keystream, src, nbytes);
memcpy(dst, keystream, nbytes);
crypto_xor_cpy(dst, keystream, src, nbytes);
crypto_inc(ctrblk, CAST5_BLOCK_SIZE);
}
......
......@@ -277,8 +277,7 @@ static void ctr_crypt_final(struct des3_ede_x86_ctx *ctx,
unsigned int nbytes = walk->nbytes;
des3_ede_enc_blk(ctx, keystream, ctrblk);
crypto_xor(keystream, src, nbytes);
memcpy(dst, keystream, nbytes);
crypto_xor_cpy(dst, keystream, src, nbytes);
crypto_inc(ctrblk, DES3_EDE_BLOCK_SIZE);
}
......
......@@ -1753,6 +1753,8 @@ config CRYPTO_USER_API_AEAD
tristate "User-space interface for AEAD cipher algorithms"
depends on NET
select CRYPTO_AEAD
select CRYPTO_BLKCIPHER
select CRYPTO_NULL
select CRYPTO_USER_API
help
This option enables the user-spaces interface for AEAD
......
This diff is collapsed.
......@@ -588,6 +588,35 @@ int crypto_unregister_ahash(struct ahash_alg *alg)
}
EXPORT_SYMBOL_GPL(crypto_unregister_ahash);
int crypto_register_ahashes(struct ahash_alg *algs, int count)
{
int i, ret;
for (i = 0; i < count; i++) {
ret = crypto_register_ahash(&algs[i]);
if (ret)
goto err;
}
return 0;
err:
for (--i; i >= 0; --i)
crypto_unregister_ahash(&algs[i]);
return ret;
}
EXPORT_SYMBOL_GPL(crypto_register_ahashes);
void crypto_unregister_ahashes(struct ahash_alg *algs, int count)
{
int i;
for (i = count - 1; i >= 0; --i)
crypto_unregister_ahash(&algs[i]);
}
EXPORT_SYMBOL_GPL(crypto_unregister_ahashes);
int ahash_register_instance(struct crypto_template *tmpl,
struct ahash_instance *inst)
{
......
......@@ -975,13 +975,15 @@ void crypto_inc(u8 *a, unsigned int size)
}
EXPORT_SYMBOL_GPL(crypto_inc);
void __crypto_xor(u8 *dst, const u8 *src, unsigned int len)
void __crypto_xor(u8 *dst, const u8 *src1, const u8 *src2, unsigned int len)
{
int relalign = 0;
if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) {
int size = sizeof(unsigned long);
int d = ((unsigned long)dst ^ (unsigned long)src) & (size - 1);
int d = (((unsigned long)dst ^ (unsigned long)src1) |
((unsigned long)dst ^ (unsigned long)src2)) &
(size - 1);
relalign = d ? 1 << __ffs(d) : size;
......@@ -992,34 +994,37 @@ void __crypto_xor(u8 *dst, const u8 *src, unsigned int len)
* process the remainder of the input using optimal strides.
*/
while (((unsigned long)dst & (relalign - 1)) && len > 0) {
*dst++ ^= *src++;
*dst++ = *src1++ ^ *src2++;
len--;
}
}
while (IS_ENABLED(CONFIG_64BIT) && len >= 8 && !(relalign & 7)) {
*(u64 *)dst ^= *(u64 *)src;
*(u64 *)dst = *(u64 *)src1 ^ *(u64 *)src2;
dst += 8;
src += 8;
src1 += 8;
src2 += 8;
len -= 8;
}
while (len >= 4 && !(relalign & 3)) {
*(u32 *)dst ^= *(u32 *)src;
*(u32 *)dst = *(u32 *)src1 ^ *(u32 *)src2;
dst += 4;
src += 4;
src1 += 4;
src2 += 4;
len -= 4;
}
while (len >= 2 && !(relalign & 1)) {
*(u16 *)dst ^= *(u16 *)src;
*(u16 *)dst = *(u16 *)src1 ^ *(u16 *)src2;
dst += 2;
src += 2;
src1 += 2;
src2 += 2;
len -= 2;
}
while (len--)
*dst++ ^= *src++;
*dst++ = *src1++ ^ *src2++;
}
EXPORT_SYMBOL_GPL(__crypto_xor);
......
This diff is collapsed.
This diff is collapsed.
......@@ -65,8 +65,7 @@ static void crypto_ctr_crypt_final(struct blkcipher_walk *walk,
unsigned int nbytes = walk->nbytes;
crypto_cipher_encrypt_one(tfm, keystream, ctrblk);
crypto_xor(keystream, src, nbytes);
memcpy(dst, keystream, nbytes);
crypto_xor_cpy(dst, keystream, src, nbytes);
crypto_inc(ctrblk, bsize);
}
......
......@@ -20,8 +20,6 @@ struct ecdh_ctx {
unsigned int curve_id;
unsigned int ndigits;
u64 private_key[ECC_MAX_DIGITS];
u64 public_key[2 * ECC_MAX_DIGITS];
u64 shared_secret[ECC_MAX_DIGITS];
};
static inline struct ecdh_ctx *ecdh_get_ctx(struct crypto_kpp *tfm)
......@@ -70,41 +68,58 @@ static int ecdh_set_secret(struct crypto_kpp *tfm, const void *buf,
static int ecdh_compute_value(struct kpp_request *req)
{
int ret = 0;
struct crypto_kpp *tfm = crypto_kpp_reqtfm(req);
struct ecdh_ctx *ctx = ecdh_get_ctx(tfm);
size_t copied, nbytes;
u64 *public_key;
u64 *shared_secret = NULL;
void *buf;
size_t copied, nbytes, public_key_sz;
int ret = -ENOMEM;
nbytes = ctx->ndigits << ECC_DIGITS_TO_BYTES_SHIFT;
/* Public part is a point thus it has both coordinates */
public_key_sz = 2 * nbytes;
public_key = kmalloc(public_key_sz, GFP_KERNEL);
if (!public_key)
return -ENOMEM;
if (req->src) {
copied = sg_copy_to_buffer(req->src, 1, ctx->public_key,
2 * nbytes);
if (copied != 2 * nbytes)
return -EINVAL;
shared_secret = kmalloc(nbytes, GFP_KERNEL);
if (!shared_secret)
goto free_pubkey;
copied = sg_copy_to_buffer(req->src, 1, public_key,
public_key_sz);
if (copied != public_key_sz) {
ret = -EINVAL;
goto free_all;
}
ret = crypto_ecdh_shared_secret(ctx->curve_id, ctx->ndigits,
ctx->private_key,
ctx->public_key,
ctx->shared_secret);
ctx->private_key, public_key,
shared_secret);
buf = ctx->shared_secret;
buf = shared_secret;
} else {
ret = ecc_make_pub_key(ctx->curve_id, ctx->ndigits,
ctx->private_key, ctx->public_key);
buf = ctx->public_key;
/* Public part is a point thus it has both coordinates */
nbytes *= 2;
ctx->private_key, public_key);
buf = public_key;
nbytes = public_key_sz;
}
if (ret < 0)
return ret;
goto free_all;
copied = sg_copy_from_buffer(req->dst, 1, buf, nbytes);
if (copied != nbytes)
return -EINVAL;
ret = -EINVAL;
/* fall through */
free_all:
kzfree(shared_secret);
free_pubkey:
kfree(public_key);
return ret;
}
......
......@@ -55,8 +55,7 @@ static int crypto_pcbc_encrypt_segment(struct skcipher_request *req,
do {
crypto_xor(iv, src, bsize);
crypto_cipher_encrypt_one(tfm, dst, iv);
memcpy(iv, dst, bsize);
crypto_xor(iv, src, bsize);
crypto_xor_cpy(iv, dst, src, bsize);
src += bsize;
dst += bsize;
......@@ -79,8 +78,7 @@ static int crypto_pcbc_encrypt_inplace(struct skcipher_request *req,
memcpy(tmpbuf, src, bsize);
crypto_xor(iv, src, bsize);
crypto_cipher_encrypt_one(tfm, src, iv);
memcpy(iv, tmpbuf, bsize);
crypto_xor(iv, src, bsize);
crypto_xor_cpy(iv, tmpbuf, src, bsize);
src += bsize;
} while ((nbytes -= bsize) >= bsize);
......@@ -127,8 +125,7 @@ static int crypto_pcbc_decrypt_segment(struct skcipher_request *req,
do {
crypto_cipher_decrypt_one(tfm, dst, src);
crypto_xor(dst, iv, bsize);
memcpy(iv, src, bsize);
crypto_xor(iv, dst, bsize);
crypto_xor_cpy(iv, dst, src, bsize);
src += bsize;
dst += bsize;
......@@ -153,8 +150,7 @@ static int crypto_pcbc_decrypt_inplace(struct skcipher_request *req,
memcpy(tmpbuf, src, bsize);
crypto_cipher_decrypt_one(tfm, src, src);
crypto_xor(src, iv, bsize);
memcpy(iv, tmpbuf, bsize);
crypto_xor(iv, src, bsize);
crypto_xor_cpy(iv, src, tmpbuf, bsize);
src += bsize;
} while ((nbytes -= bsize) >= bsize);
......
......@@ -43,12 +43,14 @@ int crypto_rng_reset(struct crypto_rng *tfm, const u8 *seed, unsigned int slen)
if (!buf)
return -ENOMEM;
get_random_bytes(buf, slen);
err = get_random_bytes_wait(buf, slen);
if (err)
goto out;
seed = buf;
}
err = crypto_rng_alg(tfm)->seed(tfm, seed, slen);
out:
kzfree(buf);
return err;
}
......
This diff is collapsed.
This diff is collapsed.
......@@ -1404,9 +1404,9 @@ static int do_test(const char *alg, u32 type, u32 mask, int m)
test_cipher_speed("lrw(aes)", DECRYPT, sec, NULL, 0,
speed_template_32_40_48);
test_cipher_speed("xts(aes)", ENCRYPT, sec, NULL, 0,
speed_template_32_48_64);
speed_template_32_64);
test_cipher_speed("xts(aes)", DECRYPT, sec, NULL, 0,
speed_template_32_48_64);
speed_template_32_64);
test_cipher_speed("cts(cbc(aes))", ENCRYPT, sec, NULL, 0,
speed_template_16_24_32);
test_cipher_speed("cts(cbc(aes))", DECRYPT, sec, NULL, 0,
......@@ -1837,9 +1837,9 @@ static int do_test(const char *alg, u32 type, u32 mask, int m)
test_acipher_speed("lrw(aes)", DECRYPT, sec, NULL, 0,
speed_template_32_40_48);
test_acipher_speed("xts(aes)", ENCRYPT, sec, NULL, 0,
speed_template_32_48_64);
speed_template_32_64);
test_acipher_speed("xts(aes)", DECRYPT, sec, NULL, 0,
speed_template_32_48_64);
speed_template_32_64);
test_acipher_speed("cts(cbc(aes))", ENCRYPT, sec, NULL, 0,
speed_template_16_24_32);
test_acipher_speed("cts(cbc(aes))", DECRYPT, sec, NULL, 0,
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
obj-$(CONFIG_CRYPTO_DEV_ARTPEC6) := artpec6_crypto.o
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment