Commit 733f7e9c authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'v6.4-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:
   - Total usage stats now include all that returned errors (instead of
     just some)
   - Remove maximum hash statesize limit
   - Add cloning support for hmac and unkeyed hashes
   - Demote BUG_ON in crypto_unregister_alg to a WARN_ON

  Algorithms:
   - Use RIP-relative addressing on x86 to prepare for PIE build
   - Add accelerated AES/GCM stitched implementation on powerpc P10
   - Add some test vectors for cmac(camellia)
   - Remove failure case where jent is unavailable outside of FIPS mode
     in drbg
   - Add permanent and intermittent health error checks in jitter RNG

  Drivers:
   - Add support for 402xx devices in qat
   - Add support for HiSTB TRNG
   - Fix hash concurrency issues in stm32
   - Add OP-TEE firmware support in caam"

* tag 'v6.4-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (139 commits)
  i2c: designware: Add doorbell support for Mendocino
  i2c: designware: Use PCI PSP driver for communication
  powerpc: Move Power10 feature PPC_MODULE_FEATURE_P10
  crypto: p10-aes-gcm - Remove POWER10_CPU dependency
  crypto: testmgr - Add some test vectors for cmac(camellia)
  crypto: cryptd - Add support for cloning hashes
  crypto: cryptd - Convert hash to use modern init_tfm/exit_tfm
  crypto: hmac - Add support for cloning
  crypto: hash - Add crypto_clone_ahash/shash
  crypto: api - Add crypto_clone_tfm
  crypto: api - Add crypto_tfm_get
  crypto: x86/sha - Use local .L symbols for code
  crypto: x86/crc32 - Use local .L symbols for code
  crypto: x86/aesni - Use local .L symbols for code
  crypto: x86/sha256 - Use RIP-relative addressing
  crypto: x86/ghash - Use RIP-relative addressing
  crypto: x86/des3 - Use RIP-relative addressing
  crypto: x86/crc32c - Use RIP-relative addressing
  crypto: x86/cast6 - Use RIP-relative addressing
  crypto: x86/cast5 - Use RIP-relative addressing
  ...
parents 98f99e67 482c84e9
Qualcomm crypto engine driver
Required properties:
- compatible : should be "qcom,crypto-v5.1"
- reg : specifies base physical address and size of the registers map
- clocks : phandle to clock-controller plus clock-specifier pair
- clock-names : "iface" clocks register interface
"bus" clocks data transfer interface
"core" clocks rest of the crypto block
- dmas : DMA specifiers for tx and rx dma channels. For more see
Documentation/devicetree/bindings/dma/dma.txt
- dma-names : DMA request names should be "rx" and "tx"
Example:
crypto@fd45a000 {
compatible = "qcom,crypto-v5.1";
reg = <0xfd45a000 0x6000>;
clocks = <&gcc GCC_CE2_AHB_CLK>,
<&gcc GCC_CE2_AXI_CLK>,
<&gcc GCC_CE2_CLK>;
clock-names = "iface", "bus", "core";
dmas = <&cryptobam 2>, <&cryptobam 3>;
dma-names = "rx", "tx";
};
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
%YAML 1.2
---
$id: http://devicetree.org/schemas/crypto/qcom-qce.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
title: Qualcomm crypto engine driver
maintainers:
- Bhupesh Sharma <bhupesh.sharma@linaro.org>
description:
This document defines the binding for the QCE crypto
controller found on Qualcomm parts.
properties:
compatible:
oneOf:
- const: qcom,crypto-v5.1
deprecated: true
description: Kept only for ABI backward compatibility
- const: qcom,crypto-v5.4
deprecated: true
description: Kept only for ABI backward compatibility
- items:
- enum:
- qcom,ipq6018-qce
- qcom,ipq8074-qce
- qcom,msm8996-qce
- qcom,sdm845-qce
- const: qcom,ipq4019-qce
- const: qcom,qce
- items:
- enum:
- qcom,sm8250-qce
- qcom,sm8350-qce
- qcom,sm8450-qce
- qcom,sm8550-qce
- const: qcom,sm8150-qce
- const: qcom,qce
reg:
maxItems: 1
clocks:
items:
- description: iface clocks register interface.
- description: bus clocks data transfer interface.
- description: core clocks rest of the crypto block.
clock-names:
items:
- const: iface
- const: bus
- const: core
iommus:
minItems: 1
maxItems: 8
description:
phandle to apps_smmu node with sid mask.
interconnects:
maxItems: 1
description:
Interconnect path between qce crypto and main memory.
interconnect-names:
const: memory
dmas:
items:
- description: DMA specifiers for rx dma channel.
- description: DMA specifiers for tx dma channel.
dma-names:
items:
- const: rx
- const: tx
allOf:
- if:
properties:
compatible:
contains:
enum:
- qcom,crypto-v5.1
- qcom,crypto-v5.4
- qcom,ipq4019-qce
then:
required:
- clocks
- clock-names
required:
- compatible
- reg
- dmas
- dma-names
additionalProperties: false
examples:
- |
#include <dt-bindings/clock/qcom,gcc-apq8084.h>
crypto-engine@fd45a000 {
compatible = "qcom,ipq6018-qce", "qcom,ipq4019-qce", "qcom,qce";
reg = <0xfd45a000 0x6000>;
clocks = <&gcc GCC_CE2_AHB_CLK>,
<&gcc GCC_CE2_AXI_CLK>,
<&gcc GCC_CE2_CLK>;
clock-names = "iface", "bus", "core";
dmas = <&cryptobam 2>, <&cryptobam 3>;
dma-names = "rx", "tx";
iommus = <&apps_smmu 0x584 0x0011>,
<&apps_smmu 0x586 0x0011>,
<&apps_smmu 0x594 0x0011>,
<&apps_smmu 0x596 0x0011>;
};
......@@ -2269,7 +2269,7 @@ F: arch/arm/boot/dts/intel-ixp*
F: arch/arm/mach-ixp4xx/
F: drivers/bus/intel-ixp4xx-eb.c
F: drivers/clocksource/timer-ixp4xx.c
F: drivers/crypto/ixp4xx_crypto.c
F: drivers/crypto/intel/ixp4xx/ixp4xx_crypto.c
F: drivers/gpio/gpio-ixp4xx.c
F: drivers/irqchip/irq-ixp4xx.c
......@@ -10391,7 +10391,7 @@ INTEL IXP4XX CRYPTO SUPPORT
M: Corentin Labbe <clabbe@baylibre.com>
L: linux-crypto@vger.kernel.org
S: Maintained
F: drivers/crypto/ixp4xx_crypto.c
F: drivers/crypto/intel/ixp4xx/ixp4xx_crypto.c
INTEL ISHTP ECLITE DRIVER
M: Sumesh K Naduvalath <sumesh.k.naduvalath@intel.com>
......@@ -10426,11 +10426,11 @@ INTEL KEEM BAY OCS AES/SM4 CRYPTO DRIVER
M: Daniele Alessandrelli <daniele.alessandrelli@intel.com>
S: Maintained
F: Documentation/devicetree/bindings/crypto/intel,keembay-ocs-aes.yaml
F: drivers/crypto/keembay/Kconfig
F: drivers/crypto/keembay/Makefile
F: drivers/crypto/keembay/keembay-ocs-aes-core.c
F: drivers/crypto/keembay/ocs-aes.c
F: drivers/crypto/keembay/ocs-aes.h
F: drivers/crypto/intel/keembay/Kconfig
F: drivers/crypto/intel/keembay/Makefile
F: drivers/crypto/intel/keembay/keembay-ocs-aes-core.c
F: drivers/crypto/intel/keembay/ocs-aes.c
F: drivers/crypto/intel/keembay/ocs-aes.h
INTEL KEEM BAY OCS ECC CRYPTO DRIVER
M: Daniele Alessandrelli <daniele.alessandrelli@intel.com>
......@@ -10438,20 +10438,20 @@ M: Prabhjot Khurana <prabhjot.khurana@intel.com>
M: Mark Gross <mgross@linux.intel.com>
S: Maintained
F: Documentation/devicetree/bindings/crypto/intel,keembay-ocs-ecc.yaml
F: drivers/crypto/keembay/Kconfig
F: drivers/crypto/keembay/Makefile
F: drivers/crypto/keembay/keembay-ocs-ecc.c
F: drivers/crypto/intel/keembay/Kconfig
F: drivers/crypto/intel/keembay/Makefile
F: drivers/crypto/intel/keembay/keembay-ocs-ecc.c
INTEL KEEM BAY OCS HCU CRYPTO DRIVER
M: Daniele Alessandrelli <daniele.alessandrelli@intel.com>
M: Declan Murphy <declan.murphy@intel.com>
S: Maintained
F: Documentation/devicetree/bindings/crypto/intel,keembay-ocs-hcu.yaml
F: drivers/crypto/keembay/Kconfig
F: drivers/crypto/keembay/Makefile
F: drivers/crypto/keembay/keembay-ocs-hcu-core.c
F: drivers/crypto/keembay/ocs-hcu.c
F: drivers/crypto/keembay/ocs-hcu.h
F: drivers/crypto/intel/keembay/Kconfig
F: drivers/crypto/intel/keembay/Makefile
F: drivers/crypto/intel/keembay/keembay-ocs-hcu-core.c
F: drivers/crypto/intel/keembay/ocs-hcu.c
F: drivers/crypto/intel/keembay/ocs-hcu.h
INTEL THUNDER BAY EMMC PHY DRIVER
M: Nandhini Srikandan <nandhini.srikandan@intel.com>
......@@ -17027,7 +17027,7 @@ QAT DRIVER
M: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
L: qat-linux@intel.com
S: Supported
F: drivers/crypto/qat/
F: drivers/crypto/intel/qat/
QCOM AUDIO (ASoC) DRIVERS
M: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
......@@ -17295,6 +17295,7 @@ M: Thara Gopinath <thara.gopinath@gmail.com>
L: linux-crypto@vger.kernel.org
L: linux-arm-msm@vger.kernel.org
S: Maintained
F: Documentation/devicetree/bindings/crypto/qcom-qce.yaml
F: drivers/crypto/qce/
QUALCOMM EMAC GIGABIT ETHERNET DRIVER
......
......@@ -1850,7 +1850,7 @@ cryptobam: dma-controller@1dc4000 {
};
crypto: crypto@1de0000 {
compatible = "qcom,sm8550-qce";
compatible = "qcom,sm8550-qce", "qcom,sm8150-qce", "qcom,qce";
reg = <0x0 0x01dfa000 0x0 0x6000>;
dmas = <&cryptobam 4>, <&cryptobam 5>;
dma-names = "rx", "tx";
......
......@@ -15,6 +15,7 @@
*/
#include <linux/linkage.h>
#include <linux/cfi_types.h>
#include <asm/assembler.h>
.text
......@@ -620,12 +621,12 @@ SYM_FUNC_END(aesbs_decrypt8)
.endm
.align 4
SYM_FUNC_START(aesbs_ecb_encrypt)
SYM_TYPED_FUNC_START(aesbs_ecb_encrypt)
__ecb_crypt aesbs_encrypt8, v0, v1, v4, v6, v3, v7, v2, v5
SYM_FUNC_END(aesbs_ecb_encrypt)
.align 4
SYM_FUNC_START(aesbs_ecb_decrypt)
SYM_TYPED_FUNC_START(aesbs_ecb_decrypt)
__ecb_crypt aesbs_decrypt8, v0, v1, v6, v4, v2, v7, v3, v5
SYM_FUNC_END(aesbs_ecb_decrypt)
......@@ -799,11 +800,11 @@ SYM_FUNC_END(__xts_crypt8)
ret
.endm
SYM_FUNC_START(aesbs_xts_encrypt)
SYM_TYPED_FUNC_START(aesbs_xts_encrypt)
__xts_crypt aesbs_encrypt8, v0, v1, v4, v6, v3, v7, v2, v5
SYM_FUNC_END(aesbs_xts_encrypt)
SYM_FUNC_START(aesbs_xts_decrypt)
SYM_TYPED_FUNC_START(aesbs_xts_decrypt)
__xts_crypt aesbs_decrypt8, v0, v1, v6, v4, v2, v7, v3, v5
SYM_FUNC_END(aesbs_xts_decrypt)
......
......@@ -94,4 +94,21 @@ config CRYPTO_AES_PPC_SPE
architecture specific assembler implementations that work on 1KB
tables or 256 bytes S-boxes.
config CRYPTO_AES_GCM_P10
tristate "Stitched AES/GCM acceleration support on P10 or later CPU (PPC)"
depends on PPC64 && CPU_LITTLE_ENDIAN
select CRYPTO_LIB_AES
select CRYPTO_ALGAPI
select CRYPTO_AEAD
default m
help
AEAD cipher: AES cipher algorithms (FIPS-197)
GCM (Galois/Counter Mode) authenticated encryption mode (NIST SP800-38D)
Architecture: powerpc64 using:
- little-endian
- Power10 or later features
Support for cryptographic acceleration instructions on Power10 or
later CPU. This module supports stitched acceleration for AES/GCM.
endmenu
......@@ -13,6 +13,7 @@ obj-$(CONFIG_CRYPTO_SHA256_PPC_SPE) += sha256-ppc-spe.o
obj-$(CONFIG_CRYPTO_CRC32C_VPMSUM) += crc32c-vpmsum.o
obj-$(CONFIG_CRYPTO_CRCT10DIF_VPMSUM) += crct10dif-vpmsum.o
obj-$(CONFIG_CRYPTO_VPMSUM_TESTER) += crc-vpmsum_test.o
obj-$(CONFIG_CRYPTO_AES_GCM_P10) += aes-gcm-p10-crypto.o
aes-ppc-spe-y := aes-spe-core.o aes-spe-keys.o aes-tab-4k.o aes-spe-modes.o aes-spe-glue.o
md5-ppc-y := md5-asm.o md5-glue.o
......@@ -21,3 +22,15 @@ sha1-ppc-spe-y := sha1-spe-asm.o sha1-spe-glue.o
sha256-ppc-spe-y := sha256-spe-asm.o sha256-spe-glue.o
crc32c-vpmsum-y := crc32c-vpmsum_asm.o crc32c-vpmsum_glue.o
crct10dif-vpmsum-y := crct10dif-vpmsum_asm.o crct10dif-vpmsum_glue.o
aes-gcm-p10-crypto-y := aes-gcm-p10-glue.o aes-gcm-p10.o ghashp8-ppc.o aesp8-ppc.o
quiet_cmd_perl = PERL $@
cmd_perl = $(PERL) $< $(if $(CONFIG_CPU_LITTLE_ENDIAN), linux-ppc64le, linux-ppc64) > $@
targets += aesp8-ppc.S ghashp8-ppc.S
$(obj)/aesp8-ppc.S $(obj)/ghashp8-ppc.S: $(obj)/%.S: $(src)/%.pl FORCE
$(call if_changed,perl)
OBJECT_FILES_NON_STANDARD_aesp8-ppc.o := y
OBJECT_FILES_NON_STANDARD_ghashp8-ppc.o := y
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Glue code for accelerated AES-GCM stitched implementation for ppc64le.
*
* Copyright 2022- IBM Inc. All rights reserved
*/
#include <asm/unaligned.h>
#include <asm/simd.h>
#include <asm/switch_to.h>
#include <crypto/aes.h>
#include <crypto/algapi.h>
#include <crypto/b128ops.h>
#include <crypto/gf128mul.h>
#include <crypto/internal/simd.h>
#include <crypto/internal/aead.h>
#include <crypto/internal/hash.h>
#include <crypto/internal/skcipher.h>
#include <crypto/scatterwalk.h>
#include <linux/cpufeature.h>
#include <linux/crypto.h>
#include <linux/module.h>
#include <linux/types.h>
#define PPC_ALIGN 16
#define GCM_IV_SIZE 12
MODULE_DESCRIPTION("PPC64le AES-GCM with Stitched implementation");
MODULE_AUTHOR("Danny Tsen <dtsen@linux.ibm.com");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS_CRYPTO("aes");
asmlinkage int aes_p8_set_encrypt_key(const u8 *userKey, const int bits,
void *key);
asmlinkage void aes_p8_encrypt(const u8 *in, u8 *out, const void *key);
asmlinkage void aes_p10_gcm_encrypt(u8 *in, u8 *out, size_t len,
void *rkey, u8 *iv, void *Xi);
asmlinkage void aes_p10_gcm_decrypt(u8 *in, u8 *out, size_t len,
void *rkey, u8 *iv, void *Xi);
asmlinkage void gcm_init_htable(unsigned char htable[256], unsigned char Xi[16]);
asmlinkage void gcm_ghash_p8(unsigned char *Xi, unsigned char *Htable,
unsigned char *aad, unsigned int alen);
struct aes_key {
u8 key[AES_MAX_KEYLENGTH];
u64 rounds;
};
struct gcm_ctx {
u8 iv[16];
u8 ivtag[16];
u8 aad_hash[16];
u64 aadLen;
u64 Plen; /* offset 56 - used in aes_p10_gcm_{en/de}crypt */
};
struct Hash_ctx {
u8 H[16]; /* subkey */
u8 Htable[256]; /* Xi, Hash table(offset 32) */
};
struct p10_aes_gcm_ctx {
struct aes_key enc_key;
};
static void vsx_begin(void)
{
preempt_disable();
enable_kernel_vsx();
}
static void vsx_end(void)
{
disable_kernel_vsx();
preempt_enable();
}
static void set_subkey(unsigned char *hash)
{
*(u64 *)&hash[0] = be64_to_cpup((__be64 *)&hash[0]);
*(u64 *)&hash[8] = be64_to_cpup((__be64 *)&hash[8]);
}
/*
* Compute aad if any.
* - Hash aad and copy to Xi.
*/
static void set_aad(struct gcm_ctx *gctx, struct Hash_ctx *hash,
unsigned char *aad, int alen)
{
int i;
u8 nXi[16] = {0, };
gctx->aadLen = alen;
i = alen & ~0xf;
if (i) {
gcm_ghash_p8(nXi, hash->Htable+32, aad, i);
aad += i;
alen -= i;
}
if (alen) {
for (i = 0; i < alen; i++)
nXi[i] ^= aad[i];
memset(gctx->aad_hash, 0, 16);
gcm_ghash_p8(gctx->aad_hash, hash->Htable+32, nXi, 16);
} else {
memcpy(gctx->aad_hash, nXi, 16);
}
memcpy(hash->Htable, gctx->aad_hash, 16);
}
static void gcmp10_init(struct gcm_ctx *gctx, u8 *iv, unsigned char *rdkey,
struct Hash_ctx *hash, u8 *assoc, unsigned int assoclen)
{
__be32 counter = cpu_to_be32(1);
aes_p8_encrypt(hash->H, hash->H, rdkey);
set_subkey(hash->H);
gcm_init_htable(hash->Htable+32, hash->H);
*((__be32 *)(iv+12)) = counter;
gctx->Plen = 0;
/*
* Encrypt counter vector as iv tag and increment counter.
*/
aes_p8_encrypt(iv, gctx->ivtag, rdkey);
counter = cpu_to_be32(2);
*((__be32 *)(iv+12)) = counter;
memcpy(gctx->iv, iv, 16);
gctx->aadLen = assoclen;
memset(gctx->aad_hash, 0, 16);
if (assoclen)
set_aad(gctx, hash, assoc, assoclen);
}
static void finish_tag(struct gcm_ctx *gctx, struct Hash_ctx *hash, int len)
{
int i;
unsigned char len_ac[16 + PPC_ALIGN];
unsigned char *aclen = PTR_ALIGN((void *)len_ac, PPC_ALIGN);
__be64 clen = cpu_to_be64(len << 3);
__be64 alen = cpu_to_be64(gctx->aadLen << 3);
if (len == 0 && gctx->aadLen == 0) {
memcpy(hash->Htable, gctx->ivtag, 16);
return;
}
/*
* Len is in bits.
*/
*((__be64 *)(aclen)) = alen;
*((__be64 *)(aclen+8)) = clen;
/*
* hash (AAD len and len)
*/
gcm_ghash_p8(hash->Htable, hash->Htable+32, aclen, 16);
for (i = 0; i < 16; i++)
hash->Htable[i] ^= gctx->ivtag[i];
}
static int set_authsize(struct crypto_aead *tfm, unsigned int authsize)
{
switch (authsize) {
case 4:
case 8:
case 12:
case 13:
case 14:
case 15:
case 16:
break;
default:
return -EINVAL;
}
return 0;
}
static int p10_aes_gcm_setkey(struct crypto_aead *aead, const u8 *key,
unsigned int keylen)
{
struct crypto_tfm *tfm = crypto_aead_tfm(aead);
struct p10_aes_gcm_ctx *ctx = crypto_tfm_ctx(tfm);
int ret;
vsx_begin();
ret = aes_p8_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
vsx_end();
return ret ? -EINVAL : 0;
}
static int p10_aes_gcm_crypt(struct aead_request *req, int enc)
{
struct crypto_tfm *tfm = req->base.tfm;
struct p10_aes_gcm_ctx *ctx = crypto_tfm_ctx(tfm);
u8 databuf[sizeof(struct gcm_ctx) + PPC_ALIGN];
struct gcm_ctx *gctx = PTR_ALIGN((void *)databuf, PPC_ALIGN);
u8 hashbuf[sizeof(struct Hash_ctx) + PPC_ALIGN];
struct Hash_ctx *hash = PTR_ALIGN((void *)hashbuf, PPC_ALIGN);
struct scatter_walk assoc_sg_walk;
struct skcipher_walk walk;
u8 *assocmem = NULL;
u8 *assoc;
unsigned int assoclen = req->assoclen;
unsigned int cryptlen = req->cryptlen;
unsigned char ivbuf[AES_BLOCK_SIZE+PPC_ALIGN];
unsigned char *iv = PTR_ALIGN((void *)ivbuf, PPC_ALIGN);
int ret;
unsigned long auth_tag_len = crypto_aead_authsize(__crypto_aead_cast(tfm));
u8 otag[16];
int total_processed = 0;
memset(databuf, 0, sizeof(databuf));
memset(hashbuf, 0, sizeof(hashbuf));
memset(ivbuf, 0, sizeof(ivbuf));
memcpy(iv, req->iv, GCM_IV_SIZE);
/* Linearize assoc, if not already linear */
if (req->src->length >= assoclen && req->src->length) {
scatterwalk_start(&assoc_sg_walk, req->src);
assoc = scatterwalk_map(&assoc_sg_walk);
} else {
gfp_t flags = (req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP) ?
GFP_KERNEL : GFP_ATOMIC;
/* assoc can be any length, so must be on heap */
assocmem = kmalloc(assoclen, flags);
if (unlikely(!assocmem))
return -ENOMEM;
assoc = assocmem;
scatterwalk_map_and_copy(assoc, req->src, 0, assoclen, 0);
}
vsx_begin();
gcmp10_init(gctx, iv, (unsigned char *) &ctx->enc_key, hash, assoc, assoclen);
vsx_end();
if (!assocmem)
scatterwalk_unmap(assoc);
else
kfree(assocmem);
if (enc)
ret = skcipher_walk_aead_encrypt(&walk, req, false);
else
ret = skcipher_walk_aead_decrypt(&walk, req, false);
if (ret)
return ret;
while (walk.nbytes > 0 && ret == 0) {
vsx_begin();
if (enc)
aes_p10_gcm_encrypt(walk.src.virt.addr,
walk.dst.virt.addr,
walk.nbytes,
&ctx->enc_key, gctx->iv, hash->Htable);
else
aes_p10_gcm_decrypt(walk.src.virt.addr,
walk.dst.virt.addr,
walk.nbytes,
&ctx->enc_key, gctx->iv, hash->Htable);
vsx_end();
total_processed += walk.nbytes;
ret = skcipher_walk_done(&walk, 0);
}
if (ret)
return ret;
/* Finalize hash */
vsx_begin();
finish_tag(gctx, hash, total_processed);
vsx_end();
/* copy Xi to end of dst */
if (enc)
scatterwalk_map_and_copy(hash->Htable, req->dst, req->assoclen + cryptlen,
auth_tag_len, 1);
else {
scatterwalk_map_and_copy(otag, req->src,
req->assoclen + cryptlen - auth_tag_len,
auth_tag_len, 0);
if (crypto_memneq(otag, hash->Htable, auth_tag_len)) {
memzero_explicit(hash->Htable, 16);
return -EBADMSG;
}
}
return 0;
}
static int p10_aes_gcm_encrypt(struct aead_request *req)
{
return p10_aes_gcm_crypt(req, 1);
}
static int p10_aes_gcm_decrypt(struct aead_request *req)
{
return p10_aes_gcm_crypt(req, 0);
}
static struct aead_alg gcm_aes_alg = {
.ivsize = GCM_IV_SIZE,
.maxauthsize = 16,
.setauthsize = set_authsize,
.setkey = p10_aes_gcm_setkey,
.encrypt = p10_aes_gcm_encrypt,
.decrypt = p10_aes_gcm_decrypt,
.base.cra_name = "gcm(aes)",
.base.cra_driver_name = "aes_gcm_p10",
.base.cra_priority = 2100,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct p10_aes_gcm_ctx),
.base.cra_module = THIS_MODULE,
};
static int __init p10_init(void)
{
return crypto_register_aead(&gcm_aes_alg);
}
static void __exit p10_exit(void)
{
crypto_unregister_aead(&gcm_aes_alg);
}
module_cpu_feature_match(PPC_MODULE_FEATURE_P10, p10_init);
module_exit(p10_exit);
This diff is collapsed.
This diff is collapsed.
#!/usr/bin/env perl
# SPDX-License-Identifier: GPL-2.0
# This code is taken from the OpenSSL project but the author (Andy Polyakov)
# has relicensed it under the GPLv2. Therefore this program is free software;
# you can redistribute it and/or modify it under the terms of the GNU General
# Public License version 2 as published by the Free Software Foundation.
#
# The original headers, including the original license headers, are
# included below for completeness.
# ====================================================================
# Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
# project. The module is, however, dual licensed under OpenSSL and
# CRYPTOGAMS licenses depending on where you obtain it. For further
# details see https://www.openssl.org/~appro/cryptogams/.
# ====================================================================
#
# GHASH for PowerISA v2.07.
#
# July 2014
#
# Accurate performance measurements are problematic, because it's
# always virtualized setup with possibly throttled processor.
# Relative comparison is therefore more informative. This initial
# version is ~2.1x slower than hardware-assisted AES-128-CTR, ~12x
# faster than "4-bit" integer-only compiler-generated 64-bit code.
# "Initial version" means that there is room for futher improvement.
$flavour=shift;
$output =shift;
if ($flavour =~ /64/) {
$SIZE_T=8;
$LRSAVE=2*$SIZE_T;
$STU="stdu";
$POP="ld";
$PUSH="std";
} elsif ($flavour =~ /32/) {
$SIZE_T=4;
$LRSAVE=$SIZE_T;
$STU="stwu";
$POP="lwz";
$PUSH="stw";
} else { die "nonsense $flavour"; }
$0 =~ m/(.*[\/\\])[^\/\\]+$/; $dir=$1;
( $xlate="${dir}ppc-xlate.pl" and -f $xlate ) or
( $xlate="${dir}../../perlasm/ppc-xlate.pl" and -f $xlate) or
die "can't locate ppc-xlate.pl";
open STDOUT,"| $^X $xlate $flavour $output" || die "can't call $xlate: $!";
my ($Xip,$Htbl,$inp,$len)=map("r$_",(3..6)); # argument block
my ($Xl,$Xm,$Xh,$IN)=map("v$_",(0..3));
my ($zero,$t0,$t1,$t2,$xC2,$H,$Hh,$Hl,$lemask)=map("v$_",(4..12));
my ($Xl1,$Xm1,$Xh1,$IN1,$H2,$H2h,$H2l)=map("v$_",(13..19));
my $vrsave="r12";
my ($t4,$t5,$t6) = ($Hl,$H,$Hh);
$code=<<___;
.machine "any"
.text
.globl .gcm_init_p8
lis r0,0xfff0
li r8,0x10
mfspr $vrsave,256
li r9,0x20
mtspr 256,r0
li r10,0x30
lvx_u $H,0,r4 # load H
le?xor r7,r7,r7
le?addi r7,r7,0x8 # need a vperm start with 08
le?lvsr 5,0,r7
le?vspltisb 6,0x0f
le?vxor 5,5,6 # set a b-endian mask
le?vperm $H,$H,$H,5
vspltisb $xC2,-16 # 0xf0
vspltisb $t0,1 # one
vaddubm $xC2,$xC2,$xC2 # 0xe0
vxor $zero,$zero,$zero
vor $xC2,$xC2,$t0 # 0xe1
vsldoi $xC2,$xC2,$zero,15 # 0xe1...
vsldoi $t1,$zero,$t0,1 # ...1
vaddubm $xC2,$xC2,$xC2 # 0xc2...
vspltisb $t2,7
vor $xC2,$xC2,$t1 # 0xc2....01
vspltb $t1,$H,0 # most significant byte
vsl $H,$H,$t0 # H<<=1
vsrab $t1,$t1,$t2 # broadcast carry bit
vand $t1,$t1,$xC2
vxor $H,$H,$t1 # twisted H
vsldoi $H,$H,$H,8 # twist even more ...
vsldoi $xC2,$zero,$xC2,8 # 0xc2.0
vsldoi $Hl,$zero,$H,8 # ... and split
vsldoi $Hh,$H,$zero,8
stvx_u $xC2,0,r3 # save pre-computed table
stvx_u $Hl,r8,r3
stvx_u $H, r9,r3
stvx_u $Hh,r10,r3
mtspr 256,$vrsave
blr
.long 0
.byte 0,12,0x14,0,0,0,2,0
.long 0
.size .gcm_init_p8,.-.gcm_init_p8
.globl .gcm_init_htable
lis r0,0xfff0
li r8,0x10
mfspr $vrsave,256
li r9,0x20
mtspr 256,r0
li r10,0x30
lvx_u $H,0,r4 # load H
vspltisb $xC2,-16 # 0xf0
vspltisb $t0,1 # one
vaddubm $xC2,$xC2,$xC2 # 0xe0
vxor $zero,$zero,$zero
vor $xC2,$xC2,$t0 # 0xe1
vsldoi $xC2,$xC2,$zero,15 # 0xe1...
vsldoi $t1,$zero,$t0,1 # ...1
vaddubm $xC2,$xC2,$xC2 # 0xc2...
vspltisb $t2,7
vor $xC2,$xC2,$t1 # 0xc2....01
vspltb $t1,$H,0 # most significant byte
vsl $H,$H,$t0 # H<<=1
vsrab $t1,$t1,$t2 # broadcast carry bit
vand $t1,$t1,$xC2
vxor $IN,$H,$t1 # twisted H
vsldoi $H,$IN,$IN,8 # twist even more ...
vsldoi $xC2,$zero,$xC2,8 # 0xc2.0
vsldoi $Hl,$zero,$H,8 # ... and split
vsldoi $Hh,$H,$zero,8
stvx_u $xC2,0,r3 # save pre-computed table
stvx_u $Hl,r8,r3
li r8,0x40
stvx_u $H, r9,r3
li r9,0x50
stvx_u $Hh,r10,r3
li r10,0x60
vpmsumd $Xl,$IN,$Hl # H.lo·H.lo
vpmsumd $Xm,$IN,$H # H.hi·H.lo+H.lo·H.hi
vpmsumd $Xh,$IN,$Hh # H.hi·H.hi
vpmsumd $t2,$Xl,$xC2 # 1st reduction phase
vsldoi $t0,$Xm,$zero,8
vsldoi $t1,$zero,$Xm,8
vxor $Xl,$Xl,$t0
vxor $Xh,$Xh,$t1
vsldoi $Xl,$Xl,$Xl,8
vxor $Xl,$Xl,$t2
vsldoi $t1,$Xl,$Xl,8 # 2nd reduction phase
vpmsumd $Xl,$Xl,$xC2
vxor $t1,$t1,$Xh
vxor $IN1,$Xl,$t1
vsldoi $H2,$IN1,$IN1,8
vsldoi $H2l,$zero,$H2,8
vsldoi $H2h,$H2,$zero,8
stvx_u $H2l,r8,r3 # save H^2
li r8,0x70
stvx_u $H2,r9,r3
li r9,0x80
stvx_u $H2h,r10,r3
li r10,0x90
vpmsumd $Xl,$IN,$H2l # H.lo·H^2.lo
vpmsumd $Xl1,$IN1,$H2l # H^2.lo·H^2.lo
vpmsumd $Xm,$IN,$H2 # H.hi·H^2.lo+H.lo·H^2.hi
vpmsumd $Xm1,$IN1,$H2 # H^2.hi·H^2.lo+H^2.lo·H^2.hi
vpmsumd $Xh,$IN,$H2h # H.hi·H^2.hi
vpmsumd $Xh1,$IN1,$H2h # H^2.hi·H^2.hi
vpmsumd $t2,$Xl,$xC2 # 1st reduction phase
vpmsumd $t6,$Xl1,$xC2 # 1st reduction phase
vsldoi $t0,$Xm,$zero,8
vsldoi $t1,$zero,$Xm,8
vsldoi $t4,$Xm1,$zero,8
vsldoi $t5,$zero,$Xm1,8
vxor $Xl,$Xl,$t0
vxor $Xh,$Xh,$t1
vxor $Xl1,$Xl1,$t4
vxor $Xh1,$Xh1,$t5
vsldoi $Xl,$Xl,$Xl,8
vsldoi $Xl1,$Xl1,$Xl1,8
vxor $Xl,$Xl,$t2
vxor $Xl1,$Xl1,$t6
vsldoi $t1,$Xl,$Xl,8 # 2nd reduction phase
vsldoi $t5,$Xl1,$Xl1,8 # 2nd reduction phase
vpmsumd $Xl,$Xl,$xC2
vpmsumd $Xl1,$Xl1,$xC2
vxor $t1,$t1,$Xh
vxor $t5,$t5,$Xh1
vxor $Xl,$Xl,$t1
vxor $Xl1,$Xl1,$t5
vsldoi $H,$Xl,$Xl,8
vsldoi $H2,$Xl1,$Xl1,8
vsldoi $Hl,$zero,$H,8
vsldoi $Hh,$H,$zero,8
vsldoi $H2l,$zero,$H2,8
vsldoi $H2h,$H2,$zero,8
stvx_u $Hl,r8,r3 # save H^3
li r8,0xa0
stvx_u $H,r9,r3
li r9,0xb0
stvx_u $Hh,r10,r3
li r10,0xc0
stvx_u $H2l,r8,r3 # save H^4
stvx_u $H2,r9,r3
stvx_u $H2h,r10,r3
mtspr 256,$vrsave
blr
.long 0
.byte 0,12,0x14,0,0,0,2,0
.long 0
.size .gcm_init_htable,.-.gcm_init_htable
.globl .gcm_gmult_p8
lis r0,0xfff8
li r8,0x10
mfspr $vrsave,256
li r9,0x20
mtspr 256,r0
li r10,0x30
lvx_u $IN,0,$Xip # load Xi
lvx_u $Hl,r8,$Htbl # load pre-computed table
le?lvsl $lemask,r0,r0
lvx_u $H, r9,$Htbl
le?vspltisb $t0,0x07
lvx_u $Hh,r10,$Htbl
le?vxor $lemask,$lemask,$t0
lvx_u $xC2,0,$Htbl
le?vperm $IN,$IN,$IN,$lemask
vxor $zero,$zero,$zero
vpmsumd $Xl,$IN,$Hl # H.lo·Xi.lo
vpmsumd $Xm,$IN,$H # H.hi·Xi.lo+H.lo·Xi.hi
vpmsumd $Xh,$IN,$Hh # H.hi·Xi.hi
vpmsumd $t2,$Xl,$xC2 # 1st phase
vsldoi $t0,$Xm,$zero,8
vsldoi $t1,$zero,$Xm,8
vxor $Xl,$Xl,$t0
vxor $Xh,$Xh,$t1
vsldoi $Xl,$Xl,$Xl,8
vxor $Xl,$Xl,$t2
vsldoi $t1,$Xl,$Xl,8 # 2nd phase
vpmsumd $Xl,$Xl,$xC2
vxor $t1,$t1,$Xh
vxor $Xl,$Xl,$t1
le?vperm $Xl,$Xl,$Xl,$lemask
stvx_u $Xl,0,$Xip # write out Xi
mtspr 256,$vrsave
blr
.long 0
.byte 0,12,0x14,0,0,0,2,0
.long 0
.size .gcm_gmult_p8,.-.gcm_gmult_p8
.globl .gcm_ghash_p8
lis r0,0xfff8
li r8,0x10
mfspr $vrsave,256
li r9,0x20
mtspr 256,r0
li r10,0x30
lvx_u $Xl,0,$Xip # load Xi
lvx_u $Hl,r8,$Htbl # load pre-computed table
le?lvsl $lemask,r0,r0
lvx_u $H, r9,$Htbl
le?vspltisb $t0,0x07
lvx_u $Hh,r10,$Htbl
le?vxor $lemask,$lemask,$t0
lvx_u $xC2,0,$Htbl
le?vperm $Xl,$Xl,$Xl,$lemask
vxor $zero,$zero,$zero
lvx_u $IN,0,$inp
addi $inp,$inp,16
subi $len,$len,16
le?vperm $IN,$IN,$IN,$lemask
vxor $IN,$IN,$Xl
b Loop
.align 5
Loop:
subic $len,$len,16
vpmsumd $Xl,$IN,$Hl # H.lo·Xi.lo
subfe. r0,r0,r0 # borrow?-1:0
vpmsumd $Xm,$IN,$H # H.hi·Xi.lo+H.lo·Xi.hi
and r0,r0,$len
vpmsumd $Xh,$IN,$Hh # H.hi·Xi.hi
add $inp,$inp,r0
vpmsumd $t2,$Xl,$xC2 # 1st phase
vsldoi $t0,$Xm,$zero,8
vsldoi $t1,$zero,$Xm,8
vxor $Xl,$Xl,$t0
vxor $Xh,$Xh,$t1
vsldoi $Xl,$Xl,$Xl,8
vxor $Xl,$Xl,$t2
lvx_u $IN,0,$inp
addi $inp,$inp,16
vsldoi $t1,$Xl,$Xl,8 # 2nd phase
vpmsumd $Xl,$Xl,$xC2
le?vperm $IN,$IN,$IN,$lemask
vxor $t1,$t1,$Xh
vxor $IN,$IN,$t1
vxor $IN,$IN,$Xl
beq Loop # did $len-=16 borrow?
vxor $Xl,$Xl,$t1
le?vperm $Xl,$Xl,$Xl,$lemask
stvx_u $Xl,0,$Xip # write out Xi
mtspr 256,$vrsave
blr
.long 0
.byte 0,12,0x14,0,0,0,4,0
.long 0
.size .gcm_ghash_p8,.-.gcm_ghash_p8
.asciz "GHASH for PowerISA 2.07, CRYPTOGAMS by <appro\@openssl.org>"
.align 2
___
foreach (split("\n",$code)) {
if ($flavour =~ /le$/o) { # little-endian
s/le\?//o or
s/be\?/#be#/o;
} else {
s/le\?/#le#/o or
s/be\?//o;
}
print $_,"\n";
}
close STDOUT; # enforce flush
#!/usr/bin/env perl
# SPDX-License-Identifier: GPL-2.0
# PowerPC assembler distiller by <appro>.
my $flavour = shift;
my $output = shift;
open STDOUT,">$output" || die "can't open $output: $!";
my %GLOBALS;
my $dotinlocallabels=($flavour=~/linux/)?1:0;
################################################################
# directives which need special treatment on different platforms
################################################################
my $globl = sub {
my $junk = shift;
my $name = shift;
my $global = \$GLOBALS{$name};
my $ret;
$name =~ s|^[\.\_]||;
SWITCH: for ($flavour) {
/aix/ && do { $name = ".$name";
last;
};
/osx/ && do { $name = "_$name";
last;
};
/linux/
&& do { $ret = "_GLOBAL($name)";
last;
};
}
$ret = ".globl $name\nalign 5\n$name:" if (!$ret);
$$global = $name;
$ret;
};
my $text = sub {
my $ret = ($flavour =~ /aix/) ? ".csect\t.text[PR],7" : ".text";
$ret = ".abiversion 2\n".$ret if ($flavour =~ /linux.*64le/);
$ret;
};
my $machine = sub {
my $junk = shift;
my $arch = shift;
if ($flavour =~ /osx/)
{ $arch =~ s/\"//g;
$arch = ($flavour=~/64/) ? "ppc970-64" : "ppc970" if ($arch eq "any");
}
".machine $arch";
};
my $size = sub {
if ($flavour =~ /linux/)
{ shift;
my $name = shift; $name =~ s|^[\.\_]||;
my $ret = ".size $name,.-".($flavour=~/64$/?".":"").$name;
$ret .= "\n.size .$name,.-.$name" if ($flavour=~/64$/);
$ret;
}
else
{ ""; }
};
my $asciz = sub {
shift;
my $line = join(",",@_);
if ($line =~ /^"(.*)"$/)
{ ".byte " . join(",",unpack("C*",$1),0) . "\n.align 2"; }
else
{ ""; }
};
my $quad = sub {
shift;
my @ret;
my ($hi,$lo);
for (@_) {
if (/^0x([0-9a-f]*?)([0-9a-f]{1,8})$/io)
{ $hi=$1?"0x$1":"0"; $lo="0x$2"; }
elsif (/^([0-9]+)$/o)
{ $hi=$1>>32; $lo=$1&0xffffffff; } # error-prone with 32-bit perl
else
{ $hi=undef; $lo=$_; }
if (defined($hi))
{ push(@ret,$flavour=~/le$/o?".long\t$lo,$hi":".long\t$hi,$lo"); }
else
{ push(@ret,".quad $lo"); }
}
join("\n",@ret);
};
################################################################
# simplified mnemonics not handled by at least one assembler
################################################################
my $cmplw = sub {
my $f = shift;
my $cr = 0; $cr = shift if ($#_>1);
# Some out-of-date 32-bit GNU assembler just can't handle cmplw...
($flavour =~ /linux.*32/) ?
" .long ".sprintf "0x%x",31<<26|$cr<<23|$_[0]<<16|$_[1]<<11|64 :
" cmplw ".join(',',$cr,@_);
};
my $bdnz = sub {
my $f = shift;
my $bo = $f=~/[\+\-]/ ? 16+9 : 16; # optional "to be taken" hint
" bc $bo,0,".shift;
} if ($flavour!~/linux/);
my $bltlr = sub {
my $f = shift;
my $bo = $f=~/\-/ ? 12+2 : 12; # optional "not to be taken" hint
($flavour =~ /linux/) ? # GNU as doesn't allow most recent hints
" .long ".sprintf "0x%x",19<<26|$bo<<21|16<<1 :
" bclr $bo,0";
};
my $bnelr = sub {
my $f = shift;
my $bo = $f=~/\-/ ? 4+2 : 4; # optional "not to be taken" hint
($flavour =~ /linux/) ? # GNU as doesn't allow most recent hints
" .long ".sprintf "0x%x",19<<26|$bo<<21|2<<16|16<<1 :
" bclr $bo,2";
};
my $beqlr = sub {
my $f = shift;
my $bo = $f=~/-/ ? 12+2 : 12; # optional "not to be taken" hint
($flavour =~ /linux/) ? # GNU as doesn't allow most recent hints
" .long ".sprintf "0x%X",19<<26|$bo<<21|2<<16|16<<1 :
" bclr $bo,2";
};
# GNU assembler can't handle extrdi rA,rS,16,48, or when sum of last two
# arguments is 64, with "operand out of range" error.
my $extrdi = sub {
my ($f,$ra,$rs,$n,$b) = @_;
$b = ($b+$n)&63; $n = 64-$n;
" rldicl $ra,$rs,$b,$n";
};
my $vmr = sub {
my ($f,$vx,$vy) = @_;
" vor $vx,$vy,$vy";
};
# Some ABIs specify vrsave, special-purpose register #256, as reserved
# for system use.
my $no_vrsave = ($flavour =~ /linux-ppc64le/);
my $mtspr = sub {
my ($f,$idx,$ra) = @_;
if ($idx == 256 && $no_vrsave) {
" or $ra,$ra,$ra";
} else {
" mtspr $idx,$ra";
}
};
my $mfspr = sub {
my ($f,$rd,$idx) = @_;
if ($idx == 256 && $no_vrsave) {
" li $rd,-1";
} else {
" mfspr $rd,$idx";
}
};
# PowerISA 2.06 stuff
sub vsxmem_op {
my ($f, $vrt, $ra, $rb, $op) = @_;
" .long ".sprintf "0x%X",(31<<26)|($vrt<<21)|($ra<<16)|($rb<<11)|($op*2+1);
}
# made-up unaligned memory reference AltiVec/VMX instructions
my $lvx_u = sub { vsxmem_op(@_, 844); }; # lxvd2x
my $stvx_u = sub { vsxmem_op(@_, 972); }; # stxvd2x
my $lvdx_u = sub { vsxmem_op(@_, 588); }; # lxsdx
my $stvdx_u = sub { vsxmem_op(@_, 716); }; # stxsdx
my $lvx_4w = sub { vsxmem_op(@_, 780); }; # lxvw4x
my $stvx_4w = sub { vsxmem_op(@_, 908); }; # stxvw4x
# PowerISA 2.07 stuff
sub vcrypto_op {
my ($f, $vrt, $vra, $vrb, $op) = @_;
" .long ".sprintf "0x%X",(4<<26)|($vrt<<21)|($vra<<16)|($vrb<<11)|$op;
}
my $vcipher = sub { vcrypto_op(@_, 1288); };
my $vcipherlast = sub { vcrypto_op(@_, 1289); };
my $vncipher = sub { vcrypto_op(@_, 1352); };
my $vncipherlast= sub { vcrypto_op(@_, 1353); };
my $vsbox = sub { vcrypto_op(@_, 0, 1480); };
my $vshasigmad = sub { my ($st,$six)=splice(@_,-2); vcrypto_op(@_, $st<<4|$six, 1730); };
my $vshasigmaw = sub { my ($st,$six)=splice(@_,-2); vcrypto_op(@_, $st<<4|$six, 1666); };
my $vpmsumb = sub { vcrypto_op(@_, 1032); };
my $vpmsumd = sub { vcrypto_op(@_, 1224); };
my $vpmsubh = sub { vcrypto_op(@_, 1096); };
my $vpmsumw = sub { vcrypto_op(@_, 1160); };
my $vaddudm = sub { vcrypto_op(@_, 192); };
my $vadduqm = sub { vcrypto_op(@_, 256); };
my $mtsle = sub {
my ($f, $arg) = @_;
" .long ".sprintf "0x%X",(31<<26)|($arg<<21)|(147*2);
};
print "#include <asm/ppc_asm.h>\n" if $flavour =~ /linux/;
while($line=<>) {
$line =~ s|[#!;].*$||; # get rid of asm-style comments...
$line =~ s|/\*.*\*/||; # ... and C-style comments...
$line =~ s|^\s+||; # ... and skip white spaces in beginning...
$line =~ s|\s+$||; # ... and at the end
{
$line =~ s|\b\.L(\w+)|L$1|g; # common denominator for Locallabel
$line =~ s|\bL(\w+)|\.L$1|g if ($dotinlocallabels);
}
{
$line =~ s|^\s*(\.?)(\w+)([\.\+\-]?)\s*||;
my $c = $1; $c = "\t" if ($c eq "");
my $mnemonic = $2;
my $f = $3;
my $opcode = eval("\$$mnemonic");
$line =~ s/\b(c?[rf]|v|vs)([0-9]+)\b/$2/g if ($c ne "." and $flavour !~ /osx/);
if (ref($opcode) eq 'CODE') { $line = &$opcode($f,split(',',$line)); }
elsif ($mnemonic) { $line = $c.$mnemonic.$f."\t".$line; }
}
print $line if ($line);
print "\n";
}
close STDOUT;
......@@ -22,6 +22,7 @@
*/
#define PPC_MODULE_FEATURE_VEC_CRYPTO (32 + ilog2(PPC_FEATURE2_VEC_CRYPTO))
#define PPC_MODULE_FEATURE_P10 (32 + ilog2(PPC_FEATURE2_ARCH_3_1))
#define cpu_feature(x) (x)
......
......@@ -201,8 +201,8 @@ SYM_FUNC_START(crypto_aegis128_aesni_init)
movdqa KEY, STATE4
/* load the constants: */
movdqa .Laegis128_const_0, STATE2
movdqa .Laegis128_const_1, STATE1
movdqa .Laegis128_const_0(%rip), STATE2
movdqa .Laegis128_const_1(%rip), STATE1
pxor STATE2, STATE3
pxor STATE1, STATE4
......@@ -682,7 +682,7 @@ SYM_TYPED_FUNC_START(crypto_aegis128_aesni_dec_tail)
punpcklbw T0, T0
punpcklbw T0, T0
punpcklbw T0, T0
movdqa .Laegis128_counter, T1
movdqa .Laegis128_counter(%rip), T1
pcmpgtb T1, T0
pand T0, MSG
......
This diff is collapsed.
This diff is collapsed.
......@@ -80,7 +80,7 @@
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
vmovdqu .Lshufb_16x16b, a0; \
vmovdqu .Lshufb_16x16b(%rip), a0; \
vmovdqu st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
......@@ -132,7 +132,7 @@
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
vmovdqu .Lshufb_16x16b, a0; \
vmovdqu .Lshufb_16x16b(%rip), a0; \
vmovdqu st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
......@@ -300,11 +300,11 @@
x4, x5, x6, x7, \
t0, t1, t2, t3, \
t4, t5, t6, t7) \
vmovdqa .Ltf_s2_bitmatrix, t0; \
vmovdqa .Ltf_inv_bitmatrix, t1; \
vmovdqa .Ltf_id_bitmatrix, t2; \
vmovdqa .Ltf_aff_bitmatrix, t3; \
vmovdqa .Ltf_x2_bitmatrix, t4; \
vmovdqa .Ltf_s2_bitmatrix(%rip), t0; \
vmovdqa .Ltf_inv_bitmatrix(%rip), t1; \
vmovdqa .Ltf_id_bitmatrix(%rip), t2; \
vmovdqa .Ltf_aff_bitmatrix(%rip), t3; \
vmovdqa .Ltf_x2_bitmatrix(%rip), t4; \
vgf2p8affineinvqb $(tf_s2_const), t0, x1, x1; \
vgf2p8affineinvqb $(tf_s2_const), t0, x5, x5; \
vgf2p8affineqb $(tf_inv_const), t1, x2, x2; \
......@@ -324,13 +324,13 @@
x4, x5, x6, x7, \
t0, t1, t2, t3, \
t4, t5, t6, t7) \
vmovdqa .Linv_shift_row, t0; \
vmovdqa .Lshift_row, t1; \
vbroadcastss .L0f0f0f0f, t6; \
vmovdqa .Ltf_lo__inv_aff__and__s2, t2; \
vmovdqa .Ltf_hi__inv_aff__and__s2, t3; \
vmovdqa .Ltf_lo__x2__and__fwd_aff, t4; \
vmovdqa .Ltf_hi__x2__and__fwd_aff, t5; \
vmovdqa .Linv_shift_row(%rip), t0; \
vmovdqa .Lshift_row(%rip), t1; \
vbroadcastss .L0f0f0f0f(%rip), t6; \
vmovdqa .Ltf_lo__inv_aff__and__s2(%rip), t2; \
vmovdqa .Ltf_hi__inv_aff__and__s2(%rip), t3; \
vmovdqa .Ltf_lo__x2__and__fwd_aff(%rip), t4; \
vmovdqa .Ltf_hi__x2__and__fwd_aff(%rip), t5; \
\
vaesenclast t7, x0, x0; \
vaesenclast t7, x4, x4; \
......
......@@ -96,7 +96,7 @@
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
vbroadcasti128 .Lshufb_16x16b, a0; \
vbroadcasti128 .Lshufb_16x16b(%rip), a0; \
vmovdqu st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
......@@ -148,7 +148,7 @@
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
vbroadcasti128 .Lshufb_16x16b, a0; \
vbroadcasti128 .Lshufb_16x16b(%rip), a0; \
vmovdqu st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
......@@ -307,11 +307,11 @@
x4, x5, x6, x7, \
t0, t1, t2, t3, \
t4, t5, t6, t7) \
vpbroadcastq .Ltf_s2_bitmatrix, t0; \
vpbroadcastq .Ltf_inv_bitmatrix, t1; \
vpbroadcastq .Ltf_id_bitmatrix, t2; \
vpbroadcastq .Ltf_aff_bitmatrix, t3; \
vpbroadcastq .Ltf_x2_bitmatrix, t4; \
vpbroadcastq .Ltf_s2_bitmatrix(%rip), t0; \
vpbroadcastq .Ltf_inv_bitmatrix(%rip), t1; \
vpbroadcastq .Ltf_id_bitmatrix(%rip), t2; \
vpbroadcastq .Ltf_aff_bitmatrix(%rip), t3; \
vpbroadcastq .Ltf_x2_bitmatrix(%rip), t4; \
vgf2p8affineinvqb $(tf_s2_const), t0, x1, x1; \
vgf2p8affineinvqb $(tf_s2_const), t0, x5, x5; \
vgf2p8affineqb $(tf_inv_const), t1, x2, x2; \
......@@ -332,12 +332,12 @@
t4, t5, t6, t7) \
vpxor t7, t7, t7; \
vpxor t6, t6, t6; \
vbroadcasti128 .Linv_shift_row, t0; \
vbroadcasti128 .Lshift_row, t1; \
vbroadcasti128 .Ltf_lo__inv_aff__and__s2, t2; \
vbroadcasti128 .Ltf_hi__inv_aff__and__s2, t3; \
vbroadcasti128 .Ltf_lo__x2__and__fwd_aff, t4; \
vbroadcasti128 .Ltf_hi__x2__and__fwd_aff, t5; \
vbroadcasti128 .Linv_shift_row(%rip), t0; \
vbroadcasti128 .Lshift_row(%rip), t1; \
vbroadcasti128 .Ltf_lo__inv_aff__and__s2(%rip), t2; \
vbroadcasti128 .Ltf_hi__inv_aff__and__s2(%rip), t3; \
vbroadcasti128 .Ltf_lo__x2__and__fwd_aff(%rip), t4; \
vbroadcasti128 .Ltf_hi__x2__and__fwd_aff(%rip), t5; \
\
vextracti128 $1, x0, t6##_x; \
vaesenclast t7##_x, x0##_x, x0##_x; \
......@@ -369,7 +369,7 @@
vaesdeclast t7##_x, t6##_x, t6##_x; \
vinserti128 $1, t6##_x, x6, x6; \
\
vpbroadcastd .L0f0f0f0f, t6; \
vpbroadcastd .L0f0f0f0f(%rip), t6; \
\
/* AES inverse shift rows */ \
vpshufb t0, x0, x0; \
......
......@@ -80,7 +80,7 @@
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
vbroadcasti64x2 .Lshufb_16x16b, a0; \
vbroadcasti64x2 .Lshufb_16x16b(%rip), a0; \
vmovdqu64 st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
......@@ -132,7 +132,7 @@
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
vbroadcasti64x2 .Lshufb_16x16b, a0; \
vbroadcasti64x2 .Lshufb_16x16b(%rip), a0; \
vmovdqu64 st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
......@@ -308,11 +308,11 @@
x4, x5, x6, x7, \
t0, t1, t2, t3, \
t4, t5, t6, t7) \
vpbroadcastq .Ltf_s2_bitmatrix, t0; \
vpbroadcastq .Ltf_inv_bitmatrix, t1; \
vpbroadcastq .Ltf_id_bitmatrix, t2; \
vpbroadcastq .Ltf_aff_bitmatrix, t3; \
vpbroadcastq .Ltf_x2_bitmatrix, t4; \
vpbroadcastq .Ltf_s2_bitmatrix(%rip), t0; \
vpbroadcastq .Ltf_inv_bitmatrix(%rip), t1; \
vpbroadcastq .Ltf_id_bitmatrix(%rip), t2; \
vpbroadcastq .Ltf_aff_bitmatrix(%rip), t3; \
vpbroadcastq .Ltf_x2_bitmatrix(%rip), t4; \
vgf2p8affineinvqb $(tf_s2_const), t0, x1, x1; \
vgf2p8affineinvqb $(tf_s2_const), t0, x5, x5; \
vgf2p8affineqb $(tf_inv_const), t1, x2, x2; \
......@@ -332,11 +332,11 @@
y4, y5, y6, y7, \
t0, t1, t2, t3, \
t4, t5, t6, t7) \
vpbroadcastq .Ltf_s2_bitmatrix, t0; \
vpbroadcastq .Ltf_inv_bitmatrix, t1; \
vpbroadcastq .Ltf_id_bitmatrix, t2; \
vpbroadcastq .Ltf_aff_bitmatrix, t3; \
vpbroadcastq .Ltf_x2_bitmatrix, t4; \
vpbroadcastq .Ltf_s2_bitmatrix(%rip), t0; \
vpbroadcastq .Ltf_inv_bitmatrix(%rip), t1; \
vpbroadcastq .Ltf_id_bitmatrix(%rip), t2; \
vpbroadcastq .Ltf_aff_bitmatrix(%rip), t3; \
vpbroadcastq .Ltf_x2_bitmatrix(%rip), t4; \
vgf2p8affineinvqb $(tf_s2_const), t0, x1, x1; \
vgf2p8affineinvqb $(tf_s2_const), t0, x5, x5; \
vgf2p8affineqb $(tf_inv_const), t1, x2, x2; \
......
......@@ -52,10 +52,10 @@
/* \
* S-function with AES subbytes \
*/ \
vmovdqa .Linv_shift_row, t4; \
vbroadcastss .L0f0f0f0f, t7; \
vmovdqa .Lpre_tf_lo_s1, t0; \
vmovdqa .Lpre_tf_hi_s1, t1; \
vmovdqa .Linv_shift_row(%rip), t4; \
vbroadcastss .L0f0f0f0f(%rip), t7; \
vmovdqa .Lpre_tf_lo_s1(%rip), t0; \
vmovdqa .Lpre_tf_hi_s1(%rip), t1; \
\
/* AES inverse shift rows */ \
vpshufb t4, x0, x0; \
......@@ -68,8 +68,8 @@
vpshufb t4, x6, x6; \
\
/* prefilter sboxes 1, 2 and 3 */ \
vmovdqa .Lpre_tf_lo_s4, t2; \
vmovdqa .Lpre_tf_hi_s4, t3; \
vmovdqa .Lpre_tf_lo_s4(%rip), t2; \
vmovdqa .Lpre_tf_hi_s4(%rip), t3; \
filter_8bit(x0, t0, t1, t7, t6); \
filter_8bit(x7, t0, t1, t7, t6); \
filter_8bit(x1, t0, t1, t7, t6); \
......@@ -83,8 +83,8 @@
filter_8bit(x6, t2, t3, t7, t6); \
\
/* AES subbytes + AES shift rows */ \
vmovdqa .Lpost_tf_lo_s1, t0; \
vmovdqa .Lpost_tf_hi_s1, t1; \
vmovdqa .Lpost_tf_lo_s1(%rip), t0; \
vmovdqa .Lpost_tf_hi_s1(%rip), t1; \
vaesenclast t4, x0, x0; \
vaesenclast t4, x7, x7; \
vaesenclast t4, x1, x1; \
......@@ -95,16 +95,16 @@
vaesenclast t4, x6, x6; \
\
/* postfilter sboxes 1 and 4 */ \
vmovdqa .Lpost_tf_lo_s3, t2; \
vmovdqa .Lpost_tf_hi_s3, t3; \
vmovdqa .Lpost_tf_lo_s3(%rip), t2; \
vmovdqa .Lpost_tf_hi_s3(%rip), t3; \
filter_8bit(x0, t0, t1, t7, t6); \
filter_8bit(x7, t0, t1, t7, t6); \
filter_8bit(x3, t0, t1, t7, t6); \
filter_8bit(x6, t0, t1, t7, t6); \
\
/* postfilter sbox 3 */ \
vmovdqa .Lpost_tf_lo_s2, t4; \
vmovdqa .Lpost_tf_hi_s2, t5; \
vmovdqa .Lpost_tf_lo_s2(%rip), t4; \
vmovdqa .Lpost_tf_hi_s2(%rip), t5; \
filter_8bit(x2, t2, t3, t7, t6); \
filter_8bit(x5, t2, t3, t7, t6); \
\
......@@ -443,7 +443,7 @@ SYM_FUNC_END(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
vmovdqu .Lshufb_16x16b, a0; \
vmovdqu .Lshufb_16x16b(%rip), a0; \
vmovdqu st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
......@@ -482,7 +482,7 @@ SYM_FUNC_END(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
#define inpack16_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \
y6, y7, rio, key) \
vmovq key, x0; \
vpshufb .Lpack_bswap, x0, x0; \
vpshufb .Lpack_bswap(%rip), x0, x0; \
\
vpxor 0 * 16(rio), x0, y7; \
vpxor 1 * 16(rio), x0, y6; \
......@@ -533,7 +533,7 @@ SYM_FUNC_END(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
vmovdqu x0, stack_tmp0; \
\
vmovq key, x0; \
vpshufb .Lpack_bswap, x0, x0; \
vpshufb .Lpack_bswap(%rip), x0, x0; \
\
vpxor x0, y7, y7; \
vpxor x0, y6, y6; \
......
......@@ -64,12 +64,12 @@
/* \
* S-function with AES subbytes \
*/ \
vbroadcasti128 .Linv_shift_row, t4; \
vpbroadcastd .L0f0f0f0f, t7; \
vbroadcasti128 .Lpre_tf_lo_s1, t5; \
vbroadcasti128 .Lpre_tf_hi_s1, t6; \
vbroadcasti128 .Lpre_tf_lo_s4, t2; \
vbroadcasti128 .Lpre_tf_hi_s4, t3; \
vbroadcasti128 .Linv_shift_row(%rip), t4; \
vpbroadcastd .L0f0f0f0f(%rip), t7; \
vbroadcasti128 .Lpre_tf_lo_s1(%rip), t5; \
vbroadcasti128 .Lpre_tf_hi_s1(%rip), t6; \
vbroadcasti128 .Lpre_tf_lo_s4(%rip), t2; \
vbroadcasti128 .Lpre_tf_hi_s4(%rip), t3; \
\
/* AES inverse shift rows */ \
vpshufb t4, x0, x0; \
......@@ -115,8 +115,8 @@
vinserti128 $1, t2##_x, x6, x6; \
vextracti128 $1, x1, t3##_x; \
vextracti128 $1, x4, t2##_x; \
vbroadcasti128 .Lpost_tf_lo_s1, t0; \
vbroadcasti128 .Lpost_tf_hi_s1, t1; \
vbroadcasti128 .Lpost_tf_lo_s1(%rip), t0; \
vbroadcasti128 .Lpost_tf_hi_s1(%rip), t1; \
vaesenclast t4##_x, x2##_x, x2##_x; \
vaesenclast t4##_x, t6##_x, t6##_x; \
vinserti128 $1, t6##_x, x2, x2; \
......@@ -131,16 +131,16 @@
vinserti128 $1, t2##_x, x4, x4; \
\
/* postfilter sboxes 1 and 4 */ \
vbroadcasti128 .Lpost_tf_lo_s3, t2; \
vbroadcasti128 .Lpost_tf_hi_s3, t3; \
vbroadcasti128 .Lpost_tf_lo_s3(%rip), t2; \
vbroadcasti128 .Lpost_tf_hi_s3(%rip), t3; \
filter_8bit(x0, t0, t1, t7, t6); \
filter_8bit(x7, t0, t1, t7, t6); \
filter_8bit(x3, t0, t1, t7, t6); \
filter_8bit(x6, t0, t1, t7, t6); \
\
/* postfilter sbox 3 */ \
vbroadcasti128 .Lpost_tf_lo_s2, t4; \
vbroadcasti128 .Lpost_tf_hi_s2, t5; \
vbroadcasti128 .Lpost_tf_lo_s2(%rip), t4; \
vbroadcasti128 .Lpost_tf_hi_s2(%rip), t5; \
filter_8bit(x2, t2, t3, t7, t6); \
filter_8bit(x5, t2, t3, t7, t6); \
\
......@@ -475,7 +475,7 @@ SYM_FUNC_END(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
transpose_4x4(c0, c1, c2, c3, a0, a1); \
transpose_4x4(d0, d1, d2, d3, a0, a1); \
\
vbroadcasti128 .Lshufb_16x16b, a0; \
vbroadcasti128 .Lshufb_16x16b(%rip), a0; \
vmovdqu st1, a1; \
vpshufb a0, a2, a2; \
vpshufb a0, a3, a3; \
......@@ -514,7 +514,7 @@ SYM_FUNC_END(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
#define inpack32_pre(x0, x1, x2, x3, x4, x5, x6, x7, y0, y1, y2, y3, y4, y5, \
y6, y7, rio, key) \
vpbroadcastq key, x0; \
vpshufb .Lpack_bswap, x0, x0; \
vpshufb .Lpack_bswap(%rip), x0, x0; \
\
vpxor 0 * 32(rio), x0, y7; \
vpxor 1 * 32(rio), x0, y6; \
......@@ -565,7 +565,7 @@ SYM_FUNC_END(roundsm32_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
vmovdqu x0, stack_tmp0; \
\
vpbroadcastq key, x0; \
vpshufb .Lpack_bswap, x0, x0; \
vpshufb .Lpack_bswap(%rip), x0, x0; \
\
vpxor x0, y7, y7; \
vpxor x0, y6, y6; \
......
......@@ -77,11 +77,13 @@
#define RXORbl %r9b
#define xor2ror16(T0, T1, tmp1, tmp2, ab, dst) \
leaq T0(%rip), tmp1; \
movzbl ab ## bl, tmp2 ## d; \
xorq (tmp1, tmp2, 8), dst; \
leaq T1(%rip), tmp2; \
movzbl ab ## bh, tmp1 ## d; \
rorq $16, ab; \
xorq T0(, tmp2, 8), dst; \
xorq T1(, tmp1, 8), dst;
xorq (tmp2, tmp1, 8), dst;
/**********************************************************************
1-way camellia
......
......@@ -84,15 +84,19 @@
#define lookup_32bit(src, dst, op1, op2, op3, interleave_op, il_reg) \
movzbl src ## bh, RID1d; \
leaq s1(%rip), RID2; \
movl (RID2,RID1,4), dst ## d; \
movzbl src ## bl, RID2d; \
leaq s2(%rip), RID1; \
op1 (RID1,RID2,4), dst ## d; \
shrq $16, src; \
movl s1(, RID1, 4), dst ## d; \
op1 s2(, RID2, 4), dst ## d; \
movzbl src ## bh, RID1d; \
leaq s3(%rip), RID2; \
op2 (RID2,RID1,4), dst ## d; \
movzbl src ## bl, RID2d; \
interleave_op(il_reg); \
op2 s3(, RID1, 4), dst ## d; \
op3 s4(, RID2, 4), dst ## d;
leaq s4(%rip), RID1; \
op3 (RID1,RID2,4), dst ## d;
#define dummy(d) /* do nothing */
......@@ -151,15 +155,15 @@
subround(l ## 3, r ## 3, l ## 4, r ## 4, f);
#define enc_preload_rkr() \
vbroadcastss .L16_mask, RKR; \
vbroadcastss .L16_mask(%rip), RKR; \
/* add 16-bit rotation to key rotations (mod 32) */ \
vpxor kr(CTX), RKR, RKR;
#define dec_preload_rkr() \
vbroadcastss .L16_mask, RKR; \
vbroadcastss .L16_mask(%rip), RKR; \
/* add 16-bit rotation to key rotations (mod 32) */ \
vpxor kr(CTX), RKR, RKR; \
vpshufb .Lbswap128_mask, RKR, RKR;
vpshufb .Lbswap128_mask(%rip), RKR, RKR;
#define transpose_2x4(x0, x1, t0, t1) \
vpunpckldq x1, x0, t0; \
......@@ -235,9 +239,9 @@ SYM_FUNC_START_LOCAL(__cast5_enc_blk16)
movq %rdi, CTX;
vmovdqa .Lbswap_mask, RKM;
vmovd .Lfirst_mask, R1ST;
vmovd .L32_mask, R32;
vmovdqa .Lbswap_mask(%rip), RKM;
vmovd .Lfirst_mask(%rip), R1ST;
vmovd .L32_mask(%rip), R32;
enc_preload_rkr();
inpack_blocks(RL1, RR1, RTMP, RX, RKM);
......@@ -271,7 +275,7 @@ SYM_FUNC_START_LOCAL(__cast5_enc_blk16)
popq %rbx;
popq %r15;
vmovdqa .Lbswap_mask, RKM;
vmovdqa .Lbswap_mask(%rip), RKM;
outunpack_blocks(RR1, RL1, RTMP, RX, RKM);
outunpack_blocks(RR2, RL2, RTMP, RX, RKM);
......@@ -308,9 +312,9 @@ SYM_FUNC_START_LOCAL(__cast5_dec_blk16)
movq %rdi, CTX;
vmovdqa .Lbswap_mask, RKM;
vmovd .Lfirst_mask, R1ST;
vmovd .L32_mask, R32;
vmovdqa .Lbswap_mask(%rip), RKM;
vmovd .Lfirst_mask(%rip), R1ST;
vmovd .L32_mask(%rip), R32;
dec_preload_rkr();
inpack_blocks(RL1, RR1, RTMP, RX, RKM);
......@@ -341,7 +345,7 @@ SYM_FUNC_START_LOCAL(__cast5_dec_blk16)
round(RL, RR, 1, 2);
round(RR, RL, 0, 1);
vmovdqa .Lbswap_mask, RKM;
vmovdqa .Lbswap_mask(%rip), RKM;
popq %rbx;
popq %r15;
......@@ -504,8 +508,8 @@ SYM_FUNC_START(cast5_ctr_16way)
vpcmpeqd RKR, RKR, RKR;
vpaddq RKR, RKR, RKR; /* low: -2, high: -2 */
vmovdqa .Lbswap_iv_mask, R1ST;
vmovdqa .Lbswap128_mask, RKM;
vmovdqa .Lbswap_iv_mask(%rip), R1ST;
vmovdqa .Lbswap128_mask(%rip), RKM;
/* load IV and byteswap */
vmovq (%rcx), RX;
......
......@@ -84,15 +84,19 @@
#define lookup_32bit(src, dst, op1, op2, op3, interleave_op, il_reg) \
movzbl src ## bh, RID1d; \
leaq s1(%rip), RID2; \
movl (RID2,RID1,4), dst ## d; \
movzbl src ## bl, RID2d; \
leaq s2(%rip), RID1; \
op1 (RID1,RID2,4), dst ## d; \
shrq $16, src; \
movl s1(, RID1, 4), dst ## d; \
op1 s2(, RID2, 4), dst ## d; \
movzbl src ## bh, RID1d; \
leaq s3(%rip), RID2; \
op2 (RID2,RID1,4), dst ## d; \
movzbl src ## bl, RID2d; \
interleave_op(il_reg); \
op2 s3(, RID1, 4), dst ## d; \
op3 s4(, RID2, 4), dst ## d;
leaq s4(%rip), RID1; \
op3 (RID1,RID2,4), dst ## d;
#define dummy(d) /* do nothing */
......@@ -175,10 +179,10 @@
qop(RD, RC, 1);
#define shuffle(mask) \
vpshufb mask, RKR, RKR;
vpshufb mask(%rip), RKR, RKR;
#define preload_rkr(n, do_mask, mask) \
vbroadcastss .L16_mask, RKR; \
vbroadcastss .L16_mask(%rip), RKR; \
/* add 16-bit rotation to key rotations (mod 32) */ \
vpxor (kr+n*16)(CTX), RKR, RKR; \
do_mask(mask);
......@@ -258,9 +262,9 @@ SYM_FUNC_START_LOCAL(__cast6_enc_blk8)
movq %rdi, CTX;
vmovdqa .Lbswap_mask, RKM;
vmovd .Lfirst_mask, R1ST;
vmovd .L32_mask, R32;
vmovdqa .Lbswap_mask(%rip), RKM;
vmovd .Lfirst_mask(%rip), R1ST;
vmovd .L32_mask(%rip), R32;
inpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
inpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
......@@ -284,7 +288,7 @@ SYM_FUNC_START_LOCAL(__cast6_enc_blk8)
popq %rbx;
popq %r15;
vmovdqa .Lbswap_mask, RKM;
vmovdqa .Lbswap_mask(%rip), RKM;
outunpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
outunpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
......@@ -306,9 +310,9 @@ SYM_FUNC_START_LOCAL(__cast6_dec_blk8)
movq %rdi, CTX;
vmovdqa .Lbswap_mask, RKM;
vmovd .Lfirst_mask, R1ST;
vmovd .L32_mask, R32;
vmovdqa .Lbswap_mask(%rip), RKM;
vmovd .Lfirst_mask(%rip), R1ST;
vmovd .L32_mask(%rip), R32;
inpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
inpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
......@@ -332,7 +336,7 @@ SYM_FUNC_START_LOCAL(__cast6_dec_blk8)
popq %rbx;
popq %r15;
vmovdqa .Lbswap_mask, RKM;
vmovdqa .Lbswap_mask(%rip), RKM;
outunpack_blocks(RA1, RB1, RC1, RD1, RTMP, RX, RKRF, RKM);
outunpack_blocks(RA2, RB2, RC2, RD2, RTMP, RX, RKRF, RKM);
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -93,7 +93,7 @@ SYM_FUNC_START(clmul_ghash_mul)
FRAME_BEGIN
movups (%rdi), DATA
movups (%rsi), SHASH
movaps .Lbswap_mask, BSWAP
movaps .Lbswap_mask(%rip), BSWAP
pshufb BSWAP, DATA
call __clmul_gf128mul_ble
pshufb BSWAP, DATA
......@@ -110,7 +110,7 @@ SYM_FUNC_START(clmul_ghash_update)
FRAME_BEGIN
cmp $16, %rdx
jb .Lupdate_just_ret # check length
movaps .Lbswap_mask, BSWAP
movaps .Lbswap_mask(%rip), BSWAP
movups (%rdi), DATA
movups (%rcx), SHASH
pshufb BSWAP, DATA
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -12,6 +12,7 @@
#include <linux/kvm_host.h>
#include <linux/kernel.h>
#include <linux/highmem.h>
#include <linux/psp.h>
#include <linux/psp-sev.h>
#include <linux/pagemap.h>
#include <linux/swap.h>
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -124,7 +124,7 @@ async_tx_channel_switch(struct dma_async_tx_descriptor *depend_tx,
/**
* submit_disposition - flags for routing an incoming operation
* enum submit_disposition - flags for routing an incoming operation
* @ASYNC_TX_SUBMITTED: we were able to append the new operation under the lock
* @ASYNC_TX_CHANNEL_SWITCH: when the lock is dropped schedule a channel switch
* @ASYNC_TX_DIRECT_SUBMIT: when the lock is dropped submit directly
......@@ -258,7 +258,7 @@ EXPORT_SYMBOL_GPL(async_trigger_callback);
/**
* async_tx_quiesce - ensure tx is complete and freeable upon return
* @tx - transaction to quiesce
* @tx: transaction to quiesce
*/
void async_tx_quiesce(struct dma_async_tx_descriptor **tx)
{
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -2,7 +2,6 @@
extern void *jent_zalloc(unsigned int len);
extern void jent_zfree(void *ptr);
extern void jent_panic(char *s);
extern void jent_memcpy(void *dest, const void *src, unsigned int n);
extern void jent_get_nstime(__u64 *out);
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment