Commits · 0f8bc4bd48dd148046c19c38568cd9449c79b45f · Kirill Smelkov / linux

25 Nov, 2022 13 commits

crypto: x86/nhpoly1305 - eliminate unnecessary CFI wrappers · 0f8bc4bd

Eric Biggers authored Nov 18, 2022

Since the CFI implementation now supports indirect calls to assembly
functions, take advantage of that rather than use wrapper functions.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

0f8bc4bd

crypto: x86/aria - fix crash with CFI enabled · c67b553a

Eric Biggers authored Nov 18, 2022

aria_aesni_avx_encrypt_16way(), aria_aesni_avx_decrypt_16way(),
aria_aesni_avx_ctr_crypt_16way(), aria_aesni_avx_gfni_encrypt_16way(),
aria_aesni_avx_gfni_decrypt_16way(), and
aria_aesni_avx_gfni_ctr_crypt_16way() are called via indirect function
calls.  Therefore they need to use SYM_TYPED_FUNC_START instead of
SYM_FUNC_START to cause their type hashes to be emitted when the kernel
is built with CONFIG_CFI_CLANG=y.  Otherwise, the code crashes with a
CFI failure.

Fixes: ccace936 ("x86: Add types to indirectly called assembly functions")
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

c67b553a

crypto: x86/aegis128 - fix possible crash with CFI enabled · 8bd9974b

Eric Biggers authored Nov 18, 2022

crypto_aegis128_aesni_enc(), crypto_aegis128_aesni_enc_tail(),
crypto_aegis128_aesni_dec(), and crypto_aegis128_aesni_dec_tail() are
called via indirect function calls.  Therefore they need to use
SYM_TYPED_FUNC_START instead of SYM_FUNC_START to cause their type
hashes to be emitted when the kernel is built with CONFIG_CFI_CLANG=y.
Otherwise, the code crashes with a CFI failure (if the compiler didn't
happen to optimize out the indirect calls).

Fixes: ccace936 ("x86: Add types to indirectly called assembly functions")
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

8bd9974b

padata: Fix list iterator in padata_do_serial() · 57ddfecc

Daniel Jordan authored Nov 16, 2022

list_for_each_entry_reverse() assumes that the iterated list is nonempty
and that every list_head is embedded in the same type, but its use in
padata_do_serial() breaks both rules.

This doesn't cause any issues now because padata_priv and padata_list
happen to have their list fields at the same offset, but we really
shouldn't be relying on that.

Fixes: bfde23ce ("padata: unbind parallel jobs from specific CPUs")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

57ddfecc

padata: Always leave BHs disabled when running ->parallel() · 34c3a47d

Daniel Jordan authored Nov 16, 2022

A deadlock can happen when an overloaded system runs ->parallel() in the
context of the current task:

    padata_do_parallel
      ->parallel()
        pcrypt_aead_enc/dec
          padata_do_serial
            spin_lock(&reorder->lock) // BHs still enabled
              <interrupt>
                ...
                  __do_softirq
                    ...
                      padata_do_serial
                        spin_lock(&reorder->lock)

It's a bug for BHs to be on in _do_serial as Steffen points out, so
ensure they're off in the "current task" case like they are in
padata_parallel_worker to avoid this situation.

Reported-by: syzbot+bc05445bc14148d51915@syzkaller.appspotmail.com
Fixes: 4611ce22 ("padata: allocate work structures for parallel jobs from a pool")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

34c3a47d

crypto: tcrypt - Fix multibuffer skcipher speed test mem leak · 1aa33fc8

Zhang Yiqun authored Nov 16, 2022

In the past, the data for mb-skcipher test has been allocated
twice, that means the first allcated memory area is without
free, which may cause a potential memory leakage. So this
patch is to remove one allocation to fix this error.

Fixes: e161c593 ("crypto: tcrypt - add multibuf skcipher...")
Signed-off-by: Zhang Yiqun <zhangyiqun@phytium.com.cn>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

1aa33fc8

crypto: algboss - compile out test-related code when tests disabled · 441cb1b7

Eric Biggers authored Nov 13, 2022

When CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is set, the code in algboss.c
that handles CRYPTO_MSG_ALG_REGISTER is unnecessary, so make it be
compiled out.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

441cb1b7

crypto: kdf - silence noisy self-test · 790c4c9f

Eric Biggers authored Nov 13, 2022

Make the kdf_sp800108 self-test only print a message on success when
fips_enabled, so that it's consistent with testmgr.c and doesn't spam
the kernel log with a message that isn't really important.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

790c4c9f

crypto: kdf - skip self-test when tests disabled · 0bf365c0

Eric Biggers authored Nov 13, 2022

Make kdf_sp800108 honor the CONFIG_CRYPTO_MANAGER_DISABLE_TESTS kconfig
option, so that it doesn't always waste time running its self-test.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

0bf365c0

crypto: api - compile out crypto_boot_test_finished when tests disabled · 06bd9c96

Eric Biggers authored Nov 13, 2022

The crypto_boot_test_finished static key is unnecessary when self-tests
are disabled in the kconfig, so optimize it out accordingly, along with
the entirety of crypto_start_tests(). This mainly avoids the overhead
of an unnecessary static_branch_enable() on every boot.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

06bd9c96

crypto: algboss - optimize registration of internal algorithms · 9cadd73a

Eric Biggers authored Nov 13, 2022

Since algboss always skips testing of algorithms with the
CRYPTO_ALG_INTERNAL flag, there is no need to go through the dance of
creating the test kthread, which creates a lot of overhead.  Instead, we
can just directly finish the algorithm registration, like is now done
when self-tests are disabled entirely.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

9cadd73a

crypto: api - optimize algorithm registration when self-tests disabled · a7008584

Eric Biggers authored Nov 13, 2022

Currently, registering an algorithm with the crypto API always causes a
notification to be posted to the "cryptomgr", which then creates a
kthread to self-test the algorithm. However, if self-tests are disabled
in the kconfig (as is the default option), then this kthread just
notifies waiters that the algorithm has been tested, then exits.

This causes a significant amount of overhead, especially in the kthread
creation and destruction, which is not necessary at all. For example,
in a quick test I found that booting a "minimum" x86_64 kernel with all
the crypto options enabled (except for the self-tests) takes about 400ms
until PID 1 can start. Of that, a full 13ms is spent just doing this
pointless dance, involving a kthread being created, run, and destroyed
over 200 times. That's over 3% of the entire kernel start time.

Fix this by just skipping the creation of the test larval and the
posting of the registration notification entirely, when self-tests are
disabled.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

a7008584

Merge branch 'i2c/client_device_id_helper-immutable' of... · 719c547c

Herbert Xu authored Nov 25, 2022

Merge branch 'i2c/client_device_id_helper-immutable' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Merge i2c tree to pick up i2c_client_get_device_id helper.

719c547c

22 Nov, 2022 1 commit

crypto: ccree - Make cc_debugfs_global_fini() available for module init function · 8e96729f

Uwe Kleine-König authored Nov 21, 2022

ccree_init() calls cc_debugfs_global_fini(), the former is an init
function and the latter an exit function though.

A modular build emits:

	WARNING: modpost: drivers/crypto/ccree/ccree.o: section mismatch in reference: init_module (section: .init.text) -> cc_debugfs_global_fini (section: .exit.text)

(with CONFIG_DEBUG_SECTION_MISMATCH=y).

Fixes: 4f1c596d ("crypto: ccree - Remove debugfs when platform_driver_register failed")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

8e96729f

18 Nov, 2022 13 commits

crypto: hisilicon/sec - remove continuous blank lines · 75df46b5

Wenkai Lin authored Nov 12, 2022

Fix that put two or more continuous blank lines inside function.
Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com>
Signed-off-by: Kai Ye <yekai13@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

75df46b5

crypto: hisilicon/sec - fix spelling mistake 'ckeck' -> 'check' · 2132d4ef

Kai Ye authored Nov 12, 2022

There are a couple of spelling mistakes in sec2. Fix them.
Signed-off-by: Kai Ye <yekai13@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

2132d4ef

crypto: hisilicon/qm - the command dump process is modified · 9c756098

Kai Ye authored Nov 12, 2022

Reduce the function complexity by use the function table in the
process of dumping queue. The function input parameters are
unified. And maintainability is enhanced.
Signed-off-by: Kai Ye <yekai13@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

9c756098

crypto: hisilicon/qm - split a debugfs.c from qm · 94476b2b

Kai Ye authored Nov 12, 2022

Considering that the qm feature and debugfs feature are independent.
The code related to debugfs is getting larger and larger. It should be
separate as a debugfs file. So move some debugfs code to new file from
qm file. The qm code logic is not modified. And maintainability is
enhanced.
Signed-off-by: Kai Ye <yekai13@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

94476b2b

crypto: hisilicon/qm - modify the process of regs dfx · b40b62ed

Kai Ye authored Nov 12, 2022

The last register logic and different register logic are combined.
Use "u32" instead of 'int' in the regs function input parameter to
simplify some checks.
Signed-off-by: Kai Ye <yekai13@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

b40b62ed

crypto: hisilicon/qm - delete redundant null assignment operations · 7bbbc9d8

Kai Ye authored Nov 12, 2022

There is no security data in the pointer. It is only a value transferred
as a structure. It makes no sense to zero a variable that is on the stack.
So not need to set the pointer to null.
Signed-off-by: Kai Ye <yekai13@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

7bbbc9d8

crypto: skcipher - Allow sync algorithms with large request contexts · e6cb02bd

Herbert Xu authored Nov 11, 2022

Some sync algorithms may require a large amount of temporary
space during its operations.  There is no reason why they should
be limited just because some legacy users want to place all
temporary data on the stack.

Such algorithms can now set a flag to indicate that they need
extra request context, which will cause them to be invisible
to users that go through the sync_skcipher interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

e6cb02bd

crypto: hisilicon/qm - add missing pci_dev_put() in q_num_set() · cc7710d0

Xiongfeng Wang authored Nov 11, 2022

pci_get_device() will increase the reference count for the returned
pci_dev. We need to use pci_dev_put() to decrease the reference count
before q_num_set() returns.

Fixes: c8b4b477 ("crypto: hisilicon - add HiSilicon HPRE accelerator")
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

cc7710d0

crypto: cryptd - Use request context instead of stack for sub-request · 3a58c231

Herbert Xu authored Nov 11, 2022

cryptd is buggy as it tries to use sync_skcipher without going
through the proper sync_skcipher interface.  In fact it doesn't
even need sync_skcipher since it's already a proper skcipher and
can easily access the request context instead of using something
off the stack.

Fixes: 36b3875a ("crypto: cryptd - Remove VLA usage of skcipher")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

3a58c231

crypto: arm64 - Fix unused variable compilation warnings of cpu_feature · 824db5cd

Tianjia Zhang authored Nov 10, 2022

The cpu feature defined by MODULE_DEVICE_TABLE is only referenced when
compiling as a module, and the warning of unused variable will be
encountered when compiling with intree. The warning can be removed by
adding the __maybe_unused flag.

Fixes: 03c9a333 ("crypto: arm64/ghash - add NEON accelerated fallback for 64-bit PMULL")
Fixes: ae1b83c7 ("crypto: arm64/sm4 - add CE implementation for GCM mode")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

824db5cd

crypto: ccree - Remove debugfs when platform_driver_register failed · 4f1c596d

Gaosheng Cui authored Nov 08, 2022

When platform_driver_register failed, we need to remove debugfs,
which will caused a resource leak, fix it.

Failed logs as follows:
[   32.606488] debugfs: Directory 'ccree' with parent '/' already present!

Fixes: 4c3f9727 ("crypto: ccree - introduce CryptoCell driver")
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

4f1c596d

hwrng: stm32 - rename readl return value · 7cdc5e6b

Tomas Marek authored Nov 08, 2022

Use a more meaningful name for the readl return value variable.

Link: https://lore.kernel.org/all/Y1J3QwynPFIlfrIv@loth.rohan.me.apana.org.au/Signed-off-by: Tomas Marek <tomas.marek@elrest.cz>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

7cdc5e6b

hwrng: core - treat default_quality as a maximum and default to 1024 · 16bdbae3

Jason A. Donenfeld authored Nov 07, 2022

Most hw_random devices return entropy which is assumed to be of full
quality, but driver authors don't bother setting the quality knob. Some
hw_random devices return less than full quality entropy, and then driver
authors set the quality knob. Therefore, the entropy crediting should be
opt-out rather than opt-in per-driver, to reflect the actual reality on
the ground.

For example, the two Raspberry Pi RNG drivers produce full entropy
randomness, and both EDK2 and U-Boot's drivers for these treat them as
such. The result is that EFI then uses these numbers and passes the to
Linux, and Linux credits them as boot, thereby initializing the RNG.
Yet, in Linux, the quality knob was never set to anything, and so on the
chance that Linux is booted without EFI, nothing is ever credited.
That's annoying.

The same pattern appears to repeat itself throughout various drivers. In
fact, very very few drivers have bothered setting quality=1024.

Looking at the git history of existing drivers and corresponding mailing
list discussion, this conclusion tracks. There's been a decent amount of
discussion about drivers that set quality < 1024 -- somebody read and
interepreted a datasheet, or made some back of the envelope calculation
somehow. But there's been very little, if any, discussion about most
drivers where the quality is just set to 1024 or unset (or set to 1000
when the authors misunderstood the API and assumed it was base-10 rather
than base-2); in both cases the intent was fairly clear of, "this is a
hardware random device; it's fine."

So let's invert this logic. A hw_random struct's quality knob now
controls the maximum quality a driver can produce, or 0 to specify 1024.
Then, the module-wide switch called "default_quality" is changed to
represent the maximum quality of any driver. By default it's 1024, and
the quality of any particular driver is then given by:

min(default_quality, rng->quality ?: 1024);

This way, the user can still turn this off for weird reasons (and we can
replace whatever driver-specific disabling hacks existed in the past),
yet we get proper crediting for relevant RNGs.

Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

16bdbae3

14 Nov, 2022 1 commit

i2c: core: Introduce i2c_client_get_device_id helper function · 66223373

Angel Iglesias authored Nov 13, 2022

Introduces new helper function to aid in .probe_new() refactors. In order
to use existing i2c_get_device_id() on the probe callback, the device
match table needs to be accessible in that function, which would require
bigger refactors in some drivers using the deprecated .probe callback.

This issue was discussed in more detail in the IIO mailing list.

Link: https://lore.kernel.org/all/20221023132302.911644-11-u.kleine-koenig@pengutronix.de/Suggested-by: Nuno Sá <noname.nuno@gmail.com>
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Suggested-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Angel Iglesias <ang.iglesiasg@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>

66223373

11 Nov, 2022 5 commits

crypto: qat - remove ADF_STATUS_PF_RUNNING flag from probe · 557ffd5a

Shashank Gupta authored Nov 04, 2022

The ADF_STATUS_PF_RUNNING bit is set after the successful initialization
of the communication between VF to PF in adf_vf2pf_notify_init().
So, it is not required to be set after the execution of the function
adf_dev_init().
Signed-off-by: Shashank Gupta <shashank.gupta@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Wojciech Ziemba <wojciech.ziemba@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

557ffd5a

crypto: rockchip - Remove surplus dev_err() when using platform_get_irq() · fb11cddf

Yang Li authored Nov 04, 2022

There is no need to call the dev_err() function directly to print a
custom message when handling an error from either the platform_get_irq()
or platform_get_irq_byname() functions as both are going to display an
appropriate error message in case of a failure.

./drivers/crypto/rockchip/rk3288_crypto.c:351:2-9: line 351 is
redundant because platform_get_irq() already prints an error

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2677Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

fb11cddf

crypto: lib/aesgcm - Provide minimal library implementation · 520af5da

Ard Biesheuvel authored Nov 03, 2022

Implement a minimal library version of AES-GCM based on the existing
library implementations of AES and multiplication in GF(2^128). Using
these primitives, GCM can be implemented in a straight-forward manner.

GCM has a couple of sharp edges, i.e., the amount of input data
processed with the same initialization vector (IV) should be capped to
protect the counter from 32-bit rollover (or carry), and the size of the
authentication tag should be fixed for a given key. [0]

The former concern is addressed trivially, given that the function call
API uses 32-bit signed types for the input lengths. It is still up to
the caller to avoid IV reuse in general, but this is not something we
can police at the implementation level.

As for the latter concern, let's make the authentication tag size part
of the key schedule, and only permit it to be configured as part of the
key expansion routine.

Note that table based AES implementations are susceptible to known
plaintext timing attacks on the encryption key. The AES library already
attempts to mitigate this to some extent, but given that the counter
mode encryption used by GCM operates exclusively on known plaintext by
construction (the IV and therefore the initial counter value are known
to an attacker), let's take some extra care to mitigate this, by calling
the AES library with interrupts disabled.

[0] https://nvlpubs.nist.gov/nistpubs/legacy/sp/nistspecialpublication800-38d.pdf

Link: https://lore.kernel.org/all/c6fb9b25-a4b6-2e4a-2dd1-63adda055a49@amd.com/Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

520af5da

crypto: lib/gf128mul - make gf128mul_lle time invariant · b67ce439

Ard Biesheuvel authored Nov 03, 2022

The gf128mul library has different variants with different
memory/performance tradeoffs, where the faster ones use 4k or 64k lookup
tables precomputed at runtime, which are based on one of the
multiplication factors, which is commonly the key for keyed hash
algorithms such as GHASH.

The slowest variant is gf128_mul_lle() [and its bbe/ble counterparts],
which does not use precomputed lookup tables, but it still relies on a
single u16[256] lookup table which is input independent. The use of such
a table may cause the execution time of gf128_mul_lle() to correlate
with the value of the inputs, which is generally something that must be
avoided for cryptographic algorithms. On top of that, the function uses
a sequence of if () statements that conditionally invoke be128_xor()
based on which bits are set in the second argument of the function,
which is usually a pointer to the multiplication factor that represents
the key.

In order to remove the correlation between the execution time of
gf128_mul_lle() and the value of its inputs, let's address the
identified shortcomings:
- add a time invariant version of gf128mul_x8_lle() that replaces the
  table lookup with the expression that is used at compile time to
  populate the lookup table;
- make the invocations of be128_xor() unconditional, but pass a zero
  vector as the third argument if the associated bit in the key is
  cleared.

The resulting code is likely to be significantly slower. However, given
that this is the slowest version already, making it even slower in order
to make it more secure is assumed to be justified.

The bbe and ble counterparts could receive the same treatment, but the
former is never used anywhere in the kernel, and the latter is only
used in the driver for a asynchronous crypto h/w accelerator (Chelsio),
where timing variances are unlikely to matter.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

b67ce439

crypto: move gf128mul library into lib/crypto · 61c581a4

Ard Biesheuvel authored Nov 03, 2022

The gf128mul library does not depend on the crypto API at all, so it can
be moved into lib/crypto. This will allow us to use it in other library
code in a subsequent patch without having to depend on CONFIG_CRYPTO.

While at it, change the Kconfig symbol name to align with other crypto
library implementations. However, the source file name is retained, as
it is reflected in the module .ko filename, and changing this might
break things for users.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

61c581a4

04 Nov, 2022 7 commits

crypto: doc - use correct function name · 329cfa42

Ralph Siemsen authored Oct 27, 2022

The hashing API does not have a function called .finish()
Signed-off-by: Ralph Siemsen <ralph.siemsen@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

329cfa42

crypto: arm64/sm4 - add CE implementation for GCM mode · ae1b83c7

Tianjia Zhang authored Oct 27, 2022

This patch is a CE-optimized assembly implementation for GCM mode.

Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 224 and 224
modes of tcrypt, and compared the performance before and after this patch (the
driver used before this patch is gcm_base(ctr-sm4-ce,ghash-generic)).
The abscissas are blocks of different lengths. The data is tabulated and the
unit is Mb/s:

Before (gcm_base(ctr-sm4-ce,ghash-generic)):

gcm(sm4)     |     16      64      256      512     1024     1420     4096     8192
-------------+---------------------------------------------------------------------
  GCM enc    |  25.24   64.65   104.66   116.69   123.81   125.12   129.67   130.62
  GCM dec    |  25.40   64.80   104.74   116.70   123.81   125.21   129.68   130.59
  GCM mb enc |  24.95   64.06   104.20   116.38   123.55   124.97   129.63   130.61
  GCM mb dec |  24.92   64.00   104.13   116.34   123.55   124.98   129.56   130.48

After:

gcm-sm4-ce   |     16      64      256      512     1024     1420     4096     8192
-------------+---------------------------------------------------------------------
  GCM enc    | 108.62  397.18   971.60  1283.92  1522.77  1513.39  1777.00  1806.96
  GCM dec    | 116.36  398.14  1004.27  1319.11  1624.21  1635.43  1932.54  1974.20
  GCM mb enc | 107.13  391.79   962.05  1274.94  1514.76  1508.57  1769.07  1801.58
  GCM mb dec | 113.40  389.36   988.51  1307.68  1619.10  1631.55  1931.70  1970.86
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

ae1b83c7

crypto: arm64/sm4 - add CE implementation for CCM mode · 67fa3a7f

Tianjia Zhang authored Oct 27, 2022

This patch is a CE-optimized assembly implementation for CCM mode.

Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 223 and 225
modes of tcrypt, and compared the performance before and after this patch (the
driver used before this patch is ccm_base(ctr-sm4-ce,cbcmac-sm4-ce)).
The abscissas are blocks of different lengths. The data is tabulated and the
unit is Mb/s:

Before (rfc4309(ccm_base(ctr-sm4-ce,cbcmac-sm4-ce))):

ccm(sm4)     |     16      64     256     512    1024    1420    4096    8192
-------------+---------------------------------------------------------------
  CCM enc    |  35.07  125.40  336.47  468.17  581.97  619.18  712.56  736.01
  CCM dec    |  34.87  124.40  335.08  466.75  581.04  618.81  712.25  735.89
  CCM mb enc |  34.71  123.96  333.92  465.39  579.91  617.49  711.45  734.92
  CCM mb dec |  34.42  122.80  331.02  462.81  578.28  616.42  709.88  734.19

After (rfc4309(ccm-sm4-ce)):

ccm-sm4-ce   |     16      64     256     512    1024    1420    4096    8192
-------------+---------------------------------------------------------------
  CCM enc    |  77.12  249.82  569.94  725.17  839.27  867.71  952.87  969.89
  CCM dec    |  75.90  247.26  566.29  722.12  836.90  865.95  951.74  968.57
  CCM mb enc |  75.98  245.25  562.91  718.99  834.76  864.70  950.17  967.90
  CCM mb dec |  75.06  243.78  560.58  717.13  833.68  862.70  949.35  967.11
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

67fa3a7f

crypto: arm64/sm4 - add CE implementation for cmac/xcbc/cbcmac · 6b5360a5

Tianjia Zhang authored Oct 27, 2022

This patch is a CE-optimized assembly implementation for cmac/xcbc/cbcmac.

Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 300 mode of
tcrypt, and compared the performance before and after this patch (the driver
used before this patch is XXXmac(sm4-ce)). The abscissas are blocks of
different lengths. The data is tabulated and the unit is Mb/s:

Before:

update-size    |      16      64     256    1024    2048    4096    8192
---------------+--------------------------------------------------------
cmac(sm4-ce)   |  293.33  403.69  503.76  527.78  531.10  535.46  535.81
xcbc(sm4-ce)   |  292.83  402.50  504.02  529.08  529.87  536.55  538.24
cbcmac(sm4-ce) |  318.42  415.79  497.12  515.05  523.15  521.19  523.01

After:

update-size    |      16      64     256    1024    2048    4096    8192
---------------+--------------------------------------------------------
cmac-sm4-ce    |  371.99  675.28  903.56  971.65  980.57  990.40  991.04
xcbc-sm4-ce    |  372.11  674.55  903.47  971.61  980.96  990.42  991.10
cbcmac-sm4-ce  |  371.63  675.33  903.23  972.07  981.42  990.93  991.45
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

6b5360a5

crypto: arm64/sm4 - add CE implementation for XTS mode · 01f63311

Tianjia Zhang authored Oct 27, 2022

This patch is a CE-optimized assembly implementation for XTS mode.

Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218 mode of
tcrypt, and compared the performance before and after this patch (the driver
used before this patch is xts(ecb-sm4-ce)). The abscissas are blocks of
different lengths. The data is tabulated and the unit is Mb/s:

Before:

xts(ecb-sm4-ce) |      16       64      128      256     1024     1420     4096
----------------+--------------------------------------------------------------
        XTS enc |  117.17   430.56   732.92  1134.98  2007.03  2136.23  2347.20
        XTS dec |  116.89   429.02   733.40  1132.96  2006.13  2130.50  2347.92

After:

xts-sm4-ce      |      16       64      128      256     1024     1420     4096
----------------+--------------------------------------------------------------
        XTS enc |  224.68   798.91  1248.08  1714.60  2413.73  2467.84  2612.62
        XTS dec |  229.85   791.34  1237.79  1720.00  2413.30  2473.84  2611.95
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

01f63311

crypto: arm64/sm4 - add CE implementation for CTS-CBC mode · b1863fd0

Tianjia Zhang authored Oct 27, 2022

This patch is a CE-optimized assembly implementation for CTS-CBC mode.

Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218 mode of
tcrypt, and compared the performance before and after this patch (the driver
used before this patch is cts(cbc-sm4-ce)). The abscissas are blocks of
different lengths. The data is tabulated and the unit is Mb/s:

Before:

cts(cbc-sm4-ce) |      16       64      128      256     1024     1420     4096
----------------+--------------------------------------------------------------
    CTS-CBC enc |  286.09   297.17   457.97   627.75   868.58   900.80   957.69
    CTS-CBC dec |  286.67   285.63   538.35   947.08  2241.03  2577.32  3391.14

After:

cts-cbc-sm4-ce  |      16       64      128      256     1024     1420     4096
----------------+--------------------------------------------------------------
    CTS-CBC enc |  288.19   428.80   593.57   741.04   911.73   931.80   950.00
    CTS-CBC dec |  292.22   468.99   838.23  1380.76  2741.17  3036.42  3409.62
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

b1863fd0

crypto: arm64/sm4 - export reusable CE acceleration functions · 45089dbe

Tianjia Zhang authored Oct 27, 2022

In the accelerated implementation of the SM4 algorithm using the Crypto
Extension instructions, there are some functions that can be reused in
the upcoming accelerated implementation of the GCM/CCM mode, and the
CBC/CFB encryption is reused in the optimized implementation of SVESM4.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

45089dbe