Commit c150b809 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - Support for various vector-accelerated crypto routines

 - Hibernation is now enabled for portable kernel builds

 - mmap_rnd_bits_max is larger on systems with larger VAs

 - Support for fast GUP

 - Support for membarrier-based instruction cache synchronization

 - Support for the Andes hart-level interrupt controller and PMU

 - Some cleanups around unaligned access speed probing and Kconfig
   settings

 - Support for ACPI LPI and CPPC

 - Various cleanus related to barriers

 - A handful of fixes

* tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (66 commits)
  riscv: Fix syscall wrapper for >word-size arguments
  crypto: riscv - add vector crypto accelerated AES-CBC-CTS
  crypto: riscv - parallelize AES-CBC decryption
  riscv: Only flush the mm icache when setting an exec pte
  riscv: Use kcalloc() instead of kzalloc()
  riscv/barrier: Add missing space after ','
  riscv/barrier: Consolidate fence definitions
  riscv/barrier: Define RISCV_FULL_BARRIER
  riscv/barrier: Define __{mb,rmb,wmb}
  RISC-V: defconfig: Enable CONFIG_ACPI_CPPC_CPUFREQ
  cpufreq: Move CPPC configs to common Kconfig and add RISC-V
  ACPI: RISC-V: Add CPPC driver
  ACPI: Enable ACPI_PROCESSOR for RISC-V
  ACPI: RISC-V: Add LPI driver
  cpuidle: RISC-V: Move few functions to arch/riscv
  riscv: Introduce set_compat_task() in asm/compat.h
  riscv: Introduce is_compat_thread() into compat.h
  riscv: add compile-time test into is_compat_task()
  riscv: Replace direct thread flag check with is_compat_task()
  riscv: Improve arch_get_mmap_end() macro
  ...
parents 1e3cd03c a9ad7329
......@@ -144,14 +144,8 @@ passing 0 into the hint address parameter of mmap. On CPUs with an address space
smaller than sv48, the CPU maximum supported address space will be the default.
Software can "opt-in" to receiving VAs from another VA space by providing
a hint address to mmap. A hint address passed to mmap will cause the largest
address space that fits entirely into the hint to be used, unless there is no
space left in the address space. If there is no space available in the requested
address space, an address in the next smallest available address space will be
returned.
For example, in order to obtain 48-bit VA space, a hint address greater than
:code:`1 << 47` must be provided. Note that this is 47 due to sv48 userspace
ending at :code:`1 << 47` and the addresses beyond this are reserved for the
kernel. Similarly, to obtain 57-bit VA space addresses, a hint address greater
than or equal to :code:`1 << 56` must be provided.
a hint address to mmap. When a hint address is passed to mmap, the returned
address will never use more bits than the hint address. For example, if a hint
address of `1 << 40` is passed to mmap, a valid returned address will never use
bits 41 through 63. If no mappable addresses are available in that range, mmap
will return `MAP_FAILED`.
......@@ -110,7 +110,11 @@ properties:
const: 1
compatible:
const: riscv,cpu-intc
oneOf:
- items:
- const: andestech,cpu-intc
- const: riscv,cpu-intc
- const: riscv,cpu-intc
interrupt-controller: true
......
......@@ -477,5 +477,12 @@ properties:
latency, as ratified in commit 56ed795 ("Update
riscv-crypto-spec-vector.adoc") of riscv-crypto.
- const: xandespmu
description:
The Andes Technology performance monitor extension for counter overflow
and privilege mode filtering. For more details, see Counter Related
Registers in the AX45MP datasheet.
https://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf
additionalProperties: true
...
......@@ -10,6 +10,22 @@
# Rely on implicit context synchronization as a result of exception return
# when returning from IPI handler, and when returning to user-space.
#
# * riscv
#
# riscv uses xRET as return from interrupt and to return to user-space.
#
# Given that xRET is not core serializing, we rely on FENCE.I for providing
# core serialization:
#
# - by calling sync_core_before_usermode() on return from interrupt (cf.
# ipi_sync_core()),
#
# - via switch_mm() and sync_core_before_usermode() (respectively, for
# uthread->uthread and kthread->uthread transitions) before returning
# to user-space.
#
# The serialization in switch_mm() is activated by prepare_sync_core_cmd().
#
# * x86
#
# x86-32 uses IRET as return from interrupt, which takes care of the IPI.
......@@ -43,7 +59,7 @@
| openrisc: | TODO |
| parisc: | TODO |
| powerpc: | ok |
| riscv: | TODO |
| riscv: | ok |
| s390: | ok |
| sh: | TODO |
| sparc: | TODO |
......
......@@ -7,6 +7,7 @@ Scheduler
completion
membarrier
sched-arch
sched-bwc
sched-deadline
......
.. SPDX-License-Identifier: GPL-2.0
========================
membarrier() System Call
========================
MEMBARRIER_CMD_{PRIVATE,GLOBAL}_EXPEDITED - Architecture requirements
=====================================================================
Memory barriers before updating rq->curr
----------------------------------------
The commands MEMBARRIER_CMD_PRIVATE_EXPEDITED and MEMBARRIER_CMD_GLOBAL_EXPEDITED
require each architecture to have a full memory barrier after coming from
user-space, before updating rq->curr. This barrier is implied by the sequence
rq_lock(); smp_mb__after_spinlock() in __schedule(). The barrier matches a full
barrier in the proximity of the membarrier system call exit, cf.
membarrier_{private,global}_expedited().
Memory barriers after updating rq->curr
---------------------------------------
The commands MEMBARRIER_CMD_PRIVATE_EXPEDITED and MEMBARRIER_CMD_GLOBAL_EXPEDITED
require each architecture to have a full memory barrier after updating rq->curr,
before returning to user-space. The schemes providing this barrier on the various
architectures are as follows.
- alpha, arc, arm, hexagon, mips rely on the full barrier implied by
spin_unlock() in finish_lock_switch().
- arm64 relies on the full barrier implied by switch_to().
- powerpc, riscv, s390, sparc, x86 rely on the full barrier implied by
switch_mm(), if mm is not NULL; they rely on the full barrier implied
by mmdrop(), otherwise. On powerpc and riscv, switch_mm() relies on
membarrier_arch_switch_mm().
The barrier matches a full barrier in the proximity of the membarrier system call
entry, cf. membarrier_{private,global}_expedited().
......@@ -14134,7 +14134,9 @@ M: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
M: "Paul E. McKenney" <paulmck@kernel.org>
L: linux-kernel@vger.kernel.org
S: Supported
F: arch/powerpc/include/asm/membarrier.h
F: Documentation/scheduler/membarrier.rst
F: arch/*/include/asm/membarrier.h
F: arch/*/include/asm/sync_core.h
F: include/uapi/linux/membarrier.h
F: kernel/sched/membarrier.c
......
......@@ -2,6 +2,7 @@
obj-y += kernel/ mm/ net/
obj-$(CONFIG_BUILTIN_DTB) += boot/dts/
obj-$(CONFIG_CRYPTO) += crypto/
obj-y += errata/
obj-$(CONFIG_KVM) += kvm/
......
......@@ -27,14 +27,18 @@ config RISCV
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_GIGANTIC_PAGE
select ARCH_HAS_KCOV
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_MMIOWB
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PMEM_API
select ARCH_HAS_PREPARE_SYNC_CORE_CMD
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_SET_DIRECT_MAP if MMU
select ARCH_HAS_SET_MEMORY if MMU
select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL
select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
select ARCH_HAS_SYSCALL_WRAPPER
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UBSAN
......@@ -47,6 +51,9 @@ config RISCV
select ARCH_SUPPORTS_CFI_CLANG
select ARCH_SUPPORTS_DEBUG_PAGEALLOC if MMU
select ARCH_SUPPORTS_HUGETLBFS if MMU
# LLD >= 14: https://github.com/llvm/llvm-project/issues/50505
select ARCH_SUPPORTS_LTO_CLANG if LLD_VERSION >= 140000
select ARCH_SUPPORTS_LTO_CLANG_THIN if LLD_VERSION >= 140000
select ARCH_SUPPORTS_PAGE_TABLE_CHECK if MMU
select ARCH_SUPPORTS_PER_VMA_LOCK if MMU
select ARCH_SUPPORTS_SHADOW_CALL_STACK if HAVE_SHADOW_CALL_STACK
......@@ -106,6 +113,7 @@ config RISCV
select HAVE_ARCH_KGDB_QXFER_PKT
select HAVE_ARCH_MMAP_RND_BITS if MMU
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_TRACEHOOK
......@@ -124,6 +132,7 @@ config RISCV
select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_TRACER if !XIP_KERNEL && !PREEMPTION
select HAVE_EBPF_JIT if MMU
select HAVE_FAST_GUP if MMU
select HAVE_FUNCTION_ARG_ACCESS_API
select HAVE_FUNCTION_ERROR_INJECTION
select HAVE_GCC_PLUGINS
......@@ -155,6 +164,7 @@ config RISCV
select IRQ_FORCED_THREADING
select KASAN_VMALLOC if KASAN
select LOCK_MM_AND_FIND_VMA
select MMU_GATHER_RCU_TABLE_FREE if SMP && MMU
select MODULES_USE_ELF_RELA if MODULES
select MODULE_SECTIONS if MODULES
select OF
......@@ -576,6 +586,13 @@ config TOOLCHAIN_HAS_ZBB
depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
depends on AS_HAS_OPTION_ARCH
# This symbol indicates that the toolchain supports all v1.0 vector crypto
# extensions, including Zvk*, Zvbb, and Zvbc. LLVM added all of these at once.
# binutils added all except Zvkb, then added Zvkb. So we just check for Zvkb.
config TOOLCHAIN_HAS_VECTOR_CRYPTO
def_bool $(as-instr, .option arch$(comma) +v$(comma) +zvkb)
depends on AS_HAS_OPTION_ARCH
config RISCV_ISA_ZBB
bool "Zbb extension support for bit manipulation instructions"
depends on TOOLCHAIN_HAS_ZBB
......@@ -686,27 +703,61 @@ config THREAD_SIZE_ORDER
affects irq stack size, which is equal to thread stack size.
config RISCV_MISALIGNED
bool "Support misaligned load/store traps for kernel and userspace"
bool
select SYSCTL_ARCH_UNALIGN_ALLOW
default y
help
Say Y here if you want the kernel to embed support for misaligned
load/store for both kernel and userspace. When disable, misaligned
accesses will generate SIGBUS in userspace and panic in kernel.
Embed support for emulating misaligned loads and stores.
choice
prompt "Unaligned Accesses Support"
default RISCV_PROBE_UNALIGNED_ACCESS
help
This determines the level of support for unaligned accesses. This
information is used by the kernel to perform optimizations. It is also
exposed to user space via the hwprobe syscall. The hardware will be
probed at boot by default.
config RISCV_PROBE_UNALIGNED_ACCESS
bool "Probe for hardware unaligned access support"
select RISCV_MISALIGNED
help
During boot, the kernel will run a series of tests to determine the
speed of unaligned accesses. This probing will dynamically determine
the speed of unaligned accesses on the underlying system. If unaligned
memory accesses trap into the kernel as they are not supported by the
system, the kernel will emulate the unaligned accesses to preserve the
UABI.
config RISCV_EMULATED_UNALIGNED_ACCESS
bool "Emulate unaligned access where system support is missing"
select RISCV_MISALIGNED
help
If unaligned memory accesses trap into the kernel as they are not
supported by the system, the kernel will emulate the unaligned
accesses to preserve the UABI. When the underlying system does support
unaligned accesses, the unaligned accesses are assumed to be slow.
config RISCV_SLOW_UNALIGNED_ACCESS
bool "Assume the system supports slow unaligned memory accesses"
depends on NONPORTABLE
help
Assume that the system supports slow unaligned memory accesses. The
kernel and userspace programs may not be able to run at all on systems
that do not support unaligned memory accesses.
config RISCV_EFFICIENT_UNALIGNED_ACCESS
bool "Assume the CPU supports fast unaligned memory accesses"
bool "Assume the system supports fast unaligned memory accesses"
depends on NONPORTABLE
select DCACHE_WORD_ACCESS if MMU
select HAVE_EFFICIENT_UNALIGNED_ACCESS
help
Say Y here if you want the kernel to assume that the CPU supports
efficient unaligned memory accesses. When enabled, this option
improves the performance of the kernel on such CPUs. However, the
kernel will run much more slowly, or will not be able to run at all,
on CPUs that do not support efficient unaligned memory accesses.
Assume that the system supports fast unaligned memory accesses. When
enabled, this option improves the performance of the kernel on such
systems. However, the kernel and userspace programs will run much more
slowly, or will not be able to run at all, on systems that do not
support efficient unaligned memory accesses.
If unsure what to do here, say N.
endchoice
endmenu # "Platform type"
......@@ -1011,11 +1062,8 @@ menu "Power management options"
source "kernel/power/Kconfig"
# Hibernation is only possible on systems where the SBI implementation has
# marked its reserved memory as not accessible from, or does not run
# from the same memory as, Linux
config ARCH_HIBERNATION_POSSIBLE
def_bool NONPORTABLE
def_bool y
config ARCH_HIBERNATION_HEADER
def_bool HIBERNATION
......
......@@ -50,6 +50,11 @@ ifndef CONFIG_AS_IS_LLVM
KBUILD_CFLAGS += -Wa,-mno-relax
KBUILD_AFLAGS += -Wa,-mno-relax
endif
# LLVM has an issue with target-features and LTO: https://github.com/llvm/llvm-project/issues/59350
# Ensure it is aware of linker relaxation with LTO, otherwise relocations may
# be incorrect: https://github.com/llvm/llvm-project/issues/65090
else ifeq ($(CONFIG_LTO_CLANG),y)
KBUILD_LDFLAGS += -mllvm -mattr=+c -mllvm -mattr=+relax
endif
ifeq ($(CONFIG_SHADOW_CALL_STACK),y)
......
......@@ -27,7 +27,7 @@ cpu0: cpu@0 {
riscv,isa-base = "rv64i";
riscv,isa-extensions = "i", "m", "a", "f", "d", "c",
"zicntr", "zicsr", "zifencei",
"zihpm";
"zihpm", "xandespmu";
mmu-type = "riscv,sv39";
i-cache-size = <0x8000>;
i-cache-line-size = <0x40>;
......@@ -39,7 +39,7 @@ cpu0: cpu@0 {
cpu0_intc: interrupt-controller {
#interrupt-cells = <1>;
compatible = "riscv,cpu-intc";
compatible = "andestech,cpu-intc", "riscv,cpu-intc";
interrupt-controller;
};
};
......
......@@ -44,6 +44,7 @@ CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=m
CONFIG_CPUFREQ_DT=y
CONFIG_ACPI_CPPC_CPUFREQ=m
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
CONFIG_ACPI=y
......@@ -215,6 +216,7 @@ CONFIG_MMC=y
CONFIG_MMC_SDHCI=y
CONFIG_MMC_SDHCI_PLTFM=y
CONFIG_MMC_SDHCI_CADENCE=y
CONFIG_MMC_SDHCI_OF_DWCMSHC=y
CONFIG_MMC_SPI=y
CONFIG_MMC_DW=y
CONFIG_MMC_DW_STARFIVE=y
......@@ -224,6 +226,7 @@ CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_SUN6I=y
CONFIG_DMADEVICES=y
CONFIG_DMA_SUN6I=m
CONFIG_DW_AXI_DMAC=y
CONFIG_RZ_DMAC=y
CONFIG_VIRTIO_PCI=y
CONFIG_VIRTIO_BALLOON=y
......
# SPDX-License-Identifier: GPL-2.0
menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
config CRYPTO_AES_RISCV64
tristate "Ciphers: AES, modes: ECB, CBC, CTS, CTR, XTS"
depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
select CRYPTO_ALGAPI
select CRYPTO_LIB_AES
select CRYPTO_SKCIPHER
help
Block cipher: AES cipher algorithms
Length-preserving ciphers: AES with ECB, CBC, CTS, CTR, XTS
Architecture: riscv64 using:
- Zvkned vector crypto extension
- Zvbb vector extension (XTS)
- Zvkb vector crypto extension (CTR)
- Zvkg vector crypto extension (XTS)
config CRYPTO_CHACHA_RISCV64
tristate "Ciphers: ChaCha"
depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
select CRYPTO_SKCIPHER
select CRYPTO_LIB_CHACHA_GENERIC
help
Length-preserving ciphers: ChaCha20 stream cipher algorithm
Architecture: riscv64 using:
- Zvkb vector crypto extension
config CRYPTO_GHASH_RISCV64
tristate "Hash functions: GHASH"
depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
select CRYPTO_GCM
help
GCM GHASH function (NIST SP 800-38D)
Architecture: riscv64 using:
- Zvkg vector crypto extension
config CRYPTO_SHA256_RISCV64
tristate "Hash functions: SHA-224 and SHA-256"
depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
select CRYPTO_SHA256
help
SHA-224 and SHA-256 secure hash algorithm (FIPS 180)
Architecture: riscv64 using:
- Zvknha or Zvknhb vector crypto extensions
- Zvkb vector crypto extension
config CRYPTO_SHA512_RISCV64
tristate "Hash functions: SHA-384 and SHA-512"
depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
select CRYPTO_SHA512
help
SHA-384 and SHA-512 secure hash algorithm (FIPS 180)
Architecture: riscv64 using:
- Zvknhb vector crypto extension
- Zvkb vector crypto extension
config CRYPTO_SM3_RISCV64
tristate "Hash functions: SM3 (ShangMi 3)"
depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
select CRYPTO_HASH
select CRYPTO_SM3
help
SM3 (ShangMi 3) secure hash function (OSCCA GM/T 0004-2012)
Architecture: riscv64 using:
- Zvksh vector crypto extension
- Zvkb vector crypto extension
config CRYPTO_SM4_RISCV64
tristate "Ciphers: SM4 (ShangMi 4)"
depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO
select CRYPTO_ALGAPI
select CRYPTO_SM4
help
SM4 block cipher algorithm (OSCCA GB/T 32907-2016,
ISO/IEC 18033-3:2010/Amd 1:2021)
SM4 (GBT.32907-2016) is a cryptographic standard issued by the
Organization of State Commercial Administration of China (OSCCA)
as an authorized cryptographic algorithm for use within China.
Architecture: riscv64 using:
- Zvksed vector crypto extension
- Zvkb vector crypto extension
endmenu
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o
aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o \
aes-riscv64-zvkned-zvbb-zvkg.o aes-riscv64-zvkned-zvkb.o
obj-$(CONFIG_CRYPTO_CHACHA_RISCV64) += chacha-riscv64.o
chacha-riscv64-y := chacha-riscv64-glue.o chacha-riscv64-zvkb.o
obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o
obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o
sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknha_or_zvknhb-zvkb.o
obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvknhb-zvkb.o
obj-$(CONFIG_CRYPTO_SM3_RISCV64) += sm3-riscv64.o
sm3-riscv64-y := sm3-riscv64-glue.o sm3-riscv64-zvksh-zvkb.o
obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o
sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed-zvkb.o
/* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */
//
// This file is dual-licensed, meaning that you can use it under your
// choice of either of the following two licenses:
//
// Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
//
// Licensed under the Apache License 2.0 (the "License"). You can obtain
// a copy in the file LICENSE in the source distribution or at
// https://www.openssl.org/source/license.html
//
// or
//
// Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
// Copyright (c) 2023, Phoebe Chen <phoebe.chen@sifive.com>
// Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com>
// Copyright 2024 Google LLC
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions
// are met:
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// This file contains macros that are shared by the other aes-*.S files. The
// generated code of these macros depends on the following RISC-V extensions:
// - RV64I
// - RISC-V Vector ('V') with VLEN >= 128
// - RISC-V Vector AES block cipher extension ('Zvkned')
// Loads the AES round keys from \keyp into vector registers and jumps to code
// specific to the length of the key. Specifically:
// - If AES-128, loads round keys into v1-v11 and jumps to \label128.
// - If AES-192, loads round keys into v1-v13 and jumps to \label192.
// - If AES-256, loads round keys into v1-v15 and continues onwards.
//
// Also sets vl=4 and vtype=e32,m1,ta,ma. Clobbers t0 and t1.
.macro aes_begin keyp, label128, label192
lwu t0, 480(\keyp) // t0 = key length in bytes
li t1, 24 // t1 = key length for AES-192
vsetivli zero, 4, e32, m1, ta, ma
vle32.v v1, (\keyp)
addi \keyp, \keyp, 16
vle32.v v2, (\keyp)
addi \keyp, \keyp, 16
vle32.v v3, (\keyp)
addi \keyp, \keyp, 16
vle32.v v4, (\keyp)
addi \keyp, \keyp, 16
vle32.v v5, (\keyp)
addi \keyp, \keyp, 16
vle32.v v6, (\keyp)
addi \keyp, \keyp, 16
vle32.v v7, (\keyp)
addi \keyp, \keyp, 16
vle32.v v8, (\keyp)
addi \keyp, \keyp, 16
vle32.v v9, (\keyp)
addi \keyp, \keyp, 16
vle32.v v10, (\keyp)
addi \keyp, \keyp, 16
vle32.v v11, (\keyp)
blt t0, t1, \label128 // If AES-128, goto label128.
addi \keyp, \keyp, 16
vle32.v v12, (\keyp)
addi \keyp, \keyp, 16
vle32.v v13, (\keyp)
beq t0, t1, \label192 // If AES-192, goto label192.
// Else, it's AES-256.
addi \keyp, \keyp, 16
vle32.v v14, (\keyp)
addi \keyp, \keyp, 16
vle32.v v15, (\keyp)
.endm
// Encrypts \data using zvkned instructions, using the round keys loaded into
// v1-v11 (for AES-128), v1-v13 (for AES-192), or v1-v15 (for AES-256). \keylen
// is the AES key length in bits. vl and vtype must already be set
// appropriately. Note that if vl > 4, multiple blocks are encrypted.
.macro aes_encrypt data, keylen
vaesz.vs \data, v1
vaesem.vs \data, v2
vaesem.vs \data, v3
vaesem.vs \data, v4
vaesem.vs \data, v5
vaesem.vs \data, v6
vaesem.vs \data, v7
vaesem.vs \data, v8
vaesem.vs \data, v9
vaesem.vs \data, v10
.if \keylen == 128
vaesef.vs \data, v11
.elseif \keylen == 192
vaesem.vs \data, v11
vaesem.vs \data, v12
vaesef.vs \data, v13
.else
vaesem.vs \data, v11
vaesem.vs \data, v12
vaesem.vs \data, v13
vaesem.vs \data, v14
vaesef.vs \data, v15
.endif
.endm
// Same as aes_encrypt, but decrypts instead of encrypts.
.macro aes_decrypt data, keylen
.if \keylen == 128
vaesz.vs \data, v11
.elseif \keylen == 192
vaesz.vs \data, v13
vaesdm.vs \data, v12
vaesdm.vs \data, v11
.else
vaesz.vs \data, v15
vaesdm.vs \data, v14
vaesdm.vs \data, v13
vaesdm.vs \data, v12
vaesdm.vs \data, v11
.endif
vaesdm.vs \data, v10
vaesdm.vs \data, v9
vaesdm.vs \data, v8
vaesdm.vs \data, v7
vaesdm.vs \data, v6
vaesdm.vs \data, v5
vaesdm.vs \data, v4
vaesdm.vs \data, v3
vaesdm.vs \data, v2
vaesdf.vs \data, v1
.endm
// Expands to aes_encrypt or aes_decrypt according to \enc, which is 1 or 0.
.macro aes_crypt data, enc, keylen
.if \enc
aes_encrypt \data, \keylen
.else
aes_decrypt \data, \keylen
.endif
.endm
This diff is collapsed.
This diff is collapsed.
/* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */
//
// This file is dual-licensed, meaning that you can use it under your
// choice of either of the following two licenses:
//
// Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
//
// Licensed under the Apache License 2.0 (the "License"). You can obtain
// a copy in the file LICENSE in the source distribution or at
// https://www.openssl.org/source/license.html
//
// or
//
// Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com>
// Copyright 2024 Google LLC
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions
// are met:
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// The generated code of this file depends on the following RISC-V extensions:
// - RV64I
// - RISC-V Vector ('V') with VLEN >= 128
// - RISC-V Vector AES block cipher extension ('Zvkned')
// - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
#include <linux/linkage.h>
.text
.option arch, +zvkned, +zvkb
#include "aes-macros.S"
#define KEYP a0
#define INP a1
#define OUTP a2
#define LEN a3
#define IVP a4
#define LEN32 a5
#define VL_E32 a6
#define VL_BLOCKS a7
.macro aes_ctr32_crypt keylen
// LEN32 = number of blocks, rounded up, in 32-bit words.
addi t0, LEN, 15
srli t0, t0, 4
slli LEN32, t0, 2
// Create a mask that selects the last 32-bit word of each 128-bit
// block. This is the word that contains the (big-endian) counter.
li t0, 0x88
vsetvli t1, zero, e8, m1, ta, ma
vmv.v.x v0, t0
// Load the IV into v31. The last 32-bit word contains the counter.
vsetivli zero, 4, e32, m1, ta, ma
vle32.v v31, (IVP)
// Convert the big-endian counter into little-endian.
vsetivli zero, 4, e32, m1, ta, mu
vrev8.v v31, v31, v0.t
// Splat the IV to v16 (with LMUL=4). The number of copies is the
// maximum number of blocks that will be processed per iteration.
vsetvli zero, LEN32, e32, m4, ta, ma
vmv.v.i v16, 0
vaesz.vs v16, v31
// v20 = [x, x, x, 0, x, x, x, 1, ...]
viota.m v20, v0, v0.t
// v16 = [IV0, IV1, IV2, counter+0, IV0, IV1, IV2, counter+1, ...]
vsetvli VL_E32, LEN32, e32, m4, ta, mu
vadd.vv v16, v16, v20, v0.t
j 2f
1:
// Set the number of blocks to process in this iteration. vl=VL_E32 is
// the length in 32-bit words, i.e. 4 times the number of blocks.
vsetvli VL_E32, LEN32, e32, m4, ta, mu
// Increment the counters by the number of blocks processed in the
// previous iteration.
vadd.vx v16, v16, VL_BLOCKS, v0.t
2:
// Prepare the AES inputs into v24.
vmv.v.v v24, v16
vrev8.v v24, v24, v0.t // Convert counters back to big-endian.
// Encrypt the AES inputs to create the next portion of the keystream.
aes_encrypt v24, \keylen
// XOR the data with the keystream.
vsetvli t0, LEN, e8, m4, ta, ma
vle8.v v20, (INP)
vxor.vv v20, v20, v24
vse8.v v20, (OUTP)
// Advance the pointers and update the remaining length.
add INP, INP, t0
add OUTP, OUTP, t0
sub LEN, LEN, t0
sub LEN32, LEN32, VL_E32
srli VL_BLOCKS, VL_E32, 2
// Repeat if more data remains.
bnez LEN, 1b
// Update *IVP to contain the next counter.
vsetivli zero, 4, e32, m1, ta, mu
vadd.vx v16, v16, VL_BLOCKS, v0.t
vrev8.v v16, v16, v0.t // Convert counters back to big-endian.
vse32.v v16, (IVP)
ret
.endm
// void aes_ctr32_crypt_zvkned_zvkb(const struct crypto_aes_ctx *key,
// const u8 *in, u8 *out, size_t len,
// u8 iv[16]);
SYM_FUNC_START(aes_ctr32_crypt_zvkned_zvkb)
aes_begin KEYP, 128f, 192f
aes_ctr32_crypt 256
128:
aes_ctr32_crypt 128
192:
aes_ctr32_crypt 192
SYM_FUNC_END(aes_ctr32_crypt_zvkned_zvkb)
This diff is collapsed.
// SPDX-License-Identifier: GPL-2.0-only
/*
* ChaCha20 using the RISC-V vector crypto extensions
*
* Copyright (C) 2023 SiFive, Inc.
* Author: Jerry Shih <jerry.shih@sifive.com>
*/
#include <asm/simd.h>
#include <asm/vector.h>
#include <crypto/internal/chacha.h>
#include <crypto/internal/skcipher.h>
#include <linux/linkage.h>
#include <linux/module.h>
asmlinkage void chacha20_zvkb(const u32 key[8], const u8 *in, u8 *out,
size_t len, const u32 iv[4]);
static int riscv64_chacha20_crypt(struct skcipher_request *req)
{
u32 iv[CHACHA_IV_SIZE / sizeof(u32)];
u8 block_buffer[CHACHA_BLOCK_SIZE];
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
const struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
unsigned int nbytes;
unsigned int tail_bytes;
int err;
iv[0] = get_unaligned_le32(req->iv);
iv[1] = get_unaligned_le32(req->iv + 4);
iv[2] = get_unaligned_le32(req->iv + 8);
iv[3] = get_unaligned_le32(req->iv + 12);
err = skcipher_walk_virt(&walk, req, false);
while (walk.nbytes) {
nbytes = walk.nbytes & ~(CHACHA_BLOCK_SIZE - 1);
tail_bytes = walk.nbytes & (CHACHA_BLOCK_SIZE - 1);
kernel_vector_begin();
if (nbytes) {
chacha20_zvkb(ctx->key, walk.src.virt.addr,
walk.dst.virt.addr, nbytes, iv);
iv[0] += nbytes / CHACHA_BLOCK_SIZE;
}
if (walk.nbytes == walk.total && tail_bytes > 0) {
memcpy(block_buffer, walk.src.virt.addr + nbytes,
tail_bytes);
chacha20_zvkb(ctx->key, block_buffer, block_buffer,
CHACHA_BLOCK_SIZE, iv);
memcpy(walk.dst.virt.addr + nbytes, block_buffer,
tail_bytes);
tail_bytes = 0;
}
kernel_vector_end();
err = skcipher_walk_done(&walk, tail_bytes);
}
return err;
}
static struct skcipher_alg riscv64_chacha_alg = {
.setkey = chacha20_setkey,
.encrypt = riscv64_chacha20_crypt,
.decrypt = riscv64_chacha20_crypt,
.min_keysize = CHACHA_KEY_SIZE,
.max_keysize = CHACHA_KEY_SIZE,
.ivsize = CHACHA_IV_SIZE,
.chunksize = CHACHA_BLOCK_SIZE,
.walksize = 4 * CHACHA_BLOCK_SIZE,
.base = {
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct chacha_ctx),
.cra_priority = 300,
.cra_name = "chacha20",
.cra_driver_name = "chacha20-riscv64-zvkb",
.cra_module = THIS_MODULE,
},
};
static int __init riscv64_chacha_mod_init(void)
{
if (riscv_isa_extension_available(NULL, ZVKB) &&
riscv_vector_vlen() >= 128)
return crypto_register_skcipher(&riscv64_chacha_alg);
return -ENODEV;
}
static void __exit riscv64_chacha_mod_exit(void)
{
crypto_unregister_skcipher(&riscv64_chacha_alg);
}
module_init(riscv64_chacha_mod_init);
module_exit(riscv64_chacha_mod_exit);
MODULE_DESCRIPTION("ChaCha20 (RISC-V accelerated)");
MODULE_AUTHOR("Jerry Shih <jerry.shih@sifive.com>");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("chacha20");
/* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */
//
// This file is dual-licensed, meaning that you can use it under your
// choice of either of the following two licenses:
//
// Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
//
// Licensed under the Apache License 2.0 (the "License"). You can obtain
// a copy in the file LICENSE in the source distribution or at
// https://www.openssl.org/source/license.html
//
// or
//
// Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com>
// Copyright 2024 Google LLC
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions
// are met:
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// The generated code of this file depends on the following RISC-V extensions:
// - RV64I
// - RISC-V Vector ('V') with VLEN >= 128
// - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
#include <linux/linkage.h>
.text
.option arch, +zvkb
#define KEYP a0
#define INP a1
#define OUTP a2
#define LEN a3
#define IVP a4
#define CONSTS0 a5
#define CONSTS1 a6
#define CONSTS2 a7
#define CONSTS3 t0
#define TMP t1
#define VL t2
#define STRIDE t3
#define NROUNDS t4
#define KEY0 s0
#define KEY1 s1
#define KEY2 s2
#define KEY3 s3
#define KEY4 s4
#define KEY5 s5
#define KEY6 s6
#define KEY7 s7
#define COUNTER s8
#define NONCE0 s9
#define NONCE1 s10
#define NONCE2 s11
.macro chacha_round a0, b0, c0, d0, a1, b1, c1, d1, \
a2, b2, c2, d2, a3, b3, c3, d3
// a += b; d ^= a; d = rol(d, 16);
vadd.vv \a0, \a0, \b0
vadd.vv \a1, \a1, \b1
vadd.vv \a2, \a2, \b2
vadd.vv \a3, \a3, \b3
vxor.vv \d0, \d0, \a0
vxor.vv \d1, \d1, \a1
vxor.vv \d2, \d2, \a2
vxor.vv \d3, \d3, \a3
vror.vi \d0, \d0, 32 - 16
vror.vi \d1, \d1, 32 - 16
vror.vi \d2, \d2, 32 - 16
vror.vi \d3, \d3, 32 - 16
// c += d; b ^= c; b = rol(b, 12);
vadd.vv \c0, \c0, \d0
vadd.vv \c1, \c1, \d1
vadd.vv \c2, \c2, \d2
vadd.vv \c3, \c3, \d3
vxor.vv \b0, \b0, \c0
vxor.vv \b1, \b1, \c1
vxor.vv \b2, \b2, \c2
vxor.vv \b3, \b3, \c3
vror.vi \b0, \b0, 32 - 12
vror.vi \b1, \b1, 32 - 12
vror.vi \b2, \b2, 32 - 12
vror.vi \b3, \b3, 32 - 12
// a += b; d ^= a; d = rol(d, 8);
vadd.vv \a0, \a0, \b0
vadd.vv \a1, \a1, \b1
vadd.vv \a2, \a2, \b2
vadd.vv \a3, \a3, \b3
vxor.vv \d0, \d0, \a0
vxor.vv \d1, \d1, \a1
vxor.vv \d2, \d2, \a2
vxor.vv \d3, \d3, \a3
vror.vi \d0, \d0, 32 - 8
vror.vi \d1, \d1, 32 - 8
vror.vi \d2, \d2, 32 - 8
vror.vi \d3, \d3, 32 - 8
// c += d; b ^= c; b = rol(b, 7);
vadd.vv \c0, \c0, \d0
vadd.vv \c1, \c1, \d1
vadd.vv \c2, \c2, \d2
vadd.vv \c3, \c3, \d3
vxor.vv \b0, \b0, \c0
vxor.vv \b1, \b1, \c1
vxor.vv \b2, \b2, \c2
vxor.vv \b3, \b3, \c3
vror.vi \b0, \b0, 32 - 7
vror.vi \b1, \b1, 32 - 7
vror.vi \b2, \b2, 32 - 7
vror.vi \b3, \b3, 32 - 7
.endm
// void chacha20_zvkb(const u32 key[8], const u8 *in, u8 *out, size_t len,
// const u32 iv[4]);
//
// |len| must be nonzero and a multiple of 64 (CHACHA_BLOCK_SIZE).
// The counter is treated as 32-bit, following the RFC7539 convention.
SYM_FUNC_START(chacha20_zvkb)
srli LEN, LEN, 6 // Bytes to blocks
addi sp, sp, -96
sd s0, 0(sp)
sd s1, 8(sp)
sd s2, 16(sp)
sd s3, 24(sp)
sd s4, 32(sp)
sd s5, 40(sp)
sd s6, 48(sp)
sd s7, 56(sp)
sd s8, 64(sp)
sd s9, 72(sp)
sd s10, 80(sp)
sd s11, 88(sp)
li STRIDE, 64
// Set up the initial state matrix in scalar registers.
li CONSTS0, 0x61707865 // "expa" little endian
li CONSTS1, 0x3320646e // "nd 3" little endian
li CONSTS2, 0x79622d32 // "2-by" little endian
li CONSTS3, 0x6b206574 // "te k" little endian
lw KEY0, 0(KEYP)
lw KEY1, 4(KEYP)
lw KEY2, 8(KEYP)
lw KEY3, 12(KEYP)
lw KEY4, 16(KEYP)
lw KEY5, 20(KEYP)
lw KEY6, 24(KEYP)
lw KEY7, 28(KEYP)
lw COUNTER, 0(IVP)
lw NONCE0, 4(IVP)
lw NONCE1, 8(IVP)
lw NONCE2, 12(IVP)
.Lblock_loop:
// Set vl to the number of blocks to process in this iteration.
vsetvli VL, LEN, e32, m1, ta, ma
// Set up the initial state matrix for the next VL blocks in v0-v15.
// v{i} holds the i'th 32-bit word of the state matrix for all blocks.
// Note that only the counter word, at index 12, differs across blocks.
vmv.v.x v0, CONSTS0
vmv.v.x v1, CONSTS1
vmv.v.x v2, CONSTS2
vmv.v.x v3, CONSTS3
vmv.v.x v4, KEY0
vmv.v.x v5, KEY1
vmv.v.x v6, KEY2
vmv.v.x v7, KEY3
vmv.v.x v8, KEY4
vmv.v.x v9, KEY5
vmv.v.x v10, KEY6
vmv.v.x v11, KEY7
vid.v v12
vadd.vx v12, v12, COUNTER
vmv.v.x v13, NONCE0
vmv.v.x v14, NONCE1
vmv.v.x v15, NONCE2
// Load the first half of the input data for each block into v16-v23.
// v{16+i} holds the i'th 32-bit word for all blocks.
vlsseg8e32.v v16, (INP), STRIDE
li NROUNDS, 20
.Lnext_doubleround:
addi NROUNDS, NROUNDS, -2
// column round
chacha_round v0, v4, v8, v12, v1, v5, v9, v13, \
v2, v6, v10, v14, v3, v7, v11, v15
// diagonal round
chacha_round v0, v5, v10, v15, v1, v6, v11, v12, \
v2, v7, v8, v13, v3, v4, v9, v14
bnez NROUNDS, .Lnext_doubleround
// Load the second half of the input data for each block into v24-v31.
// v{24+i} holds the {8+i}'th 32-bit word for all blocks.
addi TMP, INP, 32
vlsseg8e32.v v24, (TMP), STRIDE
// Finalize the first half of the keystream for each block.
vadd.vx v0, v0, CONSTS0
vadd.vx v1, v1, CONSTS1
vadd.vx v2, v2, CONSTS2
vadd.vx v3, v3, CONSTS3
vadd.vx v4, v4, KEY0
vadd.vx v5, v5, KEY1
vadd.vx v6, v6, KEY2
vadd.vx v7, v7, KEY3
// Encrypt/decrypt the first half of the data for each block.
vxor.vv v16, v16, v0
vxor.vv v17, v17, v1
vxor.vv v18, v18, v2
vxor.vv v19, v19, v3
vxor.vv v20, v20, v4
vxor.vv v21, v21, v5
vxor.vv v22, v22, v6
vxor.vv v23, v23, v7
// Store the first half of the output data for each block.
vssseg8e32.v v16, (OUTP), STRIDE
// Finalize the second half of the keystream for each block.
vadd.vx v8, v8, KEY4
vadd.vx v9, v9, KEY5
vadd.vx v10, v10, KEY6
vadd.vx v11, v11, KEY7
vid.v v0
vadd.vx v12, v12, COUNTER
vadd.vx v13, v13, NONCE0
vadd.vx v14, v14, NONCE1
vadd.vx v15, v15, NONCE2
vadd.vv v12, v12, v0
// Encrypt/decrypt the second half of the data for each block.
vxor.vv v24, v24, v8
vxor.vv v25, v25, v9
vxor.vv v26, v26, v10
vxor.vv v27, v27, v11
vxor.vv v29, v29, v13
vxor.vv v28, v28, v12
vxor.vv v30, v30, v14
vxor.vv v31, v31, v15
// Store the second half of the output data for each block.
addi TMP, OUTP, 32
vssseg8e32.v v24, (TMP), STRIDE
// Update the counter, the remaining number of blocks, and the input and
// output pointers according to the number of blocks processed (VL).
add COUNTER, COUNTER, VL
sub LEN, LEN, VL
slli TMP, VL, 6
add OUTP, OUTP, TMP
add INP, INP, TMP
bnez LEN, .Lblock_loop
ld s0, 0(sp)
ld s1, 8(sp)
ld s2, 16(sp)
ld s3, 24(sp)
ld s4, 32(sp)
ld s5, 40(sp)
ld s6, 48(sp)
ld s7, 56(sp)
ld s8, 64(sp)
ld s9, 72(sp)
ld s10, 80(sp)
ld s11, 88(sp)
addi sp, sp, 96
ret
SYM_FUNC_END(chacha20_zvkb)
// SPDX-License-Identifier: GPL-2.0-only
/*
* GHASH using the RISC-V vector crypto extensions
*
* Copyright (C) 2023 VRULL GmbH
* Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
*
* Copyright (C) 2023 SiFive, Inc.
* Author: Jerry Shih <jerry.shih@sifive.com>
*/
#include <asm/simd.h>
#include <asm/vector.h>
#include <crypto/ghash.h>
#include <crypto/internal/hash.h>
#include <crypto/internal/simd.h>
#include <linux/linkage.h>
#include <linux/module.h>
asmlinkage void ghash_zvkg(be128 *accumulator, const be128 *key, const u8 *data,
size_t len);
struct riscv64_ghash_tfm_ctx {
be128 key;
};
struct riscv64_ghash_desc_ctx {
be128 accumulator;
u8 buffer[GHASH_BLOCK_SIZE];
u32 bytes;
};
static int riscv64_ghash_setkey(struct crypto_shash *tfm, const u8 *key,
unsigned int keylen)
{
struct riscv64_ghash_tfm_ctx *tctx = crypto_shash_ctx(tfm);
if (keylen != GHASH_BLOCK_SIZE)
return -EINVAL;
memcpy(&tctx->key, key, GHASH_BLOCK_SIZE);
return 0;
}
static int riscv64_ghash_init(struct shash_desc *desc)
{
struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
*dctx = (struct riscv64_ghash_desc_ctx){};
return 0;
}
static inline void
riscv64_ghash_blocks(const struct riscv64_ghash_tfm_ctx *tctx,
struct riscv64_ghash_desc_ctx *dctx,
const u8 *src, size_t srclen)
{
/* The srclen is nonzero and a multiple of 16. */
if (crypto_simd_usable()) {
kernel_vector_begin();
ghash_zvkg(&dctx->accumulator, &tctx->key, src, srclen);
kernel_vector_end();
} else {
do {
crypto_xor((u8 *)&dctx->accumulator, src,
GHASH_BLOCK_SIZE);
gf128mul_lle(&dctx->accumulator, &tctx->key);
src += GHASH_BLOCK_SIZE;
srclen -= GHASH_BLOCK_SIZE;
} while (srclen);
}
}
static int riscv64_ghash_update(struct shash_desc *desc, const u8 *src,
unsigned int srclen)
{
const struct riscv64_ghash_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm);
struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
unsigned int len;
if (dctx->bytes) {
if (dctx->bytes + srclen < GHASH_BLOCK_SIZE) {
memcpy(dctx->buffer + dctx->bytes, src, srclen);
dctx->bytes += srclen;
return 0;
}
memcpy(dctx->buffer + dctx->bytes, src,
GHASH_BLOCK_SIZE - dctx->bytes);
riscv64_ghash_blocks(tctx, dctx, dctx->buffer,
GHASH_BLOCK_SIZE);
src += GHASH_BLOCK_SIZE - dctx->bytes;
srclen -= GHASH_BLOCK_SIZE - dctx->bytes;
dctx->bytes = 0;
}
len = round_down(srclen, GHASH_BLOCK_SIZE);
if (len) {
riscv64_ghash_blocks(tctx, dctx, src, len);
src += len;
srclen -= len;
}
if (srclen) {
memcpy(dctx->buffer, src, srclen);
dctx->bytes = srclen;
}
return 0;
}
static int riscv64_ghash_final(struct shash_desc *desc, u8 *out)
{
const struct riscv64_ghash_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm);
struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
int i;
if (dctx->bytes) {
for (i = dctx->bytes; i < GHASH_BLOCK_SIZE; i++)
dctx->buffer[i] = 0;
riscv64_ghash_blocks(tctx, dctx, dctx->buffer,
GHASH_BLOCK_SIZE);
}
memcpy(out, &dctx->accumulator, GHASH_DIGEST_SIZE);
return 0;
}
static struct shash_alg riscv64_ghash_alg = {
.init = riscv64_ghash_init,
.update = riscv64_ghash_update,
.final = riscv64_ghash_final,
.setkey = riscv64_ghash_setkey,
.descsize = sizeof(struct riscv64_ghash_desc_ctx),
.digestsize = GHASH_DIGEST_SIZE,
.base = {
.cra_blocksize = GHASH_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct riscv64_ghash_tfm_ctx),
.cra_priority = 300,
.cra_name = "ghash",
.cra_driver_name = "ghash-riscv64-zvkg",
.cra_module = THIS_MODULE,
},
};
static int __init riscv64_ghash_mod_init(void)
{
if (riscv_isa_extension_available(NULL, ZVKG) &&
riscv_vector_vlen() >= 128)
return crypto_register_shash(&riscv64_ghash_alg);
return -ENODEV;
}
static void __exit riscv64_ghash_mod_exit(void)
{
crypto_unregister_shash(&riscv64_ghash_alg);
}
module_init(riscv64_ghash_mod_init);
module_exit(riscv64_ghash_mod_exit);
MODULE_DESCRIPTION("GHASH (RISC-V accelerated)");
MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("ghash");
/* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */
//
// This file is dual-licensed, meaning that you can use it under your
// choice of either of the following two licenses:
//
// Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
//
// Licensed under the Apache License 2.0 (the "License"). You can obtain
// a copy in the file LICENSE in the source distribution or at
// https://www.openssl.org/source/license.html
//
// or
//
// Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
// Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com>
// Copyright 2024 Google LLC
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions
// are met:
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// The generated code of this file depends on the following RISC-V extensions:
// - RV64I
// - RISC-V Vector ('V') with VLEN >= 128
// - RISC-V Vector GCM/GMAC extension ('Zvkg')
#include <linux/linkage.h>
.text
.option arch, +zvkg
#define ACCUMULATOR a0
#define KEY a1
#define DATA a2
#define LEN a3
// void ghash_zvkg(be128 *accumulator, const be128 *key, const u8 *data,
// size_t len);
//
// |len| must be nonzero and a multiple of 16 (GHASH_BLOCK_SIZE).
SYM_FUNC_START(ghash_zvkg)
vsetivli zero, 4, e32, m1, ta, ma
vle32.v v1, (ACCUMULATOR)
vle32.v v2, (KEY)
.Lnext_block:
vle32.v v3, (DATA)
vghsh.vv v1, v2, v3
addi DATA, DATA, 16
addi LEN, LEN, -16
bnez LEN, .Lnext_block
vse32.v v1, (ACCUMULATOR)
ret
SYM_FUNC_END(ghash_zvkg)
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* SHA-256 and SHA-224 using the RISC-V vector crypto extensions
*
* Copyright (C) 2022 VRULL GmbH
* Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
*
* Copyright (C) 2023 SiFive, Inc.
* Author: Jerry Shih <jerry.shih@sifive.com>
*/
#include <asm/simd.h>
#include <asm/vector.h>
#include <crypto/internal/hash.h>
#include <crypto/internal/simd.h>
#include <crypto/sha256_base.h>
#include <linux/linkage.h>
#include <linux/module.h>
/*
* Note: the asm function only uses the 'state' field of struct sha256_state.
* It is assumed to be the first field.
*/
asmlinkage void sha256_transform_zvknha_or_zvknhb_zvkb(
struct sha256_state *state, const u8 *data, int num_blocks);
static int riscv64_sha256_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
/*
* Ensure struct sha256_state begins directly with the SHA-256
* 256-bit internal state, as this is what the asm function expects.
*/
BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);
if (crypto_simd_usable()) {
kernel_vector_begin();
sha256_base_do_update(desc, data, len,
sha256_transform_zvknha_or_zvknhb_zvkb);
kernel_vector_end();
} else {
crypto_sha256_update(desc, data, len);
}
return 0;
}
static int riscv64_sha256_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
if (crypto_simd_usable()) {
kernel_vector_begin();
if (len)
sha256_base_do_update(
desc, data, len,
sha256_transform_zvknha_or_zvknhb_zvkb);
sha256_base_do_finalize(
desc, sha256_transform_zvknha_or_zvknhb_zvkb);
kernel_vector_end();
return sha256_base_finish(desc, out);
}
return crypto_sha256_finup(desc, data, len, out);
}
static int riscv64_sha256_final(struct shash_desc *desc, u8 *out)
{
return riscv64_sha256_finup(desc, NULL, 0, out);
}
static int riscv64_sha256_digest(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha256_base_init(desc) ?:
riscv64_sha256_finup(desc, data, len, out);
}
static struct shash_alg riscv64_sha256_algs[] = {
{
.init = sha256_base_init,
.update = riscv64_sha256_update,
.final = riscv64_sha256_final,
.finup = riscv64_sha256_finup,
.digest = riscv64_sha256_digest,
.descsize = sizeof(struct sha256_state),
.digestsize = SHA256_DIGEST_SIZE,
.base = {
.cra_blocksize = SHA256_BLOCK_SIZE,
.cra_priority = 300,
.cra_name = "sha256",
.cra_driver_name = "sha256-riscv64-zvknha_or_zvknhb-zvkb",
.cra_module = THIS_MODULE,
},
}, {
.init = sha224_base_init,
.update = riscv64_sha256_update,
.final = riscv64_sha256_final,
.finup = riscv64_sha256_finup,
.descsize = sizeof(struct sha256_state),
.digestsize = SHA224_DIGEST_SIZE,
.base = {
.cra_blocksize = SHA224_BLOCK_SIZE,
.cra_priority = 300,
.cra_name = "sha224",
.cra_driver_name = "sha224-riscv64-zvknha_or_zvknhb-zvkb",
.cra_module = THIS_MODULE,
},
},
};
static int __init riscv64_sha256_mod_init(void)
{
/* Both zvknha and zvknhb provide the SHA-256 instructions. */
if ((riscv_isa_extension_available(NULL, ZVKNHA) ||
riscv_isa_extension_available(NULL, ZVKNHB)) &&
riscv_isa_extension_available(NULL, ZVKB) &&
riscv_vector_vlen() >= 128)
return crypto_register_shashes(riscv64_sha256_algs,
ARRAY_SIZE(riscv64_sha256_algs));
return -ENODEV;
}
static void __exit riscv64_sha256_mod_exit(void)
{
crypto_unregister_shashes(riscv64_sha256_algs,
ARRAY_SIZE(riscv64_sha256_algs));
}
module_init(riscv64_sha256_mod_init);
module_exit(riscv64_sha256_mod_exit);
MODULE_DESCRIPTION("SHA-256 (RISC-V accelerated)");
MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("sha256");
MODULE_ALIAS_CRYPTO("sha224");
/* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */
//
// This file is dual-licensed, meaning that you can use it under your
// choice of either of the following two licenses:
//
// Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
//
// Licensed under the Apache License 2.0 (the "License"). You can obtain
// a copy in the file LICENSE in the source distribution or at
// https://www.openssl.org/source/license.html
//
// or
//
// Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
// Copyright (c) 2023, Phoebe Chen <phoebe.chen@sifive.com>
// Copyright 2024 Google LLC
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions
// are met:
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// The generated code of this file depends on the following RISC-V extensions:
// - RV64I
// - RISC-V Vector ('V') with VLEN >= 128
// - RISC-V Vector SHA-2 Secure Hash extension ('Zvknha' or 'Zvknhb')
// - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
#include <linux/cfi_types.h>
.text
.option arch, +zvknha, +zvkb
#define STATEP a0
#define DATA a1
#define NUM_BLOCKS a2
#define STATEP_C a3
#define MASK v0
#define INDICES v1
#define W0 v2
#define W1 v3
#define W2 v4
#define W3 v5
#define VTMP v6
#define FEBA v7
#define HGDC v8
#define K0 v10
#define K1 v11
#define K2 v12
#define K3 v13
#define K4 v14
#define K5 v15
#define K6 v16
#define K7 v17
#define K8 v18
#define K9 v19
#define K10 v20
#define K11 v21
#define K12 v22
#define K13 v23
#define K14 v24
#define K15 v25
#define PREV_FEBA v26
#define PREV_HGDC v27
// Do 4 rounds of SHA-256. w0 contains the current 4 message schedule words.
//
// If not all the message schedule words have been computed yet, then this also
// computes 4 more message schedule words. w1-w3 contain the next 3 groups of 4
// message schedule words; this macro computes the group after w3 and writes it
// to w0. This means that the next (w0, w1, w2, w3) is the current (w1, w2, w3,
// w0), so the caller must cycle through the registers accordingly.
.macro sha256_4rounds last, k, w0, w1, w2, w3
vadd.vv VTMP, \k, \w0
vsha2cl.vv HGDC, FEBA, VTMP
vsha2ch.vv FEBA, HGDC, VTMP
.if !\last
vmerge.vvm VTMP, \w2, \w1, MASK
vsha2ms.vv \w0, VTMP, \w3
.endif
.endm
.macro sha256_16rounds last, k0, k1, k2, k3
sha256_4rounds \last, \k0, W0, W1, W2, W3
sha256_4rounds \last, \k1, W1, W2, W3, W0
sha256_4rounds \last, \k2, W2, W3, W0, W1
sha256_4rounds \last, \k3, W3, W0, W1, W2
.endm
// void sha256_transform_zvknha_or_zvknhb_zvkb(u32 state[8], const u8 *data,
// int num_blocks);
SYM_TYPED_FUNC_START(sha256_transform_zvknha_or_zvknhb_zvkb)
// Load the round constants into K0-K15.
vsetivli zero, 4, e32, m1, ta, ma
la t0, K256
vle32.v K0, (t0)
addi t0, t0, 16
vle32.v K1, (t0)
addi t0, t0, 16
vle32.v K2, (t0)
addi t0, t0, 16
vle32.v K3, (t0)
addi t0, t0, 16
vle32.v K4, (t0)
addi t0, t0, 16
vle32.v K5, (t0)
addi t0, t0, 16
vle32.v K6, (t0)
addi t0, t0, 16
vle32.v K7, (t0)
addi t0, t0, 16
vle32.v K8, (t0)
addi t0, t0, 16
vle32.v K9, (t0)
addi t0, t0, 16
vle32.v K10, (t0)
addi t0, t0, 16
vle32.v K11, (t0)
addi t0, t0, 16
vle32.v K12, (t0)
addi t0, t0, 16
vle32.v K13, (t0)
addi t0, t0, 16
vle32.v K14, (t0)
addi t0, t0, 16
vle32.v K15, (t0)
// Setup mask for the vmerge to replace the first word (idx==0) in
// message scheduling. There are 4 words, so an 8-bit mask suffices.
vsetivli zero, 1, e8, m1, ta, ma
vmv.v.i MASK, 0x01
// Load the state. The state is stored as {a,b,c,d,e,f,g,h}, but we
// need {f,e,b,a},{h,g,d,c}. The dst vtype is e32m1 and the index vtype
// is e8mf4. We use index-load with the i8 indices {20, 16, 4, 0},
// loaded using the 32-bit little endian value 0x00041014.
li t0, 0x00041014
vsetivli zero, 1, e32, m1, ta, ma
vmv.v.x INDICES, t0
addi STATEP_C, STATEP, 8
vsetivli zero, 4, e32, m1, ta, ma
vluxei8.v FEBA, (STATEP), INDICES
vluxei8.v HGDC, (STATEP_C), INDICES
.Lnext_block:
addi NUM_BLOCKS, NUM_BLOCKS, -1
// Save the previous state, as it's needed later.
vmv.v.v PREV_FEBA, FEBA
vmv.v.v PREV_HGDC, HGDC
// Load the next 512-bit message block and endian-swap each 32-bit word.
vle32.v W0, (DATA)
vrev8.v W0, W0
addi DATA, DATA, 16
vle32.v W1, (DATA)
vrev8.v W1, W1
addi DATA, DATA, 16
vle32.v W2, (DATA)
vrev8.v W2, W2
addi DATA, DATA, 16
vle32.v W3, (DATA)
vrev8.v W3, W3
addi DATA, DATA, 16
// Do the 64 rounds of SHA-256.
sha256_16rounds 0, K0, K1, K2, K3
sha256_16rounds 0, K4, K5, K6, K7
sha256_16rounds 0, K8, K9, K10, K11
sha256_16rounds 1, K12, K13, K14, K15
// Add the previous state.
vadd.vv FEBA, FEBA, PREV_FEBA
vadd.vv HGDC, HGDC, PREV_HGDC
// Repeat if more blocks remain.
bnez NUM_BLOCKS, .Lnext_block
// Store the new state and return.
vsuxei8.v FEBA, (STATEP), INDICES
vsuxei8.v HGDC, (STATEP_C), INDICES
ret
SYM_FUNC_END(sha256_transform_zvknha_or_zvknhb_zvkb)
.section ".rodata"
.p2align 2
.type K256, @object
K256:
.word 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5
.word 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5
.word 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3
.word 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174
.word 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc
.word 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da
.word 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7
.word 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967
.word 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13
.word 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85
.word 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3
.word 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070
.word 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5
.word 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3
.word 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208
.word 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
.size K256, . - K256
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* SHA-512 and SHA-384 using the RISC-V vector crypto extensions
*
* Copyright (C) 2023 VRULL GmbH
* Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
*
* Copyright (C) 2023 SiFive, Inc.
* Author: Jerry Shih <jerry.shih@sifive.com>
*/
#include <asm/simd.h>
#include <asm/vector.h>
#include <crypto/internal/hash.h>
#include <crypto/internal/simd.h>
#include <crypto/sha512_base.h>
#include <linux/linkage.h>
#include <linux/module.h>
/*
* Note: the asm function only uses the 'state' field of struct sha512_state.
* It is assumed to be the first field.
*/
asmlinkage void sha512_transform_zvknhb_zvkb(
struct sha512_state *state, const u8 *data, int num_blocks);
static int riscv64_sha512_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
/*
* Ensure struct sha512_state begins directly with the SHA-512
* 512-bit internal state, as this is what the asm function expects.
*/
BUILD_BUG_ON(offsetof(struct sha512_state, state) != 0);
if (crypto_simd_usable()) {
kernel_vector_begin();
sha512_base_do_update(desc, data, len,
sha512_transform_zvknhb_zvkb);
kernel_vector_end();
} else {
crypto_sha512_update(desc, data, len);
}
return 0;
}
static int riscv64_sha512_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
if (crypto_simd_usable()) {
kernel_vector_begin();
if (len)
sha512_base_do_update(desc, data, len,
sha512_transform_zvknhb_zvkb);
sha512_base_do_finalize(desc, sha512_transform_zvknhb_zvkb);
kernel_vector_end();
return sha512_base_finish(desc, out);
}
return crypto_sha512_finup(desc, data, len, out);
}
static int riscv64_sha512_final(struct shash_desc *desc, u8 *out)
{
return riscv64_sha512_finup(desc, NULL, 0, out);
}
static int riscv64_sha512_digest(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return sha512_base_init(desc) ?:
riscv64_sha512_finup(desc, data, len, out);
}
static struct shash_alg riscv64_sha512_algs[] = {
{
.init = sha512_base_init,
.update = riscv64_sha512_update,
.final = riscv64_sha512_final,
.finup = riscv64_sha512_finup,
.digest = riscv64_sha512_digest,
.descsize = sizeof(struct sha512_state),
.digestsize = SHA512_DIGEST_SIZE,
.base = {
.cra_blocksize = SHA512_BLOCK_SIZE,
.cra_priority = 300,
.cra_name = "sha512",
.cra_driver_name = "sha512-riscv64-zvknhb-zvkb",
.cra_module = THIS_MODULE,
},
}, {
.init = sha384_base_init,
.update = riscv64_sha512_update,
.final = riscv64_sha512_final,
.finup = riscv64_sha512_finup,
.descsize = sizeof(struct sha512_state),
.digestsize = SHA384_DIGEST_SIZE,
.base = {
.cra_blocksize = SHA384_BLOCK_SIZE,
.cra_priority = 300,
.cra_name = "sha384",
.cra_driver_name = "sha384-riscv64-zvknhb-zvkb",
.cra_module = THIS_MODULE,
},
},
};
static int __init riscv64_sha512_mod_init(void)
{
if (riscv_isa_extension_available(NULL, ZVKNHB) &&
riscv_isa_extension_available(NULL, ZVKB) &&
riscv_vector_vlen() >= 128)
return crypto_register_shashes(riscv64_sha512_algs,
ARRAY_SIZE(riscv64_sha512_algs));
return -ENODEV;
}
static void __exit riscv64_sha512_mod_exit(void)
{
crypto_unregister_shashes(riscv64_sha512_algs,
ARRAY_SIZE(riscv64_sha512_algs));
}
module_init(riscv64_sha512_mod_init);
module_exit(riscv64_sha512_mod_exit);
MODULE_DESCRIPTION("SHA-512 (RISC-V accelerated)");
MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("sha512");
MODULE_ALIAS_CRYPTO("sha384");
/* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */
//
// This file is dual-licensed, meaning that you can use it under your
// choice of either of the following two licenses:
//
// Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
//
// Licensed under the Apache License 2.0 (the "License"). You can obtain
// a copy in the file LICENSE in the source distribution or at
// https://www.openssl.org/source/license.html
//
// or
//
// Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
// Copyright (c) 2023, Phoebe Chen <phoebe.chen@sifive.com>
// Copyright 2024 Google LLC
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions
// are met:
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// The generated code of this file depends on the following RISC-V extensions:
// - RV64I
// - RISC-V Vector ('V') with VLEN >= 128
// - RISC-V Vector SHA-2 Secure Hash extension ('Zvknhb')
// - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
#include <linux/cfi_types.h>
.text
.option arch, +zvknhb, +zvkb
#define STATEP a0
#define DATA a1
#define NUM_BLOCKS a2
#define STATEP_C a3
#define K a4
#define MASK v0
#define INDICES v1
#define W0 v10 // LMUL=2
#define W1 v12 // LMUL=2
#define W2 v14 // LMUL=2
#define W3 v16 // LMUL=2
#define VTMP v20 // LMUL=2
#define FEBA v22 // LMUL=2
#define HGDC v24 // LMUL=2
#define PREV_FEBA v26 // LMUL=2
#define PREV_HGDC v28 // LMUL=2
// Do 4 rounds of SHA-512. w0 contains the current 4 message schedule words.
//
// If not all the message schedule words have been computed yet, then this also
// computes 4 more message schedule words. w1-w3 contain the next 3 groups of 4
// message schedule words; this macro computes the group after w3 and writes it
// to w0. This means that the next (w0, w1, w2, w3) is the current (w1, w2, w3,
// w0), so the caller must cycle through the registers accordingly.
.macro sha512_4rounds last, w0, w1, w2, w3
vle64.v VTMP, (K)
addi K, K, 32
vadd.vv VTMP, VTMP, \w0
vsha2cl.vv HGDC, FEBA, VTMP
vsha2ch.vv FEBA, HGDC, VTMP
.if !\last
vmerge.vvm VTMP, \w2, \w1, MASK
vsha2ms.vv \w0, VTMP, \w3
.endif
.endm
.macro sha512_16rounds last
sha512_4rounds \last, W0, W1, W2, W3
sha512_4rounds \last, W1, W2, W3, W0
sha512_4rounds \last, W2, W3, W0, W1
sha512_4rounds \last, W3, W0, W1, W2
.endm
// void sha512_transform_zvknhb_zvkb(u64 state[8], const u8 *data,
// int num_blocks);
SYM_TYPED_FUNC_START(sha512_transform_zvknhb_zvkb)
// Setup mask for the vmerge to replace the first word (idx==0) in
// message scheduling. There are 4 words, so an 8-bit mask suffices.
vsetivli zero, 1, e8, m1, ta, ma
vmv.v.i MASK, 0x01
// Load the state. The state is stored as {a,b,c,d,e,f,g,h}, but we
// need {f,e,b,a},{h,g,d,c}. The dst vtype is e64m2 and the index vtype
// is e8mf4. We use index-load with the i8 indices {40, 32, 8, 0},
// loaded using the 32-bit little endian value 0x00082028.
li t0, 0x00082028
vsetivli zero, 1, e32, m1, ta, ma
vmv.v.x INDICES, t0
addi STATEP_C, STATEP, 16
vsetivli zero, 4, e64, m2, ta, ma
vluxei8.v FEBA, (STATEP), INDICES
vluxei8.v HGDC, (STATEP_C), INDICES
.Lnext_block:
la K, K512
addi NUM_BLOCKS, NUM_BLOCKS, -1
// Save the previous state, as it's needed later.
vmv.v.v PREV_FEBA, FEBA
vmv.v.v PREV_HGDC, HGDC
// Load the next 1024-bit message block and endian-swap each 64-bit word
vle64.v W0, (DATA)
vrev8.v W0, W0
addi DATA, DATA, 32
vle64.v W1, (DATA)
vrev8.v W1, W1
addi DATA, DATA, 32
vle64.v W2, (DATA)
vrev8.v W2, W2
addi DATA, DATA, 32
vle64.v W3, (DATA)
vrev8.v W3, W3
addi DATA, DATA, 32
// Do the 80 rounds of SHA-512.
sha512_16rounds 0
sha512_16rounds 0
sha512_16rounds 0
sha512_16rounds 0
sha512_16rounds 1
// Add the previous state.
vadd.vv FEBA, FEBA, PREV_FEBA
vadd.vv HGDC, HGDC, PREV_HGDC
// Repeat if more blocks remain.
bnez NUM_BLOCKS, .Lnext_block
// Store the new state and return.
vsuxei8.v FEBA, (STATEP), INDICES
vsuxei8.v HGDC, (STATEP_C), INDICES
ret
SYM_FUNC_END(sha512_transform_zvknhb_zvkb)
.section ".rodata"
.p2align 3
.type K512, @object
K512:
.dword 0x428a2f98d728ae22, 0x7137449123ef65cd
.dword 0xb5c0fbcfec4d3b2f, 0xe9b5dba58189dbbc
.dword 0x3956c25bf348b538, 0x59f111f1b605d019
.dword 0x923f82a4af194f9b, 0xab1c5ed5da6d8118
.dword 0xd807aa98a3030242, 0x12835b0145706fbe
.dword 0x243185be4ee4b28c, 0x550c7dc3d5ffb4e2
.dword 0x72be5d74f27b896f, 0x80deb1fe3b1696b1
.dword 0x9bdc06a725c71235, 0xc19bf174cf692694
.dword 0xe49b69c19ef14ad2, 0xefbe4786384f25e3
.dword 0x0fc19dc68b8cd5b5, 0x240ca1cc77ac9c65
.dword 0x2de92c6f592b0275, 0x4a7484aa6ea6e483
.dword 0x5cb0a9dcbd41fbd4, 0x76f988da831153b5
.dword 0x983e5152ee66dfab, 0xa831c66d2db43210
.dword 0xb00327c898fb213f, 0xbf597fc7beef0ee4
.dword 0xc6e00bf33da88fc2, 0xd5a79147930aa725
.dword 0x06ca6351e003826f, 0x142929670a0e6e70
.dword 0x27b70a8546d22ffc, 0x2e1b21385c26c926
.dword 0x4d2c6dfc5ac42aed, 0x53380d139d95b3df
.dword 0x650a73548baf63de, 0x766a0abb3c77b2a8
.dword 0x81c2c92e47edaee6, 0x92722c851482353b
.dword 0xa2bfe8a14cf10364, 0xa81a664bbc423001
.dword 0xc24b8b70d0f89791, 0xc76c51a30654be30
.dword 0xd192e819d6ef5218, 0xd69906245565a910
.dword 0xf40e35855771202a, 0x106aa07032bbd1b8
.dword 0x19a4c116b8d2d0c8, 0x1e376c085141ab53
.dword 0x2748774cdf8eeb99, 0x34b0bcb5e19b48a8
.dword 0x391c0cb3c5c95a63, 0x4ed8aa4ae3418acb
.dword 0x5b9cca4f7763e373, 0x682e6ff3d6b2b8a3
.dword 0x748f82ee5defb2fc, 0x78a5636f43172f60
.dword 0x84c87814a1f0ab72, 0x8cc702081a6439ec
.dword 0x90befffa23631e28, 0xa4506cebde82bde9
.dword 0xbef9a3f7b2c67915, 0xc67178f2e372532b
.dword 0xca273eceea26619c, 0xd186b8c721c0c207
.dword 0xeada7dd6cde0eb1e, 0xf57d4f7fee6ed178
.dword 0x06f067aa72176fba, 0x0a637dc5a2c898a6
.dword 0x113f9804bef90dae, 0x1b710b35131c471b
.dword 0x28db77f523047d84, 0x32caab7b40c72493
.dword 0x3c9ebe0a15c9bebc, 0x431d67c49c100d4c
.dword 0x4cc5d4becb3e42b6, 0x597f299cfc657e2a
.dword 0x5fcb6fab3ad6faec, 0x6c44198c4a475817
.size K512, . - K512
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* SM3 using the RISC-V vector crypto extensions
*
* Copyright (C) 2023 VRULL GmbH
* Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
*
* Copyright (C) 2023 SiFive, Inc.
* Author: Jerry Shih <jerry.shih@sifive.com>
*/
#include <asm/simd.h>
#include <asm/vector.h>
#include <crypto/internal/hash.h>
#include <crypto/internal/simd.h>
#include <crypto/sm3_base.h>
#include <linux/linkage.h>
#include <linux/module.h>
/*
* Note: the asm function only uses the 'state' field of struct sm3_state.
* It is assumed to be the first field.
*/
asmlinkage void sm3_transform_zvksh_zvkb(
struct sm3_state *state, const u8 *data, int num_blocks);
static int riscv64_sm3_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
/*
* Ensure struct sm3_state begins directly with the SM3
* 256-bit internal state, as this is what the asm function expects.
*/
BUILD_BUG_ON(offsetof(struct sm3_state, state) != 0);
if (crypto_simd_usable()) {
kernel_vector_begin();
sm3_base_do_update(desc, data, len, sm3_transform_zvksh_zvkb);
kernel_vector_end();
} else {
sm3_update(shash_desc_ctx(desc), data, len);
}
return 0;
}
static int riscv64_sm3_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
struct sm3_state *ctx;
if (crypto_simd_usable()) {
kernel_vector_begin();
if (len)
sm3_base_do_update(desc, data, len,
sm3_transform_zvksh_zvkb);
sm3_base_do_finalize(desc, sm3_transform_zvksh_zvkb);
kernel_vector_end();
return sm3_base_finish(desc, out);
}
ctx = shash_desc_ctx(desc);
if (len)
sm3_update(ctx, data, len);
sm3_final(ctx, out);
return 0;
}
static int riscv64_sm3_final(struct shash_desc *desc, u8 *out)
{
return riscv64_sm3_finup(desc, NULL, 0, out);
}
static struct shash_alg riscv64_sm3_alg = {
.init = sm3_base_init,
.update = riscv64_sm3_update,
.final = riscv64_sm3_final,
.finup = riscv64_sm3_finup,
.descsize = sizeof(struct sm3_state),
.digestsize = SM3_DIGEST_SIZE,
.base = {
.cra_blocksize = SM3_BLOCK_SIZE,
.cra_priority = 300,
.cra_name = "sm3",
.cra_driver_name = "sm3-riscv64-zvksh-zvkb",
.cra_module = THIS_MODULE,
},
};
static int __init riscv64_sm3_mod_init(void)
{
if (riscv_isa_extension_available(NULL, ZVKSH) &&
riscv_isa_extension_available(NULL, ZVKB) &&
riscv_vector_vlen() >= 128)
return crypto_register_shash(&riscv64_sm3_alg);
return -ENODEV;
}
static void __exit riscv64_sm3_mod_exit(void)
{
crypto_unregister_shash(&riscv64_sm3_alg);
}
module_init(riscv64_sm3_mod_init);
module_exit(riscv64_sm3_mod_exit);
MODULE_DESCRIPTION("SM3 (RISC-V accelerated)");
MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("sm3");
/* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */
//
// This file is dual-licensed, meaning that you can use it under your
// choice of either of the following two licenses:
//
// Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
//
// Licensed under the Apache License 2.0 (the "License"). You can obtain
// a copy in the file LICENSE in the source distribution or at
// https://www.openssl.org/source/license.html
//
// or
//
// Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
// Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com>
// Copyright 2024 Google LLC
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions
// are met:
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// The generated code of this file depends on the following RISC-V extensions:
// - RV64I
// - RISC-V Vector ('V') with VLEN >= 128
// - RISC-V Vector SM3 Secure Hash extension ('Zvksh')
// - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
#include <linux/cfi_types.h>
.text
.option arch, +zvksh, +zvkb
#define STATEP a0
#define DATA a1
#define NUM_BLOCKS a2
#define STATE v0 // LMUL=2
#define PREV_STATE v2 // LMUL=2
#define W0 v4 // LMUL=2
#define W1 v6 // LMUL=2
#define VTMP v8 // LMUL=2
.macro sm3_8rounds i, w0, w1
// Do 4 rounds using W_{0+i}..W_{7+i}.
vsm3c.vi STATE, \w0, \i + 0
vslidedown.vi VTMP, \w0, 2
vsm3c.vi STATE, VTMP, \i + 1
// Compute W_{4+i}..W_{11+i}.
vslidedown.vi VTMP, \w0, 4
vslideup.vi VTMP, \w1, 4
// Do 4 rounds using W_{4+i}..W_{11+i}.
vsm3c.vi STATE, VTMP, \i + 2
vslidedown.vi VTMP, VTMP, 2
vsm3c.vi STATE, VTMP, \i + 3
.if \i < 28
// Compute W_{16+i}..W_{23+i}.
vsm3me.vv \w0, \w1, \w0
.endif
// For the next 8 rounds, w0 and w1 are swapped.
.endm
// void sm3_transform_zvksh_zvkb(u32 state[8], const u8 *data, int num_blocks);
SYM_TYPED_FUNC_START(sm3_transform_zvksh_zvkb)
// Load the state and endian-swap each 32-bit word.
vsetivli zero, 8, e32, m2, ta, ma
vle32.v STATE, (STATEP)
vrev8.v STATE, STATE
.Lnext_block:
addi NUM_BLOCKS, NUM_BLOCKS, -1
// Save the previous state, as it's needed later.
vmv.v.v PREV_STATE, STATE
// Load the next 512-bit message block into W0-W1.
vle32.v W0, (DATA)
addi DATA, DATA, 32
vle32.v W1, (DATA)
addi DATA, DATA, 32
// Do the 64 rounds of SM3.
sm3_8rounds 0, W0, W1
sm3_8rounds 4, W1, W0
sm3_8rounds 8, W0, W1
sm3_8rounds 12, W1, W0
sm3_8rounds 16, W0, W1
sm3_8rounds 20, W1, W0
sm3_8rounds 24, W0, W1
sm3_8rounds 28, W1, W0
// XOR in the previous state.
vxor.vv STATE, STATE, PREV_STATE
// Repeat if more blocks remain.
bnez NUM_BLOCKS, .Lnext_block
// Store the new state and return.
vrev8.v STATE, STATE
vse32.v STATE, (STATEP)
ret
SYM_FUNC_END(sm3_transform_zvksh_zvkb)
// SPDX-License-Identifier: GPL-2.0-only
/*
* SM4 using the RISC-V vector crypto extensions
*
* Copyright (C) 2023 VRULL GmbH
* Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
*
* Copyright (C) 2023 SiFive, Inc.
* Author: Jerry Shih <jerry.shih@sifive.com>
*/
#include <asm/simd.h>
#include <asm/vector.h>
#include <crypto/internal/cipher.h>
#include <crypto/internal/simd.h>
#include <crypto/sm4.h>
#include <linux/linkage.h>
#include <linux/module.h>
asmlinkage void sm4_expandkey_zvksed_zvkb(const u8 user_key[SM4_KEY_SIZE],
u32 rkey_enc[SM4_RKEY_WORDS],
u32 rkey_dec[SM4_RKEY_WORDS]);
asmlinkage void sm4_crypt_zvksed_zvkb(const u32 rkey[SM4_RKEY_WORDS],
const u8 in[SM4_BLOCK_SIZE],
u8 out[SM4_BLOCK_SIZE]);
static int riscv64_sm4_setkey(struct crypto_tfm *tfm, const u8 *key,
unsigned int keylen)
{
struct sm4_ctx *ctx = crypto_tfm_ctx(tfm);
if (crypto_simd_usable()) {
if (keylen != SM4_KEY_SIZE)
return -EINVAL;
kernel_vector_begin();
sm4_expandkey_zvksed_zvkb(key, ctx->rkey_enc, ctx->rkey_dec);
kernel_vector_end();
return 0;
}
return sm4_expandkey(ctx, key, keylen);
}
static void riscv64_sm4_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
{
const struct sm4_ctx *ctx = crypto_tfm_ctx(tfm);
if (crypto_simd_usable()) {
kernel_vector_begin();
sm4_crypt_zvksed_zvkb(ctx->rkey_enc, src, dst);
kernel_vector_end();
} else {
sm4_crypt_block(ctx->rkey_enc, dst, src);
}
}
static void riscv64_sm4_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
{
const struct sm4_ctx *ctx = crypto_tfm_ctx(tfm);
if (crypto_simd_usable()) {
kernel_vector_begin();
sm4_crypt_zvksed_zvkb(ctx->rkey_dec, src, dst);
kernel_vector_end();
} else {
sm4_crypt_block(ctx->rkey_dec, dst, src);
}
}
static struct crypto_alg riscv64_sm4_alg = {
.cra_flags = CRYPTO_ALG_TYPE_CIPHER,
.cra_blocksize = SM4_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct sm4_ctx),
.cra_priority = 300,
.cra_name = "sm4",
.cra_driver_name = "sm4-riscv64-zvksed-zvkb",
.cra_cipher = {
.cia_min_keysize = SM4_KEY_SIZE,
.cia_max_keysize = SM4_KEY_SIZE,
.cia_setkey = riscv64_sm4_setkey,
.cia_encrypt = riscv64_sm4_encrypt,
.cia_decrypt = riscv64_sm4_decrypt,
},
.cra_module = THIS_MODULE,
};
static int __init riscv64_sm4_mod_init(void)
{
if (riscv_isa_extension_available(NULL, ZVKSED) &&
riscv_isa_extension_available(NULL, ZVKB) &&
riscv_vector_vlen() >= 128)
return crypto_register_alg(&riscv64_sm4_alg);
return -ENODEV;
}
static void __exit riscv64_sm4_mod_exit(void)
{
crypto_unregister_alg(&riscv64_sm4_alg);
}
module_init(riscv64_sm4_mod_init);
module_exit(riscv64_sm4_mod_exit);
MODULE_DESCRIPTION("SM4 (RISC-V accelerated)");
MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CRYPTO("sm4");
/* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */
//
// This file is dual-licensed, meaning that you can use it under your
// choice of either of the following two licenses:
//
// Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
//
// Licensed under the Apache License 2.0 (the "License"). You can obtain
// a copy in the file LICENSE in the source distribution or at
// https://www.openssl.org/source/license.html
//
// or
//
// Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu>
// Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com>
// Copyright 2024 Google LLC
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions
// are met:
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// The generated code of this file depends on the following RISC-V extensions:
// - RV64I
// - RISC-V Vector ('V') with VLEN >= 128
// - RISC-V Vector SM4 Block Cipher extension ('Zvksed')
// - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb')
#include <linux/linkage.h>
.text
.option arch, +zvksed, +zvkb
// void sm4_expandkey_zksed_zvkb(const u8 user_key[16], u32 rkey_enc[32],
// u32 rkey_dec[32]);
SYM_FUNC_START(sm4_expandkey_zvksed_zvkb)
vsetivli zero, 4, e32, m1, ta, ma
// Load the user key.
vle32.v v1, (a0)
vrev8.v v1, v1
// XOR the user key with the family key.
la t0, FAMILY_KEY
vle32.v v2, (t0)
vxor.vv v1, v1, v2
// Compute the round keys. Store them in forwards order in rkey_enc
// and in reverse order in rkey_dec.
addi a2, a2, 31*4
li t0, -4
.set i, 0
.rept 8
vsm4k.vi v1, v1, i
vse32.v v1, (a1) // Store to rkey_enc.
vsse32.v v1, (a2), t0 // Store to rkey_dec.
.if i < 7
addi a1, a1, 16
addi a2, a2, -16
.endif
.set i, i + 1
.endr
ret
SYM_FUNC_END(sm4_expandkey_zvksed_zvkb)
// void sm4_crypt_zvksed_zvkb(const u32 rkey[32], const u8 in[16], u8 out[16]);
SYM_FUNC_START(sm4_crypt_zvksed_zvkb)
vsetivli zero, 4, e32, m1, ta, ma
// Load the input data.
vle32.v v1, (a1)
vrev8.v v1, v1
// Do the 32 rounds of SM4, 4 at a time.
.set i, 0
.rept 8
vle32.v v2, (a0)
vsm4r.vs v1, v2
.if i < 7
addi a0, a0, 16
.endif
.set i, i + 1
.endr
// Store the output data (in reverse element order).
vrev8.v v1, v1
li t0, -4
addi a2, a2, 12
vsse32.v v1, (a2), t0
ret
SYM_FUNC_END(sm4_crypt_zvksed_zvkb)
.section ".rodata"
.p2align 2
.type FAMILY_KEY, @object
FAMILY_KEY:
.word 0xA3B1BAC6, 0x56AA3350, 0x677D9197, 0xB27022DC
.size FAMILY_KEY, . - FAMILY_KEY
......@@ -18,9 +18,9 @@
#include <asm/sbi.h>
#include <asm/vendorid_list.h>
#define ANDESTECH_AX45MP_MARCHID 0x8000000000008a45UL
#define ANDESTECH_AX45MP_MIMPID 0x500UL
#define ANDESTECH_SBI_EXT_ANDES 0x0900031E
#define ANDES_AX45MP_MARCHID 0x8000000000008a45UL
#define ANDES_AX45MP_MIMPID 0x500UL
#define ANDES_SBI_EXT_ANDES 0x0900031E
#define ANDES_SBI_EXT_IOCP_SW_WORKAROUND 1
......@@ -32,7 +32,7 @@ static long ax45mp_iocp_sw_workaround(void)
* ANDES_SBI_EXT_IOCP_SW_WORKAROUND SBI EXT checks if the IOCP is missing and
* cache is controllable only then CMO will be applied to the platform.
*/
ret = sbi_ecall(ANDESTECH_SBI_EXT_ANDES, ANDES_SBI_EXT_IOCP_SW_WORKAROUND,
ret = sbi_ecall(ANDES_SBI_EXT_ANDES, ANDES_SBI_EXT_IOCP_SW_WORKAROUND,
0, 0, 0, 0, 0, 0);
return ret.error ? 0 : ret.value;
......@@ -50,7 +50,7 @@ static void errata_probe_iocp(unsigned int stage, unsigned long arch_id, unsigne
done = true;
if (arch_id != ANDESTECH_AX45MP_MARCHID || impid != ANDESTECH_AX45MP_MIMPID)
if (arch_id != ANDES_AX45MP_MARCHID || impid != ANDES_AX45MP_MIMPID)
return;
if (!ax45mp_iocp_sw_workaround())
......
......@@ -183,6 +183,16 @@
REG_L x31, PT_T6(sp)
.endm
/* Annotate a function as being unsuitable for kprobes. */
#ifdef CONFIG_KPROBES
#define ASM_NOKPROBE(name) \
.pushsection "_kprobe_blacklist", "aw"; \
RISCV_PTR name; \
.popsection
#else
#define ASM_NOKPROBE(name)
#endif
#endif /* __ASSEMBLY__ */
#endif /* _ASM_RISCV_ASM_H */
......@@ -17,7 +17,6 @@
#endif
#include <asm/cmpxchg.h>
#include <asm/barrier.h>
#define __atomic_acquire_fence() \
__asm__ __volatile__(RISCV_ACQUIRE_BARRIER "" ::: "memory")
......@@ -207,7 +206,7 @@ static __always_inline int arch_atomic_fetch_add_unless(atomic_t *v, int a, int
" add %[rc], %[p], %[a]\n"
" sc.w.rl %[rc], %[rc], %[c]\n"
" bnez %[rc], 0b\n"
" fence rw, rw\n"
RISCV_FULL_BARRIER
"1:\n"
: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
: [a]"r" (a), [u]"r" (u)
......@@ -228,7 +227,7 @@ static __always_inline s64 arch_atomic64_fetch_add_unless(atomic64_t *v, s64 a,
" add %[rc], %[p], %[a]\n"
" sc.d.rl %[rc], %[rc], %[c]\n"
" bnez %[rc], 0b\n"
" fence rw, rw\n"
RISCV_FULL_BARRIER
"1:\n"
: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
: [a]"r" (a), [u]"r" (u)
......@@ -248,7 +247,7 @@ static __always_inline bool arch_atomic_inc_unless_negative(atomic_t *v)
" addi %[rc], %[p], 1\n"
" sc.w.rl %[rc], %[rc], %[c]\n"
" bnez %[rc], 0b\n"
" fence rw, rw\n"
RISCV_FULL_BARRIER
"1:\n"
: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
:
......@@ -268,7 +267,7 @@ static __always_inline bool arch_atomic_dec_unless_positive(atomic_t *v)
" addi %[rc], %[p], -1\n"
" sc.w.rl %[rc], %[rc], %[c]\n"
" bnez %[rc], 0b\n"
" fence rw, rw\n"
RISCV_FULL_BARRIER
"1:\n"
: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
:
......@@ -288,7 +287,7 @@ static __always_inline int arch_atomic_dec_if_positive(atomic_t *v)
" bltz %[rc], 1f\n"
" sc.w.rl %[rc], %[rc], %[c]\n"
" bnez %[rc], 0b\n"
" fence rw, rw\n"
RISCV_FULL_BARRIER
"1:\n"
: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
:
......@@ -310,7 +309,7 @@ static __always_inline bool arch_atomic64_inc_unless_negative(atomic64_t *v)
" addi %[rc], %[p], 1\n"
" sc.d.rl %[rc], %[rc], %[c]\n"
" bnez %[rc], 0b\n"
" fence rw, rw\n"
RISCV_FULL_BARRIER
"1:\n"
: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
:
......@@ -331,7 +330,7 @@ static __always_inline bool arch_atomic64_dec_unless_positive(atomic64_t *v)
" addi %[rc], %[p], -1\n"
" sc.d.rl %[rc], %[rc], %[c]\n"
" bnez %[rc], 0b\n"
" fence rw, rw\n"
RISCV_FULL_BARRIER
"1:\n"
: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
:
......@@ -352,7 +351,7 @@ static __always_inline s64 arch_atomic64_dec_if_positive(atomic64_t *v)
" bltz %[rc], 1f\n"
" sc.d.rl %[rc], %[rc], %[c]\n"
" bnez %[rc], 0b\n"
" fence rw, rw\n"
RISCV_FULL_BARRIER
"1:\n"
: [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter)
:
......
......@@ -11,28 +11,27 @@
#define _ASM_RISCV_BARRIER_H
#ifndef __ASSEMBLY__
#include <asm/fence.h>
#define nop() __asm__ __volatile__ ("nop")
#define __nops(n) ".rept " #n "\nnop\n.endr\n"
#define nops(n) __asm__ __volatile__ (__nops(n))
#define RISCV_FENCE(p, s) \
__asm__ __volatile__ ("fence " #p "," #s : : : "memory")
/* These barriers need to enforce ordering on both devices or memory. */
#define mb() RISCV_FENCE(iorw,iorw)
#define rmb() RISCV_FENCE(ir,ir)
#define wmb() RISCV_FENCE(ow,ow)
#define __mb() RISCV_FENCE(iorw, iorw)
#define __rmb() RISCV_FENCE(ir, ir)
#define __wmb() RISCV_FENCE(ow, ow)
/* These barriers do not need to enforce ordering on devices, just memory. */
#define __smp_mb() RISCV_FENCE(rw,rw)
#define __smp_rmb() RISCV_FENCE(r,r)
#define __smp_wmb() RISCV_FENCE(w,w)
#define __smp_mb() RISCV_FENCE(rw, rw)
#define __smp_rmb() RISCV_FENCE(r, r)
#define __smp_wmb() RISCV_FENCE(w, w)
#define __smp_store_release(p, v) \
do { \
compiletime_assert_atomic_type(*p); \
RISCV_FENCE(rw,w); \
RISCV_FENCE(rw, w); \
WRITE_ONCE(*p, v); \
} while (0)
......@@ -40,7 +39,7 @@ do { \
({ \
typeof(*p) ___p1 = READ_ONCE(*p); \
compiletime_assert_atomic_type(*p); \
RISCV_FENCE(r,rw); \
RISCV_FENCE(r, rw); \
___p1; \
})
......@@ -69,7 +68,7 @@ do { \
* instances the scheduler pairs this with an mb(), so nothing is necessary on
* the new hart.
*/
#define smp_mb__after_spinlock() RISCV_FENCE(iorw,iorw)
#define smp_mb__after_spinlock() RISCV_FENCE(iorw, iorw)
#include <asm-generic/barrier.h>
......
......@@ -22,6 +22,16 @@
#include <asm-generic/bitops/fls.h>
#else
#define __HAVE_ARCH___FFS
#define __HAVE_ARCH___FLS
#define __HAVE_ARCH_FFS
#define __HAVE_ARCH_FLS
#include <asm-generic/bitops/__ffs.h>
#include <asm-generic/bitops/__fls.h>
#include <asm-generic/bitops/ffs.h>
#include <asm-generic/bitops/fls.h>
#include <asm/alternative-macros.h>
#include <asm/hwcap.h>
......@@ -37,8 +47,6 @@
static __always_inline unsigned long variable__ffs(unsigned long word)
{
int num;
asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
RISCV_ISA_EXT_ZBB, 1)
: : : : legacy);
......@@ -52,32 +60,7 @@ static __always_inline unsigned long variable__ffs(unsigned long word)
return word;
legacy:
num = 0;
#if BITS_PER_LONG == 64
if ((word & 0xffffffff) == 0) {
num += 32;
word >>= 32;
}
#endif
if ((word & 0xffff) == 0) {
num += 16;
word >>= 16;
}
if ((word & 0xff) == 0) {
num += 8;
word >>= 8;
}
if ((word & 0xf) == 0) {
num += 4;
word >>= 4;
}
if ((word & 0x3) == 0) {
num += 2;
word >>= 2;
}
if ((word & 0x1) == 0)
num += 1;
return num;
return generic___ffs(word);
}
/**
......@@ -93,8 +76,6 @@ static __always_inline unsigned long variable__ffs(unsigned long word)
static __always_inline unsigned long variable__fls(unsigned long word)
{
int num;
asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
RISCV_ISA_EXT_ZBB, 1)
: : : : legacy);
......@@ -108,32 +89,7 @@ static __always_inline unsigned long variable__fls(unsigned long word)
return BITS_PER_LONG - 1 - word;
legacy:
num = BITS_PER_LONG - 1;
#if BITS_PER_LONG == 64
if (!(word & (~0ul << 32))) {
num -= 32;
word <<= 32;
}
#endif
if (!(word & (~0ul << (BITS_PER_LONG - 16)))) {
num -= 16;
word <<= 16;
}
if (!(word & (~0ul << (BITS_PER_LONG - 8)))) {
num -= 8;
word <<= 8;
}
if (!(word & (~0ul << (BITS_PER_LONG - 4)))) {
num -= 4;
word <<= 4;
}
if (!(word & (~0ul << (BITS_PER_LONG - 2)))) {
num -= 2;
word <<= 2;
}
if (!(word & (~0ul << (BITS_PER_LONG - 1))))
num -= 1;
return num;
return generic___fls(word);
}
/**
......@@ -149,46 +105,23 @@ static __always_inline unsigned long variable__fls(unsigned long word)
static __always_inline int variable_ffs(int x)
{
int r;
if (!x)
return 0;
asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
RISCV_ISA_EXT_ZBB, 1)
: : : : legacy);
if (!x)
return 0;
asm volatile (".option push\n"
".option arch,+zbb\n"
CTZW "%0, %1\n"
".option pop\n"
: "=r" (r) : "r" (x) :);
: "=r" (x) : "r" (x) :);
return r + 1;
return x + 1;
legacy:
r = 1;
if (!(x & 0xffff)) {
x >>= 16;
r += 16;
}
if (!(x & 0xff)) {
x >>= 8;
r += 8;
}
if (!(x & 0xf)) {
x >>= 4;
r += 4;
}
if (!(x & 3)) {
x >>= 2;
r += 2;
}
if (!(x & 1)) {
x >>= 1;
r += 1;
}
return r;
return generic_ffs(x);
}
/**
......@@ -204,46 +137,23 @@ static __always_inline int variable_ffs(int x)
static __always_inline int variable_fls(unsigned int x)
{
int r;
if (!x)
return 0;
asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
RISCV_ISA_EXT_ZBB, 1)
: : : : legacy);
if (!x)
return 0;
asm volatile (".option push\n"
".option arch,+zbb\n"
CLZW "%0, %1\n"
".option pop\n"
: "=r" (r) : "r" (x) :);
: "=r" (x) : "r" (x) :);
return 32 - r;
return 32 - x;
legacy:
r = 32;
if (!(x & 0xffff0000u)) {
x <<= 16;
r -= 16;
}
if (!(x & 0xff000000u)) {
x <<= 8;
r -= 8;
}
if (!(x & 0xf0000000u)) {
x <<= 4;
r -= 4;
}
if (!(x & 0xc0000000u)) {
x <<= 2;
r -= 2;
}
if (!(x & 0x80000000u)) {
x <<= 1;
r -= 1;
}
return r;
return generic_fls(x);
}
/**
......
......@@ -8,7 +8,6 @@
#include <linux/bug.h>
#include <asm/barrier.h>
#include <asm/fence.h>
#define __xchg_relaxed(ptr, new, size) \
......@@ -313,7 +312,7 @@
" bne %0, %z3, 1f\n" \
" sc.w.rl %1, %z4, %2\n" \
" bnez %1, 0b\n" \
" fence rw, rw\n" \
RISCV_FULL_BARRIER \
"1:\n" \
: "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr) \
: "rJ" ((long)__old), "rJ" (__new) \
......@@ -325,7 +324,7 @@
" bne %0, %z3, 1f\n" \
" sc.d.rl %1, %z4, %2\n" \
" bnez %1, 0b\n" \
" fence rw, rw\n" \
RISCV_FULL_BARRIER \
"1:\n" \
: "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr) \
: "rJ" (__old), "rJ" (__new) \
......
......@@ -14,9 +14,28 @@
static inline int is_compat_task(void)
{
if (!IS_ENABLED(CONFIG_COMPAT))
return 0;
return test_thread_flag(TIF_32BIT);
}
static inline int is_compat_thread(struct thread_info *thread)
{
if (!IS_ENABLED(CONFIG_COMPAT))
return 0;
return test_ti_thread_flag(thread, TIF_32BIT);
}
static inline void set_compat_task(bool is_compat)
{
if (is_compat)
set_thread_flag(TIF_32BIT);
else
clear_thread_flag(TIF_32BIT);
}
struct compat_user_regs_struct {
compat_ulong_t pc;
compat_ulong_t ra;
......
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright 2022-2023 Rivos, Inc
* Copyright 2022-2024 Rivos, Inc
*/
#ifndef _ASM_CPUFEATURE_H
......@@ -28,29 +28,38 @@ struct riscv_isainfo {
DECLARE_PER_CPU(struct riscv_cpuinfo, riscv_cpuinfo);
DECLARE_PER_CPU(long, misaligned_access_speed);
/* Per-cpu ISA extensions. */
extern struct riscv_isainfo hart_isa[NR_CPUS];
void riscv_user_isa_enable(void);
#ifdef CONFIG_RISCV_MISALIGNED
bool unaligned_ctl_available(void);
bool check_unaligned_access_emulated(int cpu);
#if defined(CONFIG_RISCV_MISALIGNED)
bool check_unaligned_access_emulated_all_cpus(void);
void unaligned_emulation_finish(void);
bool unaligned_ctl_available(void);
DECLARE_PER_CPU(long, misaligned_access_speed);
#else
static inline bool unaligned_ctl_available(void)
{
return false;
}
#endif
#if defined(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS)
DECLARE_STATIC_KEY_FALSE(fast_unaligned_access_speed_key);
static inline bool check_unaligned_access_emulated(int cpu)
static __always_inline bool has_fast_unaligned_accesses(void)
{
return false;
return static_branch_likely(&fast_unaligned_access_speed_key);
}
#else
static __always_inline bool has_fast_unaligned_accesses(void)
{
if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
return true;
else
return false;
}
static inline void unaligned_emulation_finish(void) {}
#endif
unsigned long riscv_get_elf_hwcap(void);
......@@ -135,6 +144,4 @@ static __always_inline bool riscv_cpu_has_extension_unlikely(int cpu, const unsi
return __riscv_isa_extension_available(hart_isa[cpu].isa, ext);
}
DECLARE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
#endif
......@@ -53,13 +53,9 @@ extern bool compat_elf_check_arch(Elf32_Ehdr *hdr);
#define ELF_ET_DYN_BASE ((DEFAULT_MAP_WINDOW / 3) * 2)
#ifdef CONFIG_64BIT
#ifdef CONFIG_COMPAT
#define STACK_RND_MASK (test_thread_flag(TIF_32BIT) ? \
#define STACK_RND_MASK (is_compat_task() ? \
0x7ff >> (PAGE_SHIFT - 12) : \
0x3ffff >> (PAGE_SHIFT - 12))
#else
#define STACK_RND_MASK (0x3ffff >> (PAGE_SHIFT - 12))
#endif
#endif
/*
......@@ -139,10 +135,7 @@ do { \
#ifdef CONFIG_COMPAT
#define SET_PERSONALITY(ex) \
do { if ((ex).e_ident[EI_CLASS] == ELFCLASS32) \
set_thread_flag(TIF_32BIT); \
else \
clear_thread_flag(TIF_32BIT); \
do { set_compat_task((ex).e_ident[EI_CLASS] == ELFCLASS32); \
if (personality(current->personality) != PER_LINUX32) \
set_personality(PER_LINUX | \
(current->personality & (~PER_MASK))); \
......
......@@ -12,8 +12,8 @@
#include <asm/vendorid_list.h>
#ifdef CONFIG_ERRATA_ANDES
#define ERRATA_ANDESTECH_NO_IOCP 0
#define ERRATA_ANDESTECH_NUMBER 1
#define ERRATA_ANDES_NO_IOCP 0
#define ERRATA_ANDES_NUMBER 1
#endif
#ifdef CONFIG_ERRATA_SIFIVE
......@@ -112,15 +112,6 @@ asm volatile(ALTERNATIVE( \
#define THEAD_C9XX_RV_IRQ_PMU 17
#define THEAD_C9XX_CSR_SCOUNTEROF 0x5c5
#define ALT_SBI_PMU_OVERFLOW(__ovl) \
asm volatile(ALTERNATIVE( \
"csrr %0, " __stringify(CSR_SSCOUNTOVF), \
"csrr %0, " __stringify(THEAD_C9XX_CSR_SCOUNTEROF), \
THEAD_VENDOR_ID, ERRATA_THEAD_PMU, \
CONFIG_ERRATA_THEAD_PMU) \
: "=r" (__ovl) : \
: "memory")
#endif /* __ASSEMBLY__ */
#endif
#ifndef _ASM_RISCV_FENCE_H
#define _ASM_RISCV_FENCE_H
#define RISCV_FENCE_ASM(p, s) "\tfence " #p "," #s "\n"
#define RISCV_FENCE(p, s) \
({ __asm__ __volatile__ (RISCV_FENCE_ASM(p, s) : : : "memory"); })
#ifdef CONFIG_SMP
#define RISCV_ACQUIRE_BARRIER "\tfence r , rw\n"
#define RISCV_RELEASE_BARRIER "\tfence rw, w\n"
#define RISCV_ACQUIRE_BARRIER RISCV_FENCE_ASM(r, rw)
#define RISCV_RELEASE_BARRIER RISCV_FENCE_ASM(rw, w)
#define RISCV_FULL_BARRIER RISCV_FENCE_ASM(rw, rw)
#else
#define RISCV_ACQUIRE_BARRIER
#define RISCV_RELEASE_BARRIER
#define RISCV_FULL_BARRIER
#endif
#endif /* _ASM_RISCV_FENCE_H */
......@@ -80,6 +80,7 @@
#define RISCV_ISA_EXT_ZFA 71
#define RISCV_ISA_EXT_ZTSO 72
#define RISCV_ISA_EXT_ZACAS 73
#define RISCV_ISA_EXT_XANDESPMU 74
#define RISCV_ISA_EXT_XLINUXENVCFG 127
......
......@@ -47,10 +47,10 @@
* sufficient to ensure this works sanely on controllers that support I/O
* writes.
*/
#define __io_pbr() __asm__ __volatile__ ("fence io,i" : : : "memory");
#define __io_par(v) __asm__ __volatile__ ("fence i,ior" : : : "memory");
#define __io_pbw() __asm__ __volatile__ ("fence iow,o" : : : "memory");
#define __io_paw() __asm__ __volatile__ ("fence o,io" : : : "memory");
#define __io_pbr() RISCV_FENCE(io, i)
#define __io_par(v) RISCV_FENCE(i, ior)
#define __io_pbw() RISCV_FENCE(iow, o)
#define __io_paw() RISCV_FENCE(o, io)
/*
* Accesses from a single hart to a single I/O address must be ordered. This
......
/* SPDX-License-Identifier: GPL-2.0-only */
#ifndef _ASM_RISCV_MEMBARRIER_H
#define _ASM_RISCV_MEMBARRIER_H
static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
struct mm_struct *next,
struct task_struct *tsk)
{
/*
* Only need the full barrier when switching between processes.
* Barrier when switching from kernel to userspace is not
* required here, given that it is implied by mmdrop(). Barrier
* when switching from userspace to kernel is not needed after
* store to rq->curr.
*/
if (IS_ENABLED(CONFIG_SMP) &&
likely(!(atomic_read(&next->membarrier_state) &
(MEMBARRIER_STATE_PRIVATE_EXPEDITED |
MEMBARRIER_STATE_GLOBAL_EXPEDITED)) || !prev))
return;
/*
* The membarrier system call requires a full memory barrier
* after storing to rq->curr, before going back to user-space.
*
* This barrier is also needed for the SYNC_CORE command when
* switching between processes; in particular, on a transition
* from a thread belonging to another mm to a thread belonging
* to the mm for which a membarrier SYNC_CORE is done on CPU0:
*
* - [CPU0] sets all bits in the mm icache_stale_mask (in
* prepare_sync_core_cmd());
*
* - [CPU1] stores to rq->curr (by the scheduler);
*
* - [CPU0] loads rq->curr within membarrier and observes
* cpu_rq(1)->curr->mm != mm, so the IPI is skipped on
* CPU1; this means membarrier relies on switch_mm() to
* issue the sync-core;
*
* - [CPU1] switch_mm() loads icache_stale_mask; if the bit
* is zero, switch_mm() may incorrectly skip the sync-core.
*
* Matches a full barrier in the proximity of the membarrier
* system call entry.
*/
smp_mb();
}
#endif /* _ASM_RISCV_MEMBARRIER_H */
......@@ -12,6 +12,7 @@
#define _ASM_RISCV_MMIO_H
#include <linux/types.h>
#include <asm/fence.h>
#include <asm/mmiowb.h>
/* Generic IO read/write. These perform native-endian accesses. */
......@@ -131,8 +132,8 @@ static inline u64 __raw_readq(const volatile void __iomem *addr)
* doesn't define any ordering between the memory space and the I/O space.
*/
#define __io_br() do {} while (0)
#define __io_ar(v) ({ __asm__ __volatile__ ("fence i,ir" : : : "memory"); })
#define __io_bw() ({ __asm__ __volatile__ ("fence w,o" : : : "memory"); })
#define __io_ar(v) RISCV_FENCE(i, ir)
#define __io_bw() RISCV_FENCE(w, o)
#define __io_aw() mmiowb_set_pending()
#define readb(c) ({ u8 __v; __io_br(); __v = readb_cpu(c); __io_ar(__v); __v; })
......
......@@ -7,7 +7,7 @@
* "o,w" is sufficient to ensure that all writes to the device have completed
* before the write to the spinlock is allowed to commit.
*/
#define mmiowb() __asm__ __volatile__ ("fence o,w" : : : "memory");
#define mmiowb() RISCV_FENCE(o, w)
#include <linux/smp.h>
#include <asm-generic/mmiowb.h>
......
......@@ -95,13 +95,19 @@ static inline void pud_free(struct mm_struct *mm, pud_t *pud)
__pud_free(mm, pud);
}
#define __pud_free_tlb(tlb, pud, addr) \
do { \
if (pgtable_l4_enabled) { \
pagetable_pud_dtor(virt_to_ptdesc(pud)); \
tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pud)); \
} \
} while (0)
static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
unsigned long addr)
{
if (pgtable_l4_enabled) {
struct ptdesc *ptdesc = virt_to_ptdesc(pud);
pagetable_pud_dtor(ptdesc);
if (riscv_use_ipi_for_rfence())
tlb_remove_page_ptdesc(tlb, ptdesc);
else
tlb_remove_ptdesc(tlb, ptdesc);
}
}
#define p4d_alloc_one p4d_alloc_one
static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
......@@ -130,11 +136,16 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
__p4d_free(mm, p4d);
}
#define __p4d_free_tlb(tlb, p4d, addr) \
do { \
if (pgtable_l5_enabled) \
tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(p4d)); \
} while (0)
static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
unsigned long addr)
{
if (pgtable_l5_enabled) {
if (riscv_use_ipi_for_rfence())
tlb_remove_page_ptdesc(tlb, virt_to_ptdesc(p4d));
else
tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
}
}
#endif /* __PAGETABLE_PMD_FOLDED */
static inline void sync_kernel_mappings(pgd_t *pgd)
......@@ -159,19 +170,31 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
#ifndef __PAGETABLE_PMD_FOLDED
#define __pmd_free_tlb(tlb, pmd, addr) \
do { \
pagetable_pmd_dtor(virt_to_ptdesc(pmd)); \
tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \
} while (0)
static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
unsigned long addr)
{
struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
pagetable_pmd_dtor(ptdesc);
if (riscv_use_ipi_for_rfence())
tlb_remove_page_ptdesc(tlb, ptdesc);
else
tlb_remove_ptdesc(tlb, ptdesc);
}
#endif /* __PAGETABLE_PMD_FOLDED */
#define __pte_free_tlb(tlb, pte, buf) \
do { \
pagetable_pte_dtor(page_ptdesc(pte)); \
tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
} while (0)
static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
unsigned long addr)
{
struct ptdesc *ptdesc = page_ptdesc(pte);
pagetable_pte_dtor(ptdesc);
if (riscv_use_ipi_for_rfence())
tlb_remove_page_ptdesc(tlb, ptdesc);
else
tlb_remove_ptdesc(tlb, ptdesc);
}
#endif /* CONFIG_MMU */
#endif /* _ASM_RISCV_PGALLOC_H */
......@@ -127,16 +127,10 @@
#define VA_USER_SV48 (UL(1) << (VA_BITS_SV48 - 1))
#define VA_USER_SV57 (UL(1) << (VA_BITS_SV57 - 1))
#ifdef CONFIG_COMPAT
#define MMAP_VA_BITS_64 ((VA_BITS >= VA_BITS_SV48) ? VA_BITS_SV48 : VA_BITS)
#define MMAP_MIN_VA_BITS_64 (VA_BITS_SV39)
#define MMAP_VA_BITS (is_compat_task() ? VA_BITS_SV32 : MMAP_VA_BITS_64)
#define MMAP_MIN_VA_BITS (is_compat_task() ? VA_BITS_SV32 : MMAP_MIN_VA_BITS_64)
#else
#define MMAP_VA_BITS ((VA_BITS >= VA_BITS_SV48) ? VA_BITS_SV48 : VA_BITS)
#define MMAP_MIN_VA_BITS (VA_BITS_SV39)
#endif /* CONFIG_COMPAT */
#else
#include <asm/pgtable-32.h>
#endif /* CONFIG_64BIT */
......@@ -439,9 +433,11 @@ static inline pte_t pte_mkhuge(pte_t pte)
return pte;
}
#ifdef CONFIG_RISCV_ISA_SVNAPOT
#define pte_leaf_size(pte) (pte_napot(pte) ? \
napot_cont_size(napot_cont_order(pte)) :\
PAGE_SIZE)
#endif
#ifdef CONFIG_NUMA_BALANCING
/*
......@@ -517,12 +513,12 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
WRITE_ONCE(*ptep, pteval);
}
void flush_icache_pte(pte_t pte);
void flush_icache_pte(struct mm_struct *mm, pte_t pte);
static inline void __set_pte_at(pte_t *ptep, pte_t pteval)
static inline void __set_pte_at(struct mm_struct *mm, pte_t *ptep, pte_t pteval)
{
if (pte_present(pteval) && pte_exec(pteval))
flush_icache_pte(pteval);
flush_icache_pte(mm, pteval);
set_pte(ptep, pteval);
}
......@@ -535,7 +531,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
page_table_check_ptes_set(mm, ptep, pteval, nr);
for (;;) {
__set_pte_at(ptep, pteval);
__set_pte_at(mm, ptep, pteval);
if (--nr == 0)
break;
ptep++;
......@@ -547,7 +543,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
static inline void pte_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
__set_pte_at(ptep, __pte(0));
__set_pte_at(mm, ptep, __pte(0));
}
#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS /* defined in mm/pgtable.c */
......@@ -662,6 +658,12 @@ static inline int pmd_write(pmd_t pmd)
return pte_write(pmd_pte(pmd));
}
#define pud_write pud_write
static inline int pud_write(pud_t pud)
{
return pte_write(pud_pte(pud));
}
#define pmd_dirty pmd_dirty
static inline int pmd_dirty(pmd_t pmd)
{
......@@ -713,14 +715,14 @@ static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd)
{
page_table_check_pmd_set(mm, pmdp, pmd);
return __set_pte_at((pte_t *)pmdp, pmd_pte(pmd));
return __set_pte_at(mm, (pte_t *)pmdp, pmd_pte(pmd));
}
static inline void set_pud_at(struct mm_struct *mm, unsigned long addr,
pud_t *pudp, pud_t pud)
{
page_table_check_pud_set(mm, pudp, pud);
return __set_pte_at((pte_t *)pudp, pud_pte(pud));
return __set_pte_at(mm, (pte_t *)pudp, pud_pte(pud));
}
#ifdef CONFIG_PAGE_TABLE_CHECK
......@@ -871,8 +873,8 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
#define TASK_SIZE_MIN (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
#ifdef CONFIG_COMPAT
#define TASK_SIZE_32 (_AC(0x80000000, UL))
#define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \
#define TASK_SIZE_32 (_AC(0x80000000, UL) - PAGE_SIZE)
#define TASK_SIZE (is_compat_task() ? \
TASK_SIZE_32 : TASK_SIZE_64)
#else
#define TASK_SIZE TASK_SIZE_64
......
......@@ -14,22 +14,21 @@
#include <asm/ptrace.h>
#ifdef CONFIG_64BIT
#define DEFAULT_MAP_WINDOW (UL(1) << (MMAP_VA_BITS - 1))
#define STACK_TOP_MAX TASK_SIZE
/*
* addr is a hint to the maximum userspace address that mmap should provide, so
* this macro needs to return the largest address space available so that
* mmap_end < addr, being mmap_end the top of that address space.
* See Documentation/arch/riscv/vm-layout.rst for more details.
*/
#define arch_get_mmap_end(addr, len, flags) \
({ \
unsigned long mmap_end; \
typeof(addr) _addr = (addr); \
if ((_addr) == 0 || (IS_ENABLED(CONFIG_COMPAT) && is_compat_task())) \
mmap_end = STACK_TOP_MAX; \
else if ((_addr) >= VA_USER_SV57) \
if ((_addr) == 0 || is_compat_task() || \
((_addr + len) > BIT(VA_BITS - 1))) \
mmap_end = STACK_TOP_MAX; \
else if ((((_addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) \
mmap_end = VA_USER_SV48; \
else \
mmap_end = VA_USER_SV39; \
mmap_end = (_addr + len); \
mmap_end; \
})
......@@ -39,17 +38,17 @@
typeof(addr) _addr = (addr); \
typeof(base) _base = (base); \
unsigned long rnd_gap = DEFAULT_MAP_WINDOW - (_base); \
if ((_addr) == 0 || (IS_ENABLED(CONFIG_COMPAT) && is_compat_task())) \
if ((_addr) == 0 || is_compat_task() || \
((_addr + len) > BIT(VA_BITS - 1))) \
mmap_base = (_base); \
else if (((_addr) >= VA_USER_SV57) && (VA_BITS >= VA_BITS_SV57)) \
mmap_base = VA_USER_SV57 - rnd_gap; \
else if ((((_addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) \
mmap_base = VA_USER_SV48 - rnd_gap; \
else \
mmap_base = VA_USER_SV39 - rnd_gap; \
mmap_base = (_addr + len) - rnd_gap; \
mmap_base; \
})
#ifdef CONFIG_64BIT
#define DEFAULT_MAP_WINDOW (UL(1) << (MMAP_VA_BITS - 1))
#define STACK_TOP_MAX TASK_SIZE_64
#else
#define DEFAULT_MAP_WINDOW TASK_SIZE
#define STACK_TOP_MAX TASK_SIZE
......
......@@ -34,9 +34,9 @@ static __must_check inline bool may_use_simd(void)
return false;
/*
* Nesting is acheived in preempt_v by spreading the control for
* Nesting is achieved in preempt_v by spreading the control for
* preemptible and non-preemptible kernel-mode Vector into two fields.
* Always try to match with prempt_v if kernel V-context exists. Then,
* Always try to match with preempt_v if kernel V-context exists. Then,
* fallback to check non preempt_v if nesting happens, or if the config
* is not set.
*/
......
......@@ -56,4 +56,7 @@ int hibernate_resume_nonboot_cpu_disable(void);
asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
unsigned long cpu_resume);
asmlinkage int hibernate_core_restore_code(void);
bool riscv_sbi_hsm_is_supported(void);
bool riscv_sbi_suspend_state_is_valid(u32 state);
int riscv_sbi_hart_suspend(u32 state);
#endif
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_RISCV_SYNC_CORE_H
#define _ASM_RISCV_SYNC_CORE_H
/*
* RISC-V implements return to user-space through an xRET instruction,
* which is not core serializing.
*/
static inline void sync_core_before_usermode(void)
{
asm volatile ("fence.i" ::: "memory");
}
#ifdef CONFIG_SMP
/*
* Ensure the next switch_mm() on every CPU issues a core serializing
* instruction for the given @mm.
*/
static inline void prepare_sync_core_cmd(struct mm_struct *mm)
{
cpumask_setall(&mm->context.icache_stale_mask);
}
#else
static inline void prepare_sync_core_cmd(struct mm_struct *mm)
{
}
#endif /* CONFIG_SMP */
#endif /* _ASM_RISCV_SYNC_CORE_H */
......@@ -12,25 +12,51 @@
asmlinkage long __riscv_sys_ni_syscall(const struct pt_regs *);
#define SC_RISCV_REGS_TO_ARGS(x, ...) \
__MAP(x,__SC_ARGS \
,,regs->orig_a0,,regs->a1,,regs->a2 \
#ifdef CONFIG_64BIT
#define __SYSCALL_SE_DEFINEx(x, prefix, name, ...) \
static long __se_##prefix##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \
static long __se_##prefix##name(__MAP(x,__SC_LONG,__VA_ARGS__))
#define SC_RISCV_REGS_TO_ARGS(x, ...) \
__MAP(x,__SC_ARGS \
,,regs->orig_a0,,regs->a1,,regs->a2 \
,,regs->a3,,regs->a4,,regs->a5,,regs->a6)
#else
/*
* Use type aliasing to ensure registers a0-a6 are correctly passed to the syscall
* implementation when >word-size arguments are used.
*/
#define __SYSCALL_SE_DEFINEx(x, prefix, name, ...) \
__diag_push(); \
__diag_ignore(GCC, 8, "-Wattribute-alias", \
"Type aliasing is used to sanitize syscall arguments"); \
static long __se_##prefix##name(ulong, ulong, ulong, ulong, ulong, ulong, \
ulong) \
__attribute__((alias(__stringify(___se_##prefix##name)))); \
__diag_pop(); \
static long noinline ___se_##prefix##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \
static long ___se_##prefix##name(__MAP(x,__SC_LONG,__VA_ARGS__))
#define SC_RISCV_REGS_TO_ARGS(x, ...) \
regs->orig_a0,regs->a1,regs->a2,regs->a3,regs->a4,regs->a5,regs->a6
#endif /* CONFIG_64BIT */
#ifdef CONFIG_COMPAT
#define COMPAT_SYSCALL_DEFINEx(x, name, ...) \
asmlinkage long __riscv_compat_sys##name(const struct pt_regs *regs); \
ALLOW_ERROR_INJECTION(__riscv_compat_sys##name, ERRNO); \
static long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \
static inline long __do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \
asmlinkage long __riscv_compat_sys##name(const struct pt_regs *regs) \
__SYSCALL_SE_DEFINEx(x, compat_sys, name, __VA_ARGS__) \
{ \
return __se_compat_sys##name(SC_RISCV_REGS_TO_ARGS(x,__VA_ARGS__)); \
return __do_compat_sys##name(__MAP(x,__SC_DELOUSE,__VA_ARGS__)); \
} \
static long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
asmlinkage long __riscv_compat_sys##name(const struct pt_regs *regs) \
{ \
return __do_compat_sys##name(__MAP(x,__SC_DELOUSE,__VA_ARGS__)); \
return __se_compat_sys##name(SC_RISCV_REGS_TO_ARGS(x,__VA_ARGS__)); \
} \
static inline long __do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))
......@@ -51,19 +77,18 @@ asmlinkage long __riscv_sys_ni_syscall(const struct pt_regs *);
#define __SYSCALL_DEFINEx(x, name, ...) \
asmlinkage long __riscv_sys##name(const struct pt_regs *regs); \
ALLOW_ERROR_INJECTION(__riscv_sys##name, ERRNO); \
static long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \
static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \
asmlinkage long __riscv_sys##name(const struct pt_regs *regs) \
{ \
return __se_sys##name(SC_RISCV_REGS_TO_ARGS(x,__VA_ARGS__)); \
} \
static long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
__SYSCALL_SE_DEFINEx(x, sys, name, __VA_ARGS__) \
{ \
long ret = __do_sys##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \
__MAP(x,__SC_TEST,__VA_ARGS__); \
__PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \
return ret; \
} \
asmlinkage long __riscv_sys##name(const struct pt_regs *regs) \
{ \
return __se_sys##name(SC_RISCV_REGS_TO_ARGS(x,__VA_ARGS__)); \
} \
static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))
#define SYSCALL_DEFINE0(sname) \
......
......@@ -10,6 +10,24 @@ struct mmu_gather;
static void tlb_flush(struct mmu_gather *tlb);
#ifdef CONFIG_MMU
#include <linux/swap.h>
/*
* While riscv platforms with riscv_ipi_for_rfence as true require an IPI to
* perform TLB shootdown, some platforms with riscv_ipi_for_rfence as false use
* SBI to perform TLB shootdown. To keep software pagetable walkers safe in this
* case we switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the
* comment below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in include/asm-generic/tlb.h
* for more details.
*/
static inline void __tlb_remove_table(void *table)
{
free_page_and_swap_cache(table);
}
#endif /* CONFIG_MMU */
#define tlb_flush tlb_flush
#include <asm-generic/tlb.h>
......
......@@ -284,4 +284,15 @@ static inline bool riscv_v_vstate_ctrl_user_allowed(void) { return false; }
#endif /* CONFIG_RISCV_ISA_V */
/*
* Return the implementation's vlen value.
*
* riscv_v_vsize contains the value of "32 vector registers with vlenb length"
* so rebuild the vlen value in bits from it.
*/
static inline int riscv_vector_vlen(void)
{
return riscv_v_vsize / 32 * 8;
}
#endif /* ! __ASM_RISCV_VECTOR_H */
......@@ -5,7 +5,7 @@
#ifndef ASM_VENDOR_LIST_H
#define ASM_VENDOR_LIST_H
#define ANDESTECH_VENDOR_ID 0x31e
#define ANDES_VENDOR_ID 0x31e
#define SIFIVE_VENDOR_ID 0x489
#define THEAD_VENDOR_ID 0x5b7
......
......@@ -39,7 +39,6 @@ extra-y += vmlinux.lds
obj-y += head.o
obj-y += soc.o
obj-$(CONFIG_RISCV_ALTERNATIVE) += alternative.o
obj-y += copy-unaligned.o
obj-y += cpu.o
obj-y += cpufeature.o
obj-y += entry.o
......@@ -64,6 +63,9 @@ obj-y += tests/
obj-$(CONFIG_MMU) += vdso.o vdso/
obj-$(CONFIG_RISCV_MISALIGNED) += traps_misaligned.o
obj-$(CONFIG_RISCV_MISALIGNED) += unaligned_access_speed.o
obj-$(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS) += copy-unaligned.o
obj-$(CONFIG_FPU) += fpu.o
obj-$(CONFIG_RISCV_ISA_V) += vector.o
obj-$(CONFIG_RISCV_ISA_V) += kernel_mode_vector.o
......
......@@ -43,7 +43,7 @@ static void riscv_fill_cpu_mfr_info(struct cpu_manufacturer_info_t *cpu_mfr_info
switch (cpu_mfr_info->vendor_id) {
#ifdef CONFIG_ERRATA_ANDES
case ANDESTECH_VENDOR_ID:
case ANDES_VENDOR_ID:
cpu_mfr_info->patch_func = andes_errata_patch_func;
break;
#endif
......
This diff is collapsed.
......@@ -111,6 +111,7 @@ SYM_CODE_START(handle_exception)
1:
tail do_trap_unknown
SYM_CODE_END(handle_exception)
ASM_NOKPROBE(handle_exception)
/*
* The ret_from_exception must be called with interrupt disabled. Here is the
......@@ -184,6 +185,7 @@ SYM_CODE_START_NOALIGN(ret_from_exception)
sret
#endif
SYM_CODE_END(ret_from_exception)
ASM_NOKPROBE(ret_from_exception)
#ifdef CONFIG_VMAP_STACK
SYM_CODE_START_LOCAL(handle_kernel_stack_overflow)
......@@ -219,6 +221,7 @@ SYM_CODE_START_LOCAL(handle_kernel_stack_overflow)
move a0, sp
tail handle_bad_stack
SYM_CODE_END(handle_kernel_stack_overflow)
ASM_NOKPROBE(handle_kernel_stack_overflow)
#endif
SYM_CODE_START(ret_from_fork)
......
......@@ -9,6 +9,9 @@ KBUILD_CFLAGS := $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) -fpie \
-fno-asynchronous-unwind-tables -fno-unwind-tables \
$(call cc-option,-fno-addrsig)
# Disable LTO
KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS))
KBUILD_CFLAGS += -mcmodel=medany
CFLAGS_cmdline_early.o += -D__NO_FORTIFY
......
......@@ -377,14 +377,14 @@ long compat_arch_ptrace(struct task_struct *child, compat_long_t request,
return ret;
}
#else
static const struct user_regset_view compat_riscv_user_native_view = {};
#endif /* CONFIG_COMPAT */
const struct user_regset_view *task_user_regset_view(struct task_struct *task)
{
#ifdef CONFIG_COMPAT
if (test_tsk_thread_flag(task, TIF_32BIT))
if (is_compat_thread(&task->thread_info))
return &compat_riscv_user_native_view;
else
#endif
return &riscv_user_native_view;
}
......@@ -28,7 +28,6 @@
#include <asm/cpufeature.h>
#include <asm/cpu_ops.h>
#include <asm/cpufeature.h>
#include <asm/irq.h>
#include <asm/mmu_context.h>
#include <asm/numa.h>
......
This diff is collapsed.
......@@ -147,6 +147,7 @@ static bool hwprobe_ext0_has(const struct cpumask *cpus, unsigned long ext)
return (pair.value & ext);
}
#if defined(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS)
static u64 hwprobe_misaligned(const struct cpumask *cpus)
{
int cpu;
......@@ -169,6 +170,18 @@ static u64 hwprobe_misaligned(const struct cpumask *cpus)
return perf;
}
#else
static u64 hwprobe_misaligned(const struct cpumask *cpus)
{
if (IS_ENABLED(CONFIG_RISCV_EFFICIENT_UNALIGNED_ACCESS))
return RISCV_HWPROBE_MISALIGNED_FAST;
if (IS_ENABLED(CONFIG_RISCV_EMULATED_UNALIGNED_ACCESS) && unaligned_ctl_available())
return RISCV_HWPROBE_MISALIGNED_EMULATED;
return RISCV_HWPROBE_MISALIGNED_SLOW;
}
#endif
static void hwprobe_one_pair(struct riscv_hwprobe *pair,
const struct cpumask *cpus)
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0-only */
#include <linux/linkage.h>
#include <asm-generic/export.h>
#include <asm/asm.h>
#include <asm/asm-extable.h>
#include <asm/csr.h>
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment