Commit 4b88524c authored by Marc Zyngier's avatar Marc Zyngier

Merge remote-tracking branch 'arm64/for-next/sme' into kvmarm-master/next

Merge arm64's SME branch to resolve conflicts with the WFxT branch.
Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
parents 672c0c51 2e29b997
...@@ -264,6 +264,39 @@ HWCAP2_MTE3 ...@@ -264,6 +264,39 @@ HWCAP2_MTE3
Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0011, as described Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0011, as described
by Documentation/arm64/memory-tagging-extension.rst. by Documentation/arm64/memory-tagging-extension.rst.
HWCAP2_SME
Functionality implied by ID_AA64PFR1_EL1.SME == 0b0001, as described
by Documentation/arm64/sme.rst.
HWCAP2_SME_I16I64
Functionality implied by ID_AA64SMFR0_EL1.I16I64 == 0b1111.
HWCAP2_SME_F64F64
Functionality implied by ID_AA64SMFR0_EL1.F64F64 == 0b1.
HWCAP2_SME_I8I32
Functionality implied by ID_AA64SMFR0_EL1.I8I32 == 0b1111.
HWCAP2_SME_F16F32
Functionality implied by ID_AA64SMFR0_EL1.F16F32 == 0b1.
HWCAP2_SME_B16F32
Functionality implied by ID_AA64SMFR0_EL1.B16F32 == 0b1.
HWCAP2_SME_F32F32
Functionality implied by ID_AA64SMFR0_EL1.F32F32 == 0b1.
HWCAP2_SME_FA64
Functionality implied by ID_AA64SMFR0_EL1.FA64 == 0b1.
4. Unused AT_HWCAP bits 4. Unused AT_HWCAP bits
----------------------- -----------------------
......
...@@ -21,6 +21,7 @@ ARM64 Architecture ...@@ -21,6 +21,7 @@ ARM64 Architecture
perf perf
pointer-authentication pointer-authentication
silicon-errata silicon-errata
sme
sve sve
tagged-address-abi tagged-address-abi
tagged-pointers tagged-pointers
......
This diff is collapsed.
...@@ -7,7 +7,9 @@ Author: Dave Martin <Dave.Martin@arm.com> ...@@ -7,7 +7,9 @@ Author: Dave Martin <Dave.Martin@arm.com>
Date: 4 August 2017 Date: 4 August 2017
This document outlines briefly the interface provided to userspace by Linux in This document outlines briefly the interface provided to userspace by Linux in
order to support use of the ARM Scalable Vector Extension (SVE). order to support use of the ARM Scalable Vector Extension (SVE), including
interactions with Streaming SVE mode added by the Scalable Matrix Extension
(SME).
This is an outline of the most important features and issues only and not This is an outline of the most important features and issues only and not
intended to be exhaustive. intended to be exhaustive.
...@@ -23,6 +25,10 @@ model features for SVE is included in Appendix A. ...@@ -23,6 +25,10 @@ model features for SVE is included in Appendix A.
* SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are * SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are
tracked per-thread. tracked per-thread.
* In streaming mode FFR is not accessible unless HWCAP2_SME_FA64 is present
in the system, when it is not supported and these interfaces are used to
access streaming mode FFR is read and written as zero.
* The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector * The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector
AT_HWCAP entry. Presence of this flag implies the presence of the SVE AT_HWCAP entry. Presence of this flag implies the presence of the SVE
instructions and registers, and the Linux-specific system interfaces instructions and registers, and the Linux-specific system interfaces
...@@ -53,10 +59,19 @@ model features for SVE is included in Appendix A. ...@@ -53,10 +59,19 @@ model features for SVE is included in Appendix A.
which userspace can read using an MRS instruction. See elf_hwcaps.txt and which userspace can read using an MRS instruction. See elf_hwcaps.txt and
cpu-feature-registers.txt for details. cpu-feature-registers.txt for details.
* On hardware that supports the SME extensions, HWCAP2_SME will also be
reported in the AT_HWCAP2 aux vector entry. Among other things SME adds
streaming mode which provides a subset of the SVE feature set using a
separate SME vector length and the same Z/V registers. See sme.rst
for more details.
* Debuggers should restrict themselves to interacting with the target via the * Debuggers should restrict themselves to interacting with the target via the
NT_ARM_SVE regset. The recommended way of detecting support for this regset NT_ARM_SVE regset. The recommended way of detecting support for this regset
is to connect to a target process first and then attempt a is to connect to a target process first and then attempt a
ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). Note that when SME is
present and streaming SVE mode is in use the FPSIMD subset of registers
will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode
in the target.
* Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory * Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory
between userspace and the kernel, the register value is encoded in memory in between userspace and the kernel, the register value is encoded in memory in
...@@ -126,6 +141,11 @@ the SVE instruction set architecture. ...@@ -126,6 +141,11 @@ the SVE instruction set architecture.
are only present in fpsimd_context. For convenience, the content of V0..V31 are only present in fpsimd_context. For convenience, the content of V0..V31
is duplicated between sve_context and fpsimd_context. is duplicated between sve_context and fpsimd_context.
* The record contains a flag field which includes a flag SVE_SIG_FLAG_SM which
if set indicates that the thread is in streaming mode and the vector length
and register data (if present) describe the streaming SVE data and vector
length.
* The signal frame record for SVE always contains basic metadata, in particular * The signal frame record for SVE always contains basic metadata, in particular
the thread's vector length (in sve_context.vl). the thread's vector length (in sve_context.vl).
...@@ -170,6 +190,11 @@ When returning from a signal handler: ...@@ -170,6 +190,11 @@ When returning from a signal handler:
the signal frame does not match the current vector length, the signal return the signal frame does not match the current vector length, the signal return
attempt is treated as illegal, resulting in a forced SIGSEGV. attempt is treated as illegal, resulting in a forced SIGSEGV.
* It is permitted to enter or leave streaming mode by setting or clearing
the SVE_SIG_FLAG_SM flag but applications should take care to ensure that
when doing so sve_context.vl and any register data are appropriate for the
vector length in the new mode.
6. prctl extensions 6. prctl extensions
-------------------- --------------------
...@@ -265,8 +290,14 @@ prctl(PR_SVE_GET_VL) ...@@ -265,8 +290,14 @@ prctl(PR_SVE_GET_VL)
7. ptrace extensions 7. ptrace extensions
--------------------- ---------------------
* A new regset NT_ARM_SVE is defined for use with PTRACE_GETREGSET and * New regsets NT_ARM_SVE and NT_ARM_SSVE are defined for use with
PTRACE_SETREGSET. PTRACE_GETREGSET and PTRACE_SETREGSET. NT_ARM_SSVE describes the
streaming mode SVE registers and NT_ARM_SVE describes the
non-streaming mode SVE registers.
In this description a register set is referred to as being "live" when
the target is in the appropriate streaming or non-streaming mode and is
using data beyond the subset shared with the FPSIMD Vn registers.
Refer to [2] for definitions. Refer to [2] for definitions.
...@@ -297,7 +328,7 @@ The regset data starts with struct user_sve_header, containing: ...@@ -297,7 +328,7 @@ The regset data starts with struct user_sve_header, containing:
flags flags
either at most one of
SVE_PT_REGS_FPSIMD SVE_PT_REGS_FPSIMD
...@@ -331,6 +362,10 @@ The regset data starts with struct user_sve_header, containing: ...@@ -331,6 +362,10 @@ The regset data starts with struct user_sve_header, containing:
SVE_PT_VL_ONEXEC (SETREGSET only). SVE_PT_VL_ONEXEC (SETREGSET only).
If neither FPSIMD nor SVE flags are provided then no register
payload is available, this is only possible when SME is implemented.
* The effects of changing the vector length and/or flags are equivalent to * The effects of changing the vector length and/or flags are equivalent to
those documented for PR_SVE_SET_VL. those documented for PR_SVE_SET_VL.
...@@ -346,6 +381,13 @@ The regset data starts with struct user_sve_header, containing: ...@@ -346,6 +381,13 @@ The regset data starts with struct user_sve_header, containing:
case only the vector length and flags are changed (along with any case only the vector length and flags are changed (along with any
consequences of those changes). consequences of those changes).
* In systems supporting SME when in streaming mode a GETREGSET for
NT_REG_SVE will return only the user_sve_header with no register data,
similarly a GETREGSET for NT_REG_SSVE will not return any register data
when not in streaming mode.
* A GETREGSET for NT_ARM_SSVE will never return SVE_PT_REGS_FPSIMD.
* For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the * For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the
requested VL is not supported, the effect will be the same as if the requested VL is not supported, the effect will be the same as if the
payload were omitted, except that an EIO error is reported. No payload were omitted, except that an EIO error is reported. No
...@@ -355,17 +397,25 @@ The regset data starts with struct user_sve_header, containing: ...@@ -355,17 +397,25 @@ The regset data starts with struct user_sve_header, containing:
unspecified. It is up to the caller to translate the payload layout unspecified. It is up to the caller to translate the payload layout
for the actual VL and retry. for the actual VL and retry.
* Where SME is implemented it is not possible to GETREGSET the register
state for normal SVE when in streaming mode, nor the streaming mode
register state when in normal mode, regardless of the implementation defined
behaviour of the hardware for sharing data between the two modes.
* Any SETREGSET of NT_ARM_SVE will exit streaming mode if the target was in
streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode
if the target was not in streaming mode.
* The effect of writing a partial, incomplete payload is unspecified. * The effect of writing a partial, incomplete payload is unspecified.
8. ELF coredump extensions 8. ELF coredump extensions
--------------------------- ---------------------------
* A NT_ARM_SVE note will be added to each coredump for each thread of the * NT_ARM_SVE and NT_ARM_SSVE notes will be added to each coredump for
dumped process. The contents will be equivalent to the data that would have each thread of the dumped process. The contents will be equivalent to the
been read if a PTRACE_GETREGSET of NT_ARM_SVE were executed for each thread data that would have been read if a PTRACE_GETREGSET of the corresponding
when the coredump was generated. type were executed for each thread when the coredump was generated.
9. System runtime configuration 9. System runtime configuration
-------------------------------- --------------------------------
......
...@@ -1948,6 +1948,17 @@ config ARM64_SVE ...@@ -1948,6 +1948,17 @@ config ARM64_SVE
booting the kernel. If unsure and you are not observing these booting the kernel. If unsure and you are not observing these
symptoms, you should assume that it is safe to say Y. symptoms, you should assume that it is safe to say Y.
config ARM64_SME
bool "ARM Scalable Matrix Extension support"
default y
depends on ARM64_SVE
help
The Scalable Matrix Extension (SME) is an extension to the AArch64
execution state which utilises a substantial subset of the SVE
instruction set, together with the addition of new architectural
register state capable of holding two dimensional matrix tiles to
enable various matrix operations.
config ARM64_MODULE_PLTS config ARM64_MODULE_PLTS
bool "Use PLTs to allow module memory to spill over into vmalloc area" bool "Use PLTs to allow module memory to spill over into vmalloc area"
depends on MODULES depends on MODULES
......
...@@ -58,11 +58,15 @@ struct cpuinfo_arm64 { ...@@ -58,11 +58,15 @@ struct cpuinfo_arm64 {
u64 reg_id_aa64pfr0; u64 reg_id_aa64pfr0;
u64 reg_id_aa64pfr1; u64 reg_id_aa64pfr1;
u64 reg_id_aa64zfr0; u64 reg_id_aa64zfr0;
u64 reg_id_aa64smfr0;
struct cpuinfo_32bit aarch32; struct cpuinfo_32bit aarch32;
/* pseudo-ZCR for recording maximum ZCR_EL1 LEN value: */ /* pseudo-ZCR for recording maximum ZCR_EL1 LEN value: */
u64 reg_zcr; u64 reg_zcr;
/* pseudo-SMCR for recording maximum SMCR_EL1 LEN value: */
u64 reg_smcr;
}; };
DECLARE_PER_CPU(struct cpuinfo_arm64, cpu_data); DECLARE_PER_CPU(struct cpuinfo_arm64, cpu_data);
......
...@@ -622,6 +622,13 @@ static inline bool id_aa64pfr0_sve(u64 pfr0) ...@@ -622,6 +622,13 @@ static inline bool id_aa64pfr0_sve(u64 pfr0)
return val > 0; return val > 0;
} }
static inline bool id_aa64pfr1_sme(u64 pfr1)
{
u32 val = cpuid_feature_extract_unsigned_field(pfr1, ID_AA64PFR1_SME_SHIFT);
return val > 0;
}
static inline bool id_aa64pfr1_mte(u64 pfr1) static inline bool id_aa64pfr1_mte(u64 pfr1)
{ {
u32 val = cpuid_feature_extract_unsigned_field(pfr1, ID_AA64PFR1_MTE_SHIFT); u32 val = cpuid_feature_extract_unsigned_field(pfr1, ID_AA64PFR1_MTE_SHIFT);
...@@ -759,6 +766,23 @@ static __always_inline bool system_supports_sve(void) ...@@ -759,6 +766,23 @@ static __always_inline bool system_supports_sve(void)
cpus_have_const_cap(ARM64_SVE); cpus_have_const_cap(ARM64_SVE);
} }
static __always_inline bool system_supports_sme(void)
{
return IS_ENABLED(CONFIG_ARM64_SME) &&
cpus_have_const_cap(ARM64_SME);
}
static __always_inline bool system_supports_fa64(void)
{
return IS_ENABLED(CONFIG_ARM64_SME) &&
cpus_have_const_cap(ARM64_SME_FA64);
}
static __always_inline bool system_supports_tpidr2(void)
{
return system_supports_sme();
}
static __always_inline bool system_supports_cnp(void) static __always_inline bool system_supports_cnp(void)
{ {
return IS_ENABLED(CONFIG_ARM64_CNP) && return IS_ENABLED(CONFIG_ARM64_CNP) &&
......
...@@ -143,6 +143,50 @@ ...@@ -143,6 +143,50 @@
.Lskip_sve_\@: .Lskip_sve_\@:
.endm .endm
/* SME register access and priority mapping */
.macro __init_el2_nvhe_sme
mrs x1, id_aa64pfr1_el1
ubfx x1, x1, #ID_AA64PFR1_SME_SHIFT, #4
cbz x1, .Lskip_sme_\@
bic x0, x0, #CPTR_EL2_TSM // Also disable SME traps
msr cptr_el2, x0 // Disable copro. traps to EL2
isb
mrs x1, sctlr_el2
orr x1, x1, #SCTLR_ELx_ENTP2 // Disable TPIDR2 traps
msr sctlr_el2, x1
isb
mov x1, #0 // SMCR controls
mrs_s x2, SYS_ID_AA64SMFR0_EL1
ubfx x2, x2, #ID_AA64SMFR0_FA64_SHIFT, #1 // Full FP in SM?
cbz x2, .Lskip_sme_fa64_\@
orr x1, x1, SMCR_ELx_FA64_MASK
.Lskip_sme_fa64_\@:
orr x1, x1, #SMCR_ELx_LEN_MASK // Enable full SME vector
msr_s SYS_SMCR_EL2, x1 // length for EL1.
mrs_s x1, SYS_SMIDR_EL1 // Priority mapping supported?
ubfx x1, x1, #SYS_SMIDR_EL1_SMPS_SHIFT, #1
cbz x1, .Lskip_sme_\@
msr_s SYS_SMPRIMAP_EL2, xzr // Make all priorities equal
mrs x1, id_aa64mmfr1_el1 // HCRX_EL2 present?
ubfx x1, x1, #ID_AA64MMFR1_HCX_SHIFT, #4
cbz x1, .Lskip_sme_\@
mrs_s x1, SYS_HCRX_EL2
orr x1, x1, #HCRX_EL2_SMPME_MASK // Enable priority mapping
msr_s SYS_HCRX_EL2, x1
.Lskip_sme_\@:
.endm
/* Disable any fine grained traps */ /* Disable any fine grained traps */
.macro __init_el2_fgt .macro __init_el2_fgt
mrs x1, id_aa64mmfr0_el1 mrs x1, id_aa64mmfr0_el1
...@@ -153,15 +197,26 @@ ...@@ -153,15 +197,26 @@
mrs x1, id_aa64dfr0_el1 mrs x1, id_aa64dfr0_el1
ubfx x1, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4 ubfx x1, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4
cmp x1, #3 cmp x1, #3
b.lt .Lset_fgt_\@ b.lt .Lset_debug_fgt_\@
/* Disable PMSNEVFR_EL1 read and write traps */ /* Disable PMSNEVFR_EL1 read and write traps */
orr x0, x0, #(1 << 62) orr x0, x0, #(1 << 62)
.Lset_fgt_\@: .Lset_debug_fgt_\@:
msr_s SYS_HDFGRTR_EL2, x0 msr_s SYS_HDFGRTR_EL2, x0
msr_s SYS_HDFGWTR_EL2, x0 msr_s SYS_HDFGWTR_EL2, x0
msr_s SYS_HFGRTR_EL2, xzr
msr_s SYS_HFGWTR_EL2, xzr mov x0, xzr
mrs x1, id_aa64pfr1_el1
ubfx x1, x1, #ID_AA64PFR1_SME_SHIFT, #4
cbz x1, .Lset_fgt_\@
/* Disable nVHE traps of TPIDR2 and SMPRI */
orr x0, x0, #HFGxTR_EL2_nSMPRI_EL1_MASK
orr x0, x0, #HFGxTR_EL2_nTPIDR2_EL0_MASK
.Lset_fgt_\@:
msr_s SYS_HFGRTR_EL2, x0
msr_s SYS_HFGWTR_EL2, x0
msr_s SYS_HFGITR_EL2, xzr msr_s SYS_HFGITR_EL2, xzr
mrs x1, id_aa64pfr0_el1 // AMU traps UNDEF without AMU mrs x1, id_aa64pfr0_el1 // AMU traps UNDEF without AMU
...@@ -196,6 +251,7 @@ ...@@ -196,6 +251,7 @@
__init_el2_nvhe_idregs __init_el2_nvhe_idregs
__init_el2_nvhe_cptr __init_el2_nvhe_cptr
__init_el2_nvhe_sve __init_el2_nvhe_sve
__init_el2_nvhe_sme
__init_el2_fgt __init_el2_fgt
__init_el2_nvhe_prepare_eret __init_el2_nvhe_prepare_eret
.endm .endm
......
...@@ -37,7 +37,8 @@ ...@@ -37,7 +37,8 @@
#define ESR_ELx_EC_ERET (0x1a) /* EL2 only */ #define ESR_ELx_EC_ERET (0x1a) /* EL2 only */
/* Unallocated EC: 0x1B */ /* Unallocated EC: 0x1B */
#define ESR_ELx_EC_FPAC (0x1C) /* EL1 and above */ #define ESR_ELx_EC_FPAC (0x1C) /* EL1 and above */
/* Unallocated EC: 0x1D - 0x1E */ #define ESR_ELx_EC_SME (0x1D)
/* Unallocated EC: 0x1E */
#define ESR_ELx_EC_IMP_DEF (0x1f) /* EL3 only */ #define ESR_ELx_EC_IMP_DEF (0x1f) /* EL3 only */
#define ESR_ELx_EC_IABT_LOW (0x20) #define ESR_ELx_EC_IABT_LOW (0x20)
#define ESR_ELx_EC_IABT_CUR (0x21) #define ESR_ELx_EC_IABT_CUR (0x21)
...@@ -75,6 +76,7 @@ ...@@ -75,6 +76,7 @@
#define ESR_ELx_IL_SHIFT (25) #define ESR_ELx_IL_SHIFT (25)
#define ESR_ELx_IL (UL(1) << ESR_ELx_IL_SHIFT) #define ESR_ELx_IL (UL(1) << ESR_ELx_IL_SHIFT)
#define ESR_ELx_ISS_MASK (ESR_ELx_IL - 1) #define ESR_ELx_ISS_MASK (ESR_ELx_IL - 1)
#define ESR_ELx_ISS(esr) ((esr) & ESR_ELx_ISS_MASK)
/* ISS field definitions shared by different classes */ /* ISS field definitions shared by different classes */
#define ESR_ELx_WNR_SHIFT (6) #define ESR_ELx_WNR_SHIFT (6)
...@@ -327,6 +329,15 @@ ...@@ -327,6 +329,15 @@
#define ESR_ELx_CP15_32_ISS_SYS_CNTFRQ (ESR_ELx_CP15_32_ISS_SYS_VAL(0, 0, 14, 0) |\ #define ESR_ELx_CP15_32_ISS_SYS_CNTFRQ (ESR_ELx_CP15_32_ISS_SYS_VAL(0, 0, 14, 0) |\
ESR_ELx_CP15_32_ISS_DIR_READ) ESR_ELx_CP15_32_ISS_DIR_READ)
/*
* ISS values for SME traps
*/
#define ESR_ELx_SME_ISS_SME_DISABLED 0
#define ESR_ELx_SME_ISS_ILL 1
#define ESR_ELx_SME_ISS_SM_DISABLED 2
#define ESR_ELx_SME_ISS_ZA_DISABLED 3
#ifndef __ASSEMBLY__ #ifndef __ASSEMBLY__
#include <asm/types.h> #include <asm/types.h>
......
...@@ -64,6 +64,7 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned int esr, ...@@ -64,6 +64,7 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned int esr,
struct pt_regs *regs); struct pt_regs *regs);
void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs); void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs);
void do_sve_acc(unsigned int esr, struct pt_regs *regs); void do_sve_acc(unsigned int esr, struct pt_regs *regs);
void do_sme_acc(unsigned int esr, struct pt_regs *regs);
void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs); void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs);
void do_sysinstr(unsigned int esr, struct pt_regs *regs); void do_sysinstr(unsigned int esr, struct pt_regs *regs);
void do_sp_pc_abort(unsigned long addr, unsigned int esr, struct pt_regs *regs); void do_sp_pc_abort(unsigned long addr, unsigned int esr, struct pt_regs *regs);
......
...@@ -46,11 +46,23 @@ extern void fpsimd_restore_current_state(void); ...@@ -46,11 +46,23 @@ extern void fpsimd_restore_current_state(void);
extern void fpsimd_update_current_state(struct user_fpsimd_state const *state); extern void fpsimd_update_current_state(struct user_fpsimd_state const *state);
extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state, extern void fpsimd_bind_state_to_cpu(struct user_fpsimd_state *state,
void *sve_state, unsigned int sve_vl); void *sve_state, unsigned int sve_vl,
void *za_state, unsigned int sme_vl,
u64 *svcr);
extern void fpsimd_flush_task_state(struct task_struct *target); extern void fpsimd_flush_task_state(struct task_struct *target);
extern void fpsimd_save_and_flush_cpu_state(void); extern void fpsimd_save_and_flush_cpu_state(void);
static inline bool thread_sm_enabled(struct thread_struct *thread)
{
return system_supports_sme() && (thread->svcr & SYS_SVCR_EL0_SM_MASK);
}
static inline bool thread_za_enabled(struct thread_struct *thread)
{
return system_supports_sme() && (thread->svcr & SYS_SVCR_EL0_ZA_MASK);
}
/* Maximum VL that SVE/SME VL-agnostic software can transparently support */ /* Maximum VL that SVE/SME VL-agnostic software can transparently support */
#define VL_ARCH_MAX 0x100 #define VL_ARCH_MAX 0x100
...@@ -62,7 +74,14 @@ static inline size_t sve_ffr_offset(int vl) ...@@ -62,7 +74,14 @@ static inline size_t sve_ffr_offset(int vl)
static inline void *sve_pffr(struct thread_struct *thread) static inline void *sve_pffr(struct thread_struct *thread)
{ {
return (char *)thread->sve_state + sve_ffr_offset(thread_get_sve_vl(thread)); unsigned int vl;
if (system_supports_sme() && thread_sm_enabled(thread))
vl = thread_get_sme_vl(thread);
else
vl = thread_get_sve_vl(thread);
return (char *)thread->sve_state + sve_ffr_offset(vl);
} }
extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr); extern void sve_save_state(void *state, u32 *pfpsr, int save_ffr);
...@@ -71,11 +90,17 @@ extern void sve_load_state(void const *state, u32 const *pfpsr, ...@@ -71,11 +90,17 @@ extern void sve_load_state(void const *state, u32 const *pfpsr,
extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1); extern void sve_flush_live(bool flush_ffr, unsigned long vq_minus_1);
extern unsigned int sve_get_vl(void); extern unsigned int sve_get_vl(void);
extern void sve_set_vq(unsigned long vq_minus_1); extern void sve_set_vq(unsigned long vq_minus_1);
extern void sme_set_vq(unsigned long vq_minus_1);
extern void za_save_state(void *state);
extern void za_load_state(void const *state);
struct arm64_cpu_capabilities; struct arm64_cpu_capabilities;
extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused); extern void sve_kernel_enable(const struct arm64_cpu_capabilities *__unused);
extern void sme_kernel_enable(const struct arm64_cpu_capabilities *__unused);
extern void fa64_kernel_enable(const struct arm64_cpu_capabilities *__unused);
extern u64 read_zcr_features(void); extern u64 read_zcr_features(void);
extern u64 read_smcr_features(void);
/* /*
* Helpers to translate bit indices in sve_vq_map to VQ values (and * Helpers to translate bit indices in sve_vq_map to VQ values (and
...@@ -119,6 +144,7 @@ struct vl_info { ...@@ -119,6 +144,7 @@ struct vl_info {
extern void sve_alloc(struct task_struct *task); extern void sve_alloc(struct task_struct *task);
extern void fpsimd_release_task(struct task_struct *task); extern void fpsimd_release_task(struct task_struct *task);
extern void fpsimd_sync_to_sve(struct task_struct *task); extern void fpsimd_sync_to_sve(struct task_struct *task);
extern void fpsimd_force_sync_to_sve(struct task_struct *task);
extern void sve_sync_to_fpsimd(struct task_struct *task); extern void sve_sync_to_fpsimd(struct task_struct *task);
extern void sve_sync_from_fpsimd_zeropad(struct task_struct *task); extern void sve_sync_from_fpsimd_zeropad(struct task_struct *task);
...@@ -170,6 +196,12 @@ static inline void write_vl(enum vec_type type, u64 val) ...@@ -170,6 +196,12 @@ static inline void write_vl(enum vec_type type, u64 val)
tmp = read_sysreg_s(SYS_ZCR_EL1) & ~ZCR_ELx_LEN_MASK; tmp = read_sysreg_s(SYS_ZCR_EL1) & ~ZCR_ELx_LEN_MASK;
write_sysreg_s(tmp | val, SYS_ZCR_EL1); write_sysreg_s(tmp | val, SYS_ZCR_EL1);
break; break;
#endif
#ifdef CONFIG_ARM64_SME
case ARM64_VEC_SME:
tmp = read_sysreg_s(SYS_SMCR_EL1) & ~SMCR_ELx_LEN_MASK;
write_sysreg_s(tmp | val, SYS_SMCR_EL1);
break;
#endif #endif
default: default:
WARN_ON_ONCE(1); WARN_ON_ONCE(1);
...@@ -208,6 +240,8 @@ static inline bool sve_vq_available(unsigned int vq) ...@@ -208,6 +240,8 @@ static inline bool sve_vq_available(unsigned int vq)
return vq_available(ARM64_VEC_SVE, vq); return vq_available(ARM64_VEC_SVE, vq);
} }
size_t sve_state_size(struct task_struct const *task);
#else /* ! CONFIG_ARM64_SVE */ #else /* ! CONFIG_ARM64_SVE */
static inline void sve_alloc(struct task_struct *task) { } static inline void sve_alloc(struct task_struct *task) { }
...@@ -247,8 +281,93 @@ static inline void vec_update_vq_map(enum vec_type t) { } ...@@ -247,8 +281,93 @@ static inline void vec_update_vq_map(enum vec_type t) { }
static inline int vec_verify_vq_map(enum vec_type t) { return 0; } static inline int vec_verify_vq_map(enum vec_type t) { return 0; }
static inline void sve_setup(void) { } static inline void sve_setup(void) { }
static inline size_t sve_state_size(struct task_struct const *task)
{
return 0;
}
#endif /* ! CONFIG_ARM64_SVE */ #endif /* ! CONFIG_ARM64_SVE */
#ifdef CONFIG_ARM64_SME
static inline void sme_user_disable(void)
{
sysreg_clear_set(cpacr_el1, CPACR_EL1_SMEN_EL0EN, 0);
}
static inline void sme_user_enable(void)
{
sysreg_clear_set(cpacr_el1, 0, CPACR_EL1_SMEN_EL0EN);
}
static inline void sme_smstart_sm(void)
{
asm volatile(__msr_s(SYS_SVCR_SMSTART_SM_EL0, "xzr"));
}
static inline void sme_smstop_sm(void)
{
asm volatile(__msr_s(SYS_SVCR_SMSTOP_SM_EL0, "xzr"));
}
static inline void sme_smstop(void)
{
asm volatile(__msr_s(SYS_SVCR_SMSTOP_SMZA_EL0, "xzr"));
}
extern void __init sme_setup(void);
static inline int sme_max_vl(void)
{
return vec_max_vl(ARM64_VEC_SME);
}
static inline int sme_max_virtualisable_vl(void)
{
return vec_max_virtualisable_vl(ARM64_VEC_SME);
}
extern void sme_alloc(struct task_struct *task);
extern unsigned int sme_get_vl(void);
extern int sme_set_current_vl(unsigned long arg);
extern int sme_get_current_vl(void);
/*
* Return how many bytes of memory are required to store the full SME
* specific state (currently just ZA) for task, given task's currently
* configured vector length.
*/
static inline size_t za_state_size(struct task_struct const *task)
{
unsigned int vl = task_get_sme_vl(task);
return ZA_SIG_REGS_SIZE(sve_vq_from_vl(vl));
}
#else
static inline void sme_user_disable(void) { BUILD_BUG(); }
static inline void sme_user_enable(void) { BUILD_BUG(); }
static inline void sme_smstart_sm(void) { }
static inline void sme_smstop_sm(void) { }
static inline void sme_smstop(void) { }
static inline void sme_alloc(struct task_struct *task) { }
static inline void sme_setup(void) { }
static inline unsigned int sme_get_vl(void) { return 0; }
static inline int sme_max_vl(void) { return 0; }
static inline int sme_max_virtualisable_vl(void) { return 0; }
static inline int sme_set_current_vl(unsigned long arg) { return -EINVAL; }
static inline int sme_get_current_vl(void) { return -EINVAL; }
static inline size_t za_state_size(struct task_struct const *task)
{
return 0;
}
#endif /* ! CONFIG_ARM64_SME */
/* For use by EFI runtime services calls only */ /* For use by EFI runtime services calls only */
extern void __efi_fpsimd_begin(void); extern void __efi_fpsimd_begin(void);
extern void __efi_fpsimd_end(void); extern void __efi_fpsimd_end(void);
......
...@@ -93,6 +93,12 @@ ...@@ -93,6 +93,12 @@
.endif .endif
.endm .endm
.macro _sme_check_wv v
.if (\v) < 12 || (\v) > 15
.error "Bad vector select register \v."
.endif
.endm
/* SVE instruction encodings for non-SVE-capable assemblers */ /* SVE instruction encodings for non-SVE-capable assemblers */
/* (pre binutils 2.28, all kernel capable clang versions support SVE) */ /* (pre binutils 2.28, all kernel capable clang versions support SVE) */
...@@ -174,6 +180,54 @@ ...@@ -174,6 +180,54 @@
| (\np) | (\np)
.endm .endm
/* SME instruction encodings for non-SME-capable assemblers */
/* (pre binutils 2.38/LLVM 13) */
/* RDSVL X\nx, #\imm */
.macro _sme_rdsvl nx, imm
_check_general_reg \nx
_check_num (\imm), -0x20, 0x1f
.inst 0x04bf5800 \
| (\nx) \
| (((\imm) & 0x3f) << 5)
.endm
/*
* STR (vector from ZA array):
* STR ZA[\nw, #\offset], [X\nxbase, #\offset, MUL VL]
*/
.macro _sme_str_zav nw, nxbase, offset=0
_sme_check_wv \nw
_check_general_reg \nxbase
_check_num (\offset), -0x100, 0xff
.inst 0xe1200000 \
| (((\nw) & 3) << 13) \
| ((\nxbase) << 5) \
| ((\offset) & 7)
.endm
/*
* LDR (vector to ZA array):
* LDR ZA[\nw, #\offset], [X\nxbase, #\offset, MUL VL]
*/
.macro _sme_ldr_zav nw, nxbase, offset=0
_sme_check_wv \nw
_check_general_reg \nxbase
_check_num (\offset), -0x100, 0xff
.inst 0xe1000000 \
| (((\nw) & 3) << 13) \
| ((\nxbase) << 5) \
| ((\offset) & 7)
.endm
/*
* Zero the entire ZA array
* ZERO ZA
*/
.macro zero_za
.inst 0xc00800ff
.endm
.macro __for from:req, to:req .macro __for from:req, to:req
.if (\from) == (\to) .if (\from) == (\to)
_for__body %\from _for__body %\from
...@@ -208,6 +262,17 @@ ...@@ -208,6 +262,17 @@
921: 921:
.endm .endm
/* Update SMCR_EL1.LEN with the new VQ */
.macro sme_load_vq xvqminus1, xtmp, xtmp2
mrs_s \xtmp, SYS_SMCR_EL1
bic \xtmp2, \xtmp, SMCR_ELx_LEN_MASK
orr \xtmp2, \xtmp2, \xvqminus1
cmp \xtmp2, \xtmp
b.eq 921f
msr_s SYS_SMCR_EL1, \xtmp2 //self-synchronising
921:
.endm
/* Preserve the first 128-bits of Znz and zero the rest. */ /* Preserve the first 128-bits of Znz and zero the rest. */
.macro _sve_flush_z nz .macro _sve_flush_z nz
_sve_check_zreg \nz _sve_check_zreg \nz
...@@ -254,3 +319,25 @@ ...@@ -254,3 +319,25 @@
ldr w\nxtmp, [\xpfpsr, #4] ldr w\nxtmp, [\xpfpsr, #4]
msr fpcr, x\nxtmp msr fpcr, x\nxtmp
.endm .endm
.macro sme_save_za nxbase, xvl, nw
mov w\nw, #0
423:
_sme_str_zav \nw, \nxbase
add x\nxbase, x\nxbase, \xvl
add x\nw, x\nw, #1
cmp \xvl, x\nw
bne 423b
.endm
.macro sme_load_za nxbase, xvl, nw
mov w\nw, #0
423:
_sme_ldr_zav \nw, \nxbase
add x\nxbase, x\nxbase, \xvl
add x\nw, x\nw, #1
cmp \xvl, x\nw
bne 423b
.endm
...@@ -109,6 +109,14 @@ ...@@ -109,6 +109,14 @@
#define KERNEL_HWCAP_AFP __khwcap2_feature(AFP) #define KERNEL_HWCAP_AFP __khwcap2_feature(AFP)
#define KERNEL_HWCAP_RPRES __khwcap2_feature(RPRES) #define KERNEL_HWCAP_RPRES __khwcap2_feature(RPRES)
#define KERNEL_HWCAP_MTE3 __khwcap2_feature(MTE3) #define KERNEL_HWCAP_MTE3 __khwcap2_feature(MTE3)
#define KERNEL_HWCAP_SME __khwcap2_feature(SME)
#define KERNEL_HWCAP_SME_I16I64 __khwcap2_feature(SME_I16I64)
#define KERNEL_HWCAP_SME_F64F64 __khwcap2_feature(SME_F64F64)
#define KERNEL_HWCAP_SME_I8I32 __khwcap2_feature(SME_I8I32)
#define KERNEL_HWCAP_SME_F16F32 __khwcap2_feature(SME_F16F32)
#define KERNEL_HWCAP_SME_B16F32 __khwcap2_feature(SME_B16F32)
#define KERNEL_HWCAP_SME_F32F32 __khwcap2_feature(SME_F32F32)
#define KERNEL_HWCAP_SME_FA64 __khwcap2_feature(SME_FA64)
/* /*
* This yields a mask that user programs can use to figure out what * This yields a mask that user programs can use to figure out what
......
...@@ -279,6 +279,7 @@ ...@@ -279,6 +279,7 @@
#define CPTR_EL2_TCPAC (1U << 31) #define CPTR_EL2_TCPAC (1U << 31)
#define CPTR_EL2_TAM (1 << 30) #define CPTR_EL2_TAM (1 << 30)
#define CPTR_EL2_TTA (1 << 20) #define CPTR_EL2_TTA (1 << 20)
#define CPTR_EL2_TSM (1 << 12)
#define CPTR_EL2_TFP (1 << CPTR_EL2_TFP_SHIFT) #define CPTR_EL2_TFP (1 << CPTR_EL2_TFP_SHIFT)
#define CPTR_EL2_TZ (1 << 8) #define CPTR_EL2_TZ (1 << 8)
#define CPTR_NVHE_EL2_RES1 0x000032ff /* known RES1 bits in CPTR_EL2 (nVHE) */ #define CPTR_NVHE_EL2_RES1 0x000032ff /* known RES1 bits in CPTR_EL2 (nVHE) */
......
...@@ -295,8 +295,11 @@ struct vcpu_reset_state { ...@@ -295,8 +295,11 @@ struct vcpu_reset_state {
struct kvm_vcpu_arch { struct kvm_vcpu_arch {
struct kvm_cpu_context ctxt; struct kvm_cpu_context ctxt;
/* Guest floating point state */
void *sve_state; void *sve_state;
unsigned int sve_max_vl; unsigned int sve_max_vl;
u64 svcr;
/* Stage 2 paging state used by the hardware on next switch */ /* Stage 2 paging state used by the hardware on next switch */
struct kvm_s2_mmu *hw_mmu; struct kvm_s2_mmu *hw_mmu;
...@@ -451,6 +454,7 @@ struct kvm_vcpu_arch { ...@@ -451,6 +454,7 @@ struct kvm_vcpu_arch {
#define KVM_ARM64_DEBUG_STATE_SAVE_TRBE (1 << 13) /* Save TRBE context if active */ #define KVM_ARM64_DEBUG_STATE_SAVE_TRBE (1 << 13) /* Save TRBE context if active */
#define KVM_ARM64_FP_FOREIGN_FPSTATE (1 << 14) #define KVM_ARM64_FP_FOREIGN_FPSTATE (1 << 14)
#define KVM_ARM64_ON_UNSUPPORTED_CPU (1 << 15) /* Physical CPU not in supported_cpus */ #define KVM_ARM64_ON_UNSUPPORTED_CPU (1 << 15) /* Physical CPU not in supported_cpus */
#define KVM_ARM64_HOST_SME_ENABLED (1 << 16) /* SME enabled for EL0 */
#define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | \ #define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | \
KVM_GUESTDBG_USE_SW_BP | \ KVM_GUESTDBG_USE_SW_BP | \
......
...@@ -118,6 +118,7 @@ struct debug_info { ...@@ -118,6 +118,7 @@ struct debug_info {
enum vec_type { enum vec_type {
ARM64_VEC_SVE = 0, ARM64_VEC_SVE = 0,
ARM64_VEC_SME,
ARM64_VEC_MAX, ARM64_VEC_MAX,
}; };
...@@ -153,6 +154,7 @@ struct thread_struct { ...@@ -153,6 +154,7 @@ struct thread_struct {
unsigned int fpsimd_cpu; unsigned int fpsimd_cpu;
void *sve_state; /* SVE registers, if any */ void *sve_state; /* SVE registers, if any */
void *za_state; /* ZA register, if any */
unsigned int vl[ARM64_VEC_MAX]; /* vector length */ unsigned int vl[ARM64_VEC_MAX]; /* vector length */
unsigned int vl_onexec[ARM64_VEC_MAX]; /* vl after next exec */ unsigned int vl_onexec[ARM64_VEC_MAX]; /* vl after next exec */
unsigned long fault_address; /* fault info */ unsigned long fault_address; /* fault info */
...@@ -168,6 +170,8 @@ struct thread_struct { ...@@ -168,6 +170,8 @@ struct thread_struct {
u64 mte_ctrl; u64 mte_ctrl;
#endif #endif
u64 sctlr_user; u64 sctlr_user;
u64 svcr;
u64 tpidr2_el0;
}; };
static inline unsigned int thread_get_vl(struct thread_struct *thread, static inline unsigned int thread_get_vl(struct thread_struct *thread,
...@@ -181,6 +185,19 @@ static inline unsigned int thread_get_sve_vl(struct thread_struct *thread) ...@@ -181,6 +185,19 @@ static inline unsigned int thread_get_sve_vl(struct thread_struct *thread)
return thread_get_vl(thread, ARM64_VEC_SVE); return thread_get_vl(thread, ARM64_VEC_SVE);
} }
static inline unsigned int thread_get_sme_vl(struct thread_struct *thread)
{
return thread_get_vl(thread, ARM64_VEC_SME);
}
static inline unsigned int thread_get_cur_vl(struct thread_struct *thread)
{
if (system_supports_sme() && (thread->svcr & SYS_SVCR_EL0_SM_MASK))
return thread_get_sme_vl(thread);
else
return thread_get_sve_vl(thread);
}
unsigned int task_get_vl(const struct task_struct *task, enum vec_type type); unsigned int task_get_vl(const struct task_struct *task, enum vec_type type);
void task_set_vl(struct task_struct *task, enum vec_type type, void task_set_vl(struct task_struct *task, enum vec_type type,
unsigned long vl); unsigned long vl);
...@@ -194,6 +211,11 @@ static inline unsigned int task_get_sve_vl(const struct task_struct *task) ...@@ -194,6 +211,11 @@ static inline unsigned int task_get_sve_vl(const struct task_struct *task)
return task_get_vl(task, ARM64_VEC_SVE); return task_get_vl(task, ARM64_VEC_SVE);
} }
static inline unsigned int task_get_sme_vl(const struct task_struct *task)
{
return task_get_vl(task, ARM64_VEC_SME);
}
static inline void task_set_sve_vl(struct task_struct *task, unsigned long vl) static inline void task_set_sve_vl(struct task_struct *task, unsigned long vl)
{ {
task_set_vl(task, ARM64_VEC_SVE, vl); task_set_vl(task, ARM64_VEC_SVE, vl);
...@@ -354,9 +376,11 @@ extern void __init minsigstksz_setup(void); ...@@ -354,9 +376,11 @@ extern void __init minsigstksz_setup(void);
*/ */
#include <asm/fpsimd.h> #include <asm/fpsimd.h>
/* Userspace interface for PR_SVE_{SET,GET}_VL prctl()s: */ /* Userspace interface for PR_S[MV]E_{SET,GET}_VL prctl()s: */
#define SVE_SET_VL(arg) sve_set_current_vl(arg) #define SVE_SET_VL(arg) sve_set_current_vl(arg)
#define SVE_GET_VL() sve_get_current_vl() #define SVE_GET_VL() sve_get_current_vl()
#define SME_SET_VL(arg) sme_set_current_vl(arg)
#define SME_GET_VL() sme_get_current_vl()
/* PR_PAC_RESET_KEYS prctl */ /* PR_PAC_RESET_KEYS prctl */
#define PAC_RESET_KEYS(tsk, arg) ptrauth_prctl_reset_keys(tsk, arg) #define PAC_RESET_KEYS(tsk, arg) ptrauth_prctl_reset_keys(tsk, arg)
......
...@@ -118,6 +118,10 @@ ...@@ -118,6 +118,10 @@
* System registers, organised loosely by encoding but grouped together * System registers, organised loosely by encoding but grouped together
* where the architected name contains an index. e.g. ID_MMFR<n>_EL1. * where the architected name contains an index. e.g. ID_MMFR<n>_EL1.
*/ */
#define SYS_SVCR_SMSTOP_SM_EL0 sys_reg(0, 3, 4, 2, 3)
#define SYS_SVCR_SMSTART_SM_EL0 sys_reg(0, 3, 4, 3, 3)
#define SYS_SVCR_SMSTOP_SMZA_EL0 sys_reg(0, 3, 4, 6, 3)
#define SYS_OSDTRRX_EL1 sys_reg(2, 0, 0, 0, 2) #define SYS_OSDTRRX_EL1 sys_reg(2, 0, 0, 0, 2)
#define SYS_MDCCINT_EL1 sys_reg(2, 0, 0, 2, 0) #define SYS_MDCCINT_EL1 sys_reg(2, 0, 0, 2, 0)
#define SYS_MDSCR_EL1 sys_reg(2, 0, 0, 2, 2) #define SYS_MDSCR_EL1 sys_reg(2, 0, 0, 2, 2)
...@@ -181,6 +185,7 @@ ...@@ -181,6 +185,7 @@
#define SYS_ID_AA64PFR0_EL1 sys_reg(3, 0, 0, 4, 0) #define SYS_ID_AA64PFR0_EL1 sys_reg(3, 0, 0, 4, 0)
#define SYS_ID_AA64PFR1_EL1 sys_reg(3, 0, 0, 4, 1) #define SYS_ID_AA64PFR1_EL1 sys_reg(3, 0, 0, 4, 1)
#define SYS_ID_AA64ZFR0_EL1 sys_reg(3, 0, 0, 4, 4) #define SYS_ID_AA64ZFR0_EL1 sys_reg(3, 0, 0, 4, 4)
#define SYS_ID_AA64SMFR0_EL1 sys_reg(3, 0, 0, 4, 5)
#define SYS_ID_AA64DFR0_EL1 sys_reg(3, 0, 0, 5, 0) #define SYS_ID_AA64DFR0_EL1 sys_reg(3, 0, 0, 5, 0)
#define SYS_ID_AA64DFR1_EL1 sys_reg(3, 0, 0, 5, 1) #define SYS_ID_AA64DFR1_EL1 sys_reg(3, 0, 0, 5, 1)
...@@ -204,6 +209,8 @@ ...@@ -204,6 +209,8 @@
#define SYS_ZCR_EL1 sys_reg(3, 0, 1, 2, 0) #define SYS_ZCR_EL1 sys_reg(3, 0, 1, 2, 0)
#define SYS_TRFCR_EL1 sys_reg(3, 0, 1, 2, 1) #define SYS_TRFCR_EL1 sys_reg(3, 0, 1, 2, 1)
#define SYS_SMPRI_EL1 sys_reg(3, 0, 1, 2, 4)
#define SYS_SMCR_EL1 sys_reg(3, 0, 1, 2, 6)
#define SYS_TTBR0_EL1 sys_reg(3, 0, 2, 0, 0) #define SYS_TTBR0_EL1 sys_reg(3, 0, 2, 0, 0)
#define SYS_TTBR1_EL1 sys_reg(3, 0, 2, 0, 1) #define SYS_TTBR1_EL1 sys_reg(3, 0, 2, 0, 1)
...@@ -396,6 +403,8 @@ ...@@ -396,6 +403,8 @@
#define TRBIDR_ALIGN_MASK GENMASK(3, 0) #define TRBIDR_ALIGN_MASK GENMASK(3, 0)
#define TRBIDR_ALIGN_SHIFT 0 #define TRBIDR_ALIGN_SHIFT 0
#define SMPRI_EL1_PRIORITY_MASK 0xf
#define SYS_PMINTENSET_EL1 sys_reg(3, 0, 9, 14, 1) #define SYS_PMINTENSET_EL1 sys_reg(3, 0, 9, 14, 1)
#define SYS_PMINTENCLR_EL1 sys_reg(3, 0, 9, 14, 2) #define SYS_PMINTENCLR_EL1 sys_reg(3, 0, 9, 14, 2)
...@@ -451,8 +460,13 @@ ...@@ -451,8 +460,13 @@
#define SYS_CCSIDR_EL1 sys_reg(3, 1, 0, 0, 0) #define SYS_CCSIDR_EL1 sys_reg(3, 1, 0, 0, 0)
#define SYS_CLIDR_EL1 sys_reg(3, 1, 0, 0, 1) #define SYS_CLIDR_EL1 sys_reg(3, 1, 0, 0, 1)
#define SYS_GMID_EL1 sys_reg(3, 1, 0, 0, 4) #define SYS_GMID_EL1 sys_reg(3, 1, 0, 0, 4)
#define SYS_SMIDR_EL1 sys_reg(3, 1, 0, 0, 6)
#define SYS_AIDR_EL1 sys_reg(3, 1, 0, 0, 7) #define SYS_AIDR_EL1 sys_reg(3, 1, 0, 0, 7)
#define SYS_SMIDR_EL1_IMPLEMENTER_SHIFT 24
#define SYS_SMIDR_EL1_SMPS_SHIFT 15
#define SYS_SMIDR_EL1_AFFINITY_SHIFT 0
#define SYS_CSSELR_EL1 sys_reg(3, 2, 0, 0, 0) #define SYS_CSSELR_EL1 sys_reg(3, 2, 0, 0, 0)
#define SYS_CTR_EL0 sys_reg(3, 3, 0, 0, 1) #define SYS_CTR_EL0 sys_reg(3, 3, 0, 0, 1)
...@@ -461,6 +475,10 @@ ...@@ -461,6 +475,10 @@
#define SYS_RNDR_EL0 sys_reg(3, 3, 2, 4, 0) #define SYS_RNDR_EL0 sys_reg(3, 3, 2, 4, 0)
#define SYS_RNDRRS_EL0 sys_reg(3, 3, 2, 4, 1) #define SYS_RNDRRS_EL0 sys_reg(3, 3, 2, 4, 1)
#define SYS_SVCR_EL0 sys_reg(3, 3, 4, 2, 2)
#define SYS_SVCR_EL0_ZA_MASK 2
#define SYS_SVCR_EL0_SM_MASK 1
#define SYS_PMCR_EL0 sys_reg(3, 3, 9, 12, 0) #define SYS_PMCR_EL0 sys_reg(3, 3, 9, 12, 0)
#define SYS_PMCNTENSET_EL0 sys_reg(3, 3, 9, 12, 1) #define SYS_PMCNTENSET_EL0 sys_reg(3, 3, 9, 12, 1)
#define SYS_PMCNTENCLR_EL0 sys_reg(3, 3, 9, 12, 2) #define SYS_PMCNTENCLR_EL0 sys_reg(3, 3, 9, 12, 2)
...@@ -477,6 +495,7 @@ ...@@ -477,6 +495,7 @@
#define SYS_TPIDR_EL0 sys_reg(3, 3, 13, 0, 2) #define SYS_TPIDR_EL0 sys_reg(3, 3, 13, 0, 2)
#define SYS_TPIDRRO_EL0 sys_reg(3, 3, 13, 0, 3) #define SYS_TPIDRRO_EL0 sys_reg(3, 3, 13, 0, 3)
#define SYS_TPIDR2_EL0 sys_reg(3, 3, 13, 0, 5)
#define SYS_SCXTNUM_EL0 sys_reg(3, 3, 13, 0, 7) #define SYS_SCXTNUM_EL0 sys_reg(3, 3, 13, 0, 7)
...@@ -546,6 +565,9 @@ ...@@ -546,6 +565,9 @@
#define SYS_HFGITR_EL2 sys_reg(3, 4, 1, 1, 6) #define SYS_HFGITR_EL2 sys_reg(3, 4, 1, 1, 6)
#define SYS_ZCR_EL2 sys_reg(3, 4, 1, 2, 0) #define SYS_ZCR_EL2 sys_reg(3, 4, 1, 2, 0)
#define SYS_TRFCR_EL2 sys_reg(3, 4, 1, 2, 1) #define SYS_TRFCR_EL2 sys_reg(3, 4, 1, 2, 1)
#define SYS_HCRX_EL2 sys_reg(3, 4, 1, 2, 2)
#define SYS_SMPRIMAP_EL2 sys_reg(3, 4, 1, 2, 5)
#define SYS_SMCR_EL2 sys_reg(3, 4, 1, 2, 6)
#define SYS_DACR32_EL2 sys_reg(3, 4, 3, 0, 0) #define SYS_DACR32_EL2 sys_reg(3, 4, 3, 0, 0)
#define SYS_HDFGRTR_EL2 sys_reg(3, 4, 3, 1, 4) #define SYS_HDFGRTR_EL2 sys_reg(3, 4, 3, 1, 4)
#define SYS_HDFGWTR_EL2 sys_reg(3, 4, 3, 1, 5) #define SYS_HDFGWTR_EL2 sys_reg(3, 4, 3, 1, 5)
...@@ -605,6 +627,7 @@ ...@@ -605,6 +627,7 @@
#define SYS_SCTLR_EL12 sys_reg(3, 5, 1, 0, 0) #define SYS_SCTLR_EL12 sys_reg(3, 5, 1, 0, 0)
#define SYS_CPACR_EL12 sys_reg(3, 5, 1, 0, 2) #define SYS_CPACR_EL12 sys_reg(3, 5, 1, 0, 2)
#define SYS_ZCR_EL12 sys_reg(3, 5, 1, 2, 0) #define SYS_ZCR_EL12 sys_reg(3, 5, 1, 2, 0)
#define SYS_SMCR_EL12 sys_reg(3, 5, 1, 2, 6)
#define SYS_TTBR0_EL12 sys_reg(3, 5, 2, 0, 0) #define SYS_TTBR0_EL12 sys_reg(3, 5, 2, 0, 0)
#define SYS_TTBR1_EL12 sys_reg(3, 5, 2, 0, 1) #define SYS_TTBR1_EL12 sys_reg(3, 5, 2, 0, 1)
#define SYS_TCR_EL12 sys_reg(3, 5, 2, 0, 2) #define SYS_TCR_EL12 sys_reg(3, 5, 2, 0, 2)
...@@ -628,6 +651,7 @@ ...@@ -628,6 +651,7 @@
#define SYS_CNTV_CVAL_EL02 sys_reg(3, 5, 14, 3, 2) #define SYS_CNTV_CVAL_EL02 sys_reg(3, 5, 14, 3, 2)
/* Common SCTLR_ELx flags. */ /* Common SCTLR_ELx flags. */
#define SCTLR_ELx_ENTP2 (BIT(60))
#define SCTLR_ELx_DSSBS (BIT(44)) #define SCTLR_ELx_DSSBS (BIT(44))
#define SCTLR_ELx_ATA (BIT(43)) #define SCTLR_ELx_ATA (BIT(43))
...@@ -836,6 +860,7 @@ ...@@ -836,6 +860,7 @@
#define ID_AA64PFR0_ELx_32BIT_64BIT 0x2 #define ID_AA64PFR0_ELx_32BIT_64BIT 0x2
/* id_aa64pfr1 */ /* id_aa64pfr1 */
#define ID_AA64PFR1_SME_SHIFT 24
#define ID_AA64PFR1_MPAMFRAC_SHIFT 16 #define ID_AA64PFR1_MPAMFRAC_SHIFT 16
#define ID_AA64PFR1_RASFRAC_SHIFT 12 #define ID_AA64PFR1_RASFRAC_SHIFT 12
#define ID_AA64PFR1_MTE_SHIFT 8 #define ID_AA64PFR1_MTE_SHIFT 8
...@@ -846,6 +871,7 @@ ...@@ -846,6 +871,7 @@
#define ID_AA64PFR1_SSBS_PSTATE_ONLY 1 #define ID_AA64PFR1_SSBS_PSTATE_ONLY 1
#define ID_AA64PFR1_SSBS_PSTATE_INSNS 2 #define ID_AA64PFR1_SSBS_PSTATE_INSNS 2
#define ID_AA64PFR1_BT_BTI 0x1 #define ID_AA64PFR1_BT_BTI 0x1
#define ID_AA64PFR1_SME 1
#define ID_AA64PFR1_MTE_NI 0x0 #define ID_AA64PFR1_MTE_NI 0x0
#define ID_AA64PFR1_MTE_EL0 0x1 #define ID_AA64PFR1_MTE_EL0 0x1
...@@ -874,6 +900,23 @@ ...@@ -874,6 +900,23 @@
#define ID_AA64ZFR0_AES_PMULL 0x2 #define ID_AA64ZFR0_AES_PMULL 0x2
#define ID_AA64ZFR0_SVEVER_SVE2 0x1 #define ID_AA64ZFR0_SVEVER_SVE2 0x1
/* id_aa64smfr0 */
#define ID_AA64SMFR0_FA64_SHIFT 63
#define ID_AA64SMFR0_I16I64_SHIFT 52
#define ID_AA64SMFR0_F64F64_SHIFT 48
#define ID_AA64SMFR0_I8I32_SHIFT 36
#define ID_AA64SMFR0_F16F32_SHIFT 35
#define ID_AA64SMFR0_B16F32_SHIFT 34
#define ID_AA64SMFR0_F32F32_SHIFT 32
#define ID_AA64SMFR0_FA64 0x1
#define ID_AA64SMFR0_I16I64 0x4
#define ID_AA64SMFR0_F64F64 0x1
#define ID_AA64SMFR0_I8I32 0x4
#define ID_AA64SMFR0_F16F32 0x1
#define ID_AA64SMFR0_B16F32 0x1
#define ID_AA64SMFR0_F32F32 0x1
/* id_aa64mmfr0 */ /* id_aa64mmfr0 */
#define ID_AA64MMFR0_ECV_SHIFT 60 #define ID_AA64MMFR0_ECV_SHIFT 60
#define ID_AA64MMFR0_FGT_SHIFT 56 #define ID_AA64MMFR0_FGT_SHIFT 56
...@@ -926,6 +969,7 @@ ...@@ -926,6 +969,7 @@
/* id_aa64mmfr1 */ /* id_aa64mmfr1 */
#define ID_AA64MMFR1_ECBHB_SHIFT 60 #define ID_AA64MMFR1_ECBHB_SHIFT 60
#define ID_AA64MMFR1_HCX_SHIFT 40
#define ID_AA64MMFR1_AFP_SHIFT 44 #define ID_AA64MMFR1_AFP_SHIFT 44
#define ID_AA64MMFR1_ETS_SHIFT 36 #define ID_AA64MMFR1_ETS_SHIFT 36
#define ID_AA64MMFR1_TWED_SHIFT 32 #define ID_AA64MMFR1_TWED_SHIFT 32
...@@ -1119,9 +1163,24 @@ ...@@ -1119,9 +1163,24 @@
#define ZCR_ELx_LEN_SIZE 9 #define ZCR_ELx_LEN_SIZE 9
#define ZCR_ELx_LEN_MASK 0x1ff #define ZCR_ELx_LEN_MASK 0x1ff
#define SMCR_ELx_FA64_SHIFT 31
#define SMCR_ELx_FA64_MASK (1 << SMCR_ELx_FA64_SHIFT)
/*
* The SMCR_ELx_LEN_* definitions intentionally include bits [8:4] which
* are reserved by the SME architecture for future expansion of the LEN
* field, with compatible semantics.
*/
#define SMCR_ELx_LEN_SHIFT 0
#define SMCR_ELx_LEN_SIZE 9
#define SMCR_ELx_LEN_MASK 0x1ff
#define CPACR_EL1_FPEN_EL1EN (BIT(20)) /* enable EL1 access */ #define CPACR_EL1_FPEN_EL1EN (BIT(20)) /* enable EL1 access */
#define CPACR_EL1_FPEN_EL0EN (BIT(21)) /* enable EL0 access, if EL1EN set */ #define CPACR_EL1_FPEN_EL0EN (BIT(21)) /* enable EL0 access, if EL1EN set */
#define CPACR_EL1_SMEN_EL1EN (BIT(24)) /* enable EL1 access */
#define CPACR_EL1_SMEN_EL0EN (BIT(25)) /* enable EL0 access, if EL1EN set */
#define CPACR_EL1_ZEN_EL1EN (BIT(16)) /* enable EL1 access */ #define CPACR_EL1_ZEN_EL1EN (BIT(16)) /* enable EL1 access */
#define CPACR_EL1_ZEN_EL0EN (BIT(17)) /* enable EL0 access, if EL1EN set */ #define CPACR_EL1_ZEN_EL0EN (BIT(17)) /* enable EL0 access, if EL1EN set */
...@@ -1170,6 +1229,8 @@ ...@@ -1170,6 +1229,8 @@
#define TRFCR_ELx_ExTRE BIT(1) #define TRFCR_ELx_ExTRE BIT(1)
#define TRFCR_ELx_E0TRE BIT(0) #define TRFCR_ELx_E0TRE BIT(0)
/* HCRX_EL2 definitions */
#define HCRX_EL2_SMPME_MASK (1 << 5)
/* GIC Hypervisor interface registers */ /* GIC Hypervisor interface registers */
/* ICH_MISR_EL2 bit definitions */ /* ICH_MISR_EL2 bit definitions */
...@@ -1233,6 +1294,12 @@ ...@@ -1233,6 +1294,12 @@
#define ICH_VTR_TDS_SHIFT 19 #define ICH_VTR_TDS_SHIFT 19
#define ICH_VTR_TDS_MASK (1 << ICH_VTR_TDS_SHIFT) #define ICH_VTR_TDS_MASK (1 << ICH_VTR_TDS_SHIFT)
/* HFG[WR]TR_EL2 bit definitions */
#define HFGxTR_EL2_nTPIDR2_EL0_SHIFT 55
#define HFGxTR_EL2_nTPIDR2_EL0_MASK BIT_MASK(HFGxTR_EL2_nTPIDR2_EL0_SHIFT)
#define HFGxTR_EL2_nSMPRI_EL1_SHIFT 54
#define HFGxTR_EL2_nSMPRI_EL1_MASK BIT_MASK(HFGxTR_EL2_nSMPRI_EL1_SHIFT)
#define ARM64_FEATURE_FIELD_BITS 4 #define ARM64_FEATURE_FIELD_BITS 4
/* Create a mask for the feature bits of the specified feature. */ /* Create a mask for the feature bits of the specified feature. */
......
...@@ -82,6 +82,8 @@ int arch_dup_task_struct(struct task_struct *dst, ...@@ -82,6 +82,8 @@ int arch_dup_task_struct(struct task_struct *dst,
#define TIF_SVE_VL_INHERIT 24 /* Inherit SVE vl_onexec across exec */ #define TIF_SVE_VL_INHERIT 24 /* Inherit SVE vl_onexec across exec */
#define TIF_SSBD 25 /* Wants SSB mitigation */ #define TIF_SSBD 25 /* Wants SSB mitigation */
#define TIF_TAGGED_ADDR 26 /* Allow tagged user addresses */ #define TIF_TAGGED_ADDR 26 /* Allow tagged user addresses */
#define TIF_SME 27 /* SME in use */
#define TIF_SME_VL_INHERIT 28 /* Inherit SME vl_onexec across exec */
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
#define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
......
...@@ -79,5 +79,13 @@ ...@@ -79,5 +79,13 @@
#define HWCAP2_AFP (1 << 20) #define HWCAP2_AFP (1 << 20)
#define HWCAP2_RPRES (1 << 21) #define HWCAP2_RPRES (1 << 21)
#define HWCAP2_MTE3 (1 << 22) #define HWCAP2_MTE3 (1 << 22)
#define HWCAP2_SME (1 << 23)
#define HWCAP2_SME_I16I64 (1 << 24)
#define HWCAP2_SME_F64F64 (1 << 25)
#define HWCAP2_SME_I8I32 (1 << 26)
#define HWCAP2_SME_F16F32 (1 << 27)
#define HWCAP2_SME_B16F32 (1 << 28)
#define HWCAP2_SME_F32F32 (1 << 29)
#define HWCAP2_SME_FA64 (1 << 30)
#endif /* _UAPI__ASM_HWCAP_H */ #endif /* _UAPI__ASM_HWCAP_H */
...@@ -109,7 +109,7 @@ struct user_hwdebug_state { ...@@ -109,7 +109,7 @@ struct user_hwdebug_state {
} dbg_regs[16]; } dbg_regs[16];
}; };
/* SVE/FP/SIMD state (NT_ARM_SVE) */ /* SVE/FP/SIMD state (NT_ARM_SVE & NT_ARM_SSVE) */
struct user_sve_header { struct user_sve_header {
__u32 size; /* total meaningful regset content in bytes */ __u32 size; /* total meaningful regset content in bytes */
...@@ -220,6 +220,7 @@ struct user_sve_header { ...@@ -220,6 +220,7 @@ struct user_sve_header {
(SVE_PT_SVE_PREG_OFFSET(vq, __SVE_NUM_PREGS) - \ (SVE_PT_SVE_PREG_OFFSET(vq, __SVE_NUM_PREGS) - \
SVE_PT_SVE_PREGS_OFFSET(vq)) SVE_PT_SVE_PREGS_OFFSET(vq))
/* For streaming mode SVE (SSVE) FFR must be read and written as zero */
#define SVE_PT_SVE_FFR_OFFSET(vq) \ #define SVE_PT_SVE_FFR_OFFSET(vq) \
(SVE_PT_REGS_OFFSET + __SVE_FFR_OFFSET(vq)) (SVE_PT_REGS_OFFSET + __SVE_FFR_OFFSET(vq))
...@@ -243,7 +244,9 @@ struct user_sve_header { ...@@ -243,7 +244,9 @@ struct user_sve_header {
#define SVE_PT_SIZE(vq, flags) \ #define SVE_PT_SIZE(vq, flags) \
(((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE ? \ (((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_SVE ? \
SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, flags) \ SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, flags) \
: SVE_PT_FPSIMD_OFFSET + SVE_PT_FPSIMD_SIZE(vq, flags)) : ((((flags) & SVE_PT_REGS_MASK) == SVE_PT_REGS_FPSIMD ? \
SVE_PT_FPSIMD_OFFSET + SVE_PT_FPSIMD_SIZE(vq, flags) \
: SVE_PT_REGS_OFFSET)))
/* pointer authentication masks (NT_ARM_PAC_MASK) */ /* pointer authentication masks (NT_ARM_PAC_MASK) */
...@@ -265,6 +268,62 @@ struct user_pac_generic_keys { ...@@ -265,6 +268,62 @@ struct user_pac_generic_keys {
__uint128_t apgakey; __uint128_t apgakey;
}; };
/* ZA state (NT_ARM_ZA) */
struct user_za_header {
__u32 size; /* total meaningful regset content in bytes */
__u32 max_size; /* maxmium possible size for this thread */
__u16 vl; /* current vector length */
__u16 max_vl; /* maximum possible vector length */
__u16 flags;
__u16 __reserved;
};
/*
* Common ZA_PT_* flags:
* These must be kept in sync with prctl interface in <linux/prctl.h>
*/
#define ZA_PT_VL_INHERIT ((1 << 17) /* PR_SME_VL_INHERIT */ >> 16)
#define ZA_PT_VL_ONEXEC ((1 << 18) /* PR_SME_SET_VL_ONEXEC */ >> 16)
/*
* The remainder of the ZA state follows struct user_za_header. The
* total size of the ZA state (including header) depends on the
* metadata in the header: ZA_PT_SIZE(vq, flags) gives the total size
* of the state in bytes, including the header.
*
* Refer to <asm/sigcontext.h> for details of how to pass the correct
* "vq" argument to these macros.
*/
/* Offset from the start of struct user_za_header to the register data */
#define ZA_PT_ZA_OFFSET \
((sizeof(struct user_za_header) + (__SVE_VQ_BYTES - 1)) \
/ __SVE_VQ_BYTES * __SVE_VQ_BYTES)
/*
* The payload starts at offset ZA_PT_ZA_OFFSET, and is of size
* ZA_PT_ZA_SIZE(vq, flags).
*
* The ZA array is stored as a sequence of horizontal vectors ZAV of SVL/8
* bytes each, starting from vector 0.
*
* Additional data might be appended in the future.
*
* The ZA matrix is represented in memory in an endianness-invariant layout
* which differs from the layout used for the FPSIMD V-registers on big-endian
* systems: see sigcontext.h for more explanation.
*/
#define ZA_PT_ZAV_OFFSET(vq, n) \
(ZA_PT_ZA_OFFSET + ((vq * __SVE_VQ_BYTES) * n))
#define ZA_PT_ZA_SIZE(vq) ((vq * __SVE_VQ_BYTES) * (vq * __SVE_VQ_BYTES))
#define ZA_PT_SIZE(vq) \
(ZA_PT_ZA_OFFSET + ZA_PT_ZA_SIZE(vq))
#endif /* __ASSEMBLY__ */ #endif /* __ASSEMBLY__ */
#endif /* _UAPI__ASM_PTRACE_H */ #endif /* _UAPI__ASM_PTRACE_H */
...@@ -132,6 +132,17 @@ struct extra_context { ...@@ -132,6 +132,17 @@ struct extra_context {
#define SVE_MAGIC 0x53564501 #define SVE_MAGIC 0x53564501
struct sve_context { struct sve_context {
struct _aarch64_ctx head;
__u16 vl;
__u16 flags;
__u16 __reserved[2];
};
#define SVE_SIG_FLAG_SM 0x1 /* Context describes streaming mode */
#define ZA_MAGIC 0x54366345
struct za_context {
struct _aarch64_ctx head; struct _aarch64_ctx head;
__u16 vl; __u16 vl;
__u16 __reserved[3]; __u16 __reserved[3];
...@@ -186,9 +197,16 @@ struct sve_context { ...@@ -186,9 +197,16 @@ struct sve_context {
* sve_context.vl must equal the thread's current vector length when * sve_context.vl must equal the thread's current vector length when
* doing a sigreturn. * doing a sigreturn.
* *
* On systems with support for SME the SVE register state may reflect either
* streaming or non-streaming mode. In streaming mode the streaming mode
* vector length will be used and the flag SVE_SIG_FLAG_SM will be set in
* the flags field. It is permitted to enter or leave streaming mode in
* a signal return, applications should take care to ensure that any difference
* in vector length between the two modes is handled, including any resizing
* and movement of context blocks.
* *
* Note: for all these macros, the "vq" argument denotes the SVE * Note: for all these macros, the "vq" argument denotes the vector length
* vector length in quadwords (i.e., units of 128 bits). * in quadwords (i.e., units of 128 bits).
* *
* The correct way to obtain vq is to use sve_vq_from_vl(vl). The * The correct way to obtain vq is to use sve_vq_from_vl(vl). The
* result is valid if and only if sve_vl_valid(vl) is true. This is * result is valid if and only if sve_vl_valid(vl) is true. This is
...@@ -249,4 +267,37 @@ struct sve_context { ...@@ -249,4 +267,37 @@ struct sve_context {
#define SVE_SIG_CONTEXT_SIZE(vq) \ #define SVE_SIG_CONTEXT_SIZE(vq) \
(SVE_SIG_REGS_OFFSET + SVE_SIG_REGS_SIZE(vq)) (SVE_SIG_REGS_OFFSET + SVE_SIG_REGS_SIZE(vq))
/*
* If the ZA register is enabled for the thread at signal delivery then,
* za_context.head.size >= ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl))
* and the register data may be accessed using the ZA_SIG_*() macros.
*
* If za_context.head.size < ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl))
* then ZA was not enabled and no register data was included in which case
* ZA register was not enabled for the thread and no register data
* the ZA_SIG_*() macros should not be used except for this check.
*
* The same convention applies when returning from a signal: a caller
* will need to remove or resize the za_context block if it wants to
* enable the ZA register when it was previously non-live or vice-versa.
* This may require the caller to allocate fresh memory and/or move other
* context blocks in the signal frame.
*
* Changing the vector length during signal return is not permitted:
* za_context.vl must equal the thread's current SME vector length when
* doing a sigreturn.
*/
#define ZA_SIG_REGS_OFFSET \
((sizeof(struct za_context) + (__SVE_VQ_BYTES - 1)) \
/ __SVE_VQ_BYTES * __SVE_VQ_BYTES)
#define ZA_SIG_REGS_SIZE(vq) ((vq * __SVE_VQ_BYTES) * (vq * __SVE_VQ_BYTES))
#define ZA_SIG_ZAV_OFFSET(vq, n) (ZA_SIG_REGS_OFFSET + \
(SVE_SIG_ZREG_SIZE(vq) * n))
#define ZA_SIG_CONTEXT_SIZE(vq) \
(ZA_SIG_REGS_OFFSET + ZA_SIG_REGS_SIZE(vq))
#endif /* _UAPI__ASM_SIGCONTEXT_H */ #endif /* _UAPI__ASM_SIGCONTEXT_H */
...@@ -261,6 +261,8 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = { ...@@ -261,6 +261,8 @@ static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
}; };
static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = { static const struct arm64_ftr_bits ftr_id_aa64pfr1[] = {
ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_SME_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_MPAMFRAC_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_MPAMFRAC_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_RASFRAC_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR1_RASFRAC_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_MTE), ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_MTE),
...@@ -293,6 +295,24 @@ static const struct arm64_ftr_bits ftr_id_aa64zfr0[] = { ...@@ -293,6 +295,24 @@ static const struct arm64_ftr_bits ftr_id_aa64zfr0[] = {
ARM64_FTR_END, ARM64_FTR_END,
}; };
static const struct arm64_ftr_bits ftr_id_aa64smfr0[] = {
ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_FA64_SHIFT, 1, 0),
ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_I16I64_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_F64F64_SHIFT, 1, 0),
ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_I8I32_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_F16F32_SHIFT, 1, 0),
ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_B16F32_SHIFT, 1, 0),
ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_F32F32_SHIFT, 1, 0),
ARM64_FTR_END,
};
static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = { static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = {
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR0_ECV_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR0_ECV_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR0_FGT_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR0_FGT_SHIFT, 4, 0),
...@@ -561,6 +581,12 @@ static const struct arm64_ftr_bits ftr_zcr[] = { ...@@ -561,6 +581,12 @@ static const struct arm64_ftr_bits ftr_zcr[] = {
ARM64_FTR_END, ARM64_FTR_END,
}; };
static const struct arm64_ftr_bits ftr_smcr[] = {
ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE,
SMCR_ELx_LEN_SHIFT, SMCR_ELx_LEN_SIZE, 0), /* LEN */
ARM64_FTR_END,
};
/* /*
* Common ftr bits for a 32bit register with all hidden, strict * Common ftr bits for a 32bit register with all hidden, strict
* attributes, with 4bit feature fields and a default safe value of * attributes, with 4bit feature fields and a default safe value of
...@@ -645,6 +671,7 @@ static const struct __ftr_reg_entry { ...@@ -645,6 +671,7 @@ static const struct __ftr_reg_entry {
ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64PFR1_EL1, ftr_id_aa64pfr1, ARM64_FTR_REG_OVERRIDE(SYS_ID_AA64PFR1_EL1, ftr_id_aa64pfr1,
&id_aa64pfr1_override), &id_aa64pfr1_override),
ARM64_FTR_REG(SYS_ID_AA64ZFR0_EL1, ftr_id_aa64zfr0), ARM64_FTR_REG(SYS_ID_AA64ZFR0_EL1, ftr_id_aa64zfr0),
ARM64_FTR_REG(SYS_ID_AA64SMFR0_EL1, ftr_id_aa64smfr0),
/* Op1 = 0, CRn = 0, CRm = 5 */ /* Op1 = 0, CRn = 0, CRm = 5 */
ARM64_FTR_REG(SYS_ID_AA64DFR0_EL1, ftr_id_aa64dfr0), ARM64_FTR_REG(SYS_ID_AA64DFR0_EL1, ftr_id_aa64dfr0),
...@@ -666,6 +693,7 @@ static const struct __ftr_reg_entry { ...@@ -666,6 +693,7 @@ static const struct __ftr_reg_entry {
/* Op1 = 0, CRn = 1, CRm = 2 */ /* Op1 = 0, CRn = 1, CRm = 2 */
ARM64_FTR_REG(SYS_ZCR_EL1, ftr_zcr), ARM64_FTR_REG(SYS_ZCR_EL1, ftr_zcr),
ARM64_FTR_REG(SYS_SMCR_EL1, ftr_smcr),
/* Op1 = 1, CRn = 0, CRm = 0 */ /* Op1 = 1, CRn = 0, CRm = 0 */
ARM64_FTR_REG(SYS_GMID_EL1, ftr_gmid), ARM64_FTR_REG(SYS_GMID_EL1, ftr_gmid),
...@@ -960,6 +988,7 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info) ...@@ -960,6 +988,7 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
init_cpu_ftr_reg(SYS_ID_AA64PFR0_EL1, info->reg_id_aa64pfr0); init_cpu_ftr_reg(SYS_ID_AA64PFR0_EL1, info->reg_id_aa64pfr0);
init_cpu_ftr_reg(SYS_ID_AA64PFR1_EL1, info->reg_id_aa64pfr1); init_cpu_ftr_reg(SYS_ID_AA64PFR1_EL1, info->reg_id_aa64pfr1);
init_cpu_ftr_reg(SYS_ID_AA64ZFR0_EL1, info->reg_id_aa64zfr0); init_cpu_ftr_reg(SYS_ID_AA64ZFR0_EL1, info->reg_id_aa64zfr0);
init_cpu_ftr_reg(SYS_ID_AA64SMFR0_EL1, info->reg_id_aa64smfr0);
if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0))
init_32bit_cpu_features(&info->aarch32); init_32bit_cpu_features(&info->aarch32);
...@@ -969,6 +998,12 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info) ...@@ -969,6 +998,12 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
vec_init_vq_map(ARM64_VEC_SVE); vec_init_vq_map(ARM64_VEC_SVE);
} }
if (id_aa64pfr1_sme(info->reg_id_aa64pfr1)) {
init_cpu_ftr_reg(SYS_SMCR_EL1, info->reg_smcr);
if (IS_ENABLED(CONFIG_ARM64_SME))
vec_init_vq_map(ARM64_VEC_SME);
}
if (id_aa64pfr1_mte(info->reg_id_aa64pfr1)) if (id_aa64pfr1_mte(info->reg_id_aa64pfr1))
init_cpu_ftr_reg(SYS_GMID_EL1, info->reg_gmid); init_cpu_ftr_reg(SYS_GMID_EL1, info->reg_gmid);
...@@ -1195,6 +1230,9 @@ void update_cpu_features(int cpu, ...@@ -1195,6 +1230,9 @@ void update_cpu_features(int cpu,
taint |= check_update_ftr_reg(SYS_ID_AA64ZFR0_EL1, cpu, taint |= check_update_ftr_reg(SYS_ID_AA64ZFR0_EL1, cpu,
info->reg_id_aa64zfr0, boot->reg_id_aa64zfr0); info->reg_id_aa64zfr0, boot->reg_id_aa64zfr0);
taint |= check_update_ftr_reg(SYS_ID_AA64SMFR0_EL1, cpu,
info->reg_id_aa64smfr0, boot->reg_id_aa64smfr0);
if (id_aa64pfr0_sve(info->reg_id_aa64pfr0)) { if (id_aa64pfr0_sve(info->reg_id_aa64pfr0)) {
taint |= check_update_ftr_reg(SYS_ZCR_EL1, cpu, taint |= check_update_ftr_reg(SYS_ZCR_EL1, cpu,
info->reg_zcr, boot->reg_zcr); info->reg_zcr, boot->reg_zcr);
...@@ -1205,6 +1243,16 @@ void update_cpu_features(int cpu, ...@@ -1205,6 +1243,16 @@ void update_cpu_features(int cpu,
vec_update_vq_map(ARM64_VEC_SVE); vec_update_vq_map(ARM64_VEC_SVE);
} }
if (id_aa64pfr1_sme(info->reg_id_aa64pfr1)) {
taint |= check_update_ftr_reg(SYS_SMCR_EL1, cpu,
info->reg_smcr, boot->reg_smcr);
/* Probe vector lengths, unless we already gave up on SME */
if (id_aa64pfr1_sme(read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1)) &&
!system_capabilities_finalized())
vec_update_vq_map(ARM64_VEC_SME);
}
/* /*
* The kernel uses the LDGM/STGM instructions and the number of tags * The kernel uses the LDGM/STGM instructions and the number of tags
* they read/write depends on the GMID_EL1.BS field. Check that the * they read/write depends on the GMID_EL1.BS field. Check that the
...@@ -1288,6 +1336,7 @@ u64 __read_sysreg_by_encoding(u32 sys_id) ...@@ -1288,6 +1336,7 @@ u64 __read_sysreg_by_encoding(u32 sys_id)
read_sysreg_case(SYS_ID_AA64PFR0_EL1); read_sysreg_case(SYS_ID_AA64PFR0_EL1);
read_sysreg_case(SYS_ID_AA64PFR1_EL1); read_sysreg_case(SYS_ID_AA64PFR1_EL1);
read_sysreg_case(SYS_ID_AA64ZFR0_EL1); read_sysreg_case(SYS_ID_AA64ZFR0_EL1);
read_sysreg_case(SYS_ID_AA64SMFR0_EL1);
read_sysreg_case(SYS_ID_AA64DFR0_EL1); read_sysreg_case(SYS_ID_AA64DFR0_EL1);
read_sysreg_case(SYS_ID_AA64DFR1_EL1); read_sysreg_case(SYS_ID_AA64DFR1_EL1);
read_sysreg_case(SYS_ID_AA64MMFR0_EL1); read_sysreg_case(SYS_ID_AA64MMFR0_EL1);
...@@ -2442,6 +2491,33 @@ static const struct arm64_cpu_capabilities arm64_features[] = { ...@@ -2442,6 +2491,33 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
.matches = has_cpuid_feature, .matches = has_cpuid_feature,
.min_field_value = 1, .min_field_value = 1,
}, },
#ifdef CONFIG_ARM64_SME
{
.desc = "Scalable Matrix Extension",
.type = ARM64_CPUCAP_SYSTEM_FEATURE,
.capability = ARM64_SME,
.sys_reg = SYS_ID_AA64PFR1_EL1,
.sign = FTR_UNSIGNED,
.field_pos = ID_AA64PFR1_SME_SHIFT,
.field_width = 4,
.min_field_value = ID_AA64PFR1_SME,
.matches = has_cpuid_feature,
.cpu_enable = sme_kernel_enable,
},
/* FA64 should be sorted after the base SME capability */
{
.desc = "FA64",
.type = ARM64_CPUCAP_SYSTEM_FEATURE,
.capability = ARM64_SME_FA64,
.sys_reg = SYS_ID_AA64SMFR0_EL1,
.sign = FTR_UNSIGNED,
.field_pos = ID_AA64SMFR0_FA64_SHIFT,
.field_width = 1,
.min_field_value = ID_AA64SMFR0_FA64,
.matches = has_cpuid_feature,
.cpu_enable = fa64_kernel_enable,
},
#endif /* CONFIG_ARM64_SME */
{}, {},
}; };
...@@ -2575,6 +2651,16 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = { ...@@ -2575,6 +2651,16 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = {
HWCAP_CAP(SYS_ID_AA64MMFR0_EL1, ID_AA64MMFR0_ECV_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_ECV), HWCAP_CAP(SYS_ID_AA64MMFR0_EL1, ID_AA64MMFR0_ECV_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_ECV),
HWCAP_CAP(SYS_ID_AA64MMFR1_EL1, ID_AA64MMFR1_AFP_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_AFP), HWCAP_CAP(SYS_ID_AA64MMFR1_EL1, ID_AA64MMFR1_AFP_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_AFP),
HWCAP_CAP(SYS_ID_AA64ISAR2_EL1, ID_AA64ISAR2_RPRES_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_RPRES), HWCAP_CAP(SYS_ID_AA64ISAR2_EL1, ID_AA64ISAR2_RPRES_SHIFT, 4, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_RPRES),
#ifdef CONFIG_ARM64_SME
HWCAP_CAP(SYS_ID_AA64PFR1_EL1, ID_AA64PFR1_SME_SHIFT, 4, FTR_UNSIGNED, ID_AA64PFR1_SME, CAP_HWCAP, KERNEL_HWCAP_SME),
HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_FA64_SHIFT, 1, FTR_UNSIGNED, ID_AA64SMFR0_FA64, CAP_HWCAP, KERNEL_HWCAP_SME_FA64),
HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_I16I64_SHIFT, 4, FTR_UNSIGNED, ID_AA64SMFR0_I16I64, CAP_HWCAP, KERNEL_HWCAP_SME_I16I64),
HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_F64F64_SHIFT, 1, FTR_UNSIGNED, ID_AA64SMFR0_F64F64, CAP_HWCAP, KERNEL_HWCAP_SME_F64F64),
HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_I8I32_SHIFT, 4, FTR_UNSIGNED, ID_AA64SMFR0_I8I32, CAP_HWCAP, KERNEL_HWCAP_SME_I8I32),
HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_F16F32_SHIFT, 1, FTR_UNSIGNED, ID_AA64SMFR0_F16F32, CAP_HWCAP, KERNEL_HWCAP_SME_F16F32),
HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_B16F32_SHIFT, 1, FTR_UNSIGNED, ID_AA64SMFR0_B16F32, CAP_HWCAP, KERNEL_HWCAP_SME_B16F32),
HWCAP_CAP(SYS_ID_AA64SMFR0_EL1, ID_AA64SMFR0_F32F32_SHIFT, 1, FTR_UNSIGNED, ID_AA64SMFR0_F32F32, CAP_HWCAP, KERNEL_HWCAP_SME_F32F32),
#endif /* CONFIG_ARM64_SME */
{}, {},
}; };
...@@ -2872,6 +2958,23 @@ static void verify_sve_features(void) ...@@ -2872,6 +2958,23 @@ static void verify_sve_features(void)
/* Add checks on other ZCR bits here if necessary */ /* Add checks on other ZCR bits here if necessary */
} }
static void verify_sme_features(void)
{
u64 safe_smcr = read_sanitised_ftr_reg(SYS_SMCR_EL1);
u64 smcr = read_smcr_features();
unsigned int safe_len = safe_smcr & SMCR_ELx_LEN_MASK;
unsigned int len = smcr & SMCR_ELx_LEN_MASK;
if (len < safe_len || vec_verify_vq_map(ARM64_VEC_SME)) {
pr_crit("CPU%d: SME: vector length support mismatch\n",
smp_processor_id());
cpu_die_early();
}
/* Add checks on other SMCR bits here if necessary */
}
static void verify_hyp_capabilities(void) static void verify_hyp_capabilities(void)
{ {
u64 safe_mmfr1, mmfr0, mmfr1; u64 safe_mmfr1, mmfr0, mmfr1;
...@@ -2924,6 +3027,9 @@ static void verify_local_cpu_capabilities(void) ...@@ -2924,6 +3027,9 @@ static void verify_local_cpu_capabilities(void)
if (system_supports_sve()) if (system_supports_sve())
verify_sve_features(); verify_sve_features();
if (system_supports_sme())
verify_sme_features();
if (is_hyp_mode_available()) if (is_hyp_mode_available())
verify_hyp_capabilities(); verify_hyp_capabilities();
} }
...@@ -3041,6 +3147,7 @@ void __init setup_cpu_features(void) ...@@ -3041,6 +3147,7 @@ void __init setup_cpu_features(void)
pr_info("emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching\n"); pr_info("emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching\n");
sve_setup(); sve_setup();
sme_setup();
minsigstksz_setup(); minsigstksz_setup();
/* Advertise that we have computed the system capabilities */ /* Advertise that we have computed the system capabilities */
......
...@@ -98,6 +98,14 @@ static const char *const hwcap_str[] = { ...@@ -98,6 +98,14 @@ static const char *const hwcap_str[] = {
[KERNEL_HWCAP_AFP] = "afp", [KERNEL_HWCAP_AFP] = "afp",
[KERNEL_HWCAP_RPRES] = "rpres", [KERNEL_HWCAP_RPRES] = "rpres",
[KERNEL_HWCAP_MTE3] = "mte3", [KERNEL_HWCAP_MTE3] = "mte3",
[KERNEL_HWCAP_SME] = "sme",
[KERNEL_HWCAP_SME_I16I64] = "smei16i64",
[KERNEL_HWCAP_SME_F64F64] = "smef64f64",
[KERNEL_HWCAP_SME_I8I32] = "smei8i32",
[KERNEL_HWCAP_SME_F16F32] = "smef16f32",
[KERNEL_HWCAP_SME_B16F32] = "smeb16f32",
[KERNEL_HWCAP_SME_F32F32] = "smef32f32",
[KERNEL_HWCAP_SME_FA64] = "smefa64",
}; };
#ifdef CONFIG_COMPAT #ifdef CONFIG_COMPAT
...@@ -401,6 +409,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info) ...@@ -401,6 +409,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
info->reg_id_aa64pfr0 = read_cpuid(ID_AA64PFR0_EL1); info->reg_id_aa64pfr0 = read_cpuid(ID_AA64PFR0_EL1);
info->reg_id_aa64pfr1 = read_cpuid(ID_AA64PFR1_EL1); info->reg_id_aa64pfr1 = read_cpuid(ID_AA64PFR1_EL1);
info->reg_id_aa64zfr0 = read_cpuid(ID_AA64ZFR0_EL1); info->reg_id_aa64zfr0 = read_cpuid(ID_AA64ZFR0_EL1);
info->reg_id_aa64smfr0 = read_cpuid(ID_AA64SMFR0_EL1);
if (id_aa64pfr1_mte(info->reg_id_aa64pfr1)) if (id_aa64pfr1_mte(info->reg_id_aa64pfr1))
info->reg_gmid = read_cpuid(GMID_EL1); info->reg_gmid = read_cpuid(GMID_EL1);
...@@ -412,6 +421,10 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info) ...@@ -412,6 +421,10 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
id_aa64pfr0_sve(info->reg_id_aa64pfr0)) id_aa64pfr0_sve(info->reg_id_aa64pfr0))
info->reg_zcr = read_zcr_features(); info->reg_zcr = read_zcr_features();
if (IS_ENABLED(CONFIG_ARM64_SME) &&
id_aa64pfr1_sme(info->reg_id_aa64pfr1))
info->reg_smcr = read_smcr_features();
cpuinfo_detect_icache_policy(info); cpuinfo_detect_icache_policy(info);
} }
......
...@@ -537,6 +537,14 @@ static void noinstr el0_sve_acc(struct pt_regs *regs, unsigned long esr) ...@@ -537,6 +537,14 @@ static void noinstr el0_sve_acc(struct pt_regs *regs, unsigned long esr)
exit_to_user_mode(regs); exit_to_user_mode(regs);
} }
static void noinstr el0_sme_acc(struct pt_regs *regs, unsigned long esr)
{
enter_from_user_mode(regs);
local_daif_restore(DAIF_PROCCTX);
do_sme_acc(esr, regs);
exit_to_user_mode(regs);
}
static void noinstr el0_fpsimd_exc(struct pt_regs *regs, unsigned long esr) static void noinstr el0_fpsimd_exc(struct pt_regs *regs, unsigned long esr)
{ {
enter_from_user_mode(regs); enter_from_user_mode(regs);
...@@ -645,6 +653,9 @@ asmlinkage void noinstr el0t_64_sync_handler(struct pt_regs *regs) ...@@ -645,6 +653,9 @@ asmlinkage void noinstr el0t_64_sync_handler(struct pt_regs *regs)
case ESR_ELx_EC_SVE: case ESR_ELx_EC_SVE:
el0_sve_acc(regs, esr); el0_sve_acc(regs, esr);
break; break;
case ESR_ELx_EC_SME:
el0_sme_acc(regs, esr);
break;
case ESR_ELx_EC_FP_EXC64: case ESR_ELx_EC_FP_EXC64:
el0_fpsimd_exc(regs, esr); el0_fpsimd_exc(regs, esr);
break; break;
......
...@@ -86,3 +86,39 @@ SYM_FUNC_START(sve_flush_live) ...@@ -86,3 +86,39 @@ SYM_FUNC_START(sve_flush_live)
SYM_FUNC_END(sve_flush_live) SYM_FUNC_END(sve_flush_live)
#endif /* CONFIG_ARM64_SVE */ #endif /* CONFIG_ARM64_SVE */
#ifdef CONFIG_ARM64_SME
SYM_FUNC_START(sme_get_vl)
_sme_rdsvl 0, 1
ret
SYM_FUNC_END(sme_get_vl)
SYM_FUNC_START(sme_set_vq)
sme_load_vq x0, x1, x2
ret
SYM_FUNC_END(sme_set_vq)
/*
* Save the SME state
*
* x0 - pointer to buffer for state
*/
SYM_FUNC_START(za_save_state)
_sme_rdsvl 1, 1 // x1 = VL/8
sme_save_za 0, x1, 12
ret
SYM_FUNC_END(za_save_state)
/*
* Load the SME state
*
* x0 - pointer to buffer for state
*/
SYM_FUNC_START(za_load_state)
_sme_rdsvl 1, 1 // x1 = VL/8
sme_load_za 0, x1, 12
ret
SYM_FUNC_END(za_load_state)
#endif /* CONFIG_ARM64_SME */
This diff is collapsed.
...@@ -250,6 +250,8 @@ void show_regs(struct pt_regs *regs) ...@@ -250,6 +250,8 @@ void show_regs(struct pt_regs *regs)
static void tls_thread_flush(void) static void tls_thread_flush(void)
{ {
write_sysreg(0, tpidr_el0); write_sysreg(0, tpidr_el0);
if (system_supports_tpidr2())
write_sysreg_s(0, SYS_TPIDR2_EL0);
if (is_compat_task()) { if (is_compat_task()) {
current->thread.uw.tp_value = 0; current->thread.uw.tp_value = 0;
...@@ -298,16 +300,42 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) ...@@ -298,16 +300,42 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
/* /*
* Detach src's sve_state (if any) from dst so that it does not * Detach src's sve_state (if any) from dst so that it does not
* get erroneously used or freed prematurely. dst's sve_state * get erroneously used or freed prematurely. dst's copies
* will be allocated on demand later on if dst uses SVE. * will be allocated on demand later on if dst uses SVE.
* For consistency, also clear TIF_SVE here: this could be done * For consistency, also clear TIF_SVE here: this could be done
* later in copy_process(), but to avoid tripping up future * later in copy_process(), but to avoid tripping up future
* maintainers it is best not to leave TIF_SVE and sve_state in * maintainers it is best not to leave TIF flags and buffers in
* an inconsistent state, even temporarily. * an inconsistent state, even temporarily.
*/ */
dst->thread.sve_state = NULL; dst->thread.sve_state = NULL;
clear_tsk_thread_flag(dst, TIF_SVE); clear_tsk_thread_flag(dst, TIF_SVE);
/*
* In the unlikely event that we create a new thread with ZA
* enabled we should retain the ZA state so duplicate it here.
* This may be shortly freed if we exec() or if CLONE_SETTLS
* but it's simpler to do it here. To avoid confusing the rest
* of the code ensure that we have a sve_state allocated
* whenever za_state is allocated.
*/
if (thread_za_enabled(&src->thread)) {
dst->thread.sve_state = kzalloc(sve_state_size(src),
GFP_KERNEL);
if (!dst->thread.sve_state)
return -ENOMEM;
dst->thread.za_state = kmemdup(src->thread.za_state,
za_state_size(src),
GFP_KERNEL);
if (!dst->thread.za_state) {
kfree(dst->thread.sve_state);
dst->thread.sve_state = NULL;
return -ENOMEM;
}
} else {
dst->thread.za_state = NULL;
clear_tsk_thread_flag(dst, TIF_SME);
}
/* clear any pending asynchronous tag fault raised by the parent */ /* clear any pending asynchronous tag fault raised by the parent */
clear_tsk_thread_flag(dst, TIF_MTE_ASYNC_FAULT); clear_tsk_thread_flag(dst, TIF_MTE_ASYNC_FAULT);
...@@ -343,6 +371,8 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, ...@@ -343,6 +371,8 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
* out-of-sync with the saved value. * out-of-sync with the saved value.
*/ */
*task_user_tls(p) = read_sysreg(tpidr_el0); *task_user_tls(p) = read_sysreg(tpidr_el0);
if (system_supports_tpidr2())
p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
if (stack_start) { if (stack_start) {
if (is_compat_thread(task_thread_info(p))) if (is_compat_thread(task_thread_info(p)))
...@@ -353,10 +383,12 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, ...@@ -353,10 +383,12 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
/* /*
* If a TLS pointer was passed to clone, use it for the new * If a TLS pointer was passed to clone, use it for the new
* thread. * thread. We also reset TPIDR2 if it's in use.
*/ */
if (clone_flags & CLONE_SETTLS) if (clone_flags & CLONE_SETTLS) {
p->thread.uw.tp_value = tls; p->thread.uw.tp_value = tls;
p->thread.tpidr2_el0 = 0;
}
} else { } else {
/* /*
* A kthread has no context to ERET to, so ensure any buggy * A kthread has no context to ERET to, so ensure any buggy
...@@ -387,6 +419,8 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, ...@@ -387,6 +419,8 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
void tls_preserve_current_state(void) void tls_preserve_current_state(void)
{ {
*task_user_tls(current) = read_sysreg(tpidr_el0); *task_user_tls(current) = read_sysreg(tpidr_el0);
if (system_supports_tpidr2() && !is_compat_task())
current->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
} }
static void tls_thread_switch(struct task_struct *next) static void tls_thread_switch(struct task_struct *next)
...@@ -399,6 +433,8 @@ static void tls_thread_switch(struct task_struct *next) ...@@ -399,6 +433,8 @@ static void tls_thread_switch(struct task_struct *next)
write_sysreg(0, tpidrro_el0); write_sysreg(0, tpidrro_el0);
write_sysreg(*task_user_tls(next), tpidr_el0); write_sysreg(*task_user_tls(next), tpidr_el0);
if (system_supports_tpidr2())
write_sysreg_s(next->thread.tpidr2_el0, SYS_TPIDR2_EL0);
} }
/* /*
......
This diff is collapsed.
...@@ -56,6 +56,7 @@ struct rt_sigframe_user_layout { ...@@ -56,6 +56,7 @@ struct rt_sigframe_user_layout {
unsigned long fpsimd_offset; unsigned long fpsimd_offset;
unsigned long esr_offset; unsigned long esr_offset;
unsigned long sve_offset; unsigned long sve_offset;
unsigned long za_offset;
unsigned long extra_offset; unsigned long extra_offset;
unsigned long end_offset; unsigned long end_offset;
}; };
...@@ -218,6 +219,7 @@ static int restore_fpsimd_context(struct fpsimd_context __user *ctx) ...@@ -218,6 +219,7 @@ static int restore_fpsimd_context(struct fpsimd_context __user *ctx)
struct user_ctxs { struct user_ctxs {
struct fpsimd_context __user *fpsimd; struct fpsimd_context __user *fpsimd;
struct sve_context __user *sve; struct sve_context __user *sve;
struct za_context __user *za;
}; };
#ifdef CONFIG_ARM64_SVE #ifdef CONFIG_ARM64_SVE
...@@ -226,11 +228,17 @@ static int preserve_sve_context(struct sve_context __user *ctx) ...@@ -226,11 +228,17 @@ static int preserve_sve_context(struct sve_context __user *ctx)
{ {
int err = 0; int err = 0;
u16 reserved[ARRAY_SIZE(ctx->__reserved)]; u16 reserved[ARRAY_SIZE(ctx->__reserved)];
u16 flags = 0;
unsigned int vl = task_get_sve_vl(current); unsigned int vl = task_get_sve_vl(current);
unsigned int vq = 0; unsigned int vq = 0;
if (test_thread_flag(TIF_SVE)) if (thread_sm_enabled(&current->thread)) {
vl = task_get_sme_vl(current);
vq = sve_vq_from_vl(vl); vq = sve_vq_from_vl(vl);
flags |= SVE_SIG_FLAG_SM;
} else if (test_thread_flag(TIF_SVE)) {
vq = sve_vq_from_vl(vl);
}
memset(reserved, 0, sizeof(reserved)); memset(reserved, 0, sizeof(reserved));
...@@ -238,6 +246,7 @@ static int preserve_sve_context(struct sve_context __user *ctx) ...@@ -238,6 +246,7 @@ static int preserve_sve_context(struct sve_context __user *ctx)
__put_user_error(round_up(SVE_SIG_CONTEXT_SIZE(vq), 16), __put_user_error(round_up(SVE_SIG_CONTEXT_SIZE(vq), 16),
&ctx->head.size, err); &ctx->head.size, err);
__put_user_error(vl, &ctx->vl, err); __put_user_error(vl, &ctx->vl, err);
__put_user_error(flags, &ctx->flags, err);
BUILD_BUG_ON(sizeof(ctx->__reserved) != sizeof(reserved)); BUILD_BUG_ON(sizeof(ctx->__reserved) != sizeof(reserved));
err |= __copy_to_user(&ctx->__reserved, reserved, sizeof(reserved)); err |= __copy_to_user(&ctx->__reserved, reserved, sizeof(reserved));
...@@ -258,18 +267,28 @@ static int preserve_sve_context(struct sve_context __user *ctx) ...@@ -258,18 +267,28 @@ static int preserve_sve_context(struct sve_context __user *ctx)
static int restore_sve_fpsimd_context(struct user_ctxs *user) static int restore_sve_fpsimd_context(struct user_ctxs *user)
{ {
int err; int err;
unsigned int vq; unsigned int vl, vq;
struct user_fpsimd_state fpsimd; struct user_fpsimd_state fpsimd;
struct sve_context sve; struct sve_context sve;
if (__copy_from_user(&sve, user->sve, sizeof(sve))) if (__copy_from_user(&sve, user->sve, sizeof(sve)))
return -EFAULT; return -EFAULT;
if (sve.vl != task_get_sve_vl(current)) if (sve.flags & SVE_SIG_FLAG_SM) {
if (!system_supports_sme())
return -EINVAL;
vl = task_get_sme_vl(current);
} else {
vl = task_get_sve_vl(current);
}
if (sve.vl != vl)
return -EINVAL; return -EINVAL;
if (sve.head.size <= sizeof(*user->sve)) { if (sve.head.size <= sizeof(*user->sve)) {
clear_thread_flag(TIF_SVE); clear_thread_flag(TIF_SVE);
current->thread.svcr &= ~SYS_SVCR_EL0_SM_MASK;
goto fpsimd_only; goto fpsimd_only;
} }
...@@ -301,6 +320,9 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user) ...@@ -301,6 +320,9 @@ static int restore_sve_fpsimd_context(struct user_ctxs *user)
if (err) if (err)
return -EFAULT; return -EFAULT;
if (sve.flags & SVE_SIG_FLAG_SM)
current->thread.svcr |= SYS_SVCR_EL0_SM_MASK;
else
set_thread_flag(TIF_SVE); set_thread_flag(TIF_SVE);
fpsimd_only: fpsimd_only:
...@@ -326,6 +348,101 @@ extern int restore_sve_fpsimd_context(struct user_ctxs *user); ...@@ -326,6 +348,101 @@ extern int restore_sve_fpsimd_context(struct user_ctxs *user);
#endif /* ! CONFIG_ARM64_SVE */ #endif /* ! CONFIG_ARM64_SVE */
#ifdef CONFIG_ARM64_SME
static int preserve_za_context(struct za_context __user *ctx)
{
int err = 0;
u16 reserved[ARRAY_SIZE(ctx->__reserved)];
unsigned int vl = task_get_sme_vl(current);
unsigned int vq;
if (thread_za_enabled(&current->thread))
vq = sve_vq_from_vl(vl);
else
vq = 0;
memset(reserved, 0, sizeof(reserved));
__put_user_error(ZA_MAGIC, &ctx->head.magic, err);
__put_user_error(round_up(ZA_SIG_CONTEXT_SIZE(vq), 16),
&ctx->head.size, err);
__put_user_error(vl, &ctx->vl, err);
BUILD_BUG_ON(sizeof(ctx->__reserved) != sizeof(reserved));
err |= __copy_to_user(&ctx->__reserved, reserved, sizeof(reserved));
if (vq) {
/*
* This assumes that the ZA state has already been saved to
* the task struct by calling the function
* fpsimd_signal_preserve_current_state().
*/
err |= __copy_to_user((char __user *)ctx + ZA_SIG_REGS_OFFSET,
current->thread.za_state,
ZA_SIG_REGS_SIZE(vq));
}
return err ? -EFAULT : 0;
}
static int restore_za_context(struct user_ctxs __user *user)
{
int err;
unsigned int vq;
struct za_context za;
if (__copy_from_user(&za, user->za, sizeof(za)))
return -EFAULT;
if (za.vl != task_get_sme_vl(current))
return -EINVAL;
if (za.head.size <= sizeof(*user->za)) {
current->thread.svcr &= ~SYS_SVCR_EL0_ZA_MASK;
return 0;
}
vq = sve_vq_from_vl(za.vl);
if (za.head.size < ZA_SIG_CONTEXT_SIZE(vq))
return -EINVAL;
/*
* Careful: we are about __copy_from_user() directly into
* thread.za_state with preemption enabled, so protection is
* needed to prevent a racing context switch from writing stale
* registers back over the new data.
*/
fpsimd_flush_task_state(current);
/* From now, fpsimd_thread_switch() won't touch thread.sve_state */
sme_alloc(current);
if (!current->thread.za_state) {
current->thread.svcr &= ~SYS_SVCR_EL0_ZA_MASK;
clear_thread_flag(TIF_SME);
return -ENOMEM;
}
err = __copy_from_user(current->thread.za_state,
(char __user const *)user->za +
ZA_SIG_REGS_OFFSET,
ZA_SIG_REGS_SIZE(vq));
if (err)
return -EFAULT;
set_thread_flag(TIF_SME);
current->thread.svcr |= SYS_SVCR_EL0_ZA_MASK;
return 0;
}
#else /* ! CONFIG_ARM64_SME */
/* Turn any non-optimised out attempts to use these into a link error: */
extern int preserve_za_context(void __user *ctx);
extern int restore_za_context(struct user_ctxs *user);
#endif /* ! CONFIG_ARM64_SME */
static int parse_user_sigframe(struct user_ctxs *user, static int parse_user_sigframe(struct user_ctxs *user,
struct rt_sigframe __user *sf) struct rt_sigframe __user *sf)
...@@ -340,6 +457,7 @@ static int parse_user_sigframe(struct user_ctxs *user, ...@@ -340,6 +457,7 @@ static int parse_user_sigframe(struct user_ctxs *user,
user->fpsimd = NULL; user->fpsimd = NULL;
user->sve = NULL; user->sve = NULL;
user->za = NULL;
if (!IS_ALIGNED((unsigned long)base, 16)) if (!IS_ALIGNED((unsigned long)base, 16))
goto invalid; goto invalid;
...@@ -393,7 +511,7 @@ static int parse_user_sigframe(struct user_ctxs *user, ...@@ -393,7 +511,7 @@ static int parse_user_sigframe(struct user_ctxs *user,
break; break;
case SVE_MAGIC: case SVE_MAGIC:
if (!system_supports_sve()) if (!system_supports_sve() && !system_supports_sme())
goto invalid; goto invalid;
if (user->sve) if (user->sve)
...@@ -405,6 +523,19 @@ static int parse_user_sigframe(struct user_ctxs *user, ...@@ -405,6 +523,19 @@ static int parse_user_sigframe(struct user_ctxs *user,
user->sve = (struct sve_context __user *)head; user->sve = (struct sve_context __user *)head;
break; break;
case ZA_MAGIC:
if (!system_supports_sme())
goto invalid;
if (user->za)
goto invalid;
if (size < sizeof(*user->za))
goto invalid;
user->za = (struct za_context __user *)head;
break;
case EXTRA_MAGIC: case EXTRA_MAGIC:
if (have_extra_context) if (have_extra_context)
goto invalid; goto invalid;
...@@ -528,6 +659,9 @@ static int restore_sigframe(struct pt_regs *regs, ...@@ -528,6 +659,9 @@ static int restore_sigframe(struct pt_regs *regs,
} }
} }
if (err == 0 && system_supports_sme() && user.za)
err = restore_za_context(&user);
return err; return err;
} }
...@@ -594,11 +728,12 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user, ...@@ -594,11 +728,12 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user,
if (system_supports_sve()) { if (system_supports_sve()) {
unsigned int vq = 0; unsigned int vq = 0;
if (add_all || test_thread_flag(TIF_SVE)) { if (add_all || test_thread_flag(TIF_SVE) ||
int vl = sve_max_vl(); thread_sm_enabled(&current->thread)) {
int vl = max(sve_max_vl(), sme_max_vl());
if (!add_all) if (!add_all)
vl = task_get_sve_vl(current); vl = thread_get_cur_vl(&current->thread);
vq = sve_vq_from_vl(vl); vq = sve_vq_from_vl(vl);
} }
...@@ -609,6 +744,24 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user, ...@@ -609,6 +744,24 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user,
return err; return err;
} }
if (system_supports_sme()) {
unsigned int vl;
unsigned int vq = 0;
if (add_all)
vl = sme_max_vl();
else
vl = task_get_sme_vl(current);
if (thread_za_enabled(&current->thread))
vq = sve_vq_from_vl(vl);
err = sigframe_alloc(user, &user->za_offset,
ZA_SIG_CONTEXT_SIZE(vq));
if (err)
return err;
}
return sigframe_alloc_end(user); return sigframe_alloc_end(user);
} }
...@@ -649,13 +802,21 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user, ...@@ -649,13 +802,21 @@ static int setup_sigframe(struct rt_sigframe_user_layout *user,
__put_user_error(current->thread.fault_code, &esr_ctx->esr, err); __put_user_error(current->thread.fault_code, &esr_ctx->esr, err);
} }
/* Scalable Vector Extension state, if present */ /* Scalable Vector Extension state (including streaming), if present */
if (system_supports_sve() && err == 0 && user->sve_offset) { if ((system_supports_sve() || system_supports_sme()) &&
err == 0 && user->sve_offset) {
struct sve_context __user *sve_ctx = struct sve_context __user *sve_ctx =
apply_user_offset(user, user->sve_offset); apply_user_offset(user, user->sve_offset);
err |= preserve_sve_context(sve_ctx); err |= preserve_sve_context(sve_ctx);
} }
/* ZA state if present */
if (system_supports_sme() && err == 0 && user->za_offset) {
struct za_context __user *za_ctx =
apply_user_offset(user, user->za_offset);
err |= preserve_za_context(za_ctx);
}
if (err == 0 && user->extra_offset) { if (err == 0 && user->extra_offset) {
char __user *sfp = (char __user *)user->sigframe; char __user *sfp = (char __user *)user->sigframe;
char __user *userp = char __user *userp =
...@@ -759,6 +920,13 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka, ...@@ -759,6 +920,13 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
/* TCO (Tag Check Override) always cleared for signal handlers */ /* TCO (Tag Check Override) always cleared for signal handlers */
regs->pstate &= ~PSR_TCO_BIT; regs->pstate &= ~PSR_TCO_BIT;
/* Signal handlers are invoked with ZA and streaming mode disabled */
if (system_supports_sme()) {
current->thread.svcr &= ~(SYS_SVCR_EL0_ZA_MASK |
SYS_SVCR_EL0_SM_MASK);
sme_smstop();
}
if (ka->sa.sa_flags & SA_RESTORER) if (ka->sa.sa_flags & SA_RESTORER)
sigtramp = ka->sa.sa_restorer; sigtramp = ka->sa.sa_restorer;
else else
......
...@@ -158,11 +158,36 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, ...@@ -158,11 +158,36 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
syscall_trace_exit(regs); syscall_trace_exit(regs);
} }
static inline void sve_user_discard(void) /*
* As per the ABI exit SME streaming mode and clear the SVE state not
* shared with FPSIMD on syscall entry.
*/
static inline void fp_user_discard(void)
{ {
/*
* If SME is active then exit streaming mode. If ZA is active
* then flush the SVE registers but leave userspace access to
* both SVE and SME enabled, otherwise disable SME for the
* task and fall through to disabling SVE too. This means
* that after a syscall we never have any streaming mode
* register state to track, if this changes the KVM code will
* need updating.
*/
if (system_supports_sme() && test_thread_flag(TIF_SME)) {
u64 svcr = read_sysreg_s(SYS_SVCR_EL0);
if (svcr & SYS_SVCR_EL0_SM_MASK)
sme_smstop_sm();
}
if (!system_supports_sve()) if (!system_supports_sve())
return; return;
/*
* If SME is not active then disable SVE, the registers will
* be cleared when userspace next attempts to access them and
* we do not need to track the SVE register state until then.
*/
clear_thread_flag(TIF_SVE); clear_thread_flag(TIF_SVE);
/* /*
...@@ -177,7 +202,7 @@ static inline void sve_user_discard(void) ...@@ -177,7 +202,7 @@ static inline void sve_user_discard(void)
void do_el0_svc(struct pt_regs *regs) void do_el0_svc(struct pt_regs *regs)
{ {
sve_user_discard(); fp_user_discard();
el0_svc_common(regs, regs->regs[8], __NR_syscalls, sys_call_table); el0_svc_common(regs, regs->regs[8], __NR_syscalls, sys_call_table);
} }
......
...@@ -821,6 +821,7 @@ static const char *esr_class_str[] = { ...@@ -821,6 +821,7 @@ static const char *esr_class_str[] = {
[ESR_ELx_EC_SVE] = "SVE", [ESR_ELx_EC_SVE] = "SVE",
[ESR_ELx_EC_ERET] = "ERET/ERETAA/ERETAB", [ESR_ELx_EC_ERET] = "ERET/ERETAA/ERETAB",
[ESR_ELx_EC_FPAC] = "FPAC", [ESR_ELx_EC_FPAC] = "FPAC",
[ESR_ELx_EC_SME] = "SME",
[ESR_ELx_EC_IMP_DEF] = "EL3 IMP DEF", [ESR_ELx_EC_IMP_DEF] = "EL3 IMP DEF",
[ESR_ELx_EC_IABT_LOW] = "IABT (lower EL)", [ESR_ELx_EC_IABT_LOW] = "IABT (lower EL)",
[ESR_ELx_EC_IABT_CUR] = "IABT (current EL)", [ESR_ELx_EC_IABT_CUR] = "IABT (current EL)",
......
...@@ -82,6 +82,26 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu) ...@@ -82,6 +82,26 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
if (read_sysreg(cpacr_el1) & CPACR_EL1_ZEN_EL0EN) if (read_sysreg(cpacr_el1) & CPACR_EL1_ZEN_EL0EN)
vcpu->arch.flags |= KVM_ARM64_HOST_SVE_ENABLED; vcpu->arch.flags |= KVM_ARM64_HOST_SVE_ENABLED;
/*
* We don't currently support SME guests but if we leave
* things in streaming mode then when the guest starts running
* FPSIMD or SVE code it may generate SME traps so as a
* special case if we are in streaming mode we force the host
* state to be saved now and exit streaming mode so that we
* don't have to handle any SME traps for valid guest
* operations. Do this for ZA as well for now for simplicity.
*/
if (system_supports_sme()) {
if (read_sysreg(cpacr_el1) & CPACR_EL1_SMEN_EL0EN)
vcpu->arch.flags |= KVM_ARM64_HOST_SME_ENABLED;
if (read_sysreg_s(SYS_SVCR_EL0) &
(SYS_SVCR_EL0_SM_MASK | SYS_SVCR_EL0_ZA_MASK)) {
vcpu->arch.flags &= ~KVM_ARM64_FP_HOST;
fpsimd_save_and_flush_cpu_state();
}
}
} }
/* /*
...@@ -109,9 +129,14 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) ...@@ -109,9 +129,14 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
WARN_ON_ONCE(!irqs_disabled()); WARN_ON_ONCE(!irqs_disabled());
if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) { if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
/*
* Currently we do not support SME guests so SVCR is
* always 0 and we just need a variable to point to.
*/
fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.fp_regs, fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.fp_regs,
vcpu->arch.sve_state, vcpu->arch.sve_state,
vcpu->arch.sve_max_vl); vcpu->arch.sve_max_vl,
NULL, 0, &vcpu->arch.svcr);
clear_thread_flag(TIF_FOREIGN_FPSTATE); clear_thread_flag(TIF_FOREIGN_FPSTATE);
update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu)); update_thread_flag(TIF_SVE, vcpu_has_sve(vcpu));
...@@ -130,6 +155,22 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu) ...@@ -130,6 +155,22 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
local_irq_save(flags); local_irq_save(flags);
/*
* If we have VHE then the Hyp code will reset CPACR_EL1 to
* CPACR_EL1_DEFAULT and we need to reenable SME.
*/
if (has_vhe() && system_supports_sme()) {
/* Also restore EL0 state seen on entry */
if (vcpu->arch.flags & KVM_ARM64_HOST_SME_ENABLED)
sysreg_clear_set(CPACR_EL1, 0,
CPACR_EL1_SMEN_EL0EN |
CPACR_EL1_SMEN_EL1EN);
else
sysreg_clear_set(CPACR_EL1,
CPACR_EL1_SMEN_EL0EN,
CPACR_EL1_SMEN_EL1EN);
}
if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) { if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
if (vcpu_has_sve(vcpu)) { if (vcpu_has_sve(vcpu)) {
__vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_el1(SYS_ZCR); __vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_el1(SYS_ZCR);
......
...@@ -47,10 +47,24 @@ static void __activate_traps(struct kvm_vcpu *vcpu) ...@@ -47,10 +47,24 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
val |= CPTR_EL2_TFP | CPTR_EL2_TZ; val |= CPTR_EL2_TFP | CPTR_EL2_TZ;
__activate_traps_fpsimd32(vcpu); __activate_traps_fpsimd32(vcpu);
} }
if (cpus_have_final_cap(ARM64_SME))
val |= CPTR_EL2_TSM;
write_sysreg(val, cptr_el2); write_sysreg(val, cptr_el2);
write_sysreg(__this_cpu_read(kvm_hyp_vector), vbar_el2); write_sysreg(__this_cpu_read(kvm_hyp_vector), vbar_el2);
if (cpus_have_final_cap(ARM64_SME)) {
val = read_sysreg_s(SYS_HFGRTR_EL2);
val &= ~(HFGxTR_EL2_nTPIDR2_EL0_MASK |
HFGxTR_EL2_nSMPRI_EL1_MASK);
write_sysreg_s(val, SYS_HFGRTR_EL2);
val = read_sysreg_s(SYS_HFGWTR_EL2);
val &= ~(HFGxTR_EL2_nTPIDR2_EL0_MASK |
HFGxTR_EL2_nSMPRI_EL1_MASK);
write_sysreg_s(val, SYS_HFGWTR_EL2);
}
if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) { if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt; struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
...@@ -94,9 +108,25 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu) ...@@ -94,9 +108,25 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
write_sysreg(this_cpu_ptr(&kvm_init_params)->hcr_el2, hcr_el2); write_sysreg(this_cpu_ptr(&kvm_init_params)->hcr_el2, hcr_el2);
if (cpus_have_final_cap(ARM64_SME)) {
u64 val;
val = read_sysreg_s(SYS_HFGRTR_EL2);
val |= HFGxTR_EL2_nTPIDR2_EL0_MASK |
HFGxTR_EL2_nSMPRI_EL1_MASK;
write_sysreg_s(val, SYS_HFGRTR_EL2);
val = read_sysreg_s(SYS_HFGWTR_EL2);
val |= HFGxTR_EL2_nTPIDR2_EL0_MASK |
HFGxTR_EL2_nSMPRI_EL1_MASK;
write_sysreg_s(val, SYS_HFGWTR_EL2);
}
cptr = CPTR_EL2_DEFAULT; cptr = CPTR_EL2_DEFAULT;
if (vcpu_has_sve(vcpu) && (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)) if (vcpu_has_sve(vcpu) && (vcpu->arch.flags & KVM_ARM64_FP_ENABLED))
cptr |= CPTR_EL2_TZ; cptr |= CPTR_EL2_TZ;
if (cpus_have_final_cap(ARM64_SME))
cptr &= ~CPTR_EL2_TSM;
write_sysreg(cptr, cptr_el2); write_sysreg(cptr, cptr_el2);
write_sysreg(__kvm_hyp_host_vector, vbar_el2); write_sysreg(__kvm_hyp_host_vector, vbar_el2);
......
...@@ -41,7 +41,8 @@ static void __activate_traps(struct kvm_vcpu *vcpu) ...@@ -41,7 +41,8 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
val = read_sysreg(cpacr_el1); val = read_sysreg(cpacr_el1);
val |= CPACR_EL1_TTA; val |= CPACR_EL1_TTA;
val &= ~(CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN); val &= ~(CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN |
CPACR_EL1_SMEN_EL0EN | CPACR_EL1_SMEN_EL1EN);
/* /*
* With VHE (HCR.E2H == 1), accesses to CPACR_EL1 are routed to * With VHE (HCR.E2H == 1), accesses to CPACR_EL1 are routed to
...@@ -62,6 +63,10 @@ static void __activate_traps(struct kvm_vcpu *vcpu) ...@@ -62,6 +63,10 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
__activate_traps_fpsimd32(vcpu); __activate_traps_fpsimd32(vcpu);
} }
if (cpus_have_final_cap(ARM64_SME))
write_sysreg(read_sysreg(sctlr_el2) & ~SCTLR_ELx_ENTP2,
sctlr_el2);
write_sysreg(val, cpacr_el1); write_sysreg(val, cpacr_el1);
write_sysreg(__this_cpu_read(kvm_hyp_vector), vbar_el1); write_sysreg(__this_cpu_read(kvm_hyp_vector), vbar_el1);
...@@ -83,6 +88,10 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu) ...@@ -83,6 +88,10 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
*/ */
asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT)); asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT));
if (cpus_have_final_cap(ARM64_SME))
write_sysreg(read_sysreg(sctlr_el2) | SCTLR_ELx_ENTP2,
sctlr_el2);
write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1); write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
if (!arm64_kernel_unmapped_at_el0()) if (!arm64_kernel_unmapped_at_el0())
......
...@@ -1132,6 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu, ...@@ -1132,6 +1132,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
case SYS_ID_AA64PFR1_EL1: case SYS_ID_AA64PFR1_EL1:
if (!kvm_has_mte(vcpu->kvm)) if (!kvm_has_mte(vcpu->kvm))
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_MTE); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_MTE);
val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_SME);
break; break;
case SYS_ID_AA64ISAR1_EL1: case SYS_ID_AA64ISAR1_EL1:
if (!vcpu_has_ptrauth(vcpu)) if (!vcpu_has_ptrauth(vcpu))
...@@ -1553,7 +1555,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { ...@@ -1553,7 +1555,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
ID_UNALLOCATED(4,2), ID_UNALLOCATED(4,2),
ID_UNALLOCATED(4,3), ID_UNALLOCATED(4,3),
ID_SANITISED(ID_AA64ZFR0_EL1), ID_SANITISED(ID_AA64ZFR0_EL1),
ID_UNALLOCATED(4,5), ID_HIDDEN(ID_AA64SMFR0_EL1),
ID_UNALLOCATED(4,6), ID_UNALLOCATED(4,6),
ID_UNALLOCATED(4,7), ID_UNALLOCATED(4,7),
...@@ -1596,6 +1598,8 @@ static const struct sys_reg_desc sys_reg_descs[] = { ...@@ -1596,6 +1598,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility }, { SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility },
{ SYS_DESC(SYS_TRFCR_EL1), undef_access }, { SYS_DESC(SYS_TRFCR_EL1), undef_access },
{ SYS_DESC(SYS_SMPRI_EL1), undef_access },
{ SYS_DESC(SYS_SMCR_EL1), undef_access },
{ SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 }, { SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 },
{ SYS_DESC(SYS_TTBR1_EL1), access_vm_reg, reset_unknown, TTBR1_EL1 }, { SYS_DESC(SYS_TTBR1_EL1), access_vm_reg, reset_unknown, TTBR1_EL1 },
{ SYS_DESC(SYS_TCR_EL1), access_vm_reg, reset_val, TCR_EL1, 0 }, { SYS_DESC(SYS_TCR_EL1), access_vm_reg, reset_val, TCR_EL1, 0 },
...@@ -1678,8 +1682,10 @@ static const struct sys_reg_desc sys_reg_descs[] = { ...@@ -1678,8 +1682,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_CCSIDR_EL1), access_ccsidr }, { SYS_DESC(SYS_CCSIDR_EL1), access_ccsidr },
{ SYS_DESC(SYS_CLIDR_EL1), access_clidr }, { SYS_DESC(SYS_CLIDR_EL1), access_clidr },
{ SYS_DESC(SYS_SMIDR_EL1), undef_access },
{ SYS_DESC(SYS_CSSELR_EL1), access_csselr, reset_unknown, CSSELR_EL1 }, { SYS_DESC(SYS_CSSELR_EL1), access_csselr, reset_unknown, CSSELR_EL1 },
{ SYS_DESC(SYS_CTR_EL0), access_ctr }, { SYS_DESC(SYS_CTR_EL0), access_ctr },
{ SYS_DESC(SYS_SVCR_EL0), undef_access },
{ PMU_SYS_REG(SYS_PMCR_EL0), .access = access_pmcr, { PMU_SYS_REG(SYS_PMCR_EL0), .access = access_pmcr,
.reset = reset_pmcr, .reg = PMCR_EL0 }, .reset = reset_pmcr, .reg = PMCR_EL0 },
...@@ -1719,6 +1725,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { ...@@ -1719,6 +1725,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_TPIDR_EL0), NULL, reset_unknown, TPIDR_EL0 }, { SYS_DESC(SYS_TPIDR_EL0), NULL, reset_unknown, TPIDR_EL0 },
{ SYS_DESC(SYS_TPIDRRO_EL0), NULL, reset_unknown, TPIDRRO_EL0 }, { SYS_DESC(SYS_TPIDRRO_EL0), NULL, reset_unknown, TPIDRRO_EL0 },
{ SYS_DESC(SYS_TPIDR2_EL0), undef_access },
{ SYS_DESC(SYS_SCXTNUM_EL0), undef_access }, { SYS_DESC(SYS_SCXTNUM_EL0), undef_access },
......
...@@ -43,6 +43,8 @@ KVM_PROTECTED_MODE ...@@ -43,6 +43,8 @@ KVM_PROTECTED_MODE
MISMATCHED_CACHE_TYPE MISMATCHED_CACHE_TYPE
MTE MTE
MTE_ASYMM MTE_ASYMM
SME
SME_FA64
SPECTRE_V2 SPECTRE_V2
SPECTRE_V3A SPECTRE_V3A
SPECTRE_V4 SPECTRE_V4
......
...@@ -431,6 +431,8 @@ typedef struct elf64_shdr { ...@@ -431,6 +431,8 @@ typedef struct elf64_shdr {
#define NT_ARM_PACG_KEYS 0x408 /* ARM pointer authentication generic key */ #define NT_ARM_PACG_KEYS 0x408 /* ARM pointer authentication generic key */
#define NT_ARM_TAGGED_ADDR_CTRL 0x409 /* arm64 tagged address control (prctl()) */ #define NT_ARM_TAGGED_ADDR_CTRL 0x409 /* arm64 tagged address control (prctl()) */
#define NT_ARM_PAC_ENABLED_KEYS 0x40a /* arm64 ptr auth enabled keys (prctl()) */ #define NT_ARM_PAC_ENABLED_KEYS 0x40a /* arm64 ptr auth enabled keys (prctl()) */
#define NT_ARM_SSVE 0x40b /* ARM Streaming SVE registers */
#define NT_ARM_ZA 0x40c /* ARM SME ZA registers */
#define NT_ARC_V2 0x600 /* ARCv2 accumulator/extra registers */ #define NT_ARC_V2 0x600 /* ARCv2 accumulator/extra registers */
#define NT_VMCOREDD 0x700 /* Vmcore Device Dump Note */ #define NT_VMCOREDD 0x700 /* Vmcore Device Dump Note */
#define NT_MIPS_DSP 0x800 /* MIPS DSP ASE registers */ #define NT_MIPS_DSP 0x800 /* MIPS DSP ASE registers */
......
...@@ -272,6 +272,15 @@ struct prctl_mm_map { ...@@ -272,6 +272,15 @@ struct prctl_mm_map {
# define PR_SCHED_CORE_SCOPE_THREAD_GROUP 1 # define PR_SCHED_CORE_SCOPE_THREAD_GROUP 1
# define PR_SCHED_CORE_SCOPE_PROCESS_GROUP 2 # define PR_SCHED_CORE_SCOPE_PROCESS_GROUP 2
/* arm64 Scalable Matrix Extension controls */
/* Flag values must be in sync with SVE versions */
#define PR_SME_SET_VL 63 /* set task vector length */
# define PR_SME_SET_VL_ONEXEC (1 << 18) /* defer effect until exec */
#define PR_SME_GET_VL 64 /* get task vector length */
/* Bits common to PR_SME_SET_VL and PR_SME_GET_VL */
# define PR_SME_VL_LEN_MASK 0xffff
# define PR_SME_VL_INHERIT (1 << 17) /* inherit across exec */
#define PR_SET_VMA 0x53564d41 #define PR_SET_VMA 0x53564d41
# define PR_SET_VMA_ANON_NAME 0 # define PR_SET_VMA_ANON_NAME 0
......
...@@ -117,6 +117,12 @@ ...@@ -117,6 +117,12 @@
#ifndef SVE_GET_VL #ifndef SVE_GET_VL
# define SVE_GET_VL() (-EINVAL) # define SVE_GET_VL() (-EINVAL)
#endif #endif
#ifndef SME_SET_VL
# define SME_SET_VL(a) (-EINVAL)
#endif
#ifndef SME_GET_VL
# define SME_GET_VL() (-EINVAL)
#endif
#ifndef PAC_RESET_KEYS #ifndef PAC_RESET_KEYS
# define PAC_RESET_KEYS(a, b) (-EINVAL) # define PAC_RESET_KEYS(a, b) (-EINVAL)
#endif #endif
...@@ -2541,6 +2547,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, ...@@ -2541,6 +2547,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
case PR_SVE_GET_VL: case PR_SVE_GET_VL:
error = SVE_GET_VL(); error = SVE_GET_VL();
break; break;
case PR_SME_SET_VL:
error = SME_SET_VL(arg2);
break;
case PR_SME_GET_VL:
error = SME_GET_VL();
break;
case PR_GET_SPECULATION_CTRL: case PR_GET_SPECULATION_CTRL:
if (arg3 || arg4 || arg5) if (arg3 || arg4 || arg5)
return -EINVAL; return -EINVAL;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment