Commit 8cb1ae19 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'x86-fpu-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fpu updates from Thomas Gleixner:

 - Cleanup of extable fixup handling to be more robust, which in turn
   allows to make the FPU exception fixups more robust as well.

 - Change the return code for signal frame related failures from
   explicit error codes to a boolean fail/success as that's all what the
   calling code evaluates.

 - A large refactoring of the FPU code to prepare for adding AMX
   support:

      - Distangle the public header maze and remove especially the
        misnomed kitchen sink internal.h which is despite it's name
        included all over the place.

      - Add a proper abstraction for the register buffer storage (struct
        fpstate) which allows to dynamically size the buffer at runtime
        by flipping the pointer to the buffer container from the default
        container which is embedded in task_struct::tread::fpu to a
        dynamically allocated container with a larger register buffer.

      - Convert the code over to the new fpstate mechanism.

      - Consolidate the KVM FPU handling by moving the FPU related code
        into the FPU core which removes the number of exports and avoids
        adding even more export when AMX has to be supported in KVM.
        This also removes duplicated code which was of course
        unnecessary different and incomplete in the KVM copy.

      - Simplify the KVM FPU buffer handling by utilizing the new
        fpstate container and just switching the buffer pointer from the
        user space buffer to the KVM guest buffer when entering
        vcpu_run() and flipping it back when leaving the function. This
        cuts the memory requirements of a vCPU for FPU buffers in half
        and avoids pointless memory copy operations.

        This also solves the so far unresolved problem of adding AMX
        support because the current FPU buffer handling of KVM inflicted
        a circular dependency between adding AMX support to the core and
        to KVM. With the new scheme of switching fpstate AMX support can
        be added to the core code without affecting KVM.

      - Replace various variables with proper data structures so the
        extra information required for adding dynamically enabled FPU
        features (AMX) can be added in one place

 - Add AMX (Advanced Matrix eXtensions) support (finally):

   AMX is a large XSTATE component which is going to be available with
   Saphire Rapids XEON CPUs. The feature comes with an extra MSR
   (MSR_XFD) which allows to trap the (first) use of an AMX related
   instruction, which has two benefits:

    1) It allows the kernel to control access to the feature

    2) It allows the kernel to dynamically allocate the large register
       state buffer instead of burdening every task with the the extra
       8K or larger state storage.

   It would have been great to gain this kind of control already with
   AVX512.

   The support comes with the following infrastructure components:

    1) arch_prctl() to
        - read the supported features (equivalent to XGETBV(0))
        - read the permitted features for a task
        - request permission for a dynamically enabled feature

       Permission is granted per process, inherited on fork() and
       cleared on exec(). The permission policy of the kernel is
       restricted to sigaltstack size validation, but the syscall
       obviously allows further restrictions via seccomp etc.

    2) A stronger sigaltstack size validation for sys_sigaltstack(2)
       which takes granted permissions and the potentially resulting
       larger signal frame into account. This mechanism can also be used
       to enforce factual sigaltstack validation independent of dynamic
       features to help with finding potential victims of the 2K
       sigaltstack size constant which is broken since AVX512 support
       was added.

    3) Exception handling for #NM traps to catch first use of a extended
       feature via a new cause MSR. If the exception was caused by the
       use of such a feature, the handler checks permission for that
       feature. If permission has not been granted, the handler sends a
       SIGILL like the #UD handler would do if the feature would have
       been disabled in XCR0. If permission has been granted, then a new
       fpstate which fits the larger buffer requirement is allocated.

       In the unlikely case that this allocation fails, the handler
       sends SIGSEGV to the task. That's not elegant, but unavoidable as
       the other discussed options of preallocation or full per task
       permissions come with their own set of horrors for kernel and/or
       userspace. So this is the lesser of the evils and SIGSEGV caused
       by unexpected memory allocation failures is not a fundamentally
       new concept either.

       When allocation succeeds, the fpstate properties are filled in to
       reflect the extended feature set and the resulting sizes, the
       fpu::fpstate pointer is updated accordingly and the trap is
       disarmed for this task permanently.

    4) Enumeration and size calculations

    5) Trap switching via MSR_XFD

       The XFD (eXtended Feature Disable) MSR is context switched with
       the same life time rules as the FPU register state itself. The
       mechanism is keyed off with a static key which is default
       disabled so !AMX equipped CPUs have zero overhead. On AMX enabled
       CPUs the overhead is limited by comparing the tasks XFD value
       with a per CPU shadow variable to avoid redundant MSR writes. In
       case of switching from a AMX using task to a non AMX using task
       or vice versa, the extra MSR write is obviously inevitable.

       All other places which need to be aware of the variable feature
       sets and resulting variable sizes are not affected at all because
       they retrieve the information (feature set, sizes) unconditonally
       from the fpstate properties.

    6) Enable the new AMX states

   Note, this is relatively new code despite the fact that AMX support
   is in the works for more than a year now.

   The big refactoring of the FPU code, which allowed to do a proper
   integration has been started exactly 3 weeks ago. Refactoring of the
   existing FPU code and of the original AMX patches took a week and has
   been subject to extensive review and testing. The only fallout which
   has not been caught in review and testing right away was restricted
   to AMX enabled systems, which is completely irrelevant for anyone
   outside Intel and their early access program. There might be dragons
   lurking as usual, but so far the fine grained refactoring has held up
   and eventual yet undetected fallout is bisectable and should be
   easily addressable before the 5.16 release. Famous last words...

   Many thanks to Chang Bae and Dave Hansen for working hard on this and
   also to the various test teams at Intel who reserved extra capacity
   to follow the rapid development of this closely which provides the
   confidence level required to offer this rather large update for
   inclusion into 5.16-rc1

* tag 'x86-fpu-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits)
  Documentation/x86: Add documentation for using dynamic XSTATE features
  x86/fpu: Include vmalloc.h for vzalloc()
  selftests/x86/amx: Add context switch test
  selftests/x86/amx: Add test cases for AMX state management
  x86/fpu/amx: Enable the AMX feature in 64-bit mode
  x86/fpu: Add XFD handling for dynamic states
  x86/fpu: Calculate the default sizes independently
  x86/fpu/amx: Define AMX state components and have it used for boot-time checks
  x86/fpu/xstate: Prepare XSAVE feature table for gaps in state component numbers
  x86/fpu/xstate: Add fpstate_realloc()/free()
  x86/fpu/xstate: Add XFD #NM handler
  x86/fpu: Update XFD state where required
  x86/fpu: Add sanity checks for XFD
  x86/fpu: Add XFD state to fpstate
  x86/msr-index: Add MSRs for XFD
  x86/cpufeatures: Add eXtended Feature Disabling (XFD) feature bit
  x86/fpu: Reset permission and fpstate on exec()
  x86/fpu: Prepare fpu_clone() for dynamically enabled features
  x86/fpu/signal: Prepare for variable sigframe length
  x86/signal: Use fpu::__state_user_size for sigalt stack validation
  ...
parents 7d20dd32 d7a9590f
......@@ -5497,6 +5497,15 @@
stifb= [HW]
Format: bpp:<bpp1>[:<bpp2>[:<bpp3>...]]
strict_sas_size=
[X86]
Format: <bool>
Enable or disable strict sigaltstack size checks
against the required signal frame size which
depends on the supported FPU features. This can
be used to filter out binaries which have
not yet been made aware of AT_MINSIGSTKSZ.
sunrpc.min_resvport=
sunrpc.max_resvport=
[NFS,SUNRPC]
......
......@@ -37,3 +37,4 @@ x86-specific Documentation
sgx
features
elf_auxvec
xstate
Using XSTATE features in user space applications
================================================
The x86 architecture supports floating-point extensions which are
enumerated via CPUID. Applications consult CPUID and use XGETBV to
evaluate which features have been enabled by the kernel XCR0.
Up to AVX-512 and PKRU states, these features are automatically enabled by
the kernel if available. Features like AMX TILE_DATA (XSTATE component 18)
are enabled by XCR0 as well, but the first use of related instruction is
trapped by the kernel because by default the required large XSTATE buffers
are not allocated automatically.
Using dynamically enabled XSTATE features in user space applications
--------------------------------------------------------------------
The kernel provides an arch_prctl(2) based mechanism for applications to
request the usage of such features. The arch_prctl(2) options related to
this are:
-ARCH_GET_XCOMP_SUPP
arch_prctl(ARCH_GET_XCOMP_SUPP, &features);
ARCH_GET_XCOMP_SUPP stores the supported features in userspace storage of
type uint64_t. The second argument is a pointer to that storage.
-ARCH_GET_XCOMP_PERM
arch_prctl(ARCH_GET_XCOMP_PERM, &features);
ARCH_GET_XCOMP_PERM stores the features for which the userspace process
has permission in userspace storage of type uint64_t. The second argument
is a pointer to that storage.
-ARCH_REQ_XCOMP_PERM
arch_prctl(ARCH_REQ_XCOMP_PERM, feature_nr);
ARCH_REQ_XCOMP_PERM allows to request permission for a dynamically enabled
feature or a feature set. A feature set can be mapped to a facility, e.g.
AMX, and can require one or more XSTATE components to be enabled.
The feature argument is the number of the highest XSTATE component which
is required for a facility to work.
When requesting permission for a feature, the kernel checks the
availability. The kernel ensures that sigaltstacks in the process's tasks
are large enough to accommodate the resulting large signal frame. It
enforces this both during ARCH_REQ_XCOMP_SUPP and during any subsequent
sigaltstack(2) calls. If an installed sigaltstack is smaller than the
resulting sigframe size, ARCH_REQ_XCOMP_SUPP results in -ENOSUPP. Also,
sigaltstack(2) results in -ENOMEM if the requested altstack is too small
for the permitted features.
Permission, when granted, is valid per process. Permissions are inherited
on fork(2) and cleared on exec(3).
The first use of an instruction related to a dynamically enabled feature is
trapped by the kernel. The trap handler checks whether the process has
permission to use the feature. If the process has no permission then the
kernel sends SIGILL to the application. If the process has permission then
the handler allocates a larger xstate buffer for the task so the large
state can be context switched. In the unlikely cases that the allocation
fails, the kernel sends SIGSEGV.
......@@ -1288,6 +1288,9 @@ config ARCH_HAS_ELFCORE_COMPAT
config ARCH_HAS_PARANOID_L1D_FLUSH
bool
config DYNAMIC_SIGFRAME
bool
source "kernel/gcov/Kconfig"
source "scripts/gcc-plugins/Kconfig"
......
......@@ -125,6 +125,7 @@ config X86
select CLOCKSOURCE_VALIDATE_LAST_CYCLE
select CLOCKSOURCE_WATCHDOG
select DCACHE_WORD_ACCESS
select DYNAMIC_SIGFRAME
select EDAC_ATOMIC_SCRUB
select EDAC_SUPPORT
select GENERIC_CLOCKEVENTS_BROADCAST if X86_64 || (X86_32 && X86_LOCAL_APIC)
......@@ -2399,6 +2400,22 @@ config MODIFY_LDT_SYSCALL
Saying 'N' here may make sense for embedded or server kernels.
config STRICT_SIGALTSTACK_SIZE
bool "Enforce strict size checking for sigaltstack"
depends on DYNAMIC_SIGFRAME
help
For historical reasons MINSIGSTKSZ is a constant which became
already too small with AVX512 support. Add a mechanism to
enforce strict checking of the sigaltstack size against the
real size of the FPU frame. This option enables the check
by default. It can also be controlled via the kernel command
line option 'strict_sas_size' independent of this config
switch. Enabling it might break existing applications which
allocate a too small sigaltstack but 'work' because they
never get a signal delivered.
Say 'N' unless you want to really enforce this check.
source "kernel/livepatch/Kconfig"
endmenu
......
......@@ -14,6 +14,7 @@
#include <linux/perf_event.h>
#include <asm/fpu/xstate.h>
#include <asm/intel_ds.h>
#include <asm/cpu.h>
......
......@@ -24,7 +24,6 @@
#include <linux/syscalls.h>
#include <asm/ucontext.h>
#include <linux/uaccess.h>
#include <asm/fpu/internal.h>
#include <asm/fpu/signal.h>
#include <asm/ptrace.h>
#include <asm/ia32_unistd.h>
......@@ -57,8 +56,8 @@ static inline void reload_segments(struct sigcontext_32 *sc)
/*
* Do a signal return; undo the signal stack.
*/
static int ia32_restore_sigcontext(struct pt_regs *regs,
struct sigcontext_32 __user *usc)
static bool ia32_restore_sigcontext(struct pt_regs *regs,
struct sigcontext_32 __user *usc)
{
struct sigcontext_32 sc;
......@@ -66,7 +65,7 @@ static int ia32_restore_sigcontext(struct pt_regs *regs,
current->restart_block.fn = do_no_restart_syscall;
if (unlikely(copy_from_user(&sc, usc, sizeof(sc))))
return -EFAULT;
return false;
/* Get only the ia32 registers. */
regs->bx = sc.bx;
......@@ -111,7 +110,7 @@ COMPAT_SYSCALL_DEFINE0(sigreturn)
set_current_blocked(&set);
if (ia32_restore_sigcontext(regs, &frame->sc))
if (!ia32_restore_sigcontext(regs, &frame->sc))
goto badframe;
return regs->ax;
......@@ -135,7 +134,7 @@ COMPAT_SYSCALL_DEFINE0(rt_sigreturn)
set_current_blocked(&set);
if (ia32_restore_sigcontext(regs, &frame->uc.uc_mcontext))
if (!ia32_restore_sigcontext(regs, &frame->uc.uc_mcontext))
goto badframe;
if (compat_restore_altstack(&frame->uc.uc_stack))
......@@ -220,8 +219,8 @@ static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs,
sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size);
*fpstate = (struct _fpstate_32 __user *) sp;
if (copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned,
math_size) < 0)
if (!copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned,
math_size))
return (void __user *) -1L;
sp -= frame_size;
......
......@@ -122,28 +122,19 @@
#ifdef __KERNEL__
# include <asm/extable_fixup_types.h>
/* Exception table entry */
#ifdef __ASSEMBLY__
# define _ASM_EXTABLE_HANDLE(from, to, handler) \
# define _ASM_EXTABLE_TYPE(from, to, type) \
.pushsection "__ex_table","a" ; \
.balign 4 ; \
.long (from) - . ; \
.long (to) - . ; \
.long (handler) - . ; \
.long type ; \
.popsection
# define _ASM_EXTABLE(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_default)
# define _ASM_EXTABLE_UA(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_uaccess)
# define _ASM_EXTABLE_CPY(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_copy)
# define _ASM_EXTABLE_FAULT(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_fault)
# ifdef CONFIG_KPROBES
# define _ASM_NOKPROBE(entry) \
.pushsection "_kprobe_blacklist","aw" ; \
......@@ -155,27 +146,15 @@
# endif
#else /* ! __ASSEMBLY__ */
# define _EXPAND_EXTABLE_HANDLE(x) #x
# define _ASM_EXTABLE_HANDLE(from, to, handler) \
# define _ASM_EXTABLE_TYPE(from, to, type) \
" .pushsection \"__ex_table\",\"a\"\n" \
" .balign 4\n" \
" .long (" #from ") - .\n" \
" .long (" #to ") - .\n" \
" .long (" _EXPAND_EXTABLE_HANDLE(handler) ") - .\n" \
" .long " __stringify(type) " \n" \
" .popsection\n"
# define _ASM_EXTABLE(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_default)
# define _ASM_EXTABLE_UA(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_uaccess)
# define _ASM_EXTABLE_CPY(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_copy)
# define _ASM_EXTABLE_FAULT(from, to) \
_ASM_EXTABLE_HANDLE(from, to, ex_handler_fault)
/* For C file, we already have NOKPROBE_SYMBOL macro */
/*
......@@ -188,6 +167,17 @@ register unsigned long current_stack_pointer asm(_ASM_SP);
#define ASM_CALL_CONSTRAINT "+r" (current_stack_pointer)
#endif /* __ASSEMBLY__ */
#endif /* __KERNEL__ */
#define _ASM_EXTABLE(from, to) \
_ASM_EXTABLE_TYPE(from, to, EX_TYPE_DEFAULT)
#define _ASM_EXTABLE_UA(from, to) \
_ASM_EXTABLE_TYPE(from, to, EX_TYPE_UACCESS)
#define _ASM_EXTABLE_CPY(from, to) \
_ASM_EXTABLE_TYPE(from, to, EX_TYPE_COPY)
#define _ASM_EXTABLE_FAULT(from, to) \
_ASM_EXTABLE_TYPE(from, to, EX_TYPE_FAULT)
#endif /* __KERNEL__ */
#endif /* _ASM_X86_ASM_H */
......@@ -277,6 +277,7 @@
#define X86_FEATURE_XSAVEC (10*32+ 1) /* XSAVEC instruction */
#define X86_FEATURE_XGETBV1 (10*32+ 2) /* XGETBV with ECX = 1 instruction */
#define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS instructions */
#define X86_FEATURE_XFD (10*32+ 4) /* "" eXtended Feature Disabling */
/*
* Extended auxiliary flags: Linux defined - for features scattered in various
......@@ -298,6 +299,7 @@
/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */
#define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */
#define X86_FEATURE_AMX_TILE (18*32+24) /* AMX tile Support */
/* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
#define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */
......
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_X86_EXTABLE_H
#define _ASM_X86_EXTABLE_H
#include <asm/extable_fixup_types.h>
/*
* The exception table consists of triples of addresses relative to the
* exception table entry itself. The first address is of an instruction
* that is allowed to fault, the second is the target at which the program
* should continue. The third is a handler function to deal with the fault
* caused by the instruction in the first field.
* The exception table consists of two addresses relative to the
* exception table entry itself and a type selector field.
*
* The first address is of an instruction that is allowed to fault, the
* second is the target at which the program should continue.
*
* The type entry is used by fixup_exception() to select the handler to
* deal with the fault caused by the instruction in the first field.
*
* All the routines below use bits of fixup code that are out of line
* with the main instruction path. This means when everything is well,
......@@ -15,7 +21,7 @@
*/
struct exception_table_entry {
int insn, fixup, handler;
int insn, fixup, type;
};
struct pt_regs;
......@@ -25,21 +31,27 @@ struct pt_regs;
do { \
(a)->fixup = (b)->fixup + (delta); \
(b)->fixup = (tmp).fixup - (delta); \
(a)->handler = (b)->handler + (delta); \
(b)->handler = (tmp).handler - (delta); \
(a)->type = (b)->type; \
(b)->type = (tmp).type; \
} while (0)
enum handler_type {
EX_HANDLER_NONE,
EX_HANDLER_FAULT,
EX_HANDLER_UACCESS,
EX_HANDLER_OTHER
};
extern int fixup_exception(struct pt_regs *regs, int trapnr,
unsigned long error_code, unsigned long fault_addr);
extern int fixup_bug(struct pt_regs *regs, int trapnr);
extern enum handler_type ex_get_fault_handler_type(unsigned long ip);
extern int ex_get_fixup_type(unsigned long ip);
extern void early_fixup_exception(struct pt_regs *regs, int trapnr);
#ifdef CONFIG_X86_MCE
extern void ex_handler_msr_mce(struct pt_regs *regs, bool wrmsr);
#else
static inline void ex_handler_msr_mce(struct pt_regs *regs, bool wrmsr) { }
#endif
#if defined(CONFIG_BPF_JIT) && defined(CONFIG_X86_64)
bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs);
#else
static inline bool ex_handler_bpf(const struct exception_table_entry *x,
struct pt_regs *regs) { return false; }
#endif
#endif
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_X86_EXTABLE_FIXUP_TYPES_H
#define _ASM_X86_EXTABLE_FIXUP_TYPES_H
#define EX_TYPE_NONE 0
#define EX_TYPE_DEFAULT 1
#define EX_TYPE_FAULT 2
#define EX_TYPE_UACCESS 3
#define EX_TYPE_COPY 4
#define EX_TYPE_CLEAR_FS 5
#define EX_TYPE_FPU_RESTORE 6
#define EX_TYPE_WRMSR 7
#define EX_TYPE_RDMSR 8
#define EX_TYPE_BPF 9
#define EX_TYPE_WRMSR_IN_MCE 10
#define EX_TYPE_RDMSR_IN_MCE 11
#define EX_TYPE_DEFAULT_MCE_SAFE 12
#define EX_TYPE_FAULT_MCE_SAFE 13
#endif
......@@ -12,6 +12,8 @@
#define _ASM_X86_FPU_API_H
#include <linux/bottom_half.h>
#include <asm/fpu/types.h>
/*
* Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
* disables preemption so be careful if you intend to use it for long periods
......@@ -48,9 +50,9 @@ static inline void kernel_fpu_begin(void)
}
/*
* Use fpregs_lock() while editing CPU's FPU registers or fpu->state.
* Use fpregs_lock() while editing CPU's FPU registers or fpu->fpstate.
* A context switch will (and softirq might) save CPU's FPU registers to
* fpu->state and set TIF_NEED_FPU_LOAD leaving CPU's FPU registers in
* fpu->fpstate.regs and set TIF_NEED_FPU_LOAD leaving CPU's FPU registers in
* a random state.
*
* local_bh_disable() protects against both preemption and soft interrupts
......@@ -108,4 +110,56 @@ extern int cpu_has_xfeatures(u64 xfeatures_mask, const char **feature_name);
static inline void update_pasid(void) { }
/* Trap handling */
extern int fpu__exception_code(struct fpu *fpu, int trap_nr);
extern void fpu_sync_fpstate(struct fpu *fpu);
extern void fpu_reset_from_exception_fixup(void);
/* Boot, hotplug and resume */
extern void fpu__init_cpu(void);
extern void fpu__init_system(struct cpuinfo_x86 *c);
extern void fpu__init_check_bugs(void);
extern void fpu__resume_cpu(void);
#ifdef CONFIG_MATH_EMULATION
extern void fpstate_init_soft(struct swregs_state *soft);
#else
static inline void fpstate_init_soft(struct swregs_state *soft) {}
#endif
/* State tracking */
DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
/* Process cleanup */
#ifdef CONFIG_X86_64
extern void fpstate_free(struct fpu *fpu);
#else
static inline void fpstate_free(struct fpu *fpu) { }
#endif
/* fpstate-related functions which are exported to KVM */
extern void fpstate_clear_xstate_component(struct fpstate *fps, unsigned int xfeature);
/* KVM specific functions */
extern bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu);
extern void fpu_free_guest_fpstate(struct fpu_guest *gfpu);
extern int fpu_swap_kvm_fpstate(struct fpu_guest *gfpu, bool enter_guest);
extern void fpu_copy_guest_fpstate_to_uabi(struct fpu_guest *gfpu, void *buf, unsigned int size, u32 pkru);
extern int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, u64 xcr0, u32 *vpkru);
static inline void fpstate_set_confidential(struct fpu_guest *gfpu)
{
gfpu->fpstate->is_confidential = true;
}
static inline bool fpstate_is_confidential(struct fpu_guest *gfpu)
{
return gfpu->fpstate->is_confidential;
}
/* prctl */
struct task_struct;
extern long fpu_xstate_prctl(struct task_struct *tsk, int option, unsigned long arg2);
#endif /* _ASM_X86_FPU_API_H */
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _ASM_X86_FPU_SCHED_H
#define _ASM_X86_FPU_SCHED_H
#include <linux/sched.h>
#include <asm/cpufeature.h>
#include <asm/fpu/types.h>
#include <asm/trace/fpu.h>
extern void save_fpregs_to_fpstate(struct fpu *fpu);
extern void fpu__drop(struct fpu *fpu);
extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags);
extern void fpu_flush_thread(void);
/*
* FPU state switching for scheduling.
*
* This is a two-stage process:
*
* - switch_fpu_prepare() saves the old state.
* This is done within the context of the old process.
*
* - switch_fpu_finish() sets TIF_NEED_FPU_LOAD; the floating point state
* will get loaded on return to userspace, or when the kernel needs it.
*
* If TIF_NEED_FPU_LOAD is cleared then the CPU's FPU registers
* are saved in the current thread's FPU register state.
*
* If TIF_NEED_FPU_LOAD is set then CPU's FPU registers may not
* hold current()'s FPU registers. It is required to load the
* registers before returning to userland or using the content
* otherwise.
*
* The FPU context is only stored/restored for a user task and
* PF_KTHREAD is used to distinguish between kernel and user threads.
*/
static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
{
if (cpu_feature_enabled(X86_FEATURE_FPU) &&
!(current->flags & PF_KTHREAD)) {
save_fpregs_to_fpstate(old_fpu);
/*
* The save operation preserved register state, so the
* fpu_fpregs_owner_ctx is still @old_fpu. Store the
* current CPU number in @old_fpu, so the next return
* to user space can avoid the FPU register restore
* when is returns on the same CPU and still owns the
* context.
*/
old_fpu->last_cpu = cpu;
trace_x86_fpu_regs_deactivated(old_fpu);
}
}
/*
* Delay loading of the complete FPU state until the return to userland.
* PKRU is handled separately.
*/
static inline void switch_fpu_finish(void)
{
if (cpu_feature_enabled(X86_FEATURE_FPU))
set_thread_flag(TIF_NEED_FPU_LOAD);
}
#endif /* _ASM_X86_FPU_SCHED_H */
......@@ -5,6 +5,11 @@
#ifndef _ASM_X86_FPU_SIGNAL_H
#define _ASM_X86_FPU_SIGNAL_H
#include <linux/compat.h>
#include <linux/user.h>
#include <asm/fpu/types.h>
#ifdef CONFIG_X86_64
# include <uapi/asm/sigcontext.h>
# include <asm/user32.h>
......@@ -31,6 +36,12 @@ fpu__alloc_mathframe(unsigned long sp, int ia32_frame,
unsigned long fpu__get_fpstate_size(void);
extern void fpu__init_prepare_fx_sw_frame(void);
extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
extern void fpu__clear_user_states(struct fpu *fpu);
extern bool fpu__restore_sig(void __user *buf, int ia32_frame);
extern void restore_fpregs_from_fpstate(struct fpstate *fpstate, u64 mask);
extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
#endif /* _ASM_X86_FPU_SIGNAL_H */
......@@ -120,6 +120,9 @@ enum xfeature {
XFEATURE_RSRVD_COMP_13,
XFEATURE_RSRVD_COMP_14,
XFEATURE_LBR,
XFEATURE_RSRVD_COMP_16,
XFEATURE_XTILE_CFG,
XFEATURE_XTILE_DATA,
XFEATURE_MAX,
};
......@@ -136,12 +139,21 @@ enum xfeature {
#define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU)
#define XFEATURE_MASK_PASID (1 << XFEATURE_PASID)
#define XFEATURE_MASK_LBR (1 << XFEATURE_LBR)
#define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG)
#define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA)
#define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
#define XFEATURE_MASK_AVX512 (XFEATURE_MASK_OPMASK \
| XFEATURE_MASK_ZMM_Hi256 \
| XFEATURE_MASK_Hi16_ZMM)
#ifdef CONFIG_X86_64
# define XFEATURE_MASK_XTILE (XFEATURE_MASK_XTILE_DATA \
| XFEATURE_MASK_XTILE_CFG)
#else
# define XFEATURE_MASK_XTILE (0)
#endif
#define FIRST_EXTENDED_XFEATURE XFEATURE_YMM
struct reg_128_bit {
......@@ -153,6 +165,9 @@ struct reg_256_bit {
struct reg_512_bit {
u8 regbytes[512/8];
};
struct reg_1024_byte {
u8 regbytes[1024];
};
/*
* State component 2:
......@@ -255,6 +270,23 @@ struct arch_lbr_state {
u64 ler_to;
u64 ler_info;
struct lbr_entry entries[];
};
/*
* State component 17: 64-byte tile configuration register.
*/
struct xtile_cfg {
u64 tcfg[8];
} __packed;
/*
* State component 18: 1KB tile data register.
* Each register represents 16 64-byte rows of the matrix
* data. But the number of registers depends on the actual
* implementation.
*/
struct xtile_data {
struct reg_1024_byte tmm;
} __packed;
/*
......@@ -309,6 +341,91 @@ union fpregs_state {
u8 __padding[PAGE_SIZE];
};
struct fpstate {
/* @kernel_size: The size of the kernel register image */
unsigned int size;
/* @user_size: The size in non-compacted UABI format */
unsigned int user_size;
/* @xfeatures: xfeatures for which the storage is sized */
u64 xfeatures;
/* @user_xfeatures: xfeatures valid in UABI buffers */
u64 user_xfeatures;
/* @xfd: xfeatures disabled to trap userspace use. */
u64 xfd;
/* @is_valloc: Indicator for dynamically allocated state */
unsigned int is_valloc : 1;
/* @is_guest: Indicator for guest state (KVM) */
unsigned int is_guest : 1;
/*
* @is_confidential: Indicator for KVM confidential mode.
* The FPU registers are restored by the
* vmentry firmware from encrypted guest
* memory. On vmexit the FPU registers are
* saved by firmware to encrypted guest memory
* and the registers are scrubbed before
* returning to the host. So there is no
* content which is worth saving and restoring.
* The fpstate has to be there so that
* preemption and softirq FPU usage works
* without special casing.
*/
unsigned int is_confidential : 1;
/* @in_use: State is in use */
unsigned int in_use : 1;
/* @regs: The register state union for all supported formats */
union fpregs_state regs;
/* @regs is dynamically sized! Don't add anything after @regs! */
} __aligned(64);
struct fpu_state_perm {
/*
* @__state_perm:
*
* This bitmap indicates the permission for state components, which
* are available to a thread group. The permission prctl() sets the
* enabled state bits in thread_group_leader()->thread.fpu.
*
* All run time operations use the per thread information in the
* currently active fpu.fpstate which contains the xfeature masks
* and sizes for kernel and user space.
*
* This master permission field is only to be used when
* task.fpu.fpstate based checks fail to validate whether the task
* is allowed to expand it's xfeatures set which requires to
* allocate a larger sized fpstate buffer.
*
* Do not access this field directly. Use the provided helper
* function. Unlocked access is possible for quick checks.
*/
u64 __state_perm;
/*
* @__state_size:
*
* The size required for @__state_perm. Only valid to access
* with sighand locked.
*/
unsigned int __state_size;
/*
* @__user_state_size:
*
* The size required for @__state_perm user part. Only valid to
* access with sighand locked.
*/
unsigned int __user_state_size;
};
/*
* Highest level per task FPU state data structure that
* contains the FPU register state plus various FPU
......@@ -337,19 +454,100 @@ struct fpu {
unsigned long avx512_timestamp;
/*
* @state:
* @fpstate:
*
* Pointer to the active struct fpstate. Initialized to
* point at @__fpstate below.
*/
struct fpstate *fpstate;
/*
* @__task_fpstate:
*
* Pointer to an inactive struct fpstate. Initialized to NULL. Is
* used only for KVM support to swap out the regular task fpstate.
*/
struct fpstate *__task_fpstate;
/*
* @perm:
*
* Permission related information
*/
struct fpu_state_perm perm;
/*
* @__fpstate:
*
* In-memory copy of all FPU registers that we save/restore
* over context switches. If the task is using the FPU then
* the registers in the FPU are more recent than this state
* copy. If the task context-switches away then they get
* saved here and represent the FPU state.
* Initial in-memory storage for FPU registers which are saved in
* context switch and when the kernel uses the FPU. The registers
* are restored from this storage on return to user space if they
* are not longer containing the tasks FPU register state.
*/
union fpregs_state state;
struct fpstate __fpstate;
/*
* WARNING: 'state' is dynamically-sized. Do not put
* WARNING: '__fpstate' is dynamically-sized. Do not put
* anything after it here.
*/
};
/*
* Guest pseudo FPU container
*/
struct fpu_guest {
/*
* @fpstate: Pointer to the allocated guest fpstate
*/
struct fpstate *fpstate;
};
/*
* FPU state configuration data. Initialized at boot time. Read only after init.
*/
struct fpu_state_config {
/*
* @max_size:
*
* The maximum size of the register state buffer. Includes all
* supported features except independent managed features.
*/
unsigned int max_size;
/*
* @default_size:
*
* The default size of the register state buffer. Includes all
* supported features except independent managed features and
* features which have to be requested by user space before usage.
*/
unsigned int default_size;
/*
* @max_features:
*
* The maximum supported features bitmap. Does not include
* independent managed features.
*/
u64 max_features;
/*
* @default_features:
*
* The default supported features bitmap. Does not include
* independent managed features and features which have to
* be requested by user space before usage.
*/
u64 default_features;
/*
* @legacy_features:
*
* Features which can be reported back to user space
* even without XSAVE support, i.e. legacy features FP + SSE
*/
u64 legacy_features;
};
/* FPU state configuration information */
extern struct fpu_state_config fpu_kernel_cfg, fpu_user_cfg;
#endif /* _ASM_X86_FPU_H */
......@@ -2,17 +2,6 @@
#ifndef _ASM_X86_FPU_XCR_H
#define _ASM_X86_FPU_XCR_H
/*
* MXCSR and XCR definitions:
*/
static inline void ldmxcsr(u32 mxcsr)
{
asm volatile("ldmxcsr %0" :: "m" (mxcsr));
}
extern unsigned int mxcsr_feature_mask;
#define XCR_XFEATURE_ENABLED_MASK 0x00000000
static inline u64 xgetbv(u32 index)
......
......@@ -14,6 +14,8 @@
#define XSTATE_CPUID 0x0000000d
#define TILE_CPUID 0x0000001d
#define FXSAVE_SIZE 512
#define XSAVE_HDR_SIZE 64
......@@ -33,7 +35,8 @@
XFEATURE_MASK_Hi16_ZMM | \
XFEATURE_MASK_PKRU | \
XFEATURE_MASK_BNDREGS | \
XFEATURE_MASK_BNDCSR)
XFEATURE_MASK_BNDCSR | \
XFEATURE_MASK_XTILE)
/*
* Features which are restored when returning to user space.
......@@ -43,6 +46,9 @@
#define XFEATURE_MASK_USER_RESTORE \
(XFEATURE_MASK_USER_SUPPORTED & ~XFEATURE_MASK_PKRU)
/* Features which are dynamically enabled for a process on request */
#define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA
/* All currently supported supervisor features */
#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
......@@ -78,78 +84,42 @@
XFEATURE_MASK_INDEPENDENT | \
XFEATURE_MASK_SUPERVISOR_UNSUPPORTED)
#ifdef CONFIG_X86_64
#define REX_PREFIX "0x48, "
#else
#define REX_PREFIX
#endif
extern u64 xfeatures_mask_all;
static inline u64 xfeatures_mask_supervisor(void)
{
return xfeatures_mask_all & XFEATURE_MASK_SUPERVISOR_SUPPORTED;
}
/*
* The xfeatures which are enabled in XCR0 and expected to be in ptrace
* buffers and signal frames.
* The feature mask required to restore FPU state:
* - All user states which are not eagerly switched in switch_to()/exec()
* - The suporvisor states
*/
static inline u64 xfeatures_mask_uabi(void)
{
return xfeatures_mask_all & XFEATURE_MASK_USER_SUPPORTED;
}
/*
* The xfeatures which are restored by the kernel when returning to user
* mode. This is not necessarily the same as xfeatures_mask_uabi() as the
* kernel does not manage all XCR0 enabled features via xsave/xrstor as
* some of them have to be switched eagerly on context switch and exec().
*/
static inline u64 xfeatures_mask_restore_user(void)
{
return xfeatures_mask_all & XFEATURE_MASK_USER_RESTORE;
}
/*
* Like xfeatures_mask_restore_user() but additionally restors the
* supported supervisor states.
*/
static inline u64 xfeatures_mask_fpstate(void)
{
return xfeatures_mask_all & \
(XFEATURE_MASK_USER_RESTORE | XFEATURE_MASK_SUPERVISOR_SUPPORTED);
}
static inline u64 xfeatures_mask_independent(void)
{
if (!boot_cpu_has(X86_FEATURE_ARCH_LBR))
return XFEATURE_MASK_INDEPENDENT & ~XFEATURE_MASK_LBR;
return XFEATURE_MASK_INDEPENDENT;
}
#define XFEATURE_MASK_FPSTATE (XFEATURE_MASK_USER_RESTORE | \
XFEATURE_MASK_SUPERVISOR_SUPPORTED)
extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];
extern void __init update_regset_xstate_info(unsigned int size,
u64 xstate_mask);
void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr);
int xfeature_size(int xfeature_nr);
int copy_uabi_from_kernel_to_xstate(struct xregs_state *xsave, const void *kbuf);
int copy_sigframe_from_user_to_xstate(struct xregs_state *xsave, const void __user *ubuf);
void xsaves(struct xregs_state *xsave, u64 mask);
void xrstors(struct xregs_state *xsave, u64 mask);
enum xstate_copy_mode {
XSTATE_COPY_FP,
XSTATE_COPY_FX,
XSTATE_COPY_XSAVE,
};
int xfd_enable_feature(u64 xfd_err);
#ifdef CONFIG_X86_64
DECLARE_STATIC_KEY_FALSE(__fpu_state_size_dynamic);
#endif
#ifdef CONFIG_X86_64
DECLARE_STATIC_KEY_FALSE(__fpu_state_size_dynamic);
struct membuf;
void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
enum xstate_copy_mode mode);
static __always_inline __pure bool fpu_state_size_dynamic(void)
{
return static_branch_unlikely(&__fpu_state_size_dynamic);
}
#else
static __always_inline __pure bool fpu_state_size_dynamic(void)
{
return false;
}
#endif
#endif
......@@ -691,11 +691,10 @@ struct kvm_vcpu_arch {
*
* Note that while the PKRU state lives inside the fpu registers,
* it is switched out separately at VMENTER and VMEXIT time. The
* "guest_fpu" state here contains the guest FPU context, with the
* "guest_fpstate" state here contains the guest FPU context, with the
* host PRKU bits.
*/
struct fpu *user_fpu;
struct fpu *guest_fpu;
struct fpu_guest guest_fpu;
u64 xcr0;
u64 guest_supported_xcr0;
......@@ -1686,8 +1685,6 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
int reason, bool has_error_code, u32 error_code);
void kvm_free_guest_fpu(struct kvm_vcpu *vcpu);
void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0);
void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4);
int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
......
......@@ -625,6 +625,8 @@
#define MSR_IA32_BNDCFGS_RSVD 0x00000ffc
#define MSR_IA32_XFD 0x000001c4
#define MSR_IA32_XFD_ERR 0x000001c5
#define MSR_IA32_XSS 0x00000da0
#define MSR_IA32_APICBASE 0x0000001b
......
......@@ -92,7 +92,7 @@ static __always_inline unsigned long long __rdmsr(unsigned int msr)
asm volatile("1: rdmsr\n"
"2:\n"
_ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_rdmsr_unsafe)
_ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR)
: EAX_EDX_RET(val, low, high) : "c" (msr));
return EAX_EDX_VAL(val, low, high);
......@@ -102,7 +102,7 @@ static __always_inline void __wrmsr(unsigned int msr, u32 low, u32 high)
{
asm volatile("1: wrmsr\n"
"2:\n"
_ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_unsafe)
_ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR)
: : "c" (msr), "a"(low), "d" (high) : "memory");
}
......
......@@ -2,7 +2,7 @@
#ifndef _ASM_X86_PKRU_H
#define _ASM_X86_PKRU_H
#include <asm/fpu/xstate.h>
#include <asm/cpufeature.h>
#define PKRU_AD_BIT 0x1
#define PKRU_WD_BIT 0x2
......
......@@ -461,9 +461,6 @@ DECLARE_PER_CPU(struct irq_stack *, hardirq_stack_ptr);
DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr);
#endif /* !X86_64 */
extern unsigned int fpu_kernel_xstate_size;
extern unsigned int fpu_user_xstate_size;
struct perf_event;
struct thread_struct {
......@@ -537,12 +534,12 @@ struct thread_struct {
*/
};
/* Whitelist the FPU state from the task_struct for hardened usercopy. */
extern void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size);
static inline void arch_thread_struct_whitelist(unsigned long *offset,
unsigned long *size)
{
*offset = offsetof(struct thread_struct, fpu.state);
*size = fpu_kernel_xstate_size;
fpu_thread_struct_whitelist(offset, size);
}
static inline void
......
......@@ -40,6 +40,6 @@ void x86_report_nx(void);
extern int reboot_force;
long do_arch_prctl_common(struct task_struct *task, int option,
unsigned long cpuid_enabled);
unsigned long arg2);
#endif /* _ASM_X86_PROTO_H */
......@@ -339,7 +339,7 @@ static inline void __loadsegment_fs(unsigned short value)
"1: movw %0, %%fs \n"
"2: \n"
_ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_clear_fs)
_ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_CLEAR_FS)
: : "rm" (value) : "memory");
}
......
......@@ -22,8 +22,8 @@ DECLARE_EVENT_CLASS(x86_fpu,
__entry->fpu = fpu;
__entry->load_fpu = test_thread_flag(TIF_NEED_FPU_LOAD);
if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
__entry->xfeatures = fpu->state.xsave.header.xfeatures;
__entry->xcomp_bv = fpu->state.xsave.header.xcomp_bv;
__entry->xfeatures = fpu->fpstate->regs.xsave.header.xfeatures;
__entry->xcomp_bv = fpu->fpstate->regs.xsave.header.xcomp_bv;
}
),
TP_printk("x86/fpu: %p load: %d xfeatures: %llx xcomp_bv: %llx",
......
......@@ -10,6 +10,10 @@
#define ARCH_GET_CPUID 0x1011
#define ARCH_SET_CPUID 0x1012
#define ARCH_GET_XCOMP_SUPP 0x1021
#define ARCH_GET_XCOMP_PERM 0x1022
#define ARCH_REQ_XCOMP_PERM 0x1023
#define ARCH_MAP_VDSO_X32 0x2001
#define ARCH_MAP_VDSO_32 0x2002
#define ARCH_MAP_VDSO_64 0x2003
......
......@@ -22,7 +22,7 @@
#include <asm/bugs.h>
#include <asm/processor.h>
#include <asm/processor-flags.h>
#include <asm/fpu/internal.h>
#include <asm/fpu/api.h>
#include <asm/msr.h>
#include <asm/vmx.h>
#include <asm/paravirt.h>
......
......@@ -42,7 +42,7 @@
#include <asm/setup.h>
#include <asm/apic.h>
#include <asm/desc.h>
#include <asm/fpu/internal.h>
#include <asm/fpu/api.h>
#include <asm/mtrr.h>
#include <asm/hwcap2.h>
#include <linux/numa.h>
......
......@@ -75,6 +75,8 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_SGX_LC, X86_FEATURE_SGX },
{ X86_FEATURE_SGX1, X86_FEATURE_SGX },
{ X86_FEATURE_SGX2, X86_FEATURE_SGX1 },
{ X86_FEATURE_XFD, X86_FEATURE_XSAVES },
{ X86_FEATURE_AMX_TILE, X86_FEATURE_XFD },
{}
};
......
......@@ -373,13 +373,16 @@ static int msr_to_offset(u32 msr)
return -1;
}
__visible bool ex_handler_rdmsr_fault(const struct exception_table_entry *fixup,
struct pt_regs *regs, int trapnr,
unsigned long error_code,
unsigned long fault_addr)
void ex_handler_msr_mce(struct pt_regs *regs, bool wrmsr)
{
pr_emerg("MSR access error: RDMSR from 0x%x at rIP: 0x%lx (%pS)\n",
(unsigned int)regs->cx, regs->ip, (void *)regs->ip);
if (wrmsr) {
pr_emerg("MSR access error: WRMSR to 0x%x (tried to write 0x%08x%08x) at rIP: 0x%lx (%pS)\n",
(unsigned int)regs->cx, (unsigned int)regs->dx, (unsigned int)regs->ax,
regs->ip, (void *)regs->ip);
} else {
pr_emerg("MSR access error: RDMSR from 0x%x at rIP: 0x%lx (%pS)\n",
(unsigned int)regs->cx, regs->ip, (void *)regs->ip);
}
show_stack_regs(regs);
......@@ -387,8 +390,6 @@ __visible bool ex_handler_rdmsr_fault(const struct exception_table_entry *fixup,
while (true)
cpu_relax();
return true;
}
/* MSR access wrappers used for error injection */
......@@ -420,32 +421,13 @@ static noinstr u64 mce_rdmsrl(u32 msr)
*/
asm volatile("1: rdmsr\n"
"2:\n"
_ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_rdmsr_fault)
_ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_RDMSR_IN_MCE)
: EAX_EDX_RET(val, low, high) : "c" (msr));
return EAX_EDX_VAL(val, low, high);
}
__visible bool ex_handler_wrmsr_fault(const struct exception_table_entry *fixup,
struct pt_regs *regs, int trapnr,
unsigned long error_code,
unsigned long fault_addr)
{
pr_emerg("MSR access error: WRMSR to 0x%x (tried to write 0x%08x%08x) at rIP: 0x%lx (%pS)\n",
(unsigned int)regs->cx, (unsigned int)regs->dx, (unsigned int)regs->ax,
regs->ip, (void *)regs->ip);
show_stack_regs(regs);
panic("MCA architectural violation!\n");
while (true)
cpu_relax();
return true;
}
static noinstr void mce_wrmsrl(u32 msr, u64 v)
{
u32 low, high;
......@@ -470,7 +452,7 @@ static noinstr void mce_wrmsrl(u32 msr, u64 v)
/* See comment in mce_rdmsrl() */
asm volatile("1: wrmsr\n"
"2:\n"
_ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_wrmsr_fault)
_ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_WRMSR_IN_MCE)
: : "c" (msr), "a"(low), "d" (high) : "memory");
}
......
......@@ -61,7 +61,7 @@ static inline void cmci_disable_bank(int bank) { }
static inline void intel_init_cmci(void) { }
static inline void intel_init_lmce(void) { }
static inline void intel_clear_lmce(void) { }
static inline bool intel_filter_mce(struct mce *m) { return false; };
static inline bool intel_filter_mce(struct mce *m) { return false; }
#endif
void mce_timer_kick(unsigned long interval);
......@@ -183,17 +183,7 @@ extern bool filter_mce(struct mce *m);
#ifdef CONFIG_X86_MCE_AMD
extern bool amd_filter_mce(struct mce *m);
#else
static inline bool amd_filter_mce(struct mce *m) { return false; };
static inline bool amd_filter_mce(struct mce *m) { return false; }
#endif
__visible bool ex_handler_rdmsr_fault(const struct exception_table_entry *fixup,
struct pt_regs *regs, int trapnr,
unsigned long error_code,
unsigned long fault_addr);
__visible bool ex_handler_wrmsr_fault(const struct exception_table_entry *fixup,
struct pt_regs *regs, int trapnr,
unsigned long error_code,
unsigned long fault_addr);
#endif /* __X86_MCE_INTERNAL_H__ */
......@@ -265,25 +265,25 @@ static bool is_copy_from_user(struct pt_regs *regs)
*/
static int error_context(struct mce *m, struct pt_regs *regs)
{
enum handler_type t;
if ((m->cs & 3) == 3)
return IN_USER;
if (!mc_recoverable(m->mcgstatus))
return IN_KERNEL;
t = ex_get_fault_handler_type(m->ip);
if (t == EX_HANDLER_FAULT) {
m->kflags |= MCE_IN_KERNEL_RECOV;
return IN_KERNEL_RECOV;
}
if (t == EX_HANDLER_UACCESS && regs && is_copy_from_user(regs)) {
m->kflags |= MCE_IN_KERNEL_RECOV;
switch (ex_get_fixup_type(m->ip)) {
case EX_TYPE_UACCESS:
case EX_TYPE_COPY:
if (!regs || !is_copy_from_user(regs))
return IN_KERNEL;
m->kflags |= MCE_IN_KERNEL_COPYIN;
fallthrough;
case EX_TYPE_FAULT_MCE_SAFE:
case EX_TYPE_DEFAULT_MCE_SAFE:
m->kflags |= MCE_IN_KERNEL_RECOV;
return IN_KERNEL_RECOV;
default:
return IN_KERNEL;
}
return IN_KERNEL;
}
static int mce_severity_amd_smca(struct mce *m, enum context err_ctx)
......
......@@ -2,7 +2,7 @@
/*
* x86 FPU bug checks:
*/
#include <asm/fpu/internal.h>
#include <asm/fpu/api.h>
/*
* Boot time CPU/FPU FDIV bug detection code:
......
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __X86_KERNEL_FPU_CONTEXT_H
#define __X86_KERNEL_FPU_CONTEXT_H
#include <asm/fpu/xstate.h>
#include <asm/trace/fpu.h>
/* Functions related to FPU context tracking */
/*
* The in-register FPU state for an FPU context on a CPU is assumed to be
* valid if the fpu->last_cpu matches the CPU, and the fpu_fpregs_owner_ctx
* matches the FPU.
*
* If the FPU register state is valid, the kernel can skip restoring the
* FPU state from memory.
*
* Any code that clobbers the FPU registers or updates the in-memory
* FPU state for a task MUST let the rest of the kernel know that the
* FPU registers are no longer valid for this task.
*
* Either one of these invalidation functions is enough. Invalidate
* a resource you control: CPU if using the CPU for something else
* (with preemption disabled), FPU for the current task, or a task that
* is prevented from running by the current task.
*/
static inline void __cpu_invalidate_fpregs_state(void)
{
__this_cpu_write(fpu_fpregs_owner_ctx, NULL);
}
static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu)
{
fpu->last_cpu = -1;
}
static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu)
{
return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
}
static inline void fpregs_deactivate(struct fpu *fpu)
{
__this_cpu_write(fpu_fpregs_owner_ctx, NULL);
trace_x86_fpu_regs_deactivated(fpu);
}
static inline void fpregs_activate(struct fpu *fpu)
{
__this_cpu_write(fpu_fpregs_owner_ctx, fpu);
trace_x86_fpu_regs_activated(fpu);
}
/* Internal helper for switch_fpu_return() and signal frame setup */
static inline void fpregs_restore_userregs(void)
{
struct fpu *fpu = &current->thread.fpu;
int cpu = smp_processor_id();
if (WARN_ON_ONCE(current->flags & PF_KTHREAD))
return;
if (!fpregs_state_valid(fpu, cpu)) {
/*
* This restores _all_ xstate which has not been
* established yet.
*
* If PKRU is enabled, then the PKRU value is already
* correct because it was either set in switch_to() or in
* flush_thread(). So it is excluded because it might be
* not up to date in current->thread.fpu.xsave state.
*
* XFD state is handled in restore_fpregs_from_fpstate().
*/
restore_fpregs_from_fpstate(fpu->fpstate, XFEATURE_MASK_FPSTATE);
fpregs_activate(fpu);
fpu->last_cpu = cpu;
}
clear_thread_flag(TIF_NEED_FPU_LOAD);
}
#endif
This diff is collapsed.
......@@ -2,7 +2,7 @@
/*
* x86 FPU boot time init code:
*/
#include <asm/fpu/internal.h>
#include <asm/fpu/api.h>
#include <asm/tlbflush.h>
#include <asm/setup.h>
......@@ -10,6 +10,10 @@
#include <linux/sched/task.h>
#include <linux/init.h>
#include "internal.h"
#include "legacy.h"
#include "xstate.h"
/*
* Initialize the registers found in all CPUs, CR0 and CR4:
*/
......@@ -34,7 +38,7 @@ static void fpu__init_cpu_generic(void)
/* Flush out any pending x87 state: */
#ifdef CONFIG_MATH_EMULATION
if (!boot_cpu_has(X86_FEATURE_FPU))
fpstate_init_soft(&current->thread.fpu.state.soft);
fpstate_init_soft(&current->thread.fpu.fpstate->regs.soft);
else
#endif
asm volatile ("fninit");
......@@ -121,23 +125,14 @@ static void __init fpu__init_system_mxcsr(void)
static void __init fpu__init_system_generic(void)
{
/*
* Set up the legacy init FPU context. (xstate init might overwrite this
* with a more modern format, if the CPU supports it.)
* Set up the legacy init FPU context. Will be updated when the
* CPU supports XSAVE[S].
*/
fpstate_init(&init_fpstate);
fpstate_init_user(&init_fpstate);
fpu__init_system_mxcsr();
}
/*
* Size of the FPU context state. All tasks in the system use the
* same context size, regardless of what portion they use.
* This is inherent to the XSAVE architecture which puts all state
* components into a single, continuous memory block:
*/
unsigned int fpu_kernel_xstate_size __ro_after_init;
EXPORT_SYMBOL_GPL(fpu_kernel_xstate_size);
/* Get alignment of the TYPE. */
#define TYPE_ALIGN(TYPE) offsetof(struct { char x; TYPE test; }, test)
......@@ -162,13 +157,13 @@ static void __init fpu__init_task_struct_size(void)
* Subtract off the static size of the register state.
* It potentially has a bunch of padding.
*/
task_size -= sizeof(((struct task_struct *)0)->thread.fpu.state);
task_size -= sizeof(current->thread.fpu.__fpstate.regs);
/*
* Add back the dynamically-calculated register state
* size.
*/
task_size += fpu_kernel_xstate_size;
task_size += fpu_kernel_cfg.default_size;
/*
* We dynamically size 'struct fpu', so we require that
......@@ -177,7 +172,7 @@ static void __init fpu__init_task_struct_size(void)
* you hit a compile error here, check the structure to
* see if something got added to the end.
*/
CHECK_MEMBER_AT_END_OF(struct fpu, state);
CHECK_MEMBER_AT_END_OF(struct fpu, __fpstate);
CHECK_MEMBER_AT_END_OF(struct thread_struct, fpu);
CHECK_MEMBER_AT_END_OF(struct task_struct, thread);
......@@ -192,37 +187,34 @@ static void __init fpu__init_task_struct_size(void)
*/
static void __init fpu__init_system_xstate_size_legacy(void)
{
static int on_boot_cpu __initdata = 1;
WARN_ON_FPU(!on_boot_cpu);
on_boot_cpu = 0;
unsigned int size;
/*
* Note that xstate sizes might be overwritten later during
* fpu__init_system_xstate().
* Note that the size configuration might be overwritten later
* during fpu__init_system_xstate().
*/
if (!boot_cpu_has(X86_FEATURE_FPU)) {
fpu_kernel_xstate_size = sizeof(struct swregs_state);
if (!cpu_feature_enabled(X86_FEATURE_FPU)) {
size = sizeof(struct swregs_state);
} else if (cpu_feature_enabled(X86_FEATURE_FXSR)) {
size = sizeof(struct fxregs_state);
fpu_user_cfg.legacy_features = XFEATURE_MASK_FPSSE;
} else {
if (boot_cpu_has(X86_FEATURE_FXSR))
fpu_kernel_xstate_size =
sizeof(struct fxregs_state);
else
fpu_kernel_xstate_size =
sizeof(struct fregs_state);
size = sizeof(struct fregs_state);
fpu_user_cfg.legacy_features = XFEATURE_MASK_FP;
}
fpu_user_xstate_size = fpu_kernel_xstate_size;
fpu_kernel_cfg.max_size = size;
fpu_kernel_cfg.default_size = size;
fpu_user_cfg.max_size = size;
fpu_user_cfg.default_size = size;
fpstate_reset(&current->thread.fpu);
}
/* Legacy code to initialize eager fpu mode. */
static void __init fpu__init_system_ctx_switch(void)
static void __init fpu__init_init_fpstate(void)
{
static bool on_boot_cpu __initdata = 1;
WARN_ON_FPU(!on_boot_cpu);
on_boot_cpu = 0;
/* Bring init_fpstate size and features up to date */
init_fpstate.size = fpu_kernel_cfg.max_size;
init_fpstate.xfeatures = fpu_kernel_cfg.max_features;
}
/*
......@@ -231,6 +223,7 @@ static void __init fpu__init_system_ctx_switch(void)
*/
void __init fpu__init_system(struct cpuinfo_x86 *c)
{
fpstate_reset(&current->thread.fpu);
fpu__init_system_early_generic(c);
/*
......@@ -241,8 +234,7 @@ void __init fpu__init_system(struct cpuinfo_x86 *c)
fpu__init_system_generic();
fpu__init_system_xstate_size_legacy();
fpu__init_system_xstate();
fpu__init_system_xstate(fpu_kernel_cfg.max_size);
fpu__init_task_struct_size();
fpu__init_system_ctx_switch();
fpu__init_init_fpstate();
}
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __X86_KERNEL_FPU_INTERNAL_H
#define __X86_KERNEL_FPU_INTERNAL_H
extern struct fpstate init_fpstate;
/* CPU feature check wrappers */
static __always_inline __pure bool use_xsave(void)
{
return cpu_feature_enabled(X86_FEATURE_XSAVE);
}
static __always_inline __pure bool use_fxsr(void)
{
return cpu_feature_enabled(X86_FEATURE_FXSR);
}
#ifdef CONFIG_X86_DEBUG_FPU
# define WARN_ON_FPU(x) WARN_ON_ONCE(x)
#else
# define WARN_ON_FPU(x) ({ (void)(x); 0; })
#endif
/* Used in init.c */
extern void fpstate_init_user(struct fpstate *fpstate);
extern void fpstate_reset(struct fpu *fpu);
#endif
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __X86_KERNEL_FPU_LEGACY_H
#define __X86_KERNEL_FPU_LEGACY_H
#include <asm/fpu/types.h>
extern unsigned int mxcsr_feature_mask;
static inline void ldmxcsr(u32 mxcsr)
{
asm volatile("ldmxcsr %0" :: "m" (mxcsr));
}
/*
* Returns 0 on success or the trap number when the operation raises an
* exception.
*/
#define user_insn(insn, output, input...) \
({ \
int err; \
\
might_fault(); \
\
asm volatile(ASM_STAC "\n" \
"1: " #insn "\n" \
"2: " ASM_CLAC "\n" \
_ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_FAULT_MCE_SAFE) \
: [err] "=a" (err), output \
: "0"(0), input); \
err; \
})
#define kernel_insn_err(insn, output, input...) \
({ \
int err; \
asm volatile("1:" #insn "\n\t" \
"2:\n" \
".section .fixup,\"ax\"\n" \
"3: movl $-1,%[err]\n" \
" jmp 2b\n" \
".previous\n" \
_ASM_EXTABLE(1b, 3b) \
: [err] "=r" (err), output \
: "0"(0), input); \
err; \
})
#define kernel_insn(insn, output, input...) \
asm volatile("1:" #insn "\n\t" \
"2:\n" \
_ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_FPU_RESTORE) \
: output : input)
static inline int fnsave_to_user_sigframe(struct fregs_state __user *fx)
{
return user_insn(fnsave %[fx]; fwait, [fx] "=m" (*fx), "m" (*fx));
}
static inline int fxsave_to_user_sigframe(struct fxregs_state __user *fx)
{
if (IS_ENABLED(CONFIG_X86_32))
return user_insn(fxsave %[fx], [fx] "=m" (*fx), "m" (*fx));
else
return user_insn(fxsaveq %[fx], [fx] "=m" (*fx), "m" (*fx));
}
static inline void fxrstor(struct fxregs_state *fx)
{
if (IS_ENABLED(CONFIG_X86_32))
kernel_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
else
kernel_insn(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
}
static inline int fxrstor_safe(struct fxregs_state *fx)
{
if (IS_ENABLED(CONFIG_X86_32))
return kernel_insn_err(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
else
return kernel_insn_err(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
}
static inline int fxrstor_from_user_sigframe(struct fxregs_state __user *fx)
{
if (IS_ENABLED(CONFIG_X86_32))
return user_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
else
return user_insn(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
}
static inline void frstor(struct fregs_state *fx)
{
kernel_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
}
static inline int frstor_safe(struct fregs_state *fx)
{
return kernel_insn_err(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
}
static inline int frstor_from_user_sigframe(struct fregs_state __user *fx)
{
return user_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
}
static inline void fxsave(struct fxregs_state *fx)
{
if (IS_ENABLED(CONFIG_X86_32))
asm volatile( "fxsave %[fx]" : [fx] "=m" (*fx));
else
asm volatile("fxsaveq %[fx]" : [fx] "=m" (*fx));
}
#endif
......@@ -5,10 +5,14 @@
#include <linux/sched/task_stack.h>
#include <linux/vmalloc.h>
#include <asm/fpu/internal.h>
#include <asm/fpu/api.h>
#include <asm/fpu/signal.h>
#include <asm/fpu/regset.h>
#include <asm/fpu/xstate.h>
#include "context.h"
#include "internal.h"
#include "legacy.h"
#include "xstate.h"
/*
* The xstateregs_active() routine is the same as the regset_fpregs_active() routine,
......@@ -74,8 +78,8 @@ int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
sync_fpstate(fpu);
if (!use_xsave()) {
return membuf_write(&to, &fpu->state.fxsave,
sizeof(fpu->state.fxsave));
return membuf_write(&to, &fpu->fpstate->regs.fxsave,
sizeof(fpu->fpstate->regs.fxsave));
}
copy_xstate_to_uabi_buf(to, target, XSTATE_COPY_FX);
......@@ -110,15 +114,15 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
fpu_force_restore(fpu);
/* Copy the state */
memcpy(&fpu->state.fxsave, &newstate, sizeof(newstate));
memcpy(&fpu->fpstate->regs.fxsave, &newstate, sizeof(newstate));
/* Clear xmm8..15 */
BUILD_BUG_ON(sizeof(fpu->state.fxsave.xmm_space) != 16 * 16);
memset(&fpu->state.fxsave.xmm_space[8], 0, 8 * 16);
BUILD_BUG_ON(sizeof(fpu->__fpstate.regs.fxsave.xmm_space) != 16 * 16);
memset(&fpu->fpstate->regs.fxsave.xmm_space[8], 0, 8 * 16);
/* Mark FP and SSE as in use when XSAVE is enabled */
if (use_xsave())
fpu->state.xsave.header.xfeatures |= XFEATURE_MASK_FPSSE;
fpu->fpstate->regs.xsave.header.xfeatures |= XFEATURE_MASK_FPSSE;
return 0;
}
......@@ -149,7 +153,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
/*
* A whole standard-format XSAVE buffer is needed:
*/
if (pos != 0 || count != fpu_user_xstate_size)
if (pos != 0 || count != fpu_user_cfg.max_size)
return -EFAULT;
if (!kbuf) {
......@@ -164,7 +168,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
}
fpu_force_restore(fpu);
ret = copy_uabi_from_kernel_to_xstate(&fpu->state.xsave, kbuf ?: tmpbuf);
ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf);
out:
vfree(tmpbuf);
......@@ -283,7 +287,7 @@ static void __convert_from_fxsr(struct user_i387_ia32_struct *env,
void
convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
{
__convert_from_fxsr(env, tsk, &tsk->thread.fpu.state.fxsave);
__convert_from_fxsr(env, tsk, &tsk->thread.fpu.fpstate->regs.fxsave);
}
void convert_to_fxsr(struct fxregs_state *fxsave,
......@@ -326,7 +330,7 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
return fpregs_soft_get(target, regset, to);
if (!cpu_feature_enabled(X86_FEATURE_FXSR)) {
return membuf_write(&to, &fpu->state.fsave,
return membuf_write(&to, &fpu->fpstate->regs.fsave,
sizeof(struct fregs_state));
}
......@@ -337,7 +341,7 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
copy_xstate_to_uabi_buf(mb, target, XSTATE_COPY_FP);
fx = &fxsave;
} else {
fx = &fpu->state.fxsave;
fx = &fpu->fpstate->regs.fxsave;
}
__convert_from_fxsr(&env, target, fx);
......@@ -366,16 +370,16 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
fpu_force_restore(fpu);
if (cpu_feature_enabled(X86_FEATURE_FXSR))
convert_to_fxsr(&fpu->state.fxsave, &env);
convert_to_fxsr(&fpu->fpstate->regs.fxsave, &env);
else
memcpy(&fpu->state.fsave, &env, sizeof(env));
memcpy(&fpu->fpstate->regs.fsave, &env, sizeof(env));
/*
* Update the header bit in the xsave header, indicating the
* presence of FP.
*/
if (cpu_feature_enabled(X86_FEATURE_XSAVE))
fpu->state.xsave.header.xfeatures |= XFEATURE_MASK_FP;
fpu->fpstate->regs.xsave.header.xfeatures |= XFEATURE_MASK_FP;
return 0;
}
......
This diff is collapsed.
This diff is collapsed.
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __X86_KERNEL_FPU_XSTATE_H
#define __X86_KERNEL_FPU_XSTATE_H
#include <asm/cpufeature.h>
#include <asm/fpu/xstate.h>
#ifdef CONFIG_X86_64
DECLARE_PER_CPU(u64, xfd_state);
#endif
static inline void xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask)
{
/*
* XRSTORS requires these bits set in xcomp_bv, or it will
* trigger #GP:
*/
if (cpu_feature_enabled(X86_FEATURE_XSAVES))
xsave->header.xcomp_bv = mask | XCOMP_BV_COMPACTED_FORMAT;
}
static inline u64 xstate_get_host_group_perm(void)
{
/* Pairs with WRITE_ONCE() in xstate_request_perm() */
return READ_ONCE(current->group_leader->thread.fpu.perm.__state_perm);
}
enum xstate_copy_mode {
XSTATE_COPY_FP,
XSTATE_COPY_FX,
XSTATE_COPY_XSAVE,
};
struct membuf;
extern void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
u32 pkru_val, enum xstate_copy_mode copy_mode);
extern void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
enum xstate_copy_mode mode);
extern int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf);
extern int copy_sigframe_from_user_to_xstate(struct fpstate *fpstate, const void __user *ubuf);
extern void fpu__init_cpu_xstate(void);
extern void fpu__init_system_xstate(unsigned int legacy_size);
extern void *get_xsave_addr(struct xregs_state *xsave, int xfeature_nr);
static inline u64 xfeatures_mask_supervisor(void)
{
return fpu_kernel_cfg.max_features & XFEATURE_MASK_SUPERVISOR_SUPPORTED;
}
static inline u64 xfeatures_mask_independent(void)
{
if (!cpu_feature_enabled(X86_FEATURE_ARCH_LBR))
return XFEATURE_MASK_INDEPENDENT & ~XFEATURE_MASK_LBR;
return XFEATURE_MASK_INDEPENDENT;
}
/* XSAVE/XRSTOR wrapper functions */
#ifdef CONFIG_X86_64
#define REX_PREFIX "0x48, "
#else
#define REX_PREFIX
#endif
/* These macros all use (%edi)/(%rdi) as the single memory argument. */
#define XSAVE ".byte " REX_PREFIX "0x0f,0xae,0x27"
#define XSAVEOPT ".byte " REX_PREFIX "0x0f,0xae,0x37"
#define XSAVES ".byte " REX_PREFIX "0x0f,0xc7,0x2f"
#define XRSTOR ".byte " REX_PREFIX "0x0f,0xae,0x2f"
#define XRSTORS ".byte " REX_PREFIX "0x0f,0xc7,0x1f"
/*
* After this @err contains 0 on success or the trap number when the
* operation raises an exception.
*/
#define XSTATE_OP(op, st, lmask, hmask, err) \
asm volatile("1:" op "\n\t" \
"xor %[err], %[err]\n" \
"2:\n\t" \
_ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_FAULT_MCE_SAFE) \
: [err] "=a" (err) \
: "D" (st), "m" (*st), "a" (lmask), "d" (hmask) \
: "memory")
/*
* If XSAVES is enabled, it replaces XSAVEOPT because it supports a compact
* format and supervisor states in addition to modified optimization in
* XSAVEOPT.
*
* Otherwise, if XSAVEOPT is enabled, XSAVEOPT replaces XSAVE because XSAVEOPT
* supports modified optimization which is not supported by XSAVE.
*
* We use XSAVE as a fallback.
*
* The 661 label is defined in the ALTERNATIVE* macros as the address of the
* original instruction which gets replaced. We need to use it here as the
* address of the instruction where we might get an exception at.
*/
#define XSTATE_XSAVE(st, lmask, hmask, err) \
asm volatile(ALTERNATIVE_2(XSAVE, \
XSAVEOPT, X86_FEATURE_XSAVEOPT, \
XSAVES, X86_FEATURE_XSAVES) \
"\n" \
"xor %[err], %[err]\n" \
"3:\n" \
".pushsection .fixup,\"ax\"\n" \
"4: movl $-2, %[err]\n" \
"jmp 3b\n" \
".popsection\n" \
_ASM_EXTABLE(661b, 4b) \
: [err] "=r" (err) \
: "D" (st), "m" (*st), "a" (lmask), "d" (hmask) \
: "memory")
/*
* Use XRSTORS to restore context if it is enabled. XRSTORS supports compact
* XSAVE area format.
*/
#define XSTATE_XRESTORE(st, lmask, hmask) \
asm volatile(ALTERNATIVE(XRSTOR, \
XRSTORS, X86_FEATURE_XSAVES) \
"\n" \
"3:\n" \
_ASM_EXTABLE_TYPE(661b, 3b, EX_TYPE_FPU_RESTORE) \
: \
: "D" (st), "m" (*st), "a" (lmask), "d" (hmask) \
: "memory")
#if defined(CONFIG_X86_64) && defined(CONFIG_X86_DEBUG_FPU)
extern void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor);
#else
static inline void xfd_validate_state(struct fpstate *fpstate, u64 mask, bool rstor) { }
#endif
#ifdef CONFIG_X86_64
static inline void xfd_update_state(struct fpstate *fpstate)
{
if (fpu_state_size_dynamic()) {
u64 xfd = fpstate->xfd;
if (__this_cpu_read(xfd_state) != xfd) {
wrmsrl(MSR_IA32_XFD, xfd);
__this_cpu_write(xfd_state, xfd);
}
}
}
#else
static inline void xfd_update_state(struct fpstate *fpstate) { }
#endif
/*
* Save processor xstate to xsave area.
*
* Uses either XSAVE or XSAVEOPT or XSAVES depending on the CPU features
* and command line options. The choice is permanent until the next reboot.
*/
static inline void os_xsave(struct fpstate *fpstate)
{
u64 mask = fpstate->xfeatures;
u32 lmask = mask;
u32 hmask = mask >> 32;
int err;
WARN_ON_FPU(!alternatives_patched);
xfd_validate_state(fpstate, mask, false);
XSTATE_XSAVE(&fpstate->regs.xsave, lmask, hmask, err);
/* We should never fault when copying to a kernel buffer: */
WARN_ON_FPU(err);
}
/*
* Restore processor xstate from xsave area.
*
* Uses XRSTORS when XSAVES is used, XRSTOR otherwise.
*/
static inline void os_xrstor(struct fpstate *fpstate, u64 mask)
{
u32 lmask = mask;
u32 hmask = mask >> 32;
xfd_validate_state(fpstate, mask, true);
XSTATE_XRESTORE(&fpstate->regs.xsave, lmask, hmask);
}
/* Restore of supervisor state. Does not require XFD */
static inline void os_xrstor_supervisor(struct fpstate *fpstate)
{
u64 mask = xfeatures_mask_supervisor();
u32 lmask = mask;
u32 hmask = mask >> 32;
XSTATE_XRESTORE(&fpstate->regs.xsave, lmask, hmask);
}
/*
* Save xstate to user space xsave area.
*
* We don't use modified optimization because xrstor/xrstors might track
* a different application.
*
* We don't use compacted format xsave area for backward compatibility for
* old applications which don't understand the compacted format of the
* xsave area.
*
* The caller has to zero buf::header before calling this because XSAVE*
* does not touch the reserved fields in the header.
*/
static inline int xsave_to_user_sigframe(struct xregs_state __user *buf)
{
/*
* Include the features which are not xsaved/rstored by the kernel
* internally, e.g. PKRU. That's user space ABI and also required
* to allow the signal handler to modify PKRU.
*/
struct fpstate *fpstate = current->thread.fpu.fpstate;
u64 mask = fpstate->user_xfeatures;
u32 lmask = mask;
u32 hmask = mask >> 32;
int err;
xfd_validate_state(fpstate, mask, false);
stac();
XSTATE_OP(XSAVE, buf, lmask, hmask, err);
clac();
return err;
}
/*
* Restore xstate from user space xsave area.
*/
static inline int xrstor_from_user_sigframe(struct xregs_state __user *buf, u64 mask)
{
struct xregs_state *xstate = ((__force struct xregs_state *)buf);
u32 lmask = mask;
u32 hmask = mask >> 32;
int err;
xfd_validate_state(current->thread.fpu.fpstate, mask, true);
stac();
XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
clac();
return err;
}
/*
* Restore xstate from kernel space xsave area, return an error code instead of
* an exception.
*/
static inline int os_xrstor_safe(struct fpstate *fpstate, u64 mask)
{
struct xregs_state *xstate = &fpstate->regs.xsave;
u32 lmask = mask;
u32 hmask = mask >> 32;
int err;
/* Ensure that XFD is up to date */
xfd_update_state(fpstate);
if (cpu_feature_enabled(X86_FEATURE_XSAVES))
XSTATE_OP(XRSTORS, xstate, lmask, hmask, err);
else
XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
return err;
}
#endif
This diff is collapsed.
......@@ -41,7 +41,7 @@
#include <asm/ldt.h>
#include <asm/processor.h>
#include <asm/fpu/internal.h>
#include <asm/fpu/sched.h>
#include <asm/desc.h>
#include <linux/err.h>
......@@ -160,7 +160,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
struct thread_struct *prev = &prev_p->thread,
*next = &next_p->thread;
struct fpu *prev_fpu = &prev->fpu;
struct fpu *next_fpu = &next->fpu;
int cpu = smp_processor_id();
/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */
......@@ -213,7 +212,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
this_cpu_write(current_task, next_p);
switch_fpu_finish(next_fpu);
switch_fpu_finish();
/* Load the Intel cache allocation PQR MSR. */
resctrl_sched_in();
......
This diff is collapsed.
......@@ -29,9 +29,9 @@
#include <linux/uaccess.h>
#include <asm/processor.h>
#include <asm/fpu/internal.h>
#include <asm/fpu/signal.h>
#include <asm/fpu/regset.h>
#include <asm/fpu/xstate.h>
#include <asm/debugreg.h>
#include <asm/ldt.h>
#include <asm/desc.h>
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment