Commit d6ec9d9a authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 core updates from Ingo Molnar:
 "Note that in this cycle most of the x86 topics interacted at a level
  that caused them to be merged into tip:x86/asm - but this should be a
  temporary phenomenon, hopefully we'll back to the usual patterns in
  the next merge window.

  The main changes in this cycle were:

  Hardware enablement:

   - Add support for the Intel UMIP (User Mode Instruction Prevention)
     CPU feature. This is a security feature that disables certain
     instructions such as SGDT, SLDT, SIDT, SMSW and STR. (Ricardo Neri)

     [ Note that this is disabled by default for now, there are some
       smaller enhancements in the pipeline that I'll follow up with in
       the next 1-2 days, which allows this to be enabled by default.]

   - Add support for the AMD SEV (Secure Encrypted Virtualization) CPU
     feature, on top of SME (Secure Memory Encryption) support that was
     added in v4.14. (Tom Lendacky, Brijesh Singh)

   - Enable new SSE/AVX/AVX512 CPU features: AVX512_VBMI2, GFNI, VAES,
     VPCLMULQDQ, AVX512_VNNI, AVX512_BITALG. (Gayatri Kammela)

  Other changes:

   - A big series of entry code simplifications and enhancements (Andy
     Lutomirski)

   - Make the ORC unwinder default on x86 and various objtool
     enhancements. (Josh Poimboeuf)

   - 5-level paging enhancements (Kirill A. Shutemov)

   - Micro-optimize the entry code a bit (Borislav Petkov)

   - Improve the handling of interdependent CPU features in the early
     FPU init code (Andi Kleen)

   - Build system enhancements (Changbin Du, Masahiro Yamada)

   - ... plus misc enhancements, fixes and cleanups"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (118 commits)
  x86/build: Make the boot image generation less verbose
  selftests/x86: Add tests for the STR and SLDT instructions
  selftests/x86: Add tests for User-Mode Instruction Prevention
  x86/traps: Fix up general protection faults caused by UMIP
  x86/umip: Enable User-Mode Instruction Prevention at runtime
  x86/umip: Force a page fault when unable to copy emulated result to user
  x86/umip: Add emulation code for UMIP instructions
  x86/cpufeature: Add User-Mode Instruction Prevention definitions
  x86/insn-eval: Add support to resolve 16-bit address encodings
  x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode
  x86/insn-eval: Add wrapper function for 32 and 64-bit addresses
  x86/insn-eval: Add support to resolve 32-bit address encodings
  x86/insn-eval: Compute linear address in several utility functions
  resource: Fix resource_size.cocci warnings
  X86/KVM: Clear encryption attribute when SEV is active
  X86/KVM: Decrypt shared per-cpu variables when SEV is active
  percpu: Introduce DEFINE_PER_CPU_DECRYPTED
  x86: Add support for changing memory encryption attribute in early boot
  x86/io: Unroll string I/O when SEV is active
  x86/boot: Add early boot support when running with SEV active
  ...
parents 3e201463 91a6a6cf
Secure Memory Encryption (SME) is a feature found on AMD processors. Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV) are
features found on AMD processors.
SME provides the ability to mark individual pages of memory as encrypted using SME provides the ability to mark individual pages of memory as encrypted using
the standard x86 page tables. A page that is marked encrypted will be the standard x86 page tables. A page that is marked encrypted will be
...@@ -6,24 +7,38 @@ automatically decrypted when read from DRAM and encrypted when written to ...@@ -6,24 +7,38 @@ automatically decrypted when read from DRAM and encrypted when written to
DRAM. SME can therefore be used to protect the contents of DRAM from physical DRAM. SME can therefore be used to protect the contents of DRAM from physical
attacks on the system. attacks on the system.
SEV enables running encrypted virtual machines (VMs) in which the code and data
of the guest VM are secured so that a decrypted version is available only
within the VM itself. SEV guest VMs have the concept of private and shared
memory. Private memory is encrypted with the guest-specific key, while shared
memory may be encrypted with hypervisor key. When SME is enabled, the hypervisor
key is the same key which is used in SME.
A page is encrypted when a page table entry has the encryption bit set (see A page is encrypted when a page table entry has the encryption bit set (see
below on how to determine its position). The encryption bit can also be below on how to determine its position). The encryption bit can also be
specified in the cr3 register, allowing the PGD table to be encrypted. Each specified in the cr3 register, allowing the PGD table to be encrypted. Each
successive level of page tables can also be encrypted by setting the encryption successive level of page tables can also be encrypted by setting the encryption
bit in the page table entry that points to the next table. This allows the full bit in the page table entry that points to the next table. This allows the full
page table hierarchy to be encrypted. Note, this means that just because the page table hierarchy to be encrypted. Note, this means that just because the
encryption bit is set in cr3, doesn't imply the full hierarchy is encyrpted. encryption bit is set in cr3, doesn't imply the full hierarchy is encrypted.
Each page table entry in the hierarchy needs to have the encryption bit set to Each page table entry in the hierarchy needs to have the encryption bit set to
achieve that. So, theoretically, you could have the encryption bit set in cr3 achieve that. So, theoretically, you could have the encryption bit set in cr3
so that the PGD is encrypted, but not set the encryption bit in the PGD entry so that the PGD is encrypted, but not set the encryption bit in the PGD entry
for a PUD which results in the PUD pointed to by that entry to not be for a PUD which results in the PUD pointed to by that entry to not be
encrypted. encrypted.
Support for SME can be determined through the CPUID instruction. The CPUID When SEV is enabled, instruction pages and guest page tables are always treated
function 0x8000001f reports information related to SME: as private. All the DMA operations inside the guest must be performed on shared
memory. Since the memory encryption bit is controlled by the guest OS when it
is operating in 64-bit or 32-bit PAE mode, in all other modes the SEV hardware
forces the memory encryption bit to 1.
Support for SME and SEV can be determined through the CPUID instruction. The
CPUID function 0x8000001f reports information related to SME:
0x8000001f[eax]: 0x8000001f[eax]:
Bit[0] indicates support for SME Bit[0] indicates support for SME
Bit[1] indicates support for SEV
0x8000001f[ebx]: 0x8000001f[ebx]:
Bits[5:0] pagetable bit number used to activate memory Bits[5:0] pagetable bit number used to activate memory
encryption encryption
...@@ -39,6 +54,13 @@ determine if SME is enabled and/or to enable memory encryption: ...@@ -39,6 +54,13 @@ determine if SME is enabled and/or to enable memory encryption:
Bit[23] 0 = memory encryption features are disabled Bit[23] 0 = memory encryption features are disabled
1 = memory encryption features are enabled 1 = memory encryption features are enabled
If SEV is supported, MSR 0xc0010131 (MSR_AMD64_SEV) can be used to determine if
SEV is active:
0xc0010131:
Bit[0] 0 = memory encryption is not active
1 = memory encryption is active
Linux relies on BIOS to set this bit if BIOS has determined that the reduction Linux relies on BIOS to set this bit if BIOS has determined that the reduction
in the physical address space as a result of enabling memory encryption (see in the physical address space as a result of enabling memory encryption (see
CPUID information above) will not conflict with the address space resource CPUID information above) will not conflict with the address space resource
......
...@@ -4,7 +4,7 @@ ORC unwinder ...@@ -4,7 +4,7 @@ ORC unwinder
Overview Overview
-------- --------
The kernel CONFIG_ORC_UNWINDER option enables the ORC unwinder, which is The kernel CONFIG_UNWINDER_ORC option enables the ORC unwinder, which is
similar in concept to a DWARF unwinder. The difference is that the similar in concept to a DWARF unwinder. The difference is that the
format of the ORC data is much simpler than DWARF, which in turn allows format of the ORC data is much simpler than DWARF, which in turn allows
the ORC unwinder to be much simpler and faster. the ORC unwinder to be much simpler and faster.
......
...@@ -34,7 +34,7 @@ ff92000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space ...@@ -34,7 +34,7 @@ ff92000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space
ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole
ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB) ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB)
... unused hole ... ... unused hole ...
ffd8000000000000 - fff7ffffffffffff (=53 bits) kasan shadow memory (8PB) ffdf000000000000 - fffffc0000000000 (=53 bits) kasan shadow memory (8PB)
... unused hole ... ... unused hole ...
ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
... unused hole ... ... unused hole ...
......
...@@ -934,8 +934,8 @@ ifdef CONFIG_STACK_VALIDATION ...@@ -934,8 +934,8 @@ ifdef CONFIG_STACK_VALIDATION
ifeq ($(has_libelf),1) ifeq ($(has_libelf),1)
objtool_target := tools/objtool FORCE objtool_target := tools/objtool FORCE
else else
ifdef CONFIG_ORC_UNWINDER ifdef CONFIG_UNWINDER_ORC
$(error "Cannot generate ORC metadata for CONFIG_ORC_UNWINDER=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel") $(error "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel")
else else
$(warning "Cannot use CONFIG_STACK_VALIDATION=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel") $(warning "Cannot use CONFIG_STACK_VALIDATION=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel")
endif endif
......
...@@ -91,11 +91,13 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image) ...@@ -91,11 +91,13 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
* and that value will be returned. If all free regions are visited without * and that value will be returned. If all free regions are visited without
* func returning non-zero, then zero will be returned. * func returning non-zero, then zero will be returned.
*/ */
int arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *)) int arch_kexec_walk_mem(struct kexec_buf *kbuf,
int (*func)(struct resource *, void *))
{ {
int ret = 0; int ret = 0;
u64 i; u64 i;
phys_addr_t mstart, mend; phys_addr_t mstart, mend;
struct resource res = { };
if (kbuf->top_down) { if (kbuf->top_down) {
for_each_free_mem_range_reverse(i, NUMA_NO_NODE, 0, for_each_free_mem_range_reverse(i, NUMA_NO_NODE, 0,
...@@ -105,7 +107,9 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *)) ...@@ -105,7 +107,9 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *))
* range while in kexec, end points to the last byte * range while in kexec, end points to the last byte
* in the range. * in the range.
*/ */
ret = func(mstart, mend - 1, kbuf); res.start = mstart;
res.end = mend - 1;
ret = func(&res, kbuf);
if (ret) if (ret)
break; break;
} }
...@@ -117,7 +121,9 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *)) ...@@ -117,7 +121,9 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *))
* range while in kexec, end points to the last byte * range while in kexec, end points to the last byte
* in the range. * in the range.
*/ */
ret = func(mstart, mend - 1, kbuf); res.start = mstart;
res.end = mend - 1;
ret = func(&res, kbuf);
if (ret) if (ret)
break; break;
} }
......
...@@ -171,7 +171,7 @@ config X86 ...@@ -171,7 +171,7 @@ config X86
select HAVE_PERF_USER_STACK_DUMP select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE select HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER_UNWINDER && STACK_VALIDATION select HAVE_RELIABLE_STACKTRACE if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION
select HAVE_STACK_VALIDATION if X86_64 select HAVE_STACK_VALIDATION if X86_64
select HAVE_SYSCALL_TRACEPOINTS select HAVE_SYSCALL_TRACEPOINTS
select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_UNSTABLE_SCHED_CLOCK
...@@ -303,7 +303,6 @@ config ARCH_SUPPORTS_DEBUG_PAGEALLOC ...@@ -303,7 +303,6 @@ config ARCH_SUPPORTS_DEBUG_PAGEALLOC
config KASAN_SHADOW_OFFSET config KASAN_SHADOW_OFFSET
hex hex
depends on KASAN depends on KASAN
default 0xdff8000000000000 if X86_5LEVEL
default 0xdffffc0000000000 default 0xdffffc0000000000
config HAVE_INTEL_TXT config HAVE_INTEL_TXT
...@@ -1803,6 +1802,16 @@ config X86_SMAP ...@@ -1803,6 +1802,16 @@ config X86_SMAP
If unsure, say Y. If unsure, say Y.
config X86_INTEL_UMIP
def_bool n
depends on CPU_SUP_INTEL
prompt "Intel User Mode Instruction Prevention" if EXPERT
---help---
The User Mode Instruction Prevention (UMIP) is a security
feature in newer Intel processors. If enabled, a general
protection fault is issued if the instructions SGDT, SLDT,
SIDT, SMSW and STR are executed in user mode.
config X86_INTEL_MPX config X86_INTEL_MPX
prompt "Intel MPX (Memory Protection Extensions)" prompt "Intel MPX (Memory Protection Extensions)"
def_bool n def_bool n
......
...@@ -359,28 +359,14 @@ config PUNIT_ATOM_DEBUG ...@@ -359,28 +359,14 @@ config PUNIT_ATOM_DEBUG
choice choice
prompt "Choose kernel unwinder" prompt "Choose kernel unwinder"
default FRAME_POINTER_UNWINDER default UNWINDER_ORC if X86_64
default UNWINDER_FRAME_POINTER if X86_32
---help--- ---help---
This determines which method will be used for unwinding kernel stack This determines which method will be used for unwinding kernel stack
traces for panics, oopses, bugs, warnings, perf, /proc/<pid>/stack, traces for panics, oopses, bugs, warnings, perf, /proc/<pid>/stack,
livepatch, lockdep, and more. livepatch, lockdep, and more.
config FRAME_POINTER_UNWINDER config UNWINDER_ORC
bool "Frame pointer unwinder"
select FRAME_POINTER
---help---
This option enables the frame pointer unwinder for unwinding kernel
stack traces.
The unwinder itself is fast and it uses less RAM than the ORC
unwinder, but the kernel text size will grow by ~3% and the kernel's
overall performance will degrade by roughly 5-10%.
This option is recommended if you want to use the livepatch
consistency model, as this is currently the only way to get a
reliable stack trace (CONFIG_HAVE_RELIABLE_STACKTRACE).
config ORC_UNWINDER
bool "ORC unwinder" bool "ORC unwinder"
depends on X86_64 depends on X86_64
select STACK_VALIDATION select STACK_VALIDATION
...@@ -396,7 +382,22 @@ config ORC_UNWINDER ...@@ -396,7 +382,22 @@ config ORC_UNWINDER
Enabling this option will increase the kernel's runtime memory usage Enabling this option will increase the kernel's runtime memory usage
by roughly 2-4MB, depending on your kernel config. by roughly 2-4MB, depending on your kernel config.
config GUESS_UNWINDER config UNWINDER_FRAME_POINTER
bool "Frame pointer unwinder"
select FRAME_POINTER
---help---
This option enables the frame pointer unwinder for unwinding kernel
stack traces.
The unwinder itself is fast and it uses less RAM than the ORC
unwinder, but the kernel text size will grow by ~3% and the kernel's
overall performance will degrade by roughly 5-10%.
This option is recommended if you want to use the livepatch
consistency model, as this is currently the only way to get a
reliable stack trace (CONFIG_HAVE_RELIABLE_STACKTRACE).
config UNWINDER_GUESS
bool "Guess unwinder" bool "Guess unwinder"
depends on EXPERT depends on EXPERT
---help--- ---help---
...@@ -411,7 +412,7 @@ config GUESS_UNWINDER ...@@ -411,7 +412,7 @@ config GUESS_UNWINDER
endchoice endchoice
config FRAME_POINTER config FRAME_POINTER
depends on !ORC_UNWINDER && !GUESS_UNWINDER depends on !UNWINDER_ORC && !UNWINDER_GUESS
bool bool
endmenu endmenu
...@@ -7,3 +7,6 @@ zoffset.h ...@@ -7,3 +7,6 @@ zoffset.h
setup setup
setup.bin setup.bin
setup.elf setup.elf
fdimage
mtools.conf
image.iso
...@@ -123,63 +123,26 @@ image_cmdline = default linux $(FDARGS) $(if $(FDINITRD),initrd=initrd.img,) ...@@ -123,63 +123,26 @@ image_cmdline = default linux $(FDARGS) $(if $(FDINITRD),initrd=initrd.img,)
$(obj)/mtools.conf: $(src)/mtools.conf.in $(obj)/mtools.conf: $(src)/mtools.conf.in
sed -e 's|@OBJ@|$(obj)|g' < $< > $@ sed -e 's|@OBJ@|$(obj)|g' < $< > $@
quiet_cmd_genimage = GENIMAGE $3
cmd_genimage = sh $(srctree)/$(src)/genimage.sh $2 $3 $(obj)/bzImage \
$(obj)/mtools.conf '$(image_cmdline)' $(FDINITRD)
# This requires write access to /dev/fd0 # This requires write access to /dev/fd0
bzdisk: $(obj)/bzImage $(obj)/mtools.conf bzdisk: $(obj)/bzImage $(obj)/mtools.conf
MTOOLSRC=$(obj)/mtools.conf mformat a: ; sync $(call cmd,genimage,bzdisk,/dev/fd0)
syslinux /dev/fd0 ; sync
echo '$(image_cmdline)' | \
MTOOLSRC=$(src)/mtools.conf mcopy - a:syslinux.cfg
if [ -f '$(FDINITRD)' ] ; then \
MTOOLSRC=$(obj)/mtools.conf mcopy '$(FDINITRD)' a:initrd.img ; \
fi
MTOOLSRC=$(obj)/mtools.conf mcopy $(obj)/bzImage a:linux ; sync
# These require being root or having syslinux 2.02 or higher installed # These require being root or having syslinux 2.02 or higher installed
fdimage fdimage144: $(obj)/bzImage $(obj)/mtools.conf fdimage fdimage144: $(obj)/bzImage $(obj)/mtools.conf
dd if=/dev/zero of=$(obj)/fdimage bs=1024 count=1440 $(call cmd,genimage,fdimage144,$(obj)/fdimage)
MTOOLSRC=$(obj)/mtools.conf mformat v: ; sync @$(kecho) 'Kernel: $(obj)/fdimage is ready'
syslinux $(obj)/fdimage ; sync
echo '$(image_cmdline)' | \
MTOOLSRC=$(obj)/mtools.conf mcopy - v:syslinux.cfg
if [ -f '$(FDINITRD)' ] ; then \
MTOOLSRC=$(obj)/mtools.conf mcopy '$(FDINITRD)' v:initrd.img ; \
fi
MTOOLSRC=$(obj)/mtools.conf mcopy $(obj)/bzImage v:linux ; sync
fdimage288: $(obj)/bzImage $(obj)/mtools.conf fdimage288: $(obj)/bzImage $(obj)/mtools.conf
dd if=/dev/zero of=$(obj)/fdimage bs=1024 count=2880 $(call cmd,genimage,fdimage288,$(obj)/fdimage)
MTOOLSRC=$(obj)/mtools.conf mformat w: ; sync @$(kecho) 'Kernel: $(obj)/fdimage is ready'
syslinux $(obj)/fdimage ; sync
echo '$(image_cmdline)' | \
MTOOLSRC=$(obj)/mtools.conf mcopy - w:syslinux.cfg
if [ -f '$(FDINITRD)' ] ; then \
MTOOLSRC=$(obj)/mtools.conf mcopy '$(FDINITRD)' w:initrd.img ; \
fi
MTOOLSRC=$(obj)/mtools.conf mcopy $(obj)/bzImage w:linux ; sync
isoimage: $(obj)/bzImage isoimage: $(obj)/bzImage
-rm -rf $(obj)/isoimage $(call cmd,genimage,isoimage,$(obj)/image.iso)
mkdir $(obj)/isoimage @$(kecho) 'Kernel: $(obj)/image.iso is ready'
for i in lib lib64 share end ; do \
if [ -f /usr/$$i/syslinux/isolinux.bin ] ; then \
cp /usr/$$i/syslinux/isolinux.bin $(obj)/isoimage ; \
if [ -f /usr/$$i/syslinux/ldlinux.c32 ]; then \
cp /usr/$$i/syslinux/ldlinux.c32 $(obj)/isoimage ; \
fi ; \
break ; \
fi ; \
if [ $$i = end ] ; then exit 1 ; fi ; \
done
cp $(obj)/bzImage $(obj)/isoimage/linux
echo '$(image_cmdline)' > $(obj)/isoimage/isolinux.cfg
if [ -f '$(FDINITRD)' ] ; then \
cp '$(FDINITRD)' $(obj)/isoimage/initrd.img ; \
fi
mkisofs -J -r -o $(obj)/image.iso -b isolinux.bin -c boot.cat \
-no-emul-boot -boot-load-size 4 -boot-info-table \
$(obj)/isoimage
isohybrid $(obj)/image.iso 2>/dev/null || true
rm -rf $(obj)/isoimage
bzlilo: $(obj)/bzImage bzlilo: $(obj)/bzImage
if [ -f $(INSTALL_PATH)/vmlinuz ]; then mv $(INSTALL_PATH)/vmlinuz $(INSTALL_PATH)/vmlinuz.old; fi if [ -f $(INSTALL_PATH)/vmlinuz ]; then mv $(INSTALL_PATH)/vmlinuz $(INSTALL_PATH)/vmlinuz.old; fi
......
...@@ -78,6 +78,7 @@ vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o ...@@ -78,6 +78,7 @@ vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
ifdef CONFIG_X86_64 ifdef CONFIG_X86_64
vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/pagetable.o vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/pagetable.o
vmlinux-objs-y += $(obj)/mem_encrypt.o
endif endif
$(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
......
...@@ -131,6 +131,19 @@ ENTRY(startup_32) ...@@ -131,6 +131,19 @@ ENTRY(startup_32)
/* /*
* Build early 4G boot pagetable * Build early 4G boot pagetable
*/ */
/*
* If SEV is active then set the encryption mask in the page tables.
* This will insure that when the kernel is copied and decompressed
* it will be done so encrypted.
*/
call get_sev_encryption_bit
xorl %edx, %edx
testl %eax, %eax
jz 1f
subl $32, %eax /* Encryption bit is always above bit 31 */
bts %eax, %edx /* Set encryption mask for page tables */
1:
/* Initialize Page tables to 0 */ /* Initialize Page tables to 0 */
leal pgtable(%ebx), %edi leal pgtable(%ebx), %edi
xorl %eax, %eax xorl %eax, %eax
...@@ -141,12 +154,14 @@ ENTRY(startup_32) ...@@ -141,12 +154,14 @@ ENTRY(startup_32)
leal pgtable + 0(%ebx), %edi leal pgtable + 0(%ebx), %edi
leal 0x1007 (%edi), %eax leal 0x1007 (%edi), %eax
movl %eax, 0(%edi) movl %eax, 0(%edi)
addl %edx, 4(%edi)
/* Build Level 3 */ /* Build Level 3 */
leal pgtable + 0x1000(%ebx), %edi leal pgtable + 0x1000(%ebx), %edi
leal 0x1007(%edi), %eax leal 0x1007(%edi), %eax
movl $4, %ecx movl $4, %ecx
1: movl %eax, 0x00(%edi) 1: movl %eax, 0x00(%edi)
addl %edx, 0x04(%edi)
addl $0x00001000, %eax addl $0x00001000, %eax
addl $8, %edi addl $8, %edi
decl %ecx decl %ecx
...@@ -157,6 +172,7 @@ ENTRY(startup_32) ...@@ -157,6 +172,7 @@ ENTRY(startup_32)
movl $0x00000183, %eax movl $0x00000183, %eax
movl $2048, %ecx movl $2048, %ecx
1: movl %eax, 0(%edi) 1: movl %eax, 0(%edi)
addl %edx, 4(%edi)
addl $0x00200000, %eax addl $0x00200000, %eax
addl $8, %edi addl $8, %edi
decl %ecx decl %ecx
......
/*
* AMD Memory Encryption Support
*
* Copyright (C) 2017 Advanced Micro Devices, Inc.
*
* Author: Tom Lendacky <thomas.lendacky@amd.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/linkage.h>
#include <asm/processor-flags.h>
#include <asm/msr.h>
#include <asm/asm-offsets.h>
.text
.code32
ENTRY(get_sev_encryption_bit)
xor %eax, %eax
#ifdef CONFIG_AMD_MEM_ENCRYPT
push %ebx
push %ecx
push %edx
push %edi
/*
* RIP-relative addressing is needed to access the encryption bit
* variable. Since we are running in 32-bit mode we need this call/pop
* sequence to get the proper relative addressing.
*/
call 1f
1: popl %edi
subl $1b, %edi
movl enc_bit(%edi), %eax
cmpl $0, %eax
jge .Lsev_exit
/* Check if running under a hypervisor */
movl $1, %eax
cpuid
bt $31, %ecx /* Check the hypervisor bit */
jnc .Lno_sev
movl $0x80000000, %eax /* CPUID to check the highest leaf */
cpuid
cmpl $0x8000001f, %eax /* See if 0x8000001f is available */
jb .Lno_sev
/*
* Check for the SEV feature:
* CPUID Fn8000_001F[EAX] - Bit 1
* CPUID Fn8000_001F[EBX] - Bits 5:0
* Pagetable bit position used to indicate encryption
*/
movl $0x8000001f, %eax
cpuid
bt $1, %eax /* Check if SEV is available */
jnc .Lno_sev
movl $MSR_AMD64_SEV, %ecx /* Read the SEV MSR */
rdmsr
bt $MSR_AMD64_SEV_ENABLED_BIT, %eax /* Check if SEV is active */
jnc .Lno_sev
movl %ebx, %eax
andl $0x3f, %eax /* Return the encryption bit location */
movl %eax, enc_bit(%edi)
jmp .Lsev_exit
.Lno_sev:
xor %eax, %eax
movl %eax, enc_bit(%edi)
.Lsev_exit:
pop %edi
pop %edx
pop %ecx
pop %ebx
#endif /* CONFIG_AMD_MEM_ENCRYPT */
ret
ENDPROC(get_sev_encryption_bit)
.code64
ENTRY(get_sev_encryption_mask)
xor %rax, %rax
#ifdef CONFIG_AMD_MEM_ENCRYPT
push %rbp
push %rdx
movq %rsp, %rbp /* Save current stack pointer */
call get_sev_encryption_bit /* Get the encryption bit position */
testl %eax, %eax
jz .Lno_sev_mask
xor %rdx, %rdx
bts %rax, %rdx /* Create the encryption mask */
mov %rdx, %rax /* ... and return it */
.Lno_sev_mask:
movq %rbp, %rsp /* Restore original stack pointer */
pop %rdx
pop %rbp
#endif
ret
ENDPROC(get_sev_encryption_mask)
.data
enc_bit:
.int 0xffffffff
...@@ -109,4 +109,6 @@ static inline void console_init(void) ...@@ -109,4 +109,6 @@ static inline void console_init(void)
{ } { }
#endif #endif
unsigned long get_sev_encryption_mask(void);
#endif #endif
...@@ -77,16 +77,18 @@ static unsigned long top_level_pgt; ...@@ -77,16 +77,18 @@ static unsigned long top_level_pgt;
* Mapping information structure passed to kernel_ident_mapping_init(). * Mapping information structure passed to kernel_ident_mapping_init().
* Due to relocation, pointers must be assigned at run time not build time. * Due to relocation, pointers must be assigned at run time not build time.
*/ */
static struct x86_mapping_info mapping_info = { static struct x86_mapping_info mapping_info;
.page_flag = __PAGE_KERNEL_LARGE_EXEC,
};
/* Locates and clears a region for a new top level page table. */ /* Locates and clears a region for a new top level page table. */
void initialize_identity_maps(void) void initialize_identity_maps(void)
{ {
unsigned long sev_me_mask = get_sev_encryption_mask();
/* Init mapping_info with run-time function/buffer pointers. */ /* Init mapping_info with run-time function/buffer pointers. */
mapping_info.alloc_pgt_page = alloc_pgt_page; mapping_info.alloc_pgt_page = alloc_pgt_page;
mapping_info.context = &pgt_data; mapping_info.context = &pgt_data;
mapping_info.page_flag = __PAGE_KERNEL_LARGE_EXEC | sev_me_mask;
mapping_info.kernpg_flag = _KERNPG_TABLE | sev_me_mask;
/* /*
* It should be impossible for this not to already be true, * It should be impossible for this not to already be true,
......
#!/bin/sh
#
# This file is subject to the terms and conditions of the GNU General Public
# License. See the file "COPYING" in the main directory of this archive
# for more details.
#
# Copyright (C) 2017 by Changbin Du <changbin.du@intel.com>
#
# Adapted from code in arch/x86/boot/Makefile by H. Peter Anvin and others
#
# "make fdimage/fdimage144/fdimage288/isoimage" script for x86 architecture
#
# Arguments:
# $1 - fdimage format
# $2 - target image file
# $3 - kernel bzImage file
# $4 - mtool configuration file
# $5 - kernel cmdline
# $6 - inird image file
#
# Use "make V=1" to debug this script
case "${KBUILD_VERBOSE}" in
*1*)
set -x
;;
esac
verify () {
if [ ! -f "$1" ]; then
echo "" 1>&2
echo " *** Missing file: $1" 1>&2
echo "" 1>&2
exit 1
fi
}
export MTOOLSRC=$4
FIMAGE=$2
FBZIMAGE=$3
KCMDLINE=$5
FDINITRD=$6
# Make sure the files actually exist
verify "$FBZIMAGE"
verify "$MTOOLSRC"
genbzdisk() {
mformat a:
syslinux $FIMAGE
echo "$KCMDLINE" | mcopy - a:syslinux.cfg
if [ -f "$FDINITRD" ] ; then
mcopy "$FDINITRD" a:initrd.img
fi
mcopy $FBZIMAGE a:linux
}
genfdimage144() {
dd if=/dev/zero of=$FIMAGE bs=1024 count=1440 2> /dev/null
mformat v:
syslinux $FIMAGE
echo "$KCMDLINE" | mcopy - v:syslinux.cfg
if [ -f "$FDINITRD" ] ; then
mcopy "$FDINITRD" v:initrd.img
fi
mcopy $FBZIMAGE v:linux
}
genfdimage288() {
dd if=/dev/zero of=$FIMAGE bs=1024 count=2880 2> /dev/null
mformat w:
syslinux $FIMAGE
echo "$KCMDLINE" | mcopy - W:syslinux.cfg
if [ -f "$FDINITRD" ] ; then
mcopy "$FDINITRD" w:initrd.img
fi
mcopy $FBZIMAGE w:linux
}
genisoimage() {
tmp_dir=`dirname $FIMAGE`/isoimage
rm -rf $tmp_dir
mkdir $tmp_dir
for i in lib lib64 share end ; do
for j in syslinux ISOLINUX ; do
if [ -f /usr/$i/$j/isolinux.bin ] ; then
isolinux=/usr/$i/$j/isolinux.bin
cp $isolinux $tmp_dir
fi
done
for j in syslinux syslinux/modules/bios ; do
if [ -f /usr/$i/$j/ldlinux.c32 ]; then
ldlinux=/usr/$i/$j/ldlinux.c32
cp $ldlinux $tmp_dir
fi
done
if [ -n "$isolinux" -a -n "$ldlinux" ] ; then
break
fi
if [ $i = end -a -z "$isolinux" ] ; then
echo 'Need an isolinux.bin file, please install syslinux/isolinux.'
exit 1
fi
done
cp $FBZIMAGE $tmp_dir/linux
echo "$KCMDLINE" > $tmp_dir/isolinux.cfg
if [ -f "$FDINITRD" ] ; then
cp "$FDINITRD" $tmp_dir/initrd.img
fi
mkisofs -J -r -input-charset=utf-8 -quiet -o $FIMAGE -b isolinux.bin \
-c boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table \
$tmp_dir
isohybrid $FIMAGE 2>/dev/null || true
rm -rf $tmp_dir
}
case $1 in
bzdisk) genbzdisk;;
fdimage144) genfdimage144;;
fdimage288) genfdimage288;;
isoimage) genisoimage;;
*) echo 'Unknown image format'; exit 1;
esac
CONFIG_NOHIGHMEM=y CONFIG_NOHIGHMEM=y
# CONFIG_HIGHMEM4G is not set # CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set # CONFIG_HIGHMEM64G is not set
CONFIG_GUESS_UNWINDER=y CONFIG_UNWINDER_GUESS=y
# CONFIG_FRAME_POINTER_UNWINDER is not set # CONFIG_UNWINDER_FRAME_POINTER is not set
...@@ -299,6 +299,7 @@ CONFIG_DEBUG_STACKOVERFLOW=y ...@@ -299,6 +299,7 @@ CONFIG_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_RODATA_TEST is not set # CONFIG_DEBUG_RODATA_TEST is not set
CONFIG_DEBUG_BOOT_PARAMS=y CONFIG_DEBUG_BOOT_PARAMS=y
CONFIG_OPTIMIZE_INLINING=y CONFIG_OPTIMIZE_INLINING=y
CONFIG_UNWINDER_ORC=y
CONFIG_SECURITY=y CONFIG_SECURITY=y
CONFIG_SECURITY_NETWORK=y CONFIG_SECURITY_NETWORK=y
CONFIG_SECURITY_SELINUX=y CONFIG_SECURITY_SELINUX=y
......
...@@ -142,56 +142,25 @@ For 32-bit we have the following conventions - kernel is built with ...@@ -142,56 +142,25 @@ For 32-bit we have the following conventions - kernel is built with
UNWIND_HINT_REGS offset=\offset UNWIND_HINT_REGS offset=\offset
.endm .endm
.macro RESTORE_EXTRA_REGS offset=0 .macro POP_EXTRA_REGS
movq 0*8+\offset(%rsp), %r15 popq %r15
movq 1*8+\offset(%rsp), %r14 popq %r14
movq 2*8+\offset(%rsp), %r13 popq %r13
movq 3*8+\offset(%rsp), %r12 popq %r12
movq 4*8+\offset(%rsp), %rbp popq %rbp
movq 5*8+\offset(%rsp), %rbx popq %rbx
UNWIND_HINT_REGS offset=\offset extra=0 .endm
.endm
.macro POP_C_REGS
.macro RESTORE_C_REGS_HELPER rstor_rax=1, rstor_rcx=1, rstor_r11=1, rstor_r8910=1, rstor_rdx=1 popq %r11
.if \rstor_r11 popq %r10
movq 6*8(%rsp), %r11 popq %r9
.endif popq %r8
.if \rstor_r8910 popq %rax
movq 7*8(%rsp), %r10 popq %rcx
movq 8*8(%rsp), %r9 popq %rdx
movq 9*8(%rsp), %r8 popq %rsi
.endif popq %rdi
.if \rstor_rax
movq 10*8(%rsp), %rax
.endif
.if \rstor_rcx
movq 11*8(%rsp), %rcx
.endif
.if \rstor_rdx
movq 12*8(%rsp), %rdx
.endif
movq 13*8(%rsp), %rsi
movq 14*8(%rsp), %rdi
UNWIND_HINT_IRET_REGS offset=16*8
.endm
.macro RESTORE_C_REGS
RESTORE_C_REGS_HELPER 1,1,1,1,1
.endm
.macro RESTORE_C_REGS_EXCEPT_RAX
RESTORE_C_REGS_HELPER 0,1,1,1,1
.endm
.macro RESTORE_C_REGS_EXCEPT_RCX
RESTORE_C_REGS_HELPER 1,0,1,1,1
.endm
.macro RESTORE_C_REGS_EXCEPT_R11
RESTORE_C_REGS_HELPER 1,1,0,1,1
.endm
.macro RESTORE_C_REGS_EXCEPT_RCX_R11
RESTORE_C_REGS_HELPER 1,0,0,1,1
.endm
.macro REMOVE_PT_GPREGS_FROM_STACK addskip=0
subq $-(15*8+\addskip), %rsp
.endm .endm
.macro icebp .macro icebp
......
...@@ -221,10 +221,9 @@ entry_SYSCALL_64_fastpath: ...@@ -221,10 +221,9 @@ entry_SYSCALL_64_fastpath:
TRACE_IRQS_ON /* user mode is traced as IRQs on */ TRACE_IRQS_ON /* user mode is traced as IRQs on */
movq RIP(%rsp), %rcx movq RIP(%rsp), %rcx
movq EFLAGS(%rsp), %r11 movq EFLAGS(%rsp), %r11
RESTORE_C_REGS_EXCEPT_RCX_R11 addq $6*8, %rsp /* skip extra regs -- they were preserved */
movq RSP(%rsp), %rsp
UNWIND_HINT_EMPTY UNWIND_HINT_EMPTY
USERGS_SYSRET64 jmp .Lpop_c_regs_except_rcx_r11_and_sysret
1: 1:
/* /*
...@@ -246,17 +245,18 @@ entry_SYSCALL64_slow_path: ...@@ -246,17 +245,18 @@ entry_SYSCALL64_slow_path:
call do_syscall_64 /* returns with IRQs disabled */ call do_syscall_64 /* returns with IRQs disabled */
return_from_SYSCALL_64: return_from_SYSCALL_64:
RESTORE_EXTRA_REGS
TRACE_IRQS_IRETQ /* we're about to change IF */ TRACE_IRQS_IRETQ /* we're about to change IF */
/* /*
* Try to use SYSRET instead of IRET if we're returning to * Try to use SYSRET instead of IRET if we're returning to
* a completely clean 64-bit userspace context. * a completely clean 64-bit userspace context. If we're not,
* go to the slow exit path.
*/ */
movq RCX(%rsp), %rcx movq RCX(%rsp), %rcx
movq RIP(%rsp), %r11 movq RIP(%rsp), %r11
cmpq %rcx, %r11 /* RCX == RIP */
jne opportunistic_sysret_failed cmpq %rcx, %r11 /* SYSRET requires RCX == RIP */
jne swapgs_restore_regs_and_return_to_usermode
/* /*
* On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP
...@@ -274,14 +274,14 @@ return_from_SYSCALL_64: ...@@ -274,14 +274,14 @@ return_from_SYSCALL_64:
/* If this changed %rcx, it was not canonical */ /* If this changed %rcx, it was not canonical */
cmpq %rcx, %r11 cmpq %rcx, %r11
jne opportunistic_sysret_failed jne swapgs_restore_regs_and_return_to_usermode
cmpq $__USER_CS, CS(%rsp) /* CS must match SYSRET */ cmpq $__USER_CS, CS(%rsp) /* CS must match SYSRET */
jne opportunistic_sysret_failed jne swapgs_restore_regs_and_return_to_usermode
movq R11(%rsp), %r11 movq R11(%rsp), %r11
cmpq %r11, EFLAGS(%rsp) /* R11 == RFLAGS */ cmpq %r11, EFLAGS(%rsp) /* R11 == RFLAGS */
jne opportunistic_sysret_failed jne swapgs_restore_regs_and_return_to_usermode
/* /*
* SYSCALL clears RF when it saves RFLAGS in R11 and SYSRET cannot * SYSCALL clears RF when it saves RFLAGS in R11 and SYSRET cannot
...@@ -302,12 +302,12 @@ return_from_SYSCALL_64: ...@@ -302,12 +302,12 @@ return_from_SYSCALL_64:
* would never get past 'stuck_here'. * would never get past 'stuck_here'.
*/ */
testq $(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11 testq $(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
jnz opportunistic_sysret_failed jnz swapgs_restore_regs_and_return_to_usermode
/* nothing to check for RSP */ /* nothing to check for RSP */
cmpq $__USER_DS, SS(%rsp) /* SS must match SYSRET */ cmpq $__USER_DS, SS(%rsp) /* SS must match SYSRET */
jne opportunistic_sysret_failed jne swapgs_restore_regs_and_return_to_usermode
/* /*
* We win! This label is here just for ease of understanding * We win! This label is here just for ease of understanding
...@@ -315,14 +315,20 @@ return_from_SYSCALL_64: ...@@ -315,14 +315,20 @@ return_from_SYSCALL_64:
*/ */
syscall_return_via_sysret: syscall_return_via_sysret:
/* rcx and r11 are already restored (see code above) */ /* rcx and r11 are already restored (see code above) */
RESTORE_C_REGS_EXCEPT_RCX_R11
movq RSP(%rsp), %rsp
UNWIND_HINT_EMPTY UNWIND_HINT_EMPTY
POP_EXTRA_REGS
.Lpop_c_regs_except_rcx_r11_and_sysret:
popq %rsi /* skip r11 */
popq %r10
popq %r9
popq %r8
popq %rax
popq %rsi /* skip rcx */
popq %rdx
popq %rsi
popq %rdi
movq RSP-ORIG_RAX(%rsp), %rsp
USERGS_SYSRET64 USERGS_SYSRET64
opportunistic_sysret_failed:
SWAPGS
jmp restore_c_regs_and_iret
END(entry_SYSCALL_64) END(entry_SYSCALL_64)
ENTRY(stub_ptregs_64) ENTRY(stub_ptregs_64)
...@@ -423,8 +429,7 @@ ENTRY(ret_from_fork) ...@@ -423,8 +429,7 @@ ENTRY(ret_from_fork)
movq %rsp, %rdi movq %rsp, %rdi
call syscall_return_slowpath /* returns with IRQs disabled */ call syscall_return_slowpath /* returns with IRQs disabled */
TRACE_IRQS_ON /* user mode is traced as IRQS on */ TRACE_IRQS_ON /* user mode is traced as IRQS on */
SWAPGS jmp swapgs_restore_regs_and_return_to_usermode
jmp restore_regs_and_iret
1: 1:
/* kernel thread */ /* kernel thread */
...@@ -612,8 +617,21 @@ GLOBAL(retint_user) ...@@ -612,8 +617,21 @@ GLOBAL(retint_user)
mov %rsp,%rdi mov %rsp,%rdi
call prepare_exit_to_usermode call prepare_exit_to_usermode
TRACE_IRQS_IRETQ TRACE_IRQS_IRETQ
GLOBAL(swapgs_restore_regs_and_return_to_usermode)
#ifdef CONFIG_DEBUG_ENTRY
/* Assert that pt_regs indicates user mode. */
testb $3, CS(%rsp)
jnz 1f
ud2
1:
#endif
SWAPGS SWAPGS
jmp restore_regs_and_iret POP_EXTRA_REGS
POP_C_REGS
addq $8, %rsp /* skip regs->orig_ax */
INTERRUPT_RETURN
/* Returning to kernel space */ /* Returning to kernel space */
retint_kernel: retint_kernel:
...@@ -633,15 +651,17 @@ retint_kernel: ...@@ -633,15 +651,17 @@ retint_kernel:
*/ */
TRACE_IRQS_IRETQ TRACE_IRQS_IRETQ
/* GLOBAL(restore_regs_and_return_to_kernel)
* At this label, code paths which return to kernel and to user, #ifdef CONFIG_DEBUG_ENTRY
* which come from interrupts/exception and from syscalls, merge. /* Assert that pt_regs indicates kernel mode. */
*/ testb $3, CS(%rsp)
GLOBAL(restore_regs_and_iret) jz 1f
RESTORE_EXTRA_REGS ud2
restore_c_regs_and_iret: 1:
RESTORE_C_REGS #endif
REMOVE_PT_GPREGS_FROM_STACK 8 POP_EXTRA_REGS
POP_C_REGS
addq $8, %rsp /* skip regs->orig_ax */
INTERRUPT_RETURN INTERRUPT_RETURN
ENTRY(native_iret) ENTRY(native_iret)
...@@ -818,7 +838,7 @@ ENTRY(\sym) ...@@ -818,7 +838,7 @@ ENTRY(\sym)
ASM_CLAC ASM_CLAC
.ifeq \has_error_code .if \has_error_code == 0
pushq $-1 /* ORIG_RAX: no syscall to restart */ pushq $-1 /* ORIG_RAX: no syscall to restart */
.endif .endif
...@@ -1059,6 +1079,7 @@ idtentry int3 do_int3 has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK ...@@ -1059,6 +1079,7 @@ idtentry int3 do_int3 has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK
idtentry stack_segment do_stack_segment has_error_code=1 idtentry stack_segment do_stack_segment has_error_code=1
#ifdef CONFIG_XEN #ifdef CONFIG_XEN
idtentry xennmi do_nmi has_error_code=0
idtentry xendebug do_debug has_error_code=0 idtentry xendebug do_debug has_error_code=0
idtentry xenint3 do_int3 has_error_code=0 idtentry xenint3 do_int3 has_error_code=0
#endif #endif
...@@ -1112,17 +1133,14 @@ ENTRY(paranoid_exit) ...@@ -1112,17 +1133,14 @@ ENTRY(paranoid_exit)
DISABLE_INTERRUPTS(CLBR_ANY) DISABLE_INTERRUPTS(CLBR_ANY)
TRACE_IRQS_OFF_DEBUG TRACE_IRQS_OFF_DEBUG
testl %ebx, %ebx /* swapgs needed? */ testl %ebx, %ebx /* swapgs needed? */
jnz paranoid_exit_no_swapgs jnz .Lparanoid_exit_no_swapgs
TRACE_IRQS_IRETQ TRACE_IRQS_IRETQ
SWAPGS_UNSAFE_STACK SWAPGS_UNSAFE_STACK
jmp paranoid_exit_restore jmp .Lparanoid_exit_restore
paranoid_exit_no_swapgs: .Lparanoid_exit_no_swapgs:
TRACE_IRQS_IRETQ_DEBUG TRACE_IRQS_IRETQ_DEBUG
paranoid_exit_restore: .Lparanoid_exit_restore:
RESTORE_EXTRA_REGS jmp restore_regs_and_return_to_kernel
RESTORE_C_REGS
REMOVE_PT_GPREGS_FROM_STACK 8
INTERRUPT_RETURN
END(paranoid_exit) END(paranoid_exit)
/* /*
...@@ -1223,10 +1241,13 @@ ENTRY(error_exit) ...@@ -1223,10 +1241,13 @@ ENTRY(error_exit)
jmp retint_user jmp retint_user
END(error_exit) END(error_exit)
/* Runs on exception stack */ /*
/* XXX: broken on Xen PV */ * Runs on exception stack. Xen PV does not go through this path at all,
* so we can use real assembly here.
*/
ENTRY(nmi) ENTRY(nmi)
UNWIND_HINT_IRET_REGS UNWIND_HINT_IRET_REGS
/* /*
* We allow breakpoints in NMIs. If a breakpoint occurs, then * We allow breakpoints in NMIs. If a breakpoint occurs, then
* the iretq it performs will take us out of NMI context. * the iretq it performs will take us out of NMI context.
...@@ -1284,7 +1305,7 @@ ENTRY(nmi) ...@@ -1284,7 +1305,7 @@ ENTRY(nmi)
* stacks lest we corrupt the "NMI executing" variable. * stacks lest we corrupt the "NMI executing" variable.
*/ */
SWAPGS_UNSAFE_STACK swapgs
cld cld
movq %rsp, %rdx movq %rsp, %rdx
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
...@@ -1328,8 +1349,7 @@ ENTRY(nmi) ...@@ -1328,8 +1349,7 @@ ENTRY(nmi)
* Return back to user mode. We must *not* do the normal exit * Return back to user mode. We must *not* do the normal exit
* work, because we don't want to enable interrupts. * work, because we don't want to enable interrupts.
*/ */
SWAPGS jmp swapgs_restore_regs_and_return_to_usermode
jmp restore_regs_and_iret
.Lnmi_from_kernel: .Lnmi_from_kernel:
/* /*
...@@ -1450,7 +1470,7 @@ nested_nmi_out: ...@@ -1450,7 +1470,7 @@ nested_nmi_out:
popq %rdx popq %rdx
/* We are returning to kernel mode, so this cannot result in a fault. */ /* We are returning to kernel mode, so this cannot result in a fault. */
INTERRUPT_RETURN iretq
first_nmi: first_nmi:
/* Restore rdx. */ /* Restore rdx. */
...@@ -1481,7 +1501,7 @@ first_nmi: ...@@ -1481,7 +1501,7 @@ first_nmi:
pushfq /* RFLAGS */ pushfq /* RFLAGS */
pushq $__KERNEL_CS /* CS */ pushq $__KERNEL_CS /* CS */
pushq $1f /* RIP */ pushq $1f /* RIP */
INTERRUPT_RETURN /* continues at repeat_nmi below */ iretq /* continues at repeat_nmi below */
UNWIND_HINT_IRET_REGS UNWIND_HINT_IRET_REGS
1: 1:
#endif #endif
...@@ -1544,29 +1564,34 @@ end_repeat_nmi: ...@@ -1544,29 +1564,34 @@ end_repeat_nmi:
nmi_swapgs: nmi_swapgs:
SWAPGS_UNSAFE_STACK SWAPGS_UNSAFE_STACK
nmi_restore: nmi_restore:
RESTORE_EXTRA_REGS POP_EXTRA_REGS
RESTORE_C_REGS POP_C_REGS
/* Point RSP at the "iret" frame. */ /*
REMOVE_PT_GPREGS_FROM_STACK 6*8 * Skip orig_ax and the "outermost" frame to point RSP at the "iret"
* at the "iret" frame.
*/
addq $6*8, %rsp
/* /*
* Clear "NMI executing". Set DF first so that we can easily * Clear "NMI executing". Set DF first so that we can easily
* distinguish the remaining code between here and IRET from * distinguish the remaining code between here and IRET from
* the SYSCALL entry and exit paths. On a native kernel, we * the SYSCALL entry and exit paths.
* could just inspect RIP, but, on paravirt kernels, *
* INTERRUPT_RETURN can translate into a jump into a * We arguably should just inspect RIP instead, but I (Andy) wrote
* hypercall page. * this code when I had the misapprehension that Xen PV supported
* NMIs, and Xen PV would break that approach.
*/ */
std std
movq $0, 5*8(%rsp) /* clear "NMI executing" */ movq $0, 5*8(%rsp) /* clear "NMI executing" */
/* /*
* INTERRUPT_RETURN reads the "iret" frame and exits the NMI * iretq reads the "iret" frame and exits the NMI stack in a
* stack in a single instruction. We are returning to kernel * single instruction. We are returning to kernel mode, so this
* mode, so this cannot result in a fault. * cannot result in a fault. Similarly, we don't need to worry
* about espfix64 on the way back to kernel mode.
*/ */
INTERRUPT_RETURN iretq
END(nmi) END(nmi)
ENTRY(ignore_sysret) ENTRY(ignore_sysret)
......
...@@ -337,8 +337,7 @@ ENTRY(entry_INT80_compat) ...@@ -337,8 +337,7 @@ ENTRY(entry_INT80_compat)
/* Go back to user mode. */ /* Go back to user mode. */
TRACE_IRQS_ON TRACE_IRQS_ON
SWAPGS jmp swapgs_restore_regs_and_return_to_usermode
jmp restore_regs_and_iret
END(entry_INT80_compat) END(entry_INT80_compat)
ENTRY(stub32_clone) ENTRY(stub32_clone)
......
# SPDX-License-Identifier: GPL-2.0 # SPDX-License-Identifier: GPL-2.0
out := $(obj)/../../include/generated/asm out := arch/$(SRCARCH)/include/generated/asm
uapi := $(obj)/../../include/generated/uapi/asm uapi := arch/$(SRCARCH)/include/generated/uapi/asm
# Create output directory if not already present # Create output directory if not already present
_dummy := $(shell [ -d '$(out)' ] || mkdir -p '$(out)') \ _dummy := $(shell [ -d '$(out)' ] || mkdir -p '$(out)') \
......
...@@ -114,10 +114,11 @@ static int vvar_fault(const struct vm_special_mapping *sm, ...@@ -114,10 +114,11 @@ static int vvar_fault(const struct vm_special_mapping *sm,
struct pvclock_vsyscall_time_info *pvti = struct pvclock_vsyscall_time_info *pvti =
pvclock_pvti_cpu0_va(); pvclock_pvti_cpu0_va();
if (pvti && vclock_was_used(VCLOCK_PVCLOCK)) { if (pvti && vclock_was_used(VCLOCK_PVCLOCK)) {
ret = vm_insert_pfn( ret = vm_insert_pfn_prot(
vma, vma,
vmf->address, vmf->address,
__pa(pvti) >> PAGE_SHIFT); __pa(pvti) >> PAGE_SHIFT,
pgprot_decrypted(vma->vm_page_prot));
} }
} else if (sym_offset == image->sym_hvclock_page) { } else if (sym_offset == image->sym_hvclock_page) {
struct ms_hyperv_tsc_page *tsc_pg = hv_get_tsc_page(); struct ms_hyperv_tsc_page *tsc_pg = hv_get_tsc_page();
......
...@@ -45,7 +45,7 @@ static inline bool rdrand_long(unsigned long *v) ...@@ -45,7 +45,7 @@ static inline bool rdrand_long(unsigned long *v)
bool ok; bool ok;
unsigned int retry = RDRAND_RETRY_LOOPS; unsigned int retry = RDRAND_RETRY_LOOPS;
do { do {
asm volatile(RDRAND_LONG "\n\t" asm volatile(RDRAND_LONG
CC_SET(c) CC_SET(c)
: CC_OUT(c) (ok), "=a" (*v)); : CC_OUT(c) (ok), "=a" (*v));
if (ok) if (ok)
...@@ -59,7 +59,7 @@ static inline bool rdrand_int(unsigned int *v) ...@@ -59,7 +59,7 @@ static inline bool rdrand_int(unsigned int *v)
bool ok; bool ok;
unsigned int retry = RDRAND_RETRY_LOOPS; unsigned int retry = RDRAND_RETRY_LOOPS;
do { do {
asm volatile(RDRAND_INT "\n\t" asm volatile(RDRAND_INT
CC_SET(c) CC_SET(c)
: CC_OUT(c) (ok), "=a" (*v)); : CC_OUT(c) (ok), "=a" (*v));
if (ok) if (ok)
...@@ -71,7 +71,7 @@ static inline bool rdrand_int(unsigned int *v) ...@@ -71,7 +71,7 @@ static inline bool rdrand_int(unsigned int *v)
static inline bool rdseed_long(unsigned long *v) static inline bool rdseed_long(unsigned long *v)
{ {
bool ok; bool ok;
asm volatile(RDSEED_LONG "\n\t" asm volatile(RDSEED_LONG
CC_SET(c) CC_SET(c)
: CC_OUT(c) (ok), "=a" (*v)); : CC_OUT(c) (ok), "=a" (*v));
return ok; return ok;
...@@ -80,7 +80,7 @@ static inline bool rdseed_long(unsigned long *v) ...@@ -80,7 +80,7 @@ static inline bool rdseed_long(unsigned long *v)
static inline bool rdseed_int(unsigned int *v) static inline bool rdseed_int(unsigned int *v)
{ {
bool ok; bool ok;
asm volatile(RDSEED_INT "\n\t" asm volatile(RDSEED_INT
CC_SET(c) CC_SET(c)
: CC_OUT(c) (ok), "=a" (*v)); : CC_OUT(c) (ok), "=a" (*v));
return ok; return ok;
......
...@@ -143,7 +143,7 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr) ...@@ -143,7 +143,7 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr) static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr)
{ {
bool negative; bool negative;
asm volatile(LOCK_PREFIX "andb %2,%1\n\t" asm volatile(LOCK_PREFIX "andb %2,%1"
CC_SET(s) CC_SET(s)
: CC_OUT(s) (negative), ADDR : CC_OUT(s) (negative), ADDR
: "ir" ((char) ~(1 << nr)) : "memory"); : "ir" ((char) ~(1 << nr)) : "memory");
...@@ -246,7 +246,7 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long * ...@@ -246,7 +246,7 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
{ {
bool oldbit; bool oldbit;
asm("bts %2,%1\n\t" asm("bts %2,%1"
CC_SET(c) CC_SET(c)
: CC_OUT(c) (oldbit), ADDR : CC_OUT(c) (oldbit), ADDR
: "Ir" (nr)); : "Ir" (nr));
...@@ -286,7 +286,7 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long ...@@ -286,7 +286,7 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
{ {
bool oldbit; bool oldbit;
asm volatile("btr %2,%1\n\t" asm volatile("btr %2,%1"
CC_SET(c) CC_SET(c)
: CC_OUT(c) (oldbit), ADDR : CC_OUT(c) (oldbit), ADDR
: "Ir" (nr)); : "Ir" (nr));
...@@ -298,7 +298,7 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon ...@@ -298,7 +298,7 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
{ {
bool oldbit; bool oldbit;
asm volatile("btc %2,%1\n\t" asm volatile("btc %2,%1"
CC_SET(c) CC_SET(c)
: CC_OUT(c) (oldbit), ADDR : CC_OUT(c) (oldbit), ADDR
: "Ir" (nr) : "memory"); : "Ir" (nr) : "memory");
...@@ -329,7 +329,7 @@ static __always_inline bool variable_test_bit(long nr, volatile const unsigned l ...@@ -329,7 +329,7 @@ static __always_inline bool variable_test_bit(long nr, volatile const unsigned l
{ {
bool oldbit; bool oldbit;
asm volatile("bt %2,%1\n\t" asm volatile("bt %2,%1"
CC_SET(c) CC_SET(c)
: CC_OUT(c) (oldbit) : CC_OUT(c) (oldbit)
: "m" (*(unsigned long *)addr), "Ir" (nr)); : "m" (*(unsigned long *)addr), "Ir" (nr));
......
...@@ -7,6 +7,7 @@ ...@@ -7,6 +7,7 @@
*/ */
#include <linux/types.h> #include <linux/types.h>
#include <linux/sched.h> #include <linux/sched.h>
#include <linux/sched/task_stack.h>
#include <asm/processor.h> #include <asm/processor.h>
#include <asm/user32.h> #include <asm/user32.h>
#include <asm/unistd.h> #include <asm/unistd.h>
......
...@@ -126,11 +126,10 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; ...@@ -126,11 +126,10 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
#define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit) #define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit)
#define set_cpu_cap(c, bit) set_bit(bit, (unsigned long *)((c)->x86_capability)) #define set_cpu_cap(c, bit) set_bit(bit, (unsigned long *)((c)->x86_capability))
#define clear_cpu_cap(c, bit) clear_bit(bit, (unsigned long *)((c)->x86_capability))
#define setup_clear_cpu_cap(bit) do { \ extern void setup_clear_cpu_cap(unsigned int bit);
clear_cpu_cap(&boot_cpu_data, bit); \ extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
set_bit(bit, (unsigned long *)cpu_caps_cleared); \
} while (0)
#define setup_force_cpu_cap(bit) do { \ #define setup_force_cpu_cap(bit) do { \
set_cpu_cap(&boot_cpu_data, bit); \ set_cpu_cap(&boot_cpu_data, bit); \
set_bit(bit, (unsigned long *)cpu_caps_set); \ set_bit(bit, (unsigned long *)cpu_caps_set); \
......
This diff is collapsed.
...@@ -16,6 +16,12 @@ ...@@ -16,6 +16,12 @@
# define DISABLE_MPX (1<<(X86_FEATURE_MPX & 31)) # define DISABLE_MPX (1<<(X86_FEATURE_MPX & 31))
#endif #endif
#ifdef CONFIG_X86_INTEL_UMIP
# define DISABLE_UMIP 0
#else
# define DISABLE_UMIP (1<<(X86_FEATURE_UMIP & 31))
#endif
#ifdef CONFIG_X86_64 #ifdef CONFIG_X86_64
# define DISABLE_VME (1<<(X86_FEATURE_VME & 31)) # define DISABLE_VME (1<<(X86_FEATURE_VME & 31))
# define DISABLE_K6_MTRR (1<<(X86_FEATURE_K6_MTRR & 31)) # define DISABLE_K6_MTRR (1<<(X86_FEATURE_K6_MTRR & 31))
...@@ -63,7 +69,7 @@ ...@@ -63,7 +69,7 @@
#define DISABLED_MASK13 0 #define DISABLED_MASK13 0
#define DISABLED_MASK14 0 #define DISABLED_MASK14 0
#define DISABLED_MASK15 0 #define DISABLED_MASK15 0
#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57) #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP)
#define DISABLED_MASK17 0 #define DISABLED_MASK17 0
#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18) #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
......
...@@ -97,6 +97,16 @@ ...@@ -97,6 +97,16 @@
#define INAT_MAKE_GROUP(grp) ((grp << INAT_GRP_OFFS) | INAT_MODRM) #define INAT_MAKE_GROUP(grp) ((grp << INAT_GRP_OFFS) | INAT_MODRM)
#define INAT_MAKE_IMM(imm) (imm << INAT_IMM_OFFS) #define INAT_MAKE_IMM(imm) (imm << INAT_IMM_OFFS)
/* Identifiers for segment registers */
#define INAT_SEG_REG_IGNORE 0
#define INAT_SEG_REG_DEFAULT 1
#define INAT_SEG_REG_CS 2
#define INAT_SEG_REG_SS 3
#define INAT_SEG_REG_DS 4
#define INAT_SEG_REG_ES 5
#define INAT_SEG_REG_FS 6
#define INAT_SEG_REG_GS 7
/* Attribute search APIs */ /* Attribute search APIs */
extern insn_attr_t inat_get_opcode_attribute(insn_byte_t opcode); extern insn_attr_t inat_get_opcode_attribute(insn_byte_t opcode);
extern int inat_get_last_prefix_id(insn_byte_t last_pfx); extern int inat_get_last_prefix_id(insn_byte_t last_pfx);
......
#ifndef _ASM_X86_INSN_EVAL_H
#define _ASM_X86_INSN_EVAL_H
/*
* A collection of utility functions for x86 instruction analysis to be
* used in a kernel context. Useful when, for instance, making sense
* of the registers indicated by operands.
*/
#include <linux/compiler.h>
#include <linux/bug.h>
#include <linux/err.h>
#include <asm/ptrace.h>
#define INSN_CODE_SEG_ADDR_SZ(params) ((params >> 4) & 0xf)
#define INSN_CODE_SEG_OPND_SZ(params) (params & 0xf)
#define INSN_CODE_SEG_PARAMS(oper_sz, addr_sz) (oper_sz | (addr_sz << 4))
void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs);
int insn_get_modrm_rm_off(struct insn *insn, struct pt_regs *regs);
unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx);
char insn_get_code_seg_params(struct pt_regs *regs);
#endif /* _ASM_X86_INSN_EVAL_H */
...@@ -266,6 +266,21 @@ static inline void slow_down_io(void) ...@@ -266,6 +266,21 @@ static inline void slow_down_io(void)
#endif #endif
#ifdef CONFIG_AMD_MEM_ENCRYPT
#include <linux/jump_label.h>
extern struct static_key_false sev_enable_key;
static inline bool sev_key_active(void)
{
return static_branch_unlikely(&sev_enable_key);
}
#else /* !CONFIG_AMD_MEM_ENCRYPT */
static inline bool sev_key_active(void) { return false; }
#endif /* CONFIG_AMD_MEM_ENCRYPT */
#define BUILDIO(bwl, bw, type) \ #define BUILDIO(bwl, bw, type) \
static inline void out##bwl(unsigned type value, int port) \ static inline void out##bwl(unsigned type value, int port) \
{ \ { \
...@@ -296,14 +311,34 @@ static inline unsigned type in##bwl##_p(int port) \ ...@@ -296,14 +311,34 @@ static inline unsigned type in##bwl##_p(int port) \
\ \
static inline void outs##bwl(int port, const void *addr, unsigned long count) \ static inline void outs##bwl(int port, const void *addr, unsigned long count) \
{ \ { \
if (sev_key_active()) { \
unsigned type *value = (unsigned type *)addr; \
while (count) { \
out##bwl(*value, port); \
value++; \
count--; \
} \
} else { \
asm volatile("rep; outs" #bwl \ asm volatile("rep; outs" #bwl \
: "+S"(addr), "+c"(count) : "d"(port) : "memory"); \ : "+S"(addr), "+c"(count) \
: "d"(port) : "memory"); \
} \
} \ } \
\ \
static inline void ins##bwl(int port, void *addr, unsigned long count) \ static inline void ins##bwl(int port, void *addr, unsigned long count) \
{ \ { \
if (sev_key_active()) { \
unsigned type *value = (unsigned type *)addr; \
while (count) { \
*value = in##bwl(port); \
value++; \
count--; \
} \
} else { \
asm volatile("rep; ins" #bwl \ asm volatile("rep; ins" #bwl \
: "+D"(addr), "+c"(count) : "d"(port) : "memory"); \ : "+D"(addr), "+c"(count) \
: "d"(port) : "memory"); \
} \
} }
BUILDIO(b, b, char) BUILDIO(b, b, char)
......
...@@ -42,11 +42,17 @@ void __init sme_early_init(void); ...@@ -42,11 +42,17 @@ void __init sme_early_init(void);
void __init sme_encrypt_kernel(void); void __init sme_encrypt_kernel(void);
void __init sme_enable(struct boot_params *bp); void __init sme_enable(struct boot_params *bp);
int __init early_set_memory_decrypted(unsigned long vaddr, unsigned long size);
int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size);
/* Architecture __weak replacement functions */ /* Architecture __weak replacement functions */
void __init mem_encrypt_init(void); void __init mem_encrypt_init(void);
void swiotlb_set_mem_attributes(void *vaddr, unsigned long size); void swiotlb_set_mem_attributes(void *vaddr, unsigned long size);
bool sme_active(void);
bool sev_active(void);
#else /* !CONFIG_AMD_MEM_ENCRYPT */ #else /* !CONFIG_AMD_MEM_ENCRYPT */
#define sme_me_mask 0ULL #define sme_me_mask 0ULL
...@@ -64,6 +70,14 @@ static inline void __init sme_early_init(void) { } ...@@ -64,6 +70,14 @@ static inline void __init sme_early_init(void) { }
static inline void __init sme_encrypt_kernel(void) { } static inline void __init sme_encrypt_kernel(void) { }
static inline void __init sme_enable(struct boot_params *bp) { } static inline void __init sme_enable(struct boot_params *bp) { }
static inline bool sme_active(void) { return false; }
static inline bool sev_active(void) { return false; }
static inline int __init
early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 0; }
static inline int __init
early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0; }
#endif /* CONFIG_AMD_MEM_ENCRYPT */ #endif /* CONFIG_AMD_MEM_ENCRYPT */
/* /*
......
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
#include <asm/orc_types.h> #include <asm/orc_types.h>
struct mod_arch_specific { struct mod_arch_specific {
#ifdef CONFIG_ORC_UNWINDER #ifdef CONFIG_UNWINDER_ORC
unsigned int num_orcs; unsigned int num_orcs;
int *orc_unwind_ip; int *orc_unwind_ip;
struct orc_entry *orc_unwind; struct orc_entry *orc_unwind;
......
...@@ -324,6 +324,9 @@ ...@@ -324,6 +324,9 @@
#define MSR_AMD64_IBSBRTARGET 0xc001103b #define MSR_AMD64_IBSBRTARGET 0xc001103b
#define MSR_AMD64_IBSOPDATA4 0xc001103d #define MSR_AMD64_IBSOPDATA4 0xc001103d
#define MSR_AMD64_IBS_REG_COUNT_MAX 8 /* includes MSR_AMD64_IBSBRTARGET */ #define MSR_AMD64_IBS_REG_COUNT_MAX 8 /* includes MSR_AMD64_IBSBRTARGET */
#define MSR_AMD64_SEV 0xc0010131
#define MSR_AMD64_SEV_ENABLED_BIT 0
#define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
/* Fam 17h MSRs */ /* Fam 17h MSRs */
#define MSR_F17H_IRPERF 0xc00000e9 #define MSR_F17H_IRPERF 0xc00000e9
......
...@@ -16,10 +16,9 @@ ...@@ -16,10 +16,9 @@
#include <linux/cpumask.h> #include <linux/cpumask.h>
#include <asm/frame.h> #include <asm/frame.h>
static inline void load_sp0(struct tss_struct *tss, static inline void load_sp0(unsigned long sp0)
struct thread_struct *thread)
{ {
PVOP_VCALL2(pv_cpu_ops.load_sp0, tss, thread); PVOP_VCALL1(pv_cpu_ops.load_sp0, sp0);
} }
/* The paravirtualized CPUID instruction. */ /* The paravirtualized CPUID instruction. */
......
...@@ -134,7 +134,7 @@ struct pv_cpu_ops { ...@@ -134,7 +134,7 @@ struct pv_cpu_ops {
void (*alloc_ldt)(struct desc_struct *ldt, unsigned entries); void (*alloc_ldt)(struct desc_struct *ldt, unsigned entries);
void (*free_ldt)(struct desc_struct *ldt, unsigned entries); void (*free_ldt)(struct desc_struct *ldt, unsigned entries);
void (*load_sp0)(struct tss_struct *tss, struct thread_struct *t); void (*load_sp0)(unsigned long sp0);
void (*set_iopl_mask)(unsigned mask); void (*set_iopl_mask)(unsigned mask);
......
...@@ -526,7 +526,7 @@ static inline bool x86_this_cpu_variable_test_bit(int nr, ...@@ -526,7 +526,7 @@ static inline bool x86_this_cpu_variable_test_bit(int nr,
{ {
bool oldbit; bool oldbit;
asm volatile("bt "__percpu_arg(2)",%1\n\t" asm volatile("bt "__percpu_arg(2)",%1"
CC_SET(c) CC_SET(c)
: CC_OUT(c) (oldbit) : CC_OUT(c) (oldbit)
: "m" (*(unsigned long __percpu *)addr), "Ir" (nr)); : "m" (*(unsigned long __percpu *)addr), "Ir" (nr));
......
...@@ -200,10 +200,9 @@ enum page_cache_mode { ...@@ -200,10 +200,9 @@ enum page_cache_mode {
#define _PAGE_ENC (_AT(pteval_t, sme_me_mask)) #define _PAGE_ENC (_AT(pteval_t, sme_me_mask))
#define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_ENC)
#define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \ #define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \
_PAGE_DIRTY | _PAGE_ENC) _PAGE_DIRTY | _PAGE_ENC)
#define _PAGE_TABLE (_KERNPG_TABLE | _PAGE_USER)
#define __PAGE_KERNEL_ENC (__PAGE_KERNEL | _PAGE_ENC) #define __PAGE_KERNEL_ENC (__PAGE_KERNEL | _PAGE_ENC)
#define __PAGE_KERNEL_ENC_WP (__PAGE_KERNEL_WP | _PAGE_ENC) #define __PAGE_KERNEL_ENC_WP (__PAGE_KERNEL_WP | _PAGE_ENC)
......
...@@ -431,7 +431,9 @@ typedef struct { ...@@ -431,7 +431,9 @@ typedef struct {
struct thread_struct { struct thread_struct {
/* Cached TLS descriptors: */ /* Cached TLS descriptors: */
struct desc_struct tls_array[GDT_ENTRY_TLS_ENTRIES]; struct desc_struct tls_array[GDT_ENTRY_TLS_ENTRIES];
#ifdef CONFIG_X86_32
unsigned long sp0; unsigned long sp0;
#endif
unsigned long sp; unsigned long sp;
#ifdef CONFIG_X86_32 #ifdef CONFIG_X86_32
unsigned long sysenter_cs; unsigned long sysenter_cs;
...@@ -518,16 +520,9 @@ static inline void native_set_iopl_mask(unsigned mask) ...@@ -518,16 +520,9 @@ static inline void native_set_iopl_mask(unsigned mask)
} }
static inline void static inline void
native_load_sp0(struct tss_struct *tss, struct thread_struct *thread) native_load_sp0(unsigned long sp0)
{ {
tss->x86_tss.sp0 = thread->sp0; this_cpu_write(cpu_tss.x86_tss.sp0, sp0);
#ifdef CONFIG_X86_32
/* Only happens when SEP is enabled, no need to test "SEP"arately: */
if (unlikely(tss->x86_tss.ss1 != thread->sysenter_cs)) {
tss->x86_tss.ss1 = thread->sysenter_cs;
wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0);
}
#endif
} }
static inline void native_swapgs(void) static inline void native_swapgs(void)
...@@ -547,15 +542,20 @@ static inline unsigned long current_top_of_stack(void) ...@@ -547,15 +542,20 @@ static inline unsigned long current_top_of_stack(void)
#endif #endif
} }
static inline bool on_thread_stack(void)
{
return (unsigned long)(current_top_of_stack() -
current_stack_pointer) < THREAD_SIZE;
}
#ifdef CONFIG_PARAVIRT #ifdef CONFIG_PARAVIRT
#include <asm/paravirt.h> #include <asm/paravirt.h>
#else #else
#define __cpuid native_cpuid #define __cpuid native_cpuid
static inline void load_sp0(struct tss_struct *tss, static inline void load_sp0(unsigned long sp0)
struct thread_struct *thread)
{ {
native_load_sp0(tss, thread); native_load_sp0(sp0);
} }
#define set_iopl_mask native_set_iopl_mask #define set_iopl_mask native_set_iopl_mask
...@@ -804,6 +804,15 @@ static inline void spin_lock_prefetch(const void *x) ...@@ -804,6 +804,15 @@ static inline void spin_lock_prefetch(const void *x)
#define TOP_OF_INIT_STACK ((unsigned long)&init_stack + sizeof(init_stack) - \ #define TOP_OF_INIT_STACK ((unsigned long)&init_stack + sizeof(init_stack) - \
TOP_OF_KERNEL_STACK_PADDING) TOP_OF_KERNEL_STACK_PADDING)
#define task_top_of_stack(task) ((unsigned long)(task_pt_regs(task) + 1))
#define task_pt_regs(task) \
({ \
unsigned long __ptr = (unsigned long)task_stack_page(task); \
__ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING; \
((struct pt_regs *)__ptr) - 1; \
})
#ifdef CONFIG_X86_32 #ifdef CONFIG_X86_32
/* /*
* User space process size: 3GB (default). * User space process size: 3GB (default).
...@@ -823,23 +832,6 @@ static inline void spin_lock_prefetch(const void *x) ...@@ -823,23 +832,6 @@ static inline void spin_lock_prefetch(const void *x)
.addr_limit = KERNEL_DS, \ .addr_limit = KERNEL_DS, \
} }
/*
* TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack.
* This is necessary to guarantee that the entire "struct pt_regs"
* is accessible even if the CPU haven't stored the SS/ESP registers
* on the stack (interrupt gate does not save these registers
* when switching to the same priv ring).
* Therefore beware: accessing the ss/esp fields of the
* "struct pt_regs" is possible, but they may contain the
* completely wrong values.
*/
#define task_pt_regs(task) \
({ \
unsigned long __ptr = (unsigned long)task_stack_page(task); \
__ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING; \
((struct pt_regs *)__ptr) - 1; \
})
#define KSTK_ESP(task) (task_pt_regs(task)->sp) #define KSTK_ESP(task) (task_pt_regs(task)->sp)
#else #else
...@@ -873,11 +865,9 @@ static inline void spin_lock_prefetch(const void *x) ...@@ -873,11 +865,9 @@ static inline void spin_lock_prefetch(const void *x)
#define STACK_TOP_MAX TASK_SIZE_MAX #define STACK_TOP_MAX TASK_SIZE_MAX
#define INIT_THREAD { \ #define INIT_THREAD { \
.sp0 = TOP_OF_INIT_STACK, \
.addr_limit = KERNEL_DS, \ .addr_limit = KERNEL_DS, \
} }
#define task_pt_regs(tsk) ((struct pt_regs *)(tsk)->thread.sp0 - 1)
extern unsigned long KSTK_ESP(struct task_struct *task); extern unsigned long KSTK_ESP(struct task_struct *task);
#endif /* CONFIG_X86_64 */ #endif /* CONFIG_X86_64 */
......
...@@ -136,9 +136,9 @@ static inline int v8086_mode(struct pt_regs *regs) ...@@ -136,9 +136,9 @@ static inline int v8086_mode(struct pt_regs *regs)
#endif #endif
} }
#ifdef CONFIG_X86_64
static inline bool user_64bit_mode(struct pt_regs *regs) static inline bool user_64bit_mode(struct pt_regs *regs)
{ {
#ifdef CONFIG_X86_64
#ifndef CONFIG_PARAVIRT #ifndef CONFIG_PARAVIRT
/* /*
* On non-paravirt systems, this is the only long mode CPL 3 * On non-paravirt systems, this is the only long mode CPL 3
...@@ -149,8 +149,12 @@ static inline bool user_64bit_mode(struct pt_regs *regs) ...@@ -149,8 +149,12 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
/* Headers are too twisted for this to go in paravirt.h. */ /* Headers are too twisted for this to go in paravirt.h. */
return regs->cs == __USER_CS || regs->cs == pv_info.extra_user_64bit_cs; return regs->cs == __USER_CS || regs->cs == pv_info.extra_user_64bit_cs;
#endif #endif
#else /* !CONFIG_X86_64 */
return false;
#endif
} }
#ifdef CONFIG_X86_64
#define current_user_stack_pointer() current_pt_regs()->sp #define current_user_stack_pointer() current_pt_regs()->sp
#define compat_user_stack_pointer() current_pt_regs()->sp #define compat_user_stack_pointer() current_pt_regs()->sp
#endif #endif
......
...@@ -29,7 +29,7 @@ cc_label: \ ...@@ -29,7 +29,7 @@ cc_label: \
#define __GEN_RMWcc(fullop, var, cc, clobbers, ...) \ #define __GEN_RMWcc(fullop, var, cc, clobbers, ...) \
do { \ do { \
bool c; \ bool c; \
asm volatile (fullop ";" CC_SET(cc) \ asm volatile (fullop CC_SET(cc) \
: [counter] "+m" (var), CC_OUT(cc) (c) \ : [counter] "+m" (var), CC_OUT(cc) (c) \
: __VA_ARGS__ : clobbers); \ : __VA_ARGS__ : clobbers); \
return c; \ return c; \
......
...@@ -2,6 +2,8 @@ ...@@ -2,6 +2,8 @@
#ifndef _ASM_X86_SWITCH_TO_H #ifndef _ASM_X86_SWITCH_TO_H
#define _ASM_X86_SWITCH_TO_H #define _ASM_X86_SWITCH_TO_H
#include <linux/sched/task_stack.h>
struct task_struct; /* one of the stranger aspects of C forward declarations */ struct task_struct; /* one of the stranger aspects of C forward declarations */
struct task_struct *__switch_to_asm(struct task_struct *prev, struct task_struct *__switch_to_asm(struct task_struct *prev,
...@@ -73,4 +75,26 @@ do { \ ...@@ -73,4 +75,26 @@ do { \
((last) = __switch_to_asm((prev), (next))); \ ((last) = __switch_to_asm((prev), (next))); \
} while (0) } while (0)
#ifdef CONFIG_X86_32
static inline void refresh_sysenter_cs(struct thread_struct *thread)
{
/* Only happens when SEP is enabled, no need to test "SEP"arately: */
if (unlikely(this_cpu_read(cpu_tss.x86_tss.ss1) == thread->sysenter_cs))
return;
this_cpu_write(cpu_tss.x86_tss.ss1, thread->sysenter_cs);
wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0);
}
#endif
/* This is used when switching tasks or entering/exiting vm86 mode. */
static inline void update_sp0(struct task_struct *task)
{
#ifdef CONFIG_X86_32
load_sp0(task->thread.sp0);
#else
load_sp0(task_top_of_stack(task));
#endif
}
#endif /* _ASM_X86_SWITCH_TO_H */ #endif /* _ASM_X86_SWITCH_TO_H */
...@@ -21,7 +21,7 @@ asmlinkage long sys_ioperm(unsigned long, unsigned long, int); ...@@ -21,7 +21,7 @@ asmlinkage long sys_ioperm(unsigned long, unsigned long, int);
asmlinkage long sys_iopl(unsigned int); asmlinkage long sys_iopl(unsigned int);
/* kernel/ldt.c */ /* kernel/ldt.c */
asmlinkage int sys_modify_ldt(int, void __user *, unsigned long); asmlinkage long sys_modify_ldt(int, void __user *, unsigned long);
/* kernel/signal.c */ /* kernel/signal.c */
asmlinkage long sys_rt_sigreturn(void); asmlinkage long sys_rt_sigreturn(void);
......
...@@ -34,11 +34,6 @@ DECLARE_EVENT_CLASS(x86_fpu, ...@@ -34,11 +34,6 @@ DECLARE_EVENT_CLASS(x86_fpu,
) )
); );
DEFINE_EVENT(x86_fpu, x86_fpu_state,
TP_PROTO(struct fpu *fpu),
TP_ARGS(fpu)
);
DEFINE_EVENT(x86_fpu, x86_fpu_before_save, DEFINE_EVENT(x86_fpu, x86_fpu_before_save,
TP_PROTO(struct fpu *fpu), TP_PROTO(struct fpu *fpu),
TP_ARGS(fpu) TP_ARGS(fpu)
...@@ -74,11 +69,6 @@ DEFINE_EVENT(x86_fpu, x86_fpu_activate_state, ...@@ -74,11 +69,6 @@ DEFINE_EVENT(x86_fpu, x86_fpu_activate_state,
TP_ARGS(fpu) TP_ARGS(fpu)
); );
DEFINE_EVENT(x86_fpu, x86_fpu_deactivate_state,
TP_PROTO(struct fpu *fpu),
TP_ARGS(fpu)
);
DEFINE_EVENT(x86_fpu, x86_fpu_init_state, DEFINE_EVENT(x86_fpu, x86_fpu_init_state,
TP_PROTO(struct fpu *fpu), TP_PROTO(struct fpu *fpu),
TP_ARGS(fpu) TP_ARGS(fpu)
......
...@@ -38,9 +38,9 @@ asmlinkage void simd_coprocessor_error(void); ...@@ -38,9 +38,9 @@ asmlinkage void simd_coprocessor_error(void);
#if defined(CONFIG_X86_64) && defined(CONFIG_XEN_PV) #if defined(CONFIG_X86_64) && defined(CONFIG_XEN_PV)
asmlinkage void xen_divide_error(void); asmlinkage void xen_divide_error(void);
asmlinkage void xen_xennmi(void);
asmlinkage void xen_xendebug(void); asmlinkage void xen_xendebug(void);
asmlinkage void xen_xenint3(void); asmlinkage void xen_xenint3(void);
asmlinkage void xen_nmi(void);
asmlinkage void xen_overflow(void); asmlinkage void xen_overflow(void);
asmlinkage void xen_bounds(void); asmlinkage void xen_bounds(void);
asmlinkage void xen_invalid_op(void); asmlinkage void xen_invalid_op(void);
...@@ -145,4 +145,22 @@ enum { ...@@ -145,4 +145,22 @@ enum {
X86_TRAP_IRET = 32, /* 32, IRET Exception */ X86_TRAP_IRET = 32, /* 32, IRET Exception */
}; };
/*
* Page fault error code bits:
*
* bit 0 == 0: no page found 1: protection fault
* bit 1 == 0: read access 1: write access
* bit 2 == 0: kernel-mode access 1: user-mode access
* bit 3 == 1: use of reserved bit detected
* bit 4 == 1: fault was an instruction fetch
* bit 5 == 1: protection keys block access
*/
enum x86_pf_error_code {
X86_PF_PROT = 1 << 0,
X86_PF_WRITE = 1 << 1,
X86_PF_USER = 1 << 2,
X86_PF_RSVD = 1 << 3,
X86_PF_INSTR = 1 << 4,
X86_PF_PK = 1 << 5,
};
#endif /* _ASM_X86_TRAPS_H */ #endif /* _ASM_X86_TRAPS_H */
#ifndef _ASM_X86_UMIP_H
#define _ASM_X86_UMIP_H
#include <linux/types.h>
#include <asm/ptrace.h>
#ifdef CONFIG_X86_INTEL_UMIP
bool fixup_umip_exception(struct pt_regs *regs);
#else
static inline bool fixup_umip_exception(struct pt_regs *regs) { return false; }
#endif /* CONFIG_X86_INTEL_UMIP */
#endif /* _ASM_X86_UMIP_H */
...@@ -13,11 +13,11 @@ struct unwind_state { ...@@ -13,11 +13,11 @@ struct unwind_state {
struct task_struct *task; struct task_struct *task;
int graph_idx; int graph_idx;
bool error; bool error;
#if defined(CONFIG_ORC_UNWINDER) #if defined(CONFIG_UNWINDER_ORC)
bool signal, full_regs; bool signal, full_regs;
unsigned long sp, bp, ip; unsigned long sp, bp, ip;
struct pt_regs *regs; struct pt_regs *regs;
#elif defined(CONFIG_FRAME_POINTER_UNWINDER) #elif defined(CONFIG_UNWINDER_FRAME_POINTER)
bool got_irq; bool got_irq;
unsigned long *bp, *orig_sp, ip; unsigned long *bp, *orig_sp, ip;
struct pt_regs *regs; struct pt_regs *regs;
...@@ -51,7 +51,7 @@ void unwind_start(struct unwind_state *state, struct task_struct *task, ...@@ -51,7 +51,7 @@ void unwind_start(struct unwind_state *state, struct task_struct *task,
__unwind_start(state, task, regs, first_frame); __unwind_start(state, task, regs, first_frame);
} }
#if defined(CONFIG_ORC_UNWINDER) || defined(CONFIG_FRAME_POINTER_UNWINDER) #if defined(CONFIG_UNWINDER_ORC) || defined(CONFIG_UNWINDER_FRAME_POINTER)
static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state) static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
{ {
if (unwind_done(state)) if (unwind_done(state))
...@@ -66,7 +66,7 @@ static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state) ...@@ -66,7 +66,7 @@ static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
} }
#endif #endif
#ifdef CONFIG_ORC_UNWINDER #ifdef CONFIG_UNWINDER_ORC
void unwind_init(void); void unwind_init(void);
void unwind_module_init(struct module *mod, void *orc_ip, size_t orc_ip_size, void unwind_module_init(struct module *mod, void *orc_ip, size_t orc_ip_size,
void *orc, size_t orc_size); void *orc, size_t orc_size);
......
...@@ -110,5 +110,4 @@ struct kvm_vcpu_pv_apf_data { ...@@ -110,5 +110,4 @@ struct kvm_vcpu_pv_apf_data {
#define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
#define KVM_PV_EOI_DISABLED 0x0 #define KVM_PV_EOI_DISABLED 0x0
#endif /* _UAPI_ASM_X86_KVM_PARA_H */ #endif /* _UAPI_ASM_X86_KVM_PARA_H */
...@@ -105,6 +105,8 @@ ...@@ -105,6 +105,8 @@
#define X86_CR4_OSFXSR _BITUL(X86_CR4_OSFXSR_BIT) #define X86_CR4_OSFXSR _BITUL(X86_CR4_OSFXSR_BIT)
#define X86_CR4_OSXMMEXCPT_BIT 10 /* enable unmasked SSE exceptions */ #define X86_CR4_OSXMMEXCPT_BIT 10 /* enable unmasked SSE exceptions */
#define X86_CR4_OSXMMEXCPT _BITUL(X86_CR4_OSXMMEXCPT_BIT) #define X86_CR4_OSXMMEXCPT _BITUL(X86_CR4_OSXMMEXCPT_BIT)
#define X86_CR4_UMIP_BIT 11 /* enable UMIP support */
#define X86_CR4_UMIP _BITUL(X86_CR4_UMIP_BIT)
#define X86_CR4_LA57_BIT 12 /* enable 5-level page tables */ #define X86_CR4_LA57_BIT 12 /* enable 5-level page tables */
#define X86_CR4_LA57 _BITUL(X86_CR4_LA57_BIT) #define X86_CR4_LA57 _BITUL(X86_CR4_LA57_BIT)
#define X86_CR4_VMXE_BIT 13 /* enable VMX virtualization */ #define X86_CR4_VMXE_BIT 13 /* enable VMX virtualization */
...@@ -152,5 +154,8 @@ ...@@ -152,5 +154,8 @@
#define CX86_ARR_BASE 0xc4 #define CX86_ARR_BASE 0xc4
#define CX86_RCR_BASE 0xdc #define CX86_RCR_BASE 0xdc
#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \
X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \
X86_CR0_PG)
#endif /* _UAPI_ASM_X86_PROCESSOR_FLAGS_H */ #endif /* _UAPI_ASM_X86_PROCESSOR_FLAGS_H */
...@@ -26,8 +26,8 @@ KASAN_SANITIZE_head$(BITS).o := n ...@@ -26,8 +26,8 @@ KASAN_SANITIZE_head$(BITS).o := n
KASAN_SANITIZE_dumpstack.o := n KASAN_SANITIZE_dumpstack.o := n
KASAN_SANITIZE_dumpstack_$(BITS).o := n KASAN_SANITIZE_dumpstack_$(BITS).o := n
KASAN_SANITIZE_stacktrace.o := n KASAN_SANITIZE_stacktrace.o := n
KASAN_SANITIZE_paravirt.o := n
OBJECT_FILES_NON_STANDARD_head_$(BITS).o := y
OBJECT_FILES_NON_STANDARD_relocate_kernel_$(BITS).o := y OBJECT_FILES_NON_STANDARD_relocate_kernel_$(BITS).o := y
OBJECT_FILES_NON_STANDARD_ftrace_$(BITS).o := y OBJECT_FILES_NON_STANDARD_ftrace_$(BITS).o := y
OBJECT_FILES_NON_STANDARD_test_nx.o := y OBJECT_FILES_NON_STANDARD_test_nx.o := y
...@@ -127,10 +127,11 @@ obj-$(CONFIG_EFI) += sysfb_efi.o ...@@ -127,10 +127,11 @@ obj-$(CONFIG_EFI) += sysfb_efi.o
obj-$(CONFIG_PERF_EVENTS) += perf_regs.o obj-$(CONFIG_PERF_EVENTS) += perf_regs.o
obj-$(CONFIG_TRACING) += tracepoint.o obj-$(CONFIG_TRACING) += tracepoint.o
obj-$(CONFIG_SCHED_MC_PRIO) += itmt.o obj-$(CONFIG_SCHED_MC_PRIO) += itmt.o
obj-$(CONFIG_X86_INTEL_UMIP) += umip.o
obj-$(CONFIG_ORC_UNWINDER) += unwind_orc.o obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o
obj-$(CONFIG_FRAME_POINTER_UNWINDER) += unwind_frame.o obj-$(CONFIG_UNWINDER_FRAME_POINTER) += unwind_frame.o
obj-$(CONFIG_GUESS_UNWINDER) += unwind_guess.o obj-$(CONFIG_UNWINDER_GUESS) += unwind_guess.o
### ###
# 64 bit specific files # 64 bit specific files
......
...@@ -442,7 +442,6 @@ static void alternatives_smp_lock(const s32 *start, const s32 *end, ...@@ -442,7 +442,6 @@ static void alternatives_smp_lock(const s32 *start, const s32 *end,
{ {
const s32 *poff; const s32 *poff;
mutex_lock(&text_mutex);
for (poff = start; poff < end; poff++) { for (poff = start; poff < end; poff++) {
u8 *ptr = (u8 *)poff + *poff; u8 *ptr = (u8 *)poff + *poff;
...@@ -452,7 +451,6 @@ static void alternatives_smp_lock(const s32 *start, const s32 *end, ...@@ -452,7 +451,6 @@ static void alternatives_smp_lock(const s32 *start, const s32 *end,
if (*ptr == 0x3e) if (*ptr == 0x3e)
text_poke(ptr, ((unsigned char []){0xf0}), 1); text_poke(ptr, ((unsigned char []){0xf0}), 1);
} }
mutex_unlock(&text_mutex);
} }
static void alternatives_smp_unlock(const s32 *start, const s32 *end, static void alternatives_smp_unlock(const s32 *start, const s32 *end,
...@@ -460,7 +458,6 @@ static void alternatives_smp_unlock(const s32 *start, const s32 *end, ...@@ -460,7 +458,6 @@ static void alternatives_smp_unlock(const s32 *start, const s32 *end,
{ {
const s32 *poff; const s32 *poff;
mutex_lock(&text_mutex);
for (poff = start; poff < end; poff++) { for (poff = start; poff < end; poff++) {
u8 *ptr = (u8 *)poff + *poff; u8 *ptr = (u8 *)poff + *poff;
...@@ -470,7 +467,6 @@ static void alternatives_smp_unlock(const s32 *start, const s32 *end, ...@@ -470,7 +467,6 @@ static void alternatives_smp_unlock(const s32 *start, const s32 *end,
if (*ptr == 0xf0) if (*ptr == 0xf0)
text_poke(ptr, ((unsigned char []){0x3E}), 1); text_poke(ptr, ((unsigned char []){0x3E}), 1);
} }
mutex_unlock(&text_mutex);
} }
struct smp_alt_module { struct smp_alt_module {
...@@ -489,8 +485,7 @@ struct smp_alt_module { ...@@ -489,8 +485,7 @@ struct smp_alt_module {
struct list_head next; struct list_head next;
}; };
static LIST_HEAD(smp_alt_modules); static LIST_HEAD(smp_alt_modules);
static DEFINE_MUTEX(smp_alt); static bool uniproc_patched = false; /* protected by text_mutex */
static bool uniproc_patched = false; /* protected by smp_alt */
void __init_or_module alternatives_smp_module_add(struct module *mod, void __init_or_module alternatives_smp_module_add(struct module *mod,
char *name, char *name,
...@@ -499,7 +494,7 @@ void __init_or_module alternatives_smp_module_add(struct module *mod, ...@@ -499,7 +494,7 @@ void __init_or_module alternatives_smp_module_add(struct module *mod,
{ {
struct smp_alt_module *smp; struct smp_alt_module *smp;
mutex_lock(&smp_alt); mutex_lock(&text_mutex);
if (!uniproc_patched) if (!uniproc_patched)
goto unlock; goto unlock;
...@@ -526,14 +521,14 @@ void __init_or_module alternatives_smp_module_add(struct module *mod, ...@@ -526,14 +521,14 @@ void __init_or_module alternatives_smp_module_add(struct module *mod,
smp_unlock: smp_unlock:
alternatives_smp_unlock(locks, locks_end, text, text_end); alternatives_smp_unlock(locks, locks_end, text, text_end);
unlock: unlock:
mutex_unlock(&smp_alt); mutex_unlock(&text_mutex);
} }
void __init_or_module alternatives_smp_module_del(struct module *mod) void __init_or_module alternatives_smp_module_del(struct module *mod)
{ {
struct smp_alt_module *item; struct smp_alt_module *item;
mutex_lock(&smp_alt); mutex_lock(&text_mutex);
list_for_each_entry(item, &smp_alt_modules, next) { list_for_each_entry(item, &smp_alt_modules, next) {
if (mod != item->mod) if (mod != item->mod)
continue; continue;
...@@ -541,7 +536,7 @@ void __init_or_module alternatives_smp_module_del(struct module *mod) ...@@ -541,7 +536,7 @@ void __init_or_module alternatives_smp_module_del(struct module *mod)
kfree(item); kfree(item);
break; break;
} }
mutex_unlock(&smp_alt); mutex_unlock(&text_mutex);
} }
void alternatives_enable_smp(void) void alternatives_enable_smp(void)
...@@ -551,7 +546,7 @@ void alternatives_enable_smp(void) ...@@ -551,7 +546,7 @@ void alternatives_enable_smp(void)
/* Why bother if there are no other CPUs? */ /* Why bother if there are no other CPUs? */
BUG_ON(num_possible_cpus() == 1); BUG_ON(num_possible_cpus() == 1);
mutex_lock(&smp_alt); mutex_lock(&text_mutex);
if (uniproc_patched) { if (uniproc_patched) {
pr_info("switching to SMP code\n"); pr_info("switching to SMP code\n");
...@@ -563,10 +558,13 @@ void alternatives_enable_smp(void) ...@@ -563,10 +558,13 @@ void alternatives_enable_smp(void)
mod->text, mod->text_end); mod->text, mod->text_end);
uniproc_patched = false; uniproc_patched = false;
} }
mutex_unlock(&smp_alt); mutex_unlock(&text_mutex);
} }
/* Return 1 if the address range is reserved for smp-alternatives */ /*
* Return 1 if the address range is reserved for SMP-alternatives.
* Must hold text_mutex.
*/
int alternatives_text_reserved(void *start, void *end) int alternatives_text_reserved(void *start, void *end)
{ {
struct smp_alt_module *mod; struct smp_alt_module *mod;
...@@ -574,6 +572,8 @@ int alternatives_text_reserved(void *start, void *end) ...@@ -574,6 +572,8 @@ int alternatives_text_reserved(void *start, void *end)
u8 *text_start = start; u8 *text_start = start;
u8 *text_end = end; u8 *text_end = end;
lockdep_assert_held(&text_mutex);
list_for_each_entry(mod, &smp_alt_modules, next) { list_for_each_entry(mod, &smp_alt_modules, next) {
if (mod->text > text_end || mod->text_end < text_start) if (mod->text > text_end || mod->text_end < text_start)
continue; continue;
......
...@@ -23,6 +23,7 @@ obj-y += rdrand.o ...@@ -23,6 +23,7 @@ obj-y += rdrand.o
obj-y += match.o obj-y += match.o
obj-y += bugs.o obj-y += bugs.o
obj-$(CONFIG_CPU_FREQ) += aperfmperf.o obj-$(CONFIG_CPU_FREQ) += aperfmperf.o
obj-y += cpuid-deps.o
obj-$(CONFIG_PROC_FS) += proc.o obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o
......
...@@ -329,6 +329,28 @@ static __always_inline void setup_smap(struct cpuinfo_x86 *c) ...@@ -329,6 +329,28 @@ static __always_inline void setup_smap(struct cpuinfo_x86 *c)
} }
} }
static __always_inline void setup_umip(struct cpuinfo_x86 *c)
{
/* Check the boot processor, plus build option for UMIP. */
if (!cpu_feature_enabled(X86_FEATURE_UMIP))
goto out;
/* Check the current processor's cpuid bits. */
if (!cpu_has(c, X86_FEATURE_UMIP))
goto out;
cr4_set_bits(X86_CR4_UMIP);
return;
out:
/*
* Make sure UMIP is disabled in case it was enabled in a
* previous boot (e.g., via kexec).
*/
cr4_clear_bits(X86_CR4_UMIP);
}
/* /*
* Protection Keys are not available in 32-bit mode. * Protection Keys are not available in 32-bit mode.
*/ */
...@@ -1147,9 +1169,10 @@ static void identify_cpu(struct cpuinfo_x86 *c) ...@@ -1147,9 +1169,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Disable the PN if appropriate */ /* Disable the PN if appropriate */
squash_the_stupid_serial_number(c); squash_the_stupid_serial_number(c);
/* Set up SMEP/SMAP */ /* Set up SMEP/SMAP/UMIP */
setup_smep(c); setup_smep(c);
setup_smap(c); setup_smap(c);
setup_umip(c);
/* /*
* The vendor-specific functions might have changed features. * The vendor-specific functions might have changed features.
...@@ -1301,18 +1324,16 @@ void print_cpu_info(struct cpuinfo_x86 *c) ...@@ -1301,18 +1324,16 @@ void print_cpu_info(struct cpuinfo_x86 *c)
pr_cont(")\n"); pr_cont(")\n");
} }
static __init int setup_disablecpuid(char *arg) /*
* clearcpuid= was already parsed in fpu__init_parse_early_param.
* But we need to keep a dummy __setup around otherwise it would
* show up as an environment variable for init.
*/
static __init int setup_clearcpuid(char *arg)
{ {
int bit;
if (get_option(&arg, &bit) && bit >= 0 && bit < NCAPINTS * 32)
setup_clear_cpu_cap(bit);
else
return 0;
return 1; return 1;
} }
__setup("clearcpuid=", setup_disablecpuid); __setup("clearcpuid=", setup_clearcpuid);
#ifdef CONFIG_X86_64 #ifdef CONFIG_X86_64
DEFINE_PER_CPU_FIRST(union irq_stack_union, DEFINE_PER_CPU_FIRST(union irq_stack_union,
...@@ -1572,9 +1593,13 @@ void cpu_init(void) ...@@ -1572,9 +1593,13 @@ void cpu_init(void)
initialize_tlbstate_and_flush(); initialize_tlbstate_and_flush();
enter_lazy_tlb(&init_mm, me); enter_lazy_tlb(&init_mm, me);
load_sp0(t, &current->thread); /*
* Initialize the TSS. Don't bother initializing sp0, as the initial
* task never enters user mode.
*/
set_tss_desc(cpu, t); set_tss_desc(cpu, t);
load_TR_desc(); load_TR_desc();
load_mm_ldt(&init_mm); load_mm_ldt(&init_mm);
clear_all_debug_regs(); clear_all_debug_regs();
...@@ -1596,7 +1621,6 @@ void cpu_init(void) ...@@ -1596,7 +1621,6 @@ void cpu_init(void)
int cpu = smp_processor_id(); int cpu = smp_processor_id();
struct task_struct *curr = current; struct task_struct *curr = current;
struct tss_struct *t = &per_cpu(cpu_tss, cpu); struct tss_struct *t = &per_cpu(cpu_tss, cpu);
struct thread_struct *thread = &curr->thread;
wait_for_master_cpu(cpu); wait_for_master_cpu(cpu);
...@@ -1627,9 +1651,13 @@ void cpu_init(void) ...@@ -1627,9 +1651,13 @@ void cpu_init(void)
initialize_tlbstate_and_flush(); initialize_tlbstate_and_flush();
enter_lazy_tlb(&init_mm, curr); enter_lazy_tlb(&init_mm, curr);
load_sp0(t, thread); /*
* Initialize the TSS. Don't bother initializing sp0, as the initial
* task never enters user mode.
*/
set_tss_desc(cpu, t); set_tss_desc(cpu, t);
load_TR_desc(); load_TR_desc();
load_mm_ldt(&init_mm); load_mm_ldt(&init_mm);
t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap); t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
......
/* Declare dependencies between CPUIDs */
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
#include <asm/cpufeature.h>
struct cpuid_dep {
unsigned int feature;
unsigned int depends;
};
/*
* Table of CPUID features that depend on others.
*
* This only includes dependencies that can be usefully disabled, not
* features part of the base set (like FPU).
*
* Note this all is not __init / __initdata because it can be
* called from cpu hotplug. It shouldn't do anything in this case,
* but it's difficult to tell that to the init reference checker.
*/
const static struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_XSAVEOPT, X86_FEATURE_XSAVE },
{ X86_FEATURE_XSAVEC, X86_FEATURE_XSAVE },
{ X86_FEATURE_XSAVES, X86_FEATURE_XSAVE },
{ X86_FEATURE_AVX, X86_FEATURE_XSAVE },
{ X86_FEATURE_PKU, X86_FEATURE_XSAVE },
{ X86_FEATURE_MPX, X86_FEATURE_XSAVE },
{ X86_FEATURE_XGETBV1, X86_FEATURE_XSAVE },
{ X86_FEATURE_FXSR_OPT, X86_FEATURE_FXSR },
{ X86_FEATURE_XMM, X86_FEATURE_FXSR },
{ X86_FEATURE_XMM2, X86_FEATURE_XMM },
{ X86_FEATURE_XMM3, X86_FEATURE_XMM2 },
{ X86_FEATURE_XMM4_1, X86_FEATURE_XMM2 },
{ X86_FEATURE_XMM4_2, X86_FEATURE_XMM2 },
{ X86_FEATURE_XMM3, X86_FEATURE_XMM2 },
{ X86_FEATURE_PCLMULQDQ, X86_FEATURE_XMM2 },
{ X86_FEATURE_SSSE3, X86_FEATURE_XMM2, },
{ X86_FEATURE_F16C, X86_FEATURE_XMM2, },
{ X86_FEATURE_AES, X86_FEATURE_XMM2 },
{ X86_FEATURE_SHA_NI, X86_FEATURE_XMM2 },
{ X86_FEATURE_FMA, X86_FEATURE_AVX },
{ X86_FEATURE_AVX2, X86_FEATURE_AVX, },
{ X86_FEATURE_AVX512F, X86_FEATURE_AVX, },
{ X86_FEATURE_AVX512IFMA, X86_FEATURE_AVX512F },
{ X86_FEATURE_AVX512PF, X86_FEATURE_AVX512F },
{ X86_FEATURE_AVX512ER, X86_FEATURE_AVX512F },
{ X86_FEATURE_AVX512CD, X86_FEATURE_AVX512F },
{ X86_FEATURE_AVX512DQ, X86_FEATURE_AVX512F },
{ X86_FEATURE_AVX512BW, X86_FEATURE_AVX512F },
{ X86_FEATURE_AVX512VL, X86_FEATURE_AVX512F },
{ X86_FEATURE_AVX512VBMI, X86_FEATURE_AVX512F },
{ X86_FEATURE_AVX512_VBMI2, X86_FEATURE_AVX512VL },
{ X86_FEATURE_GFNI, X86_FEATURE_AVX512VL },
{ X86_FEATURE_VAES, X86_FEATURE_AVX512VL },
{ X86_FEATURE_VPCLMULQDQ, X86_FEATURE_AVX512VL },
{ X86_FEATURE_AVX512_VNNI, X86_FEATURE_AVX512VL },
{ X86_FEATURE_AVX512_BITALG, X86_FEATURE_AVX512VL },
{ X86_FEATURE_AVX512_4VNNIW, X86_FEATURE_AVX512F },
{ X86_FEATURE_AVX512_4FMAPS, X86_FEATURE_AVX512F },
{ X86_FEATURE_AVX512_VPOPCNTDQ, X86_FEATURE_AVX512F },
{}
};
static inline void clear_feature(struct cpuinfo_x86 *c, unsigned int feature)
{
/*
* Note: This could use the non atomic __*_bit() variants, but the
* rest of the cpufeature code uses atomics as well, so keep it for
* consistency. Cleanup all of it separately.
*/
if (!c) {
clear_cpu_cap(&boot_cpu_data, feature);
set_bit(feature, (unsigned long *)cpu_caps_cleared);
} else {
clear_bit(feature, (unsigned long *)c->x86_capability);
}
}
/* Take the capabilities and the BUG bits into account */
#define MAX_FEATURE_BITS ((NCAPINTS + NBUGINTS) * sizeof(u32) * 8)
static void do_clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int feature)
{
DECLARE_BITMAP(disable, MAX_FEATURE_BITS);
const struct cpuid_dep *d;
bool changed;
if (WARN_ON(feature >= MAX_FEATURE_BITS))
return;
clear_feature(c, feature);
/* Collect all features to disable, handling dependencies */
memset(disable, 0, sizeof(disable));
__set_bit(feature, disable);
/* Loop until we get a stable state. */
do {
changed = false;
for (d = cpuid_deps; d->feature; d++) {
if (!test_bit(d->depends, disable))
continue;
if (__test_and_set_bit(d->feature, disable))
continue;
changed = true;
clear_feature(c, d->feature);
}
} while (changed);
}
void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int feature)
{
do_clear_cpu_cap(c, feature);
}
void setup_clear_cpu_cap(unsigned int feature)
{
do_clear_cpu_cap(NULL, feature);
}
...@@ -209,7 +209,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs) ...@@ -209,7 +209,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
} }
#ifdef CONFIG_KEXEC_FILE #ifdef CONFIG_KEXEC_FILE
static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg) static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
{ {
unsigned int *nr_ranges = arg; unsigned int *nr_ranges = arg;
...@@ -342,7 +342,7 @@ static int elf_header_exclude_ranges(struct crash_elf_data *ced, ...@@ -342,7 +342,7 @@ static int elf_header_exclude_ranges(struct crash_elf_data *ced,
return ret; return ret;
} }
static int prepare_elf64_ram_headers_callback(u64 start, u64 end, void *arg) static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)
{ {
struct crash_elf_data *ced = arg; struct crash_elf_data *ced = arg;
Elf64_Ehdr *ehdr; Elf64_Ehdr *ehdr;
...@@ -355,7 +355,7 @@ static int prepare_elf64_ram_headers_callback(u64 start, u64 end, void *arg) ...@@ -355,7 +355,7 @@ static int prepare_elf64_ram_headers_callback(u64 start, u64 end, void *arg)
ehdr = ced->ehdr; ehdr = ced->ehdr;
/* Exclude unwanted mem ranges */ /* Exclude unwanted mem ranges */
ret = elf_header_exclude_ranges(ced, start, end); ret = elf_header_exclude_ranges(ced, res->start, res->end);
if (ret) if (ret)
return ret; return ret;
...@@ -518,14 +518,14 @@ static int add_e820_entry(struct boot_params *params, struct e820_entry *entry) ...@@ -518,14 +518,14 @@ static int add_e820_entry(struct boot_params *params, struct e820_entry *entry)
return 0; return 0;
} }
static int memmap_entry_callback(u64 start, u64 end, void *arg) static int memmap_entry_callback(struct resource *res, void *arg)
{ {
struct crash_memmap_data *cmd = arg; struct crash_memmap_data *cmd = arg;
struct boot_params *params = cmd->params; struct boot_params *params = cmd->params;
struct e820_entry ei; struct e820_entry ei;
ei.addr = start; ei.addr = res->start;
ei.size = end - start + 1; ei.size = resource_size(res);
ei.type = cmd->type; ei.type = cmd->type;
add_e820_entry(params, &ei); add_e820_entry(params, &ei);
...@@ -619,12 +619,12 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params) ...@@ -619,12 +619,12 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
return ret; return ret;
} }
static int determine_backup_region(u64 start, u64 end, void *arg) static int determine_backup_region(struct resource *res, void *arg)
{ {
struct kimage *image = arg; struct kimage *image = arg;
image->arch.backup_src_start = start; image->arch.backup_src_start = res->start;
image->arch.backup_src_sz = end - start + 1; image->arch.backup_src_sz = resource_size(res);
/* Expecting only one range for backup region */ /* Expecting only one range for backup region */
return 1; return 1;
......
...@@ -249,6 +249,10 @@ static void __init fpu__init_system_ctx_switch(void) ...@@ -249,6 +249,10 @@ static void __init fpu__init_system_ctx_switch(void)
*/ */
static void __init fpu__init_parse_early_param(void) static void __init fpu__init_parse_early_param(void)
{ {
char arg[32];
char *argptr = arg;
int bit;
if (cmdline_find_option_bool(boot_command_line, "no387")) if (cmdline_find_option_bool(boot_command_line, "no387"))
setup_clear_cpu_cap(X86_FEATURE_FPU); setup_clear_cpu_cap(X86_FEATURE_FPU);
...@@ -266,6 +270,13 @@ static void __init fpu__init_parse_early_param(void) ...@@ -266,6 +270,13 @@ static void __init fpu__init_parse_early_param(void)
if (cmdline_find_option_bool(boot_command_line, "noxsaves")) if (cmdline_find_option_bool(boot_command_line, "noxsaves"))
setup_clear_cpu_cap(X86_FEATURE_XSAVES); setup_clear_cpu_cap(X86_FEATURE_XSAVES);
if (cmdline_find_option(boot_command_line, "clearcpuid", arg,
sizeof(arg)) &&
get_option(&argptr, &bit) &&
bit >= 0 &&
bit < NCAPINTS * 32)
setup_clear_cpu_cap(bit);
} }
/* /*
......
...@@ -15,6 +15,7 @@ ...@@ -15,6 +15,7 @@
#include <asm/fpu/xstate.h> #include <asm/fpu/xstate.h>
#include <asm/tlbflush.h> #include <asm/tlbflush.h>
#include <asm/cpufeature.h>
/* /*
* Although we spell it out in here, the Processor Trace * Although we spell it out in here, the Processor Trace
...@@ -36,6 +37,19 @@ static const char *xfeature_names[] = ...@@ -36,6 +37,19 @@ static const char *xfeature_names[] =
"unknown xstate feature" , "unknown xstate feature" ,
}; };
static short xsave_cpuid_features[] __initdata = {
X86_FEATURE_FPU,
X86_FEATURE_XMM,
X86_FEATURE_AVX,
X86_FEATURE_MPX,
X86_FEATURE_MPX,
X86_FEATURE_AVX512F,
X86_FEATURE_AVX512F,
X86_FEATURE_AVX512F,
X86_FEATURE_INTEL_PT,
X86_FEATURE_PKU,
};
/* /*
* Mask of xstate features supported by the CPU and the kernel: * Mask of xstate features supported by the CPU and the kernel:
*/ */
...@@ -59,26 +73,6 @@ unsigned int fpu_user_xstate_size; ...@@ -59,26 +73,6 @@ unsigned int fpu_user_xstate_size;
void fpu__xstate_clear_all_cpu_caps(void) void fpu__xstate_clear_all_cpu_caps(void)
{ {
setup_clear_cpu_cap(X86_FEATURE_XSAVE); setup_clear_cpu_cap(X86_FEATURE_XSAVE);
setup_clear_cpu_cap(X86_FEATURE_XSAVEOPT);
setup_clear_cpu_cap(X86_FEATURE_XSAVEC);
setup_clear_cpu_cap(X86_FEATURE_XSAVES);
setup_clear_cpu_cap(X86_FEATURE_AVX);
setup_clear_cpu_cap(X86_FEATURE_AVX2);
setup_clear_cpu_cap(X86_FEATURE_AVX512F);
setup_clear_cpu_cap(X86_FEATURE_AVX512IFMA);
setup_clear_cpu_cap(X86_FEATURE_AVX512PF);
setup_clear_cpu_cap(X86_FEATURE_AVX512ER);
setup_clear_cpu_cap(X86_FEATURE_AVX512CD);
setup_clear_cpu_cap(X86_FEATURE_AVX512DQ);
setup_clear_cpu_cap(X86_FEATURE_AVX512BW);
setup_clear_cpu_cap(X86_FEATURE_AVX512VL);
setup_clear_cpu_cap(X86_FEATURE_MPX);
setup_clear_cpu_cap(X86_FEATURE_XGETBV1);
setup_clear_cpu_cap(X86_FEATURE_AVX512VBMI);
setup_clear_cpu_cap(X86_FEATURE_PKU);
setup_clear_cpu_cap(X86_FEATURE_AVX512_4VNNIW);
setup_clear_cpu_cap(X86_FEATURE_AVX512_4FMAPS);
setup_clear_cpu_cap(X86_FEATURE_AVX512_VPOPCNTDQ);
} }
/* /*
...@@ -726,6 +720,7 @@ void __init fpu__init_system_xstate(void) ...@@ -726,6 +720,7 @@ void __init fpu__init_system_xstate(void)
unsigned int eax, ebx, ecx, edx; unsigned int eax, ebx, ecx, edx;
static int on_boot_cpu __initdata = 1; static int on_boot_cpu __initdata = 1;
int err; int err;
int i;
WARN_ON_FPU(!on_boot_cpu); WARN_ON_FPU(!on_boot_cpu);
on_boot_cpu = 0; on_boot_cpu = 0;
...@@ -759,6 +754,14 @@ void __init fpu__init_system_xstate(void) ...@@ -759,6 +754,14 @@ void __init fpu__init_system_xstate(void)
goto out_disable; goto out_disable;
} }
/*
* Clear XSAVE features that are disabled in the normal CPUID.
*/
for (i = 0; i < ARRAY_SIZE(xsave_cpuid_features); i++) {
if (!boot_cpu_has(xsave_cpuid_features[i]))
xfeatures_mask &= ~BIT(i);
}
xfeatures_mask &= fpu__get_supported_xfeatures_mask(); xfeatures_mask &= fpu__get_supported_xfeatures_mask();
/* Enable xstate instructions to be able to continue with initialization: */ /* Enable xstate instructions to be able to continue with initialization: */
......
...@@ -212,9 +212,6 @@ ENTRY(startup_32_smp) ...@@ -212,9 +212,6 @@ ENTRY(startup_32_smp)
#endif #endif
.Ldefault_entry: .Ldefault_entry:
#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \
X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \
X86_CR0_PG)
movl $(CR0_STATE & ~X86_CR0_PG),%eax movl $(CR0_STATE & ~X86_CR0_PG),%eax
movl %eax,%cr0 movl %eax,%cr0
...@@ -402,7 +399,7 @@ ENTRY(early_idt_handler_array) ...@@ -402,7 +399,7 @@ ENTRY(early_idt_handler_array)
# 24(%rsp) error code # 24(%rsp) error code
i = 0 i = 0
.rept NUM_EXCEPTION_VECTORS .rept NUM_EXCEPTION_VECTORS
.ifeq (EXCEPTION_ERRCODE_MASK >> i) & 1 .if ((EXCEPTION_ERRCODE_MASK >> i) & 1) == 0
pushl $0 # Dummy error code, to make stack frame uniform pushl $0 # Dummy error code, to make stack frame uniform
.endif .endif
pushl $i # 20(%esp) Vector number pushl $i # 20(%esp) Vector number
......
...@@ -38,11 +38,12 @@ ...@@ -38,11 +38,12 @@
* *
*/ */
#define p4d_index(x) (((x) >> P4D_SHIFT) & (PTRS_PER_P4D-1))
#define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1)) #define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH)
PGD_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE) PGD_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE)
PGD_START_KERNEL = pgd_index(__START_KERNEL_map) PGD_START_KERNEL = pgd_index(__START_KERNEL_map)
#endif
L3_START_KERNEL = pud_index(__START_KERNEL_map) L3_START_KERNEL = pud_index(__START_KERNEL_map)
.text .text
...@@ -50,6 +51,7 @@ L3_START_KERNEL = pud_index(__START_KERNEL_map) ...@@ -50,6 +51,7 @@ L3_START_KERNEL = pud_index(__START_KERNEL_map)
.code64 .code64
.globl startup_64 .globl startup_64
startup_64: startup_64:
UNWIND_HINT_EMPTY
/* /*
* At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0, * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0,
* and someone has loaded an identity mapped page table * and someone has loaded an identity mapped page table
...@@ -89,6 +91,7 @@ startup_64: ...@@ -89,6 +91,7 @@ startup_64:
addq $(early_top_pgt - __START_KERNEL_map), %rax addq $(early_top_pgt - __START_KERNEL_map), %rax
jmp 1f jmp 1f
ENTRY(secondary_startup_64) ENTRY(secondary_startup_64)
UNWIND_HINT_EMPTY
/* /*
* At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0, * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0,
* and someone has loaded a mapped page table. * and someone has loaded a mapped page table.
...@@ -133,6 +136,7 @@ ENTRY(secondary_startup_64) ...@@ -133,6 +136,7 @@ ENTRY(secondary_startup_64)
movq $1f, %rax movq $1f, %rax
jmp *%rax jmp *%rax
1: 1:
UNWIND_HINT_EMPTY
/* Check if nx is implemented */ /* Check if nx is implemented */
movl $0x80000001, %eax movl $0x80000001, %eax
...@@ -150,9 +154,6 @@ ENTRY(secondary_startup_64) ...@@ -150,9 +154,6 @@ ENTRY(secondary_startup_64)
1: wrmsr /* Make changes effective */ 1: wrmsr /* Make changes effective */
/* Setup cr0 */ /* Setup cr0 */
#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \
X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \
X86_CR0_PG)
movl $CR0_STATE, %eax movl $CR0_STATE, %eax
/* Make changes effective */ /* Make changes effective */
movq %rax, %cr0 movq %rax, %cr0
...@@ -235,7 +236,7 @@ ENTRY(secondary_startup_64) ...@@ -235,7 +236,7 @@ ENTRY(secondary_startup_64)
pushq %rax # target address in negative space pushq %rax # target address in negative space
lretq lretq
.Lafter_lret: .Lafter_lret:
ENDPROC(secondary_startup_64) END(secondary_startup_64)
#include "verify_cpu.S" #include "verify_cpu.S"
...@@ -247,6 +248,7 @@ ENDPROC(secondary_startup_64) ...@@ -247,6 +248,7 @@ ENDPROC(secondary_startup_64)
*/ */
ENTRY(start_cpu0) ENTRY(start_cpu0)
movq initial_stack(%rip), %rsp movq initial_stack(%rip), %rsp
UNWIND_HINT_EMPTY
jmp .Ljump_to_C_code jmp .Ljump_to_C_code
ENDPROC(start_cpu0) ENDPROC(start_cpu0)
#endif #endif
...@@ -266,26 +268,24 @@ ENDPROC(start_cpu0) ...@@ -266,26 +268,24 @@ ENDPROC(start_cpu0)
.quad init_thread_union + THREAD_SIZE - SIZEOF_PTREGS .quad init_thread_union + THREAD_SIZE - SIZEOF_PTREGS
__FINITDATA __FINITDATA
bad_address:
jmp bad_address
__INIT __INIT
ENTRY(early_idt_handler_array) ENTRY(early_idt_handler_array)
# 104(%rsp) %rflags
# 96(%rsp) %cs
# 88(%rsp) %rip
# 80(%rsp) error code
i = 0 i = 0
.rept NUM_EXCEPTION_VECTORS .rept NUM_EXCEPTION_VECTORS
.ifeq (EXCEPTION_ERRCODE_MASK >> i) & 1 .if ((EXCEPTION_ERRCODE_MASK >> i) & 1) == 0
UNWIND_HINT_IRET_REGS
pushq $0 # Dummy error code, to make stack frame uniform pushq $0 # Dummy error code, to make stack frame uniform
.else
UNWIND_HINT_IRET_REGS offset=8
.endif .endif
pushq $i # 72(%rsp) Vector number pushq $i # 72(%rsp) Vector number
jmp early_idt_handler_common jmp early_idt_handler_common
UNWIND_HINT_IRET_REGS
i = i + 1 i = i + 1
.fill early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc .fill early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc
.endr .endr
ENDPROC(early_idt_handler_array) UNWIND_HINT_IRET_REGS offset=16
END(early_idt_handler_array)
early_idt_handler_common: early_idt_handler_common:
/* /*
...@@ -313,6 +313,7 @@ early_idt_handler_common: ...@@ -313,6 +313,7 @@ early_idt_handler_common:
pushq %r13 /* pt_regs->r13 */ pushq %r13 /* pt_regs->r13 */
pushq %r14 /* pt_regs->r14 */ pushq %r14 /* pt_regs->r14 */
pushq %r15 /* pt_regs->r15 */ pushq %r15 /* pt_regs->r15 */
UNWIND_HINT_REGS
cmpq $14,%rsi /* Page fault? */ cmpq $14,%rsi /* Page fault? */
jnz 10f jnz 10f
...@@ -327,8 +328,8 @@ early_idt_handler_common: ...@@ -327,8 +328,8 @@ early_idt_handler_common:
20: 20:
decl early_recursion_flag(%rip) decl early_recursion_flag(%rip)
jmp restore_regs_and_iret jmp restore_regs_and_return_to_kernel
ENDPROC(early_idt_handler_common) END(early_idt_handler_common)
__INITDATA __INITDATA
...@@ -362,10 +363,7 @@ NEXT_PAGE(early_dynamic_pgts) ...@@ -362,10 +363,7 @@ NEXT_PAGE(early_dynamic_pgts)
.data .data
#ifndef CONFIG_XEN #if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH)
NEXT_PAGE(init_top_pgt)
.fill 512,8,0
#else
NEXT_PAGE(init_top_pgt) NEXT_PAGE(init_top_pgt)
.quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
.org init_top_pgt + PGD_PAGE_OFFSET*8, 0 .org init_top_pgt + PGD_PAGE_OFFSET*8, 0
...@@ -382,6 +380,9 @@ NEXT_PAGE(level2_ident_pgt) ...@@ -382,6 +380,9 @@ NEXT_PAGE(level2_ident_pgt)
* Don't set NX because code runs from these pages. * Don't set NX because code runs from these pages.
*/ */
PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD) PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
#else
NEXT_PAGE(init_top_pgt)
.fill 512,8,0
#endif #endif
#ifdef CONFIG_X86_5LEVEL #ifdef CONFIG_X86_5LEVEL
......
...@@ -75,8 +75,8 @@ static int parse_no_kvmclock_vsyscall(char *arg) ...@@ -75,8 +75,8 @@ static int parse_no_kvmclock_vsyscall(char *arg)
early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall); early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64); static DEFINE_PER_CPU_DECRYPTED(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64); static DEFINE_PER_CPU_DECRYPTED(struct kvm_steal_time, steal_time) __aligned(64);
static int has_steal_clock = 0; static int has_steal_clock = 0;
/* /*
...@@ -312,7 +312,7 @@ static void kvm_register_steal_time(void) ...@@ -312,7 +312,7 @@ static void kvm_register_steal_time(void)
cpu, (unsigned long long) slow_virt_to_phys(st)); cpu, (unsigned long long) slow_virt_to_phys(st));
} }
static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED; static DEFINE_PER_CPU_DECRYPTED(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val) static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val)
{ {
...@@ -426,9 +426,42 @@ void kvm_disable_steal_time(void) ...@@ -426,9 +426,42 @@ void kvm_disable_steal_time(void)
wrmsr(MSR_KVM_STEAL_TIME, 0, 0); wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
} }
static inline void __set_percpu_decrypted(void *ptr, unsigned long size)
{
early_set_memory_decrypted((unsigned long) ptr, size);
}
/*
* Iterate through all possible CPUs and map the memory region pointed
* by apf_reason, steal_time and kvm_apic_eoi as decrypted at once.
*
* Note: we iterate through all possible CPUs to ensure that CPUs
* hotplugged will have their per-cpu variable already mapped as
* decrypted.
*/
static void __init sev_map_percpu_data(void)
{
int cpu;
if (!sev_active())
return;
for_each_possible_cpu(cpu) {
__set_percpu_decrypted(&per_cpu(apf_reason, cpu), sizeof(apf_reason));
__set_percpu_decrypted(&per_cpu(steal_time, cpu), sizeof(steal_time));
__set_percpu_decrypted(&per_cpu(kvm_apic_eoi, cpu), sizeof(kvm_apic_eoi));
}
}
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
static void __init kvm_smp_prepare_boot_cpu(void) static void __init kvm_smp_prepare_boot_cpu(void)
{ {
/*
* Map the per-cpu variables as decrypted before kvm_guest_cpu_init()
* shares the guest physical address with the hypervisor.
*/
sev_map_percpu_data();
kvm_guest_cpu_init(); kvm_guest_cpu_init();
native_smp_prepare_boot_cpu(); native_smp_prepare_boot_cpu();
kvm_spinlock_init(); kvm_spinlock_init();
...@@ -496,6 +529,7 @@ void __init kvm_guest_init(void) ...@@ -496,6 +529,7 @@ void __init kvm_guest_init(void)
kvm_cpu_online, kvm_cpu_down_prepare) < 0) kvm_cpu_online, kvm_cpu_down_prepare) < 0)
pr_err("kvm_guest: Failed to install cpu hotplug callbacks\n"); pr_err("kvm_guest: Failed to install cpu hotplug callbacks\n");
#else #else
sev_map_percpu_data();
kvm_guest_cpu_init(); kvm_guest_cpu_init();
#endif #endif
......
...@@ -27,6 +27,7 @@ ...@@ -27,6 +27,7 @@
#include <linux/sched.h> #include <linux/sched.h>
#include <linux/sched/clock.h> #include <linux/sched/clock.h>
#include <asm/mem_encrypt.h>
#include <asm/x86_init.h> #include <asm/x86_init.h>
#include <asm/reboot.h> #include <asm/reboot.h>
#include <asm/kvmclock.h> #include <asm/kvmclock.h>
...@@ -45,7 +46,7 @@ early_param("no-kvmclock", parse_no_kvmclock); ...@@ -45,7 +46,7 @@ early_param("no-kvmclock", parse_no_kvmclock);
/* The hypervisor will put information about time periodically here */ /* The hypervisor will put information about time periodically here */
static struct pvclock_vsyscall_time_info *hv_clock; static struct pvclock_vsyscall_time_info *hv_clock;
static struct pvclock_wall_clock wall_clock; static struct pvclock_wall_clock *wall_clock;
struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void) struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
{ {
...@@ -64,15 +65,15 @@ static void kvm_get_wallclock(struct timespec *now) ...@@ -64,15 +65,15 @@ static void kvm_get_wallclock(struct timespec *now)
int low, high; int low, high;
int cpu; int cpu;
low = (int)__pa_symbol(&wall_clock); low = (int)slow_virt_to_phys(wall_clock);
high = ((u64)__pa_symbol(&wall_clock) >> 32); high = ((u64)slow_virt_to_phys(wall_clock) >> 32);
native_write_msr(msr_kvm_wall_clock, low, high); native_write_msr(msr_kvm_wall_clock, low, high);
cpu = get_cpu(); cpu = get_cpu();
vcpu_time = &hv_clock[cpu].pvti; vcpu_time = &hv_clock[cpu].pvti;
pvclock_read_wallclock(&wall_clock, vcpu_time, now); pvclock_read_wallclock(wall_clock, vcpu_time, now);
put_cpu(); put_cpu();
} }
...@@ -249,11 +250,39 @@ static void kvm_shutdown(void) ...@@ -249,11 +250,39 @@ static void kvm_shutdown(void)
native_machine_shutdown(); native_machine_shutdown();
} }
static phys_addr_t __init kvm_memblock_alloc(phys_addr_t size,
phys_addr_t align)
{
phys_addr_t mem;
mem = memblock_alloc(size, align);
if (!mem)
return 0;
if (sev_active()) {
if (early_set_memory_decrypted((unsigned long)__va(mem), size))
goto e_free;
}
return mem;
e_free:
memblock_free(mem, size);
return 0;
}
static void __init kvm_memblock_free(phys_addr_t addr, phys_addr_t size)
{
if (sev_active())
early_set_memory_encrypted((unsigned long)__va(addr), size);
memblock_free(addr, size);
}
void __init kvmclock_init(void) void __init kvmclock_init(void)
{ {
struct pvclock_vcpu_time_info *vcpu_time; struct pvclock_vcpu_time_info *vcpu_time;
unsigned long mem; unsigned long mem, mem_wall_clock;
int size, cpu; int size, cpu, wall_clock_size;
u8 flags; u8 flags;
size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS); size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS);
...@@ -267,21 +296,35 @@ void __init kvmclock_init(void) ...@@ -267,21 +296,35 @@ void __init kvmclock_init(void)
} else if (!(kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE))) } else if (!(kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
return; return;
printk(KERN_INFO "kvm-clock: Using msrs %x and %x", wall_clock_size = PAGE_ALIGN(sizeof(struct pvclock_wall_clock));
msr_kvm_system_time, msr_kvm_wall_clock); mem_wall_clock = kvm_memblock_alloc(wall_clock_size, PAGE_SIZE);
if (!mem_wall_clock)
return;
mem = memblock_alloc(size, PAGE_SIZE); wall_clock = __va(mem_wall_clock);
if (!mem) memset(wall_clock, 0, wall_clock_size);
mem = kvm_memblock_alloc(size, PAGE_SIZE);
if (!mem) {
kvm_memblock_free(mem_wall_clock, wall_clock_size);
wall_clock = NULL;
return; return;
}
hv_clock = __va(mem); hv_clock = __va(mem);
memset(hv_clock, 0, size); memset(hv_clock, 0, size);
if (kvm_register_clock("primary cpu clock")) { if (kvm_register_clock("primary cpu clock")) {
hv_clock = NULL; hv_clock = NULL;
memblock_free(mem, size); kvm_memblock_free(mem, size);
kvm_memblock_free(mem_wall_clock, wall_clock_size);
wall_clock = NULL;
return; return;
} }
printk(KERN_INFO "kvm-clock: Using msrs %x and %x",
msr_kvm_system_time, msr_kvm_wall_clock);
if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT)) if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT); pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
......
...@@ -13,6 +13,7 @@ ...@@ -13,6 +13,7 @@
#include <linux/string.h> #include <linux/string.h>
#include <linux/mm.h> #include <linux/mm.h>
#include <linux/smp.h> #include <linux/smp.h>
#include <linux/syscalls.h>
#include <linux/slab.h> #include <linux/slab.h>
#include <linux/vmalloc.h> #include <linux/vmalloc.h>
#include <linux/uaccess.h> #include <linux/uaccess.h>
...@@ -295,8 +296,8 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode) ...@@ -295,8 +296,8 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
return error; return error;
} }
asmlinkage int sys_modify_ldt(int func, void __user *ptr, SYSCALL_DEFINE3(modify_ldt, int , func , void __user * , ptr ,
unsigned long bytecount) unsigned long , bytecount)
{ {
int ret = -ENOSYS; int ret = -ENOSYS;
...@@ -314,5 +315,14 @@ asmlinkage int sys_modify_ldt(int func, void __user *ptr, ...@@ -314,5 +315,14 @@ asmlinkage int sys_modify_ldt(int func, void __user *ptr,
ret = write_ldt(ptr, bytecount, 0); ret = write_ldt(ptr, bytecount, 0);
break; break;
} }
return ret; /*
* The SYSCALL_DEFINE() macros give us an 'unsigned long'
* return type, but tht ABI for sys_modify_ldt() expects
* 'int'. This cast gives us an int-sized value in %rax
* for the return code. The 'unsigned' is necessary so
* the compiler does not try to sign-extend the negative
* return codes into the high half of the register when
* taking the value from int->long.
*/
return (unsigned int)ret;
} }
...@@ -7,7 +7,7 @@ ...@@ -7,7 +7,7 @@
#include <linux/init.h> #include <linux/init.h>
#include <linux/ioport.h> #include <linux/ioport.h>
static int found(u64 start, u64 end, void *data) static int found(struct resource *res, void *data)
{ {
return 1; return 1;
} }
......
...@@ -49,7 +49,13 @@ ...@@ -49,7 +49,13 @@
*/ */
__visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = { __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = {
.x86_tss = { .x86_tss = {
.sp0 = TOP_OF_INIT_STACK, /*
* .sp0 is only used when entering ring 0 from a lower
* privilege level. Since the init task never runs anything
* but ring 0 code, there is no need for a valid value here.
* Poison it.
*/
.sp0 = (1UL << (BITS_PER_LONG-1)) + 1,
#ifdef CONFIG_X86_32 #ifdef CONFIG_X86_32
.ss0 = __KERNEL_DS, .ss0 = __KERNEL_DS,
.ss1 = __KERNEL_CS, .ss1 = __KERNEL_CS,
......
...@@ -284,9 +284,11 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) ...@@ -284,9 +284,11 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
/* /*
* Reload esp0 and cpu_current_top_of_stack. This changes * Reload esp0 and cpu_current_top_of_stack. This changes
* current_thread_info(). * current_thread_info(). Refresh the SYSENTER configuration in
* case prev or next is vm86.
*/ */
load_sp0(tss, next); update_sp0(next_p);
refresh_sysenter_cs(next);
this_cpu_write(cpu_current_top_of_stack, this_cpu_write(cpu_current_top_of_stack,
(unsigned long)task_stack_page(next_p) + (unsigned long)task_stack_page(next_p) +
THREAD_SIZE); THREAD_SIZE);
......
...@@ -274,7 +274,6 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp, ...@@ -274,7 +274,6 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
struct inactive_task_frame *frame; struct inactive_task_frame *frame;
struct task_struct *me = current; struct task_struct *me = current;
p->thread.sp0 = (unsigned long)task_stack_page(p) + THREAD_SIZE;
childregs = task_pt_regs(p); childregs = task_pt_regs(p);
fork_frame = container_of(childregs, struct fork_frame, regs); fork_frame = container_of(childregs, struct fork_frame, regs);
frame = &fork_frame->frame; frame = &fork_frame->frame;
...@@ -464,8 +463,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) ...@@ -464,8 +463,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
*/ */
this_cpu_write(current_task, next_p); this_cpu_write(current_task, next_p);
/* Reload esp0 and ss1. This changes current_thread_info(). */ /* Reload sp0. */
load_sp0(tss, next); update_sp0(next_p);
/* /*
* Now maybe reload the debug registers and handle I/O bitmaps * Now maybe reload the debug registers and handle I/O bitmaps
......
...@@ -380,8 +380,10 @@ static void __init reserve_initrd(void) ...@@ -380,8 +380,10 @@ static void __init reserve_initrd(void)
* If SME is active, this memory will be marked encrypted by the * If SME is active, this memory will be marked encrypted by the
* kernel when it is accessed (including relocation). However, the * kernel when it is accessed (including relocation). However, the
* ramdisk image was loaded decrypted by the bootloader, so make * ramdisk image was loaded decrypted by the bootloader, so make
* sure that it is encrypted before accessing it. * sure that it is encrypted before accessing it. For SEV the
* ramdisk will already be encrypted, so only do this for SME.
*/ */
if (sme_active())
sme_early_encrypt(ramdisk_image, ramdisk_end - ramdisk_image); sme_early_encrypt(ramdisk_image, ramdisk_end - ramdisk_image);
initrd_start = 0; initrd_start = 0;
......
...@@ -963,8 +963,7 @@ void common_cpu_up(unsigned int cpu, struct task_struct *idle) ...@@ -963,8 +963,7 @@ void common_cpu_up(unsigned int cpu, struct task_struct *idle)
#ifdef CONFIG_X86_32 #ifdef CONFIG_X86_32
/* Stack for startup_32 can be just as for start_secondary onwards */ /* Stack for startup_32 can be just as for start_secondary onwards */
irq_ctx_init(cpu); irq_ctx_init(cpu);
per_cpu(cpu_current_top_of_stack, cpu) = per_cpu(cpu_current_top_of_stack, cpu) = task_top_of_stack(idle);
(unsigned long)task_stack_page(idle) + THREAD_SIZE;
#else #else
initial_gs = per_cpu_offset(cpu); initial_gs = per_cpu_offset(cpu);
#endif #endif
......
...@@ -60,6 +60,7 @@ ...@@ -60,6 +60,7 @@
#include <asm/trace/mpx.h> #include <asm/trace/mpx.h>
#include <asm/mpx.h> #include <asm/mpx.h>
#include <asm/vm86.h> #include <asm/vm86.h>
#include <asm/umip.h>
#ifdef CONFIG_X86_64 #ifdef CONFIG_X86_64
#include <asm/x86_init.h> #include <asm/x86_init.h>
...@@ -141,8 +142,7 @@ void ist_begin_non_atomic(struct pt_regs *regs) ...@@ -141,8 +142,7 @@ void ist_begin_non_atomic(struct pt_regs *regs)
* will catch asm bugs and any attempt to use ist_preempt_enable * will catch asm bugs and any attempt to use ist_preempt_enable
* from double_fault. * from double_fault.
*/ */
BUG_ON((unsigned long)(current_top_of_stack() - BUG_ON(!on_thread_stack());
current_stack_pointer) >= THREAD_SIZE);
preempt_enable_no_resched(); preempt_enable_no_resched();
} }
...@@ -518,6 +518,11 @@ do_general_protection(struct pt_regs *regs, long error_code) ...@@ -518,6 +518,11 @@ do_general_protection(struct pt_regs *regs, long error_code)
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU"); RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
cond_local_irq_enable(regs); cond_local_irq_enable(regs);
if (static_cpu_has(X86_FEATURE_UMIP)) {
if (user_mode(regs) && fixup_umip_exception(regs))
return;
}
if (v8086_mode(regs)) { if (v8086_mode(regs)) {
local_irq_enable(); local_irq_enable();
handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code); handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
......
This diff is collapsed.
...@@ -271,12 +271,15 @@ static bool is_prefix_bad(struct insn *insn) ...@@ -271,12 +271,15 @@ static bool is_prefix_bad(struct insn *insn)
int i; int i;
for (i = 0; i < insn->prefixes.nbytes; i++) { for (i = 0; i < insn->prefixes.nbytes; i++) {
switch (insn->prefixes.bytes[i]) { insn_attr_t attr;
case 0x26: /* INAT_PFX_ES */
case 0x2E: /* INAT_PFX_CS */ attr = inat_get_opcode_attribute(insn->prefixes.bytes[i]);
case 0x36: /* INAT_PFX_DS */ switch (attr) {
case 0x3E: /* INAT_PFX_SS */ case INAT_MAKE_PREFIX(INAT_PFX_ES):
case 0xF0: /* INAT_PFX_LOCK */ case INAT_MAKE_PREFIX(INAT_PFX_CS):
case INAT_MAKE_PREFIX(INAT_PFX_DS):
case INAT_MAKE_PREFIX(INAT_PFX_SS):
case INAT_MAKE_PREFIX(INAT_PFX_LOCK):
return true; return true;
} }
} }
......
...@@ -33,7 +33,7 @@ ...@@ -33,7 +33,7 @@
#include <asm/cpufeatures.h> #include <asm/cpufeatures.h>
#include <asm/msr-index.h> #include <asm/msr-index.h>
verify_cpu: ENTRY(verify_cpu)
pushf # Save caller passed flags pushf # Save caller passed flags
push $0 # Kill any dangerous flags push $0 # Kill any dangerous flags
popf popf
...@@ -139,3 +139,4 @@ verify_cpu: ...@@ -139,3 +139,4 @@ verify_cpu:
popf # Restore caller passed flags popf # Restore caller passed flags
xorl %eax, %eax xorl %eax, %eax
ret ret
ENDPROC(verify_cpu)
...@@ -55,6 +55,7 @@ ...@@ -55,6 +55,7 @@
#include <asm/irq.h> #include <asm/irq.h>
#include <asm/traps.h> #include <asm/traps.h>
#include <asm/vm86.h> #include <asm/vm86.h>
#include <asm/switch_to.h>
/* /*
* Known problems: * Known problems:
...@@ -94,7 +95,6 @@ ...@@ -94,7 +95,6 @@
void save_v86_state(struct kernel_vm86_regs *regs, int retval) void save_v86_state(struct kernel_vm86_regs *regs, int retval)
{ {
struct tss_struct *tss;
struct task_struct *tsk = current; struct task_struct *tsk = current;
struct vm86plus_struct __user *user; struct vm86plus_struct __user *user;
struct vm86 *vm86 = current->thread.vm86; struct vm86 *vm86 = current->thread.vm86;
...@@ -146,12 +146,13 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval) ...@@ -146,12 +146,13 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
do_exit(SIGSEGV); do_exit(SIGSEGV);
} }
tss = &per_cpu(cpu_tss, get_cpu()); preempt_disable();
tsk->thread.sp0 = vm86->saved_sp0; tsk->thread.sp0 = vm86->saved_sp0;
tsk->thread.sysenter_cs = __KERNEL_CS; tsk->thread.sysenter_cs = __KERNEL_CS;
load_sp0(tss, &tsk->thread); update_sp0(tsk);
refresh_sysenter_cs(&tsk->thread);
vm86->saved_sp0 = 0; vm86->saved_sp0 = 0;
put_cpu(); preempt_enable();
memcpy(&regs->pt, &vm86->regs32, sizeof(struct pt_regs)); memcpy(&regs->pt, &vm86->regs32, sizeof(struct pt_regs));
...@@ -237,7 +238,6 @@ SYSCALL_DEFINE2(vm86, unsigned long, cmd, unsigned long, arg) ...@@ -237,7 +238,6 @@ SYSCALL_DEFINE2(vm86, unsigned long, cmd, unsigned long, arg)
static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus) static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
{ {
struct tss_struct *tss;
struct task_struct *tsk = current; struct task_struct *tsk = current;
struct vm86 *vm86 = tsk->thread.vm86; struct vm86 *vm86 = tsk->thread.vm86;
struct kernel_vm86_regs vm86regs; struct kernel_vm86_regs vm86regs;
...@@ -365,15 +365,17 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus) ...@@ -365,15 +365,17 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
vm86->saved_sp0 = tsk->thread.sp0; vm86->saved_sp0 = tsk->thread.sp0;
lazy_save_gs(vm86->regs32.gs); lazy_save_gs(vm86->regs32.gs);
tss = &per_cpu(cpu_tss, get_cpu());
/* make room for real-mode segments */ /* make room for real-mode segments */
preempt_disable();
tsk->thread.sp0 += 16; tsk->thread.sp0 += 16;
if (static_cpu_has(X86_FEATURE_SEP)) if (static_cpu_has(X86_FEATURE_SEP)) {
tsk->thread.sysenter_cs = 0; tsk->thread.sysenter_cs = 0;
refresh_sysenter_cs(&tsk->thread);
}
load_sp0(tss, &tsk->thread); update_sp0(tsk);
put_cpu(); preempt_enable();
if (vm86->flags & VM86_SCREEN_BITMAP) if (vm86->flags & VM86_SCREEN_BITMAP)
mark_screen_rdonly(tsk->mm); mark_screen_rdonly(tsk->mm);
......
...@@ -24,7 +24,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o ...@@ -24,7 +24,7 @@ lib-y := delay.o misc.o cmdline.o cpu.o
lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
lib-y += memcpy_$(BITS).o lib-y += memcpy_$(BITS).o
lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
......
This diff is collapsed.
...@@ -29,26 +29,6 @@ ...@@ -29,26 +29,6 @@
#define CREATE_TRACE_POINTS #define CREATE_TRACE_POINTS
#include <asm/trace/exceptions.h> #include <asm/trace/exceptions.h>
/*
* Page fault error code bits:
*
* bit 0 == 0: no page found 1: protection fault
* bit 1 == 0: read access 1: write access
* bit 2 == 0: kernel-mode access 1: user-mode access
* bit 3 == 1: use of reserved bit detected
* bit 4 == 1: fault was an instruction fetch
* bit 5 == 1: protection keys block access
*/
enum x86_pf_error_code {
PF_PROT = 1 << 0,
PF_WRITE = 1 << 1,
PF_USER = 1 << 2,
PF_RSVD = 1 << 3,
PF_INSTR = 1 << 4,
PF_PK = 1 << 5,
};
/* /*
* Returns 0 if mmiotrace is disabled, or if the fault is not * Returns 0 if mmiotrace is disabled, or if the fault is not
* handled by mmiotrace: * handled by mmiotrace:
...@@ -150,7 +130,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr) ...@@ -150,7 +130,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr)
* If it was a exec (instruction fetch) fault on NX page, then * If it was a exec (instruction fetch) fault on NX page, then
* do not ignore the fault: * do not ignore the fault:
*/ */
if (error_code & PF_INSTR) if (error_code & X86_PF_INSTR)
return 0; return 0;
instr = (void *)convert_ip_to_linear(current, regs); instr = (void *)convert_ip_to_linear(current, regs);
...@@ -180,7 +160,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr) ...@@ -180,7 +160,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr)
* siginfo so userspace can discover which protection key was set * siginfo so userspace can discover which protection key was set
* on the PTE. * on the PTE.
* *
* If we get here, we know that the hardware signaled a PF_PK * If we get here, we know that the hardware signaled a X86_PF_PK
* fault and that there was a VMA once we got in the fault * fault and that there was a VMA once we got in the fault
* handler. It does *not* guarantee that the VMA we find here * handler. It does *not* guarantee that the VMA we find here
* was the one that we faulted on. * was the one that we faulted on.
...@@ -205,7 +185,7 @@ static void fill_sig_info_pkey(int si_code, siginfo_t *info, u32 *pkey) ...@@ -205,7 +185,7 @@ static void fill_sig_info_pkey(int si_code, siginfo_t *info, u32 *pkey)
/* /*
* force_sig_info_fault() is called from a number of * force_sig_info_fault() is called from a number of
* contexts, some of which have a VMA and some of which * contexts, some of which have a VMA and some of which
* do not. The PF_PK handing happens after we have a * do not. The X86_PF_PK handing happens after we have a
* valid VMA, so we should never reach this without a * valid VMA, so we should never reach this without a
* valid VMA. * valid VMA.
*/ */
...@@ -698,7 +678,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, ...@@ -698,7 +678,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code,
if (!oops_may_print()) if (!oops_may_print())
return; return;
if (error_code & PF_INSTR) { if (error_code & X86_PF_INSTR) {
unsigned int level; unsigned int level;
pgd_t *pgd; pgd_t *pgd;
pte_t *pte; pte_t *pte;
...@@ -780,7 +760,7 @@ no_context(struct pt_regs *regs, unsigned long error_code, ...@@ -780,7 +760,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
*/ */
if (current->thread.sig_on_uaccess_err && signal) { if (current->thread.sig_on_uaccess_err && signal) {
tsk->thread.trap_nr = X86_TRAP_PF; tsk->thread.trap_nr = X86_TRAP_PF;
tsk->thread.error_code = error_code | PF_USER; tsk->thread.error_code = error_code | X86_PF_USER;
tsk->thread.cr2 = address; tsk->thread.cr2 = address;
/* XXX: hwpoison faults will set the wrong code. */ /* XXX: hwpoison faults will set the wrong code. */
...@@ -898,7 +878,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, ...@@ -898,7 +878,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
struct task_struct *tsk = current; struct task_struct *tsk = current;
/* User mode accesses just cause a SIGSEGV */ /* User mode accesses just cause a SIGSEGV */
if (error_code & PF_USER) { if (error_code & X86_PF_USER) {
/* /*
* It's possible to have interrupts off here: * It's possible to have interrupts off here:
*/ */
...@@ -919,7 +899,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, ...@@ -919,7 +899,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
* Instruction fetch faults in the vsyscall page might need * Instruction fetch faults in the vsyscall page might need
* emulation. * emulation.
*/ */
if (unlikely((error_code & PF_INSTR) && if (unlikely((error_code & X86_PF_INSTR) &&
((address & ~0xfff) == VSYSCALL_ADDR))) { ((address & ~0xfff) == VSYSCALL_ADDR))) {
if (emulate_vsyscall(regs, address)) if (emulate_vsyscall(regs, address))
return; return;
...@@ -932,7 +912,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, ...@@ -932,7 +912,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
* are always protection faults. * are always protection faults.
*/ */
if (address >= TASK_SIZE_MAX) if (address >= TASK_SIZE_MAX)
error_code |= PF_PROT; error_code |= X86_PF_PROT;
if (likely(show_unhandled_signals)) if (likely(show_unhandled_signals))
show_signal_msg(regs, error_code, address, tsk); show_signal_msg(regs, error_code, address, tsk);
...@@ -993,11 +973,11 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code, ...@@ -993,11 +973,11 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code,
if (!boot_cpu_has(X86_FEATURE_OSPKE)) if (!boot_cpu_has(X86_FEATURE_OSPKE))
return false; return false;
if (error_code & PF_PK) if (error_code & X86_PF_PK)
return true; return true;
/* this checks permission keys on the VMA: */ /* this checks permission keys on the VMA: */
if (!arch_vma_access_permitted(vma, (error_code & PF_WRITE), if (!arch_vma_access_permitted(vma, (error_code & X86_PF_WRITE),
(error_code & PF_INSTR), foreign)) (error_code & X86_PF_INSTR), foreign))
return true; return true;
return false; return false;
} }
...@@ -1025,7 +1005,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address, ...@@ -1025,7 +1005,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
int code = BUS_ADRERR; int code = BUS_ADRERR;
/* Kernel mode? Handle exceptions or die: */ /* Kernel mode? Handle exceptions or die: */
if (!(error_code & PF_USER)) { if (!(error_code & X86_PF_USER)) {
no_context(regs, error_code, address, SIGBUS, BUS_ADRERR); no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
return; return;
} }
...@@ -1053,14 +1033,14 @@ static noinline void ...@@ -1053,14 +1033,14 @@ static noinline void
mm_fault_error(struct pt_regs *regs, unsigned long error_code, mm_fault_error(struct pt_regs *regs, unsigned long error_code,
unsigned long address, u32 *pkey, unsigned int fault) unsigned long address, u32 *pkey, unsigned int fault)
{ {
if (fatal_signal_pending(current) && !(error_code & PF_USER)) { if (fatal_signal_pending(current) && !(error_code & X86_PF_USER)) {
no_context(regs, error_code, address, 0, 0); no_context(regs, error_code, address, 0, 0);
return; return;
} }
if (fault & VM_FAULT_OOM) { if (fault & VM_FAULT_OOM) {
/* Kernel mode? Handle exceptions or die: */ /* Kernel mode? Handle exceptions or die: */
if (!(error_code & PF_USER)) { if (!(error_code & X86_PF_USER)) {
no_context(regs, error_code, address, no_context(regs, error_code, address,
SIGSEGV, SEGV_MAPERR); SIGSEGV, SEGV_MAPERR);
return; return;
...@@ -1085,16 +1065,16 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code, ...@@ -1085,16 +1065,16 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
static int spurious_fault_check(unsigned long error_code, pte_t *pte) static int spurious_fault_check(unsigned long error_code, pte_t *pte)
{ {
if ((error_code & PF_WRITE) && !pte_write(*pte)) if ((error_code & X86_PF_WRITE) && !pte_write(*pte))
return 0; return 0;
if ((error_code & PF_INSTR) && !pte_exec(*pte)) if ((error_code & X86_PF_INSTR) && !pte_exec(*pte))
return 0; return 0;
/* /*
* Note: We do not do lazy flushing on protection key * Note: We do not do lazy flushing on protection key
* changes, so no spurious fault will ever set PF_PK. * changes, so no spurious fault will ever set X86_PF_PK.
*/ */
if ((error_code & PF_PK)) if ((error_code & X86_PF_PK))
return 1; return 1;
return 1; return 1;
...@@ -1140,8 +1120,8 @@ spurious_fault(unsigned long error_code, unsigned long address) ...@@ -1140,8 +1120,8 @@ spurious_fault(unsigned long error_code, unsigned long address)
* change, so user accesses are not expected to cause spurious * change, so user accesses are not expected to cause spurious
* faults. * faults.
*/ */
if (error_code != (PF_WRITE | PF_PROT) if (error_code != (X86_PF_WRITE | X86_PF_PROT) &&
&& error_code != (PF_INSTR | PF_PROT)) error_code != (X86_PF_INSTR | X86_PF_PROT))
return 0; return 0;
pgd = init_mm.pgd + pgd_index(address); pgd = init_mm.pgd + pgd_index(address);
...@@ -1201,19 +1181,19 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) ...@@ -1201,19 +1181,19 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
* always an unconditional error and can never result in * always an unconditional error and can never result in
* a follow-up action to resolve the fault, like a COW. * a follow-up action to resolve the fault, like a COW.
*/ */
if (error_code & PF_PK) if (error_code & X86_PF_PK)
return 1; return 1;
/* /*
* Make sure to check the VMA so that we do not perform * Make sure to check the VMA so that we do not perform
* faults just to hit a PF_PK as soon as we fill in a * faults just to hit a X86_PF_PK as soon as we fill in a
* page. * page.
*/ */
if (!arch_vma_access_permitted(vma, (error_code & PF_WRITE), if (!arch_vma_access_permitted(vma, (error_code & X86_PF_WRITE),
(error_code & PF_INSTR), foreign)) (error_code & X86_PF_INSTR), foreign))
return 1; return 1;
if (error_code & PF_WRITE) { if (error_code & X86_PF_WRITE) {
/* write, present and write, not present: */ /* write, present and write, not present: */
if (unlikely(!(vma->vm_flags & VM_WRITE))) if (unlikely(!(vma->vm_flags & VM_WRITE)))
return 1; return 1;
...@@ -1221,7 +1201,7 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) ...@@ -1221,7 +1201,7 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
} }
/* read, present: */ /* read, present: */
if (unlikely(error_code & PF_PROT)) if (unlikely(error_code & X86_PF_PROT))
return 1; return 1;
/* read, not present: */ /* read, not present: */
...@@ -1244,7 +1224,7 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs) ...@@ -1244,7 +1224,7 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs)
if (!static_cpu_has(X86_FEATURE_SMAP)) if (!static_cpu_has(X86_FEATURE_SMAP))
return false; return false;
if (error_code & PF_USER) if (error_code & X86_PF_USER)
return false; return false;
if (!user_mode(regs) && (regs->flags & X86_EFLAGS_AC)) if (!user_mode(regs) && (regs->flags & X86_EFLAGS_AC))
...@@ -1297,7 +1277,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, ...@@ -1297,7 +1277,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
* protection error (error_code & 9) == 0. * protection error (error_code & 9) == 0.
*/ */
if (unlikely(fault_in_kernel_space(address))) { if (unlikely(fault_in_kernel_space(address))) {
if (!(error_code & (PF_RSVD | PF_USER | PF_PROT))) { if (!(error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) {
if (vmalloc_fault(address) >= 0) if (vmalloc_fault(address) >= 0)
return; return;
...@@ -1325,7 +1305,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, ...@@ -1325,7 +1305,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
if (unlikely(kprobes_fault(regs))) if (unlikely(kprobes_fault(regs)))
return; return;
if (unlikely(error_code & PF_RSVD)) if (unlikely(error_code & X86_PF_RSVD))
pgtable_bad(regs, error_code, address); pgtable_bad(regs, error_code, address);
if (unlikely(smap_violation(error_code, regs))) { if (unlikely(smap_violation(error_code, regs))) {
...@@ -1351,7 +1331,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, ...@@ -1351,7 +1331,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
*/ */
if (user_mode(regs)) { if (user_mode(regs)) {
local_irq_enable(); local_irq_enable();
error_code |= PF_USER; error_code |= X86_PF_USER;
flags |= FAULT_FLAG_USER; flags |= FAULT_FLAG_USER;
} else { } else {
if (regs->flags & X86_EFLAGS_IF) if (regs->flags & X86_EFLAGS_IF)
...@@ -1360,9 +1340,9 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, ...@@ -1360,9 +1340,9 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
if (error_code & PF_WRITE) if (error_code & X86_PF_WRITE)
flags |= FAULT_FLAG_WRITE; flags |= FAULT_FLAG_WRITE;
if (error_code & PF_INSTR) if (error_code & X86_PF_INSTR)
flags |= FAULT_FLAG_INSTRUCTION; flags |= FAULT_FLAG_INSTRUCTION;
/* /*
...@@ -1382,7 +1362,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, ...@@ -1382,7 +1362,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
* space check, thus avoiding the deadlock: * space check, thus avoiding the deadlock:
*/ */
if (unlikely(!down_read_trylock(&mm->mmap_sem))) { if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
if ((error_code & PF_USER) == 0 && if (!(error_code & X86_PF_USER) &&
!search_exception_tables(regs->ip)) { !search_exception_tables(regs->ip)) {
bad_area_nosemaphore(regs, error_code, address, NULL); bad_area_nosemaphore(regs, error_code, address, NULL);
return; return;
...@@ -1409,7 +1389,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, ...@@ -1409,7 +1389,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
bad_area(regs, error_code, address); bad_area(regs, error_code, address);
return; return;
} }
if (error_code & PF_USER) { if (error_code & X86_PF_USER) {
/* /*
* Accessing the stack below %sp is always a bug. * Accessing the stack below %sp is always a bug.
* The large cushion allows instructions like enter * The large cushion allows instructions like enter
......
...@@ -1426,16 +1426,16 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node) ...@@ -1426,16 +1426,16 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
#if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HAVE_BOOTMEM_INFO_NODE) #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HAVE_BOOTMEM_INFO_NODE)
void register_page_bootmem_memmap(unsigned long section_nr, void register_page_bootmem_memmap(unsigned long section_nr,
struct page *start_page, unsigned long size) struct page *start_page, unsigned long nr_pages)
{ {
unsigned long addr = (unsigned long)start_page; unsigned long addr = (unsigned long)start_page;
unsigned long end = (unsigned long)(start_page + size); unsigned long end = (unsigned long)(start_page + nr_pages);
unsigned long next; unsigned long next;
pgd_t *pgd; pgd_t *pgd;
p4d_t *p4d; p4d_t *p4d;
pud_t *pud; pud_t *pud;
pmd_t *pmd; pmd_t *pmd;
unsigned int nr_pages; unsigned int nr_pmd_pages;
struct page *page; struct page *page;
for (; addr < end; addr = next) { for (; addr < end; addr = next) {
...@@ -1482,9 +1482,9 @@ void register_page_bootmem_memmap(unsigned long section_nr, ...@@ -1482,9 +1482,9 @@ void register_page_bootmem_memmap(unsigned long section_nr,
if (pmd_none(*pmd)) if (pmd_none(*pmd))
continue; continue;
nr_pages = 1 << (get_order(PMD_SIZE)); nr_pmd_pages = 1 << get_order(PMD_SIZE);
page = pmd_page(*pmd); page = pmd_page(*pmd);
while (nr_pages--) while (nr_pmd_pages--)
get_page_bootmem(section_nr, page++, get_page_bootmem(section_nr, page++,
SECTION_INFO); SECTION_INFO);
} }
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
...@@ -1781,8 +1781,8 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc) ...@@ -1781,8 +1781,8 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
unsigned long start; unsigned long start;
int ret; int ret;
/* Nothing to do if the SME is not active */ /* Nothing to do if memory encryption is not active */
if (!sme_active()) if (!mem_encrypt_active())
return 0; return 0;
/* Should not be working on unaligned addresses */ /* Should not be working on unaligned addresses */
......
This diff is collapsed.
...@@ -64,8 +64,9 @@ static void __init setup_real_mode(void) ...@@ -64,8 +64,9 @@ static void __init setup_real_mode(void)
/* /*
* If SME is active, the trampoline area will need to be in * If SME is active, the trampoline area will need to be in
* decrypted memory in order to bring up other processors * decrypted memory in order to bring up other processors
* successfully. * successfully. This is not needed for SEV.
*/ */
if (sme_active())
set_memory_decrypted((unsigned long)base, size >> PAGE_SHIFT); set_memory_decrypted((unsigned long)base, size >> PAGE_SHIFT);
memcpy(base, real_mode_blob, size); memcpy(base, real_mode_blob, size);
......
...@@ -6,6 +6,7 @@ ...@@ -6,6 +6,7 @@
#include <linux/mm.h> #include <linux/mm.h>
#include <linux/sched.h> #include <linux/sched.h>
#include <linux/slab.h> #include <linux/slab.h>
#include <linux/syscalls.h>
#include <linux/uaccess.h> #include <linux/uaccess.h>
#include <asm/unistd.h> #include <asm/unistd.h>
#include <os.h> #include <os.h>
...@@ -369,7 +370,9 @@ void free_ldt(struct mm_context *mm) ...@@ -369,7 +370,9 @@ void free_ldt(struct mm_context *mm)
mm->arch.ldt.entry_count = 0; mm->arch.ldt.entry_count = 0;
} }
int sys_modify_ldt(int func, void __user *ptr, unsigned long bytecount) SYSCALL_DEFINE3(modify_ldt, int , func , void __user * , ptr ,
unsigned long , bytecount)
{ {
return do_modify_ldt_skas(func, ptr, bytecount); /* See non-um modify_ldt() for why we do this cast */
return (unsigned int)do_modify_ldt_skas(func, ptr, bytecount);
} }
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment